Download Appendix B Floating Point Numbers

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Law of large numbers wikipedia , lookup

Approximations of π wikipedia , lookup

Rounding wikipedia , lookup

Large numbers wikipedia , lookup

Positional notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Addition wikipedia , lookup

Arithmetic wikipedia , lookup

Elementary mathematics wikipedia , lookup

Transcript
Appendix B
Floating Point Numbers
Making a computer’s limited bit words represent a larger range of numerical values.
Modeling real floating-point number representations.
Mantissa – defined fractional component of a floating point number
Exponent – the exponentiation radix and value of the exponent
X  frac  R exponent
or
Decimal floating point range of values
1 of 9
X   1
sign
 frac  R exponent
Binary Floating Point
X  frac  R exp onent
Typical two’s complement
The fractional portion is in:

Fractional two’s complement 1  frac  0.5 and frac  0 and  0.5  frac  1

Mixed two’s complement

Hidden 1, two’s complement
2 Off  frac  2 Off 1 and  2 Off 1  frac  2 Off
2   frac  1  1 and  1   frac  1  2
The exponent portion is in (E is number of bits)

Integer two’s complement
2 E 1  1  exp  2 E 1
Note: specific values of the exponent may be used as “flags”.
Flags may be required to represent 0 and infinity
TI initially used a hidden-1, two’s complement format for floating point.
They now support IEEE, which based on hidden-1, sign-magnitude
Typical Sign-magnitude
X   1
sign
 frac  R exp onent
The fractional portion is in:

Fractional sign-magnitude
1  frac  0

Mixed sign-magnitude
2 Off  frac  2 Off 1

Hidden 1, sign-magnitude
2  frac  1

Hidden 1, sign-magnitude
2   frac  1  1 and  1   frac  1  2
The exponent portion is in (E is number of bits)

Excess M (integer with offset M)
2
E
 1  M  exp   M
Note: specific values of the exponent may be used as “flags”.
Flags may be required to represent 0 and infinity
IEEE floating point is based on hidden-1, sign-magnitude, excess exponent
2 of 9
Normalized Numbers
Numerical processes may result in numbers that are not in the correctly normalized floatingpoint format.
Text Examples of normalized and unnormalized numbers
Terms:
Normalized
The number is in the defined format
Unnormalized
The number is not in the normalized format
Denormalized
An unnormalized number, but it may be that way for a reason.
Denormalized representations are required when adding or subtracting two normalized numbers!
Since they may have unequal exponents, something has to be done with the mantissas.
3 of 9
IEEE Std. 754 Floating Point Numbers
IEEE Floating Point Numbers consist of 32-bit number representations with unique fractional
and exponent segments. The fractional parts are offset sign-magnitude, while the exponents are
in an excess 127 format.
http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4610935&tag=1
Sign (s): 1 bit
 1s
Exponent (exp):
Fraction (f):
Single: 8 bits
Single: 23 bits
Double: 11 bits
Double: 52 bits
Single  2 exp 127
Norm  1. f1 f 2   
Double  2
exp 1023
Denorm  0. f 1 f 2   
Representation based on the exponent range for 32-bit, single precision.
Excess 127: The “8-bit unsigned integer” is offset by 127 (i.e. UInt – 127)
0 and 255 or reserved values.
Exponent
Value
Type
1111 1111 (255)
Inf, NaN, Flags
Special
1111 1110 (254)
v   1  1. f 1 f 2     2127
Normalized


1000 0000 (128)
v   1  1. f1 f 2     21
Normalized
0111 1111 (127)
v   1  1. f1 f 2     2 0
Normalized
0111 1110 (126)
v   1  1. f 1 f 2     2 1
Normalized


0000 0001 (1)
v   1  1. f1 f 2     2 126
Normalized
0000 0000 (0)
v   1  0. f 1 f 2     2 126
Denormalized
s

s
s
s

s
s
4 of 9
Examples of IEEE Single Precision Floating Point
IEEE Hex/Binary
Value
3F80 0000
Norm: v   1  2127 127  1.0000000 
0011 1111 1000 0000 etc
v  1 2 0
BFC0 0000
Norm: v   1  2127 127  1.1000000
1011 1111 1100 0000 etc
v  1.5  2 0
7F80 0000
Special: (exp = 1111 1111)
0111 1111 1000 0000 etc
+Inf
FF00 0000
Norm: v   1  2 254127  1.0000000 
1111 1111 0000 0000 etc
v  1 2127
0070 000
Denorm: v   1  2 126  0.1110000
0000 0000 0111 0000 etc
v  7  2 126
8
IEEE Hex/Binary
Value
0
1
1
0
v  1 2 0
v  123
4000 0000
BFFF 0000
See: This is an on-line converter that I really liked!
http://babbage.cs.qc.edu/IEEE-754/
5 of 9
B1) Convert the following numbers to IEEE single precision floating-point format. Give the
results as eight hexadecimal digits.
Value
Sign bit
Binary w/ fraction
Exponent
Shifted Binary
Excess 127 Exp
Binary IEEE
Hex IEEE
9
0
1001.0
3 = 0000 0011
1.001
130 = 1000 0010
0100 0001 0001 0000
4110 0000
5/32
0
0.00101
-3 = 1111 1101
1.010
124 = 0111 1100
0011 1110 0010 0000
3E20 0000
Value
Sign bit
Binary w/ fraction
Exponent
Shifted Binary
Excess 127 Exp
Binary IEEE
Hex IEEE
-5/32
1
0.00101
-3 = 1111 1101
1.010
124 = 0111 1100
1011 1110 0010 0000
BE20 0000
6.125
0
0110.0010
2 = 0000 0010
1.100010
129 = 1000 0001
01000 0000 1100 0100
40C4 0000
B2) Convert the following IEEE single-precision floating-point numbers from hex to decimal:
IEEE Value
Sign bit
Excess 127 Exp
Exponent
Fraction
Format Value
Magnitude Binary
Value
42E4 8000
0
1000 0101 = 133
6
1.11001001000
2^6 x 1.11001001000
111 0010.01000
114.25
3F88 0000
0
0111 1111 = 127
0
1.0001000
2^0 x 1.0001000
1.0001000
1.0625
IEEE Value
Sign bit
Excess 127 Exp
Exponent
Fraction
Format Value
Magnitude Binary
Value
0080 0000
0
0000 0001 = 1
-126
1.0000
2^-126 x 1.0
0.000 --- 1
2^-126
C7F0 0000
1
1000 1111 = 143
16
1.1110000
-2^16 x 1.1110
11110000000000000
-122,880
6 of 9
Everyone should read:
What Every Computer Scientist Should Know
About Floating-Point Arithmetic
David Goldberg. 1991. What every computer scientist should know about floating-point
arithmetic. ACM Comput. Surv. 23, 1 (March 1991), 5-48. DOI=10.1145/103162.103163
http://doi.acm.org/10.1145/103162.103163
http://dl.acm.org/citation.cfm?id=103163
http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html#674
Abstract: Floating-point arithmetic is considered an esoteric subject by many people. This is
rather surprising because floating-point is ubiquitous in computer systems. Almost every
language has a floating-point datatype; computers from PC’s to supercomputers have floatingpoint accelerators; most compilers will be called upon to compile floating-point algorithms from
time to time; and virtually every operating system must respond to floating-point exceptions such
as overflow. This paper presents a tutorial on those aspects of floating-point that have a direct
impact on designers of computer systems. It begins with background on floating-point
representation and rounding error, continues with a discussion of the IEEE floating-point
standard, and concludes with numerous examples of how computer builders can better support
floating-point.
Another on-line article:
Five Tips for Floating Point Programming, By John D. Cook, 30 Oct 2008
http://www.codeproject.com/Articles/29637/Five-Tips-for-Floating-Point-Programming
From Wikipedia: http://en.wikipedia.org/wiki/Floating_point
 Dealing with exception cases
 Accuracy problems
 Machine precision definition
 Example of well-formed and poorly formed iteration for computing pi.
In this article there is an example of “catastrophic cancellation” where an iterative algorithm
performs subtraction and the numerical accuracy of the result is terrible or “dangerously
inaccurate”.
If you are going to be involved with “engineering or scientific” numerical processing,
take a class in Numerical Analysis.
7 of 9
Floating Point Arithmetic Hardware Implementation
Addition/Subtraction of floating point numbers:
 Denormalize the smaller number
 Add the mantissa
 Normalize the mantissa and adjust the exponent or form the correct representation.
 Check to see if the exponent has overflowed and form the correct representation.
Floating Point Add-Subtract
(from Chapter 6, Computer System Design and Architecture, V. Heuring and H. Jordan,
Addison Wesley, 1997, ISBN 0-8053-4330-X)
8 of 9
Multiplication of floating point numbers:
 Multiply the mantissas
 Add the exponents
 Normalize the mantissa and adjust the new exponent or form the correct representation.
 Check to see if the exponent has overflowed and form the correct representation
A(Sign, Exo, Mant.)
XOR
Sign
Add Exponents
B(Sign, Exo, Mant.)
Multiply Fractions
Normalize
Add Exponents
C(Sign, Exo, Mant.)
9 of 9