Download PPT - Bucknell University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

List of first-order theories wikipedia , lookup

History of logarithms wikipedia , lookup

Infinitesimal wikipedia , lookup

Infinity wikipedia , lookup

Large numbers wikipedia , lookup

Real number wikipedia , lookup

Arithmetic wikipedia , lookup

Approximations of π wikipedia , lookup

Addition wikipedia , lookup

Elementary mathematics wikipedia , lookup

Location arithmetic wikipedia , lookup

Positional notation wikipedia , lookup

Transcript
CSCI206 - Computer Organization &
Programming
Introduction to Floating Point Numbers
zyBook: 10.5
Developed and maintained by the Bucknell University Computer Science Department - 2017
Real Numbers in Binary
Recall the equation describing a positional
number system:
Recall the equation describing a positional
number system:
This can be extended to real numbers:
decimal point!
http://www.virtualnerd.com/pre-algebra/real-numbers-right-triangles/real-and-irrational/define-realnumbers/real-number-definition
Real Numbers in Binary
Examples
base 10
0.1 in binary is
= 0.5 (base 10)
Real Numbers in Binary
Fractions of a power of 2 are
easy
Other values are represented
using the sum of fractions with
a power of 2 denominator
Algorithm to Convert Decimal to Binary
For decimal to binary integers
we divided by 2 and record
the remainder.
Example: convert
For decimal to binary real
numbers we multiply by 2
and record the integer part.
to binary.
to binary
binary digits
beginning to the
right of the decimal
to binary
binary digits
beginning to the
right of the decimal
to binary
binary digits
beginning to the
right of the decimal
to binary
binary digits
beginning to the
right of the decimal
to binary
only work with
fractional part!
binary digits
beginning to the
right of the decimal
to binary = 0.00011
It appears magic, but the
reason behind the algorithm
is to find k, such that v * 2^k
= 1.0, as v is expressed as
b0*2^(-1)+b1*2^(-2) …
fraction
part is
zero,
stop!
Convert binary to decimal
Convert 0.000112 to decimal
0.000112 == 2-4 + 2-5 == 1/16 + 1/32 == 3/32
== 0.0937510
Number Representation in Computing
For a given range of integers, there is a
corresponding range of (exact) binary
representations
Example: the range [0-15] corresponds to the 4-bit
binary numbers 0000 through 1111.
Number Representation in Computing
Within a range of real numbers, there is no
way to encode all possible values.
Example: the range [0.0 - 1.0] has an infinite number
of points, so we would need an infinite number of
bits to represent all of the possible values!
As a result, in computing, real numbers are
approximate.
Activity 24, question 1 - 2
An Observation
Many common base 10 real numbers
generate an infinite number of binary digits
Fixed vs. Floating Point Approximations
Decimal notation
Floating Point Representation
2
2×100
300
3×102
321.7
3.217×102
−53,000
−5.3×104
6,720,000,000
6.72×109
0.2
2×10−1
Where
S is the sign bit
M is a fixed point number (precision of numbers)
E is a signed integer (range of numbers)
S
Exponent
Mantissa
or Fraction
32 or 64 bit word
Scientific
notation
IEEE 754 Standard (1985)
S
Exponent
Mantissa
One bit for Sign
Single precision float (32 bits)
8 bit Exponent
23 bit Mantissa
Double precision float (64 bits)
11 bit Exponent
52 bit Mantissa
IEEE 754 Standard (1985)
S
Exponent
Mantissa
Mantissa is normalized, meaning it is a fixed
point number in the form 1.xxxxxx
to save one bit, the 1. is implicit (not represented)
Exponent is represented in biased form
B = 127 for single
B = 1023 for double
IEEE 754 Standard (1985) (normalized)
S
Exponent
Mantissa
S, E, and M are encoded in the binary word
IEEE754 - Reserved Values
Not a Number =
IEEE754 - Example
Show 3.14 as a single precision float
3.14 - step 1 write in binary
3.14 == 3 + 0.14
0.14*2 = 0.28
0.28*2 = 0.56
0.56*2 = 1.12
0.12*2 = 0.24
......
0.0010......
3.14 - step 1 write in binary
need 24 bits for single (52 for double)
3.14 == 11.0010001111010111000011
3.14 - step 2 normalize binary
Normalized form is 1.yyyyy
3.14 == 11.0010001111010111000011 ==
Note a total of 24 bits.
3.14 - step 3 write mantissa & sign
3.14 ==
M = 10010001111010111000011
S = 0 (positive)
Note that the mantissa keeps only 23 bits, the
leading bit is always 1, so it is omitted in
representation (only!!).
3.14 - step 4 encode exponent
3.14 ==
Exponent = 1, B = 127, (8 bits)
E (biased exponent) = 128 = 1000 0000
3.14 - step 5 write result
S = 0 (positive)
E = 1000 0000
M = 10010001111010111000011
0 1000 0000 10010001111010111000011
to hex = 0x4048f5c3
Endianness
On a little-endian system (Intel, etc), the
IEEE754 value is byte & word swapped
0x 40 48 f5 c3 (big endian)
4840
c3f5
Swap bytes and words!
0x
c3f5
4840 (little endian)
float f = 3.14;
unsigned char* p = (unsigned char*)&f;
printf("%02x%02x %02x%02x\n", *p, *(p+1), *(p+2), *(p+3)); // result: c3f5 4840