Download Floating Point Numbers Presentation

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Large numbers wikipedia , lookup

History of logarithms wikipedia , lookup

Elementary mathematics wikipedia , lookup

Location arithmetic wikipedia , lookup

Approximations of π wikipedia , lookup

Addition wikipedia , lookup

Arithmetic wikipedia , lookup

Positional notation wikipedia , lookup

Transcript
Floating Point
Numbers
CS208
Floating Point Numbers

Now you've seen unsigned and signed
integers. In real life we also need to be able
represent numbers with fractional parts (like: 12.5 & 45.39).
 Called Floating Point numbers.
 You will learn the IEEE 32-bit floating
point representation.
Floating Point Numbers


In the decimal system, a decimal point
(radix point) separates the whole
numbers from the fractional part
Examples:
37.25 ( whole = 37, fraction = 25/100)
123.567
10.12345678
Floating Point Numbers
For example, 37.25 can be analyzed as:
101
Tens
3
100
Units
7
10-1
Tenths
2
10-2
Hundredths
5
37.25 = (3 x 10) + (7 x 1) + (2 x 1/10) + (5 x 1/100)
Binary Equivalence
The binary equivalent of a floating point number
can be determined by computing the binary
representation for each part separately.
1) For the whole part:
Use subtraction or division method
previously learned.
2) For the fractional part:
Use the subtraction or
method (to be shown next)
multiplication
Fractional Part – Multiplication Method
In the binary representation of a floating point
number the column values will be as follows:
… 25 24 23 22 21 20 . 2-1 2-2 2-3
2-4 …
… 32 16 8 4 2 1 . 1/2 1/4 1/8
1/16…
… 32 16 8 4 2 1 . .5 .25 .125 .0625…
Fractional Part – Multiplication Method
Ex 1. Find the binary equivalent of 0.25
Step 1: Multiply the fraction by 2 until the fractional
part becomes 0
.25
x2
0.5
x2
1.0
Step 2: Collect the whole parts in forward order. Put
them after the radix point
. .5
. 0
.25 .125 .0625
1
Fractional Part – Multiplication Method
Ex 2. Find the binary equivalent of 0.625
Step 1: Multiply the fraction by 2 until the fractional
part becomes 0
.625
x 2
1.25
x 2
0.50
Step 2: Collect the whole parts in
x 2
forward order. Put them after the 1.0
radix point
. .5
. 1
.25 .125 .0625
0 1
Fractional Part – Subtraction Method
Start with the column values again, as follows:
… 20 . 2-1 2-2 2-3 2-4
2-5
2-6…
… 1 . 1/2 1/4 1/8 1/16 1/32
1/64…
… 1 . .5 .25 .125 .0625 .03125 .015625…
Fractional Part – Subtraction Method
Starting with 0.5, subtract the column values
from left to right. Insert a 0 in the column if
the value cannot be subtracted or 1 if it can
be. Continue until the fraction becomes .0
Ex 1.
.25
- .25
.0
.5
.0
.25
1
.125
.0625
Binary Equivalent of FP
number
Ex 2. Convert 37.25, using subtraction method.
64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1
0
0
37
- 32
5
-4
1
-1
0
1 0 1. 0
1
.25
- .25
.0
37.2510 = 100101.012
Binary Equivalent of FP
number
Ex 3. Convert 18.625, using subtraction method.
64 32 16 8 4 2 1 . .5 .25 .125 .0625
26 25 24 23 22 21 20 . 2-1 2-2 2-3 2-4
1 0 0 1 0
1 0 1
18
- 16
2
- 2
0
.625
- .5
.125
- .125
0
18.62510 = 10010.1012
Try It Yourself
Convert the following decimal numbers to
binary:
60.7510
190.562510
(Answers on next page)
Answers
60.7510 =
111100.112
190.562510
= 10111110.10012
Problem storing binary form

We have no way to store the radix point!

Standards committee came up with a way
to store floating point numbers (that have
a decimal point)
IEEE Floating Point Representation

Floating point numbers can be stored into 32bits, by dividing the bits into three parts:
the sign, the exponent, and the mantissa.
1 2
9
10
32
IEEE Floating Point Representation

The first (leftmost) field of our floating
point representation will STILL be the
sign bit:
0 for a positive number,
 1 for a negative number.

Storing the Binary Form
How do we store a radix point?
- All we have are zeros and ones…
Make sure that the radix point is ALWAYS in the
same position within the number.
Use the IEEE 32-bit standard
 the leftmost digit must be a 1
Solution is Normalization
Every binary number, except the one
corresponding to the number zero, can be
normalized by choosing the exponent so that the
radix point falls to the right of the leftmost 1 bit.
37.2510 = 100101.012 = 1.0010101 x 25
7.62510 = 111.1012 = 1.11101 x 22
0.312510 = 0.01012 = 1.01 x 2-2
IEEE Floating Point Representation

The second field of the floating point number
will be the exponent.

The exponent is stored as an unsigned 8-bit
number, RELATIVE to a bias of 127.

Exponent 5 is stored as (127 + 5) or 132


132 = 10000100
Exponent -5 is stored as (127 + (-5)) or 122

122 = 01111010
Try It Yourself
How would the following exponents be
stored (8-bits, 127-biased):
2-10
28
(Answers on next slide)
Answers
2-10
exponent
bias
28
exponent
bias
-10
+127
117
8-bit
value
 01110101
8
+127
135
8-bit
value
 10000111
IEEE Floating Point Representation

The mantissa is the set of 0’s and 1’s to
the left of the radix point of the normalized
(when the digit to the left of the radix point
is 1) binary number.
Ex:
1.00101 X 23
(The mantissa is 00101)
 The mantissa is stored in a 23 bit field, so
we add zeros to the right side and store:
00101000000000000000000
Decimal Floating Point to
IEEE standard Conversion
Ex 1: Find the IEEE FP representation of
40.15625
Step 1.
Compute the binary equivalent of the
whole part and the fractional part. (i.e.
convert 40 and .15625 to their binary
equivalents)
Decimal Floating Point to
IEEE standard Conversion
40
- 32
8
- 8
0
So:
Result:
101000
.15625
-.12500
.03125
-.03125
.0
Result:
.00101
40.1562510 = 101000.001012
Decimal Floating Point to
IEEE standard Conversion
Step 2. Normalize the number by moving the
decimal point to the right of the leftmost one.
101000.00101 = 1.0100000101 x 25
Decimal Floating Point to
IEEE standard Conversion
Step 3. Convert the exponent to a biased
exponent
127 + 5 = 132
And convert biased exponent to 8-bit unsigned
binary:
13210 = 100001002
Decimal Floating Point to
IEEE standard Conversion
Step 4. Store the results from steps 1-3:
Sign
Exponent
Mantissa
(from step 3) (from step 2)
0
10000100
01000001010000000000000
Decimal Floating Point to
IEEE standard Conversion
Ex 2: Find the IEEE FP representation of –24.75
Step 1. Compute the binary equivalent of the whole
part and the fractional part.
24
- 16
8
- 8
0
Result:
11000
.75
- .50
.25
- .25
.0
So: -24.7510 = -11000.112
Result:
.11
Decimal Floating Point to
IEEE standard Conversion
Step 2.
Normalize the number by moving the decimal
point to the right of the leftmost one.
-11000.11 = -1.100011 x 24
Decimal Floating Point to
IEEE standard Conversion.
Step 3. Convert the exponent to a biased
exponent
127 + 4 = 131
==> 13110 = 100000112
Step 4. Store the results from steps 1-3
Sign
1
Exponent
10000011
mantissa
1000110..0
Try It Yourself
Convert the following numbers to IEEE
floating point representation:
60.7510
-190.562510
Answers
60.7510 =
0 10000100 11100110..0
-190.562510 =
1 10000110 011111010010..0
IEEE standard to Decimal
Floating Point Conversion.

Do the steps in reverse order

In reversing the normalization step move the
radix point the number of digits equal to the
exponent:
 If exponent is positive, move to the right
 If exponent is negative, move to the left
IEEE standard to Decimal
Floating Point Conversion.
Ex 1: Convert the following 32-bit binary
number to its decimal floating point
equivalent:
Sign
Exponent
Mantissa
1
01111101
010..0
IEEE standard to Decimal
Floating Point Conversion..
Step 1: Extract the biased exponent and unbias
it
Biased exponent = 011111012 = 12510
Unbiased Exponent: 125 – 127 = -2
IEEE standard to Decimal
Floating Point Conversion..
Step 2: Write Normalized number in the form:
1 . ____________
x 2
Mantissa
For our number:
-1. 01 x 2 –2
Exponent
----
IEEE standard to Decimal
Floating Point Conversion.
Step 3: Denormalize the binary number from step 2
(i.e. move the decimal and get rid of (x 2n) part):
-0.01012
(negative exponent – move left)
Step 4: Convert binary number to the FP equivalent
(i.e. Add all column values with 1s in them)
-0.01012 = - ( 0.25 + 0.0625)
= -0.312510
IEEE standard to Decimal
Floating Point Conversion.
Ex 2: Convert the following 32 bit binary
number to its decimal floating point
equivalent:
Sign
0
Exponent
10000011
Mantissa
10011000..0
IEEE standard to Decimal
Floating Point Conversion..
Step 1: Extract the biased exponent and unbias
it
Biased exponent = 10000112 = 13110
Unbiased Exponent: 131 – 127 = 4
IEEE standard to Decimal
Floating Point Conversion..
Step 2: Write Normalized number in the form:
Mantissa
1 . ____________
x 2
For our number:
1.10011 x 2 4
Exponent
----
IEEE standard to Decimal
Floating Point Conversion.
Step 3: Denormalize the binary number from step 2
(i.e. move the decimal and get rid of (x 2n) part:
11001.12
(positive exponent – move right)
Step 4: Convert binary number to the FP equivalent
(i.e. Add all column values with 1s in them)
11001.1 = 16 + 8 + 1 +.5
= 25.510
Try It Yourself
Convert the following IEEE floating point
numbers to decimal:
0
10000010
1110010..0
0
10000110
01010..0
1
01111101
10100..0
Answers
0
10000010 1110010..0 =
15.12510
0
10000110 01010..0 =
168.010
1
01111101 10100 =
-0.4062510