Download More on errors

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Computer representation of numbers in different bases.
Decimal: 37145 = 5+40+100+7000+30000
= 5 × 100 + 4 × 101 + 1 × 102 + 7 × 103 + 3 × 104
General formula: A number in decimal system would be a string
like
๐‘Ž๐‘› ๐‘Ž๐‘›โˆ’1 ๐‘Ž๐‘›โˆ’2 โ€ฆ ๐‘Ž0 = ๐‘Ž0 × 100 + ๐‘Ž1 × 101 โ€ฆ + ๐‘Ž๐‘› × 10๐‘›
For a fractional part in decimal
0. ๐‘1 ๐‘2 ๐‘3 โ€ฆ = ๐‘1 × 10โˆ’1 + ๐‘2 × 10โˆ’2 + ๐‘3 × 10โˆ’3 + โ‹ฏ
Expanding it to any base ๐›ฝ number, a typical number may be
seen as
โˆ’๐‘˜
(๐‘Ž๐‘› ๐‘Ž๐‘›โˆ’1 โ€ฆ ๐‘Ž0 . ๐‘1 ๐‘2 ๐‘3 โ€ฆ )๐›ฝ = โˆ‘๐‘›๐‘˜=0 ๐‘Ž๐‘˜ ๐›ฝ ๐‘˜ + โˆ‘โˆž
๐‘˜=1 ๐‘๐‘˜ ๐›ฝ
Examples.
(723.3041)10
= 7 × 102 + 2 × 101 + 3 × 100 + 3 × 10โˆ’1 + 4 × 10โˆ’3
+ 1 × 10โˆ’4
To convert a number from one base to another, we may do the
following.
Example from decimal to octal: Divide the part before the decimal
recursively by 8, keep the remainder aside, and continue working
with the quotient in the similar manner until quotient becomes
less than 8.
e.g. (723)10 = 8|723 โ†’ 8|90 + 3 โ†’ 8|11 2 โ†’ 1 3
Collecting them, we get (723)10 = (1323)8
Check (1323)8 = 1 × 83 + 3 × 82 + 2 × 81 + 3 = 723
We can expand (1323)8 could be expressed in binary. Take each
digit from the right and expand that digit in binary. Therefore,
(1323)8 = (1 011 010 011)2
Similarly, to get the decimal part in octal, keep multiplying it by 8
collecting the subsequent digit at the left of the decimal point for
octal expansion as we continue.
0.3041 × 8 = 2 + 0.4328
0.4328 × 8 = 3 + 0.4624
0.4624 × 8 = 3 + 0.6992
0.6992 × 8 = 5 + 0.5936
โ€ฆโ€ฆโ€ฆ
Thus, (0.3041)10 = (.2335. . )8 = (.10 011 011 101)2
Thus
(723.3041)10 = (1323.2335)8 = (1 011010011.10011011101)2
Floating point representations.
Computer arithmetic treats number other than integers as floating
point numbers, and they all comprise a sign bit, integer part, and
a fractional part as shown in normalized scientific notation forms:
54.23157
= 0.5423157 × 102
0.000327138 = 0.327138 × 10โˆ’3
โˆ’20137.21765 = โˆ’0.2013721765 × 105
In general, in normalized form, a floating point ๐‘ฅ is represented
as
๐‘ฅ = ±๐‘Ÿ × 10๐‘› where ๐‘Ÿ = normalized mantissa and
๐‘› = exponent
Note that 0.1 โ‰ค ๐‘Ÿ < 1
The standard IEEE single-precision floating point form is 32-bit
number, in which first bit is the sign bit, the mantissa is 23 bits,
and the exponent is 8 bits.
For double precision arithmetic, we use 64-bit representation, in
which first bit is the sign bit, the mantissa is 53 bits, and the
exponent is 10 bits.
IEEE Floating point format:
๐‘ฅ = (โˆ’1)๐‘  × (1 + ๐‘“๐‘Ÿ๐‘Ž๐‘๐‘ก๐‘–๐‘œ๐‘›) × 2(๐‘’๐‘ฅ๐‘๐‘œ๐‘›๐‘’๐‘›๐‘กโˆ’๐‘๐‘–๐‘Ž๐‘ )
๐‘  = 0 for positive, and 1, for negative
Exponent = Excess representation: actual exponent + bias
โ–  Ensures exponent is unsigned
โ–  For single precision: bias = 127
โ–  For double precision: bias = 1203
The final look for a 32-bit floating point number:
Sign bit | the 8 exponent bits | 23 bits for mantissa
Single precision range
โ–  00000000 and 11111111 are reserved.
โ–  smallest value: Exponent 00000001. This implies actual
exponent to 1-127 = -126.
โ–  The smallest value: ±1 × 2โˆ’126 โ‰ˆ ±1.2 × 10โˆ’38
โ–  The largest value:
Exponent = 11111110 implies
Actual exponent = 254-127=127
Fraction part: 0.1111 โ€ฆ11 โ‰ˆ significant โ‰ˆ 2.0
The number is ±2.0 × 2127 โ‰ˆ ±3.4 × 1038
Example. (From
http://sandbox.mc.edu/~bennet/cs110/flt/dtof.html)
Convert -1313.3125 to IEEE 32-bit floating point format.
The integral part is 131310 = (2441)8 =(10100100001)2. The
fractional:
0.3125 × 2 = 0.625 0 Generate
0.625 × 2 = 1.25 1 Generate
0.25 × 2 = 0.5 0 Generate
Generate
0.5
× 2 = 1.0 1
0
1
0
1
and
and
and
and
continue.
continue with the rest.
continue.
nothing remains.
So 1313.312510 = (10100100001.0101)2.
Normalize: (10100100001.0101)2 = (1.01001000010101)2 × 210.
Mantissa is 01001000010101000000000, exponent is 10 + 127 =
137 = 100010012, sign bit is 1.
So -1313.3125 is 11000100101001000010101000000000 =
(๐ถ4๐ด42๐ด00)16
Machine precision
โ–  For a 32-bit number, the smallest number that we can
identify is 2โˆ’23 โ‰ˆ 1.2 × 2โˆ’7 . Therefore, in usual single precision
computations, approximately 6 significant digits of accuracy is
available.
โ–  For a double precision, each number is stored using two
consecutive words in memory, 52 bits are available for mantissa.
Since 2โˆ’52 โ‰ˆ 2.2 × 2โˆ’16, we have approximately 15 bits available
for double precision accuracy.
โ–  For an integer in 32 bits, it lies between โˆ’(231 โˆ’ 1) and
(231 โˆ’ 1) = 2147483647. In integer arithmetic, accuracy is about
9 digits only.
Computational rounding off error
โ–  Let ๐‘ฅ = actual number, and fl(x) floating point machine
number. Then, fl(x) =x(1+๐›ฟ), where |๐›ฟ| โ‰ค 2โˆ’24
โ–  The relative error in representing x by its floating point
machine number fl(x) is
Relative Error, e โ‰ฅ
โ–  Some relative errors:
|๐‘ฅโˆ’๐‘“๐‘™(๐‘ฅ)
|๐‘ฅ|
fl(x โŠ™ y) = (x โŠ™ y) (1+๐›ฟ)
โ–  In computers, all non-integer numbers (floating point numbers)
are expressed with finite precision. ๐œ‹ and
1
3
cannot be expressed
by a fixed number of significant digits. And here is our source of
loss of significant digit.
Compare 0.3+0.3+0.3 with 0.9. Assign their difference to error.
What is the value of error?
> 0.3+0.3+0.3==0.9
[1] FALSE
> error=(0.3+0.3+0.3-0.9)
> error
[1] -1.110223e-16
>
Note also that floating point computation results depend on the
order of computations. It may not be always associative i.e
(a+b)+c may not be always equal to a+(b+c).
Example.
> (0.00000000000000001+1)-1
[1] 0
> 0.00000000000000001+(1-1)
[1] 1e-17
>
Some floating points like 0.1 has to be approximated. And this
leads to a major source of error.
0.1=1.1001 1001 1001 1001 1001 1001 × 2โˆ’4
The fraction part (the mantissa) 1001 1001 1001 1001 1001 1001
is also called significant, and -4 is the exponent. Storage for
floating points is predicated by machine architectures. For a
particular machine, sign part takes 1 bit, significant is stored
using 24 bits, and the exponent by 7 bits so that the entire
floating point is stored as 4 bytes for single precision as shown by
an example earlier.
All these examples point to loss of significant digits due to
machine limitations (rounding/chopping errors) of floating point
numbers.
In particular, be careful of subtracting two numbers which are
almost equal as careless subtractions may lead to loss of
significant digits.
In general, error in numerical computation (our current issue)
now arises from:
โ–  Modeling errors
โ–  Mistakes
โ–  Physical measurement errors
โ–  Machine representation of floating points
โ–  Mathematical approximation of errors
Consequences of inadequate handling of errors:
โ–  Loss of significant digits
โ–  Noise in function evaluation
โ–  Over and under stretching of errors