Download On the definition of ulp(x)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Laws of Form wikipedia , lookup

Corecursion wikipedia , lookup

Fundamental theorem of algebra wikipedia , lookup

Addition wikipedia , lookup

Transcript
On the definition of ulp(x)
Jean-Michel Muller
To cite this version:
Jean-Michel Muller. On the definition of ulp(x). RR-5504, INRIA. 2005, pp.16. <inria00070503>
HAL Id: inria-00070503
https://hal.inria.fr/inria-00070503
Submitted on 19 May 2006
HAL is a multi-disciplinary open access
archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from
teaching and research institutions in France or
abroad, or from public or private research centers.
L’archive ouverte pluridisciplinaire HAL, est
destinée au dépôt et à la diffusion de documents
scientifiques de niveau recherche, publiés ou non,
émanant des établissements d’enseignement et de
recherche français ou étrangers, des laboratoires
publics ou privés.
INSTITUT NATIONAL DE RECHERCHE EN INFORMATIQUE ET EN AUTOMATIQUE
On the definition of ulp (x)
Jean-Michel Muller
N° 5504
February 2005
Thème SYM
ISSN 0249-6399
apport
de recherche
On the definition of ulp (x)
Jean-Michel Muller
Thème SYM — Systèmes symboliques
Projet Arénaire
Rapport de recherche n° 5504 — February 2005 —16 pages
Abstract: Function ulp (acronym for unit in the last place) is frequently used for expressing errors in floating-point computations. We present several previously suggested
definitions of that function, and analyse some of their properties.
Key-words: Computer arithmetic, floating-point arithmetic, unit in the last place, ULP
This text is also available as a research report of the Laboratoire de l’Informatique du Parallélisme
http://www.ens-lyon.fr/LIP.
Unité de recherche INRIA Rhône-Alpes
655, avenue de l’Europe, 38334 Montbonnot Saint Ismier (France)
Téléphone : +33 4 76 61 52 00 — Télécopie +33 4 76 61 52 52
A propos de la définition de ulp (x)
Résumé : La fonction ulp (acronyme pour unit in the last place, c’est-à-dire “poids du
dernier chiffre”) est fréquemment utilisée pour exprimer des erreurs en arithmétique virgule flottante. Nous présentons plusieurs définitions précédemment suggérées pour cette
fonction, et analysons quelques unes de leurs propriétés.
Mots-clés : Arithmétique des ordinateurs, arithmétique virgule flottante, poids du dernier
chiffre
On the definition of ulp (x)
3
1 Introduction
The term ulp (acronym for unit in the last place) was coined by W. Kahan in 1960. The
original definition was [5]:
ulp (x) is the gap between the two floating-point numbers nearest x, even if x
is one of them.
As told by Kahan [5], the adoption of the IEEE-754 standard for floating-point arithmetic
has made infinities and NaNs ubiquitous, and that must be taken into account in the definition of ulp (x). Kahan now suggests the following definition:
ulp (x) is the gap between the two finite floating-point numbers nearest x, even
if x is one of them. (But ulp (NaN) is NaN.)
Several slightly different definitions of ulp (x) appear in the literature [3, 4, 6, 8]. In
this paper, we remind these various definitions and we analyze some of their properties.
Among these properties, some have certainly already been found by other people having
dealt with this topic (without, to my knowledge, having been published, except when I give
references). And yet, I feel it may be useful to collect them in a paper.
Thorough the paper, we assume a radix-r floating-point (FP for short) arithmetic, with
n-digit mantissas1 . If X is an FP number, then X + denotes the smallest FP number larger
than X and X − denotes the largest FP number less than X.
A good definition of function ulp :
• should (of course) agree with the “intuitive” notion when x is not in an “ambiguous
area” (i.e., x is not near a power of the radix, of larger than the largest representable
number, or ±∞, or zero. . . );
• should be useful: after all, for a binary n-bit format, defining ulp (1) as 2 −n (i.e.,
1 − 1− ) or 2−n+1 (i.e., 1+ − 1) are equally legitimate from a theoretical point of
view. What matters is which choice is helpful (i.e., which choice will preserve in
“ambiguous areas” properties that are true when we are far from them);
Let us consider the following common claims. They are true “in general”, but they need
some clarification. In the following RN (x) is x rounded to the nearest (even) floatingpoint (FP) number, RD (x) is x rounded towards −∞, RU (x) is x rounded towards +∞,
and RZ (x) is x rounded towards zero. The uppercase letter X will denote an FP number,
whereas x will denote a real number.
1
The possible implicit leading bit of the binary systems is counted in these n digits. For instance, in IEEE754 double precision arithmetic, n is equal to 53.
RR n° 5504
4
Jean-Michel Muller
Common claim 1
X = RN (x) ⇒ |x − X| ≤
Common claim 2
|x − X| <
1
ulp
2
1
ulp ⇒ X = RN (x)
2
Common claim 3
|x − X| < 1 ulp ⇔ X ∈ { RD (x), RU (x)}
In these claims, several things are unclear. The first one, of course, is the definition
of ulp (especially near the powers of the radix). The second one is whether “ ulp ” means
ulp (x) or ulp (X). Of course, in most practical cases, both values will be equal. But in
difficult cases (e.g., X is a loose approximation to x, or these values are close to a power of
the radix), they may differ.
2 Should we consider ulp (x) or ulp (X) ?
It should be clear that, for measuring the error of an approximation, the (possibly very loose)
approximation should not be used for defining the measure of error: the distance between
x (exact value) and X (FP approximation) should be expressed in terms of ulp (x), instead
of ulp (X). Just consider the example given in Figure 1: we assume a binary floating-point
system, with n-bit mantissas, we consider the real number x = 1 + = 1 + 2−n+1 and
two (very poor) approximations A = 2− = 2 − 2−n+1 and B = 2+ = 2 + 2−n+2 . A
approximates x with error (2n−1 − 2) ≈ 2n−1 ulp (A), whereas B approximates x with
error (2n−2 + 1/2) ≈ 2n−2 ulp (B). From these values, one could believe that B is a
much better approximation to x than A. And yet, A is closer to x than B. This shows that
ulp (approximation) cannot be a sensible unit of measurement of error.
3 Various definitions of function ulp
Definition 1 (Kahan [2, 5]) KahanUlp (x) is the width of the interval whose endpoints are
the two finite representable numbers nearest x (even if x is not contained within that interval).
Note: in [4], Harrison attributes the previous definition of ulp (x) to me, because I used
approximately the same in my book on elementary functions [7] (when writing the book, I
was not aware of Kahan’s definition).
INRIA
On the definition of ulp (x)
5
A = 2− = 2 − 2−n+1
...
0
1/4 1/2
......
1
2
x = 1+ = 1 + 2−n+1
4
B = 2+ = 2 + 2−n+2
Figure 1: A approximates x with error (2n−1 −2) ≈ 2n−1 ulp (A), whereas B approximates
x with error (2n−2 + 1/2) ≈ 2n−2 ulp (B). From these values, one could believe that B is
a much better approximation to x than A. And yet, A is closer to x than B.
Definition 2 (Harrison [4]) HarrisonUlp (x) is the distance between the closest straddling points a and b (i.e., those with a ≤ x ≤ b and a 6= b), assuming an unbounded
exponent range.
It is worth being noticed that Kahan’s and Harrison’s definitions coincide on FP numbers. However, for real numbers they may differ near powers of the radix. For instance,
in radix 2 with n-bit mantissas, if 1 < x < 1 + 2−n−1 then KahanUlp (x) = 2−n and
HarrisonUlp (x) = 2−n+1 .
Definition 3 (Goldberg [3]) GoldbergUlp (x) is defined as follows. If the FP number
d.dddd . . . dβ e is used to represent x, it is in error by
|d.dddd . . . d − (x/β e )|
units in the last place.
This last definition uses the approximation that represents x. To use it, we will assume
that the approximation is RZ (x) (to keep the same exponent). We will call the obtained
definition the modified GoldbergUlp (x).
Overton [8] defines function ulp for FP numbers only. He defines ulp (X), for X > 0,
as the gap between X and the next larger floating-point number (for X < 0, ulp (X) =
ulp (−X)). His definition coincides with Goldberg’s definition on FP numbers. Markstein [6] gives a very similar definition.
RR n° 5504
6
Jean-Michel Muller
1 1 + 2−n−1
ulp = 2−n
ulp = 2−n+1
x y
Figure 2: The values of KahanUlp (x) near 1, assuming a binary FP system with n-bit
mantissas. Note the strange side effect: 1 seems to be a better approximation to y than to x.
1
ulp = 2−n
ulp = 2−n+1
Figure 3: The values of HarrisonUlp (x) near 1, assuming a binary FP system with n-bit
mantissas.
1
ulp = 2−n
ulp = 2−n+1
Figure 4: The values of Modified GoldbergUlp (x) near 1, assuming a binary FP system
with n-bit mantissas. Notice that Modified GoldbergUlp (x) and HarrisonUlp (x) only
differ when x is a power of the radix.
INRIA
On the definition of ulp (x)
7
4 Some properties (assuming unbounded exponents)
4.1 With rounding to nearest
Property 1 In radix 2,
|X − x| <
1
HarrisonUlp (x) ⇒ X = RN (x)
2
See Theorem 1 for proof. Property 1 is not true in radices greater than or equal to 3. Figure 5
gives a counter-example in radix 3.
X = 1−
1 + 3−n /2
x
1+
1
Figure 5: This example shows that Property 1 is not true in radix 3. Here, x satisfies
1 < x < 1 + 21 3−n and X = 1− = 1 − 3−n . We have HarrisonUlp (x) = 3−n+1 , and
|x − X| < 3−n+1 /2, so that |x − X| < 21 HarrisonUlp (x). And yet, X 6= RN (x).
Property 2 For any radix,
X = RN (x) ⇒ |X − x| ≤
1
HarrisonUlp (x)
2
See Theorem 2 for proof.
Property 3 For any radix,
|X − x| <
RR n° 5504
1
KahanUlp (x) ⇒ X = RN (x)
2
8
Jean-Michel Muller
See Theorem 1 for proof.
Property 4 In radix 2,
X = RN (x) ⇒ |X − x| ≤
1
KahanUlp (x)
2
See Theorem 2 for proof. Property 4 is not true in radices greater than or equal to 3.
Figure 6 gives a counter-example in radix 3.
1−
1 + 3−n /2
1+
1
In this area, 1 = RN (x)
and yet |x − 1| > 0.5 KahanUlp (x).
Figure 6: This example shows that Property 4 is not true in radix 3. If 1 + 21 3−n < x <
1 + 3−n , then 1 = RN (x), and yet |x − 1| > 0.5 KahanUlp (x).
We see that with rounding to nearest in radix 2, both Kahan’s and Harrison’s definitions
preserve the common claims listed above. As we shall see later, the situation is different
with directed roundings.
Definition 4 A regular ulp function is such that there exists a value xcut ∈ [1, 1 + r −n+1 )
so that
ulp (x) = r −n+1+k
if rk xcut < x < r k+1 xcut .
This does not uniquely define the value of ulp (x) since there remains an ambiguity at
x = r k xcut . This ambiguity has no importance if xcut 6= 1, but may make a difference if
xcut = 1.
INRIA
On the definition of ulp (x)
9
For instance, both HarrisonUlp and KahanUlp are regular ulp functions, with x cut = 1
−n
for HarrisonUlp and xcut = 1 + r 2 (r − 1) for KahanUlp .
Theorem 1 To have
1
ulp (x) ⇒ X = RN (x)
2
for any real x and FP number X, we need
|X − x| <
r
xcut ≥ 1 + r −n ( − 1).
2
(1)
Proof: We only consider the case 1 ≤ x < 1+ (the other cases are either straightforward,
or easily deduced from this one). First, if x > xcut , then ulp (x) = r −n+1 . In that case,
since 1− = 1 − r −n cannot be the FP number that is nearest x (because x is closer to 1 than
to 1− ), we must have
1
x − 1− > ulp (x),
2
i.e.,
r
x > 1 + r −n ( − 1).
2
This gives the condition of the theorem.
Conversely, if xcut ≥ 1 + r −n ( 2r − 1) then
• if 1 ≤ x < xcut then ulp (x) = 1 − 1− = r−n . Hence, the only values that can be
within 21 ulp (x) from x (if any) are 1 and 1+ , and at most one of these values only
can be within 12 ulp (x) from x. If there is one, it will necessary be the FP number
that is nearest x;
• if x > xcut then ulp (x) = 1+ −1 = r −n+1 . Since (1) implies that x−1− > 21 ulp (x),
the only values that can be within 21 ulp (x) from x (if any) are 1 and 1+ , and at most
one of these values only can be within less than 12 ulp (x) from x. If there is one, it
will necessary be the FP number that is nearest x.
Theorem 2 To have
X = RN (x) ⇒ |X − x| ≤
1
ulp (x)
2
for any real x and FP number X, we need
1
xcut ≤ 1 + r−n .
2
RR n° 5504
(2)
10
Jean-Michel Muller
Proof: Again, we only consider the case 1 ≤ x < 1+ (the other cases are either straightforward, or easily deduced from this one).
If xcut > 1 + 12 r−n then, for
1
1
1 + r−n < x < min{xcut , 1 + r−n+1 }
2
2
we have,
½
RN (x) = 1
ulp (x) = r −n
hence, we have 1 = RN (x), and yet |1 − x| > 21 ulp (x). Hence the condition of the
theorem.
Conversely, if xcut ≤ 1 + 21 r−n , then for 1 ≤ x ≤ xcut , we have both RN (x) = 1 and
|1 − x| ≤ 21 ulp (x), and for xcut < x < 1+ , we have ulp (x) = r −n+1 = 1+ − 1, so RN (x)
is the value X in {1, 1+ } that is nearest x, and |X − x| is obviously less than or equal to
(1+ − 1)/2 = 12 ulp (x).
Theorem 3 If the radix r is greater than or equal to 4, there is no regular ulp function that
satisfies both
1
|X − x| < ulp (x) ⇒ X = RN (x)
2
and
X = RN (x) ⇒ |X − x| ≤
1
ulp (x).
2
Theorem 3 implies that for r ≥ 4 (which means, in practice, for r = 10, since radices
different from 2 and 10 seem no longer used) we have to choose between both properties:
they will be true “in general”, but one of them will be wrong when x is close to a power of
r. Theorem 3 is an immediate consequence of Theorems 1 and 2 (conditions (1) and (2)
become incompatible for r ≥ 4). For r = 3, the only allowable value of x cut is 1 + 3−n /2.
For r = 2, xcut ∈ [1, (1 + 1+ )/2].
4.2 With directed roundings
Property 5 For any value of the radix r,
X ∈ { RD (x), RU (x)} ⇒ |X − x| < 1 HarrisonUlp (x)
INRIA
On the definition of ulp (x)
11
But now the converse is not true. There are values X and x for which |X − x| <
1 HarrisonUlp (x), and yet X is not in { RD (x), RU (x)} (consider the case x slightly above
1 and X equal to 1− , the FP predecessor of 1).
With KahanUlp (x), also, there are values X and x for which |X−x| < 1 HarrisonUlp (x),
and yet X is not in { RD (x), RU (x)} (consider, in radix 2 with n-bit mantissas, the case
X = 1 − 2−n and x between 1 + 2−n−1 and 1 + 2−n ).
With KahanUlp (x), there is no equivalent of property 5. As noticed by Harrison [4],
we can have X ∈ { RD (x), RU (x)}, and |X − x| significantly larger than 1 KahanUlp (x)
(it can be arbitrarily close, without being equal, to r KahanUlp (x)). Consider the radix-2
case depicted by Figure 7.
1 + 2−n−1
1
x
X
Figure 7: We assume radix 2 and n-bit mantissas. X is equal to RU (x), and yet |X − x| is
very close to 2 KahanUlp (x) [4].
4.3 If anyway one decides to use ulp (X)
Although we have indicated in Section 2 that using ulp (x) as the measure of error seems
much preferable, one may, for some application, find a good reason for using ulp (X). In
such a case, we list the obtained properties below.
RR n° 5504
12
Jean-Michel Muller
Property 6 Assuming unbounded exponents, we find, for any value of the radix r:
|X − x| <
1
2
HarrisonUlp (X) ⇒ X = RN (x)
X = RN (x) does not imply |X − x| ≤
1
2
HarrisonUlp (X)
|X − x| < HarrisonUlp (X) ⇒ X ∈ { RD (x), RU (x)}
X ∈ { RD (x), RU (x)} does not imply |X − x| ≤ HarrisonUlp (X)
|X − x| <
1
2
KahanUlp (X) ⇒ X = RN (x)
X = RN (x) does not imply |X − x| ≤
1
2
KahanUlp (X)
|X − x| < KahanUlp (X) ⇒ X ∈ { RD (x), RU (x)}
X ∈ { RD (x), RU (x)} does not imply |X − x| ≤ KahanUlp (X)
X = RN (x) ⇒ |X − x| ≤
|X − x| <
1
2
1
2
GoldbergUlp (X)
GoldbergUlp (X) does not imply X = RN (x)
X ∈ { RD (x), RU (x)} ⇒ |X − x| ≤ GoldbergUlp (X)
|X − x| < GoldbergUlp (X) does not imply X ∈ { RD (x), RU (x)}
In that case, Kahan’s and Harrison’s definitions satisfy the same properties, which is not
surprising since they coincide on FP numbers.
5 Properties near infinity
Kahan’s definition is the only one that clearly defines function ulp for big numbers. Define
L as the largest finite FP number, and L− as its predecessor. If x is larger than L, then it is
clear from definition 1 that
KahanUlp (x) = L − L− .
From this, it is clear that
|X − x| <
1
KahanUlp (x) ⇒ X = RN (x)
2
So, property 3 is always true (there is no need to assume unbounded exponents, as in the
previous section).
Interestingly enough, with IEEE-754 FP (binary) numbers, the converse holds. This
is due to a feature of the IEEE-754 Standard [1] (which by the way makes RN (x) quite
INRIA
On the definition of ulp (x)
13
different from what one would expect from the term “rounding to nearest”). The standard
says that an infinitely precise result with magnitude at least
¡
¢
2emax 2 − 2−n
shall round to ∞ with no change in sign. With that convention, if X is finite,
X = RN (x) ⇒ |X − x| ≤
1
KahanUlp (x),
2
i.e., Property 4 remains true for big numbers.
The intuitive generalization of Harrison’s definition (for big numbers, the straddling
points would be L and +∞), would give +∞ for ulp (x) when X > L. This would make
sense, but would be useless: any FP number would be within 1/2 ulp from such an x.
6 Conclusion
It appears that a definition that would preserve most properties would be
½
HarrisonUlp (x)
if |x| ≤ L
ulp (x) =
−
KahanUlp (x) = L − L
otherwise,
which could be given as follows:
Definition 5 If x is a real number that lies between two finite consecutive FP numbers a
and b, without being equal to one of them, then ulp (x) = |b − a|, otherwise ulp (x) is the
distance between the two finite FP numbers nearest x. Moreover, ulp (NaN) is NaN.
Acknowledgement
In November 2004, I had the opportunity to discuss these topics (as well as other aspects
of floating-point arithmetic) with Professor Kahan. These were enlightening discussions.
Although we may occasionally disagree on minor issues, we share the same feeling that the
big improvements brought to numerical computing by the IEEE-754 and 854 standards for
floating-point arithmetic are endangered: we must explain to computer architects, compiler
designers and numerical application programmers that some features of the standards that
sometimes seem arcane or that seem to hinder performance may be crucial when reliability
and/or portability are at stake.
RR n° 5504
14
Jean-Michel Muller
References
[1] American National Standards Institute and Institute of Electrical and Electronic Engineers. IEEE standard for binary floating-point arithmetic. ANSI/IEEE Standard, Std
754-1985, New York, 1985.
[2] 754 R committee. DRAFT standard for floating-point arithmetic p754 d0.6.5. October
2004.
[3] D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys, 23(1):5–47, March 1991.
[4] J. Harrison. A machine-checked theory of floating-point arithmetic. In Y. Bertot,
G. Dowek, A. Hirschowitz, C. Paulin, and L. Théry, editors, Theorem Proving in Higher
Order Logics: 12th International Conference, TPHOLs’99, volume 1690 of Lecture
Notes in Computer Science, pages 113–130, Nice, France, September 1999. SpringerVerlag.
[5] W. Kahan.
A logarithm too clever by half.
Available
http://http.cs.berkeley.edu/~wkahan/LOG10HAF.TXT, 2004.
at
[6] P. Markstein. IA-64 and Elementary Functions : Speed and Precision. Hewlett-Packard
Professional Books. Prentice Hall, 2000. ISBN: 0130183482.
[7] J.M. Muller. Elementary Functions, Algorithms and Implementation. Birkhauser,
Boston, 1997.
[8] M. A. Overton. Numerical Computing with IEEE Floating-Point Arithmetic. SIAM,
2001.
Appendix: Maple programs that compute ulp (x) in double precision
The following two Maple programs compute KahanUlp (t) and ulp (t) as suggested in
Definition 5 for any real number t, assuming that the used floating-point format is the double
precision format of the IEEE-754 standard (i.e., r = 2 and n = 53).
KahanUlp := proc(t);
x := abs(t);
if x < 2^(-1021) then res := 2^(-1074)
INRIA
On the definition of ulp (x)
else if x > (1-2^(-53))*2^(1024) then res := 2^971
else
powermin := 2^(-1021); expmin := -1021;
powermax := 2^1024; expmax := 1024;
# x is between powermin = 2^expmin and powermax = 2^expmax
while (expmax-expmin > 1) do
expmiddle := round((expmax+expmin)/2);
powermiddle := 2^expmiddle;
if x >= powermiddle then
powermin := powermiddle;
expmin := expmiddle
else
powermax := powermiddle;
expmax := expmiddle
fi;
od;
# now, expmax - expmin = 1
# and powermin <= x < powermax
if x/powermin <= 1+2^(-54) then res := 2^(expmin-53)
else res := 2^(expmin-52)
fi;
fi;
fi;
res;
end;
SuggestedUlp := proc(t);
x := abs(t);
if x < 2^(-1021) then res := 2^(-1074)
else if x > (1-2^(-53))*2^(1024) then res := 2^971
else
powermin := 2^(-1021); expmin := -1021;
powermax := 2^1024; expmax := 1024;
# x is between powermin = 2^expmin and powermax = 2^expmax
while (expmax-expmin > 1) do
expmiddle := round((expmax+expmin)/2);
powermiddle := 2^expmiddle;
if x >= powermiddle then
powermin := powermiddle;
expmin := expmiddle
else
powermax := powermiddle;
expmax := expmiddle
RR n° 5504
15
16
Jean-Michel Muller
fi;
od;
# now, expmax - expmin = 1
# and powermin <= x < powermax
if x = powermin then res := 2^(expmin-53)
else res := 2^(expmin-52)
fi;
fi;
fi;
res;
end;
INRIA
Unité de recherche INRIA Rhône-Alpes
655, avenue de l’Europe - 38334 Montbonnot Saint-Ismier (France)
Unité de recherche INRIA Futurs : Parc Club Orsay Université - ZAC des Vignes
4, rue Jacques Monod - 91893 ORSAY Cedex (France)
Unité de recherche INRIA Lorraine : LORIA, Technopôle de Nancy-Brabois - Campus scientifique
615, rue du Jardin Botanique - BP 101 - 54602 Villers-lès-Nancy Cedex (France)
Unité de recherche INRIA Rennes : IRISA, Campus universitaire de Beaulieu - 35042 Rennes Cedex (France)
Unité de recherche INRIA Rocquencourt : Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex (France)
Unité de recherche INRIA Sophia Antipolis : 2004, route des Lucioles - BP 93 - 06902 Sophia Antipolis Cedex (France)
Éditeur
INRIA - Domaine de Voluceau - Rocquencourt, BP 105 - 78153 Le Chesnay Cedex (France)
http://www.inria.fr
ISSN 0249-6399