Download On the computation of the reciprocal of floating point expansions

On the computation of the reciprocal of floating point expansions using an adapted Newton-Raphson iteration Mioara Joldes, Valentina Popescu, Jean-Michel Muller ASAP 2014 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) 1 / 10 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) e.g. in dynamical systems like: finding sinks in Hénon map, iterating Lorenz attractor, long term stability of the solar system 1 / 10 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) e.g. in dynamical systems like: finding sinks in Hénon map, iterating Lorenz attractor, long term stability of the solar system These are usually high performance computing problems also. 1 / 10 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) e.g. in dynamical systems like: finding sinks in Hénon map, iterating Lorenz attractor, long term stability of the solar system These are usually high performance computing problems also. Our project: AMPAR CudA Multiple Precision ARithmetic librarY 1 / 10 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) e.g. in dynamical systems like: finding sinks in Hénon map, iterating Lorenz attractor, long term stability of the solar system These are usually high performance computing problems also. Our project: AMPAR CudA Multiple Precision ARithmetic librarY Existing Multiple-Precision Libraries (on CPU/GPU architectures) Multiple-digits representation: GNU MPFR, ARPREC, GPU variants: GARPREC, CUMP 1 / 10 Motivation Numerical problems require floating-point operations with higher precision than double (binary64) e.g. in dynamical systems like: finding sinks in Hénon map, iterating Lorenz attractor, long term stability of the solar system These are usually high performance computing problems also. Our project: AMPAR CudA Multiple Precision ARithmetic librarY Existing Multiple-Precision Libraries (on CPU/GPU architectures) Multiple-digits representation: GNU MPFR, ARPREC, GPU variants: GARPREC, CUMP Multiple-terms representation: QD, GPU variant: GQD 1 / 10 Extending the precision using multiple-terms format: FP expansions 2 / 10 Extending the precision using multiple-terms format: FP expansions Let a precision-p Floating-point (FP) number x = Mx · 2ex , with 1 ≤ |Mx | < 2. Denote ulp (x) = 2ex −p+1 (Goldberg’s def). 2 / 10 Extending the precision using multiple-terms format: FP expansions Let a precision-p Floating-point (FP) number x = Mx · 2ex , with 1 ≤ |Mx | < 2. Denote ulp (x) = 2ex −p+1 (Goldberg’s def). Def. A floating-point expansion u with n terms is the unevaluated sum u = n−1 P ui of n FP i=0 numbers u0 , . . . , un−1 , s.t. ui 6= 0 ⇒ |ui | ≥ |ui+1 |. 2 / 10 Extending the precision using multiple-terms format: FP expansions Let a precision-p Floating-point (FP) number x = Mx · 2ex , with 1 ≤ |Mx | < 2. Denote ulp (x) = 2ex −p+1 (Goldberg’s def). Def. A floating-point expansion u with n terms is the unevaluated sum u = n−1 P ui of n FP i=0 numbers u0 , . . . , un−1 , s.t. ui 6= 0 ⇒ |ui | ≥ |ui+1 |. Non-overlapping FP expansions u is Bailey-nonoverlapping if for all 0 < i < n, we have |ui | ≤ 1 2 ulp (ui−1 ). 2 / 10 Extending the precision using multiple-terms format: FP expansions Let a precision-p Floating-point (FP) number x = Mx · 2ex , with 1 ≤ |Mx | < 2. Denote ulp (x) = 2ex −p+1 (Goldberg’s def). Def. A floating-point expansion u with n terms is the unevaluated sum u = n−1 P ui of n FP i=0 numbers u0 , . . . , un−1 , s.t. ui 6= 0 ⇒ |ui | ≥ |ui+1 |. Non-overlapping FP expansions u is Bailey-nonoverlapping if for all 0 < i < n, we have |ui | ≤ 1 2 ulp (ui−1 ). Example: π in double-double p0 = 11.0010010000111111011010101000100010000101101000110002 , and −53 p1 = 1.00011010011000100110001100110001010001011100000001112 × 2 . p0 + p1 ↔ 107 bits FP approx. 2 / 10 Extending the precision using FP expansions Pros: – use directly available and highly optimized native FP operations – sufficiently simple and regular algorithms for addition and multiplication – straightforwardly portable to highly parallel architectures, such as GPUs. 3 / 10 Extending the precision using FP expansions Pros: – use directly available and highly optimized native FP operations – sufficiently simple and regular algorithms for addition and multiplication – straightforwardly portable to highly parallel architectures, such as GPUs. Cons: – lack of thorough error bounds – no correct rounding – QD only supports 2-doubles and 4-doubles format 3 / 10 Extending the precision using FP expansions Pros: – use directly available and highly optimized native FP operations – sufficiently simple and regular algorithms for addition and multiplication – straightforwardly portable to highly parallel architectures, such as GPUs. Cons: – lack of thorough error bounds – no correct rounding – QD only supports 2-doubles and 4-doubles format Existing algorithms Addition and Multiplication: – generalized/adapted versions of [Priest’91], [Shewchuck’97], [Bailey’01] – based on Error-Free Transforms: 2Sum, 2Prod, 2ProdFMA 3 / 10 Extending the precision using FP expansions Pros: – use directly available and highly optimized native FP operations – sufficiently simple and regular algorithms for addition and multiplication – straightforwardly portable to highly parallel architectures, such as GPUs. Cons: – lack of thorough error bounds – no correct rounding – QD only supports 2-doubles and 4-doubles format Existing algorithms Addition and Multiplication: – generalized/adapted versions of [Priest’91], [Shewchuck’97], [Bailey’01] – based on Error-Free Transforms: 2Sum, 2Prod, 2ProdFMA Division based on classical ”paper and pencil” long division algorithm [Bailey’01, Priest’91, Daumas’99] 3 / 10 Our contribution: Reciprocal of FP expansions with an adapted Newton-Raphson Iteration & explicit error bound Newton iteration for root α of f f (x ) xn+1 = xn − f 0 (xn ) , n when x0 close to α, f 0 (α) 6= 0 → quadratic convergence. 4 / 10 Our contribution: Reciprocal of FP expansions with an adapted Newton-Raphson Iteration & explicit error bound Newton iteration for root α of f f (x ) xn+1 = xn − f 0 (xn ) , n when x0 close to α, f 0 (α) 6= 0 → quadratic convergence. Newton-Raphson iteration for reciprocal i.e., root 1/a of f (x) = 1/x − a xn+1 = xn (2 − axn ), when x0 close to 1/a, → quadratic convergence, xn+1 − 1 a = −a(xn − 1 2 ) . a 4 / 10 Our contribution: Reciprocal of FP expansions with an adapted Newton-Raphson Iteration & explicit error bound Newton iteration for root α of f f (x ) xn+1 = xn − f 0 (xn ) , n when x0 close to α, f 0 (α) 6= 0 → quadratic convergence. Newton-Raphson iteration for reciprocal i.e., root 1/a of f (x) = 1/x − a xn+1 = xn (2 − axn ), when x0 close to 1/a, → quadratic convergence, xn+1 − 1 a = −a(xn − 1 2 ) . a Adapted Newton-Raphson iteration for reciprocal of FP expansion: FP Expansions xn+1 = xn · (2 − a · xn ). | {z } RdM ulE | {z } RdSubE | {z } RdM ulE → quadratic convergence 4 / 10 Main Algorithm Algorithm 1 Truncated Newton iteration based algorithm for reciprocal of an FP expansion. Input: FP expansion a = a0 + . . . + a2k −1 ; length of output FP expansion 2q . Output: FP expansion x = x0 + . . . + x2q −1 s.t. −2q (p−3)−1 x − 1 ≤ 2 . a |a| 1: 2: 3: 4: 5: 6: 7: (1) x0 = RN (1/a0 ) for i ← 0 to q − 1 do v̂[0 : 2i+1 − 1] ← RdMulE(x[0 : 2i − 1], a[0 : 2i+1 − 1], 2i+1 ) ŵ[0 : 2i+1 − 1] ← RdSubE(2, v̂[0 : 2i+1 − 1], 2i+1 ) x[0 : 2i+1 − 1] ← RdMulE(x[0 : 2i − 1], ŵ[0 : 2i+1 − 1], 2i+1 ) end for return FP expansion x = x0 + . . . + x2q −1 . 5 / 10 Example, adapted Newton-Raphson iteration on FP expansions General procedure: xn+1 = xn (2 − axn ) iter 0 = x0 , RN a1 0 6 / 10 Example, adapted Newton-Raphson iteration on FP expansions General procedure: xn+1 = xn (2 − axn ) iter 0 = x0 , RN a1 0 ! · = iter 1 x0 x1 x0 − 2 · a0 a1 , x0 6 / 10 Example, adapted Newton-Raphson iteration on FP expansions General procedure: xn+1 = xn (2 − axn ) iter 0 = x0 , RN a1 0 ! · = iter 1 x0 x1 − x0 2 · a0 a1 , x0 ! iter 2 · = x0 x1 x2 x3 x0 x1 − 2 · a0 a1 a2 a3 , x0 x1 .. . 6 / 10 Example, adapted Newton-Raphson iteration on FP expansions General procedure: xn+1 = xn (2 − axn ) iter 0 = x0 , RN a1 0 ! · = iter 1 x0 x1 − x0 2 · a0 a1 , x0 ! iter 2 · = x0 x1 x2 x3 x0 − x1 · a0 2 a1 a2 a3 , x0 x1 .. . But... when using Error Free Transforms,    · ...  u0 u1 un−1  ... v0 v1   =  vm−1 ...... w0 w1  w2mn−1 6 / 10 Example, adapted Newton-Raphson iteration on FP expansions General procedure: xn+1 = xn (2 − axn ) iter 0 = x0 , RN a1 0 ! · = iter 1 x0 x1 − x0 2 · a0 a1 , x0 ! iter 2 · = x0 x1 x2 x3 x0 − x1 · a0 2 a1 a2 a3 , x0 x1 .. . But... when using Error Free Transforms,    · ...  u0 u1 un−1  ... v0 v1   =  vm−1 ...... w0 w1  w2mn−1 Use truncated Addition/Multiplication. 6 / 10 Truncations and Error Analysis General procedure: xn+1 = xn (2 − axn ) iter 0 , = x0 RN a1 0 iter 1 = x0 x1 x0        ·          · ,  a1 x0  {z }  − 2 a0 | v̂0 | {z ŵ0 v̂1 } ŵ1 .. . 7 / 10 Truncations and Error Analysis General procedure: xn+1 = xn (2 − axn ) iter 0 Error analysis based on triangular inequalities. Let P (−j−1)p = 2−p , η= ∞ j=0 2 1−2−p , = x0 RN a1 0 iter 1 = x0 x1 x0        ·          · ,  a1 x0  {z }  − 2 a0 | v̂0 | {z ŵ0 v̂1 } i+1 γi = 2−(2 −1)p η , 1−η |xi+1 − τi | ≤ γi |xi · ŵi | , (2a) |wi − ŵi | ≤ γi |wi | ≤ γi |2 − v̂i | , |vi − v̂i | ≤ γi a(fi ) · xi , a − a(fi ) ≤ γi |a| . (2b) (2c) (2d) ŵ1 .. . 7 / 10 Truncations and Error Analysis General procedure: xn+1 = xn (2 − axn ) iter 0 Error analysis based on triangular inequalities. Let P (−j−1)p = 2−p , η= ∞ j=0 2 1−2−p , = x0 RN a1 0 iter 1 = x0 x1 x0        ·          · ,  a1 x0  {z }  − 2 a0 | v̂0 | {z ŵ0 v̂1 } i+1 γi = 2−(2 −1)p η , 1−η |xi+1 − τi | ≤ γi |xi · ŵi | , (2a) |wi − ŵi | ≤ γi |wi | ≤ γi |2 − v̂i | , |vi − v̂i | ≤ γi a(fi ) · xi , a − a(fi ) ≤ γi |a| . (2b) (2c) (2d) ŵ1 .. . And finally, xi − a−1 −2i (p−3)−1 a−1 ≤ 2 7 / 10 Comparison and Results Table: (a) Error bounds values for Priest’s formula [Priest’91] vs. Daumas [Daumas] vs. our analysis (1); d = 2q terms are computed in the quotient Prec, iteration p = 53, q = 0 p = 53, q = 1 p = 53, q = 2 p = 53, q = 3 p = 53, q = 4 p = 24, q = 0 p = 24, q = 1 p = 24, q = 2 p = 24, q = 3 p = 24, q = 4 Eq. Priest 2 1 2−2 2−6 2−13 2 1 2−2 2−5 2−12 Eq. Daumas 2−49 2−98 2−195 2−387 2−764 2−20 2−40 2−79 2−155 2−300 Eq. (1) 2−51 2−101 2−201 2−401 2−801 2−22 2−43 2−85 2−169 2−337 8 / 10 Comparison and Results Table: Timings† in MFlops/s for Alg. 1 vs. QD implementation for reciprocal (A) and division (B) of expansions; the numerator, denominator and quotient have respectively dn , di and do terms. di , do 1, 1 2, 2 4, 4 1, 2 2, 4 4, 2 1, 4 1, 8 2, 8 4, 8 8, 8 1, 16 2, 16 4, 16 8, 16 16, 16 Alg. 1 107 62 10 62 10.7 61 12.6 2 1.7 1.4 1.3 0.3 0.27 0.22 0.19 0.17 QD 107 70 3.6 86.2 3.7 86.2 7.36 * * * * * * * * * dn , di , do 2, 2, 2 4, 4, 4 2, 1, 2 4, 2, 4 2, 4, 2 4, 1, 4 Alg. 1 46.3 6.8 46.7 7 46.1 7.7 QD 70 3.6 86.2 3.7 86.2 7.36 (B) Division (A) Reciprocal † ∗ Intel(R) Core(TM) i7 CPU 3820, 3.6GHz computer precision not supported 9 / 10 Conclusion Use multiple-term format for multiple-precision floating-point numbers → FP expansions Method for computing the reciprocal/division of FP expansions based on: – ”truncated” addition and multiplication – ”adapted” Newton-Raphson Iteration Thorough error analysis and explicit error bound AMPAR CudA Multiple Precision ARithmetic librarY Thank you! 10 / 10

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download On the computation of the reciprocal of floating point expansions