Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Mathematics of radio engineering wikipedia , lookup
Location arithmetic wikipedia , lookup
Large numbers wikipedia , lookup
Recurrence relation wikipedia , lookup
Horner's method wikipedia , lookup
System of polynomial equations wikipedia , lookup
Factorization wikipedia , lookup
Vincent's theorem wikipedia , lookup
Approximations of π wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Math 321 Lecture 1 Newton’s method in one and more dimensions and IEEE floating point One of the giants of scientific computing was Isaac Newton, 300 years before he could have had access to a computer! Many of the best algorithms in use today were invented before the computer. The lesson you should draw is not to underestimate the importance of mathematics in scientific computing. The more you know, the better. A prototype numerical problem is to find the root of the equation f (x) = 0 To make it more practical, let us suppose that we want to produce a computer subroutine to compute x = sqrt(z) In order to cast it into the standard form suitable for applying Newton’s method, we rewrite it as f (x) = x2 − z = 0 p p This equation has two roots, + (z), and − (z), so we will have to make sure that we get the answer that we are after. Of course, if z < 0, then there will be no real solution. One step of Newton’s method: Suppose that we have arrived somehow at an estimate xn to the root x∗ on iteration n. We draw the tangent to the curve at the point (xn , f (xn )), and project it down to the x axis, which becomes our next iterate xn+1 , hopefully closer to x∗ . Mathematically, 0 tan(θ) = f (xn ) = f (xn ) xn − xn+1 which, after rearrangement gives us the computational procedure for obtaining the next iterate xn+1 from the current iterate xn : xn+1 = xn − f (xn ) f 0 (xn ) In our sqrt example, this has a very simple form: f (x) = x2 − z, 0 f (x) = 2x, xn+1 = xn − 1 x2n − z z = ) (xn + 2xn 2 xn This example is typical of a lot of scientific computation, in that we have a problem and a proposed method of solution. We would like to know when it will work and when it won’t (the Russian mathematician Kantorovich worked on this problem); how fast it will give us an answer (Newton understood this); and what to watch out for in a numerical implementation (the Berkeley mathematician Kahan gave us the IEEE754 standard for floating point arithmetic used in almost all computers today). 0 Obviously, it won’t always work. We will not go into all the gory details, but if f (xn ) = 0 0 it fails disastrously, and if f (xn ) ever becomes numerically small, the procedure is likely to wander far from the desired solution. To understand the rate of convergence, we have to do some analysis. Write xn+1 = x∗ + δn+1 , xn = x∗ + δn where x∗ is the solution point where f (x∗ ) = 0, δn is the error at step n, and δn+1 is the error at step n + 1. 0 xn+1 = xn − f (xn )f (xn ) becomes 0 x∗ + δn+1 = x∗ + δn − f (x∗ + δn )/f (x∗ + δn ) which gives 0 δn+1 = δn − f (x∗ + δn )/f (x∗ + δn ). Expanding using Taylor series: δn2 00 ∗ f (x ) + . . . , 2 δ 2 000 0 0 00 f (x∗ + δn ) = f (x∗ ) + δn f (x∗ ) + n f (x∗ ) + . . . 2 0 f (x∗ + δn ) = f (x∗ ) + δn f (x∗ ) + f (x∗ ) = 0 so 0 δn+1 = δn − = 2 000 δn δn 00 ∗ ∗ 2 f (x ) + 6 f (x ) + . . .) 2 δn f 00 (x∗ ) + δ2n f 000 (x∗ ) + . . .) δn (f (x∗ ) + (f 0 (x∗ ) + 2 00 δn 2δn 000 ∗ ∗ 2 (f (x ) + 3 f (x ) + . . .) 2 (f 0 (x∗ ) + δn f 00 (x∗ ) + δ2n f 000 (x∗ ) + . . .) 0 When we are near the solution, δn → 0, so provided that f (x∗ ) 6= 0 00 δn+1 ≈ δn2 f (x∗ ) 2 f 0 (x∗ ) This is described as quadratic convergence, a wonderful outcome 300 years ago and still a very desirable property of numerical algorithms today. 0 00 If f (x∗ ) = 0, the leading term in both numerator and denominator is f (x∗), and the convergence reverts to linear convergence: 1 δn+1 = δn 2 If you were to solve f (x) = x2 − 4 = 0, and f (x) = (x − 2)2 = 0, both starting from x0 = 1, you would clearly see the difference. Newton’s algorithm does not work as well for repeated roots. Computer implementation Much scientific computation involves iterations of the type xn+1 = some f unction of xn and on the computer these will usually involve inexact arithmetic if we are using C++ or FORTRAN or the evalf function in Maple. For example, √ 2 = 1.4142 . . . √ We know 2 is not a rational, so it has a non-terminating, non-recurring decimal expansion, and a similar but not so familiar non-teminating, non-recurring binary expansion. Computers are able to represent integers, and do integer arithmetic exactly. For example, a 32 bit binary representation of the decimal number 11 is 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 ^ ^ ^ ^ sign bit 8 2 1 The sign bit is 0 for positive integers. The decimal number 11 is 23 + 21 + 20 . The ones complement of a number replaces each 1 bit by a 0 bit, and each 0 bit by a 1 bit. 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 0 The twos complement of a number is the ones complement + 1; 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 0 1 The twos complement representation is used most commonly today. The number directly above is the binary representation of −11 in twos complement 32 bit computers. As you can see, the leading sign bit of a negative integer is 1. The representation of floating point, or real, numbers is a little more complicated. Over the last 50 years of computing, various different formats with varying precision have been used in different computers, with the somewhat unnerving consequence that the answers produced on different computers have not always produced exactly identical answers. We could spend a long time talking about the IEEE754 standard, but we cannot afford the time to discuss it thoroughly. A google search on IEEE754 will provide more detail, including some of Kahan’s proposals for the standard and comments on its implementation. What I will try to give here is a brief outline which may help you to read about it in more detail if you are tempted. To illustrate the point, consider the scientific notation numbers 1.234 ∗ 106 1.234 ∗ 10−6 −1.234 ∗ 106 −1.234 ∗ 10−6 There are two parts to the number, the fraction (mantissa) and the power (exponent). There are two signs, one for the number itself and one for the exponent. There are many possible representations of the same number, all of which are perfectly valid scientific notation: 1.234 ∗ 106 = 12.34 ∗ 105 = 0.1234 ∗ 107 . . . Let us pretend for a moment we are designing a decimal floating point computer. (In fact, such a computer was designed in 1955, the IBM1620). Firstly we have to decide on a dynamic range of numbers. Say we decide to accept numbers with exponents in the range 10−50 to 10+50 . Then suppose that we want each fraction to be able to be represented with 7 decimal digits, and that a properly normalised real will resemble 1.234567 ∗ 1016 , in that the digits before the decimal point are all 0, except for the one immediately before, which is only 0 if the entire number is 0.0. We can allocate one digit to the sign of the number, but we still have to represent the sign of the exponent. This can be done by using an excess 50 notation. We allocate two digits for the exponent, and equate 00 with an exponent of 10−50 , 50 with an exponent of 100 , and 99 with an exponent of 10+49 . Because the digit before the decimal point is always 0, we need not represent it, or the decimal point itself. Then we can represent the number 1.234567 ∗ 1016 by + 66 1234567. Apart from a couple of minor details, this is the design of the IBM1620 floating point number. Some of the things that can go wrong when we do floating point arithmetic are big * big big * (-big) x / 0.0 overflow overflow zero divide 0.0 / 0.0 small * small invalid underflow +Inf -Inf +Inf (x > 0) -Inf (x < 0) Nan (not a number) subnormal result, which could be 0.0 The principles underlying IEEE754 representation are similar, except that the fraction is a binary fraction, and the exponent is a power of 2 and not a power of 10. A single precision real occupies 32 bits, or four 8-bit bytes, has an 8-bit exponent giving a dynamic range of roughly 10−38 to 1038 , and a 24-bit fraction roughly equivalent to 7 decimal digits. Actually, because the leading digit of the fraction of a properly normalised number is non-zero, it has to be a 1, there is no need to store it, and the fraction can be stored in 23 bits, leaving one bit over for the leading sign bit (0 for a positive number, and 1 for a negative number). Thus the binary representation of 13.0, which is 1.101 ∗ 23 is 0 1 0 0 0 0 0 1 0 1 0 1 0 0 0 0 the leading sign bit is 0, denoting a Note the excess exponent. 1 0 0 0 0 0 exponent 1, so 1 0 0 0 0 0 1 0 is the 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 positive number. 0 0 is the representation of representation of exponent 3. A double precision real occupies 64 bits, or eight 8-bit bytes, has an 11-bit exponent giving a dynamic range of roughly 10−308 to 10308 , and a 53-bit fraction (with 52 bits stored and the first an implied 1), roughly equivalent to 16 decimal digits, plus a leading sign bit. We give a table of special numbers in single precision Number +0.0 -0.0 3,4e+38 1.1e-38 1.1e-38 1.4e-45 +inf -Inf Nan Nan Hexadecimal representation 00000000 80000000 7f7fffff 00800000 007fffff 00000001 7f800000 ff800000 ffffffff 7f800001 Comment both are possible the biggest normalised number the smallest normalised number subnormal number smallest subnormal number (quiet) (signalling) Note the subnormal numbers, which have an exponent as small as it can be (0 0 0 0 0 0 0 0). For this exponent, the fraction is represented explicitly, and does not have an implied 1 before the decimal point. This allows computations to underflow gracefully and to represent numbers smaller than would be possible if only normalised numbers were permitted What causes underflow? The smallest single precision normalised number is 1.17549435e-38, so if we did any of the following, we would produce an underflow: 1.2e-30 * 1.2e-30 = 1.44 e-40 1.2345e-36 - 1.2344e-36 = 1.0e-40 Notice that when you subtract quantities that are almost equal you lose precision in the fraction part of the result as well. Allowing subnormal numbers is ”better” than saying either of the above results is 0.0, which is the only alternative if we allow only normalised numbers. The very next operation might be a divide by this result, which will be catastrophic if it has been set to 0.0. Quiet Nan and Signalling Nan results are treated differently, in that every time an operation is performed on a signalling Nan an interrupt is generated, which if properly handled, will allow a program in trouble to terminate gracefully. If the interrupts are ignored, Nans will likely propagate through the program and produce output which can immediately be seen to be wrong. For many programs, it is not necessary for a programmer to know the details of the binary representation of the numbers in the computer, but programmers have to be aware of the fact that the precision is finite, even in double precision. You should know that terminating decimal fractions such as 0.1 and 0.01 do not have terminating binary fractions, so it would not be guaranteed, for example, that 10 ∗ 0.01 = 0.1 exactly. When you compute xn − xn+1 in Newton’s method to see how your solution is converging, the closer they come together, the more digits of precision you will lose when computing the difference. Sometimes it is useful to know the precise details. Suppose that you wish to use Newton’s method to compute the sqrt() function. If you are trying to compute sqrt(1.234e+50), you would like to start with a guess like 1.0e+25, so you really need to be able to manipulate the binary exponent. Newton’s method extended to higher dimensions Suppose we are looking for the solution of two equations in two unknowns: f (x, y) = 0 g(x, y) = 0 For example, we could be looking for the points of intersection of the line y = mx + b with the circle (x − p)2 + (y − q)2 = r2 We recast Newton’s method in one dimension and generalise: We would like to go from our current value xn to our new point xn+1 = xn + h, where 0 f (xn + h) = f (xn ) + hf (xn ) + . . . = 0 , giving 0 h = −f (xn )/f (xn ) 0 xn+1 = xn − h = xn − f (xn )/f (xn ) The two dimensional analogue is as follows: From the point (xn , yn ), we want to move to the point (xn+1 , yn+1 ) where xn+1 = xn + h yn+1 = yn + k such that f (xn+1 , yn+1 ) = 0 g(xn+1 , yn+1 ) = 0 Using the first order Taylor series approximation, now in 2 dimensions, ∂f +k ∂x ∂g g(xn+1 , yn+1 ) ≈ g(xn , yn ) + h +k ∂x f (xn+1 , yn+1 ) ≈ f (xn , yn ) + h ∂f + ... = 0 ∂y ∂g + ... = 0 ∂y which we can write as J h = − f; where " J= ∂f ∂x ∂g ∂x ∂f ∂y ∂g ∂y # " , h= # h k " , and f = f g # , so we find that we have a matrix equation to solve for h and k at each step. Returning to our example of finding the point(s) where the line mx − y + b = 0 cuts the circle (x − p)2 + (y − q)2 − r2 = 0 we find ∂f ∂x ∂f ∂y ∂g ∂x ∂g ∂x = m, = −1, = 2(x − p), = 2(y − q) giving " m −1 2(xn − p) 2(yn − q) #" h k # " = − mxn − yn + b (xn − p)2 + (yn − q)2 − r2 # . The power of matrix notation is that it extends to any number of dimensions. If we have m functions f1 , f2 , . . . fm , each of which is a function of m variables, x1 , x2 , . . . xm , then we can write one step of Newton’s method as J h=−f , where Ji j = ∂fi evaluated at xn . ∂xj The full Newton algorithm is as follows: Begin with a starting guess x0 REPEAT Evaluate the matrix J, with operation countm2 Evaluate the vector f , with operation countm Solve the matrix equation J h = −f , with operation count m3 Update xn+1 ← xn + h, with operation count m UNTIL h and f are ”small enough”, or maximum iterations exceeded. One application of Newton’s method is Bairstow’s method for extracting the roots of polynomials. It is worhwhile for you to look at this application, because it shows you another important idea, that you can differentiate through a set or recurrence relations in a way that perhaps you have not seen before, though of course Maple will support such operations. A fundamental theorem of algebra is that a polynomial of degree n with complex coefficients has n complex roots satisfying Pn (z) = 0 If we look at polynomials with real coefficients, it is not difficult to establish that if z1 = x + ι y is a root of Pn (z) = 0, then z2 = x − ι y is also a root. That is, polynomials with real coefficients have roots which are either real, or can be grouped together in complex conjugate pairs. As an example to illustrate this, consider the quartic polynomial: P4 (z) = z 4 + a3 z 3 + a2 Z 2 + a1 z + a0 = (x + ιy)4 + a3 (x + ιy)3 + a2 (x + ιy)2 + a1 (x + ιy) + a0 = Real + ι Imag Real = x4 − 6x2 y 2 + y 4 + a3 (x3 − 3xY 2 ) + a2 (x2 − y 2 ) + a1 x + a0 contains only even powers of y. Imag = 4x3 y − 4xy 3 + a3 (3x2 y − y 3 ) + a2 (2xy) + a1 y contains only odd powers of y. Therefore it follows that Real + ι Imag = 0 implies Real − ι Imag = 0 which in turn illustrates that the complex roots of a real polynomial occur in complex conjugate pairs, which we could think of as the pair of roots of the real quadratic equation z 2 +pz +q = 0, where p and q are real. Looking at the first few monic polynomials P1 (z) = z + a0 = 0 has one real root; P2 (z) = z 2 + a1 z + a0 = 0 has 2 real roots, or a complex conjugate pair; P3 (z) = z 3 +a2 z 2 +a1 z+a0 = 0; has one real root, and either a complex conjugate pair, or two more real roots . . . Bairstow’s idea is that we can write Pn (z) as follows: Pn (z) = (z 2 + pz + q) Qn−2 (z) + R z + S where Rz + S is the remainder which is left when we divide through by the quadratic, and the coefficients R and S depend on the coefficients p and q of the quadratic. If the quadratic z 2 + pz + q exactly divides Pn (z), then R and S will be zero. Although it is obvious that R and S depend on p and q, the dependence is implicit, and is usually established through the Horner recurrence relations. Let Pn (z) = z n + an−1 z n−1 + an−2 z n−2 + . . . + a1 z + a0 Qn−2 (z) = z n−2 + bn−3 z n−3 + bn−4 z n−4 + . . . + b1 z + b0 Multiply and equate coefficients (z 2 + pz + q)Qn−2 (z) + Rz + S = z n + z n−1 (bn−3 + p) + z n−2 (bn−4 + pbn−3 + q) + z n−3 (bn−5 + pbn−4 + qbn−3 ) + . . . + z 2 (b0 + pb1 + qb2 ) + z(pb0 + qB1 + R) + (qb0 + S) This leads to the recurrence relations: bn−3 + p = an−1 bn−4 + p bn−3 + q = an−2 bn−5 + p bn−4 + q bn−3 = an−3 ... = ... b 0 + p b1 + q b 2 = a2 R + p b0 + q b 1 = a1 s + q b0 = a0 from which we can define the computational procedure: b[n-1] = 0; b[n-2] = 1; for i:= (n-3) downto 0 by -1 do b[i] := a[i+2] - p * b[i+1] - q * b[i+2]; end do; R := a[1] - p * b[0] - q * b[1]; S := a[0] - q * b[0]; So, given p, q we can calculate R(p, q) and S(p, q). We are looking for values of p and q, such that R(p, q) = 0 S(p, q) = 0 an obvious application for Newton’s method, but in order to implement it, we will have to be able to compute the partial derivatives ∂R ∂p , etc. We do not have an explicit function R(p, q), but if we go back to the definition of a partial derivative, we have ∂R = ∂p ∂R = ∂q rate of change of R with p, when q is held constant. rate of change of R with q, when p is held constant. Returning to our set of equations giving R and S as a function of p and q we can differentiate each in turn with respect to p to give: dbdp[n-1] = 0; dbdp[n-2] = 0; for i:= (n-3) downto 0 by -1 do dbdp[i] := - b[i+1] - p * dbdp[i+1] - q * dbdp[i+2]; end do; dRdp := - b[0] - p * dbdp[0] - q * dbdp[1]; dSdp := - q * dbdp[0]; and again with respect to q to give: dbdq[n-1] = 0; dbdq[n-2] = 0; for i:= (n-3) downto 0 by -1 do dbdq[i] := - p * dbdq[i+1] - q * dbdq[i+2] - b[i+2]; end do; dRdq := - p * dbdq[0] - q * dbdq[1] - b[1] ; dSdq := - q * dbdq[0] - b[0]; I wanted to show you Bairstow’s method, because it is an illustration of the way partial derivatives can be computed, even though the functions R(p, q) and S(p, q) are defined implicitly through a set of recurrence relations. There are several automatic differentiation packages, including two of the well known packages, ADIFOR for FORTRAN and ADIC for C, see http://www-fp.mcs.anl.gov/autodiff/ , which if you supply the program to evaluate the function will return the program to compute the partial derivatives. These automatic differentiation packages have made Newton’s method for complex functions much easier to use, because they provide mistake free programs to compute the derivatives. It is very easy to make a mistake in coding the partial derivatives, and of course if the function is a function of many variables there is a lot of work to be done when the differentiation is performed manually. With these procedures available, it is easy to iterate p ← p + δp , q ← q + δq where " ∂R ∂p ∂S ∂p ∂R ∂q ∂S ∂q #" δp δq # " = − R(p, q) S(p, q) until R and S are 0, implying z 2 + p z + q is a factor of Pn (z). # . While on the subject of polynomials, it is worth mentioning that if you have to program, in a procedural language, the evaluation of the polynomial Pn (z) = a0 + a1 z + a2 z 2 + . . . + an z n with the coefficients a0 , a1 , . . . an stored in a[0] . . . a[n], the procedure should be as follows: f := a[n]; for i:= n-1 downto 0 by -1 do f := z * f + a[i]; end do; As you can see, the total operation count for performing the loop is n additions and n multiplies. Even when coding small polynomials, you should use the efficient evaluation process, for example: f := a[0] + z * (a[1] + z * (a[2] + z * a[3] ) for a cubic, rather than the obvious, but less efficient f := a[0] + a[1] * z + a[2] * z * z + a[3] * z * z * z Finally, I will show how detailed knowledge of IEEE format can be used in a language like C to generate a good starting approximation to sqrt(x), which, as you have seen from the tutorial is necessary for Newton’s method to converge rapidly. In double precision, a floating point number is stored as one sign bit, followed by an 11 bit exponent stored in excess notation ( ie 10000000000 represents 20 ), followed by a 53 bit mantissa with the first 1 bit implied and the remaining 52 bits stored. To generate a close approximation to the correct exponent, the following sequence suffices: AND (z, mask), where mask = 0111111111110000000000..... SHIFT result 52 places to the right SUBTRACT 100000000 to remove the excess from the exponent, DIVIDE by 2, to halve the real exponent ADD 100000000000 to restore the representation of the halved exponent SHIFT result 52 places to the left to put exponent in the correct position AND result with the same mask to produce the approximate sqrt Of course, in a language like Maple such bit manipulations are unnecessary, and procedural languages like C will always provide sqrt() in their libraries, but it may be useful to how how to manipulate exponents in other applications. For example, if you were writing a procedure to compute the f if th root, rather than the sqrt, the process would be the same, except that ”DIVIDE by 2” would be replaced by ”DIVIDE by 5”.