Download An algorithm for solving non-linear equations based on the secant

An algorithm for solving non-linear equations based on the secant method By J. G. P. Barnes* This paper describes a variant of the generalized secant method for solving simultaneous nonlinear equations. The method is of particular value in cases where the evaluation of the residuals for imputed values of the unknowns is tedious, or a good approximation to the solution and the Jacobian at the solution are available. Experiments are described comparing the method with the Newton-Raphson process. It is shown that for suitable problems the method is considerably superior. 1. Introduction With the increased use of digital computers, sets of non-linear algebraic equations are now being solved in which the functions may be defined by a lengthy process. There seems to be a need to develop methods which demand as few function evaluations as possible by making the utmost use of the information obtained from each evaluation. In this paper, therefore, the number of function evaluations required will be taken as the criterion for the comparison of methods. The Newton-Raphson method for solving non-linear equations is well known and has the advantage of being rapidly convergent in the neighbourhood of a solution. It does, however, demand the determination of the matrix of first derivatives (the Jacobian) of the system at each iteration. For lengthy sets of equations, explicit derivatives are rarely available, and even if they were their evaluation would usually be more tedious than that of the parent functions. Hence, if the NewtonRaphson method is to be employed the derivatives must be evaluated numerically, and this means at least n + 1 function evaluations per iteration (n is the dimension of the system). For equations whose Jacobian does not change much from iteration to iteration (i.e. for near-linear equations), the complete re-evaluation of the Jacobian at each iteration is unnecessary, and it has been suggested that the Jacobian only be re-evaluated every k iterations, where A: is a small integer. In practice this procedure is not to be recommended. The method to be described has the great advantage of not requiring the explicit evaluation of derivatives, but uses instead an approximate value of the Jacobian and corrects this after each function evaluation. Like the Newton-Raphson method it is (given suitable initial conditions) capable of determining both real and complex roots. It will be shown to be equivalent to the generalized secant method described by Bittner (1959) and Wolfe (1959), but has the additional advantage of being able to make use of an initial approximation to the Jacobian. The benefit of this last fact should not be underestimated since the situation often arises in practice where the same set of equations are to be solved several times with slightly different values of certain parameters. The final solution point and Jacobian of one problem often provide excellent initial conditions for the next, and under these circumstances the method may prove to be many times faster than Newton-Raphson. Both theoretical and experimental results will show that this method is in general about twice as good as Newton-Raphson in the neighbourhood of a solution. 2. Notation Vectors, matrices and tensors will be denoted by boldface type. Subscripts will in general refer to components of tensors and superscripts to iterates. Thus xfp is the /th component of theyth iterate of the vector x. The summation convention is applied where relevant. The transpose of a matrix will be denoted by the superfix T. The general non-linear equations to be solved (in n dimensions) are Mxj) = o (l) where subscripts run from 1 to n. 3. The basic method Let / ( l ) be the initial (guessed) value of the Jacobian and let xw be the initial point at which the function value is/ ( I ) . Then the first step Sx^ is defined by /(i) OJC (i> = -/(»>. (2) (l) Note that if/ were correct this would be the NewtonRaphson step. This gives rise to the point x(2) = x (i) + 5,(0 (2) (3) at which the function value i s / . The correction to be applied to the Jacobian / ( l ) is determined by considering the behaviour of a linear system which has values/ (1) ,/ (2) at *<'>, x(2), respectively. Imperial Chemical Industries Limited, Bozedown House, Whitchurch Hill, Reading, Berks. 66 Secant method for non-linear equations The important consequence of this choice of z (l) is that o \<i-j<n. (13) Suppose / were the Jacobian of such a system, then we would have h JW>. (4) The corrected Jacobian / is chosen to satisfy equation (4). Suppose the correction is Z)(l) so that jd) = jo) + DO) (5) Hence /o+D! (2) thenZ><'> satisfies /"> =/*•> + (/ using (2) this gives (1) + Z)('))5* y(2) = D<i)8x°>. (1) + DW]8xW [/C/+D + Z)0 1, = /O- (14) < n. and 5. The linear case Now consider the behaviour of the method on a general set of linear equations (6) A solution of equation (6) is /= Gx - b. (15) (m+1) Suppose that the step 8x is the first step linearly dependent on the previous steps. Then certainly m < n since n + 1 vectors cannot be linearly independent. Let where z(1) is an arbitrary vector. A general iteration is thus \ (7) (8) Now /(-+1)8*0> = ju+ 08*0) from (14) 1 < j = /(/+1) - / O ) from (12) = G(xO+0 _ ;cO>) from (15) (9) (10) where the vectors z Note that (;) (16) are as yet undefined. (17) Hence from (16) 01) (12) and (() It remains to choose the vectors z . = Sk 4. The vectors z(/> = C8*('"+ | ). A desirable feature of any method of solving nonlinear equations is that it should rapidly solve a linear set of equations. In fact, n + 1 function evaluations should suffice to determine the Jacobian of the system exactly, and hence the solution ought to be found on the (n -f- 2)nd function evaluation. In our case this means that jr("+2) ought to be the solution. It will be shown in the next section that with the choice of vectors z(l) as follows this desirable feature is in fact obtained. If i > n then z(i) is chosen orthogonal to the previous n — 1 steps, 8x('-" + 1> . . . Sx<'-»>. If i < n then we can only demand that z(l) is orthogonal to the available j — 1 steps, 8x(1) . . . Sx (l ~'). This naturally leaves some freedom of choice, and for simplicity in this case z(l) is taken to be that linear combination of 8x(I) . . . 8;t(;) which is orthogonal to > It will be noticed from equation (9) that the magnitude of z (l) is irrelevant. For convenience it is taken to be a unit vector, and the above computation is then readily performed in either case by the usual Gram-Schmidt orthogonalization process (see Section 13). + C8* So = /("' +!) = 0. + (m+I) (18) from (15) 7(m + l)8 A: (m+l)fr om (18) from (7) Thus the step 8jc (m+1) leads to the solution which is hence found after at most n + 2 function evaluations. An alternative interpretation is afforded by considering the approach of successive approximations / ( l ) to the correct value G. To be explicit consider the eigenvectors of zero eigenvalue (nullity vectors) of the matrices (/ ( / ) — G). Equation (17) shows that all previous steps are such eigenvectors, and are independent. Each step reduces by one the maximum possible value of the rank of / ( l ) — G, until after at most n steps its rank is zero implying that / ( ( ) = G. The following step, of course, gives the solution. If the rank of / ( 1 ) — G is not n then it is possible for the solution to be found in less than n + 1 steps, but not necessary. It depends upon the null eigenvectors of / ( 1 ) — G. If the orthogonalization procedure could be started with such null eigenvectors accounted for then earlier convergence would be assured. 67 Secant method for non-linear equations From (24) using Cramer's rule 6. The general case Now consider the behaviour with the general set of non-linear equations (1). We have /<«8x<*-'> = /(*-'+»8x<*-0 from (14) « / » - I + D -.fOc-n from (12) = 8/(*-''> say, i < k and 0 < / < n. (19) In particular if k > n +'), . . . 8/»-n]. X (25) Equations (22) and (25) provide an explicit expression for * (fc+1) in terms of the previous n + 1 pairs of points and function values. 8. Convergence rate Consider a general set of non-linear equations, and expand using Taylor's theorem about the solution point \ (20) (k) The value of the Jacobian J is therefore that of the linear system defined by the n + 1 pairs of points and function values, x<-k-«),/(*-»> . . . *<*>,/«>. The method is therefore identified with the generalized secant method and as such has been previously described by Bittner (1959) and Wolfe (1959). The present representation of the secant method, however, has the advantage of being able to use an initial value of /, and in practice has been found to be more reliable. hx) =fiJ8xJ where the derivatives are evaluated at the solution point %. No loss of generality is incurred by taking % = 0, and since the algorithm is seen from equations (7) and (20) to be invariant under a linear transformation, we will also assume fUj = 8;y. So a general set of non-linear equations may be considered to be 7. An explicit expression for x(k+1'> In this section suppose k > n and for clarity put /<» = / . Then the equations fi — xi Near the solution point terms O(x3) may be ignored. The convergence rate of the method will therefore be considered with respect to the system of equations I = 0 . . . I! - 1 /, = Xi + BiJkXjXk = 0 may be written in the form /<*-» = /x<fc-'> +L i= 0...n _ x(k) _ J-lf(k) 9. Newton-Raphson convergence We consider equation (26). The derivatives are j . . = S y + 2BUkxk and J^xSp = — /J" defines the step (22) 8y\ Rewrite (21) in the form Now F=JX (23) xtf) = X(D + 8*0) = J- '[/r<'> - where / is a 1 x (n + 1) matrix with each element unity and p _ ry(*-/,)ya-n+i)) (26) where Bljk is symmetric in j and k. To enable comparisons to be made, the NewtonRaphson algorithm will be considered first. (21) from which x(k+\) + ^fiJ and ./ ( w ] Now Equation (23) represents n sets of linear equations in n + 1 variables for the n2 + n unknown elements of / and L. Taking the /th row of equation (23) so /,,*</> - / ? > = 8ijX<P - *< = B 7 - ' = l + O(x) xf = BurfW + O(x3). . (27) This is a well-known result indicating second-order convergence. If x is some linear measure of the vector x, then heuristically (24) (28) where F, is the /th row of matrix F. 68 Secant method for non-linear equations We wish to find the dominant root of this equation. Equation (35) has one real root > 1 and, if n is odd, one other real root between — 1 and 0. It has no other real roots. The roots of the equation are the same as the eigenvalues of the companion matrix: 10. Convergence of secant method To simplify the notation in this section we will consider x ( n + 2 ) . The general result follows at once. x<"+2> = - / - ' L Now (22) where L is given by 0 1 0 1 0 1 X (25) / Applying // = x these to equations formulae (26) gives i + Bijkxjxk- Suppose that successive iterations give much closer approximations to the root. Then > |*<2)| > . . . > |x("+')|. I* (0 X Expanding Expand X F, term is Bijkx^ So by the bottom row , the dominant term 1 is |x<»\ x<2>, . ] 0 0 1 0 0 1 1 This is a non-negative reducible matrix. Hence by the weak form of a theorem of Frobenius (Gantmacher (1959), Varga (1962)), it has a non-negative real eigenvalue equal to its spectral radius. It follows that the only positive real root of equation (35) is equal to this eigenvalue, and hence no other root of equation (35) has larger modulus. It remains to prove that no other root of the equation has the same modulus. It is obvious that the positive real root is a simple root. Let it be r. 1 (36) Then r- 1 by the bottom row, then the dominant } lx (2) x ( 3 ) r0)ra)\ L, == E x(n+1)l • •.x<" xm >xO)> +1 >| "1 (29) + higher terms. As before J~l = 1 + O(x), and if x is again some linear measure of the error we obtain x ~ Bx x" . (30) Suppose re'9 is also a root. Then 11. Comparison of convergence rates The convergence rates have been shown to be approximately given by B{x(i))2 xit+i) = x<.i+n+i) Newton-Raphson — £xu)x(n+o Secant method so (31) (32) Newton-Raphson (33) Secant method - f - 1 = 0. from (36). re'« — 1 r— 1 vi+n+i (34) = 1 =v,+vn+i is hence dominated by vt = vot' and the reduction factor per iteration is t. A similar result has previously been obtained by Tornheim (1964) as an example of a multipoint iterative method. Hence the ratio of the number of function evaluations required by Newton-Raphson to the number for the secant method for a given error reduction is Consider first the Newton-Raphson case. Each iteration requires the evaluation of the Jacobian and so involves at least n + 1 function evaluations. For the purpose of comparison we will suppose only n + 1 are in fact required. From equation (33) the reduction factor is seen to be 2 1 / n + 1 per function evaluation. Consider now the difference equation (34) describing the secant method. Its characteristic equation is f+l g/i/8 which is impossible unless e'9 = 1. Hence there are no roots of modulus r other than the real positive root. The positive real root of (35) is therefore dominant. We will henceforth denote this root by t. The solution of the difference equation (34) = log (Bx<-») then vi+l = 2v, « , + n + 1 = V; + vn+i from (35), Taking moduli where x ( ( ) is some linear measure of the error of the ith iterate. Put v. rV'oCe' 9 — 1) = 1 _ (n + 1) log , Ioi2 (35) 69 (37) Secant method for non-linear equations The secant method may be said to be better by a factor Rn. Some values of Rn and t are shown in Table 1 Rates of convergence Table 1. 12. Experimental results To test the above theoretical prediction, the equations f, = Xf + BiJkXjXk = 0 were solved by both Newton-Raphson (the derivatives being available analytically) and the secant method, using a Mercury digital computer. The coefficients BiJk were generated as random variables from a rectangular distribution (\BiJk\ < Bo say). In each run the starting point was taken at random on the unit sphere. By varying Bo the effective degree of non-linearity is altered. Two minor difficulties arise. First, the length of the mantissa of the floating-point representation used (on Mercury) was 29 bits, and so cancellation errors precluded an improvement of more than about 8 decimal digits per iteration. With the rapid convergence of the two methods here under comparison, many successive valid iterations could not therefore be obtained. Secondly, it was difficult to know what initial value of Jacobian to give the modified secant method, and how to penalize it for having such knowledge. To obtain results of reasonable variance in the face of the first difficulty several runs were carried out for each n and Bo. Each run was terminated when the ratio of successive values of f2 = fj-, (the accuracy measure) exceeded 1014 using Newton-Raphson. (In each case at the correponding level with the secant method the ratio of successive values of f1 was less than 1014.) The iteration number of the last allowed iteration on Newton-Raphson was recorded, and the corresponding number of iterations to obtain the same degree of accuracy with the secant method was found by interpolation between the two iterations around that accuracy. The interpolation was carried out on the logarithms of the successive values of/2. Linear interpolation was discarded since it would have given rise to consistently low values for the iteration. Interpolation was actually carried out on the assumption that the successive values of these logarithms fitted a curve of the form log (/ 2 ) = at' + j8 where t is given by Table 1. This is, of course, a result predicted by the above theory. Two series of runs were carried out with the secant method. One was with the initial Jacobian exact (at the starting point) so that the first iteration is the same as Newton-Raphson. The second was with the Jacobian equal to the unit matrix (which is, of course, the correct value at the solution). In comparing the number of function evaluations required, each Newton-Raphson iteration is scored as n + 1 evaluations. The following scores were evaluated for each run. n t 1 2 3 4 5 6 7 8 9 10 20 50 100 1-618 1-466 1-380 1-325 1-285 1-255 1-232 1-213 1-197 1-184 1-114 1-058 1034 Rn 1-388 1-654 1-860 2-028 2-172 2-297 2-409 2-509 2-600 2-684 3-283 4-179 4-914 (1) Initial /exact: Since thefirststep is the same in each case, the count was from the end of that step. This score would be expected to be in favour of the secant method since the initial J is good. (2) Initial J exact: As an alternative to the above the iterations were counted from the beginning, but a penalty of n function evaluations was added to the score of the secant method to compensate for the exact /. This penalty is obviously too heavy, and the score is therefore in favour of Newton-Raphson. (3) Initial J unit: The initial J is equal to the correct final value; this sort of situation may well arise in practice. Iterations were both counted from the start. This score is in favour of the secant method especially for large n since / is not altered substantially before the solution is nearly reached. The equations were solved with n = 2(1)7 and Bo = 0 • 01 and 0-1. Five runs were carried out for each pair of values of n and Bo. The relative behaviour of the two methods was found to be essentially independent of Bo although obviously more iterations were required for the higher value. The runs are therefore grouped only according to the value of n. The total scores over all runs for each value of n were accumulated, and the corresponding estimates of Rn were evaluated and are shown in Tables 2 and 3. Comparison with the predicted value of Rn shows good agreement in every case in view of the comments about the bias of the scores. Scores 1 and 2 straddle the predicted value in every case. Score 3 shows how rapid the secant method is under favourable conditions. 13. Stability Bittner (1959) has shown that the secant method as defined by equations (7) and (20) has the following property. 70 Secant method for non-linear equations Table 2 Mean number of iterations required SECANT INITIAL JACOBIAN DIMENSION NEWTONRAPHSON (a) EXACT (t>) UNIT 2 3 4 5 6 7 2-5 2-5 2-7 3-0 2-8 3-4 3-72 4-10 4-82 6-12 5-95 8-36 3-47 3-68 4-14 5-40 4-61 and i = 2, . . . m. Then the vectors eu . . . em are an orthonormal basis of the space spanned by the set ?i> • • • ?mIf we now put m = min («, k) and 607 in the above we obtain em = z(k), and, writing Ck for C, (to distinguish iterations), Table 3 Comparison of scores with predicted value 1 "" |z<»||Sxt»| SCORE DIMENSION so that condition (39) may be written PREDICTED l 2 3 1-654 1-860 2-028 2-172 2-297 2-409 1-656 1-938 2-222 2-343 2-544 2-608 1-312 1-409 1-530 1-618 1-640 1-771 2-163 2-718 3-264 3-336 4-250 4-479 (40) |C*| > />. 2 3 4 5 6 7 In particular, if k > n so that m = n A pkrk fk (41) Ck = 1 for all k (42) See Todd (1962). Note that \c}+>\ and Denote the determinant Sx<« |5x(*—+i)|' |8x«-"+ 2 >i' ' ' ' |Sx<»| the latter being easily seen by considering the geometric properties of the system. by A, Then for k > n we have Then given w such that 0 < w < 1 and provided that | A* | > wfor allfc> n (38) there exists a neighbourhood of the solution within which convergence is assured. The condition |Afc| > w is the sort of expedient that might be thought necessary for the reliable computation of 7 ( A + 1 ) from equation (20). This sort of condition is not obviously required for the algorithm as expressed by equation (7) et seq. It might, however, be thought necessary to impose a condition of the form |(z<W)r8x<»| (43) then (i) If |A,| > w |C* . . . Ck\ > w and so |C*| > w. (ii) If \Ck\> p then for all k < fc0, say, |A t | = \C\ . . . Ck\ ~> \Ck-" + 2 SO C k\ hv ^ 4 ^ |%| >p"~'. It has thus been shown that conditions (38) and (39), although not identical, nevertheless are related. Consequently either of the two tests may be used to ensure convergence under suitable conditions. As a practical consequence of the above, one or both of the tests are applied at each iteration and the proposed step Sx(k) rejected if the test fails. A satisfactory alternative is to set 8x(Ar) parallel to z w (in which case the tests will evidently be satisfied). The magnitude of the alternative step is still arbitrary; a suitable value might be that of the rejected vector. (39) to ensure the reliable computation of (9). It will now be shown that conditions (38) and (39) are related by considering the orthogonalization process used to determine the z w . Suppose that 5i, • . . ? m are a set of m < n unit vectors. Define the vectors eu . . . em and the scalars Cu . . . Cm by the following equations. 71 Secant method for non-linear equations A more general method, which has a much larger domain of convergence, may be formed by imposing a success criterion. The usual criterion employed is the minimization of/2. It is ensured that each step gives rise to an improvement (i.e. reduces/2) by multiplying the step by a suitable scalar in those cases where the direct application of the algorithm does not give rise to an improvement. The imposition of such a criterion ensures convergence over a large domain but does not impair the final convergence rate. A generalization of the algorithm as defined by equations (7) et seq. is now needed for the case in which &x is not prescribed by equation (7). An argument similar to that of Section 3 leads to the replacement of (9) by (z«))r6*<'> (44) which reduces to (9) in the usual case, In practice the use of (44) for the calculation of in every case is recommended. The values of C* and Ak were monitored for all the experiments of Section 12. From these it would seem that the simplest procedure likely to give consistent results is to test C* only, and reject the step if |C*| < p0. p0 might be 10~4. Larger values of p0 may delay convergence considerably. Acknowledgements The author wishes to express his thanks to Imperial Chemical Industries Limited for permission to publish this paper, to his colleagues Mr. I. Gray and Dr. H. H. Robertson for their constant advice and encouragement, and to the referee for his constructive criticisms. References BITTNER, L. (1959). "Eine Verallgemeinerung des Sekantenverfahrens (regula falsi) zur naherungsweisen Berechnung der Nullstellen eines nichtlinearen Gleichungssystems," Wissen. Zeit. der Technischen Hochschule Dresden, Vol. 9, p. 325. GANTMACHER, F. R. (1959). Applications of the Theory of Matrices, New York: Interscience Publishers Inc. TODD, J. (1962) (Ed.). A Survey of Numerical Analysis, New York: McGraw-Hill Book Co. TORNHEIM, L. (1964). "Convergence of Multipoint Iterative Methods," / . Assoc. Comp. Mach., Vol. 11, p. 210. VARGA, R. S. (1962). Matrix Iterative Analysis, London: Prentice-Hall International. WOLFE, P. (1959). "The Secant Method for Simultaneous Non-linear Equations," Comm. Assoc. Comp. Mach., Vol. 2, p. 12. contradiction shows that either the function T does not exist or that P is not a program". Since the nonexistence of T itself implies that P is not a program, the most that can be concluded is that in any event P is not a program. To the Editor, The Computer Journal. "An impossible program" Dear Sir, I do not know whose leg Mr. Strachey is pulling (this Journal, January 1965, p. 313); but if each letter in refutation of his proof adds to some private tally for his amusement, then I am happy to amuse him. May I offer three independent refutations? I am, of course, being careful not to claim that Mr. Strachey's initial assertion (that it is impossible to write a program which can examine any other program and tell, in every case, if it will terminate or get into a closed loop when it is run) is false. But what is manifest is that his proof of the far stronger assertion (that T[R] does not exist) is invalid: both in its final step (see (iii) above) and in its assumption that a set of statements in CPL—or any other language—necessarily constitutes a program. (If anybody doubts my counter assertion that P is not a program, let him try compiling P in—any— machine language!) Yours faithfully, (i) He defines a function T[R]. Any subsequent "proof" that T cannot exist is then idle; the function exists by definition. (ii) If T does not exist, then P does not exist, since T is essentially involved in the statement of P. So P is not a program. So P is not an acceptable argument forT. (iii) If one accepts Mr. Strachey's reasoning up to the point "In each case T[P] has exactly the wrong value", the appropriate deduction is not "this contradiction shows that the function T cannot exist" but "this H. G. APSIMON. 22 Stafford Court, London, W.8. 18 February 1965. 72

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download An algorithm for solving non-linear equations based on the secant