Download An algorithm for solving non-linear equations based on the secant

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Compressed sensing wikipedia , lookup

Quadratic equation wikipedia , lookup

Linear algebra wikipedia , lookup

Signal-flow graph wikipedia , lookup

Elementary algebra wikipedia , lookup

Horner's method wikipedia , lookup

Cubic function wikipedia , lookup

Quartic function wikipedia , lookup

System of polynomial equations wikipedia , lookup

History of algebra wikipedia , lookup

Equation wikipedia , lookup

System of linear equations wikipedia , lookup

Transcript
An algorithm for solving non-linear equations based
on the secant method
By J. G. P. Barnes*
This paper describes a variant of the generalized secant method for solving simultaneous nonlinear equations. The method is of particular value in cases where the evaluation of the residuals
for imputed values of the unknowns is tedious, or a good approximation to the solution and
the Jacobian at the solution are available.
Experiments are described comparing the method with the Newton-Raphson process. It is
shown that for suitable problems the method is considerably superior.
1. Introduction
With the increased use of digital computers, sets of
non-linear algebraic equations are now being solved in
which the functions may be defined by a lengthy process.
There seems to be a need to develop methods which
demand as few function evaluations as possible by
making the utmost use of the information obtained from
each evaluation. In this paper, therefore, the number of
function evaluations required will be taken as the criterion
for the comparison of methods.
The Newton-Raphson method for solving non-linear
equations is well known and has the advantage of being
rapidly convergent in the neighbourhood of a solution.
It does, however, demand the determination of the
matrix of first derivatives (the Jacobian) of the system
at each iteration. For lengthy sets of equations, explicit
derivatives are rarely available, and even if they were
their evaluation would usually be more tedious than
that of the parent functions. Hence, if the NewtonRaphson method is to be employed the derivatives
must be evaluated numerically, and this means at least
n + 1 function evaluations per iteration (n is the dimension of the system).
For equations whose Jacobian does not change much
from iteration to iteration (i.e. for near-linear equations),
the complete re-evaluation of the Jacobian at each
iteration is unnecessary, and it has been suggested that
the Jacobian only be re-evaluated every k iterations,
where A: is a small integer. In practice this procedure
is not to be recommended.
The method to be described has the great advantage
of not requiring the explicit evaluation of derivatives, but
uses instead an approximate value of the Jacobian and
corrects this after each function evaluation. Like the
Newton-Raphson method it is (given suitable initial
conditions) capable of determining both real and complex roots. It will be shown to be equivalent to the
generalized secant method described by Bittner (1959)
and Wolfe (1959), but has the additional advantage of
being able to make use of an initial approximation to
the Jacobian. The benefit of this last fact should not be
underestimated since the situation often arises in practice
where the same set of equations are to be solved several
times with slightly different values of certain parameters.
The final solution point and Jacobian of one problem
often provide excellent initial conditions for the next,
and under these circumstances the method may prove
to be many times faster than Newton-Raphson.
Both theoretical and experimental results will show
that this method is in general about twice as good as
Newton-Raphson in the neighbourhood of a solution.
2. Notation
Vectors, matrices and tensors will be denoted by boldface type.
Subscripts will in general refer to components of
tensors and superscripts to iterates. Thus xfp is the
/th component of theyth iterate of the vector x.
The summation convention is applied where relevant.
The transpose of a matrix will be denoted by the
superfix T.
The general non-linear equations to be solved (in n
dimensions) are
Mxj) = o
(l)
where subscripts run from 1 to n.
3. The basic method
Let / ( l ) be the initial (guessed) value of the Jacobian
and let xw be the initial point at which the function
value is/ ( I ) .
Then the first step Sx^ is defined by
/(i) OJC (i> =
-/(»>.
(2)
(l)
Note that if/ were correct this would be the NewtonRaphson step.
This gives rise to the point
x(2)
=
x (i)
+
5,(0
(2)
(3)
at which the function value i s / .
The correction to be applied to the Jacobian / ( l ) is
determined by considering the behaviour of a linear
system which has values/ (1) ,/ (2) at *<'>, x(2), respectively.
Imperial Chemical Industries Limited, Bozedown House, Whitchurch Hill, Reading, Berks.
66
Secant method for non-linear equations
The important consequence of this choice of z (l) is
that
o
\<i-j<n.
(13)
Suppose / were the Jacobian of such a system, then we
would have
h JW>.
(4)
The corrected Jacobian / is chosen to satisfy equation (4). Suppose the correction is Z)(l) so that
jd) = jo) + DO)
(5)
Hence
/o+D!
(2)
thenZ><'> satisfies /"> =/*•> + (/
using (2) this gives
(1)
+ Z)('))5*
y(2) = D<i)8x°>.
(1)
+ DW]8xW
[/C/+D + Z)0
1,
= /O-
(14)
< n.
and
5. The linear case
Now consider the behaviour of the method on a
general set of linear equations
(6)
A solution of equation (6) is
/=
Gx - b.
(15)
(m+1)
Suppose that the step 8x
is the first step linearly
dependent on the previous steps. Then certainly m < n
since n + 1 vectors cannot be linearly independent.
Let
where z(1) is an arbitrary vector.
A general iteration is thus
\
(7)
(8)
Now
/(-+1)8*0> = ju+ 08*0) from (14) 1 < j
= /(/+1) - / O ) from (12)
= G(xO+0 _ ;cO>) from (15)
(9)
(10)
where the vectors z
Note that
(;)
(16)
are as yet undefined.
(17)
Hence from (16)
01)
(12)
and
(()
It remains to choose the vectors z .
= Sk
4. The vectors z(/>
= C8*('"+ | ).
A desirable feature of any method of solving nonlinear equations is that it should rapidly solve a linear
set of equations. In fact, n + 1 function evaluations
should suffice to determine the Jacobian of the system
exactly, and hence the solution ought to be found on
the (n -f- 2)nd function evaluation. In our case this
means that jr("+2) ought to be the solution. It will be
shown in the next section that with the choice of
vectors z(l) as follows this desirable feature is in fact
obtained.
If i > n then z(i) is chosen orthogonal to the previous
n — 1 steps, 8x('-" + 1> . . . Sx<'-»>.
If i < n then we can only demand that z(l) is orthogonal
to the available j — 1 steps, 8x(1) . . . Sx (l ~'). This
naturally leaves some freedom of choice, and for simplicity in this case z(l) is taken to be that linear combination of 8x(I) . . . 8;t(;) which is orthogonal to
>
It will be noticed from equation (9) that the magnitude
of z (l) is irrelevant. For convenience it is taken to be
a unit vector, and the above computation is then readily
performed in either case by the usual Gram-Schmidt
orthogonalization process (see Section 13).
+ C8*
So
= /("' +!)
= 0.
+
(m+I)
(18)
from (15)
7(m + l)8 A: (m+l)fr om (18)
from (7)
Thus the step 8jc (m+1) leads to the solution which is
hence found after at most n + 2 function evaluations.
An alternative interpretation is afforded by considering
the approach of successive approximations / ( l ) to the
correct value G. To be explicit consider the eigenvectors
of zero eigenvalue (nullity vectors) of the matrices
(/ ( / ) — G). Equation (17) shows that all previous steps
are such eigenvectors, and are independent. Each step
reduces by one the maximum possible value of the rank
of / ( l ) — G, until after at most n steps its rank is zero
implying that / ( ( ) = G. The following step, of course,
gives the solution.
If the rank of / ( 1 ) — G is not n then it is possible for
the solution to be found in less than n + 1 steps, but
not necessary. It depends upon the null eigenvectors of
/ ( 1 ) — G. If the orthogonalization procedure could be
started with such null eigenvectors accounted for then
earlier convergence would be assured.
67
Secant method for non-linear equations
From (24) using Cramer's rule
6. The general case
Now consider the behaviour with the general set of
non-linear equations (1).
We have
/<«8x<*-'> = /(*-'+»8x<*-0
from (14)
« / » - I + D -.fOc-n
from (12)
= 8/(*-''> say, i < k and 0 < / < n. (19)
In particular if k > n
+'), . . . 8/»-n].
X
(25)
Equations (22) and (25) provide an explicit expression
for * (fc+1) in terms of the previous n + 1 pairs of points
and function values.
8. Convergence rate
Consider a general set of non-linear equations, and
expand using Taylor's theorem about the solution
point \
(20)
(k)
The value of the Jacobian J is therefore that of the
linear system defined by the n + 1 pairs of points and
function values, x<-k-«),/(*-»> . . . *<*>,/«>.
The method is therefore identified with the generalized
secant method and as such has been previously described
by Bittner (1959) and Wolfe (1959). The present
representation of the secant method, however, has the
advantage of being able to use an initial value of /, and
in practice has been found to be more reliable.
hx) =fiJ8xJ
where the derivatives are evaluated at the solution
point %.
No loss of generality is incurred by taking % = 0, and
since the algorithm is seen from equations (7) and (20)
to be invariant under a linear transformation, we will
also assume fUj = 8;y.
So a general set of non-linear equations may be considered to be
7. An explicit expression for x(k+1'>
In this section suppose k > n and for clarity put
/<» = / .
Then the equations
fi — xi
Near the solution point terms O(x3) may be ignored.
The convergence rate of the method will therefore be
considered with respect to the system of equations
I = 0 . . . I! - 1
/, = Xi + BiJkXjXk = 0
may be written in the form
/<*-» = /x<fc-'> +L
i= 0...n
_ x(k) _
J-lf(k)
9. Newton-Raphson convergence
We consider equation (26). The derivatives are
j . . = S y + 2BUkxk and J^xSp = — /J" defines the step
(22)
8y\
Rewrite (21) in the form
Now
F=JX
(23)
xtf) = X(D + 8*0)
= J- '[/r<'> -
where / is a 1 x (n + 1) matrix with each element unity
and
p _ ry(*-/,)ya-n+i))
(26)
where Bljk is symmetric in j and k.
To enable comparisons to be made, the NewtonRaphson algorithm will be considered first.
(21)
from which
x(k+\)
+ ^fiJ
and
./ ( w ]
Now
Equation (23) represents n sets of linear equations in
n + 1 variables for the n2 + n unknown elements of
/ and L.
Taking the /th row of equation (23)
so
/,,*</> - / ? > = 8ijX<P
- *<
= B
7 - ' = l + O(x)
xf = BurfW
+ O(x3). .
(27)
This is a well-known result indicating second-order
convergence.
If x is some linear measure of the vector x, then
heuristically
(24)
(28)
where F, is the /th row of matrix F.
68
Secant method for non-linear equations
We wish to find the dominant root of this equation.
Equation (35) has one real root > 1 and, if n is odd,
one other real root between — 1 and 0. It has no other
real roots.
The roots of the equation are the same as the eigenvalues of the companion matrix:
10. Convergence of secant method
To simplify the notation in this section we will consider x ( n + 2 ) . The general result follows at once.
x<"+2> = - / - ' L
Now
(22)
where L is given by
0
1 0
1 0
1
X
(25)
/
Applying
// =
x
these
to equations
formulae
(26) gives
i + Bijkxjxk-
Suppose that successive iterations give much closer
approximations to the root. Then
> |*<2)| > . . . > |x("+')|.
I* (0
X
Expanding
Expand
X
F,
term is Bijkx^
So
by the bottom row , the dominant term
1
is |x<»\ x<2>, .
]
0
0
1 0 0
1 1
This is a non-negative reducible matrix. Hence by
the weak form of a theorem of Frobenius (Gantmacher
(1959), Varga (1962)), it has a non-negative real eigenvalue equal to its spectral radius. It follows that the
only positive real root of equation (35) is equal to this
eigenvalue, and hence no other root of equation (35) has
larger modulus. It remains to prove that no other
root of the equation has the same modulus.
It is obvious that the positive real root is a simple
root. Let it be r.
1
(36)
Then
r- 1
by the bottom row, then the dominant
}
lx (2) x ( 3 )
r0)ra)\
L, == E
x(n+1)l
• •.x<"
xm
>xO)>
+1
>|
"1
(29)
+ higher terms.
As before J~l = 1 + O(x), and if x is again some
linear measure of the error we obtain
x
~ Bx x"
.
(30)
Suppose re'9 is also a root.
Then
11. Comparison of convergence rates
The convergence rates have been shown to be approximately given by
B{x(i))2
xit+i) =
x<.i+n+i)
Newton-Raphson
— £xu)x(n+o
Secant method
so
(31)
(32)
Newton-Raphson (33)
Secant method
- f - 1 = 0.
from (36).
re'« — 1
r— 1
vi+n+i
(34)
= 1
=v,+vn+i
is hence dominated by vt = vot' and the reduction factor
per iteration is t. A similar result has previously been
obtained by Tornheim (1964) as an example of a multipoint iterative method.
Hence the ratio of the number of function evaluations
required by Newton-Raphson to the number for the
secant method for a given error reduction is
Consider first the Newton-Raphson case. Each
iteration requires the evaluation of the Jacobian and so
involves at least n + 1 function evaluations. For the
purpose of comparison we will suppose only n + 1 are
in fact required. From equation (33) the reduction
factor is seen to be 2 1 / n + 1 per function evaluation.
Consider now the difference equation (34) describing
the secant method. Its characteristic equation is
f+l
g/i/8
which is impossible unless e'9 = 1. Hence there are
no roots of modulus r other than the real positive root.
The positive real root of (35) is therefore dominant.
We will henceforth denote this root by t.
The solution of the difference equation (34)
= log (Bx<-») then
vi+l = 2v,
« , + n + 1 = V; + vn+i
from (35),
Taking moduli
where x ( ( ) is some linear measure of the error of the
ith iterate.
Put
v.
rV'oCe' 9 — 1) = 1
_ (n + 1) log ,
Ioi2
(35)
69
(37)
Secant method for non-linear equations
The secant method may be said to be better by a
factor Rn. Some values of Rn and t are shown in
Table 1
Rates of convergence
Table 1.
12. Experimental results
To test the above theoretical prediction, the equations
f, = Xf + BiJkXjXk = 0
were solved by both Newton-Raphson (the derivatives
being available analytically) and the secant method,
using a Mercury digital computer.
The coefficients BiJk were generated as random
variables from a rectangular distribution (\BiJk\ < Bo
say). In each run the starting point was taken at
random on the unit sphere. By varying Bo the effective
degree of non-linearity is altered.
Two minor difficulties arise. First, the length of the
mantissa of the floating-point representation used (on
Mercury) was 29 bits, and so cancellation errors precluded an improvement of more than about 8 decimal
digits per iteration. With the rapid convergence of the
two methods here under comparison, many successive
valid iterations could not therefore be obtained.
Secondly, it was difficult to know what initial value of
Jacobian to give the modified secant method, and how
to penalize it for having such knowledge.
To obtain results of reasonable variance in the face
of the first difficulty several runs were carried out for
each n and Bo. Each run was terminated when the
ratio of successive values of f2 = fj-, (the accuracy
measure) exceeded 1014 using Newton-Raphson. (In
each case at the correponding level with the secant
method the ratio of successive values of f1 was less
than 1014.) The iteration number of the last allowed
iteration on Newton-Raphson was recorded, and the
corresponding number of iterations to obtain the same
degree of accuracy with the secant method was found by
interpolation between the two iterations around that
accuracy. The interpolation was carried out on the
logarithms of the successive values of/2. Linear interpolation was discarded since it would have given rise to
consistently low values for the iteration. Interpolation
was actually carried out on the assumption that the
successive values of these logarithms fitted a curve of
the form log (/ 2 ) = at' + j8 where t is given by Table 1.
This is, of course, a result predicted by the above
theory.
Two series of runs were carried out with the secant
method. One was with the initial Jacobian exact (at
the starting point) so that the first iteration is the same
as Newton-Raphson. The second was with the Jacobian
equal to the unit matrix (which is, of course, the correct
value at the solution).
In comparing the number of function evaluations
required, each Newton-Raphson iteration is scored as
n + 1 evaluations.
The following scores were evaluated for each run.
n
t
1
2
3
4
5
6
7
8
9
10
20
50
100
1-618
1-466
1-380
1-325
1-285
1-255
1-232
1-213
1-197
1-184
1-114
1-058
1034
Rn
1-388
1-654
1-860
2-028
2-172
2-297
2-409
2-509
2-600
2-684
3-283
4-179
4-914
(1) Initial /exact: Since thefirststep is the same in each
case, the count was from the end of that step. This
score would be expected to be in favour of the
secant method since the initial J is good.
(2) Initial J exact: As an alternative to the above
the iterations were counted from the beginning,
but a penalty of n function evaluations was added
to the score of the secant method to compensate
for the exact /. This penalty is obviously too
heavy, and the score is therefore in favour of
Newton-Raphson.
(3) Initial J unit: The initial J is equal to the correct
final value; this sort of situation may well arise in
practice. Iterations were both counted from the
start. This score is in favour of the secant method
especially for large n since / is not altered substantially before the solution is nearly reached.
The equations were solved with n = 2(1)7 and Bo = 0 • 01
and 0-1. Five runs were carried out for each pair of
values of n and Bo. The relative behaviour of the two
methods was found to be essentially independent of Bo
although obviously more iterations were required for the
higher value. The runs are therefore grouped only
according to the value of n. The total scores over all runs
for each value of n were accumulated, and the corresponding estimates of Rn were evaluated and are shown in
Tables 2 and 3. Comparison with the predicted value
of Rn shows good agreement in every case in view of the
comments about the bias of the scores. Scores 1 and 2
straddle the predicted value in every case. Score 3
shows how rapid the secant method is under favourable
conditions.
13. Stability
Bittner (1959) has shown that the secant method as
defined by equations (7) and (20) has the following
property.
70
Secant method for non-linear equations
Table 2
Mean number of iterations required
SECANT
INITIAL JACOBIAN
DIMENSION
NEWTONRAPHSON
(a) EXACT
(t>) UNIT
2
3
4
5
6
7
2-5
2-5
2-7
3-0
2-8
3-4
3-72
4-10
4-82
6-12
5-95
8-36
3-47
3-68
4-14
5-40
4-61
and
i = 2, . . . m.
Then the vectors eu . . . em are an orthonormal basis
of the space spanned by the set ?i> • • • ?mIf we now put m = min («, k) and
607
in the above we obtain em = z(k), and, writing Ck for C,
(to distinguish iterations),
Table 3
Comparison of scores with predicted value
1
""
|z<»||Sxt»|
SCORE
DIMENSION
so that condition (39) may be written
PREDICTED
l
2
3
1-654
1-860
2-028
2-172
2-297
2-409
1-656
1-938
2-222
2-343
2-544
2-608
1-312
1-409
1-530
1-618
1-640
1-771
2-163
2-718
3-264
3-336
4-250
4-479
(40)
|C*| > />.
2
3
4
5
6
7
In particular, if k > n so that m = n
A
pkrk
fk
(41)
Ck = 1 for all k
(42)
See Todd (1962).
Note that
\c}+>\
and
Denote the determinant
Sx<«
|5x(*—+i)|' |8x«-"+ 2 >i' ' ' ' |Sx<»|
the latter being easily seen by considering the geometric
properties of the system.
by A,
Then for k > n we have
Then given w such that 0 < w < 1 and provided that
| A* | > wfor allfc> n
(38)
there exists a neighbourhood of the solution within which
convergence is assured.
The condition |Afc| > w is the sort of expedient that
might be thought necessary for the reliable computation
of 7 ( A + 1 ) from equation (20).
This sort of condition is not obviously required for
the algorithm as expressed by equation (7) et seq. It
might, however, be thought necessary to impose a condition of the form
|(z<W)r8x<»|
(43)
then
(i) If |A,| > w
|C* . . . Ck\ > w
and so
|C*| > w.
(ii) If \Ck\> p
then
for all k < fc0, say,
|A t | = \C\ . . . Ck\
~> \Ck-" + 2
SO
C k\
hv ^ 4 ^
|%| >p"~'.
It has thus been shown that conditions (38) and (39),
although not identical, nevertheless are related. Consequently either of the two tests may be used to ensure
convergence under suitable conditions.
As a practical consequence of the above, one or both
of the tests are applied at each iteration and the proposed
step Sx(k) rejected if the test fails. A satisfactory alternative is to set 8x(Ar) parallel to z w (in which case the
tests will evidently be satisfied). The magnitude of the
alternative step is still arbitrary; a suitable value might
be that of the rejected vector.
(39)
to ensure the reliable computation of (9).
It will now be shown that conditions (38) and (39) are
related by considering the orthogonalization process used
to determine the z w .
Suppose that 5i, • . . ? m are a set of m < n unit vectors.
Define the vectors eu . . . em and the scalars Cu . . . Cm
by the following equations.
71
Secant method for non-linear equations
A more general method, which has a much larger
domain of convergence, may be formed by imposing a
success criterion. The usual criterion employed is the
minimization of/2.
It is ensured that each step gives rise to an improvement (i.e. reduces/2) by multiplying the step by a suitable
scalar in those cases where the direct application of the
algorithm does not give rise to an improvement. The
imposition of such a criterion ensures convergence over
a large domain but does not impair the final convergence
rate.
A generalization of the algorithm as defined by equations (7) et seq. is now needed for the case in which &x
is not prescribed by equation (7).
An argument similar to that of Section 3 leads to the
replacement of (9) by
(z«))r6*<'>
(44)
which reduces to (9) in the usual case, In practice the
use of (44) for the calculation of
in every case is
recommended.
The values of C* and Ak were monitored for all the
experiments of Section 12. From these it would seem
that the simplest procedure likely to give consistent
results is to test C* only, and reject the step if |C*| < p0.
p0 might be 10~4. Larger values of p0 may delay convergence considerably.
Acknowledgements
The author wishes to express his thanks to Imperial
Chemical Industries Limited for permission to publish
this paper, to his colleagues Mr. I. Gray and Dr. H. H.
Robertson for their constant advice and encouragement,
and to the referee for his constructive criticisms.
References
BITTNER, L. (1959). "Eine Verallgemeinerung des Sekantenverfahrens (regula falsi) zur naherungsweisen Berechnung der
Nullstellen eines nichtlinearen Gleichungssystems," Wissen. Zeit. der Technischen Hochschule Dresden, Vol. 9, p. 325.
GANTMACHER, F. R. (1959). Applications of the Theory of Matrices, New York: Interscience Publishers Inc.
TODD, J. (1962) (Ed.). A Survey of Numerical Analysis, New York: McGraw-Hill Book Co.
TORNHEIM, L. (1964). "Convergence of Multipoint Iterative Methods," / . Assoc. Comp. Mach., Vol. 11, p. 210.
VARGA, R. S. (1962). Matrix Iterative Analysis, London: Prentice-Hall International.
WOLFE, P. (1959). "The Secant Method for Simultaneous Non-linear Equations," Comm. Assoc. Comp. Mach., Vol. 2, p. 12.
contradiction shows that either the function T does
not exist or that P is not a program". Since the nonexistence of T itself implies that P is not a program,
the most that can be concluded is that in any event P
is not a program.
To the Editor,
The Computer Journal.
"An impossible program"
Dear Sir,
I do not know whose leg Mr. Strachey is pulling (this Journal,
January 1965, p. 313); but if each letter in refutation of his
proof adds to some private tally for his amusement, then I
am happy to amuse him. May I offer three independent
refutations?
I am, of course, being careful not to claim that Mr. Strachey's
initial assertion (that it is impossible to write a program
which can examine any other program and tell, in every case,
if it will terminate or get into a closed loop when it is run) is
false. But what is manifest is that his proof of the far stronger
assertion (that T[R] does not exist) is invalid: both in its final
step (see (iii) above) and in its assumption that a set of statements in CPL—or any other language—necessarily constitutes a program. (If anybody doubts my counter assertion
that P is not a program, let him try compiling P in—any—
machine language!)
Yours faithfully,
(i) He defines a function T[R]. Any subsequent "proof"
that T cannot exist is then idle; the function exists by
definition.
(ii) If T does not exist, then P does not exist, since T is
essentially involved in the statement of P. So P is
not a program. So P is not an acceptable argument
forT.
(iii) If one accepts Mr. Strachey's reasoning up to the
point "In each case T[P] has exactly the wrong value",
the appropriate deduction is not "this contradiction
shows that the function T cannot exist" but "this
H. G. APSIMON.
22 Stafford Court,
London, W.8.
18 February 1965.
72