Download MT5241 Computational Fluid Dynamics I

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Signal-flow graph wikipedia , lookup

Compressed sensing wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Linear algebra wikipedia , lookup

Quartic function wikipedia , lookup

Elementary algebra wikipedia , lookup

Matrix calculus wikipedia , lookup

Horner's method wikipedia , lookup

History of algebra wikipedia , lookup

System of polynomial equations wikipedia , lookup

Equation wikipedia , lookup

System of linear equations wikipedia , lookup

Transcript
MT5241 Computational Fluid Dynamics I
J.S.B. Gajjar
September 2004
1
Introduction
In this course we will be studying computational fluid dynamics or CFD. CFD
is concerned with the the study of fluid flow problems using computational techniques, as opposed to analytical or experimental methods. The modelling of fluid
flow leads to the solution of unsteady, nonlinear, partial differential equations,
and there are only a handful of known ‘exact’ analytical solution. Even in the
simplest of geometries the solution of these equations is quite challenging. In this
CFD course we will aim to study the different techniques which are used to solve
the different types of equations which arise. As we will see, to be good at CFD,
one has to be fluent in many different subject areas such as numerical analysis,
programming, fluid mechanics. This introductory course will concentrate mainly
on finite-difference methods. In CFD II you will meet the popular finite-volume
method. In the finite-difference method, the solution is sought on a grid of points,
and we deal with function values at these nodal points. In a finite-element environment on the other hand, the regions is split into various sub-regions, and the
solution in these sub-regions is approximated by basis functions.
1.1
Errors
To start with we need to discuss errors. Typically numerical computation of any
sort will generate errors and it is one of the tasks of the CFD practitioner to be
able to assess and be aware of the errors arising from numerical computation.
The different types of errors can be categorised into the following main types:
• Roundoff errors.
• Errors in modelling.
• Programming errors.
• Truncation and discretization errors.
Roundoff errors.
1
These arise when a computer is used for doing numerical calculations. Some
√
typical examples include the inexact representation of numbers such as π, 2.
Roundoff and chopping errors arise from the way numbers are stored on the
computer, and in the way arithmetic operations are carried out. Whereas most
of the time the way numbers are stored on a computer is not under our control,
the way certain expressions are computed definitely is. A simple example will
illustrate the point. Consider the computation of the roots of a quadratic equation
ax2 + bx + c = 0 with the expressions
√
√
−b + b2 − 4ac
−b − b2 − 4ac
x1 =
, x2 =
.
2a
2a
Let us take a = c = 1, b = −28. Then
√
√
x1 = 14 + 195, x2 = 14 − 195.
√
Now to 5 significant figures we have 195 = 13.964. Hence x1 = 27.964 x2 =
0.036. So what can we say about the error. We need some way to quantify errors.
There are two useful measures for this.
Absolute error
Suppose that φ∗ is an approximation to a quantity φ. Then the absolute error
is defined by |φ∗ − φ|.
Relative error
Another measure is the relative error and this is defined by |φ∗ − φ|/|φ| if
φ 6= 0.
For our example above we see that |x∗1 − x1 | = 2.4 ∗ 10−4 and |x∗2 − x2 | =
|x∗ −x |
2.4 ∗ 10−4 which look small. On the other hand the relative errors are 1|x1 | 1 =
|x∗ −x |
8.6 ∗ 10−6 and 2|x2 | 2 = 6.7 ∗ 10−3 . Thus the accuracy in computing x2 is far less
than in computing x1 . On the other hand if we compute x2 via
x2 = 14 −
√
195 =
14 +
1
√
195
we obtain x2 = 0.03576 with a much smaller absolute error of 3.4 ∗ 10−7 and a
relative error of 9.6 ∗ 10−6 .
Errors in numerical modelling.
These arise for example when the equations being solved are not the proper
equations for the flow problem in question. An example is using the Euler equations for calculating the solution of flows where viscosity effects are important.
No matter how accurately the solution has been computed, it may not be close
to the real physical solution because viscous effects have been neglected throught
the computation.
Programming errors, and bugs.
These are errors entirely under the control of the programmer. To eliminate
these requires careful testing of the code and logic, as well as comparison with
2
previous work. Even then, for your problem for which there may not be previous work to compare with, one has to do numerous self-consistency checks with
further analysis as necessary.
Truncation and discretization errors.
These errors arise when we take the continuum model and replace it with a
discrete approximation. For example suppose we wish to solve
d2 U
= f (x).
dx2
Using Taylor series we can apoproximate the second derivative term by
u(xi+1 ) − 2u(xi ) + u(xi−1 )
h2
where we have taken a uniform grid with spacing h say and node points xi . As far
the approximation of the equation is concerned we will have a truncation error
given by
d2 U (x1 )
h2 d4 U (xi )
τ (xi ) =
−
f
(x
)
=
+ ....
i
dx2
12 dx4
Even though the discrete equations may be solved to high accuracy, there will be
still an error of O(h2 ) arising from the discretization of the equations. Of course
with more points, we would expect the error to diminish.
3
2
Initial value problems
Here we will look at the solution of ordinary differential equations of the type,
say
dy
= f (x, y), a ≤ x ≤ b,
dx
subject to an initial condition
y(a) = α
(1)
(2)
The methods that we will use generalise readily to systems of differential equations of the type
dY
= F(x, Y), a ≤ x ≤ b,
(3)
dx
where
Y = (y1 (x), y2 (x), ..., yN (x))T ,
F = (f1 (x, Y), f2 (x, Y), ..., fN (x, Y))T ,
with initial data
Y(a) = α,
(4)
say, where α = (α1 , α2 , ..., αN )T .
One question which arises immediately is, what about second and third order
differential equations? Consider for example
1
g 000 + gg 00 + (1 − g 02 ) = 0,
2
g(0) = g 0 (0) = g 0 (∞) − 1 = 0,
(5)
which arises in fluid applications. The answer to this is that if we take any
differential equation, we can always write it as a system of first order equations.
Example
In (5) let
Y1 = g(x), Y2 = g 0 (x), Y3 = g 00 (x),
then (5) becomes


Y2
dY 

Y3
=
 = F.
dx
1
2
−Y1 Y3 − 2 (1 − Y1 )
(6)
The boundary conditions in (5) are not of the form as in (4) because we have
conditions given at two points x = 0 and x = ∞. We will see how to deal with
this later.
Consider equation (1). There are various mathematical results that we should
be aware of, although we will not go too deeply into the implications or theory
behind some of these.
Suppose we define D to be the domain
D = {(x, y) | a ≤ x ≤ b,
4
−∞ < y < ∞}
and f (x, y) is continuous on D. If f (x, y) satisfies a Lipschitz condition on D
then (1) has a unique solution for a ≤ x ≤ b. Recall f (x, y) satisfies a Lipschitz
condition on D means that there exists a constant L > 0 (called the Lipschitz
constant) such that
|f (x1 , y1 ) − f (x2 , y2 )| ≤ L|y1 − y2 |
whenever (x1 , y1 ), (x2 , y2 ) belong to D.
2.1
Euler’s Method
This is the simplest of techniques for the numerical solution of (1). It is good for
proving various results, but it is rarely used in practice because of far superior
methods.
For simplicity define an equally spaced mesh
Sfrag replacements
xj = a + jh,
j = 0, .., N
where h = (b − a)/N is called the step size.
h
a = x0
x1
x2
xi
xi+1
xN = b
We can derive Euler’s method as follows. Suppose y(x) is the unique solution
to (1), and twice differentiable. Then by Taylor’s theorem we have
h2 00
y(xi+1 ) = y(xi + h) = y(xi ) + y (xi )h + y (ξ)
2
0
(7)
where xi ≤ ξ ≤ xi+1 . But from the differential equation y 0 (xi ) = f (xi ), and
yi = y(xi ). This suggests the scheme
w0 = α
wi+1 = wi + hf (xi , wi ),
i = 1, 2, .., N − 1,
(8)
for calculating the wi which will be our approximate solution to the equation.
This is called Euler’s method.
2.1.1
Truncation error for Euler’s method
Suppose that yi = y(xi ) is the exact solution (1) at x = xi . Then the truncation
error is defined by
yi+1 − (yi + hf (xi , yi ))
h
yi+1 − yi
=
− f (xi , yi ),
h
τi+1 (h) =
5
(9)
for i = 0, 1, ..., N − 1. From earlier (7) we find that
τi+1 (h) =
h 00
y (ξi )
2
for some ξi in (xi , xi+1 ). So if y 00 (x) is bounded by a constant M in (a, b) then
|τi+1 (h)| ≤
h
M
2
and thus we see that the truncation error for Euler’s method is O(h). In general
if τi+1 = hp we say that the method is of order hp . In principle if h decreases, we
should be able to achieve greater accuracy, although in practice roundoff error
limits the smallest size of h that we can take.
2.2
Higher order methods
2.2.1
Modified Euler
The modified Euler method is given by
w0 = α
k1 = hf (xi , wi ),
h
wi+1 = wi + [f (xi , wi ) + f (xi+1 , wi+1 )],
2
i = 1, 2, .., N − 1,
This has truncation error O(h2 ). Sometimes this is also called a Runge-Kutta
method of order 2.
2.2.2
Runge-Kutta method of order 4
One of the most common Runge-Kutta methods of order 4 is given by
w0 = α,
k1 = hf (xi , wi ),
1
h
k2 = hf (xi + , wi + k1 )
2
2
h
1
k3 = hf (xi + , wi + k2 )
2
2
k4 = hf (xi+1 , wi + k3 )
1
wi+1 = wi + (k1 + 2k2 + 2k3 + k4 )
6
(10)
for i = 0, 1, .., N − 1.
Note that this requires 4 function evaluations per step.
All these methods generalise to a system of first order equations of the form
(3). Thus for instance the RK(4) method (10) above becomes
6
w0 = α,
k1 = hf (xi , wi ),
1
h
k2 = hf (xi + , wi + k1 )
2
2
h
1
k3 = hf (xi + , wi + k2 )
2
2
k4 = hf (xi+1 , wi + k3 )
1
wi+1 = wi + (k1 + 2k2 + 2k3 + k4 )
6
for i = 0, 1, .., N − 1.
In the above we have taken the step size to be constant. But there is no
reason to do this and the step size can be adjusted dyanmically. This forms the
basis of the so called Kutta-Fehlberg methods.
2.3
m-step multi-step method
The methods discussed above are called one-step methods. Methods that use the
approximate values at more than one previous mesh point are called multi-step
methods. There are two distinct types worth mentioning.
These are of the form
wi+1 = cm−1 wi + cm−2 wi−1 + ... + c0 wi+1−m
+h[bm f (xi+1 , wi+1 ) + bm−1 f (xi , wi ) + ... + b0 f (xi+1−m , wi+1−m )] (11)
aIf bm = 0 (so there is no term with wi+1 on the right hand side of (11), the
method is explicit.
If bm 6= 0 we have an implicit method.
2.3.1
Adams-Bashforth 4th order method (explicit)
Here we have
w0 = α,
w 1 = α1 ,
w2 = α 2 ,
w3 = α 3 ,
where these values are obtained using other methods such as RK(4) for instance.
Then for i = 3, 4, ..., N − 1 we use
wi+1 = wi +
h
[55f (xi , wi ) − 59f (xi−1 , wi−1 ) + 37f (xi−2 , wi−2 ) − 9f (xi−3 , wi−3 )].
24
aa
7
2.3.2
Adams-Moulton 4th order method (implicit)
Here we have
w0 = α,
w 1 = α1 ,
w2 = α 2 ,
where these values are obtained using by other methods such as RK(4) for instance. Then
h
[9f (xi+1,wi +1 ) + 19f (xi , wi )
24
−5f (xi−1 , wi−1 ) + f (xi−2 , wi−2 )], for i = 2, 4, ..., N − 1. (12)
wi+1 = wi +
Notice the presence of the term containing wi+1 on the right hand side of (12).
The Adams-Bashforth, Adams-Moulton methods can be derived by integrating the differential equations as
dy
= f (x, y),
dx
y(xi+1 ) − y(xi ) =
Z
xi+1
xi
f (s, y) ds,
(13)
and replacing f (x, y) in the integral by an interpolating polynomial passing
through previous points.
The implicit methods are generally more stable, and have less roundoff errors
than the explicit methods. But here we need to bring in the ideas of stability of
a method.
2.3.3
Predictor-corrector methods
In practice the multi-step methods are rarely used as described above. Instead
they are frequently used as predictor-corrector methods. First we predict a value
using an explicit method, and correct it using an implicit method.
4th order predictor-corrector
h
[55f (xi , wi ) − 59f (xi−1 , wi−1 ) + 37f (xi−2 , wi−2 ) − 9f (xi−3 , wi−3 )],
24
(14)
h
(k)
(k−1)
wi+1 = wi + [9f (xi+1 , wi+1 +19f (xi , wi )−9f (xi−1 , wi−1 )+f (xi−2 , wi−2 )]. (15)
24
Here (14) is used to predict a value for wi+1 and (15) used to iteratively correct.
(0)
wi+1 = wi +
2.4
Improving accuracy with extrapolation
Suppose that we use a method with truncation error of O(hm ) to compute an
approximation wi . We can use Richardson extrapolation to get an approximation
(1)
with greater accuracy. First generate an approximation wi say with step size h
(2)
and an approximation wi with step size 2h. Then we can write
(1)
wi
= yi + Ehm + E1 hm+1 + . . . ,
8
and
(2)
wi
= yi + E(2h)m + E1 (2h)m+1 + . . . .
Then we can eliminate the E term to get
(1)
(2)
2 m wi − w i
Thus
= (2m − 1)yi + O(hm+1 ).
(1)
wi∗
(2)
2 m wi − w i
=
2m − 1
(1)
(2)
is a more accurate approximationto the solution than wi or wi .
For a 4th order Runge-Kutta method the above gives
(1)
wi∗ =
(2)
16wi − wi
.
15
9
3
Stability
We have introduced a number of different methods. How do we select which
one to use? In practice most methods should work reasonably well on standard
problems. However certain types of problems (stiff problems) can cause difficulty
and care needs to be exercised in the choice of the method. Typically boundary
layer type problems (say with a small parameter multiplying a highest derivative)
are examples of stiff problems. This is where one needs to be aware of the stability
properties of a particular method.
3.1
Consistent
A method is said to be consistent if the local truncation error tends to zero as
the step size → 0, i.e
lim max |τi (h)| = 0.
h→0
3.2
i
Convergence
A method is said to be convergent with respect to the equation it approximates
if
lim max |wi − y(xi )| = 0,
h→0
i
where y(x) is the exact solution and wi an approximation produced by the
method.
3.3
Stability
If we consider an m-step method
w0 = α0 , w1 = α1 , . . . , wm−1 = αm−1 ,
wi+1 = am−1 wi + am−2 wi−1 + ... + a0 wi+1−m
+h[F (xi , wi+1 , wi , ..., wi+1−m )]
(16)
then ignoring the F term the homogenous part is just a difference equation. The
stability is thus connected with the stability of this difference equation and hence
the roots of the characteristic polynomial
λm − am−1 λm−1 − ... − a1 λ − a0 = 0.
Why? Well consider (1) with f (x, y) = 0. This has the solution y(x) = α.
Thus the difference equation has to produce the same solution, ie wn = α. Now
consider
wi+1 = am−1 wi + am−2 wi−1 + ... + a0 wi+1−m .
(17)
10
If we look for solutions of the form wn = λn then this gives the characteristic
polynomial equation
λm − am−1 λm−1 − ... − a1 λ − a0 = 0.
(18)
Suppose λ1 , ...λm are distinct roots of (18). Then we can write
wn =
m
X
c i λm
i .
(19)
i=1
Since wn = α is a solution, the difference equation (17) gives
α − am−1 α − ... − a0 α = 0,
or
α(1 − am−1 − ... − a0 ) = 0.
This shows that λ = 1 is a root, and we may take c1 = α in (19). Thus (19) may
be written as
m
wn = α +
X
ci λni .
(20)
i=2
In the absence of round-off error all the ci in (20) would be zero. If |λi | ≤ 1 then
the error due to roundoff will not grow. The argument used above generalises
to multiple roots λi of the characteristic polynomial. Thus if the roots λi , (i =
1, .., m) of the characteristic polynomial satisfy |λi | ≤ 1, then the method is
stable.
Example
Consider the 4th order Adams-Bashforth method
wi+1 = wi +
h
[55f (xi , wi ) − 59f (xi−1 , wi−1 ) + 37f (xi−2 , wi−2 ) − 9f (xi−3 , wi−3 )].
24
This has the characteristic equation
λ4 − λ3 = 0,
giving the roots as λ = 1, 0, 0, 0. Thus this is stable according to our definition
of stability above.
It can be proven that if the difference method is consistent with
the differential equation, then the method is stable if and only if the
method is convergent.
Is it enough just to have stability as defined above? Consider the solution of
dy
= −30y,
dx
y(0) = 1/3.
The RK(4) method, although stable, has difficulty in computing the accurate
solution of this problem. This means that we need something more than just the
idea of stability defined above.
11
3.4
Absolute stability
Consider
dy
= ky, y(0) = α, k < 0.
(21)
dx
The exact solution of this is y(x) = αekx . If we take our one-step method and
apply it to this equation we obtain
wi+1 = Q(hk)wi .
Similarly for a multi-step of the type used earlier (11), when applied to the test
equation (21) gives rise to the difference equation
wi+1 = am−1 wi + am−2 wi−1 + ... + a0 wi+1−m
+h[bm kwi+1 + bm−1 kwi + ... + b0 kwi+1−m ].
(22)
Thus if we seek solutions of the form wi = z i this will give rise to the characteristic
polynomial equation
Q(z, hk) = 0,
where
Q(z, hk) = (1 − hkbm )z m − (am−1 + hkbm−1 )z m−1 − ... − (a0 + hkb0 ).
The region R of absolute stability for a one-step method is defined as the
region in the complex plane R= {hk ∈ C, |Q(hk)| < 1}, and for a multi-step
method R = {hk ∈ C, |βj | < 1}, where βj is a root of Q(z, hk) = 0.
A numerical method is A-stable if R contains the entire left half plane.
In practice the above conditions place a limit on the size of the step size which
we can use.
Consider the modified Euler method
w0 = α
k1 = hf (xi , wi ),
h
wi+1 = wi + [f (xi , wi ) + f (xi+1 , wi+1 )],
2
This is an A-stable method.
12
i = 1, 2, .., N − 1.
4
4.1
Boundary Value Problems
Shooting Methods
Consider the differential equation
d2 y
dy
+k
+ xy = 0,
2
dx
dx
y(0) = 0,
y(1) = 1.
(23)
This is an example of a boundary value problem because conditions have to
satisfied at both ends. If we write this as a system of first order equations we
have
Y1 = y,
dy
,
Y2 =
dx
dY1
= Y2 ,
dx
dY2
= −kY2 − xY1 .
dx
(24)
The boundary conditions give
Y1 (0) = 0,
Y1 (1) = 1.
We do not know the value of Y2 (0). If we knew this value then we could use our
standard integrator to obtain a solution. So how do we handle this situation.
Suppose we guess the value of Y2 (0) = g, say. Then we can integrate (24) with
the initial condition
!
!
Y1 (0)
0
Y(0) =
=
,
Y2 (0)
g
using any integrator eg RK(4). This will give us
Y(1) =
Y1 (1)
Y2 (1)
!
=
β1
β2
!
,
where, because we guessed the value g = Y2 (0), β1 will not necessarily satisfy the
required condition Y1 (1) = 1, in (23). So now we need to go through some kind
of iterative process to try and get the correct value of g such that the required
condition at x = 1 is satisfied.
To do this define
φ(g) = Y1 (1; g) − 1,
(25)
where the semi-colon notation is used to indicate an additional parametric dependence on g. We want to find the value of g such that φ(g) = 0. This gives rise
to the idea of a shooting method. We adjust the value of g to force φ(g) = 0. We
can also think of this as finding the root of the equation φ(g) = 0.
13
4.2
How do we adjust the value of g?
To solve φ(g) = 0 we can use any of the root finding methods we know, eg secant
method, Newton’s method, bisection, etc.
4.3
Newton’s Method, Secant Method
Suppose that we have a guess g̃ and we seek a correction dg such that φ(g̃ + dg) =
0. By Taylor expansion we have
φ(g̃ + dg) = φ(g̃) +
dφ
(g̃)dg + O(dg 2).
dg
This suggests that we take
dg = −
φ(g̃)
,
φ0 (g̃)
and hence a new value for g is g + dg. Thus this gives rise to a sequence of
iterations for n = 0, 1, ...,
φ(gn )
,
(26)
gn+1 = gn − 0
φ (gn )
where we start the iteration with a suitable guess g0 . Provided that g0 is close to to
the true value, Newton’s method converges very fast, ie quadratically. Difficulties
with convergence arise when φ0 (g) is close to zero.
Now in (26) we are required to find φ0 (gn ). How can we do this? One way is
to estimate φ0 (gn ) by
φ(gn ) − φ(gn−1 )
.
φ0 (gn ) =
gn − gn−1
This gives
gn+1 = gn −
φ(gn )(gn − gn−1 )
,
φ(gn ) − φ(gn−1 )
(27)
which is known as the secant method. We need two starting values for g and
then we can use (27) to generate the remaining values.
Consider (24) again
dY1
= Y2 ,
dx
dY2
= −kY2 − xY1 ,
dx
with Y(0) = (0, g)T . Now from (25)
φ0 (g) =
∂Y1
(1; g),
∂g
14
so this suggests differentiating the original system of equations and boundary
conditions with respect to g to get
d
dx
∂Y
∂g
!
∂Y2
∂g
=
1
2
− x ∂Y
−k ∂Y
∂g
∂g
!
,
!
∂Y
(x = 0) =
∂g
0
1
!
.
(28)
The system (28) defines another initial value problem with given initial conditions.
Thus the procedure is as follows. First integrate (24) with initial conditions
Y(0) = (0, g)T and with a suitable initial guess g. Then, either simultaneously
at each step, or after a step, solve (28). After integrating over all the steps we
(x = 1) from (28). Note
will have at x = 1 the values Y(x = 1) from (24) and ∂Y
∂g
that from (25)
∂Y1
(1; g).
φ0 (g) =
∂g
1
From the solution of (28) we can extract ∂Y
(x = 1) and hence substitute into (26)
∂g
to update g. This forms the basis of Newton’s method combined with shooting,
to solve boundary value problems.
It is clear how one can solve (28) after solving (24) at each step. This can also
be done simultaneously by working with an augmented system where we define
Y1 = y,
Y2 =
dy
,
dx
Y3 =
∂y
∂Y1
=
,
∂g
∂g
Y4 =
∂Y2
,
∂g
and then

d 


dx 
4.4
Y1
Y2
Y3
Y4









=
Y2
−kY2 − xY1
Y4
−kY4 − xY3




,




Y(0) = 
Y1 (0)
Y2 (0)
Y3 (0)
Y4 (0)









=
0
g
0
1



.

(29)
Multiple end conditions
What we have discussed so far works fine if we just have one condition to satisfy.
The same procedure works also for cases where we have two or more conditions
to satisfy. For example with
d4 y
dy
= y 3 − ( )2 ,
4
dx
dx
y(0) = 1,
dy
(0) = 0,
dx
y(1) = 2,
dy
(1) = 1,
dx
we would need two starting values at x = 0 and then we will have two conditions
to satisfy at x = 1. The corrections can be calculated in a similar way. Thus in
the example above, suppose we define
Y1 = y, Y2 = y 0 , Y3 = y 00 , y4 = y 000 ,
15
then we will need to use guesses for Y3 (0) = e, say, and Y4 (0) = g. The conditions
we need to satisfy can be written as
φ1 (e, g) = φ2 (e, g) = 0,
where
φ1 (e, g) = Y1 (1; e; g) − 2,
φ2 (e, g) = Y2 (1; e; g) − 1.
To obtain the corrections to guessed values ẽ, g̃ we have
∂φ1
∂φ1
(ẽ, g̃) + dg
(ẽ, g̃) + O(de2 , dg 2 ),
∂e
∂g
∂φ2
∂φ2
φ2 (ẽ + de, g̃ + dg) = 0 = φ2 (ẽ, g̃) + de
(ẽ, g̃) + dg
(ẽ, g̃) + O(de2 , dg 2 ).
∂e
∂g
φ1 (ẽ + de, g̃ + dg) = 0 = φ1 (ẽ, g̃) + de
This can be solved to obtain the corrections (de, dg).
If we generalise to a multidimensional case where we have a vector of guesses
g̃ we can find the corrections dg as
dg = −J−1 (g̃)φ(g̃),
∂φi
and φ is the vector of conditions.
where J is the Jacobian ∂g
k
There are of course many variations on the above theme and different strategies when dealing with linear equations. For stiff systems one can also integrate
from the two ends and match in the middle. Many of these techniques are discussed extensively in the book by Keller (1976).
16
4.5
Solution of boundary value problems using finite-differences
Boundary value problems can also be tackled directly using finite-differences or
some other technique such as spectral approximation. We will look at one specific
example to illustrate the finite-difference technique.
Consider
d2 y
1
dy
= (32 + 2x3 − y ),
2
dx
8
dx
1 ≤ x ≤ 3,
y(1) = 17,
y(3) =
43
.
3
(30)
The exact solution of (30) is y(x) = x2 + (16/x).
Sfrag replacements
Let us first define a uniform grid (x0 , x1 , ..., xN ) with N + 1 points and grid
spacing h = (xN − x0 )/N, so that xj = x0 + jh, for (j = 0, 1, .., N ).
h
a = x0
x2
x3
xi
xi+1
xN = b
Let us approximate y by wi at each of the nodes x = xi . The derivatives of y
in (30) are approximated in finite-difference form as
dy
dx
d2 y
dx2
!
!
wi+1 − wi−1
+ O(h2 ),
2h
(31)
wi+1 − 2wi + wi−1
+ O(h2 ).
h2
(32)
=
x=xi
=
x=xi
These can be derived by making use of a Taylor expansion about the point x = xi .
Thus for example
dy
h2 d 2 y
h3 d 3 y
h4 d 4 y
(xi ) +
(x
)
+
(x
)
+
(xi ) + O(h5 ),
i
i
dx
2 dx2
6 dx3
24 dx4
(33)
2 2
3 3
4 4
h dy
h dy
h dy
dy
(x
)
−
(x
)
+
(xi ) + O(h5 ).
y(xi−1 ) = y(xi ) − h (xi ) +
i
i
dx
2 dx2
6 dx3
24 dx4
(34)
y(xi+1 ) = y(xi ) + h
By adding and subtracting (33),(34) and replacing y(xi ) by wi we obtain (31),(32).
Next replace y and its derivatives in (30) by the above approximations to get
wi+1 − wi−1
wi+1 − 2wi + wi−1
x3i
=
4
+
−
w
,
i
h2
4
16h
for (i = 1, 2, ...N − 1)
17
(35)
and
43
.
(36)
3
The system of equations (35),(36) is a set of nonlinear difference equations. We
have N + 1 equations for N + 1 unknowns w0 , ..., wN . At this stage one has
recourse to a number of different techniques for handling the nonlinearity. One
(k)
is to set up an iteration scheme replacing wi by wi say, and starting with a
(0)
suitable guess for wi . The nonlinear term in (35) can now be tackled in many
different ways. Thus for example we can replace it by
w0 = 17,
(k−1)
wi
or

wN =
(k)
(k)

w − wi−1 
 i+1
,
16h

(k−1)
(k−1)

w
− wi−1 
(k)
wi  i+1
,
16h
(k)
and we are then left with a linear system of equations to find the wi and the
iterations can be continued until a suitable convergence condition is satisfied.
What is a suitable convergence criterion? One condition is that the nonlinear
difference equations (35) need to be satisfied to sufficient accuracy.
4.5.1
Newton linearization
Where practicable Newton’s method offers the best means for handling nonlinear
equations because of its superior convergence properties. To use this technique
for nonlinear difference equations of the form (35), suppose that we have a guess
for the solutions wi = Wi . We seek corrections δwi such that the wi = Wi + δwi
satisfies the system (35). Substituting wi = Wi + δwi into (35) and linearizing,
ie ignoring terms of O(δwi2 ), gives
!
δwi+1 − 2δwi + δwi−1
Wi+1 − Wi−1
δwi+1 − δwi−1
− Wi
,
= Fi − δwi
2
h
16h
16h
for (i = 1, 2, ...N − 1) (37)
and
δw0 = F0 ,
δwN = FN .
(38)
where
Wi+1 − Wi−1
x3
Wi+1 − 2Wi + Wi−1
,
Fi = 4+ i −Wi
−
4
16h
h2
and
F0 = 17 − W0 ,
FN =
18
43
− WN .
3
for i = 1, ..., N −1,
The system (37),(38) is now linear and can be solved to obtain the corrections
δwi and hence we can update the Wi by Wi + δwi . This is continued until the δwi
are small enough and the difference equations are satisfied to sufficient accuracy.
Instead of solving for the corrections δwi one can also solve for the total
quantities wi by replacing δwi in (37), (38) by δwi = wi − Wi , and solving for the
wi .
The choice of the iteration scheme is dependent on a number of factors including ease of implementation, convergence properties and so on. The techniques
described above lead to the solution of a tridiagonal systems of linear equations
of the form
αi wi−1 + βi wi + γi wi+1 = δi , i = 0, 1, .., N,
(39)
where the αi , βi , γi are coefficients obtainable from the linear system and α0 =
γN = 0. For instance from (37) we have
αi =
1
Wi
−
,
2
h
16h
βi = −
γi =
Wi
1
+
,
2
h
16h
δi = F i ,
2
Wi+1 − Wi−1
+
,
2
h
16h
i = 1, 2, .., N − 1,
and
β0 = 1, γ0 = 0, δ0 = F0 ,
αN = 0, βN = 1, δN = FN .
This can be solved using any standard library routine, or Thomas’s algorithm.
4.6
Thomas’s tridiagonal algorithm
This version of a tridiagonal solver is based on Gaussian elimination. First we
create zeros below the diagonal and then once we have a triangular matrix, we
solve for the wi using back substitution. Thus the algorithm takes the form
γj−1 αj
j = 2, 3, ..., N,
βj−1
δj−1 αj
= δj −
,
j = 2, 3, ..., N,
βj−1
(δj − γj wj+1 )
δN
, wj =
,
j = N − 1, ..., 1.
=
βN
βj
βj = β j −
δj
wN
19
5
Numerical solution of pde’s
In this part of the course we will be concerned with the numerical solution of partial differential equations (pde’s), using finite-difference methods. This involves
covering the domain with a mesh of points and calculating the approximate solution of the pde at these mesh points. The techniques that we will use will depend
on the type of pde that we are dealing with, ie whether it is elliptic, parabolic,
or hyperbolic, although there are common features. We will need to learn how to
construct approximations to the given equation, how to solve the equation using
iterative methods, investigate stability and so on.
5.1
Classification
Partial differential equations can be classified as being of type elliptic, parabolic or
hyperbolic. In some cases equations can be of mixed type. There are systematic
methods for classifying an equation but for our purposes it is enough to summarize
the main results. Consider the second order pde
A
∂2φ
∂2φ
∂2φ
∂φ
∂φ
+
B
+
C
+D
+E
+ F φ + G = 0,
2
2
∂x
∂x∂y
∂y
∂x
∂y
(40)
where, in general, A, B, C, D, E, F, and G are functions of the independent variables x and y and of the dependent variables φ. The equation (40) is said to
be
• elliptic if B 2 − 4AC < 0,
• parabolic if B 2 − 4AC = 0, or
• hyperbolic if B 2 − 4AC > 0.
An example of an elliptic equation is Poisson’s equation
∂2φ ∂2φ
+ 2 = f (x, y).
∂x2
∂y
The heat equation
∂2φ
∂φ
=k 2
∂t
∂x
is of parabolic type, and the wave equation
∂2φ ∂2φ
− 2 =0
∂x2
∂y
is a hyperbolic pde. An example of a mixed type equation is the transonic small
disturbance equation given by
(K −
∂φ ∂ 2 φ ∂ 2 φ
)
+ 2 = 0.
∂x ∂x2
∂y
20
A more formal approach to classification is to consider a system of first order
partial differential equations for the unknowns U = (u1 , u2 , ..., un )T and independent variables x = (x1 , x2 , ..., xm )T . Suppose that the equations can be written
in quasi-linear form
m
X
∂U
=Q
(41)
Ak
∂xk
k=1
where the Ak are (n × n) matrices and Q is an (n × 1) column vector, and both
can depend on xk and U but not on the derivatives of U. If we seek plane wave
solutions of the homogeneous part of (41) in the form
U = Uo eix.s ,
where s = (s1 , s2 , .., sm )T , then (41) leads to the system of equations
i
"
m
X
#
Ak sk U = 0.
k=1
(42)
This will have a non-trivial solution only if the characteristic equation
det |
m
X
k=1
Ak sk | = 0,
holds. This will have at most n solutions and n vectors s, which can be regarded
as normals to the characteristic surfaces (which are the surfaces of constant phase
of the wave).
The system is hyperbolic if n real characteristics exist. If all the charcteristics
are complex, the system is elliptic. If some are real and some complex, the system
is of mixed type. If the system (42) is of rank less than n, then we have a parabolic
system.
Example
Consider the steady two-dimensional Euler equations for incompressible flow
∂u ∂v
+
= 0,
∂x ∂y
∂u
∂u
∂p
u
+v
=− ,
∂x
∂y
∂x
∂v
∂p
∂v
+v
=− .
u
∂x
∂y
∂y
Introducing the vector U = (u, v, p)T and x = (x, y)T we can express this in
matrix vector form (41) with Q = 0 and



1 0 0


A1 =  u 0 1  ,
0 u 0

0 1 0


A2 =  v 0 0  .
0 v 1
21
Introducing λ = s1 /s2 the characteristic equation gives


λ
1
0

0
λ 
det  uλ + v
 = 0,
0
uλ + v 1
which leads to λ = −v/u or λ = ±i. Thus the Euler equations are hybrid.
5.2
Consistency, convergence and Lax equivalence theorem
These ideas were introduced earlier for ordinary differntial equations but apply
equally to pdes.
Consistent A discrete approximation to a partial differential equation is said
to be consistent if in the limit of the step-size(s) going to zero, the original pde
system is recovered, ie the truncation error approaches zero.
Stability If we define the error to be the difference between the computed
solutions and the exact solution of the discrete approximation, then the scheme
is stable if the error remains uniformly bounded for successive iterations.
Convergence A scheme is stable if the solution of the discrete equations
approaches the solution of the pde in the limit that the step-sizes approach zero.
Lax’s Equivalence Theorem For a well posed initial-value problem and
a consistent discretization, stability is the necessary and sufficient condition for
convergence.
5.3
Difference formulae
Let us first consider the various approximations to derivatives that we can construct. The usual tool for this is the Taylor expansion. Suppose that we have a
grid of points with equal mesh spacing ∆x in the x− direction and equal spacing
∆y in the y− direction. Thus we can define points xi , yj by
xi = x0 + i∆x ,
yj = y0 + j∆y .
Suppose that we are trying to approximate a derivative of a function φ(x, y) at
the points xi , yj . Denote the approximate value of φ(x, y) at the point xi , yj by
wi,j say.
5.3.1
Central Differences
The first and second derivatives in x or y may be approximated as before by
∂2φ
∂x2
!
=
ij
wi+1,j − 2wi,j + wi−1,j
+ O((∆x )2 ),
2
(∆x )
22
(43)
∂2φ
∂y 2
!
=
ij
wi,j+1 − 2wi,j + wi,j−1
+ O((∆y )2 ),
2
(∆y )
∂φ
∂x
!
∂φ
∂y
=
wi+1,j − wi−1,j
+ O((∆x )2 ),
2∆x
(45)
=
wi,j+1 − wi,j−1
+ O((∆y )2 ).
2∆y
(46)
ij
!
(44)
ij
∂2φ
Note that replacing the derivative
by the approximation in (43) gives
∂x2 ij
rise to a truncation error O((∆x )2 ). The approximations listed above are centered
at the points (xi , yj ), and are called central-difference approximations.
!
•
i − 1, j
×
i, j
•
i + 1, j
Consider the approximation
∂φ
∂x
!
wi+1,j − wi,j
.
∆x
=
ij
By Taylor expansion we see that this gives rise to a truncation error of O(∆x ).
In addition this approximation is centered at the point xi+ 1 ,j .
2
•
i, j
5.3.2
×
•
i + 1, j
One-sided approximations
We can also construct one-sided approximations to derivatives. Thus for example
a second-order forward approximation to ∂φ
at the point (xi , yj ) is given by
∂x
∂φ
∂x
!
=
ij
−3wi,j + 4wi+1,j − wi+2,j
.
2∆x
Tables 1 and 2 list some of the more commonly used central and one-sided
approximations. In these tables the numbers denote the weights at the nodes,
and the whole expression is multiplied by (∆1x )j where j denote the order of the
derivative, ie j = 1 for a first derivative and j = 2 for a second derivative.
5.3.3
Mixed derivatives
The techniques for finding suitable discrete approximations for mixed derivatives
is exactly the same, except this time we would make use a multidimensional
23
Table 1: Weights for central differences
Order of
Accuracy
1st derivative
(∆x )2
(∆x )4
Node Points
i−2 i−1 i
1
12
− 21
0
1
2
− 23
0
2
3
1
-2
1
4
3
− 25
4
3
2nd derivative
(∆x )2
(∆x )4
i+1 i+2
1
− 12
1
− 12
1
− 12
Table 2: Weights for one-sided differences
Order of
Accuracy
1st derivative
(∆x )
Node Points
i
i+1 i+2
i+3 i+4
-1
1
(∆x )2
− 23
2
− 12
(∆x )3
− 11
6
3
− 32
1
3
(∆x )4
25
− 12
4
-3
4
3
2nd derivative
(∆x )
1
-2
1
(∆x )2
2
-5
4
-1
(∆x )3
35
12
− 26
3
19
2
− 14
3
24
− 14
11
12
Taylor expansion. Thus for example second order approximations to ∂ 2 φ/∂x∂y
at the point i, j are given
wi+1,j+1 − wi−1,j+1 + wi−1,j−1 − wi+1,j−1
∂2φ
=
+ O((∆x )2 , (∆y )2 ),
∂x∂y
4∆x ∆y
or
wi+1,j+1 − wi+1,j − wi,j+1 + wi−1,j−1 − wi−1,j − wi,j−1 + 2wi,j
∂2φ
=
+O((∆x )2 , (∆y )2 ).
∂x∂y
2∆x ∆y
5.4
Solution of elliptic pde’s
A prototype elliptic pde is Poisson’s equation given by
∂2φ ∂2φ
+ 2 = f (x, y),
∂x2
∂y
(47)
where f (x, y) is a known/given function. The equation (47) has to be solved in a
domain D, say, with boundary conditions given on the boundary δD of D. These
can be of three types:
• Dirichlet
φ = g(x, y) on δD.
• Neumann
∂φ
= g(x, y) on δD.
∂n
∂φ
• Robin/Mixed B(φ,
) = 0 on δD. Robin boundary conditions involve a
∂n
linear combination of φ and its normal derivative on the boundary. Mixed
boundary conditions involve different conditions for one part of the boundary, and another type for other parts of the boundary.
In the above g(x, y) is a given function, and derivatives with respect to n denote
the normal derivative, ie the derivative in the direction n which is normal to the
boundary.
Let us consider a model problem with
∂2φ ∂2φ
+ 2 = f (x, y), 0 < x, y < 1
∂x2
∂y
φ = 0 on δD.
(48)
(49)
Here the domain D is the square region 0 < x < 1 and 0 < y < 1. Construct a
finite difference mesh with points (xi , yj ), say where
xi = i∆x ,
i = 0, 1, ..., N,
yj = j∆y ,
j = 0, 1, ..., M,
where ∆x = 1/N, and ∆y = 1/M are the step sizes in the x and y directions.
25
Next replace the derivatives in (48) by the approximations from (43),(44) to
get
wi+1,j − 2wi,j + wi−1,j wi,j+1 − 2wi,j + wi,j−1
+
= fi,j ,
(∆x )2
(∆y )2
1 ≤ i ≤ N − 1, 1 ≤ j ≤ M − 1
(50)
and
wi,j = 0,
if
i = 1, N,
1 ≤ j ≤ M,
(51)
wi,j = 0,
if
j = 1, M,
1 ≤ i ≤ N.
(52)
Thus we have (N − 1) × (M − 1) unknown values wi,j to find at the interior points
of the domain. If we write wi = (wi,1 , wi,2 , ..., wi,M −1 )T , fi = (fi,1 , fi,2 , ..., fi,M −1 )T ,
we can write the above system of equations (50) in matrix form as
B I

 I B I


I B I







I B









w1
w2
w3
wN −1











=



2
(∆x ) 




f1
f2
f3
fN −1





.




In the above I is the (M − 1) × (M − 1) identity matrix and B is a similar sized
matrix given by

B=













b c
c b c
0 c b c
.
.
.
c b






,





with b = −2(
1
1
+
),
(∆x )2 +(∆y )2
c = 1.
What we observe is that the matrix is very sparse and can be very large even
with a modest number of grid points. For instance with N = M = 101 the linear
system is of size 104 × 104 and we have 104 unknowns to find.
To solve the system of equations arising from the above discretization, one
has recourse to a number of methods. Direct methods are not feasible because of
the size of the matrices involved and in addition they are expensive and require a
lot of storage, unless one exploits the sparsity pattern of the matrices. Iterative
methods offer an alternative and are easy to use, but depending on the method,
convergence can be slow with an increasing number of mesh points.
26
5.5
Iterative methods for the solution of linear systems
5.5.1
Jacobi method
Consider (50) which we can rewrite as
wi,j =
1
2
2
w
+
w
+
β
(w
+
w
)
−
(∆
)
f
,
i+1,j
i−1,j
i,j+1
i,j−1
x
i,j
2(1 + β 2 )
(53)
with β = ∆x /∆y . This suggests the iterative scheme
new
wi,j
=
1
old
old
2
old
old
2
w
+
w
+
β
(w
+
w
i−1,j
i,j+1
i,j−1 − (∆x ) fi,j ).
2(1 + β 2 ) i+1,j
(54)
old
We start with a guess for wi,j
and compute a new value from (54) and continue
iterating until suitable convergence criteria are satisfied. What are suitable convergence criteria? Ideally we wish to stop when the corrections are small, or when
the equations (53) are satisfied to sufficient accuracy.
Suppose we write the linear system as
Av = f
where v is the exact solution of the linear system. Then if w is an approximate
solution, the error e is defined by e = v − w. But this is not much use unless we
know the exact solution. On the other hand we can write
Ae = A(v − w) = f − Aw.
Then the residual is defined by r = f − Aw which can be computed. For the
Jacobi scheme the residual is given by
ri,j = (∆x )2 fi,j + 2(1 + β 2 )wi,j − (wi+1,j + wi−1,j + β 2 (wi,j+1 + wi,j−1 )).
Therefore a suitable stopping condition might be
max |ri,j | < 1 ,
i,j
5.5.2
or
sX
2
ri,j
< 2 .
i,j
Gauss-Siedel iteration
new
old
The Jacobi scheme involves two levels of storage, wi,j
and wi,j
which can be
wasteful. It is more economical to overwrite values as and when they are computed and use the new values when ready. This gives rise to the Gauss-Seidel
scheme, which, if we sweep with i increasing followed by j, is given by
new
wi,j
=
1
old
new
2
old
new
2
w
+
w
+
β
(w
+
w
i−1,j
i,j+1
i,j−1 − (∆x ) fi,j ),
2(1 + β 2 ) i+1,j
where the new values overwrite existing values.
27
(55)
5.5.3
Relaxation and the SOR method
Instead of updating the new values as indicated above, it is better to use relaxation. Here we compute
old
∗
wi,j = (1 − ω)wi,j
+ ωwi,j
,
∗
where ω is called the relaxation factor, and wi,j
denotes the value as computed by
the Jacobi, or Gauss-Seidel scheme. ω = 1 reduces to the Jacobi or Gauss-Seidel
scheme. The Gauss-Seidel scheme with ω 6= 1 is called the method of successive
overrelaxation or SOR scheme.
5.5.4
Line relaxation
The Jacobi, Gauss-Seidel and SOR schemes are called point relaxation methods.
On the other hand we may compute a whole line of new values at a time, leading
to the line-relaxation methods. For instance suppose we write the equations as
new
new
new
old
wi+1,j
− 2(1 + β 2 )wi,j
+ wi−1,j
= −β 2 (wi,j+1 + wi,j−1
)) + (∆x )2 fi,j ,
(56)
then starting from j = 1 we may compute the values wi,j , for i = 1, ..., N − 1 in
one go. The left hand side of (56) can be seen to lead to a tridiagonal system
of equations and we may use a tridiagonal solver to compute the solution, and
incrementing j until all the points have been computed. The above scheme can
be combined with relaxation to give us a line-SOR method. In general the more
implicit the scheme the better the convergence properties. A variation on the
above scheme is to alternate the direction of sweep, by doing i lines first and on
the next iteration doing the j lines first. This leads to the ADI, or alternating
direction implicit, scheme.
28
5.6
Convergence properties of basic iteration schemes
Consider the linear system
Ax = b
(57)
where A = (ai,j ) is an n × n matrix, and x, b are n × 1 column vectors. Suppose
we write the matrix A in the form A = D−L−U where D, L, U are the diagonal
matrix, lower and upper triangular parts of A, ie

D=













a1,1 0
0 a2,2 0
0
0 a3,3 0
.
.
.
0 an,n

−U =


















,






−L =













0
0
a2,1 0 0
a3,1 a3,2 0 0
.
.
.
an,1 an,2
0 a1,2 a1,3 . . . a1,n
0 0 a2,3 . . . a2,n
0 0
0 a3,4 . . a3,n
.
.
0 an−1,n
0 0
0
Then (57) can be written as
an,n−1 0







.





Dx = (L + U)x + b.
The Jacobi iteration is then defined as
x(k+1) = D−1 (L + U)x(k) + D−1 b.
(58)
The Gauss-Seidel iteration is defined by
(D − L)x(k+1) = Ux(k) + b,
(59)
x(k+1) = (D − L)−1 Ux(k) + (D − L)−1 b.
(60)
or
In general an iteration scheme for (57) may be written as
x(k+1) = Px(k) + Pb
(61)
where P is called the iteration matrix. For the Jacobi scheme scheme we have
P = PJ = D−1 (L + U)
29






,





and for the Gauss-Seidel scheme
P = PG = (D − L)−1 U.
The exact solution to the linear system (57) satisfies
x = Px + Pb.
(62)
Hence if we define the error at the kth iteration by e(k) = x − x(k) , then by
subtracting (61) from (62) we see that the error satisfies the equation
e(k+1) = Pe(k) = P2 e(k−1) = Pk+1 e(0) .
(63)
In order that the error diminishes as k → ∞ we must have
||Pk || → 0 as k → ∞,
(64)
Since
||Pk || = ||P||k
we see that (64) is true provided ||P|| < 1. From linear algebra it can be proven
that the condition that ||P|| < 1 is equivalent to the requirement that
ρ(P) = max |λi | < 1
i
where λi are the eigenvalues of the matrix P. Note that ρ(P) is called the spectral
radius of P. Thus for the iteration to converge we require that the spectral matrix
of the iteration matrix to be less than unity. One further result which is useful
is that for large k
||e(k+1) || = ρ||e(k) ||.
Suppose we want to estimate how many iterations it takes to reduce the initial
error by a factor . Then we see that we need q iterations where q is the smallest
value for which
ρq < giving
q ≥ qd =
ln .
ln ρ
The quantity − ln ρ is called the asymptotic rate of convergence of the iteration.
It gives a measure of how much the error decreases at each iteration. Thus
iteration matrices where the spectral radius is close to 1 will converge slowly.
For the model problem it can be shown that for Jacobi iteration
π
π
1
cos + cos
,
ρ = ρ(PJ ) =
2
N
M
30
and for Gauss-Seidel
ρ = ρ(PG ) = [ρ(PJ )]2 .
If we take N = M and N >> 1 then for Jacobi iteration we have
qd =
2N 2
ln =
−
ln .
2
π
π2
ln(1 − 2N
2)
Gauss-Seidel converges twice as fast as Jacobi, and requires less storage.
For point SOR the spectral radius depends on the relaxation factor ω, but for
the model problem with optimum values and N = M it can be shown that
1 − sin Nπ
ρ=
1 + sin Nπ
giving
N
ln .
2π
Thus SOR with the optimum requires an order of magnitude less number of
iterations to converge as compared to Gauss-Seidel. Finally similar results can
also be derived for line SOR, and it can be shown that using optimum values
qd = −
ρ=
π
1 − sin 2N
π
1 + sin 2N
!2
,
31
N
qd = − √ ln .
2 2π
6
Solution of differential equations using Chebychev collocation
In this section we will briefly look at the solution of equation using Chebychev
collocation. Spectral methods offer a completely different means for obtaining a
solution of a differential equation (ode or pde) and they have a number of major
advantages over finite difference and other similar methods. One is the concept
of spectral accuracy. With finite difference methods increasing the number of
points for a second order method means truncation error reducing by a factor 4.
On the other hand with spectral methods the accuracy of the approximation (for
smooth functions) increases exponentially with increased number of points. This
is referred to as spectral accuracy. There are many different types of spectral
methods (for example the tau method, Galerkin method, Fourier approximation)
and a good place to start studying the basic ideas and theory is in the book
by Canuto et al. (1988). Here we will briefly look at one namely, Chebychev
collocation.
With any spectral method the idea is to represent the function as an expansion
in terms of some basis functions, usually orthogonal polynomials or in the case of
Fourier approximation, trigonometric functions or complex exponentials. With
Chebychev collocation a function is represented in terms of expansions using
Chebychev polynomials.
6.1
Basic properties of Chebychev polynomials
A Chebychev polynomial of degree n is defined by
Tn (x) = cos(n cos−1 x).
(65)
The Chebychev polynomials are eigenfunctions of the Sturm-Liouville equation
d √
dTk (x)
k2
1 − x2
+√
Tk (x) = 0.
dx
dx
1 − x2
!
The following properties are easy to prove
Tk+1 (x) = 2xTk (x) − Tk−1 (x),
with T0 (x) = 1 and T1 (x) = x. Also
|Tk (x)| ≤ 1,
Tk (±1) = (±1)k ,
2Tk (x) =
−1 ≤ x ≤ 1,
Tk0 (±1) = (±1)k k 2 ,
1 dTk−1 (x)
1 dTk+1 (x)
−
,
(k + 1)
dx
(k − 1)
dx
32
k ≥ 1.
Suppose we expand a (smooth) function u(x) in terms of Chebychev polynomials, then
u(x) =
∞
X
2
ûk =
πck
ûk Tk (x),
k=0
where
ck =
(
Z
1
−1
u(x)Tk (x)
√
dx,
1 − x2
2
k = 0, N
.
1 1≤k ≤N −1
If we define ũ(θ) = u(cos θ) then
ũ(θ) =
∞
X
ûk cos(kθ),
k=0
so that the Chebychev series for u corresponds to a cosine series for ũ. This
implies that if u is infinitely differentiable then the Chebychev coefficients of the
expansion will decay faster than algebraically.
In a collocation method we may represent a smooth function in terms of its
values at set of discrete points. Derivatives of the function are approximated by
analytic derivatives of the interpolating polynomial. It is common to work with
the Gauss-Lobatto points defined by
xj = cos(
jπ
) j = 0, 1, ..., N.
N
Suppose the given function is approximated at these points and we represent the
approxmiate values of the function u(x) at these points x = xj by uj . Here we
are assuming that we are working with N + 1 points in the interval −1 ≤ x ≤ 1.
In a given differential equation we need to approximate the derivatives at these
node points. The theory says that we can write
du
dx
!
=
j
N
X
Dj,k uk ,
k=0
where Dj,k are the elements of the Chebychev collocation differentiation matrix
D say. The elements of the matrix D are given by
Dp,j =



























cp (−1)p+j
cj (xp −xj )
p 6= j
x
j
− 2(1−x
1≤p=j ≤N −1
2)
j
2N 2 +1
6
2
− 2N 6 +1
p=j=1
p=j=N
33
Thus in matrix form








dw0
dx
dw1
dx
.
.
dwN
dx








=




D



w0
w1
.
.
wN




,



and
34








d2 w 0
dx2
d2 w 1
dx2
.
.
d2 w N
dx2








=




D2 



w0
w1
.
.
wN








7
Parabolic Equations
In this section we will look at the solution of parabolic partial differential equations. The techniques introduced earlier apply equally to parabolic pdes.
One of the simplest parabolic pde is the diffusion equation which in one space
dimensions is
∂2u
∂u
= κ 2.
(66)
∂t
∂x
For two or more space dimensions we have
∂u
= κ∇2 u.
∂t
In the above κ is some given constant.
Another familiar set of parabolic pdes is the boundary layer equations
ux + vy = 0,
ut + uux + vuy = −px + uyy ,
0 = −py .
With a parabolic pde we expect, in addition to boundary conditions, an initial
condition at say, t = 0.
Let us consider (66) in the region a ≤ x ≤ b. We will take a uniform mesh in
x with xj = a + j∆x , j = 0, 1, .., N and ∆x = (b − a)/N . For the differencing
in time we assume a constant step size ∆t so that t = tk = k∆t .
7.1
First order central difference approximation
We may approximate (66) by
k
k
wjk+1 − wjk
wj+1
− 2wjk + wj−1
=κ
.
∆t
∆2x
"
#
(67)
Here wjk denotes an approximation to the exact solution u(x, t) of the pde at
x = xj , t = tk . The scheme given in (67) is first order in time O(∆t ) and
second order in space O(∆x )2 . Let us assume that we are given a suitable initial
condition, and boundary conditions of the form
u(a, t) = f (t),
u(b, t) = g(t).
The scheme (67) is explicit because the unknowns at level k + 1 can be computed
directly. Notice that there is a time lag before the effect of the boundary data is
felt on the solution. As we will see later this scheme is conditionally stable for
κ∆t
β ≤ 1/2, where β = 2 . Note that β is sometimes called the Peclet or diffusion
∆x
number.
35
7.2
Fully implicit, first order
A better approximation is one which makes use of an implicit scheme. Then
instead of (67) we have
k+1
wjk+1 − wjk
w k+1 − 2wjk+1 + wj−1
.
= κ j+1
∆t
∆2x
#
"
(68)
The unknowns at level k + 1 are coupled together and we have a set of implicit
equations to solve. If we rearrange (68) then
k+1
k+1
βwj−1
− (1 + 2β)wjk+1 + βwj−1
= −wjk ,
1 ≤ j ≤ N − 1,
(69)
In addition approximation of the boundary conditions gives
w0k+1 = f (tk+1 ),
k+1
wN
= g(tk+1 ).
(70)
The discrete equations (69),(70) are of tridiagonal form and thus easily solved.
The scheme is unconditionally stable . The accuracy of the above fully implicit
scheme is only first order in time. We can try and improve on this with a second
order scheme.
7.3
Richardson method
Consider
k
k
wj+1
− 2wjk + wj−1
wjk+1 − wjk−1
.
=κ
2∆t
∆2x
#
"
(71)
This uses three time levels and has accuracy O(∆2t , ∆2x ). The scheme was devised
by a meteorologist and is unconditionally unstable!
7.4
Du-Fort Frankel
This uses the approximation
k
k
wj+1
− wjk+1 − wjk−1 + wj−1
wjk+1 − wjk−1
=κ
.
2∆t
∆2x
"
#
(72)
∆t 2
) ), and is an explicit scheme. The
∆x
scheme is unconditionally stable, but is inconsistent if ∆t → 0, ∆x → 0 but with
∆t /∆x remaining fixed.
This has truncation error O(∆2t , ∆2x , (
36
7.5
Crank-Nicolson
A popular scheme is the Crank-Nicolson scheme given by
k+1
k+1
k
k
wj+1
− 2wjk + wj−1
wjk+1 − wjk
− 2wjk+1 + wj−1
κ wj+1
=
+
.
∆t
2
∆2x
∆2x
"
#
(73)
This is second order accurate O(∆2t , ∆2x ) and is unconditionally stable. (Taking
very large time steps can however cause problems). As can be seen it is also an
implicit scheme.
7.6
Multi-space dimensions
The schemes outlined above are easily extended to multi-dimensions. Thus in
two space dimensions a first order explict approximation to
∂u
= κ∇2 u,
∂t
is
k+1
k
k
k
k
k
k
wi,j+1
− 2wi,j
+ wi,j−1
wk
− 2wi,j
+ wi−1,j
wi,j
− wi,j
+
.
= κ i+1,j
∆t
∆2x
∆2y
#
"
(74)
This is first order in ∆t and second order in space. It is conditionally stable for
κ∆t
κ∆t
1
+
≤
.
(∆x )2 (∆y )2
2
If we use a fully implicit scheme we would obtain
k+1
k+1
k+1
k+1
k+1
k+1
k
− wi,j
wi,j
w k+1 − 2wi,j
+ wi−1,j
wi,j+1
− 2wi,j
+ wi,j−1
= κ i+1,j
+
.
∆t
∆2x
∆2y
"
#
(75)
This leads to an implicit system of equations of the form
k+1
k+1
k+1
k+1
k+1
k
αwi+1,j
+ αwi−1,j
− (2α + 2β + 1)wi,j
+ βwi,j−1
+ βwi,j+1
= −wi,j
,
where α = κ∆t /∆2x , β = κ∆t /∆2y . The form of the discrete equations is very
much like the system of equations arising in elliptic pdes.
From the computational point of view a better scheme is
k+ 1
wi,j 2
k
wi,j
−
∆t /2
k+ 21
k+1
− wi,j
wi,j
∆t /2

k+ 21
w
 i+1,j
= κ

k+ 1
−
k+ 1
2wi,j 2
∆2x
+
k+ 21
wi−1,j
k+ 1
k+ 1
+
k
wi,j+1
−
k
2wi,j
∆2y
+

k
wi,j−1

,

k+1
k+1
k+1
2
2
2
+ wi,j−1
− 2wi,j
wi,j+1

 wi+1,j − 2wi,j + wi−1,j
+
= κ
 , (76)
2
2
∆x
∆y
which leads to a tridiagonal system of equations similar to the ADI scheme. The
above scheme is second order in time and space and also unconditionally stable.
37
7.7
Consistency revisited
Let us consider the truncation error for the first order central (explicit) scheme
(67), and also the Du-Fort Frankel scheme (72).
If u(x, t) is the exact solution then we may write ukj = u(xj , tk ) and thus from
a Taylor series expansion
uk+1
= u(xj , tk + ∆t ) =
j
∂u
∂t
ukj +∆t
∆2
+ t
2
j,k
!
∂2u
∂t2
!
+ O(∆t )3 ,
(77)
j,k
and
ukj+1 = u(xj + ∆x , tk ) =
ukj + ∆x
∂u
∂x
!
∂2u
∂x2
∆2
+ x
2
j,k
!
∆3
+ x
6
j,k
∂3u
∂x3
∆4
+ x
24
j,k
!
∂4u
∂x4
!
+ O(∆x )5 ,
j,k
(78)
and
ukj−1 = u(xj − ∆x , tk ) =
ukj
− ∆x
∂u
∂x
∆2
+ x
2
j,k
∂2u
∂x2
!
!
∆3
− x
6
j,k
∂3u
∂x3
!
∆4
+ x
24
j,k
∂4u
∂x4
!
+ O(∆x )5 .
j,k
(79)
Using (77, 78,79) and substituting into (67) gives

1  k
∂u
uj + ∆ t
∆t
∂t

κ  k
∂u
uj + ∆ x
2
∆x
∂x
+
ukj − ∆x
∂u
∂x
∆2
+ t
2
j,k
!
!
∆2
+ x
2
j,k
∆2
+ x
2
j,k
∂2u
∂x2
!
∂2u
∂t2
!
∂2u
∂x2
!
j,k
!

− ukj + O(∆t )3  =
∆3
+ x
6
j,k
∆3
− x
6
j,k
∂3u
∂x3
∂3u
∂x3
!
∆4
+ x
24
j,k
!
∆4
+ x
24
j,k
∂4u
∂x4
∂4u
∂x4
!
j,k
!
j,k
− 2ukj

+ O(∆x )5 
from which we obtain
"
∂2u
∂u
−κ 2
∂t
∂x
#
j,k
=−
∆t ∂ 2 u
κ∆2x ∂ 4 u
( 2 )j,k +
(
)j,k .
2 ∂t
12 ∂x4
(80)
Thus (80) shows that as ∆t → 0 and ∆x → 0 the original pde is satisfied, and
the right hand side of (80) implies a truncation error O(∆t , ∆2x ).
38
If we do the same for the Du-Fort Frankel scheme we find that
#
" k
uj+1 − uk+1
− ujk−1 + ukj−1
uk+1
− ujk−1
j
j
−κ
2∆t
∆2x
"
#
∂u
∂2u
∆2 ∂ 2 u
∆2
=
− κ 2 + κ 2t 2
+ O(∆2t , ∆2x , 2t ).
∂t
∂x
∆x ∂t j,k
∆x
(81)
This shows that the Du-Fort scheme is only consistent if the step sizes approach
∆t
zero and also ∆
→ 0 simultaneously. Otherwise if we take step sizes such that
x
∆t
remains constant as both step sizes approach zero, then (81) shows that we
∆x
are solving the wrong equation.
7.8
Stability
Consider the first order explicit scheme which can be written as
k
k
wjk+1 = βwj−1
+ (1 − 2β)wjk + βwj+1
,
1 ≤ j ≤ N − 1,
k
with w0k , wN
given. We can write the above in matrix form as

wk+1 =












(1 − 2β) β
β
(1 − 2β) β
0
β
(1 − 2β) β
.
.
β (1 − 2β)
k
T
where wk = (w1k , w2k , ..., wN
−1 ) . Thus (82) can be written as






 k
w ,





(82)
wk+1 = Awk .
If we recall the results from the discussion of convergence of iterative methods,
the above scheme is stable if and only if ||A|| ≤ 1. Now the infinity norm ||A||∞
is defined by
||A||∞ = max
i
N
X
j
|ai,j |
for an N × N matrix A. Thus from (82) we have
||A||∞ = β + |1 − 2β| + β = 2β + |1 − 2β|.
So if (1 − 2β) ≥ 0 ie β ≤ 1/2 then
||A||∞ = 2β + 1 − 2β = 1.
If (1 − 2β) < 0 then
||A||∞ = 2β + 2β − 1 = 4β − 1 > 1.
Thus we have proved that the explicit scheme is unstable if β > 1/2.
39
7.8.1
Stability of Crank-Nicolson scheme
The Crank-Nicolson (73) scheme may be written as
−βwj−1,k+1 + (2 + 2β)wj,k+1 − βwj+1,k+1 = βwj−1,k + (2 − 2β)wj,k + βwj+1,k ,
j = 1, 2, ..., N − 1
(83)
where β =













∆t κ
.
(∆x )2
In matrix form this can be written as

(2 + 2β) −β
−β
(2 + 2β) −β
0
−β
(2 + 2β) −β
.













.
−β (2 + 2β)
(2 − 2β) β
β
(2 − 2β) β
0
β
(2 − 2β) β





 k+1
w





.
.
β (2 − 2β)
k
T
where wk = (w1k , w2k , ..., wN
−1 ) . Thus (84) is of the form
=






 k
w ,





(84)
Bwk+1 = Awk ,
where B = 2IN −1 − βSN −1 , and A = 2IN −1 + βSN −1 and IN is the N × N identity
matrix, and SN −1 is the (N − 1) × (N − 1) matrix

SN −1 =












−2 1
1 −2 1
0
1 −2 1
.
.
1 −2







.





Thus the Crank-Nicolson scheme will be stable if the spectral radius of the matrix
B−1 A is less than unity, ie ρ(B−1 A) < 1. We therefore need the eigenvalues of
the matrix B−1 A. To find these we need to invoke some theorems from linear
algebra. Recall that λ is an eigenvalue of the matrix S, and x a corresponding
eigenvector if
Sx = λx.
40
Thus for any integer p
Sp x = Sp−1 Sx = Sp−1 λx = ... = λp x.
Hence the eigenvalues of Sp are λp with eigenvector x. Extending this result, if
P (S) is the matrix polynomial
P (S) = a0 Sn + a1 Sn−1 + ... + an I
then
1
x.
P (λ)
Finally if Q(S) is any other polynomial in S then we see that
and P −1 (S)x =
P (S)x = P (λ)x,
P −1 (S)Q(S)x =
Q(λ)
x.
P (λ)
If we let P = B(SN −1 ) = 2IN −1 − βSN −1 , and Q = A(SN −1 ) = 2IN −1 + βSN −1
then the eigenvalues of the matrix B−1 A are given by
µ=
2 + βλ
2 − βλ
where λ is an eigenvalue of the matrix SN −1 .
Now the eigenvalues of the N × N matrix

T=
can be shown to be given by









a b
c a b
0 c a b
. . .
. . .
c a
√
nπ
,
λ = λn = a + 2 bc cos
N +1










n = 1, 2, .., N.
Hence the eigenvalues of SN −1 are
λn = −4 sin2
nπ
,
2N
n = 1, 2, ..., N − 1
and so the eigenvalues of B−1 A are
2 − 4β sin2
µn =
2 + 4β sin2
nπ
N
nπ
N
n = 1, 2, .., N − 1.
Clearly
ρ(B−1 A) = max |µn | < 1 ∀β > 0.
n
This proves that the Crank-Nicolson scheme is unconditionally stable.
41
7.9
Stability condition allowing exponential growth
In the above discussion of stability we have said that the solution of
wk+1 = Awk
is stable if ||A|| ≤ 1. This condition does not make allowance for solutions of
the pde which may be growing exponentially in time. A necessary and sufficient
condition for stability when the solution of the pde is increasing exponentially in
time is that
||A|| ≤ 1 + M ∆t = 1 + O(∆t )
where M is a constant independent of ∆x and ∆t .
7.10
von-Neumann stability analysis
The matrix method for analysing the stability of a scheme can become difficult for
more complicated schemes and in multi-dmensional situations. A very versatile
tool in this regard is the Fourier method developed by von Neumann. Here initial
values at mesh points are expressed in terms of a finite Fourier series, and we
consider the growth of individual Fourier components. A finite sine or cosine
series expansion in the interval a ≤ x ≤ b takes the form
X
X
nπx
nπx
an sin(
bn cos(
), or
),
L
L
n
n
where L = b − a. Now consider an individual component written in complex
exponential form at a mesh point x = xj = a + j∆x
An e
inxπ
L
= An e
inaπ
L
e
injπ∆x
L
= Ãn eiαn j∆x
where αn = nπ/L.
Given initial data we can express the initial values as
wp0 =
N
X
Ãn eiαn p∆x
p = 0, 1, ..., N,
n=0
and we have N + 1 equations to determine the N + 1 unknowns Ã. To find how
each Fourier mode develops in time, assume a simple separable solution of the
form
wpk = eiαn p∆x eΩtk = eiαn p∆x eΩk∆t = eiαn p∆x ξ k ,
where ξ = eΩ∆t . Here ξ is called the amplification factor. For stability we thus
require |ξ| ≤ 1. If the exact solution of the pde grows exponentially, then the
difference scheme will allow such solutions if
|ξ| ≤ 1 + M ∆t
where M does not depend on ∆x or ∆t .
42
7.10.1
Fully implicit scheme
Consider the fully implicit scheme
k+1
wjk+1 − wjk
w k+1 − 2wjk+1 + wj−1
.
= κ j+1
∆t
∆2x
#
"
(85)
Let
wjk = ξ k eiαn j∆x .
Then substituting into (85) gives
1 k
κξ k+1 −iαn ∆x
ξ (ξ − 1)eiαn j∆x =
(e
− 2 + eiαn ∆x )eiαn j∆x .
∆t
∆2x
Thus with β = ∆t κ/∆2x
ξ − 1 = βξ(2 cos(αn ∆x ) − 2) = −4βξ sin2 (
This gives
α n ∆x
).
2
1
,
1 + 4β sin2 ( αn2∆x )
ξ=
and clearly 0 < ξ ≤ 1 for all β > 0 and for all αn . Thus the fully implicit scheme
is unconditionally stable.
7.10.2
Richardson’s scheme
The Richardson scheme is given by
k
k
wj+1
− 2wjk + wj−1
wjk+1 − wjk−1
=κ
.
2∆t
∆2x
"
#
Using a von-Neumann analysis and writing
wpk = ξ k eiαn p∆x ,
gives after substitution into (86)
eiαn p∆x ξ k−1 (ξ 2 − 1) = βξ k (e−iαn ∆x − 2 + eiαn ∆x )eiαn p∆x .
This gives
ξ 2 − 1 = −4ξβ sin2 (
α n ∆x
),
2
where β = 2∆t κ/∆2x . Thus
ξ 2 + 4ξβ sin2 (
α n ∆x
) − 1 = 0.
2
43
(86)
This quadratic has two roots ξ1 , ξ2 . The sum and product of the roots is given
by
α n ∆x
ξ1 + ξ2 = −4ξβ sin2 (
), ξ1 ξ2 = −1.
(87)
2
For stability we require |ξ1 | ≤ 1 and |ξ2 | ≤ 1 and (87) shows that if |ξ1 | < 1 then
|ξ2 | > 1, and vice-versa. Also if ξ1 = 1 and ξ2 = −1 then again from (87) we
must have β = 0. Thus the Richardson scheme is unconditionally unstable.
7.10.3
von Neumann analysis in multi-dimensions
For two or more dimensions, the Fourier method extends easily. For instance for
two space dimensions we would seek solutions of the form
k
wp,q
= eiαn p∆x eiγn q∆y ξ k .
44
7.11
Advanced iterative methods
We have looked at a number of simple iterative techniques such as point SOR,
line SOR and ADI. In this section we will consider more advanced techniques
such as the conjugate gradient method, and minimimal residual methods.
7.11.1
Conjugate Gradient Method
This method was first put forward by Hestenes & Stiefel (1952) and was reinvented again and popularised in the 1970’s. The method has one important
property that it converges to the exact solution of the linear system in a finite
number of iterations, in the absence of roundoff error. Used on its own the
method has a number of drawbacks, but combined with a good preconditioner it
is an effective tool for solving linear equations iteratively.
Consider
Ax = b,
(88)
where A is an n×n symmetric and positive definite matrix. Recall A is symmetric
if AT = A and positive definite if xT Ax > 0 for any non zero vector x. Consider
the quadratic form
1
φ(x) = xT Ax − xT b.
(89)
2
We see that
1
1
φ(x) = xT (Ax − b) − xT b,
2
2
and from the postive definiteness of A the quadratic form has a unique minimum
- 21 bT A−1 b when x = A−1 b.
We will try and find a solution to (88) iteratively. Let xk be the kth iterate.
Now we know that φ decreases most rapidly in the direction ∇φ, and from (89)
we have
∇φ(xk ) = b − Axk = rk .
If the residual rk 6= 0 then we can find an α such that φ(xk + αxk ) < φ(xk ). In
the method of steepest descent we choose α to minimize
1 T
(x + αrTk )A(xk + αrk ) − (xTk + αrTk )b
2 k
1
1 T
xk Axk + αrTk Axk + α2 rTk Ark − xTk b − αrTk b,
=
2
2
φ(xk + αrk ) =
(90)
where we have used xTk Ark = (xTk Ark )T = rTk Axk . Differentiating (90) with
respect to α and equating to zero to find a minimum gives
0 = rTk Axk + αrTk Ark − rT b,
which gives
αk =
rTk rk
.
rTk Ark
45
(91)
The problem with the steepest descent algorithm as constructed above is that
convergence can be slow especially if A is ill conditioned. The condition number
of a matrix A is defined by
max λi
κ(A) =
min λi
where λi is an eigenvalue of the matrix A. The matrix is said to be ill-conditioned
if κ is large. The steepest descent algorithm works well if κ is small, but for the
system of equations arising from a second-order finite difference approximation
to Poisson’s equation, κ is very large. In a sense the steepest descent method
is slow because the search directions are too similar. We can try and search for
the new iterate in another direction, say pk . To minimize φ(xk + αpk ) we must
choose as in (91)
pT pk
αk = T k
,
pk Apk
where now the search direction is different from the direction of the gradient.
Then
1
φ(xTk−1 + αk pTk ) = (xTk−1 + αk pTk )A(xk−1 + αk pk ) − (xTk−1 + αk pTk )b,
2
= φ(xk−1 ) −
1 (pTk rk−1 )2
,
2 pTk Apk
and φ is reduced provided that pk is not orthogonal to rk−1 . Thus the algorithm
loops over
rk−1 = b − Axk−1
pTk pk
αk =
pTk Apk
xk = xk−1 + αk pk
(92)
Thus xk is a linear combination of the vectors p1 , p2 , ..., pk . How can we choose
the pi ? It turns out that we can choose them so that each xk solves the problem
min φ(x),
x ∈ span(p1 , p2 , ..., pk ),
when we choose α to solve
min φ(xk−1 + αpk )
α
provided that we make the pi A conjugate.
Define an n×k matrix Pk by Pk = [p1 , p2 , ..., pk ] . Then if x ∈ span(p1 , p2 , ..., pk )
there exists a (k − 1) vector y such that
x = Pk−1 y + αpk ,
46
and
φ(x) = φ(Pk−1 y) + αyT PTk−1 Apk −
Choosing the {pi } so that
α2 T
p Apk − αpk b.
2 k
PTk−1 Apk = 0
decouples the globalpart of the minimization from the part in the pk direction.
Then the required global minimization of φ(x) is obtained by choosing
Pk−1 y = xk−1 ,
and αk =
pTk b
pTk rk−1
=
,
pTk Apk
pTk Apk
as before.
From (??)
pTi Apk = 0,
i = 1, 2, ..., k − 1,
ie pk is A to p1 , p2 , ..., pk−1 . Thus the {pi } are linearly independent and hence
xn is a linear combination of p1 , p2 , ..., pn−1 is the exact solution of
Ax = b.
Condition ?? does not determine pk uniquely. To do this we choose the pk
to be closest to rk−1 because rk−1 gives the most rapid reduction of φ(x).
It can be shown that, (eg, see Golub & van Loan) that
• the residuals r1 , r2 , ..., rk−1 are orthogonal,
• pTi rj = 0,
i = 1, 2, .., j,
• rj = rj−1 − αj Apj ,
• pk = rk−1 + βk pk−1 where
βk = −
pTk−1 Ark−1
.
pTk−1 Apk−1
47
7.11.2
Conjugate Gradient Algorithm
Using these results we obtain the following algorithm
r0 = b − Ax0
p0 = r 0
FOR i = 0, ..., n DO
BEGIN
(ri , ri )
(pi , Api )
xi+1 = xi + αi pi
ri+1 = ri − αi Api
(ri+1 , ri+1 )
βi =
(ri , ri )
pi+1 = ri+1 + βi pi
αi =
END
In practice xn 6= x because of rounding errors and the residuals are not
orthogonal. It can be shown that
"
√ #2k
1− κ
√
||x − xk ||A ≤ ||x − x0 ||A
1+ κ
so that convergence can be slow for very large κ. Preconditioning of A is required
for the CG method to be useful.
7.11.3
Implementation issues
Note that the algorithm requires the matrix vector product Ap to be computed.
The matrix A is usually very large and sparse. As far as implementation of the
algorithm is concerned we do not explicitly construct the matrix A, only the effect
of A on a vector. Consider, for example, a second-order discretization applied to
say Poisson’s equation, ∇2 φ = f (x, y) with Dirichlet boundary conditions. The
discretization gives the set of equations
wi+1,j − 2wi,j + wi+1,j wi,j+1 − 2wi,j + wi,j−1
+
= fi,j ,
∆2x
∆2y
2≤i≤N −1
,
2≤j ≤M −1
and
w1,j = f l(j),
w(N, j) = f r(j),
48
1 ≤ j ≤ M,
wi,1 = gb(i),
wi,M = gt(i),
2 ≤ i ≤ N − 1,
and f l(y), f r(y), gb(x), gt(x) are known functions given on the boundaries. Here
we may define x to the N M × 1 vector of unknowns
x = (w1,1 , w2,1 .., wN,1 , w1,2 , ..., wi,j , ..., wN,M )T .
In fact we may identify the qth component xq of x by
xq = wi,j
where q = i + N (j − 1).
Then the matrix vector product Ax will have elements dq with q = i + N (j − 1)
given by
wi+1,j − 2wi,j + wi+1,j wi,j+1 − 2wi,j + wi,j−1
dq =
+
,
∆2x
∆2y
if i, j identifies an interior point of the domain, and
d1+N (j−1) = w1,j ,
di = wi,1 ,
dN +N (j−1) = wN,j ,
di+(M −1)N = wi,M ,
1 ≤ j ≤ M,
2 ≤ i ≤ N − 1,
for points on the boundary.
One other point to re-emphazise is that in the above analysis/discussion the
matrix A is assumed to be symmetric and positive definite. Usually the system
of equations that we come across are not symmetric. Here one needs to consider
alternative algorithms, such as the GMRES method discussed later, or one could
try and use the conjugate gradient algorithm with the system
AT Ax = AT b.
The latter is not recommended as the condition number of the matrix AT A
deteriorates if the original matrix A is ill-conditioned to start with.
7.11.4
CG, alternative derivation
Let A be a symmetric positive definite matrix ie
A = AT ,
xT Ax > 0
for any non-zero vector x.
Also define an inner product
(x, y) = xT y.
We saw that the functional
1
F (x) = (x, Ax) − (b, x),
2
49
is minimized when x is a solution of
Ax = b
The steepest descent method is defined by looking at iterations of the form
x(k+1) = x(k) + αk r(k)
where
r(k) = b − Ax(k)
is the residual, and in the direction of the gradient of F (x).
The α’s are chosen to minimize F (x(k+1) ), and come out to be
αk =
(r(k) , r(k) )
.
(r(k) , Ax(k) )
The convergence of this technique can be slow, especially if the matrix is
ill-conditioned. By choosing the directions differently we obtain the conjugate
gradient algorithm.
This works as follows. Let
x(k+1) = x(k) + αk p(k)
where p(k) is a direction vector. For the Conjugate Gradient method we let
p(k) = r(k) + βk p(k−1)
for n ≥ 1, and βk is chosen so that p(k) is A conjugate to x, ie
(p(k) , Ap(k−1) ) = 0.
This gives
(r(k) , Ap(k−1) )
.
(p(k−1) , Ap(k−1) )
As before α is chosen to minimize F (x(k+1) ), and come out to be
βk = −
(p(k) , r(k) )
αk = (k)
.
(p , Ap(k) )
Collecting all the results together gives the CG algorithm
x(0)
x
(k+1)
p(k)
arbitrary
= x + αk p(k) , k = 0, 1, ..
(
r(k) , k = 0
=
r(k) + βk p(k−1) , k = 1, 2,
(k)
(r(k) , Ap(k−1) )
, k = 1, 2, ..
(p(k−1) , Ap(k−1) )
= b − Ax(k) , k = 0, 1, ..
(p(k) , r(k) )
=
, k = 0, 1, ..
(p(k) , Ap(k) )
βk = −
r(k)
αk
50
It can be shown that the αk , βk and r(k) are equivalent to
(r(k) , r(k) )
, k = 1, 2, ..
(r(k−1) , r(k−1) )
(r(k) , r(k) )
= − (k)
, k = 0, 1, ..
(p , Ap(k) )
= r(k−1) − αk−1 Ap(k−1) , k = 1, 2, ..
βk = −
αk
r(k)
Also the following relations hold
(r(i) , r(j) ) = 0 i 6= j
(p(i) , Ap(j) ) = 0 i 6= j
(r(i) , Ap(j) ) = 0 i 6= j
and i 6= j + 1
The residual vectors r(k) are mutually orthogonal and the direction vectors p(k)
are mutually A conjugate. Hence the name conjugate gradient.
7.11.5
Preconditioning
The conjugate gradient method combined with preconditioning is a very effective
tool for solving linear equations iteratively. Preconditioning helps to redistribute
the eigenvalues of the iteration matrix and this can significantly enhance the
convergence properties. Suppose that we are dealing with the linear system
Ax = b.
(93)
A preconditioning matrix M is such that M is close to A in some sense, and
easily invertible. For instance if we were dealing with the matrix A arising from
a discretization of the Laplace operator, then one could take M to be any one
matrices stemming from one of the classical iteration schemes such as Jacobi,
SOR, or ADI, say. The system (93) can be replaced by
M−1 Ax = M−1 b.
The conjugate algorithm is then applied to (94) leading to the algorithm
x0
guess r0
solve Mz0
p0
FOR k
αk
=
=
=
=
Ax0 − b
r0 and put
z0
1, n
(zk−1 , rk−1 )
=
(pk−1 , Apk−1 )
51
(94)
xk = xk−1 + αk pk−1
rk = rk−1 − αk Apk−1
Solve Mzk = rk
(zk , rk )
βk =
(zk−1 , rk−1 )
pk = zk + βk pk−1
END
52