Download EEE484 COMPUTATIONAL METHODS Course Notebook

Document related concepts
no text concepts found
Transcript
EEE484
COMPUTATIONAL METHODS
Course Notebook
February 3, 2009
................................................................
.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........
.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...
...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...
...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...
...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...
.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.
.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.
.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.
.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.
.aabbccddeeffhhkkpp--oojjggddbb..BBDDGGJJOO++PPKKHHFFEEDDCCBBAA.
.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.
.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.
.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.
.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.
...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...
...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...
...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...
...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......
.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........
................................................................
http://www1.gantep.edu.tr/∼andrew/eee484/
Dr Andrew Beddall
[email protected]
Department of Electric and Electronic Engineering,
University of Gaziantep, Turkey.
Preamble
This notebook presents notes, exercises and example exam questions for the course EEE484.
Fortran and C++ solutions can be found in the downloads section of the course web-site.
The content of this document is automatically built from the course web-site
( this build is dated Tue Feb 3 11:45:18 EET 2009 ).
You can download the latest version, in postscript or pdf format, from the course web-site.
Only 8 topics are present in this version (more coming soon):
• Lecture 1 - Numerical Truncation, Precision and Overflow
• Lecture 2 - Numerical Differentiation
• Lecture 3 - Roots, Maxima, Minima (closed methods)
• Lecture 4 - Roots, Maxima, Minima (open methods)
• Lecture 5 - Numerical Integration: Trapezoidal and Simpson’s formulae
• Lecture 6 - Solution of D.E.s: Runge-Kutta, and Finite-Difference
• Lecture 7 - Random Variables and Frequency Experiments
• Lecture 8 - Monte-Carlo Methods
Title page figure: Numerical solution for the potential around a dipole.
Contents
1
2
3
4
5
6
7
Numerical Truncation, Precision and Overflow
1.1 Topics Covered . . . . . . . . . . . . . . . . . . .
1.2 Lecture Notes . . . . . . . . . . . . . . . . . . . .
1.3 Lab Exercises . . . . . . . . . . . . . . . . . . . .
1.4 Lab Solutions . . . . . . . . . . . . . . . . . . . .
1.5 Example exam questions . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
1
1
1
6
7
9
Numerical Differentiation
2.1 Topics Covered . . . . . .
2.2 Lecture Notes . . . . . . .
2.3 Lab Exercises . . . . . . .
2.4 Lab Solutions . . . . . . .
2.5 Example exam questions .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10
10
10
15
16
19
Roots, Maxima, Minima (closed methods)
3.1 Topics Covered . . . . . . . . . . . . . . . .
3.2 Lecture Notes . . . . . . . . . . . . . . . . .
3.3 Lab Exercises . . . . . . . . . . . . . . . . .
3.4 Lab Solutions . . . . . . . . . . . . . . . . .
3.5 Example exam questions . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
21
21
21
27
28
30
.
.
.
.
.
31
31
31
37
38
41
.
.
.
.
.
43
43
43
50
51
54
.
.
.
.
.
55
55
55
64
66
68
.
.
.
.
.
70
70
70
82
83
85
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Roots, Maxima, Minima (open methods)
4.1 Topics Covered . . . . . . . . . . . . . . . .
4.2 Lecture Notes . . . . . . . . . . . . . . . . .
4.3 Lab Exercises . . . . . . . . . . . . . . . . .
4.4 Lab Solutions . . . . . . . . . . . . . . . . .
4.5 Example exam questions . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Numerical Integration: Trapezoidal and Simpson’s formulae
5.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5.5 Example exam questions . . . . . . . . . . . . . . . . . . . . . . .
Solution of D.E.s: Runge-Kutta, and Finite-Difference
6.1 Topics Covered . . . . . . . . . . . . . . . . . . . . . . . .
6.2 Lecture Notes . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Lab Exercises . . . . . . . . . . . . . . . . . . . . . . . . .
6.4 Lab Solutions . . . . . . . . . . . . . . . . . . . . . . . . .
6.5 Example exam questions . . . . . . . . . . . . . . . . . . .
Random Variables and Frequency Experiments
7.1 Topics Covered . . . . . . . . . . . . . . . . . . .
7.2 Lecture Notes . . . . . . . . . . . . . . . . . . . .
7.3 Lab Exercises . . . . . . . . . . . . . . . . . . . .
7.4 Lab Solutions . . . . . . . . . . . . . . . . . . . .
7.5 Example exam questions . . . . . . . . . . . . . .
i
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
A
Monte-Carlo Methods
8.1 Topics Covered . . . . .
8.2 Lecture Notes . . . . . .
8.3 Lab Exercises . . . . . .
8.4 Lab Solutions . . . . . .
8.5 Example exam questions
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
Linux Tutorial
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
86
86
86
93
94
96
97
ii
1
Numerical Truncation, Precision and Overflow
1.1
Topics Covered
o Introduction to numerical methods; Taylor’s expansion and truncation errors; round-off errors, overflow;
precision of data types in Fortran and C++.
1.2
Lecture Notes
Introduction
It is important to understand that, in general, numerical methods are not exact; neither are the machines
(computers) that perform the numerical calculations for us. In this lecture, we will look at the nature of
truncation errors and round-off errors. An understanding of these sources of errors in numerical methods is
as important as an understanding of the methods themselves.
Numerical Methods
We apply numerical techniques to solve numerical problems when analytical solutions are difficult or inconvenient. A simple example is the computation of the first derivative of a function f(x). Calculus gives us
an analytical method for forming an expression for the derivative, however, such analysis for some functions may be difficult, impossible, or inconvenient. A simple numerical solution uses the Forward-Difference
Approximation (FDA) that approximates the derivative by taking the gradient of the function f(x) in the
region x to x+h:
FDA = ( f(x+h) - f(x) ) / h
where h is small but not zero.
For example if f(x) = 2x2 + 4x + 6 and we wish to determine the first derivative evaluated at x=3, the
FDA (using h=0.01) gives:
( (2x3.012 + 4x3.01 + 6) - (2x3.002 + 4x3.00 + 6 ) ) / 0.01 = 16.02.
Of course this is only an approximation (the true value, by calculus, is 16).
gnuplot$>$ plot [0:4] 2*x**2+4*x+6
Truncation Errors
The error in the above approximation can be written as FDA - f’(x) = 16.02 - 16 = 0.02. This is called a
truncation error as it is due to the truncation of higher orders in the exact expression for the first derivative.
We can see the form of the truncation error in the FDA by considering Taylors expansion:
f(x+h) = f(x) + h.f‘(x)/1! + h^2.f‘‘(x)/2! + h^3.f‘‘‘(x)/3! + ....
Rearrange for the FDA:
( f(x+h) - f(x) ) / h = f‘(x) + h.f‘‘(x)/2 + h^2.f‘‘‘(x)/6 + ....
=> FDA = f‘(x) + (h/2).f‘‘(x) + O(h^2)
------------------------/
\
the derivative
the truncation error in the FDA
1
We see that the FDA gives the first derivative plus some extra terms in the series. The error in the approximation FDA - f’(x) is therefore (h/2).f“(x) + O(h2 ). This can be checked numerically with the above
example: (h/2).f“(x) = (0.01/2) x (4) = 0.02 (as found above). The truncation error in the FDA is proportional to h, the FDA is therefore called a first-order approximation. Higher-order methods have truncation
errors that are proportional to higher powers of h and therefore yield smaller truncation errors (when h is
less than one).
We will investigate the round-off error in the above calculation at the end of the next section.
Computer Precision (Round-off Errors)
Numerical methods are implemented in computer programs where the numerical calculations can be performed quickly and conveniently. However, numbers are stored in computer memory with a limited precision;
the loss of precision of a value is called a round-off error. Round-off errors can occur when a value is initially
assigned, and can be compounded when values are combined in arithmetic operations. Iteration is common
in computational methods and so it is important to minimise compound roundoff.
As round-off errors can be a significant source of error in a numerical method (in addition to the truncation
error) we will look more closely at the nature of the round-off error and how it can be reduced.
A binary representation is used to store numbers in computer memory. For example the binary number
11.011 represents exactly the decimal number 3.375:
1
1x2
+
1
1x1
.
0
+ 0x1/2
+
1
1*1/4
+
1
1*1/8 = 3.375
Similarly the decimal value 0.3125 can be expanded to 0.25 + 0.0625 = 1/4 + 1/16 that can be stored exactly
in binary as 0.0101. However, given a limited number of binary digits, it is possible that even a rational
decimal number might not be stored precisely in binary. For example there is no precise representation for
0.3; the nearest representation with 8 bits is 0.01001101 that gives 0.30078125. The precision increases as
more binary digits are used, but there is always a round-off error. In general, the only real numbers that
can be represented exactly in the computer’s memory are those that can be written in the form m/2k where
m and k are integers; however, again there is a limit to the set of numbers that are included in this group
due to the limited number of binary digits used to store the value.
Floating-Point Representation
Computers store REAL numbers (as opposed to INTEGER numbers) in the floating-point representation
value = m be where m is the mantissa, b is the base (= 2 in computers) and e is the exponent. In Fortran,
a type ”real” number is stored in 32 binary bits (4 bytes) [this is equivalent to a ”float” in C/C++]. To
allow for a large exponent range the binary bits available for storage are shared between the mantissa and
the exponent of the number. For example the number 413.26 is represented by a mantissa part and an
exponent part as 0.41326x103 . The division of the 32 binary bits are as follows: 8 bits are used for to store
the exponent, 1 bit for the sign of the exponent, and 23 bits for the mantissa. The precision of the storage
of real data is therefore limited by the 23 bits used to store the mantissa.
In Fortran the number of binary bits used to store type real numbers can be increased from the default 32
to 64 or 128 by declaring the type ”real” data with the kind specifier. The default single-precision data has
kind=4 where each data is stored in 4 bytes (32 binary bits) of memory. Double-precision data (kind=8) is
allocated 8 bytes (64 binary bits) [this is equivalent to a ”double” in C/C++] and quad-precision (kind=16)
16 bytes (128 binary bits). Double precision has about twice the precision as single precision and a much
larger range in the exponent, quad precision has more than four times the precision and a very large range
in the exponent. The three real kinds are illustrated in the table below.
2
+------------------+---------------------+------------------+---------+--------| Type and Kind
| Memory allocation
| Precision
| Range
| C/C++
+------------------+---------------------+------------------+---------+--------| real (kind=4 )* | 4 bytes ( 32 bits) | 7 s.f. (Single) | 10^38
| "float"
| real (kind=8 )
| 8 bytes ( 64 bits) | 15 s.f. (Double) | 10^308 | "double"
| real (kind=16)
| 16 bytes (128 bits) | 34 s.f. (Quad)
| 10^4931 | "long double"+
+------------------+---------------------+------------------+---------+--------* default kind in Fortran.
s.f. = "significant figures".
+ only on 64 bit platforms.
A limitation is also placed on the range of values that be be stored, this is illustrated for single-precision
type real data below:
underflow
overflow<------------------------->--<------------------------->overflow
-10^38
-10^-45 +10^-45
+10^38
If a number exceeds the permitted range, for example -1038 to +1038 , then it cannot be stored; such a
situation results in the program continuing with wrong values or terminating with an overflow error. There
is also a limit to the representation of very small real numbers, the range for single precision real data is
about -10−45 to about +10−45 ; attempting to store a value smaller than this results in an underflow error.
Similarly, integer type data can be stored in 1, 2, 4 or 8 bytes, each giving a larger range of values that
can be represented by them. The default integer kind in Fortran is 4 bytes (kind=4) [”int” in C/C++].
As an integer number is exact there is no corresponding precision, the only limitation is then that of range
(integer overflow). The four kind types for integers, and the corresponding ranges, are summarised in the
table below.
+------------------+-------------------+----------------------------+------| Type and Kind
| Memory allocation | Range
| C/C++ (signed)
+------------------+-------------------+----------------------------+------| integer (kind=1) | 1 byte ( 8 bits) |
-128 to 127
| "char"
| integer (kind=2) | 2 bytes (16 bits) |
-32768 to 32767
| "short"
| integer (kind=4)*| 4 bytes (32 bits) | -2147483648 to 2147483647 | "int"
| integer (kind=8) | 8 bytes (64 bits) |
about +- 9x10^18
| "long"+
+------------------+-------------------+----------------------------+
* default kind in Fortran.
+ only on 64 bit platforms.
kind specification in Fortran [you can investigate the C/C++ equivalent in your own time]
Examples of the declaration of data of different kinds:
real
real (kind=4)
real (kind=8)
real (kind=16)
::
::
::
::
A
B
C
D
!
!
!
!
Default (single precision)
Single precision
Double precision
Quad precision
Examples of assignments:
A=1.2345678_4
or simply 1.2345678
C=1.234567890123456_8
D=1.2345678901234567890123456789012345_16
(Single precision)
(Double precision)
(Quad precision)
3
Note that the underscore symbol is used to define the precision of the constant; if this is not used then some
precision might be lost, or unpredictable values assigned to some of the least significant digits. For example:
real(kind=8) :: C = 1.11111111111111_8
whereas
real(kind=8) :: C = 1.11111111111111
and
real(kind=8) :: C = 1.111111
assigns C with 1.11111111111111
assigns C with 1.11111116409302
assigns C with 1.11111104488373
The last 7 or 8 digits have been assigned garbage from residual values in memory.
The E symbol can be used for exponentiation:
A = 1.234568E38
or
A = 1.234568E38_4
C = 1.23456789012346E308_8
D = 1.234567890123456789012345678901235E4931_16
We will see later in the course how double- and quad-precision can greatly reduce round-off errors in numerical methods.
Remember: although double- and quad-precision can reduce round-off errors they have no effect on the
size of truncation errors; truncation errors are inherent to the numerical method and not to the internal
representation of numbers in a computer.
Examples
1. The expression ( (a+b)2 - 2.a.b - b2 ) / a2 reduces to a2 / a2 = 1. But computed in a machine with
limited precision can give unexpected results:
[Fortran]
real(kind=8) :: a=0.00001_8, b=88888.0_8, c
c = ( (a+b)**2 - 2*a*b - b**2 ) / a**2
print *, c
end
[C++]
#include $<$iostream$>$
main () {
double a=0.00001, b=88888, c;
c = ( (a+b)*(a+b) - 2*a*b - b*b ) / (a*a);
std::cout << c << std::endl;
}
The result is 4.65661 in both cases! This is an extreme example of a calculation that is sensitive to roundoff.
Note that quad precision gives the correct result 1.000000000000017
2. test for precision
The following programs (in Fortran and C++) implement the Forward-Difference Approximation algorithm
using single-precision.
[Fortran]
real :: h = 0.1
print *, "FDA = ", (f(3.0+h)-f(3.0))/h
contains
real function f(x)
real :: x
f = 2*x**2 + 4*x + 6
end function f
end
[C++]
#include $<$iostream$>$
float f(float x) { return 2*x*x + 4*x + 6; }
int main() {
float h = 0.1;
std::cout << "FDA = "
<< (f(3.0+h)-f(3.0))/h
<< std::endl;
}
running the above programs for decreasing values of h reveals a decreasing truncation error (t.e.) but
increasing round-off error (r.e.). Remember that the correct result should be 16.
4
h=0.1
h=0.01
h=0.001
h=0.0001
h=0.00001
h=0.000001
FDA
FDA
FDA
FDA
FDA
FDA
=
=
=
=
=
=
16.199990
16.019821
16.002655
15.983582
16.021729
15.258789
t.e
t.e
t.e
t.e
t.e
t.e
=
=
=
=
=
=
0.2,
0.02,
0.002,
0.0002,
0.00002,
0.000002,
r.e.
r.e.
r.e.
r.e.
r.e.
r.e.
=
=
=
=
=
=
-0.000011444092
-0.00017929077
0.0006542206
-0.016618729
0.021709442
-0.74121284
The optimal value occurs when h=0.001 where both truncation and round-off errors are relatively small.
3. tests for overflow and underflow:
! test integer overflow
integer :: i, j=1e9
do i = 1, 5
print *, i, j
j = j * 2
end do
end
|
|
|
|
|
|
|
Result:
1 1000000000
2 2000000000
3 -294967296
4 -589934592
5 -1179869184
! test real overflow
integer :: i
real :: r=1.0E37
do i = 1, 5
print *, i, r
r = r * 10.
end do
end
|
|
|
|
|
|
|
|
Result:
1 1.E+37
2 1.E+38
3 +Inf
4 +Inf
5 +Inf
! test real underflow
integer :: i
real :: r=1.0E-42
do i = 1, 6
print *, i, r
r = r / 10.
end do
end
|
|
|
|
|
|
|
|
Result:
1 1.E-42
2 1.E-43
3 1.E-44
4 1.E-45
5 0.
6 0.
Some compilers provide options that give different behavior with respect to overflow and underflow. For
example in the g95 compiler (www.g95.org) the following environment variables can be set:
G95_FPU_OVERFLOW=1
and
G95_FPU_UNDERFLOW=1
In this case the above two programs abort with a ”Floating point exception” message instead of continuing
with bogus values.
5
1.3
Lab Exercises
Task 1 - Truncation Errors
The first derivative (gradient) of a function can be approximated by the Forward Difference Approximation:
FDA = ( F(x+h) - F(x) ) / h
where h is small but not zero.
In theory, the truncation error in this approximation is given by:
Error = h/2 F‘‘(x) + O(h^2)
Write a program that computes, using the FDA, at x=4.7, with h=0.01, the first derivative of the function:
F(x) = 3.4 + 18.7x - 1.6x^2
Hint: use double precision to avoid significant round-off errors confusing the results: for example
real(kind=8) :: h, and h=0.01_8
Questions
Compare your result with the exact result determined by calculus. Compare the error in the result with the
predicted truncation error. Do your comparisons make sense?
Task 2 - Precision (Round-off Errors)
1. What is the result of your FDA program with all the variables in single precision?
2. Write a Fortran program that declares variables A, B, C and D as type double precision real, and
determine the result of the assignments:
A
B
C
D
=
=
=
=
1.11111111111111_8
1.11111111111111
1.111111
1.111111_8
Explain your findings.
3. What do you expect to be the output of the following program. Run the program to see if you are right,
explain your findings. Hint: Press [Ctrl][C] to break out of a program that does not terminate.
real :: a=0.0
do
a = a + 0.1
print *, a
if ( a == 1.0 ) exit
end do
end
Task 3 - Series expansion
Write a program that computes ex by the series expansion: ex = 1 + x + x2 /2! + x3 /3! + x4 /4! + ...
+ xi /i! + ... Terminate the expansion when a term is less than 0.000001. Check you results against the
library function exp(x) or use your pocket calculator.
Hint: Factorials can be problematic due to integer overflow; you can avoid factorials by observing that the
(i+1)th term in the series is equal to the he (i)th term times x/i.
6
1.4
Lab Solutions
Task 1 - Truncation Errors
The first derivative (gradient) of a function can be approximated by the Forward Difference Approximation:
FDA = ( F(x+h) - F(x) ) / h
where h is small but not zero.
In theory, the truncation error in this approximation is given by:
Error = h/2 F‘‘(x) + O(h^2)
Write a program that computes, using the FDA, at x=4.7, with h=0.01, the first derivative of the function:
F(x) = 3.4 + 18.7x - 1.6x^2
Hint: use double precision to avoid significant round-off errors confusing the results: for example
real(kind=8) :: h, and h=0.01_8
Questions
Compare your result with the exact result determined by calculus. Compare the error in the result with the
predicted truncation error. Do your comparisons make sense?
Solution
Program: eee484ex1a (see the downloads page)
The output is:
fda
true
error_fda
true error
= 3.644000
= 3.660000
= -0.016000
= -0.016000
Note that double precision variables are used, real(kind=8), otherwise the results will include significant
round-off errors making the analysis less clear.
From the output, we can see that the fda value is similar to the true value, but not exactly the same as it
is an estimate only. According to theory the truncation error in this estimate is
(h/2).F‘‘(x) = h/2.0*(-3.2) = -0.016
this is the same as the true error = fda - true = 3.644 - 3.660 = -0.016.
The result is: the expression for the truncation error is correct.
Task 2 - Precision (Round-off Errors)
1. What is the result of your FDA program with all the variables in single precision?
Solution
Simply replace kind=8 with kind=4, and underscore 8 with underscore 4 [or double with float] and rerun
the program, the result is
fda
true
error_fda
true error
= 3.64423
= 3.66000
= -0.01600
= -0.01577
7
The result for the fda is different in the fourth decimal place - as well as truncation error there is now an
additional round-off error.
2. Write a program that declares variables A, B, C and D as type double precision real, and determine the
result of the assignments:
A
B
C
D
=
=
=
=
1.11111111111111_8
1.11111111111111
1.111111
1.111111_8
Explain your findings.
Solution
A = 1.11111111111111_8
correctly assigns the value 1.11111111111111 to A.
B = 1.11111111111111 assigns 1.11111116409302 to B because the assignment is equivalent to
1.11111111111111_4 = 1.1111111
and so the last 7 digits contain garbage.
C = 1.111111 assigns 1.11111104488373 for the same reason as in B.
D = 1.111111_8
assigns correctly 1.11111100000000 to D
3. What do you expect to be the output of the following program. Run the program to see if you are right,
explain your findings. Hint: Press [Ctrl][C] to break out of a program that does not terminate.
real :: a=0.0
do
a = a + 0.1
print *, a
if ( a == 1.0 ) exit
end do
end
Solution
You might expect the program to output the numbers 0.1, 0.2, ...., 0.9, 1.0 and then terminate. But you
might actually find that, due to round of errors, A does not take exactly the value 1.0 and therefore the
program fails the test (A==1.0) and continues to count without end (press [Ctrl][C] to stop the program).
A fix for this would be to replace the equality ”==” with ”>=” (greater than or equal to); the loop might
then end with A=1.0 or A=1.1.
Task 3 - Series expansion
Write a program that computes ex by the series expansion: ex = 1 + x + x2 /2! + x3 /3! + x4 /4! + ...
+ xi /i! + ... Terminate the expansion when a term is less than 0.000001. Check you results against the
library function exp(x) or use your pocket calculator.
Hint: Factorials can be problematic due to integer overflow; you can avoid factorials by observing that the
(i+1)th term in the series is equal to the he (i)th term times x/i.
Solution
Program: eee484ex1b (see the downloads page)
8
1.5
Example exam questions
Question
a) Explain the term ’truncation error’, and give an example.
How can truncation errors be reduced?
b) Explain the term ’round-off error’.
How can round-off errors be reduced?
c) Explain the terms ’overflow’ and ’underflow’.
How can overflow and underflow be avoided.
9
2
Numerical Differentiation
2.1
Topics Covered
o Numerical Differentiation:
- Forward Difference Approximation (first derivative): FDA
- Central Difference Approximation (first derivative): CDA
- Richardson Extrapolation (first derivative): REA
- Central Difference Approximation (second derivative): CDA2
- Richardson Extrapolation (second derivative): REA2
The student should be able to derive (or prove) the FDA, CDA, REA, CDA2 and REA2 (the formulae are
given) from Taylor’s Expansion, and show the form of the error in each approximation. The student should
be able to use the formulae by hand, and implement them in a computer program. A basic understanding
of the meaning of ”Truncation Error” and ”Round-off Error” is expected.
2.2
Lecture Notes
Introduction
When an analytical solution to the derivative of a given function is difficult or inconvenient then a numerical method can be used to provide an approximate solution. It is important however to understand the
truncation and roundoff errors involved in these numerical methods. We will look first at the most basic
method for the first derivative of a function, the FDA, and then move onto higher order methods, the CDA
and REA. We will also look at approximations for the second derivative: the CDA2 and REA2.
The Forward-Difference Approximation (FDA)
The FDA method for the numerical differentiation of a function can be derived by considering differentiation
from first principles, or alternatively by considering Taylor’s Expansion.
1. Differentiation from first principles:
|
f‘(x) = limit
|
dx -> 0 |
f(x+dx) - f(x)
-------------dx
As a computer cannot divide by zero the computed (finite) version of this expression is an approximation
where dx is small but not zero, I denote this value by h. Now f‘(x) is approximated by: (f(x+h)-f(x))/h
This is the Forward-Difference Approximation for the numerical derivative of a function, it has the most
basic form for a numerical derivative and is the least accurate :
+--------------------------------+
| FDA = ( f(x+h) - f(x) ) / h
|
+--------------------------------+
h is small but not zero
Example:
Compute the first derivative of f(x) = 3x3 + 2x2 + x at x=3 and x=10 using the FDA with h = 0.01
FDA(3) = ( f(3.01) - f(3) ) / 0.01 = 94.2903
by calculus f‘(3) = 94.0000 => error is 0.2903
(0.3%)
FDA(10) = ( f(10.01) - f(10) ) / 0.01 = 941.9203
by calculus f‘(10) = 941.0000 => error is 0.9203
(0.1%)
10
2. Taylor’s Expansion:
The FDA can also be obtained by rearranging the Taylor Expansion:
f(x+h) = f(x) + h f‘(x)/1! + h^2 f‘‘(x)/2! + h^3 f‘‘‘(x)/3! + ....
Rearrange for the FDA:
( f(x+h) - f(x) ) / h = f‘(x) + h f‘‘(x)/2 + h^2 f‘‘‘(x)/6 + ....
the left hand side is the FDA
=> FDA = f‘(x) + (h/2) f‘‘(x) + O(h^2)
------------------------/
\
the derivative
the truncation error in the FDA
Consider the example of the numerical first derivative of f(x) = 3x3 + 2x2 + x , at x=3 and x=10 , h =
0.01 We obtained the results :
FDA(3) = 94.2903 and error = 0.2903
FDA(10) = 941.9203 and error = 0.9203
We can check that the error is (h/2).f“(x):
f“(x) = 18x + 4 , the error(x) = h(9x+2), error(3) = 0.2900 as above and error(10) = 0.9200 as above. The
small difference between the results is due to the omission of the O(h2 ) term in the expression for the error.
Summary:
o The first derivative of a function f(x) is approximated by: FDA = ( f(x+h) - f(x) ) / h where h is small
but not zero.
o The error is approximately (h/2).f“(x) , i.e. proportional to h - to minimise the error choose a small value
of h
o The error in the FDA is called a truncation error as it is due to the truncation of the higher-order terms
in the Taylor expansion.
Note that h should not be too small as round-off errors in the machine arithmetic increase as h decreases
(always use double precision variables! the ”kind=8” specifier in Fortran, and the ”double” declaration in
C/C++).
The Central-Difference Approximation (CDA)
The CDA gives an improved (higher-order) method:
+--------------------------------------+
|
CDA = ( f(x+h) - f(x-h) ) / (2h)
|
+--------------------------------------+ h is small but not zero
It can be shown (see the lecture) from Taylor’s Expansion that
CDA = f‘(x) + (h^2/6) f‘‘‘(x) + O(h^4)
---------------------------/
\
the derivative
the truncation error in the CDA
The CDA is a higher-order method than the FDA as it gives an error which is proportional to h2 (the error
is therefore much smaller). Also, the error is proportional to the third derivative, F“‘(x) which may further
11
reduce the error with respect to the FDA error which has an f“(x) dependence.
Richardson Extrapolation Approximation (REA)
A higher-order method is given by the Richardson Extrapolation Approximation:
+-------------------------------------------------+
| REA = (f(x-2h)-8f(x-h)+8f(x+h)-f(x+2h))/(12h) |
+-------------------------------------------------+ h is small but not zero
It can be shown from Taylor’s expansion that
REA = f‘(x) + - (h^4/30) f‘‘‘‘‘(x) + O(h^6)
---------------------------------/
\
the 1st derivative
the truncation error in the REA
The truncation error is proportional to the fifth derivative and h4 .
The results below compare the performance of the above three methods.
f(x) = 3x^3 + 2x^2 + x , first derivative at x=3 and x=10 , h = 0.01
+------------------------------------+--------------------------------------+
| FDA(3)=94.290300 (error=0.290300) | FDA(10)=941.920300 (error=0.920300) |
| CDA(3)=94.000300 (error=0.000300) | CDA(10)=941.000300 (error=0.000300) |
| REA(3)=94.000000 (error=0.000000) | CDA(10)=941.000000 (error=0.000000) |
+------------------------------------+--------------------------------------+
The results illustrate that the CDA can give reasonably accurate results and so is worth implementing as a
simple method. The REA in this case is exact as the truncation error is proportional to the fifth derivative
which is zero.
Implementation
Implementation of the above methods is simple. The program requires a definition of f(x) and the two
inputs h and x.
Algorithm 2
! Program to compute the first
! The FDA, CDA and REA methods
input "input the value of x ",
input "input the value of h ",
fda
cda
rea
derivative of a function f(x)
are implemented for comparison.
x
h
= (f(x+h)-f(x))/h
= (f(x+h)-f(x-h))/(2*h)
= (f(x-2*h)-8*f(x-h)+8*f(x+h)-f(x+2*h))/(12*h)
output "FDA = ", fda
output "CDA = ", cda
output "REA = ", rea
function definition f(x) = 3x^3 + 2x^2 + x
Note: you should use double precision variables to avoid large round-off errors.
12
The Central-Difference Approximation for a Second Derivative (CDA2)
The second derivative of a function f(x) can be approximated by:
+----------------------------------------------+
|
CDA2 = ( f(x-h) - 2f(x) + f(x+h) ) / h^2
|
+----------------------------------------------+ h is small but not zero
It can be shown from Taylor’s Expansion that
CDA2 = f‘‘(x) + (h^2/12) f‘‘‘‘(x) + O(h^4)
------------------------------/
\
the 2nd derivative
the truncation error in the CDA2
The Richardson Extrapolation Approximation for a Second Derivative (REA2)
The second derivative of a function f(x) can be approximated by:
+----------------------------------------------------------------+
| REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h)) / (12h^2) |
+----------------------------------------------------------------+
h is small but not zero
It can be shown from Taylor’s Expansion that
REA2 = f‘‘(x) +
-----/
the 2nd derivative
- (h^4/90) f‘‘‘‘‘‘(x) + O(h^6)
-----------------------------\
the truncation error in the REA2
Summary of Methods
FDA
CDA
REA
= (f(x+h)-f(x))/h
= (f(x+h)-f(x-h))/(2h)
= (f(x-2h)-8f(x-h)+8f(x+h)-f(x+2h))/(12h)
= f‘(x)
= f‘(x)
= f‘(x)
+ (h/2)
f‘‘(x)
+ (h^2/6) f‘‘‘(x)
- (h^4/30) f‘‘‘‘‘(x)
+ ....
+ ....
+ ....
CDA2 = (f(x-h)-2f(x)+f(x+h))/h^2
= f‘‘(x) + (h^2/12) f‘‘‘‘(x)
+ ....
REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h))/(12h^2) = f‘‘(x) - (h^4/90) f‘‘‘‘‘‘(x) + ....
Errors - the truncation error and the round-off error
The approximation methods FDA, CDA, and REA can be used to demonstrate the effect of truncation errors and round-off errors. The error, for example (h2 /6).f“‘(x) inherent to the CDA method, is an example
of a truncation error, i.e. by truncating higher order terms in the Taylor expansion the method becomes
only approximate. Another source of error exists when the FDA, CDA or REA are computed; this is the
round-off error due to limited precision in numerical arithmetic (numerical values are stored in the computer
with a limited number of binary bits). Round-off errors are compounded in arithmetic operations. The total
error is therefore a combination of the two error sources:
Total Error = Truncation Error + Round-off Error
13
The important parameter here is the value of h; the truncation error increases with increasing h, the roundoff error decreases with increasing h
Given a particular method, for example the CDA, the most accurate computed derivative is obtained by
minimising the total error, this corresponds to finding the optimal value of h.
This optimal value will differ depending on
1. The numerical method (FDA, CDA, REA, etc).
2. The function being differentiated, and the value of x.
3. The precision of the arithmetic (single-, double-, quad-precision).
To arrive at the optimal value some study of the output of your program is needed.
The total error in the CDA is given by:
|------------------------|
| Error = CDA(x) - f‘(x) | where f‘(x) is the
|------------------------| unknown first derivative.
A plot of |Error| against h will have
the form indicated qualitatively in
the figure. The rise on the right
(as h increases) is due to the
truncation error which has the form
of h^2, the rise on the left (as h
decreases) is due to round-off errors.
log(|Error|)
|
| |
/
|
\
/
|
\
/
|
\ _ /
|____________________
-10 -8 -6 -4 -2
log(h)
A minimum error exists at some intermediate value of h corresponding to a minimum in the plot. If f‘(x)
is unknown we can only plot CDA versus h, but, as f’(x) is a constant the plot will have the same shape
(only shift up or down). In this case again a minimum (or stationary) value in the plot will be observed
corresponding to a minimum error.
The situation is less clear when the round-off error has the opposite sign to the truncation error, the CDA
may vary erratically about the true value, though again a relatively stable stationary value in a plot of CDA
versus h corresponds to a solution close to a minimum error.
This procedure of outputting the CDA with different values of h and then interpreting the results is an
example of step 8 in the section Errors and Problem Solving.
The lab exercise will require you to perform such an analysis.
14
2.3
Lab Exercises
We will estimate the derivative of the function f(x) = -1/x at x=3 and check the result against the exact
result f‘(x) = 1/x2 which gives f‘(3) = 1/32 = 0.111 111 111 111...
i.e, Error = Estimate - 0.111 111 111 111.
Task 1
Compare the accuracy of FDA(3), CDA(3) and REA(3).
For this use h = 0.01, and double precision data (”kind=8” or ”double”).
Task 2
For CDA(3) investigate the effect of varying h, try h = 10−1 , 10−2 , 10−3 , ...., 10−12 . Use double precision
data (”kind=8” or ”double”). Which value of h gives the most accurate estimate?
Task 3
Repeat task 2 replacing CDA with REA. Which value of h gives the most accurate estimate?
Task 4
Repeat task 2 with single-, double-, and quad-precision. Comment on the results.
Additional Tasks
Investigate the CDA2 and REA2 expressions for finding the second derivative of a function.
15
2.4
Lab Solutions
We will estimate the derivative of the function f(x) = -1/x at x=3 and check the result against the exact
result f‘(x) = 1/x2 which gives f‘(3) = 1/32 = 0.111 111 111 111...
i.e, Error = Estimate - 0.111 111 111 111.
Task 1
Compare the accuracy of FDA(3), CDA(3) and REA(3).
For this use h = 0.01, and double precision data (”kind=8” or ”double”).
Solution
Program: eee484ex2a (see the downloads page).
FDA
CDA
REA
Tru
=
=
=
=
0.110741971207
0.111112345693
0.111111111056
0.111111111111
Err = -0.000369139904
Err = 0.000001234582
Err = -0.000000000055
For the same value of h and x, the accuracy increases as the order of the method increases. Remember that
truncation error for the FDA, CDA, and REA are proportional to h, h2 , and h4 respectively.
Task 2
For CDA(3) investigate the effect of varying h, try h = 10−1 , 10−2 , 10−3 , ...., 10−12 . Use double precision
data (”kind=8” or ”double”). Which value of h gives the most accurate estimate?
Solution
Program: eee484ex2b (see the downloads page).
h
0.100000000000
0.010000000000
0.001000000000
0.000100000000
0.000010000000
0.000001000000
0.000000100000
0.000000010000
0.000000001000
0.000000000100
0.000000000010
0.000000000001
CDA(3)
Error=CDA(3)-Tru
0.111234705228 0.000123594117
0.111112345693 0.000001234582
0.111111123457 0.000000012346
0.111111111235 0.000000000124
0.111111111113 0.000000000002
0.111111111123 0.000000000012
0.111111110900 -0.000000000211
0.111111110929 -0.000000000182
0.111111129574 0.000000018463
0.111111212868 0.000000101757
0.111112045942 0.000000934831
0.111108659166 -0.000002451946
As h decreases in size the truncation error is seen to decrease in the expected form; i.e. the error is proportional to h2 so each decrease in h by a factor of 10 decreases the error by a factor of 102 . The minimum
error is obtained with h=0.00001 after which round-off error dominates. The round off error increases as h
decreases in size. Remember that the total error is the sum of the truncation error and the round off error.
Further studies reveal that the optimal value of h depends on the function and the value of x.
Task 3
Repeat task 2 replacing CDA with REA. Which value of h gives the most accurate estimate?
16
Solution
Program: eee484ex2c (see the downloads page).
h
0.100000000000
0.010000000000
0.001000000000
0.000100000000
0.000010000000
0.000001000000
0.000000100000
0.000000010000
0.000000001000
0.000000000100
0.000000000010
0.000000000001
REA(3)
0.111110559352
0.111111111056
0.111111111111
0.111111111112
0.111111111113
0.111111111116
0.111111110725
0.111111113438
0.111111107981
0.111110996954
0.111109886799
0.111104541456
Error=REA(3)-Tru
-0.000000551759
-0.000000000055
-0.000000000000
0.000000000000
0.000000000002
0.000000000005
-0.000000000386
0.000000002327
-0.000000003130
-0.000000114157
-0.000001224312
-0.000006569655
Replacing CDA with REA results in much small truncation errors. The round off errors, however, are similar
to those generated in the CDA method. Consequently, the optimal value of h occurs earlier at about h =
0.001.
Task 4
Repeat task 2 with single, double, and quad precision. Comment on the results.
Solution
Program: eee484ex2d (see the downloads page).
h
0.100000000000
0.010000000000
0.001000000000
0.000100000000
0.000010000000
0.000001000000
0.000000100000
0.000000010000
0.000000001000
0.000000000100
0.000000000010
0.000000000001
CDA(3) (float)
(double)
Error (kind=4) Error (kind=8)
0.000123433769 0.000123594117
0.000000916421 0.000001234582
-0.000009402633 0.000000012346
-0.000079058111 0.000000000124
0.000647775829 0.000000000002
-0.003491610289 0.000000000012
-0.160781651735 -0.000000000211
-0.607816457748 0.000000000182
-5.078164577484 0.000000018463
-49.78163909912 0.000000101757
-496.8164062500 0.000000934831
-4967.164062500 -0.000002451946
(long double)
Error (kind=16)
0.000123594116 9200346063527
0.000001234581 6188081102136
0.000000012345 6803840879439
0.000000000123 4567902606310
0.000000000001 2345679012483
0.000000000000 0123456790123
0.000000000000 0001234567901
0.000000000000 0000012345679
0.000000000000 0000000123457
0.000000000000 0000000001235
0.000000000000 0000000000010
0.000000000000 0000000000106
Single precision (”kind=4” or ”float”) is clearly not appropriate for numerical differentiation; the round off
error dominates early and so the optimal value of h is large resulting in a poor accuracy.
Double precision (”kind=8” or ”double”) performs well but one should be careful not to choose a value of
h that is very small as this will result in significant round-off errors.
Quad precision (”kind=16” or ”long double”) again dramatically reduces round off errors, the errors becoming significant only at h = 10−12 for this case. The use of quad-precision, however, is not common as
double precision is often sufficient and quad-precision arithmetic takes significantly longer to compute on
32 bit platforms [64 bit becoming common? - this statement is outdated?].
17
Conclusion
The FDA method gives poor results and should not be used. The CDA method gives reasonable results
if you require a simple (easy to remember) method and do not require a very high precision. The REA
gives the best result of the three methods and should be used in applications where precision is important.
In this kind of numerical work it is advisable to use double precision data to avoid large round off errors.
Although quad precision is sometimes available (depending on the platform and compiler) it is not (yet)
a commonly used precision. The choice of the value of h can be important, the optimal value will depend
on the function, where it is being evaluated and the method used; one should not choose an arbitrary value.
Additional Tasks
Investigate the CDA2 and REA2 expressions for finding the second derivative of a function.
Solution
This is left to the student. Feel free to discuss your results with your teacher.
18
2.5
Example exam questions
Question
a) Write a computer program to evaluate the first derivative of a
function f(x) using the Central Difference Approximation method:
CDA = ( f(x+h) - f(x-h) ) / 2h
b) Using Taylor’s expansion show that the truncation error in this
approximation is given by: error = (h^2/6).f’’’(x) + O(h^4)
c) Theoretically, how can the error in the CDA be minimised?
In practice, what other type of error exists in this method.
d) i. Using the CDA with h=0.1, evaluate the first derivative of
f(x) = x^4 at x=3.2
ii. Using calculus, determine the value for the error in your result
and show that it equals (h^2/6).f’’’(x)
Question
a) Write a computer program to evaluate the first derivative of a
function f(x) using the Richardson Extrapolation Approximation method:
REA = ( f(x-2h) - 8f(x-h) + 8f(x+h) - f(x+2h) ) / 12h
b) Using Taylor’s expansion show that the truncation error in this
approximation is given by: error = -(h^4/30).f’’’’’(x) + O(h^6)
c) Theoretically, how can the error in the REA be minimised?
In practice, what other type of error exists in this method.
d) i. Using the REA with h=0.1, evaluate the first derivative of
f(x) = x^6 at x=3.2
ii. Using calculus, determine the value for the error in your result
and show that it equals -(h^4/30).f’’’’’(x)
Question
a) Write a computer program to evaluate the second derivative of a
function f(x) using the Central Difference Approximation method:
CDA2 = ( f(x-h) - 2f(x) + f(x+h) ) / h^2
b) Using Taylor’s expansion show that the truncation error in this
approximation is given by: error = (h^2/12).f’’’’(x) + O(h^4)
c) Theoretically, how can the error in the CDA2 be minimised?
In practice, what other type of error exists in this method.
d) i. Using the CDA2 with h=0.1, evaluate the second derivative of
f(x) = x^5 at x=3.2
ii. Using calculus, determine the value for the error in your result
and show that it equals (h^2/12).f’’’’(x)
19
Question
a) Write a computer program to evaluate the second derivative of a
function f(x) using the Richardson Extrapolation Approximation method:
REA2 = (-f(x-2h)+16f(x-h)-30f(x)+16f(x+h)-f(x+2h))/(12h^2)
b) Using Taylor’s expansion show that the truncation error in this
approximation is given by: error = -(h^4/90).f’’’’’’(x) + O(h^6)
c) Theoretically, how can the error in the REA2 be minimised?
In practice, what other type of error exists in this method.
d) i. Using the REA2 with h=0.1, evaluate the second derivative of
f(x) = x^7 at x=1.5
ii. Using calculus, determine the value for the error in your result
and show that it equals -(h^4/90).f’’’’’’(x)
20
3
Roots, Maxima, Minima (closed methods)
3.1
Topics Covered
gnuplot$>$ plot [0:10] exp(-x)*(x**3-6*x**2+8*x)
http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif
o The sequential search method for finding roots; the student should remember the method, and be able to
derive an expression for the number of iterations required to obtain a given accuracy.
o The bisection method for finding roots; the student should remember the method, and be able to derive
an expression for the number of iterations required to obtain a given accuracy.
http://www1.gantep.edu.tr/~andrew/eee484/images/bisection_method.png
o The sequential search method for maxima and minima; the student should remember the method, and be
able to derive an expression for the number of iterations required to obtain a given accuracy.
http://www1.gantep.edu.tr/~andrew/eee484/images/extrema_example.png
3.2
Lecture Notes
Introduction
Numerical methods for finding roots and extremum (maxima and minima) of functions are used when analytical solutions are difficult (or impossible), or when a calculation is part of a larger numerical algorithm.
We will study a number of basic numerical methods starting from very simple (and inefficient) sequential
searches to very powerful Newton’s methods. The algorithms are divided into two groups: closed methods[this week] (where the solution is initially bracketed), and open methods[next week] (where the solution
is not bracketed).
Root finding (closed methods)
Definition:
The root, xo , of a function f(x) is such that f(xo )=0.
For example if f(x) = x3 -28, then the root xo is 281/3 = 3.0365889718756625194208095785...
In general we will not find an exact solution especially given that roots tend to be irrational. Our strategy
will be to define how accurate we want the solution to be and then compute the result approximately to this
accuracy. This is called a tolerance, for example: Tolerance = 0.0001, this means that the root is required
to be correct within plus or minus 0.0001 (four decimal place accuracy). For this to work we also need to
be able to determine an error estimate with which the tolerance is compared and the algorithm terminates
when —error estimate— < tolerance
The sequential search method (closed method)
In the sequential search first the position of the root is estimated such that a bracket a,b can be formed
placing a lower- and upper-bound on the root. For this some initial analysis of the function is required.
Note that if a single root (or odd number of roots) is bracketed by a and b then there will be a sign change
between f(a) and f(b). During this search (scan) of the function we can identify a root as follows:
21
Search the function in the range a ≤ x ≤ b in steps of dx until we see a sign change. An estimate of the
root can then be given as the center of the last inspected step with a maximum error of dx/2 (and mean
error of about dx/4).
/sign change
i=0 i=1 i=2 i=3 i=4 ...
/
i=n
-----|---|---|---|---|---|---|-o-|---|---|---|---|---|---|---|---- x
a
/root
dx
b
/
/
if the sign change occurs between x and x+dx
then root estimate = x + dx/2, maximum error = dx/2
Algorithm 3a
Sequential search method for finding the root of f(x).
All roots between a and b are found.
input a, b, tolerance
dx = 2*tolerance
n = nint((b-a)/dx)
do i = 1, n
x = a + i*dx
if ( f(x)*f(x+dx) < 0 ) output "root = ", x+dx/2
end do
function definition f = x**3-28
Results for a=3.0, b=3.1 and different values of tolerance.
root
root
root
root
root
root
root
root
=
=
=
=
=
=
=
=
x0
3.03
3.037
3.0365
3.03659
3.036589
3.0365889
3.03658897
3.036588971
error
-0.66E-2
0.41E-3
-0.89E-4
0.10E-5
0.28E-6
-0.72E-7
-0.19E-8
-0.88E-9
tolerance
n
1.E-2
5
1.E-3
50
1.E-4
500
1.E-5
5000
1.E-6
50000
1.E-7
500000
1.E-8 5000000
1.E-9 50000000 in 0.25 seconds!
We obtain a high accuracy (tolerance = 1e-9) in less than one second (50 million steps). However if the
initial bracket was a=0, b=10 then we would need 2 billion steps taking 30 seconds.
We can see that the error is proportional to 1/n and the run-time is proportional to 1/tolerance. We can
do this much more efficiently using the following bisection method.
The Bisection method (closed method)
In the Bisection method first the position of the root is estimated such that a bracket can be formed placing
a lower- and upper-bound on the root. For this some initial analysis of the function is required. A first
estimate of the root is then computed as the mid-point between the two bounds
22
/
LowerBound
/
UpperBound
------x----------/------o-----------------x-----/
MidPoint
/
/
MidPoint = ( LowerBound + UpperBound ) / 2
Consider the function F(x) = x3 -28; the root lies somewhere between x=3.0 and x=3.1. This can be shown
be evaluating the function at these two values: F(3.0) = 27 - 28 = -1 and F(3.1) = 29.791 - 28 = +1.791;
the function changes sign implying that the root is bracketed between x=3.0 and x=3.1. The first estimate
of the root is then MidPoint = (3.0+3.1)/2 = 3.05.
We can improve on this estimate by determining which side of MidPoint the root lies and then moving the
bracket accordingly and re-evaluating MidPoint:
if F(LowerBound) . F(MidPoint) is negative
then the root is to the left of MidPoint => move UpperBound to MidPoint
else the root is to the right of MidPoint => move LowerBound to MidPoint
/
/
LowerBound
/
UpperBound
-----x---------/------o----------------x-----/
MidPoint
-ve
/
+ve
+ve
/
/
<- root is this way
MidPoint is recalculated and the procedure iterated until HalfBracket is less than Tolerance, where HalfBracket = (UpperBound-LowerBound)/2 is the maximum possible error in our estimate. Each iteration
halves (bisects) the bracket (and therefore halves the maximum possible error) hence the term ”Bisection”.
The following algorithm represents a Bisection search for the root of F(x) = x3 -28
Algorithm 3b
input lb, ub, tolerance
do
hb = (ub-lb)/2
! the error estimate
mp = (ub+lb)/2
! the new root estimate
output mp, hb
if ( hb < tolerance ) exit ! terminate if tolerance is satisfied
if ( f(lb)*f(mp) < 0 ) ub=mp else lb=mp
end do
function definition f = x**3-28
For inputs 3.0, 3.1, 0.001 the result of the algorithm is:
23
MidPoint
3.0500
3.0250
3.0375
3.0313
3.0344
3.0359
3.0367
HalfBracket
0.0500
0.0250
0.0125
0.0062
0.0031
0.0016
0.0008
The algorithm terminates after six iterations when the value of HalfBracket (the error estimate) is smaller
than the value of Tolerance; i.e. 0.0008 is less than 0.001. The final value of MidPoint (the root estimate)
for this tolerance is 3.037. A trace of values is shown below:
iteration
0
1
2
3
4
5
6
F(MidPoi.)
0.3726
-0.3194
0.0252
-0.1474
-0.0612
-0.0180
0.0036
LowerB. MidPoint
3.0000
3.0500
3.0000
3.0250
3.0250
3.0375
3.0250
3.0313
3.0313
3.0344
3.0344
3.0359
3.0359 *3.0367*
UpperB.
3.1000
3.0500
3.0500
3.0375
3.0375
3.0375
3.0375
F(L)*F(M)
-ve
+ve
-ve
+ve
+ve
+ve
-ve
HalfBracket
0.0500
0.0250
0.0125
0.0062
0.0031
0.0016
0.0008
Notice that each iteration halves the size of the search region, hence the term Bisection. The table below
gives the number of iterations required to satisfy a given tolerance (the values in brackets are explained
later).
Tolerance
10^-1
10^-2
10^-3
10^-4
10^-5
10^-6
10^-7
10^-8
10^-9
MidPoint(root)
3.0500000000
3.0312500000
3.0367187500
3.0366210937
3.0365905762
3.0365882874
3.0365889549
3.0365889728
3.0365889720
HalfBracket
0.0500000000
0.0062500000
0.0007812500
0.0000976562
0.0000061035
0.0000007629
0.0000000954
0.0000000060
0.0000000007
true error
+0.0134110281
-0.0053389719
+0.0001297781
+0.0000321219
+0.0000016043
-0.0000006845
-0.0000000170
+0.0000000009
+0.0000000002
iterations
0 (-1.0)
3 ( 2.3)
6 ( 5.6)
9 ( 9.0)
13 (12.3)
16 (15.6)
19 (18.9)
23 (22.3)
26 (25.6)
As expected, a greater number of iterations is required to achieve a greater accuracy, for this method the
convergence is exponential (3 or 4 iterations increases the accuracy by a factor of ten). The value of the
HalfBracket is the largest possible error in the calculated root. This is illustrated in the table by comparing
this value with the true error = MidPoint - 281/3 , the true error is similar to but always smaller than the
value of HalfBracket. An expression for the relationship between the error and the number of iterations can
be derived as follows:
Given an initial error ei , after one iteration the error is ei /2 and after n iterations the error is ei /2n = ef
(the final error), and so taking logs and rearranging for n we have:
n = log(ei / ef ) / log(2). This is the number of iterations required to achieve an accuracy of ef given an
initial accuracy of ei .
In the above example the initial accuracy is (3.1-3.0)/2 = 0.05 , the final accuracy en must be less than
24
the tolerance. The expression becomes: n = log(0.05/Tolerance) / log(2) The results from this expression
are shown in the brackets in the above table. The number of iterations performed by the algorithm is the
same as that indicated by the above expression (rounded up to the nearest integer). We see that the error
is proportional to 2¡sup¿-n¡/sup¿ and the run-time is proportional to log(1/tolerance).
The Bisection method is similar to the way we search for a word in a dictionary. The upper and lower
bounds are the first and last page respectively and we open the book at the centre page. The word lies
either to the left or the right of the current page, if it is to the right then we turn to the page half way
through the book to the right (bisecting the pages to the right). We continue the search in the appropriate
direction converging exponentially towards required page. In this way a page can be found in a 1000 page
dictionary in only n = log(500/1) / log(2) = 9 bisections (the tolerance here is 1 page, the initial HalfBracket
is 1000/2=500 pages). Try it for your self.
Maxima and Minima [extremum] (closed methods)
[See the figure give in the lecture (URL)]
For some functions we can use differential calculus to find extremum; we know that a minimum occurs when
f’(x)=0 and f”(x)>0, and a maximum when f’(x)=0 and f”(x)<0. For example f(x) = x2 - 8x + 19 and so
f’(x) = 2x - 8 = 0 and so an extremum occurs at x = 4. And, f”(x) = 2 (+ve) and so this is a minimum.
Also by inspection f(x) = x2 - 8x + 19 = (x-4)2 + 3 and so f(4) is a minimum.
However, often it may be difficult, or impossible, to treat a function analytically and we must use a numerical
method for finding extremum. Also, we must be careful not to mistake local extremum for global extremum.
[See the figure given in the lecture (URL)]
We can attempt to avoid making this mistake by inspecting the function graphically (or equivalently performing a sequential search) or re-running our algorithm a number of times with a broad variety of different
inputs.
For investigating methods for finding extremum, out test function is: f(x) = e−x (x3 - 6x2 + 8x) and we are
interested in x≥0.
http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif
Sequential Search (closed method)
If we plot our test function, say in the range 0<x<10, then we are performing a sequential search. During
this search (scan) of the function we can identify extremum as follows:
Search the function in the range a ≤ x ≤ b in steps of dx, with x as the current position, the following
conditions are tested:
if [f(x) > f(x-dx) and f(x) > f(x+dx)] then f(x) is a local or global maximum
if [f(x) < f(x-dx) and f(x) < f(x+dx)] then f(x) is a local or global minimum
[See the figure give in the lecture (URL)]
here dx defines the tolerance, and the number of points inspected is nint[(b-a)/dx] + 1.
i.e. we loop over i = 0 to n and define xi = a + i dx.
Algorithm 3c
Sequential search method for finding minimum and maximum of f(x).
All global and local minimum and maximum between a and b are found.
25
input a, b
input dx
n = nint((b-a)/dx)
do i=0,n
x = a + i*dx
if [f(x) < f(x-dx) and f(x) < f(x+dx)] output "minima ", f(x), "at x = ", x
if [f(x) > f(x-dx) and f(x) > f(x+dx)] output "maxima ", f(x), "at x = ", x
end do
end
define f(x) = exp(-x) (x^3 - 6x^2 + 8x)
With a=0, b=10 and dx=10−6 the output of this algorithm is:
maxima
minima
maxima
1.592547 at x =
-0.165150 at x =
0.120121 at x =
0.510 711
2.710 831
5.778 457
(actual error is 0.4x10^-6)
(actual error is 0.4x10^-6)
(actual error is 0.1x10^-6)
Note that the results are given to 6 decimal places as this is the limit of the accuracy defined by dx = 10−6 .
To achieve this accuracy the algorithm needs to inspect 10,000,001 points, this takes about 5 seconds on
my 2.4 GHz CPU which may be considered much to slow for general purposes (to increase the accuracy to
10−9 the algorithm will take 5000 seconds!).
We can see that the error is proportional to 1/n and the run-time is proportional to 1/tolerance. A more
efficient method is the Golden Section search; this however is much more complex to derive and implement.
We will look at another powerful (but simple) extremum finder in the next lecture (open methods).
26
3.3
Lab Exercises
Task 1
Implement algorithms 3a, 3b, 3c given in the lecture into computer programs. Check the outputs of your
programs against the solutions given in the lecture.
Task 2
Using the bisection method, evaluate to at least 6 decimal place accuracy the root of the following function:
f(x) = x2 + loge (x) - 3.73
gnuplot> plot [1:2] x**2 + log(x) - 3.73
http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig1.gif
For each method write down:
- The evaluated root (to the appropriate number of decimal places).
- The estimated error, explain how you arrive at your value.
- The number of iterations performed.
- The theoretically expected number of iterations for the required accuracy.
Task 3
Using any of your computer programs, find all extremum and roots of the function f(x) = ex - 3x2 (for 0 >
x > 4) to 6 decimal place accuracy.
gnuplot> plot [0:4] exp(x) - 3*x**2
http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig2.gif
If you have time, experiment with some more functions.
27
3.4
Lab Solutions
Task 1
Implement algorithms 3a, 3b, 3c given in the lecture into computer programs. Check the outputs of your
programs against the solutions given in the lecture.
Solutions
See eee484ex3a, eee484ex3b, eee484ex3c in the course downloads page.
Task 2
Using the bisection method, evaluate to at least 6 decimal place accuracy the root of the following function:
f(x) = x2 + loge (x) - 3.73
gnuplot> plot [1:2] x**2 + log(x) - 3.73
http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig1.gif
For each method write down:
- The evaluated root (to the appropriate number of decimal places).
- The estimated error, explain how you arrive at your value.
- The number of iterations performed.
- The theoretically expected number of iterations for the required accuracy.
Solutions
With an initial approximate analysis, the root is determined to be between 1.0 and 2.0
i.e. F(1.0) = -2.73 and F(2.0) = +0.96
Program eee484ex3b (see the downloads page)
MidPoint
1.500 000
1.750 000
1.875 000
1.812 500
1.781 250
1.765 625
1.773 437
1.777 343
1.775 390
1.776 367
1.775 878
1.776 123
1.776 245
1.776 306
1.776 336
1.776 351
1.776 359
1.776 355
1.776 353
1.776 354
00
00
00
00
00
00
50
75
62
19
91
05
12
15
67
93
56
74
84
79
HalfBracket
0.500 000 00
0.250 000 00
0.125 000 00
0.062 500 00
0.031 250 00
0.015 625 00
0.007 812 50
0.003 906 25
0.001 953 12
0.000 976 56
0.000 488 28
0.000 244 14
0.000 122 07
0.000 061 04
0.000 030 52
0.000 015 26
0.000 007 63
0.000 003 81
0.000 001 91
0.000 000 95
- initial estimate
- iteration 1
- iteration 19
The program terminates after 19 iterations because HalfBracket (the error estimate) is less than 0.000 001
28
(the tolerance). The result for the root is the final value of MidPoint = 1.776 355 (6dp accuracy). The theoretical number of iterations required is log(ei /ef )/log(2) = log(0.5/0.000001)/log(2) = 18.9 = 19 (as above).
Task 3
Using any of your computer programs, find all extremum and roots of the function f(x) = ex - 3x2 (for 0 >
x > 4) to 6 decimal place accuracy.
gnuplot> plot [0:4] exp(x) - 3*x**2
http://www1.gantep.edu.tr/~andrew/eee484/images/lab3-fig2.gif
Solutions
The plot indicates that there is a maximum at about 0.2, a root at about 1.0, minimum at about 2.8 and
a second root at about 3.7. We will use eee484ex3a and eee484ex3c to perform a sequential search in the
range 0,4 with a tolerance of 0.000 001.
Results:
eee484ex3a
root = 0.910007
root = 3.733079
check with eee484ex3b with two brackets
{0.90, 0.92} root = 0.910008
{3.73, 3.74} root = 3.733079
eee484ex3c
maxima at x =
minima at x =
0.204481
2.833148
29
3.5
Example exam questions
Question 1 (Bisection Method)
a) Show that, for the bisection root-finding method, the number of
iterations, n, required to reduce the error from an initial value
of e_i to a final value of e_f is given by: n = log( e_i / e_f ) / log(2)
b) Given that a root of the function f(x) = e^x - 3x^2 is near 1.0,
estimate the number of iterations required to achieve an accuracy
of at least 6 decimal places.
Question 2 (Sequential search: root)
Explain how a sequential search can be performed to find roots
of a function f(x). Include in your answer an explanation of
the relationship between the number of function evaluations and the
accuracy of the solution.
Question 2 (Sequential search: extremum)
Explain how a sequential search can be performed to find maxima and
minima of a function f(x). Include in your answer an explanation of
the relationship between the number of function evaluations and the
accuracy of the solution.
30
4
Roots, Maxima, Minima (open methods)
4.1
Topics Covered
gnuplot$>$ plot [0:10] exp(-x)*(x**3-6*x**2+8*x)
http://www1.gantep.edu.tr/~andrew/eee484/images/extrema-test-function.gif
o The Newton-Raphson root-finding method; the student should be able to derive the iterative formula,
write a computer program implementing it, and use the formula to find the root of a function by hand
(using a pocket calculator).
o Newton’s Square-root; the student should be able to derive the Newton’s Square-root iterative formula
from the the Newton-Raphson iterative formula, write a computer program, and use the method to calculate
by hand (using a pocket calculator) the square-root of a positive number.
o The Secant root-finding method.
o Newton’s method and modified Newton’s method for finding extremum.
4.2
Lecture Notes
Introduction
Continuing from last week, we now look at open methods (not requiring the solution to be bracketed) for
finding roots and extrema of functions.
The Newton-Raphson root finding method (open method)
The Newton-Raphson method for finding the root of a function f(x) uses information about the first derivative f ’(x) to estimate how far (and which direction) the root lies from the current position.
Theory:
Let x be an approximation to the root xo , the error e is defined as: e = x - xo , and so we can write the root
as xo = x - e.
Taylor’s Expansion gives:
f(xo ) = f(x-e) = f(x) - e f ’(x)/1! + e2 f ”(x)/2! - e3 f ”’(x)/3! + ....
Ignoring powers of e2 and higher we arrive at an approximation to the root:
f(xo ) = f(x) - e f ’(x) (approximately).
f(xo ) = 0 and so we can write: 0 = f(x) - e f ’(x) and so e = f(x) / f ’(x) (approximately) i.e. we have an
estimate of the error in the approximation x. We can now correct the root estimate x for this error and
arrive at a value closer to the root: xo = x - e and so xo = x - f(x) / f ’(x) (approximately).
This is the Newton-Raphson improved estimate of the root xo given an initial estimate x.
This improved estimate is still not exact as we have not included all the terms in the Taylor expansion (it
has a truncation error), but by iterating the procedure we can repeat the improvements; the iteration is
illustrated as follows:
xi+1 = xi - f(xi ) / f ’(xi ) where xi is the current estimate and xi+1 is the next, improved, estimate. This
is the Newton-Raphson iterative formula for the root of a function; the method can be represented by the
following algorithm.
31
Algorithm 4a
input x
! input the initial root estimate
input Tolerance
! input the tolerance (required accuracy)
do
Error = f(x) / f’(x)
! the error estimate
output x, Error
! output current estimates
if ( |Error| < Tolerance ) terminate ! terminate if tolerance is satisfied
x = x - Error
! subtract the error estimate
end do
define f(x) = x^3 - 28
define f’(x) = 3 x^2
Notes:
- The algorithm is very simple!
- It terminates when a tolerance is satisfied.
- No bracket is required, only an initial estimate of the root.
- The algorithm requires the first derivative of the function.
- The error estimate can be negative so the absolute value is compared to the tolerance.
- If f ’(x) is close to zero ( i.e. near a turning point in the function f(x) ) then the estimate of the error, f(x)
/ f ’(x), can be very large launching the solution far away from the root. The algorithm might crash with
an overflow or error or take a long time to recover.
Convergence for Newton-Raphson is very rapid. The error in the error estimate is proportional to the square
of the error; this vanishes quickly for an error << 1: on each iteration the number of correct significant
figures doubles. This is demonstrated by executing the above algorithm for:
f(x) = x3 -28, f ’(x)=3x2 , Tolerance = 10−12 , and an initial estimate of x=3.0:
iteration
0
1
2
3
Root estimate (x)
Error estimate
3.000000000000000000 -0.037 037 037 037
3.037037037037037037
0.000 447 999 059
3.036589037977100638
0.000 000 066 101
3.036588971875663958
0.000 000 000 000
037
936
436
001
037
399
680
439
The program converges in just 3 iterations giving an accuracy of the order of 10−15 .
Newton’s Square-Root
A special case of the Newton-Raphson method can be written for the square-root of a positive number p:
let x = p1/2 then the root of the function f(x) = x2 - p gives the root of p. In the iterative method of
Newton-Raphson xi+1 = xi - f(xi ) / f ’(xi ), the first derivative is simply 2x and so the formula can be written
as: xi+1 = xi - (x2 i - p) / 2xi or xi+1 = xi - (xi - p/xi )/2
Algorithm 4b
input p
input Tolerance
x = p
do
Error = (x-p/x)/2
output x, Error
if ( |Error| < Tolerance ) terminate
x = x - Error
end do
! input the number we want the sqrt of
! input the tolerance (required accuracy)
! initial sqrt estimate
!
!
!
!
the error estimate
output current estimates
terminate if tolerance is satisfied
subtract the error estimate
32
Notes:
1. The estimate x is initially set to p though this could be p/2 (but not zero).
2. We need a tolerance to provide a termination condition.
3. The functions f(x) = x2 - p and f ’(x) = 2x do not need to be defined, they are absorbed directly into the
error expression.
Example
For p=2 and a tolerance of 10−9 the output of the algorithm is:
2.000
1.500
1.416
1.414
1.414
000
000
666
215
213
000
000
666
686
562
000
000
667
275
375
0.500
0.083
0.002
0.000
0.000
000
333
450
002
000
000
333
980
123
000
000
333
392
900
002
The square root of 2 can therefore be written as 1.414213562 (9 dp) or subtracting the final error estimate
as 1.414213562373 (12 dp).
The Secant Method
The main disadvantage of the Newton-Raphson method is that it requires a knowledge of the first derivative.
If the first derivative is not known, or is inconvenient to implement, then it can be approximated numerically
by the iterative form of the Forward Difference Approximation, this leads to the Secant Method:
we have the Newton Raphson iterative formula xi+1 = xi - f(xi ) / f ’(xi ) and replacing f ’(xi ) with
( f(xi ) - f(xi−1 ) / ( xi - xi−1 ) we have xi+1 = xi - f(xi ) ( xi - xi−1 ) / ( f(xi ) - f(xi−1 ) where xi−1 is the previous
estimate, xi is the current estimate, and xi+1 is the next, improved, estimate. This is the Secant iterative
formula for the root of a function; the method can be represented by the following algorithm.
Algorithm 4c
input x0
input x1
input Tolerance
! input the lower bracket (previous estimate)
! input the upper bracket (current estimate)
! input the tolerance (required accuracy)
do
Error = f(x1) * (x1-x0) / (f(x1)-f(x0))
output x1, Error
if ( |Error| < Tolerance ) terminate
x0 = x1
x1 = x1 - Error
!
!
!
!
!
the error estimate
output the current values
terminate if the tolerance is satisfied
reassign the previous
subtract the error estimate
end do
define f(x) = x^3 - 28
Notes:
- The algorithm is similar to Newton-Raphson, but does not require a knowledge of the first derivative.
- As with the Bisection method a lower and upper bracket is required but this bracket does not necessarily
need to contain the root.
- Unlike the Bisection method, convergence is not guaranteed.
Convergence for the Secant method is very rapid, almost as fast as Newton-Raphson. This is demonstrated
by executing the above algorithm for:
f(x) = x3 -28, Tolerance = 10−12 and initial bracket x0=3.0 and x1=3.1:
33
iteration
0
1
2
3
4
Root estimate (x)
Error estimate
3.100000000000000000
0.064 170 548 190
3.035829451809387316 -0.000 743 875 474
3.036573327283408032 -0.000 015 648 505
3.036588975789397405
0.000 000 003 913
3.036588971875642356 -0.000 000 000 000
612
020
989
755
020
684
715
373
049
164
The program converges in 4 iterations giving an accuracy of the order of 10−14 .
Conclusion
Below is a comparison of the Bisection, Newton-Raphson, and Secant methods for finding the root of f(x)
= x3 -28 with a tolerance of 10−12 .
method
Bisection
Secant
Newton-Raphson
True root
root
estimate
3.036
3.036
3.036
3.036
588
588
588
588
971
971
971
971
876
875
875
875
336
642
664
663
error
estimate
true
error
number of
iterations
728E-15
20E-15
1E-15
673E-15
20E-15
1E-15
36
4
3
The Newton-Raphson method has the fastest convergence with the Secant method a close second. An additional advantage of the Newton-Raphson method over the Bisection and Secant methods is that it does
not require upper and lower bounds as inputs. However, a disadvantage is that it requires a knowledge of
the first derivative (which is not always available). Also, the Newton-Raphson method can fail at or close
to turning points in the function (why?). The Bisection method guarantees convergence whereas both the
Newton-Raphson and Secant methods can fail to converge on a root.
More than one root?
Functions may contain more than one root, the algorithms discussed above will only find one root at a time
and so the user will need to guide the root-finder to find the other roots. This involves giving different
brackets (Bisection and Secant cases) or initial values of x (Newton-Raphson case) until all expected roots
are found.
Hybrid algorithms
By combining the two methods a hybrid algorithm can be constructed which contains the rapid convergence
of the Newton-Raphson or Secant method with the robustness of the bisection method. Try this for yourself
and think about the advantages of your hybrid program.
Extrema (open methods)
Last week we studied the Sequential Search method for finding extremum (minima and maxima) of a function; this is a closed method, i.e. it requires the extremum to be bracketed. This week we continue the
study using open methods.
Newton’s method (open method)
This method is very closely related to the Newton-Raphson root finding method. It provides very fast
convergence. However, it is an open method and so involves some uncertainty about which extremum is
found.
In the Newton-Raphson method we have a target root x0 = x - e where x is a root estimate and e is the
error. An estimate of e is obtained by truncating Taylor’s expansion f(x+e) = f(x) + e f ’(x) = 0 (condition
for a root) and so e = f(x)/f ’(x). The estimate x is then improved iteratively with xi+1 = xi - f(xi )/f ’(xi ).
Similarly, differentiating the truncated Taylor expansion we have f ’(x+e) = f ’(x) + e f ”(x) = 0 (condition
for an extremum) and so e = f ’(x)/f ”(x). The estimate x is then improved iteratively with
xi+1 = xi - f ’(xi )/f ”(xi ). This is Newton’s iterative formula for finding extremum of f(x).
34
Clearly we need the first and second derivatives of f(x); for example for our test function:
f(x) = e−x (x3 -6x2 +8x)
and so [using d/dx(u.v)=u.dv/dx+v.du/dx]
f ’(x) = e−x (-x3 +9x2 -20x+8)
f ”(x) = e−x (x3 -12x2 +38x-28)
Algorithm 4d
Newton’s method for finding accurately and quickly one minimum or maximum of f(x).
input x, tol
! initial estimate and required accuracy
do
e = f’(x) / f’’(x)
! The error estimate
output x, e
! output the current values
if ( |e| < tol ) exit ! terminate if tolerance is satisfied
x = x - e
! subtract the error estimate
end do
if [f’’(x) < 0] output "maxima ", f(x), "at x = ", x
if [f’’(x) > 0] output "minima ", f(x), "at x = ", x
end
define
f(x) = exp(-x) (x^3 - 6x^2 + 8x)
define f’(x) = exp(-x) (-x^3 + 9x^2 -20x + 8)
define f’’(x) = exp(-x) (x^3 - 12x^2 + 38x - 28)
with dx=10−9 the output of this algorithm is:
[x=0.5]
[x=2.7]
[x=5.7]
maxima 1.592547 at x = 0.510 711 428 189 916
minima -0.165150 at x = 2.710 831 453 551 690
maxima 0.120121 at x = 5.778 457 118 251 383
The solutions converge in just 4 iterations with an error << 10−9 .
Newton’s modified method (an open method)
The requirement that we need to know the first and second derivative is a major disadvantage for Newton’s
method. However, as with the Secant root finding method, derivatives can be replaced with numerical
approximations.
The CDA approximation for the first derivative requires two x values x0 and x1 , The CDA2 approximation
of the second derivative requires three values; in this case the third value m is taken as the mean of the first
two, i.e m = (x0 + x1 )/2
[See the figure give in the lecture (URL)]
We can write CDA = ( f(x+dx) - f(x-dx) ) / (2dx) = ( f(x1 ) - f(x0 ) ) / (x1 -x0 )
and CDA2 = ( f(x-dx) + f(x+dx) - 2f(x) ) / dx2 = 4 ( f(x0 ) + f(x1 ) - 2 f(m) ) / (x1 -x0 )2
The error estimate in Algorithm 4d can be rewritten with the above approximations as:
( x1 - x0 )( f(x1) - f(x0) )
e = ----------------------------4 ( f(x0) + f(x1) - 2 f(m) )
Applying m = m - e improves the estimate, and then after the reassignments x0 = m - e and x1 = m + e,
the procedure is iterated causing m to converge quickly on an extremum.
35
|<- e ->|<- e ->|
----+-------+-------+---> x
x0
m
x1
Algorithm 4e
Newton’s modified method for finding accurately and quickly one minimum or maximum of f(x).
input x0, x1
! initial estimates of the root
input tol
! required accuracy
do
m = (x0+x1)/2
e = (x1-x0)*(f(x1)-f(x0))/(f(x0)+f(x1)-2*f(m))/4
output f(m), m, e
! output the current values
if ( |e| < tol ) exit
! terminate if tolerance is satisfied
m = m - e
! improve the extremum estimate
x0 = m - e
! modify x0
x1 = m + e
! and x1
end do
define
f(x) = exp(-x) (x^3 - 6x^2 + 8x)
You can defined f”(x) to determine whether the solution is a maximum or minimum.
With dx=10−9 the output of this algorithm is:
[x0=0.4 x3=0.6] f(m) = 1.592547
[x0=2.7 x3=2.8] f(m) = -0.165150
[x0=5.7 x3=5.8] f(m) = 0.120121
x = 0.510 711 428 296
x = 2.710 831 454 653
x = 5.778 457 122 806
[actual error 0.1x10^-9]
[actual error 1.1x10^-9]
[actual error 4.6x10^-9]
The solutions converge in about 7 iterations (slightly slower than the Newton’s method) with an error of
the order of 10−9 . The modified Newton’s method avoids the need for derivatives but at the expense of less
accuracy.
36
4.3
Lab Exercises
Task 1
Implement the Newton-Raphson and Secant root-finding algorithms given in the lecture into computer programs. Using your computer programs, evaluate to at least 6 decimal place accuracy the root of the following
function:
f(x) = x2 + loge (x) - 3.73
gnuplot> plot [1:2] x**2 + log(x) - 3.73
http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig1.gif
For each method write down:
- The evaluated root (to the appropriate number of decimal places).
- The estimated error, explain how you arrive at your value.
- The number of iterations performed.
- The theoretically expected number of iterations for the required accuracy.
Task 2
Implement the Newton’s Square-Root algorithm given in the lecture into a fortran program. Use your program to evaluate, to at least 9 decimal place accuracy, the square root of 2, 3, 4, ...., 10.
Check the results against your pocket calculator.
Task 3
Implement Newton’s method and Modified Newton’s method for finding extrema of a function given in the
lecture into computer programs. Using your computer programs, find all extremum of the function f(x) =
ex - 3x2 (for 0 > x > 4) to 6 decimal place accuracy.
gnuplot> plot [0:4] exp(x) - 3*x**2
http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig2.gif
If you have time, experiment with some more functions.
37
4.4
Lab Solutions
Task 2
Implement the Bisection, Newton-Raphson and Secant algorithms given in the lecture into computer programs. Use your programs to evaluate, to at least 6 decimal place accuracy, the root of the following function:
f(x) = x2 + loge (x) - 3.73
gnuplot> plot [1:2] x**2 + log(x) - 3.73
http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig1.gif
For each method write down:
- The evaluated root (to the appropriate number of decimal places).
- The estimated error, explain how you arrive at your value.
- The number of iterations performed.
- The theoretically expected number of iterations for the required accuracy.
Solutions
With an initial approximate analysis, the root is determined to be between 1.0 and 2.0
i.e. f(1.0) = -2.73 and f(2.0) = +0.96
Newton-Raphson Method
Program eee484ex4a (see the downloads page)
Here the root estimate is the current value of x and the error estimate is f(x)/f’(x).
x
1.500
1.793
1.776
1.776
Error estimate
000
054
411
354
000
971
626
855
-0.293
0.016
0.000
0.000
054
643
056
000
971 - initial estimate
345 - iteration 1
771
001 - iteration 3
With an initial root estimate of 1.5 and a tolerance of 10−6 the program terminates after 3 iterations, the
final root estimate is 1.776355 (6 dp accuracy). Double precision (kind=8) is used to avoid round-off errors.
The theoretical number of iterations is estimated as follows: for the Newton-Raphson method the number
of correct significant figures is said to double on each iteration, we start with one correct s.f. and want 7
correct s.f.; 22.8 = 7 which implies about 3 iterations (as above).
Secant Method
Program eee484ex4b (see the downloads page)
Here we provide an initial bracket x0=1, x1=2; the initial root estimate is x1=2.
x2
2.000
1.739
1.774
1.776
1.776
Error estimate
000
206
699
367
354
000
933
754
491
850
0.260
-0.035
-0.001
0.000
-0.000
793
492
667
012
000
067
822
737
641
004
- initial estimate
- iteration 1
- iteration 4
With an initial root estimate of 2.0 and a tolerance of 10−6 the program terminates after 4 iterations, the
38
final root estimate is 1.776355 (6 dp accuracy). Double precision (kind=8) is used to avoid round-off errors.
Theoretical the number of required iterations is similar to that from the Newton-Raphson method - plus
one or two more iterations.
Discussion
The Newton-Raphson method converges much more rapidly than the bisection method and tends to overshoot the required tolerance giving a much higher accuracy than requested. Also, this method does not
require an initial bracket and so an initial estimate of the root is not necessary. However, the bisection
method does not need a knowledge of the first derivative - this is a advantage when the first derivative is
difficult to derive. The Secant method also does not require the first derivative and is much faster than the
Bisection method.
Task 2
Implement the Newton’s Square-Root algorithm given in the lecture into a fortran program. Use your program to evaluate, to at least 9 decimal place accuracy, the square root of 2, 3, 4, ...., 10.
Check the results against your pocket calculator.
Solution
Newton’s Square-Root
Program eee484ex4c (see the downloads page)
With a tolerance of 10−9 the results are summarised below:
P
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
Estimate
1.414213562
1.732050808
2.000000000
2.236067977
2.449489743
2.645751311
2.828427125
3.000000000
3.162277660
True square-root
1.414213562
1.732050808
2.000000000
2.236067977
2.449489743
2.645751311
2.828427125
3.000000000
3.162277660
All final estimates are accurate when compared to the true root values. This is not surprising as the computations are based on the powerful Newton-Raphson algorithm. To obtain a 9 decimal place accuracy, 11
significant figures are required; single precision is not sufficient for this and so double precision (kind=8) is
employed.
Task 3
Implement Newton’s method and Modified Newton’s method for finding extrema of a function given in the
lecture into computer programs. Using your computer programs, find all extremum of the function f(x) = ex
- 3x2 (for 0 > x > 4) to 6 decimal place accuracy.
gnuplot> plot [0:4] exp(x) - 3*x**2
http://www1.gantep.edu.tr/~andrew/eee484/images/lab4-fig2.gif
Solutions
The plot indicates that there is a maximum at about 0.2 and a minimum at about 2.8. We will use eee484ex4d
and eee484ex4e with a tolerance of 0.000 001.
Results:
39
eee484ex4d (Newton’s method)
input 0.2: maxima at x =
input 2.8: minima at x =
0.204481 (in 2 iterations)
2.833148 (in 3 iterations)
eee484ex4e (modified Newton’s method)
inputs 0.2,0.3: maxima at x =
inputs 2.8,2.9: maxima at x =
0.204481 (in 3 iterations)
2.833148 (in 3 iterations)
40
4.5
Example exam questions
Question 1 (Newton-Raphson Method)
a) Using Taylor’s expansion derive the following Newton-Raphson iterative
formula for finding the root of a function f(x):
x[i+1] = x[i] - f(x[i]) / f’(x[i])
b) Write a computer program to implement the Newton-Raphson method for
the evaluation of the root of f(x) = e^x - 3x^2. Your program should
include a tolerance as an input.
c) Using the Newton-Raphson method evaluate the root of the
function f(x) = e^x - 3x^2, which is near 1.0, to an accuracy
of at least 6 decimal places. Show the result of each iteration.
Question 3 (Secant Method)
a) Using Taylor’s expansion derive the following Secant iterative
formula for finding the root of a function f(x):
x[i+1] = x[i] - f(x[i]) ( x[i] - x[i-1] ) / ( F(x[i]) - F(x[i-1]) )
b) Write a computer program to implement the Secant method for
the evaluation of the root of f(x) = e^x - 3x^2. Your program should
include a tolerance as an input.
c) Using the Secant method evaluate the root of the
function f(x) = e^x - 3x^2, which is near 1.0, to an accuracy
of at least 6 decimal places. Show the result of each iteration.
Question 5 (Newton’s Square-root)
a) Using the Newton-Raphson iterative formula for the root of a function f(x):
x[i+1] = x[i] - f(x[i]) / f’(x[i])
show that the iterative formula:
x[i+1] = x[i] - (x[i]-p/x[i])/2
converges to the square-root of p.
b) Write a computer program to implement the above formula. Your program should
include a tolerance as an input.
c) Using the above formula evaluate the square-root of 45.6 to an accuracy of at
least 6 decimal places. Show the result of each iteration.
d) Generalise Newton’s Square root to compute the n’th root of a number p.
Question 6
a) Using Taylor’s expansion, derive Newton’s iterative formula
for finding the extremum of a function f(x):
x[i+1] = x[i] - f’(x[i]) / f’’(x[i])
41
b) Write a computer program to implement Newton’s method.
Your program should include a tolerance as an input.
c) Using the Newton’s iterative formula find the maximum of the
function f(x) = e^x - 3x^2, which is near 0.2, to an accuracy
of at least 6 decimal places. Show the result of each iteration.
42
5
Numerical Integration: Trapezoidal and Simpson’s formulae
5.1
Topics Covered
o Numerical Integration: the Extended Trapezoidal Formula (ETF) and the Extended Simpson’s Formula
(ESF). The student should remember the formulae for the ETF and ESF, be able to compute results by
hand, and implement the formulae in computer programs. The student should understand the significance
of the 1/n2 and 1/n4 terms in the truncation error.
5.2
Lecture Notes
Introduction
In this lecture we investigate numerical integration using the Newton-Cotes formulas (also called the NewtonCotes rules). These are a group of formulas for numerical integration based on evaluating the integrand at
n+1 equally-spaced points. They are named after Isaac Newton and Roger Cotes. From the Newton-Cotes
group of rules the two most simplest will be studied, they are the Trapezoid rule and Simpson’s rule. The
aim is to perform, numerically, the integral of a function F(x) over the limits a to b. The basic idea is to
evaluate the function at equally space locations between the limits, summing the values in an appropriate
manner will give an approximation to the integral.
The approach taken here is to take simple (and therefore less accurate) formulae and implement them in an
intelligent way to form basic but practical integration algorithms.
The Trapezoidal Rule (the building block)
Consider integrating a known function F(x) over the interval x = a to b. The Trapezoidal Rule gives the
following expression for the exact integral I:
I = (h/2) ( F(a) + F(b) )
-
(h^3/12) F’’(z)
+ higher order terms
where h is the interval b-a, F”(z) is the second derivative of the function evaluated at some unknown point
z between a and b. You can find the derivation of this rule elsewhere.
Rearranging the Trapezoidal Rule gives:
(h/2) ( F(a) + F(b) ) = I + (h^3/12) F’’(z) + higher order terms
The expression to the left of the equality can be calculated numerically, it approximates the true integral
with a truncation error O(h3 ). The value of F”(z) is generally not known (z is unknown though bounded
between a and b) and so this term is omitted from the solution when we perform the numerical calculation.
The Extended Trapezoidal Rule (ETF)
We will now extend the Trapezoidal Rule to increase the accuracy of the numerical integral.
The expression on the left-hand side gives our numerical approximation for the integral I (which is what we
want to know). This expression is simply the area under the straight line between the points [a,F(a)] and
[b,F(b)]. Note that two function evaluations are performed. Of course we do not expect this straight line to
give an exact representation of the curve F(x) (unless the curve is a straight line, then the function has the
form F(x) = m x + c , and so F”(x) = 0 and so the LHS is exact). The right hand side is the exact integral
I plus the unknown term which represents the truncation error in the approximation.
To increase the accuracy of the numerical integration we can divide the single interval up into n intervals
and perform n Trapezoidal Rules (n+1 function evaluations). For example for n = 5 intervals:
43
+
+
+
+
(h/2)
(h/2)
(h/2)
(h/2)
(h/2)
(
(
(
(
(
F(x0)
F(x1)
F(x2)
F(x3)
F(x4)
+
+
+
+
+
F(x1)
F(x2)
F(x3)
F(x4)
F(x5)
)
)
)
)
) =
+
+
+
+
I1
I2
I3
I4
I5
+
+
+
+
+
(h^3/12)
(h^3/12)
(h^3/12)
(h^3/12)
(h^3/12)
F’’(z1)
F’’(z2)
F’’(z3)
F’’(z4)
F’’(z5)
where h = (b-a)/n = (b-a)/5
x0 = a, x1 = a+h, x2 = a+2h, x3 = a+3h, x4 = a+4h, x5 = a+5h = b
(i.e. xi = a + i*h , for i = 0 to 5)
The expression reduces to:
h ( F(x0)/2 + F(x1) + F(x2) + F(x3) + F(x4) + F(x5)/2 ) = I + 5 (h^3/12) F’’(z)
Where I = I1 +I2 +I3 +I4 +I5 is the exact integral over the full range, and F(z)” = F”(z1 ) + F”(z2 ) + F”(z3 )
+ F”(z4 ) + F”(z5 ) is unknown and represents the error in the numerical integration.
For n intervals the expression becomes:
+-------------------------------------------------------------------+
| ETF = h ( F(x0)/2 + F(x1) + F(x2) + .... + F(x[n-1]) + F(xn)/2 ) |
|
|
|
= I + n.(h^3/12).F’’ + higher order terms
|
|
|
|
where
h = (b-a)/n ,
xi = a + i*h , for i = 0 to n
|
+-------------------------------------------------------------------+
Extended (Compound) Trapezoidal Formula (ETF) for n intervals.
It is useful to rearrange the formula as follows:
(h/2) ( F(x0) + F(xn) )
|
|
a
b
+
h.( F(x1) + F(x2) + ... + F(x[n-1]) )
Replacing h with (b-a)/n in the right hand side gives:
n (h^3/12) F’’ = (b-a)^3 F’’/(12n^2)
+ higher order terms
giving for n intervals the numerical integral
+-------------------------------------------------------+
| ETF =
h ( F(a) + F(b) ) / 2
|
|
+ h ( F(x1) + F(x2) + ... + F(x[n-1]) )
|
|
|
|
= I + (b-a)^3 F’’/(12 n^2) + higher order terms |
|
|
|
where h=(b-a)/n , xi=a+i*h , for i = 1 to n-1
|
+-------------------------------------------------------+
Extended (Compound) Trapezoidal Formula for n intervals
Inspecting the truncation error term we can now see that the accuracy of the numerical integral can be
increased by increasing the number of intervals n; the truncation error in the ETF is inversely proportional
44
to the the square of the number of intervals, i.e. doubling the number of intervals gives four times the
accuracy. Note that the value of (b-a) is constant (the region of integration). Also, if the second derivative
of F(x) is small then the error is small, if F” is zero then the formula is exact, i.e. for the form F(x) = m x
+ c the error is zero (as expected).
Example
Using the Extended Trapezoidal Formula (ETF) integrate the function
F(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5 using 5 intervals.
Solution:
First we can see that the second derivative = 6x-6 is not zero and so we expect the ETF to contain a
non-zero truncation error.
h = (b-a)/n
= (2.5-0.0)/5
= 0.5
xi = 0.0 + i*0.5
n=5
i
x
F(x)
--------------------x0=a 0
0.0
5.000
x1
1
0.5
4.375
x2
2
1.0
3.000
x3
3
1.5
1.625
x4
4
2.0
1.000
x5=b 5
2.5
1.875
ETF = 0.25*( 5.000 + 1.875 ) + 0.5*( 4.375 + 3.000 + 1.625 + 1.000 ) = 6.718750.
Compare with the analytical result 6.640625, we see the error E5 = 0.078125.
Repeating the integral over 10 intervals (n=10) gives ETF = 6.660156, the error is E10 = 0.019531 and so
E5 /E10 = 4.0000512 as predicted by the form of the error term in the Trapezoidal formula. Note that the
1/n2 relation is not exact due to higher order terms in the truncation error.
Implementation
First consider a basic implementation. The inputs to the numerical integration are:
1. The function, F(x), to be integrated
2. The limits of the integration, a and b
3. The number of intervals of the integration, n.
Algorithm 5a
input a, b
input n
! input the lower and upper limits
! input the number of intervals
h = (b-a) / n
etf = ( f(a) + f(b) ) / 2
! the interval size
! sum the end points
do i = 1 to n-1
x = a + i*h
etf = etf + f(x)
end do
etf = etf*h
! calculate the evaluation position
! sum over the remaining points
! complete the ETF
print ’The integral = ’,etf ! output the result
end
define the function f(x) = x^3 - 3*x^2 + 5
45
Note: use double precision variables to avoid round-off errors.
This algorithm gives the following outputs for the indicated inputs (the true error is given in the brackets):
a=0.0,
a=0.0,
a=0.0,
a=0.0,
b=2.5,
b=2.5,
b=2.5,
b=2.5,
n=
5
n= 10
n= 100
n=1000
integral
integral
integral
integral
=
=
=
=
6.718750
6.660156
6.640820
6.640627
(0.08)
(0.02)
(0.0002)
(0.000002)
As expected, the error in the approximation reduces as the square of the number of intervals, n. Here the
error is calculated by comparing the numerical result with the analytical evaluation; of course the analytical
result might not be available in practice.
Tolerance and Error Estimation
A desirable property of a numerical integration algorithm is that the accuracy of the result is determined
by the user i.e. a tolerance is an input to the algorithm. In our algorithm (above):
replace:
input n
with:
input tolerance
! input the number of intervals
! the required accuracy of the result
The algorithm then has to decide what value of n corresponds to the required accuracy (tolerance). For
this the algorithm needs to form an estimate E for the error in the ETF and terminate the algorithm when
abs(E) is less than Tolerance. We use abs(E) because E can be negative.
An error estimate
An error estimate can be formulated by making use of the fact that the error has a 1/n2 form. Consider
n intervals giving an approximation ETFn and error En . Now run the algorithm again with 2n intervals.
The result is ETF2n with an error E2n = En /4 (to a good approximation). The difference between the
two results is (ETFn - ETF2n ) = (En - E2n ) = (4E2n - E2n ) = 3 E2n and so we can write E2n = (ETFn ETF2n )/3. We therefore have an estimate of the error in the final result ETF2n . This error estimate can
be used as a termination condition in the algorithm: repeat the ETF doubling the number of intervals until
abs(E2n ) less than the value of Tolerance.
Also we can use the error estimate to improve our final approximation:
ETFimproved = ETF2n - E2n (this is applied after the termination condition).
The estimated error is subtracted from the final approximation. This works well as long as the assumption
that E2n = En /4 is accurate.
Example using the above data:
E10 = (ETF5 - ETF10 )/3 = (6.718750-6.660156)/3 = 0.019531
ETFimproved = ETF10 - E10 = 6.660156 - 0.019531 = 6.640625 (exact to 6dp).
The modification of Algorithm 5a for the input of a tolerance is left to the student (see lab exercise).
Simpson’s Rule (a higher-order method)
The second method in Newton-Cotes group is Simpson’s Rule which provides a higher-order method:
I = (h/3) ( F(a) + 4 F((a+b)/2) + F(b) )
-
(h^5/90) F’’’’(z)
+ higher order terms
Rearranging Simpson’s Rule gives:
(h/3) ( F(a) + 4 F((a+b)/2) + F(b) ) = I + (h^5/90) F’’’’(z) + higher order terms
The left hand side can be calculated, there are three function evaluations (i.e. n=2).
Extending Simpson’s Rule for n intervals (n must be even) gives:
46
[n=2]
(h/3) ( F(x0) + 4 F(x1) + F(x2) ) = I + (h^5/90) F’’’’(z) + higher order terms
[n=4]
(h/3) ( F(x0) + 4 F(x1) + F(x2) ) + (h/3) ( F(x2) + 4 F(x3) + F(x4) )
= (h/3) ( F(x0) + 4 F(x1) + 2 F(x2) + 4 F(x3) + F(x4) ) = I + 2(h^5/90) F’’’’(z)
[n even]
(h/3) ( F(x0) + 4 F(x1) + F(x2) +
F(x2) + 4 F(x3) + F(x4) +
F(x4) + 4 F(x5) + F(x6) +
.
.
F(x[n-2]) + 4 F(x[n-1]) + F(x[n]) ) = I + (n/2)(h^5/90) F’’’’(z)
and combining terms gives:
+-------------------------------------------------------------+
| ESF = (h/3) ( F(x0) + 4 F(x1) + 2 F(x2) + 4 F(x3) + ....
|
|
+ 2 F(x[n-2]) + 4 F(x[n-1]) + F(x[n] )
|
|
|
|
= I + (n/2)(h^5/90) F’’’’ + higher order terms
|
|
|
|
where
h = (b-a)/n ,
xi = a + i*h , for i = 0 to n
|
+-------------------------------------------------------------+
Extended Simpson’s Formula (ESF) for n(even) intervals.
Replacing h with (b-a)/n
(n/2)(h^5/90) F’’’’ = (b-a)^5 F’’’’/(180 n^4)
+ higher order terms
and forming summation series, we have
+-------------------------------------------------------------+
| ESF =
(h/3) ( F(x0) + F(x[n] )
|
|
+ 4 (h/3) ( F(x1) + F(x3) + .... + F(x[n-1]) )
|
|
+ 2 (h/3) ( F(x2) + F(x4) + .... + F(x[n-2]) )
|
|
|
|
= I + (b-a)^5 F’’’’/(180 n^4) + higher order terms
|
|
|
|
where
h = (b-a)/n ,
xi = a + i*h , for i = 0 to n
|
+-------------------------------------------------------------+
Extended Simpson’s Formula (ESF) for n(even) intervals.
Inspecting the truncation error for the ESF we see that the error is proportional to 1/n4 and the fourth
derivative of F. We can therefore expect a much smaller truncation error than the ETF.
Example (using the previous ETF example)
Using the Extended Simpson’s Formula (ESF) integrate the function
F(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5 using 2 intervals.
Solution:
First we can see that the fourth derivative is zero and so even with just n=2 we expect the truncation error
to be zero.
47
h = (b-a)/n
= (2.5-0.0)/2
= 1.25
n=2
i
x
F(x)
--------------------x0=a 0
0.00 5.000
x1
1
1.25 2.265625
x2=b 2
2.50 1.875
ESF = (h/3) ( F(x0) + 4 F(x1) + F(x2) )
= 1.25/3 * ( 5.000 + 4*2.265625 + 1.875 ) = 6.640625
The result is exact as expected.
To test the ESF further we need a function that has a non-zero fourth derivative.
For example integrate the function F(x) = x5 over the same interval:
h = (b-a)/n
= (2.5-0.0)/2
= 1.25
n=2
i
x
F(x)
--------------------x0=a 0
0.00 0.00
x1
1
1.25 3.0517578125
x2=b 2
2.50 97.65625
ESF = (h/3) ( F(x0) + 4 F(x1) + F(x2) )
= 1.25/3 * ( 0 + 4*3.0517578125 + 97.65625 ) = 45.7763671875
The exact results is 2.56 /6 = 40.690104166666667 and so the error E2 = 45.7763672 - 40.690104166666667
= 5.086263020833333
Repeating for n=4 we get ESF = 41.00799560547 and so E4 = 41.00799560547 - 40.690104166666667 =
0.3178914388021
We expect an error proportional to 1/n4 i.e. E2 /E4 = 24 = 16, and 5.086263020833333/0.3178914388021 =
16.00000 which is consistent with the 1/n4 expectation.
Implementation
The inputs to the numerical integration are: 1. The function, F(x), to be integrated; 2. The limits of the
integration, a and b; 3. The number of intervals of the integration, n.
Algorithm 5b
input a, b
! input the lower and upper limits
input n
! input the number of intervals
h = (b-a) / n
! the interval size
esf = f(a) + f(b)
! sum the end points
do i = 1, n-1,2
x = a + i*h
! calculate the evaluation position
esf = esf + 4 * f(x)
! sum the odd points
end do
do i = 2, n-2,2
x = a + i*h
! calculate the evaluation position
esf = esf + 2 * f(x)
! sum the even points
end do
esf = esf * h/3
print ’The integral = ’,esf ! output the result
end
define the function f(x) = x^5
48
This ESF algorithm is compared with the ETF algorithm for F(x) = x5 , a=0.0, b=2.5 :
intervals
n= 4
n= 8
n= 16
n= 32
n=320
ETF
46.968460083
42.274594307
41.087158024
40.789425838
40.691097575
error
6.278355916
1.584490140
0.397053858
0.099321672
0.000993409
ESF
41.007995605
40.709972382
40.691345930
40.690181777
40.690104174
error
0.317891439
0.019868215
0.001241763
0.000077610
0.000000008
The ESF method is clearly more accurate than the ETF method.
Tolerance and Error Estimation
As discussed earlier for the ETF algorithm, it is desirable to replace the input n with a tolerance; for this the
algorithm needs to form an estimate E for the error in the ESF and terminate the algorithm when abs(E)
is less than the value of Tolerance.
An error estimate
An error estimate can be formulated by making use of the fact that the truncation error has a 1/n4 form.
Consider n intervals giving an approximation ESFn and error En . Now run the algorithm again with 2n
intervals. The result is ESF2n with an error E2n = En /16 (to a good approximation). The difference between
the two results is (ESFn - ESF2n ) = (En - E2n ) = (16E2n - E2n ) = 15 E2n and so we can write E2n = (ESFn
- ESF2n )/15. We therefore have an estimate of the error in the final result ESF2n .
This error estimate can be used as a termination condition in the algorithm: repeat the ESF doubling the
number of intervals until abs(E2n ) is less than the value of Tolerance.
Also we can use the error estimate to improve our final approximation: ESFimproved = ESF2n - E2n (this is
applied after the termination condition). The estimated error is subtracted from the final approximation.
Example using the above data:
E32 = (ESF16 - ESF32 )/15 = (40.691345930-40.690181777)/15 = 0.000077610.
ESFimproved = ESF32 - E32 = 40.690181777 - 0.000077610 = 40.690104167 (exact to 9dp).
However in some cases, where the higher derivatives of a function are significant, this error estimate is not
accurate. In these cases it is better to use an alternative error estimate formed simply as E2n = ESFn ESF2n , in this way we repeat the calculations until the difference between the previous result and the new
result is smaller than a tolerance. This avoids the possible underestimation of the above error estimate (see
the lab exercise).
Adaptive methods
The Newton-Cotes formulas we have considered in this lecture perform numerical integration based on evaluating the integrand at n+1 equally-spaced points. This may not be an effective approach in cases were the
function varies greatly over the region of integration. For example and Gaussian function has long (infinite)
tails that reduce asymptotically towards zero. In such cases the spacing between function evaluations need
to vary. This is the subject of adaptive numerical integration methods. You can read about these methods
elsewhere.
Summary
We have investigated two numerical methods for integrating a function F(x) over the range x = a to b based
on evaluating the integrand at n+1 equally-spaced points. The Extended Trapezoidal Formula (ETF) has
a truncation error proportional to the second derivative and 1/n2 . The Extended Simpson’s Formula (ESF)
has an truncation error proportional to the fourth derivative and 1/n4 .
Error estimates can be formed by considering the 1/n2 and 1/n4 forms for the truncation error or simply
by comparing the difference between the previous result and the new result as the number of intervals
is increased. These error estimates can be used as a termination condition in the numerical integration
algorithm.
49
5.3
Lab Exercises
The task
Write a computer program to integrate the following functions to an accuracy of at least 6 decimal places
using the Extended Trapezoidal Formula and the Extended Simpson’s Formula.
F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1
F(t) = 894 t / ( 1.76 + 3.21 t2 )3 over the range t=0 to t=1.61
Estimate the integrals from a rough sketch of the function.
Questions
1. What are the results of your program - do they look reasonable?
2. What is the approximate error in the numerical integral?
Note: To determine the integral to at least 6 decimal places you should run the program with say 16 intervals, then run it again with 32 intervals and 64 and 128 etc until the error estimate is less than 0.000001.
But be careful with the way you choose the error estimate - you might run into problems! Also, if you can
think of a way to instruct the computer to perform this procedure automatically then you will save your
self a lot of time (and have a useful program!).
50
5.4
Lab Solutions
Write a computer program to integrate the following functions to an accuracy of at least 6 decimal places
using the Extended Trapezoidal Formula and the Extended Simpson’s Formula.
F(t) = 894 t / ( 1.76 + 3.21 t2 )3 over the range t=0 to t=1.61
F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1
Estimate the integrals from a rough sketch of the function.
1. What are the results of your program - do they look reasonable?
2. What is the approximate error in the numerical integral?
Solutions:
Programs: eee484ex5a and eee484ex5b (see download page).
To determine the integral to at least 6 decimal places you should run the program with say 16 intervals, then
run it again with 32 intervals and 64 and 128 etc until the error estimate is less than 0.000001. The results
shown below use the following error estimates: E2n = (ETFn -ETF2n )/3 and E2n = (ESFn -ESF2n )/15. We
have already seen in the lecture notes that these estimates can be very accurate. However, we will see that
they are not so good for the second function given in this exercise and so later we will try again with the
error estimates E2n = ETFn -ETF2n and E2n = ESFn -ESF2n .
First it is good practice to estimated the integral from a rough sketch to make sure that computed result is not completely wrong (due to a programming error).
Results for F(t) = 894 t / ( 1.76 + 3.21 t2 )3
First a rough sketch of the integral shows that we expect the result to be approximately 21.8.
Next, computing the ETF and ESF for n=16 and then doubling until the tolerance is satisfied, gives:
n
etf
16 21.650237
32 21.756923
64 21.783457
128 21.790082
256 21.791738
512 21.792152
1024 21.792255
2048 21.792281
4096 21.792287
8192 *21.792289
Error Estimate
True E.
(ETFn-ETF2n)/3
-0.035562 -0.035367
-0.008845 -0.008833
-0.002208 -0.002208
-0.000552 -0.000552
-0.000138 -0.000138
-0.000034 -0.000034
-0.000009 -0.000009
-0.000002 -0.000002
-0.0000005 -0.0000005
esf
21.795657
21.792484
21.792302
*21.792290
21.792290
21.792290
21.792290
Error estimate
True E.
(ESFn-ESF2n)/15
0.000211
0.000195
0.000012
0.000012
0.0000007
0.0000007
0.00000005
0.00000005
0.000000003 0.000000003
0.000000000 0.000000000
Here the Error Estimate and True Error are compared; in this case to 6dp the error estimates are accurate.
Iterations are terminated(*) when the error estimate is less than 10−6 .
For the ETF a 6 decimal place accuracy is achieved for n=8192 (the error estimate is -0.000 000 5) with
the numerical integral = 21.792 289. The actual error by calculus is also 0.000 000 5, in this case the error
estimate is accurate. The ESF gives the same result but in fewer iterations. Again the error estimate is
accurate.
51
Results for F(x) = ( 1-x2 )1/2
First a rough sketch of the integral shows that we expect the result to be approximately 1.57.
n
etf
16 1.544910
32 1.561627
64 1.567551
128 1.569648
256 1.570390
512 1.570653
1024 1.570746
2048 1.570778
4096 1.570790
8192 1.570794
16384 *1.570796
Error estimate
True
(ETFn-ETF2n)/3
-0.005572 -0.009170
-0.001975 -0.003245
-0.000699 -0.001148
-0.000247 -0.000406
-0.000087 -0.000144
-0.000031 -0.000051
-0.000011 -0.000018
-0.000004 -0.000006
-0.000001 -0.000002
-0.0000005 -0.0000008
esf
1.560595
1.567199
1.569526
1.570348
1.570638
1.570740
1.570777
*1.570789
1.570794
1.570795
1.570796
Error estimate
True error
(ESFn-ESF2n)/15
-0.000440
-0.003597
-0.000155
-0.001270
-0.000055
-0.000449
-0.000019
-0.000159
-0.000007
-0.000056
-0.000002
-0.000020
-0.0000009
-0.0000070
-0.0000003
-0.0000025
-0.0000001
-0.0000009
-0.00000004 -0.00000031
For the ETF 6 decimal place accuracy is achieved with n=16384 giving a numerical integral = 1.570796
The error estimate is -0.000 000 5 while the actual error by calculus is 0.000 0009 about twice the estimated
error but still less than 10−6 . For this function the error estimate is not very accurate but still serves well
for the termination condition.
The ESF performs only slightly better than the ETF for this function, except for the error estimate which
is of the order of 10 times less than the true error. Consequently the algorithm terminates too soon giving
the result 1.570 789 that has only 5 dp accuracy! This problematic behavior can be explained by the fact
that the higher derivatives of this function are significant. The solution to this problem is to define the error
estimate in a more reliable way as E2n = ESFn - ESF2n , in this case the algorithm terminates at n=16384
giving the result 1.570796 that is now correct to 6 decimal places (see below).
Discussion
The error estimates (ETFn -ETF2n )/3 and (ESFn -ESF2n )/15 in some cases give very good estimates of the
truncation error and in other cases not so good; this varies from function to function (and where the function
is being evaluated). A much more reliable error estimate is E2n = ESFn - ESF2n . This will guarantee the
correct result!
It would be practical to start with, for example, n=1024; the solution is then only a three or four programruns away. Alternatively you could write the program such that it automatically repeats the ETF (doubling
n each time) until the termination condition is met. This is implemented in eee484ex5a-auto and eee484ex5bauto (see the downloads page). The more reliable error estimate of E2n = ESFn - ESF2n is also used in
these programs.
Results for F(t) = 894 t / ( 1.76 + 3.21 t2 )3 .
n ETF
32 21.756923
64 21.783457
128 21.790082
256 21.791738
512 21.792152
1024 21.792255
2048 21.792281
4096 21.792287
8192 21.792289
16384 *21.792289
Error
-0.106686
-0.026534
-0.006625
-0.001656
-0.000414
-0.000103
-0.000026
-0.000006
-0.000002
-0.000000
ESF
Error
21.792484 0.003172
21.792302 0.000183
21.792290 0.000011
*21.792290 <0.000001
52
Results for F(x) = ( 1-x2 )1/2 over the range x=-1 to x=1.
n ETF
32 1.561627
64 1.567551
128 1.569648
256 1.570390
512 1.570653
1024 1.570746
2048 1.570778
4096 1.570790
8192 1.570794
16384 1.570796
32768 *1.570796
Error
-0.016717
-0.005925
-0.002097
-0.000742
-0.000262
-0.000093
-0.000033
-0.000012
-0.000004
-0.000001
<-0.000001
ESF
1.567199
1.569526
1.570348
1.570638
1.570740
1.570777
1.570789
1.570794
1.570795
*1.570796
Error
-0.006604
-0.002327
-0.000821
-0.000290
-0.000103
-0.000036
-0.000013
-0.000005
-0.000002
<-0.000001
In this case the 6 decimal place accuracy is guaranteed.
Conclusion
We can integrate functions numerically to a predefined accuracy. Higher-order methods do not necessarily
give more accurate results! The Extended Trapezoidal Formula and Extended Simpson’s Formula are simple
to implement and work well when used intelligently with a termination condition based on an error estimate
and a tolerance. Be careful when forming error estimates, E2n = ESFn - ESF2n is safer.
53
5.5
Example exam questions
Question
a) Using the ETF(or ESF) perform the following integral by dividing
the region of integration into ten equally spaced intervals:
Integral of ( 6 x^2 - e^x ) dx
from 1.0 to 5.0
b) Write a computer program to implement this method for 100 intervals.
Answers
a) The ETF is given by: I = h ( f0/2 + f1 + f2 + ..... + fn-1
where fi = f(xi) and xi = a + i*h and h = (b-a)/n.
Note that there are (n+1) function evaluations.
+ fn/2 )
For ten intervals (n=10), and a range of 1.0 to 5.0
Evaluating with a pocket calculator f(x) = 6x^2 - e^x gives:
sum = 252.51921 ; I = 0.4*sum = 101.00768
Repeating with the ESF: I = h/3 ( f0 + 4 f1 + 2 f2 + ..... + 2 fn-2 + 4 fn-1 + fn )
Evaluating with a pocket calculator f(x) = 6x^2 - e^x gives:
sum = 767.1359 ; I = 0.4/3*sum = 102.285
A quick check with the analytical solution 102.305 indicates that
the answer seems reasonable.
b) See the lab exercise.
54
6
Solution of D.E.s: Runge-Kutta, and Finite-Difference
6.1
Topics Covered
o Runge-Kutta; the student should be able to write down Euler (first-order RK) steps representing the time
evolution of a given (simple) physical system, and calculate the truncation error by considering Taylor’s
series. The student should be able to write a computer program implementing the formulae for the given
physical system.
o Laplace and Jacobi Relaxation: the student should be able to derive the finite difference form for Laplace’s
equation using the ”CDA2”, and solve for the potential V(x,y,z).
6.2
Lecture Notes
Introduction
The numerical solution of differential equations is a very large subject spanning many types of problems and
solutions. In this lecture we will look at just two simplified topics: the solution of ordinary differential equations using Euler’s method (first-order Rung-Kutta), and the solution of partial differential equations using
the finite-difference method and Jacobi Relaxation. You can read about many other types of problems and
solutions in your course text book (”Ordinary Differential Equations” and ”Partial Differential Equations”).
The Euler Method (first-order Rung-Kutta)
Many physical systems can be expressed in terms of first-order or second-order differential equations. The
time-evolution of these systems can be approximated with a Euler method. We will apply the Euler method
to simulate a body in free-fall, the charging and discharge of a simple R-C circuit, and the motion of a
mass-on-a-spring system.
To introduce the ideas of Euler methods we will first look at a simple ’freshman physics’ problem of
a body in free-fall; here air resistance is ignored and the acceleration due to gravity is a constant = 9.81
m/s2 . First the problem is solved analytically and then we will develop Euler methods to solve the problem
numerically. We will then study Euler methods further.
Free-fall - analytical solution
We will determine the displacement of a body in free-fall. The boundary conditions for the solution is that
at t=0 the initial displacement, y, and the initial velocity, v, are both zero. The system is governed by the
second-order differential equation y” = -g ; integrating with the above boundary conditions gives
y’ = - g t and so y = - 0.5 g t2 .
For t = 10 seconds we therefore have a displacement of -0.5x9.81x102 = -490.500 m
Free-fall - numerical solution (by the Euler method)
We have the second-order differential equation y” = -g this can be written as two first-order equations:
dv/dt = -g and dy/dt = v, rearranging these equations we have: dv = -g dt and dy = v dt, which can be
written as:
v(t+dt) = v(t) - g dt
y(t+dt) = y(t) + v(t) dt
These expressions simple state that the velocity a time dt later is the current velocity advanced by acceleration multiplied by dt, and the displacement a time dt later is the current displacement advanced by velocity
multiplied by dt. The expressions are exact if dt takes the calculus form of ”dt tends to zero”. However, for
55
a numerical solution dt is small but not zero; with a finite value for dt the above equations become Euler’s
method, the expressions now, in general, contain truncation errors (see later).
Writing the expressions in the form of an algorithm:
v1 = v0 - g dt
y1 = y0 + v0 dt
Euler step evolving velocity and
displacement in a finite time dt
Three versions of the Euler step exist:
Simple Euler
v1 = v0 - g dt
y1 = y0 + v0 dt
Euler-Cromer
v1 = v0 - g dt
y1 = y0 + v1 dt
Improved Euler
v1 = v0 - g dt
y1 = y1 + (v0+v1)/2 dt
Each version uses a different velocity to evolve y:
Simple Euler
:
Euler-Cromer
:
Improved Euler :
v0
v1
(v0+v1)/2
- generally poor accuracy
- works well for oscillating systems
- exact for the free-fall system.
Algorithm 6a
Implementation of the Simple Euler method to evolve the displacement of a body under free-fall (g=9.81)
for 10 seconds. The system evolves in time steps of 0.1 seconds (100 iterations).
dt
t
y
v1
=
=
=
=
do 100
v0 =
t =
v1 =
y =
end do
0.1
0
0
0
!
!
!
!
Time step is 0.1 seconds
Time is initially zero
Displacement is initially zero
Velocity is initially zero
iterations
v1
t + dt
v0 - 9.81*dt
y + v0 dt
!
!
!
!
Record
Evolve
Evolve
Evolve
output y
the previous velocity
time
velocity accord
displacement (Simple Euler method)
! Output the result
Result: displacement at t = 10.000 seconds is -485.595 m
The exact result (from calculus) is - g t2 /2 = -9.81x102 /2 = -490.500 m. This difference, +4.905 m, between
this numerical result and the analytical result is due to truncation errors in the Simple Euler method. We
can investigate this by considering Taylor’s expansion: y(t+dt) = y(t) + y’(t) dt + y”(t) dt2 /2 + y”’(t)
dt3 /6 + ...
In free-fall y’(t)=v(t), y”(t)=-g, and y”’(t)=0, and so we can write: y(t+dt) = y(t) + v(t) dt - g dt2 /2.
The first two terms in the above equation is the Euler-step for evolving displacement (with the evolution of
velocity v(t+dt) = v(t) - g dt being exact). The last term g dt2 /2 represents the truncation error in each
Simple Euler step. In the above algorithm the Euler step is iterated 10s/0.1s = 100 times which implies the
total error is 100 g dt2 /2 = 100x9.81x0.12 /2 = 4.905 m, as seen in the results of the algorithm.
Again by considering Taylor’s expansion it can be shown that the Improved Euler method is exact for the
case of free-fall (see homework). Replace y = y + v0 dt with y = y + (v0+v1)/2 dt in the above algorithm
to convert it to the Improved Euler method.
Euler methods give a general tool for solving systems governed by first- or second-order differential equations
- in many cases analytical solutions are not available. Euler methods are generally not exact though with
56
careful choice of the version of the method and by using a small enough value for dt Euler methods can give
good results.
The general form for the Simple Euler method:
First-order: dy/dt = f(), Euler step: y1 = y0 + f() dt
where f() is any function of t and y and t,y can represent any parameters.
Example, a marble falling in oil: dv/dt = g - bv/m
Euler step: v = v + (g-bv/m) dt (only velocity is evolved)
Second-order: y’’ = f()
where f() is any function of t and y and t,y can represent any parameters.
Replace this with two first-order equations: dv/dt = f() and dy/dt = v
Euler steps: v1 = v0 + f() dt and y = y + v0 dt
Example, a simple pendulum:
let y = theta and v = omega => theta’’ = - g Sin(theta)/L
replace with two first-order equations:
d(omega)/dt = -g Sin(theta)/L and d(theta)/dt = omega
Euler steps: omega1 = omega0 - (g Sin(theta)/L) dt and theta = theta + omega0 dt
For systems governed by second-order differential equations the Euler-Cromer and Improved Euler methods
can be obtained with simple modifications of the above formulae.
Summary so far
It has been shown that a Euler method can be implemented to evolve, in time, the motion of a free-falling
body. No analytical inputs are required and so this method can be employed in the study of more complex
systems where analytical solutions are difficult of impossible. The type of Euler method should be chosen
carefully; the Improved-Euler method gives an exact result in the case of free-fall, the Euler-Cromer method
is more suitable for oscillating systems.
Examples
The following are examples of employing the Simple Euler method to solve some physical systems (each
system has an analytical solution with which you can check the result of the numerical solution).
A first-order system
A charging R-C circuit.
A simple R-C circuit is governed by the 1st-order D.E.:
i = dq/dt
,
where i is the current in the circuit: i = (V0-V)/R
V0 is the charging voltage, and V is the potential
difference across the capacitor V = q/C.
The analytical result for potential difference across the capacitor
as it charges is given by: V = V0 (1-exp(-t/RC))
The Simple Euler simulation is as follows:
The system should be initialised:
57
R = 1000
C = 1E-6
V0 = 12
V = 0
q = 0
!
!
!
!
!
Circuit resistance (Ohms)
Circuit capacitance (Farads)
Charging voltage (Volts)
Initial potential of the capacitor (Volts)
Initially uncharged (Coulombs)
dt = 1E-5
t = 0
! Euler time step (seconds)
! Start at t=0 seconds
The Simple Euler steps are
i
t
q
V
=
=
=
=
(V0-V)/R
t + dt
q + i dt
q/C
!
!
!
!
Calculate the circuit current
Advance the time
Add a small amount of charge dq = i dt
Recalculate the voltage
The system evolves by iterating the above Euler steps.
A second-order system
The displacement of a mass-on-a-spring
The restoring force of a mass on a spring is given by F = -k.x
=> the motion of the body is governed by the 2nd-order D.E.:
x’’ = -kx/m
where k is the spring constant (N/m),
x is the displacement,
m is the inertial mass.
two 1st-order D.E.s are formed:
dv/dt = -kx/m
dx/dt = v
and
The analytical result for the displacement is: x = x0 Cos(w.t)
where x0 is the amplitude and w is the angular frequency = SQRT(k/m)
The Euler simulation is as follows (here I show the Euler-Cromer
method - it is more accurate for oscillating systems and is more
concise to write down):
The system should be initialised:
m
k
x
v
=
=
=
=
0.1
1.0
0.1
0
!
!
!
!
Mass (kg)
Spring constant (N/m)
Initial displacement (amplitude) (m)
The mass is initial at rest.
58
dt = 0.01
t = 0
! Time step (seconds)
! Start at t=0
The Euler-steps are
t
a
v
x
=
=
=
=
t + dt
-kx/m
v + a dt
x + v dt
!
!
!
!
Advance the time
Calculate the acceleration
Advance the speed
Advance the displacement (using the new speed)
The system evolves by iterating the above Euler steps.
Algorithm 6b
As an example, a simulation of the displacement of a mass-on-a-spring is implemented in the algorithm
below. The 1 kg mass is initially at rest and at a displacement of 1 m. The spring constant is 1 N/m
and the simulation evolves in time steps of 0.1 seconds. The algorithm terminates after 100 iterations (10
seconds).
m = 0.1
k = 1.0
x0 = 0.1
dt = 0.01
!
!
!
!
Mass
Spring constant
Amplitude
Time step
!
!
!
!
x = x0
v = 0.0
t = 0.0
! Initial displacement
! Initial velocity
! Initial time
Parameters
!
!
!
Initial state
do 100 iterations
t = t + dt
! Advance time
a = -k x/m
! Calculate the acceleration
v = v + a dt ! Advance the velocity
x = x + v dt ! Advance the displacement (Euler-Cromer)
! Ouput, and compare the displacement with the exact expression
output x, x0*COS(SQRT(k/m)*t)
end do
4th-Order Runge-Kutta
For ”serious” calculations, higher-order methods are employed. One very popular method is the 4th-order
Runge-Kutta method; the truncation error is greatly reduced though at the expense of a more complex
algorithm. You can read more about this in your course text book.
Finite-Difference Methods
Many physical systems can be represented by partial differential equations. Such systems can be solved numerically using finite-difference methods; i.e. the PDEs are replaced with finite-difference approximations.
For example dV/dt can be approximated by the forward-difference approximation (V(t+dt)-V(t))/dt, and
d2 V/dx2 can be approximated by the Central-Difference approximation ( V(x-dx) - 2V(x) + V(x+dx) ) /
dx2 . In this lecture we will look at just one such method; you can refer to the course text book for an
introduction to a number of other Finite-Difference methods.
59
Finite-Difference: Laplace
For a region of space which does not contain any electric charge the electric potential, V(x,y,z) in that
region must obey Laplace’s equation: d2 V/dx2 + d2 V/dy2 + d2 V/dz2 = 0; here d2 /dx2 etc are partial
derivatives.
Laplace’s equation can be solved analytically for simple (symmetric) configurations; however, if a more
complex configuration is to be solved then a numerical method must be employed. The numerical solution
involves four steps: 1. The region of space is represented by a three-dimensional lattice where the potential
is defined at discrete points. 2. Laplace’s Equation is written numerically as a finite difference equation.
3. The numerical equation is solved. 4. A relaxation algorithm is employed to apply the solution until
Laplace’s equation is satisfied. The simplest method for this is the Jacobi Method.
1. A 3-d lattice
The potential V(x,y,z) in a region of space can be mapped by a lattice V(i, j, k) where i, j, and k specify
points in the lattice:
x = i.dx , y = j.dy , z = k.dz
where dx, dy, dz are the spacing between the lattice points in the x-, y-, and z-direction respectively.
The lattice is initialised with the boundary conditions (which are fixed) and with an initial approximation
to the solution (which will be relaxed to the solution). Note: as the lattice spacing tends to zero the lattice
becomes continuous space. However, the lattice spacing must be finite in this computed solution; the model
is therefore not exact.
2. A Finite Difference equation for Laplace’s Equation.
Recall that the central difference approximation for the second derivative of a function F(x) is given by:
CDA2 = ( F(x-h) - 2F(x) + F(x+h) ) / h^2
and so we can write (approximately):
d^2 V
----- = ( V(x-dx,y,z) - 2V(x,y,z) + V(x+dx,y,z) ) / dx^2
dx^2
d^2 V
----- = ( V(x,y-dy,z) - 2V(x,y,z) + V(x,y+dy,z) ) / dy^2
dy^2
d^2 V
----- = ( V(x,y,z-dz) - 2V(x,y,z) + V(x,y,z+dz) ) / dz^2
dz^2
and Laplace’s equation becomes
[ V(x-h,y,z) - 2V(x,y,z) + V(x+h,y,z) +
V(x,y-h,z) - 2V(x,y,z) + V(x,y+h,z) +
V(x,y,z-h) - 2V(x,y,z) + V(x,y,z+h) ] / h^2
=
0
where, for convenience, the lattice spacings are set equal
i.e. dx = dy = dz = h
3. Solution
Solving for V(x,y,z) [see homework] gives
60
V(x,y,z) =
[ V(x-dx,y,z) + V(x+dx,y,z) +
V(x,y-dy,z) + V(x,y+dy,z) +
V(x,y,z-dz) + V(x,y,z+dz) ] / 6
The equation simply states that the value of the potential at any point is the average of the potential at
neighboring points. The solution for V(x,y,z) is the function that satisfies this condition at all points simultaneously and satisfies the boundary conditions.
4. Jacobi Relaxation
Applying the above solution to the region of space modifies the field so that it is in better agreement with
Laplace’s equation. The solution must be applied many times, each iteration giving a better agreement with
Laplace’s equation. The solution is satisfied when further iterations yields insignificant modifications to the
potential field. The difference between the old and new lattice can be expressed as:
|V1-V2|
---------------number of points
Delta =
Iteration can be terminated when Delta is less than some small value that corresponds to an insignificant
change in the solution. This method of relaxation of the potential field is one of many techniques which
can be used to solve for V(x,y,z). The Jacobi relaxation method is the simplest form of relaxation; other
methods are employed to speed up the relaxation process especially for large lattices.
Algorithm 6c
The following algorithm implements Jacobi Relaxation for a dipole potential. It employs a 33-by-33 twodimensional lattice and iterates until Delta is less than 10−6 . Note that the matrix is centered at (0,0).
Declare matrix V1(-16:+16,-16:+16)
Declare matrix V2(-16:+16,-16:+16)
V2=0.0
! Set all grid points to zero potential
V2(-6,0)=-1.0
V2(+6,0)=+1.0
! Create the -ve pole
! Create the +ve pole
do
V1=V2
! Make a copy of the old lattice
! Apply the solution to Laplace’s equation
! but don’t modify the boundary.
do i=-15,+15
do j=-15,+15
V2(i,j) = ( V1(i-1,j) + V1(i+1,j) + V1(i,j-1) + V1(i,j+1) )/4
end do
end do
V2(-6,0)=-1.0
V2(+6,0)=+1.0
! Reset the dipoles
61
! Compute the difference between the old and new solution
Delta = sum(abs(V1-V2)) / (33*33)
if (Delta < 0.000001) exit
! Terminate if Delta is small
end do
The Following are results for a dipole and a parallel-plate capacitor; values of potential are represented by
characters:
-zyxwvutsrqponmlkjihgfedcba.ABCDEFGHIJKLMNOPQRSTUVWXYZ+
|
|
|
|
|
-1V
-0.5V
0 V
+0.5V
1.0V
62
A Dipole (dipole.f90)
Initial field
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
...................--......................++...................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
Final field after 473 iterations
................................................................
................................................................
................................................................
.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........
.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...
...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...
...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...
...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...
.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.
.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.
.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.
.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.
.aabbccddeeffhhkkpp--oojjggddbb..BBDDGGJJOO++PPKKHHFFEEDDCCBBAA.
.aabbccddeeffhhjjmmoolliiffddbb..BBDDFFIILLOOMMJJHHFFEEDDCCBBAA.
.aabbccddeeffggiijjkkiiggeeddbb..BBDDEEGGIIKKJJIIGGFFEEDDCCBBAA.
.aabbbbccddeeffgghhhhggffeeccbb..BBCCEEFFGGHHHHGGFFEEDDCCBBBBAA.
.aaaabbccddddeeffffffffeeddccaa..AACCDDEEFFFFFFFFEEDDDDCCBBAAAA.
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
.aaaabbbbccccddddddddddccbbbbaa..AABBBBCCDDDDDDDDDDCCCCBBBBAAAA.
...aaaabbbbccccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCCCBBBBAAAA...
...aaaabbbbbbbbbbccbbbbbbbbaaaa..AAAABBBBBBBBCCBBBBBBBBBBAAAA...
...aaaaaabbbbbbbbbbbbbbbbaaaa......AAAABBBBBBBBBBBBBBBBAAAAAA...
...aaaaaaaaaabbbbbbbbaaaaaaaa......AAAAAAAABBBBBBBBAAAAAAAAAA...
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
.......aaaaaaaaaaaaaaaaaaaa..........AAAAAAAAAAAAAAAAAAAA.......
.........aaaaaaaaaaaaaaaa..............AAAAAAAAAAAAAAAA.........
................................................................
................................................................
................................................................
A parallel plate capacitor (capacitor.f90)
For a capacitor simply replace:
with:
V2(-6,0)=-1.0
V2(+6,0)=+1.0
V2(-6,-8:+8)=-1.0
V2(+6,-8:+8)=+1.0
Initial field
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
...................--......................++...................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
................................................................
Final field after 330 iterations
................................................................
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
...aaaabbbbbbccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCBBBBBBAAAA...
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
.aabbccccddeeffffggggffeeddccaa..AACCDDEEFFGGGGFFFFEEDDCCCCBBAA.
.aabbccddffgghhiiiiiihhggeeddbb..BBDDEEGGHHIIIIIIHHGGFFDDCCBBAA.
.aaccddeegghhjjkkllllkkiiggeebb..BBEEGGIIKKLLLLKKJJHHGGEEDDCCAA.
.bbcceeffhhjjllnnppqqoolliiffcc..CCFFIILLOOQQPPNNLLJJHHFFEECCBB.
.bbcceeggiikkmmpptt--ssnnjjggcc..CCGGJJNNSS++TTPPMMKKIIGGEECCBB.
.bbddffhhjjlloorruu--ttookkggdd..DDGGKKOOTT++UURROOLLJJHHFFDDBB.
.bbddffhhkkmmppssvv--uuppllhhdd..DDHHLLPPUU++VVSSPPMMKKHHFFDDBB.
.bbddffiikknnppssww--uuqqllhhdd..DDHHLLQQUU++WWSSPPNNKKIIFFDDBB.
.bbddggiillnnqqttww--uuqqmmhhdd..DDHHMMQQUU++WWTTQQNNLLIIGGDDBB.
.bbeeggiillooqqttww--vvqqmmhhdd..DDHHMMQQVV++WWTTQQOOLLIIGGEEBB.
.bbeeggjjllooqqttww--vvqqmmiidd..DDIIMMQQVV++WWTTQQOOLLJJGGEEBB.
.bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.
.bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.
.bbeeggjjlloorrttww--vvqqmmiidd..DDIIMMQQVV++WWTTRROOLLJJGGEEBB.
.bbeeggjjllooqqttww--vvqqmmiidd..DDIIMMQQVV++WWTTQQOOLLJJGGEEBB.
.bbeeggiillooqqttww--vvqqmmhhdd..DDHHMMQQVV++WWTTQQOOLLIIGGEEBB.
.bbddggiillnnqqttww--uuqqmmhhdd..DDHHMMQQUU++WWTTQQNNLLIIGGDDBB.
.bbddffiikknnppssww--uuqqllhhdd..DDHHLLQQUU++WWSSPPNNKKIIFFDDBB.
.bbddffhhkkmmppssvv--uuppllhhdd..DDHHLLPPUU++VVSSPPMMKKHHFFDDBB.
.bbddffhhjjlloorruu--ttookkggdd..DDGGKKOOTT++UURROOLLJJHHFFDDBB.
.bbcceeggiikkmmpptt--ssnnjjggcc..CCGGJJNNSS++TTPPMMKKIIGGEECCBB.
.bbcceeffhhjjllnnppqqoolliiffcc..CCFFIILLOOQQPPNNLLJJHHFFEECCBB.
.aaccddeegghhjjkkllllkkiiggeebb..BBEEGGIIKKLLLLKKJJHHGGEEDDCCAA.
.aabbccddffgghhiiiiiihhggeeddbb..BBDDEEGGHHIIIIIIHHGGFFDDCCBBAA.
.aabbccccddeeffffggggffeeddccaa..AACCDDEEFFGGGGFFFFEEDDCCCCBBAA.
.aaaabbccccddddeeeeeeddddccbbaa..AABBCCDDDDEEEEEEDDDDCCCCBBAAAA.
...aaaabbbbbbccccccccccbbbbaaaa..AAAABBBBCCCCCCCCCCBBBBBBAAAA...
.....aaaaaaaaaaaaaaaaaaaaaaaa......AAAAAAAAAAAAAAAAAAAAAAAA.....
................................................................
The Fortran programs for these two simulations can be found in the downloads section of the course website.
Results for various configurations of charges, and for larger matrix sizes can be found at:
http://www1.gantep.edu.tr/~andrew/eee484/downloads/laplace/
The program sources are also available at at that URL.
63
6.3
Lab Exercises
Task 1: Investigation of the potential fields for various configurations
First download the program source codes dipole.f90 and capacitor.f90 from the course web site downloads
page. Compile and run them. If you prefer then try to translate these programs to C or C++ etc. The two
programs are the same except for the definitions of the dipole and capacitor plates. The dipole potential is
defined with the assignments:
V2(-6,0)=-1.0
V2(+6,0)=+1.0
The capacitor plates are defined with the assignments:
V2(-6,-8:+8)=-1.0
V2(+6,-8:+8)=+1.0
Note that the assignments appear twice in each program.
To simulate other configurations of potentials the above assignments are simply replaced by the appropriate potential distribution (the rest of the program remains unchanged). For example a + (plus) shape
arrangement can be achieved with the assignments:
V2(0,-8:+8)=-1.0
V2(-8:+8,0)=+1.0
----------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++---------------++++++++++++++++++++++++++++++++++
----------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++-------------------------------++----------------
Modify the dipole program for the following configurations, run your programs and check the outputs:
1. A dipole with equally signed potentials
2. A monopole
3. A quadrupole
4. A box potential
After you have investigated the above configurations try some others.
Task 2: Implementation of Euler simulations of a charging R-C circuit and a mass-on-aspring
Implement the following Euler simulations given in the lecture into computer programs and compare the
outputs with the analytical results. If you have time compare the performance of the Simple Euler, EulerCromer and Improved Euler methods for these systems (the formulations can be tricky!).
1. A charging R-C circuit governed by the 1st-order D.E. dq/dt = (Vo -V)/R, where R = 1000 Ohms, C =
1 micro Farad, charging voltage Vo is 12 Volts. With the capacitor initially uncharged and a time step dt
= 10−5 seconds evolve the system for 1 millisecond.
Compare your result for the circuit voltage with the analytical solution: V(t) = Vo (1 - e−t/RC )
2. A mass-on-a-spring governed by the 2nd-order D.E. x” = -kx/m, where m = 0.1 kg, k = 1 N/m. With an
64
initial displacement xo = 0.1 m and the body initially at rest, using a time step of 0.01 s evolve the system
for 10 seconds.
Compare your result for the displacement of the mass with the analytical solution: x(t) = xo Cos(wt) where
w = sqrt(k/m).
65
6.4
Lab Solutions
Task 1: Investigation of the potential fields for various configurations
Modify the dipole program for the following configurations, run your programs and check the outputs:
1. A dipole with equally signed potentials
2. A monopole
3. A quadrupole
4. A box potential
Solutions: (See the downloads page):
Sources: eee484ex6a1
Outputs: eee484ex6a1.out
eee484ex6a2
eee484ex6a2.out
eee484ex6a3
eee484ex6a3.out
eee484ex6a4
eee484ex6a4.out
The potential definitions are:
A (+ +) dipole
V2(-6,0)=+1.0
V2(+6,0)=+1.0
A monopole
V2(0,0)=+1.0
A quadrupole
V2(-6,-6)=+1.0
V2(+6,+6)=+1.0
V2(-6,+6)=-1.0
V2(+6,-6)=-1.0
A box potential
V2(-8,-8:+8)=+1.0
V2(+8,-8:+8)=+1.0
V2(-8:+8,-8)=+1.0
V2(-8:+8,+8)=+1.0
Task 2: Implementation of Euler simulations of a charging R-C circuit and a mass-on-a-spring
1. A charging R-C circuit governed by the 1st-order D.E. dq/dt = (Vo -V)/R, where R = 1000 Ohms, C =
1 micro Farad, charging voltage Vo is 12 Volts. With the capacitor initially uncharged and a time step dt =
10−5 seconds evolve the system for 1 millisecond.
Compare your result for the circuit voltage with the analytical solution V(t) = Vo (1 - e−t/RC )
Solution eee484ex6b1 (see the downloads page).
For this Simple Euler simulation a 10 micro-second time step yields, initially, 0.5 percent error; in this case
the error reduces slightly as the system evolves (to 0.3 percent after 1ms). Reducing the time-step to 1
micro-second reduces this initial error to 0.05 percent. Again, it can be shown that the error is proportional
to the size of the time-step. A much more accurate simulation is gained with the Improved Euler method
eee484ex6b1improved; here the initially error of 0.5 percent error is quickly reduced to near zero.
2. A mass-on-a-spring governed by the 2nd-order D.E. x” = -kx/m, where m = 0.1 kg, k = 1 N/m. With an
initial displacement xo = 0.1 m and the body initially at rest, using a time step of 0.01 s evolve the system
for 10 seconds.
Compare your result for the displacement of the mass with the analytical solution x(t) = xo Cos(wt) where w
= SQRT(k/m).
Solution eee484ex6b2 (see the downloads page)
With the Simple Euler simulation although the period of the system is well reproduced the amplitude
of the system increases (thereby not conserving energy); you can check this by comparing the simulated
displacement with the expected theoretical displacement over a few periods. The Euler-Cromer method,
eee484ex6b2cromer.f90, is much more accurate, this reproduces well both the period and amplitude of the
system. Euler-Cromer is generally better for oscillating systems.
Conclusion
For the above simulations, and in general, we see that the Simple Euler method does not perform well. The
66
Improved Euler method can give a much greater accurate except in the case of oscillating systems where
the Euler-Cromer method is the preferred choice. Errors are proportional to the size of the time-step, dt,
i.e. the error in each simulation can be reduced by reducing the value of dt. However, reducing dt makes
it necessary to perform more iterations increasing the run-time of the simulation, also round-off errors may
become large (in this case double precision should be used).
Final note
In practice higher-order methods such as the 4th -order Runge-Kutta method are often employed. See rkshm.f90 in the downloads page.
67
6.5
Example exam questions
Question 1
Using the central-difference approximation to the second derivative
of a function F(x): CDA2 = ( F(x-h) - 2F(x) + F(x+h) ) / h^2
show that for a region of space that does not contain any
electric charge the following expression satisfies Laplace equation.
V(i,j,k) = [ V(i-1,j,k) + V(i+1,j,k) +
V(i,j-1,k) + V(i,j+1,k) +
V(i,j,k-1) + V(i,j,k+1) ] / 6
where the matrix V represents the potential at
discrete points i,j,k in a three-dimensional lattice.
Your answer should include an explanation of the mapping of
x, y, and z space onto i, j, and k points in the lattice.
Hint:
d^2 V
d^2 V
d^2 V
----- + ----- + ----- = 0
dx^2
dy^2
dz^2
Laplace’s equation for
chargeless region of
space
Question 2
Using the central-difference approximation to the second derivative
of a function f(x): CDA2 = ( F(x-dx) - 2F(x) + F(x+dx) ) / dx^2
and the forward-difference approximation to the first derivative
of a function f(t): FDA = ( F(t+dt) - F(t) ) / dt, show that the
finite-difference solution to the heat equation for a thin rod is
given by:
U(i,j+1) = r U(i-1,j) + (1-2r) U(i,j) + r U(i+1,j)
where the matrix U represents the temperature at
discrete points i,j in a two-dimensional lattice.
Your answer should include an explanation of the mapping of
x and t space onto i and j points in the lattice.
Hint:
dU/dt = c d^2 U / dx^2
Heat equation for a one-dimensional conductor.
Question 3
a) Write down the formulae representing the following methods for the
numerical evolution of the displacement, y(t), of a body governed
by a second-order differential equation y’’(t) = -g
i. Simple Euler method
ii. Improved Euler method
iii. Euler-Cromer method
68
b) Show that the Improved Euler method yields an exact result.
c) Write a computer program implementing the Improved Euler
method for the above system.
Question 4
a) A simple R-C circuit is governed by the 1st-order D.E:
i = dq/dt , where i is the current in the circuit: i = V/R
and V is the p.d. across the capacitor V = q/C.
Write down Euler steps representing the time-evolution of the potential difference
across the capacitor. Implement your formulae in a computer program.
b) A simple R-C circuit is governed by the 1st-order D.E.:
i = dq/dt
,
where i is the current in the circuit: i = (V0-V)/R
V0 is the charging voltage, and V is the potential
difference across the capacitor V = q/C.
Write down Euler steps representing the time-evolution of the potential difference
across the capacitor. Implement your formulae in a computer program.
c) The
F =
x’’
the
restoring force in a mass-on-a-spring system is given by
-k.x, the motion of the body is governed by the 2nd-order D.E.:
= -k.x/m
where k is the spring constant (N/m), x is the displacement, m is
inertial mass. Two 1st-order D.E.s can be formed: dv/dt = -k.x/m and dx/dt = v
Write down Euler steps representing the time-evolution of the
displacement of the mass. Implement your formulae in a computer program.
d) (In this question you are not given the differential equations describing the
system, so you need to build them yourself). Write down Euler steps representing
the time-evolution of the voltage V recorded by the voltmeter in the system shown
below. Implement your formulae in a computer program.
The diagram is
an R-C circuit
| |
+------| |-------+
|
| |
|
|
C
|
|
|
|
+------+
|
+----| R
|----+
|
+------+
|
|
|
|
|
+------(V)-------+
C = 1 micro Farad
R = 1000 Ohms
At t = 0
V = 12 volts
(V) is a voltmeter, you
can assume it has infinite
internal resistance.
69
7
Random Variables and Frequency Experiments
7.1
Topics Covered
o Review of Probability and Random Variables: the student should be familiar with the topics taught in
EEE 283 (operations on pdfs and pmfs, probabilities and conditional probabilities);
o Generation of pseudo-random numbers: the student should know how to generate random numbers (e.g.
by using the rand() function) and write computer programs to perform frequency experiments.
o Transformation of a uniform pdf to a non-uniform pdf: given a uniform pdf fX (x)=1 with 0<x<1 and
the transformation function y=T(x) the student should be able to determine and sketch the resultant
non-uniform pdf fY (y); the student should also be familiar with the ”rejection method” for generating a
non-uniform pdf from a uniform pdf.
7.2
Lecture Notes
Introduction
Probability, random variables and random processes are important topics in science and engineering. These
topics are covered in the course EEE 283 (check out my web site for that course). Key to this subject are the
ideas of the random variable, probability density functions (pdf’s) and probability mass functions (pmf’s)
and operations on them. This is treated theoretically in EEE 283. However, all the results given in that
course can be reproduced experimentally by taking a frequency interpretation of probability.
[A brief summary of Probability and Random Variables is given in class].
For example, consider the tossing of a coin; we know that the probability of the outcome being ”heads” is
0.5, in probability theory this is written P(”heads”) = 0.5. The frequency interpretation is P(”heads”) =
nheads /n in the limit n goes to infinity, where n is the number of tosses of the coin (the number of trials)
and nheads is the number of outcomes that give ”heads”, i.e. we perform an experiment where the coin is
tossed an infinite times and we count the number of times the coin comes up ”heads”. In reality a good
approximation to the probability can be obtained with a large finite value of n.
Such frequency experiments allow us to verify that a theoretical result is true (a great help when writing
exam questions for EEE 283!) and find results for cases where the theoretical calculations are difficult to
evaluate (i.e. many problems in the real world).
To perform a frequency experiment one needs to generate a large number of trials. Often we have to do this
by hand (e.g. to test the effect of a new drug a clinical trial is perform where a large number of patients
is given the drug while another large number of patients is not, the outcomes are analysed statistically).
However if we know the underlying probabilities that govern a system (e.g. P(”heads”) = 0.5) then we can
simulate an experiment using a computer. For this we need to be able to generate a probability density
function (pdf), i.e. lists of numbers X = (x1 , x2 , x3 , ..., xn ) that are randomly distributed according to some
function fX (x). The most basic pdf is fX (x)=1 with 0<x<1, i.e. a unform distribution.
Generating Random Numbers
In Fortran 90 the intrinsic subroutine ”random number” provides the programmer with lists of random
numbers uniformly distributed in the range 0<r<1. In the following example, array r is filled with random
numbers:
Algorithm 7a1 (Fortran syntax)
real :: r(8)
call random_number(r)
print *, r
70
Example result:
0.983900 0.699951 0.275312 0.661102 0.809842 0.910005 0.304463 0.484259
The equivalent in C++, using the intrinsic ”rand()” function, is:
Algorithm 7a2 (c++ syntax)
for (int i=1; i<=8; i++) {
double r = rand()/(double(RAND_MAX)+1);
cout << r << " ";
}
cout << endl;
Example result:
0.840188 0.394383 0.783099 0.798440 0.911647 0.197551 0.335223 0.768230
These programs output 8 pseudo-random numbers. Random number generators create a sequence of pseudorandom numbers usually distributed uniformly between 0 and 1. The numbers are not truly random, they
are created by an deterministic algorithm hence the term pseudo-random.
There are various algorithms for producing large sequences of random numbers with varying qualities. The
quality of a random number generator relates to four main properties:
1. The apparent randomness off the sequence.
2. The size of the period of the sequence, i.e. how many numbers are generated before the sequence repeats;
this varies from 109 in a minimal standard generator to 1043 or more in high quality generators.
3. The uniformity of the distribution of random numbers; is the distribution flat? does it have gaps?
4. The distribution should pass some statistical/spectral tests.
|_________________|
|
|
|
|
+-----------------+- R
0
1
|________-___-____|
|
|
|
|
+-----------------+- R
0
1
|________ ___ ____|
|
|
|
|
+-----------------+- R
0
1
Uniform (flat) distribution
of a set of random numbers
Distribution of a set of random
numbers with some non-uniformity
Distribution of a set of random
numbers with gaps.
71
A popular primitive algorithm is the multiplicative linear congruential generator first used in 1948; with
carefully chosen constants this generator provides a good basic generator:
Ri+1 = ( a Ri + b ) MOD m, where MOD means modulo.
Constants a, b and m are chosen carefully such that the sequence of numbers becomes chaotic and evenly
distributed. Park and Miller proposed a minimal standard with which more complex generators can be
compared; the constants are taken as:
a = 75 = 16807, b = 0, and m = 231 - 1 = 2147483647.
The range of values is 1 to m (divide by m to convert to 0<r<1). The period of this generator is m-1, about
2 billion.
In a computer implementation of this algorithm using 32 bit integers is not straight forward as a times R
can be out of the integer range; we have to apply a trick (approximate factorisation of m). The algorithm
is implemented below in Fortran 90 (see also the downloads page on the course website for the Fortran
77 and C and C++ versions of this program). The algorithm is in the form of a function ran() to which
a seed is passed. The function returns a random number, the seed returns modified so that the next call
of the function returns the next random number in the sequence. Before the first call to the function the
seed needs to be initialised with any value 1 to 2147483647 (not zero). Different initial seed values result in
different sequences of random numbers.
ran.f90
integer :: i, iseed
real :: r
iseed=314159265 ! Initialise the seed
do i=1,10
r=ran(iseed) ! ran() is a function that
print *, r
! returns a random number.
end do
contains ! ran() is defined as an internal function:
real function ran(iseed)
!-------------------------------------------------------------! Returns a uniform random deviate between 0.0 and 1.0.
! Based on: Park and Miller’s "Minimal Standard" random number
!
generator (Comm. ACM, 31, 1192, 1988)
!-------------------------------------------------------------implicit none
integer, intent(inout) :: iseed
integer, parameter :: IM=2147483647, IA=16807, IQ=127773, IR= 2836
real,
parameter :: AM=128.0/IM
integer :: K
K = iseed/IQ
iseed = IA*(iseed-K*IQ) - IR*K
IF (ISEED < 0) iseed = iseed+IM
ran = AM*(iseed/128)
end function ran
end
For the given initial seed 314159265 the program gives the following sequence of values:
0.7264141
0.0418309
0.8427828
0.0521500
0.6508798
0.4857842
0.3372238
0.5747718
72
0.7214876
0.1894919
As mentioned above, the period of this algorithm is m-1 = 2147483647. This is actual not large, for example
my 2.4 GHz cpu takes only 23 seconds to generate the complete sequence of random numbers! Compare this
to, for example, simulations of high energy particle reactions where farms of computers generate datasets
over days, then the period of this generator is clearly not sufficient. Improved algorithms are available
providing uniform distributions of random numbers with periods of 1012 , 1018 , 1043 and even 10171 ; these
algorithms, however, are much more complex.
Frequency Experiments
We now have a method for generating large numbers of (pseudo) random numbers ”call random number(r)”
in Fortran 90, and ”double r = rand()/(double(RAND MAX)+1);” in C++; and we can return to the
frequency experiments. Consider again the tossing of a coin. We can create an experiment by generating a
large number of random values (uniformly distributed between 0 and 1), calling any value less than 0.5 a
”head”, and count the number of times this occurs. This is illustrated in the algorithm below where a coin
is tossed one million times (n=1000000) and outputs the fraction nheads/n.
Algorithm 7b (Fortran syntax [mostly])
The second (concise) form of the algorithm makes use of Fortran whole-array processing and intrinsics.
n = 1000000
nheads=0
do i = 1, n
call random_number(r)
if (r<0.5) nheads=nheads+1
end do
output nheads/n
real :: r(1000000)
call random_number(r)
print *, count(r<0.5)/1000000.
end
Example output: 0.499687
The result is close to, but not exactly, the expected value of 0.5 because the process of generating outcomes
is random. Repeating the above experiment with different sample sizes, n, gives the following results:
n
100
1000
10000
100000
1000000
nheads/n
0.47
0.499
0.5030
0.50067
0.499687
Note that the difference between the value of nheads/n and 0.5 gets smaller as n increases, i.e. the experiment becomes more accurate as the statistics increases.
Operations on Random Variables
Continuing the subject of random variables, a important topic is that of operations on random variables.
Basic operations include the calculation of the expectation value and variance of a probability density function. The expectation value E[X] is the first moment about the origin (denoted by m1 ); it can be viewed
as the center of mass or arithmetic mean of a distribution and is defined as E[X] = m1 = the integral of
the product ”x f(x)”. The variance is the second moment about the mean (denoted by mu2 ) it represents
a measure of the size of the spread of the distribution about the mean m1 , and is defined as E[(X-m1 )2 ]
= mu2 = the integral of the product ”(x-m1 )2 f(x)”. The variance can also be equated by simple algebraic
arguments as mu2 = m2 - m1 2 where m2 is the second moment about the origin defined as E[X2 ] = m2 = the
integral of the product ”x2 f(x)”. We will now calculate the expectation value and variance for the uniform
pdf both theoretically and via a frequency experiment as follows.
73
Theory:
We have the random variable X with a pdf f(x)=1 in the range 0<x<1. E[X] = m1 = the integral of the
product ”x f(x)” = 1/2 (see class notes for the integral), and the variance E[(X-1/2)2 ] = mu2 = the integral
of the product ”(x-1/2)2 f(x)” = 1/12 (see class notes for the integral). Alternatively mu2 = m2 -m1 2 ; with
m2 = the integral of the product ”x2 f(x)” = 1/3, mu2 = 1/3 - (1/2)2 = 1/12.
Experiment:
We now generate n random variables X = (x1 , x2 , x3 , ...., xn ) from a set of uniformly distributed values
0<x<1, and calculate experimentally the expectation value and the variance. For this we can use directly
the random number generator intrinsic to the Fortran or C++ compiler. In this frequency experiment the
calculation of the expectation value E[X] = m1 becomes the sum of the values of x normalised to the number
of values; i.e. m1 = (x1 +x2 +x3 +...+xn )/n which is simply the arithmetic mean. For the variance mu2 , it is
convenient to use the equality mu2 = m2 -m1 2 which requires m2 = (x1 2 +x2 2 +x3 2 +...+xn 2 )/n.
Algorithm 7c (Fortran syntax [mostly])
The second (concise) form of the algorithm makes use of Fortran whole-array processing and intrinsics.
n = 1000000
m1=0, m2=0
do i = 1, n
call random_number(x)
m1 = m1 + x
m2 = m2 + x^2
end do
m1 = m1/n
m2 = m2/n
output "
mean, m1 = ", m1
output "variance, mu2 = ", m2-m1^2
integer, parameter :: n = 1000000
real(kind=8) :: x(1000000), m1, m2
call random_number(x)
m1 = sum(x)/n
m2 = sum(x**2)/n
print *, "
mean, m1 = ", m1
print *, "variance, mu2 = ", m2 - m1**2
end
Example output (for n = 1000,000):
mean, m1 =
variance, mu2 =
0.5000423 [1/1.9998]
0.0832316 [1/12.015]
Increasing the number of trials to n = 1000,000,000 (the program take less than 1 minute to run!) gives:
mean, m1 =
variance, mu2 =
0.500005976 [1/1.99998]
0.083332809 [1/12.00008]
While we do not obtain the exact values, it is clear that the expectation value and variance tends toward
the theoretical values for large n.
The above demonstration can be repeated for the triangular pdf: f(x) = 2x with 0<x<1. Here, E[X] = m1
= the integral of the product ”x 2x” = 2/3 (see class notes for the integral), and the variance E[(X-2/3)2 ] =
mu2 = the integral of the product ”(x-2/3)2 2x” = 1/18 (see class notes for the integral). Alternatively mu2
= m2 -m1 2 ; with m2 = the integral of the product ”x2 2x” = 1/2, mu2 = 1/2 - (2/3)2 = 1/18.
For the experiment, Algorithm 7c only needs to be modified such that random numbers are distributed in
the form of a triangular pdf. This is obtained with transformation x=sqrt(x); i.e. replace ”call random
number(x)” with ”call random number(x); x=sqrt(x)” [the transformation of distributions will be studied
later in this topic]. The result for n = 1000,000,000 is:
mean, m1 =
variance, mu2 =
0.6666715428590047
[2/2.999978]
0.055555029965474394 [1/18.013]
Again the experimental results are in agreement with the theoretical results.
74
Total Probability and Conditional Probability
Total probabilities are obtained by integrating the pdf. The probability of obtaining a value between a and
b is defined as P(a<X<b) = integral of f(x) over the limits a to b. For example for the triangular pdf : f(x)
= 2x with 0<x<1, the probability of obtaining a value between 0.5 and 0.9 = P(0.5<x<0.9) = the integral
of 2x over the limits 0.5 to 0.9 = 0.56.
Experimentally, the probability is obtained by simply counting the number of values that appear within the
given range. This is illustrated below:
Algorithm 7d1 (Fortran syntax [mostly])
n = 1000000
m = 0
do i = 1, n
call random_number(x); x=sqrt(x)
if (x>0.5 .and. x<0.9) m = m+1
end do
output "P{0.5$<$X$<$0.9} = " m/n
integer, parameter :: n = 1000000
real :: x(n), m
call random_number(x); x=sqrt(x)
m = count(x>0.5 .and. x<0.9)/real(n)
print *, "P{0.5$<$X$<$0.9} = ", m
(remember that the statement ”x=sqrt(x)” transforms the uniform pdf to a triangular pdf)
The result is shown below for various values of n.
n =
1000 P{0.5<X<0.9} = 0.574
n =
1000000 P{0.5<X<0.9} = 0.560993
n = 1000000000 P{0.5<X<0.9} = 0.560003037
For a large number of trials the the probability tends toward the theoretical result.
From probability theory, the conditional probability P(A!B) = P(A intersect B)/P(B) where P(A!B) reads
”the probability of A given that B has occurred”. For example P(0.5<X<0.9!X>0.6) = P(0.5<X<0.9 intersect X>0.6) / P(X>0.6) = P(0.6<X<0.9) / P(X>0.6) = 0.45/0.64 = 0.703125. See the lecture for the
full calculations.
Experimentally, the probability is obtained by simply counting the number of values that appear within the
given range after first requiring that x>0.6. This is illustrated below:
Algorithm 7d2 (Fortran syntax [mostly])
do
call random_number(x); x=sqrt(x)
if ( x<0.6 ) cycle
n = n+1
if ( x>0.5 .and. x<0.9 ) m = m+1
if ( n==1000000 ) exit
end do
output "P{0.5$<$X$<$0.9|X>0.6} = ", m/n
!
!
!
!
!
a triangular pdf
condition X>0.6 has occurred
increase the trial count
condition 0.5$<$X$<$0.9
exit when there are enough trials
Here, ”cycle” means return the the top of the loop, and ”exit” means drop out of the loop.
The result is shown below for various values of n.
n =
1000
n =
1000000
n = 1000000000
P{0.5$<$X$<$0.9|X>0.6} =
P{0.5$<$X$<$0.9|X>0.6} =
P{0.5$<$X$<$0.9|X>0.6} =
0.691
0.703951
0.703116218
As the number of trials increases the result moves closer to the theoretical value obtained from the rule
P(A!B) = P(A intersect B)/P(B).
75
The Binomial probability mass function
If the probability of success for a single trial is p then the probability of k successes out of n trials is given by
the Binomial pmf f(k) = [n k] pk (1-p)n−k where the binomial coefficient is [n k] = n! / k!(n-k)!. A program
to generate this pmf can be found on the download page (binomial-pmf.f90).
For example if p=0.25 and n=6, the pmf is:
k
0
1
2
3
4
5
6
P{k}
0.177979
0.355957
0.296631
0.131836
0.032959
0.004395
0.000244
Experimental
0.177988
0.356018
0.296520
0.131854
0.032981
0.004395
0.000244
The column indicated by ”experimental” are obtained with the following algorithm.
Algorithm 7d3 (Fortran syntax [mostly])
In this algorithm m is an integer vector with 7 elements indexed 0 to 6, and x is a real vector with 6 elements.
n = 100000000
real x(6); integer m(0:6) = 0
do i = 1, n
call random_number(x) ! generate 6 random values
k = count(x<0.25)
! count how many are < 0.25
m(k) = m(k)+1
end do
output "P{k} = " m/n
Note that the output statement contains 7 values from the vector array m (see the above table).
For a large number of trials the the probability tends toward the theoretical result. This experimental result
has two consequences. First, it demonstrates the correctness of the expression for the binomial distribution,
and second it demonstrates a computational method for simulating stochastic processes.
In the next topic we will look in more detail at performing computational simulations of systems that involve
stochastic processes; this field of study is called Monte Carlo Simulation. Before we can do this we need to
know how to generate pdf’s of any required form.
Generation of Non-uniform Random Distributions
In simulations of random processes we often require a non-uniform distribution of random numbers. For
example, radioactive decay is characterised by an exponential pdf: fX (x) = a e−ax with x≥0. There are two
useful methods for generating such non-uniform distributions:
1. The transformation method.
2. The rejection method
The aim of both these methods is to convert a uniform distribution of random numbers of the form fX (x)=1
with 0<x<1, into a non-uniform distribution of the form fY (y) with a<y<b; this is illustrated below.
fx|
|_________________|
|
|
|
|
+-----------------+- x
0
1
Uniform (flat) distribution
of a set of random numbers.
76
fy|
_____
|
/
\
|
/
\______
| /
\
+-+-----------------+- y
a
b
Non-uniform distribution
of a set of random numbers.
The shape of the pdf is arbitrary.
1. The Transformation Method
Consider a collection of variables X = (x1 ,x2 ,x3 ,...) that are distributed according to the pdf fX (x), then
the probability of finding a value that lies between x and x+dx is fX (x) dx. If y is some function of x then
we can write:
|fx(x)dx| = |fy(y)dy|
where fY (y) is the pdf that describes the collection Y = (y1 , y2 , y3 , ....).
Now let fX (x)=1 with 0<x<1, e.g the uniformly distributed random numbers that we generate via the
”random number()” intrinsic subroutine in Fortran; then we can write
dx = |fy(y)dy| and so fy(y) = |dx/dy|
fx|
1|_________________|
|
|
|
|
+-----------------+- x
0
1
fx(x)=1 and so fy(y) = |dx/dy|
And so in order to obtain a sequence characterised by the distribution fY (y) we must find a transformation
function y = T(x) that satisfies:
|dx/dy| = fy(y)
Example 1:
Consider that we want the exponential distribution fY (y) = a e−ay ,
the transformation function is then y = T(x) = -ln(x)/a.
Proof: y = -ln(x))/a and so x = e−ay and so abs(dx/dy) = abs(- a e−ay ) = a e−ay = fY (y)
Example 2:
Consider that we want the distribution fY (y) = 2y
fx|
1|________
|
|
+-------+ x
0
1
fy|
2| /
| /
|/
+---+- y
0
1
the transformation function is then y = T(x) = x1/2
Proof: y = x1/2 and so x = y2 and so dx/dy = 2y = fY (y)
A quick check for the correctness of the transformation is that the integral of the two distributions is equal:
integral [fX (x) dx] = 1 times 1 = 1; integral [fY (y) dy] = 0.5 times 1 times 2 = 1 i.e. the integrals of the two
distributions are both 1 (remember that any pdf must always integrate to unity i.e. the total probability is
one).
77
Example 3:
We wish to obtain the pdf fY (y) = 0.5 Sin(y) with 0<y<π.
The transformation function is y = T(x) = Cos−1 (1-2x).
Proof: y = Cos−1 (1-2x) and so x = 0.5 (1-Cos(y)) and dx/dy = 0.5 Sin(y) = fY (y). The range of y is
T(0)<y<T(1) = 0<y<π as required. One check for correctness is that the integral should be one: integral
of 0.5 Sin(y) over the limits 0<y<π = 1
Algorithm 7e (Fortran syntax)
The following algorithm implements the transformation given in Example 2.
real :: x(8000), y(8000)
call random_number(x); y = sqrt(x)
Array x is filled with random numbers from a uniform distribution in the range 0<x<1 Array y is assigned the
transformation of these numbers giving a distribution fY (y) = 2y with the range sqrt(0)=0<y<sqrt(1)=1.
We can use the ”minval” and ”maxval” intrinsic functions to inspect the ranges:
print *, minval(x), maxval(x), minval(y), maxval(y)
gives:
0.0000973344 0.99984
0.0098658195 0.99992
The two distributions are illustrated in the histograms below. First the uniform distribution of random
numbers from fX (x); here, 8000 random numbers have been generated and placed into a 20-bin histogram
(the histogram is turned on its side).
1
0+-------------------+---> fx(x)
|#####################
|####################
The range of values is from 0 to 1.
|####################
Each ’#’ symbol represents 20 numbers.
|###################
The average number of entries per bin
|####################
is 8000 values / 20 bins = 400 values
|######################
=> 400/20 = 20 ’#’s.
|###################
|####################
Given that the values are created
|#####################
randomly, statistical theory tells us
|###################
that we expect a variation from
|####################
bin-to-bin of: sigma = sqrt(n)
|#####################
= sqrt(400) = 20 = 1 ’#’
|####################
|####################
i.e. we expect a variation of 1 or 2 ’#’
|####################
as is seen in this histogram.
|######################
|###################
|###################
|#####################
|###################
1+
|
x
78
Next, each value x is transformed to y = T(x) = x1/2 , the resultant distribution, fY (y), is shown in the
histogram below.
2
0+------------------------------------+----> fy(y)
|#
|###
|#####
|########
|#########
|############
The distribution is the required
|#############
fy(y)=2y in the range 0 < y < 1.
|###############
|################
The number of entries is 8000
|###################
(as each y value corresponds to
|#######################
an x value).
|######################
|#########################
The function is not exact due
|##########################
to statistical variations in
|#############################
the distribution fx(x).
|################################
|################################
|#####################################
|###################################
|#######################################
1+
|
y
2. The Rejection method
The transformation method is useful when the transformation function y=T(x) can be derived easily. The
rejection method provides an alternative to the transformation method, it has the advantage of being able
to create any required distribution. In this method a sequence of random numbers X = (x1 , x2 , x3 , ...., xn )
is generated with a uniform distribution in the range of interest, ymin to ymax . Now suppose that our goal
is to produce a sequence of numbers distributed according to the function fY (y):
fy(y)
|
|_____________________________________ fmax
|
/ \
|
_________/
\
|
/
\
|
____/
\
|
/
\__
|
/
\
|
/
\
+--+-------------------------------+--- y
y_min
y_max
We proceed through the sequence (x1 , x2 , x3 , ..., xn ) and accept values with a probability proportional to
79
fy (x). This is achieved as follows: for each value of x a new random number, ptest , distributed uniformly
in the range 0<ptest <fmax , is generated. If fY (x) is greater than ptest then number x is kept (added to set
Y) otherwise it is removed (rejected) from the sequence. The probability of number xi passing the test is
proportional to fY (xi ). The resultant set Y = (y1 , y2 , y3 , ...., ym ) [m≤n] is therefore distributed according
to the function fY (y).
Algorithm 7f (Fortran syntax)
Consider that we want a distribution fY (y) = 0.5 + Sin2 (y) in the range π<y<3π.
fmax is therefore 0.5 + 1.02 = 1.5, and so ptest is generated from 0 to 1.5.
integer :: i, m=0, n=8000
real :: x(n), y(n), Pi=3.141593, fmax=1.5, Ptest
call random_number(x)
x = pi+2*pi*x
! x is random and uniform in the range pi < x < 3pi
do i = 1, n
call random_number(Ptest)
Ptest = Ptest*fmax
! range 0 < Ptest < fmax
if (0.5+sin(x(i))**2 > Ptest) then
m = m+1
! entry x(i) passed the test
y(m) = x(i) ! so we record it in y(m)
end if
end do
Result:
Before Rejection
0
pi+----------------------> fx(x)
|#####################
|###################
|####################
|####################
|###################
|####################
|###################
|####################
|#####################
|###################
|###################
|####################
|#####################
|####################
|#####################
|#####################
|####################
|###################
|#####################
|###################
3pi+
x
The original sequence contains
8000 entries distributed uniformly
in the range pi < x < 3pi
80
After Rejection
pi+---------------------> fy(y)
|#######
|#########
|#############
|#################
|###################
|####################
|################
|#############
|##########
|#######
|########
|##########
|##############
|#################
|####################
|####################
|#################
|############
|##########
|######
3pi+
y
The distribution of sequence y
(after rejection of some entries)
shows a sine-squared function of
amplitude 1.0 on a base of 0.5.
The number of entries is 5321
which is approximately equal to:
integral fy(y) [y=pi,3pi]
8000 --------------------------fmax * (3pi-pi)
= 8000 2pi / (1.5*2pi) = 5333
The number of rejected entries is
8000-5321 = 2679
Discussion:
The rejection method is less efficient than the transform method as it requires two random numbers to be
generated for each entry, and some numbers are wasted (rejected). Implementation is also more difficult.
However, the advantage is that any distribution can be generated, whereas the transformation method is
limited to distributions where the transformation function can be calculated.
Monte Carlo
Computer simulations of systems that involve some random process are called Monte Carlo simulations.
In such simulations random numbers are generated with the appropriate distribution corresponding to the
physical random process. There are a vast array of Monte Carlo applications in science and engineering.
We will look at some basic Monte Carlo simulations next week.
81
7.3
Lab Exercises
1. Implement Algorithm 7b into a computer program, compile and run the program. Check that P(”heads”)
= nheads /n tends to 0.5 for large n. Repeat this experiment replacing the coin with two dice. Determine
the probabilty that a double six is thrown. Compare this with the theoretically expected outcome.
2. Implement Algorithm 7c (with the transformation to a triangular pdf) into a computer program, compile
and run the program. Check that m1 = 2/3 and mu2 = 1/18. Modify the experiment to verify the identity
E[20X+30] = 20 E[X] + 30.
3. Modify the above program to compute the expectation value and variance of the pdf fX (x) = 0.5 Sin(x)
for 0<x<π.
Compare you results with the theoretical solutions. Hint: the transformation function can be found in the
lecture notes.
4. A nucleus of an atom has a probability of decaying that is described by the pdf fT (t) = 0.1 exp(-0.1t)
where t≥0 is the time in seconds. Write a program to perform a stochastic experiment to determine a) the
probability that the nucleus survives 20 seconds, b) assuming the nucleus has already survived 10 seconds
what is the probability that the nucleus survives 20 seconds? Compare you results with the theoretical
solutions.
Hint: Algorithm 7d1 will be helpful for part a, and Algorithm 7d2 for part b; the required transformation
function can be found in the lecture notes.
82
7.4
Lab Solutions
1. Implement Algorithm 7b into a computer program, compile and run the program. Check that P(”heads”)
= nheads /n tends to 0.5 for large n. Repeat this experiment replacing the coin with two dice. Determine the
probability that a double six is thrown. Compare this with the theoretically expected outcome.
Solution: eee484ex7a (see the downloads page).
This program counts the number of times two random numbers are both < 1/6 repeating this for 100,000,000
trials. The result is 2780456/100000 = 0.02780456 = 1/35.965; the theoretical expectation is 1/6*1/6 =
1/36.
2. Implement Algorithm 7c (with the transformation to a triangular pdf ) into a computer program, compile
and run the program. Check that m1 = 2/3 and mu2 = 1/18. Modify the experiment to verify the identity
E[20X+30] = 20 E[X] + 30.
Solution: eee484ex7b (see the downloads page).
The result for [n=100000000 and call random number(x); x=sqrt(x)] is
E[X] =
0.6666696399773584
To find E[20X+30] for this pdf we simply transform each generated value of x as follows: x = 20 x + 30.
The result for [n=100000000 and call random number(x); x=sqrt(x); x = 20*x + 30] is
E[20X+30] =
43.33339279955304
Note that 20 E[X] + 30 = 20 * 0.6666696399773584 + 30 = 43.33339279954717, and so E[20X+30] = 20
E[X] + 30 is shown experimentally.
If you are not convinced then you can repeat the experiment with different values and different pdfs.
3. Modify the above program to compute the expectation value and variance of the pdf fX (x) = 0.5 Sin(x)
for 0<x<π.
Compare you results with the theoretical solutions.
Hint: the transformation function can be found in the lecture notes.
Solution: eee484ex7c (see the downloads page).
From the lecture notes, the transformation function is y = T(x) = Cos−1 (1-2x); i.e. to generate the pdf
fX (x) = 0.5 Sin(x) for 0 < x < π the Fortran code is:
call random_number(x)
x = acos(1-2*x)
! x is uniform {0 < x < 1}
! x = 0.5 Sin(y) {0 < x < pi}
The output of the program is:
mean, m1 =
variance, mu2 =
1.5707902846841562
0.4673394386775112
= pi/2
- 0.000006
= pi/4-2 - 0.00006
From theory: m1 =pi/2, m2 =pi2 /2 - 2, and so mu2 = pi2 /2 - 2 - (pi/2)2 = pi2 /4 - 2 = 0.4674011002723394;
the experimental results are in agreement with the theory.
4. A nucleus of an atom has a probability of decaying that is described by the pdf fT (t) = 0.1 exp(-0.1t)
where t≥0 is the time in seconds. Write a program to perform a stochastic experiment to determine a) the
probability that the nucleus survives 20 seconds, b) assuming the nucleus has already survived 10 seconds
what is the probability that the nucleus survives 20 seconds? Compare you results with the theoretical solutions.
Hint: Algorithm 7d1 will be helpful for part a, and Algorithm 7d2 for part b; the required transformation
function can be found in the lecture notes.
83
Solution: eee484ex7d1, eee484ex7d2 (see the downloads page).
From the lecture notes, the transformation is x=-log(x)/0.1.
Part a asks for P(X>20 seconds), the theoretical result is 1/e2 , the output of the program for n = 100000000
is P(X>20) = 0.13533197 = 1/e2 - 0.000003
Part b asks for P(X>20!X>10), the theoretical result is 1/e, the output of the program for n = 100000000
is P(X>20!X>10) = 0.36785103 = 1/e - 0.00003
The experimental results are in agreement with the theoretical.
84
7.5
Example exam questions
Question 1
For the following transformation functions transforming a uniform pdf
fx(x) in the range 0 < x < 1 into non-uniform pdf fy(y):
a) y=SQRT(3+x),
i.
ii.
iii.
iv.
b) y=ArcCosine(1-2x),
c) y=1/(x+0.5)
Determine the transformed probability density functions fy(y).
Write down the range of y values.
Sketch the distribution fy(y).
Show that the integral of the transformed pdf is equal
to the integral of the original uniform pdf.
Question 2
Write a computer program that performs a frequency experiment
to determine the probability that the outcome of throwing a
coin and die is (a "head" and a "3").
85
8
Monte-Carlo Methods
8.1
Topics Covered
o Monte Carlo integration; the student should be able to write a computer program to integrate a function
f(x) using the Monte Carlo method.
o Monte Carlo simulation; the student should be able to write a simple Monte Carlo simulation to solve a
problem that involves some underlying random process.
8.2
Lecture Notes
Introduction
Armed with methods that allow us to generate any pdf we can now attempt to simulate less trivial physical
processes. Such simulations are call Monte Carlo Simulation. But first, we will use the Monte Carlo method
as an alternative method for integration (Monte-Carlo Integration).
Monte Carlo Integration
In Monte Carlo integration numerical integration of a function is performed by making use of random
numbers. We will see that for simple one-dimensional functions MC integration is not as effective as other
numerical integration methods (though for high-dimensional integrals the MC method can be more efficient).
To introduce Monte Carlo integration we will compute the area of a circle and hence a value for π as follows:
generate n pairs of random numbers, x and y, each uniformly distributed between 0 and 1; count the number
of pairs m which satisfy the condition x2 + y2 < 1 (i.e. they lie inside a circle of unit radius); then the ratio
n/m = 1/(π/4) and so π = 4m/n .
Algorithm 8a (Fortran syntax [mostly])
m=0, n=10000000
do i = 1, n
call random_number(x)
call random_number(y)
if ( x**2+y**2 < 1.0 ) m = m+1
end do
output n, 4*m/real(n)
Results for different values of n are tabulated below, estimates for π are given in column 2 and the errors in
column 3.
n
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
4*m/n
3.156
3.1308
3.14768
3.143304
3.1420207
3.1417096
3.1415832
Error = 4*m/n -pi
0.014407259
-0.010792741
0.006087259
0.001711259
0.000428059
0.00011685899
-0.000009637012
The method works, but clearly it is not computationally efficient as it requires one billion random numbers
to achieve only a 4 or 5 decimal place accuracy. As n increases the accuracy of the computed value of π
improves (except for some statistical variations). It can be shown, by repeating the experiment many times
(with different seeds), that the error is generally proportional to 1/n1/2 .
86
We will now look at a simple MC method for the integral I of a function f(x), the method is as follows:
1. Enclose the function in a box area A and determine ymax .
y=f(x)
|
| ______________________________ y_max
| |
/ \
|
| | A
_________/
\
|
| |
/
\
|
| | ____/
\
|
| | /
I
\_ |
| |/
\|
0 +--+----------------------------+--- x
a
b
2. Uniformly populate the box with n random points.:
generate two random numbers r1 , r2 , a random point in the box is then x = a + (b-a) r1 and y = ymax r2
3. Count the number of points m that lie below the curve f(x).
4. The integral is then estimated from: I/A = m/n and so I = A (m/n) and so I = (b-a)ymax m/n.
Example: in a previous lecture ”Numerical Integration” we use the Extended Trapezoidal Formula to integrate the function f(x) = x3 - 3x2 + 5 over the range x = 0.0 to 2.5. With n=1000 (1000 intervals) the
result is 6.640627 (error = 0.000002). The MC integration is as follows:
For this function, in the integral range, we have turning points at x=0.0 (maximum) and x=2 (minimum)
and so ymax = f(0.0) = 5. So we have a = 0, b = 2.5, ymax = 5.0.
Algorithm 8b (Fortran syntax [mostly])
The second program is a complete concise Fortran source using whole-arrays.
m = 0
input a, b, Ymax, n
do i = 1, n
call random_number(r1)
call random_number(r2)
x = a + (b-a)*r1
y = Ymax*r2
if ( y < f(x) ) m = m+1
end do
output (b-a)*Ymax*m/real(n)
integer, parameter :: n=1e8
integer :: m
real :: a=0.0, b=2.5, Ymax=5.0
real :: x(n), y(n)
call random_number(x); x = a + (b-a)*x
call random_number(y); y = Ymax*y
m = count(y < x**3-3*x**2+5)
print *, (b-a)*Ymax*m/real(n)
end
define function F(x) = x**3 - 3*x**2 + 5
The result for n=1000 is 6.450 (error = -0.190625). Repeating for increasing values of n gives:
n
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
m
516
5320
53194
530692
5309750
53121655
531243958
integral
6.450000
6.650000
6.649250
6.633650
6.637187
6.640207
6.640550
Error
-0.190625
0.009375
0.008625
-0.006975
-0.003437
-0.000418
-0.000076
87
Again the error reduces with n1/2 (you have to perform many experiments, with different seeds, to see this
more clearly).
The accuracy of this simple Monte Carlo integration method is not good when compared to other numerical
methods such as Trapezoidal or Simpson integration, however, the MC integration method can be refined
to give improved accuracy (see text books for the details). Moreover, for high-dimensional integrations
the MC method can be more efficient than other integration methods. MC integration is also useful for
discontinuous shapes, for example a torus with a slice cut out of it; the shape is expressed more easily in a
MC program than in Trapezoidal or Simpson integration algorithms.
Monte Carlo Simulation
We will look at two example simulations:
1. a binary communication system, and 2. propagation of errors through a measurement system.
1. A binary communication system
A binary communication system consists of a transmitter that sends a binary ”0” or ”1” over a channel to
a receiver. On average, the transmitter transmits a ”0” with a probability of 0.4 and a ”1” with a probability of 0.6. The channel occasionally causes errors to occur flipping a ”0” to ”1” and a ”1” to a ”0”; the
probability of this error occurring is 0.1.
Using this information, we wish to calculate the following:
a) The probability of a ”1” being transmitted without error.
b) The probability of a ”0” being transmitted without error.
c) If a ”1” is observed at the receiver, what is the probability of it being correct.
d) If a ”0” is observed at the receiver, what is the probability of it being correct.
The first two calculations are trivial, but the second two requires Bayes’ theorem. The solutions to these
questions are given below.
P(A0)=0.4
P(A1)=0.6
A0
A1
+
+
|\
/|
| \
/ |
| \
/ |
|
\
/
|
|
\
/
|
P(B0|A0) |
\ /
|
= 0.9
|
Channel
|
|
/\
|
|
/ \
|
|
/
\
|
|
/
\
|
| /
\ |
| /
\ |
|/
\|
+
+
B0
B1
Bayes’ Theorem
P(Y|X) = P(X|Y) P(Y) / P(X)
Transmitter (A)
a) asks for P(B1|A1) = 0.90000
b) asks for P(B0|A0) = 0.90000
c) asks for P(A1|B1) =
P(B1|A1) P(A1) / P(B1) =
0.9 0.6 / (0.9 P(A1) + 0.1 P(A0)) =
0.54 / (0.9 0.6 + 0.1 0.4) =
0.54 / (0.54 + 0.04) = 0.93103
P(B1|A1)
= 0.9
d) asks for P(A0|B0) =
P(B0|A0) P(A0) / P(B0) =
0.9 0.4 / (0.9 P(A0) + 0.1 P(A1)) =
0.36 / (0.9 0.4 + 0.1 0.6) =
0.36 / (0.36 + 0.06) = 0.85714
Receiver (B)
Answers: a) 0.90000
c) 0.93103
b) 0.90000
d) 0.85714
Note that the probability of observing a correct ”0” is less than that of a ”1” because more 1’s flip to 0’s as
P(A1) > P(A0).
88
The above questions can be solved via a MC simulation; the simulation generates the correct fractions
of zeros and ones and then flips them with a probability of 10 percent and counts the resulting number of
zeros and ones and their history.
Algorithm 8c (Fortran syntax [mostly])
n = 10000000
correct0=0, correct1=0
incorrect0=0, incorrect1=0
do i = 1, n
call random_number(r) ! generate a binary "1" or "0"
if (r<0.4) then
bit=0 ! P{0}=0.4
else
bit=1 ! P{1}=0.6
end if
call random_number(r) ! give a 10% probability of an error
if (r<0.1) then ! flip the bit
if (bit==0) incorrect1=incorrect1+1
if (bit==1) incorrect0=incorrect0+1
else ! no error
if (bit==0) correct0=correct0+1
if (bit==1) correct1=correct1+1
end if
end do
output
output
output
output
"a)
"b)
"c)
"d)
P(of a
P(of a
P(1 is
P(0 is
1 being transmitted without error)", correct1/(correct1+incorrect0)
0 being transmitted without error)", correct0/(correct0+incorrect1)
observed correctly)",
correct1/(correct1+incorrect1)
observed correctly)",
correct0/(correct0+incorrect0)
The output for 10 million trials is
a)
b)
c)
d)
P(1
P(0
P(1
P(0
being transmitted without error) 0.90003
being transmitted without error) 0.90008
is observed correctly) 0.93116
is observed correctly) 0.85707
Note that ”correct1+incorrect0” is the number of 1’s at the transmitter, and ”correct1+incorrect1” is the
number of 1’s at the receiver.
The values obtained for part a and b provide a basic validation of the simulation, the values obtained for
part c and d provide a validation of Baye’s theorem.
89
2. Propagation of errors through a measurement system
Consider, for example, a pressure measurement system. Such a system may have several components that
process a signal that is initially generated by a pressure transducer. For example the system may contain
the following components:
dT1 |
dVs |
dT2 |
|
|
|
v
v
v
+------------+
+------------+
+-----------+
+----------+
---->| Pressure
|----->| Deflection |------>| Amplifier |----->| Recorder |----->
P
| transducer | R
| bridge
| V1
|
| V2 |
|
Pm
+------------+
+------------+
+-----------+
+----------+
A resistance R (Ohms) is output from the pressure transducer in response to an input pressure P (Pascals).
The deflection bridge converts the resistance into a voltage V1 (mA) which in turn is amplified to the voltage
V2 (mV) by the amplifier. Finally the recorder outputs a reading Pm (Pascals).
Suppose that the response of the last three components is affected by random variations in the environment;
variations from standard ambient temperature dT affects the gain of the deflection bridge and creates a bias
in the recorder output, variations from the standard supply voltage dVS causes a bias in the output of the
amplifier. The output of each component can be modeled as follows:
R
V1
V2
Pm
=
=
=
=
0.0001 P
( 0.04 + 0.00003 dT1 ) R
1000 V1 + 0.13 dVs
250 V2 + 2.7 dT2
Here dT has a Gaussian distribution centered at zero with a standard deviation sT =3.0 C, and dVS has a
Gaussian distribution centered at zero with a standard deviation sV s =0.23 V [see sketches in the class].
We can first consider the output of the system for various inputs given standard environmental conditions;
i.e with dT and dVS set to their average values of zero. The model of the measurement system becomes:
R
V1
V2
Pm
=
=
=
=
0.0001 P
( 0.04 ) R
1000 V1
250 V2 = 250 (1000 V1) = 250000 0.04 R = 10000 0.0001 P = P
i.e. Pm = P and so the system is perfectly calibrated.
However, dT and dVS are randomly non-zero causing a random variation in the outputs of each component.
These random variations propagate through the system to the final output. Assuming Gaussian random
variations, the standard deviation sO in the output O of a component for an input I is given by (here d are
partial derivatives):
sO 2 = (sI dO/dI)2 + (sA dO/dA)2 + (sB dO/dB)2 + (sC dO/dC)2 + ....
Here, O is dependent on the input I and the random variables A, B, C... . Note that input I has a standard
deviation sI due to random errors in the output of the previous component; the random errors therefore
propagate through the system to the final output.
Including these random errors the response of the system is shown below for an input of 5000 Pa.
R = 0.0001 P = 0.0001 5000 = 0.5 Ohms
V1 = ( 0.04 + 0.00003 dT1 ) 0.5
= 0.04 0.5 + 0.00003 dT1 0.5
= 0.02 mV + 0.000015 dT1
90
dT1 is Gaussian with sT = 3.0
and so sV1^2 = (sT dV1/dT1)^2 = ( 3.0 0.000015 )^2
and so sV1 = 0.000045 mV
with V1 = 0.02 mV (with dT1 = 0 on average)
V2 = 1000 V1 + 0.13 dVs
= 1000 0.02 + 0.13 dVs
= 20 mV + 0.13 dVs
dVs is Gaussian with sVs = 0.23 and we have also sV1 = 0.000045
and so sV2^2 = (sV1 dV2/dV1)^2 + (sVs dV2/dVs)^2
= (0.000045 1000)^2 + (0.23 0.13)^2
= (0.045)^2 + (0.029)^2
and so sV2 = 0.05403 mV
with V2 = 20 mV (with dVs = 0 on average)
Pm = 250 V2 + 2.7 dT2
= 250 20 + 2.7 dT2
= 5000 Pa + 2.7 dT2
[here we assume this dT2 is independent of the above dT1]
dT2 is Gaussian with sT = 3.0
and so sPm^2 = (sV2 dPm/dV2)^2 + (sT dPm/dT2)^2
= (0.05403 250)^2 + (3.0 2.7)^2
and so sPm = 15.75 Pa
with Pm = 5000 Pa (with dT2 = 0 on average)
Note that again the system is perfectly calibrated with an average output of 5000 Pa for an input of 5000
Pa, but the output is Gaussian distributed with a standard deviation of 15.75 Pa.
We can verify this calculation by performing a monte-carlo simulation of the system. For this we need to be
able to transform a uniform random variable into Gaussian (normal) random variable. This can be achieved
by applying the Box-Muller transformation (see rnrm.f90 and rnrm.c++ in the downloads page):
real function rnrm()
!---------------------------------------------------------------------------! Returns a normally distributed deviate with zero mean and unit variance
! The routine uses the Box-Muller transformation of uniform deviates.
!---------------------------------------------------------------------------real :: r, x, y
do
call random_number(x)
call random_number(y)
x = 2.0*x - 1.0
y = 2.0*y - 1.0
r = x**2 + y**2
if (r<1.0) exit
end do
rnrm = x*sqrt(-2.0*log(r)/r)
end function rnrm
91
The above measurement system is simulated in the following algorithm:
Algorithm 8d (Fortran syntax [mostly])
m1=0, m2=0, n = 100000000
sT=3.0, sVs=0.23, P=5000.
do i = 1, n
! random variables
dT1 = sT * rnrm() ! Gaussian pdf with standard deviation sT
dT2 = sT * rnrm() ! Gaussian pdf with standard deviation sT
dVs = sVs * rnrm() ! Gaussian pdf with standard deviation sVs
! model of the measurement system
R = 0.0001 * P
! the pressure transducer
V1 = ( 0.04 + 0.00003 * dT1 ) * R
! the deflection bridge
V2 = 1000 * V1 + 0.13 * dVs
! the amplifier
Pm = 250 * V2 + 2.7 * dT2
! the recorder
! statistical variables
m1 = m1+Pm
! E[Pm]
m2 = m2+Pm**2
! E[Pm^2]
end do
m1 = m1/n
m2 = m2/n
output "mean, m1 = ", m1
output "sd = sqrt(mu2) = ", sqrt(m2-m1**2)
The output for n = 100000000 is
mean, m1 =
sd = sqrt(mu2) =
4999.99994
15.75136
Notes:
1. The simulation is very simple and can be expanded easily to include more components and environmental
effects.
2. The result is in very good agreement with the theoretical result (given large enough n).
3. Although the environmental effects dT1, dVs, dT2 are included in the models, and they are non-zero,
the net result is a zero contribution giving a mean output of 5000.
4. In this treatment (and in the theoretical treatment) changes in the ambient temperature (dT1 and
dT2) are considered independent of each other. However, in reality these quantities are the same, i.e.
dT1=dT2, and so are fully correlated; the actual standard deviation should therefore be larger. It is trivial
to incorporate this situation in the algorithm (replace dT2 with dT1, and remove dT2); see the lab exercise.
92
8.3
Lab Exercises
Part A:
1. MC estimation of the volume of a sphere
The following Fortran program estimates the area of a circle of unit radius.
implicit none
integer :: i, m=0, n=10000000
real
:: x, y
do i = 1, n
call random_number(x)
call random_number(y)
if ( x**2+y**2 < 1.0 ) m=m+1
end do
print *, 4.0*m/real(n)
end
Copy the program (or rewrite in your language of choice), run and check it, and then modify it to estimate
the volume of a sphere of unit radius. How many random trials (n) does it take to acheive an accuracy of
three decimal places?
2. MC integration of a function f(x)
a) Write a Monte Carlo integration program to integrate the function f(x) = (1-x2 )1/2 over the range x =
-1 to x = 1.
b) Sketch the integral and determine ymax .
c) Write, compile and run your program.
d) Compare your computed result with the analytical result: π/2.
e) How many trials (n) are required to acheive an accuracy of three decimal places?
Part B:
Propagation of errors through a measurement system
a) Code Algorithm 8d into a computer program, compile and run the program, check that the output agrees
with the theoretical result.
b) Verify that the mean output of the measurement system is equal to the input for the input values: 1000
Pa, 2000 Pa, 4000 Pa, and 8000 Pa; comment on the size of the standard deviation for each input value.
c) In this model we assume that dT is independent for the deflection bridge and the recorder; modify your
program such that dT is not independent (this is more realistic), what is the effect of this on the standard
deviation of the output?
93
8.4
Lab Solutions
Part A:
1. MC estimation of the volume of a sphere
The following Fortran program estimates the area of a circle of unit radius.
implicit none
integer :: i, m=0, n=10000000
real
:: x, y
do i = 1, n
call random_number(x)
call random_number(y)
if ( x**2+y**2 < 1.0 ) m=m+1
end do
print *, 4.0*m/real(n)
end
Copy the program (or rewrite in your language of choice), run and check it, and then modify it to estimate
the volume of a sphere of unit radius. How many random trials (n) does it take to achieve an accuracy of
three decimal places?
Solution eee484ex8a (see the downloads page).
We simply add another dimension Z and the test for X2 + Y2 + Z2 < 1
This defines an 1/8th of the volume (an octant) of the sphere, so the volume 4π/3 is 8m/n.
The estimate is 8m/n and so the error in the estimate is 8m/n - 4π/3
The error can be inspected for increasing values of n:
n
10
100
1000
10000
100000
1000000
10000000
100000000
8*m/n
2.4
4.4
4.104
4.1768
4.20456
4.191408
4.1885104
4.1886897
Error = 8*m/n - 4pi/3
-1.7887903
0.21120968
-0.08479032
-0.011990322
0.01576968
0.0026176786
-0.00027992134 | 3 d.p.
-0.00010080135 | accuracy
It appears that we need of the order of n = 107 trials to gain a 3 decimal place accuracy! The number of
trials required depends on the initial seed of the generator - there are significant statistical fluctuations.
2.
a)
-1
b)
c)
d)
e)
MC integration of a function f(x)
Write a Monte Carlo integration program to integrate the function f(x) = (1-x2 )1/2 over the range x =
to x = 1.
Sketch the integral and determine ymax .
Write, compile and run your program.
Compare your computed result with the analytical result: π/2.
How many trials (n) are required to achieve an accuracy of three decimal places?
94
Solution eee484ex8b (see the downloads page).
We have a=-1.0, b=+1.0, Ymax =1.0 (by simple inspection), and the integral estimate (b-a) Ymax m/n. The
error in the estimate is (b-a) Ymax m/n - π/2; we can investigate the effect of varying n as follows:
n
10
100
1,000
10,000
100,000
1,000,000
10,000,000
100,000,000
1,000,000,000
m
5
66
793
7855
78674
785723
7856866
78547805
785407068
Estimate
1.000000
1.320000
1.586000
1.571000
1.573480
1.571446
1.571373
1.570956
1.570814
Error
-0.570796
-0.250796
0.015204
0.000204
0.002684
0.000650 | 3 d.p.
0.000577 | accuracy
0.000160
0.000018
So it appears (after investigating other seeds) one requires of the order of n = 107 trials to gain a 3 decimal
place accuracy!
Part B:
Propagation of errors through a measurement system
a) Code Algorithm 8c into a computer program, compile and run the program, check that the output agrees
with the theoretical result.
b) Verify that the mean output of the measurement system is equal to the input for the input values: 1000
Pa, 2000 Pa, 4000 Pa, and 8000 Pa; comment on the size of the standard deviation for each input value.
c) In this model we assume that dT is independent for the deflection bridge and the recorder; modify your
program such that dT is not independent (this is more realistic), what is the effect of this on the standard
deviation of the output?
Solution eee484ex8c (see the downloads page).
a) For n=10000000 and P=5000 the output is: mean = 5000.005 sd = 15.758
b) For n=10000000
P=1000 the output is: mean = 1000.001, sd = 11.251
P=2000 the output is: mean = 2000.002, sd = 11.909
P=4000 the output is: mean = 4000.004, sd = 14.237
P=8000 the output is: mean = 8000.008, sd = 21.118
The relative error reduces as the input increases; this can be seen by inspecting the mathematical model of
the measurement system.
c) To make dT not-independent, simple replace ”dT2 = sT * rnrm()” with ”dT2 = dT1”; the result is
an increase in the output standard deviation from 15.75 to 20.75 (when dT is independent the two values
sometimes partially cancel, if they have opposite signs).
95
8.5
Example exam questions
Question 1
Write a computer program to perform a Monte Carlo integration of the
function y = 5 sqrt(x) - x over the interval x=1.0 to x=9.0.
96
A
Linux Tutorial
Linux Tutorial - Simple data manipulation programs under Linux
Your central interactive computer account uses the Linux operating system. Linux is one of
several variants of Unix. Many basic though very useful data manipulation programs exist
under Linux. We will look at the following programs:
cat
sort
uniq
diff
echo
sed
tr
grep
head
tail
wc
cut
-
concatenate files
sort lines of text files
remove duplicate lines from a sorted file
find differences between two files
display a line of text
a Stream EDitor
translate or delete characters
search for lines in a file matching a pattern
output the first part of files
output the last part of files
print the number of bytes, words, and lines in files
remove sections from each line of files
For detailed information about these commands type:
man ’command’ or info ’command’ or ’command’ --help
Additionally, we can use "output redirection" (the ’>’ symbol) to redirect the output of
a program to a file instead of the screen, "append" (the ’>>’ symbol) to add(append) to
files, and "piping" (the ’|’ symbol) to "pipe" the output of one program into another
program.
Copy the following data files to your file space, you can do this with the
command get-EEE484-unix. Starting with files file1.dat file2.dat file3.dat we can
perform the following operations (you can view the contents of a file with the command
"cat filename"):
The file contents are:
file1.dat
--------cake
hat
pool
ten
tool
file2.dat
--------house
pen
fish
cake
file3.dat
--------bed
fence
tool
one
comb
Join the contents into a single file:
$ cat file1.dat file2.dat file3.dat > file4.dat
the output is sent to the file file4.dat, view it with ’cat file4.dat’
Sort into alphabetical order:
$ sort file4.dat > file5.dat
output to file5.dat, view it with ’cat file5.dat’
97
Remove multiple entries:
$ uniq file5.dat > file6.dat
We can combine the last two operations with:
$ sort file4.dat | uniq > file6.dat
(note that there is no file5.dat required)
Look at the difference:
$ diff file5.dat file6.dat
Append the word ’dog’ to the file:
$ echo dog >> file6.dat
Again sort into alphabetical order:
$ sort file6.dat > file7.dat
Replace the word ’house’ with the word ’home’:
$ sed "s/house/home/g" file7.dat > file8.dat
Translate all letters between a and z with their upper case values:
$ cat file8.dat | tr a-z A-Z > file9.dat
Search for the word ’home’ in the file:
$ grep home file9.dat
Search for the word ’HOME’ in the file:
$ grep HOME file9.dat
Search for the word ’home’ ignoring case:
$ grep -i home file9.dat
And again giving the line number of any occurrences:
$ grep -i -n home file9.dat
Display the first 8 lines:
$ head -n8 file9.dat
Display the last 5 lines:
$ tail -n5 file9.dat
Count the number of lines, words and characters in the file:
$ wc file9.dat
Display the first three characters of each line:
$ cut -b1-3 file9.dat
Display the seconds to fourth character of each line:
$ cut -b2-4 file9.dat
98
Lab Exercise - Linux
Download the file lep.dat and perform the following analysis.
1. Use wc to determine how many particles are there in the list.
2. Use cut, sort and uniq to form a unique list of particle species;
how many species of particles are there?
3. Use grep and wc to determine how many particles there are of each species.
4. Use cut, sort, head, and tail to determine the maximum and minimum
particle momenta. Which particles are they?
5. Use piping; i.e. dont waste time creating intermediate files, repeat the exercise
(except part 3) with the file lep2.dat; the file contains many more particles.
Solution for Lab Exercise Linux
1. Use wc to determine how many particles are there in the list.
Answer:
$ wc lep.dat
26
52
442 lep.dat
26 lines implies 26 particles in the list.
2. Use cut, sort and uniq to form a unique list of particle species;
how many species of particles are there?
Answer:
$ cut -b1-7 lep.dat | sort | uniq
KAONNEUTRON
PHOTON
PION+
PIONPROTON
There are six particle species in the list.
3. Use grep and wc to determine how many particles there are of each species.
Answer:
We search for each particle type (6 of them) and count.
$ grep
1
$ grep
2
$ grep
10
$ grep
7
$ grep
5
KAON2
NEUTRON
4
PHOTON
20
PION+
14
PION10
lep.dat
17
lep.dat
34
lep.dat
170
lep.dat
119
lep.dat
85
| wc
| wc
| wc
| wc
| wc
99
$ grep PROTON lep.dat | wc
1
2
17
The result is: 1 Kaon, 2 Neutrons, 10 Photons, 7 positive Pions,
5 negative Pions, and 1 proton; (26 in total)
4. Use cut, sort, head, and tail to determine the maximum and minimum particle momenta.
Which particles are they?
Answer:
$ cut -b11-16 lep.dat | sort -n | head -n1
0.120
$ cut -b11-16 lep.dat | sort -n | tail -n1
22.284
The minimum momentum is 0.120 GeV/c, the maximum is 22.284 GeV/c
$ grep 0.120 lep.dat
PHOTON
0.120
$ grep 22.284 lep.dat
PION22.284
The minimum momentum belongs to a Photon. The maximum momentum belongs to a
negative Pion.
5. Use piping; i.e. dont waste time creating intermediate files, repeat the exercise
(except part 3) with the file lep2.dat; the file contains many more particles.
Answer:
$ wc -l lep2.dat
980 lep2.dat
There are 982 particles in the list.
$ cut -b1-8 lep2.dat | sort | uniq | wc -l
87
There are 87 particle species in the list.
$ cut -b11-19 lep2.dat | sort -n | head -n1
0.001432
$ cut -b11-19 lep2.dat | sort -n | tail -n1
40.19147
The minimum momentum is 0.001 GeV/c, maximum is 40.191 GeV/c.
$ grep 0.001432 lep2.dat
GAMMA
0.00143232674
$ grep 40.191 lep2.dat
B*40.1914749
The minimum momentum belongs to a GAMMA. The maximum momentum belongs to a B*-.
100