Download Calculus lecture notes from previous years

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Calculus 1 to 4 (2004–2006)
Axel Schüler
January 3, 2007
2
Contents
1 Real and Complex Numbers
Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Notations . . . . . . . . . . . . . . . . . . . . . . . . . . .
Sums and Products . . . . . . . . . . . . . . . . . . . . . .
Mathematical Induction . . . . . . . . . . . . . . . . . . . .
Binomial Coefficients . . . . . . . . . . . . . . . . . . . . .
1.1 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Ordered Sets . . . . . . . . . . . . . . . . . . . . .
1.1.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.3 Ordered Fields . . . . . . . . . . . . . . . . . . . .
1.1.4 Embedding of natural numbers into the real numbers
1.1.5 The completeness of R . . . . . . . . . . . . . . . .
1.1.6 The Absolute Value . . . . . . . . . . . . . . . . . .
1.1.7 Supremum and Infimum revisited . . . . . . . . . .
1.1.8 Powers of real numbers . . . . . . . . . . . . . . . .
1.1.9 Logarithms . . . . . . . . . . . . . . . . . . . . . .
1.2 Complex numbers . . . . . . . . . . . . . . . . . . . . . . .
1.2.1 The Complex Plane and the Polar form . . . . . . .
1.2.2 Roots of Complex Numbers . . . . . . . . . . . . .
1.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3.1 Monotony of the Power and Exponential Functions .
1.3.2 The Arithmetic-Geometric mean inequality . . . . .
1.3.3 The Cauchy–Schwarz Inequality . . . . . . . . . . .
1.4 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Sequences and Series
2.1 Convergent Sequences . . . . . . . . . . .
2.1.1 Algebraic operations with sequences
2.1.2 Some special sequences . . . . . .
2.1.3 Monotonic Sequences . . . . . . .
2.1.4 Subsequences . . . . . . . . . . . .
2.2 Cauchy Sequences . . . . . . . . . . . . .
2.3 Series . . . . . . . . . . . . . . . . . . . .
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
11
11
11
12
12
13
15
15
17
19
20
21
22
23
24
26
29
31
33
34
34
34
35
36
.
.
.
.
.
.
.
43
43
46
49
50
51
55
57
CONTENTS
4
2.3.1
2.3.2
2.3.3
2.3.4
2.3.5
2.3.6
2.3.7
2.3.8
2.3.9
2.3.10
2.3.11
Properties of Convergent Series . . .
Operations with Convergent Series . .
Series of Nonnegative Numbers . . .
The Number e . . . . . . . . . . . .
The Root and the Ratio Tests . . . . .
Absolute Convergence . . . . . . . .
Decimal Expansion of Real Numbers
Complex Sequences and Series . . .
Power Series . . . . . . . . . . . . .
Rearrangements . . . . . . . . . . . .
Products of Series . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
57
59
59
61
63
65
66
67
68
69
72
3 Functions and Continuity
75
3.1 Limits of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity . . . . . . . . . 77
3.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
3.2.1 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . 81
3.2.2 Continuous Functions on Bounded and Closed Intervals—The Theorem about Maximum and M
3.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.4 Monotonic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses . . . 86
3.5.1 Exponential and Logarithm Functions . . . . . . . . . . . . . . . . . . 86
3.5.2 Trigonometric Functions and their Inverses . . . . . . . . . . . . . . . 89
3.5.3 Hyperbolic Functions and their Inverses . . . . . . . . . . . . . . . . . 94
3.6 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
3.6.1 Monotonic Functions have One-Sided Limits . . . . . . . . . . . . . . 95
3.6.2 Proofs for sin x and cos x inequalities . . . . . . . . . . . . . . . . . . 96
3.6.3 Estimates for π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4 Differentiation
4.1 The Derivative of a Function . . . . . . . . .
4.2 The Derivatives of Elementary Functions . .
4.2.1 Derivatives of Higher Order . . . . .
4.3 Local Extrema and the Mean Value Theorem
4.3.1 Local Extrema and Convexity . . . .
4.4 L’Hospital’s Rule . . . . . . . . . . . . . . .
4.5 Taylor’s Theorem . . . . . . . . . . . . . . .
4.5.1 Examples of Taylor Series . . . . . .
4.6 Appendix C . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
101
101
107
108
108
111
112
113
115
117
5 Integration
119
5.1 The Riemann–Stieltjes Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.1.1 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . 126
CONTENTS
5.2
5.3
5.4
5.5
5.6
5
Integration and Differentiation . . . . . . . . . . . . . . .
5.2.1 Table of Antiderivatives . . . . . . . . . . . . . .
5.2.2 Integration Rules . . . . . . . . . . . . . . . . . .
5.2.3 Integration of Rational Functions . . . . . . . . .
5.2.4 Partial Fraction Decomposition . . . . . . . . . .
5.2.5 Other Classes of Elementary Integrable Functions .
Improper Integrals . . . . . . . . . . . . . . . . . . . . .
5.3.1 Integrals on unbounded intervals . . . . . . . . . .
5.3.2 Integrals of Unbounded Functions . . . . . . . . .
5.3.3 The Gamma function . . . . . . . . . . . . . . . .
Integration of Vector-Valued Functions . . . . . . . . . . .
Inequalities . . . . . . . . . . . . . . . . . . . . . . . . .
Appendix D . . . . . . . . . . . . . . . . . . . . . . . . .
5.6.1 More on the Gamma Function . . . . . . . . . . .
6 Sequences of Functions and Basic Topology
6.1 Discussion of the Main Problem . . . . . . . . . .
6.2 Uniform Convergence . . . . . . . . . . . . . . . .
6.2.1 Definitions and Example . . . . . . . . . .
6.2.2 Uniform Convergence and Continuity . . .
6.2.3 Uniform Convergence and Integration . . .
6.2.4 Uniform Convergence and Differentiation .
6.3 Fourier Series . . . . . . . . . . . . . . . . . . . .
6.3.1 An Inner Product on the Periodic Functions
6.4 Basic Topology . . . . . . . . . . . . . . . . . . .
6.4.1 Finite, Countable, and Uncountable Sets . .
6.4.2 Metric Spaces and Normed Spaces . . . . .
6.4.3 Open and Closed Sets . . . . . . . . . . .
6.4.4 Limits and Continuity . . . . . . . . . . .
6.4.5 Comleteness and Compactness . . . . . . .
6.4.6 Continuous Functions in Rk . . . . . . . .
6.5 Appendix E . . . . . . . . . . . . . . . . . . . . .
7 Calculus of Functions of Several Variables
7.1 Partial Derivatives . . . . . . . . . . . . .
7.1.1 Higher Partial Derivatives . . . .
7.1.2 The Laplacian . . . . . . . . . . .
7.2 Total Differentiation . . . . . . . . . . . .
7.2.1 Basic Theorems . . . . . . . . . .
7.3 Taylor’s Formula . . . . . . . . . . . . .
7.3.1 Directional Derivatives . . . . . .
7.3.2 Taylor’s Formula . . . . . . . . .
7.4 Extrema of Functions of Several Variables
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
132
134
135
138
140
141
143
143
146
147
148
150
151
152
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
157
157
158
158
162
163
166
168
171
177
177
178
180
182
185
187
188
.
.
.
.
.
.
.
.
.
193
194
197
199
199
202
206
206
208
211
CONTENTS
6
7.5
7.6
7.7
7.8
7.9
The Inverse Mapping Theorem . . . . . . .
The Implicit Function Theorem . . . . . . .
Lagrange Multiplier Rule . . . . . . . . . .
Integrals depending on Parameters . . . . .
7.8.1 Continuity of I(y) . . . . . . . . .
7.8.2 Differentiation of Integrals . . . . .
7.8.3 Improper Integrals with Parameters
Appendix . . . . . . . . . . . . . . . . . .
8 Curves and Line Integrals
8.1 Rectifiable Curves . . . . .
8.1.1 Curves in Rk . . .
8.1.2 Rectifiable Curves
8.2 Line Integrals . . . . . . .
8.2.1 Path Independence
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
9 Integration of Functions of Several Variables
9.1 Basic Definition . . . . . . . . . . . . . . . . .
9.1.1 Properties of the Riemann Integral . . .
9.2 Integrable Functions . . . . . . . . . . . . . .
9.2.1 Integration over More General Sets . .
9.2.2 Fubini’s Theorem and Iterated Integrals
9.3 Change of Variable . . . . . . . . . . . . . . .
9.4 Appendix . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
10 Surface Integrals
10.1 Surfaces in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.1.1 The Area of a Surface . . . . . . . . . . . . . . . . . . . .
10.2 Scalar Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . .
10.2.1 Other Forms for dS . . . . . . . . . . . . . . . . . . . . .
10.2.2 Physical Application . . . . . . . . . . . . . . . . . . . . .
10.3 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.3.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . .
10.4 Gauß’ Divergence Theorem . . . . . . . . . . . . . . . . . . . . . .
10.5 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . .
10.5.1 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . .
10.5.2 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . .
10.5.3 Vector Potential and the Inverse Problem of Vector Analysis
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
216
219
223
225
225
225
227
230
.
.
.
.
.
231
231
231
233
236
239
.
.
.
.
.
.
.
245
245
247
248
249
250
253
257
.
.
.
.
.
.
.
.
.
.
.
.
259
259
261
262
262
264
264
264
268
272
272
274
276
.
.
.
.
279
279
279
284
285
R
11 Differential Forms on n
11.1 The Exterior Algebra Λ(Rn ) . . .
11.1.1 The Dual Vector Space V ∗
11.1.2 The Pull-Back of k-forms
11.1.3 Orientation of Rn . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
CONTENTS
7
11.2 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.3 Pull-Back . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.2.4 Closed and Exact Forms . . . . . . . . . . . . . . . . . . . .
11.3 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator
11.3.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.3 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . .
11.3.4 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . .
11.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . .
12 Measure Theory and Integration
12.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
12.1.1 Algebras, σ-algebras, and Borel Sets . . . . . . . . . . . . .
12.1.2 Additive Functions and Measures . . . . . . . . . . . . . .
12.1.3 Extension of Countably Additive Functions . . . . . . . . .
12.1.4 The Lebesgue Measure on Rn . . . . . . . . . . . . . . . .
12.2 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . .
12.3 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . .
12.3.1 Simple Functions . . . . . . . . . . . . . . . . . . . . . . .
12.3.2 Positive Measurable Functions . . . . . . . . . . . . . . .
12.4 Some Theorems on Lebesgue Integrals . . . . . . . . . . . . . . . .
12.4.1 The Role Played by Measure Zero Sets . . . . . . . . . . .
12.4.2 The space Lp (X, µ) . . . . . . . . . . . . . . . . . . . . . .
12.4.3 The Monotone Convergence Theorem . . . . . . . . . . . .
12.4.4 The Dominated Convergence Theorem . . . . . . . . . . .
12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals .
12.4.6 The Riemann and the Lebesgue Integrals . . . . . . . . . .
12.4.7 Appendix: Fubini’s Theorem . . . . . . . . . . . . . . . . .
13 Hilbert Space
13.1 The Geometry of the Hilbert Space . . . . . . .
13.1.1 Unitary Spaces . . . . . . . . . . . . .
13.1.2 Norm and Inner product . . . . . . . .
13.1.3 Two Theorems of F. Riesz . . . . . . .
13.1.4 Orthogonal Sets and Fourier Expansion
13.1.5 Appendix . . . . . . . . . . . . . . . .
13.2 Bounded Linear Operators in Hilbert Spaces . .
13.2.1 Bounded Linear Operators . . . . . . .
13.2.2 The Adjoint Operator . . . . . . . . . .
13.2.3 Classes of Bounded Linear Operators .
13.2.4 Orthogonal Projections . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
285
285
286
288
291
293
293
295
296
298
299
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
305
305
306
308
313
314
316
318
318
319
322
322
324
325
326
327
329
329
.
.
.
.
.
.
.
.
.
.
.
331
331
331
334
335
339
343
344
344
347
349
351
CONTENTS
8
13.2.5 Spectrum and Resolvent . . . . . . . . . . . . . . . . . . . . . . . . . 353
13.2.6 The Spectrum of Self-Adjoint Operators . . . . . . . . . . . . . . . . . 357
14 Complex Analysis
14.1 Holomorphic Functions . . . . . . . . . . . . .
14.1.1 Complex Differentiation . . . . . . . .
14.1.2 Power Series . . . . . . . . . . . . . .
14.1.3 Cauchy–Riemann Equations . . . . . .
14.2 Cauchy’s Integral Formula . . . . . . . . . . .
14.2.1 Integration . . . . . . . . . . . . . . .
14.2.2 Cauchy’s Theorem . . . . . . . . . . .
14.2.3 Cauchy’s Integral Formula . . . . . . .
14.2.4 Applications of the Coefficient Formula
14.2.5 Power Series . . . . . . . . . . . . . .
14.3 Local Properties of Holomorphic Functions . .
14.4 Singularities . . . . . . . . . . . . . . . . . . .
14.4.1 Classification of Singularities . . . . .
14.4.2 Laurent Series . . . . . . . . . . . . .
14.5 Residues . . . . . . . . . . . . . . . . . . . . .
14.5.1 Calculating Residues . . . . . . . . . .
14.6 Real Integrals . . . . . . . . . . . . . . . . . .
14.6.1 Rational Functions in Sine and Cosine .
R∞
14.6.2 Integrals of the form −∞ f (x) dx . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15 Partial Differential Equations I — an Introduction
15.1 Classification of PDE . . . . . . . . . . . . . . . .
15.1.1 Introduction . . . . . . . . . . . . . . . . .
15.1.2 Examples . . . . . . . . . . . . . . . . . .
15.2 First Order PDE — The Method of Characteristics
15.3 Classification of Semi-Linear Second-Order PDEs .
15.3.1 Quadratic Forms . . . . . . . . . . . . . .
15.3.2 Elliptic, Parabolic and Hyperbolic . . . . .
15.3.3 Change of Coordinates . . . . . . . . . . .
15.3.4 Characteristics . . . . . . . . . . . . . . .
15.3.5 The Vibrating String . . . . . . . . . . . .
16 Distributions
16.1 Introduction — Test Functions and Distributions .
16.1.1 Motivation . . . . . . . . . . . . . . . .
16.1.2 Test Functions D(Rn ) and D(Ω) . . . .
16.2 The Distributions D′ (Rn ) . . . . . . . . . . . . .
16.2.1 Regular Distributions . . . . . . . . . . .
16.2.2 Other Examples of Distributions . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
363
363
363
365
366
369
369
371
373
377
380
383
385
386
387
392
394
395
395
396
.
.
.
.
.
.
.
.
.
.
401
401
401
402
405
408
408
408
409
411
414
.
.
.
.
.
.
417
417
417
418
422
422
424
CONTENTS
16.2.3 Convergence and Limits of Distributions
16.2.4 The distribution P x1 . . . . . . . . . . .
16.2.5 Operation with Distributions . . . . . . .
16.3 Tensor Product and Convolution Product . . . . .
16.3.1 The Support of a Distribution . . . . . .
16.3.2 Tensor Products . . . . . . . . . . . . . .
16.3.3 Convolution Product . . . . . . . . . . .
16.3.4 Linear Change of Variables . . . . . . . .
16.3.5 Fundamental Solutions . . . . . . . . . .
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) .
16.4.1 The Space S (Rn ) . . . . . . . . . . . .
16.4.2 The Space S ′ (Rn ) . . . . . . . . . . . .
16.4.3 Fourier Transformation in S ′ (Rn ) . . . .
16.5 Appendix—More about Convolutions . . . . . .
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
425
426
427
433
433
433
434
437
438
439
440
446
447
450
17 PDE II — The Equations of Mathematical Physics
17.1 Fundamental Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.1 The Laplace Equation . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.2 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.1.3 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2 The Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.1 Motivation of the Method . . . . . . . . . . . . . . . . . . . . . . . .
17.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.3 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.2.4 Physical Interpretation of the Results . . . . . . . . . . . . . . . . . .
17.3 Fourier Method for Boundary Value Problems . . . . . . . . . . . . . . . . . .
17.3.1 Initial Boundary Value Problems . . . . . . . . . . . . . . . . . . . . .
17.3.2 Eigenvalue Problems for the Laplace Equation . . . . . . . . . . . . .
17.4 Boundary Value Problems for the Laplace and the Poisson Equations . . . . . .
17.4.1 Formulation of Boundary Value Problems . . . . . . . . . . . . . . . .
17.4.2 Basic Properties of Harmonic Functions . . . . . . . . . . . . . . . . .
17.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
17.5.1 Existence of Solutions to the Boundary Value Problems . . . . . . . . .
17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle
17.5.3 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . .
453
453
453
455
456
459
459
460
464
466
468
469
473
477
477
478
485
485
490
494
10
CONTENTS
Chapter 1
Real and Complex Numbers
Basics
Notations
R
C
Q
N = {1, 2, . . . }
Z
Real numbers
Complex numbers
Rational numbers
positive integers (natural numbers)
Integers
We know that N ⊆ Z ⊆ Q ⊆ R ⊆ C. We write R+ , Q+ and Z+ for the non-negative
real, rational, and integer numbers x ≥ 0, respectively. The notions A ⊂ B and A ⊆ B are
equivalent. If we want to point out that B is strictly bigger than A we write A ( B.
We use the following symbols
:=
y, ⇒
⇐⇒
∀
∃
defining equation
implication, “if . . . , then . . . ”
“if and only if”, equivalence
for all
there exists
Let a < b fixed real numbers. We denote the intervals as follows
[a, b] := {x ∈ R | a ≤ x ≤ b}
(a, b) := {x ∈ R | a < x < b}
[a, b) := {x ∈ R | a ≤ x < b}
(a, b] := {x ∈ R | a < x ≤ b}
[a, ∞) := {x ∈ R | a ≤ x}
(a, ∞) := {x ∈ R | a < x}
(−∞, b] := {x ∈ R | x ≤ b}
(−∞, b) := {x ∈ R | x < b}
closed interval
open interval
half-open interval
half-open interval
closed half-line
open half-line
closed half-line
open half-line
11
1 Real and Complex Numbers
12
(a) Sums and Products
P
Q
Let us recall the meaning of the sum sign
and the product sign . Suppose m ≤ n are
integers, and ak , k = m, . . . , n are real numbers. Then we set
n
X
k=m
n
Y
ak := am + am+1 + · · · + an ,
k=m
ak := am am+1 · · · an .
In case m = n the sum and the product consist of one summand and one factor only, respectively. In case n < m it is customary to set
n
X
ak := 0, (empty sum)
k=m
n
Y
ak := 1
(empty product).
k=m
The following rules are obvious: If m ≤ n ≤ p and d ∈ Z are integers then
n
X
k=m
ak +
p
X
ak =
k=n+1
We have for a ∈ R,
n
X
k=m
p
X
ak ,
k=m
n
X
ak =
k=m
n+d
X
ak−d
(index shift).
k=m+d
a = (n − m + 1)a.
(b) Mathematical Induction
Mathematical induction is a powerful method to prove theorems about natural numbers.
Theorem 1.1 (Principle of Mathematical Induction) Let n0 ∈ Z be an integer. To prove a
statement A(n) for all integers n ≥ n0 it is sufficient to show:
(I) A(n0 ) is true.
(II) For any n ≥ n0 : If A(n) is true, so is A(n + 1) (Induction step).
It is easy to see how the principle works: First, A(n0 ) is true. Apply (II) to n = n0 we obtain
that A(n0 + 1) is true. Successive application of (II) yields A(n0 + 2), A(n0 + 3) are true and
so on.
Example 1.1 (a) For all nonnegative integers n we have
n
X
k=1
(2k − 1) = n2 .
Proof. We use induction over n. In case n = 0 we have an empty sum on the left hand side (lhs)
and 02 = 0 on the right hand side (rhs). Hence, the statement is true for n = 0.
Suppose it is true for some fixed n. We shall prove it for n + 1. By the definition of the sum
P
and by induction hypothesis, nk=1 (2k − 1) = n2 , we have
n+1
X
k=1
(2k − 1) =
n
X
k=1
(2k − 1) + 2(n + 1) − 1
This proves the claim for n + 1.
=
ind. hyp.
n2 + 2n + 1 = (n + 1)2 .
13
(b) For all positive integers n ≥ 8 we have 2n > 3n2 .
Proof. In case n = 8 we have
2n = 28 = 256 > 192 = 3 · 64 = 3 · 82 = 3n2 ;
and the statement is true in this case.
Suppose it is true for some fixed n ≥ 8, i. e. 2n > 3n2 (induction hypothesis). We will show
that the statement is true for n + 1, i. e. 2n+1 > 3(n + 1)2 (induction assertion). Note that n ≥ 8
implies
=⇒ (n − 1)2 > 4 > 2
n−1≥ 7>2
=⇒ 3(n2 − 2n − 1) > 0
=⇒ 6n2 > 3n2 + 6n + 3
=⇒ 2 · 3n2 > 3(n + 1)2 .
=⇒ 3n2 − 6n − 3 > 0
=⇒ 2 · 3n2 > 3(n2 + 2n + 1)
=⇒ n2 − 2n − 1 > 0
| +3n2 + 6n + 3
(1.1)
By induction assumption, 2n+1 = 2 · 2n > 2 · 3n2 . This together with (1.1) yields
2n+1 > 3(n + 1)2 . Thus, we have shown the induction assertion. Hence the statement is true
for all positive integers n ≥ 8.
For a positive integer n ∈ N we set
n! :=
n
Y
k,
read: “n factorial,”
0! = 1! = 1.
k=1
(c) Binomial Coefficients
For non-negative integers n, k ∈ Z+ we define
k
Y
n−i+1
n(n − 1) · · · (n − k + 1)
n
=
.
:=
i
k(k − 1) · · · 2 · 1
k
i=1
The numbers nk (read: “n choose k”) are called binomial coefficients since they appear in the
binomial theorem, see Proposition 1.4 below. It just follows from the definition that
n
= 0 for k > n,
k
n!
n
n
=
for 0 ≤ k ≤ n.
=
k
k!(n − k)!
n−k
Lemma 1.2 For 0 ≤ k ≤ n we have:
n
n
n+1
.
+
=
k+1
k
k+1
1 Real and Complex Numbers
14
Proof. For k = n the formula is obvious. For 0 ≤ k ≤ n − 1 we have
n!
n
n
n!
=
+
+
k+1
k
k!(n − k)! (k + 1)!(n − k − 1)!
(k + 1)n! + (n − k)n!
(n + 1)!
n+1
=
.
=
=
(k + 1)!(n − k)!
(k + 1)!(n − k)!
k+1
We say that X is an n-set if X has exactly n elements. We write Card X = n (from “cardinality”) to denote the number of elements in X.
Lemma 1.3 The number of k-subsets of an n-set is nk .
The Lemma in particular shows that nk is always an integer (which is not obvious by its definition).
Proof. We denote the number of k-subsets of an n set Xn by Ckn . It is clear that C0n = Cnn = 1
since ∅ is the only 0-subset of Xn and Xn itself is the only n-subset of Xn . We use induction
over n. The case n = 1 is obvious since C01 = C11 = 10 = 11 = 1. Suppose that the claim is
true for some fixed n. We will show the statement for the (n + 1)-set X = {1, . . . , n + 1} and
all k with 1 ≤ k ≤ n. The family of (k + 1)-subsets of X splits into two disjoint classes. In the
first class A1 every subset contains n + 1; in the second class A2 , not. To form a subset in A1
one has to choose another k elements out of {1, . . . , n}. By induction assumption the number
is Card A1 = Ckn = nk . To form a subset in A2 one has to choose k + 1 elements out of
n
n
. By Lemma 1.2
= k+1
{1, . . . , n}. By induction assumption this number is Card A2 = Ck+1
we obtain
n+1
n
n
n+1
Ck+1 = Card A1 + Card A2 =
=
+
k+1
k+1
k
which proves the induction assertion.
Proposition 1.4 (Binomial Theorem) Let x, y ∈ R and n ∈ N. Then we have
n X
n n−k k
(x + y) =
x y .
k
k=0
n
Proof. We give a direct proof. Using the distributive law we find that each of the 2n summands
of product (x + y)n has the form xn−k y k for some k = 0, . . . , n. We number the n factors
as (x + y)n = f1 · f2 · · · fn , f1 = f2 = · · · = fn = x + y. Let us count how often the
summand xn−k y k appears. We have to choose k factors y out of the n factors f1 , . . . , fn . The
remaining n − k factors must be x. This gives a 1-1-correspondence between the k-subsets
of {f1 , . . . , fn } and the different summands of the form xn−k y k . Hence, by Lemma 1.3 their
number is Ckn = nk . This proves the proposition.
1.1 Real Numbers
15
1.1 Real Numbers
In this lecture course we assume the system of real numbers to be given. Recall that the set of
integers is Z = {0, ±1, ±2, . . . } while the fractions of integers Q = { m
| m, n ∈ Z, n 6= 0}
n
form the set of rational numbers.
A satisfactory discussion of the main concepts of analysis such as convergence, continuity,
differentiation and integration must be based on an accurately defined number concept.
An existence proof for the real numbers is given in [Rud76, Appendix to Chapter 1]. The author
explicitly constructs the real numbers R starting from the rational numbers Q.
The aim of the following two sections is to formulate the axioms which are sufficient to derive
all properties and theorems of the real number system.
The rational numbers are inadequate for many purposes, both as a field and an ordered set. For
instance, there is no rational x with x2 = 2. This leads to the introduction of irrational numbers
which are often written as infinite decimal expansions and are considered to be “approximated”
by the corresponding finite decimals. Thus the sequence
1, 1.4, 1.41, 1.414, 1.4142, . . .
√
√
“tends to 2.” But unless the irrational number 2 has been clearly defined, the question must
arise: What is it that this sequence “tends to”?
This sort of question can be answered as soon as the so-called “real number system” is constructed.
Example 1.2 As shown in the exercise class, there is no rational number x with x2 = 2. Set
A = {x ∈ Q+ | x2 < 2} and
B = {x ∈ Q+ | x2 > 2}.
Then A ∪ B = Q+ and A ∩ B = ∅. One can show that in the rational number system, A
has no largest element and B has no smallest element, for details see Appendix A or Rudin’s
book [Rud76, Example 1.1, page 2]. This example shows that the system of rational numbers
has certain gaps in spite of the fact that between any two rationals there is another: If r < s
then r < (r + s)/2 < s. The real number system fills these gaps. This is the principal reason
for the fundamental role which it plays in analysis.
We start with the brief discussion of the general concepts of ordered set and field.
1.1.1 Ordered Sets
Definition 1.1 (a) Let S be a set. An order (or total order) on S is a relation, denoted by <,
with the following properties. Let x, y, z ∈ S.
(i) One and only one of the following statements is true.
x < y,
x = y,
(ii) x < y and y < z implies x < z
y<x
(trichotomy)
(transitivity).
1 Real and Complex Numbers
16
In this case S is called an ordered set.
(b) Suppose (S, <) is an ordered set, and E ⊆ S. If there exists a β ∈ S such that x ≤ β for
all x ∈ E, we say that E is bounded above, and call β an upper bound of E. Lower bounds are
defined in the same way with ≥ in place of ≤.
If E is both bounded above and below, we say that E is bounded.
The statement x < y may be read as “x is less than y” or “x precedes y”. It is convenient to
write y > x instead of x < y. The notation x ≤ y indicates x < y or x = y. In other words,
x ≤ y is the negation of x > y. For example, R is an ordered set if r < s is defined to mean
that s − r > 0 is a positive real number.
Example 1.3 (a) The intervals [a, b], (a, b], [a, b), (a, b), (−∞, b), and (−∞, b] are bounded
above by b and all numbers greater than b.
(b) E := { n1 | n ∈ N} = {1, 12 , 13 , . . . } is bounded above by any α ≥ 1. It is bounded below
by 0.
Definition 1.2 Suppose S is an ordered set, E ⊆ S, an E is bounded above. Suppose there
exists an α ∈ S such that
(i) α is an upper bound of E.
(ii) If β is an upper bound of E then α ≤ β.
Then α is called the supremum of E (or least upper bound) of E. We write
α = sup E.
An equivalent formulation of (ii) is the following:
(ii)′ If β < α then β is not an upper bound of E.
The infimum (or greatest lower bound) of a set E which is bounded below is defined in the same
manner: The statement
α = inf E
means that α is a lower bound of E and for all lower bounds β of E we have β ≤ α.
Example 1.4 (a) If α = sup E exists, then α may or may not belong to E. For instance consider
[0, 1) and [0, 1]. Then
1 = sup[0, 1) = sup[0, 1],
however 1 6∈ [0, 1) but 1 ∈ [0, 1]. We will show that sup[0, 1] = 1. Obviously, 1 is an upper
bound of [0, 1]. Suppose that β < 1, then β is not an upper bound of [0, 1] since β 6≥ 1. Hence
1 = sup[0, 1].
We show will show that sup[0, 1) = 1. Obviously, 1 is an upper bound of this interval. Suppose
that β < 1. Then β < β+1
< 1. Since β+1
∈ [0, 1), β is not an upper bound. Consequently,
2
2
1 = sup[0, 1).
(b) Consider the sets A and B of Example 1.2 as subsets of the ordered set Q. Since A∪B = Q+
(there is no rational number with x2 = 2) the upper bounds of A are exactly the elements of B.
1.1 Real Numbers
17
Indeed, if a ∈ A and b ∈ B then a2 < 2 < b2 . Taking the square root we have a < b. Since B
contains no smallest member, A has no supremum in Q+ .
Similarly, B is bounded below by any element of A. Since A has no largest member, B has no
infimum in Q.
Remarks 1.1 (a) It is clear from (ii) and the trichotomy of < that there is at most one such α.
Indeed, suppose α′ also satisfies (i) and (ii), by (ii) we have α ≤ α′ and α′ ≤ α; hence α = α′ .
(b) If sup E exists and belongs to E, we call it the maximum of E and denote it by max E.
Hence, max E = sup E and max E ∈ E. Similarly, if the infimum of E exists and belongs to
E we call it the minimum and denote it by min E; min E = inf E, min E ∈ E.
bounded subset of Q
[0, 1]
[0, 1)
A
an upper bound sup
2
2
2
1
1
—
max
1
—
—
(c) Suppose that α is an upper bound of E and α ∈ E then α = max E, that is, property (ii) in
Definition 1.2 is automatically satisfied. Similarly, if β ∈ E is a lower bound, then β = min E.
(d) If E is a finite set it has always a maximum and a minimum.
1.1.2 Fields
Definition 1.3 A field is a set F with two operations, called addition and multiplication which
satisfy the following so-called “field axioms” (A), (M), and (D):
(A) Axioms for addition
(A1)
(A2)
(A3)
(A4)
(A5)
If x ∈ F and y ∈ F then their sum x + y is in F .
Addition is commutative: x + y = y + x for all x, y ∈ F .
Addition is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ F .
F contains an element 0 such that 0 + x = x for all x ∈ F .
To every x ∈ F there exists an element −x ∈ F such that x + (−x) = 0.
(M) Axioms for multiplication
(M1)
(M2)
(M3)
(M4)
(M5)
If x ∈ F and y ∈ F then their product xy is in F .
Multiplication is commutative: xy = yx for all x, y ∈ F .
Multiplication is associative: (xy)z = x(yz) for all x, y, z ∈ F .
F contains an element 1 such that 1x = x for all x ∈ F .
If x ∈ F and x 6= 0 then there exists an element 1/x ∈ F such that x · (1/x) = 1.
(D) The distributive law
x(y + z) = xy + xz
holds for all x, y, z ∈ F .
1 Real and Complex Numbers
18
Remarks 1.2 (a) One usually writes
x − y,
x
, x + y + z, xyz, x2 , x3 , 2x, . . .
y
in place of
1
x + (−y), x · , (x + y) + z, (xy)z, x · x, x · x · x, 2x, . . .
y
(b) The field axioms clearly hold in Q if addition and multiplication have their customary meaning. Thus Q is a field. The integers Z form not a field since 2 ∈ Z has no multiplicative inverse
(axiom (M5) is not fulfilled).
(c) The smallest field is F2 = {0, 1} consisting of the neutral element 0 for addition and the neu+ 0 1
tral element 1 for multiplication. Multiplication and addition are defined as follows 0 0 1
1 1 0
· 0 1
0 0 0 . It is easy to check the field axioms (A), (M), and (D) directly.
1 0 1
(d) (A1) to (A5) and (M1) to (M5) mean that both (F, +) and (F \ {0}, ·) are commutative (or
abelian) groups, respectively.
Proposition 1.5 The axioms of addition imply the following statements.
(a) If x + y = x + z then y = z (Cancellation law).
(b) If x + y = x then y = 0 (The element 0 is unique).
(c) If x + y = 0 the y = −x (The inverse −x is unique).
(d) −(−x) = x.
Proof. If x + y = x + z, the axioms (A) give
y = 0 + y = (−x + x) + y = −x + (x + y) = −x + (x + z)
A4
A5
A3
assump.
= (−x + x) + z = 0 + z = z.
A3
A5
A4
This proves (a). Take z = 0 in (a) to obtain (b). Take z = −x in (a) to obtain (c). Since
−x + x = 0, (c) with −x in place of x and x in place of y, gives (d).
Proposition 1.6 The axioms for multiplication imply the following statements.
(a) If x 6= 0 and xy = xz then y = z (Cancellation law).
(b) If x 6= 0 and xy = x then y = 1 (The element 1 is unique).
(c) If x 6= 0 and xy = 1 then y = 1/x (The inverse 1/x is unique).
(d) If x 6= 0 then 1/(1/x) = x.
The proof is so similar to that of Proposition 1.5 that we omit it.
1.1 Real Numbers
19
Proposition 1.7 The field axioms imply the following statements, for any x, y, z ∈ F
(a) 0x = 0.
(b) If xy = 0 then x = 0 or y = 0.
(c) (−x)y = −(xy) = x(−y).
(d) (−x)(−y) = xy.
Proof. 0x + 0x = (0 + 0)x = 0x. Hence 1.5 (b) implies that 0x = 0, and (a) holds.
Suppose to the contrary that both x 6= 0 and y 6= 0 then (a) gives
1 1
1 1
1 = · xy = · 0 = 0,
y x
y x
a contradiction. Thus (b) holds.
The first equality in (c) comes from
(−x)y + xy = (−x + x)y = 0y = 0,
combined with 1.5 (b); the other half of (c) is proved in the same way. Finally,
(−x)(−y) = −[x(−y)] = −[−xy] = xy
by (c) and 1.5 (d).
1.1.3 Ordered Fields
In analysis dealing with equations is as important as dealing with inequalities. Calculations
with inequalities are based on the ordering axioms. It turns out that all can be reduced to the
notion of positivity.
In F there are distinguished positive elements (x > 0) such that the following axioms are valid.
Definition 1.4 An ordered field is a field F which is also an ordered set, such that for all
x, y, z ∈ F
(O) Axioms for ordered fields
(O1) x > 0 and y > 0 implies x + y > 0,
(O2) x > 0 and y > 0 implies xy > 0.
If x > 0 we call x positive; if x < 0, x is negative.
For example Q and R are ordered fields, if x > y is defined to mean that x − y is positive.
Proposition 1.8 The following statements are true in every ordered field F .
(a) If x < y and a ∈ F then a + x < a + y.
(b) If x < y and x′ < y ′ then x + x′ < y + y ′ .
1 Real and Complex Numbers
20
Proof. (a) By assumption (a + y) − (a + x) = y − x > 0. Hence a + x < a + y.
(b) By assumption and by (a) we have x + x′ < y + x′ and y + x′ < y + y ′ . Using transitivity,
see Definition 1.1 (ii), we have x + x′ < y + y ′.
Proposition 1.9 The following statements are true in every ordered field.
(a) If x > 0 then −x < 0, and if x < 0 then −x > 0.
(b) If x > 0 and y < z then xy < xz.
(c) If x < 0 and y < z then xy > xz.
(d) If x 6= 0 then x2 > 0. In particular, 1 > 0.
(e) If 0 < x < y then 0 < 1/y < 1/x.
Proof. (a) If x > 0 then 0 = −x + x > −x + 0 = −x, so that −x < 0. If x < 0 then
0 = −x + x < −x + 0 = −x so that −x > 0. This proves (a).
(b) Since z > y, we have z − y > 0, hence x(z − y) > 0 by axiom (O2), and therefore
xz = x(z − y) + xy
>
P rp. 1.8
0 + xy = xy.
(c) By (a), (b) and Proposition 1.7 (c)
−[x(z − y)] = (−x)(z − y) > 0,
so that x(z − y) < 0, hence xz < xy.
(d) If x > 0 axiom 1.4 (ii) gives x2 > 0. If x < 0 then −x > 0, hence (−x)2 > 0 But
x2 = (−x)2 by Proposition 1.7 (d). Since 12 = 1, 1 > 0.
(e) If y > 0 and v ≤ 0 then yv ≤ 0. But y · (1/y) = 1 > 0. Hence 1/y > 0, likewise 1/x > 0.
If we multiply x < y by the the positive quantity (1/x)(1/y), we obtain 1/y < 1/x.
Remarks 1.3 (a) The finite field F2 = {0, 1}, see Remarks 1.2, is not an ordered field since
1 + 1 = 0 which contradicts 1 > 0.
(b) The field of complex numbers C (see below) is not an ordered field since i2 = −1 contradicts
Proposition 1.9 (a), (d).
1.1.4 Embedding of natural numbers into the real numbers
Let F be an ordered field. We want to recover the integers inside F . In order to distinguish 0
and 1 in F from the integers 0 and 1 we temporarily write 0F and 1F . For a positive integer
n ∈ N, n ≥ 2 we define
nF := 1F + 1F + · · · + 1F
Lemma 1.10 We have nF > 0F for all n ∈ N.
(n times).
1.1 Real Numbers
21
Proof. We use induction over n. By Proposition 1.9 (d) the statement is true for n = 1. Suppose
it is true for a fixed n, i. e. nF > 0F . Moreover 1F > 0F . Using axiom (O2) we obtain
(n + 1)1F = nF + 1F > 0.
From Lemma 1.10 it follows that m 6= n implies nF 6= mF . Indeed, let n be greater than m,
say n = m + k for some k ∈ N, then nF = mF + kF . Since kF > 0 it follows from 1.8 (a) that
nF > mF . In particular, nF 6= mF . Hence, the mapping
N → F,
n 7→ nF
is a one-to-one correspondence (injective). In this way the positive integers are embedded into
the real numbers. Addition and multiplication of natural numbers and of its embeddings are the
same:
nF mF = (nm)F .
nF + mF = (n + m)F ,
From now on we identify a natural number with the associated real number. We write n for nF .
Definition 1.5 (The Archimedean Axiom) An ordered field F is called Archimedean if for all
x, y ∈ F with x > 0 and y > 0 there exists n ∈ N such that nx > y.
An equivalent formulation is: The subset N ⊂ F of positive integers is not bounded above.
Choose x = 1 in the above definition, then for any y ∈ F there in an n ∈ N such that n > y;
hence N is not bounded above.
Suppose N is not bounded and x > 0, y > 0 are given. Then y/x is not an upper bound for N,
that is there is some n ∈ N with n > y/x or nx > y.
1.1.5 The completeness of
R
Using the axioms so far we are not yet able to prove the existence of irrational numbers. We
need the completeness axiom.
Definition 1.6 (Order Completeness) An ordered set S is said to be order complete if for
every non-empty bounded subset E ⊂ S has a supremum sup E in S.
(C) Completeness Axiom
The real numbers are order complete, i. e. every bounded subset E ⊂ R has a supremum.
The set Q of rational numbers is not order complete since, for example, the bounded set
√
A = {x ∈ Q+ | x2 < 2} has no supremum in Q. Later we will define 2 := sup A.
√
The existence of 2 in R is furnished by the completeness axiom (C).
Axiom (C) implies that every bounded subset E ⊂ R has an infimum. This is an easy consequence of Homework 1.4 (a).
We will see that an order complete field is always Archimedean.
Proposition 1.11 (a) R is Archimedean.
(b) If x, y ∈ R, and x < y then there is a p ∈ Q with x < p < y.
1 Real and Complex Numbers
22
Part (b) may be stated by saying that Q is dense in R.
Proof. (a) Let x, y > 0 be real numbers which do not fulfill the Archimedean property. That is,
if A := {nx | n ∈ N}, then y would be an upper bound of A. Then (C) furnishes that A has a
supremum α = sup A. Since x > 0, α − x < α and α − x is not an upper bound of A. Hence
α − x < mx for some m ∈ N. But then α < (m + 1)x, which is impossible, since α is an upper
bound of A.
(b) See [Rud76, Theorem 29].
Remarks 1.4 (a) If x, y ∈ Q with x < y, then there exists z ∈ R \ Q with x < z < y; chose
√
z = x + (y − x)/ 2.
Ex class: (b) We shall show that inf n1 | n ∈ N = 0. Since n > 0 for all n ∈ N, n1 > 0 by
Proposition 1.9 (e) and 0 is a lower bound. Suppose α > 0. Since R is Archimedean, we find
m ∈ N such that 1 < mα or, equivalently 1/m < α. Hence, α is not a lower bound for E
which proves the claim.
(c) Axiom (C) is equivalent to the Archimedean property together with the topological completeness (“Every Cauchy sequence in R is convergent,” see Proposition 2.18).
(d) Axiom (C) is equivalent to the axiom of nested intervals, see Proposition 2.11 below:
Let In := [an , bn ] a sequence of closed nested intervals, that is (I1 ⊇ I2 ⊇ I3 ⊇ · · · )
such that for all ε > 0 there exists n0 such that 0 ≤ bn − an < ε for all n ≥ n0 .
Then there exists a unique real number a ∈ R which is a member of all intervals,
T
i. e. {a} = n∈N In .
1.1.6 The Absolute Value
For x ∈ R one defines
| x | :=
(
x,
−x,
if
if
x ≥ 0,
x < 0.
Lemma 1.12 For a, x, y ∈ R we have
(a) | x | ≥ 0 and | x | = 0 if and only if x = 0. Further | −x | = | x |.
(b) ±x ≤ | x |, | x | = max{x,
−x}, and | x | ≤ a ⇐⇒ (x ≤ a and
(c) | xy | = | x | | y | and xy = || xy || if y 6= 0.
(d) | x + y | ≤ | x | + | y | (triangle inequality).
(e) | | x | − | y | | ≤ | x + y |.
− x ≤ a).
Proof. (a) By Proposition 1.9 (a), x < 0 implies | x | = −x > 0. Also, x > 0 implies | x | > 0.
Putting both together we obtain, x 6= 0 implies | x | > 0 and thus | x | = 0 implies x = 0.
Moreover | 0 | = 0. This shows the first part.
The statement | x | = | −x | follows from (b) and −(−x) = x.
(b) Suppose first that x ≥ 0. Then x ≥ 0 ≥ −x and we have max{x, −x} = x = | x |. If x < 0
then −x > 0 > x and
1.1 Real Numbers
23
max{−x, x} = −x = | x |. This proves max{x, −x} = | x |. Since the maximum is an
upper bound, | x | ≥ x and | x | ≥ −x. Suppose now a is an upper bound of {x, −x}. Then
| x | = max{x, −x} ≤ a. On the other hand, max{x, −x} ≤ a implies that a is an upper bound
of {x, −x} since max is.
One proves the first part of (c) by verifying the four cases (i) x, y ≥ 0, (ii) x ≥ 0, y < 0, (iii)
x < 0, y ≥ 0, and (iv) x, y < 0 separately. (i) is clear. In case (ii) we have by Proposition 1.9 (a)
and (b) that xy ≤ 0, and Proposition 1.7 (c)
| x | | y | = x(−y) = −(xy) = | xy | .
The cases (iii) and (iv) are similar. To the second part. Since x = xy · y we have by the first part of (c), | x | = xy | y |. The claim follows by multipli-
cation with | y1 | .
(d) By (b) we have ±x ≤ | x | and ±y ≤ | y |. It follows from Proposition 1.8 (b) that
±(x + y) ≤ | x | + | y | .
By the second part of (b) with a = | x | + | y |, we obtain | x + y | ≤ | x | + | y |.
(e) Inserting u := x + y and v := −y into | u + v | ≤ | u | + | v | one obtains
| x | ≤ | x + y | + | −y | = | x + y | + | y | .
Adding − | y | on both sides one obtains | x | − | y | ≤ | x + y |. Changing the role of x and y
in the last inequality yields −(| x | − | y |) ≤ | x + y |. The claim follows again by (b) with
a = | x + y |.
1.1.7 Supremum and Infimum revisited
The following equivalent definition for the supremum of sets of real numbers is often used in
the sequel. Note that
x≤β
=⇒ sup M ≤ β.
∀x ∈ M
Similarly, α ≤ x for all x ∈ M implies α ≤ inf M.
Remarks 1.5 (a) Suppose that E ⊂ R. Then α is the supremum of E if and only if
(1) α is an upper bound for E,
(2) For all ε > 0 there exists x ∈ E with α − ε < x.
Using the Archimedean axiom (2) can be replaced by
(2’) For all n ∈ N there exists x ∈ E such that α −
1
n
< x.
1 Real and Complex Numbers
24
(b) Let M ⊂ R and N ⊂ R nonempty subsets which are bounded above.
Then M + N := {m + n | m ∈ M, n ∈ N} is bounded above and
sup(M + N) = sup M + sup N.
(c) Let M ⊂ R+ and N ⊂ R+ nonempty subsets which are bounded above.
Then MN := {mn | m ∈ M, n ∈ N} is bounded above and
sup(MN) = sup M sup N.
1.1.8 Powers of real numbers
We shall prove the existence of nth roots of positive reals. We already know xn , n ∈ Z. It is
1
recursively defined by xn := xn−1 · x, x1 := x, n ∈ N and xn := x−n
for n < 0.
Proposition 1.13 (Bernoulli’s inequality) Let x ≥ −1 and n ∈ N. Then we have
(1 + x)n ≥ 1 + nx.
Equality holds if and only if x = 0 or n = 1.
Proof. We use induction over n. In the cases n = 1 and x = 0 we have equality. The strict
inequality (with an > sign in place of the ≥ sign) holds for n0 = 2 and x 6= 0 since (1 + x)2 =
1 + 2x + x2 > 1 + 2x. Suppose the strict inequality is true for some fixed n ≥ 2 and x 6= 0.
Since 1 + x ≥ 0 by Proposition 1.9 (b) multiplication of the induction assumption by this factor
yields
(1 + x)n+1 ≥ (1 + nx)(1 + x) = 1 + (n + 1)x + nx2 > 1 + (n + 1)x.
This proves the strict assertion for n + 1. We have equality only if n = 1 or x = 0.
Lemma 1.14 (a) For x, y ∈ R with x, y > 0 and n ∈ N we have
x < y ⇐⇒ xn < y n .
(b) For x, y ∈ R+ and n ∈ N we have
nxn−1 (y − x) ≤ y n − xn ≤ ny n−1 (y − x).
We have equality if and only if n = 1 or x = y.
Proof. (a) Observe that
n
n
y − x = (y − x)
n
X
k=1
y n−k xk−1 = c(y − x)
(1.2)
1.1 Real Numbers
25
P
with c := nk=1 y n−k xk−1 > 0 since x, y > 0. The claim follows.
(b) We have
y n − xn − nxn−1 (y − x) = (y − x)
= (y − x)
n
X
k=1
n
X
k=1
y n−k xk−1 − xn−1
xk−1 y n−k − xn−k ≥ 0
since by (a) y − x and y n−k − xn−k have the same sign. The proof of the second inequality is
quite analogous.
Proposition 1.15 For every real x > 0 and every positive integer n ∈ N there is one and only
one y > 0 such that y n = x.
√
1
This number y is written n x or x n , and it is called “the nth root of x”.
Proof. The uniqueness is clear since by Lemma 1.14 (a) 0 < y1 < y2 implies 0 < y1n < y2n .
Set
E := {t ∈ R+ | tn < x}.
Observe that E 6= ∅ since 0 ∈ E. We show that E is bounded above. By Bernoulli’s inequality
and since 0 < x < nx we have
t ∈ E ⇔ tn < x < 1 + nx < (1 + x)n
=⇒
t<1+x
Lemma 1.14
Hence, 1 + x is an upper bound for E. By the order completeness of R there exists y ∈ R such
that y = sup E. We have to show that y n = x. For, we will show that each of the inequalities
y n > x and y n < x leads to a contradiction.
Assume y n < x and consider (y + h)n with “small” h (0 < h < 1). Lemma 1.14 (b) implies
0 ≤ (y + h)n − y n ≤ n (y + h)n−1 (y + h − y) < h n(y + 1)n−1 .
Choosing h small enough that h n(y + 1)n−1 < x − y n we may continue
(y + h)n − y n ≤ x − y n .
Consequently, (y + h)n < x and therefore y + h ∈ E. Since y + h > y, this contradicts the fact
that y is an upper bound of E.
Assume y n > x and consider (y − h)n with “small” h (0 < h < 1). Again by Lemma 1.14 (b)
we have
0 ≤ y n − (y − h)n ≤ n y n−1(y − y + h) < h ny n−1.
Choosing h small enough that h ny n−1 < y n − x we may continue
y n − (y − h)n ≤ y n − x.
1 Real and Complex Numbers
26
Consequently, x < (y − h)n and therefore tn < x < (y − h)n for all t in E. Hence y − h is
an upper bound for E smaller than y. This contradicts the fact that y is the least upper bound.
Hence y n = x, and the proof is complete.
Remarks 1.6 (a) If a and b are positive real numbers and n ∈ N then (ab)1/n = a1/n b1/n .
Proof. Put α = a1/n and β = b1/n . Then ab = αn β n = (αβ)n , since multiplication is commutative. The uniqueness assertion of Proposition 1.15 shows therefore that
(ab)1/n = αβ = a1/n b1/n .
(b) Fix b > 0. If m, n, p, q ∈ Z and n > 0, q > 0, and r = m/n = p/q. Then we have
(bm )1/n = (bp )1/q .
(1.3)
Hence it makes sense to define br = (bm )1/n .
(c) Fix b > 1. If x ∈ R define
bx = sup{bp | p ∈ Q, p < x}.
For 0 < b < 1 set
bx =
(1.4)
1
.
1 x
b
Without proof we give the familiar laws for powers and exponentials. Later we will redefine the
power bx with real exponent. Then we are able to give easier proofs.
(d) If a, b > 0 and x, y ∈ R, then
x
(i) bx+y = bx by , bx−y = bby , (ii) bxy = (bx )y , (iii) (ab)x = ax bx .
1.1.9 Logarithms
Fix b > 1, y > 0. Similarly as in the preceding subsection, one can prove the existence of a
unique real x such that bx = y. This number x is called the logarithm of y to the base b, and we
write x = logb y. Knowing existence and uniqueness of the logarithm, it is not difficult to prove
the following properties.
Lemma 1.16 For any a > 0, a 6= 1 we have
(a) loga (bc) = loga b + loga c if b, c > 0;
(b) loga (bc ) = c loga b, if b > 0;
logd b
if b, d > 0 and d 6= 1.
(c) loga b =
logd a
Later we will give an alternative definition of the logarithm function.
1.1 Real Numbers
27
Review of Trigonometric Functions
(a) Degrees and Radians
The following table gives some important angles in degrees and radians. The precise definition
of π is given below. For a moment it is just an abbreviation to measure angles. Transformation
2π
of angles αdeg measured in degrees into angles measured in radians goes by αrad = αdeg 360
◦.
Degrees 0◦
Radians 0
30◦
45◦
60◦
90◦
120◦
135◦
150◦
π
6
π
4
π
3
π
2
2π
3
3π
4
5π
6
180◦
π
270◦
3π
2
360◦
2π
(b) Sine and Cosine
The sine, cosine, and tangent functions are defined in terms of ratios of sides of a right triangle:
length of the side adjacent to ϕ
,
length of the hypotenuse
length of the side opposite to ϕ
,
sin ϕ =
length of the hypotenuse
length of the side opposite to ϕ
tan ϕ =
.
length of the side adjacent to θ
cos ϕ =
hypotenuse
opposite side
.
ϕ
adjacent side
Let ϕ be any angle between 0◦ and 360◦ . Further let P be the point on the unit circle (with
center in (0, 0) and radius 1) such that the ray from P to the origin (0, 0) and the positive x-axis
make an angle ϕ. Then cos ϕ and sin ϕ are defined to be the x-coordinate and the y-coordinate
of the point P , respectively.
y
P
1
sin ϕ
ϕ
cos ϕ
x
If the angle ϕ is between 0◦ and 90◦ this new definition coincides
with the definition using the right triangle since the hypotenuse
which is a radius of the unit circle has now length 1.
y
P
If 90◦ < ϕ < 180◦ we find
sin ϕ
ϕ
cos ϕ
x
cos ϕ = − cos(180◦ − ϕ) < 0,
sin ϕ = sin(180◦ − ϕ) > 0.
1 Real and Complex Numbers
28
y
If 180◦ < ϕ < 270◦ we find
ϕ
cos ϕ
cos ϕ = − cos(ϕ − 180◦ ) < 0,
x
sin ϕ = − sin(ϕ − 180◦ ) < 0.
sin ϕ
P
y
If 270◦ < ϕ < 360◦ we find
ϕ
cos ϕ
cos ϕ = cos(360◦ − ϕ) > 0,
x
sin ϕ = − sin(360◦ − ϕ) < 0.
sin ϕ
1
P
For angles greater than 360◦ or less than 0◦ define
cos ϕ = cos(ϕ + k·360◦),
sin ϕ = sin(ϕ + k·360◦),
where k ∈ Z is chosen such that 0◦ ≤ ϕ + k 360◦ < 360◦ . Thinking of ϕ to be given in radians,
cosine and sine are functions defined for all real ϕ taking values in the closed interval [−1, 1].
If ϕ 6= π2 + kπ, k ∈ Z then cos ϕ 6= 0 and we define
tan ϕ :=
If ϕ 6= kπ, k ∈ Z then sin ϕ 6= 0 and we define
cot ϕ :=
sin ϕ
.
cos ϕ
cos ϕ
.
sin ϕ
In this way we have defined cosine, sine, tangent, and cotangent for arbitrary angles.
(c) Special Values
x in degrees
0◦
30◦
45◦
60◦
90◦
120◦
135◦
150◦
180◦
270◦
360◦
x in radians
0
π
6
π
4
π
3
π
2
2π
3
3π
4
5π
6
π
3π
2
2π
sin x
0
1
2
√
√
√
√
1
2
0
−1
0
cos x
1
√
3
2
− 12
−
−1
0
1
2
2
√
2
2
3
2
1
2
1
0
2
2
3
2
√
2
2
−
√
3
2
√
√
√
√
3
3
tan x
0
1
3
/
−
3
−1
−
0
/
0
3
3
Recall the addition formulas for cosine and sine and the trigonometric pythagoras.
cos(x + y) = cos x cos y − sin x sin y,
sin(x + y) = sin x cos y + cos x sin y.
sin2 x + cos2 x = 1.
(1.5)
(1.6)
1.2 Complex numbers
29
1.2 Complex numbers
Some algebraic equations do not have solutions in the real number system. For instance the
quadratic equation x2 − 4x + 8 = 0 gives ‘formally’
√
√
x1 = 2 + −4 and x2 = 2 − −4.
We will see that one can work with this notation.
Definition 1.7 A complex number is an ordered pair (a, b) of real numbers. “Ordered” means
that (a, b) 6= (b, a) if a 6= b. Two complex numbers x = (a, b) and y = (c, d) are said to be
equal if and only if a = c and b = d. We define
x + y := (a + c, b + d),
xy := (ac − bd, ad + bc).
Theorem 1.17 These definitions turn the set of all complex numbers into a field, with (0, 0) and
(1, 0) in the role of 0 and 1.
Proof. We simply verify the field axioms as listed in Definition 1.3. Of course, we use the field
structure of R.
Let x = (a, b), y = (c, d), and z = (e, f ). (A1) is clear.
(A2) x + y = (a + c, b + d) = (c + a, d + b) = y + x.
(A3) (x+y)+z = (a+c, b+d)+(e, f ) = (a+c+e, b+d+f ) = (a, b)+(c+e, d+f ) = x+(y+z).
(A4) x + 0 = (a, b) + (0, 0) = (a, b) = x.
(A5) Put −x := (−a, −b). Then x + (−x) = (a, b) + (−a, −b) = (0, 0) = 0.
(M1) is clear.
(M2) xy = (ac − bd, ad + bc) = (ca − db, da + cb) = yx.
(M3) (xy)z = (ac − bd, ad + bc)(e, f ) = (ace − bde − adf − bcf, acf − bdf + ade + bce) =
(a, b)(ce − df, cf + de) = x(yz).
(M4) x · 1 = (a, b)(1, 0) = (a, b) = x.
(M5) If x 6= 0 then (a, b) 6= (0, 0), which means that at least one of the real numbers a, b is
different from 0. Hence a2 + b2 > 0 and we can define
−b
1
a
,
.
:=
x
a2 + b2 a2 + b2
Then
1
x · = (a, b)
x
(D)
a
−b
,
a2 + b2 a2 + b2
= (1, 0) = 1.
x(y + z) = (a, b)(c + e, d + f ) = (ac + ae − bd − bf, ad + af + bc + be)
= (ac − bd, ad + bc) + (ae − bf, af + be)
= xy + yz.
1 Real and Complex Numbers
30
Remark 1.7 For any two real numbers a and b we have (a, 0) + (b, 0) = (a + b, 0) and
(a, 0)(b, 0) = (ab, 0). This shows that the complex numbers (a, 0) have the same arithmetic
properties as the corresponding real numbers a. We can therefore identify (a, 0) with a. This
gives us the real field as a subfield of the complex field.
Note that we have defined the complex numbers without any reference to the mysterious square
root of −1. We now show that the notation (a, b) is equivalent to the more customary a + bi.
Definition 1.8 i := (0, 1).
Lemma 1.18 (a) i2 = −1. (b) If a, b ∈ R then (a, b) = a + bi.
Proof. (a) i2 = (0, 1)(0, 1) = (−1, 0) = −1.
(b) a + bi = (a, 0) + (b, 0)(0, 1) = (a, 0) + (0, b) = (a, b).
Definition 1.9 If a, b are real and z = a + bi, then the complex number z := a − bi is called the
conjugate of z. The numbers a and b are the real part and the imaginary part of z, respectively.
We shall write a = Re z and b = Im z.
Proposition 1.19 If z and w are complex, then
(a) z + w = z + w,
(b) zw = z · w,
(c) z + z = 2 Re z,
z − z = 2i Im z,
(d) z z is positive real except when z = 0.
Proof. (a), (b), and (c) are quite trivial. To prove (d) write z = a + bi and note that z z = a2 + b2 .
Definition 1.10 If z is complex number, its absolute value | z | is the (nonnegative) root of z z;
√
that is | z | := z z.
The existence (and uniqueness)√of | x | follows from Proposition 1.19 (d). Note that when x is
real, then x = x, hence | x | = x2 . Thus | x | = x if x > 0 and | x | = −x if x < 0. We have
recovered the definition of the absolute value for real numbers, see Subsection 1.1.6.
Proposition 1.20 Let z and w be complex numbers . Then
(a) | z | > 0 unless z = 0,
(b) | z | = | z |,
(c) | zw | = | z | | w |,
(d) | Re z | ≤ | z |,
(e) | z + w | ≤ | z | + | w | .
Proof. (a) and (b) are trivial. Put z = a + bi and w = c + di, with a, b, c, d real. Then
| zw |2 = (ac − bd)2 + (ad + bc)2 = (a2 + b2 )(c2 + d2 ) = | z |2 | w |2
1.2 Complex numbers
31
or | zw |2 = (| z | | w |)2 . Now (c) follows from the uniqueness assertion for roots.
To prove (d), note that a2 ≤ a2 + b2 , hence
√
√
| a | = a2 ≤ a2 + b2 = | z | .
To prove (e), note that z w is the conjugate of z w, so that z w + zw = 2 Re (z w). Hence
| z + w |2 = (z + w)(z + w) = z z + z w + z w + w w
= | z |2 + 2 Re (z w) + | w |2
≤ | z |2 + 2 | z | | w | + | w |2 = (| z | + | w |)2 .
Now (e) follows by taking square roots.
1.2.1 The Complex Plane and the Polar form
There is a bijective correspondence between complex numbers and the points of a plane.
Im
z=a+b i
b
r=| z|
ϕ
a
By the Pythagorean theorem it is clear that
√
| z | = a2 + b2 is exactly the distance of
z from the origin 0. The angle ϕ between
the positive real axis and the half-line 0z is
called the argument of z and is denoted by
ϕ = arg z. If z 6= 0, the argument ϕ is
uniquely determined up to integer multiples
of 2π
Re
Elementary trigonometry gives
sin ϕ =
b
,
|z|
cos ϕ =
a
.
|z|
This gives with r = | z |, a = r cos ϕ and b = r sin ϕ. Inserting these into the rectangular form
of z yields
z = r(cos ϕ + i sin ϕ),
(1.7)
which is called the polar form of the complex number z.
√
√
Example 1.5 a) z = 1 + i. Then | z | = 2 and sin ϕ = 1/ 2 = cos ϕ. This implies ϕ = π/4.
√
Hence, the polar form of z is 1 + i = 2(cos π/4 + i sin π/4).
b) z = −i. We have | −i | = 1 and sin ϕ = −1, cos ϕ = 0. Hence ϕ = 3π/2 and −i =
1(cos 3π/2 + i sin 3π/2).
c) Computing the rectangular form of z from the polar form is easier.
√
√
z = 32(cos 7π/6 + i sin 7π/6) = 32(− 3/2 − i/2) = −16 3 − 16i.
1 Real and Complex Numbers
32
z+w
z
w
The addition of complex numbers corresponds
to the addition of vectors in the plane. The geometric meaning of the inequality | z + w | ≤
| z | + | w | is: the sum of two edges of a triangle
is bigger than the third edge.
Multiplying complex numbers z = r(cos ϕ +
i sin ϕ) and w = s(cos ψ + i sin ψ) in the polar
form gives
zw
zw = rs(cos ϕ + i sin ϕ)(cos ψ + i sin ψ)
= rs(cos ϕ cos ψ − sin ϕ sin ψ)+
z
w
i(cos ϕ sin ψ + sin ϕ cos ψ)
zw = rs(cos(ϕ + ψ) + i sin(ϕ + ψ)), (1.8)
where we made use of the addition laws for sin
and cos in the last equation.
Hence, the product of complex numbers is formed by taking the product of their absolute values
and the sum of their arguments.
The geometric meaning of multiplication by w is a similarity transformation of C. More precisely, we have a rotation around 0 by the angle ψ and then a dilatation with factor s and center
0.
Similarly, if w 6= 0 we have
z
r
= (cos(ϕ − ψ) + i sin(ϕ − ψ)) .
(1.9)
w
s
Proposition 1.21 (De Moivre’s formula) Let z = r(cos ϕ + i sin ϕ) be a complex number with
absolute value r and argument ϕ. Then for all n ∈ Z one has
z n = r n (cos(nϕ) + i sin(nϕ)).
(1.10)
Proof. (a) First let n > 0. We use induction over n to prove De Moivre’s formula. For n = 1
there is nothing to prove. Suppose (1.10) is true for some fixed n. We will show that the
assertion is true for n + 1. Using induction hypothesis and (1.8) we find
z n+1 = z n ·z = r n (cos(nϕ)+i sin(nϕ))r(cos ϕ+i sin ϕ) = r n+1 (cos(nϕ+ϕ)+i sin(nϕ+ϕ)).
This proves the induction assertion.
(b) If n < 0, then z n = 1/(z −n ). Since 1 = 1(cos 0 + i sin 0), (1.9) and the result of (a) gives
zn =
1
z −n
=
1
r −n
(cos(0 − (−n)ϕ) + i sin(0 − (−n)ϕ)) = r n (cos(nϕ) + i sin(nϕ)).
This completes the proof.
1.2 Complex numbers
33
√
Example 1.6 Compute the polar form of z = 3 − 3i and compute z 15 .
√
√
√
We have | z | = 3 + 9 = 2 3, cos ϕ = 1/2, and sin ϕ = − 3/2. Therefore, ϕ = −π/3 and
√
z = 2 3(cos(−π/3) + sin(−π/3)). By the De Moivre’s formula we have
√
√
π π
+ i sin −15
= 215 37 3(cos(−5π) + i sin(−5π))
z 15 = (2 3)15 cos −15
3
3
√
15
15 7
z = −2 3 3.
1.2.2 Roots of Complex Numbers
Let z ∈ C and n ∈ N. A complex number w is called an nth root of z if w n = z. In contrast
to the real case, roots of complex numbers are not unique. We will see that there are exactly n
different nth roots of z for every z 6= 0.
Let z = r(cos ϕ + i sin ϕ) and w = s(cos ψ + i sin ψ) an nth root of z. De Moivre’s formula
√
gives w n = sn (cos nψ + i sin nψ). If we compare w n and z we get sn = r or s = n r ≥ 0.
ϕ 2kπ
, k ∈ Z. For k = 0, 1, . . . , n − 1 we obtain
Moreover nψ = ϕ + 2kπ, k ∈ Z or ψ = +
n
n
different values ψ0 , ψ1 , . . . , ψn−1 modulo 2π. We summarize.
Lemma 1.22 Let n ∈ N and z = r(cos ϕ + i sin ϕ) 6= 0 a complex number. Then the complex
numbers
√
ϕ + 2kπ
ϕ + 2kπ
n
, k = 0, 1, . . . , n − 1
wk = r cos
+ i sin
n
n
are the n different nth roots of z.
Example 1.7 Compute the 4th roots of z = −1.
w1
w0
| z | = 1 =⇒ | w | =
√
4
1 = 1, arg z = ϕ = 180◦. Hence
ϕ
,
4
ϕ 1 · 360◦
ψ1 = +
= 135◦,
4
4
ϕ 2 · 360◦
= 225◦,
ψ2 = +
4
4
ϕ 3 · 360◦
= 315◦.
ψ3 = +
4
4
ψ0 =
z=-1
w2
w3
We obtain
1√
1√
2+i
2
2
2
1√
1√
2+i
2,
w1 = cos 135◦ + i sin 135◦ = −
2
2
1√
1√
2−i
2,
w2 = cos 225◦ + i sin 225◦ = −
2
2
1√
1√
w3 = cos 315◦ + i sin 315◦ =
2−i
2.
2
2
Geometric interpretation of the nth roots. The nth roots of z 6= 0 form a regularp
n-gon in the
n
complex plane with center 0. The vertices lie on a circle with center 0 and radius | z |.
w0 = cos 45◦ + i sin 45◦
=
1 Real and Complex Numbers
34
1.3 Inequalities
1.3.1 Monotony of the Power and Exponential Functions
Lemma 1.23 (a) For a, b > 0 and r ∈ Q we have
a < b ⇐⇒ ar < br
r
if r > 0,
r
a < b ⇐⇒ a > b
if r < 0.
r < s ⇐⇒ ar < as
if a > 1,
(b) For a > 0 and r, s ∈ Q we have
r < s ⇐⇒ ar > as
if a < 1.
Proof. Suppose that r > 0, r = m/n with integers m, n ∈ Z, n > 0. Using Lemma 1.14 (a)
twice we get
1
1
a < b ⇐⇒ am < bm ⇐⇒ (am ) n < (bm ) n ,
which proves the first claim. The second part r < 0 can be obtained by setting −r in place of r
in the first part and using Proposition 1.9 (e).
(b) Suppose that s > r. Put x = s − r, then x ∈ Q and x > 0. By (a), 1 < a implies
1 = 1x < ax . Hence 1 < as−r = as /ar (here we used Remark 1.6 (d)), and therefore ar < as .
Changing the roles of r and s shows that s < r implies as < ar such that the converse direction
is also true.
The proof for a < 1 is similar.
1.3.2 The Arithmetic-Geometric mean inequality
Proposition 1.24 Let n ∈ N and x1 , . . . , xn be in R+ . Then
√
x1 + · · · + xn
≥ n x1 · · · xn .
n
We have equality if and only if x1 = x2 = · · · = xn .
(1.11)
Proof. We use forward-backward induction over n. First we show (1.11) is true for all n which
are powers of 2. Then we prove that if (1.11) is true for some n + 1, then it is true for n. Hence,
it is true for all positive integers.
√
√
√
The inequality is true for n = 1. Let a, b ≥ 0 then ( a − b)2 ≥ 0 implies a + b ≥ 2 ab and
the inequality is true in case n = 2. Equality holds if and only if a = b. Suppose it is true for
some fixed k ∈ N; we will show that it is true for 2k. Let x1 , . . . , xk , y1 , . . . , yk ∈ R+ . Using
induction assumption and the inequality in case n = 2, we have

!
!
!1/k
!1/k 
k
k
k
k
k
k
X
Y
Y
1 X
1X
1 1X
1

xi +
yi ≥
xi +
yi ≥ 
xi
+
yi
2k i=1
2
k
k
2
i=1
i=1
i=1
i=1
i=1
≥
k
Y
i=1
xi
k
Y
i=1
yi
! 2k1
.
1.3 Inequalities
35
This completes the ‘forward’ part. Assume now (1.11) is true for n + 1. We will show it for n.
P
Let x1 , . . . , xn ∈ R+ and set A := ( ni=1 xi )/n. By induction assumption we have
1
! n+1
n
Y
n
Y
1
! n+1
1
1
1
⇐⇒
A n+1
(x1 + · · · + xn + A) ≥
(nA + A) ≥
xi A
xi
n+1
n+1
i=1
i=1
1
1
!
!
!1/n
n
n
n
n+1
n+1
Y
Y
Y
1
n
A≥
A n+1 ⇐⇒ A n+1 ≥
⇐⇒ A ≥
xi
xi
xi
.
i=1
i=1
i=1
It is trivial that in case x1 = x2 = · · · = xn we have equality. Suppose that equality holds in
a case where at least two of the xi are different, say x1 < x2 . Consider the inequality with the
new set of values x′1 := x′2 := (x1 + x2 )/2, and x′i = xi for i ≥ 3. Then
!1/n
n
n
n
X
X
Y
1
1
xk
=
xk =
x′k ≥
x′k
.
n
n
k=1
k=1
k=1
k=1
2
x1 + x2
′ ′
⇐⇒ 4x1 x2 ≥ x21 + 2x1 x2 + x22 ⇐⇒ 0 ≥ (x1 − x2 )2 .
x1 x2 ≥ x1 x2 =
2
n
Y
!1/n
This contradicts the choice x1 < x2 . Hence, x1 = x2 = · · · = xn is the only case where
equality holds. This completes the proof.
1.3.3 The Cauchy–Schwarz Inequality
Proposition 1.25 (Cauchy–Schwarz inequality) Suppose that x1 , . . . , xn , y1 , . . . , yn are real
numbers. Then we have
!2
n
n
n
X
X
X
xk yk
≤
x2k ·
yk2.
(1.12)
k=1
k=1
k=1
Equality holds if and only if there exists t ∈ R such that yk = t xk for k = 1, . . . , n that is, the
vector y = (y1 , . . . , yn ) is a scalar multiple of the vector x = (x1 , . . . , xn ).
Proof. Consider the quadratic function f (t) = at2 − 2bt + c where
a=
n
X
x2k ,
b=
k=1
n
X
xk yk ,
c=
k=1
n
X
yk2 .
k=1
Then
f (t) =
=
n
X
k=1
n
X
k=1
x2k t2
−
n
X
k=1
2xk yk t +
n
X
yk2
k=1
n
X
x2k t2 − 2xk yk t + yk2 =
(xk t − yk )2 ≥ 0.
k=1
1 Real and Complex Numbers
36
Equality holds if and only if there is a t ∈ R with yk = txk for all k. Suppose now, there is no
such t ∈ R. That is
f (t) > 0, for all t ∈ R.
√
In other words, the polynomial f (t) = at2 − 2bt + c has no real zeros, t1,2 = a1 b ± b2 − ac .
That is, the discriminant D = b2 − ac is negative (only complex roots); hence b2 < ac:
!2
n
n
n
X
X
X
2
xk yk
<
xk ·
yk2 .
k=1
k=1
k=1
this proves the claim.
Corollary 1.26 (The Complex Cauchy–Schwarz inequality) If x1 , . . . , xn and y1 , . . . , yn are
complex numbers, then
2
n
n
n
X
X
X
2
xk yk ≤
| xk |
| y k |2 .
(1.13)
k=1
k=1
k=1
Equality holds if and only if there exists a λ ∈ C such that y = λx, where y = (y1 , . . . , yn ) ∈
Cn , x = (x1 , . . . , xn ) ∈ Cn .
Proof. Using the generalized triangle inequality | z1 + · · · + zn | ≤ | z1 | + · · · + | zn | and the
real Cauchy–Schwarz inequality we obtain
2
!2
!2
n
n
n
n
n
X
X
X
X
X
2
xk yk ≤
| xk yk | =
| xk | | yk | ≤
| xk | ·
| yk |2 .
k=1
k=1
k=1
k=1
k=1
This proves the inequality.
The right “less equal” is an equality if there is a t ∈ R such that | y | = t | x |. In the first “less
equal” sign we have equality if and only if all zk = xk y k have the same argument; that is
arg yk = arg xk + ϕ. Putting both together yields y = λx with λ = t(cos ϕ + i sin ϕ).
1.4 Appendix A
In this appendix we collect some assitional facts which were not covered by the lecture.
We now show that the equation
x2 = 2
(1.14)
is not satisfied by any rational number x.
Suppose to the contrary that there were such an x, we could write x = m/n with integers m
and n, n 6= 0 that are not both even. Then (1.14) implies
m2 = 2n2 .
(1.15)
1.4 Appendix A
37
This shows that m2 is even and hence m is even. Therefore m2 is divisible by 4. It follows that
the right hand side of (1.15) is divisible by 4, so that n2 is even, which implies that n is even.
But this contradicts our choice of m and n. Hence (1.14) is impossible for rational x.
We shall show that A contains no largest element and B contains no smallest. That is for every
p ∈ A we can find a rational q ∈ A with p < q and for every p ∈ B we can find a rational q ∈ B
such that q < p.
Suppose that p is in A. We associate with p > 0 the rational number
2p + 2
2 − p2
=
.
p+2
p+2
(1.16)
4p2 + 8p + 4 − 2p2 − 8p − 8
2(p2 − 2)
=
.
(p + 2)2
(p + 2)2
(1.17)
q =p+
Then
q2 − 2 =
If p is in A then 2 − p2 > 0, (1.16) shows that q > p, and (1.17) shows that q 2 < 2. If p is in B
then 2 < p2 , (1.16) shows that q < p, and (1.17) shows that q 2 > 2.
A Non-Archimedean Ordered Field
The fields Q and R are Archimedean, see below. But there exist ordered fields without this
property. Let F := R(t) the field of rational functions f (t) = p(t)/q(t) where p and q are
polynomials with real coefficients. Since p and q have only finitely many zeros, for large t,
f (t) is either positive or negative. In the first case we set f > 0. In this way R(t) becomes an
ordered field. But t > n for all n ∈ N since the polynomial f (t) = t − n becomes positive for
large t (and fixed n).
Our aim is to define bx for arbitrary real x.
Lemma 1.27 Let b, p be real numbers with b > 1 and p > 0. Set
M = {br | r ∈ Q, r < p},
M ′ = {bs | s ∈ Q, p < s}.
Then
sup M = inf M ′ .
Proof. (a) M is bounded above by arbitrary bs , s ∈ Q, with s > p, and M ′ is bounded below by
any br , r ∈ Q, with r < p. Hence sup M and inf M ′ both exist.
(b) Since r < p < s implies ar < bs by Lemma 1.23, sup M ≤ bs for all bs ∈ M ′ . Taking the
infimum over all such bs , sup M ≤ inf M ′ .
(c) Let s = sup M and ε > 0 be given. We want to show that inf M ′ < s + ε. Choose n ∈ N
such that
1/n < ε/(s(b − 1)).
(1.18)
By Proposition 1.11 there exist r, s ∈ Q with
r < p < s and
s−r <
1
.
n
(1.19)
1 Real and Complex Numbers
38
Using s − r < 1/n, Bernoulli’s inequality (part 2), and (1.18), we compute
1
1
bs − br = br (bs−r − 1) ≤ s(b n − 1) ≤ s (b − 1) < ε.
n
Hence
inf M ′ ≤ bs < br + ε ≤ sup M + ε.
Since ε was arbitrary, inf M ′ ≤ sup M, and finally, with the result of (b), inf M ′ = sup M.
Corollary 1.28 Suppose p ∈ Q and b > 1 is real. Then
bp = sup{br | r ∈ Q, r < p}.
Proof. For all rational numbers r, p, s ∈ Q, r < p < s implies ar < ap < as . Hence
sup M ≤ ap ≤ inf M ′ . By the lemma, these three numbers coincide.
Inequalities
Now we extend Bernoulli’s inequality to rational exponents.
Proposition 1.29 (Bernoulli’s inequality) Let a ≥ −1 real and r ∈ Q. Then
(a) (1 + a)r ≥ 1 + ra if r ≥ 1,
(b) (1 + a)r ≤ 1 + ra if 0 ≤ r ≤ 1.
Equality holds if and only if a = 0 or r = 1.
Proof. (b) Let r = m/n with m ≤ n, m, n ∈ N. Apply (1.11) to xi := 1 + a, i = 1, . . . , m and
xi := 1 for i = m + 1, . . . , n. We obtain
1
1
(m(1 + a) + (n − m)1) ≥ (1 + a)m · 1n−m n
n
m
m
a + 1 ≥ (1 + a) n ,
n
which proves (b). Equality holds if n = 1 or if x1 = · · · = xn i. e. a = 0.
(a) Now let s ≥ 1, z ≥ −1. Setting r = 1/s and a := (1 + z)1/r − 1 we obtain r ≤ 1 and
a ≥ −1. Inserting this into (b) yields
1 r
(1 + a)r ≤ (1 + z) r ≤ 1 + r ((1 + z)s − 1)
z ≤ r ((1 + z)s − 1)
1 + sz ≤ (1 + z)s .
This completes the proof of (a).
1.4 Appendix A
39
Corollary 1.30 (Bernoulli’s inequality) Let a ≥ −1 real and x ∈ R. Then
(a) (1 + a)x ≥ 1 + xa if x ≥ 1,
(b) (1 + a)x ≤ 1 + xa if x ≤ 1. Equality holds if and only if a = 0 or x = 1.
Proof. (a) First let a > 0. By Proposition 1.29 (a) (1 + a)r ≥ 1 + ra if r ∈ Q. Hence,
(1 + a)x = sup{(1 + a)r | r ∈ Q, r < x} ≥ sup{1 + ra | r ∈ Q, r < x} = 1 + xa.
Now let −1 ≤ a < 0. Then r < x implies ra > xa, and Proposition 1.29 (a) implies
(1 + a)r ≥ 1 + ra > 1 + xa.
(1.20)
By definition of the power with a real exponent, see (1.4)
(1 + a)x =
1
sup{(1/(a + 1))r | r ∈ Q, r < x}
=
HW 2.1
inf{(1 + a)r | r ∈ Q, r < x}.
Taking in (1.20) the infimum over all r ∈ Q with r < x we obtain
(1 + a)x = inf{(1 + a)r | r ∈ Q, r < x} ≥ 1 + xa.
(b) The proof is analogous, so we omit it.
Proposition 1.31 (Young’s inequality) If a, b ∈ R+ and p > 1, then
ab ≤
1 p 1 q
a + b,
p
q
(1.21)
where 1/p + 1/q = 1. Equality holds if and only if ap = bq .
Proof. First note that 1/q = 1 − 1/p. We reformulate Bernoulli’s inequality for y ∈ R+ and
p>1
y p − 1 ≥ p(y − 1) ⇐⇒
1 p
1
1
(y − 1) + 1 ≥ y ⇐⇒ y p + ≥ y.
p
p
q
If b = 0 the statement is always true. If b 6= 0 insert y := ab/bq into the above inequality:
p
1
1 ab
ab
+ ≥ q
q
p b
q
b
1 ap bp 1
ab
+ ≥ q
| · bq
pq
p b
q
b
1 p 1 q
a + b ≥ ab,
p
q
since bp+q = bpq . We have equality if y = 1 or p = 1. The later is impossible by assumption.
y = 1 is equivalent to bq = ab or bq−1 = a or b(q−1)p = ap (b 6= 0). If b = 0 equality holds if
and only if a = 0.
1 Real and Complex Numbers
40
Proposition 1.32 (Hölder’s inequality) Let p > 1, 1/p + 1/q = 1, and x1 , . . . , xn ∈ R+ and
y1 , . . . , yn ∈ R+ non-negative real numbers. Then
n
X
k=1
n
X
xk yk ≤
! p1
xpk
k=1
n
X
ykq
k=1
! 1q
.
(1.22)
We have equality if and only if there exists c ∈ R such that for all k = 1, . . . , n, xpk /ykq = c (they
are proportional).
1
1
P
P
Proof. Set A := ( nk=1 xpk ) p and B := ( nk=1 ykq ) q . The cases A = 0 and B = 0 are trivial. So
we assume A, B > 0. By Young’s inequality we have
1 xpk
xk yk
1 ykq
·
≤
+
A B
p Ap q B q
n
n
n
1 X
1 X p
1 X q
=⇒
xk yk ≤
x +
y
AB k=1
pAp k=1 k qB q k=1 k
1 X p
1 X q
xk + P q
yk
= P p
p xk
q yk
1 1
= + =1
p q
! p1
! q1
n
n
n
X
X
X
.
xk yk ≤
xpk
ykq
=⇒
k=1
k=1
k=1
Equality holds if and only if xpk /Ap = ykq /B q for all k = 1, . . . , n. Therefore, xpk /ykq = const.
Corollary 1.33 (Complex Hölder’s inequality) Let p > 1, 1/p + 1/q = 1 and xk , yk ∈ C,
k = 1, . . . , n. Then
n
X
k=1
| xk yk | ≤
n
X
k=1
p
| xk |
! p1
n
X
k=1
q
| yk |
! 1q
.
Equality holds if and only if | xk |p / | yk |q = const. for k = 1, . . . , n.
Proof. Set xk := | xk | and yk := | yk | in (1.22). This will prove the statement.
Proposition 1.34 (Minkowski’s inequality) If x1 , . . . , xn ∈ R+ and y1 , . . . , yn ∈ R+ and
p ≥ 1 then
n
X
(xk + yk )p
k=1
! p1
≤
n
X
k=1
xpk
! p1
Equality holds if p = 1 or if p > 1 and xk /yk = const.
+
n
X
k=1
ykp
! p1
.
(1.23)
1.4 Appendix A
41
Proof. The case p = 1 is obvious. Let p > 1. As before let q > 0 be the unique positive number
with 1/p + 1/q = 1. We compute
n
n
n
n
X
X
X
X
p
p−1
p−1
(xk + yk ) =
(xk + yk )(xk + yk )
=
xk (xk + yk )
+
yk (xk + yk )p−1
k=1
≤
(1.22)
≤
k=1
X
X
xpk
xpk
p1
1/p
X
k
+
(xk + yk )(p−1)q
X
! q1
k=1
+
X
ykp
k=1
1p
1/q X
1/q
q
(xk + yk )p
.
yk
X
(xk + yk )(p−1)q
k
! q1
P
1
1
We can assume that (xk + yk )p > 0. Using 1 − = by taking the quotient of the last
q
p
P
p 1/q
inequality by ( (xk + yk ) ) we obtain the claim.
Equality holds if xpk /(xk + yk )(p−1)q = const. and ykp /(xk + yk )(p−1)q) = const.; that is
xk /yk = const.
Corollary 1.35 (Complex Minkowski’s inequality) If x1 , . . . , xn , y1, . . . , yn ∈ C and p ≥ 1
then
! p1
! p1
! p1
n
n
n
X
X
X
p
p
p
≤
+
.
(1.24)
| xk + yk |
| xk |
| yk |
k=1
k=1
k=1
Equality holds if p = 1 or if p > 1 and xk /yk = λ > 0.
Proof. Using the triangle inequality gives | xk + yk | ≤ | xk | + | yk |; hence
Pn
Pn
p
p
≤
The real version of Minkowski’s inequality
k=1 | xk + yk |
k=1 (| xk | + | yk |) .
now proves the assertion.
If x = (x1 , . . . , xn ) is a vector in Rn or Cn , the (non-negative) number
kxkp :=
n
X
k=1
| xk |
p
! p1
is called the p-norm of the vector x. Minkowski’s inequalitie then reads as
kx + ykp ≤ kxkp + kykp
which is the triangle inequality for the p-norm.
42
1 Real and Complex Numbers
Chapter 2
Sequences and Series
This chapter will deal with one of the main notions of calculus, the limit of a sequence. Although we are concerned with real sequences, almost all notions make sense in arbitrary metric
spaces like Rn or Cn .
Given a ∈ R and ε > 0 we define the ε-neighborhood of a as
Uε (a) := (a − ε, a + ε) = {x ∈ R | a − ε < x < a + ε} = {x ∈ R | | x − a | < ε}.
2.1 Convergent Sequences
A sequence is a mapping x : N → R. To every n ∈ N we associate a real number xn . We write
this as (xn )n∈N or (x1 , x2 , . . . ). For different sequences we use different letters as (an ), (bn ),
(yn ).
Example 2.1 (a) xn = n1 , n1 , (xn ) = 1, 12 , 13 , . . . ;
(b) xn = (−1)n + 1, (xn ) = (0, 2, 0, 2, . . . );
(c) xn = a (a ∈ R fixed), (xn ) = (a, a, . . . ) (constant sequence),
(d) xn = 2n − 1, (xn ) = (1, 3, 5, 7, . . . ) the sequence of odd positive integers.
(e) xn = an (a ∈ R fixed), (xn ) = (a, a2 , a3 , . . . ) (geometric sequence);
Definition 2.1 A sequence (xn ) is said to be convergent to x ∈ R if
For every ε > 0 there exists n0 ∈ N such that n ≥ n0 implies
| xn − x | < ε.
x is called the limit of (xn ) and we write
x = lim xn
n→∞
or simply
x = lim xn
or
xn → x.
If there is no such x with the above property, the sequence (xn ) is said to be divergent.
In other words: (xn ) converges to x if any neighborhood Uε (x), ε > 0, contains “almost all”
elements of the sequence (xn ). “Almost all” means “all but finitely many.” Sometimes we say
“for sufficiently large n” which means the same.
43
2 Sequences and Series
44
This is an equivalent formulation since xn ∈ Uε (x) means x−ε < xn < x+ε, hence | x − xn | <
ε. The n0 in question need not to be the smallest possible.
We write
lim xn = +∞
n→∞
(2.1)
if for all E > 0 there exists n0 ∈ N such that n ≥ n0 implies xn ≥ E. Similarly, we write
lim xn = −∞
n→∞
(2.2)
if for all E > 0 there exists n0 ∈ N such that n ≥ n0 implies xn ≤ −E. In these cases we say
that +∞ and −∞ are improper limits of (xn ). Note that in both cases (xn ) is divergent.
Example 2.2 This is Example 2.1 continued.
(a) limn→∞ n1 = 0. Indeed, let ε > 0 be fixed. We are looking for some n0 with n1 − 0 < ε
for all n ≥ n0 . This is equivalent to 1/ε < n. Choose n0 > 1/ε (which is possible by the
Archimedean property). Then for all n ≥ n0 we have
n ≥ n0 >
1
1
=⇒ < ε =⇒ | xn − 0 | < ε.
ε
n
Therefore, (xn ) tends to 0 as n → ∞.
(b) xn = (−1)n + 1 is divergent. Suppose to the contrary that x is the limit. To ε = 1 there is
n0 such that for n ≥ n0 we have | xn − x | < 1. For even n ≥ n0 this implies | 2 − x | < 1 for
odd n ≥ n0 , | 0 − x | = | x | < 1. The triangle inequality gives
2 = | (2 − x) + x | ≤ | 2 − x | + | x | < 1 + 1 = 2.
This is a contradiction. Hence, (xn ) is divergent.
(c) xn = a. lim xn = a since | xn − a | = | a − a | = 0 < ε for all ε > 0 and all n ∈ N.
(d) lim(2n − 1) = +∞. Indeed, suppose that E > 0 is given. Choose n0 > E2 + 1. Then
n ≥ n0 =⇒ n >
E
+ 1 =⇒ 2n − 2 > E =⇒ xn = 2n − 1 > 2n − 2 > E.
2
This proves the claim. Similarly, one can show that lim −n3 = −∞. But both ((−n)n ) and
(1, 2, 1, 3, 1, 4, 1, 5, . . . ) have no improper limit. Indeed, the first one becomes arbitrarily large
for even n and arbitrarily small for odd n. The second one becomes large for eveb n but is
constant for odd n.
(e) xn = an , (a ≥ 0).
(
1, if a = 1,
lim an =
n→∞
0, if 0 ≤ a < 1.
(an ) is divergent for a > 1. Moreover, lim an = +∞. To prove this let E > 0 be given. By
the Archimedean property of R and since a − 1 > 0 we find m ∈ N such that m(a − 1) > E.
Bernoulli’s inequality gives
am ≥ m(a − 1) + 1 > m(a − 1) > E.
2.1 Convergent Sequences
45
By Lemma 1.23 (b), n ≥ m implies
an ≥ am > E.
This proves the claim.
Clearly (an ) is convergent in cases a = 0 and a = 1 since it is constant then. Let 0 < a < 1
and put b = a1 − 1; then b > 0. Bernoulli’s inequality gives
n
1
= (b + 1)n ≥ 1 + nb > nb
a
1
=⇒ 0 < an < .
nb
1
=
an
Let ε > 0. Choose n0 >
(2.3)
1
1
. Then ε >
and n ≥ n0 implies
εb
n0 b
| an − 0 | = | an | = an <
(2.3)
1
1
≤
< ε.
nb
n0 b
Hence, an → 0.
Proposition 2.1 The limit of a convergent sequence is uniquely determined.
Proof. Suppose that x = lim xn and y = lim xn and x 6= y. Put ε := | x − y | /2 > 0. Then
∃n1 ∈ N ∀n ≥ n1 : | x − xn | < ε,
∃n2 ∈ N ∀n ≥ n2 : | y − xn | < ε.
Choose m ≥ max{n1 , n2 }. Then | x − xm | < ε and | y − xm | < ε. Hence,
| x − y | ≤ | x − xm | + | y − xm | < 2ε = | x − y | .
This contradiction establishes the statement.
Proposition 2.1 holds in arbitrary metric spaces.
Definition 2.2 A sequence (xn ) is said to be bounded if the set of its elements is a bounded set;
i. e. there is a C ≥ 0 such that
| xn | ≤ C
for all n ∈ N.
Similarly, (xn ) is said to be bounded above or bounded below if there exists C ∈ R such that
xn ≤ C or xn ≥ C, respectively, for all n ∈ N
Proposition 2.2 If (xn ) is convergent, then (xn ) is bounded.
2 Sequences and Series
46
Proof. Let x = lim xn . To ε = 1 there exists n0 ∈ N such that | x − xn | < 1 for all n ≥ n0 .
Then | xn | = | xn − x + x | ≤ | xn − x | + | x | < | x | + 1 for all n ≥ n0 . Put
C := max{| x1 | , . . . , | xn0 −1 | , | x | + 1}.
Then | xn | ≤ C for all n ∈ N.
The reversal statement is not true; there are bounded sequences which are not convergent, see
Example 2.1 (b).
Ex Class: If (xn ) has an improper limit, then (xn ) is divergent.
Proof. Suppose to the contrary that (xn ) is convergent; then it is bounded, say | xn | ≤ C for all
n. This contradicts xn > E as well as xn < −E for E = C and sufficiently large n. Hence,
(xn ) has no improper limits, a contradiction.
2.1.1 Algebraic operations with sequences
The sum, difference, product, quotient and absolute value of sequences (xn ) and (yn ) are defined as follows
(xn ) ± (yn ) := (xn ± yn ),
(xn )
xn
:=
, (yn 6= 0)
(yn )
yn
(xn ) · (yn ) := (xn yn ),
| (xn ) | := (| xn |).
Proposition 2.3 If (xn ) and (yn ) are convergent sequences and c ∈ R, then their sum, difference, product, quotient (provided yn 6= 0 and lim yn 6= 0), and their absolute values are also
convergent:
(a) lim(xn ± yn ) = lim xn ± lim yn ;
(b) lim(cxn ) = c lim xn , lim(xn + c) = lim xn + c.
(c) lim(xn yn ) = lim xn · lim yn ;
xn
(d) lim xynn = lim
if yn 6= 0 for all n and lim yn 6= 0;
lim yn
(e) lim | xn | = | lim xn | .
Proof. Let xn → x and yn → y.
(a) Given ε > 0 then there exist integers n1 and n2 such that
n ≥ n1 implies | xn − x | < ε/2 and n ≥ n2 implies | yn − y | < ε/2.
If n0 := max{n1 , n2 }, then n ≥ n0 implies
| (xn + yn ) − (x + y) | ≤ | xn − x | + | yn − y | ≤ ε.
The proof for the difference is quite similar.
(b) follows from | cxn − cx | = | c | | xn − x | and | (xn + c) − (x + c) | = | xn − x |.
(c) We use the identity
xn yn − xy = (xn − x)(yn − y) + x(yn − y) + y(xn − x).
Given ε > 0 there are integers n1 and n2 such that
(2.4)
2.1 Convergent Sequences
47
n ≥ n1 implies | xn − x | <
√
ε and n ≥ n2 implies | yn − y | <
√
ε.
If we take n0 = max{n1 , n2 }, n ≥ n0 implies
| (xn − x)(yn − y) | < ε,
so that
lim (xn − x)(yn − y) = 0.
n→∞
Now we apply (a) and (b) to (2.4) and conclude that
lim (xn yn − xy) = 0.
n→∞
(d) Choosing n1 such that | yn − y | < | y | /2 if n ≥ n1 , we see that
| y | ≤ | y − yn | + | yn | < | y | /2 + | yn | =⇒ | yn | > | y | /2.
Given ε > 0, there is an integer n2 > n1 such that n ≥ n2 implies
| yn − y | < | y |2 ε/2.
Hence, for n ≥ n2 ,
1
yn − y
1
yn − y = yn y
< 2 | yn − y | < ε
| y |2
1
. The general case can be reduced to the above case using (c) and
lim yn
(xn /yn ) = (xn · 1/yn ).
(e) By Lemma 1.12 (e) we have | | xn | − | x | | ≤ | xn − x |. Given ε > 0, there is n0 such that
n ≥ n0 implies | xn − x | < ε. By the above inequality, also | | xn | − | x | | ≤ ε and we are
done.
and we get lim( y1n ) =
Example 2.3 (a) zn := n+1
. Set xn = 1 and yn = 1/n. Then zn = xn + yn and we already
n
= lim 1 + lim n1 = 1 + 0 = 1.
know that lim xn = 1 and lim yn = 0. Hence, lim n+1
n
2
(b) an = 3nn2+13n
. We can write this as
−2
an =
3 + 13
n
.
1 − n22
Since lim 1/n = 0, by Proposition 2.3, we obtain lim 1/n2 = 0 and lim 13/n = 0. Hence
lim 2/n2 = 0 and lim (3 + 13/n) = 3. Finally,
limn→∞ 3 + 13
3
3n2 + 13n
n
=
= = 3.
lim
2
2
n→∞
n −2
1
limn→∞ 1 − n
(c) We introduce the notion of a polynomial and and a rational function.
Given a0 , a1 , . . . , an ∈ R, an 6= 0. The function p : R → R given by p(t) := an tn + an−1 tn−1 +
· · · + a1 t + a0 is called a polynomial. The positive integer n is the degree of the polynomial
2 Sequences and Series
48
p(t), and a1 , . . . an are called the coefficients of p(t). The set of all real polynomials forms a
real vector space denoted by R[x].
Given two polynomials p and q; put D := {t ∈ R | q(t) 6= 0}. Then r = pq is a called a rational
function where r : D → R is defined by
r(t) :=
p(t)
.
q(t)
Polynomials are special rational functions with q(t) ≡ 1. the set of rational functions with real
coefficients form both a real vector space and a field. It is denoted by R(x).
Lemma 2.4 (a) Let an → 0 be a sequence tending to zero with an 6= 0 for every n. Then
(
+∞,
if an > 0 for almost all n;
1
lim
=
n→∞ an
−∞,
if an < 0 for almost all n.
(b) Let yn → a be a sequence converging to a and a > 0. Then yn > 0 for almost all n ∈ N.
Proof. (a) We will prove the case with −∞. Let ε > 0. By assumption there is a positive integer
n0 such that n ≥ n0 implies −ε < an < 0. Tis implies 0 < −an < ε and further a1n < − 1ε < 0.
Suppose E > 0 is given; choose ε = 1/E and n0 as above. Then by the previous argument,
n ≥ n0 implies
1
1
< − = −E.
an
ε
This shows limn→∞ a1n = −∞.
(b) To ε = a there exists n0 such that n ≥ n0 implies | yn − a | < a. That is −a < yn − a < a
or 0 < yn < 2a which proves the claim.
Lemma 2.5 Suppose that p(t) =
ar 6= 0 and bs 6= 0. Then
k
k=0 ak t


0,



 ar ,
p(n)
= br
n→∞ q(n)

+∞,




−∞,
lim
Proof. Note first that
Pr
and q(t) =
Ps
k
k=0 bk t
are real polynomials with
r < s,
r = s,
r>s
and
r>s
and
ar
br
ar
br
> 0,
< 0.
nr ar + ar−1 n1 + · · · + a0 n1r
ar + ar−1 n1 + · · · + a0 n1r
p(n)
1
1
= s
=
·
1
1 =: s−r · cn
1
1
s−r
q(n)
n
n
bs + bs−1 n + · · · + b0 ns
n bs + bs−1 n + · · · + b0 ns
Suppose that r = s. By Proposition 2.3, n1k −→ 0 for all k ∈ N. By the same proposition, the
limits of each summand in the numerator and denominator is 0 except for the first in each sum.
Hence,
limn→∞ ar + ar−1 n1 + · · · + a0 n1r
p(n)
a
= r.
=
lim
1
1
n→∞ q(n)
br
limn→∞ bs + bs−1 n + · · · + b0 ns
2.1 Convergent Sequences
49
Supose now that r < s. As in the previous case, the sequence (cn ) tends to ar /bs but the first
1
factor ns−r
tends to 0. Hence, the product sequence tends to 0.
Suppose that r > s and abrs > 0. The sequence (cn ) has a positive limit. By Lemma 2.4 (b)
almost all cn > 0. Hence,
q(n)
1 1
dn :=
= r−s
p(n)
n cn
tends
above part and dn > 0 for almost all n. By Lemma 2.4 (a), the sequence
to 0 by the
p(n)
1
= q(n) tends to +∞ as n → ∞, which proves the claim in the first case. The case
dn
ar /bs < 0 can be obtained by multiplying with −1 and noting that limn→∞ xn = +∞ implies
limn→∞ (−xn ) = −∞.
In the German literature the next proposition is known as the ‘Theorem of the two policemen’.
Proposition 2.6 (Sandwich Theorem) Let an , bn and xn be real sequences with an ≤ xn ≤ bn
for all but finitely many n ∈ N. Further let limn→∞ an = limn→∞ bn = x. Then xn is also
convergent to x.
Proof. Let ε > 0. There exist n1 , n2 , and n3 ∈ N such n ≥ n1 implies an ∈ Uε (x), n ≥ n2
implies bn ∈ Uε (x), and n ≥ n3 implies an ≤ xn ≤ bn . Choosing n0 = max{n1 , n2 , n3 },
n ≥ n0 implies xn ∈ Uε (x). Hence, xn → x.
Remark. (a) If two sequences (an ) and (bn ) differ only in finitely many elements, then both
sequences converge to the same limit or both diverge.
(b) Define the “shifted sequence” bn := an+k , n ∈ N, where k is a fixed positv integer. Then
both sequences converge to the same limit or both diverge.
2.1.2 Some special sequences
1
(a) If p > 0, then lim p = 0.
n→∞ n
√
(b) If p > 0, then lim n p = 1.
n→∞
√
(c) lim n n = 1.
n→∞
nα
(d) If a > 1 and α ∈ R, then lim n = 0.
n→∞ a
Proposition 2.7
Proof. (a) Let ε > 0. Take n0 > (1/ε)1/p (Note that the Archimedean Property of the real
numbers is used here). Then n ≥ n0 implies 1/np < ε.
√
(b) If p > 1, put xn = n p−1. Then, xn > 0 and by Bernoulli’s inequality (that is by homework
1
1
that is
4.1) we have p n = (1 + p − 1) n < 1 + p−1
n
0 < xn ≤
1
(p − 1)
n
By Proposition 2.6, xn → 0. If p = 1, (b) is trivial, and if 0 < p < 1 the result is obtained by
taking reciprocals.
2 Sequences and Series
50
(c) Put xn =
√
n
n − 1. Then xn ≥ 0, and, by the binomial theorem,
n = (1 + xn )n ≥
Hence
n(n − 1) 2
xn .
2
r
2
, (n ≥ 2).
n−1
√
1
→ 0. Applying the sandwich theorem again, xn → 0 and so n n → 1.
By (a), √n−1
(d) Put p = a − 1, then p > 0. Let k be an integer such that k > α, k > 0. For n > 2k,
n k n(n − 1) · · · (n − k + 1) k nk pk
n
p =
p > k .
(1 + p) >
k
k!
2 k!
0 ≤ xn ≤
Hence,
0<
nα
nα
2k k! α−k
=
<
n
an
(1 + p)n
pk
(n > 2k).
Since α − k < 0, nα−k → 0 by (a).
Q 1. Let (xn ) be a convergent sequence, xn → x. Then the sequence of arithmetic means
n
X
sn := n1
xk also converges to x.
k=1
Q 2. Let (xn ) > 0 a convergent sequence of positive numbers with and lim xn = x > 0. Then
√
n
x1 x2 · · · xn → x. Hint: Consider yn = log xn .
2.1.3 Monotonic Sequences
Definition 2.3 A real sequence (xn ) is said to be
(a) monotonically increasing if xn ≤ xn+1 for all n;
(b) monotonically decreasing if xn ≥ xn+1 for all n.
The class of monotonic sequences consists of the increasing and decreasing sequences.
A sequence is said to be strictly monotonically increasing or decreasing if xn < xn+1 or xn >
xn+1 for all n, respectively. We write xn ր and xn ց.
Proposition 2.8 A monotonic and bounded sequence is convergent. More precisely, if (xn ) is
increasing and bounded above, then lim xn = sup{xn }. If (xn ) is decreasing and bounded
below, then lim xn = inf{xn }.
Proof. Suppose xn ≤ xn+1 for all n (the proof is analogous in the other case). Let E := {xn |
n ∈ N} and x = sup E. Then xn ≤ x, n ∈ N. For every ε > 0 there is an integer n0 ∈ N such
that
x − ε < xn0 < x,
for otherwise x − ε would be an upper bound of E. Since xn increases, n ≥ n0 implies
x − ε < xn < x,
2.1 Convergent Sequences
51
which shows that (xn ) converges to x.
cn
with some fixed c > 0. We will show that xn → 0 as n → ∞.
n!
Writing (xn ) recursively,
Example 2.4 Let xn =
xn+1 =
c
xn ,
n+1
(2.5)
c
<
n+1
xn . On the other hand, xn > 0 for all n such that (xn ) is bounded below by 0. By Proposition 2.8, (xn ) converges to some x ∈ R. Taking the limit n → ∞ in (2.5), we have
we observe that (xn ) is strictly decreasing for n ≥ c. Indeed, n ≥ c implies xn+1 = xn
c
· lim xn = 0 · x = 0.
n→∞ n + 1 n→∞
x = lim xn+1 = lim
n→∞
Hence, the sequence tends to 0.
2.1.4 Subsequences
Definition 2.4 Let (xn ) be a sequence and (nk )k∈N a strictly increasing sequence of positive
integers nk ∈ N. We call (xnk )k∈N a subsequence of (xn )n∈N . If (xnk ) converges, its limit is
called a subsequential limit of (xn ).
Example 2.5 (a) xn = 1/n, nk = 2k . then (xnk ) = (1/2, 1/4, 1/8, . . . ).
(b) (xn ) = (1, −1, 1, −1, . . . ). (x2k ) = (−1, −1, . . . ) has the subsequential limit −1; (x2k+1 ) =
(1, 1, 1, . . . ) has the subsequential limit 1.
Proposition 2.9 Subsequences of convergent sequences are convergent with the same limit.
Proof. Let lim xn = x and xnk be a subsequence. To ε > 0 there exists m0 ∈ N such that
n ≥ m0 implies | xn − x | < ε. Since nm ≥ m for all m, m ≥ m0 implies | xnm − x | < ε;
hence lim xnm = x.
Definition 2.5 Let (xn ) be a sequence. We call x ∈ R a limit point of (xn ) if every neighborhood of x contains infinitely many elements of (xn ).
Proposition 2.10 The point x is limit point of the sequence (xn ) if and only if x is a subsequential limit.
Proof. If lim xnk = x then every neighborhood Uε (x) contains all but finitely many xnk ; in
k→∞
particular, it contains infinitely many elements xn . That is, x is a limit point of (xn ).
Suppose x is a limit point of (xn ). To ε = 1 there exists xn1 ∈ U1 (x). To ε = 1/k there exists
nk with xnk ∈ U1/k (x) and nk > nk−1 . We have constructed a subsequence (xnk ) of (xn ) with
| x − xnk | <
1
;
k
2 Sequences and Series
52
Hence, (xnk ) converges to x.
Question: Which sequences do have limit points? The answer is: Every bounded sequence has
limit points.
Proposition 2.11 (Principle of nested intervals) Let In := [an , bn ] a sequence of closed
nested intervals In+1 ⊆ In such that their lengths bn − an tend to 0:
Given ε > 0 there exists n0 such that 0 ≤ bn − an < ε for all n ≥ n0 .
For any such interval sequence {In } there exists a unique real number x ∈ R which is a member
T
of all intervals, i. e. {x} = n∈N In .
Proof. Since the intervals are nested, (an ) ր is an increasing sequence bounded above by each
of the bk , and (bn ) ց is decreasing sequence bounded below by each of the ak . Consequently,
by Proposition 2.8 we have
∃x = lim an = sup{an } ≤ bm ,
n→∞
for all m, and ∃y = lim bm = inf{bm } ≥ x.
Since an ≤ x ≤ y ≤ bn for all n ∈ N
∅ 6= [x, y] ⊆
m→∞
\
N
In .
n∈
T
We show the converse inclusion namely that n∈N [an , bn ] ⊆ [x, y]. Let p ∈ In for all n, that
is, an ≤ p ≤ bn for all n ∈ N. Hence supn an ≤ p ≤ inf n bn ; that is p ∈ [x, y]. Thus,
T
[x, y] = n∈N In . We show uniqueness, that is x = y. Given ε > 0 we find n such that
y − x ≤ bn − an ≤ ε. Hence y − x ≤ 0; therefore x = y. The intersection contains a unique
point x.
Proposition 2.12 (Bolzano–Weierstraß) A bounded real sequence has a limit point.
Proof. We use the principle of nested intervals. Let (xn ) be bounded, say | xn | ≤ C. Hence,
the interval [−C, C] contains infinitely many xk . Consider the intervals [−C, 0] and [0, C]. At
least one of them contains infinitely many xk , say I1 := [a1 , b1 ]. Suppose, we have already
constructed In = [an , bn ] of length bn − an = C/2n−2 which contains infinitely many xk .
Consider the two intervals [an , (an + bn )/2] and [(an + bn )/2, bn ] of length C/2n−1 . At least
one of them still contains infinitely many xk , say In+1 := [an+1 , bn+1 ]. In this way we have
constructed a nested sequence of intervals which length go to 0. By Proposition 2.11, there
T
exists a unique x ∈ n∈N In . We will show that x is a subsequential limit of (xn ) (and hence a
limit point). For, choose xnk ∈ Ik ; this is possible since Ik contains infinitely many xm . Then,
ak ≤ xnk ≤ bk for all k ∈ N. Proposition 2.6 gives x = lim ak ≤ lim xnk ≤ lim bk = x; hence
lim xnk = x.
Remark. The principle of nested intevals is equivalent to the order completeness of R.
2.1 Convergent Sequences
53
Example 2.6 (a) xn = (−1)n−1 + n1 ; set of limit points is {−1, 1}. First note that −1 =
limn→∞ x2n and 1 = limn→∞ x2n+1 are subsequential limits of (xn ). We show, for example,
that 13 is not a limit point. Indeed, for n ≥ 4, there exists a small neighborhood of 13 which has
no intersection with U 1 (1) and U 1 (−1). Hence, 13 is not a limit point.
4
hni 4
, where [x] denotes the least integer less than or equal to x ([π] = [3] = 3,
(b) xn = n − 5
5
[−2.8] = −3, [1/2] = 0). (xn ) = (1, 2, 3, 4, 0, 1, 2, 3, 4, 0, . . . ); the set of limit points is
{0, 1, 2, 3, 4}
(c) One can enumerate the rational numbers in (0, 1) in the following way.
1
,
x1 ,
2
1
2
, 3,
x2 , x3
3
1
2
3
, 4,
, x4 , x5 , x6 ,
4
4
···
···
The set of limit points is the whole interval [0, 1] since in any neighborhood of any real number
there is a rational number, see Proposition 1.11 (b) 1.11 (b). Any rational number of (0, 1)
appears infinitely often in this sequence, namely as pq = 2p
= 3p
= ···.
2q
3q
(d) xn = n has no limit point. Since (xn ) is not bounded, Bolzano-Weierstraß fails to apply.
Definition 2.6 (a) Let (xn ) be a bounded sequence and A its set of limit points. Then sup A is
called the upper limit of (xn ) and inf A is called the lower limit of (xn ). We write
lim xn , and
n→∞
= lim xn .
n→∞
for the upper and lower limits of (xn ), respectively.
(b) If (xn ) is not bounded above, we write lim xn = +∞. If moreover +∞ is the only limit
point, lim xn = +∞, and we can also write lim xn = +∞. If (xn ) is not bounded below,
lim xn = −∞.
Proposition 2.13 Let (xn ) be a bounded sequence and A the set of limit points of (xn ).
Then lim xn and lim xn are also limit points of (xn ).
Proof. Let x = lim xn . Let ε > 0. By the definition of the supremum of A there exists x′ ∈ A
with
ε
x − < x′ < x.
2
Since x′ is a limit point, U 2ε (x′ ) contains infinitely many elements xk . By construction,
U 2ε (x′ ) ⊆ Uε (x). Indeed, x ∈ U 2ε (x′ ) implies | x − x′ | < 2ε and therefore
| x − x | = | x − x′ + x′ − x | ≤ | x − x′ | + | x′ − x | <
ε ε
+ = ε.
2 2
Hence, x is a limit point, too. The proof for lim xn is similar.
Proposition 2.14 Let b ∈ R be fixed. Suppose (xn ) is a sequence which is bounded above, then
xn ≤ b for all but finitely many n implies
lim xn ≤ b.
n→∞
(2.6)
2 Sequences and Series
54
Similarly, if (xn ) is bounded below, then
xn ≥ b for all but finitely many n implies
lim xn ≥ b.
(2.7)
n→∞
Proof. We prove only the first part for lim xn . Proving statement for lim xn is similar.
Let t := lim xn . Suppose to the contrary that t > b. Set ε = (t − b)/2, then Uε (t) contains
infinitely many xn (t is a limit point) which are all greater than b; this contradicts xn ≤ b for
all but finitely many n. Hence lim xn ≤ b.
Applying the first part to b = supn {xn } and noting that inf A ≤ sup A, we have
inf {xn } ≤ lim xn ≤ lim xn ≤ sup{xn }.
n
n→∞
n→∞
n
The next proposition is a converse statement to Proposition 2.9.
Proposition 2.15 Let (xn ) be a bounded sequence with a unique limit point x. Then (xn )
converges to x.
Proof. Suppose to the contrary that (xn ) diverges; that is, there exists some ε > 0 such that
infinitely many xn are outside Uε (x). We view these elements as a subsequence (yk ) := (xnk )
of (xn ). Since (xn ) is bounded, so is (yk ). By Proposition 2.12 there exists a limit point y of
(yk ) which is in turn also a limit point of (xn ). Since y 6∈ Uε (x), y 6= x is a second limit point;
a contradiction! We conclude that (xn ) converges to x.
Note that t := lim xn is uniquely characterized by the following two properties. For every ε > 0
t − ε <xn ,
for infinitely many n,
xn < t + ε, for almost all n.
(See also homework 6.2) Let us consider the above examples.
n−1
Example 2.7 (a)
+ 1/n; lim xn = −1, lim xn = 1.
h nxin = (−1)
, lim xn = 0, lim xn = 4.
(b) xn = n − 5
5
(c) (xn ) is the sequence of rational numbers of (0, 1); lim xn = 0, lim xn = 1.
(d) xn = n; lim xn = lim xn = +∞.
Proposition 2.16 If sn ≤ tn for all but finitely many n, then
lim sn ≤ lim tn ,
n→∞
n→∞
lim sn ≤ lim tn .
n→∞
n→∞
2.2 Cauchy Sequences
55
Proof. (a) We keep the notations s∗ and t∗ for the upper limits of (sn ) and (tn ), respectively. Set
s = lim sn and t = lim tn . Let ε > 0. By homework 6.3 (a)
=⇒
by assumption
s − ε ≤ sn
for all but finitely many n
s − ε ≤ sn ≤ tn
for all but finitely many n
=⇒ s − ε ≤ lim tn
by Prp. 2.14
=⇒
by first Remark in Subsection 1.1.7
sup{s − ε | ε > 0 } ≤ t
s ≤ t.
(b) The proof for the lower limit follows from (a) and − sup E = inf(−E).
2.2 Cauchy Sequences
The aim of this section is to characterize convergent sequences without knowing their limits.
Definition 2.7 A sequence (xn ) is said to be a Cauchy sequence if:
For every ε > 0 there exists a positive integer n0 such that | xn − xm | < ε for all
m, n ≥ n0 .
The definition makes sense in arbitrary metric spaces. The definition is equivalent to
∀ ε > 0 ∃ n0 ∈ N ∀ n ≥ n0 ∀ k ∈ N : | xn+k − xn | < ε.
Lemma 2.17 Every convergent sequence is a Cauchy sequence.
Proof. Let xn → x. To ε > 0 there is n0 ∈ N such that n ≥ n0 implies xn ∈ Uε/2 (x). By
triangle inequality, m, n ≥ n0 implies
| xn − xm | ≤ | xn − x | + | xm − x | ≤ ε/2 + ε/2 = ε.
Hence, (xn ) is a Cauchy sequence.
Proposition 2.18 (Cauchy convergence criterion) A real sequence is convergent if and only
if it is a Cauchy sequence.
Proof. One direction is Lemma 2.17. We prove the other direction. Let (xn ) be a Cauchy
sequence. First we show that (xn ) is bounded. To ε = 1 there is a positive integer n0 such
that m, n ≥ n0 implies | xm − xn | < 1. In particular | xn − xn0 | < 1 for all n ≥ n0 ; hence
| xn | < 1 + | xn0 |. Setting
C = max{| x1 | , | x2 | , . . . , | xn0 −1 | , | xn0 | + 1},
2 Sequences and Series
56
| xn | < C for all n.
By Proposition 2.12 there exists a limit point x of (xn ); and by Proposition 2.10 a subsequence
(xnk ) converging to x. We will show that limn→∞ xn = x. Let ε > 0. Since xnk → x we find
k0 ∈ N such that k ≥ k0 implies | xnk − x | < ε/2. Since (xn ) is a Cauchy sequence, there
exists n0 ∈ N such that m, n ≥ n0 implies | xn − xm | < ε/2. Put n1 := max{n0 , nk0 } and
choose k1 with nk1 ≥ n1 ≥ nk0 . Then n ≥ n1 implies
| x − xn | ≤ x − xnk1 + xnk1 − xn < 2 · ε/2 = ε.
Example 2.8 (a) xn =
sequence. For, consider
Pn
1
k=1 k
= 1+
x2m − xm =
1
2
+
1
3
+ · · · + n1 . We show that (xn ) is not a Cauchy
2m
2m
X
X
1
1
1
1
≥
=m·
= .
k k=m+1 2m
2m
2
k=m+1
Hence, there is no n0 such that p, n ≥ n0 implies | xp − xn | < 12 .
n
X
(−1)k+1
(b) xn =
= 1 − 1/2 + 1/3 − + · · · + (−1)n+1 1/n. Consider
k
k=1
1
1
1
1
k+1
xn+k − xn = (−1)
−
+
− · · · + (−1)
n+1 n+2 n+3
n+k
1
1
1
1
+
+···
−
−
= (−1)n
n+1 n+2
n+3 n+4
(
1
1
, k even
− n+k
n+k−1
+
1
,
k odd
n+k
n
Since all summands in parentheses are positive, we conclude
(
1
1
− n+k
,
1
1
1
1
n+k−1
+
+···+
−
−
| xn+k − xn | =
1
n+1 n+2
n+3 n+4
,
n+k
(
"
1
,
k even
1
1
1
+ · · · + n+k 1
−
−
=
1
n+1
n+2 n+3
−
, k even
n+k−1
k even
k odd
n+k
1
,
| xn+k − xn | <
n+1
since all summands in parentheses are positive. Hence, (xn ) is a Cauchy sequence and converges.
2.3 Series
57
2.3 Series
Definition 2.8 Given a sequence (an ), we associate with (an ) a sequence (sn ), where
sn =
n
X
k=1
ak = a1 + a2 + · · · + an .
For (sn ) we also use the symbol
∞
X
ak ,
(2.8)
k=1
and we call it an infinite series or just a series. The numbers sn are called the partial sums of
the series. If (sn ) converges to s, we say that the series converges, and write
∞
X
ak = s.
k=1
The number s is called the sum of the series.
Remarks 2.1 (a) The sum of a series should be clearly understood as the limit of the sequence
of partial sums; it is not simply obtained by addition.
(b) If (sn ) diverges, the series is said to be divergent.
P
(c) The symbol ∞
as well as the limit of this
k=1 ak means both, the sequence of partial sums
P∞
sequence (if it exists). Sometimes we use series of the form k=k0 ak , k0 ∈ N. We simply
P
write ak if there is no ambiguity about the bounds of the index k.
Example 2.9 (Example 2.8 continued)
∞
X
1
is divergent. This is the harmonic series.
(1)
n
n=1
∞
X
1
(2)
(−1)n+1 is convergent. It is an example of an alternating series (the summands are
n
n=1
changing their signs, and the absolute value of the summands form a decreasing to 0 sequence).
∞
X
P n
1
q n is called the geometric series. It is convergent for | q | < 1 with ∞
(3)
0 q = 1−q . This
n=0
n
X
1 − q n+1
, see proof of Lemma 1.14, first formula with y = 1, x = q.
is seen from
q =
1−q
k=0
The series diverges for | q | ≥ 1. The general formula in case | q | < 1 is
k
∞
X
cq n =
n=n0
cq n0
.
1−q
2.3.1 Properties of Convergent Series
P
P
an is convergent, then ∞
Lemma 2.19 (1) If ∞
n=1
k=m ak is convergent for any m ∈ N.
P
P
(2) If an is convergent, then the sequence rn := ∞
k=n+1 ak tends to 0 as n → ∞.
(2.9)
2 Sequences and Series
58
(3) If (an ) is a sequence of nonnegative real numbers, then
partial sums are bounded.
P
an converges if and only if the
P
P∞
Proof. (1). Suppose that ∞
a = s − (a1 + a2 + · · · + am−1 ).
n=1 an = s; we show that
n=m
P∞k
P
Indeed, let (sn ) and (tn ) denote the nth partial sums of k=1 ak and ∞
k=m ak , respectively.
Pm−1
Then for n > m one has tn = sn − k=1 ak . Taking the limit n → ∞ proves the claim.
P∞
P
We prove (2). Suppose that ∞
1 an converges to s. By (1), rn =
k=n+1 ak is also a convergent
series for all n. We have
∞
X
ak =
k=1
n
X
ak +
k=1
=⇒ s = sn + rn
∞
X
ak
k=n+1
=⇒ rn = s − sn
=⇒ lim rn = s − s = 0.
n→∞
(3) Suppose an ≥ 0, then sn+1 = sn + an+1 ≥ sn . Hence, (sn ) is an increasing sequence. By
Proposition 2.8, (sn ) converges.
The other direction is trivial since every convergent sequence is bounded.
Proposition 2.20 (Cauchy criterion)
integer n0 ∈ N such that
P
an converges if and only if for every ε > 0 there is an
n
X
ak < ε
(2.10)
k=m
if m, n ≥ n0 .
Proof. Clear from Proposition 2.18. Consider the sequence of partial sums sn =
P
note that for n ≥ m one has| sn − sm−1 | = | nk=m ak |.
Corollary 2.21 If
P
Pn
k=1 ak
and
an converges, then (an ) converges to 0.
Proof. Take m = n in (2.10); this yields | an | < ε. Hence (an ) tends to 0.
Proposition 2.22 (Comparison test) (a) If | an | ≤ Cbn for some C > 0 and for almost all
P
P
n ∈ N, and if bn converges, then an converges.
P
P
(b) If an ≥ Cdn ≥ 0 for some C > 0 and for almost all n, and if
dn diverges, then
an
diverges.
2.3 Series
59
Proof. (a) Suppose n ≥ n1 implies | an | ≤ Cbn . Given ε > 0, there exists n0 ≥ n1 such that
m, n ≥ n0 implies
n
X
ε
bk <
C
k=m
by the Cauchy criterion. Hence
n
n
n
X X
X
ak ≤
| ak | ≤
Cbk < ε,
k=m
k=m
k=m
and (a) follows by the Cauchy criterion.
P
P
(b) follows from (a), for if an converges, so must dn .
2.3.2 Operations with Convergent Series
P
P
Definition 2.9 If
an and
bn are series, we define sums and differences as follows
P
P
P
P
P
can , c ∈ R.
an ± bn := (an ± bn ) and c an :=
Pn
P
P
P
Let cn := k=1 ak bn−k+1 , then cn is called the Cauchy product of an and bn .
P
P
P∞
P∞
P
(a
+
b
)
=
a
+
If an and bn are convergent, it is easy to see that ∞
n
n
n
1
1
1 bn and
Pn
Pn
1 can = c
1 an .
P
√
Caution, the product series cn need not to be convergent. Indeed, let an := bn := (−1)n / n.
P
P
P
One can show that an and bn are convergent (see Proposition 2.29 below), however, cn
P
is not convergent, when cn = nk=1 ak bn−k+1 . Proof: By the arithmetic-geometric mean inP
2
2n
2
≥ n+1
. Hence, | cn | ≥ nk=1 n+1
= n+1
. Since cn doesn’t
equality, | ak bn−k+1 | = √ 1
k(n+1−k)
P∞
converge to 0 as n → ∞, n=0 cn diverges by Corollary 2.21
2.3.3 Series of Nonnegative Numbers
Proposition 2.23 (Compression Theorem) Suppose a1 ≥ a2 ≥ · · · ≥ 0. Then the series
∞
X
an converges if and only if the series
n=1
∞
X
k=0
2k a2k = a1 + 2a2 + 4a4 + 8a8 + · · ·
converges.
Proof. By Lemma 2.19 (3) it suffices to consider boundedness of the partial sums. Let
sn = a1 + · · · + an ,
tk = a1 + 2a2 + · · · + 2k a2k .
(2.11)
2 Sequences and Series
60
For n < 2k
sn ≤ a1 + (a2 + a3 ) + · · · + (a2k + · · · + a2k+1 −1 )
sn ≤ a1 + 2a2 + · · · + 2k a2k = tk .
(2.12)
On the other hand, if n > 2k ,
sn ≥ a1 + a2 + (a3 + a4 ) + · · · + (a2k−1 +1 + · · · + a2k )
1
sn ≥ a1 + a2 + 2a4 + · · · + 2k−1a2k
2
1
sn ≥ tk .
2
(2.13)
By (2.12) and (2.13), the sequences sn and tk are either both bounded or both unbounded. This
completes the proof.
∞
X
1
Example 2.10 (a)
converges if p > 1 and diverges if p ≤ 1.
np
n=1
If p ≤ 0, divergence follows from Corollary 2.21. If p > 0 Proposition 2.23 is applicable, and
we are led to the series
k
∞
∞ X
X
1
k 1
2 kp =
.
2
2p−1
k=0
k=0
1
This is a geometric series wit q = p−1 . It converges if and only if 2p−1 > 1 if and only if
2
p > 1.
(b) If p > 1,
∞
X
n=2
1
n(log n)p
(2.14)
converges; if p ≤ 1, the series diverges. “log n” denotes the logarithm to the base e.
If p < 0, n(log1 n)p > n1 and divergence follows by comparison with the harmonic series. Now let
p > 0. By Lemma 1.23 (b), log n < log(n + 1). Hence (n(log n)p ) increases and 1/(n(log n))p
decreases; we can apply Proposition 2.23 to (2.14). This leads us to the series
∞
X
k=1
2k ·
∞
∞
X
X
1
1
1
1
=
=
,
p
p
p
2k (log 2k )p
(k
log
2)
(log
2)
k
k=1
k=1
and the assertion follows from example (a).
This procedure can evidently be continued. For instance
∞
X
n=3
∞
X
n=3
1
converges.
n log n(log log n)2
1
diverges, whereas
n log n log log n
2.3 Series
61
2.3.4 The Number e
Leonhard Euler (Basel 1707 – St. Petersburg 1783) was one of the greatest mathematicians.
He made contributions to Number Theorie, Ordinary Differential Equations, Calculus of Varian
tions, Astronomy, Mechanics. Fermat (1635) claimed that all numbers of the form fn = 22 +1,
n ∈ N, are prime numbers. This is obviously true for the first 5 numbers (3, 5, 17, 257, 65537).
Euler showed that 641 | 232 + 1. In fact, it is open whether any other element fn is a
prime number. Euler showed that the equation x3 + y 3 = z 3 has no solution in positive integers x, y,z. This is the special case ofFermat’s last theorem. It is known that the limit
1
1 1
γ = lim 1 + + + · · · + − log n exists and gives a finite number γ, the so called
n→∞
2 3
n
Euler constant. It is not known whether γ is rational or not. Further, Euler numbers Er play
P
1
a role in calculating the series n (−1)n (2n+1)
r . Soon, we will speak about the Euler formula
ix
e = cos x + i sin x. More about live and work of famous mathematicians is to be found in
www-history.mcs.st-andrews.ac.uk/
We define
∞
X
1
,
(2.15)
e :=
n!
n=0
where 0! = 1! = 1 by definition. Since
1
1
1
+
+···+
1·2 1·2·3
1 · 2···n
1
1
1
< 1 + 1 + + 2 + · · · + n−1 < 3,
2 2
2
the series converges (by the comparing it with the geometric series with q = 12 ) and the definition makes sense. In fact, the series converges very rapidly and allows us to compute e with
great accuracy. It is of interest to note that e can also defined by means of another limit process.
e is called the Euler number.
sn = 1 + 1 +
Proposition 2.24
Proof. Let
n
1
e = lim 1 +
.
n→∞
n
n
X
1
,
sn =
k!
k=0
(2.16)
n
1
tn = 1 +
.
n
By the binomial theorem,
1 n(n − 1) 1
n(n − 1)(n − 2) 1
n(n − 1) · · · 1 1
· 2+
· 3 +···+
· n
tn = 1 + n +
n
3!
n!
n
2! n n
1
1
1
1
2
= 1+1+
1−
+
1−
1−
+···
2!
n
3!
n
n
1
1
2
n−1
+
1−
1−
··· 1−
n!
n
n
n
1
1
1
≤ 1+1+ + +···+
2! 3!
n!
2 Sequences and Series
62
Hence, tn ≤ sn , so that by Proposition 2.16
lim tn ≤ lim sn = lim sn = e.
n→∞
n→∞
n→∞
(2.17)
Next if n ≥ m,
1
1
1
m−1
1
1−
+···+
1−
··· 1−
.
tn ≥ 1 + 1 +
2!
n
m!
n
n
Let n → ∞, keeping m fixed again by Proposition 2.16 we get
lim tn ≥ 1 + 1 +
n→∞
1
1
+···+
= sm .
2!
m!
Letting m → ∞, we finally get
e ≤ lim tn .
(2.18)
n→∞
The proposition follows from (2.17) and (2.18).
P
The rapidity with which the series 1/n! converges can be estimated as follows.
1
1
+
+···
(n + 1)! (n + 2)!
1
1
1
1
1
1
1+
+
·
+··· =
<
1 =
2
(n + 1)!
n + 1 (n + 1)
(n + 1)! 1 − n+1
n! n
e − sn =
so that
0 < e − sn <
1
.
n! n
(2.19)
We use the preceding inequality to compute e. For n = 9 we find
s9 = 1 + 1 +
1
1
1
1
1
1
1 1
+ +
+
+
+
+
+
= 2.718281526...
2 6 24 120 720 5040 40, 320 362, 880
(2.20)
By (2.19)
3.1
107
such that the first six digits of e in (2.20) are correct.
e − s9 <
Example 2.11
n
n
1
1
1
n−1
n = lim
(a) lim 1 −
= lim
= lim
n−1
n
1
n→∞
n→∞
n→∞
n→∞ 1 +
n
n
1+
n−1
n−1
4n
4n
4n
3n + 1
3n + 1
3n
= lim
· lim
(b) lim
n→∞ 3n − 1
n→∞
n→∞ 3n − 1
3n
3n ! 34
3n ! 34
8
1
1
· lim
= e3.
= lim
1+
1+
n→∞
n→∞
3n
3n − 1
1
n−1
1
= .
e
2.3 Series
63
Proposition 2.25 e is irrational.
Proof. Suppose e is rational, say e = p/q with positive integers p and q. By (2.19)
1
0 < q!(e − sq ) < .
q
(2.21)
By our assumption, q!e is an integer. Since
1
1
q!sq = q! 1 + 1 + + · · · +
2!
q!
is also an integer, we see that q!(e − sq ) is an integer. Since q ≥ 1, (2.21) implies the existence
of an integer between 0 and 1 which is absurd.
2.3.5 The Root and the Ratio Tests
p
P
Theorem 2.26 (Root Test) Given an , put α = lim n | an |.
n→∞
Then
P
(a) if α < 1, an converges;
P
(b) if α > 1, an diverges;
(c) if α = 1, the test gives no information.
Proof. (a) If α < 1 choose β such that α < β < 1, and an integer n0 such that
p
n
| an | < β
p
for n ≥ n0 (such n0 exists since α is the supremum of the limit set of ( n | an |)). That is,
n ≥ n0 implies
| an | < β n .
P n
P
Since 0 < β < 1,
β converges. Convergence of
an now follows from the comparison
test.
p
(b) If α > 1 there is a subsequence (ank ) such that nk | ank | → α. Hence | an | > 1 for infinitely
many n, so that the necessary condition for convergence, an → 0, fails.
X1
X 1
. For each of the series α = 1, but the first
To prove (c) consider the series
and
n
n2
diverges, the second converges.
p
P
Remark. (a) an converges, if there exists q < 1 such that n | an | ≤ q for almost all n.
P
(b) an diverges if | an | ≥ 1 for infinitely many n.
P
Theorem 2.27 (Ratio Test) The series an
an+1 < 1,
(a) converges if lim n→∞
an an+1 ≥ 1 for all but finitely many n.
(b) diverges if an 2 Sequences and Series
64
In place of (b) one can also use the
(weaker) statement
an+1 P
> 1.
(b’) an diverges if lim an n→∞
an+1
Indeed, if (b’) is satisfied, almost all elements of the sequence an
Corollary 2.28 The series
P
an
are ≥ 1.
an+1 < 1,
(a) converges if lim a
n
an+1 > 1.
(b) diverges if lim an Proof of Theorem 2.27. If condition (a) holds, we can find β < 1 and an integer m such that
n ≥ m implies
an+1 an < β.
In particular,
| am+1 | < β | am | ,
| am+2 | < β | am+1 | < β 2 | am | ,
...
| am+p | < β p | am | .
That is,
| an | <
| am | n
β
βm
P
for n ≥ m, and (a) follows from the comparison test, since β n converges. If | an+1 | ≥ | an |
for n ≥ n0 , it is seen that the condition an → 0 does not hold, and (b) follows.
Remark 2.2 Homework 7.5 shows that in (b) “all but finitely many” cannot be replaced by the
weaker assumption “infinitely many.”
Example 2.12 (a) The series
P∞
2
n
n=0 n /2
converges since, if n ≥ 3,
2
2
an+1 (n + 1)2 2n
1
1
1
1
8
=
1+
1+
=
≤
= < 1.
an n+1
2
2 n
2
n
2
3
9
(b) Consider the series
1 1
1
1
1
1
1
+1+ + +
+
+
+
+···
2
8 4 32 16 128 64
=
1
1
1
1
1
1
+ 0 + 3 + 2 + 5 + 4 +···
1
2
2
2
2
2
2
2.3 Series
65
an+1
an+1
1
√
= , lim
= 2, but lim n an = 12 . Indeed, a2n = 1/22n−2 and a2n+1 =
8 n→∞ an
n→∞ an
1/22n+1 yields
a2n
1
a2n+1
= ,
= 2.
a2n
8
a2n−1
where lim
The root test indicates convergence; the ratio test does not apply.
X 1
X1
and
(c) For
both the ratio and the root test do not apply since both (an+1 /an ) and
n
n2
√
n
( an ) converge to 1.
The ratio test is frequently easier to apply than the root test. However, the root test has wider
scope.
Remark 2.3 For any sequence (cn ) of positive real numbers,
√
√
cn+1
cn+1
≤ lim n cn ≤ lim n cn ≤ lim
.
n→∞
n→∞ cn
n→∞ cn
n→∞
lim
For the proof, see [Rud76, 3.37 Theorem]. In particular, if lim
exists and both limits coincide.
cn+1
√
exists, then lim n cn also
cn
P
P
bn =
Proposition 2.29 (Leibniz criterion) Let
bn be an alternating serie, that is
P
n+1
(−1) an with a decreasing sequence of positive numbers a1 ≥ a2 ≥ · · · ≥ 0. If lim an = 0
P
then bn converges.
Proof. The proof is quite the same as in Example 2.8 (b). We find for the partial sums sn of
P
bn
| sn − sm | ≤ am+1
P
if n ≥ m. Since (an ) tends to 0, the Cauchy criterion applies to (sn ). Hence,
bn is
convergent.
2.3.6 Absolute Convergence
The series
P
an is said to converge absolutely if the series
Proposition 2.30 If
P
an converges absolutely, then
P
plus the Cauchy criterion.
k=m
| an | converges.
an converges.
Proof. The assertion follows from the inequality
n
n
X
X
ak ≤
| ak |
k=m
P
2 Sequences and Series
66
Remarks 2.4 For series with positive terms, absolute convergence is the same as convergence.
P
P
P
If
an converges but
| an | diverges, we say that
an converges nonabsolutely. For inP
n+1
stance (−1) /n converges nonabsolutely. The comparison test as well as the root and
the ratio tests, is really a test for absolute convergence and cannot give any information about
nonabsolutely convergent series.
We shall see that we may operate with absolutely convergent series very much as with finite
sums. We may multiply them, we may change the order in which the additions are carried out
without effecting the sum of the series. But for nonabsolutely convergent sequences this is no
longer true and more care has to be taken when dealing with them.
Without proof we mention the fact that one can multiply absolutely convergent series; for the
proof, see [Rud76, Theorem 3.50].
Proposition 2.31 If
cn =
n
X
k=0
P
an converges absolutely with
ak bn−k , n ∈ Z+ . Then
∞
X
∞
X
an = A,
n=0
P
bn converges,
∞
X
bn = B,
n=0
cn = AB.
n=0
2.3.7 Decimal Expansion of Real Numbers
Proposition 2.32 (a) Let α be a real number with 0 ≤ α < 1. Then there exists a sequence
(an ), an ∈ {0, 1, 2, . . . , 9} such that
α=
∞
X
an 10−n .
(2.22)
n=1
The sequence (an ) is called a decimal expansion of α.
(b) Given a sequence (ak ), ak ∈ {0, 1, . . . , 9}, then there exists a real number α ∈ [0, 1] such
that
∞
X
α=
an 10−n .
n=1
Proof. (b) Comparison with the geometric series yields
∞
X
n=1
P∞
an 10−n ≤ 9
∞
X
n=1
10−n =
1
9
·
= 1.
10 1 − 1/10
Hence the series n=1 an 10−n converges to some α ∈ [0, 1].
(a) Given α ∈ [0, 1) we use induction to construct a sequence (an ) with (2.22) and
sn ≤ α < sn + 10
−n
,
where
sn =
n
X
ak 10−k .
k=1
First, cut [0, 1] into 10 pieces Ij := [j/10, (j + 1)/10), j = 0, . . . , 9, of equal length. If α ∈ Ij ,
put a1 := j. Then,
a1
1
s1 =
≤ α < s1 + .
10
10
2.3 Series
67
Suppose a1 , . . . , an are already constructed and
sn ≤ α < sn + 10−n .
Consider the intervals Ij := [sn + j/10n+1, sn + (j + 1)/10n+1), j = 0, . . . , 9. There is exactly
one j such that α ∈ Ij . Put an+1 := j, then
an+1
an+1 + 1
≤ α < sn +
n+1
10
10n+1
−n−1
≤ α < sn+1 + 10
.
sn +
sn+1
The induction step is complete. By construction | α − sn | < 10−n , that is, lim sn = α.
Remarks 2.5 (a) The proof shows that any real number α ∈ [0, 1) can be approximated by
rational numbers.
(b) The construction avoids decimal expansion of the form α = . . . a9999 . . . , a < 9, and gives
instead α = . . . (a + 1)000 . . . . It gives a bijective correspondence between the real numbers of
the interval [0, 1) and the sequences (an ), an ∈ {0, 1, . . . , 9}, not ending with nines. However,
the sequence (an ) = (0, 1, 9, 9, · · · ) corresponds to the real number 0.02.
(c) It is not difficult to see that α ∈ [0, 1) is rational if and only if there exist positive integers
n0 and p such that n ≥ n0 implies an = an+p —the decimal expansion is periodic from n0 on.
2.3.8 Complex Sequences and Series
Almost all notions and theorems carry over from real sequences to complex sequences. For
example
A sequence (zn ) of complex numbers converges to z if for every (real) ε > 0 there
exists a positive integer n0 ∈ N such that n ≥ n0 implies
| z − zn | < ε.
The following proposition shows that convergence of a complex sequence can be reduced to the
convergence of two real sequences.
Proposition 2.33 The complex sequence (zn ) converges to some complex number z if and only
if the real sequences ( Re zn ) converges to Re z and the real sequence ( Im zn ) converges to
Im z.
Proof. Using the (complex) limit law lim(zn + c) = c + lim zn it is easy to see that we
can restrict ourselves to the case z = 0. Suppose first zn → 0. Proposition 1.20 (d) gives
| Re zn | ≤ | zn |. Hence Re zn tends to 0 as n → ∞. Similarly, | Im zn | ≤ | zn | and therefore
Im zn → 0.
Suppose now xn := Re zn → 0 and yn := Im zn → 0 as n goes to infinity. Since
2 Sequences and Series
68
| zn |2 = x2n + yn2 , | zn |2 → 0 as n → ∞; this implies zn → 0.
Since the complex field C is not an ordered field, all notions and propositions where the order
is involved do not make sense for complex series or they need modifications. The sandwich
theorem does not hold; there is no notion of monotonic sequences, upper and lower limits. But
still there are bounded sequences (| zn | ≤ C), limit points, subsequences, Cauchy sequences,
series, and absolute convergence. The following theorems are true for complex sequences, too:
Proposition/Lemma/Theorem 1, 2, 3, 9, 10, 12, 15, 17, 18
The Bolzano–Weierstraß Theorem for bounded complex sequences (zn ) can be proved by considering the real and the imaginary sequences ( Re zn ) and ( Im zn ) separately.
The comparison test for series now reads:
P
(a) If | an | ≤ C | bn | for some C > 0 and for almost all n ∈ N, and if
| bn |
P
converges, then an converges.
P
(b) If | an | ≥ C | dn | for some C > 0 and for almost all n, and if | dn | diverges,
P
then an diverges.
The Cauchy criterion, the root, and the ratio tests are true for complex series as well. Propositions 19, 20, 26, 27, 28, 30, 31 are true for complex series.
2.3.9 Power Series
Definition 2.10 Given a sequence (cn ) of complex numbers, the series
∞
X
cn z n
(2.23)
n=0
is called a power series. The numbers cn are called the coefficients of the series; z is a complex
number.
In general, the series will converge or diverge, depending on the choice of z. More precisely,
with every power series there is associated a circle with center 0, the circle of convergence, such
that (2.23) converges if z is in the interior of the circle and diverges if z is in the exterior. The
radius R of this disc of convergence is called the radius of convergence.
On the disc of convergence, a power series defines a function since it associates to each z with
P
| z | < R a complex number, namely the sum of the numerical series n cn z n . For example,
P∞ n
1
n=0 z defines the function f (z) = 1−z for | z | < 1. If almost all coefficients cn are 0, say
cn = 0 for all n ≥ m + 1, the power series is a finite sum and the corresponding function is a
Pm
P
n
n
2
m
polynomial: ∞
n=0 cn z =
n=0 cn z = c0 + c1 z + c2 z + · · · + cm z .
P
Theorem 2.34 Given a power series cn z n , put
p
1
(2.24)
α = lim n | cn |, R = .
n→∞
α
P
If α = 0, R = +∞; if α = +∞, R = 0. Then
cn z n converges if | z | < R, and diverges if
| z | > R.
2.3 Series
69
The behavior on the circle of convergence cannot be described so simple.
Proof. Put an = cn z n and apply the root test:
p
p
|z|
n
.
| an | = | z | lim n | cn | =
n→∞
n→∞
R
lim
This gives convergence if | z | < R and divergence if | z | > R.
The nonnegative number R is called the radius of convergence.
Pm
m
Examplep2.13 (a) The series
has cn = 0 for almost all n. Hence α =
n=0 cn z
n
limn→∞ | cn | = limn→∞ 0 = 0 and R = +∞.
P
(b) The series nn z n has R = 0.
X zn
has R = +∞. (In this case the ratio test is easier to apply than the root
(c) The series
n!
test. Indeed,
cn+1 n!
1
= lim
α = lim = lim
= 0,
n→∞
n→∞ (n + 1)!
n→∞ n + 1
cn
and therefore R = +∞. )
P n
(d) The series
z has R = 1. If | z | = 1 diverges since (z n ) does not tend to 0. This
generalizes the geometric series; formula (2.9) still holds if | q | < 1:
n
∞
X
2(i/3)2
i
3+i
2
=
=−
.
3
1
−
i/3
15
n=2
P
(e) The series z n /n has R = 1. It diverges if z = 1. It converges for all other z with | z | = 1
(without proof).
P
(f) The series z n /n2 has R = 1. It converges for all z with | z | = 1 by the comparison test,
since | z n /n2 | = 1/n2 .
2.3.10 Rearrangements
The generalized associative law for finite sums says that we can insert brackets without effecting
the sum, for example, ((a1 + a2 ) + (a3 + a4 )) = (a1 + (a2 + (a3 + a4 ))). We will see that a
similar statement holds for series:
P
P
P
Suppose that k ak is a converging series and l bl is a sum obtained from k ak by “inserting
brackets”, for example
b1 + b2 + b3 + · · · = (a1 + a2 ) + (a3 + · · · + a10 ) + (a11 + a12 ) + · · ·
| {z } |
{z
} | {z }
b1
b2
b3
P
P
Then l bl converges and the sum is the same. If k ak diverges to +∞, the same is true
P
P
P
for l bl . However, divergence of
ak does not imply divergence of
bl in general, since
1 − 1 + 1 − 1 + 1 − 1 + · · · diverges but (1 − 1) + (1 − 1) + · · · converges. For the proof
P
P
let sn = nk=1 and tm = m
l=1 bl . By construction, tm = snm for a suitable subsequence (snm )
2 Sequences and Series
70
P
of the partial sums of k ak . Convergence (proper or improper) of (sn ) implies convergence
P
(proper or improper) of any subsequence. Hence, l bl converges.
For finite sums, the generalized commutative law holds:
a1 + a2 + a3 + a4 = a2 + a4 + a1 + a3 ;
that is, any rearrangement of the summands does not effect the sum. We will see in Example 2.14 below that this is not true for arbitrary series but for absolutely converging ones, (see
Proposition 2.36 below).
Definition 2.11 Let σ : N → N be a bijective mapping, that is in the sequence (σ(1), σ(2), . . . )
every positive integer appears once and only once. Putting
a′n = aσ(n) ,
we say that
P
a′n is a rearrangement of
P
(n = 1, 2, . . . ),
an .
P
P ′
P
If (sn ) and (s′n ) are the partial sums of
an and a rearrangement
an of
an , it is easily
seen that, in general, these two sequences consist of entirely different numbers. We are led to
the problem of determining under what conditions all rearrangements of a convergent series
will converge and whether the sums are necessarily the same.
Example 2.14 (a) Consider the convergent series
∞
X
(−1)n+1
n=1
n
=1−
1 1
+ − +···
2 3
(2.25)
and one of its rearrangements
1−
1 1
1 1 1
1
1
1
− + − − + −
−
+··· .
2 4
3 6 8
5 10 12
(2.26)
If s is the sum of (2.25) then s > 0 since
1 1
1
+
+ · · · > 0.
−
1−
2
3 4
We will show that (2.26) converges to s′ = s/2. Namely
X
1
1
1
1
1 1
1
1
′
′
s =
− +
− +
−
−
−
+···
an = 1 −
2
4
3 6
8
5 10
12
1 1 1 1
1
1
= − + − +
−
+···
2 4 6 8 10 12 1 1 1
1
1
1 − + − + −··· = s
=
2
2 3 4
2
Since s 6= 0, s′ 6= s. Hence, there exist rearrangements which converge; however to a different
limit.
2.3 Series
71
(b) Consider the following rearrangement of the series (2.25)
X
1 1 1
a′n = 1 − + − +
2 3 4
1 1
1
+
− +
+
5 7
6
1
1
1
1
1
−
+
+
+
+
9 11 13 15
8
+···
1
1
1
1
−
+ n
+ · · · + n+1
+···
+
n
2 +1 2 +3
2
−1
2n + 2
Since for every positive integer n ≥ 10
1
1
1
1
1
1
1
1
1
−
+ n
+ · · · + n+1
> 2n−1 · n+1 −
> −
>
n
2 +1 2 +3
2
−1
2n + 2
2
2n + 2
4 2n + 2
5
the rearranged series diverges to +∞.
Without proof (see [Rud76, 3.54 Theorem]) we remark the following surprising theorem. It
shows (together with the Proposition 2.36) that the absolute convergence of a series is necessary
and sufficient for every rearrangement to be convergent (to the same limit).
P
Proposition 2.35 Let
an be a series of real numbers which converges, but not absolutely.
P
Suppose −∞ ≤ α ≤ β ≤ +∞. Then there exists a rearrangement a′n with partial sums s′n
such that
lim s′n = α, lim s′n = β.
n→∞
n→∞
P
Proposition 2.36 If
an is a series of complex numbers which converges absolutely, then
P
every rearrangement of an converges, and they all converge to the same sum.
P
Proof. Let a′n be a rearrangement with partial sums s′n . Given ε > 0, by the Cauchy criterion
P
for the series | an | there exists n0 ∈ N such that n ≥ m ≥ n0 implies
n
X
k=m
| ak | < ε.
(2.27)
Now choose p such that the integers 1, 2, . . . , n0 are all contained in the set σ(1), σ(2), . . . , σ(p).
{1, 2, . . . , n0 } ⊆ {σ(1), σ(2), . . . , σ(p)}.
Then, if n ≥ p, the numbers a1 , a2 , . . . , an0 will cancel in the difference sn − s′n , so that
n
n
n
n
X
X
X
X
| sn − s′n | = ak −
aσ(k) ≤ ±ak ≤
| ak | < ε,
k=1
k=1
k=n0 +1
by (2.27). Hence (s′n ) converges to the same sum as (sn ).
P
The same argument shows that a′n also absolutely converges.
k=n0 +1
2 Sequences and Series
72
2.3.11 Products of Series
If we multiply two finite sums a1 + a2 + · · · + an and b1 + b2 + · · · + bm by the distributive
law, we form all products ai bj put them into a sequence p0 , p1 , · · · , ps , s = mn, and add up
p0 + p1 + p2 + · · · + ps . This method can be generalized to series a0 + a1 + · · · and b0 + b1 + · · · .
Surely, we can form all products ai bj , we can arrange them in a sequence p0 , p1 , p2 , · · · and
form the product series p0 + p1 + · · · . For example, consider the table
a0 b0 a0 b1 a0 b2 · · ·
a1 b0 a1 b1 a1 b2 · · ·
a2 b0 a2 b1 a2 b2 · · ·
..
..
..
.
.
.
p0 p1 p3 · · ·
p2 p4 p7 · · ·
p5 p8 p12 · · ·
..
.. ..
.
. .
P
and the diagonal enumeration of the products. The question is: under which conditions on an
P
and
bn the product series converges and its sum does not depend on the arrangement of the
products ai bk .
P
P
P
ak and ∞
bk converge absolutely with A = ∞
a
Proposition 2.37 If both series ∞
k=0
k=0
P∞
P
P∞ k=0 k
and B = k=0 bk , then any of their product series pk converges absolutely and k=0 pk =
AB.
P
Proof. For the nth partial sum of any product series nk=0 | pk | we have
| p0 | + | p1 | + · · · + | pn | ≤ (| a0 | + · · · + | am |) · (| b0 | + · · · + | bm |),
if m is sufficiently large. More than ever,
| p0 | + | p1 | + · · · + | pn | ≤
∞
X
k=0
| ak | ·
∞
X
k=0
| bk | .
P
That is, any series ∞
by Lemma 2.19 (c). By Propok=0 | pk | is bounded and hence convergent
P∞
sition 2.36 all product series converge to the same sum s = k=0 pk . Consider now the very
P
special product series ∞
k=1 qn with partial sums consisting of the sum of the elements in the
upper left square. Then
q1 + q2 + · · · + q(n+1)2 = (a0 + a1 + · · · + an )(b0 + · · · + bn ).
converges to s = AB.
Arranging the elements ai bj as above in a diagonal array and summing up the elements on the
nth diagonal cn = a0 bn + a1 bn−1 + · · · + an b0 , we obtain the Cauchy product
∞
X
n=0
cn =
∞
X
n=0
P∞
(a0 bn + a1 bn−1 + · · · + an b0 ).
P
P
Corollary 2.38 If both series k=0 ak and ∞
bk converge absolutely with A = ∞
k=0
k=0 ak
P∞
P∞
P∞
and B = k=0 bk , their Cauchy product k=0 ck converges absolutely and k=0 ck = AB.
2.3 Series
73
Example 2.15 We compute the Cauchy product of two geometric series:
(1 + p + p2 + · · · )(1 + q + q 2 + · · · ) = 1 + (p + q) + (p2 + pq + q 2 )+
+ (p3 + p2 q + pq 2 + q 2 ) + · · ·
∞
p − q p2 − q 2 p3 − q 3
1 X n
=
+
+
+··· =
(p − q n )
p−q
p−q
p−q
p − q n=1
=
| p |<1,| q |<1
1
p
q
1 p(1 − q) − q(1 − p)
1
1
−
=
=
·
.
p−q1−p 1−q
p − q (1 − p)(1 − q)
1−p 1−q
Cauchy Product of Power Series
In case of power series the Cauchy product is appropriate since it is again a power series (which
P
k
is not the case for other types of product series). Indeed, the Cauchy product of ∞
k=0 ak z and
P∞
k
k=0 bk z is given by the general element
n
X
k
ak z bn−k z
n−k
=z
k=0
such that
∞
X
k=0
ak z k ·
n
n
X
ak bn−k ,
k=0
∞
X
bk z k =
∞
X
n=0
k=0
(a0 bn + · · · + an b0 )z n .
P
Corollary 2.39 Suppose that n an z n and n bn z n are power series with positive radius
of convergence R1 and R2 , respectively. Let R = min{R1 , R2 }. Then the Cauchy product
P∞
n
n=0 cn z , cn = a0 bn + · · · + an b0 , converges absolutely for | z | < R and
P
∞
X
n=0
an z n ·
∞
X
bn z n =
n=0
∞
X
cn z n ,
n=0
| z | < R.
This follows from the previous corollary and the fact that both series converge absolutely for
| z | < R.
Example 2.16 (a)
∞
X
1
, | z | < 1.
(1 − z)2
n=0
P∞ n
1
= 1−z
, | z | < 1 with itself.
Indeed, consider the Cauchy product of
n=0 z
Pn
Pn
an = bn = 1, cn = k=0 ak bn−k = k=0 1 · 1 = n + 1, the claim follows.
(b)
2
(z + z )
∞
X
n=0
n
z =
∞
X
(z
n+1
(n + 1)z n =
+z
n+2
n=0
=z+2
)=
∞
X
n=1
∞
X
n=2
n
z +
∞
X
Since
zn =
n=2
z n = z + 2z 2 + 2z 3 + 2z 4 + · · · = z +
z + z2
2z 2
=
.
1−z
1−z
74
2 Sequences and Series
Chapter 3
Functions and Continuity
This chapter is devoted to another central notion in analysis—the notion of a continuous function. We will see that sums, product, quotients, and compositions of continuous functions are
continuous. If nothing is specified otherwise D will denote a finite union of intervals.
Definition 3.1 Let D ⊂ R be a subset of R. A function is a map f : D → R.
(a) The set D is called the domain of f ; we write D = D(f ).
(b) If A ⊆ D, f (A) := {f (x) | x ∈ A} is called the image of A under f .
The function f ↾A : A → R given by f ↾A(a) = f (a), a ∈ A, is called the restriction of f to A.
(c) If B ⊂ R, we call f −1 (B) := {x ∈ D | f (x) ∈ B} the preimage of B under f .
(d) The graph of f is the set graph(f ) := {(x, f (x)) | x ∈ D}.
Later we will consider functions in a wider sense: From the complex numbers into complex
numbers and from Fn into Fm where F = R or F = C.
We say that a function f : D → R is bounded, if f (D) ⊂ R is a bounded set of real numbers,
i. e. there is a C > 0 such that | f (x) | ≤ C for all x ∈ D. We say that f is bounded above
(resp. bounded below) if there exists C ∈ R such that f (x) < C (resp. f (x) > C) for all x in
the domain of f .
Example 3.1 (a) Power series (with radius of convergence R > 0), polynomials and rational
functions are the most important examples of functions.
Let c ∈ R. Then f (x) = c, f : R → R, is called the constant function.
(b) Properties of the functions change drastically if we change the domain or the image set.
Let f : R → R, g : R → R+ , k : R+ → R, h : R+ → R+ function given by x 7→ x2 .
g is surjective, k is injective, h is bijective, f is neither injective nor surjective. Obviously,
f ↾R+ = k and g↾R+ = h.
P∞ n
1
\
(c) Let f (x) =
n=0 x , f : (−1, 1) → R and h(x) = 1−x , h : R {1} → R. Then
h↾(−1, 1) = f .
75
3 Functions and Continuity
76
y
y
y
f(x)=x
f(x)=|x|
c
x
x
x
The graphs of the constant, the identity, and absolute value functions.
3.1 Limits of a Function
Definition 3.2 (ε-δ-definition) Let (a, b) a finite or infinite interval and x0 ∈ (a, b). Let
f : (a, b) \ {x0 } → R be a real-valued function. We call A ∈ R the limit of f in x0 (“The
limit of f (x) is A as x approaches x0 ”; “f approaches A near x0 ”) if the following is satisfied
For any ε > 0 there exists δ > 0 such that x ∈ (a, b) and 0 < | x − x0 | < δ imply
| f (x) − A | < ε.
We write
lim f (x) = A.
x→x0
Roughly speaking, if x is close to x0 , f (x) must be closed to A.
f(x)+ε
f(x)
f(x)- ε
x- δ x x+δ
Using quantifiers lim f (x) = A reads as
x→x0
∀ ε > 0 ∃ δ > 0 ∀ x ∈ (a, b) :
0 < | x − x0 | < δ =⇒ | f (x) − A | < ε.
Note that the formal negation of lim f (x) = A is
x→x0
∃ ε > 0 ∀ δ > 0 ∃ x ∈ (a, b) :
0 < | x − x0 | < δ
and
| f (x) − A | ≥ ε.
3.1 Limits of a Function
77
Proposition 3.1 (sequences definition) Let f and x0 be as above. Then lim f (x) = A if and
x→x0
only if for every sequence (xn ) with xn ∈ (a, b), xn 6= x0 for all n, and lim xn = x0 we have
n→∞
lim f (xn ) = A.
n→∞
Proof. Suppose limx→x0 f (x) = A, and xn → x0 where xn 6= x0 for all n. Given ε > 0 we find
δ > 0 such that | f (x) − A | < ε if 0 < | x − x0 | < δ. Since xn → x0 , there is a positive integer
n0 such that n ≥ n0 implies | xn − x0 | < δ. Therefore n ≥ n0 implies | f (xn ) − A | < ε. That
is, limn→∞ f (xn ) = A.
Suppose to the contrary that the condition of the proposition is fulfilled but limx→x0 f (x) 6= A.
Then there is some ε > 0 such that for all δ = 1/n, n ∈ N, there is an xn ∈ (a, b) such that
0 < | xn − x0 | < 1/n, but | f (xn ) − A | ≥ ε. We have constructed a sequence (xn ), xn 6= x0
and xn → x0 as n → ∞ such that limn→∞ f (xn ) 6= A which contradicts our assumption.
Hence lim f (x) = A.
x→x0
Example. limx→1 x + 3 = 4. Indeed, given ε > 0 choose δ = ε. Then | x − 1 | < δ implies
| (x + 3) − 4 | < δ = ε.
3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity
Definition 3.3 (a) We are writing
lim f (x) = A
x→x0 +0
if for all sequences (xn ) with xn > x0 and lim xn = x0 , we have lim f (xn ) = A. Sometimes
n→∞
n→∞
we use the notation f (x0 + 0) in place of lim f (x). We call f (x0 + 0) the right-hand limit
x→x0 +0
of f at x0 or we say “A is the limit of f as x approaches x0 from above (from the right).”
Similarly one defines the left-hand limit of f at x0 , lim f (x) = A with xn < x0 in place of
x→x0 −0
xn > x0 . Sometimes we use the notation f (x0 − 0).
(b) We are writing
lim f (x) = A
x→+∞
if for all sequences (xn ) with lim xn = +∞ we have lim f (xn ) = A. Sometimes we use the
n→∞
n→∞
notation f (+∞). In a similar way we define lim f (x) = A.
x→−∞
(c) Finally, the notions of (a), (b), and Definition 3.2 still make sense in case A = +∞ and
A = −∞. For example,
lim f (x) = −∞
x→x0 −0
if for all sequences (xn ) with xn < x0 and lim xn = x0 we have lim f (xn ) = −∞.
n→∞
n→∞
Remark 3.1 All notions in the above definition can be given in ε-δ or ε-D or E-δ or E-D
languages using inequalities. For example, lim f (x) = −∞ if and only if
x→x0 −0
∀ E > 0 ∃ δ > 0 ∀ x ∈ D(f ) : 0 < x0 − x < δ =⇒ f (x) < −E.
3 Functions and Continuity
78
For example, we show that limx→0−0 x1 = −∞. To E > 0 choose δ = E1 . Then 0 < −x < δ =
1
implies 0 < E < − x1 and hence f (x) < −E. This proves the claim.
E
Similarly, lim f (x) = +∞ if and only if
x→+∞
∀ E > 0 ∃ D > 0 ∀ x ∈ D(f ) : x > D =⇒ f (x) > E.
The proves of equivalence of ε-δ definitions and sequence definitions are along the lines of
Proposition 3.1.
√
√
For example, limx→+∞ x2 = +∞. To E > 0 choose D = E. Then x > D implies x > E;
thus x2 > E.
1
= 0. For, let ε > 0; choose D = 1/ε. Then x > D = 1/ε implies
x→+∞ x
0 < 1/x < ε. This proves the claim.
(b) Consider the entier function f (x) = [x], defined in
Example 2.6 (b). If n ∈ Z, lim f (x) = n − 1 whereas
y
Example 3.2 (a) lim
x→n−0
n
lim f (x) = n.
f(x)=[x]
x→n+0
Proof. We use the ε-δ definition of the one-sided limits
to prove the first claim. Let ε > 0. Choose δ = 12 then
0 < n − x < 12 implies n − 12 < x < n and therefore
f (x) = n − 1. In particular | f (x) − (n − 1) | = 0 < ε.
Similarly one proves lim f (x) = n.
n-1
n
x
x→n+0
Since the one-sided limits are different, lim f (x) does not exist.
x→n
Definition 3.4 Suppose we are given two functions f and g, both defined on (a, b) \ {x0 }. By
f + g we mean the function which assigns to each point x 6= x0 of (a, b) the number f (x) +
g(x). Similarly, we define the difference f − g, the product f g, and the quotient f /g, with the
understanding that the quotient is defined only at those points x at which g(x) 6= 0.
Proposition 3.2 Suppose that f and g are functions defined on (a, b) \ {x0 }, a < x0 < b, and
limx→x0 f (x) = A, limx→x0 g(x) = B, α, β ∈ R. Then
(a) lim f (x) = A′ implies A′ = A.
x→x0
(b) lim (αf + βg)(x) = αA + βB;
x→x0
(c) lim (f g)(x) = AB;
x→x0
A
f
(x) = , if B 6= 0.
x→x0 g
B
(e) lim | f (x) | = | A |.
(d) lim
x→x0
Proof. In view of Proposition 3.1, all these assertions follow immediately from the analogous
properties of sequences, see Proposition 2.3. As an example, we show (c). Let (xn ) , xn 6= x0 ,
be a sequence tending to x0 . By assumption, limn→∞ f (xn ) = A and limn→∞ g(xn ) = B.
By the Propostition 2.3 limn→∞ f (xn )g(xn ) = AB, that is, limn→∞ (f g)(xn ) = AB. By
3.1 Limits of a Function
79
Proposition 3.1, limx→x0 f g(x) = AB.
Remark 3.2 The proposition remains true if we replace (at the same time in all places) x → x0
by x → x0 + 0, x → x0 − 0, x → +∞, or x → −∞. Moreover we can replace A or B by +∞
or by −∞ provided the right members of (b), (c), (d) and (e) are defined.
Note that +∞ + (−∞), 0 · ∞, ∞/∞, and A/0 are not defined.
The extended real number system consists of the real field R and two symbols, +∞ and −∞.
We preserve the original order in R and define
−∞ < x < +∞
for every x ∈ R.
It is the clear that +∞ is an upper bound of every subset of the extended real number system,
and every nonempty subset has a least upper bound. If, for example, E is a set of real numbers
which is not bounded above in R, then sup E = +∞ in the extended real system. Exactly the
same remarks apply to lower bounds.
The extended real system does not form a field, but it is customary to make the following
conventions:
(a) If x is real then
x + ∞ = +∞,
x − ∞ = −∞,
x
x
=
= 0.
+∞
−∞
(b) If x > 0 then x · (+∞) = +∞ and x · (−∞) = −∞.
(c) If x < 0 then x · (+∞) = −∞ and x · (−∞) = +∞.
When it is desired to make the distinction between the real numbers on the one hand and the
symbols +∞ and −∞ on the other hand quite explicit, the real numbers are called finite.
In Homework 9.2 (a) and (b) you are invited to give explicit proves in two special cases.
Example 3.3 (a) Let p(x) and q(x) be polynomials and a ∈ R. Then
lim p(x) = p(a).
x→a
This immediately follows from limx→a x = a, limx→a c = c and Proposition 3.2. Indeed, by (b)
and (c), for p(x) = 3x3 − 4x + 7 we have limx→a (3x2 − 4x + 7) = 3 (limx→a x)3 − 4 limx→a x +
7 = 3a2 − 4a + 7 = p(a). This works for arbitrary polynomials . Suppose moreover that
q(a) 6= 0. Then by (d),
p(a)
p(x)
lim
=
.
x→a q(x)
q(a)
Hence, the limit of a rational function f (x) as x approaches a point a of the domain of f is
f (a).
3 Functions and Continuity
80
P
(b) Let f (x) = p(x)
be a rational function with polynomials p(x) = rk=0 ak xk and q(x) =
q(x)
Ps
k
k=0 bk x with real coefficients ak and bk and of degree r and s, respectively. Then


0, if r < s,



 ar , if r = s,
lim f (x) = bs
x→+∞

+∞, if r > s and




−∞, if r > s and
ar
bs
ar
bs
> 0,
< 0.
The first two statements (r ≥ s) follow from Example 3.2 (b) together with Proposition 3.2.
Namely, ak xk−r → 0 as x → +∞ provided 0 ≤ k < r. The statements for r > s follow from
xr−s → +∞ as x → +∞ and the above remark.
Note that
lim f (x) = (−1)r+s lim f (x)
x→−∞
x→+∞
since
r
p(−x)
(−1)r ar xr + . . .
r+s ar x + . . .
=
=
(−1)
.
q(−x)
(−1)s bs xs + . . .
bs xs + . . .
3.2 Continuous Functions
Definition 3.5 Let f be a function and x0 ∈ D(f ). We say that f is continuous at x0 if
∀ ε > 0 ∃ δ > 0 ∀ x ∈ D(f ) : | x − x0 | < δ =⇒ | f (x) − f (x0 ) | < ε.
(3.1)
We say that f is continuous in A ⊂ D(f ) if f is continuous at all points x0 ∈ A.
Proposition 3.1 shows that the above definition of continuity in x0 is equivalent to: For all
sequences (xn ), xn ∈ D(f ), with lim xn = x0 , lim f (xn ) = f (x0 ). In other words, f is
n→∞
n→∞
continuous at x0 if lim f (x) = f (x0 ).
x→x0
Example 3.4 (a) In example 3.2 we have seen that every polynomial is continuous in R and
every rational functions f is continuous in their domain D(f ).
f (x) = | x | is continuous in R.
(b) Continuity is a local property: If two functions f, g : D → R coincide in a neighborhood
Uε (x0 ) ⊂ D of some point x0 , then f is continuous at x0 if and only if g is continuous at x0 .
(c) f (x) = [x] is continuous in R \ Z. If x0 is not an integer, then n < x0 < n + 1 for some
n ∈ N and f (x) = n coincides with a constant function in a neighborhood x ∈ Uε (x0 ). By (b),
f is continuous at x0 . If x0 = n ∈ Z, limx→n [x] does not exist; hence f is not continuous at n.
x2 − 1
(d) f (x) =
if x 6= 1 and f (1) = 1. Then f is not continuous at x0 = 1 since
x−1
x2 − 1
= lim (x + 1) = 2 6= 1 = f (1).
x→1 x − 1
x→1
lim
3.2 Continuous Functions
81
There are two reasons for a function not being continuous at x0 . First, limx→x0 f (x) does not
exist. Secondly, f has a limit at x0 but limx→x0 f (x) 6= f (x0 ).
Proposition 3.3 Suppose f, g : D → R are continuous at x0 ∈ D. Then f + g and f g are also
continuous at x0 . If g(x0 ) 6= 0, then f /g is continuous at x0 .
The proof is obvious from Proposition 3.2.
The set C(D) of continuous function on D ⊂ R form a commutative algebra with 1.
Proposition 3.4 Let f : D → R and g : E → R functions with f (D) ⊂ E. Suppose f is
continuous at a ∈ D, and g is continuous at b = f (a) ∈ E. Then the composite function
g ◦f : D → R is continuous at a.
Proof. Let (xn ) be a sequence with xn ∈ D and limn→∞ xn = a. Since f is continuous
at a, limn→∞ f (xn ) = b. Since g is continuous at b, limn→∞ g(f (xn )) = g(b); hence
g ◦f (xn ) → g ◦f (a). This completes the proof.
Example 3.5 f (x) = x1 is continuous for x 6= 0, g(x) = sin x is continuous (see below), hence,
(g ◦f )(x) = sin x1 is continuous on R \ {0}.
3.2.1 The Intermediate Value Theorem
In this paragraph, [a, b] ⊂ R is a closed, bounded interval, a, b ∈ R.
The intermediate value theorem is the basis for several existence theorems in analysis. It is
again equivalent to the order completeness of R.
Theorem 3.5 (Intermediate Value Theorem) Let f : [a, b] → R be a continuous function and
γ a real number between f (a) and f (b). Then there exists c ∈ [a, b] such that f (c) = γ.
The statement is clear from the graphical presentation. Nevertheless, it needs a proof since pictures do not prove anything.
The statement is wrong for rational numbers. For example, let
γ
D = {x ∈ Q | 1 ≤ x ≤ 2} and f (x) = x2 − 2. Then f (1) = −1
c
a
b
and f (2) = 2 but there is no p ∈ D with f (p) = 0 since 2 has no
rational square root.
Proof. Without loss of generality suppose f (a) ≤ f (b). Starting with [a1 , b1 ] = [a, b], we successively construct a nested sequence of intervals [an , bn ] such that f (an ) ≤ γ ≤ f (bn ). As in
the proof of Proposition 2.12, the [an , bn ] is one of the two halfintervals [an−1 , m] and [m, bn−1 ]
where m = (an−1 + bn−1 )/2 is the midpoint of the (n − 1)st interval. By Proposition 2.11 the
monotonic sequences (an ) and (bn ) both converge to a common point c. Since f is continuous,
lim f (an ) = f ( lim an ) = f (c) = f ( lim bn ) = lim f (bn ).
n→∞
n→∞
n→∞
By Proposition 2.14, f (an ) ≤ γ ≤ f (bn ) implies
lim f (an ) ≤ γ ≤ lim f (bn );
n→∞
n→∞
n→∞
3 Functions and Continuity
82
Hence, γ = f (c).
Example 3.6 (a) We again show the existence of the nth root of a positive real number a > 0,
n ∈ N. By Example 3.2, the polynomial p(x) = xn − a is continuous in R. We find p(0) =
−a < 0 and by Bernoulli’s inequality
p(1 + a) = (1 + a)n − a ≥ 1 + (n − 1)a ≥ 1 > 0.
Theorem 3.5 shows that p has a root in the interval (0, 1 + a).
(b) A polynomial p of odd degree with real coefficients has a real zero. Namely, by Example 3.3,
if the leading coefficient ar of p is positive, lim p(x) = −∞ and lim p(x) = +∞. Hence
x→−∞
x→∞
there are a and b with a < b and p(a) < 0 < p(b). Therefore, there is a c ∈ (a, b) such that
p(c) = 0.
There are polynomials of even degree having no real zeros. For example f (x) = x2k + 1.
Remark 3.3 Theorem 3.5 is not true for continuous functions f : Q → R. For example,
f (x) = x2 − 2 is continuous, f (0) = −2 < 0 < 2 = f (2). However, there is no r ∈ Q
with f (r) = 0.
3.2.2 Continuous Functions on Bounded and Closed Intervals—The Theorem about Maximum and Minimum
We say that f : [a, b] → R is continuous, if f is continuous on (a, b) and f (a + 0) = f (a) and
f (b − 0) = f (b).
Theorem 3.6 (Theorem about Maximum and Minimum) Let f : [a, b] → R be continuous.
Then f is bounded and attains its maximum and its minimum, that is, there exists C > 0 with
| f (x) | ≤ C for all x ∈ [a, b] and there exist p, q ∈ [a, b] with sup f (x) = max f (x) = f (p)
a≤x≤b
a≤x≤b
and inf f (x) = min f (x) = f (q).
a≤x≤b
a≤x≤b
Remarks 3.4 (a) The theorem is not true in case of open, half-open or infinite intervals. For
example, f : (0, 1] → R, f (x) = x1 is continuous but not bounded. The function f : (0, 1) → R,
f (x) = x is continuous and bounded. However, it doesn’t attain maximum and minimum.
Finally, f (x) = x2 on R+ is continuous but not bounded.
(b) Put M := max f (x) and m := min f (x). By the Theorem about maximum and minimum
x∈K
x∈K
and the intermediate value theorem, for all γ ∈ R with m ≤ γ ≤ M there exists c ∈ [a, b] such
that f (c) = γ; that is, f attains all values between m and M.
Proof. We give the proof in case of the maximum. Replacing f by −f yields the proof for the
minimum. Let
A = sup f (x) ∈ R ∪ {+∞}.
a≤x≤b
3.3 Uniform Continuity
83
(Note that A = +∞ is equivalent to f is not bounded above.) Then there exists a sequence
(xn ) ∈ [a, b] such that limn→∞ f (xn ) = A. Since (xn ) is bounded, by the Theoremm of
Weierstraß there exists a convergent subsequence (xnk ) with p = lim xnk and a ≤ p ≤ b. Since
k
f is continuous,
A = lim f (xnk ) = f (p).
k→∞
In particular, A is a finite real number; that is, f is bounded above by A and f attains its
maximum A at point p ∈ [a, b].
3.3 Uniform Continuity
Let D be a finite or infinite interval.
Definition 3.6 A function f : D → R is called uniformly continuous if for every ε > 0 there
exists a δ > 0 such that for all x, x′ ∈ D | x − x′ | < δ implies | f (x) − f (x′ ) | < ε.
f is uniformly continuous on [a, b] if and only if
∀ ε > 0 ∃ δ > 0 ∀ x, y ∈ [a, b] : | x − y | < δ =⇒ | f (x) − f (y) | < ε.
(3.2)
Remark 3.5 If f is uniformly continuous on D then f is continuous on D. However, the
converse direction is not true.
Consider, for example, f : (0, 1) → R, f (x) = x1 which is continuous. Suppose to the contrary
that f is uniformly continuous. Then to ε = 1 there exists δ > 0 with (3.2). By the Archimedian
1
1
property there exists n ∈ N such that 2n
< δ. Consider xn = n1 and yn = 2n
. Then | xn − yn | =
1
< δ. However,
2n
| f (xn ) − f (yn ) | = 2n − n = n ≥ 1.
A contradiction! Hence, f is not uniformly continuous on (0, 1).
Let us consider the differences between the concepts of continuity and uniform continuity. First,
uniform continuity is a property of a function on a set, whereas continuity can be defined in a
single point. To ask whether or not a given function is uniformly continuous at a certain point is
meaningless. Secondly, if f is continuous on D, then it is possible to find, for each ε > 0 and for
each point x0 ∈ D, a number δ = δ(x0 , ε) > 0 having the property specified in Definition 3.5.
This δ depends on ε and on x0 . If f is, however, uniformly continuous on X, then it is possible,
for each ε > 0 to find one δ = δ(ε) > 0 which will do for all possible points x0 of X.
That the two concepts are equivalent on bounded and closed intervals follows from the next
proposition.
Proposition 3.7 Let f : [a, b] → R be a continuous function on a bounded and closed interval.
Then f is uniformly continuous on [a, b].
3 Functions and Continuity
84
Proof. Suppose to the contrary that f is not uniformly continuous. Then there exists ε0 > 0
without matching δ > 0; for every positive integer n ∈ N there exists a pair of points xn , x′n
with | xn − x′n | < 1/n but | f (xn ) − f (x′n ) | ≥ ε0 . Since [a, b] is bounded and closed, (xn ) has
a subsequence converging to some point p ∈ [a, b]. Since | xn − x′n | < 1/n, the sequence (x′n )
also converges to p. Hence
lim f (xnk ) − f (x′nk ) = f (p) − f (p) = 0
k→∞
which contradicts f (xnk ) − f (x′nk ) ≥ ε0 for all k.
There exists an example of a bounded continuous function f : [0, 1) → R which is not uniformly continuous, see [Kön90, p. 91].
Discontinuities
If x is a point in the domain of a function f at which f is not continuous, we say f is discontinuous at x or f has a discontinuity at x. It is customary to divide discontinuities into two
types.
Definition 3.7 Let f : (a, b) → R be a function which is discontinuous at a point x0 . If the
one-sided limits limx→x0 +0 f (x) and limx→x0 −0 f (x) exist, then f is said to have a simple discontinuity or a discontinuity of the first kind. Otherwise the discontinuity is said to be of the
second kind.
Example 3.7 (a) f (x) = sign(x) is continuous on R \ {0} since it is locally constant. Moreover, f (0 + 0) = 1 and f (0 − 0) = −1. Hence, sign(x) has a simple discontinuity at x0 = 0.
(b) Define f (x) = 0 if x is rational, and f (x) = 1 if x is irrational. Then f has a discontinuity
of the second kind at every point x since neither f (x + 0) nor f (x − 0) exists.
(c) Define
f (x) =
(
sin x1 , if
0,
if
x 6= 0;
x = 0.
Consider the two sequences
xn =
π
2
1
+ nπ
and yn =
1
,
nπ
Then both sequences (xn ) and (yn ) approach 0 from above but limn→∞ f (xn ) = 1 and
limn→∞ f (yn ) = 0; hence f (0 + 0) does not exist. Therefore f has a discontinuity of the
second kind at x = 0. We have not yet shown that sin x is a continuous function. This will be
done in Section 3.5.2.
3.4 Monotonic Functions
85
3.4 Monotonic Functions
Definition 3.8 Let f be a real function on the interval (a, b). Then f is said to be monotonically
increasing on (a, b) if a < x < y < b implies f (x) ≤ f (y). If the last inequality is reversed, we
obtain the definition of a monotonically decreasing function. The class of monotonic functions
consists of both the increasing and the decreasing functions.
If a < x < y < b implies f (x) < f (y), the function is said to be strictly increasing. Similarly,
strictly decreasing functions are defined.
Theorem 3.8 Let f be a monotonically increasing function on (a, b). Then the one-sided limits
f (x + 0) and f (x − 0) exist at every point x of (a, b). More precisely,
sup f (t) = f (x − 0) ≤ f (x) ≤ f (x + 0) = inf f (t).
t∈(x,b)
t∈(a,x)
(3.3)
Furthermore, if a < x < y < b, then
f (x + 0) ≤ f (y − 0).
(3.4)
Analogous results evidently hold for monotonically decreasing functions.
Proof. See Appendix B to this chapter.
Proposition 3.9 Let f : [a, b] → R be a strictly monotonically increasing continuous function
and A = f (a) and B = f (b). Then f maps [a, b] bijectively onto [A, B] and the inverse function
f −1 : [A, B] → R
is again strictly monotonically increasing and continuous.
Note that the inverse function f −1 : [A, B] → [a, b] is defined by f (y0 ) = x0 , y0 ∈ [A, B],
where x0 is the unique element of [a, b] with f (x0 ) = y0 . However, we can think of f −1 as a
function into R. A similar statement is true for strictly decreasing functions.
Proof. By Remark 3.4, f maps [a, b] onto the whole closed interval [A, B] (intermediate value
theorem). Since x < y implies f (x) < f (y), f is injective and hence bijective. Hence, the
inverse mapping f −1 : [A, B] → [a, b] exists and is again strictly increasing (u < v implies
f −1 (u) = x < y = f −1 (v) otherwise, x ≥ y implies u ≥ v).
We show that g = f −1 is continuous. Suppose (un ) is a sequence in [A, B] with un → u and
un = f (xn ) and u = f (x). We have to show that (xn ) converges to x. Suppose to the contrary
that there exists ε0 > 0 such that | xn − x | ≥ ε0 for infinitely many n. Since (xn ) ⊆ [a, b] is
bounded, there exists a converging subsequence (xnk ), say, xnk → c as k → ∞. The above
inequality is true for the limit c, too, that is | c − x | ≥ ε0 . By continuity of f , xnk → c implies
f (xnk ) → f (c). That is unk → f (c). Since un → u = f (x) and the limit of a converging
sequence is unique, f (c) = f (x). Since f is bijective, x = c; this contradicts | c − x | ≥ ε0 .
Hence, g is continuous at u.
3 Functions and Continuity
86
Example 3.8 The function f : R+ → R+ , f (x) = xn , is continuous and strictly increasing.
√
Hence x = g(y) = n y is continuous, too. This gives an alternative proof of homework 5.5.
3.5 Exponential, Trigonometric, and Hyperbolic Functions
and their Inverses
3.5.1 Exponential and Logarithm Functions
In this section we are dealing with the exponential function which is one of the most important
in analysis. We use the exponential series to define the function. We will see that this definition
is consistent with the definition ex for rational x ∈ Q as defined in Chapter 1.
Definition 3.9 For z ∈ C put
E(z) =
∞
X
zn
n=0
n!
=1+z+
z2 z3
+
+··· .
2
6
(3.5)
Note that E(0) = 1 and E(1) = e by the definition at page 61. The radius of convergence of
the exponential series (3.5) is R = +∞, i. e. the series converges absolutely for all z ∈ C, see
Example 2.13 (c).
Applying Proposition 2.31 (Cauchy product) on multiplication of absolutely convergent series,
we obtain
n
∞
∞
∞
X
z n X w m X X z k w n−k
=
E(z)E(w) =
n! m=0 m!
k! (n − k)!
n=0
n=0 k=0
∞
n
∞
X 1 X n
X
(z + w)n
k n−k
,
z w
=
=
n!
n!
k
n=0
n=0
k=0
which gives us the important addition formula
E(z + w) = E(z)E(w),
z, w ∈ C.
(3.6)
z ∈ C.
(3.7)
One consequence is that
E(z)E(−z) = E(0) = 1,
This shows that E(z) 6= 0 for all z. By (3.5), E(x) > 0 if x > 0; hence (3.7) shows E(x) > 0
for all real x.
Iteration of (3.6) gives
E(z1 + · · · + zn ) = E(z1 ) · · · E(zn ).
(3.8)
Let us take z1 = · · · = zn = 1. Since E(1) = e by (2.15), we obtain
E(n) = en ,
n ∈ N.
(3.9)
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses
87
If p = m/n, where m, n are positive integers, then
E(p)n = E(pn) = E(m) = em ,
(3.10)
so that
p ∈ Q+ .
E(p) = ep ,
(3.11)
It follows from (3.7) that E(−p) = e−p if p is positive and rational. Thus (3.11) holds for all
rational p. This justifies the redefinition
x ∈ C.
ex := E(x),
The notation exp(x) is often used in place of ex .
Proposition 3.10 We can estimate the remainder term rn :=
2 | z |n
| rn (z) | ≤
n!
Proof. We have
∞ k
X
z
| rn (z) | ≤
k!
k=n
≤
| z |n
n!
if
|z| ≤
P∞
k=n
z k /k! as follows
n+1
.
2
(3.12)
| z |n
=
n!
|z|
| z |2
| z |k
1+
+
+···+
+···
n + 1 (n + 1)(n + 2)
(n + 1) · · · (n + k)
!
| z |k
|z|
| z |2
+···+
+··· .
1+
+
n + 1 (n + 1)2
(n + 1)k
| z | ≤ (n + 1)/2 implies,
| z |n
| rn (z) | ≤
n!
1 1
1
1+ + +···+ k +···
2 4
2
2 | z |n
.
≤
n!
Example 3.9 (a) Inserting n = 1 gives
| E(z) − 1 | = | r1 (z) | ≤ 2 | z | ,
| z | ≤ 1.
In particular, E(z) is continuous at z0 = 0. Indeed, to ε > 0 choose δ = ε/2 then | z | < δ
implies | E(z) − 1 | ≤ 2 | z | ≤ ε; hence lim E(z) = E(0) = 1 and E is continuous at 0.
z→0
(b) Inserting n = 2 gives
| ez − 1 − z | = | r2 (z) | ≤ | z |2 ,
This implies
z
e −1
≤ |z |,
−
1
z
The sandwich theorem gives lim
z→0
ez −1
z
= 1.
3
|z| ≤ .
2
3
|z| < .
2
!
3 Functions and Continuity
88
By (3.5), limx→∞ E(x) = +∞; hence (3.7) shows that limx→−∞ E(x) = 0. By (3.5),
0 < x < y implies that E(x) < E(y); by (3.7), it follows that E(−y) < E(−x); hence, E
is strictly increasing on the whole real axis.
The addition formula also shows that
lim (E(z + h) − E(z)) = E(z) lim (E(h) − 1) = E(z) · 0 = 0,
h→0
h→0
(3.13)
where limh→0 E(h) = 1 directly follows from Example 3.9. Hence, E(z) is continuous for all
z.
Proposition 3.11 Let ex be defined on R by the power series (3.5). Then
(a) ex is continuous for all x.
(b) ex is a strictly increasing function and ex > 0.
(c) ex+y = ex ey .
(d) lim ex = +∞, lim ex = 0.
x→+∞
x→−∞
xn
(e) lim x = 0 for every n ∈ N.
x→+∞ e
Proof. We have already proved (a) to (d); (3.5) shows that
ex >
xn+1
(n + 1)!
for x > 0, so that
(n + 1)!
xn
,
<
x
e
x
and (e) follows. Part (e) shows that ex tends faster to +∞ than any power of x, as x → +∞.
Since ex , x ∈ R, is a strictly increasing continuous function, by Proposition 3.9 ex has an strictly
increasing continuous inverse function log y, log : (0, +∞) → R. The function log is defined
by
elog y = y,
y > 0,
(3.14)
or, equivalently, by
log(ex ) = x,
x ∈ R.
(3.15)
Writing u = ex and v = ey , (3.6) gives
log(uv) = log(ex ey ) = log(ex+y ) = x + y,
such that
log(uv) = log u + log v,
u > 0, v > 0.
(3.16)
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses
89
This shows that log has the familiar property which makes the logarithm useful for computations. Another customary notation for log x is ln x. Proposition 3.11 shows that
lim log x = +∞,
lim log x = −∞.
x→+∞
x→0+0
We summarize what we have proved so far.
Proposition 3.12 Let the logarithm log : (0, +∞) → R be the inverse function to the exponential function ex . Then
(a) log is continuous on (0, +∞).
(b) log is strictly increasing.
(c) log(uv) = log u + log v for u, v > 0.
(d) lim log x = +∞,
lim log x = −∞.
x→+∞
x→0+0
It is seen from (3.14) that
x = elog x =⇒ xn = en log x
(3.17)
if x > 0 and n is an integer. Similarly, if m is a positive integer, we have
1
xm = e
log x
m
(3.18)
Combining (3.17) and (3.18), we obtain
xα = eα log x .
(3.19)
for any rational α. We now define xα for any real α and x > 0, by (3.19). In the same way, we
redefine the exponential function
ax = ex log a ,
a > 0,
x ∈ R.
It turns out that in case a 6= 1, f (x) = ax is strict monotonic and continuous since ex is so.
Hence, f has a stict monotonic continuous inverse function loga : (0, +∞) → R defined by
loga (ax ) = x,
x ∈ R,
aloga x = x,
x > 0.
3.5.2 Trigonometric Functions and their Inverses
Simple Trig Functions
1
In this section we redefine the trigonometric functions using
the exponential function ez . We will see that the new definitions coincide with the old ones.
0.5
–6
–4
–2
2
4
6
Definition 3.10 For z ∈ C define
cos z =
–0.5
such that
–1
Sine
Cosine
1 iz
e + e−i z ,
2
ei z = cos z + i sin z
sin z =
1 iz
e − e−i z
2i
(Euler formula)
(3.20)
(3.21)
3 Functions and Continuity
90
Proposition 3.13 (a) The functions sin z and cos z can be written as power series which converge absolutely for all z ∈ C:
cos z =
sin z =
∞
X
(−1)n
n=0
∞
X
n=0
1
1
1
z 2n = 1 − z 2 + z 4 − z 6 + − · · ·
(2n)!
2
4!
6!
1
1
(−1)n 2n+1
= z − z3 + z5 − + · · · .
z
(2n + 1)!
3!
5!
(3.22)
(b) sin x and cos x are real valued and continuous on R, where cos x is an even and sin x is an
odd function, i. e. cos(−x) = cos x, sin(−x) = − sin x. We have
sin2 x + cos2 x = 1;
(3.23)
cos(x + y) = cos x cos y − sin x sin y;
sin(x + y) = sin x cos y + cos x sin y.
(3.24)
Proof. (a) Inserting iz into (3.5) in place of z and using (in ) = (i, −1, −i, 1, i, −1, . . . ), we have
iz
e =
∞
X
nz
i
n=0
n
n!
=
∞
X
k=0
∞
X
z 2k
z 2k+1
(−1)
(−1)k
+i
.
(2k)!
(2k + 1)!
k
k=0
Inserting −iz into (3.5) in place of z we have
e−i z =
∞
X
in
n=0
∞
∞
X
zn X
z 2k
z 2k+1
(−1)k
(−1)k
=
−i
.
n!
(2k)!
(2k
+
1)!
k=0
k=0
Inserting this into (3.20) proves (a).
(b) Since the exponential function is continuous on C, sin z and cos z are also continuous on
C. In particular, their restrictions to R are continuous. Now let x ∈ R, then i x = −i x. By
Homework 11.3 and (3.20) we obtain
1
1
1 ix
e + ei x =
ei x + ei x =
ei x + ei x = Re ei x
cos x =
2
2
2
and similarly
sin x = Im ei x .
Hence, sin x and cos x are real for real x.
For x ∈ R we have ei x = 1. Namely by (3.7) and
Homework 11.3
i
ei x
ix 2
i sin x
e = eix eix = eix e−i x = e0 = 1,
so that for x ∈ R
0
cos x
1
ix e = 1.
(3.25)
On the other hand, the Euler formula and the fact that
cos x and sin x are real give
1 = ei x = | cos x + i sin x | = cos2 x + sin2 x.
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses
91
Hence, eix = cos x + i sin x is a point on the unit circle in the complex plane, and cos x and
sin x are its coordinates. This establishes the equivalence between the old definition of cos x as
◦
the length of the adjacent side in a rectangular triangle with hypothenuse 1 and angle x 180
and
π
the power series definition of cos x. The only missing link is: the length of the arc from 1 to eix
is x.
It follows directly from the definition that cos(−z) = cos z and sin(−z) = − sin z for all
z ∈ C. The addition laws for sin x and cos x follow from (3.6) applied to ei (x+y) . This
completes the proof of (b).
Lemma 3.14 There exists a unique number τ ∈ (0, 2) such that cos τ = 0. We define the
number π by
π = 2τ.
(3.26)
The proof is based on the following Lemma.
Lemma 3.15
√
x3
< sin x < x.
6
√
(b) 0 < x < 2 implies 0 < cos x,
sin x
0 < sin x < x <
,
cos x
1
.
cos2 x <
1 + x2
(a) 0 < x <
6 implies x −
(3.27)
(3.28)
(3.29)
(3.30)
(c) cos x is strictly decreasing on [0, π]; whereas sin x is strictly increasing on [−π/2, π/2].
2
In particular, the sandwich theorem applied to statement (a), 1 − x6 < sinx x < 1 as x → 0 + 0
gives limx→0+0 sinx x = 1. Since sinx x is an even function, this implies limx→0 sinx x = 1.
The proof of the lemma is in the Appendix B to this chapter.
Proof of Lemma 3.14. cos 0 = 1. By the Lemma 3.15, cos2 1 < 1/2. By the double angle
formula for cosine, cos 2 = 2 cos2 1 − 1 < 0. By continuity of cos x and Theorem 3.5, cos has
a zero τ in the interval (0, 2).
By addition laws,
x+y
x−y
cos x − cos y = −2 sin
sin
.
2
2
So that by Lemma 3.15 0 < x < y < 2 implies 0 < sin((x + y)/2) and sin((x − y)/2) < 0;
therefore cos x > cos y. Hence, cos x is strictly decreasing on (0, 2). The zero τ is therefore
unique.
3 Functions and Continuity
92
By definition, cos π2 = 0; and (3.23) shows sin(π/2) = ±1. By (3.27), sin π/2 = 1. Thus
eiπ/2 = i, and the addition formula for ez gives
eπ i = −1,
e2π i = 1;
(3.31)
z ∈ C.
(3.32)
hence,
ez+2π i = ez ,
Proposition 3.16 (a) The function ez is periodic with period 2πi.
We have eix = 1, x ∈ R, if and only if x = 2kπ, k ∈ Z.
(b) The functions sin z and cos z are periodic with period 2π.
The real zeros of the sine and cosine functions are {kπ | k ∈ Z} and {π/2 + kπ | k ∈ Z},
respectively.
Proof. We have already proved (a). (b) follows from (a) and (3.20).
Tangent and Cotangent Functions
Tangent
4
Cotangent
4
3
y
3
y
2
1
–4
–3
–2
–1
2
1
1
2
3
4
–4
–3
–2
–1
1
x
2
3
4
x
–1
–1
–2
–2
–3
–3
–4
–4
For x 6= π/2 + kπ, k ∈ Z, define
tan x =
sin x
.
cos x
(3.33)
cot x =
cos x
.
sin x
(3.34)
For x 6= kπ, k ∈ Z, define
Lemma 3.17 (a) tan x is continuous at x ∈ R \ {π/2 + kπ | k ∈ Z}, and tan(x + π) = tan x;
(b) lim
tan x = +∞, lim
tan x = −∞;
π
π
x→ 2 −0
x→ 2 +0
(c) tan x is strictly increasing on (−π/2, π/2);
3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses
93
Proof. (a) is clear by Proposition 3.3 since sin x and cos x are continuous. We show only (c) and
let (b) as an exercise. Let 0 < x < y < π/2. Then 0 < sin x < sin y and cos x > cos y > 0.
Therefore
tan x =
sin x
sin y
<
= tan y.
cos x
cos y
Hence, tan is strictly increasing on (0, π/2). Since tan(−x) = − tan(x), tan is strictly
increasing on the whole interval (−π/2, π/2).
Similarly as Lemma 3.17 one proves the next lemma.
Lemma 3.18 (a) cot x is continuous at x ∈ R \ {kπ | k ∈ Z}, and cot(x + π) = cot x;
(b) lim cot x = −∞, lim cot x = +∞;
x→0−0
x→0+0
(c) cot x is strictly decreasing on (0, π).
Inverse Trigonometric Functions
We have seen in Lemma 3.15 that cos x is strictly decreasing on [0, π] and sin x is strictly increasing on [−π/2, π/2]. Obviously, the images are cos[0, π] = sin[−π/2, π/2] = [−1, 1].
Using Proposition 3.9 we obtain that the inverse functions exists and they are monotonic and
continuous.
Proposition 3.19 (and Definition) There exists the inArcosine and Arcsine
verse function to cos
3
arccos : [−1, 1] → [0, π]
2
1
–1 –0.8 –0.6 –0.4 –0.2 0
0.2
–1
0.4
0.6
0.8
1
(3.35)
given by arccos(cos x) = x, x ∈ [0, π] or cos(arccos y) =
y, y ∈ [−1, 1]. The function arccos x is strictly decreasing and continuous.
There exists the inverse function to sin
arcsin : [−1, 1] → [−π/2, π/2]
(3.36)
given by arcsin(sin x) = x, x ∈ [−π/2, π/2] or
sin(arcsin y) = y, y ∈ [−1, 1]. The function arcsin x
is strictly increasing and continuous.
Note that arcsin x + arccos x = π/2 if x ∈ [−1, 1]. Indeed, let y = arcsin x; then x = sin y =
cos(π/2 − y). Since y ∈ [0, π], π/2 − y ∈ [−π/2, π/2], and we have arccos x = π/2 − y.
Therefore y + arccos x = π/2.
arccos
arcsin
3 Functions and Continuity
94
By Lemma 3.17, tan x is strictly increasing on
(−π/2, π/2). Therefore, there exists the inverse
function on the image tan(−π/2, π/2) = R.
Arctangent and Arccotangent
3
–10
–8
–6
–4
2
Proposition 3.20 (and Definition) There
exists the inverse function to tan
1
arctan : R → (−π/2, π/2)
–2
2
4
6
8
x
–1
10
(3.37)
given by arctan(tan x) = x, x ∈ (−π/2, π/2)
or tan(arctan y) = y, y ∈ R. The function
arctan x is strictly increasing and continuous.
There exists the inverse function to cot x
arccot : R → (0, π)
(3.38)
given by arccot (cot x) = x, x ∈ (0, π) or
cot(arccot y) = y, y ∈ R. The function
arccot x is strictly decreasing and continuous.
arctan
arccot
3.5.3 Hyperbolic Functions and their Inverses
Hyperbolic Cosine and Sine
The functions
3
2
1
–2
ex − e−x
,
2
ex + e−x
cosh x =
,
2
ex − e−x
tanh x = x
=
e + e−x
ex + e−x
=
coth x = x
e − e−x
sinh x =
–1
1
–1
–2
–3
cosh
sinh
2
(3.39)
(3.40)
sinh x
cosh x
cosh x
sinh x
(3.41)
(3.42)
are called hyperbolic sine, hyperbolic cosine, hyperbolic tangent, and hyperbolic cotangent, respectively.
There are many analogies between these functions and
their ordinary trigonometric counterparts.
3.6 Appendix B
95
Hyperbolic Cotangent
3
Hyperbolic Tangent
1
2
y
0.5
1
–3
–2
–1
0
1
2
3
–3
–2
–1
1
2
3
x
–1
–0.5
–2
–1
–3
The functions sinh x and tanh x are strictly increasing with sinh(R) = R and tanh(R) =
(−1, 1). Hence, their inverse functions are defined on R and on (−1, 1), respectively, and are
also strictly increasing and continuous. The function
arsinh : R → R
(3.43)
is given by arsinh (sinh(x)) = x, x ∈ R or sinh(arsinh (y)) = y, y ∈ R.
The function
artanh : (−1, 1) → R
(3.44)
is defined by artanh (tanh(x)) = x, x ∈ R or tanh(artanh (y)) = y, y ∈ (−1, 1).
The function cosh is strictly increasing on the half line R+ with cosh(R+ ) = [1, ∞). Hence,
the inverse function is defined on [1, ∞) taking values in R+ . It is also strictly increasing and
continuous.
arcosh : [1, ∞) → R+
(3.45)
is defined via arcosh (cosh(x)) = x, x ≥ 0 or by cosh(arcosh (y)) = y, y ≥ 1.
The function coth is strictly decreasing on the x < 0 and on x > 0 with coth(R \ 0) =
R \ [−1, 1]. Hence, the inverse function is defined on R \ [−1, 1] taking values in R \ 0. It is
also strictly decreasing and continuous.
arcoth : R \ [−1, 1] → R
(3.46)
is defined via arcoth (coth(x)) = x, x 6= 0 or by coth(arcoth (y)) = y, y < −1 or y > 1.
3.6 Appendix B
3.6.1 Monotonic Functions have One-Sided Limits
Proof of Theorem 3.8. By hypothesis, the set {f (t) | a < t < x} is bounded above by f (x),
and therefore has a least upper bound which we shall denote by A. Evidently A ≤ f (x). We
3 Functions and Continuity
96
have to show that A = f (x − 0).
Let ε > 0 be given. It follows from the definition of A as a least upper bound that there exists
δ > 0 such that a < x − δ < x and
A − ε < f (x − δ) ≤ A.
(3.47)
Since f is monotonic, we have
f (x − δ) < f (t) ≤ A,
if x − δ < t < x.
(3.48)
Combining (3.47) and (3.48), we see that
| f (t) − A | < ε if
x − δ < t < x.
Hence f (x − 0) = A.
The second half of (3.3) is proved in precisely the same way. Next, if a < x < y < b, we see
from (3.3) that
f (x + 0) = inf f (t) = inf f (t).
x<t<b
x<t<y
(3.49)
The last equality is obtained by applying (3.3) to (a, y) instead of (a, b). Similarly,
f (y − 0) = sup f (t) = sup f (t).
a<t<y
(3.50)
x<t<y
Comparison of the (3.49) and (3.50) gives (3.4).
3.6.2 Proofs for sin x and cos x inequalities
Proof of Lemma 3.15. (a) By (3.22)
1 2
1
1 2
4
cos x = 1 − x + x
− x +··· .
2!
4! 6!
√
0 < x < 2 implies 1 − x2 /2 > 0 and, moreover 1/(2n)! − x2 /(2n + 2)! > 0 for all n ∈ N;
hence C(x) > 0.
By (3.22),
1 2
1 2
1
5
− x +··· .
sin x = x 1 − x + x
3!
5! 7!
Now,
√
√
1
1
1 2
x > 0 ⇐⇒ x < 6,
− x2 > 0 ⇐⇒ x < 42, . . . .
3!
5! 7!
√
Hence, S(x) > 0 if 0 < x < 6. This gives (3.27). Similarly,
1
1
1 2
1 2
3
7
− x +x
− x +··· ,
x − sin x = x
3! 5!
7! 9!
1−
3.6 Appendix B
97
√
and we obtain sin x < x if 0 < x < 20. Finally we have to check whether sin x − x cos x > 0;
equivalently
?
1
1
1
1
1
1
3
5
7
−x
+x
−+···
−
−
−
0<x
2! 3!
4! 5!
6! 7!
?
2
6
3
2 4
7
2 8
0<x
+x
+···
−x
−x
3!
5!
7!
9!
√
Now 10 > x > 0 implies
2n + 2 2
2n
−
x >0
(2n + 1)! (2n + 3)!
for all n ∈ N. This completes the proof of (a)
(b) Using (3.23), we get
0 < x cos x < sin x =⇒ 0 < x2 cos2 x < sin2 x
1
.
=⇒ x2 cos2 x + cos2 x < 1 =⇒ cos2 x <
1 + x2
(c) In the proof of Lemma 3.14 we have seen that cos x is strictly decreasing in (0, π/2). By
√
(3.23), sin x = 1 − cos2 x is strictly increasing. Since sin x is an odd function, sin x is strictly
increasing on [−π/2, π/2]. Since cos x = − sin(x − π/2), the statement for cos x follows.
3.6.3 Estimates for π
Proposition 3.21 For real x we have
cos x =
n
X
(−1)k
x2k
+ r2n+2 (x)
(2k)!
(3.51)
(−1)k
x2k+1
+ r2n+3 (x),
(2k + 1)!
(3.52)
k=0
sin x =
n
X
k=0
where
| x |2n+2
| r2n+2 (x) | ≤
(2n + 2)!
| x |2n+3
| r2n+3 (x) | ≤
(2n + 3)!
Proof. Let
Put
if
| x | ≤ 2n + 3,
(3.53)
if
| x | ≤ 2n + 4.
(3.54)
x2
x2n+2
r2n+2 (x) = ±
1−
±··· .
(2n + 2)!
(2n + 3)(2n + 4)
ak :=
x2k
.
(2n + 3)(2n + 4) · · · (2n + 2(k + 1))
3 Functions and Continuity
98
Then we have, by definition
r2n+2 (x) = ±
x2n+2
(1 − a1 + a2 − + · · · ) .
(2n + 2)!
Since
ak = ak−1
x2
,
(2n + 2k + 1)(2n + 2k + 2)
| x | ≤ 2n + 3 implies
1 > a1 > a2 > · · · > 0
and finally as in the proof of the Leibniz criterion
0 ≤ 1 − a1 + a2 − a3 + − · · · ≤ 1.
Hence, | r2n+2 (x) | ≤ | x |2n+2 /(2n + 2)!. The estimate for the remainder of the sine series is
similar.
This is an application of Proposition 3.21. For numerical calculations it is convenient to use the
following order of operations
cos x = · · ·
−x2
−x2
−x2
+1
+1
+1 ···
2n(2n − 1)
(2n − 2)(2n − 3)
(2n − 4)(2n − 5)
−x2
+ 1 + r2n+2 (x).
···
2
First we compute cos 1.5 and cos 1.6. Choosing n = 7 we obtain
cos x =
−x2
−x2
−x2
−x2
−x2
−x2
−x2
+1
+1
+1
+1
+1
+1
+
182
132
90
56
30
12
2
+ 1 + r16 (x).
By Proposition 3.21
| r16 (x) | ≤
| x |16
≤ 0.9 · 10−10
16!
if
| x | ≤ 1.6.
The calculations give
cos 1.5 = 0.07073720163 ± 20 · 10−11 > 0, cos 1.6 = −0.02919952239 ± 20 · 10−11 < 0.
By the itermediate value theorem, 1.5 < π/2 < 1.6.
3.6 Appendix B
99
Now we compute cos x for two values of x which are close to
the linear interpolation
cos 1.5
a = 1.5 + 0.1
1.5
a
cos 1.5
= 1.57078 . . .
cos 1.5 − cos 1.6
cos 1.5707 = 0.000096326273 ± 20 · 10−11 > 0,
1.6
cos 1.5708 = −0.00000367326 ± 20 · 10−11 < 0.
cos 1.6
Hence, 1.5707 < π/2 < 1.5708.
The next linear interpolation gives
cos 1.5707
= 1.570796326 . . .
cos 1.707 − cos 1.708
cos 1.570796326 = 0.00000000073 ± 20 · 10−11 > 0,
b = 1.5707 + 0.00001
cos 1.570796327 = −0.00000000027 ± 20 · 10−11 < 0.
Therefore 1.570796326 < π/2 < 1.570796327 so that
π = 3.141592653 ± 10−9.
100
3 Functions and Continuity
Chapter 4
Differentiation
4.1 The Derivative of a Function
We define the derivative of a function and prove the main properties like product, quotient and
chain rule. We relate the derivative of a function with the derivative of its inverse function. We
prove the mean value theorem and consider local extrema. Taylor’s theorem will be formulated.
Definition 4.1 Let f : (a, b) → R be a function and x0 ∈ (a, b). If the limit
lim
x→x0
f (x) − f (x0 )
x − x0
(4.1)
exists, we call f differentiable at x0 . The limit is denoted by f ′ (x0 ). We say f is differentiable
if f is differentiable at every point x ∈ (a, b). We thus have associated to every function f a
function f ′ whose domain is the set of points x0 where the limit (4.1) exists; f ′ is called the
derivative of f .
Sometimes the Leibniz notation is used to denote the derivative of f
f ′ (x0 ) =
df (x0 )
d
=
f (x0 ).
dx
dx
f (x0 + h) − f (x0 )
Remarks 4.1 (a) Replacing x − x0 by h , we see that f ′ (x0 ) = lim
.
h→0
h
(b) The limits
f (x0 + h) − f (x0 )
f (x0 + h) − f (x0 )
,
lim
lim
h→0−0
h→0+0
h
h
are called left-hand and right-hand derivatives of f in x0 , respectively. In particular for
f : [a, b] → R, we can consider the right-hand derivative at a and the left-hand derivative at b.
Example 4.1 (a) For f (x) = c the constant function
f (x) − f (x0 )
c−c
= lim
= 0.
x→x0
x→x0 x − x0
x − x0
f ′ (x0 ) = lim
(b) For f (x) = x,
f ′ (x0 ) = lim
x→x0
x − x0
= 1.
x − x0
101
4 Differentiation
102
(c) The slope of the tangent line. Given a function f : (a, b) → R which is differentiable at x0 .
Then f ′ (x0 ) is the slope of the tangent line to the graph of f through the point (x0 , f (x0 )).
The slope of the secant line through (x0 , f (x0 )) and
(x1 , f (x1 )) is
y
f( x 1)
m = tan α1 =
f( x 0)
α1
x0
x1
x
f (x1 ) − f (x0 )
.
x1 − x0
One can see: If x1 approaches x0 , the secant line through
(x0 , f (x0 )) and (x1 , f (x1 )) approaches the tangent line
through (x0 , f (x0 )). Hence, the slope of the tangent line
is the limit of the slopes of the secant lines if x approaches
x0 :
f (x) − f (x0 )
.
f ′ (x0 ) = lim
x→x0
x − x0
Proposition 4.1 If f is differentiable at x0 ∈ (a, b), then f is continuous at x0 .
Proof. By Proposition 3.2 we have
f (x) − f (x0 )
(x − x0 ) = f ′ (x0 ) lim (x − x0 ) = f ′ (x0 ) · 0 = 0.
x→x0
x→x0
x − x0
lim (f (x) − f (x0 )) = lim
x→x0
The converse of this proposition is not true. For example f (x) = | x | is continuous in R but
|h|
|h|
= 1 whereas lim
= −1. Later we will become
differentiable in R \ {0} since lim
h→0+0 h
h→0−0 h
aquainted with a function which is continuous on the whole line without being differentiable at
any point!
Proposition 4.2 Let f : (r, s) → R be a function and a ∈ (r, s). Then f is differentiable at a if
and only if there exists a number c ∈ R and a function ϕ defined in a neighborhood of a such
that
f (x) = f (a) + (x − a)c + ϕ(x),
(4.2)
where
lim
x→a
In this case f ′ (a) = c.
ϕ(x)
= 0.
x−a
(4.3)
The proposition says that a function f differentiable at a can be approximated by a linear function, in our case by
y = f (a) + (x − a)f ′ (a).
The graph of this linear function is the tangent line to the graph of f at the point (a, f (a)). Later
we will use this point of view to define differentiability of functions f : Rn → Rm .
4.1 The Derivative of a Function
103
Proof. Suppose first f satisfies (4.2) and (4.3). Then
ϕ(x)
f (x) − f (a)
lim
= c.
= lim c +
x→a
x→a
x−a
x−a
Hence, f is differentiable at a with f ′ (a) = c.
Now, let f be differentiable at a with f ′ (a) = c. Put ϕ(x) = f (x) − f (a) − (x − a)f ′ (a). Then
lim
x→a
ϕ(x)
f (x) − f (a)
= lim
− f ′ (a) = 0.
x→a
x−a
x−a
Let us compute the linear function whose graph
is the tangent line through (x0 , f (x0 )). Consider the rectangular triangle P P0Q0 . By Example 4.1 (c) we have
y
y
P (x,y)
α
P0
f( x 0)
f ′ (x0 ) = tan α =
Q
x0
x
0
y − y0
,
x − x0
such that the tangent line has the equation
x
y = g(x) = f (x0 ) + f ′ (x0 )(x − x0 ).
This function is called the linearization of f at x0 . It is also the Taylor polynomial of degree 1
of f at x0 , see Section4.5 below.
Proposition 4.3 Suppose f and g are defined on (a, b) and are differentible at a point x ∈
(a, b). Then f + g, f g, and f /g are differentiable at x and
(a) (f + g)′ (x) = f ′ (x) + g ′(x);
′
′
′
(b) (f g)
(x) = f (x)g(x) + f (x)g (x);
(c)
f
g
′
(x) =
f ′ (x)g(x)−f (x)g ′ (x)
.
g(x)2
In (c), we assume that g(x) 6= 0.
Proof. (a) Since
f (x + h) − f (x) g(x + h) − g(x)
(f + g)(x + h) − (f + g)(x)
=
+
,
h
h
h
the claim follows from Proposition 3.2.
Let h = f g and t be variable. Then
h(t) − h(x) = f (t)(g(t) − g(x)) + g(x)(f (t) − f (x))
h(t) − h(x)
g(t) − g(x)
f (t) − f (x)
= f (t)
+ g(x)
.
t−x
t−x
t−x
4 Differentiation
104
Noting that f (t) → f (x) as t → x, (b) follows.
Next let h = f /g. Then
f (t)
g(t)
−
f (x)
g(x)
f (t)g(x) − f (x)g(t)
t−x
g(x)g(t)(t − x)
1
f (t)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(t)
=
g(t)g(x)
t−x
f (t) − f (x)
g(t) − g(x)
1
g(x)
.
− f (x)
=
g(t)g(x)
t−x
t−x
h(t) − h(x)
=
t−x
=
Letting t → x, and applying Propositions 3.2 and 4.1, we obtain (c).
Example 4.2 (a) f (x) = xn , n ∈ Z. We will prove f ′ (x) = nxn−1 by induction on n ∈ N.
The cases n = 0, 1 are OK by Example 4.1. Suppose the statement is true for some fixed n. We
will show that (xn+1 )′ = (n + 1)xn .
By the product rule and the induction hypothesis
(xn+1 )′ = (xn · x)′ = (xn )′ x + xn (x′ ) = nxn−1 x + xn = (n + 1)xn .
This proves the claim for positive integers n. For negative n consider f (x) = 1/x−n and use
the quotient rule.
(b) (ex )′ = ex .
ex+h − ex
ex eh − ex
eh − 1
= lim
= ex lim
= ex ;
h→0
h→0
h→0
h
h
h
(ex )′ = lim
(4.4)
the last equation simply follows from Example 3.9 (b)
(c) (sin x)′ = cos x, (cos x)′ = − sin x. Using sin(x + y) − sin(x − y) = 2 cos (x) sin (y) we
have
2 cos 2x+h
sin h2
sin(x + h) − sin x
2
= lim
(sin x) = lim
h→0
h→0
h
h
sin h2
h
lim
= lim cos x +
.
h→0
2 h→0 h2
′
sin h
= 1 (by the argument after Lemma 3.15), we obtain
h→0 h
(sin x)′ = cos x. The proof for cos x is analogous.
1
(d) (tan x)′ =
. Using the quotiont rule for the function tan x = sin x/ cos x we have
cos2 x
Since cos x is continuous and lim
(tan x)′ =
(sin x)′ cos x − sin x(cos x)′
cos2 x + sin2 x
1
=
=
.
2
2
cos x
cos x
cos2 x
The next proposition deals with composite functions and is probably the most important statement about derivatives.
4.1 The Derivative of a Function
105
Proposition 4.4 (Chain rule) Let g : (α, β) → R be differentiable at x0 ∈ (α, β) and let
f : (a, b) → R be differentiable at y0 = g(x0 ) ∈ (a, b). Then h = f ◦g is differentiable at
x0 , and
h′ (x0 ) = f ′ (y0 )g ′ (x0 ).
(4.5)
Proof. We have
f (g(x)) − f (g(x0 )) g(x) − g(x0 )
f (g(x)) − f (g(x0 ))
=
x − x0
g(x) − g(x0 )
x − x0
f (y) − f (y0 ) ′
−→ lim
· g (x0 ) = f ′ (y0 )g ′ (x0 ).
x→x0 y→y0
y − y0
Here we used that y = g(x) tends to y0 = g(x0 ) as x → x0 , since g is continuous at x0 .
Proposition 4.5 Let f : (a, b) → R be strictly monotonic and continuous. Suppose f is differentiable at x. Then the inverse function g = f −1 : f ((a, b)) → R is differentiable at y = f (x)
with
g ′ (y) =
1
f ′ (x)
=
1
f ′ (g(y))
.
(4.6)
Proof. Let (yn ) ⊂ f ((a, b)) be a sequence with yn → y and yn 6= y for all n. Put xn = g(yn ).
Since g is continuous (by Corollary 3.9), limn→∞ xn = x. Since g is injective, xn 6= x for all n.
We have
g(yn) − g(y)
xn − x
= lim
= lim
n→∞
n→∞ f (xn ) − f (x)
n→∞
yn − y
lim
1
f (xn )−f (x)
xn −x
=
1
f ′ (x)
.
Hence g ′ (y) = 1/f ′ (x) = 1/f ′ (g(y)).
We give some applications of the last two propositions.
Example 4.3 (a) Let f : R → R be differentiable; define F : R → R by F (x) := f (ax + b)
with some a, b ∈ R. Then
F ′ (x) = af ′ (ax + b).
(b) In what follows f is the original function (with known derivative) and g is the inverse
function to f . We fix the notion y = f (x) and x = g(y).
log : R+ \ {0} → R is the inverse function to f (x) = ex . By the above proposition
(log y)′ =
1
1
1
= x = .
x
′
(e )
e
y
1
(c) xα = eα log x . Hence, (xα )′ = (eα log x )′ = eα log x α = αxα−1 .
x
′
′1
(d) Suppose f > 0 and g = log f . Then g = f ; hence f ′ = f g ′ .
f
4 Differentiation
106
(e) arcsin : [−1, 1] → R is the inverse function to y = f (x) = sin x. If x ∈ (−1, 1) then
(arcsin(y))′ =
1
1
=
.
′
(sin x)
cos x
Since y ∈ [−1,p1] implies x = arcsin y ∈ [−π/2, π/2], cos x ≥ 0. Therefore, cos x =
p
1 − sin2 x = 1 − y 2. Hence
1
,
(arcsin y)′ = p
1 − y2
−1 < y < 1.
Note that the derivative is not defined at the endpoints y = −1 and y = 1.
(f)
1
1
= 1 = cos2 x.
(arctan y)′ =
′
(tan x)
cos2 x
Since y = tan x we have
y 2 = tan2 x =
1
1 + y2
1
(arctan y)′ =
.
1 + y2
cos2 x =
1 − cos2 x
1
sin2 x
=
=
−1
2
2
cos x
cos x
cos2 x
4.2 The Derivatives of Elementary Functions
107
4.2 The Derivatives of Elementary Functions
function
const.
derivative
0
xn (n ∈ N)
nxn−1
xα (α ∈ R, x > 0)
αxα−1
ex
ex
ax , (a > 0)
ax log a
1
x
1
x log a
cos x
log x
loga x
sin x
cos x
tan x
cot x
sinh x
cosh x
tanh x
coth x
arcsin x
arccos x
arctan x
arccot x
arsinh x
arcosh x
artanh x
arcoth x
− sin x
1
cos2 x
1
− 2
sin x
cosh x
sinh x
1
cosh2 x
1
−
sinh2 x
1
√
1 − x2
1
−√
1 − x2
1
1 + x2
1
−
1 + x2
1
√
x2 + 1
1
√
x2 − 1
1
1 − x2
1
1 − x2
4 Differentiation
108
4.2.1 Derivatives of Higher Order
Let f : D → R be differentiable. If the derivative f ′ : D → R is differentiable at x ∈ D, then
d2 f (x)
= f ′′ (x) = (f ′ )′ (x)
dx2
is called the second derivative of f at x. Similarly, one defines inductively higher order derivatives. Continuing in this manner, we obtain functions
f, f ′ , f ′′ , f (3) , . . . , f (k)
each of which is the derivative of the preceding one. f (n) is called the nth derivative of f or the
derivative of order n of f . We also use the Leibniz notation
f
(k)
dk f (x)
(x) =
=
dxk
d
dx
k
f (x).
Definition 4.2 Let D ⊂ R and k ∈ N a positive integer. We denote by Ck (D) the set of all
functions f : D → R such that f (k) (x) exists for all x ∈ D and f (k) (x) is continuous. Obviously
C(D) ⊃ C1 (D) ⊃ C2 (D) ⊃ · · · . Further, we set
\
Ck (D) = {f : D → R | f (k) (x) exists ∀ k ∈ N, x ∈ D}.
(4.7)
C∞ (D) =
N
k∈
f ∈ Ck (D) is called k times continuously differentiable. C(D) = C0 (D) is the vector space of
continuous functions on D.
Using induction over n, one proves the following proposition.
Proposition 4.6 (Leibniz formula) Let f and g be n times differentiable. Then f g is n times
differentiable with
(f (x)g(x))
(n)
n X
n (k)
f (x)g (n−k) (x).
=
k
k=0
(4.8)
4.3 Local Extrema and the Mean Value Theorem
Many properties of a function f like monotony, convexity, and existence of local extrema can
be studied using the derivative f ′ . From estimates for f ′ we obtain estimates for the growth of
f.
Definition 4.3 Let f : [a, b] → R be a function. We say that f has a local maximum at the point
ξ, ξ ∈ (a, b), if there exists δ > 0 such that f (x) ≤ f (ξ) for all x ∈ [a, b] with | x − ξ | < δ.
Local minima are defined likewise.
We say that ξ is a local extremum if it is either a local maximum or a local minimum.
4.3 Local Extrema and the Mean Value Theorem
109
Proposition 4.7 Let f be defined on [a, b]. If f has a local extremum at a point ξ ∈ (a, b), and
if f ′ (ξ) exists, then f ′ (ξ) = 0.
Proof. Suppose f has a local maximum at ξ. According with the definition choose δ > 0 such
that
a < ξ − δ < ξ < ξ + δ < b.
If ξ − δ < x < ξ, then
Letting x → ξ, we see that f ′ (ξ) ≥ 0.
If ξ < x < ξ + δ, then
f (x) − f (ξ)
≥ 0.
x−ξ
f (x) − f (ξ)
≤ 0.
x−ξ
Letting x → ξ, we see that f ′ (ξ) ≤ 0. Hence, f ′ (ξ) = 0.
Remarks 4.2 (a) f ′ (x) = 0 is a necessary but not a sufficient condition for a local extremum
in x. For example f (x) = x3 has f ′ (x) = 0, but x3 has no local extremum.
(b) If f attains its local extrema at the boundary, like f (x) = x on [0, 1], we do not have
f ′ (ξ) = 0.
Theorem 4.8 (Rolle’s Theorem) Let f : [a, b] → R be continuous with f (a) = f (b) and let f
be differentiable in (a, b). Then there exists a point ξ ∈ (a, b) with f ′ (ξ) = 0.
In particular, between two zeros of a differentiable function there is a zero of its derivative.
Proof. If f is the constant function, the theorem is trivial since f ′ (x) ≡ 0 on (a, b). Otherwise, there exists x0 ∈ (a, b) such that f (x0 ) > f (a) or f (x0 ) < f (a). Then f attains its
maximum or minimum, respectively, at a point ξ ∈ (a, b). By Proposition 4.7, f ′ (ξ) = 0.
Theorem 4.9 (Mean Value Theorem) Let f : [a, b] → R be continuous and differentiable in
(a, b). Then there exists a point ξ ∈ (a, b) such that
f ′ (ξ) =
f (b) − f (a)
b−a
(4.9)
Geometrically, the mean value theorem states that there exists
a tangent line through some point (ξ, f (ξ)) which is parallel
to the secant line AB, A = (a, f (a)), B = (b, f (b)).
a
ξ
b
4 Differentiation
110
Theorem 4.10 (Generalized Mean Value Theorem) Let f and g be continuous functions on
[a, b] which are differentiable on (a, b). Then there exists a point ξ ∈ (a, b) such that
(f (b) − f (a))g ′(ξ) = (g(b) − g(a))f ′ (ξ).
Proof. Put
h(t) = (f (b) − f (a))g(t) − (g(b) − g(a))f (t).
Then h is continuous in [a, b] and differentiable in (a, b) and
h(a) = f (b)g(a) − f (a)g(b) = h(b).
Rolle’s theorem shows that there exists ξ ∈ (a, b) such that
h′ (ξ) = f (b) − f (a))h′ (ξ) − (g(b) − g(a))f ′ (ξ) = 0.
The theorem follows.
In case that g ′ is nonzero on (a, b) and g(b)−g(a) 6= 0, the generalized MVT states the existence
of some ξ ∈ (a, b) such that
f ′ (ξ)
f (b) − f (a)
= ′ .
g(b) − g(a)
g (ξ)
This is in particular true for g(x) = x and g ′ = 1 which gives the assertion of the Mean Value
Theorem.
Remark 4.3 Note that the MVT fails if f is complex-valued, continuous on [a, b], and differentiable on (a, b). Indeed, f (x) = eix on [0, 2π] is a counter example. f is continuous on [0, 2π],
differentiable on (0, 2π) and f (0) = f (2π) = 1. However, there is no ξ ∈ (0, 2π) such that
(0)
0 = f (2π)−f
= f ′ (ξ) = ieiξ since the exponential function has no zero, see (3.7) (ez · e−z = 1)
2π
in Subsection 3.5.1.
Corollary 4.11 Suppose f is differentiable on (a, b).
If f ′ (x) ≥ 0 for all x ∈ (a, b), then f in monotonically increasing.
If f ′ (x) = 0 for all x ∈ (a, b), then f is constant.
If f ′ (x) ≤ 0 for all x in (a, b), then f is monotonically decreasing.
Proof. All conclusions can be read off from the equality
f (x) − f (t) = (x − t)f ′ (ξ)
which is valid for each pair x, t, a < t < x < b and for some ξ ∈ (t, x).
4.3 Local Extrema and the Mean Value Theorem
111
4.3.1 Local Extrema and Convexity
Proposition 4.12 Let f : (a, b) → R be differentiable and suppose f ′′ (ξ) exists at a point
ξ ∈ (a, b). If
f ′ (ξ) = 0 and f ′′ (ξ) > 0,
then f has a local minimum at ξ. Similarly, if
f ′ (ξ) = 0
and f ′′ (ξ) < 0,
f has a local maximum at ξ.
Remark 4.4 The condition of Proposition 4.12 is sufficient but not necessary for the existence
of a local extremum. For example, f (x) = x4 has a local minimum at x = 0, but f ′′ (0) = 0.
Proof. We consider the case f ′′ (ξ) > 0; the proof of the other case is analogous. Since
f ′ (x) − f ′ (ξ)
> 0.
x→ξ
x−ξ
f ′′ (ξ) = lim
By Homework 10.4 there exists δ > 0 such that
| f ′′ (ξ) |
f ′ (x) − f ′ (ξ)
>
> 0,
x−ξ
2
for all x with 0 < | x − ξ | < δ.
Since f ′ (ξ) = 0 it follows that
f ′ (x) < 0
′
f (x) > 0
if ξ − δ < x < ξ,
if ξ < x < ξ + δ.
Hence, by Corollary 4.11, f is decreasing in (ξ − δ, ξ) and increasing in (ξ, ξ + δ). Therefore,
f has a local minimum at ξ.
Definition 4.4 A function f : (a, b) → R is
said to be convex if for all x, y ∈ (a, b) and all
λ ∈ [0, 1]
λ f(x)+(1- λ )f(y)
f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y).
(4.10)
f( λ x+(1- λ )y )
x
λ x+(1- λ )y
y
A function f is said to be concave if −f is convex.
Proposition 4.13 (a) Convex functions are continuous.
(b) Suppose f : (a, b) → R is twice differentiable. Then f is convex if and only if f ′′ (x) ≥ 0 for
all x ∈ (a, b).
Proof. The proof is in Appendix C to this chapter.
4 Differentiation
112
4.4 L’Hospital’s Rule
Theorem 4.14 (L’Hospital’s Rule) Suppose f and g are differentiable in (a, b) and g(x) 6= 0
for all x ∈ (a, b), where −∞ ≤ a < b ≤ +∞. Suppose
f ′ (x)
= A.
x→a+0 g ′ (x)
lim
(4.11)
If
(a)
(b)
lim f (x) = lim g(x) = 0 or
(4.12)
lim f (x) = lim g(x) = +∞,
(4.13)
x→a+0
x→a+0
x→a+0
x→a+0
then
f (x)
= A.
x→a+0 g(x)
lim
(4.14)
The analogous statements are of course also true if x → b − 0, or if g(x) → −∞.
Proof. First we consider the case of finite a ∈ R. (a) One can extend the definition of f and
g via f (a) = g(a) = 0. Then f and g are continuous at a. By the generalized mean value
theorem, for every x ∈ (a, b) there exists a ξ ∈ (a, x) such that
f (x)
f ′ (ξ)
f (x) − f (a)
=
= ′ .
g(x) − g(a)
g(x)
g (ξ)
If x approaches a then ξ also approaches a, and (a) follows.
(b) Now let f (a + 0) = g(a + 0) = +∞. Given ε > 0 choose δ > 0 such that
′
f (t)
g ′ (t) − A < ε
if t ∈ (a, a + δ). By the generalized mean value theorem for any x, y ∈ (a, a + δ) with x 6= y,
f (x) − f (y)
g(x) − g(y) − A < ε.
We have
f (x) − f (y) 1 −
f (x)
=
g(x)
g(x) − g(y) 1 −
g(y)
g(x)
f (y)
f (x)
.
The right factor tends to 1 as x approaches a, in particular there exists δ1 > 0 with δ1 < δ such
that x ∈ (a, a + δ1 ) implies
f (x) f (x) − f (y) g(x) − g(x) − g(y) < ε.
Further, the triangle inequality gives
f (x)
< 2ε.
−
A
g(x)
4.5 Taylor’s Theorem
113
This proves (b).
The case x → +∞ can be reduced to the limit process y → 0 + 0 using the substitution
y = 1/x.
L’Hospital’s rule applies also applies in the cases A = +∞ and A = −∞.
sin x
cos x
= lim
= 1.
Example 4.4 (a) lim
x→0 x
x→0
1
√
1
√
x
1
2 x
(b) lim
= lim
= lim √
= +∞.
x→0+0 1 − cos x
x→0+0 sin x
x→0+0 2 x sin x
(c)
1
log x
x
= lim
= lim −x = 0.
lim x log x = lim
1
x→0+0
x→0+0
x→0+0 − 12
x→0+0
x
x
0 ∞
Remark 4.5 It is easy to transform other indefinite expressions to or
of l’Hospital’s rule.
0 ∞
f
0·∞: f ·g = 1
g
∞−∞:
f −g =
1
g
−
1
fg
g log f
1
f
;
00 : f g = e
.
Similarly, expressions of the form 1∞ and ∞0 can be transformed.
4.5 Taylor’s Theorem
The aim of this section is to show how n times differentiable functions can be approximated by
polynomials of degree n.
First consider a polynomial p(x) = an xn + · · · + a1 x + a0 . We compute
p′ (x) = nan xn−1 + (n − 1)an−1 xn−2 + · · · + a1 ,
p′′ (x) = n(n − 1)an xn−2 + (n − 1)(n − 2)an−1 xn−2 + · · · + 2a2 ,
..
.
p(n) (x) = n! an .
Inserting x = 0 gives p(0) = a0 , p′ (0) = a1 , p′′ (0) = 2a2 ,. . . , p(n) (0) = n!an . Hence,
p(x) = p(0) +
p(n) (0) n
p′′ (0)
p′′ (0) 2
x+
x +···+
x .
1!
2!
n!
Now, fix a ∈ R and let q(x) = p(x + a). Since q (k) (0) = p(k) (a), (4.15) gives
p(x + a) = q(x) =
p(x + a) =
n
X
q (k) (0)
k=0
n
X
k=0
k!
xk ,
p(k) (a) k
x .
k!
(4.15)
4 Differentiation
114
Replacing in the above equation x + a by x yields
p(x) =
n
X
p(k) (a)
k=0
k!
(x − a)k .
(4.16)
Theorem 4.15 (Taylor’s Theorem) Suppose f is a real function on [r, s], n ∈ N, f (n) is continuous on [r, s], f (n+1) (t) exists for all t ∈ (r, s). Let a and x be distinct points of [r, s] and
define
Pn (x) =
n
X
f (k) (a)
k=0
k!
(x − a)k .
(4.17)
Then there exists a point ξ between x and a such that
f (x) = Pn (x) +
f (n+1) (ξ)
(x − a)n+1 .
(n + 1)!
(4.18)
For n = 0, this is just the mean value theorem. Pn (x) is called the nth Taylor polynomial of f
at x = a, and the second summand of (4.18)
Rn+1 (x, a) =
f (n+1) (ξ)
(x − a)n+1
(n + 1)!
is called the Lagrangian remainder term.
In general, the theorem shows that f can be approximated by a polynomial
of degree n, and that
(n+1)
(4.18) allows to estimate the error, if we know the bounds of f
(x) .
Proof. Consider a and x to be fixed; let M be the number defined by
f (x) = Pn (x) + M(x − a)n+1
and put
g(t) = f (t) − Pn (t) − M(t − a)n+1 ,
for r ≤ t ≤ s.
(4.19)
We have to show that (n + 1)!M = f (n+1) (ξ) for some ξ between a and x. By (4.17) and (4.19),
g (n+1) (t) = f (n+1) (t) − (n + 1)!M,
for r < t < s.
(4.20)
Hence the proof will be complete if we can show that g (n+1) (ξ) = 0 for some ξ between a and
x.
(k)
Since Pn (a) = f (k) (a) for k = 0, 1, . . . , n, we have
g(a) = g ′ (a) = · · · = g (n) (a) = 0.
Our choice of M shows that g(x) = 0, so that g ′ (ξ1 ) = 0 for some ξ1 between a and x, by
Rolle’s theorem. Since g ′ (a) = 0 we conclude similarly that g ′′ (ξ2 ) = 0 for some ξ2 between
a and ξ1 . After n + 1 steps we arrive at the conclusion that g (n+1) (ξn+1 ) = 0 for some ξn+1
between a and ξn , that is, between a and x.
4.5 Taylor’s Theorem
115
Definition 4.5 Suppose that f is a real function defined on [r, s] such that f (n) (t) exists for all
t ∈ (r, s) and all n ∈ N. Let x and a points of [r, s]. Then
Tf (x) =
∞
X
f (k) (a)
k=0
k!
(x − a)k
(4.21)
is called the Taylor series of f at a.
Remarks 4.6 (a) The radius r of convergence of a Taylor series can be 0.
(b) If Tf converges, it may happen that Tf (x) 6= f (x). If Tf (x) at a point a converges to f (x)
in a certain neighborhood Ur (a), r > 0, f is called to be analytic at a.
Example 4.5 We give an example for (b). Define f : R → R via
(
2
if x 6= 0,
e−1/x ,
f (x) =
0,
if x = 0.
We will show that f ∈ C∞ (R) with f (k) (0) = 0. For we will prove by induction on n that there
exists a polynomial pn such that
1 −1/x2
(n)
, x 6= 0
f (x) = pn
e
x
and f (n) (0) = 0. For n = 0 the statement is clear taking p0 (x) ≡ 1. Suppose the statement is
true for n. First, let x 6= 0 then
′ 2
1 −1/x2
1 ′ 1
1
2
(n+1)
f
e
+ 3 pn
e−1/x .
(x) = pn
= − 2 pn
x
x
x
x
x
Choose pn+1 (t) = −p′n (t)t2 + 2pn (t)t3 .
Secondly,
pn
f (n) (h) − f (n) (0)
= lim
f (n+1) (0) = lim
h→0
h→0
h
1
h
2
e−1/h
2
= lim xpn (x)e−x = 0,
x→±∞
h
where we used Proposition 2.7 in the last equality.
Hence Tf ≡ 0 at 0—the Taylor series is identically 0—and Tf (x) does not converge to f (x) in
a neigborhood of 0.
4.5.1 Examples of Taylor Series
(a) Power series coincide with their Taylor series.
ex =
∞
X
xn
n=0
n!
,
x ∈ R,
(b) f (x) = log(1 + x), see Homework 13.4.
∞
X
n=0
xn =
1
,
1−x
x ∈ (−1, 1).
4 Differentiation
116
(c) f (x) = (1 + x)α , α ∈ R, a = 0. We have
f (k) (x) = α(α−1) · · · (α−k+1)(1+x)α−k ,
in particular f (k) (0) = α(α−1) · · · (α−k+1).
Therefore,
(1 + x)α =
n
X
α(α − 1) · · · (α − k + 1)
k!
k=1
xk + Rn (x)
(4.22)
The quotient test shows that the corresponding power series converges for | x | < 1. Consider
the Lagrangian remainder term with 0 < ξ < x < 1 and n + 1 > α. Then
α
α
α
−→ 0
(1 + ξ)α−n−1xn+1 ≤ xn+1 ≤ | Rn+1 (x) | = n+1
n+1
n+1 as n → ∞. Hence,
α
(1 + x) =
∞ X
α
n=0
n
xn ,
0 < x < 1.
(4.23)
(4.23) is called the binomial series. Its radius of convergence is R = 1. Looking at other forms
of the remainder term gives that (4.23) holds for −1 < x < 1.
(d) y = f (x) = arctan x. Since y ′ = 1/(1 + x2 ) and y ′′ = −2x/(1 + x2 )2 we see that
y ′(1 + x2 ) = 1.
Differentiating this n times and using Leibniz’s formula, Proposition 4.6 we have
n
X
′ (k)
2 (n−k)
(y ) (1 + x )
k=0
n
= 0.
k
n (n+1)
n
n
2
(n)
y
=⇒
y 2x +
y (n−1) 2 = 0;
(1 + x ) +
n
n−1
n−2
x=0:
This yields
y (n) (0) =
(
0,
k
(−1) (2k)!,
y (n+1) + n(n − 1)y (n−1) = 0.
if
n = 2k,
if
n = 2k + 1.
Therefore,
n
X
(−1)k 2k+1
x
arctan x =
+ R2n+2 (x).
2k
+
1
k=0
(4.24)
One can prove that −1 < x ≤ 1 implies R2n+2 (x) → 0 as n → ∞. In particular, x = 1 gives
π
1 1
= 1− + − +··· .
4
3 5
4.6 Appendix C
117
4.6 Appendix C
Corollary 4.16 (to the mean value theorem) Let f : R → R be a differentiable function with
f ′ (x) = cf (x) for all x ∈ R,
(4.25)
where c ∈ R is a fixed number. Let A = f (0). Then
f (x) = Aecx
for all x ∈ R.
(4.26)
Proof. Consider F (x) = f (x)e−cx . Using the product rule for derivatives and (4.25) we obtain
F ′ (x) = f ′ (x)e−cx + f (x)(−c)e−cx = (f ′ (x) − cf ′ (x)) e−cx = 0.
By Corollary 4.11, F (x) is constant. Since F (0) = f (0) = A, F (x) = A for all x ∈ R; the
statement follows.
The Continuity of derivatives
We have seen that there exist derivatives f ′ which are not continuous at some point. However,
not every function is a derivative. In particular, derivatives which exist at every point of an interval have one important property: The intermediate value theorem holds. The precise statement
follows.
Proposition 4.17 Suppose f is differentiable on [a, b] and suppose f ′ (a) < λ < f ′ (b). Then
there is a point x ∈ (a, b) such that f ′ (x) = λ.
Proof. Put g(t) = f (t) − λt. Then g is differentiable and g ′ (a) < 0. Therefore, g(t1) < g(a)
for some t1 ∈ (a, b). Similarly, g ′ (b) > 0, so that g(t2 ) < g(b) for some t2 ∈ (a, b). Hence,
g attains its minimum in the open interval (a, b) in some point x ∈ (a, b). By Proposition 4.7,
g ′(x) = 0. Hence, f ′ (x) = λ.
Corollary 4.18 If f is differentiable on [a, b], then f ′ cannot have discontinuities of the first
kind.
Proof of Proposition 4.13. (a) Suppose first that f ′′ ≥ 0 for all x. By Corollary 4.11, f ′ is
increasing. Let a < x < y < b and λ ∈ [0, 1]. Put t = λx + (1 − λ)y. Then x < t < y and by
the mean value theorem there exist ξ1 ∈ (x, t) and ξ2 ∈ (t, y) such that
f (y) − f (t)
f (t) − f (x)
= f ′ (ξ1 ) ≤ f ′ (ξ2 ) =
.
t−x
y−t
Since t − x = (1 − λ)(y − x) and y − t = λ(y − x) it follows that
f (t) − f (x)
f (y) − f (t)
≤
1−λ
λ
=⇒ f (t) ≤ λf (x) + (1 − λ)f (y).
4 Differentiation
118
Hence, f is convex.
(b) Let f : (a, b) → R be convex and twice differentiable. Suppose to the contrary f ′′ (x0 ) < 0
for some x0 ∈ (a, b). Let c = f ′ (x0 ); put
ϕ(x) = f (x) − (x − x0 )c.
Then ϕ : (a, b) → R is twice differentiable with ϕ′ (x0 ) = 0 and ϕ′′ (x0 ) < 0. Hence, by
Proposition 4.12, ϕ has a local maximum in x0 . By definition, there is a δ > 0 such that
Uδ (x0 ) ⊂ (a, b) and
ϕ(x0 − δ) < ϕ(x0 ), ϕ(x0 + δ) < ϕ(x0 ).
It follows that
f (x0 ) = ϕ(x0 ) >
1
1
(ϕ(x0 − δ) + ϕ(x0 + δ)) = (f (x0 − δ) + f (x0 + δ)) .
2
2
This contradicts the convexity of f if we set x = x0 − δ, y = x0 + δ, and λ = 1/2.
Chapter 5
Integration
In the first section of this chapter derivatives will not appear! Roughly speaking, integration
generalizes “addition”. The formula distance = velocity × time is only valid for constant
Rt
velocity. The right formula is s = t01 v(t) dt. We need integrals to compute length of curves,
areas of surfaces, and volumes.
The study of integrals requires a long preparation, but once this preliminary work has been
completed, integrals will be an invaluable tool for creating new functions, and the derivative
will reappear more powerful than ever. The relation between the integral and derivatives is
given in the Fundamental Theorem of Calculus.
The integral formalizes a simple intuitive concept—that of
area. It is not a surprise that to learn the definition of an intuitive concept can present great difficulties—“area” is certainly
not an exception.
What’s the area ??
a
b
5.1 The Riemann–Stieltjes Integral
In this section we will only define the area of some very special regions—those which are
bounded by the horizontal axis, the vertical lines through (a, 0) and (b, 0) and the graph of a
function f such that f (x) ≥ 0 for all x in [a, b]. If f is negative on a subinterval of [a, b], the
integral will represent the difference of the areas above and below the x-axis.
All intervals [a, b] are finite intervals.
Definition 5.1 Let [a, b] be an interval. By a partition of [a, b] we mean a finite set of points
x0 , x1 , . . . , xn , where
a = x0 ≤ x1 ≤ · · · ≤ xn = b.
We write
∆xi = xi − xi−1 ,
119
i = 1, . . . , n.
5 Integration
120
Now suppose f is a bounded real function defined on [a, b]. Corresponding to each partition P
of [a, b] we put
Mi = sup{f (x) | x ∈ [xi−1 , xi ]}
x=
a
0
x1
x2
x=
3 b
mi = inf{f (x) | x ∈ [xi−1 , xi ]}
n
n
X
X
U(P, f ) =
Mi ∆xi , L(P, f ) =
mi ∆xi ,
i=1
(5.1)
(5.2)
(5.3)
i=1
and finally
Z
Z
b
f dx = inf U(P, f ),
(5.4)
f dx = sup L(P, f ),
(5.5)
a
b
a
where the infimum and supremum are taken over all partitions P of [a, b]. The left members of
(5.4) and (5.5) are called the upper and lower Riemann integrals of f over [a, b], respectively.
If the upper and lower integrals are equal, we say that f Riemann-integrable on [a, b] and we
write f ∈ R (that is R denotes the Riemann-integrable functions), and we denote the common
value of (5.4) and (5.5) by
Z b
Z b
f dx or by
f (x) dx.
(5.6)
a
a
This is the Riemann integral of f over [a, b].
Since f is bounded, there exist two numbers m and M such that m ≤ f (x) ≤ M for all
x ∈ [a, b]. Hence for every partition P
m(b − a) ≤ L(P, f ) ≤ U(P, f ) ≤ M(b − a),
so that the numbers L(P, f ) and U(P, f ) form a bounded set. This shows that the upper and the
lower integrals are defined for every bounded function f . The question of their equality, and
hence the question of the integrability of f , is a more delicate one. Instead of investigating it
separately for the Riemann integral, we shall immediately consider a more general situation.
Definition 5.2 Let α be a monotonically increasing function on [a, b] (since α(a) and α(b) are
finite, it follows that α is bounded on [a, b]). Corresponding to each partition P of [a, b], we
write
∆αi = α(xi ) − α(xi−1 ).
It is clear that ∆αi ≥ 0. For any real function f which is bounded on [a, b] we put
U(P, f, α) =
L(P, f, α) =
n
X
i=1
n
X
i=1
Mi ∆αi ,
(5.7)
mi ∆αi ,
(5.8)
5.1 The Riemann–Stieltjes Integral
121
where Mi and mi have the same meaning as in Definition 5.1, and we define
Z
Z
b
f dα = inf U(P, f, α),
(5.9)
f dα = sup U(P, f, α),
(5.10)
a
b
a
where the infimum and the supremum are taken over all partitions P .
If the left members of (5.9) and (5.10) are equal, we denote their common value by
Z
b
f dα
or sometimes by
a
Z
b
f (x) dα(x).
(5.11)
a
This is the Riemann–Stieltjes integral (or simply the Stieltjes integral) of f with respect to α,
over [a, b]. If (5.11) exists, we say that f is integrable with respect to α in the Riemann sense,
and write f ∈ R(α).
By taking α(x) = x, the Riemann integral is seen to be a special case of the Riemann–Stieltjes
integral. Let us mention explicitely, that in the general case, α need not even be continuous.
We shall now investigate the existence of the integral (5.11). Without saying so every time, f
will be assumed real and bounded, and α increasing on [a, b].
Definition 5.3 We say that a partition P ∗ is a refinement of the partition P if P ∗ ⊃ P (that
is, every point of P is a point of P ∗ ). Given two partitions, P1 and P2 , we say that P ∗ is their
common refinement if P ∗ = P1 ∪ P2 .
Lemma 5.1 If P ∗ is a refinement of P , then
L(P, f, α) ≤ L(P ∗ , f, α) and U(P, f, α) ≥ U(P ∗ , f, α).
(5.12)
Proof. We only prove the first inequality of (5.12); the proof of the second one is analogous.
Suppose first that P ∗ contains just one point more than P . Let this extra point be x∗ , and
suppose xi−1 ≤ x∗ < xi , where xi−1 and xi are two consecutive points of P . Put
w1 = inf{f (x) | x ∈ [xi−1 , x∗ ]},
w2 = inf{f (x) | x ∈ [x∗ , xi ]}.
Clearly, w1 ≥ mi and w2 ≥ mi (since inf M ≥ inf N if M ⊂ N, see homework 1.4 (b)),
where, as before, mi = inf{f (x) | x ∈ [xi−1 , xi ]}. Hence
L(P ∗ , f, α) − L(P, f, α) = w1 (α(x∗ ) − α(xi−1 )) + w2 (α(xi ) − α(x∗ )) − mi (α(xi ) − α(xi−1 ))
= (w1 − mi )(α(x∗ ) − α(xi−1 )) + (w2 − mi )(α(xi ) − α(x∗ )) ≥ 0.
If P ∗ contains k points more than P , we repeat this reasoning k times, and arrive at (5.12).
5 Integration
122
Proposition 5.2
Z
b
a
f dα ≤
Z
b
f dα.
a
∗
Proof. Let P be the common refinement of two partitions P1 and P2 . By Lemma 5.1
L(P1 , f, α) ≤ L(P ∗ , f, α) ≤ U(P ∗ , f, α) ≤ U(P2 , f, α).
Hence
L(P1 , f, α) ≤ U(P2 , f, α).
If P2 is fixed and the supremum is taken over all P1 , (5.13) gives
Z b
f dα ≤ U(P2 , f, α).
(5.13)
(5.14)
a
The proposition follows by taking the infimum over all P2 in (5.14).
Proposition 5.3 (Riemann Criterion) f ∈ R(α) on [a, b] if and only if for every ε > 0 there
exists a partition P such that
U(P, f, α) − L(P, f, α) < ε.
(RC)
Proof. For every P we have
L(P, f, α) ≤
Z
0≤
Z
Thus (RC) implies
b
a
f dα ≤
b
a
f dα −
Z
Z
b
a
f dα ≤ U(P, f, α).
b
f dα < ε.
a
since the above inequality can be satisfied for every ε > 0, we have
Z
b
f dα =
a
Z
b
f dα,
a
that is f ∈ R(α).
Conversely, suppose f ∈ R(α), and let ε > 0 be given. Then there exist partitions P1 and P2
such that
Z b
Z b
ε
ε
U(P2 , f, α) −
(5.15)
f dα < ,
f dα − L(P1 , f, α) < .
2
2
a
a
We choose P to be the common refinement of P1 and P2 . Then Lemma 5.1, together with
(5.15), shows that
Z b
ε
U(P, f, α) ≤ U(P2 , f, α) <
f dα + < L(P1 , f, α) + ε ≤ L(P, f, α) + ε,
2
a
5.1 The Riemann–Stieltjes Integral
123
so that (RC) holds for this partition P .
Proposition 5.3 furnishes a convenient criterion for integrability. Before we apply it, we state
some closely related facts.
Lemma 5.4 (a) If (RC) holds for P and some ε, then (RC) holds with the same ε for every
refinement of P .
(b) If (RC) holds for P = {x0 , . . . , xn } and if si , ti are arbitrary points in [xi−1 , xi ], then
n
X
i=1
| f (si) − f (ti ) | ∆αi < ε.
(c) If f ∈ R(α) and (RC) holds as in (b), then
Z b
n
X
f
(t
)∆α
−
f
dα
< ε.
i
i
a
i=1
Proof. Lemma 5.1 implies (a). Under the assumptions made in (b), both f (si) and f (ti ) lie in
[mi , Mi ], so that | f (si ) − f (ti ) | ≤ Mi − mi . Thus
n
X
i=1
| f (ti ) − f (si ) | ∆αi ≤ U(P, f, α) − L(P, f, α),
which proves (b). The obvious inequalities
X
L(P, f, α) ≤
f (ti )∆αi ≤ U(P, f, α)
i
and
L(P, f, α) ≤
prove (c).
Z
a
b
f dα ≤ U(P, f, α)
Theorem 5.5 If f is continuous on [a, b] then f ∈ R(α) on [a, b].
Proof. Let ε > 0 be given. Choose η > 0 so that
(α(b) − α(a))η < ε.
Since f is uniformly continuous on [a, b] (Proposition 3.7), there exists a δ > 0 such that
| f (x) − f (t) | < η
(5.16)
if x, t ∈ [a, b] and | x − t | < δ. If P is any partition of [a, b] such that ∆xi < δ for all i, then
(5.16) implies that
Mi − mi ≤ η,
i = 1, . . . , n
(5.17)
5 Integration
124
and therefore
U(P, f, α) − L(P, f, α) =
n
X
i=1
(Mi − mi )∆αi ≤ η
n
X
i=1
∆αi = η(α(b) − α(a)) < ε.
By Proposition 5.3, f ∈ R(α).
Example 5.1 (a) The proof of Theorem 5.5 together with Lemma 5.4 shows that
Z b
n
X
f (ti )∆αi −
f dα < ε
a
i=1
if ∆xi < δ.
Rb
We compute I = a sin x dx. Let ε > 0. Since sin x is continuous, f ∈ R. There exists δ > 0
such that | x − t | < δ implies
| sin x − sin t | <
ε
.
b−a
(5.18)
In this case (RC) is satisfied and consequently
Z b
n
X
sin(ti )∆xi −
sin x dx < ε
a
i=1
for every partition P with ∆xi < δ, i = 1, . . . , n.
For we choose an equidistant partition of [a, b], xi = a + (b − a)i/n, i = 0, . . . , n. Then
(b − a)2
h = ∆xi = (b − a)/n and the condition (5.18) is satisfied provided n >
. We have, by
ε
addition the formula cos(x − y) − cos(x + y) = 2 sin x sin y
n
X
sin xi ∆xi =
i=1
n
X
i=1
=
n
X
h
sin(a + ih)h =
2 sin h/2 sin(a + ih)
2 sin h/2 i=1
n
X
h
(cos(a + (i − 1/2)h) − cos(a + (i + 1/2)h))
2 sin h/2 i=1
h
(cos(a + h/2) − cos(a + (n + 1/2)h))
2 sin h/2
h/2
(cos(a + h/2) − cos(b + h/2)))
=
sin h/2
=
Since limh→0 sin h/h = 1 and cos x is continuous, we find that the above expression tends to
Rb
cos a − cos b. Hence a sin x dx = cos a − cos b.
(b) For x ∈ [a, b] define
(
1,
x ∈ Q,
f (x) =
0,
x 6∈ Q.
We will show f 6∈ R. Let P be any partition of [a, b]. Since any interval contains rational
as well as irrational points, mi = 0 and Mi = 1 for all i. Hence L(P, f ) = 0 whereas
5.1 The Riemann–Stieltjes Integral
125
P
U(P, f ) = ni=1 ∆xi = b − a. We conclude that the upper and lower Riemann integrals don’t
coincide; f 6∈ R. A similar reasoning shows f 6∈ R(α) if α(b) > α(a) since L(P, f, α) = 0 <
U(P, f, α) = α(b) − α(a).
Proposition 5.6 If f is monotonic on [a, b], and α is continuous on [a, b], then f ∈ R(α).
α (x)
Let ε > 0 be given. For any positive
integer n, choose a partition such that
∆αi =
Proof.
α(b) − α(a)
,
n
i = 1, . . . , n.
This is possible by the intermediate
value theorem (Theorem 3.5) since α
is continuous.
a
x
1
x2
x
b
n-1
We suppose that f is monotonically increasing (the proof is analogous in the other case). Then
Mi = f (xi ),
mi = f (xi−1 ),
i = 1, . . . , n,
so that
n
U(P, f, α) − L(P, f, α) =
=
α(b) − α(a) X
(f (xi ) − f (xi−1 ))
n
i=1
α(b) − α(a)
(f (b) − f (a)) < ε
n
if n is taken large enough. By Proposition 5.3, f ∈ R(α).
Without proofs which can be found in [Rud76, pp. 126 –128] we note the following facts.
Proposition 5.7 If f is bounded on [a, b], f has finitely many points of discontinuity on [a, b],
and α is continuous at every point at which f is discontinuous. Then f ∈ R(α).
Proof. We give an idea of the proof in case of the Riemann integral (α(x) = x) and one single
discontinuity at c, a < c < b. For, let ε > 0 be given and m ≤ f (x) ≤ M for all x ∈ [a, b] and
put C = M − m. First choose point a′ and b′ with a < a′ < c < b′ < b and C(b′ − a′ ) < ε.
Let fj , j = 1, 2, denote the restriction of f to the subintervals I1 = [a, a′ ] and I2 = [b, b′ ],
respectively. Since fj is continuous on Ij , fj ∈ R over Ij and therefore, by the Riemann
criterion, there exist partitions Pj , j = 1, 2, of Ij such that U(Pj , fj ) − L(Pj , fj ) < ε, j = 1, 2.
Let P = P1 ∪ P2 be a partition of [a, b]. Then
U(P, f ) − L(P, f ) = U(P1 , f1 ) − L(P1 , f ) + U(P2 , f ) − L(P2 , f ) + (M0 − m0 )(b′ − a′ )
≤ ε + ε + C(b′ − a′ ) < 3ε,
5 Integration
126
where M0 and m0 are the supremum and infimum of f (x) on [a′ , b′ ]. The Riemann criterion is
satisfied for f on [a, b], f ∈ R.
Proposition 5.8 If f ∈ R(α) on [a, b], m ≤ f (x) ≤ M, ϕ is continuous on [m, M], and
h(x) = ϕ(f (x)) on [a, b]. Then h ∈ R(α) on [a, b].
Remark 5.1 (a) A bounded function f is Riemann-integrable on [a, b] if and only if f is continuous almost everywhere on [a, b]. (The proof of this fact can be found in [Rud76, Theorem 11.33]).
“Almost everywhere” means that the discontinuities form a set of (Lebesgue) measure 0. A set
S
M ⊂ R has measure 0 if for given ε > 0 there exist intervals In , n ∈ N such that M ⊂ n∈N In
P
and n∈N | In | < ε. Here, | I | denotes the length of the interval. Examples of sets of measure
0 are finite sets, countable sets, and the Cantor set (which is uncountable).
(b) Note that such a “chaotic” function (at point 0) as
(
cos x1 ,
x 6= 0,
f (x) =
0,
x = 0,
is integrable on [−π, π] since there is only one single discontinuity at 0.
5.1.1 Properties of the Integral
Proposition 5.9 (a) If f1 , f2 ∈ R(α) on [a, b] then f1 + f2 ∈ R(α), cf ∈ R(α) for every
constant c and
Z b
Z b
Z b
Z b
Z b
(f1 + f2 ) dα =
f1 dα +
f2 dα,
cf dα = c
f dα.
a
a
a
a
a
(b) If f1 , f2 ∈ R(α) and f1 (x) ≤ f2 (x) on [a, b], then
Z b
Z b
f1 dα ≤
f2 dα.
a
a
(c) If f ∈ R(α) on [a, b] and if a < c < b, then f ∈ R(α) on [a, c] and on [c, b], and
Z b
Z c
Z b
f dα =
f dα +
f dα.
a
a
c
(d) If f ∈ R(α) on [a, b] and | f (x) | ≤ M on [a, b], then
Z b
≤ M(α(b) − α(a)).
f
dα
a
(e) If f ∈ R(α1 ) and f ∈ R(α2 ), then f ∈ R(α1 + α2 ) and
Z b
Z b
Z b
f d(α1 + α2 ) =
f dα1 +
f dα2 ;
a
a
a
5.1 The Riemann–Stieltjes Integral
127
if f ∈ R(α) and c is a positive constant, then f ∈ R(cα) and
Z b
Z b
f d(cα) = c
f dα.
a
a
Proof. If f = f1 + f2 and P is any partition of [a, b], we have
L(P, f1 , α) + L(P, f2 , α) ≤ L(P, f, α) ≤ U(P, f, α) ≤ U(P, f1 , α) + U(P, f2 , α)
(5.19)
since inf f1 + inf f2 ≤ inf (f1 + f2 ) and sup f1 + sup f2 ≥ sup(f1 + f2 ).
Ii
Ii
Ii
Ii
Ii
Ii
If f1 ∈ R(α) and f2 ∈ R(α), let ε > 0 be given. There are partitons Pj , j = 1, 2, such that
U(Pj , fj , α) − L(Pj , fj , α) < ε.
These inequalities persist if P1 and P2 are replaced by their common refinement P . Then (5.19)
implies
U(P, f, α) − L(P, f, α) < 2ε
which proves that f ∈ R(α). With the same P we have
Z b
fj dα + ε,
U(P, fj , α) <
j = 1, 2;
a
since L(P, f, α) <
Rb
f dα < U(P, f, α); hence (5.19) implies
Z b
Z b
Z b
f dα ≤ U(P, f, α) <
f1 dα +
f2 dα + 2ε.
a
a
a
a
Since ε was arbitrary, we conclude that
Z b
Z b
Z b
f dα ≤
f1 dα +
f2 dα.
a
a
(5.20)
a
If we replace f1 and f2 in (5.20) by −f1 and −f2 , respectively, the inequality is reversed, and
the equality is proved.
Rb
(b) Put f = f2 − f1 . It suffices to prove that a f dα ≥ 0. For every partition P we have mi ≥ 0
since f ≥ 0. Hence
Z b
n
X
f dα ≥ L(P, f, α) =
mi ∆αi ≥ 0
a
i=1
since in addition ∆αi = α(xi ) − α(xi−1 ) ≥ 0 (α is increasing).
The proofs of the other assertions are so similar that we omit the details. In part (c) the point is
that (by passing to refinements) we may restrict ourselves to partitions which contain the point
Rb
c, in approximating a f dα, cf. Homework 14.5.
Note that in (c), f ∈ R(α) on [a, c] and on [c, b] in general does not imply that f ∈ R(α) on
[a, b]. For example consider the interval [−1, 1] with
(
0,
−1 ≤ x < 0,
f (x) = α(x) =
1,
0 ≤ x ≤ 1.
5 Integration
128
R1
Then 0 f dα = 0. The integral vanishes since α is constant on [0, 1]. However, f 6∈ R(α) on
[−1, 1] since for any partition P including the point 0, we have U(P, f, α) = 1 and L(P, f, α) =
0.
Proposition 5.10 If f, g ∈ R(α) on [a, b], then
(a) f g ∈ R(α);
Z b
Z b
(b) | f | ∈ R(α) and f dα ≤
| f | dα.
a
a
Proof. If we take ϕ(t) = t2 , Proposition 5.8 shows that f 2 ∈ R(α) if f ∈ R(α). The identity
4f g = (f + g)2 − (f − g)2
completes the proof of (a).
If we take ϕ(t) = | t |, Proposition 5.8 shows that | f | ∈ R(α). Choose c = ±1 so that
R
c f dα ≥ 0. Then
since ±f ≤ | f |.
Z
Z
Z
Z
f dα = c f dα = cf dα ≤ | f | dα,
The unit step function or Heaviside function H(x) is defined by H(x) = 0 if x < 0 and
H(x) = 1 if x ≥ 0.
Example 5.2 (a) If a < s < b, f is bounded on [a, b], f is continuous at s, and α(x) = H(x−s),
then
Z b
f dα = f (s).
a
For the proof, consider the partition P with n = 3; a = x0 < x1 < s = x2 < x3 = b. Then
∆α1 = ∆α3 = 0, ∆α2 = 1, and
U(P, f, α) = M2 ,
L(P, f, α) = m2 .
Since f is continuous at s, we see that M2 and m2 converge to f (s) as x → s.
(b) Suppose cn ≥ 0 for all n = 1, . . . , N and (sn ), n = 1, . . . , N, is a strictly increasing finite
P
sequence of distinct points in (a, b). Further, α(x) = N
n=1 cn H(x − sn ). Then
Z
b
f dα =
a
This follows from (a) and Proposition 5.9 (e).
N
X
n=1
cn f (sn ).
5.1 The Riemann–Stieltjes Integral
129
Proposition 5.11 Suppose cn ≥ 0 for all positive integers
P∞
n ∈ N,
n=1 cn converges, (sn ) is a strictly increasing sequence of distinct points in (a, b), and
α(x) =
∞
X
n=1
c3
cn H(x − sn ).
(5.21)
c1
Let f be continuous on [a, b]. Then
Z
b
f dα =
a
∞
X
c2
s1
cn f (sn ).
s2 s 3
s4
(5.22)
n=1
Proof. The comparison test shows that the series (5.21) converges for every x. Its sum α is
P
evidently an increasing function with α(a) = 0 and α(b) =
cn . Let ε > 0 be given, choose
N so that
∞
X
cn < ε.
n=N +1
Put
α1 (x) =
N
X
n=1
cn H(x − sn ),
α2 (x) =
n=N +1
By Proposition 5.9 and Example 5.2
Z
∞
X
b
f dα1 =
a
N
X
cn H(x − sn ).
cn f (sn ).
n=1
Since α2 (b) − α2 (a) < ε, by Proposition 5.9 (d),
Z b
≤ Mε,
f
dα
2
a
where M = sup | f (x) |. Since α = α1 + α2 it follows that
Z
N
b
X
f dα −
cn f (sn ) ≤ Mε.
a
n=1
If we let N → ∞ we obtain (5.22).
Proposition 5.12 Assume that α is increasing and α′ ∈ R on [a, b]. Let f be a bounded real
function on [a, b].
Then f ∈ R(α) if and only if f α′ ∈ R. In that case
Z b
Z b
f dα =
f (x)α′ (x) dx.
(5.23)
a
a
The statement remains true if α is continuous on [a, b] and differentiable up to finitely many
points c1 , c2 , . . . , cn .
5 Integration
130
Proof. Let ε > 0 be given and apply the Riemann criterion Proposition 5.3 to α′ : There is a
partition P = {x0 , . . . , xn } of [a, b] such that
U(P, α′ ) − L(P, α′) < ε.
(5.24)
The mean value theorem furnishes points ti ∈ [xi−1 , xi ] such that
∆αi = α(xi ) − α(xi−1 ) = α′ (ti )(xi − xi−1 ) = α′ (ti )∆xi ,
If si ∈ [xi−1 , xi ], then
n
X
i=1
for
| α′ (si ) − α′ (ti ) | ∆xi < ε
i = 1, . . . , n.
(5.25)
by (5.24) and Lemma 5.4 (b). Put M = sup | f (x) |. Since
n
X
f (si )∆αi =
i=1
n
X
f (si )α′ (ti )∆xi
i=1
it follows from (5.25) that
n
n
X
X
f (si)∆αi −
f (si )α′ (si )∆xi ≤ Mε.
i=1
In particular,
(5.26)
i=1
n
X
i=1
f (si )∆αi ≤ U(P, f α′ ) + Mε,
for all choices of si ∈ [xi−1 , xi ], so that
U(P, f, α) ≤ U(P, f α′) + Mε.
The same argument leads from (5.26) to
U(P, f α′ ) ≤ U(P, f, α) + Mε.
Thus
| U(P, f, α) − U(P, f α) | ≤ Mε.
(5.27)
Now (5.25) remains true if P is replaced by any refinement. Hence (5.26) also remains true.
We conclude that
Z b
Z b
′
f dα −
f (x)α (x) dx ≤ Mε.
a
a
But ε is arbitrary. Hence
Z
b
f dα =
a
Z
b
f (x)α′ (x) dx,
a
for any bounded f . The equality for the lower integrals follows from (5.26) in exactly the same
way. The proposition follows.
We now summarize the two cases.
5.1 The Riemann–Stieltjes Integral
131
Proposition 5.13 Let f be continuous on [a, b]. Except for finitely many points c0 , c1 , . . . , cn
with c0 = a and cn = b there exists α′ (x) which is continuous and bounded on
[a, b] \ {c0 , . . . , cn }.
Then f ∈ R(α) and
Z
Z
b
f dα =
a
b
′
f (x)α (x) dx +
a
n−1
X
i=1
f (ci )(α(ci + 0) − α(ci − 0))+
f (a)(α(a + 0) − α(a)) + f (b)(α(b) − α(b − 0)).
−
Proof (Sketch of proof). (a) Note that A+
i = α(ci + 0) − α(ci ) and Ai = α(ci ) − α(ci − 0)
exist by Theorem 3.8. Define
n−1
X
α1 (x) =
A+
i H(x
i=0
− ci ) +
k
X
i=1
−A−
i H(ci − x).
(b) Then α2 = α − α1 is continuous.
(c) Since α1 is piecewise constant, α1′ (x) = 0 for x 6= ck . Hence α2′ (x) = α′(x). for x 6= ci .
Applying Proposition 5.12 gives
Z
Z
b
f dα2 =
a
Further,
Z
b
f dα =
a
By Proposition 5.11
Z
b
a
f α2′
dx =
b
f α′ dx.
a
b
f d(α1 + α2 ) =
a
Z
Z
Z
b
′
f α dx +
a
b
f dα1 =
a
n
X
A+
i f (ci )
i=1
−
n−1
X
Z
b
f dα1 .
a
A−
i (−f (ci )).
i=1
Example 5.3 (a) The Fundamental Theorem of Calculus, see Theorem 5.15 yields
Z
0
(b) f (x) = x2 .
2
3
x dx =
Z
0
2
2
x4 x · 3x dx = 3 = 12.
4 0
2


x,



7,
α(x) =

x2 + 10,




64,
0 ≤ x < 1,
x = 1,
1 < x < 2,
x = 2.
5 Integration
132
Z
0
2
Z
2
f α′ dx + f (1)(α(1 + 0) − α(1 − 0)) + f (2)(α(2) − α(2 − 0))
Z 1
Z 2
2
=
x · 1 dx +
x2 · 2x dx + 1(11 − 1) + 4(64 − 14)
0
1
3 1
4 2
1
5
x x
1
=
+ + 10 + 200 = + 8 − + 210 = 217 .
3 0
2 1
3
2
6
f dα =
0
Remark 5.2 The three preceding proposition show the flexibility of the Stieltjes process of
integration. If α is a pure step function, the integral reduces to an infinite series. If α has
an initegrable derivative, the integral reduces to the ordinary Riemann integral. This makes it
possible to study series and integral simultaneously, rather than separately.
5.2 Integration and Differentiation
We shall see that integration and differentiation are, in a certain sense, inverse operations.
Theorem 5.14 Let f ∈ R on [a, b]. For a ≤ x ≤ b put
Z x
f (t) dt.
F (x) =
a
Then F is continuous on [a, b]; furthermore, if f is continuous at x0 ∈ [a, b] then F is differentiable at x0 and
F ′ (x0 ) = f (x0 ).
Proof. Since f ∈ R, f is bounded. Suppose | f (t) | ≤ M on [a, b]. If a ≤ x < y ≤ b, then
Z y
| F (y) − F (x) | = f (t) dt ≤ M(y − x),
x
by Proposition 5.9 (c) and (d). Given ε > 0, we see that
| F (y) − F (x) | < ε,
provided that | y − x | < ε/M. This proves continuity (and, in fact, uniform continuity) of F .
Now suppose that f is continuous at x0 . Given ε > 0, choose δ > 0 such that
| f (t) − f (x0 ) | < ε
if | t − x0 | < δ, t ∈ [a, b]. Hence, if
x0 − δ < s ≤ x0 ≤ t < x0 + δ,
and a ≤ s < t ≤ b,
we have by Proposition 5.9 (d) as before
Z t
Z t
1
F (t) − F (s)
1
f (r) dr −
f (x0 ) dr − f (x0 ) = t−s
t−s s
t−s s
Z t
1 =
(f (u) − f (x0 )) du < ε.
t−s
s
5.2 Integration and Differentiation
133
This in particular holds for s = x0 , that is
F (t) − F (x0 )
< ε.
−
f
(x
)
0
t − x0
It follows that F ′ (x0 ) = f (x0 ).
Definition 5.4 A function F : [a, b] → R is called an antiderivative or a primitive of a function
f : [a, b] → R if F is differentiable and F ′ = f .
Remarks 5.3 (a) There exist functions f not having an antiderivative, for example the Heaviside function H(x) has a simple discontinuity at 0; but by Corollary 4.18 derivatives cannot
have simple discontinuities.
(b) The antiderivative F of a function f (if it exists) is unique up to an additive constant. More
precisely, if F is a antiderivative on [a, b], then F1 (x) = F (x) + c is also a antiderivative of f .
If F and G are antiderivatives of f on [a, b], then there is a constant c so that F (x) − G(x) = c.
The first part is obvious since F1′ (x) = F ′ (x) + c′ = f (x). Suppose F and G are antiderivatives
of f . Put H(x) = F (x) − G(x); then H ′ (x) = 0 and H(x) is constant by Corollary 4.11.
Notation for the antiderivative:
F (x) =
Z
f (x) dx =
Z
f dx.
The function f is called the integrand. Integration and differentiation are inverse to each other:
Z
Z
d
f (x) dx = f (x),
f ′ (x) dx = f (x).
dx
Theorem 5.15 (Fundamental Theorem of Calculus) Let f : [a, b] → R be continuous. (a) If
Z x
F (x) =
f (t) dt.
a
Then F (x) is an antiderivative of f (x) on [a, b].
(b) If G(x) is an antiderivative of f (x) then
Z
b
a
f (t) dt = G(b) − G(a).
Rx
Proof. (a) By Theorem 5.14 F (x) = a f (x) dx is differentiable at any point x0 ∈ [a, b] with
F ′ (x) = f (x).
(b) By the above remark, the antiderivative is unique up to a constant, hence F (x) − G(x) = C.
Ra
Since F (a) = a f (x) dx = 0 we obtain
G(b) − G(a) = (F (b) − C) − (F (a) − C) = F (b) − F (a) = F (b) =
Z
b
f (x) dx.
a
5 Integration
134
Note that the FTC is also true if f ∈ R and G is an antiderivative of f on [a, b]. Indeed, let ε > 0
be given. By the Riemann criterion, Proposition 5.3 there exists a partition P = {x0 , . . . , xn }
of [a, b] such that U(P, f ) − L(P, f ) < ε. By the mean value theorem, there exist points
ti ∈ [xi−1 , xi ] such that
F (xi ) − F (xi−1 ) = f (ti )(xi − xi−1 ),
Thus
F (b) − F (a) =
n
X
i = 1, . . . , n.
f (ti )∆xi .
i=1
It follows from Lemma 5.4 (c) and the above equation that
Z b
Z b
n
X
f (ti )∆xi −
f (x) dx = F (b) − F (a) −
f (x) dx < ε.
a
a
i=1
Since ε > 0 was arbitrary, the proof is complete.
5.2.1 Table of Antiderivatives
By differentiating the right hand side one gets the left hand side of the table.
function
domain
xα
α ∈ R \ {−1}, x > 0
1
x
ex
x < 0 or
x>0
log | x |
R
ax
a > 0, a 6= 1, x ∈ R
sin x
R
R
cos x
1
sin2 x
1
cos2 x
1
1 + x2
1
√
1 + x2
1
√
1 − x2
1
√
x2 − 1
antiderivative
1
xα+1
α+1
ex
ax
log a
− cos x
sin x
R \ {kπ | k ∈ Z}
R\
nπ
2
+ kπ | k ∈ Z
− cot x
o
tan x
R
R
arctan x
arsinh x = log(x +
−1 < x < 1
x < −1
or
x>1
√
x2 + 1)
arcsin x
log(x +
√
x2 − 1)
5.2 Integration and Differentiation
135
5.2.2 Integration Rules
The aim of this subsection is to calculate antiderivatives of composed functions using antiderivatives of (already known) simpler functions.
Notation:
f (x)|ba := f (b) − f (a).
Proposition 5.16 (a) Let f and g be functions with antiderivatives F and G, respectively. Then
af (x) + bg(x), a, b ∈ R, has the antiderivative aF (x) + bG(x).
Z
Z
Z
(af + bg) dx = a f dx + b g dx (Linearity.)
(b) If f and g are differentiable, and f (x)g ′ (x) has a antiderivative then f ′ (x)g(x) has a antiderivative, too:
Z
Z
′
f g dx = f g − f g ′ dx, (Integration by parts.)
(5.28)
If f and g are continuously differentiable on [a, b] then
Z b
Z b
b
′
f g dx = f (x)g(x)|a −
f g ′ dx.
a
(5.29)
a
(c) If ϕ : D → R is continuously differentiable with ϕ(D) ⊂ I, and f : I → R has a antiderivative F , then
Z
(5.30)
f (ϕ(x))ϕ′ (x) dx = F (ϕ(x)), (Change of variable.)
If ϕ : [a, b] → R is continuously differentiable with ϕ([a, b]) ⊂ I and f : I → R is continuous,
then
Z b
Z ϕ(b)
′
f (ϕ(t))ϕ (t) dt =
f (x) dx.
a
ϕ(a)
Proof. Since differentiation is linear, (a) follows.
(b) Differentiating the right hand side, we obtain
Z
d
(f g − f g ′ dx) = f ′ g + f g ′ − f g ′ = f ′ g
dx
which proves the statement.
(c) By the chain rule F (ϕ(x)) is differentiable with
d
F (ϕ(x)) = F ′ (ϕ(x))ϕ′ (x) = f (ϕ(x))ϕ′ (x),
dx
and (c) follows.
The statements about the Riemann integrals follow from the statements about antiderivatives
5 Integration
136
using the fundamental theorem of calculus. For example, we show the second part of (c). By
the above part, F (ϕ(t)) is an antiderivative of f (ϕ(t))ϕ′(t). By the FTC we have
Z b
f (ϕ(t))ϕ′ (t) dt = F (ϕ(t))|ba = F (ϕ(b)) − F (ϕ(a)).
a
On the other hand, again by the FTC,
Z ϕ(b)
ϕ(b)
f (x) dx = F (x)|ϕ(a) = F (ϕ(b)) − F (ϕ(a)).
ϕ(a)
This completes the proof of (c).
Corollary 5.17 Suppose F is the antiderivative of f .
Z
1
(5.31)
f (ax + b) dx = F (ax + b), a 6= 0;
a
Z ′
g (x)
dx = log | g(x) | , (g differentiable and g(x) 6= 0).
(5.32)
g(x)
R
P
Example 5.4 (a) The antiderivative of a polymnomial. If p(x) = nk=0 ak xk , then p(x) dx =
Pn ak k+1
.
k=0 k+1 x
′
(b) Put f (x) = ex and g(x) = x, then f (x) = ex and g ′(x) = 1 and we obtain
Z
Z
x
x
xe dx = xe − 1 · ex dx = ex (x − 1).
R
R
R
(c) I = (0, ∞). log x dx = 1 · log x dx = x log x − x x1 dx = x log x − x.
(d)
Z
Z
Z
1
arctan x dx = 1 · arctan x dx = x arctan x − x
dx
1 + x2
Z
1
1
(1 + x2 )′
= x arctan x −
dx = x arctan x − log(1 + x2 ).
2
2
1+x
2
In the last equation we made use of (5.32).
(e) Recurrent computation of integrals.
Z
dx
In :=
,
(1 + x2 )n
I1 = arctan x.
In =
Put u = x, v ′ =
Z
n ∈ N.
(1 + x2 ) − x2
= In−1 −
(1 + x2 )n
Z
x2 dx
.
(1 + x2 )n
x
. Then U ′ = 1 and
(1 + x2 )n
Z
x dx
1 (1 + x2 )1−n
v=
=
.
(1 + x2 )n
2 1−n
5.2 Integration and Differentiation
Hence,
137
Z
1 x(1 + x2 )1−n
1
In = In−1 −
(1 + x2 )1−n dx
−
2
1−n
2(1 − n)
x
2n − 3
In =
In−1 .
+
(2n − 2)(1 + x2 )n−1 2n − 2
In particular, I2 =
x
2(1+x2 )
+ 12 arctan x and I3 =
x
4(1+x2 )2
+ 34 I2 .
Proposition 5.18 (Mean Value Theorem of Integration) Let f, ϕ : [a, b] → R be continuous
functions and ϕ ≥ 0. Then there exists ξ ∈ [a, b] such that
Z b
Z b
f (x)ϕ(x) dx = f (ξ)
ϕ(x) dx.
(5.33)
a
a
In particular, in case ϕ = 1 we have
Z b
a
f (x) dx = f (ξ)(b − a)
for some ξ ∈ [a, b].
Proof. Put m = inf{f (x) | x ∈ [a, b]} and M = sup{f (x) | x ∈ [a, b]}. Since ϕ ≥ 0 we obtain
mϕ(x) ≤ f (x)ϕ(x) ≤ Mϕ(x). By Proposition 5.9 (a) and (b) we have
Z b
Z b
Z b
ϕ(x) dx ≤
f (x)ϕ(x) dx ≤ M
ϕ(x) dx.
m
a
a
a
Hence there is a µ ∈ [m, M] such that
Z b
Z b
f (x)ϕ(x) dx = µ
ϕ(x) dx.
a
a
Since f is continuous on [a, b] the intermediate value theorem Theorem 3.5 ensures that there
is a ξ with µ = f (ξ). The claim follows.
Example 5.5 The trapezoid rule. Let f : [0, 1] → R be twice continuously differentiable. Then
there exists ξ ∈ [0, 1] such that
Z 1
1
1
f (x) dx = (f (0) + f (1)) − f ′′ (ξ).
(5.34)
2
12
0
Proof. Let ϕ(x) = 12 x(1 −x) such that ϕ(x) ≥ 0 for x ∈ [0, 1], ϕ′ (x) = 12 −x, and ϕ′′ (x) = −1.
Using integration by parts twice as well as Theorem 5.18 we find
Z 1
Z 1
Z 1
1
′′
′
f (x) dx = −
ϕ (x)f (x) dx = −ϕ (x)f (x)|0 +
ϕ′ (x)f ′ (x) dx
0
0
0
Z 1
1
1
= (f (0) + f (1)) + ϕ(x)f ′ (x)|0 −
ϕ(x)f ′′ (x) dx
2
0
Z 1
1
= (f (0) + f (1)) − f ′′ (ξ)
ϕ(x) dx
2
0
1
1
= (f (0) + f (1)) − f ′′ (ξ).
2
12
5 Integration
138
Indeed,
R1
0
1
x
2
1
− 12 x2 dx = 14 x2 − 16 x3 0 =
1
4
−
1
6
=
1
.
12
5.2.3 Integration of Rational Functions
We will give a useful method to compute antiderivatives of an arbitrary rational function.
Consider a rational function p/q where p and q are polinomials. We will assume that deg p <
deg q; for otherwise we can express p/q as a polynomial function plus a rational function which
is of this form, for eample
x2
1
=x+1+
.
x−1
x−1
Polynomials
We need some preliminary facts on polynomials which are stated here without proof. Recall
that a non-zero constant polynomial has degree zero, deg c = 0 if c 6= 0. By definition, the zero
polynomial has degree −∞, deg 0 = −∞.
Theorem 5.19 (Fundamental Theorem of Algebra) Every polynomial p of positive degree
with complex coefficients has a complex root, i. e. there exists a complex number z such that
p(z) = 0.
Lemma 5.20 (Long Division) Let p and q be polynomials, then there exist unique polynomials
r and s such that
p = qs + r, deg r < deg q.
Lemma 5.21 Let p be a complex polynomial of degree n ≥ 1 and leading coefficient an . Then
there exist n uniquely determined numbers z1 , . . . , zn (which may be equal) such that
p(z) = an (z − z1 )(z − z2 ) · · · (z − zn ).
Proof. We use induction over n and the two preceding statements. In case n = 1 the linear
polynomial p(z) = az + b can be written in the desired form
b
−b
p(z) = a z −
with the unique root z1 = − .
a
a
Suppose the statement is true for all polynomials of degree n − 1. We will show it for degree n
polynomials. For, let zn be a complex root of p which exists by Theorem 5.19; p(zn ) = 0. Using
long division of p by the linear polynomial q(z) = z −zn we obtain a quotient polynomial p1 (z)
and a remainder polynomial r(z) of degree 0 (a constant polynomial) such that
p(z) = (z − zn )p1 (z) + r(z).
Inserting z = zn gives p(zn ) = 0 = r(zn ); hence the constant r vanishes and we have
p(z) = (z − zn )p1 (z)
5.2 Integration and Differentiation
139
with a polynomial p1 (z) of degree n − 1. Applying the induction hypothesis to p1 the statement
follows.
A root α of p is said to be a root of multiplicity k, k ∈ N, if α appears exactly k times among
the zeros z1 , z2 , . . . , zn . In that case (z − α)k divides p(z) but (z − α)k+1 not.
If p is a real polynomial, i. e. a polynomial with real coefficients, and α is a root of multiplicity
k of p then α is also a root of multiplicity k of p. Indeed, taking the complex conjugation of the
equation
p(z) = (z − α)k q(z)
we have since p(z) = p(z) = p(z)
p(z) = (z − α)k q(z) =⇒ p(z) = (z − α)k q(z).
z:=z
Note that the product of the two complex linear factors z − α and z − α yield a real quadratic
factor
(z − α)(z − α) = z 2 − (α + α)z + αα = z 2 − 2 Re α + | α |2 .
Using this fact, the real version of Lemma 5.21 is as follows.
Lemma 5.22 Let q be a real polynomial of degree n with leading coefficient an . Then there
exist real numbers αi , βj , γj and multiplicities ri , sj ∈ N, i = 1, . . . , k, j = 1, . . . , l such that
q(x) = an
k
Y
i=1
(x − αi )
ri
l
Y
j=1
(x2 − 2βj x + γj )sj .
We assume that the quadratic factors cannot be factored further; this means
Of course, deg q =
P
i ri
+
P
βj2 − γj < 0,
j
j = 1, . . . , l.
2sj = n.
√
√
√
√
Example 5.6 (a) x4 − 4 = (x2 + 2)(x2 − 2) = (x − 2)(x + 2)(x − i 2)(x + i 2) =
√
√
(x − 2)(x + 2)(x2 + 2)
(b) x3 + x − 2. One can guess the first zero x1 = 1. Using long division one gets
x3
−(x3 −x2 )
x2
−(x2
+x
−2 = (x − 1)(x2 + x + 2)
+x −2
−x
)
2x −2
−(2x −2)
0
There are no further real zeros of x2 + x + 2.
5 Integration
140
5.2.4 Partial Fraction Decomposition
Proposition 5.23 Let p(x) and q(x) be real polynomials with deg p < deg q. There exist real
numbers Air , Bjs , and Cjs such that
!
!
sj
ri
k
l
X
X
Air
Bjs x + Cjs
p(x) X X
=
+
(5.35)
q(x)
(x − αi )r
(x2 − 2βj x + γj )s
r=1
s=1
i=1
j=1
where the αi , βj , γj , ri , and sj have the same meaning as in Lemma 5.22.
Z
Z
x4
Example 5.7 (a) Compute f (x) dx =
dx. We use long division to obtain a ratiox3 − 1
x
nal function p/q with deg p < deg q, f (x) = x + x3 −1 . To obtain the partial fraction decomposition (PFD), we need the factorization of the denominator polynomial q(x) = x3 − 1. One can
guess the first real zero x1 = 1 and divide q by x − 1; q(x) = (x − 1)(x2 + x + 1).
The PFD then reads
x
a
bx + c
=
+ 2
.
3
x −1
x−1 x +x+1
We have to determine a, b, c. Multiplication by x3 − 1 gives
0 · x2 + 1 · x + 0 = a(x2 + x + 1) + (bx + c)(x − 1) = (a + b)x2 + (a − b + c)x + a − c.
The two polynomials on the left and on the right must coincide, that is, there coefficients must
be equal:
0 = a − c, 1 = a − b + c, 0 = a + b;
which gives a = −b = c = 13 . Hence,
x3
x
1 1
1 x−1
=
−
.
−1
3 x − 1 3 x2 + x + 1
We can integrate the first two terms but we have to rewrite the last one
x2
Recall that
Z
x−1
1 2x + 1
3
1
=
−
.
2
1
+x+1
2 x +x+1 2 x+ 2+ 3
2
4
2
2x − 2β
x − 2βx + γ ,
dx
=
log
x2 − 2βx + γ
Z
x+b
1
dx
.
= arctan
2
2
(x + b) + a
a
a
Therefore,
Z
2x + 1
x4
1 2 1
1
1
2
√
√ .
arctan
dx
=
x
log
|
x
−
1
|
−
log(x
+
+
x
+
1)
+
x3 − 1
2
2
6
3
3
(b) If q(x) = (x − 1)3 (x + 2)(x2 + 2)2 (x2 + 1) and p(x) is any polynomial with deg p < deg q =
10, then the partial fraction decomposition reads as
A11
A12
B11 x + C11 B12 x + C12 B21 + C21
A13
A21
p(x)
=
+
+
+
.
+
+
+
2
3
q(x)
x − 1 (x − 1)
(x − 1)
x+2
x2 + 2
(x2 + 2)2
x2 + 1
(5.36)
5.2 Integration and Differentiation
141
Suppose now that p(x) ≡ 1. One can immediately compute A13 and A21 . Multiplying (5.36)
by (x − 1)3 yields
1
= A13 + (x − 1)p1 (x)
(x + 2)(x2 + 2)2 (x2 + 1)
with a rational function p1 not having (x − 1) in the denominator. Inserting x = 1 gives
1
1
= 54
. Similarly,
A13 = 32 ·3·2
1
1
A21 =
.
=
3
2
2
2
3
(x − 1) (x + 2) (x + 1) x=−2 (−3) · 6 · 5
5.2.5 Other Classes of Elementary Integrable Functions
An elementary function is the compositions of rational, exponential, trigonometric functions
and their inverse functions, for example
√
esin( x−1)
f (x) =
.
x + log x
A function is called elementary integrable if it has an elementary antiderivative. Rational functions are elementary integrable. “Most” functions are not elementary integrable as
2
e−x ,
ex
,
x
1
,
log x
They define “new” functions
Z x
t2
e− 2 dt,
W (x) :=
Z0 x
dt
li(x) :=
log t
Z0 ϕ
dx
p
F(ϕ, k) :=
0
1 − k 2 sin2 x
Z ϕp
1 − k 2 sin2 x dx
E(ϕ, k) :=
sin x
.
x
(Gaussian integral),
(integral logarithm)
(elliptic integral of the first kind),
(elliptic integral of the second kind).
0
R
R(cos x, sin x) dx
Let R(u, v) be a rational function in two variables u and v, that is R(u, v) =
x
mials p and q in two variables. We substitute t = tan . Then
2
sin x =
Hence
Z
2t
,
1 + t2
R(cos x, sin x) dx =
Z
cos x =
R
1 − t2
,
1 + t2
1 − t2 2t
,
1 + t2 1 + t2
dx =
2dt
.
1 + t2
2dt
=
1 + t2
with another rational function R1 (t).
There are 3 special cases where another substitution is apropriate.
p(u,v)
q(u,v)
Z
R1 (t) dt
with polino-
5 Integration
142
(a) R(−u, v) = −R(u, v), R is odd in u. Substitute t = sin x.
(b) R(u, −v) = −R(u, v), R is odd in v. Substitute t = cos x.
(c) R(−u, −v) = R(u, v). Substitute t = tan x.
Example 5.8 (1)
Z
sin3 x dx. Here, R(u, v) = v 3 is an odd function in v, such that (b) applies;
t = cos x, dt = − sin x dx, sin2 x = 1 − cos2 x = 1 − t2 . This yields
Z
Z
Z
t3
3
2
sin x dx = − sin x · (− sin x dx) = − (1 − t2 ) dt = −t + + const.
3
3
cos
+ const. .
= − cos x +
3
R
(2) tan x dx. Here, R(u, v) = uv . All (a), (b), and (c) apply to this situation. For example, let
t = sin x. Then cos2 x = 1 − t2 , dt = cos x dx and
Z
Z
Z
Z
1
sin x · cos x dx
d(1 − t2 )
t dt
=
−
=
tan x dx =
=
cos2 x
1 − t2
2
1 − t2
1
= − log(1 − t2 ) = − log | cos x | .
2
R
R(x,
√
n
ax + b) dx
The substitution
t=
√
n
ax + b
yields x = (tn − b)/a, dx = ntn−1 dt/a, and therefore
n
Z
Z
√
t −b
n
n
R
, t tn−1 dt.
R(x, ax + b) dx =
a
a
R
R(x,
√
ax2 + 2bx + c) dx
Using the method of complete squares the above integral can be written in one of the three basic
forms
Z
Z
Z
√
√
√
2
2
R(t, t − 1) dt,
R(t, 1 − t2 ) dt.
R(t, t + 1) dt,
Further substitutions
t = sinh u,
t = ± cosh u,
t = ± cos u,
√
√
√
t2 + 1 = cosh u,
dt = cosh u du,
t2 − 1 = sinh u,
dt = ± sinh u du,
1 − t2 = sin u,
reduce the integral to already known integrals.
dt = ∓ sin u du
5.3 Improper Integrals
143
Z
√
dx
. Hint: t = x2 + 6x + 5 − x.
x2 + 6x + 5
2
2
2
Then (x + t) = x + 2tx + t = x2 + 6x + 5 such that t2 + 2tx = 6x + 5 and therefore x =
and
2t(6 − 2t) + 2(t2 − 5)
−2t2 + 12t − 10
dx =
dt
=
dt.
(6 − 2t)2
(6 − 2t)2
Example 5.9 Compute I =
Hence, using t + x = t +
√
t2 −5
6−2t
=
t2 −5
6−2t
−t2 +6t−5
,
6−2t
Z
Z
(−2t2 + 12t − 10) dt 1
2(6 − 2t)(−t2 + 6t − 5)
I=
=
dt
(6 − 2t)2
t+x
(−t2 + 6t − 5)(6 − 2t)2
Z
√
dt
= − log | 6 − 2t | + const. = log 6 − 2 x2 + 6x + 5 + 2x + const.
=2
6 − 2t
5.3 Improper Integrals
The notion of the Riemann integral defined so far is apparently too tight for some applications:
we can integrate only over finite intervals and the functions are necessarily bounded. If the
integration interval is unbounded or the function to integrate is unbounded we speak about
improper integrals. We consider three cases: one limit of the integral is infinite; the function is
not defined at one of the end points a or b of the interval; both a and b are critical points (either
infinity or the function is not defined there).
5.3.1 Integrals on unbounded intervals
Definition 5.5 Suppose f ∈ R on [a, b] for all b > a where a is fixed. Define
Z
∞
f (x) dx = lim
b→+∞
a
Z
b
f (x) dx
(5.37)
a
if this limit exists (and is finite). In that case, we say that the integral on the left converges. If it
also converges if f has been replaced by | f |, it is said to converge absolutely.
If an integral converges absolutely, then it converges, see Example 5.11 below, where
Z ∞
Z ∞
f dx ≤
| f | dx.
a
Similarly, one defines
Z
a
b
f (x) dx. Moreover,
−∞
Z
∞
f dx :=
−∞
if both integrals on the right side converge.
Z
0
−∞
f dx +
Z
∞
0
f dx
5 Integration
144
Z
Example 5.10 (a) The integral
Indeed,
Z
1
Since
R
1
1
R→+∞
Rs−1
Z
∞
0
(b)
Hence
Z
Z
R
0
∞
dx
converges for s > 1 and diverges for 0 < s ≤ 1.
xs
R
dx
1
1 1
1
1 − s−1 .
·
=
=
xs
1 − s xs−1 1
s−1
R
lim
it follows that
∞
e−x dx = 1.
=
(
0,
+∞,
dx
1
,
=
s
x
s−1
if s > 1,
if 0 < s < 1,
if s > 1.
R
1
e−x dx = −e−x 0 = 1 − R .
e
0
Proposition 5.24 (Cauchy criterion) The improper integral
Z
∞
f dx converges if and only if
a
for every ε > 0 there exists some b > a such that for all c, d > b
Z d
< ε.
f
dx
c
Proof. The following Cauchy criterion for limits of functions is easily proved using sequences:
The limit lim F (x) exists if and only if
x→∞
∀ ε > 0 ∃ R > 0 ∀ x, y > R : | F (x) − F (y) | < ε.
(5.38)
Indeed, suppose that (xn ) is any sequence converging to +∞ as n → ∞. We will show
that (F (xn )) converges if (5.38) is satisfied. Let ε > 0. By assumption, there exists R > 0
with the above property. Since xn −→ +∞ there exists n0 ∈ N such that n ≥ n0 implies
n→∞
xn > R. Hence, | F (xn ) − F (xm ) | < ε as m, n ≥ n0 . Thus, (F (xn )) is a Cauchy sequence and
therefore convergent. This proves one direction of the above criterion. The inverse direction is
even simpler: Suppose that lim F (x) = A exists (and is finite!). We will show that the above
x→+∞
criterion is satisfied.Let ε > 0. By definition of the limit there exists R > 0 such that x, y > R
imply | F (x) − A | < ε/2 and | F (y) − A | < ε/2. By the triangle inequality,
| F (x) − F (y) | = | F (x) − A − (F (y) − A) | ≤ | F (x) − A | + | F (y) − A | <
ε ε
+ = ε,
2 2
as x, y > R which completes the proof of this Cauchy criterion.
R
Rt
d
Applying this criterion to the function F (t) = a f dx noting that | F (d) − F (c) | = c f dx ,
the limit limt→∞ F (t) exists.
5.3 Improper Integrals
145
R∞
R∞
Example 5.11 (a) If a f dx converges absolutely, then a f dx converges. Indeed, let ε > 0
R∞
and a | f | dx converges. By the Cauchy Criterion for the later integral and by the triangle
inequality, Proposition 5.10, there exists b > 0 such that for all c, d > b
Z d
Z d
f dx ≤
| f | dx < ε.
(5.39)
c
c
R∞
Hence, the Cauchy criterion is satisfied for f if it holds for | f |. Thus, a f dx converges.
R∞
(b) 1 sinx x dx. Partial integration with u = x1 and v ′ = sin x yields u′ = − x12 , v = − cos x and
d Z d
Z d
1
sin x
cos x
dx = − cos x −
dx
x
x
x2
c
c
c
Z d
Z d
1
1
sin
x
dx
≤ − cos d + cos c + dx
d
2 x
c
c
c x
1 1
1 1 1 1 <ε
+
≤ + + − ≤2
c d
d c
c d
R∞
if c and d are sufficiently large. Hence, 1 sinx x dx converges.
The integral does not converge absolutely. For non-negative integers n ∈ Z+ we have
Z (n+1)π
Z (n+1)π sin x 1
2
dx ≥
;
| sin x | dx =
x (n + 1)π nπ
(n + 1)π
nπ
hence
n
X
sin x 1
dx ≥ 2
.
x π k=1 k + 1
1
R∞
Since the harmonic series diverges, so does the integral π sinx x dx.
R∞
Proposition 5.25 Suppose f ∈ R is nonnegative, f ≥ 0. Then a f dx converges if there
exists C > 0 such that
Z b
f dx < C, for all b > a.
Z
(n+1)π
a
The proof is similar to the proof of Lemma 2.19 (c); we omit it. Analogous propositions are true
Ra
for integrals −∞ f dx.
Proposition 5.26 (Integral criterion for series) Assume that f ∈ R is nonnegative f ≥ 0 and
R∞
P
decreasing on [1, +∞). Then 1 f dx converges if and only if the series ∞
n=1 f (n) converges.
Proof. Since f (n) ≤ f (x) ≤ f (n − 1) for n − 1 ≤ x ≤ n,
Z n
f dx ≤ f (n − 1).
f (n) ≤
n−1
Summation over n = 2, 3, . . . , N yields
N
X
n=2
f (n) ≤
Z
N
1
f dx ≤
N
−1
X
n=1
f (n).
5 Integration
146
R∞
P
If 1 f dx converges the series ∞
and therefore convergent.
n=1 f (n) is bounded
RR
P∞
P∞
Conversely, if
n=1 f (n) converges, the integral 1 f dx ≤
n=1 f (n) is bounded as
R → ∞, hence convergent by Proposition 5.25.
R ∞ dx
P∞
1
Example 5.12
n=2 n(log n)α converges if and only if 2 x(log x)α converges. The substitution
gives
y = log x, dy = dx
x
Z ∞
Z ∞
dx
dy
=
α
α
x(log x)
2
log 2 y
which converges if and only if α > 1 (see Example 5.10).
5.3.2 Integrals of Unbounded Functions
Definition 5.6 Suppose f is a real function on [a, b) and f ∈ R on [a, t] for every t, a < t < b.
Define
Z
Z
b
t
f dx = lim
f dx
t→b−0
a
a
if the limit on the right exists. Similarly, one defines
Z
b
f dx = lim
t→a+0
a
Z
b
f dx
t
if f is unbounded at a and integrable on [t, b] for all t with a < t < b.
Rb
In both cases we say that a f dx converges.
Example 5.13 (a)
Z 1
Z t
dx
dx
π
√
√
= lim
= lim arcsin x|t0 = lim arcsin t = arcsin 1 = .
2
2
t→1−0
t→1−0
t→1−0
2
1−x
1−x
0
0
(b)
Z
1
0
dx
= lim
t→0+0
xα
Z
1
t
dx
= lim
t→0+0
xα
(
1
1−α 1
x
1−α
t
1
log x|t ,
, α 6= 1
α=1
=
(
1
,
1−α
α < 1,
+∞,
α ≥ 1.
Remarks 5.4 (a) The analogous statements to Proposition 5.24 and Proposition 5.25 are true
Rb
for improper integrals a f dx.
R 1 dx
R1
R1
For example, 0 x(1−x)
diverges since both improper integrals 02 f dx and 1 f dx diverge,
2
R 1 dx
R1
dx
√
√
diverges
since
it
diverges
at
x
=
1,
finally
I
=
converges.
Indeed, the
0
0
x(1−x)
x(1−x)
substitution x = sin2 t gives I = π.
(b) If f is unbounded both at a and at b we define the improper integral
Z
b
f dx =
a
Z
c
f dx +
a
Z
b
f dx
c
5.3 Improper Integrals
147
if c is between a and b and both improper integrals on the right side exist.
(c) Also, if f is unbounded at a define
Z ∞
Z b
Z ∞
f dx =
f dx +
f dx
a
a
b
if the two improper integrals on the right side exist.
(d) If f is unbounded in the interior of the interval [a, b], say at c, we define the improper integral
Z c
Z b
Z b
f dx =
f dx +
f dx
a
a
c
if the two improper integrals on the right side exist. For example,
Z 1
Z 0
Z 1
Z t
Z 1
dx
dx
dx
dx
dx
p
p
p
p
p
=
+
= lim
+ lim
|x|
|x|
| x | t→0−0 −1 | x | t→0+0 t
|x|
−1
−1
0
√ t
√ 1
= lim −2 −x + lim 2 x = 4.
−1
t→0−0
t
t→0+0
5.3.3 The Gamma function
For x > 0 set
Γ(x) =
Z
∞
tx−1 e−t dt.
(5.40)
0
By Example 5.13, Γ1 (x) =
By Example 5.10, Γ2 (x) =
R1
0
tx−1 e−t dt converges since for every t > 0
R∞
1
tx−1 e−t ≤
1
t1−x
.
tx−1 e−t dt converges since for every t ≥ t0
tx−1 e−t ≤
1
.
t2
Note that limt→∞ tx+1 e−t = 0 by Proposition 3.11. Hence, Γ(x) is defined for every x > 0.
Proposition 5.27 For every positive x
xΓ(x) = Γ(x + 1).
(5.41)
In particular, for n ∈ N we have Γ(n + 1) = n!.
Proof. Using integration by parts,
Z R
Z
x −t
x −t R
t e dt = −t e ε + x
ε
R
tx−1 e−t dt.
ε
Taking the limits ε → 0 + 0 and R → +∞ one has Γ(x + 1) = xΓ(x). Since by Example 5.10
Z ∞
Γ(1) =
e−t dt = 1,
0
5 Integration
148
it follows from (5.41) that
Γ(n + 1) = nΓ(n) = · · · = n(n − 1)(n − 2) · · · Γ(1) = n!
The Gamma function interpolates the factorial function n! which is defined only for positive
integers n. However, this property alone is not sufficient for a complete characterization of the
Gamma function. We need another property. This will be done more in detail in the appendix
to this chapter.
5.4 Integration of Vector-Valued Functions
A mapping γ : [a, b] → Rk , γ(t) = (γ1 (t), . . . , γk (t)) is said to be continuous if all the mappings
γi , i = 1, . . . , k, are continuous. Moreover, if all the γi are differentiable, we write γ ′ (t) =
(γ1′ (t), . . . , γk′ (t)).
Definition 5.7 Let f1 , . . . , fk be real functions on [a, b] and let f = (f1 , . . . , fk ) be the correponding mapping from [a, b] into Rk . If α increases on [a, b], to say that f ∈ R(α) means that
fj ∈ R(α) for j = 1, . . . , k. In this case we define
Z b
Z b
Z b
f dα =
f1 dα, . . . ,
fk dα .
a
a
a
Rb
Rb
In other words a f dα is the point in Rk whose jth coordinate is a fj dα. It is clear that parts
(a), (c), and (e) of Proposition 5.9 are valid for these vector valued integrals; we simply apply
the earlier results to each coordinate. The same is true for Proposition 5.12, Theorem 5.14, and
Theorem 5.15. To illustrate this, we state the analog of the fundamental theorem of calculus.
Theorem 5.28 If f = (f1 , . . . , fk ) ∈ R on [a, b] and if F = (F1 , . . . , Fk ) is an antiderivative
of f on [a, b], then
Z b
f (x) dx = F (b) − F (a).
a
The analog of Proposition 5.10 (b) offers some new features.
Let x = (x1 , . . . , xk ) ∈ Rk be any
p
vector in Rk . We denote its Euclidean norm by kxk = x21 + · · · + x2k .
Proposition 5.29 If f = (f1 , . . . , fk ) ∈ R(α) on [a, b] then kf k ∈ R(α) and
Z b
Z b
f dα
kf k dα.
≤
a
(5.42)
a
Proof. By the definition of the norm,
kf k = f12 + f22 + · · · + fk2
12
.
By Proposition 5.10 (a) each of the functions fi2 belong to R(α); hence so does their sum f12 +
f22 + · · · + fk2 . Note that the square-root is a continuous function on the positive half line. If we
5.4 Integration of Vector-Valued Functions
149
apply Proposition 5.8 we see kf k ∈ R(α).
Rb
Rb
To prove (5.42), put y = (y1 , . . . , yk ) with yj = a fj dα. Then we have y = a f dα, and
2
kyk =
k
X
yj2
=
j=1
k
X
yj
Z
Z bX
k
b
fj dα =
a
j=1
a
(yj fj ) dα.
j=1
By the Cauchy–Schwarz inequality, Proposition 1.25,
k
X
j=1
yj fj (t) ≤ kyk kf (t)k ,
t ∈ [a, b].
Inserting this into the preceding equation, the monotony of the integral gives
2
kyk ≤ kyk
Z
a
b
kf k dα.
If y = 0, (5.42) is trivial. If y 6= 0, division by kyk gives (5.42).
Integration of Complex Valued Functions
This is a special case of the above arguments with k = 2. Let ϕ : [a, b] → C be a complexvalued function. Let u, v : [a, b] → R be the real and imaginary parts of ϕ, respectively; u =
Re ϕ and v = Im ϕ.
The function ϕ = u + iv is said to be integrable if u, v ∈ R on [a, b] and we set
Z
b
ϕ dx =
a
Z
b
u dx + i
a
Z
b
v dx.
a
The fundamental theorem of calculus holds: If the complex function ϕ is Riemann integrable,
ϕ ∈ R on [a, b] and F (x) is an antiderivative of ϕ, then
Z
a
b
ϕ(x) dx = F (b) − F (a).
Rx
Similarly, if u and v are both continuous, F (x) = a ϕ(t) dt is an antiderivative of ϕ(x).
Proof. Let F = U +iV be the antiderivative of ϕ where U ′ = u and V ′ = v. By the fundamental
theorem of calculus
Z b
Z b
Z b
ϕ dx =
u dx + i
v dx = U(b) − U(a) + i (V (b) − V (a)) = F (b) − F (a).
a
Example:
a
a
Z
a
b
b
1 αt e dt = e ,
α
a
αt
α ∈ C.
5 Integration
150
5.5 Inequalities
R
Rb
b
Besides the triangle inequality a f dα ≤ a | f | dα which was shown in Proposition 5.10
we can formulate Hölder’s, Minkowski’s, and the Cauchy–Schwarz inequalities for Riemann–
Stieltjes integrals. For, let p > 0 be a fixed positive real number and α an increasing function
on [a, b]. For f ∈ R(α) define the Lp -norm
kf kp =
Z
b
a
p
| f | dα
p1
.
(5.43)
Cauchy–Schwarz Inequality
Proposition 5.30 Let f, g : [a, b] → C be complex valued functions and f, g ∈ R on [a, b].
Then
Z b
2 Z b
Z b
2
| f g | dx ≤
| f | dx ·
| g |2 dx.
(5.44)
a
a
a
R
R
R
Proof. Letting f = | f | and g = | g |, it suffices to show ( f g dx)2 ≤ f 2 dx · g 2 dx. For,
Rb
Rb
Rb
put A = a g 2 dx, B = a f g dx, and C = a f 2 dx. Let λ ∈ C be arbitrary. By the positivity
and linearity of the integal,
Z b
Z b
Z b
Z b
2
2
2
(f + λg) dx =
f dx + 2λ
f g dx + λ
g 2 dx = C + 2Bλ + Aλ2 =: h(λ).
0≤
a
a
a
a
Thus, h is non-negative for all complex values λ.
Case 1. A = 0. Inserting this, we get 2Bλ + C ≥ 0 for all λ ∈ C. This implies B = 0 and
C ≥ 0; the inequality is satisfied.
Case 2. A > 0. Dividing the above inequality by A, we have
2 2
C
B
2B
B
C
2
0≤λ +
λ+ ≤ λ+
−
+ .
A
A
A
A
A
This is satisfied for all λ if and only if
2
B
C
≤
A
A
and, finally B 2 ≤ AC.
This completes the proof.
Proposition 5.31 (a) Cauchy–Schwarz inequality. Suppose f, g ∈ R(α), then
s
s
Z b
Z b
Z b
Z b
2
f g dα ≤
| f g | dα ≤
| f | dα
| g |2 dα or
a
a
a
a
Z b
| f g | dα ≤ kf k2 kgk2 .
a
(5.45)
(5.46)
5.6 Appendix D
151
1 1
(b) Hölder’s inequality. Let p and q be positive real numbers such that + = 1. If f, g ∈ R(α),
p q
then
Z b
Z b
≤
f
g
dα
| f g | dα ≤ kf kp kgkq .
(5.47)
a
a
(c) Minkowski’s inequality. Let p ≥ 1 and f, g ∈ R(α), then
kf + gkp ≤ kf kp + kgkp .
(5.48)
5.6 Appendix D
The composition of an integrable and a continuous function is integrable
Proof of Proposition 5.8. Let ε > 0. Since ϕ is uniformly continuous on [m, M], there exists
δ > 0 such that δ < ε and | ϕ(s) − ϕ(t) | < ε if | s − t | < δ and [s, t ∈ [m, M].
Since f ∈ R(α), there exists a partition P = {x0 , x1 , . . . , xn } of [a, b] such that
U(P, f, α) − L(P, f, α) < δ 2 .
(5.49)
Let Mi and mi have the same meaning as in Definition 5.1, and let Mi∗ and m∗i the analogous
numbers for h. Divide the numbers 1, 2, . . . , n into two classes: i ∈ A if Mi − mi < δ and
i ∈ B if Mi − mi > δ. For i ∈ A our choice of δ shows that Mi∗ − m∗i ≤ ε. For i ∈ B,
Mi∗ − m∗i ≤ 2K where K = sup{| ϕ(t) | | m ≤ t ≤ M}. By (5.49), we have
X
X
∆αi ≤
(Mi − mi )∆αi < δ 2
(5.50)
δ
so that
P
i∈B
i∈B
i∈B
∆αi < δ. It follows that
U(P, h, α) − L(P, h, α) =
X
i∈A
(Mi∗ − m∗i )∆αi +
X
i∈B
(Mi∗ − m∗i )∆αi ≤
ε(α(b) − α(a)) + 2Kδ < ε(α(b) − α(a) + 2K).
Since ε was arbitrary, Proposition 5.3 implies that h ∈ R(α).
Convex Functions are Continuous
Proposition 5.32 Every convex function f : (a, b) → R, −∞ ≤ a < b ≤ +∞, is continuous.
Proof. There is a very nice geometric proof in Rudin’s book “Real and Complex Analysis”,
see [Rud66, 3.2 Theorem]. We give another proof here.
Let x ∈ (a, b); choose a finite subinterval (x1 , x2 ) with a < x1 < x < x2 < b. Since
f (x) ≤ λf (x1 ) + (1 − λ)f (x2 ), λ ∈ [0, 1], f is bounded above on [x1 , x2 ]. Chosing x3 with
x1 < x3 < x the convexity of f implies
f (x3 ) − f (x1 )
f (x) − f (x1 )
f (x3 ) − f (x1 )
≤
=⇒ f (x) ≥
(x − x1 ).
x3 − x1
x − x1
x3 − x1
5 Integration
152
This means that f is bounded below on [x3 , x2 ] by a linear function; hence f is bounded on
[x3 , x2 ], say | f (x) | ≤ C on [x3 , x2 ].
The convexity implies
1
1
1
(x + h) + (x − h) ≤ (f (x + h) + f (x − h))
f
2
2
2
=⇒ f (x) − f (x − h) ≤ f (x + h) − f (x).
Iteration yields
f (x − (ν − 1)h) − f (x − νh) ≤ f (x + h) − f (x) ≤ f (x + νh) − f (x + (ν − 1)h).
Summing up over ν = 1, . . . , n we have
f (x) − f (x − nh) ≤ n (f (x + h) − f (x)) ≤ f (x + nh) − f (x)
1
1
=⇒ (f (x) − f (x − nh)) ≤ f (x + h) − f (x) ≤ (f (x + nh) − f (x)) .
n
n
Let ε > 0 be given; choose n ∈ N such that 2C/n < ε and choose h such that x3 < x − nh <
x < x + nh < x2 . The above inequality then implies
| f (x + h) − f (x) | ≤
2C
< ε.
n
This shows continuity of f at x.
If g is an increasing convex function and f is a convex function, then g ◦f is convex since
f (λx + µy) ≤ λf (x) + µf (y), λ + µ = 1, λ, µ ≥ 0, implies
g(f (λx + µy)) ≤ g(λf (x) + µg(x)) ≤ λg(f (x)) + µg(f (y)).
5.6.1 More on the Gamma Function
Let I ⊂ R be an interval. A positive function F : I → R is called logarithmic convex if
log F : I → R is convex, i. e. for every x, y ∈ I and every λ, 0 ≤ λ ≤ 1 we have
F (λx + (1 − λ)y) ≤ F (x)λ F (y)1−λ.
Proposition 5.33 The Gamma function is logarithmic convex.
Proof. Let x, y > 0 and 0 < λ < 1 be given. Set p = 1/λ and q = 1/(1 − λ). Then
1/p + 1/q = 1 and we apply Hölder’s inequality to the functions
f (t) = t
and obtain
Z
ε
R
f (t)g(t) dt ≤
x−1
p
t
e− p ,
Z
R
ε
g(t) = t
y−1
q
p1 Z
f (t)p dt
t
e− q
R
ε
1q
g(t)q dt .
5.6 Appendix D
153
Note that
x
y
f (t)g(t) = t p + q −1 e−t ,
f (t)p = tx−1 e−t ,
g(t)q = ty−1 e−t .
Taking the limts ε → 0 + 0 and R → +∞ we obtain
1
1
x y
≤ Γ(x) p Γ(y) q .
+
Γ
p q
Remark 5.5 One can prove that a convex function (see Definition 4.4) is continuous, see Proposition 5.32. Also, an increasing convex function of a convex function f is convex, for example
ef is convex if f is. We conclude that Γ(x) is continuous for x > 0.
Theorem 5.34 Let F : (0, +∞) → (0, +∞) be a function with
(a) F (1) = 1,
(b) F (x + 1) = xF (x),
(c) F is logarithmic convex.
Then F (x) = Γ(x) for all x > 0.
Proof. Since Γ(x) has the properties (a), (b), and (c) it suffices to prove that F is uniquely
determined by (a), (b), and (c). By (b),
F (x + n) = F (x)x(x + 1) · · · (x + n)
for every positive x and every positive integer n. In particular F (n + 1) = n! and it suffices to
show that F (x) is uniquely determined for every x with x ∈ (0, 1). Since n + x = (1 − x)n +
x(n + 1) from (c) it follows
F (n + x) ≤ F (n)1−x F (n + 1)x = F (n)1−x F (n)x nx = (n − 1)!nx .
Similarly, from n + 1 = x(n + x) + (1 − x)((n + 1 + x) it follows
n! = F (n + 1) ≤ F (n + x)x F (n + 1 + x)1−x = F (n + x)(n + x)1−x .
Combining both inequalities,
n!(n + x)x−1 ≤ F (n + x) ≤ (n − 1)!nx
and moreover
an (x) :=
Since
bn (x)
an (x)
=
(n − 1)!nx
n!(n + x)x−1
≤ F (x) ≤
=: bn (x).
x(x + 1) · · · (x + n − 1)
x(x + 1) · · · (x + n − 1)
(n+x)nx
n(n+x)x
converges to 1 as n → ∞,
(n − 1)!nx
.
n→∞ x(x + 1) · · · (x + n)
F (x) = lim
Hence F is uniquely determined.
5 Integration
154
Stirling’s Formula
We give an asymptotic formula for n! as n → ∞. We call two sequences (an ) and (bn ) to be
an
asymptotically equal if lim
= 1, and we write an ∼ bn .
n→∞ bn
Proposition 5.35 (Stirling’s Formula) The asymptotical behavior of n! is
n n
√
.
n! ∼ 2πn
e
Proof. Using the trapezoid rule (5.34) with f (x) = log x, f ′′ (x) = −1/x2 we have
Z
k+1
log x dx =
k
1
1
(log k + log(k + 1)) +
2
12ξk2
with k ≤ ξk ≤ k + 1. Summation over k = 1, . . . , n − 1 gives
Since
R
Z
n
1
n
X
n−1
1 X 1
1
log x dx =
log k − log n +
.
2
12 k=1 ξk2
k=1
log x dx = x log x − x (integration by parts), we have
n log n − n + 1 =
n
X
log k =
k=1
where γn = 1 −
1
12
Pn−1
1
k=1 ξk2 .
n
X
k=1
n−1
log k −
1
n+
2
1
1 X 1
log n +
2
12 k=1 ξk2
log n − n + γn ,
Exponentiating both sides of the equation we find with cn = eγn
1
n! = nn+ 2 e−n cn .
(5.51)
Since 0 < 1/ξk2 ≤ 1/k 2 , the limit
∞
X
1
γ = lim γn = 1 −
n→∞
ξ2
k=1 k
exists, and so the limit c = lim cn = eγ .
n→∞
√
Proof of cn → 2π. Using (5.51) we have
√
(n!)2 2n(2n)2n √ 22n (n!)2
c2n
= 2√
=
c2n
n2n+1 (2n)!
n(2n)!
and limn→∞
c2n
c2n
=
c2
c
= c. Using Wallis’s product formula for π
π=2
∞
Y
2·2·4·4 · · · ·2n·2n
4k 2
2
=
lim
4k 2 − 1 n→∞ 1·3·3·5 · · · ·(2n − 1)(2n + 1)
k=1
(5.52)
5.6 Appendix D
155
we have
n
Y
4k 2
2
4k 2 − 1
k=1
! 12
=
√
2
2·4 · · · 2n
1
√
=q
3·5 · · · (2n − 1) 2n + 1
n+
1
=q
n+
such that
1
2
√
22 ·42 · · · (2n)2
2·3·4 · · · (2n − 1)(2n)
22n (n!)2
·
,
(2n)!
√
Consequently, c =
1
2
·
22n (n!)2
π = lim √
.
n→∞
n(2n)!
2π which completes the proof.
Proof of Hölder’s Inequality
Proof of Proposition 5.31. We prove (b). The other two statements are consequences, their
proofs are along the lines in Section 1.3. The main idea is to approximate the integral on the left
by Riemann sums and use Hölder’s inequality (1.22). Let ε > 0; without loss of generality, let
f, g ≥ 0. By Proposition 5.10 f g, f p , g q ∈ R(α) and by Proposition 5.3 there exist partitions
P1 , P2 , and P3 of [a, b] such that U(f g, P1 , α)−L(f g, P1, α) < ε, U(f p , P2 , α)−L(f p , P2 , α) <
ε, and U(g q , P3 , α) − L(g q , P3 , α) < ε. Let P = {x0 , x1 , . . . , xn } be the common refinement
of P1 , P2 , and P3 . By Lemma 5.4 (a) and (c)
Z b
n
X
f g dα <
(f g)(ti)∆αi + ε,
(5.53)
a
n
X
i=1
n
X
i=1
b
f (ti )p ∆αi <
q
g(ti ) ∆αi <
i=1
Z
Z
f p dα + ε,
(5.54)
g q dα + ε,
(5.55)
a
b
a
for any ti ∈ [xi−1 , xi ]. Using the two preceding inequalities and Hölder’s inequality (1.22) we
have
! p1
! 1q
n
n
n
1
1
X
X
X
f (ti )∆αip g(ti )∆αiq ≤
f (ti )p ∆αi
g(ti)q ∆αi
i=1
i=1
<
Z
i=1
b
p
f dα + ε
a
p1 Z
b
q
g dα + ε
a
1q
.
By (5.53),
Z
b
a
Z b
p1 Z b
1q
n
X
p
q
f g dα <
(f g)(ti )∆αi + ε <
f dα + ε
g dα + ε + ε.
i=1
a
a
156
Since ε > 0 was arbitrary, the claim follows.
5 Integration
Chapter 6
Sequences of Functions and Basic
Topology
In the present chapter we draw our attention to complex-valued functions (including the realvalued), although many of the theorems and proofs which follow extend to vector-valued functions without difficulty and even to mappings into more general spaces. We stay within this
simple framework in order to focus attention on the most important aspects of the problem that
arise when limit processes are interchanged.
6.1 Discussion of the Main Problem
Definition 6.1 Suppose (fn ), n ∈ N, is a sequence of functions defined on a set E, and suppose
that the sequence of numbers (fn (x)) converges for every x ∈ E. We can then define a function
f by
f (x) = lim fn (x),
n→∞
x ∈ E.
(6.1)
Under these circumstances we say that (fn ) converges on E and f is the limit (or the limit
function) of (fn ). Sometimes we say that “(fn ) converges pointwise to f on E” if (6.1) holds.
P
Similarly, if ∞
n=1 fn (x) converges for every x ∈ E, and if we define
f (x) =
∞
X
fn (x),
n=1
the function f is called the sum of the series
P∞
n=1
x ∈ E,
(6.2)
fn .
The main problem which arises is to determine whether important properties of the functions
fn are preserved under the limit operations (6.1) and (6.2). For instance, if the functions fn are
continuous, or differentiable, or integrable, is the same true of the limit function? What are the
relations between fn′ and f ′ , say, or between the integrals of fn and that of f ? To say that f is
continuous at x means
lim f (t) = f (x).
t→x
157
6 Sequences of Functions and Basic Topology
158
Hence, to ask whether the limit of a sequence of continuous functions is continuous is the same
as to ask whether
lim lim fn (t) = lim lim fn (t)
t→x n→∞
(6.3)
n→∞ t→x
i. e. whether the order in which limit processes are carried out is immaterial. We shall now
show by means of several examples that limit processes cannot in general be interchanged
without affecting the result. Afterwards, we shall prove that under certain conditions the order
in which limit operations are carried out is inessential.
Example 6.1 (a) Our first example, and the simplest one, concerns a “double sequence.” For
positive integers m, n ∈ N let
m
smn =
.
m+n
Then, for fixed n
lim smn = 1,
m→∞
so that lim lim smn = 1. On the other hand, for every fixed m,
n→∞ m→∞
lim smn = 0,
n→∞
so that lim lim smn = 0. The two limits cannot be interchanged.
m→∞ n→∞
(
0,
0 ≤ x < 1,
All functions fn (x) are conti(b) Let fn (x) = xn on [0, 1]. Then f (x) =
1,
x = 1.
nous on [0, 1]; however, the limit f (x) is discontinuous at x = 1; that is lim lim tn = 0 6=
1 = lim lim tn . The limits cannot be interchanged.
x→1−0 n→∞
n→∞ t→1−0
After these examples, which show what can go wrong if limit processes are interchanged carelessly, we now define a new notion of convergence, stronger than pointwise convergence as
defined in Definition 6.1, which will enable us to arrive at positive results.
6.2 Uniform Convergence
6.2.1 Definitions and Example
Definition 6.2 A sequence of functions (fn ) converges uniformly on E to a function f if for
every ε > 0 there is a positive integer n0 such that n ≥ n0 implies
| fn (x) − f (x) | ≤ ε
for all x ∈ E. We write fn ⇉ f on E.
As a formula, fn ⇉ f on E if
∀ ε > 0 ∃ n0 ∈ N ∀ n ≥ n0 ∀ x ∈ E : | fn (x) − f (x) | ≤ ε.
(6.4)
6.2 Uniform Convergence
159
f(x) + ε
f(x)
f(x) − ε
a
Uniform convergence of fn to f on [a, b] means
that fn is in the ε-tube of f for sufficiently large
n
b
It is clear that every uniformly convergent sequence is pointwise convergent (to the same function). Quite explicitly, the difference between the two concepts is this: If (fn ) converges pointwise on E to a function f , for every ε > 0 and for every x ∈ E, there exists an integer n0
depending on both ε and x ∈ E such that (6.4) holds if n ≥ n0 . If (fn ) converges uniformly on
E it is possible, for each ε > 0 to find one integer n0 which will do for all x ∈ E.
P
We say that the series ∞
k=1 fk (x) converges uniformly on E if the sequence (sn (x)) of partial
sums defined by
n
X
fk (x)
sn (x) =
k=1
converges uniformly on E.
Proposition 6.1 (Cauchy criterion) (a) The sequence of functions (fn ) defined on E converges
uniformly on E if and only if for every ε > 0 there is an integer n0 such that n, m ≥ n0 and
x ∈ E imply
| fn (x) − fm (x) | ≤ ε.
(b) The series of functions
∞
X
(6.5)
gk (x) defined on E converges uniformly on E if and only if for
k=1
every ε > 0 there is an integer n0 such that n, m ≥ n0 and x ∈ E imply
n
X
gk (x) ≤ ε.
k=m
Proof. Suppose (fn ) converges uniformly on E and let f be the limit function. Then there is an
integer n0 such that n ≥ n0 , x ∈ E implies
ε
| fn (x) − f (x) | ≤ ,
2
so that
| fn (x) − fm (x) | ≤ | fn (x) − f (x) | + | fm (x) − f (x) | ≤ ε
if m, n ≥ n0 , x ∈ E.
Conversely, suppose the Cauchy condition holds. By Proposition 2.18, the sequence (fn (x))
converges for every x to a limit which may we call f (x). Thus the sequence (fn ) converges
6 Sequences of Functions and Basic Topology
160
pointwise on E to f . We have to prove that the convergence is uniform. Let ε > 0 be given,
choose n0 such that (6.5) holds. Fix n and let m → ∞ in (6.5). Since fm (x) → f (x) as
m → ∞ this gives
| fn (x) − f (x) | ≤ ε
for every n ≥ n0 and x ∈ E.
P
(b) immediately follows from (a) with fn (x) = nk=1 gk (x).
Remark 6.1 Suppose
lim fn (x) = f (x),
n→∞
x ∈ E.
Put
Mn = sup | fn (x) − f (x) | .
x∈E
Then fn ⇉ f uniformly on E if and only if Mn → 0 as n → ∞.
(prove!)
The following comparison test of a function series with a numerical series gives a sufficient
criterion for uniform convergence.
Theorem 6.2 (Weierstraß) Suppose (fn ) is a sequence of functions defined on E, and suppose
Then
P∞
n=1
| fn (x) | ≤ Mn ,
fn converges uniformly on E if
P∞
x ∈ E, n ∈ N.
n=1
(6.6)
Mn converges.
P
Proof. If Mn converges, then, for arbitrary ε > 0 there exists n0 such that m, n ≥ n0 implies
Pn
i=m Mi ≤ ε. Hence,
n
n
n
X
X
X
f
(x)
≤
|
f
(x)
|
≤
Mi ≤ ε,
∀ x ∈ E.
i
i
tr.In.
(6.6)
i=m
i=m
i=m
Uniform convergence now follows from Proposition 6.1.
P
Proposition 6.3 ( Comparison Test) If ∞
converges uniformly on E and | fn (x) | ≤
n=1 gn (x)P
gn (x) for all sufficiently large n and all x ∈ E then ∞
n=1 fn (x) converges uniformly on E.
Proof. Apply the Cauchy criterion. Note that
m
m
m
X
X
X
fn (x) ≤
| fn (x) | ≤
gn (x) < ε.
n=k
n=k
n=k
6.2 Uniform Convergence
161
Application of Weierstraß’ Theorem to Power Series and Fourier Series
Proposition 6.4 Let
∞
X
an ∈ C,
an z n ,
n=0
(6.7)
be a power series with radius of convergence R > 0.
Then (6.7) converges uniformly on the closed disc {z | | z | ≤ r} for every r with 0 ≤ r < R.
Proof. We apply Weierstraß’ theorem to fn (z) = an z n . Note that
| fn (z) | = | an | | z |n ≤ | an | r n .
P
n
Since r < R, r belongs to the disc of convergence, the series ∞
n=0 | an | r converges by TheoP∞
rem 2.34. By Theorem 6.2, the series n=0 an z n converges uniformly on {z | | z | ≤ r}.
Remark 6.2 (a) The power series
∞
X
nan z n−1
n=0
has the same radius of convergence R as the series (6.7) and hence also converges uniformly on
the closed disc {z | | z | ≤ r}.
Indeed, this simply follows from the fact that
lim
n→∞
p
n
(n + 1) | an+1 | = lim
n→∞
√
n
p
1
n
| an | = .
n→∞
R
n + 1 lim
(b) Note that the power series in general does not converge uniformly on the whole open disc
of convergence | z | < R. As an example, consider the geometric series
f (z) =
∞
X
1
=
zk ,
1−z
k=0
| z | < 1.
Note that the condition
∃ ε0 > 0 ∀ n ∈ N ∃ xn ∈ E : | fn (xn ) − f (xn ) | ≥ ε0
implies that (fn ) does not converge uniformly to f on E.
Prove!
n
To ε = 1 and every n ∈ N choose zn = n+1 and we obtain, using Bernoulli’s inequality,
n
1
1
znn
n
zn = 1 −
= 1 − zn , hence
≥1−n
≥ 1.
(6.8)
n+1
n+1
1 − zn
so that
n−1
X
1
| sn−1 (zn ) − f (zn ) | = znk −
1 − zn
k=0
∞
X znn
znk =
≥ 1.
=
1 − zn (6.8)
k=n
The geometric series doesn’t converge uniformly on the whole open unit disc.
6 Sequences of Functions and Basic Topology
162
Example 6.2 (a) A series of the form
∞
X
an cos(nx) +
n=0
∞
X
an , bn , x ∈ R,
bn sin(nx),
n=1
(6.9)
P∞
P
is called a Fourier series (see Section 6.3 below). If both ∞
n=0 | an | and
n=0 | bn | converge
then the series (6.9) converges uniformly on R to a function F (x).
Indeed, since | an cos(nx) | ≤ | an | and | bn sin(nx) | ≤ | bn |, by Theorem 6.2, the series (6.9)
converges uniformly on R.
(b) Let f : R → R be the sum of the Fourier series
f (x) =
∞
X
sin nx
n=1
n
(6.10)
P
P
Note that (a) does not apply since n | bn | = n n1 diverges.
If f (x) exists, so does f (x + 2π) = f (x), and f (0) = 0. We will show that the series converges
uniformly on [δ, 2π − δ] for every δ > 0. For, put
!
n
n
X
X
sn (x) =
sin kx = Im
eikx .
k=1
k=1
If δ ≤ x ≤ 2π − δ we have
n
ei(n+1)x − eix X
2
1
1
ikx ≤
=
| sn (x) | ≤ e = .
x ≤
ix
ix/2
−ix/2
e −1
|e
−e
|
sin 2
sin δ2
k=1
Note that | Im z | ≤ | z | and eix = 1. Since sin x2 ≥ sin 2δ for δ/2 ≤ x/2 ≤ π − δ/2 we have
for 0 < m < n
n
n
X
X
sin
kx
s
(x)
−
s
(x)
k
k−1
=
k
k
k=m
k=m
n
X
(x)
(x)
s
1
s
1
n
m−1
=
+
−
−
sk (x)
k
k
+
1
n
+
1
m
k=m
!
n X
1
1 1 1
1
≤
+
−
+
n+1 m
sin 2δ k=m k k + 1
1
2
1
1
1
1
≤
≤
−
+
+
δ
sin 2 m n + 1 n + 1 m
m sin 2δ
The right side becomes arbitraryly small as m → ∞. Using Proposition 6.1 (b) uniform convergence of (6.10) on [δ, 2π − δ] follows.
6.2.2 Uniform Convergence and Continuity
Theorem 6.5 Let E ⊂ R be a subset and fn : E → R, n ∈ N, be a sequence of continuous
functions on E uniformly converging to some function f : E → R.
Then f is continuous on E.
6.2 Uniform Convergence
163
Proof. Let a ∈ E and ε > 0 be given. Since fn ⇉ f there is an r ∈ N such that
| fr (x) − f (x) | ≤ ε/3
for all
x ∈ E.
Since fr is continuous at a, there exists δ > 0 such that | x − a | < δ implies
| fr (x) − f (a) | ≤ ε/3.
Hence | x − a | < δ implies
| f (x) − f (a) | ≤ | f (x) − fr (x) | + | fr (x) − fr (a) | + | fr (a) − f (a) | ≤
ε ε ε
+ + = ε.
3 3 3
This proves the assertion.
The same is true for functions f : E → C where E ⊂ C.
Example 6.3 (Example 6.2 continued) (a) A power series defines a continuous functions on
the disc of convergence {z | | z | < R}.
Indeed, let | z0 | < R. Then there exists r ∈ R such that | z0 | < r < R. By Proposition 6.4,
P
n
n an x converges uniformly on {x | | x | ≤ r}. By Theorem 6.5, the function, defined by the
sum of the power series is continuous on [−r, r].
P
(b) The sum of the Fourier series (6.9) is a continuous function on R if both n | an | and
P
n | bn | converge.
P
is continuous on [δ, 2π − δ] for all δ with
(c) The sum of the Fourier series f (x) = n sin(nx)
n
0 < δ < π by the above theorem. Clearly, f is 2π-periodic since all partial sums are.
π/2
f(x)
.
π
.
.
2π
Later (see the section on Fourier series) we will show that
(
0,
x = 0,
f (x) = π−x
,
x ∈ (0, 2π),
2
Since f is discontinuous at x0 = 2πn, the Fourier series
does not converge uniformly on R.
−π/2
Also, Example 6.1 (b) shows that the continuity of the fn (x) = xn alone is not sufficient for
the continuity of the limit function. On the other hand, the sequence of continuous functions
(xn ) on (0, 1) converges to the continuous function 0. However, the convergence is not uniform.
Prove!
6.2.3 Uniform Convergence and Integration
Example 6.4 Let fn (x) = 2n2 xe−n
Z
0
2 x2
; clearly limn→∞ fn (x) = 0 for all x ∈ R. Further
1
fn (x) dx = −e−n
2 x2
1 2
= 1 − e−n −→ 1.
0
6 Sequences of Functions and Basic Topology
164
R1
R1
On the other hand 0 limn→∞ fn (x) dx = 0 0 dx = 0. Thus, limn→∞ and integration cannot
be interchanged. The reason, (fn ) converges pointwise to 0 but not uniformly. Indeed,
2n2 −1 2n
1
fn
=
e =
−→ +∞.
n
n
e n→∞
Theorem 6.6 Let α be an increasing function on [a, b]. Suppose fn ∈ R(α) on [a, b] for all
n ∈ N and suppose fn → f uniformly on [a, b]. Then f ∈ R(α) on [a, b] and
Z b
Z b
f dα = lim
fn dα.
(6.11)
n→∞
a
a
Proof. Put
εn = sup | fn (x) − f (x) | .
x∈[a,b]
Then
fn − εn ≤ f ≤ fn + εn ,
so that the upper and the lower integrals of f satisfy
Z
b
a
(fn − εn ) dα ≤
Hence,
0≤
Z
Z
b
a
f dα −
f dα ≤
Z
Z
b
a
f dα ≤
Z
b
(fn + εn ) dα.
(6.12)
a
f dα ≤ 2εn (α(b) − α(a)).
Since εn → 0 as n → ∞ (Remark 6.1), the upper and the lower integrals of f are equal. Thus
f ∈ R(α). Another application of (6.12) yields
Z b
Z b
Z b
(fn − εn ) dα ≤
f dα ≤
(fn + εn ) dα
a
a
a
Z b
Z b
≤ εn ((α(b) − α(a)).
f
dα
−
f
dα
n
a
a
This implies (6.11).
Corollary 6.7 If fn ∈ R(α) on [a, b] and if the series
f (x) =
∞
X
fn (x),
n=1
a≤x≤b
converges uniformly on [a, b], then
Z
a
b
∞
X
n=1
fn
!
dα =
∞ Z
X
n=1
b
a
In other words, the series may be integrated term by term.
fn dα .
6.2 Uniform Convergence
165
Corollary 6.8 Let fn : [a, b] → R be a sequence of continuous functions uniformly converging
on [a, b] to f . Let x0 ∈ [a, b].
Rx
Rx
Then the sequence Fn (x) = x0 fn (t) dt converges uniformly to F (x) = x0 f (t) dt.
Proof. The pointwise convergence of Fn follows from the above theorem with α(t) = t and a
and b replaced by x0 and x.
We show uniform convergence: Let ε > 0. Since fn ⇉ f on [a, b], there exists n0 ∈ N such
ε
for all t ∈ [a, b]. For n ≥ n0 and all x ∈ [a, b] we
that n ≥ n0 implies | fn (t) − f (t) | ≤ b−a
thus have
Z x
Z x
ε
| Fn (x) − F (x) | = (b − a) = ε.
(fn (t) − f (t)) dt ≤
| fn (t) − f (t) | dt ≤
b−a
x0
x0
Hence, Fn ⇉ F on [a, b].
Example 6.5 (a) For every real t ∈ (−1, 1) we have
∞
X (−1)n−1
t2 t3
tn .
log(1 + t) = t − + ∓ · · · =
2
3
n
n=1
(6.13)
Proof. In Homework 13.5 (a) there was computed the Taylor series
T (x) =
∞
X
(−1)n−1
n=1
n
xn
of log(1 + x) and it was shown that T (x) = log(1 + x) if x ∈ (0, 1).
∞
X
By Proposition 6.4 the geometric series
(−1)n xn converges uniformly to the function
1
1+x
n=0
on [−r, r] for all 0 < r < 1. By Corollary 6.7 we have for all t ∈ [−r, r]
Z tX
Z t
∞
dx
t
=
(−1)n xn dx
log(1 + t) = log(1 + x)|0 =
1
+
x
0
0 n=0
t
Z
∞
∞
∞
t
X
X (−1)n
X
(−1)n−1 n
n n
n+1 x =
t
=
(−1) x dx =
Cor 6.7
n
+
1
n
0
0
n=0
n=0
n=1
(b) For | t | < 1 we have
∞
X
t2n+1
t3 t5
(−1)n
arctan t = t − + ∓ · · · =
3
5
2n + 1
n=0
(6.14)
As in the previous example we use the uniform convergence of the geometric series on [−r, r]
for every 0 < r < 1 that allows to exchange integration and summation
Z t
Z tX
Z t
∞
∞
∞
X
X
dx
(−1)n 2n+1
n 2n
n
2n
arctan t =
t
=
(−1) x dx =
(−1)
x dx =
.
2
2n + 1
0 1+x
0 n=0
0
n=0
n=0
6 Sequences of Functions and Basic Topology
166
Note that you are, in general, not allowed to insert t = 1 into the equations (6.13) and (6.14).
However, the following proposition (the proof is in the appendix to this chapter) fills this gap.
Proposition 6.9 (Abel’s Limit Theorem) Let
Then the power series
f (x) =
P∞
n=0
∞
X
an a convergent series of real numbers.
an xn
n=0
converges for x ∈ [0, 1] and is continuous on [0, 1].
As a consequence of the above proposition we have
∞
X (−1)n−1
1 1 1
log 2 = 1 − + − ± · · · =
,
2 3 4
n
n=0
∞
X (−1)n
1 1 1
π
= 1− + − ±··· =
.
4
3 5 7
2n + 1
n=0
1 −x
e n ⇉ f (x) ≡ 0 on [0, +∞). Indeed, | fn (x) − 0 | ≤
n
and for all x ∈ R+ . However,
Example 6.6 We have fn (x) =
if n ≥
1
ε
Z
0
∞
+∞
− nt fn (t) dt = −e
Hence
lim
n→∞
Z
0
∞
0
= lim
t→+∞
fn (t) dt = 1 6= 0 =
Z
− nt
1−e
∞
1
n
<ε
= 1.
f (t) dt.
0
That is, Theorem 6.6 fails in case of improper integrals.
6.2.4 Uniform Convergence and Differentiation
Example 6.7 Let
fn (x) =
sin(nx)
√
,
n
x ∈ R, n ∈ N,
(6.15)
f (x) = lim fn (x) = 0. Then f ′ (x) = 0, and
n→∞
fn′ (x) =
√
n cos(nx),
so that (fn′ ) does not converge to f ′ . For instance fn′ (0) =
√
n −→ +∞ as n → ∞, whereas
n→∞
√
√
f (0) = 0. Note that (fn ) converges uniformly to 0 on R since | sin(nx)/ n | ≤ 1/ n becomes
small, independently on x ∈ R.
Consequently, uniform convergence of (fn ) implies nothing about the sequence (fn′ ). Thus,
stronger hypothesis are required for the assertion that fn → f implies fn′ → f ′
′
6.2 Uniform Convergence
167
Theorem 6.10 Suppose (fn ) is a sequence of continuously differentiable functions on [a, b]
pointwise converging to some function f . Suppose further that (fn′ ) converges uniformly on
[a, b].
Then fn converges uniformly to f on [a, b], f is continuously differentiable and on [a, b], and
f ′ (x) = lim fn′ (x),
a ≤ x ≤ b.
n→∞
(6.16)
Proof. Put g(x) = limn→∞ fn′ (x), then g is continuous by Theorem 6.5. By the Fundamental
Theorem of Calculus, Theorem 5.14,
Z x
fn (x) = fn (a) +
fn′ (t) dt.
a
By assumption on (fn′ ) and by Corollary 6.8 the sequence
Z x
′
fn (t) dt
N
a
converges uniformly on [a, b] to
thus obtain
n∈
Rx
g(t) dt. Taking the limit n → ∞ in the above equation, we
Z x
f (x) = f (a) +
g(t) dt.
a
a
Since g is continuous, the right hand side defines a differentiable function, namely the
antiderivative of g(x), by the FTC. Hence, f ′ (x) = g(x); since g is continuous the proof is now
complete.
For a more general result (without the additional assumption of continuity of fn′ ) see [Rud76,
7.17 Theorem].
P
n
Corollary 6.11 Let f (x) = ∞
n=0 an x be a power series with radius of convergence R.
(a) Then f is differentiable on (−R, R) and we have
f ′ (x) =
∞
X
nan xn−1 ,
n=1
x ∈ (−R, R).
(6.17)
(b) The function f is infinitely often differentiable on (−R, R) and we have
f (k) (x) =
∞
X
n=k
an =
n(n − 1) · · · (n − k + 1)an xn−k ,
1 (n)
f (0),
n!
n ∈ N0 .
(6.18)
(6.19)
In particular, f coincides with its Taylor series.
P
n ′
Proof. (a) By Remark 6.2 (a), the power series ∞
n=0 (an x ) has the same radius of convergence
and converges uniformly on every closed subinterval [−r, r] of (−R, R). By Theorem 6.10,
f (x) is differentiable and differentiation and summation can be interchanged.
6 Sequences of Functions and Basic Topology
168
(b) Iterated application of (a) yields that f (k−1) is differentiable on (−R, R) with (6.18). In
particular, inserting x = 0 into (6.18) we find
f (k) (0) = k!ak ,
=⇒
ak =
f (k) (0)
.
k!
These are exactly the Taylor coefficients of f hat a = 0. Hence, f coincides with its Taylor
series.
Example 6.8 For x ∈ (−1, 1) we have
∞
X
nxn =
n=1
x
.
(1 − x)2
P
n
Since the geometric series f (x) = ∞
n=0 x equals 1/(1 − x) on (−1, 1) by Corollary 6.11 we
have
!
∞
∞
∞
X
X
1
d
d
d X n
1
n
=
=
x
nxn−1 .
=
(x ) =
(1 − x)2
dx 1 − x
dx n=0
dx
n=1
n=1
Multiplying the preceding equation by x gives the result.
6.3 Fourier Series
In this section we consider basic notions and results of the theory of Fourier series. The question
is to write a periodic function as a series of cos kx and sin kx, k ∈ N. In contrast to Taylor
expansions the periodic function need not to be infinitely often differentiable. Two Fourier
series may have the same behavior in one interval, but may behave in different ways in some
other interval. We have here a very striking contrast between Fourier series and power series.
In this section a periodic function is meant to be a 2π-periodic complex valued function on R,
that is f : R → C satisfies f (x + 2π) = f (x) for all x ∈ R. Special periodic functions are the
trigonometric polynomials.
Definition 6.3 A function f : R → R is called trigonometric polynomial if there are real numbers ak , bk , k = 0, . . . , n with
n
f (x) =
a0 X
ak cos kx + bk sin kx.
+
2
k=1
The coefficients ak and bk are uniquely determined by f since
Z
1 2π
ak =
f (x) cos kx dx, k = 0, 1, . . . , n,
π 0
Z
1 2π
f (x) sin kx dx, k = 1, . . . , n.
bk =
π 0
(6.20)
(6.21)
6.3 Fourier Series
169
This is immediate from
Z
2π
cos kx sin mx dx = 0,
0
Z 2π
0
Z
cos kx cos mx dx = πδkm , k, m ∈ N,
(6.22)
2π
sin kx sin mx dx = πδkm ,
0
where δkm = 1 if k = m and δkm = 0 if k 6= m is the so called Kronecker symbol, see
Homework 19.2. For example, if m ≥ 1 we have
!
Z
Z
n
1 2π a0 X
1 2π
+
f (x) cos mx dx =
ak cos kx + bk sin kx cos mx dx
π 0
π 0
2
k=1
!
n Z
1 X 2π
=
(ak cos kx cos mx + bk sin kx cos mx) dx
π k=1 0
!
n
1 X
=
ak πδkm = am .
π k=1
Sometimes it is useful to consider complex trigonometric polynomials. Using the formulas
expressing cos x and sin x in terms of eix and e−ix we can write the above polynomial (6.20) as
f (x) =
n
X
ck eikx ,
(6.23)
k=−n
where c0 = a0 /2 and
ck =
1
(ak − ibk ) ,
2
c−k =
1
(ak + ibk ) ,
2
k ≥ 1.
To obtain the coefficients ck using integration we need the notion of an integral of a complexvalued function, see Section 5.5. If m 6= 0 we have
Z
b
imx
e
a
If a = 0 and b = 2π and m ∈ Z we obtain
Z
b
1 imx dx =
e .
im
a
2π
eimx dx =
0
(
0,
2π,
m ∈ Z \ {0},
m = 0.
We conclude,
1
ck =
2π
Z
2π
f (x) e−ikx dx,
0
k = 0, ±1, . . . , ±n.
(6.24)
6 Sequences of Functions and Basic Topology
170
Definition 6.4 Let f : R → C be a periodic function with f ∈ R on [0, 2π]. We call
1
ck =
2π
Z
2π
f (x)e−ikx dx,
0
k∈Z
(6.25)
the Fourier coefficients of f , and the series
∞
X
ck eikx ,
(6.26)
k=−∞
i. e. the sequence of partial sums
sn =
n
X
ck eikx ,
k=−n
n ∈ N,
the Fourier series of f .
The Fourier series can also be written as
∞
a0 X
+
ak cos kx + bk sin kx.
2
(6.27)
k=1
where ak and bk are given by (6.21). One can ask whether the Fourier series of a function
converges to the function itself. It is easy to see: If the function f is the uniform limit of a series
of trigonometric polynomials
∞
X
f (x) =
γk eikx
(6.28)
k=−∞
then f coincides with its Fourier series. Indeed, since the series (6.28) converges uniformly, by
Proposition 6.6 we can change the order of summation and integration and obtain
1
ck =
2π
=
Z
2π
0
∞
X
γm eimx
m=−∞
Z
∞
X
1
2π m=−∞
!
e−ikx dx
2π
γm ei(m−k)x dx = γk .
0
In general, the Fourier series of f neither converges uniformly nor pointwise to f . For Fourier
series convergence with respect to the L2 -norm
kf k2 =
is the appropriate notion.
1
2π
Z
0
2π
21
| f | dx
2
(6.29)
6.3 Fourier Series
171
6.3.1 An Inner Product on the Periodic Functions
Let V be the linear space of periodic functions f : R → C, f ∈ R on [0, 2π]. We introduce an
inner product on V by
Z 2π
1
f ·g =
f (x) g(x) dx, f, g ∈ V.
2π 0
One easily checks the following properties for f, g, h ∈ V , λ, µ ∈ C.
f + g ·h = f ·h + g ·h,
f ·g + h = f ·g + f ·h,
λf ·µg = λ µ f ·g,
f ·g = g ·f.
R 2π
For every f ∈ V we have f ·f = 1/(2π) 0 | f |2 dx ≥ 0. However, f ·f = 0 does not imply
f = 0 (you can change f at finitely many points without any impact on f ·f ). If f ∈ V is
√
continuous, then f ·f = 0 implies f = 0, see Homework 14.3. Put kf k2 = f ·f .
Note that in the physical literature the inner product in L2 (X) is often linear in the second component and antilinear in the first component. Define for k ∈ Z the periodic function ek : R → C
by ek (x) = eikx , the Fourier coefficients of f ∈ V take the form
ck = f ·ek ,
k ∈ Z.
From (6.24) it follows that the functions ek , k ∈ Z, satisfy
ek ·el = δkl .
(6.30)
Any such subset {ek | k ∈ N} of an inner product space V satisfying (6.30) is called an
orthonormal system (ONS). Using ek (x) = cos kx + i sin kx the real orthogonality relations
(6.22) immediately follow from (6.30).
The next lemma shows that the Fourier series of f is the best L2 -approximation of a periodic
function f ∈ V by trigonometric polynomials.
Lemma 6.12 (Least Square Approximation) Suppose f ∈ V has the Fourier coefficients ck ,
k ∈ Z and let γk ∈ C be arbitrary. Then
2 2
n
n
X
X
ck ek ≤ f −
γk ek ,
(6.31)
f −
k=−n
2
k=−n
2
and equality holds if and only if ck = γk for all k. Further,
2
n
n
X
X
2
ck ek = kf k2 −
| ck |2 .
f −
k=−n
2
k=−n
(6.32)
6 Sequences of Functions and Basic Topology
172
Proof. Let
P
always denote
n
X
. Put gn =
k=−n
f ·g n = f ·
and gn ·ek = γk such that
X
P
γk ek . Then
γk ek =
g n ·g n =
P
P
γk f ·ek =
P
ck γ k
| γk |2 .
Noting that | a − b |2 = (a − b)(a − b) = | a |2 + | b |2 − ab − ab, it follows that
kf − gn k22 = f − gn ·f − gn = f ·f − f ·gn − gn ·f + gn ·gn
= kf k22 −
= kf k22 −
P
P
ck γ k −
P
| ck |2 +
ck γ k +
P
P
| γk |2
| γ k − ck | 2
(6.33)
which is evidently minimized if and only if γk = ck . Inserting this into (6.33), equation (6.32)
follows.
Corollary 6.13 (Bessel’s Inequality) Under the assumptions of the above lemma we have
∞
X
k=−∞
| ck |2 ≤ kf k22 .
(6.34)
Proof. By equation (6.32), for every n ∈ N we have
n
X
k=−n
| ck |2 ≤ kf k22 .
Taking the limit n → ∞ or supn∈N shows the assertion.
An ONS {ek | k ∈ Z} is said to be complete if instead of Bessel’s inequality, equality holds for
all f ∈ V .
Definition 6.5 Let fn , f ∈ V , we say that (fn ) converges to f in L2 (denoted by fn −→ f ) if
k·k2
lim kfn − f k2 = 0.
n→∞
Explicitly
Z
2π
0
| fn (x) − f (x) |2 dx −→ 0.
n→∞
6.3 Fourier Series
173
Remarks 6.3 (a) Note that the L2 -limit in V is not unique; changing f (x) at finitely many
R 2π
points of [0, 2π] does not change the integral 0 | f − fn |2 dx.
(b) If fn ⇉ f on R then fn −→ f . Indeed, let ε > 0. Then there exists n0 ∈ N such that
k·k2
n ≥ n0 implies supx∈R | fn (x) − f (x) | ≤ ε. Hence
Z 2π
Z 2π
2
| fn − f | dx ≤
ε2 dx = 2πε2 .
0
0
This shows fn − f −→ 0.
k·k2
(c) The above Lemma, in particular (6.32), shows that the Fourier series converges in L2 to f if
and only if
kf k22
∞
X
=
k=−∞
| ck | 2 .
(6.35)
This is called Parseval’s Completeness Relation. We will see that it holds for all f ∈ V .
Let use write
f (x) ∼
∞
X
ck eikx
k=−∞
to express the fact that (ck ) are the (complex) Fourier coefficients of f . Further
sn (f ) = sn (f ; x) =
n
X
ck eikx
(6.36)
k=−n
denotes the nth partial sum.
Theorem 6.14 (Parseval’s Completeness Theorem) The ONS {ek | k ∈ Z} is complete.
More precisely, if f, g ∈ V with
f∼
∞
X
ck ek ,
k=−∞
g∼
∞
X
γk ek ,
k=−∞
then
(i)
1
lim
n→∞ 2π
Z
2π
0
(ii)
(iii)
| f − sn (f ) |2 dx = 0,
1
2π
Z
1
2π
Z
0
2π
f g dx =
0
2π
| f |2 dx =
∞
X
k=−∞
∞
X
k=−∞
(6.37)
ck γ k ,
| ck | 2 =
(6.38)
∞
a20 1 X 2
+
(a + b2k ) Parseval’s formula.
4
2 k=1 k
(6.39)
The proof is in Rudin’s book, [Rud76, 8.16, p.191]. It uses Stone–Weierstraß theorem about
the uniform approximation of a continuoous function by polynomials. An elementary proof is
in Forster’s book [For01, §23].
6 Sequences of Functions and Basic Topology
174
Example 6.9 (a) Consider the periodic function f ∈ V given by
(
1,
0≤x<π
f (x) =
−1,
π ≤ x < 2π.
Since f is an odd function the coefficients ak vanish. We compute the Fourier coefficients bk .
2
bk =
π
Z
π
0
(
π
0,
2
2
(−1)k+1 + 1 = 4
− cos kx =
sin kx dx = −
kπ
kπ
,
0
kπ
The Fourier series of f reads
Noting that
Z
k∈
kf k22
1
=
2π
Z
2π
dx = 1 =
0
if k is odd..
∞
4 X sin(2n + 1)x
.
f∼
π n=0
2n + 1
X
Parseval’s formula gives
if k is even,
| ck | 2 =
a20 1 X 2
+
(a + b2n )
4
2 n∈N n
∞
1
1X 2
8 X
8
π2
.
bn = 2
=:
s
=⇒
s
=
1
1
2 n∈N
π n=0 (2n + 1)2
π2
8
P
1
Now we can compute s = ∞
n=1 n2 . Since this series converges absolutely we are allowed to
rearrange the elements in such a way that we first add all the odd terms, which gives s1 and then
all the even terms which gives s0 . Using s1 = π 2 /8 we find
1
1
1
s = s1 + s0 = s1 + 2 + 2 + 2 + · · ·
2
4 6
1 1
1
s
+ 2 + · · · = s1 +
s = s0 + 2
2
2 1
2
4
2
4
π
s = s1 = .
3
6
(b) Fix a ∈ [0, 2π] and consider f ∈ V with
(
1,
f (x) =
0,
0 ≤ x ≤ a,
a < x < 2π
Z a
1
a
dx =
and
The Fourier coefficients of f are c0 =
2π 0
2π
Z a
1
i
e−ika − 1 ,
e−ikx dx =
ck = f ·ek =
2π 0
2πk
If k 6= 0,
| ck | 2 =
k 6= 0.
1 − cos ka
1
ika
−ika
1
−
e
1
−
e
=
,
4π 2 k 2
2π 2 k 2
6.3 Fourier Series
175
hence Parseval’s formula gives
∞
X
∞
X 1 − cos ak
a2
| ck | = 2 +
4π
π2k2
k=−∞
k=1
where s =
P
2
∞
∞
a2
1 X 1
1 X cos ak
= 2+ 2
−
4π
π k=1 k 2 π 2 k=1 k 2
!
∞
X
cos ak
a2
1
= 2 + 2 s−
,
4π
π
k2
k=1
1/k 2 . On the other hand
kf k22
Z
1
=
2π
a
dx =
0
a
.
2π
Hence, (6.37) reads
a2
1
+
4π 2 π 2
s−
∞
X
cos ka
k=1
∞
X
k=1
k2
!
=
a
2π
cos ka
(a − π)2 π 2
a2 aπ π 2
−
+
=
− .
=
k2
4
2
6
4
12
(6.40)
Since the series
∞
X
cos kx
(6.41)
k2
k=1
converges uniformly on R (use Theorem 6.2 and
Fourier series of the function
(x − π)2 π 2
− ,
4
12
P
k
1/k 2 is an upper bound) (6.41) is the
x ∈ [0, 2π]
and the Fourier series converges uniformly on R to the above function. Since the term by term
differentiated series converges uniformly on [δ, 2π − δ], see Example 6.2, we obtain
−
∞
X
sin kx
k=1
k
=
′
∞ X
cos kx
k2
k=1
=
(x − π)2 π 2
−
4
12
′
=
x−π
2
which is true for x ∈ (0, 2π).
We can also integrate the Fourier series and obtain by Corollary 6.7
Z
0
∞
xX
k=1
∞
X 1
cos kt
dt
=
k2
k2
k=1
=
Z
x
0
∞
X
sin kx
k=1
k3
x
∞
X
1
cos kt dt =
sin
kt
3
k
0
k=1
.
6 Sequences of Functions and Basic Topology
176
On the other hand,
Z
x
0
(t − π)2 π 2
−
4
12
By homework 19.5
f (x) =
∞
X
sin kx
k3
k=1
dt =
=
(x − π)3 π 2
π3
− x+ .
12
12
12
π3
(x − π)3 π 2
− x+
12
12
12
defines a continuously differentiable periodic function on R.
π
2π
3π
4π
π
2π
3π
4π
Theorem 6.15 Let f : R → R be a continuous periodic function which is piecewise continuously differentiable, i. e. there exists a partition {t0 , . . . , tr } of [0, 2π] such that f |[ti−1 , ti ]
is continuously differentiable.
Then the Fourier series of f converges uniformly to f .
Proof. Let ϕi : [ti−1 , ti ] → R denote the continuous derivative of f |[ti−1 , ti ] and ϕ : R → R
the periodic function that coincides with ϕi on [ti−1 , ti ]. By Bessel’s inequality, the Fourier
coefficients γk of ϕ satisfy
∞
X
| γk |2 ≤ kϕk22 < ∞.
k=−∞
If k 6= 0 the Fourier coefficients ck of f can be found using integration by parts from the Fourier
coefficients of γk .
Z ti
Z ti
i
−ikx
−ikx ti
−ikx
f (x)e
−
f (x)e
dx =
ϕ(x)e
dx .
ti−1
k
ti−1
ti−1
Hence summation over i = 1, . . . , r yields,
1
ck =
2π
ck =
Z
−i
2πk
r
2π
−ikx
f (x)e
0
Z
2π
1 X
dx =
2π i=1
ϕ(x)e−ikx dx =
0
Note that the term
r
X
i=1
Z
ti
f (x)e−ikx dx
ti−1
−iγk
.
k
ti
−ikx f (x)e
ti−1
vanishes since f is continuous and f (2π) = f (0). Since for α, β ∈ C we have | αβ | ≤
1
(| α |2 + | β |2 ), we obtain
2
1
1
2
| ck | ≤
+ | γk | .
2 | k |2
6.4 Basic Topology
177
∞
∞
X
X
1
Since both
and
| γk |2 converge,
2
k
k=1
k=−∞
∞
X
k=−∞
| ck | < ∞.
Thus, the Fourier series converges uniformly to a continuous function g (see Theorem 6.5).
Since the Fourier series converges both to f and to g in the L2 norm, kf − gk2 = 0. Since both
f and g are continuous, they coincide. This completes the proof.
P
P
2
Note that for any f ∈ V , the series
|
c
|
converges
while
the
series
k
k∈Z
k∈Z | ck |
converges only if the Fourier series converges uniformly to f .
6.4 Basic Topology
In the study of functions of several variables we need some topological notions like neighborhood, open set, closed set, and compactness.
6.4.1 Finite, Countable, and Uncountable Sets
Definition 6.6 If there exists a 1-1 mapping of the set A onto the the B (a bijection), we say
that A and B have the same cardinality or that A and B are equivalent; we write A ∼ B.
Definition 6.7 For any nonnegative integer n ∈ N0 let Nn be the set {1, 2, . . . , n}. For any set
A we say that:
(a) A is finite if A ∼ Nn for some n. The empty set ∅ is also considered to be
finite.
(b) A is infinite if A is not finite.
(c) A is countable if A ∼ N.
(d) A is uncountable if A is neither finite nor countable.
(e) A is at most countable if A is finite or countable.
For finite sets A and B we evidently have A ∼ B if A and B have the same number of elements.
For infinite sets, however, the idea of “having the same number of elements” becomes quite
vague, whereas the notion of 1-1 correspondence retains its clarity.
Example 6.10 (a) Z is countable. Indeed, the arrangement
0, 1, −1, 2, −2, 3, −3, . . .
gives a bijection between N and Z. An infinite set Z can be equivalent to one of its proper
subsets N.
6 Sequences of Functions and Basic Topology
178
(b) Countable sets represent the “smallest” infinite cardinality: No uncountable set can be a
subset of a countable set. Any countable set can be arranged in a sequence. In particular, Q is
contable, see Example 2.6 (c).
(c) The countable union of countable sets is a countable set; this is Cantor’s First Diagonal
Process:
x11 → x12
x13 → x14
...
ւ
ր
ւ
ր
x21
x22
x23
x24
...
↓
ր
ւ
ր
x31
x32
x33
x34
...
ւ
ր
ւ
x41
x42
x43
x44
···
↓
ր
···
x51
(d) Let A = {(xn ) | xn ∈ {0, 1} ∀ n ∈ N} be the set of all sequences whose elements are 0
and 1. This set A is uncountable. In particular, R is uncountable.
Proof. Suppose to the contrary that A is countable and arrange the elements of A in a sequence
(sn )n∈N of distinct elements of A. We construct a sequence s as follows. If the nth element in
sn is 1 we let the nth digit of s be 0, and vice versa. Then the sequence s differs from every
member s1 , s2 , . . . at least in one place; hence s 6∈ A—a contradiction since s is indeed an
element of A. This proves, A is uncountable.
6.4.2 Metric Spaces and Normed Spaces
Definition 6.8 A set X is said to be a metric space if for any two points x, y ∈ X there is
associated a real number d(x, y), called the distance of x and y such that
(a) d(x, x) = 0 and d(x, y) > 0 for all x, y ∈ X with x 6= y (positive definiteness);
(b) d(x, y) = d(y, x) (symmetry);
(c) d(x, y) ≤ d(x, z) + d(z, y) for any z ∈ X (triangle inequality).
Any function d : X × X → R with these three properties is called a distance function or metric
on X.
Example 6.11 (a) C, R, Q, and Z are metric spaces with d(x, y) := | y − x |.
Any subsets of a metric space is again a metric space.
(b) The real plane R2 is a metric space with respect to
p
d2 ((x1 , x2 ), (y1 , y2 )) := (x1 − x2 )2 + (y1 − y2 )2 ,
d1 ((x1 , x2 ), (y1 , y2 )) := | x1 − x2 | + | y1 − y2 | .
d2 is called the euclidean metric.
6.4 Basic Topology
179
(c) Let X be a set. Define
d(x, y) :=
(
1, if
0, if
x 6= y,
x = y.
Then (X, d) becomes a metric space. It is called the discrete metric space.
Definition 6.9 Let E be a vector space over C (or R). Suppose on E there is given a function
k·k : E → R which associates to each x ∈ E a real number kxk such that the following three
conditions are satisfied:
(i) kxk ≥ 0 for every x ∈ E, and kxk = 0 if and only if x = 0,
(ii) kλxk = | λ | kxk for all λ ∈ C (in R, resp.)
(iii) kx + yk ≤ kxk + kyk , for all x, y ∈ E.
Then E is called a normed (vector) space and kxk is the norm of x.
kxk generalizes the “length” of vector x ∈ E. Every normed vector space E is a metric space
if we put d(x, y) = kx − yk.
Prove!
However, there are metric spaces that are not normed spaces, for example (N, d(m, n) =
| n − m |).
Example 6.12 (a) E = Rk or E = Ck . Let x = (x1 , · · · , xk ) ∈ E and define
v
u k
uX
kxk2 = t
| xi |2 .
i=1
Then k·k is a norm on E. It is called the Euclidean norm.
There are other possibilities to define a norm on E. For example,
kxk∞ = max | xi | ,
1≤i≤k
kxk1 =
k
X
i=1
| xk | ,
kxka = kxk2 + 3 kxk1 ,
kxkb = max(kxk1 , kxk2 ).
(b) E = C([a, b]). Let p ≥ 1. Then
kf k∞ = sup | f (x) | ,
x∈[a,b]
kf kp =
Z
b
a
p1
| f (t) | dt .
p
√
define norms on E. Note that kf kp ≤ p b − a kf k∞ .
P
2
(c) Hilbert’s sequence space. E = ℓ2 = {(xn ) | ∞
n=1 | xn | < ∞}. Then
kxk2 =
∞
X
n=1
| xn |2
! 21
6 Sequences of Functions and Basic Topology
180
defines a norm on ℓ2 .
(d) The bounded sequences. E = ℓ∞ = {(xn ) | supn∈N | xn | < ∞}. Then
kxk∞ = sup | xn |
N
n∈
defines a norm on E.
(e) E = C([a, b]). Then
kf k1 =
defines a norm on E.
Z
b
a
| f (t) | dt
6.4.3 Open and Closed Sets
Definition 6.10 Let X be a metric space with metric d. All points and subsets mentioned below
are understood to be elements and subsets of X, in particular, let E ⊂ X be a subset of X.
(a) The set Uε (x) = {y | d(x, y) < ε} with some ε > 0 is called the
ε-neighborhood (or ε-ball with center x) of x. The number ε is called the radius of
the neighborhood Uε (x).
(b) A point p is an interior or inner point of E if there is a neighborhood Uε (p)
completely contained in E. E is open if every point of E is an interior point.
(c) A point p is called an accumulation or limit point of E if every neighborhood of
p has a point q 6= p such that q ∈ E.
(d) E is said to be closed if every accumulation point of E is a point of E. The closure of E (denoted by E) is E together with all accumulation points of E. In other
words p ∈ E, if and only if every neighborhood of x has a non-empty intersection
with E.
(e) The complement of E (denoted by E c ) is the set of all points p ∈ X such that
p 6∈ E.
(f) E is bounded if there exists a real number C > 0 such that d(x, y) ≤ C for all
x, y ∈ E.
(g) E is dense in X if E = X.
Example 6.13 (a) X = R with the standard metric d(x, y) = | x − y |. E = (a, b) ⊂ R is
an open set. Indeed, for every x ∈ (a, b) we have Uε (x) ⊂ (a, b) if ε is small enough, say
ε ≤ min{| x − a | , | x − b |}. Hence, x is an inner point of (a, b). Since x was arbitrary, (a, b)
is open.
F = [a, b) is not open since a is not an inner point of [a, b). Indeed, Uε (a) ( [a, b) for every
ε > 0.
We have
E = F = set of accumulation points = [a, b].
6.4 Basic Topology
181
Indeed, a is an accumulation point of both (a, b) and [a, b). This is true since every neighborhood Uε (a), ε < b − a, has a + ε/2 ∈ (a, b) (resp. in [a, b)) which is different from a. For any
point x 6∈ [a, b] we find a neighborhood Uε (x) with Uε (x) ∩ [a, b) = ∅; hence x 6∈ E.
The set of rational numbers Q is dense in R. Indeed, every neighborhood Uε (r) of every real
number r contains a rational number, see Proposition 1.11 (b).
For the real line one can prove: Every open set is the at most countable union of disjoint open
(finite or infinite) intervals. A similar description for closed subsets of R is false. There is no
similar description of open subsets of Rk , k ≥ 2.
(b) For every metric space X, both the whole space X and the empty set ∅ are open as well as
closed.
(c) Let B = {x ∈ Rk | kxk2 < 1} be the open unit ball in Rk . B is open (see Lemma 6.16
below); B is not closed. For example, x0 = (1, 0, . . . , 0) is an accumulation point of B since
xn = (1 − 1/n, 0, . . . , 0) is a sequence of elements of B converging to x0 , however, x0 6∈ B.
The accumulation points of B are B = {x ∈ Rk | kxk2 ≤ 1}. This is also the closure of B in
Rk .
f+ε
g
f- ε
(d) Consider E = C([a, b]) with the supremum norm.
Then g ∈ E is in the ε-neighborhood of a function f ∈ E
if and only if
| f (t) − g(t) | < ε,
for all
x ∈ [a, b].
Lemma 6.16 Every neighborhood Ur (p), r > 0, of a point p is an open set.
Proof. Let q ∈ Ur (p). Then there exists ε > 0 such that d(q, p) =
r − ε. We will show that Uε (q) ⊂ Ur (p). For, let x ∈ Uε (q). Then
by the triangle inequality we have
q
x
p
d(x, p) ≤ d(x, q) + d(q, p) < ε + (r − ε) = r.
Hence x ∈ Ur (p) and q is an interior point of Ur (p). Since q was
arbitrary, Ur (p) is open.
Remarks 6.4 (a) If p is an accumulation point of a set E, then every neighborhood of p contains
infinitely many points of E.
(b) A finite set has no accumulation points; hence any finite set is closed.
Example 6.14 (a) The open complex unit disc, {z ∈ C | | z | < 1}.
(b) The closed unit disc, {z ∈ C | | z | ≤ 1}.
(c) A finite set.
(d) The set Z of all integers.
(e) {1/n | n ∈ N}.
(f) The set C of all complex numbers.
(g) The interval (a, b).
6 Sequences of Functions and Basic Topology
182
Here (d), (e), and (g) are regarded as subsets of R. Some properties of these sets are tabulated
below:
(a)
(b)
(c)
(d)
(e)
(f)
(g)
Closed
No
Yes
Yes
Yes
No
Yes
No
Open
Yes
No
No
No
No
Yes
Yes
Bounded
Yes
Yes
Yes
No
Yes
No
Yes
Proposition 6.17 A subset E ⊂ X of a metric space X is open if and only if its complement
E c is closed.
Proof. First, suppose E c is closed. Choose x ∈ E. Then x 6∈ E c , and x is not an accumulation
point of E c . Hence there exists a neighborhood U of x such that U ∩ E c is empty, that is
U ⊂ E. Thus x is an interior point of E and E is open.
Next, suppose that E is open. Let x be an accumulation point of E c . Then every neighborhood
of x contains a point of E c , so that x is not an interior point of E. Since E is open, this means
that x ∈ E c . It follows that E c is closed.
6.4.4 Limits and Continuity
In this section we generalize the notions of convergent sequences and continuous functions to
arbitrary metric spaces.
Definition 6.11 Let X be a metric space and (xn ) a sequence of elements of X. We say that
(xn ) converges to x ∈ X if lim d(xn , x) = 0. We write lim xn = x or xn −→ x.
n→∞
n→∞
n→∞
In other words, limn→∞ xn = x if for every neighborhood Uε , ε > 0, of x there exists an n0 ∈ N
such that n ≥ n0 implies xn ∈ Uε .
Note that a subset F of a metric space X is closed if and only if F contains all limits of
convergent sequences (xn ), xn ∈ F .
Two metrics d1 and d2 on a space X are said to be topologically equivalent if limn→∞ xn = x
w.r.t. d1 if and only if limn→∞ xn = x w.r.t. d2 . In particular, two norms k·k1 and k·k2 on the
same linear space E are said to be equivalent if the metric spaces are topologically equivalent.
Proposition 6.18 Let E1 = (E, k·k1 ) and E2 = (E, k·k2 ) be normed vector spaces such that
there exist positive numbers c1 , c2 > 0 with
c1 kxk1 ≤ kxk2 ≤ c2 kxk1 ,
Then k·k1 and k·k2 are equivalent.
for all x ∈ E.
(6.42)
6.4 Basic Topology
183
Proof. Condition (6.42) is obviously symmetric with respect to E1 and E2 since kxk2 /c2 ≤
kxk1 ≤ kxk2 /c1 . Therefore, it is sufficient to show the following: If xn −→ x w.r.t. k·k1 then
n→∞
xn −→ x w.r.t. k·k2 . Indeed, by Definition, limn→∞ kxn − xk1 = 0. By assumption,
n→∞
c1 kxn − xk1 ≤ kxn − xk2 ≤ c2 kxn − xk1 ,
n ∈ N.
Since the first and the last expressions tend to 0 as n → ∞, the sandwich theorem shows that
limn→∞ kxn − xk2 = 0, too. This proves xn → x w.r.t. k·k2 .
Example 6.15 Let E = Rk or E = Ck with the norm kxkp =
All these norms are equivalent. Indeed,
kxkp∞
≤
k
X
i=1
p
| xi | ≤
=⇒ kxk∞ ≤ kxkp ≤
√
p
k
X
i=1
p
p
| x1 |p + · · · | xk |p , p ∈ [1, ∞].
kxkp∞ = k kxkp∞ ,
k kxk∞ .
(6.43)
The following Proposition is quite analogous to Proposition 2.33 with k = 2. Recall that a
complex sequence (zn ) converges if and only if both Re zn and Im zn converge.
Proposition 6.19 Let (xn ) be a sequence of vectors of the euclidean space (Rk , k·k2 ),
xn = (xn1 , . . . , xnk ).
Then (xn ) converges to a = (a1 , . . . , ak ) ∈ Rk if and only if
lim xni = ai ,
n→∞
i = 1, . . . , k.
Proof. Suppose that lim xn = a. Given ε > 0 there is an n0 ∈ N such that n ≥ n0 implies
n→∞
kxn − ak2 < ε. Thus, for i = 1, . . . , k we have
| xni − ai | ≤ kxn − ak2 < ε;
hence lim xni = ai .
n→∞
Conversely, suppose that lim xni = ai for i = 1, . . . , k. Given ε > 0 there are n0i ∈ N such
n→∞
that n ≥ n0i implies
ε
| xni − ai | < √ .
k
For n ≥ max{n01 , . . . , n0k } we have (see (6.43))
√
kxn − ak2 ≤ k kxn − ak∞ < ε.
hence lim xn = a.
n→∞
6 Sequences of Functions and Basic Topology
184
Corollary 6.20 Let B ⊂ Rk be a bounded subset and (xn ) a sequence of elements of B.
Then (xn ) has a converging subsequence.
(1)
Proof. Since B is bounded all coordinates of B are bounded; hence there is a subsequence (xn )
(2)
(1)
of (xn ) such that the first coordinate converges. Further, there is a subsequence (xn ) of (xn )
(k)
(k−1)
such that the second coordinate converges. Finally there is a subsequence (xn ) of (xn )
(k)
such that all coordinates converge. By the above proposition the subsequence (xn ) converges
in Rk .
The same statement is true for subsets B ⊂ Ck .
Definition 6.12 A mapping f : X → Y from the metric space X into the metric space Y is
said to be continuous at a ∈ X if one of the following equivalent conditons is satisfied.
(a) For every ε > 0 there exists δ > 0 such that for every x ∈ X
d(x, a) < δ
implies d(f (x), f (a)) < ε.
(6.44)
(b) For any sequence (xn ), xn ∈ X with lim xn = a it follows that lim f (xn ) = f (a).
n→∞
n→∞
The mapping f is said to be continuous on X if f is continuous at every point a of X.
Proposition 6.21 The composition of two continuous mappings is continuous.
The proof is completely the same as in the real case (see Proposition 3.4) and we omit it.
We give the topological description of continuous functions.
Proposition 6.22 Let X and Y be metric spaces. A mapping f : X → Y is continuous if and
only if the preimage of any open set in Y is open in X.
Proof. Suppose that f is continuous and G ⊂ Y is open. If f −1 (G) = ∅, there is nothing to
prove; the empty set is open. Otherwise there exists x0 ∈ f −1 (G), and therefore f (x0 ) ∈ G.
Since G is open, there is ε > 0 such that Uε (f (x0 )) ⊂ G. Since f is continuous at x0 , there
exists δ > 0 such that x ∈ Uδ (x0 ) implies f (x) ∈ Uε (f (x0 )) ⊂ G. That is, Uδ (x0 ) ⊂ f −1 (G),
and x0 is an inner point of f −1 (G); hence f −1 (G) is open.
Suppose now that the condition of the proposition is fulfilled. We will show that f is continuous. Fix x0 ∈ X and ε > 0. Since G = Uε (f (x0 )) is open by Lemma 6.16, f −1 (G) is open by
assumption. In particular, x0 ∈ f −1 (G) is an inner point. Hence, there exists δ > 0 such that
Uδ (x0 ) ⊂ f −1 (G). It follows that f (Uδ (x0 )) ⊂ Uε (x0 ); this means that f is continuous at x0 .
Since x0 was arbitrary, f is continuous on X.
Remark 6.5 Since the complement of an open set is a closed set, it is obvious that the proposition holds if we replace “open set” by “closed set.”
In general, the image of an open set under a continuous function need not to be open; consider
for example f (x) = sin x and G = (0, 2π) which is open; however, f ((0, 2π)) = [−1, 1] is not
open.
6.4 Basic Topology
185
6.4.5 Comleteness and Compactness
(a) Completeness
Definition 6.13 Let (X, d) be a metric space. A sequence (xn ) of elements of X is said to be a
Cauchy sequence if for every ε > 0 there exists a positive integer n0 ∈ N such that
d(xn , xm ) < ε
for all
m, n ≥ n0 .
A metric space is said to be complete if every Cauchy sequence converges.
A complete normed vector space is called a Banach space.
Remark. The euclidean k-space Rk and Ck is complete.
The function space C([a, b]), k·k∞ ) is complete, see homework 21.2. The Hilbert space ℓ2 is
complete
(b) Compactness
The notion of compactness is of great importance in analysis, especially in connection with
continuity.
By an open cover of a set E in a metric space X we mean a collection {Gα | α ∈ I} of open
S
subsets of X such that E ⊂ α Gα . Here I is any index set and
[
Gα = {x ∈ X | ∃ β ∈ I : x ∈ Gβ }.
α∈I
Definition 6.14 (Covering definition) A subset K of a metric space X is said to be compact if
every open cover of K contains a finite subcover. More explicitly, if {Gα } is an open cover of
K, then there are finitely many indices α1 , . . . , αn such that
K ⊂ Gα1 ∪ · · · ∪ Gαn .
Note that the definition does not state that a set is compact if there exists a finite open cover—the
whole space X is open and a cover consisting of only one member. Instead, every open cover
has a finite subcover.
Example 6.16 (a) It is clear that every finite set is compact.
(b) Let (xn ) be a converging to x sequence in a metric space X. Then
A = {xn | n ∈ N} ∪ {x}
is compact.
Proof. Let {Gα } be any open cover of A. In particular, the limit point x is covered by, say, G0 .
Then there is an n0 ∈ N such that xn ∈ G0 for every n ≥ n0 . Finally, xk is covered by some
Gk , k = 1, . . . , n0 − 1. Hence the collection
{Gk | k = 0, 1, . . . , n0 − 1}
is a finite subcover of A; therefore A is compact.
6 Sequences of Functions and Basic Topology
186
Proposition 6.23 (Sequence Definition) A subset K of a metric space X is compact if and
only if every sequence in K contains a convergent subsequence with limit in K.
Proof. (a) Let K be compact and suppose to the contrary that (xn ) is a sequence in K without
any convergent to some point of K subsequence. Then every x ∈ K has a neighborhood Ux
containing only finitely many elements of the sequence (xn ). (Otherwise x would be a limit
point of (xn ) and there were a converging to x subsequence.) By construction,
[
K⊂
Ux .
x∈X
Since K is compact, there are finitely many points y1 , . . . , ym ∈ K with
K ⊂ Uy1 ∪ · · · ∪ Uym .
Since every Uyi contains only finitely many elements of (xn ), there are only finitely many
elements of (xn ) in K—a contradiction.
(b) The proof is an the appendix to this chapter.
Remark 6.6 Further properties. (a) A compact subset of a metric space is closed and bounded.
(b) A closed subsets of a compact set is compact.
(c) A subset K of Rk or Ck is compact if and only if K is bounded and closed.
Proof. Suppose K is closed and bounded. Let (xn ) be a sequence in K. By Corollary 6.20 (xn )
has a convergent subsequence. Since K is closed, the limit is in K. By the above proposition
K is compact. The other directions follows from (a)
(c) Compactness and Continuity
As in the real case (see Theorem 3.6) we have the analogous results for metric spaces.
Proposition 6.24 Let X be a compact metric space.
(a) Let f : X → Y be a continuous mapping into the metric space Y . Then f (X) is compact.
(b) Let f : X → R a continuous mapping. Then f is bounded and attains its maximum and
minimum, that is there are points p and q in X such that
f (p) = sup f (x),
x∈X
f (q) = inf f (x).
x∈X
Proof. (a) Let {Gα } be an open covering of f (X). By Proposition 6.22 f −1 (Gα ) is open for
every α. Hence, {f −1 (Gα )} is an open cover of X. Since X is compact there is an open
subcover of X, say {f −1(Gα1 ), . . . , f −1 (Gαn )}. Then {Gα1 , . . . , Gαn } is a finite subcover of
{Gα } covering f (X). Hence, f (X) is compact. We skip (b).
Similarly as for real function we have the following proposition about uniform continuity. The
proof is in the appendix.
6.4 Basic Topology
187
Proposition 6.25 Let f : K → R be a continuous function on a compact set K ⊂ R. Then f
is uniformly continuous on K.
6.4.6 Continuous Functions in
R
k
Proposition 6.26 (a) The projection mapping pi : Rk → R, i = 1, . . . , k, given by
pi (x1 , . . . , xk ) = xi is continuous.
(b) Let U ⊆ Rk be open and f, g : U → R be continuous on U. Then f + g, f g, | f |, and, f /g
(g 6= 0) are continuous functions on U.
(c) Let X be a metric space. A mapping
f = (f1 , . . . , fk ) : X → Rk
is continuous if and only if all components fi : X → R, i = 1, . . . , k, are continuous.
Proof. (a) Let (xn ) be a sequence converging to a = (a1 , . . . , ak ) ∈ Rk . Then the sequence
(pi (xn )) converges to ai = pi (a) by Proposition 6.19. This shows continuity of pi at a.
(b) The proofs are quite similar to the proofs in the real case, see Proposition 2.3. As a sample
we carry out the proof in case f g. Let a ∈ U and put M = max{| f (a) | , | f (b) |}. Let ε > 0,
ε < 3M 2 , be given. Since f and g are continuous at a, there exists δ > 0 such that
kx − ak < δ
implies
kx − ak < δ
implies
ε
,
3M
ε
.
| g(x) − g(a) | <
3M
| f (x) − f (a) | <
(6.45)
Note that
f g(x) − f g(a) = (f (x) − f (a))(g(x) − g(a)) + f (a)(g(x) − g(a)) + g(a)(f (x) − f (a)).
Taking the absolute value of the above identity, using the triangle inequality as well as (6.45)
we have that kx − ak < δ implies
| f g(x)−f g(a) | ≤ | f (x)−f (a) | | g(x)−g(a) | + | f (a) | | g(x) − g(a) | + | g(a) | | f (x) − f (a) |
ε2
ε
ε ε ε
ε
≤
+M
≤ + + = ε.
+M
2
9M
3M
3M
3 3 3
This proves continuity of f g at a.
(c) Suppose first that f is continuous at a ∈ X. Since fi = pi ◦f , fi is continuous by the result
of (a) and Proposition 6.21.
Suppose now that all the fi , i = 1, . . . , k, are continuous at a. Let (xn ), xn 6= a, be a sequence
in X with limn→∞ xn = a in X. Since fi is continuous, the sequences (fi (xn )) of numbers
converge to fi (a). By Proposition 6.19, the sequence of vectors f (xn ) converges to f (a); hence
f is continuous at a.
6 Sequences of Functions and Basic Topology
188
Example 6.17 Let f : R3 → R2 be given by
f (x, y, z) =
sin √
x2 +ez
x2 +y 2 +z 2 +1
2
2
!
log | x2 + y + z + 1 |
.
Then f is continuous p
on U. Indeed, since product, sum, and composition of continuous functions are continuous, x2 + y 2 + z 2 + 1 is a continuous function on R3 . We also made use
of Proposition 6.26 (a); the coordinate functions x, y, and z are continuous. Since the denomi2
z
nator is nonzero, f1 (x, y, z) = sin √ 2x +e
is continuous. Since | x2 + y 2 + z 2 + 1 | > 0,
2
2
x +y +z +1
f2 (x, y, z) = log | x2 + y 2 + z 2 + 1 | is continuous. By Proposition 6.26 (c) f is continuous.
6.5 Appendix E
(a) A compact subset is closed
Proof. Let K be a compact subset of a metric space X. We shall prove that the complement of
K is an open subset of X.
Suppose that p ∈ X, p 6∈ K. If q ∈ K, let V q and U(q) be neighborhoods of p and q,
respectively, of radius less than d(p, q)/2. Since K is compact, there are finitely many points
q1 , . . . , qn in K such that
K ⊂ Uq1 ∪ · · · ∪ Uqn =: U.
If V = V q1 ∩ · · · ∩ V qn , then V is a neighborhood of p which does not intersect U. Hence
U ⊂ K c , so that p is an interior point of K c , and K is closed. We show that K is bounded.
Let ε > 0 be given. Since K is compact the open cover {Uε (x) | x ∈ K} of K has a finite
S
subcover, say {Uε (x1 ), . . . , Uε (xn )}. Let U = ni=1 Uε (xi ), then the maximal distance of two
points x and y in U is bounded by
2ε +
X
d(xi , xj ).
1≤i<j≤n
A closed subset of a compact set is compact
Proof. Suppose F ⊂ K ⊂ X, F is closed in X, and K is compact. Let {U α } be an open cover
of F . Since F c is open, {U α , F c } is an open cover Ω of K. Since K is compact, there is a
finite subcover Φ of Ω, which covers K. If F c is a member of Φ, we may remove it from Φ and
still retain an open cover of F . Thus we have shown that a finite subcollection of {U α } covers
F.
6.5 Appendix E
189
Equivalence of Compactness and Sequential Compactness
Proof of Proposition 6.23 (b). This direction is hard to proof. It does not work in arbitrary
topological spaces and essentially uses that X is a metric space. The prove is roughly along the
lines of Exercises 22 to 26 in [Rud76]. We give the proof of Bredon (see [Bre97, 9.4 Theorem])
Suppose that every sequence in K contains a converging in K subsequence.
1) K contains a countable dense set. For, we show that for every ε > 0, K can be covered by
a finite number of ε-balls (ε is fixed). Suppose, this is not true, i. e. K can’t be covered by any
finite number of ε-balls. Then we construct a sequence (xn ) as follows. Take an arbitrary x1 .
Suppose x1 , . . . , xn are already found; since K is not covered by a finite number of ε-balls, we
find xn+1 which distance to every preceding element of the sequence is greater than or equal
to ε. Consider a limit point x of this sequence and an ε/2-neighborhood U of x. Almost all
elements of a suitable subsequence of (xn ) belong to U, say xr and xs with s > r. Since both
are in U their distance is less than ε. But this contradicts the construction of the sequence.
Now take the union of all those finite sets corresponding to ε = 1/n, n ∈ N. This is a countable
dense set of K.
2) Any open cover {Uα } of K has a countable subcover. Let x ∈ K be given. Since {Uα }α∈I is
an open cover of K we find β ∈ I and n ∈ N such that U2/n (x) ⊂ Uα . Further, since {xi }i∈N
is dense in K, we find i, n ∈ N such that d(x, xi ) < 1/n. By the triangle inequality
x ∈ U1/n (xi ) ⊂ U2/n (x) ⊂ Uβ .
To each of the countably many U1/n (xi ) choose one Uβ ⊃ U1/n (xi ). This is a countable subcover of {Uα }.
3) Rename the countable open subcover by {Vn }n∈N and consider the decreasing sequence Cn
of closed sets
n
[
Cn = K \
Vk , C 1 ⊃ C 2 ⊃ · · · .
k=1
If Ck = ∅ we have found a finite subcover, namely V1 , V2 , . . . , Vk . Suppose that all the Cn are
nonempty, say xn ∈ Cn . Further, let x be the limit of the subsequence (xni ). Since xni ∈ Cm
T
for all ni ≥ m and Cm is closed, x ∈ Cm for all m. Hence x ∈ m∈N Cm . However,
\
N
Cm = K \
m∈
[
N
Vm = ∅.
m∈
This contradiction completes the proof.
Proof of Proposition 6.25. Let ε > 0 be given. Since f is continuous, we can associate to each
point p ∈ K a positive number δ(p) such that q ∈ K ∩ Uδ(p) (p) implies | f (q) − f (p) | < ε/2.
Let J(p) = {q ∈ K | | p − q | < δ(p)/2}.
Since p ∈ J(p), the collection {J(p) | p ∈ K} is an open cover of K; and since K is compact,
there is a finite set of points p1 , . . . , pn in K such that
K ⊂ J(p1 ) ∪ · · · ∪ J(pn ).
(6.46)
6 Sequences of Functions and Basic Topology
190
We put δ := 12 min{δ(p1 ), . . . , δ(pn )}. Then δ > 0. Now let p and q be points of K with
| x − y | < δ. By (6.46), there is an integer m, 1 ≤ m ≤ n, such that p ∈ J(pm ); hence
1
| p − pm | < δ(pm ),
2
and we also have
1
| q − pm | ≤ | p − q | + | p − pm | < δ + δ(pm ) ≤ δ(pm ).
2
Finally, continuity at pm gives
| f (p) − f (q) | ≤ | f (p) − f (pm ) | + | f (pm ) − f (q) | < ε.
Proposition 6.27 There exists a real continuous function on the real line which is nowhere
differentiable.
Proof. Define
ϕ(x) = | x | ,
x ∈ [−1, 1]
and extend the definition of ϕ to all real x by requiring periodicity
ϕ(x + 2) = ϕ(x).
Then for all s, t ∈ R,
| ϕ(s) − ϕ(t) | ≤ | s − t | .
(6.47)
In particular, ϕ is continuous on R. Define
f (x) =
∞ n
X
3
n=0
4
ϕ(4n x).
(6.48)
Since 0 ≤ ϕ ≤ 1, Theorem 6.2 shows that the series (6.48) converges uniformly on R. By
Theorem 6.5, f is continuous on R.
Now fix a real number x and a positive integer m ∈ N. Put
δm =
±1
2 · 4m
where the sign is chosen that no integer lies between 4m x and 4m (x + δm ). This can be done
since 4m | δm | = 12 . It follows that | ϕ(4m x) − ϕ(4m x + 4m δm ) | = 12 . Define
γn =
ϕ(4n (x + δm )) − ϕ(4n x)
.
δm
6.5 Appendix E
191
When n > m, then 4n δm is an even integer, so that γn = 0 by peridicity of ϕ. When 0 ≤ n ≤ m,
(6.47) implies | γn | ≤ 4m . Since | γm | = 4m , we conclude that
m n
m−1
X
f (x + δm ) − f (x) X
3
1
m
=
γ
−
3n = (3m + 1) .
≥
3
n
n=0 4
δm
2
n=0
As m → ∞, δm → 0. It follows that f is not differentiable at x.
Proof of Abel’s Limit Theorem, Proposition 6.9. By Proposition 6.4, the series converges on
(−1, 1) and the limit function is continuous there since the radius of convergence is at least 1,
by assumption. Hence it suffices to proof continuity at x = 1, i. e. that limx→1−0 f (x) = f (1).
P
n ∈ Z+
Put rn = ∞
k=n ak ; then r0 = f (1) and rn+1 − rn = −cn for all nonnegative integers
P∞
and limn→∞ rn = 0. Hence there is a constant C with | rn | ≤ C and the series n=0 rn+1 xn
converges for | x | < 1 by the comparison test. We have
(1 − x)
∞
X
rn+1 xn =
n=0
∞
X
rn+1 xn +
n=0
=
∞
X
n=0
∞
X
rn+1 xn+1
n=0
n
rn+1 x −
∞
X
n=0
n
rn x + r0 = −
hence,
f (1) − f (x) = (1 − x)
∞
X
∞
X
an xn + f (1),
n=0
rn+1 xn .
n=0
Let ε > 0 be given. Choose N ∈ N such that n ≥ N implies | rn | < ε. Put δ = ε/(CN); then
x ∈ (1 − δ, 1) implies
| f (1) − f (x) | ≤ (1 − x)
N
−1
X
n=0
| rn+1 | xn + (1 − x)
≤ (1 − x)CN + (1 − x)ε
∞
X
∞
X
n=N
| rn+1 | xn
xn = 2ε;
n=0
hence f tends to f (1) as x → 1 − 0.
Definition 6.15 If X is a metric space C(X) will denote the set of all continuous, bounded
functions with domain X. We associate with each f ∈ C(X) its supremum norm
kf k∞ = kf k = sup | f (x) | .
(6.49)
x∈X
Since f is assumed to be bounded, kf k < ∞. Note that boundedness of X is redundant if X is
a compact metric space (Proposition 6.24). Thus C(X) contains of all continuous functions in
that case.
It is clear that C(X) is a vector space since the sum of bounded functions is again a bounded
6 Sequences of Functions and Basic Topology
192
function (see the triangle inequality below) and the sum of continuous functions is a continuous
function (see Proposition 6.26). We show that kf k∞ is indeed a norm on C(X).
(i) Obviously, kf k∞ ≥ 0 since the absolute value | f (x) | is nonnegative. Further k0k = 0.
Suppose now kf k = 0. This implies | f (x) | = 0 for all x; hence f = 0.
(ii) Clearly, for every (real or complex) number λ we have
kλf k = sup | λf (x) | = | λ | sup | f (x) | = | λ | kf k .
x∈X
x∈X
(iii) If h = f + g then
| h(x) | ≤ | f (x) | + | g(x) | ≤ kf k + kgk ,
x ∈ X;
hence
kf + gk ≤ kf k + kgk .
We have thus made C(X) into a normed vector space. Remark 6.1 can be rephrased as
A sequence (fn ) converges to f with respect to the norm in C(X) if and only if
fn → f uniformly on X.
Accordingly, closed subsets of C(X) are sometimes called uniformly closed, the closure of a
set A ⊂ C(X) is called the uniform closure, and so on.
Theorem 6.28 The above norm makes C(X) into a Banach space (a complete normed space).
Proof. Let (fn ) be a Cauchy sequence of C(X). This means to every ε > 0 corresponds an
n0 ∈ N such that n, m ≥ n0 implies kfn − fm k < ε. It follows by Proposition 6.1 that there
is a function f with domain X to which (fn ) converges uniformly. By Theorem 6.5, f is
continuous. Moreover, f is bounded, since there is an n such that | f (x) − fn (x) | < 1 for all
x ∈ X, and fn is bounded.
Thus f ∈ C(X), and since fn → f uniformly on X, we have kf − fn k → 0 as n → ∞.
Chapter 7
Calculus of Functions of Several Variables
In this chapter we consider functions f : U → R or f : U → Rm where U ⊂ Rn is an open set.
In Subsection 6.4.6 we collected the main properties of continuous functions f . Now we will
study differentiation and integration of such functions in more detail
The Norm of a linear Mapping
Proposition 7.1 Let T ∈ L(Rn , Rm ) be a linear mapping of the euclidean spaces Rn into Rm .
(a) Then there exists some C > 0 such that
kT (x)k2 ≤ C kxk2 ,
(b) T is uniformly continuous on Rn .
for all x ∈ Rn .
(7.1)
Proof. (a) Using the standard bases of Rn and Rm we identify T with its matrix T = (aij ),
P
T ej = m
i=1 aij ei . For x = (x1 , . . . , xn ) we have
!
n
n
X
X
a1j xj , . . . ,
amj xj ;
T (x) =
j=1
j=1
hence by the Cauchy–Schwarz inequality we have
n
2
!2
m X
n
m
X
X
X
aij xj ≤
| aij xj |
kT (x)k22 =
i=1 j=1
i=1
j=1
! n
m
n
n
XX
X
X
X
2
2
2
≤
| aij |
| xj | =
aij
| xj |2 = C 2 kxk2 ,
i=1 j=1
where C =
qP
i,j
j=1
i,j
j=1
a2ij . Consequently,
kT xk ≤ C kxk .
(b) Let ε > 0. Put δ = ε/C with the above C. Then kx − yk < δ implies
kT x − T yk = kT (x − y)k ≤ C kx − yk < ε,
which proves (b).
193
7 Calculus of Functions of Several Variables
194
Definition 7.1 Let V and W normed vector spaces and A ∈ L(V, W ). The smallest number C
with (7.1) is called the norm of the linear map A and is denoted by kAk.
kAk = inf{C | kAxk ≤ C kxk
for all x ∈ V }.
(7.2)
By definition,
kAxk ≤ kAk kxk .
(7.3)
Let T ∈ L(Rn , Rm ) be a linear mapping. One can show that
kT k = sup
x6=0
kT xk
= sup kT xk = sup kT xk .
kxk
kxk=1
kxk≤1
7.1 Partial Derivatives
We consider functions f : U → R where U ⊂ Rn is an open set. We want to find derivatives
“one variable at a time.”
Definition 7.2 Let U ⊂ Rn be open and f : U → R a real function. Then f is called partial
differentiable at a = (a1 , . . . , an ) ∈ U with respect to the ith coordinate if the limit
Di f (a) = lim
h→0
f (a1 , . . . , ai + h, . . . , an ) − f (a1 , . . . , an )
h
(7.4)
exists where h is real and sufficiently small (such that (a1 , . . . , ai + h, . . . , an ) ∈ U).
Di f (x) is called the ith partial derivative of f at a. We also use the notations
Di f (a) =
∂f
∂f (a)
(a) =
= fxi (a).
∂xi
∂xi
It is important that Di f (a) is the ordinary derivative of a certain function; in fact, if g(x) =
f (a1 , . . . , x, . . . , an ), then Di f (a) = g ′ (ai ). That is, Di f (a) is the slope of the tangent line at
(a, f (a)) to the curve obtained by intersecting the graph of f with the plane xj = aj , j 6= i. It
also means that computation of Di f (a) is a problem we can already solve.
Example 7.1 (a) f (x, y) = sin(xy 2 ). Then D1 f (x, y) = y 2 cos(xy 2 ) and D2 f (x, y) =
2xy cos(xy 2 ).
(b) Consider the radius function r : Rn → R
q
r(x) = kxk2 = x21 + · · · + x2n ,
x = (x1 , . . . , xn ) ∈ Rn . Then r is partial differentiable on Rn \ 0 with
∂r
xi
,
(x) =
∂xi
r(x)
Indeed, the function
f (ξ) =
x 6= 0.
q
x21 + · · · + ξ 2 + · · · + x2n
(7.5)
7.1 Partial Derivatives
195
is differentiable, where x1 , . . . , xi−1 , xi+1 , . . . , xn are considered to be constant. Using the chain
rule one obtains (with ξ = xi )
2ξ
1
xi
∂r
(x) = f ′ (ξ) = p 2
= .
∂xi
2 x1 + · · · + ξ 2 + · · · + x2n
r
(c) Let f : (0, +∞) → R be differentiable. The composition x 7→ f (r(x)) (with the above
radius function r) is denoted by f (r), it is partial differentiable on Rn \ 0. The chain rule gives
∂r
xi
∂
f (r) = f ′ (r)
= f ′ (r) .
∂xi
∂xi
r
(d) Partial differentiability does not imply continuity. Define
(
xy
xy
(x, y) 6= (0, 0),
2
2 2 = r4 ,
f (x, y) = (x +y )
0,
(x, y) = (0, 0).
Obviously, f is partial differentiable on R2 \ 0. Indeed, by definition of the partial derivative
∂f
f (h, 0)
(0, 0) = lim
= lim 0 = 0.
h→0
h→0
∂x
h
Since f is symmetric in x and y, ∂f
(0, 0) = 0, too. However, f is not continuous at 0 since
∂y
2
f (ε, ε) = 1/(4ε ) becomes large as ε tends to 0.
Remark 7.1 In the next section we will become acquainted with stronger notion of differentiability which implies continuity. In particular, a continuously partial differentiable function is
continuous.
Definition 7.3 Let U ⊂ Rn be open and f : U → R partial differentiable. The vector
∂f
∂f
grad f (x) =
(x), . . . ,
(x)
∂x1
∂xn
(7.6)
is called the gradient of f at x ∈ U.
Example 7.2 (a) For the radius function r(x) defined in Example 7.1 (b) we have
x
grad r(x) = .
r
Note that x/r is a unit vector (of the euclidean norm 1) in the direction x. With the notations of
Example 7.1 (c),
x
grad f (r) = f ′ (r) .
r
(b) Let f, g : U → R be partial differentiable functions. Then we have the following product
rule
grad (f g) = g grad f + f grad g.
This is immediate from the product rule for functions of one variable
∂f
∂g
∂
(f g) =
g+f
.
∂xi
∂xi
∂xi
(c) f (x, y) = xy . Then grad f (x, y) = (yxy−1 , xy log x).
(7.7)
7 Calculus of Functions of Several Variables
196
Notation. Instead of grad f one also writes ∇f (“Nabla f ”). ∇ is a vector-valued differential
operator:
∂
∂
∇=
.
,...,
∂x1
∂xn
Definition 7.4 Let U ⊂ Rn . A vector field on U is a mapping
v = (v1 , . . . , vn ) : U → Rn .
(7.8)
To every point x ∈ U there is associated a vector v(x) ∈ Rn .
If the vector field v is partial differentiable (i. e. all components vi are partial differentiable)
then
n
X
∂vi
div v =
∂xi
i=1
(7.9)
is called the divergence of the vector field v.
Formally the divergence of v can be written as a inner product of ∇ and v
div v = ∇·v =
n
X
∂
vi .
∂x
i
i=1
The product rule gives the following rule for the divergence. Let f : U → R a partial differentiable function and
v = (v1 , . . . , vn ) : U → R
a partial differentiable vector field, then
∂f
∂vi
∂
(f vi ) =
· vi + f ·
.
∂xi
∂xi
∂xi
Summation over i gives
div (f v) = grad f ·v + f div v.
Using the nabla operator this can be rewritten as
∇·f v = ∇f ·v + f ∇·v.
Example 7.3 Let F : Rn \ 0 → Rn be the vector field F (x) = xr , r = kxk. Since
div x =
n
X
∂xi
i=1
∂xi
=n
and
x·x = r 2 ,
Example 7.2 gives with v = x and f (r) = 1/r
div
x
n
1
x
n−1
1
= grad ·x + div x = − 3 ·x + =
.
r
r
r
r
r
r
(7.10)
7.1 Partial Derivatives
197
7.1.1 Higher Partial Derivatives
Let U ⊂ Rn be open and f : U → R a partial differentiable function. If all partial derivatives
Di f : U → R are again partial differentiable, f is called twice partial differentiable. We can
form the partial derivatives Dj Di f of the second order.
More general, f : U → R is said to be (k + 1)-times partial differentiable if it is k-times partial
differentiable and all partial derivatives of order k
Dik Dik−1 · · · Di1 f : U → R
are partial differentiable.
A function f : U → R is said to be k-times continuously partial differentiable if it is k-times
partial differentiable and all partial derivatives of order less than or equal to k are continuous.
The set of all such functions on U is denoted by Ck (U).
We also use the notation
Dj Di f =
∂k f
∂2f
∂2f
f
=
= fxi xj , Di Di f =
,
D
·
·
·
D
.
i
i
1
k
∂xj ∂xi
∂x2i
∂xik · · · ∂xi1
Example. Let f (x, y) = sin(xy 2 ). One easily sees that
fyx = fxy = 2y cos(xy 2 ) − y 2 sin(xy 2 )2xy.
Proposition 7.2 (Schwarz’s Lemma) Let U ⊂ Rn be open and f : U → R be twice continuously partial differentiable.
Then for every a ∈ U and all i, j = 1, . . . , n we have
Dj Di f (a) = Di Dj f (a).
(7.11)
Proof. Without loss of generality we assume n = 2, i = 1, j = 2, and a = 0; we write (x, y) in
place of (x1 , x2 ). Since U is open, there is a small square of length 2δ > 0 completely contained
in U:
{(x, y) ∈ R2 | | x | < δ, | y | < δ} ⊂ U.
For fixed y ∈ Uδ (0) define the function F : (−δ, δ) → R via
F (x) = f (x, y) − f (x, 0).
By the mean value theorem (Theorem 4.9) there is a ξ with | ξ | ≤ | x | such that
F (x) − F (0) = xF ′ (ξ).
But F ′ (ξ) = fx (ξ, y) − fx (ξ, 0). Applying the mean value theorem to the function h(y) =
fx (ξ, y), there is an η with | η | ≤ | y | and
fx (ξ, y) − fx (ξ, 0) = h′ (y)y =
∂
fx (ξ, η) y = fxy (ξ, η) y.
∂y
Altogether we have
F (x) − F (0) = f (x, y) − f (x, 0) − f (0, y) + f (0, 0) = fxy (ξ, η)xy.
(7.12)
7 Calculus of Functions of Several Variables
198
The same arguments but starting with the function G(y) = f (x, y) − f (0, y) show the existence
of ξ ′ and η ′ with | ξ ′ | ≤ | x |, | η ′ | ≤ | y | and
f (x, y) − f (x, 0) − f (0, y) + f (0, 0) = fxy (ξ ′, η ′ ) xy.
(7.13)
From (7.12) and (7.13) for xy 6= 0 it follows that
fxy (ξ, η) = fxy (ξ ′, η ′ ).
If (x, y) approaches (0, 0) so do (ξ, η) and (ξ ′ , η ′ ). Since fxy and fyx are both continuous it
follows from the above equation
D2 D1 f (0, 0) = D1 D2 f (0, 0).
Corollary 7.3 Let U ⊂ Rn be open and f : U → Rn be k-times continuously partial differentiable. Then
Dik · · · Di1 f = Diπ(k) · · · Diπ(1) f
for every permutation π of 1, . . . , k.
Proof. The proof is by induction on k using the fact that any permutation can be written as a
product of transpositions (j ↔ j + 1).
Example 7.4 Let U ⊂ R3 be open and let v : U → R3 be a partial differentiable vector field.
One defines a new vector field curl v : U → R3 , the curl of v by
∂v3
∂v2 ∂v1
∂v3 ∂v2
∂v1
curl v =
.
(7.14)
−
,
−
,
−
∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2
Formally one can think of curl v as being the vector product of ∇ and v
e
1 e2 e3 curl v = ∇ × v = ∂x∂ 1 ∂x∂ 2 ∂x∂ 3 ,
v1 v2 v3 where e1 , e2 , and e3 are the unit vectors in R3 . If f : U → R has continuous second partial
derivatives then, by Proposition 7.2,
curl grad f = 0.
(7.15)
Indeed, the first coordinate of curl grad f is by definition
∂2f
∂2f
−
= 0.
∂x2 ∂x3 ∂x3 ∂x2
The other two components are obtained by cyclic permutation of the indices.
We have found: curl v = 0 is a necessary condition for a continuously partial differentiable
vector field v : U → R3 to be the gradient of a function f : U → R.
7.2 Total Differentiation
199
7.1.2 The Laplacian
Let U ⊂ Rn be open and f ∈ C(U). Put
∆f = div grad f =
and call
∆=
∂2f
∂2f
+
·
·
·
+
,
∂x21
∂x2n
(7.16)
∂2
∂2
+
·
·
·
+
∂x21
∂x2n
the Laplacian or Laplace operator. The equation ∆f = 0 is called the Laplace equation; its
solution are the harmonic functions. If f depends on an additional time variable t, f : U × I →
R, (x, t) 7→ f (x, t) one considers the so called wave equation
ftt − a2 ∆f = 0,
(7.17)
ft − k∆f = 0.
(7.18)
and the so called heat equation
Example 7.5 Let f : (0, +∞) → R be twice continuously differentiable. We want to compute
the Laplacian ∆f (r), r = kxk, x ∈ Rn \ 0. By Example 7.2 we have
x
grad f (r) = f ′ (r) ,
r
and by the product rule and Example 7.3 we obtain
x
x
x x
n−1
∆f (r) = div grad f (r) = grad f ′ (r)· + f ′ (r) div = f ′′ (r) · + f ′ (r)
;
r
r
r r
r
thus
n−1 ′
f (r).
r
= 0 if n ≥ 3 and ∆ log r = 0 if n = 2.
∆f (r) = f ′′ (r) +
1
In particular, ∆ rn−2
Prove!
7.2 Total Differentiation
In this section we define (total) differentiability of a function f from Rn to Rm . Roughly
speaking, f is differentiable (at some point) if it can be approximated by a linear mapping. In
contrast to partial differentiability we need not to refer to single coordinates. Differentiable
functions are continuous. In this section U always denotes an open subset of Rn . The vector
space of linear mappings f of a vector space V into a vector space W will be denoted by
L(V, W ).
Motivation: If f : R → R is differentiable at x ∈ R and f ′ (x) = a, then
f (x + h) − f (x) − a·h
= 0.
h→0
h
lim
Note that the mapping h 7→ ah is linear from R → R and any linear mapping is of that form.
7 Calculus of Functions of Several Variables
200
Definition 7.5 The mapping f : U → Rm is said to be differentiable at a point x ∈ U if there
exist a linear map A : Rn → Rm such that
kf (x + h) − f (x) − A(h)k
= 0.
h→0
khk
lim
(7.19)
The linear map A ∈ L(Rn , Rm ) is called the derivative of f at x and will be denoted by Df (x).
In case n = m = 1 this notion coincides with the ordinary differentiability of a function.
Remark 7.2 We reformulate the definition of differentiability of f at a ∈ U: Define a function
ϕa : Uε (0) ⊂ Rn → Rm (depending on both a and h) by
f (a + h) = f (a) + A(h) + ϕa (h).
(7.20)
a (h)k
Then f is differentiable at a if and only if limh→0 kϕkhk
= 0. Replacing the r.h.s. of (7.20) by
f (a) + A(h) (forgetting about ϕa ) and inserting x in place of a + h and Df (a) in place of A,
we obtain the linearization L : Rn → Rm of f at a:
L(x) = f (a) + Df (a)(x − a).
(7.21)
Lemma 7.4 If f is differentiable at x ∈ U the linear mapping A is uniquely determined.
Proof. Throughout we refer to the euclidean norms on Rn and Rm . Suppose that A′ ∈
L(Rn , Rm ) is another linear mapping satisfying (7.19). Then for h ∈ Rn , h 6= 0,
kA(h) − A′ (h)k = kf (x + h) − f (x) − A(h) − (f (x + h) − f (x) − A′ (h))k
≤ kf (x + h) − f (x) − A(h)k + kf (x + h) − f (x) − A′ (h)k
kA(h) − A′ (h)k
kf (x + h) − f (x) − A(h)k kf (x + h) − f (x) − A′ (h)k
≤
+
khk
khk
khk
Since the limit h → 0 on the right exists and equals 0, the l.h.s also tends to 0 as h → 0, that is
kA(h) − A′ (h)k
= 0.
h→0
khk
lim
Now fix h0 ∈ Rn , h0 6= 0, and put h = th0 , t ∈ R , t → 0. Then h → 0 and hence,
kA(th0 ) − A′ (th0 )k
| t | kA(h0 ) − A′ (h0 )k
kA(h0 ) − A′ (h0 )k
= lim
=
.
t→0
t→0
kth0 k
| t | kh0 k
kh0 k
0 = lim
Hence, kA(h0 ) − A′ (h0 )k = 0 which implies A(h0 ) = A′ (h0 ) such that A = A′ .
Definition 7.6 The matrix (aij ) ∈ Rm×n to the linear map Df (x) with respect to the standard
bases in Rn and Rm is called the Jacobi matrix of f at x. It is denoted by f ′ (x), that is
 
0
 .. 
.
 
′

aij = (f (x))ij = (0, 0, . . . , |{z}
1 , 0, . . . , 0)·Df (x)· 
1 = ei ·Df (x)(ej ).
 .. 
i
.
0
7.2 Total Differentiation
201
Example Let f : Rn → Rm be linear, f (x) = B(x) with B ∈ L(Rn , Rm ). Then Df (x) = B
is the constant linear mapping. Indeed,
f (x + h) − f (x) − B(h) = B(x + h) − B(x) − B(h) = 0
since B is linear. Hence, lim kf (x + h) − f (x) − B(h)k khk = 0 which proves the claim.
h→0
Remark 7.3 (a) Using a column vector h = (h1 , . . . , hn )⊤ the map Df (x)(h) is then given by
matrix multiplication

    Pn

h1
a11 . . . a1n
j=1 a1j hj


..
..   ..  = 
Df (x)(h) = f ′ (x) · h =  ...
.
.
.  .  
Pn
hn
am1 . . . amn
j=1 amj hj
Once chosen the standard basis in Rm , we can write f (x) = (f1 (x), . . . , fm (x)) as vector of m
scalar functions fi : Rn → R. By Proposition 6.19 the limit of the vector function
1
(f (x + h) − f (x) − Df (x)(h)) .
h→0 khk
lim
exists and is equal to 0 if and only if the limit exists for every coordinate i = 1, . . . , m and is 0
!
n
X
1
fi (x + h) − fi (x) −
(7.22)
aij hj = 0, i = 1, . . . , m.
lim
h→0 khk
j=1
We see, f is differentiable at x if and only if all fi , i = 1, . . . , m, are. In this case the Jacobi
matrix f ′ (x) is just the collection of the row vectors fi′ (x), i = 1, . . . , m:
 ′

f1 (x)


f ′ (x) =  ...  ,
′
(x)
fm
where fi′ (x) = (ai1 , ai2 , . . . , ain ).
(b) Case m = 1 (f = f1 ), f ′ (a) ∈ R1×n is a linear functional (a row vector). Proposition 7.6
below will show that f ′ (a) = grad f (a). The linearization L(x) of f at a is given by the linear
functional Df (a) from Rn → R. The graph of L is an n-dimensional hyper plane in Rn+1
touching the graph of f at the point (a, f (a)). In coordinates, the linearization (hyper plane
equation) is
xn+1 = L(x) = f (a) + f ′ (a)·(x − a).
Here f ′ (a) is the row vector corresponding to the linear functional Df (a) : Rn → R w.r.t. the
standard basis.
Example 7.6 Let C = (cij ) ∈ Rn×n be a symmetric n × n matrix, that is cij = cji for all
i, j = 1, . . . , n and define f : Rn → R by
f (x) = x·C(x) =
n
X
i,j=1
cij xi xj ,
x = (x1 , . . . , xn ) ∈ Rn .
7 Calculus of Functions of Several Variables
202
If a, h ∈ Rn we have
f (a + h) = a + h·C(a + h) = a·C(a) + h·C(a) + a·C(h) + h·C(h)
= a·Ca + 2C(a)·h + h·C(h)
= f (a) + v ·h + ϕ(h),
where v = 2C(a) and ϕ(h) = h·C(h). Since, by the Cauchy–Schwarz inequality,
| ϕ(h) | = | h·C(h) | ≤ khk kC(h)k ≤ khk kCk khk ≤ kCk khk2 ,
= 0. This proves f to be differentiable at a ∈ Rn with derivative Df (x)(x) =
limh→0 ϕ(h)
khk
2C(a)·x. The Jacobi matrix is a row vector f ′ (a) = 2C(a)⊤ .
7.2.1 Basic Theorems
Lemma 7.5 Let f : U → Rm differentiable at x, then f is continuous at x.
Proof. Define ϕx (h) as in Remarks 7.2 with Df (x) = A, then
lim kϕx (h)k = 0
h→0
since f is differentiable at x. Since A is continuous by Prop. 7.1, limh→0 A(h) = A(0) = 0.
This gives
lim f (x + h) = f (x) + lim A(h) + lim ϕx (h) = f (x).
h→0
h→0
h→0
This shows continuity of f at x.
Proposition 7.6 Let f : U → Rm , f (x) = (f1 (x), . . . , fm (x)) be differentiable at x ∈ U . Then
, i = 1, . . . m, j = 1, . . . , n exist and the Jacobi matrix f ′ (x) ∈
all partial derivatives ∂f∂xi (x)
j
Rm×n has the form
 ∂f1

∂f1
. . . ∂x
∂x1
n
∂fi (x)
 ..

.
′
..  (x) =
.
(7.23)
(aij ) = f (x) =  .
i = 1, . . . , m
∂xj
∂fm
∂fm
. . . ∂xn
∂x1
j = 1, . . . , n
Notation. (a) For the Jacobi matrix we also use the notation
∂(f1 , . . . , fm )
′
f (x) =
(x) .
∂(x1 , . . . , xn )
(b) In case n = m the determinant det(f ′ (x)) of the Jacobi matrix is called the Jacobian or
functional determinant of f at x. It is denoted by
det(f ′ (x)) =
∂(f1 , . . . , fn )
(x) .
∂(x1 , . . . , xn )
7.2 Total Differentiation
203
Proof. Inserting h = tej = (0, . . . , t, . . . , 0) into (7.22) (see Remark 7.3) we have, since khk =
| t | and hk = tδkj for all i = 1, . . . , m
P
kfi (x + tej ) − fi (x) − nk=1 aik hk k
0 = lim
t→0
ktej k
| fi (x1 , . . . , xj + t, . . . , xn ) − fi (x) − taij |
= lim
t→0
|t|
fi (x1 , . . . , xj + t, . . . , xn ) − fi (x)
= lim − aij t→0
t
∂fi (x)
= − aij .
∂xj
Hence aij =
∂fi (x)
.
∂xj
Hyper Planes
A plane in R3 is the set H = {(x1 , x2 , x3 ) ∈ R3 | a1 x2 + a2 x2 + a3 x3 = a4 } where, ai ∈ R,
i = 1, . . . , 4. The vector a = (a1 , a2 , a3 ) is the normal vector to H; a is orthogonal to any
vector x − x′ , x, x′ ∈ H. Indeed, a·(x − x′ ) = a·x − a·x′ = a4 − a4 = 0.
The plane H is 2-dimensional since H can be written with two parameters α1 , α2 ∈ R as
(x01 , x02 , x03 )+α1 v1 +α2 v2 , where (x01 , x02 , x03 ) is some point in H and v1 , v2 ∈ R3 are independent
vectors spanning H.
This concept is can be generalized to Rn . A hyper plane in Rn is the set of points
H = {(x1 , . . . , xn ) ∈ Rn | a1 x1 + a2 x2 + · · · + an xn = an+1 },
where a1 , . . . , an+1 ∈ R. The vector (a1 , . . . , an ) ∈ Rn is called the normal vector to the hyper
plane H. Note that a is unique only up to scalar multiples. A hyper plane in Rn is of dimension
n − 1 since there are n − 1 linear independent vectors v1 , . . . , vn−1 ∈ Rn and a point h ∈ H
such that
H = {h + α1 v1 + · · · + αn vn | α1 , . . . , αn ∈ R}.
Example 7.7 (a) Special case m = 1; let f : U → R be differentiable. Then
∂f
∂f
′
(x), . . . ,
(x) = grad f (x).
f (x) =
∂x1
∂xn
It is a row vector and gives a linear functional on Rn which linearly associates to each vector
y = (y1 , . . . , yn )⊤ ∈ Rn a real number
 
y1
n
X
 .. 
fxj (x)yj .
Df (x)  .  = grad f (x)·y =
yn
j=1
7 Calculus of Functions of Several Variables
204
In particular by Remark 7.3 (b), the equation of the linearization of f at a (the touching hyper
plane) is
xn+1 = L(x) = f (a) + grad f (a)·(x − a)
xn+1 = f (a) + (fx1 (a), . . . , fxn (a)) ·(x1 − a1 , · · · , xn − an )
xn+1 = f (a) +
n
X
j=1
0=
n
X
j=1
fxj (a)(xj − aj )
−fxj (a)(xj − aj ) + 1 · (xn+1 − f (a))
0 = ñ·(x̃ − ã),
where x̃ = (x1 , · · · , xn , xn+1 ), ã = (a1 , · · · an , f (a)) and ñ = (− grad f (a), 1) ∈ Rn+1 is the
normal vector to the hyper plane at ã.
(b) Special case n = 1; let f : (a, b) → Rm , f = (f1 , . . . , fm ) be differentiable. Then f is a
′
(t)) ∈ Rm×1 is
curve in Rm with initial point f (a) and end point f (b). f ′ (t) = (f1′ (t), . . . , fm
the Jacobi matrix of f at x (column vector). It is the tangent vector to the curve f at t ∈ (a, b).
(c) Let f : R3 → R2 be given by
3
x − 3xy 2 + z
.
f (x, y, z) = (f1 , f2 ) =
sin(xyz 2 )
Then
′
f (x, y, z) =
∂(f1 , f2 )
∂(x, y, z)
=
−6xy
1
3x2 − 3y 2
.
yz 2 cos(xy 2 z) xz 2 cos(xy 2 z) 2xyz cos(xy 2 z)
The linearization of f at (a, b, c) is
L(x, y, z) = f (a, b, c) + f ′ (a, b, c)·(x − a, y − b, z − c).
Remark 7.4 Note that the existence of all partial derivatives does not imply the existence of
f ′ (a). Recall Example 7.1 (d). There was given a function having partial derivatives at the
origin not being continuous at (0, 0), and hence not being differentiable at (0, 0). However, the
next proposition shows that the converse is true provided all partial derivatives are continuous.
Proposition 7.7 Let f : U → Rm be continuously partial differentiable at a point a ∈ U.
Then f is differentiable at a and f ′ : U → L(Rn , Rm ) is continuous at a.
The proof in case n = 2, m = 1 is in the appendix to this chapter.
Theorem 7.8 (Chain Rule) If f : Rn → Rm is differentiable at a point a and g : Rm → Rp is
differentiable at b = f (a), then the composition k = g ◦f : Rn → Rp is differentiable at a and
Dk(a) = Dg(b)◦Df (a).
(7.24)
7.2 Total Differentiation
205
Using Jacobi matrices, this can be written as
k ′ (a) = g ′ (b) · f ′ (a).
(7.25)
Proof. Let A = Df (a), B = Dg(b), and y = f (x). Defining functions ϕ, ψ, and ρ by
ϕ(x) = f (x) − f (a) − A(x − a),
ψ(y) = g(y) − g(b) − B(y − b),
ρ(x) = g ◦f (x) − g ◦f (a) − B ◦A(x − a)
(7.26)
(7.27)
(7.28)
then
lim
x→a
kϕ(x)k
= 0,
kx − ak
and we have to show that
lim
y→b
kψ(y)k
=0
ky − bk
(7.29)
kρ(x)k
= 0.
x→a kx − ak
lim
Inserting (7.26) and (7.27) we find
ρ(x) = g(f (x)) − g(f (a)) − BA (x − a) = g(f (x)) − g(f (a)) − B(f (x) − f (a) − ϕ(x))
ρ(x) = [g(f (x)) − g(f (a)) − B(f (x) − f (a))] + B ◦ϕ(x)
ρ(x) = ψ(f (x)) + B(ϕ(x)).
Using kT (x)k ≤ kT k kxk (see Proposition 7.1) this shows
kρ(x)k
kψ(f (x))k kB ◦ϕ(x)k
kψ(y)k kf (x) − f (a)k
kϕ(x)k
≤
+
≤
·
+ kBk
.
kx − ak
kx − ak
kx − ak
ky − bk
kx − ak
kx − ak
Inserting (7.26) again into the above equation we continue
kϕ(x)k
kψ(y)k kϕ(x) + A(x − a)k
·
+ kBk
ky − bk
ka − xk
kx − ak
kψ(y)k kϕ(x)k
kϕ(x)k
≤
+ kAk + kBk
.
ky − bk ka − xk
kx − ak
=
All terms on the right side tend to 0 as x approaches a. This completes the proof.
Remarks 7.5 (a) The chain rule in coordinates. If A = f ′ (a), B = g ′(f (a)), and C = k ′ (a),
then A ∈ Rm×n , B ∈ Rp×m , and C ∈ Rp×n and
∂(k1 , . . . , kp )
∂(g1 , . . . , gp )
∂(f1 , . . . , fm )
◦
=
(7.30)
∂(x1 , . . . , xn )
∂(y1 , . . . , ym)
∂(x1 , . . . , xn )
m
X
∂fi
∂kr
∂gr
(a) =
(f (a))
(a), r = 1, . . . , p, j = 1, . . . , n.
(7.31)
∂xj
∂yi
∂xj
i=1
(b) In particular, in case p = 1, k(x) = g(f (x)) we have,
∂k
∂g ∂f1
∂g ∂fm
=
+···+
.
∂xj
∂y1 ∂xj
∂ym ∂xj
7 Calculus of Functions of Several Variables
206
Example 7.8 (a) Let f (u, v) = uv, u = g(x, y) = x2 + y 2 , v = h(x, y) = xy, and z =
f (g(x, y), h(x, y)) = (x2 + y 2 )xy = x3 y + x2 y 3 .
∂z
∂f ∂g ∂f ∂h
=
·
+
·
= v · 2x + u · y = 2x2 y + y(x2 + y 2 )
∂x
∂u ∂x ∂v ∂x
∂z
= 3x2 y + y 3 .
∂x
(b) Let f (u, v) = uv , u(t) = v(t) = t. Then F (t) = f (u(t), v(t)) = tt and
∂f ′
∂f ′
u (t) +
v (t) = vuv−1 · 1 + uv log u · 1
∂u
∂v
= t · tt−1 + tt log t = tt (log t + 1).
F ′ (t) =
7.3 Taylor’s Formula
The gradient of f gives an approximation of a scalar function f by a linear functional. Taylor’s
formula generalizes this concept of approximation to higher order. We consider quadratic approximation of f to determine
local extrema. Througout this section we refer to the euclidean
p
2
norm kxk = kxk2 = x1 + · · · + x2n .
7.3.1 Directional Derivatives
Definition 7.7 Let f : U → R be a function, a ∈ U, and e ∈ Rn a unit vector, kek = 1. The
directional derivative of f at a in the direction of the unit vector e is the limit
(De f )(a) = lim
t→0
Note that for e = ej we have De f = Dj f =
f (a + te) − f (a)
.
t
(7.32)
∂f
.
∂xj
Proposition 7.9 Let f : U → R be continuously differentiable. Then for every a ∈ U and every
unit vector e ∈ Rn , kek = 1, we have
De f (a) = e· grad f (a)
(7.33)
Proof. Define g : R → Rn by g(t) = a + te = (a1 + te1 , . . . , an + ten ). For sufficiently small
t ∈ R, say | t | ≤ ε, the composition k = f ◦g
g
f
R → Rn → R, k(t) = f (g(t)) = f (a1 + te1 , . . . , an + ten )
is defined. We compute k ′ (t) using the chain rule:
n
X
∂f
k (t) =
(a + te) gj′ (t).
∂xj
j=1
′
7.3 Taylor’s Formula
207
Since gj′ (t) = (aj + tej )′ = ej and g(0) = a, it follows
n
X
∂f
k (t) =
(a + te) ej ,
∂xj
j=1
′
′
k (0) =
n
X
(7.34)
fxj (a)ej = grad f (a)·e.
j=1
On the other hand, by definition of the directional derivative
k(t) − k(0)
f (a + te) − f (a)
= lim
= De f (a).
t→0
t→0
t
t
k ′ (0) = lim
This completes the proof.
Remark 7.6 (Geometric meaning of grad f ) Suppose that grad f (a) 6= 0 and let e be a
normed vector, kek = 1. Varying e, De f (x) = e· grad f (x) becomes maximal if and only if
e and ∇f (a) have the same directions. Hence the vector grad f (a) points in the direction of
maximal slope of f at a. Similarly, − grad f (a) is the direction of maximal
decline.
p
For example f (x,
1 − x2 −y 2 has
y) =
grad f (x, y) =
√
−x
,
1−x2 −y 2
√
−y
1−x2 −y 2
. The
maximal slope p
of f at (x, y) is in direction
e = (−x, −y)/ x2 + y 2 . In this case,the tangent line to the graph points to the z-axis and
has maximal slope.
Corollary 7.10 Let f : U → R be k-times continuously differentiable, a ∈ U and x ∈ Rn such
that the whole segment a + tx, t ∈ [0, 1] is contained in U.
Then the function h : [0, 1] → R, h(t) = f (a + tx) is k-times continuously differentiable, where
h(k) (t) =
n
X
i1 ,...,ik =1
Dik · · · Di1 f (a + tx)xi1 · · · xik .
(7.35)
In particular
h(k) (0) =
n
X
i1 ,...,ik =1
Dik · · · Di1 f (a)xi1 · · · xik .
(7.36)
Proof. The proof is by induction on k. For k = 1 it is exactly the statement of the Proposition.
We demonstrate the step from k = 1 to k = 2. By (7.34)
n
n
n X
X
X
d ∂f (a + tx)
∂f
′′
xi1 =
(a + tx)xi2 xi1 .
h (t) =
dt
∂xi1
∂xi2 ∂xi1
i =1
i =1 i =1
1
1
2
In the second equality we applied the chain rule to h̃(t) = fxi1 (a + tx).
7 Calculus of Functions of Several Variables
208
For brevity we use the following notation for the term on the right of (7.36):
n
X
(x ∇)k f (a) =
xi1 · · · xik Dik · · · Di1 f (a).
i1 ,...,ik =1
In particular, (x∇)f (a) = x1 fx1 (a) + x2 fx2 (a) + · · · + xn fxn (a) and (∇ x)2 f (a) =
Pn
∂2f
i,j=1 xi xj ∂xi ∂xj .
7.3.2 Taylor’s Formula
Theorem 7.11 Let f ∈ Ck+1 (U), a ∈ U, and x ∈ Rn such that a + tx ∈ U for all t ∈ [0, 1].
Then there exists θ ∈ [0, 1] such that
k
X
1
1
(x ∇)m f (a) +
(x ∇)k+1f (a + θx)
f (a + x) =
m!
(k
+
1)!
m=0
f (a + x) = f (a) +
n
X
i=1
n
1 X
xi fxi (a) +
xi xj fxi xj (a) + · · · +
2! i,j=1
X
1
+
xi1 · · · xik+1 fxi1 ···xik+1 (a + θx).
(k + 1)! i ,...i
1
The expression Rk+1 (a, x) =
(7.37)
1
(x ∇)k+1 f (a
(k+1)!
k+1
+ θx) is called the Lagrange remainder term.
Proof. Consider the function h : [0, 1] → R, h(t) = f (a + tx). By Corollary 7.10, h is a
(k + 1)-times continuously differentiable. By Taylor’s theorem for functions in one variable
(Theorem 4.15 with x = 1 and a = 0 therein), we have
k
X
h(m) (0) h(k+1) (θ)
+
.
f (a + x) = h(1) =
m!
(k
+
1)!
m=0
By Corollary 7.10 for m = 1, . . . , k we have
1
h(m) (0)
=
(x ∇)m f (a).
m!
m!
and
1
h(k+1) (θ)
=
(x ∇)k+1 f (a + θx);
(k + 1)!
(k + 1)!
the assertion follows.
It is often convenient to substitute x := x + a. Then the Taylor expansion reads
f (x) =
k
X
1
1
((x − a) ∇)m f (a) +
((x − a) ∇)(k+1) f (a + θ(x − a))
m!
(k
+
1)!
m=0
n
X
n
1 X
f (x) = f (a) +
(xi − ai )fxi (a) +
(xi − ai )(xj − aj )fxi xj (a) + · · · +
2! i,j=1
i=1
X
1
+
(xi1 − ai1 ) · · · (xik+1 − aik+1 )fxi1 ···xik+1 (a + θ(x − a)).
(k + 1)! i ,...,i
1
k+1
7.3 Taylor’s Formula
209
We write the Taylor formula for the case n = 2, k = 3:
f (a + x,b + y) = f (a, b) + (fx (a, b)x + fy (a, b)y) +
1
+
fxx (a, b)x2 + 2fxy (a, b) xy + fyy (a, b)y 2 +
2!
1
fxxx (a, b)x3 + 3fxxy (a, b)x2 y + 3fxyy (a, b)xy 2 + fyyy (a, b)y 3 ) + R4 (a, x).
+
3!
T
k
k
If f ∈ ∞
k=0 C (U) = {f : f ∈ C (U) ∀ k ∈ N} and lim Rk (a, x) = 0 for all x ∈ U, then
k→∞
∞
X
1
f (x) =
((x − a) ∇)m f (a).
m!
m=0
The r.h.s. is called the Taylor series of f at a.
Example 7.9 (a) We compute the Taylor expansion of f (x, y) = cos x sin y at (0, 0) to the third
order. We have
fx = − sin x sin y,
fy = cos x cos y,
fx (0, 0) = 0,
fxx = − cos x sin y,
fxx (0, 0) = 0,
fxxy = − cos x cos y,
fxxy (0, 0) = −1,
fy (0, 0) = 1,
fyy = − cos x sin y,
fyy (0, 0) = 0,
fyyy = − cos x cos y,
fyyy (0, 0) = −1,
fxy = − sin x cos y
fxy (0, 0) = 0.
fxyy (0, 0) = fxxx (0, 0) = 0.
Inserting this gives
1
−3x2 y − y 3 + R4 (x, y; 0).
3!
The same result can be obtained by multiplying the Taylor series for cos x and sin y:
y3
1
x2 x4
y3
+
∓···
y−
± · · · = y − x2 y −
+···
1−
2
4!
3!
2
6
f (x, y) = y +
2
(b) The Taylor series of f (x, y) = exy at (0, 0) is
∞
X
(xy 2 )n
n=0
n!
1
= 1 + xy 2 + x2 y 4 + · · · ;
2
it converges all over R2 to f (x, y).
Corollary 7.12 (Mean Value Theorem) Let f : U → R be continuously differentiable, a ∈ U,
x ∈ Rn such that a + tx ∈ U for all t ∈ [0, 1]. Then there exists θ ∈ [0, 1] such that
f (a + x) − f (a) = ∇f (a + θx)·x, f (y) − f (x)
= ∇f ((1 − θ)x + θy)·(y − x).
This is the special case of Taylor’s formula with k = 0.
(7.38)
7 Calculus of Functions of Several Variables
210
Corollary 7.13 Let f : U → R be k times continuously differentiable, a ∈ U, x ∈ Rn such
that a + tx ∈ U for all t ∈ [0, 1]. Then there exists ϕ : U → R such that
k
X
1
f (a + x) =
(x ∇)m f (a) + ϕ(x),
m!
m=0
where
lim
x→0
ϕ(x)
kxkk
(7.39)
= 0.
Proof. By Taylor’s theorem for f ∈ Ck (U), there exists θ ∈ [0, 1] such that
f (x + a) =
k−1
k
X
X
1
1
1
!
(x ∇)m f (a) + (x ∇)k f (a + θx) =
(x ∇)m f (a) + ϕ(x).
m!
k!
m!
m=0
m=0
This implies
ϕ(x) =
1
(x ∇)k f (a + θx) − (x ∇)k f (a) .
k!
Since | xi1 · · · xik | ≤ kxk . . . kxk = kxkk for x 6= 0,
| ϕ(x) |
kxkk
≤
1 k
∇ f (a + θx) − ∇k f (a) .
k!
Since all kth partial derivatives of f are continuous,
Di1 i2 ···ik (f (a + θx) − f (a)) −→ 0.
x→0
This proves the claim.
Remarks 7.7 (a) With the above notations let
Pm (x) =
((x − a) ∇)m
f (a).
m!
Then Pm is a polynomial of degree m in the set of variables x = (x1 , . . . , xn ) and we have
f (x) =
k
X
m=0
Pm (x) + ϕ(x),
lim
x→a
kϕ(x)k
kx − akk
= 0.
Let us consider in more detail the cases m = 0, 1, 2.
Case m = 0.
D 0 f (a) 0
x = f (a).
P0 (x) =
0!
P0 is the constant polynomial with value f (a).
Case m = 1. We have
P1 (x) =
n
X
j=1
fxj (a)(xj − aj ) = grad f (a)·(x − a)
7.4 Extrema of Functions of Several Variables
211
Using Corollary 7.13 the first order approximation of a continuously differentiable function is
f (x) = f (a) + grad f (a)·(x − a) + ϕ(x),
ϕ(x)
= 0.
x→a kx − ak
lim
The linearization of f at a is L(x) = P0 (x) + P1 (x).
Case m = 2.
n
1X
fx x (a)(xi − ai )(xj − aj ).
P2 (x) =
2 i,j=1 i j
Hence, P2 (x) is quadratic with the corresponding matrix
Corollary 7.13 (m = 2) we have for f ∈ C2 (U)
1
f (a)
2 xi xj
1
f (a + x) = f (a) + grad f (a)·x + x⊤ · Hess f (a)·x + ϕ(x),
2
(7.40)
. As a special case of
ϕ(x)
= 0,
x→0 kxk2
lim
(7.41)
where
( Hess f )(a) = fxi xj (a)
n
i,j=1
(7.42)
is called the Hessian matrix of f at a ∈ U. The Hessian matrix is symmetric by Schwarz’
lemma.
7.4 Extrema of Functions of Several Variables
Definition 7.8 Let f : U → R be a function. The point x ∈ U is called local maximum
(minimum) of f if there exists a neighborhood Uε (x) ⊂ U of x such that
f (x) ≥ f (y)
(f (x) ≤ f (y)) for all y ∈ Uε (x).
A local extremum is either a local maximum or a local minimum.
Proposition 7.14 Let f : U → R be partial differentiable. If f has a local extremum at x ∈ U
then grad f (x) = 0.
Proof. For i = 1, . . . , n consider the function
gi (t) = f (x + tei ).
This is a differentiable function of one variable, defined on a certain interval (−ε, ε). If f has
an extremum at x, then gi has an extremum at t = 0. By Proposition 4.7
gi′ (0) = 0.
Since gi′ (0) = limt→0
f (x+tei )−f (x)
t
= fxi (x) and i was arbitrary, it follows that
grad f (x) = (Di f (x), . . . , Dn f (x)) = 0.
7 Calculus of Functions of Several Variables
212
p
Example 7.10 Let f (x, y) = 1 − x2 − y 2 be defined on the open unit disc U = {(x, y) ∈
R2 | x2 + y 2 < 1}. Then grad f (x, y) = (−x/r, −y/r) = p
0 if and only if x = y = 0. If f has
an extremum in U then at the origin. Obviously, f (x, y) = 1 − x2 − y 2 ≤ 1 = f (0, 0) for all
points in U such that f attains its global (and local) maximum at (0, 0).
To obtain a sufficient criterion for the existence of local extrema we have to consider the Hessian
matrix. Before, we need some facts from Linear Algebra.
Definition 7.9 Let A ∈ Rn×n be a real, symmetric n × n-matrix, that is aij = aji for all
i, j = 1, . . . , n. The associated quadratic form
Q(x) =
n
X
i,j=1
aij xi xj = x⊤ · A · x
is called
positive definite
negative definite
indefinite
positive semidefinite
negative semidefinite
if Q(x) > 0 for all x 6= 0,
if Q(x) < 0 for all x 6= 0,
if Q(x) > 0, Q(y) < 0 for some x, y,
if Q(x) ≥ 0 for all x and Q is not positive definite,
if Q(x) ≤ 0 for all x and Q is not negative definite.
Also, we say that the corresponding matrix A is positive defininite if Q(x) is.
Example 7.11 Let n = 2, Q(x) = Q(x1 , x2 ). Then Q1 (x) = 3x21 + 7x22 is positive definite,
Q2 (x) = −x21 − 2x22 is negative definite, Q3 (x) = x21 − 2x22 is indefinite, Q4 (x) = x21 is positive
semidefinite, and Q5 (x) = −x22 is negative semidefinite.
Proposition 7.15 (Sylvester) Let A be a real symmetric n × n-matrix and Q(x) = x·Ax the
corresponding quadratic form. For k = 1, · · · , n let


a11 · · · a1k

..  , D = det A .
Ak =  ...
k
k
. 
ak1 · · · akk
Let λ1 , . . . , λn be the eigenvalues of A. Then
(a) Q is positive definite if and only if λ1 > 0, λ2 > 0, . . . , λn > 0. This is the case
if and only if D1 > 0, D2 > 0,. . . , Dn > 0.
(b) Q(x) is negative definite if and only if λ1 < 0, λ2 < 0,. . . ,λn < 0. This is the
case if and only if (−1)k Dk > 0 for all k = 1, . . . , n.
(c) Q(x) is indefinite if and only if, A has both positive and negative eigenvalues.
Example 7.12 Case n = 2. Let A ∈ R
2×2
, A =
Sylvester’s criterion A is
(a) positive definite if and only if det A > 0 and a > 0,
a b
, be a symmetric matrix. By
b c
7.4 Extrema of Functions of Several Variables
213
(b) negative definite if and only if det A > 0 and a < 0,
(c) indefinite if and only if det A < 0,
(d) semidefinite if and only if det A = 0.
Proposition 7.16 Let f : U → R be twice continuously differentiable and let grad f (a) = 0 at
some point a ∈ U.
(a) If Hess f (a) is positive definite, then f has a local minimum at a.
(b) If Hess f (a) is negative definite, then f has a local maximum at a.
(c) If Hess f (a) is indefinite, then f has not a local extremum at a.
Note that in general there is no information on a if Hess f (a) is semidefinit.
Proof. By (7.41) and since grad f (a) = 0,
1
f (a + x) = f (a) + x·A(x) + ϕ(x),
2
ϕ(x)
= 0,
x→0 kxk2
lim
(7.43)
where A = Hess f (a).
(a) Let A be positive definite. Since the unit sphere S = {x ∈ Rn | kxk = 1} is compact (closed
and bounded) and the map Q(x) = x·A(x) is continuous, the function attains its minimum, say
m, on S, see Proposition 6.24,
m = min{x·A(x) | x ∈ S}.
Since Q(x) is positive definite and 0 6∈ S, m > 0. If x is nonzero, y = x/ kxk ∈ S and therefore
1
x A(x)
1
x
m ≤ y ·A(y) =
=
x·A
=
x·A(x),
·
kxk
kxk
kxk kxk
kxk2
This implies Q(x) = x·A(x) ≥ m kxk2 for all x ∈ U.
Since ϕ(x)/ kxk2 −→ 0 as x → 0, there exists δ > 0 such that kxk < δ implies
−
m
m
kxk2 ≤ ϕ(x) ≤
kxk2 .
4
4
From (7.43) it follows
f (a + x) = f (a) +
m
m
1
1
Q(x) + ϕ(x) ≥ f (a) + m kxk2 − kxk2 ≥ f (a) + kxk2 ,
2
2
4
4
hence
f (a + x) > f (a),
if 0 < kxk < δ,
and f has a strict (isolated) local minimum at a.
(b) If A = Hess f (a) is negative definite, consider −f in place of f and apply (a).
(c) Let A = Hess f (a) indefinite. We have to show that in every neighborhood of a there exist
x′ and x′′ such that f (x′′ ) < f (a) < f (x′ ). Since A is indefinite, there is a vector x ∈ Rn \ 0
such that x·A(x) = m > 0. Then for small t we have
m
1
f (a + tx) = f (a) + tx·A(tx) + ϕ(tx) = f (a) + t2 + ϕ(tx).
2
2
7 Calculus of Functions of Several Variables
214
If t is small enough, − m4 t2 ≤ ϕ(tx) ≤
m 2
t,
4
hence
f (a + tx) > f (a),
if
0 < | t | < δ.
Similarly, if y ∈ Rn \ 0 satisfies y ·A(y) < 0, for sufficiently small t we have f (a + ty) < f (a).
!
Example 7.13 (a) f (x, y) = x2 + y 2. Here ∇f = (2x, 2y) = 0 if and only if x = y = 0.
Furthermore,
2 0
Hess f (x, y) =
0 2
is positive definite. f has a (strict) local minimum at (0, 0)
(b)Find the local extrema of z = f (x, y) = 4x2 − y 2 on R2 . (the graph is a hyperbolic
paraboloid). We find that the necesary condition ∇f = 0 implies fx = 8x = 0, fy = −2y = 0;
thus x = y = 0. Further,
fxx fxy
8 0
Hess f (x, y) =
.
=
fyx fyy
0 −2
The Hessian matrix at (0, 0) is indefinite; the function has not an extremum at the origin (0, 0).
(c) f (x, y) = x2 + y 3 . ∇f (x, y) = (2x, 3y 2) vanishes if and only if x = y = 0. Furthermore,
2 0
Hess f (0, 0) =
0 0
is positive semidefinit. However, there is no local extremum at the origin since f (ε, 0) = ε2 >
0 = f (0, 0) > −ε3 = f (0, −ε).
(d) f (x, y) = x2 + y 4. Again the Hessian matrix at (0, 0) is positive semidefinite. However,
(0, 0) is a strict local minimum.
Local and Global Extrema
To compute the global extrema of a function f : U → R where U ⊂ Rn is open and U is the
closure of U we have go along the following lines:
(a) Compute the local extrema on U;
(b) Compute the global extrema on the boundary ∂U = U ∩ U c ;
(c) If U is unbounded without boundary (as U = R), consider the limits at infinity.
Note that
sup f (x) = max{maximum of all local maxima in U, sup f (x)}.
x∈U
x∈∂U
To compute the global extremum of f on the boundary one has to find the local extrema on
the interior point of the boundary and to compare them with the values on the boundary of the
boundary.
7.4 Extrema of Functions of Several Variables
215
Example 7.14 Find the global extrema of f (x, y) = x2 y on U = {(x, y) ∈ R2 | x2 + y 2 ≤ 1}
(where U is the open unit disc.)
Since grad f = (fx , fy ) = (2xy, x2 ) local extrema can appear only on the y-axis x = 0, y is
arbitrary. The Hessian matrix at (0, y) is
fxx fxy 2y 2x 2y 0
Hess f (0, y) =
=
=
.
fxy fyy x=0
2x 0 x=0
0 0
This matrix is positive semidefinite in case y > 0, negative semidefinite in case y < 0 and 0 at
(0, 0). Hence, the above criterion gives no answer. We have to apply the definition directly. In
case y > 0 we have f (x, y) = x2 y ≥ 0 for all x. In particular f (x, y) ≥ f (0, y) = 0. Hence
(0, y) is a local minimum. Similarly, in case y < 0, f (x, y) ≤ f (0, y) = 0 for all x. Hence, f
has a local maximum at (0, y), y < 0. However f takes both positive and negative values in a
neighborhood of (0, 0), for example f (ε, ε) = ε3 and f (ε, −ε) = −ε3 . Thus (0, 0) is not a local
extremum.
We have to consider the boundary x2 + y 2 = 1. Inserting x2 = 1 − y 2 we obtain
g(y) = f (x, y)|x2 +y2 =1 = x2 y x2 +y2 =1 = (1 − y 2)y = y − y 3 , | y | ≤ 1.
We compute the local extrema of the boundary x2 +y 2 = 1 (note, that the circle has no boundary,
such that the local extrema are actually the global extrema).
!
g ′ (y) = 1 − 3y 2 = 0,
1
|y| = √ .
3
√
√
√
Since g ′′ (1/ 3) < 0 and g ′′ (−1/ 3) > 0, g attains its maximum 3√2 3 at y = 1/ 3. Since this
is greater than the local maximum of f at (0, y), y > 0, f attains its global maximum at the two
points
!
r
2 1
M1,2 = ±
,√
,
3 3
√
where f (M1,2 ) = x2 y = 3√2 3 . g attains its minimum − 3√2 3 at y = −1/ 3. Since this is less
than the local minimum of f at (0, y), y < 0, f attains its global minimum at the two points
!
r
2
1
, −√
,
m1,2 = ±
3
3
where f (m1,2 ) = x2 y = − 3√2 3 .
The arithmetic-geometric mean inequality shows the same result for x, y > 0:
1
x2 + y 2
≥
=
3
3
x2
2
+
x2
2
3
+ y2
≥
x2 x2 2
y
2 2
31
2
=⇒ x2 y ≤ √ .
3 3
(b) Among all boxes with volume 1 find the one where the sum of the length of the 12 edges is
minimal.
Let x, y and z denote the length of the three perpendicular edges of one vertex. By assumption
xyz = 1; and g(x, y, z) = 4(x + y + z) is the function to minimize.
7 Calculus of Functions of Several Variables
216
Local Extrema. Inserting the constraint z = 1/(xy) we have to minimize
1
f (x, y) = 4 x + y +
on U = {(x, y) | x > 0, y > 0}.
xy
The necessary condition is
1
= 0,
fx = 4 1 − 2
xy
=⇒ x2 y = xy 2 = 1 =⇒ x = y = 1.
1
fy = 4 1 − 2 = 0,
xy
Further,
fxx =
such that
8
,
x3 y
fyy =
8
,
xy 3
fxy =
4
x2 y 2
8 4
= 64 − 16 > 0;
det Hess f (1, 1) = 4 8
hence f has an extremum at (1, 1). Since fxx (1, 1) = 8 > 0, f has a local minimum at (1, 1).
Global Extrema. We show that (1, 1) is even the global minimum on the first quadrant U.
1
Consider N = {(x, y) | 25
≤ x, y ≤ 5}. If (x, y) 6∈ N,
f (x, y) ≥ 4(5 + 0 + 0) = 20,
Since f (x, y) ≥ 12 = f (1, 1), the global minimum of f on the right-upper quadrant is attained
on the compact rectangle N. Inserting the four boundaries x = 5, y = 5, x = 1/5, and y = 1/5,
in all cases, f (x, y) ≥ 20 such that the local minimum (1, 1) is also the global minimum.
7.5 The Inverse Mapping Theorem
Suppose that f : R → R is differentiable on an open set U ⊂ R, containing a ∈ U, and
f ′ (a) 6= 0. If f ′ (a) > 0, then there is an open interval V ⊂ U containing a such that f ′ (x) >
0 for all x ∈ V . Thus f is strictly increasing on V and therefore injective with an inverse
function g defined on some open interval W containing f (a). Moreover g is differentiable (see
Proposition 4.5) and g ′ (y) = 1/f ′ (x) if f (x) = y. An analogous result in higher dimensions is
more involved but the result is very important.
W
n
R
U
f
V
11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
a
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
11111
00000
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
f(a)
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
00000
11111
Rn
7.5 The Inverse Mapping Theorem
217
Theorem 7.17 (Inverse Mapping Theorem) Suppose that f : Rn → Rn is continuously differentiable on an open set U containing a, and det f ′ (a) 6= 0. Then there is an open set V ⊂ U
containing a and an open set W containing f (a) such that f : V → W has a continuous inverse
g : W → V which is differentiable and for all y ∈ W . For y = f (x) we have
g ′ (y) = (f ′ (x))
−1
,
Dg(y) = (Df (x))−1 .
(7.44)
For the proof see [Rud76, 9.24 Theorem] or [Spi65, 2-11].
Corollary 7.18 Let U ⊂ Rn be open, f : U → Rn continuously differentiable and
det f ′ (x) 6= 0 for all x ∈ U. Then f (U) is open in Rn .
Remarks 7.8 (a) One main part is to show that there is an open set V ⊂ U which is mapped
onto an open set W . In general, this is not true for continuous mappings. For example sin x
maps the open interval (0, 2π) onto the closed set [−1, 1]. Note that sin x does not satisfy the
assumptions of the corollary since sin′ (π/2) = cos(π/2) = 0.
(b) Note that continuity of f ′ (x) in a neighborhood of a, continuity of the determinant mapping
det : Rn×n → R, and det f ′ (a) 6= 0 implies that det f ′ (x) 6= 0 in a neighborhood V1 of a, see
homework 10.4. This implies that the linear mapping Df (x) is invertible for x ∈ V1 . Thus,
Df (x)−1 and (f ′ (x))−1 exist for x ∈ V1 — the linear mapping Df (x) is regular.
(c) Let us reformulate the statement of the theorem. Suppose
y1 = f1 (x1 , . . . , xn ),
y2 = f2 (x1 , . . . , xn ),
..
.
yn = fn (x1 , . . . , xn )
is a system of n equations in n variables x1 , . . . , xn ; y1 , . . . , yn are given in a neighborhood W
of f (a). Under the assumptions of the theorem, there exists a unique solution x = g(y) of this
system of equations
x1 = g1 (y1 , . . . , yn ),
x2 = g2 (y1 , . . . , yn ),
..
.
xn = gn (y1, . . . , yn )
in a certain neighborhood (x1 , . . . , xn ) ∈ V of a. Note that the theorem states the existence of
such a solution. It doesn’t provide an explicit formula.
(d) Note that the inverse function g may exist even if det f ′ (x) = 0. For example f : R → R,
√
defined by f (x) = x3 has f ′ (0) = 0; however g(y) = 3 x is inverse to f (x). One thing is
certain if det f ′ (a) = 0 then g cannot be differentiable at f (a). If g were differentiable at f (a),
the chain rule applied to g(f (x)) = x would give
g ′ (f (a)) · f ′ (a) = id
7 Calculus of Functions of Several Variables
218
and consequently
det g ′ (f (a)) det f ′ (a) = det id = 1
contradicting det f ′ (a) = 0.
(e) Note that the theorem states that under the given assumptions f is locally invertible. There
is no information about the existence of an inverse function g to f on a fixed open set. See
Example 7.15 (a) below.
Example 7.15 (a) Let x = r cos ϕ and y = r sin ϕ be the polar coordinates in R2 . More
precisely, let
r cos ϕ
x
, f : R2 → R2 .
=
f (r, ϕ) =
r sin ϕ
y
The Jacobian is
∂(x, y) xr xϕ cos ϕ −r sin ϕ
=
= r.
=
∂(r, ϕ) yr yϕ sin ϕ r cos ϕ Let f (r0 , ϕ0 ) = (x0 , y0 ) 6= (0, 0), then r0 6= 0 and the Jacobian of f at (r0 , ϕ0 ) is non-zero.
Since all partial derivatives of f with respect to r and ϕ exist and they are continuous on R2 , the
assumptions of the theorem are satisfied. Hence, in a neighborhood U of (x0 , y0 ) there exists a
continuously differentiable inverse
function r = r(x, y), ϕ = ϕ(x, y). In this case, the function
p
2
can be given explicitly, r = x + y 2 , ϕ = arg(x, y). We want to compute the Jacobi matrix
of the inverse function. Since the inverse matrix
−1 cos ϕ
sin ϕ
cos ϕ −r sin ϕ
=
− 1r sin ϕ 1r cos ϕ
sin ϕ r cos ϕ
we obtain by the theorem
g ′(x, y) =
∂(r, ϕ)
∂(x, y)
=
cos ϕ
− 1r sin ϕ
sin ϕ
=
1
cos ϕ
r
x
x2 +y 2
y
− x2 +y
2
√
√
y
x2 +y 2
x
x2 +y 2
!
;
in particular, the second row gives the partial derivatives of the argument function with respect
to x and y
∂ arg(x, y)
−y
x
∂ arg(x, y)
= 2
=
,
.
∂x
x + y2
∂y
x2 + y 2
Note that we have not determined the explicit form of the argument function which is not unique
since f (r, ϕ+2kπ) = f (r, ϕ), for all k ∈ Z. However, the gradient takes always the above form.
Note that det f ′ (r, ϕ) 6= 0 for all r 6= 0 is not sufficient for f to be injective on R2 \ {(0, 0)}.
(b) Let f : R2 → R2 be given by (u, v) = f (x, y) where
u(x, y) = sin x − cos y,
v(x, y) = − cos x + sin y.
Since
∂(u, v) ux uy cos x sin y =
=
= cos x cos y − sin x sin y = cos(x + y)
∂(x, y) vx vy sin x cos y 7.6 The Implicit Function Theorem
219
f is locally invertible at (x0 , y0 ) = ( π4 , − π4 ) since the Jacobian at (x0 , y0 ) is cos 0 = 1 6= 0.
√
Since f ( π4 , − π4 ) = (0, − 2), the inverse function g(u, v) = (x, y) is defined in a neighborhood
√
√
of (0, − 2) and the Jacobi matrix of g at (0, − 2) is
!−1
!
√1
√1
√1
√1
−
2
2
2
2
=
.
√1
√1
√1
√1
−
2
2
2
2
Note that at point ( π4 , π4 ) the Jacobian of f vanishes. There is indeed no neighborhood of ( π4 , π4 )
where f is injective since for all t ∈ R
π π π
π
= (0, 0).
+ t, − t = f
,
f
4
4
4 4
7.6 The Implicit Function Theorem
Motivation: Hyper Surfaces
Suppose that F : U → R is a continuously differentiable function and grad F (x) 6= 0 for all
x ∈ U. Then
S = {(x1 , . . . , xn ) | F (x1 , . . . , xn ) = 0}
is called a hyper surface in Rn . A hyper surface in Rn has dimension n − 1. Examples are
hyper planes a1 x1 + · · · + an xn + c = 0 ((a1 , . . . , an ) 6= 0), spheres x21 + · · · + x2n = r 2 . The
graph of differentiable functions f : U → R is also a hyper surface in Rn+1
Γf = {(x, f (x)) ∈ Rn+1 | x ∈ U}.
Question: Is any hyper surface locally the graph of a differentiable function? More precisely,
we may ask the following question: Suppose that f : Rn × R → R is differentiable and
f (a1 , . . . , an , b) = 0. Can we find for each (x1 , . . . , xn ) near (a1 , . . . , an ) a unique y near b
such that f (x1 , . . . , xn , y) = 0? The answer to this question is provided by the Implicit Function Theorem (IFT).
Consider the function f : R2 → R defined by f (x, y) = x2 + y 2 − 1. If we choose (a, b) with
a, b > 0, there are open intervals A and B containig a and b with the following property: if
x ∈ A, there is a unique y ∈ B with f (x, y) = 0. We can therefore define a function g : A → B
√
by the condition g(x) ∈ B and f (x, g(x)) = 0. If b > 0 then g(x) = 1 − x2 ; if b < 0 then
√
g(x) = − 1 − x2 . Both functions g are differentiable. These functions are said to be defined
implicitly by the equation f (x, y) = 0.
On the other hand, there exists no neighborhood of (1, 0) such that f (x, y) = p
0 can locally be
solved for y. Note that fy (1, 0) = 0. However it can be solved for x = h(y) = 1 − y 2.
Theorem 7.19 Suppose that f : Rn × Rm → Rm , f = f (x, y), is continuously differentiable
in an open set containing (a, b) ∈ Rn+m and f (a, b) = 0. Let Dfy (x, y) be the linear mapping
from Rm into Rm given by
∂(f1 , . . . , fm )
∂fi (x, y)
i
Dy f (x, y) = Dn+j f (x, y) =
, i, j = 1, . . . , m.
(x, y) =
∂(y1 , . . . , ym )
∂yj
(7.45)
7 Calculus of Functions of Several Variables
220
If det Dy f (a, b) 6= 0 there is an open set A ⊂ Rn containing a and an open set B ⊂ Rm
containing b with the following properties: There exists a unique continuously differentiable
function g : A → B such that
(a) g(a) = b,
(b) f (x, g(x)) = 0 for all x ∈ A.
For the derivative Dg(x) ∈ L(Rn , Rm ) we have
Dg(x) = −(Dfy (x, g(x)))−1 ◦Dfx (x, g(x)),
g ′ (x) = −(fy′ (x, g(x)))−1 ·fx′ (x, g(x)).
The Jacobi matrix g ′ (x) is given by
∂(g1 , . . . , gm )
∂(f1 , . . . , fm )
′
−1
(x) = −fy (x, g(x)) ·
(x, g(x))
(7.46)
∂(x1 , . . . , xn )
∂(x1 , . . . , xn )
n
X
∂(gk (x))
∂fl (x, g(x))
=−
(fy′ (x, g(x))−1 )kl ·
, k = 1, . . . , m, j = 1, . . . , n.
∂xj
∂xj
l=1
Idea of Proof. Define F : Rn × Rm → Rn × Rm by F (x, y) = (x, f (x, y)). Let M = fy′ (a, b).
Then
1n 0n,m
′
F (a, b) =
=⇒ det F ′ (a, b) = det M 6= 0.
0m,n M
By the inverse mapping theorem Theorem 7.17 there exists an open set W ⊂ Rn × Rm containing F (a, b) = (a, 0) and an open set V ⊂ Rn × Rm containing (a, b) which may be of the
form A × B such that F : A × B → W has a differentiable inverse h : W → A × B.
Since g is differentiable, it is easy to find the Jacobi matrix. In fact, since fi (x, g(x)) = 0,
∂f
on both sides gives by the chain rule
i = 1, . . . n, taking the partial derivative ∂x
j
m
∂fi (x, g(x)) X ∂fi (x, g(x)) ∂gk (x)
0=
+
·
∂xj
∂yk
∂xj
k=1
∂fi (x, g(x))
∂gk (x)
′
0=
+ fy (x, g(x)) ·
∂xj
∂xj
∂fi (x, g(x))
∂gk (x)
−
= fy′ (x, g(x)) ·
.
∂xj
∂xj
∂fi (x, g(x))
−
∂xj
Since det fy′ (a, b) 6= 0, det fy′ (x, y) 6= 0 in a small neighborhood of (a, b). Hence fy′ (x, g(x))
is invertible and we can multiply the preceding equation from the left by (fy′ (x, g(x)))−1 which
gives (7.46).
Remarks 7.9 (a) The theorem gives a sufficient condition for “locally” solving the system of
equations
0 = f1 (x1 , . . . , xn , y1 , . . . ym ),
..
.
0 = fm (x1 , . . . , xn , y1 , . . . , ym )
7.6 The Implicit Function Theorem
221
with given x1 , . . . , xn for y1 , . . . , ym .
(b) We rewrite the statement in case n = m = 1: If f (x, y) is continously differentiable on an
open set G ⊂ R2 which contains (a, b) and f (a, b) = 0. If fy (a, b) 6= 0 then there exist δ, ε > 0
such that the following holds: for every x ∈ Uδ (a) there exists a unique y = g(x) ∈ Uε (b) with
f (x, y) = 0. We have g(a) = b; the function y = g(x) is continuously differentiable with
g ′ (x) = −
Be careful, note fx (x, g(x)) 6=
d
dx
fx (x, g(x))
.
fy (x, g(x))
(f (x, g(x))).
Example 7.16 (a) Let f (x, y) = sin(x + y) + exy − 1. Note that f (0, 0) = 0. Since
fy (0, 0) = cos(x + y) + xexy |(0,0) = cos 0 + 0 = 1 6= 0
f (x, y) = 0 can uniquely be solved for y = g(x) in a neighborhood of x = 0, y = 0. Further
fx (0, 0) = cos(x + y) + yexy |(0,0) = 1.
By Remark 7.9 (b)
fx (x, y) cos(x + g(x)) + g(x) exg(x)
=
.
g (x) = −
fy (x, y) y=g(x)
cos(x + g(x)) + x exg(x)
′
In particular g ′ (0) = −1.
Remark. Differentiating the equation fx + fy g ′ = 0 we obtain
0 = fxx + fxy g ′ + (fyx + fyy g ′ )g ′ + fy g ′′
1
fxx + 2fxy g ′ + fyy (g ′)2
g ′′ = −
fy
−fxx fy2 + 2fxy fx fy − fyy fx2
′′
g
=
.
fy3
g ′ =− ffx
y
Since
fxx (0, 0) = − sin(x + y) + y 2 exy (0,0) = 0,
fyy (0, 0) = − sin(x + y) + x2 exy (0,0) = 0,
fxy (0, 0) = − sin(x + y) + exy (1 + xy)|(0,0) = 1,
we obtain g ′′ (0) = 2. Therefore the Taylor expansion of g(x) around 0 reads
g(x) = x + x2 + r3 (x).
(b) Let γ(t) = (x(t), y(t)) be a differentiable curve γ ∈ C2 ([0, 1]) in R2 . Suppose in a neighborhood of t = 0 the curve describes a function y = g(x). Find the Taylor polynomial of degree
2 of g at x0 = x(0).
7 Calculus of Functions of Several Variables
222
Inserting the curve into the equation y = g(x) we have y(t) = g(x(t)). Differentiation gives
ẏ = g ′ ẋ,
ÿ = g ′′ẋ2 + g ′ ẍ
Thus
g ′(x) =
ẏ
,
ẋ
g ′′(x) =
ÿ − g ′ẍ
ÿ ẋ − ẍẏ
=
ẋ2
ẋ3
Now we have the Taylor ploynomial of g at x0
T2 (g)(x) = x0 + g ′(x0 )(x − x0 ) +
g ′′ (x0 )
(x − x0 )2 .
2
(c) The tangent hyper plane to a hyper surface.
Anyone who understands geometry can understand everything in this world
(Galileo Galilei, 1564 – 1642)
Suppose that F : U → R is continuously differentiable, a ∈ U, F (a) = 0, and grad F (a) 6= 0.
Then
n
X
∇F (a)·(x − a) =
Fxi (a)(xi − ai ) = 0
i=1
is the equation of the tangent hyper plane to the surface F (x) = 0 at point a.
Proof. Indeed, since the gradient at a is nonzero we may assume without loss of generality that
Fxn (a) 6= 0. By the IFT, F (x1 , . . . , xn−1 , xn ) = 0 is locally solvable for xn = g(x1 , . . . , xn−1 )
in a neighborhood of a = (a1 , . . . , an ) with g(ã) = an , where ã = (a1 , . . . , an−1 ) and
x̃ = (x1 , . . . , xn−1 ). Define the tangent hyperplane to be the graph of the linearization of g
at (a1 , . . . , an−1 , an ). By Example 7.7 (a) the hyperplane to the graph of g at ã is given by
xn = g(ã) + grad g(ã)·x̃.
Since F (ã, g(ã)) = 0, by the implicit function theorem
Fx (a)
∂g(ã)
=− j
,
∂xj
Fxn (a)
j = 1, . . . , n − 1.
Inserting this into (7.47) we have
n−1
xn − an = −
1 X
Fx (a)(xj − aj ).
Fxn (a) j=1 j
Multiplication by −Fxn (a) gives
−Fxn (a)(xn − an ) =
n−1
X
j=1
Fxj (a)(xj − aj ) =⇒ 0 = grad F (a)·(x − a).
(7.47)
7.7 Lagrange Multiplier Rule
223
U−1
U0
U1
Let f : U → R be differentiable. For c ∈ R
define the level set Uc = {x ∈ U | f (x) =
c}. The set Uc may be empty, may consist of
a single point (in case of local extrema) or, in
the “generic” case, that is if grad F (a) 6= 0 and
Uc is non-empty, Uc it is a (n − 1)-dimensional
hyper surface. {Uc | c ∈ R} is family of nonintersecting subsets of U which cover U.
7.7 Lagrange Multiplier Rule
This is a method to find local extrema of a function under certain constraints.
Consider the following problem: Find local extremma of a function f (x, y) of two variables
where x and y are not independent from each other but satisfy the constraint
ϕ(x, y) = 0.
Suppose further that f and ϕ are continuously differentiable. Note that the level sets Uc =
{(x, y) ∈ R2 | f (x, y) = c} form a family of non-intersecting curves in the plane.
f=c
We have to find the curve f (x, y) = c intersecting
the constraint curve ϕ(x, y) = 0 where c is as large
or as small as possible. Usually f = c intersects
ϕ = 0 if c monotonically changes. However if c is
maximal, the curve f = c touches the graph ϕ =
0. In other words, the tangent lines coincide. This
means that the defining normal vectors to the tangent
lines are scalar multiples of each other.
ϕ =0
Theorem 7.20 (Lagrange Multiplier Rule) Let f, ϕ : U → R, U ⊂ Rn is open, be continuously differentiable and f has a local extrema at a ∈ U under the constraint ϕ(x) = 0. Suppose
that grad ϕ(a) 6= 0.
Then there exists a real number λ such that
grad f (a) = λ grad ϕ(a).
This number λ is called Lagrange multiplier.
Proof. The idea is to solve the constraint ϕ(x) = 0 for one variable and to consider the “free”
extremum problem with one variable less. Suppose without loss of generality that ϕxn (a) 6=
0. By the implicit function theorm we can solve ϕ(x) = 0 for xn = g(x1 , . . . , xn−1 ) in a
neighborhood of x = a. Differentiating ϕ(x̃, g(x̃)) = 0 and inserting a = (ã, an ) as before we
have
ϕxj (a) + ϕxn (a)gxj (ã) = 0,
j = 1, . . . , n − 1.
(7.48)
7 Calculus of Functions of Several Variables
224
Since h(x̃) = f (x̃, g(x̃)) has a local extremum at ã all partial derivatives of h vanish at ã:
fxj (a) + fxn (a)gxj (ã) = 0,
j = 1, . . . , n − 1.
(7.49)
Setting λ = fxn (a)/ϕxn (a) and comparing (7.48) and (7.49) we find
fxj (a) = λϕxj (a),
j = 1, . . . , n − 1.
Since by definition, fxn (a) = λϕxn (a) we finally obtain grad f (a) = λ grad ϕ(a) which
completes the proof.
Example 7.17 (a) Let A = (aij ) be a real symmetric n × n-matrix, and define f (x) = x·Ax =
P
n−1
= {x ∈ Rn | kxk = 1}.
i,j aij xi xj . We aks for the local extrema of f on the unit sphere S
P
This constraint can be written as ϕ(x) = kxk2 − 1 = ni=1 x2i − 1 = 0. Suppose that f attains
a local minimum at a ∈ Sn−1 . By Example 7.6 (b)
grad f (a) = 2A(a).
On the other hand
grad ϕ(a) = (2x1 , . . . , 2xn )|x=a = 2a.
By Theorem 7.20 there exists a real number λ1 such that
grad f (a) = 2A(a) = λ1 grad ϕ(a) = 2a,
Hence A(a) = λ1 a; that is, λ is an eigenvalue of A and a the corresponding eigenvector. In
particular, A has a real eigenvalue. Since Sn−1 has no boundary, the global minimum is also a
local one. We find: if f (a) = a·A(a) = a·λa = λ is the global minimum, λ is the smallest
eigenvalue.
(b) Let a be the point of a hypersurface M = {x | ϕ(x) = 0} with minimal distance to a given
point b 6∈ M. Then the line through a and b is orthogonal to M.
Indeed, the function f (x) = kx − bk2 attains its minimum under the condition ϕ(x) = 0 at a.
By the Theorem, there is a real number λ such that
grad f (a) = 2(a − b) = λ grad ϕ(a).
The assertion follows since by Example 7.16 (c), grad ϕ(a) is orthogonal to M at a and b − a
is a multiple of the normal vector ∇ϕ(a).
Theorem 7.21 (Lagrange Multiplier Rule — extended version) Let f, ϕi : U → R, i =
1, . . . , m, m < n, be continuously differentiable functions. Let M = {x ∈ U | ϕ1 (x) = · · · =
ϕm (x) = 0} and suppose that f (x) has a local extrema at a under the constraints x ∈ M.
Suppose further that the Jacobi matrix ϕ′ (a) ∈ Rm×n has maximal rank m.
Then there exist real numbers λ1 , . . . , λm such that
grad f (a) = grad (λ1 ϕ1 + · · · + λm ϕm )(a) = 0.
Note that the rank condition ensures that there is a choice of m variables out of x1 , . . . , xn such
that the Jacobian of ϕ1 , . . . , ϕm with respect to this set of variable is nonzero at a.
7.8 Integrals depending on Parameters
225
7.8 Integrals depending on Parameters
Rb
Problem: Define I(y) = a f (x, y) dx; what are the relations between properties of f (x, y) and
of I(y) for example with respect to continuity and differentiability.
7.8.1 Continuity of I(y)
Proposition 7.22 Let f (x, y) be continuous on the rectangle R = [a, b] × [c, d].
Rb
Then I(y) = a f (x, y) dx is continuous on [c, d].
Proof. Let ε > 0. Since f is continuous on the compact set R, f is uniformly continuous on R
(see Proposition 6.25). Hence, there is a δ > 0 such that | x − x′ | < δ and | y − y ′ | < δ and
(x, y), (x′, y ′ ) ∈ R imply
| f (x, y) − f (x′ , y ′) | < ε.
Therefore, | y − y0 | < δ and y, y0 ∈ [c, d] imply
Z b
(f (x, y) − f (x, y0 )) dx ≤ ε(b − a).
| I(y) − I(y0 ) | = a
This shows continuity of I(y) at y0 .
For example, I(y) =
R1
0
arctan xy dx is continuous for y > 0.
Remark 7.10 (a) Note that continuity at y0 means that we can interchange the limit and the
Z b
Z b
Z b
integral, lim
f (x, y) dx =
lim f (x, y) dx =
f (x, y0 ) dx.
y→y0
a y→y0
a
a
(b) A similar statement holds for y → ∞: Suppose that f (x, y) is continuous on [a, b]×[c, +∞)
and limy→+∞ f (x, y) = ϕ(x) exists uniformly for all x ∈ [a, b] that is
∀ ε > 0 ∃ R > 0 ∀ x ∈ [a, b], y ≥ R : | f (x, y) − ϕ(x) | < ε.
Then
Rb
a
ϕ(x) dx exists and lim I(y) =
y→∞
Z
b
ϕ(x) dx.
a
7.8.2 Differentiation of Integrals
Proposition 7.23 Let f (x, y) be defined on R = [a, b] × [c, d] and continuous as a function of x
for every fixed y. Suppose that fy (x, y) exists for all (x, y) ∈ R and is continuous as a function
of the two variables x and y.
Then I(y) is differentiable and
d
I (y) =
dy
′
Z
a
b
f (x, y) dx =
Z
a
b
fy (x, y) dx.
7 Calculus of Functions of Several Variables
226
Proof. Let ε > 0. Since fy (x, y) is continuous, it is uniformly continuous on R. Hence there
exists δ > 0 such that | x′ − x′′ | < δ and | y ′ − y ′′ | < δ imply | fy (x′ , y ′) − fy (x′′ , y ′′) | < ε.
We have for | h | < δ
Z b Z b
I(y0 + h) − I(y0 )
f (x, y0 + h) − f (x, y0 )
−
− fy (x, y0 ) dx fy (x, y0 ) dx ≤ h
h
a
a
Z b
| fy (x, y0 + θh) − fy (x, y0 ) | dx < ε(b − a)
≤
Mean value theorem
a
for some θ ∈ (0, 1). Since this inequality holds for all small h, it holds for the limit as h → 0,
too. Thus,
Z b
′
I (y0 ) −
fy (x, y0 ) dx ≤ ε(b − a).
a
Since ε was arbitrary, the claim follows.
In case of variable integration limits we have the following theorem.
Proposition 7.24 Let f (x, y) be as in Proposition 7.23. Let α(y) and β(y) be differentiable on
[c, d], and suppose that α([c, d]) and β([c, d]) are contained in [a, b].
R β(y)
Let I(y) = α(y) f (x, y) dx. Then I(y) is differentiable and
′
I (y) =
Z
Proof. Let F (y, u, v) =
of calculus yields
β(y)
α(y)
Rv
u
fy (x, y) dx + β ′ (y)f (β(y), y) − α′ (y)f (α(y), y).
f (x, y) dx; then I(y) = F (y, α(y), β(y)). The fundamental theorem
∂F
(y, u, v) =
∂v
∂F
(y, u, v) =
∂u
Z v
∂
f (x, y) dx = f (v, y),
∂v u
Z u
∂
−
f (x, y) dx = −f (u, y).
∂u
v
By the chain rule, the previous proposition and (7.51) we have
∂F ′
∂F
∂F ′
+
α (y) +
β (y)
∂y
∂u
∂v
∂F
∂F
∂F
=
(y, α(y), β(y)) +
(y, α(y), β(y)) α′(y) +
(y, α(y), β(y)) β ′(y)
∂y
∂u
∂v
Z β(y)
fy (x, y) dx + α′ (y)(−f (α(y), y)) + β ′ (y)f (β(y), y).
=
I ′ (y) =
α(y)
(7.50)
(7.51)
7.8 Integrals depending on Parameters
227
R4
Example 7.18 (a) I(y) = 3 sin(xy)
dx is differentiable by Proposition 7.23 since fy (x, y) =
x
cos(xy)
x = cos(xy) is continuous. Hence
x
′
I (y) =
Z
4
sin(xy) sin 4y sin 3y
−
.
cos(xy) dx =
=
y
y
y
3
4
3
(b) I(y) =
R sin y
log y
2
ex y dx is differentiable with
′
I (y) =
Z
sin y
2
2
x2 ex y dx + cos yey sin
log y
y
1
2
− ey(log y) .
y
7.8.3 Improper Integrals with Parameters
R∞
f (x, y) dx exists for y ∈ [c, d].
R∞
Definition 7.10 We say that the improper integral a f (x, y) dx converges uniformly with respect to y on [c, d] if for every ε > 0 there is an A0 > 0 such that A > A0 implies
Z ∞
Z A
I(y) −
f (x, y) dx ≡ f (x, y) dx < ε
Suppose that the improper integral
a
a
A
for all y ∈ [c, d].
Note that the Cauchy and Weierstraß criteria (see Proposition 6.1 and Theorem 6.2) for uniform
convergence of series of functions also hold for improper parametric integrals. For example the
theorem of Weierstraß now reads as follows.
RA
Proposition 7.25 Suppose that a f (x, y) dx exists for all A ≥ a and y ∈ [c, d]. Suppose
R∞
further that | f (x, y) | ≤ ϕ(x) for all x ≥ a and a ϕ(x) dx converges.
R∞
Then a f (x, y) dx converges uniformly with respect to y ∈ [c, d].
R∞
Example 7.19 I(y) = 1 e−xy xy y 2 dx converges uniformly on [2, 4] since
| f (x, y) | = e−xy xy y 2 ≤ e−2x x4 42 = ϕ(x).
R∞
and 1 e−2x x4 42 dx < ∞ converges.
If we add the assumption of uniform convergence then the preceding theorems remain true for
improper integrals.
Proposition 7.26 Let f (x, y) be continuous on {(x, y) ∈ R2 | a ≤ x < ∞, c ≤ y ≤ d}.
R∞
Suppose that I(y) = a f (x, y) dx converges uniformly with respect to y ∈ [c, d].
Then I(y) is continuous on [c, d].
Proof. This proof was not carried out in the lecture. Let ε > 0. Since the improper integral
converges uniformly, there exists A0 > 0 such that for all A ≥ A0 we have
Z ∞
<ε
f
(x,
y)
dx
A
7 Calculus of Functions of Several Variables
228
for all y ∈ [c, d]. Let A ≥ A0 be fixed. On {(x, y) ∈ R2 | a ≤ x ≤ A, c ≤ y ≤ d} f (x, y)
is uniformly continuous; hence there is a δ > 0 such that | x′ − x′′ | < δ and | y ′ − y ′′ | < δ
implies
ε
.
| f (x′ , y ′ ) − f (x′′ , y ′′) | <
A−a
Therefore,
Z A
ε
(A − a) = ε, for | y − y0 | < δ.
| f (x, y) − f (x, y0 ) | dx <
A−a
a
Finally,
| I(y) − I(y0) | =
Z
A
+
a
Z
∞
A
| f (x, y) − f (x, y0 ) | ≤ 2ε for
| y − y0 | < δ.
We skip the proof of the following proposition.
Proposition 7.27 Let fy (x, y) be continuous on {(x, y) | a ≤ x < ∞, c ≤ y ≤ d}, f (x, y)
continuous with respect to x for all fixed y ∈ [c, d].
R∞
Suppose that for all y ∈ [c, d] the integral I(y) = a f (x, y) dx exists and the integral
R∞
fy (x, y) dx converges uniformly with respect to y ∈ [c, d].
a
R∞
Then I(y) is differentiable and I ′ (y) = a fy (x, y) dx.
Combinig the results of the last Proposition and Proposition 7.25 we get the following corollary.
Corollary 7.28 Let fy (x, y) be continuous on {(x, y) | a ≤ x < ∞, c ≤ y ≤ d}, f (x, y)
continuous with respect to x for all fixed y ∈ [c, d].
Suppose that
R∞
(a) for all y ∈ [c, d] the integral I(y) = a f (x, y) dx exists,
(b) | fy (x, y) | ≤ ϕ(x) for all x ≥ a and all y
R∞
(c) a ϕ(x) dx exists.
R∞
Then I(y) is differentiable and I ′ (y) = a fy (x, y) dx.
R∞
2
2
Example 7.20 (a) I(y) = 0 e−x cos(2yx) dx. f (x, y) = e−x cos(2yx), fy (x, y) =
2
−2x sin(2yx) e−x converges uniformly with respect to y since
2
| fy (x, y) | ≤ 2xe−x ≤ Ke−x
Hence,
′
I (y) = −
Z
∞
2 /2
.
2
2x sin(2yx) e−x dx.
0
′
2
2
Integration by parts with u = sin(2yx), v = −e−x 2x gives u′ = 2y cos(2yx), v = e−x and
Z A
Z A
2
−x2
−A2
−e 2x sin(2yx) dx = sin(2yA) e
−
2y cos(2yx) e−x dx.
0
0
7.8 Integrals depending on Parameters
229
As A → ∞ the first summand on the right tends to 0; thus I(y) satisfies the ordinary differential
equation
I ′ (y) = −2yI(y).
ODE: y ′ = −2xy; dy = −2xy dx; dy/y = −2x dx. Integration yields log y = −x2 + c;
2
y = c′ e−x .
2
The general solution is I(y) = Ce−y . We determine the constant C. Insert y = 0. Since
R∞
√
2
I(0) = 0 e−x dx = π/2, we find
√
π −y2
I(y) =
e .
2
R∞
(b) The Gamma function Γ(x) = 0 tx−1 e−t dt
is in C∞ (R+ ). Let x > 0, say x ∈
[c, d]. Recall from Subsection 5.3.3 the definition and the proof of the convergence of the imR1
proper integrals Γ1 = 0 f (x, t) dt and Γ2 =
R∞
f (x, t) dt, where f (x, t) = tx−1 e−t . Note
1
that Γ1 (x) is an improper integral at t = 0 + 0.
By L’Hospital’s rule lim tα log t = 0 for all
Gamma Function
2.2
2
1.8
1.6
1.4
1.2
t→0+0
α > 0. In particular, | log t | < t−c/2 if 0 < t <
t0 < 1.
1
0.5
1
1.5
2
2.5
3
x
Since e−t < 1 and moreover tx−1 < tc−1 for t < t0 by Lemma 1.23 (b) we conclude that
∂
f (x, t) = tx−1 log te−t ≤ | log t | tc−1 ≤ t− 2c tc−1 = 1 c ,
∂x
t1− 2
for 0 < t < t0 . Since ϕ(t) = 1−1 2c is integrable over [0, 1], Γ1 (x) is differentiable by the
t
R1
Corrollary with Γ′1 (x) = 0 tx−1 log te−t dt. Similarly, Γ2 (x) is an improper integral over an
unbounded interval [1, +∞), for sufficiently large t ≥ t0 > 1, we have log t < t and tx < td ,
such that
∂
f (x, t) = tx−1 log te−t ≤ tx e−t ≤ td e−t ≤ td e−t/2 e−t/2 ≤ Me−t/2 .
∂x
Since td e−t/2 tends to 0 as t → ∞, it is bounded by some constant M and e−t/2 is integrable on
[1, +∞) such that Γ2 (x) is differentiable with
Z ∞
′
Γ2 (x) =
tx−1 log te−t dt.
1
Consequently, Γ(x) is differentiable for all x > 0 with
Z ∞
′
tx−1 log t e−t dt.
Γ (x) =
0
Similarly one can show that Γ ∈ C (R>0 ) with
Z ∞
(k)
Γ (x) =
tx−1 (log t)k e−t dt.
∞
0
7 Calculus of Functions of Several Variables
230
7.9 Appendix
∂fi
Proof of Proposition 7.7. Let A = (Aij ) =
be the matrix of partial derivatives consid∂xj
ered as a linear map from Rn to Rm . Our aim is to show that
kf (a + h) − f (a) − A hk
= 0.
h→0
khk
lim
For, it suffices to prove the convergence to 0 for each coordinate i = 1, . . . , m by Proposition 6.26
P
fi (a + h) − fi (a) − nj=1 Aij hj
= 0.
(7.52)
lim
h→0
khk
Without loss of generality we assume m = 1 and f = f1 . For simplicity, let n = 2, f = f (x, y),
a = (a, b), and h = (h, k). Note first that by the mean value theorem we have
f (a + h, b + k) − f (a, b) = f (a + h, b + k) − f (a, b + k) + f (a, b + k) − f (a, b)
∂f
∂f
(ξ, b + k)h +
(a, η)k,
=
∂x
∂y
where ξ ∈ (a, a + h) and η ∈ (b, b + k). Using this, the expression in(7.52) reads
(a, b) h − ∂f
(a, b) k
f (a + h, b + k) − f (a, b) − ∂f
∂x
∂y
√
=
h2 + k 2
∂f
∂f
∂f
∂f
(ξ,
b
+
k)
−
(a,
b)
h
+
(a,
η)
−
(a,
b)
k
∂x
∂x
∂y
∂x
√
.
=
h2 + k 2
Since both ∂f
and ∂f
are continuous at (a, b),
∂x
∂y
Uδ ((a, b)) implies
∂f
∂f
∂x (x, y) − ∂x (a, b) < ε,
This shows
∂f (ξ, b + k) −
∂x
∂f
(a, b)
∂x
√
h+
h2 + k 2
given ε > 0 we find δ > 0 such that (x, y) ∈
∂f
(a, η)
∂y
∂f
∂f
∂y (x, y) − ∂y (a, b) < ε.
−
∂f
(a, b)
∂x
k
ε|h| + ε|k|
≤ √
≤ 2ε,
h2 + k 2
(a, b) ∂f
(a, b)).
hence f is differentiable at (a, b) with Jacobi matrix A = ( ∂f
∂x
∂y
Since both components of A—the partial derivatives—are continuous functions of (x, y), the
assignment x 7→ f ′ (x) is continuous by Proposition 6.26.
Chapter 8
Curves and Line Integrals
8.1 Rectifiable Curves
8.1.1 Curves in
R
k
We consider curves in Rk . We define the tangent vector, regular points, angle of intersection.
Definition 8.1 A curve in Rk is a continuous mapping γ : I → Rk , where I ⊂ R is a closed
interval consisting of more than one point.
The interval can be I = [a, b], I = [a, +∞), or I = R. In the first case γ(a) and γ(b) are
called the initial and end point of γ. These two points derfine a natural orientation of the curve
“from γ(a) to γ(b)”. Replacing γ(t) by γ(a + b − t) we obtain the curve from γ(b) to γ(a) with
opposite orientation.
If γ(a) = γ(b), γ is said to be a closed curve. The curve γ is given by a k-tupel γ = (γ1 , . . . , γk )
of continuous real-valued functions. If γ is differentiable, the curve is said to be differentiable.
Note that we have defined the curve to be a mapping, not a set of points in Rk . Of course, with
each curve γ in Rk there is associated a subset of Rk , namely the image of γ,
C = γ(I) = {γ(t) ∈ Rk | t ∈ I}.
but different curves γ may have the same image C = γ(I). The curve is said to be simple if γ
is injective on the inner points I ◦ of I. A simple curve has no self-intersection.
Example 8.1 (a) A circle in R2 of radius r > 0 with center (0, 0) is described by the curve
γ : [0, 2π] → R2 ,
γ(t) = (r cos t, r sin t).
Note that γ̃ : [0, 4π] → R2 with γ̃(t) = γ(t) has the same image but is different from γ. γ is a
simple curve, γ̃ is not.
(b) Let p, q ∈ Rk be fixed points, p 6= q. Then
γ1 (t) = (1 − t)p + tq,
γ2 (t) = (1 − t)p + tq,
231
t ∈ [0, 1],
t ∈ R,
8 Curves and Line Integrals
232
are the segment pq from p to q and the line pq through p and q, respectively. If v ∈ Rk is a
vector, then γ3 (t) = p + tv, t ∈ R, is the line through p with direction v.
(c) If f : [a, b] → R is a continuous function, the graph of f is a curve in R2 :
γ : [a, b] → R2 ,
γ(t) = (t, f (t)).
(d) Implicit Curves. Let F : U ⊂ R2 → R be continuously differentiable, F (a, b) = 0, and
∇F (a, b) 6= 0 for some point (a, b) ∈ U. By the Implicit function theorem, F (x, y) = 0 can
locally be solved for y = g(x) or x = f (y). In both cases γ(t) = (t, g(t)) and γ(t) = (f (t), t)
is a curve through (a, b). For example,
F (x, y) = y 2 − x3 − x2 = 0
is locally solvable except for (a, b) = (0, 0). The corresponding curve is Newton’s knot
Definition 8.2 (a) A simple curve γ : I → Rk is said to be regular at t0 if γ is continuously
differentiable on I and γ ′ (t0 ) 6= 0. γ is regular if it is regular at every point t0 ∈ I.
(b) The vector γ ′ (t0 ) is called the tangent vector, α(t) = γ(t0 ) + tγ ′ (t0 ), t ∈ R, is called the
tangent line to the curve γ at point γ(t0 ).
Remark 8.1 The moving partice. Let t the time variable and s(t) the coordinates of a point
vector of the moving point.
moving in Rk . Then the tangent vector v(t) = s′ (t) is the velocityp
The instantaneous velocity is the euclidean norm of v(t) kv(t)k = s′1 (t)2 + · · · + s′k (t)2 . The
acceleration vector is the second derivative of s(t), a(t) = v ′ (t) = s′′ (t).
Let γi : Ii → Rk , i = 1, 2, be two regular curves with a common point γ1 (t1 ) = γ2 (t2 ). The
angle of intersection ϕ between the two curves γi at ti is defined to be the angle between the
two tangent lines γ1′ (t1 ) and γ2′ (t2 ). Hence,
cos ϕ =
γ1′ (t1 )·γ2′ (t2 )
,
kγ1′ (t1 )k kγ2′ (t2 )k
ϕ ∈ [0, π].
NewtonsKnot
1.5
1
0.5
0
-1
-0.5
0
-0.5
-1
-1.5
0.5
1
Example 8.2 (a) Newton’s knot. The curve
γ : R → R2 given by γ(t) = (t2 − 1, t3 − t)
is not injective since γ(−1) = γ(1) = (0, 0) =
x0 . The point x0 is a double point of the curve.
In general γ has two different tangent lines at
a double point. Since γ ′ (t) = (2t, 3t2 − 1) we
have γ ′ (−1) = (−2, 2) and γ ′ (1) = (2, 2). The
curve is regular since γ ′ (t) 6= 0 for all t.
Let us compute the angle of self-intersection. Since γ(−1) = γ(1) = (0, 0), the selfintersection angle ϕ satisfies
cos ϕ =
(−2, 2)·(2, 2)
= 0,
8
8.1 Rectifiable Curves
233
hence ϕ = 90◦ , the intersection is orthogonal.
(b) Neil’s parabola. Let γ : R → R2 be given by γ(t) = (t2 , t3 ). Since γ ′ (t) = (2t, 3t2 ), the
origin is the only singular point.
8.1.2 Rectifiable Curves
The goal of this subsection is to define the length of a curve. For differentiable curves, there
is a formula using the tangent vector. However, the “lenght of a curve” makes sense for some
non-differentiable, continuous curves.
Let γ : [a, b] → Rk be a curve. We associate to each partition P = {t0 , . . . , tn } of [a, b] the
points xi = γ(ti ), i = 0, . . . , n, and the number
ℓ(P, γ) =
n
X
i=1
kγ(ti ) − γ(ti−1 )k .
(8.1)
The ith term in this sum is the euclidean distance of the points xi−1 = γ(ti−1 ) and xi = γ(ti ).
x3
Hence ℓ(P, γ) is the length of the polygonal
path with vertices x0 , . . . , xn . As our partition becomes finer and finer, this polygon approaches the image of γ more and more closely.
x0
x11
x2
a
t1
t2
b
Definition 8.3 A curve γ : [a, b] → Rk is said to be rectifiable if the set of non-negative real
numbers {ℓ(P, γ) | P is a partition of [a, b]} is bounded. In this case
ℓ(γ) = sup ℓ(P, γ),
where the supremum is taken over all partitions P of [a, b], is called the length of γ.
In certain cases, ℓ(γ) is given by a Riemann integral. We shall prove this for continuously
differentiable curves, i. e. for curves γ whose derivative γ ′ is continuous.
Proposition 8.1 If γ ′ is continuous on [a, b], then γ is rectifiable, and
Z b
kγ ′ (t)k dt.
ℓ(γ) =
a
R ti ′
Proof. If a ≤ ti−1 < ti ≤ b, by Theorem 5.28, γ(ti ) − γ(ti−1 ) = ti−1
γ (t) dt. Applying
Proposition 5.29 we have
Z ti
Z ti
′
kγ(ti ) − γ(ti−1 )k = γ (t) dt
kγ ′ (t)k dt.
≤
ti−1
Hence
ℓ(P, γ) ≤
Z
ti−1
b
a
kγ ′ (t)k dt
8 Curves and Line Integrals
234
for every partition P of [a, b]. Consequently,
Z b
ℓ(γ) ≤
kγ ′ (t)k dt.
a
To prove the opposite inequality, let ε > 0 be given. Since γ ′ is uniformly continuous on [a, b],
there exists δ > 0 such that
kγ ′ (s) − γ ′ (t)k < ε if
| s − t | < δ.
Let P be a partition with ∆ti ≤ δ for all i. If ti−1 ≤ t ≤ ti it follows that
kγ ′ (t)k ≤ kγ ′ (ti )k + ε.
Hence
Z
ti
kγ ′ (t)k dt ≤ kγ ′ (ti )k ∆ti + ε∆ti
ti−1
Z ti
′
′
′
+ ε∆ti
=
(γ
(t)
−
γ
(t
)
−
γ
(t))
dt
i
t
Z i−1
Z
ti ′
ti ′
′
+
+ ε∆ti
≤
γ
(t)
dt
(γ
(t
)
−
γ
(t))
dt
i
ti−1
ti−1
′
′
≤ kγ (ti ) − γ (ti−1 )k + 2ε∆ti .
If we add these inequalities, we obtain
Z b
kγ ′ (t)k dt ≤ ℓ(P, γ) + 2ε(b − a) ≤ ℓ(γ) + 2ε(b − a).
a
Since ε was arbitrary,
Z
a
This completes the proof.
b
kγ ′ (t) dtk ≤ ℓ(γ).
Special Case k = 2
k = 2, γ(t) = (x(t), y(t)), t ∈ [a, b]. Then
Z bp
ℓ(γ) =
x′ (t)2 + y ′ (t)2 dt.
a
In particular, let γ(t) = (t, fp
(t)) be the graph of a continuously differentiable function
Rb
f : [a, b] → R. Then ℓ(γ) = a 1 + (f ′ (t))2 dt.
Example 8.3 Catenary Curve. Let f (t) = a cosh at , t ∈ [0, b], b > 0. Then f ′ (t) = sinh at
and moreover
s
b
2
Z b
Z b
t
t
b
t ℓ(γ) =
1 + sinh
dt =
cosh dt = a sinh = a sinh .
a
a
b 0
a
0
0
8.1 Rectifiable Curves
235
Cycloid and Circle
2
1.5
1
0.5
0
1
2
3
4
5
6
(b) The position of a bulge in a bicycle
tire as it rolls down the street can be
parametrized by an angle θ as shown
in the figure.
Cycloid
Defining Circle
Curve 3
Curve 4
Let the radius of the tire be a. It can be verified by plane trigonometry that
a(θ − sin θ)
.
γ(θ) =
a(1 − cos θ)
This curve is called a cycloid.
Find the distance travelled by the bulge for 0 ≤ θ ≤ 2π.
θ
Using 1 − cos θ = 2 sin2 we have
2
γ ′ (θ) = a(1 − cos θ, sin θ)
q
√
′
kγ (θ)k = a (1 − cos θ)2 + sin2 θ = a 2 − 2 cos θ
√ √
θ
= a 2 1 − cos θ = 2a sin .
2
Therefore,
ℓ(γ) = 2a
Z
2π
0
θ
sin dθ = −4a
2
2π
θ = 4a(− cos π + cos 0) = 8a.
cos
2 0
(c) The arc element ds. Formally the arc element of
a plane differentiable curve can be computed using the
pythagorean theorem
ds
p
dy
( ds)2 = ( dx)2 + ( dy)2 =⇒ ds = dx2 + dy 2
r
dy 2
ds = dx 1 +
dx
dx2
p
ds = 1 + (f ′ (x))2 dx.
(d) Arc of an Ellipse. The ellipse with equation x2 /a2 + y 2/b2 = 1 , 0 < b ≤ a, is parametrized
by γ(t) = (a cos t, b sin t), t ∈ [0, t0 ], such that γ ′ (t) = (−a sin t, b cos t). Hence,
Z t0 p
Z t0 p
2
a2 sin t + b2 cos2 t dt =
a2 − (a2 − b2 ) cos2 t dt
ℓ(γ) =
0
0
Z t0 √
1 − ε2 cos2 t dt,
=a
0
where ε =
√
a2 −b2
.
a
This integral can be transformed into the function
Z τp
E(τ, ǫ) =
1 − ε2 sin2 t dt
0
8 Curves and Line Integrals
236
is the elliptic integral of the second kind as defined in Chapter5.
(e) A non-rectifiable curve. Consider the graph γ(t) = (t, f (t)), t ∈ [0, 1], of the function f ,
(
π
,
0 < t ≤ 1,
t cos 2t
f (t) =
0,
t = 0.
Since lim f (t) = f (0) = 0, f is continuous and γ(t) is a curve. However, this curve is not rect→0+0
tifiable. Indeed, choose the partition Pk = {t0 = 0, 1/(4k), 1/(4k − 2), . . . , 1/4, 1/2, t2k+1 =
1
1} consisting of 2k + 1 points, ti = 4k−2i+2
, i = 1, . . . , 2k. Note that t0 = 0 and
t2k+1 = 1 play a special role and will be omitted in the calculations below. Then cos 2tπi =
(1, −1, 1, −1, . . . , −1, 1), i = 1, . . . , 2k. Thus
ℓ(Pk , γ) ≥
2k
X
p
i=2
(ti − ti−1
)2
+ (f (ti ) − f (ti−1
))2
≥
2k
X
i=2
| f (ti ) − f (ti−1 ) |
1
1
1 1
1
1
≥
+
+···+
+
+
+
4k − 2 4k
4k − 4 4k − 2
2 4
1
1
1
1
+
+···+
= +2
2
4
4k − 2
4k
which is unbounded for k → ∞ since the harmonic series is unbounded. Hence γ is not
rectifiable.
8.2 Line Integrals
A lot of physical applications are to be found in [MW85, Chapter 18]. Integration of vector
fields along curves is of fundamental importance in both mathematics and physics. We use the
concept of work to motivate the material in this section.
The motion of an object is described by a parametric curve ~x = ~x(t) = (x(t), y(t), z(t)). By
differentiating this function, we obtain the velocity ~v (t) = ~x′ (t) and the acceleration ~a(t) =
~x′′ (t). We use the physicist notation ~x˙ (t) and ~x¨(t) to denote derivatives with respect to the time
t.
According to Newton’s law, the total force F̃ acting on an object of mass m is
F̃ = m~a.
Since the kinetic energy K is defined by K = 12 m~v 2 = 12 m~v · ~v we have
1
K̇(t) = m(~v˙ · ~v + ~v · ~v˙ ) = m~a · ~v = F̃ · ~v.
2
The total change of the kinetic energy from time t1 to t2 , denoted W , is called the work done by
the force F̃ along the path ~x(t):
Z t2
Z t2
Z t2
W =
K̇(t) dt =
F̃ · ~v dt =
F̃ (t) · ~x˙ (t) dt.
t1
t1
t1
8.2 Line Integrals
237
Let us now suppose that the force F̃ at time t depends only on the position ~x(t). That is, we
assume that there is a vector field F~ (~x) such that F̃ (t) = F~ (~x(t)) (gravitational and electrostatic
attraction are position-dependent while magnetic forces are velocity-dependent). Then we may
rewrite the above integral as
Z
t2
W =
t1
F~ (~x(t)) · ~x˙ (t) dt.
In the one-dimensional case, by a change of variables, this can be simplified to
Z b
W =
F (x) dx,
a
where a and b are the starting and ending positions.
Definition 8.4 Let Γ = {~x(t) | t ∈ [r, s]}, be a continuously differentiable curve ~x(t) ∈
C1 ([r, s]) in Rn and f~ : Γ → Rn a continuous vector field on Γ . The integral
Z
Z s
~
f (~x) · d~x =
f~(~x(t)) · ~x˙ (t) dt
Γ
r
is called the line integral of the vector field f~ along the curve Γ .
Remark 8.2 (a) The definition of the line integral does not depend on the parametrization of
Γ.
(b) If we take different curves between the same endpoints, the line integral may be different.
R
(c) If the vector field f~ is orthogonal to the tangent vector, then Γ f~ · d~x = 0.
(d) Other notations. If f~ = (P, Q) is a vector field in R2 ,
Z
Z
~
f · d~x =
P dx + Q dy,
Γ
Γ
R
where the right side is either a symbol or Γ P dx = Γ (P, 0) · d~x.
R
Example 8.4 (a) Find the line integral Γi y dx + (x − y) dy, i =
1, 2, where
R
Γ1 = {~x(t) = (t, t2 ) | t ∈ [0, 1]}
Γ
and Γ2 = Γ3 ∪ Γ4 ,
Γ
y dx + (x − y) dy =
R
R
Z
1
2
0
2
(t · 1 + (t − t )2t) dt =
R
Γ
4
1
with Γ3 = {(t, 0) | t ∈ [0, 1]}, Γ4 = {(1, t) | t ∈ [0, 1]}.
In the first case ~x˙ (t) = (1, 2t); hence
Z
(1,1)
(0,0)
Z
0
1
Γ3
(1,0)
1
(3t2 − 2t3 ) dt = .
2
In the second case Γ2 f d~x = Γ3 f d~x + Γ4 f d~x. For the first part ( dx, dy) = ( dt, 0), for the
second part ( dx, dy) = (0, dt) such that
1
Z
Z
Z 1
Z 1
1 2 1
f d~x =
y dx + (x − y) dy =
0 dt + (t − 0) · 0 +
t · 0 + (1 − t) dt = t − t = .
2 0 2
Γ
Γ
0
0
(b) Find the work done by the force field F~ (x, y, z) = (y, −x, 1) as a particle moves from
(1, 0, 0) to (1, 0, 1) along the following paths ε = ±1:
8 Curves and Line Integrals
238
~x(t)ε = (cos t, ε sin t, 2πt ), t ∈ [0, 2π],
We find
Z
Γε
F~ · d~x =
Z
2π
(ε sin t, − cos t, 1) · (− sin t, ε cos t, 1/(2π)) dt
Z 2π 1
2
2
=
dt
−ε sin t − ε cos t +
2π
0
0
= −2πε + 1.
In case ε = 1, the motion is “with the force”, so the work is positive; for the path ε = −1, the
motion is against the force and the work is negative.
We can also define a scalar line integral in the following way. Let γ : [a, b] → Rn be a continuously differentiable curve, Γ = γ([a, b]), and f : Γ → R a continuous function. The integral
Z
Z b
f (x) ds :=
f (γ(t)) kγ ′ (t)k dt
Γ
a
is called the scalar line integral of f along Γ .
Properties of Line Integrals
Remark 8.3 (a) Linearity.
Z
Z
Z
~
~
f d~x + ~g d~x,
(f + ~g ) d~x =
Γ
Γ
Γ
Z
λf~ d~x = λ
Γ
Z
f~ d~x.
Γ
(b) Change of orientation. If ~x(t), t ∈ [r, s] defines a curve Γ which goes from a = ~x(r) to
b = ~x(s), then ~y (t) = ~x(r + s − t), t ∈ [r, s], defines the curve −Γ which goes in the opposite
direction from b to a. It is easy to see that
Z
Z
f~ d~x = − f~ d~x.
−Γ
(c) Triangle inequality.
Γ
Z
~
f~ d~x ≤ ℓ(Γ ) sup f(x) .
x∈Γ
Γ
Proof. Let ~x(t), t ∈ [t0 , t1 ] be a parametrization of Γ , then
Z t1
Z
Z t1 ~
′
′
~
f~ d~x = f
(~
x
(t))
f
(~
x
(t))
·
~
x
(t)
dt
≤
k~x (t)k dt
tri.in.,CSI
Γ
t0
t0
Z t
~ 1 ′
~ ≤ sup f (~x)
k~x (t)k dt = sup f (~x) ℓ(Γ ).
~
x∈Γ
~
x∈Γ
t0
(d) Splitting. If Γ1 and Γ2 are two curves such that the ending point of Γ1 equals the starting
point of Γ2 then
Z
Z
Z
f~ d~x =
f~ d~x +
f~ d~x.
Γ1 ∪Γ2
Γ1
Γ2
8.2 Line Integrals
239
8.2.1 Path Independence
Problem: For which vector fields f~ the line integral from a to b does not depend upon the path
(see Example 8.4 (a) Example 8.2)?
Definition 8.5 A vector field f~ : G → Rn , G ⊂ Rn , is called conservative if for any points a
and b in G and any curves Γ1 and Γ2 from a to b we have
Z
Z
f~ d~x =
f~ d~x.
Γ1
Γ2
In this case we say that the line integral
Rb
f~ d~x.
a
R
Γ
f~ d~x is path independent and we use the notation
Definition 8.6 A vector field f~ : G → Rn is called potential field or gradient vector field if
there exists a continuously differentiable function U : G → R such that f~(x) = grad U(x) for
x ∈ G. We call U the potential or antiderivative of f~.
Example 8.5 The gravitational force is given by
x
,
F~ (x) = −α
kxk3
where α = γmM. It is a potential field with potential
U(x) = α
1
kxk
x
with f (y) = 1/y and f ′ (y) =
This follows from Example 7.2 (a), grad f (kxk) = f ′ (kxk) kxk
−1/y 2.
Remark 8.4 (a) A vector field f~ is conservative if and only if the line integral over any closed
curve in G is 0. Indeed, suppose that f~ is conservative and Γ = Γ1 ∪ Γ2 is a closed curve,
where Γ1 is a curve from a to b and Γ2 is a curve from b to a. By Remark 8.3 (b), changing the
orientation of Γ2 , the sign of the line integral changes and −Γ2 is again a curve from a to b:
Z
Z
Z
Z Z ~
~
+
−
f d~x =
f d~x =
f~ d~x = 0.
Γ
Γ1
Γ2
Γ1
−Γ2
The proof of the other direction is similar.
y
G
G
x
x
y
(b) Uniqueness of a potential. An open subset
G ⊂ Rn is said to be connected, if any two
points x, y ∈ G can be connected by a polygonal path from x to y inside G. If it exists, U(x)
is uniquely determined up to a constant.
8 Curves and Line Integrals
240
Indeed, if grad U1 (x) = grad U2 (x) = f~ put U = U1 − U2 then we have ∇U = ∇U1 − ∇U2 =
f~ − f~ = 0. Now suppose that x, y ∈ G can be connected by a segment xy ⊂ G inside G. By
the MVT (Corollary 7.12)
U(y) − U(x) = ∇U((1 − t)x + ty) (y − x) = 0,
since ∇U = 0 on the segment xy. This shows that U = U1 − U2 = const. on any polygonal
path inside G. Since G is connected, U1 − U2 is constant on G.
Theorem 8.2 Let G ⊂ Rn be a domain.
(i) If U : G → R is continuously differentiable and f~ = grad U. Then f~ is conservative, and
for every (piecewise continuously differentiable) curve Γ from a to b, a, b ∈ G, we have
Z
f~ d~x = U(b) − U(a).
Γ
(ii) Let f~ : G → Rn be a continuous, conservative vector field and a ∈ G. Put
Z x
U(x) =
f~ d~y , x ∈ G.
a
Then U(x) is a potential for f~, that is grad U = f~.
(iii) A continuous vector field f~ is conservative in G if and only if it is a potential field.
Proof. (i) Let Γ = {~x(t) | t ∈ [r, s]}, be a continuously differentiable curve from a = ~x(r) to
b = ~x(s). We define ϕ(t) = U(~x(t)) and compute the derivative using the chain rule
ϕ̇(t) = grad U(~x(t)) · ~x˙ (t) = f~(~x(t)) · ~x˙ (t).
By definition of the line integral we have
Z
Z s
~
f d~x =
f~(~x(t)) ~x˙ (t) dt.
Γ
r
Inserting the above expression and applying the fundamental theorem of calculus, we find
Z
Z s
~
ϕ̇(t) dt = ϕ(s) − ϕ(r) = U(~x(s)) − U(~x(r)) = U(b) − U(a).
f d~x =
Γ
r
(ii) Choose h ∈ Rn small such that x + th ∈ G for all t ∈ [0, 1]. By the path independence of
the line integral
Z x+h
Z x
Z x+h
~
~
U(x + h) − U(x) =
f · d~y −
f · d~y =
f~ · d~y
a
a
x
Consider the curve ~x(t) = x + th, t ∈ [0, 1] from x to x + h. Then ~x˙ (t) = h. By the mean value
theorem of integration (Theorem 5.18 with ϕ = 1, a = 0 and b = 1) we have
Z 1
Z x+h
~
~ + θh) · h,
f · d~y =
f~(~x(t)) · h dt = f(x
x
0
8.2 Line Integrals
241
where θ ∈ [0, 1]. We check grad U(x) = f~(x) using the definition of the derivative:
~
~
U(x + h) − U(x) − f~(x) · h f (x + θh) − f~(x) khk
(f(x + θh) − f~(x)) · h =
≤
khk
khk
khk
CSI
~ = f~(x + θh) − f(x)
−→ 0,
h→0
since f~ is continuous at x. This shows that ∇U = f~.
(iii) follows immediately from (i) and (ii).
Remark 8.5 (a) In case n = 2, a simple path to compute the line integral (and so the potential
U) in (ii) consists of 2 segments: from (0, 0) via (x, 0) to (x, y). The line integral of P dx+Q dy
then reads as ordinary Riemann integrals
Z x
Z y
U(x, y) =
P (t, 0) dt +
Q(x, t) dt.
0
0
(b) Case n = 3. You can also use just one single segment from the origin to the endpoint
(x, y, z). This path is parametrized by the curve
~x(t) = (tx, ty, tz),
t ∈ [0, 1],
~x˙ (t) = (x, y, z).
We obtain
U(x, y, z) =
Z
=x
(x,y,z)
f1 dx + f2 dy + f3 dz
(0,0,0)
Z 1
f1 (tx, ty, tz) dt + y
0
Z
1
f2 (tx, ty, tz) dt + z
0
(8.2)
Z
1
f3 (tx, ty, tz) dt.
(8.3)
0
(c) Although Theorem 8.2 gives a necessary and sufficient condition for a vector field to be
conservative, we are missing an easy criterion.
Recall from Example 7.4, that a necessary condition for f~ = (f1 , . . . , fn ) to be a potential vector
field is
∂fj
∂fi
=
, 1 ≤ i < j ≤ n.
∂xj
∂xi
which is a simple consequence from Schwarz’s lemma since if fi = Uxi then
Uxi xj =
The condition
∂fi
∂xj
=
∂fj
,
∂xi
∂Uxj
∂Uxi
∂fi
∂fj
=
=
=
= Uxj xi .
∂xj
∂xj
∂xi
∂xi
~ It is a
1 ≤ i < j ≤ n. is called integrability condition for f.
necessary condition for f~ to be conservative. However, it is not sufficient.
Remark 8.6 Counter example. Let G = R2 \ {(0, 0)} and
−y
x
~
.
f = (P, Q) =
,
x2 + y 2 x2 + y 2
8 Curves and Line Integrals
242
The vector field satisfies the integrability condition Py = Qx . However, it is not conservative.
For, consider the unit circle γ(t) = (cos t, sin t), t ∈ [0, 2π]. Then γ ′ (t) = (− sin t, cos t) and
Z
Z 2π
Z
Z 2π
−y dx
sin t sin t dt cos t cos t dt
x dy
~
f d~x =
+
=
+ 2
=
dt = 2π.
2
2
x + y2
1
1
0
0
γ
γ x +y
R
This contradicts Γ f~ d~x = 0 for conservative vector fields. Hence, f~ is not conservative.
f~ fails to be conservative since G = R2 \ {(0, 0)} has an hole.
For more details, see homework 30.1.
The next proposition shows that under one additional assumption this criterion is also sufficient.
A connected open subset G (a region) of Rn is called simply connected if every closed polygonal
path inside G can be shrunk inside G to a single point.
Roughly speaking, simply connected sets do not have holes.
convex subset of Rn
1-torus S1 = {z ∈ C | | z | = 1}
annulus {(x, y) ∈ R2 | r 2 < x2 + y 2 ≤ R2 }, 0 ≤ r < R ≤ ∞
R2 \ {(0, 0)}
R3 \ {(0, 0, 0)}
simply connected
not simply connected
not simply connected
not simlpy connected
simply connected
The precise mathematical term for a curve γ to be “shrinkable to a point” is to be nullhomotopic.
Definition 8.7 (a) A closed curve γ : [a, b] → G, G ⊂ Rn open, is said to be null-homotopic if
there exists a continuous mapping h : [a, b] × [0, 1] → G and a point x0 ∈ G such that
(a) h(t, 0) = γ(t) for all t,
(b) h(t, 1) = x0 for all t,
(c) h(a, s) = h(b, s) = x0 for all s ∈ [0, 1].
(b) G is simply connected if any curve in G is null homotopic.
Proposition 8.3 Let f~ = (f1 , f2 , f3 ) a continuously differentiable vector field on a region G ⊂
R3 .
(a) If f~ is conservative then curl f~ = 0, i. e.
∂f2
∂f3
−
= 0,
∂x2 ∂x3
∂f1
∂f3
−
= 0,
∂x3 ∂x1
∂f2
∂f1
−
= 0.
∂x1 ∂x2
(b) If curl f~ = 0 and G is simply connected, then f~ is conservative.
Proof. (a) Let f~ be conservative; by Theorem 8.2 there exists a potential U, grad U = f~.
However, curl grad U = 0 since
∂f3
∂f2
∂2U
∂2U
−
=
−
=0
∂x2 ∂x3
∂x2 ∂x3 ∂x3 ∂x2
by Schwarz’s Lemma.
(b) This will be an application of Stokes’ theorem, see below.
8.2 Line Integrals
243
Example 8.6 Let on R3 , f~ = (P, Q, R) = (6xy 2 + ex , 6x2 y, 1). Then
curl f~ = (Ry − Qz , Pz − Rx , Qx − Py ) = (0, 0, 12xy − 12xy) = 0;
hence, f~ is conservative with the potential U(x, y, z).
First method to compute the potential U: ODE ansatz.
The ansatz Ux = 6xy 2 + ex will be integrated with respect to x:
Z
Z
U(x, y, z) = Ux dx + C(y, z) = (6xy 2 + ex ) dx + C(y, z) = 3x2 y 2 + ex + C(y, z).
Hence,
!
Uy = 6x2 y + Cy (y, z) = 6x2 y,
!
Uz = Cz = 1.
This implies Cy = 0 and Cz = 1. The solution here is C(y, z) = z + c1 such that U =
3x2 y 2 + ex + z + c1 .
Second method: Line Integrals. See Remark 8.5 (b))
Z 1
Z 1
Z 1
U(x, y, z) = x
f1 (tx, ty, tz) dt + y
f2 (tx, ty, tz) dt + z
f3 (tx, ty, tz) dt
0
0
0
Z 1
Z 1
Z 1
3
2
tx
3 2
=x
(6t xy + e ) dt + y
6t x y dt + z
dt
0
= 3x2 y 2 + ex + z.
0
0
244
8 Curves and Line Integrals
Chapter 9
Integration of Functions of Several
Variables
References to this chapter are [O’N75, Section 4] which is quite elemantary and good accessible. Another elementary approach is [MW85, Chapter 17] (part III). A more advanced but still
good accessible treatment is [Spi65, Chapter 3]. This will be our main reference here. Rudin’s
book [Rud76] is not recommendable for an introduction to integration.
9.1 Basic Definition
The definition of the Riemann integral of a function f : A → R, where A ⊂ Rn is a closed
rectangle, is so similar to that of the ordinary integral that a rapid treatment will be given, see
Section 5.1.
If nothing is specified otherwise, A denotes a rectangle. A rectangle A is the cartesian product
of n intervals,
A = [a1 , b1 ] × · · · × [an , bn ] = {(x1 , . . . , xn ) ∈ Rn | ak ≤ xk ≤ bk , k = 1, . . . , n}.
Recall that a partition of a closed interval [a, b] is a sequence t0 , . . . , tk where a = t0 ≤ t1 ≤
· · · ≤ tk = b. The partition divides the interval [a, b] in to k subintervals [ti−1 , ti ]. A partition
of a rectangle [a1 , b1 ] × · · · × [an , bn ] is a collection P = (P1 , . . . , Pn ) where each Pi is a
partition of the interval [ai , bi ]. Suppose for example that P1 = (t0 , . . . , tk ) is a partition of
[a1 , b1 ] and P2 = (s0 , . . . , sl ) is a partition of [a2 , b2 ]. Then the partition P = (P1 , P2 ) of
[a1 , b1 ] × [a2 , b2 ] divides the closed rectangle [a1 , b1 ] × [a2 , b2 ] into kl subrectangles, a typical
one being [ti−1 , ti ] × [sj−1 , sj ]. In general, if Pi divides [ai , bi ] into Ni subintervals, then P =
(P1 , . . . , Pn ) divides [a1 , b1 ] × · · · × [an , bn ] into N1 · · · Nn subrectangles. These subrectangles
will be called subrectangles of the partition P .
Suppose now A is a rectangle, f : A → R is a bounded function, and P is a partition of A. For
each subrectangle S of the partition let
mS = inf{f (x) | x ∈ S},
MS = sup{f (x) | x ∈ S},
245
9 Integration of Functions of Several Variables
246
and let v(S) be the volume of the rectangle S.
A = [a1 , b1 ] × · · · × [an , bn ] is
Note that volume of the rectangle
v(A) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ).
The lower and the upper sums of f for P are defined by
X
X
mS v(S) and U(P, f ) =
MS v(S),
L(P, f ) =
S
S
where the sum is taken over all subrectangles S of the partition P . Clearly, if f is bounded with
m ≤ f (x) ≤ M on the rectangle x ∈ R,
m v(R) ≤ L(P, f ) ≤ U(P, f ) ≤ M v(R),
so that the numbers L(P, f ) and U(P, f ) form bounded sets. Lemma 5.1 remains true; the proof
is completely the same.
Lemma 9.1 (a) Suppose the partition P ∗ is a refinement of P (that is, each subrectangle of P ∗
is contained in a subrectangle of P ). Then
L(P, f ) ≤ L(P ∗ , f ) and U(P ∗ , f ) ≤ U(P, f ).
(b) If P and P ′ are any two partitions, then L(P, f ) ≤ U(P ′ , f ).
It follows from the above corollary that all lower sums are bounded above by any upper sum
and vice versa.
Definition 9.1 Let f : A → R be a bounded function. The function f is called Riemann integrable on the rectangle A if
Z
A
f dx := sup{L(P, f )} = inf {U(P, f )} =:
P
P
Z
f dx,
A
where the supremum and the infimum are taken over all partitions P of A. This common
number is the Riemann integral of f on A and is denoted by
Z
Z
f dx or
f (x1 , . . . , xn ) dx1 · · · dxn .
A
A
R
R
f dx and A f dx are called the lower and the upper integral of f on A, respectively. They
A
always exist. The set of integrable function on A is denoted by R(A).
As in the one dimensional case we have the following criterion.
Proposition 9.2 (Riemann Criterion) A bounded function f : A → R is integrable if and only
if for every ε > 0 there exists a partition P of A such that U(P, f ) − L(P, f ) < ε.
9.1 Basic Definition
247
Example 9.1 (a) Let f : A → R be a constant function f (x) = c. Then for any Partition P and
any subrectangle S we have mS = MS = c, so that
X
X
c v(S) = c
v(S) = cv(A).
L(P, f ) = U(P, f ) =
S
R
S
Hence, A c dx = cv(A).
(b) Let f : [0, 1] × [0, 1] → R be defined by
(
0,
f (x, y) =
1,
if x is rational,
if x is irrational.
If P is a partition, then every subrectangle S will contain points (x, y) with x rational, and also
points (x, y) with x irrational. Hence mS = 0 and MS = 1, so
X
L(P, f ) =
0v(S) = 0,
S
and
U(P, f ) =
Therefore,
R
f dx = 1 6= 0 =
A
X
R
S
A
1v(S) = v(A) = v([0, 1] × [0, 1]) = 1.
f dx and f is not integrable.
9.1.1 Properties of the Riemann Integral
We briefly write R for R(A).
R
Remark 9.1 (a) R is a linear space and A (·) dx is a linear functional, i. e. f, g ∈ R imply
λf + µg ∈ R for all λ, µ ∈ R and
Z
Z
Z
(λf + µg) dx = λ f dx + µ g dx.
A
A
A
(b) R is a lattice, i. e., f ∈ R implies | f | ∈ R. If f, g ∈ R, then max{f, g} ∈ R and
min{f, g} ∈ R.
(c) R is an algebra, i. e., f, g ∈ R imply f g ∈ R.
(d) The triangle inequality holds:
Z
Z
f dx ≤
| f | dx.
A
(e) C(A) ⊂ R(A).
A
(f) f ∈ R(A) and f (A) ⊂ [a, b], g ∈ C[a, b]. Then g ◦f ∈ R(A).
(g) If f ∈ R and f = g except at finitely many points, then g ∈ R and
R
A
f dx =
R
A
g dx.
(h) Let f : A → R and let P be a partition of A. Then f ∈ R(A) if and only if f ↾S is
integrable for each subrectangle S. In this case
Z
XZ
f dx =
f ↾S dx.
A
S
S
9 Integration of Functions of Several Variables
248
9.2 Integrable Functions
We are going to characterize integrable functions. For, we need the notion of a set of measure
zero.
Definition 9.2 Let A be a subset of Rn . A has (n-dimensional) measure zero if for every ε > 0
P
there exists a sequence (Ui )i∈N of closed rectangles Ui which cover A such that ∞
i=1 v(Ui ) < ε.
Open rectangles can also be used in the definition.
Remark 9.2 (a) Any finite set {a1 , . . . , am } ⊂ Rn is of measure 0. Indeed, let ε > 0 and
choose Ui be a rectangle with midpoint ai and volume ε/m. Then {Ui | i = 1, . . . , m} covers
P
A and i v(Ui ) ≤ ε.
(b) Any contable set is of measure 0.
(c) Any countable set has measure 0.
(d) If each (Ai )i∈N has measure 0 then A = A1 ∪ A2 ∪ · · · has measure 0.
Proof. Let ε > 0. Since Ai has measure 0 there exist closed rectangles Uik , i ∈ N, k ∈ N,
S
such that for fixed i, the family {Uik | k ∈ N} covers Ai , i.e.
k∈N Uik ⊇ Ai and
P
i−1
, i ∈ N. In this way we have constructed an infinite array {Uik } which
k∈N v(Uik ) ≤ ε/2
covers A. Arranging those sets in a sequence (cf. Cantor’s first diagonal process), we obtain a
sequence of rectangles which covers A and
Hence,
∞
X
∞
X
ε
v(Uik ) ≤
= 2ε.
i−1
2
i=1
i,k=1
P∞
i,k=1 v(Uik )
≤ 2ε and A has measure 0.
(e) Let A = [a1 , b1 ]×· · ·×[an , bn ] be a non-singular rectangle, that is ai < bi for all i = 1, . . . , n.
Then A is not of measure 0. Indeed, we use the following two facts about the volume of finite
unions of rectangles:
P
(a) v(U1 ∪ · · · ∪ Un ) ≤ ni=1 v(Ui ),
(b) U ⊆ V implies v(U) ≤ v(V ).
Now let ε = v(A)/2 = (b1 − a1 ) · · · (bn − an )/2 and suppose that the open rectangles (Ui )i∈N
cover the compact set A. Then there exists a finite subcover U1 ∪ · · · ∪ Um ⊇ A. This and (a),
(b) imply
!
n
n
∞
[
X
X
Ui ≤
v(Ui ) ≤
v(Ui ).
ε < v(A) ≤ v
This contradicts
P
i
i=1
i=1
i=1
v(Ui ) ≤ ε; thus, A has not measure 0.
Theorem 9.3 Let A be a closed rectangle and f : A → R a bounded function.
B = {x ∈ A | f is discontinuous at x}.
Then f is integrable if and only if B is a set of measure 0.
For the proof see [Spi65, 3-8 Theorem] or [Rud76, Theorem 11.33].
Let
9.2 Integrable Functions
249
9.2.1 Integration over More General Sets
We have so far dealt only with integrals of functions over rectangles. Integrals over other sets
are easily reduced to this type.
If C ⊂ Rn , the characteristic function χC of C is defined by
(
1,
x ∈ C,
χC (x) =
0,
x 6∈ C.
1111111111111111111
0000000000000000000
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
0000000000000000000
1111111111111111111
C
A
Definition 9.3 Let f : C → R be bounded and A a rectangle, C ⊂ A. We call f Riemann integrable on C if the
product function f ·χC : A → R is Riemann integrable
on A. In this case we define
Z
Z
f dx =
f χC dx.
C
A
This certainly occurs if both f and χC are integrable on
R
R
A. Note, that C 1 dx = A χC dx =: v(C) is defined to
be the volume or measure of C.
R
Problem: Under which conditions on C, the volume v(C) = C dx exists? By Theorem 9.3
χC is integrable if and only if the set B of discontinuities of χC in A has measure 0.
The boundary of a set C
111
000
000
111
000
111
C
A
δC
11111
00000
00000
11111
00000
11111
00000
11111
111
000
000
111
000
111
Let C ⊂ A. For every x ∈ A exactly one of the
following three cases occurs:
(a) x has a neighborhood which is completely
contained in C (x is an inner point of C),
(b) x has a neighborhood which is completely
contained in C c (x is an inner point of C c ),
(c) every neighborhood of x intersects both C
and C c . In this case we say, x belongs to the
boundary ∂C of C. By definition ∂C = C ∩C c ;
also ∂C = C \ C ◦ .
By the above discussion, A is the disjoint union of two open and a closed set:
A = C ◦ ∪ ∂C ∪ (C c )◦ .
Theorem 9.4 The characteristic function χC : A → R is integrable if and only if the boundary
of C has measure 0.
Proof. Since the boundary ∂C is closed and inside the bounded set, ∂C is compact. Suppose
first x is an inner point of C. Then there is an open set U ⊂ C containing x. Thus χC (x) = 1
on x ∈ U; clearly χC is continuous at x (since it is locally constant). Similarly, if x is an inner
point of C c , χC (x) is locally constant, namely χC = 0 in a neighborhood of x. Hence χC is
9 Integration of Functions of Several Variables
250
continuous at x. Finally, if x is in the boundary of C for every open neighborhood U of x there
is y1 ∈ U ∩ C and y2 ∈ U ∩ C c , so that χC (y1 ) = 1 whereas χC (y2 ) = 0. Hence, χC is not
continuous at x. Thus, the set of discontinuity of χC is exactly the boundary ∂C. The rest
follows from Theorem 9.3.
Definition 9.4 A bounded set C is called Jordan measurable or simply a Jordan set if its boundR
ary has measure 0. The integral v(C) = C 1 dx is called the n-dimensional Jordan measure of
C or the n-dimensional volume of C; sometimes we write µ(C) in place of v(C).
Naturally, the one-dimensional volume in the length, and the two-dimensional volume is the
area.
Typical Examples of Jordan Measurable Sets
P
Hyper planes ni=1 ai xi = c, and, more general, hyper surfaces f (x1 , . . . , xn ) = c, f ∈ C1 (G)
are sets with measure 0 in Rn . Curves in Rn have measure 0. Graphs of functions
Γf = {(x, f (x)) ∈ Rn+1 | x ∈ G}, f continuous, are of measure 0 in Rn+1 . If G is a bounded
region in Rn , the boundary ∂G has measure 0. If G ⊂ Rn is a region, the cylinder
C = ∂G × R = {(x, xn+1 ) | x ∈ ∂G} ⊂ Rn+1 is a measure 0 set.
Let D ⊂ Rn+1 be given by
D = {(x, xn+1 ) | x ∈ K, 0 ≤ xn+1 ≤ f (x)},
f(x)
D
K
δK
where K ⊂ Rn is a compact set and f : K → R is continuous. Then D is Jordan
measurable. Indeed, D is bounded by the graph Γf , the hyper plane xn+1 = 0
and the cylinder ∂D × R = {(x, xn+1 ) | x ∈ ∂K} and all have measure 0 in
Rn+1 .
9.2.2 Fubini’s Theorem and Iterated Integrals
Our goal is to evaluate Riemann integrals; however, so far there was no method to compute
multiple integrals. The following theorem fills this gap.
Theorem 9.5 (Fubini’s Theorem) Let A ⊂ Rn and B ⊂ Rm be closed rectangles, and let
f : A × B → R be integrable. For x ∈ A let gx : B → R be defined by gx (y) = f (x, y) and let
L(x) =
U(x) =
Z
Z
gx dy =
B
gx dy =
B
Z
Z
f (x, y) dy,
B
f (x, y) dy.
B
9.2 Integrable Functions
251
Then L(x) and U(x) are integrable on A and
Z
Z
Z
f dxdy =
L(x) dx =
A×B
Z
A
f dxdy =
A×B
Z
U(x) dx =
A
Z
A
Z Z
A
f (x, y) dy
B
f (x, y) dy
B
!
dx,
dx.
The integrals on the right are called iterated integrals.
The proof is in the appendix to this chapter.
Remarks 9.3 (a) A similar proof shows that we can exchange the order of integration
!
Z
Z
Z Z
Z
f dxdy =
f (x, y) dx dy =
f (x, y) dx dy.
A×B
B
B
A
A
These integrals are called iterated integrals for f .
R
(b) In practice it is often the case that each gx is integrable so that A×B f dxdy =
R R
f
(x,
y)
dy
dx. This certainly occurs if f is continuous.
A
B
(c) If A = [a1 , b1 ] × · · · × [an , bn ] and f : A → R is continuous, we can apply Fubini’s theorem
repeatedly to obtain
Z
Z bn Z b1
f dx =
···
f (x1 , . . . , xn ) dx1 · · · dxn .
A
an
a1
R
(d) If C ⊂ A × B, Fubini’s theorem can be used to compute C f dx since this is by definition
R
f χC dx. Here are two examples in case n = 2 and n = 3.
A×B
Let a < b and ϕ(x) and ψ(x) continuous real valued functions on
[a, b] with ϕ(x) < ψ(x) on [a, b]. Put
ψ (x)
C = {(x, y) ∈ R2 | a ≤ x ≤ b,
ϕ(x) ≤ y ≤ ψ(x)}.
Let f (x, y) be continuous on C. Then f is integrable on C and
!
ZZ
Z
Z
b
ψ(x)
f dxdy =
C
f (x, y) dy
a
dx.
φ (x)
a
b
ϕ(x)
Let
G = {(x, y, z) ∈ R3 | a ≤ x ≤ b, ϕ(x) ≤ y ≤ ψ(x), α(x, y) ≤ z ≤ β(x, y)},
where all functions are sufficiently nice. Then
ZZZ
Z b Z ψ(y)
f (x, y, z) dxdydz =
G
a
ϕ(x)
Z
β(x,y)
α(x,y)
f (x, y, z) dz
!
dy
!
dx.
(e) Cavalieri’s Principle. Let A and B be Jordan sets in R3 and let Ac = {(x, y) | (x, y, c) ∈
A} be the section of A with the plane z = c; Bc is defined similar. Suppose each Ac and Bc is
Jordan measurable (in R2 ) and they have the same area v(Ac ) = v(Bc ) for all c ∈ R.
Then A and B have the same volume v(A) = v(B).
9 Integration of Functions of Several Variables
252
(1,1)
y
Example 9.2 (a) Let f (x, y) = xy and
C = {(x, y) ∈ R2 | 0 ≤ x ≤ 1, x2 ≤ y ≤ x}
√
= {(x, y) ∈ R2 | 0 ≤ y ≤ 1, y ≤ x ≤ y}.
x
Then
ZZ
xy dxdy =
Z
Z
1
xy dy dx =
x2
0
C
x
Z
1
0
x
Z
1 1 3
xy 2 dx =
(x − x5 ) dx
2 y=x2
2 0
1
1
x4 x6 1
1
=
− = −
= .
8
12 0 8 12
24
Interchanging the order of integration we obtain
ZZ
xy dxdy =
C
Z
1
0
Z
√
y
xy dx dy =
y
Z
1
0
√y
Z
1 1 2
x2 y =
(y − y 3) dy
2 y
2 0
1
y 3 y 4 1
1 1
=
− = − = .
6
8 0 6 8
24
(b) Let G = {(x, y, z) ∈ R3 | x, y, z ≥ 0, x + y + z ≤ 1} and f (x, y, z) = 1/(x + y + z + 1)3 .
The set G can be parametrized as follows
ZZZ
G
(2,4)
(1,2)
(2,2)
(1,1)
f dxdydz =
Z
1
0
Z
1−x
0
Z
1−x−y
0
dz
(1 + x + y + z)3
dy
dx
1−x−y !
1
−1
=
dy dx
2 (1 + x + y + z)2 0
0
0
Z 1 Z 1−x 1
1
1
dy dx
=
−
2 (1 + x + y)2 8
0
0
Z 1
5
x−3
1 1
1
dx =
log 2 −
.
+
=
2 0 x+1
4
2
8
Z
1
Z
1−x
(c) Let f (x, y) = ey/x and D the above region. Compute
the integral of f on D.
D can be parametrized as follows D = {(x, y) | 1 ≤ x ≤
2, x ≤ y ≤ 2x} Hence,
Z 2
Z 2x
ZZ
y
f dxdy =
dx
e x dy
D
=
1
Z
2
1
y 2x
dx xe x =
x
x
Z
1
2
3
(e2 x − ex) dx = (e2 − e).
2
9.3 Change of Variable
253
But trying to reverse the order of integration we encounter two problems. First, we must break
D in several regions:
ZZ
Z 2
Z y
Z 4
Z 2
y/x
f dxdy =
dy
e dx +
dy
ey/x dx.
1
D
1
2
y/2
This is not a serious problem. A greater problem is that e1/x has no elementary antiderivative, so
R y y/x
R2
e dx and y/2 ey/x dx are very difficult to evaluate. In this example, there is a considerable
1
advantage in one order of integration over the other.
The Cosmopolitan Integral
Let f ≥ 0 be a continuous real-valued function on [a, b].
y
There are some very special solids whose volumes can
f(x)
be expressed by integrals. The simplest such solid G is a
“volume of revolution” obtained by revolving the region
under the graph of f ≥ 0 on [a, b] around the horizontal
x
axis. We apply Fubini’s theorem to the set
a
b
G = {(x, y, z) ∈ R3 | a ≤ x ≤ b,
y 2 + z 2 ≤ f (x)2 }.
Consequently, the volume of v(G) is given by
v(G) =
ZZZ
dxdydz =
G
Z
b
dx
a
Z Z
dydz ,
Gx
(9.1)
where Gx = {(y, z) ∈ R2 | y 2 + z 2 ≤ f (x)2 } is the closed disc of radius f (x) around (0, 0).
RR
For any fixed x ∈ [a, b] its area is v(Gx ) = Gx dydz = πf (x)2 . Hence
v(G) = π
Z
b
f (x)2 dx.
(9.2)
a
Example 9.3 We compute the volume of the ellipsoid obtained by revolving the graph of the
ellipse
x2 y 2
+ 2 =1
a2
b
2
around the x-axis. We have y 2 = f (x)2 = b2 1 − xa2 ; hence
2
v(G) = πb
a
x2
x3 2a3
4π 2
2
2
b a.
1 − 2 dx = πb x − 2 = πb 2a − 2 =
a
3a
3a
3
−a
−a
Z
a
9.3 Change of Variable
We want to generalize change of variables formula
R g(b)
g(a)
f (x) dx =
Rb
a
f (g(y))g ′(y) dy.
9 Integration of Functions of Several Variables
254
If BR is the ball in R3 with radius R around the origin we have in cartesian coordinates
ZZZ
Z
Z √ 2 2
Z √ 2 2 2
R
f dxdydz =
dx
−R
BR
R −x
√
− R2 −x2
R −x −y
dy
−
√
dzf (x, y, z).
R2 −x2 −y 2
Usually, the complicated limits yield hard computations. Here spherical coordinates are appropriate.
To motivate the formula consider the area of a parallelogram D in the x-y-plane spanned by the
two vectors a = (a1 , a2 ) and b = (b1 , b2 ).
g1 (λ, µ) D = {λa + µb | λ, µ ∈ [0, 1]} =
(λ, µ ∈ [0, 1] ,
g2 (λ, µ) where g1 (λ, µ) = λa1 + µb1 and g2 (λ, µ) = λa2 + µb2 . As known from linear algebra the area
of D equals the norm of the vector product


e
e
e
1
2
3


v(D) = ka × bk = det a1 a2 0 = k(0, 0, a1b2 − a2 b1 )k = | a1 b2 − a2 b1 | =: d
b1 b2 0 Introducing new variables λ and µ with
x = λa1 + µb1 ,
y = λa2 + µb2 ,
the parallelogram D in the x-y-plane is now the unit square C = [0, 1] × [0, 1] in the λ-µ-plane
and D = g(C). We want to compare the area d of D with the area 1 of C. Note that d is exactly
1 ,g2 )
; indeed
the absolute value of the Jacobian ∂(g
∂(λ,µ)
∂(g1 , g2 )
= det
∂(λ, µ)
Hence,
ZZ
D
∂g1
∂λ
∂g2
∂λ
∂g1
∂µ
∂g2
∂µ
!
a1 a2
= det
= a1 b2 − a2 b1 .
b1 b2
ZZ ∂(g1 , g2 ) dxdy =
∂(λ, µ) dλdµ.
C
This is true for any Rn and any regular map, g : C → D.
Theorem 9.6 (Change of variable) Let C and D be compact Jordan set in Rn ; let M ⊂ C a
set of measure 0. Let g : C → D be continuously differentiable with the following
properties
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
(i) g is injective on C \ M.
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111 11111111111
0000000000000000 00000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
0000000000000000
1111111111111111
00000000000
11111111111
(ii) g ′ (x) is regular on C \ M.
0000000000000000
1111111111111111
00000000000
11111111111
g
D, y
0000000000000000
1111111111111111
00000000000
11111111111
Let f : D → R be continuous.
Then
Z
Z
f (y) dy =
f (g(x))
D
C
1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
fg
00000000
11111111
C, x
00000000
11111111
IR
1111111111
0000000000
00000000000
11111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
f
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
∂(g1 , . . . , gn )
∂(x1 , . . . , xn ) (x) dx.
(9.3)
9.3 Change of Variable
255
Remark 9.4 Why the absolute value of the Jacobian? In R1 we don’t have the absolute value.
Rb
Ra
But in contrast to Rn , n ≥ 1, we have an orientation of the integration set a f dx = − b f dx.
For the proof see [Rud76, 10.9 Theorem]. The main steps of the proof are: 1) In a small open
set g can be written as the composition of n “flips” and n “primitive mappings”. A flip changes
two variables xi and xk , wheras a primitive mapping H is equal to the identity except for one
variable, H(x) = x + (h(x) − x)em where h : U → R.
2) If the statement is true for transformations S and T , then it is true for the composition S ◦T
which follows from det(AB) = det A det B.
3) Use a partition of unity.
Example 9.4 (a) Polar coordinates. Let A = {(r, ϕ) | 0 ≤ r ≤ R, 0 ≤ ϕ < 2π} be a rectangle
in polar coordinates. The mapping g(r, ϕ) = (x, y), x = r cos ϕ, y = r sin ϕ maps this
rectangle continuously differentiable onto the disc D with radius R. Let M = {(r, ϕ) | r = 0}.
= r, the map g is bijective and regular on A \ M. The assumptions of the theorem
Since ∂(x,y)
∂(r,ϕ)
are satisfied and we have
ZZ
ZZ
f (x, y) dxdy =
f (r cos ϕ, r sin ϕ)rdrdϕ
D
=
Fubini
Z
0
A
R Z 2π
f (r cos ϕ, r sin ϕ)r drdϕ.
0
(b) Spherical coordinates. Recall from the exercise class the spherical coordinates r ∈ [0, ∞),
ϕ ∈ [0, 2π], and ϑ ∈ [0, π]
x = r sin ϑ cos ϕ,
y = r sin ϑ sin ϕ,
z = r cos ϑ.
The Jacobian reads
xr xϑ xϕ sin ϑ cos ϕ r cos ϑ cos ϕ −r sin ϑ sin ϕ
∂(x, y, z) = yr yϑ yϕ = sin ϑ sin ϕ r cos ϑ sin ϕ r sin ϑ cos ϕ = r 2 sin ϑ
∂(r, ϑ, ϕ) zr zϑ zϕ cos ϑ
−r sin ϑ
0
Sometimes one uses
Hence
ZZZ
B1
∂(x, y, z)
= −r 2 sin ϑ.
∂(r, ϕ, ϑ)
f (x, y, z) dxdydz =
Z
0
1
Z
0
2π
Z
π
f (x, y, z) sin2 ϑ dr dϕ dϑ.
0
This example was not covered in the lecture. Compute the volume of the ellipsoid E given by
u2 /a2 + v 2 /b2 + w 2 /c2 = 1. We use scaled spherical coordinates:
u = ar sin ϑ cos ϕ,
v = br sin ϑ sin ϕ,
w = cr cos ϑ,
9 Integration of Functions of Several Variables
256
where r ∈ [0, 1], ϑ ∈ [0, π], ϕ ∈ 0, 2π]. Since the rows of the spherical Jacobian matrix
are simply multiplied by a, b, and c, respectively, we have
∂(x,y,z)
∂(r,ϑ,ϕ)
∂(u, v, w)
= abcr 2 sin ϑ.
∂(r, ϑ, ϕ)
Hence, if B1 is the unit ball around 0 we have using iterated integrals
ZZZ
ZZZ
v(E) =
r 2 sin ϑ drdϑdϕ
dudvdw = abc
E
= abc
Z
1
drr 2
0
Z
0
2π
dϕ
Z
B1
π
sin ϑdϑ
0
1
4π
= abc 2π (− cos ϑ)|π0 =
abc.
3
3
2
2
x - y =1
2
x -y
2
=4
(c)
ZZ
(x2 + y 2) dxdy where C is bounded by the four hyperbolas
C
xy=2
xy=1
xy = 1, xy = 2, x2 − y 2 = 1, x2 − y 2 = 4.
We change coordinates g(x, y) = (u, v)
u = xy,
The Jacobian is
v = x2 − y 2 .
∂(u, v) y
x = −2(x2 + y 2).
=
2x −2y
∂(x, y)
The Jacobian of the inverse transform is
∂(x, y)
1
=−
.
2
∂(u, v)
2(x + y 2)
In the (u, v)-plane, the region is a rectangle D = {(u, v) ∈ R2 | 1 ≤ u ≤ 2,
Hence,
ZZ
ZZ
ZZ
x2 + y 2
2
2
2
2 ∂(x, y) dudv
=
dudv =
(x + y ) dxdy =
(x + y ) ∂(u, v) 2(x2 + y 2 )
C
D
1 ≤ v ≤ 4}.
1
3
v(D) = .
2
2
D
Physical Applications
If ̺(x) = ρ(x1 , x2 , x3 ) is a mass density of a solid C ⊂ R3 , then
m=
xi =
Z
ρ dx is the mass of C and
CZ
1
m
C
xi ρ(x) dx, i = 1, . . . , 3 are the coordinates of the mass center x of C.
9.4 Appendix
257
The moments of inertia of C are defined as follows
ZZZ
ZZZ
ZZZ
2
2
2
2
Ixx =
(y + z )ρ dxdydz,Iyy =
(x + z )ρ dxdydz,Izz =
(x2 + y 2 )ρ dxdydz,
Ixy =
ZCZ Z
xyρ dxdydz,
Ixz =
C
ZCZ Z
xzρ dxdydz,
Iyz =
C
ZCZ Z
yzρ dxdydz.
C
Here Ixx , Iyy , and Izz are the moments of inertia of the solid with respect to the x-axis, y-axis,
and z-axis, respectively.
Example 9.5 Compute the mass center of a homogeneous half-plate of radius R, C = {(x, y) |
x2 + y 2 ≤ R2 , y ≥ 0}.
Solution. By the symmetry of C with respect to the y-axis, x = 0. Using polar coordinates we
find
ZZ
Z Z
Z
1
1 R π
1 R 2
1 2R3
.
y=
y dxdy =
r sin ϕrdϕ dr =
r dr (− cos ϕ) |π0 =
m
m 0 0
m 0
m 3
C
2
) is the mass center of
Since the mass is proportional to the area, m = π R2 and we find (0, 4R
3π
the half-plate.
9.4 Appendix
Proof of Fubini’s Theorem. Let PA be a partition of A and PB a partition of B. Together they
give a partition P of A × B for which any subrectangle S is of the form SA × SB , where SA is
a subrectangle of the partition PA , and SB is a subrectangle of the partition PB . Thus
X
X
mSA ×SB v(SA × SB )
mS v(S) =
L(P, f ) =
S
SA ,SB
=
X X
SA
SB
!
mSA ×SB v(SB ) v(SA ).
Now, if x ∈ SA , then clearly mSA ×SB (f ) ≤ mSB (gx ) since the reference set SA × SB on the
left is bigger than the reference set {x} × SB on the right. Consequently, for x ∈ SA we have
Z
X
X
gx dy = L(x)
mSA ×SB v(SB ) ≤
mSB (gx ) v(SB ) ≤
SB
X
SB
B
SB
mSA ×SB v(SB ) ≤ mSA (L(x)).
Therefore,
X X
SA
SB
!
mSA ×SB v(SB ) v(SA ) ≤
X
SA
mSA (L(x))v(SA ) = L(PA , L).
258
9 Integration of Functions of Several Variables
We thus obtain
L(P, f ) ≤ L(PA , L) ≤ U(PA , L) ≤ U(PA , U) ≤ U(P, f ),
where the proof of the last inequality is entirely analogous to the proof of the first. Since f is
R
integrable, sup{L(P, f )} = inf{U(P, f )} = A×B f dxdy. Hence,
Z
sup{L(PA , L)} = inf{U(PA , L)} =
f dxdy.
A×B
R
R
In other words, L(x) is integrable on A and A×B f dxdy = A L(x) dx.
The assertion for U(x) follows similarly from the inequalities
L(P, f ) ≤ L(PA , L) ≤ L(PA , U) ≤ U(PA , U) ≤ U(P, f ).
Chapter 10
Surface Integrals
10.1 Surfaces in
R
3
Recall that a domain G is an open and connected subset in Rn ; connected means that for any
two points x and y in G, there exist points x0 , x1 , . . . , xk with x0 = x and xk = y such that
every segment xi−1 xi , i = 1, . . . , k, is completely contained in G.
Definition 10.1 Let G ⊂ R2 be a domain and F : G → R3 continuously differentiable. The
mapping F as well as the set F = F (G) = {F (s, t) | (s, t) ∈ G} is called an open regular
surface if the Jacobian matrix F ′ (s, t) has rank 2 for all (s, t) ∈ G.
If
the Jacobian matrix of F is


x(s, t)
F (s, t) = y(s, t) ,
z(s, t)


xs xt
F ′ (s, t) =  ys yt  .
zs zt
The two column vectors of F ′ (s, t) span the tangent plane to F at (s, t):
∂y
∂z
∂x
(s, t), (s, t), (s, t) ,
D1 F (s, t) =
∂s
∂s
∂s
∂x
∂y
∂z
D2 F (s, t) =
(s, t), (s, t), (s, t) .
∂t
∂t
∂t
Justification: Suppose (s, t0 ) ∈ G where t0 is fixed. Then γ(s) = F (s, t0 ) defines a curve
in F with tangent vector γ ′ (s) = D1 F (s, t0 ). Similarly, for fixed s0 we obtain another curve
γ̃(t) = F (s0 , t) with tangent vector γ̃ ′ (t) = D2 F (s0 , t). Since F ′ (s, t) has rank 2 at every point
of G, the vectors D1 F and D2 F are linearly independent; hence they span a plane.
259
10 Surface Integrals
260
Definition 10.2 Let F : G → R3 be an open regular surface, and (s0 , t0 ) ∈ G. Then
~x = F (s0 , t0 ) + αD1 F (s0 , t0 ) + βD2 F (s0 , t0 ),
α, β ∈ R
is called the tangent plane E to F at F (s0 , t0 ). The line through F (s0 , t0 ) which is orthogonal
to E is called the normal line to F at F (s0 , t0 ).
Recall that the vector product ~x × ~y of vectors ~x = (x1 , x2 , x3 ) and ~y = (y1 , y2, y3 ) from R3 is
the vector
e1 e2 e3 ~x × ~y = x1 x2 x3 = (x2 y3 − y2 x3 , x3 y1 − y3 x1 , x1 y2 − y1 x2 ).
y y y 1
2
3
It is orthogonal to the plane spanned by the parallelogram P with edges ~x and ~y . Its length is
the area of the parallelogram P .
A vector which points in the direction of the normal line is
e1 e2
D1 F (s0 , t0 ) × D2 F (s0 , t0 ) = xs ys
x y
t
t
D1 F
~n = ±
kD1 F
e3 zs zt × D2 F
,
× D2 F k
(10.1)
(10.2)
where ~n is the unit normal vector at (s0 , t0 ).
Example 10.1 (Graph of a function) Let F be given by the graph of a function f : G → R,
namely F (x, y) = (x, y, f (x, y)). By definition
D1 F = (1, 0, fx ),
hence
D2 F = (0, 1, fy ),
e1 e2 e3 D1 f × D2 f = 1 0 fx = (−fx , −fy , 1).
0 1 f y
Therefore, the tangent plane has the equation
−fx (x − x0 ) − fy (y − y0 ) + 1(z − z0 ) = 0.
Further, the unit normal vector to the tangent plane is
(fx , fy , −1)
~n = ± p 2
.
fx + fy2 + 1
10.1 Surfaces in R3
261
10.1.1 The Area of a Surface
Let F and F be as above. We assume that the continuous vector fields D1 F and D2 F on G can
be extended to continuous functions on the closure G.
Definition 10.3 The number
| F | = F :=
ZZ
kD1 F × D2 F k dsdt
(10.3)
G
is called the area of F and of F.
We call
dS = kD1 F × D2 F k dsdt
RR
dS.
the scalar surface element of F . In this notation, | F | =
G
Helix: (s*cos(t), s*cos(t), 2*t)
Example 10.2 Let F = {(s cos t, s sin t, 2t) | s ∈
[0, 2], t ∈ [0, 4π]} be the surfaced spanned by a helix.
We shall compute its area. The normal vector is
25
20
15
D1 F = (cos t, sin t, 0),
10
5
D2 F = (−s sin t, s cos t, 2)
-2
-1
0
0
1
-2
-1
2
0
1
e1
e2
e3 D1 F × D2 F = cos t
sin t 0 = (2 sin t, −2 cos t, s).
−s sin t s cos t 2 Therefore,
|F| =
Z
0
4π
such that
2
Z 2p
Z
2
2
2
4 cos t + 4 sin t + s dsdt = 4π
0
2
0
√
√
√
4 + s2 ds = 8π( 2 − log( 2 − 1)).
Example 10.3 (Guldin’s Rule (Paul Guldin, 1577–1643, Swiss Mathematician)) Let f be a
continuously differentiable function on [a, b] with f (x) ≥ 0 for all x ∈ [a, b]. Let the graph of
f revolve around the x-axis and let F be the corresponding surface. We have
| F | = 2π
Z
a
b
f (x)
p
1 + f ′ (x)2 dx.
Proof. Using polar coordinates in the y-z-plane, we obtain a parametrization of F
F = {(x, f (x) cos ϕ, f (x) sin ϕ) | x ∈ [a, b], ϕ ∈ [0, 2π]}.
10 Surface Integrals
262
We have
D1 F = (1, f ′ (x) cos ϕ, f ′ (x) sin ϕ),
D1 F × D2 F = (f f ′ , −f cos ϕ, −f sin ϕ);
D1 F = (0, −f sin ϕ, f cos ϕ),
p
so that dS = f (x) 1 + f ′ (x)2 dxdϕ. Hence
|F| =
Z bZ
a
0
2π
Z b
p
p
′
2
f (x) 1 + f (x) dϕ dx = 2π
f (x) 1 + f ′ (x)2 dx.
a
10.2 Scalar Surface Integrals
Let F and F be as above, and f : F → R a continuous function on the compact subset F ⊂ R3 .
Definition 10.4 The number
ZZ
ZZ
f (~x) dS :=
f (F (s, t)) kD1 F (s, t) × D2 (s, t)k dsdt
G
F
is called the scalar surface integral of f on F.
10.2.1 Other Forms for dS
(a) Let the surface F be given as the graph of a function F (x, y) = (x, y, f (x, y)), (x, y) ∈ G.
Then, see Example 10.1,
q
dS =
1 + fx2 + fy2 dxdy.
(b) Let the surface be given implicitly as G(x, y, z) = 0. Suppose G is locally solvable for z in
a neighborhood of some point (x0 , y0 , z0 ). Then the surface element (up to the sign) is given by
dS =
p
Fx2 + Fy2 + Fz2
k grad F k
dxdy =
dxdy.
| Fz |
| Fz |
One checks that DF1 × DF2 = (Fx , Fy , Fz )/Fz .
(c) If F is given by F (s, t) = (x(s, t), y(s, t), z(s, t)) we have
dS =
√
EG − H 2 dsdt,
where
E = x2s + ys2 + zs2 ,
G = x2t + yt2 + zt2 ,
H = xs xt + ys yt + zs zt .
10.2 Scalar Surface Integrals
263
~
Indeed, using ~a × b = k~ak
by ~a and ~b we get
~ b sin ϕ and sin2 ϕ = 1 − cos2 ϕ, where ϕ is the angle spanned
EG − H 2 = kD1 F k2 kD2 F k2 − (D1 F ·D2 F )2 = kD1 F k2 kD2 F k2 (1 − cos2 ϕ)
= kD1 F k2 kD2 F k2 sin2 ϕ = kD1 F × D2 F k2
which proves the claim.
Example 10.4 (a) We give two different forms for the scalar surface element of a sphere. By
(b), the sphere x2 + y 2 + z 2 = R2 has surface element
dS =
k(2x, 2y, 2z)k
R
dxdy = dxdy.
2z
z
If
x = R cos ϕ sin ϑ,
y = R sin ϕ sin ϑ,
z = R cos ϑ,
we obtain
D1 = Fϑ = R(cos ϕ cos ϑ, sin ϕ cos ϑ, − sin ϑ),
D2 = Fϕ = R(− sin ϕ sin ϑ, cos ϕ sin ϑ, 0),
D1 × D2 = R2 (cos ϕ sin2 ϑ, sin ϕ sin2 ϑ, sin ϑ cos ϑ).
Hence,
dS = kD1 × D2 k dϑdϕ = R2 sin ϑdϑdϕ.
(b) Riemann integral in R3 and surface integral over spheres. Let M = {(x, y, z) ∈ R3 |
ρ ≤ k(x, y, z)k ≤ R} where R > ρ ≥ 0. Let f : M → R be continuous. Let us denote the
sphere of radius r by Sr = {(x, y, z) ∈ R3 | x2 + y 2 + z 2 = r 2 }. Then
ZZZ
M
f dxdydz =
Z
R
ρ

dr 
ZZ
Sr

f (~x) dS  =
Z
R
ρ


ZZ
f (r~x) dS(~x) dr.
r2 
S1
Indeed, by the previous example, and by our knowledge of spherical coordinates (r, ϑ, ϕ).
dxdydz = r 2 sin ϑ dr dϑ dϕ = dr dSr .
On the other hand, on the unit sphere S1 , dS = sin ϑ dϑ dϕ such that
dxdydz = r 2 dr dS
which establishes the second formula.
10 Surface Integrals
264
10.2.2 Physical Application
(a) If ρ(x, y, z) is the mass density on a surface F,
RR
ρ dS is the total mass of F. The mass
F
center of F has coordinates (xc , yc , zc ) with
xc = RR
1
ρ dS
ZZ
x ρ(x, y, z) dS,
F
F
and similarly for yc and zc .
(b) If σ(~x) is a charge density on a surface F. Then
U(~y ) =
ZZ
F
σ(~x)
dS(~x),
k~y − ~xk
~y 6∈ F
is the potential generated by F.
10.3 Surface Integrals
10.3.1 Orientation
We want to define the notion of orientation for a regular surface. Let F be a regular (injective)
surface with or without boundary. Then for every point x0 ∈ F there exists the tangent plane
Ex0 ; the normal line to F at x0 is uniquely defined.
However, a unit vector on the normal line can have two different directions.
Definition 10.5 (a) Let F be a surface as above. A unit normal field to F is a continuous
function ~n : F → R3 with the following two properties for every x0 ∈ F
(i) ~n(x0 ) is orthogonal to the tangent plane to F at x0 .
(ii) k~n(x0 )k = 1.
(b) A regular surface F is called orientable, if there exists a unit normal field on F.
Suppose F is an oriented, open, regular surface with piecewise smooth boundary ∂F. Let
F (s, t) be a parametrization of F. We assume that the vector functions F , DF1 , and DF2 can
be extended to continuous functions on F. The unit normal vector is given by
~n = ε
D1 F × D2 F
,
kD1 F × D2 F k
where ε = +1 or ε = −1 fixes the orientation of F. It turns out that for a regular surface F
there either exists exactly two unit normal fields or there is no such field. If F is provided with
an orientation we write F+ for the pair (F, ~n). For F with the opposite orientation, we write
F− .
10.3 Surface Integrals
265
Examples of non-orientable surfaces are the Möbius band and the real projective plane. Analytically the Möbius band is given by

1 + t cos 2s sin s
F (s, t) =  1 + t cos 2s cos s ,
t sin 2s

1 1
(s, t) ∈ [0, 2π] × − ,
.
2 2
Definition 10.6 Let f~ : F → R3 be a continuous vector field on F. The number
ZZ
~ x) · ~n dS
f(~
(10.4)
F+
is called the surface integral of the vector field f~ on F+ . We call
~ = ~n dS = ε D1 F × D2 F dsdt
dS
the surface element of F.
Remark 10.1 (a) The surface integral is independent of the parametrization of F but depends
RR
RR
~ =−
~
on the orientation; F+ f~· dS
f~ dS.
F− ·
For, let (s, t) = (s(ξ, η), t(ξ, η)) be a new parametrization with F (s(ξ, η), t(ξ, η)) = G(ξ, η).
Then the Jacobian is
∂(s, t)
dsdt =
= (sξ tη − sη tξ ) dξdη.
∂(ξ, η)
Further
D1 G = D1 F sξ + D2 F tξ ,
D2 G = D1 F sη + D2 F tη ,
so that using ~x × ~x = 0, ~x × ~y = −~y × ~x
D1 G × D2 Gdξdη = (D1 F sξ + D2 F tξ ) × (D1 F sη + D2 F tη )dξdη,
= (sξ tη − sη tξ )D1 F × D2 F dξdη
= D1 F × D2 F dsdt.
(b) The scalar surface integral is a special case of the surface integral, namely
RR
RR
f dS =
f~n·~n dS.
(c) Special cases. Let F be the graph of a function f , F = {(x, y, f (x, y)) | (x, y) ∈ C}, then
~ = ±(−fx , −fy , 1) dxdy.
dS
If the surface is given implicitly by F (x, y, z) = 0 and it is locally solvable for z, then
~ = ± grad F dxdy.
dS
Fz
10 Surface Integrals
266
~
(d) Still another form of dS.
ZZ
~ =ε
f~ dS
ZZ
f (F (s, t)) · (D1 F × D2 F ) dsdt
G
F
f1 (F (s, t)) f2 (F (s, t)) f3 (F (s, t))
~ (s, t)) · (D1 F × D2 F ) = xs (s, t)
.
f(F
y
(s,
t)
z
(s,
t)
s
s
x (s, t)
yt (s, t)
zt (s, t) t
(10.5)
(e) Again another notation. Computing the previous determinant or the determinant (10.1)
explicitely we have
ys zs zs xs xs ys ~
= f1 ∂(y, z) +f2 ∂(z, x) +f3 ∂(x, y) .
+f2 +f3 f ·(D1 F ×D2 F ) = f1 yt zt
zt xt
xt yt ∂(s, t)
∂(s, t)
∂(s, t)
Hence,
~ = εD1 F × D2 F dsdt = ε
dS
~ = ε ( dydz, dzdx, dxdy) .
dS
Therefore we can write
ZZ
~ =
f~ · dS
F
In this setting
ZZ
(f1 dydz + f2 dzdx + f3 dxdy) .
F
f1 dydz =
F
ZZ
∂(z, x)
∂(x, y)
∂(y, z)
dsdt,
dsdt,
dsdt
∂(s, t)
∂(s, t)
∂(s, t)
ZZ
~ =±
(f1 , 0, 0) · dS
ZZ
f1 (F (s, t))
∂(y, z)
dsdt.
∂(s, t)
G
F
Sometimes one uses
~ = (cos(~n, e1 ), cos(~n, e2 ), cos(~n, e3 )) dS,
dS
~ = ~n dS.
since cos(~n, ei ) = ~n·ei = ni and dS
Note that we have surface integrals in the last two lines, not ordinary double integrals since F
is a surface in R3 and f1 = f1 (x, y, z) can also depend on x.
RR
~ is the flow of the vector field f~ through the surface F. The
The physical meaning of F f~ · dS
flow is (locally) positive if ~n and f~ are on the same side of the tangent plane to F and negative
in the other case.
Example 10.5 (a) Compute the surface integral
ZZ
f dzdx
F+
of f (x, y, z) = x2 yz where F is the graph of g(x, y) = x2 + y over the unit square G =
[0, 1] × [0, 1] with the downward directed unit normal field.
10.3 Surface Integrals
267
By Remark 10.1 (c)
~ = (gx , gy , −1) dxdy = (2x, 1, −1) dxdy.
dS
Hence
ZZ
f dzdx =
F+
ZZ
~
(0, f, 0) · dS
F+
= (R)
ZZ
2
2
x y(x + y) dxdy =
1
dx
0
G
z
Z
Z
1
(x4 y + x2 y 2) dy =
0
19
.
90
(b) Let G denote the upper half ball of radius R in R3 :
G = {(x, y, z) | x2 + y 2 + z 2 ≤ R2 ,
z ≥ 0},
and let F be the boundary of G with the orientation of the
outer normal. Then F consists of the upper half sphere
F1
y
R
R
x
F1 = {(x, y,
p
R2 − x2 − y 2 ) | x2 + y 2 ≤ R2 },
z = g(x, y) =
p
R2 − x2 − y 2 ,
with the upper orientation of the unit normal field and of the disc F2 in the x-y-plane
F2 = {(x, y, 0) | x2 + y 2 ≤ R2 },
z = g(x, y) = 0,
with the downward directed normal. Let f~(x, y, z) = (ax, by, cz). We want to compute
ZZ
~
f~ · dS.
F+
~ = 1 (x, y, z) dxdy. Hence
By Remark 10.1 (c), the surface element of the half-sphere F1 is dS
z
ZZ
ZZ
ZZ
1
1
2
2
2 ~
~
f · dS =
I1 =
(ax + by + cz )
(ax, by, cz) · (x, y, z) dxdy =
dxdy.
z
z
z=g(x,y)
BR
F1+
BR
Using polar coordinates x = r cos ϕ, y = r sin ϕ, r ∈ [0, R], and z =
√
R2 − r 2 we get
I1 =
Z
2π
dϕ
0
Noting
R 2π
0
sin2 ϕdϕ =
0
R 2π
I1 = π
0
Z
Z
ar 2 cos2 ϕ + br 2 sin2 ϕ + c(R2 − r 2 )
√
rdr.
R2 − r 2
cos2 ϕdϕ = π we continue
R
0
R
p
√
ar 3
br 3
√
+√
+ 2cr R2 − r 2
R2 − r 2
R2 − r 2
dr.
R2 − x2 − y 2 =
10 Surface Integrals
268
Using r = R sin t, dr = R cos t dt we have
Z R
Z π 3 3
Z π
2 R sin tR cos t dt
2
2
r3
3
p
√
dr =
=R
sin3 t dt = R3 .
3
R2 − r 2
0
0
0
R 1 − sin2 t
Hence,
2π 3
R (a + b) + πc
I1 =
3
Z
R
1
(R2 − r 2 ) 2 d(r 2 )
0
R
2π 3
2 2
2 32 =
R (a + b) + πc − (R − r ) 3
3
0
2π 3
R (a + b + c).
=
3
In case of the disc F2 we have z = f (x, y) = 0, such that fx = fy = 0 and
~ = (0, 0, −1) dxdy
dS
by Remark 10.1 (c). Hence
ZZ
ZZ
ZZ
~
~
f · dS =
(ax, by, cz) · (0, 0, −1) dxdy = −c
z dxdy = 0.
z=0
BR
F2+
Hence,
ZZ
BR
~ = 2π R3 (a + b + c).
(ax, by, cz) · dS
3
F
10.4 Gauß’ Divergence Theorem
The aim is to generalize the fundamental theorem of calculus to higher dimensions:
Z b
f ′ (x) dx = f (b) − f (a).
a
Note that a and b form the boundary of the segment [a, b]. There are three possibilities to do this
RR
RRR
~
g dxdydz =⇒
Gauß’ theorem in R3 ,
f~· dS
G
(∂G)+
RR
R
g dxdy
=⇒
f~· d~x Green’s theorem in R2 ,
G
(∂G)+
RR
R
~
f~· d~x Stokes’ theorem .
=⇒
~g · dS
F+
(∂G)+
Let G ⊂ R3 be a bounded domain (open, connected) such that its boundary F = ∂G satisfies
the following assumptions:
1. F is a union of regular, orientable surfaces Fi . The parametrization Fi (s, t), (s, t) ∈ Ci ,
of Fi as well as D1 Fi and D2 Fi are continuous vector functions on Ci; Ci is a domain in
R2 .
10.4 Gauß’ Divergence Theorem
269
2. Let Fi be oriented by the outer normal (with respect to G).
3. There is given a continuously differentiable vector field f~ : G → R3 on G (More
precisely, there exist an open set U ⊃ G and a continuously differentiable function
f˜: U → R3 such that f˜↾G = f~.)
Theorem 10.1 (Gauß’ Divergence Theorem) Under the above assumptions we have
ZZ
ZZZ
~
f~· dS
div f~ dxdydz =
(10.6)
∂G+
G
Sometimes the theorem is called Gauß–Ostrogadski theorem or simply Ostrogadski theorem.
Other writings:
ZZZ ZZ
∂f1 ∂f2 ∂f3
dxdydz =
+
+
(f1 dydz + f2 dzdx + f3 dxdy)
(10.7)
∂x
∂y
∂z
∂G
G
The theorem holds for more general regions G ⊂ R3 .
We give a proof for
G = {(x, y, z) | (x, y) ∈ C, α(x, y) ≤ z ≤ β(x, y)},
β (x,y)
where C ⊂ R2 is a domain and α, β ∈ C1 (C) define regular top and bottom surfaces F1 and F2 of F, respectively.
We prove only one part of (10.7) namely f~ = (0, 0, f3).
ZZ
ZZZ
∂f3
dxdydz =
f3 dxdy.
(10.8)
∂z
G
Proof.
α (x,y)
0000000000000000000000000000
1111111111111111111111111111
1111111111111111111111111111
0000000000000000000000000000
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
C
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
0000000000000000000000000000
1111111111111111111111111111
∂G
G
By Fubini’s theorem, the left side reads
ZZZ
∂f3
dxdydz =
∂x
ZZ
Z
α(x,y)
C
G
=
ZZ
β(x,y)
∂f3
dz
∂z
!
dxdy
(f3 (x, y, β(x, y)) − f3 (x, y, α(x, y))) dxdy,
(10.9)
C
where the last equality is by the fundamental theorem of calculus.
Now we are going to compute the surface integral. The outer normal for the top surface is
(−βx (x, y), −βy (x, y), 1) such that
ZZ
ZZ
I1 =
(0, 0, f3) · (−βx (x, y), −βy (x, y), 1) dxdy
f3 dxdy =
F1+
=
ZZ
C
C
f3 (x, y, β(x, y)) dxdy.
10 Surface Integrals
270
Since the bottom surface F2 is oriented downward, the outer normal is (αx (x, y), αy (x, y), −1)
such that
ZZ
ZZ
I2 =
f3 dxdy = −
f3 (x, y, α(x, y)) dxdy.
C
F2+
Finally, the shell F3 is parametrized by an angle ϕ and z:
F3 = {(r(ϕ) cos ϕ, r(ϕ) sin ϕ, z) | α(x, y) ≤ z ≤ β(x, y), ϕ ∈ [0, 2π]}.
Since D2 F = (0, 0, 1), the normal vector is orthogonal to the z-axis, ~n = (n1 , n2 , 0). Therefore,
ZZ
ZZ
f3 dxdy =
(0, 0, f3 ) · (n1 , n2 , 0) dS = 0.
I3 =
F3+
F3+
Comparing I1 + I2 + I3 with (10.9) proves the theorem in this special case.
Remarks 10.2 (a) Gauß’ divergence theorem can be used to compute the volume of the domain
G ⊂ R3 . Suppose the boundary ∂G of G has the orientation of the outer normal. Then
ZZ
ZZ
ZZ
x dydz =
y dzdx =
z dxdy.
v(G) =
∂G
∂G
∂G
(b) Applying the mean value theorem to the left-hand side of Gauß’ formula we have for any
bounded region G containing x0
ZZZ
ZZ
~
~
~
div f (x0 + h)
dxdydz = div f (x0 + h)v(G) =
f~· dS,
G
∂G+
where h is a small vector. The integral on the left is the volume v(G). Hence
1
G→x0 v(G)
div f~(x0 ) = lim
ZZ
∂G+
~ = lim 1 3
f~· dS
4πε
ε→0
3
ZZ
Sε (x0 )+
~
f~· dS,
where the region G tends to x0 . In the second formula, we have chosen G = Bε (x0 ) the open
ball of radius ε with center x0 . The right hand side can be thought as to be the source density of
~ In particular, the right side gives a basis independent description of div f~.
the field f.
Example 10.6 We want to compute the surface integral from Example 10.5 (b) using Gauß’
theorem:
ZZ
ZZZ
ZZZ
2πR3
~
~
div f dxdydz =
(a + b + c) dxdydz =
f · dS =
(a + b + c).
3
F+
C
x2 +y 2 +z 2 ≤R2 , z≥0
10.4 Gauß’ Divergence Theorem
271
Gauß’ divergence theorem which play an important role in partial differential equations.
Recall (Proposition 7.9 (Prop. 8.9)) that the directional derivative of a function v : U → R,
U ⊂ Rn , at x0 in the direction of the unit vector ~n is given by D~n f (x0 ) = grad f (x0 )·~n.
Notation. Let U ⊂ R3 be open and F+ ⊂ U be an oriented, regular open surface with the unit
normal vector ~n(x0 ) at x0 ∈ F. Let g : U → R be differentiable.
Then
∂g
(x0 ) = grad g(x0 )·~n(x0 )
∂~n
(10.10)
is called the normal derivative of g on F+ at x0 .
Proposition 10.2 Let G be a region as in Gauß’ theorem, the boundary ∂G is oriented with the
outer normal, u, v are twice continuously differentiable on an open set U with G ⊂ U. Then we
have Green’s identities:
ZZ
ZZZ
ZZZ
∂v
∇(u)·∇(v) dxdydz =
u
u ∆(v) dxdydz,
(10.11)
dS −
∂~n
G
G
∂G
ZZZ
ZZ ∂u
∂v
dS,
(10.12)
−v
(u∆(v) − v∆(u)) dxdydz =
u
∂~n
∂~n
G
ZZZ
Z∂GZ
∂u
dS.
(10.13)
∆(u) dxdydz =
∂~n
G
∂G
Proof. Put f = u∇(v). Then by nabla calculus
div f = ∇(u∇v) = ∇(u) · ∇(v) + u∇·(∇v)
= grad u· grad v + u∆(v).
Applying Gauß’ theorem, we obtain
ZZZ
ZZZ
ZZZ
div f dxdydz =
( grad u· grad v) dxdydz +
u∆(v) dxdydz
G
=
ZG
Z
∂G
u grad v · ~n dS =
ZZ
G
u
∂v
dS.
∂~n
∂G
This proves Green’s first identity. Changing the role of u and v and taking the difference, we
obtain the second formula.
Inserting v = −1 into (10.12) we get (10.13).
Application to Laplace equation
Let u1 and u2 be functions on G with ∆u1 = ∆u2 , which coincide on the boundary ∂G,
u1 (x) = u2 (x) for all x ∈ ∂G. Then u1 ≡ u2 in G.
10 Surface Integrals
272
Proof. Put u = u1 − u2 and apply Green’s first formula (10.11) to u = v. Note that ∆(u) =
∆(u1 ) − ∆(u2 ) = 0 (U is harmonic ini G) and u(x) = u1 (x) − u2 (x) = 0 on the boundary
x ∈ ∂G. In other words, a harmonic function is uniquely determined by its boundary values.
ZZZ
ZZZ
ZZ
∂u
dS −
u ∆(u) dxdydz = 0.
∇(u)·∇(u) dxdydz =
u
|{z}
| {z }
∂~n
G
∂G
2
0,x∈∂G
0
2
Since ∇(u)·∇(u) = k∇(u)k ≥ 0 and k∇(u)k is a continuous function on G, by homework
14.3, k∇(u)k2 = 0 on G; hence ∇(u) = 0 on G. By the Mean Value Theorem, Corollary 7.12,
u is constant on G. Since u = 0 on ∂G, u(x) = 0 for all x ∈ G.
10.5 Stokes’ Theorem
Roughly speaking, Stokes’ theorem relates a surface integral over a surface F with a line integral
over the boundary ∂F. In case of a plane surface in R2 , it is called Green’s theorem.
10.5.1 Green’s Theorem
Γ
11111
00000
00000
11111
00000
11111
00000
11111
Γ
00000000000
11111111111
00000
11111
00000000000
11111111111
00000
11111
00000000000
11111111111
00000
00000000000 11111
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
00000000000
11111111111
2
Γ1
Let G be a domain in R2 with picewise smooth (differentiable) boundaries Γ1 , Γ2 , . . . , Γk . We give an orientation to the boundary: the outer curve is oriented counter
clockwise (mathematical positive), the inner boundaries
are oriented in the opposite direction.
3
G
Theorem 10.3 (Green’s Theorem) Let (P, Q) be a continuously differentiable vector field on
G and let the boundary Γ = ∂G be oriented as above. Then
Z
ZZ ∂Q ∂P
dxdy =
−
P dx + Q dy.
(10.14)
∂x
∂y
Γ
G
Proof. (a) First, we consider a region G of type 1 in the plane, as shown in the figure and we
will prove that
Z
ZZ
∂P
dxdy =
−
P dx.
(10.15)
∂y
Γ
G
ψ(x)
y
Γ4
Γ3
Γ1
ϕ(x)
Γ
2
a
b
x
The double integral on the left may be evaluated as an
iterated integral (Fubini’s theorem), we have
!
ZZ
Z b Z ψ(x)
∂P
Py (x, y) dy dx
dxdy =
∂y
a
ϕ(x)
G
Z b
=
(P (x, ψ(x)) − P (x, ϕ(x))) dx.
a
10.5 Stokes’ Theorem
273
The latter equality is due to the fundamental theorem of calculus. To compute the line integral,
we parametrize the four parts of Γ in a natural way:
−Γ1 ,
~x1 (t) = (a, t),
Γ2 ,
~x2 (t) = (t, ϕ(t)),
Γ3 ,
~x3 (t) = (b, t),
−Γ4 ,
t ∈ [ϕ(a), ψ(a)],
dx = 0,
t ∈ [ϕ(b), ψ(b)],
dx = 0,
t ∈ [a, b],
~x4 (t) = (t, ψ(t)),
dy = dt,
dx = dt,
t ∈ [a, b],
dy = ϕ′ (t) dt,
dy = dt,
dx = dt,
dy = ψ ′ (t) dt.
Since dx = 0 on Γ1 and Γ3 we are left with the line integrals over Γ2 and Γ4 :
Z b
Z b
Z
P dx =
P (t, ϕ(t)) dt −
P (t, ψ(t)) dt
Γ
Let us prove the second part, −
Z
ψ(x)
∂
d
Q(x, y) dy =
∂x
dx
ϕ(x)
Inserting this into
RR
∂Q
∂x
G
ZZ
∂Q
dxdy =
∂x
G
=
Z
G
Z
dxdy =
b
d
dx
a
Z
RR
Z
a
∂Q
∂x
dxdy =
ψ(x)
Q dy. Using Proposition 7.24 we have
Γ
Q(x, y) dy
ϕ(x)
R b R ψ(x)
∂Q
ϕ(x) ∂x
a
Q(x, y) dy
ϕ(x)
Q(b, y) dy −
Z
!
− ψ ′ (x)Q(x, ψ(x)) + ϕ′ (x)Q(x, ϕ(x)).
dy dx, we get
ψ(x)
ψ(b)
ϕ(b)
a
R
!
′
− ψ (x)Q(x, ψ(x)) + ϕ (x)Q(x, ϕ(x))
ψ(a)
ϕ(a)
Q(a, y) dy −
+
We compute the line integrals:
Z
Z
−
Q dy = −
−Γ1
Further,
Z
Q dy =
Γ2
ψ(a)
Q(a, y) dy,
ϕ(a)
Z
Q(t, ϕ(t)) ϕ (t) dt,
a
Z
Z
b
Z
−
Z
−Γ4
!
dx
b
Q(x, ψ(x)) ψ ′ (x) dx+ (10.16)
a
Q(x, ϕ(x)) ϕ′ (x) dx.
(10.17)
a
Q dy =
Γ3
b
′
′
Q dy = −
Z
Z
ψ(b)
Q(b, y) dy.
ϕ(b)
b
Q(t, ψ(t)) ψ ′ (t) dt.
a
Adding up these integrals and comparing the result with (10.17), the proof for type 1 regions is
complete.
Γ1
Γ4
y
type 2
ϕ(x)
1010
1010
type 1
110010
1111111111111111111111111
0000000000000000000000000
11111111111111
00000000000000
00000000000000
11111111111111
11111111
00000000
10
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
type 2
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
type
2
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
00000000000000
11111111111111
0000000000000000000000000
1111111111111111111111111
1111111111111111111
0000000000000000000
0000000000000000000000000
1111111111111111111111111
ψ(x)
Γ2
Γ3
111
000
x
10 Surface Integrals
274
Exactly in the same way, we can prove that if G is a type 2 region then (10.14) holds.
(b) Breaking a region G up into smaller regions, each of which is both of type 1 and 2, Green’s
theorem is valid for G. The line integrals along the inner boundary cancel leaving the line
integral around the boundary of G.
(c) If the region has a hole, one can split it into two simply connected regions, for which
Green’s theorem is valid by the arguments of (b).
Application: Area of a Region
R
If Γ is a curve which bounds a region G, then the area of G is A = Γ (1 − α)x dy − αy dx
where α ∈ R is arbitrary, in particular,
Z
Z
Z
1
A=
x dy − y dx =
x dy = − y dx.
(10.18)
2 Γ
Γ
Γ
Proof. Choosing Q = (1 − α)x, P = −αy one has
ZZ
ZZ
Z
ZZ
dxdy =
((1 − α) − (−α)) dxdy =
(Qx − Py ) dxdy =
P dx + Q dy
A=
G
= −α
Z
Γ
G
y dx + (1 − α)
Z
Γ
G
x dy.
Γ
Inserting α = 0, α = 1, and α =
1
2
yields the assertion.
x2
y2
Example 10.7 Find the area bounded by the ellipse Γ : 2 + 2 = 1. We parametrize Γ by
a
b
~x(t) = (a cos t, b sin t), t ∈ [0, 2π], ~x˙ (t) = (−a sin t, b cos t). Then (10.18) gives
1
A=
2
Z
0
2π
1
a cos t b sin t dt − b sin t(−a sin t) dt =
2
Z
2π
ab dt = πab.
0
10.5.2 Stokes’ Theorem
Conventions: Let F+ be a regular, oriented surface. Let Γ = ∂F+ be the boundary of F with the
induced orientation: the orientation of the surface (normal vector) together with the orientation
of the boundary form a right-oriented screw. A second way to get the induced orientation:
sitting in the arrowhead of the unit normal vector to the surface, the boundary curve has counter
clockwise orientation.
Theorem 10.4 (Stokes’ theorem) Let F+ be a smooth regular oriented surface with a
parametrization F ∈ C2 (G) and G is a plane region to which Green’s theorem applies. Let
10.5 Stokes’ Theorem
275
Γ = ∂F be the boundary with the above orientation. Further, let f~ be a continuously differentiable vector field on F.
Then we have
Z
ZZ
~
~
curl f · dS =
f~· d~x.
(10.19)
∂F+
F+
This can also be written as
ZZ Z
∂f3 ∂f2
∂f1 ∂f3
∂f2 ∂f1
dydz+
dzdx+
dxdy =
−
−
−
f1 dx+f2 dy+f3 dz.
∂y
∂z
∂z
∂x
∂x
∂y
Γ
F+
Proof. Main idea: Reduction to Green’s theorem. Since both sides of the equation are additive
with respect to the vector field f~, it suffices to proof the statement for the vector fields (f1 , 0, 0),
(0, f2 , 0), and (0, 0, f3 ). We show the theorem for f~ = (f, 0, 0), the other cases are quite
analogous:
Z
ZZ ∂f
∂f
dzdx −
dxdy =
f dx.
∂z
∂y
∂F
F
Let F (u, v), u, v ∈ G be the parametrization of the surface F. Then
dx =
∂x
∂x
du +
dv,
∂u
∂v
(u, v)
such that the line integral on the right reads with P (u, v) = f (x(u, v), y(u, v), z(u, v)) ∂x
∂u
∂x
and Q(u, v) = f ∂v .
Z
f dx =
∂F
Z
f xu du + f xv dv =
∂G
=
=
=
ZZ
ZGZ
ZGZ
Z
P du + Q dv
∂G
=
Green′ s th.
−(fu xu + f xvu ) + (fu xv + f xuv ) du dv =
ZZ
ZZ ∂Q
∂P
+
−
∂v
∂u
G
du dv
−fv xu + fu xv du dv
G
−(fx xv + fy yv + fz zv )xu + (fx xu + fy yu + fz zu )xv du dv
(−fy (xu yv − xv yu ) + fz (zu xv − zv xu )) du dv
G
ZZ
ZZ ∂(x, y)
∂(z, x)
du dv =
+ fz
=
−fy
−fy dxdy + fz dzdx.
∂(u, v)
∂(u, v)
G
F
This completes the proof.
Remark 10.3 (a) Green’s theorem is a special case with F = G × {0}, ~n = (0, 0, 1) (orientation) and f~ = (P, Q, 0).
10 Surface Integrals
276
(b) The right side of (10.19) is called the circulation of the vector field f~ over the closed curve
Γ . Now let ~x0 ∈ F be fixed and consider smaller and smaller neighborhoods F0+ of ~x0 with
boundaries Γ0 . By Stokes’ theorem and by the Mean Value Theorem of integration,
Z
ZZ
~
curl f~·~n dS = curl f~(x0 )~n(x0 ) area (F0 ).
f d~x =
Γ0
F0
Hence,
curl f~(x0 )·~n(x0 ) = lim
F0 →x0
R
∂F0
f~ d~x
| F0 |
.
We call curl f~(x0 ) · ~n(x0 ) the infinitesimal circulation of the vector field f~ at x0 corresponding
to the unit normal vector ~n.
(c) Stokes’ theorem then says that the integral over the infinitesimal circulation of a vector field
f~ corresponding to the unit normal vector ~n over F equals the circulation of the vector field
along the boundary of F.
Path Independence of Line Integrals
We are going complete the proof of Proposition 8.3 and show that for a
simply connected region G ⊂ R3 and
a twice continuously differentiable vector field f~ with
curl f~ = 0 for all x ∈ G
the vector field f~ is conservative.
Proof. Indeed, let Γ be a closed, regular, piecewise differentiable curve Γ ⊂ G and let Γ be the
the boundary of a smooth regular oriented surface F+ , Γ = ∂F+ such that Γ has the induced
orientation. Inserting curl f~ = 0 into Stokes’ theorem gives
ZZ
Z
~
f~· d~x;
curl f · dS = 0 =
F+
Γ
the line integral is path independent and hence, f~ is conservative. Note that the region must be
simply connected; otherwise its in general impossible to find F with boundary Γ .
10.5.3 Vector Potential and the Inverse Problem of Vector Analysis
Let f~ be a continuously differentiable vector field on the simply connected region G ⊂ R3 .
Definition 10.7 The vector field f~ on G is called a source-free field (solenoidal field) if there
exists a vector field ~g on G with f~ = curl ~g . Then ~g is called the vector potential to f~.
Theorem 10.5 f~ is source-free if and only if div f~ = 0.
10.5 Stokes’ Theorem
277
Proof. (a) If f~ = curl ~g then div f~ = div ( curl ~g ) = 0.
(b) To simplify notations, we skip the arrows. We explicitly construct a vector potential g to f
with g = (g1 , g2 , 0) and curl g = f . This means
∂g2
,
∂z
∂g1
f2 =
,
∂z
∂g2 ∂g1
−
.
f3 =
∂x
∂y
f1 = −
Integrating the first two equations, we obtain
Z z
g2 = −
f1 (x, y, t) dt + h(x, y),
z0
Z z
f2 (x, y, t) dt,
g1 =
z0
where h(x, y) is the integration constant, not depending on z. Inserting this into the third equation, we obtain
Z z
Z z
∂f1
∂f2
∂g2 ∂g1
−
=−
(x, y, t) dt + hx (x, y) −
(x, y, t) dt
∂x
∂y
z0 ∂x
z0 ∂y
Z z
∂f1 ∂f2
=−
dt + hx
+
∂x
∂y
z0
Z z
∂f3
(x, y, t) dt + hx
=
div f =0 z ∂z
0
f3 (x, y, z) = f3 (x, y, z) − f3 (x, y, z0 ) + hx (x, y).
This yields, hx (x, y) = f3 (x, y, z0).
Integration with respect to x finally gives
Rx
h(x, y) = x0 f3 (t, y, z0) dt; the third equation is satisfied and curl g = f .
Remarks 10.4 (a) The proof of second direction is a constructive one; you can use this method
to calculate a vector potential explicitly. You can also try another ansatz, say g = (0, g2 , g3 ) or
g = (g1 , 0, g3 ).
(b) If g is a vector potential for f and U ∈ C2 (G), then g̃ = g + grad U is also a vector potential
for f . Indeed
curl g̃ = curl g + curl grad U = f.
The Inverse Problem of Vector Analysis
Let h be a function and ~a be a vector field on G; both continuously differentiable.
Problem: Does there exist a vector field f~ such that
div f~ = h and
curl f~ = ~a.
10 Surface Integrals
278
Proposition 10.6 The above problem has a solution if and only if div ~a = 0.
Proof. The condition is necessary since div ~a = div curl f~ = 0. We skip the vector arrows.
For the other direction we use the ansatz f = r + s with
curl r = 0,
div r = h,
(10.20)
curl s = a,
div s = 0.
(10.21)
Since curl r = 0, by Proposition 8.3 there exists a potential U with r = grad U. Then
curl r = 0 and div r = div grad U = ∆(U). Hence (10.20) is satisfied if and only if
r = grad U and ∆(U) = h.
Since div a = 0 by asssumption, there exists a vector potential g such that curl g = a. Let ϕ be
twice continuously differentiable on G and set s = g + grad ϕ. Then curl s = curl g = a and
div s = div g + div grad ϕ = div g + ∆(ϕ). Hence, div s = 0 if and only if ∆(ϕ) = − div g.
Both equations ∆(U) = h and ∆(ϕ) = − div g are so called Poisson equations which can be
solved within the theory of partial differential equations (PDE).
The inverse problem has not a unique solution. Choose a harmonic function ψ, ∆(ψ) = 0 and
put f1 = f + grad ψ. Then
div f1 = div f + div grad ψ = div f + ∆(ψ) = div f = h,
curl f1 = curl f + curl grad ψ = curl f = a.
Chapter 11
Differential Forms on
Rn
We show that Gauß’, Green’s and Stokes’ theorems are three cases of a “general” theorem which
R
R
is also named after Stokes. The simple formula now reads c dω = ∂c ω. The appearance of
the Jacobian in the change of variable theorem will become clear. We formulate the Poincaré
lemma.
Good references are [Spi65], [AF01], and [vW81].
11.1 The Exterior Algebra Λ(
R)
n
Although we are working with the ground field R all constructions make sense for arbitrary
fields K, in particular, K = C. Let {e1 , . . . , en } be the standard basis of Rn ; for h ∈ Rn we
P
write h = (h1 , . . . , hn ) with respect to the standard basis, h = i hi ei .
11.1.1 The Dual Vector Space V ∗
The interplay between a normed space E and its dual space E ′ forms the basis of functional
analysis. We start with the definition of the (algebraic) dual.
Definition 11.1 Let V be a linear space. The dual vector space V ∗ to V is the set of all linear
functionals f : V → R,
V ∗ = {f : V → R | f is linear}.
It turns out that V ∗ is again a linear space if we introduce addition and scalar multiples in the
natural way. For f, g ∈ V ∗ , α ∈ R put
(f + g)(v) := f (v) + g(v),
(αf )(v) := αf (v).
The evaluation of f ∈ V ∗ on v ∈ V is sometimes denoted by
f (v) = hf , vi ∈ K.
279
11 Differential Forms on Rn
280
In this case, the brackets denote the dual pairing between V ∗ and V . By definition, the pairing
is linear in both components. That is, for all v, w ∈ V and for all λ, µ ∈ R
hλf + µg , vi = λ hf , vi + µ hg , vi ,
hf , λv + µwi = λ hf , vi + µ hf , wi .
Example 11.1 (a) Let V = Rn with the above standard basis. For i = 1, . . . , n define the ith
coordinate functional dxi : Rn → R by
dxi (h) = dxi (h1 , . . . , hn ) = hi ,
h ∈ Rn .
The functional dxi associates to each vector h ∈ Rn its ith coordinate hi . The functional dxi is
indeed linear since for all v, w ∈ Rn and α, β ∈ R, dxi (αv+βw) = (αv+βw)i = αvi +βwi =
α dxi (v) + β dxi (w).
The linear space (Rn )∗ has also dimension n. We will show that { dx1 , dx2 , . . . , dxn } is a
basis of (Rn )∗ . We call it the dual basis of to {e1 , . . . , en }. Using the Kronecker symbol the
evaluation of dxi on ej reads as follows
dxi (ej ) = δij ,
i, j = 1, . . . , n.
{ dx1 , dx2 , . . . , dxn } generates V ∗ . Indeed, let f ∈ V ∗ . Then f =
coincide for all h ∈ V :
n
X
i=1
f (ei ) dxi (h) =
n
X
f (ei ) hi
i=1
=
f homog.
X
f (hi ei ) = f
i
Pn
i=1
X
i
f (ei ) dxi since both
hi ei
!
= f (h).
In Propostition 11.1 below, we will see that { dx1 , . . . , dxn } is not only generating but linearly
independent.
(b) If V = C([0, 1]), the continuous functions on [0, 1] and α is an increasing on [0, 1] function,
then the Riemann-Stieltjes integral
ϕα (f ) =
Z
1
f dα,
0
defines a linear functional ϕα on V .
If a ∈ [0, 1],
δa (f ) = f (a),
f ∈V
f ∈V
defines the evaluation functional of f at a. In case a = 0 this is Dirac’s δ-functional playing an
important role in the theory of distributions (generalized functions).
P
(c) Let a ∈ Rn . Then ha , xi = ni=1 ai xi , x ∈ Rn defines a linear functional on Rn . By (a)
this is already the most general form of a linear functional on Rn .
11.1 The Exterior Algebra Λ(Rn )
281
Definition 11.2 Let k ∈ N. An alternating (or skew-symmetric) multilinear form of degree
k on Rn , a k-form for short, is a mapping ω : Rn × · · · × Rn → R, k factors Rn , which is
multilinear and skew-symmetric, i. e.
MULT ω(· · · , αvi + βwi , · · · ) = αω(· · · , vi , · · · ) + βω(· · · , wi, · · · ),
SKEW
ω(· · · , vi , · · · , vj , · · · ) = −ω(· · · , vj , · · · , vi , · · · ),
(11.1)
i, j = 1, . . . , k, i 6= j, (11.2)
for all vectors v1 , v2 , . . . , vk , wi ∈ Rn .
We denote the linear space of all k-forms on Rn by Λk (Rn ) with the convention Λ0 (Rn ) = R.
In case k = 1 property (11.2) is an empty condition such that Λ1 (Rn ) = (Rn )∗ is just the dual
space.
Let f1 , . . . , fk ∈ (Rn )∗ be linear functionals on Rn . Then we define the k-form
f1 ∧ · · · ∧fk ∈ Λk (Rn ) (read: “f1 wedge f2 . . . wedge fk ”) as follows
f1 (h1 ) · · · f1 (hk )
.. (11.3)
f1 ∧ · · · ∧fk (h1 , . . . , hk ) = ...
. fk (h1 ) · · · fk (hk )
In particular, let i1 , . . . , ik ∈ {1, . . . , n} be fixed and choose fj = dxij , j = 1, . . . , k. Then
h1i · · · hki 1
1
.. dxi1 ∧ · · · ∧ dxik (h1 , . . . , hk ) = ...
. h1i · · · hki k
k
f1 ∧ · · · ∧fk is indeed a k-form since the fi are linear, the determinant is multilinear
′
a b c a b c λa + µa′ b c λd + µd′ e f = λ d e f + µ d′ e f ,
g ′ h i g h i λg + µg ′ h i and skew-symmetric
b a c
a b c d e f = − e d f .
h g i g h i For example, let y = (y1 , . . . , yn ), z = (z1 , . . . , zn ) ∈ Rn ,
y3 z3 = y3 z1 − y1 z3 .
dx3 ∧ dx1 (y, z) = y1 z1 If fr = fs = f for some r 6= s, we have f1 ∧ · · · ∧f ∧ · · · ∧f ∧ · · · ∧fk = 0 since determinants
with identical rows vanish. Also, for any ω ∈ Λk (Rn ),
ω(h1 , . . . , h, . . . , h, . . . , hk ) = 0,
h1 , . . . , hk , h ∈ Rn
since the defining determinant has two identical columns.
11 Differential Forms on Rn
282
Proposition 11.1 For k ≤ n the k-forms { dxi1 ∧ · · · ∧ dxik | 1 ≤ i1 < i2 < · · · < ik ≤ n}
form a basis of the vector space Λk (Rn ). A k-form with k > n is identically zero. We have
n
k
n
dim Λ (R ) =
.
k
Proof. Any k-form ω is uniquely determined by its values on the k-tuple of vectors (ei1 , . . . , eik )
with 1 ≤ i1 < i2 < · · · < ik ≤ n. Indeed, using skew-symmetry of ω, we know ω on all ktuples of basis vectors; using linearity in each component, we get ω on all k-tuples of vectors.
This shows that the dxi1 ∧ · · · ∧ dxik with 1 ≤ i1 < i2 < · · · < ik ≤ n generate the linear space
P
P
Λk (Rn ). We make this precise in case k = 2. With y = i yi ei , z = j zj ej we have by
linearity and skew-symmetry of ω
ω(y, z) =
n
X
yi zj ω(ei , ej ) =
SKEW
i,j=1
=
X
i<j
Hence,
X
(yi zj − yj zi )ω(ei , ej )
1≤i<j≤n
yi zi X
=
ω(ei , ej ) ω(ei , ej ) dxi ∧ dxj (y, z).
yj zj i<j
ω=
X
i<j
ω(ei, ej ) dxi ∧ dxj .
This shows that the n2 2-forms { dxi ∧ dxj | i < j} generate Λ2 (Rn ).
P
We show its linear independence. Suppose that i<j αij dxi ∧ dxj = 0 for some αij ∈ R.
Evaluating this on (er , es ), r < s, gives
X
X
δri δsi X
=
αij dxi ∧ dxj (er , es ) =
αij αij (δri δsj − δrj δsi ) = αrs ;
0=
δ
δ
rj
sj
i<j
i<j
i<j
hence, the above 2-forms are linearly independent. The arguments for general k are similar.
In general, let ω ∈ Λk (Rn ) then there exist unique numbers ai1 ···ik = ω(ei1 , . . . , eik ) ∈ R,
i1 < i2 < · · · < ik such that
X
ω=
ai1 ···ik dxi1 ∧ · · · ∧ dxik .
1≤i1 <···<ik ≤n
Example 11.2 Let n = 3.
k = 1 { dx1 , dx2 , dx3 } is a basis of Λ1 (R3 ).
k = 2 { dx1 ∧ dx2 , dx1 ∧ dx3 , dx2 ∧ dx3 } is a basis of Λ2 (R3 ).
k = 3 { dx1 ∧ dx2 ∧ dx3 } is a basis of Λ3 (R3 ).
Λk (R3 ) = {0} for k ≥ 4.
Definition 11.3 An algebra A over R is a linear space together with a product map (a, b) → ab,
A × A → A, such that the following holds for all a, b, c ∈ A an d α ∈ R
11.1 The Exterior Algebra Λ(Rn )
283
(i) a(bc) = (ab)c (associative),
(ii) (a + b)c = ac + bc, a(b + c) = ab + ac,
(iii) α(ab) = (αa)b = a(αb).
Standard examples are C(X), the continuous functions on a metric space X or Rn×n , the full
n × n-matrix algebra over R or the algebra of polynomials R[X].
n
M
n
Let Λ(R ) =
Λk (Rn ) be the direct sum of linear spaces.
k=0
Proposition 11.2 (i) Λ(Rn ) is an R-algebra with unity 1 and product ∧ defined by
( dxi1 ∧ · · · ∧ dxik ) ∧ ( dxj1 ∧ · · · ∧ dxjl ) = dxi1 ∧ · · · ∧ dxik ∧ dxj1 ∧ · · · ∧ dxjl
(ii) If ωk ∈ Λk (Rn ) and ωl ∈ Λl (Rn ) then ωk ∧ωl ∈ Λk+l (Rn ) and
ωk ∧ωl = (−1)kl ωl ∧ωk .
Proof. (i) Associativity is clear since concatanation of strings is associative. The distributive
laws are used to extend multiplication from the basis to the entire space Λ(Rn ).
We show (ii) for ωk = dxi1 ∧ · · · ∧ dxik and ωl = dxj1 ∧ · · · ∧ dxjl . We already know
dxi ∧ dxj = − dxj ∧ dxi . There are kl transpositions dxir ↔ dxjs necessary to transport all
dxjs from the right to the left of ωk . Hence the sign is (−1)kl .
In particular, dxi ∧ dxi = 0. The formula dxi ∧ dxj = − dxj ∧ dxi determines the product in
Λ(Rn ) uniquely.
We call Λ(Rn ) is the exterior algebra of the vector space Rn .
The following formula will be used in the next subsection. Let ω ∈ Λk (Rn ) and η ∈ Λl (Rn )
then for all v1 , . . . , vk+l ∈ Rn
(ω∧η)(v1 , . . . , vk+l ) =
1 X
(−1)σ ω(vσ(1) , . . . , vσ(k) ) η(vσ(k+1) , . . . , vσ(k+l) ).
k!l! σ∈S
(11.4)
k+l
Indeed, let ω = f1 ∧ · · · ∧fk , η = fk+1 ∧ · · · ∧fk+l , fi ∈ (Rn )∗ . The above formula can be
obtained from
(f1 ∧ · · · ∧fk ) ∧ (fk+1 ∧ · · · ∧fk+l )(v1 , . . . , vk+l ) =
f1 (v1 )
· · · f1 (vk )
f1 (vk+1 )
· · · f1 (vk+l )
f2 (v1 )
· · · f2 (vk )
f2 (vk+1 )
· · · f2 (vk+l )
..
..
..
.
.
.
= fk (v1 )
· · · fk (vk )
fk (vk+1 )
· · · fk (vk+l )
f (v ) · · · f (v ) f (v ) · · · f (v )
k+1 k
k+1 k+1
k+1 k+l
k+1 1
..
..
..
.
.
.
f (v ) · · · f (v ) f (v ) · · · f (v )
k+l 1
k+l k
k+l k+1
k+l k+l
11 Differential Forms on Rn
284
when expanding this determinant with respect to the last l rows. This can be done using Laplace
expansion:
ai j · · · ai j ai j
· · · aik+1 jk+l 1 1
1 k k+1 k+1
X
Pk
..
.. ..
|A| =
(−1) m=1 (im +jm ) ...
,
.
.
.
1≤j1 <···<jk ≤n
ai j · · · ai j ai j
· · · ai j k 1
k k
k+l k+1
k+l k+l
where (i1 , . . . , ik ) is any fixed orderd multi-index and (jk+1 , . . . , jk+l ) is the complementary
orderd multi-index to (j1 , . . . , jk ) such that all inegers 1, 2, . . . , k + l appear.
11.1.2 The Pull-Back of k-forms
Definition 11.4 Let A ∈ L(Rn , Rm ) a linear mapping and k ∈ N. For ω ∈ Λk (Rm ) we define
a k-form A∗ (ω) ∈ Λk (Rn ) by
(A∗ ω)(h1 , . . . , hk ) = ω(A(h1 ), A(h2 ), . . . , A(hk )),
h1 , . . . , hk ∈ Rn
We call A∗ (ω) the pull-back of ω under A.
Note that A∗ ∈ L(Λk (Rm ), Λk (Rn )) is a linear mapping. In case k = 1 we call A∗ the dual
mapping to A. In case k = 0, ω ∈ R we simply set A∗ (ω) = ω. We have A∗ (ω∧η) =
A∗ (ω)∧A∗ (η). Indeed, let ω ∈ Λk (Rn ), η ∈ Λl (Rn ), and hi ∈ Rn , i = 1, . . . , k + l, then by
(11.4)
A∗ (ω∧η)(h1 , . . . ,hk+l ) = (ω∧η)(A(h1 ), . . . , A(hk+l ))
1 X
(−1)σ ω(A(vσ(1) ), . . . , A(vσ(k) )) η(A(vσ(k+1) ), . . . , A(vσ(k+l) ))
=
k!l! σ∈S
k+l
1 X
(−1)σ A∗ (ω)(vσ(1) , . . . , vσ(k) ) A∗ (η)(vσ(k+1) , . . . , vσ(k+l) )
=
k!l! σ∈S
k+l
∗
= (A (ω)∧A∗ (η))(h1 , . . . , hk+l ).
1 0 3
Example 11.3 (a) Let A =
∈ R2×3 be a linear map A : R3 → R2 , defined by
2 1 0
matrix multiplication, A(v) = A·v, v ∈ R3 . Let {e1 , e2 , e3 } and {f1 , f2 } be the standard bases
of R3 and R2 resp. and let { dx1 , dx2 , dx3 } and { dy1 , dy2 } their dual bases, respectvely.
First we compute A∗ ( dy1 ) and A∗ ( dy2 ).
A∗ ( dy1)(ei ) = dy1 (A(ei )) = ai1 ,
A∗ ( dy2 )(ei ) = ai2 .
In particular,
A∗ ( dy1 ) = 1 dx1 + 0 dx2 + 3 dx3 ,
A∗ ( dy2 ) = 2 dx1 + dx2 .
Compute A∗ ( dy2 ∧ dy1 ). By definition, for 1 ≤ i < j ≤ 3,
ai2 aj2 .
A ( dy2 ∧ dy1 )(ei , ej ) = dy2 ∧ dy1 (A(ei ), A(ej )) = dy2 ∧ dy1 (Ai , Aj ) = aj2 aj1 ∗
11.2 Differential Forms
285
In particular
2
A ( dy2 ∧ dy1 )(e1 , e2 ) = 1
1
A∗ ( dy2 ∧ dy1 )(e2 , e3 ) = 3
∗
Hence,
1
= −1,
0
0
= 3.
0
2 0
= 6,
A ( dy2 ∧ dy1 )(e1 , e3 ) = 1 3
∗
A∗ ( dy2∧ dy1 ) = − dx1 ∧ dx2 + 6 dx1 ∧ dx3 + 3 dx2 ∧ dx3 .
On the other hand
A∗ ( dy2 )∧A∗ ( dy1) = (2 dx1 + dx2 )∧( dx1 +3 dx3 ) = − dx1 ∧ dx2 +3 dx1 ∧ dx3 +6 dx1 ∧ dx3 .
(b) Let A ∈ Rn×n , A : Rn → Rn and ω =
A∗ (ω) = det(A) ω.
11.1.3 Orientation of
dx1 ∧ · · · ∧ dxn ∈ Λn (Rn ).
Then
R
n
If {e1 , . . . , en } and {f1 , . . . , fn } are two bases of Rn there exists a unique regular matrix A =
P
(aij ) (det A 6= 0) such that ei =
j aij fj . We say that {e1 , . . . , en } and {f1 , . . . , fn } are
equivalent if and only if det A > 0. Since det A 6= 0, there are exactly two equivalence classes.
We say that the two bases {ei | i = 1, . . . , n} and {fi | i = 1, . . . , n} define the same orientation
if and only if det A > 0.
Definition 11.5 An orientation of Rn is given by fixing one of the two equivalence classes.
Example 11.4 (a) In R2 the bases {e1 , e2 } and {e2 , e1 } have different orientations since A =
0 1
and det A = −1.
1 0
(b) In R3 the bases {e1 , e2 , e3 }, {e3 , e1 , e2 } and {e2 , e3 , e1 } have the same orientation whereas
{e1 , e3 , e2 }, {e2 , e1 , e3 }, and {e3 , e2 , e1 } have opposite orientation.
(c) The standard basis {e1 , . . . , en } and {e2 , e1 , e3 , . . . , en } define different orientations.
11.2 Differential Forms
11.2.1 Definition
Throughout this section let U ⊂ Rn be an open and connected set.
Definition 11.6 (a) A differential k-form on U is a mapping ω : U → Λk (Rn ), i. e. to every
point p ∈ U we associate a k-form ω(p) ∈ Λk (Rn ). The linear space of differential k-forms on
U is denoted by Ω k (U).
11 Differential Forms on Rn
286
(b) Let ω be a differential k-form on U. Since { dxi1 ∧· · ·∧ dxik | 1 ≤ i1 < i2 < · · · < ik ≤ n}
forms a basis of Λk (Rn ) there exist uniquely determined functions ai1 ···ik on U such that
X
ω(p) =
(11.5)
ai1 ···ik (p) dxi1 ∧ · · · ∧ dxik .
1≤i1 <···<ik ≤n
If all functions ai1 ···ik are in Cr (U), r ∈ N ∪ {∞} we say ω is an r times continuosly differentiable differential k-form on U. The set of those differential k-forms is denoted by Ωrk (U)
We define
Ωr0 (U)
r
= C (U) and Ω(U) =
n
M
Ω k (U). The product in Λ(Rn ) defines a product
k=0
in Ω(U):
(ω∧η)(x) = ω(x)∧η(x),
x ∈ U,
hence Ω(U) is an algebra. For example, if
ω1 = x2 dy∧ dz + xyz dx∧ dy
ω2 = (xy 2 + 3z 2 ) dx
define a differential 2-form and a 1-form on R3 , ω1 ∈ Ω 2 (R3 ), ω2 ∈ Ω 1 (R3 ) then ω1 ∧ω2 =
(x3 y 2 + 3x2 z 2 ) dx∧ dy∧ dz.
11.2.2
Differentiation
Definition 11.7 Let f ∈ Ω 0 (U) = Cr (U) and p ∈ U. We define
df (p) = Df (p);
then df is a differential
1-form on U.
X
If ω(p) =
ai1 ···ik (p) dxi1 ∧ · · · ∧ dxik is a differential k-form, we define
1≤i1 <···<ik ≤n
dω(p) =
X
1≤i1 <···<ik ≤n
dai1 ···ik (p)∧ dxi1 ∧ · · · ∧ dxik .
(11.6)
Then dω is a differential (k + 1)-form. The linear operator d : Ω k (U) → Ω k+1 (U) is called the
exterior differential.
Remarks 11.1 (a) Note, that for a function f : U → R, Df ∈ L(Rn , R) = Λ1 (Rn ). By
Example 7.7 (a)
n
n
X
X
∂f
∂f
Df (x)(h) = grad f (x) · h =
(x)hi =
(x) dxi (h),
∂x
∂x
i
i
i=1
i=1
hence
n
X
∂f
df (x) =
(x) dxi .
∂xi
i=1
(11.7)
11.2 Differential Forms
287
Viewing xi : U → R as a C∞ -function, by the above formula
dxi (x) = dxi .
This justifies the notation dxi . If f ∈ C∞ (R) we have df (x) = f ′ (x) dx.
(b) One can show that the definition of dω does not depend on the choice of the basis
{ dx1 , . . . , dxn } of Λ1 (Rn ).
Example 11.5 (a) G = R2 , ω = exy dx + xy 3 dy. Then
dω = d(exy )∧ dx + d(xy 3 )∧ dy
= (yexy dx + xexy dy)∧ dx + (y 3 dx + 3xy 2 dy)∧ dy
= (−xexy + y 3) dx∧ dy.
(b) Let f be continuously differentiable. Then
df = fx dx + fy dy + fz dz = grad f ·( dx, dy, dz) = grad f ·. d~x.
(c) Let v = (v1 , v2 , v3 ) be a C1 -vector field. Put ω = v1 dx + v2 dy + v3 dz. Then we have
dω =
∂v3 ∂v2
−
∂y
∂z
dy∧ dz +
∂v1 ∂v3
−
∂z
∂x
dz∧ dx +
∂v2 ∂v1
−
∂x
∂y
dx∧ dy
= curl (v)· ( dy∧ dz, dz∧ dx, dx∧ dy) = curl (v)· dS.
(d) Let v be as above. Put ω = v1 dy∧ dz + v2 dz∧ dx + v3 dx∧ dy. Then we have
dω = div (v) dx∧ dy∧ dz.
Proposition 11.3 The exterior differential d is a linear mapping which satisfies
(i) d(ω∧η) = dω∧η + (−1)k ω∧dη,
(ii) d(dω) = 0, ω ∈ Ω2 (U).
ω ∈ Ω1k (U), η ∈ Ω1 (U).
Proof. (i) We first prove Leibniz’ rule for functions f, g ∈ Ω10 (U). By Remarks 11.1 (a),
X ∂f
X ∂
∂g
(f g) dxi =
g+f
dxi
d(f g) =
∂xi
∂xi
∂xi
i
i
X ∂f
X ∂g
=
dxi g +
dxi f = df g + f dg.
∂xi
∂xi
i
i
For I = (i1 , . . . , ik ) and J = (j1 , . . . , jl ) we abbreviate dxI = dxi1 ∧ · · · ∧ dxik and dxJ =
11 Differential Forms on Rn
288
dxj1 ∧ · · · ∧ dxjl . Let ω =
X
I
d(ω∧η) = d
=
X
I,J
=
aI dxI and η =
X
I,J
X
bJ dxJ . By definition
J
X
I,J
aI bJ dxI ∧ dxJ
!
=
X
I,J
d(aI bj )∧ dxI ∧ dxJ
(daI bJ + aI dbJ ) ∧ dxI ∧ dxJ
daI ∧ dxI ∧ bJ dxJ +
k
= dω∧η + (−1) ω∧dη,
X
I,J
aI dxI ∧dbJ ∧ dxJ (−1)k
where in the third line we used dbJ ∧ dxI = (−1)k dxI ∧dbJ .
(ii) Again by the definition of d:
d(dω) =
X
I
X 2
X ∂aI
∂ aI
d (daI ∧ dxI ) =
d
dxj ∧ dxI =
dxi ∧ dxj ∧ dxI
∂x
∂x
j
i ∂xj
I,j
I,i,j
X ∂ 2 aI
(− dxj ∧ dxi ∧ dxI ) = −d(dω).
Schwarz’ lemma
∂xj ∂xi
I,i,j
=
It follows that d(dω) = d2 ω = 0.
11.2.3 Pull-Back
Definition 11.8 Let f : U → V be a differentiable function with open sets U ⊂ Rn and
V ⊂ Rm . Let ω ∈ Ω k (V ) be a differential k-form. We define a differential k-form f ∗ (ω) ∈
Ω k (U) by
(f ∗ ω)(p) = (Df (p)∗)ω(f (p)),
(f ∗ ω)(p; h1, . . . , hk ) = ω(f (p); Df (p)(h1), . . . , Df (p)(hk )),
p ∈ U, h1 , . . . , hk ∈ Rn .
In case k = 0 and ω ∈ Ω 0 (V ) = C∞ (V ) we simply set
(f ∗ ω)(p) = ω(f (p)),
f ∗ ω = ω ◦f.
We call f ∗ (ω) the pull-back of the differential k-form ω with respect to f .
Note that by definition the pull-back f ∗ is a linear mapping from the space of differential kforms on V to the space of differential k-forms on U, f ∗ : Ω k (V ) → Ω k (U).
11.2 Differential Forms
289
Proposition 11.4 Let f be as above and ω, η ∈ Ω(V ). Let { dy1 , . . . , dym } be the dual basis
to the standard basis in (Rm )∗ . Then we have with f = (f1 , . . . , fm )
n
X
∂fi
dxj = dfi ,
∂x
j
j=1
(a)
f ∗ ( dyi) =
(b)
f ∗ (dω) = d(f ∗ ω),
(c)
f ∗ (aω) = (a◦f )f ∗ (ω),
(d)
∗
∗
∗
f (ω∧η) = f (ω)∧f (η).
i = 1, . . . , m.
a ∈ C∞ (V ),
(11.8)
(11.9)
(11.10)
(11.11)
If n = m, then
(e)
f ∗ ( dy1 ∧ · · · ∧ dyn ) =
∂(f1 , . . . , fn )
dx1 ∧ · · · ∧ dxn .
∂(x1 , . . . , xn )
(11.12)
Proof. We show (a). Let h ∈ Rn ; by Definition 11.7 and the definition of the derivative we have
+
*
!
n X
(p)
∂f
k
f ∗ ( dyi )(h) = dyi (Df (p)(h)) = dyi ,
hj
∂x
j
j=1
k=1,...,m
n
n
X ∂fi (p)
X ∂fi (p)
=
hj =
dxj (h).
∂xj
∂xj
j=1
j=1
This shows (a). Equation (11.10) is a special case of (11.11); we prove (d). Let p ∈ U. Using
the pull-back formula for k forms we obtain
f ∗ (ω∧η)(p) = (Df (p))∗ (ω∧η(f (p))) = (Df (p))∗(ω(f (p))∧η(f (p)))
= (Df (p))∗ (ω(f (p))) ∧ (Df (p))∗ (η(f (p)))
= f ∗ (ω)(p) ∧ f ∗ (η)(p) = (f ∗ (ω) ∧ f ∗ (η))(p)
To show (11.9) we start with a 0-form g and prove that f ∗ (dg) = d(f ∗ g) for functions
g : U → R. By (11.7) and (11.10) we have
!
m
m
X
X
∂g
∂g(f (p)) ∗
∗
∗
f (dg)(p) = f
dyi (p) =
f ( dyi)
∂yi
∂yi
i=1
i=1
n
m
X
∂g(f (p)) X ∂fi
(p) dxj
∂y
∂x
i
j
i=1
j=1
!
n
m
X X ∂g(f (p)) ∂fi (p)
=
dxj
∂yi
∂xj
j=1
i=1
=
n
X
∂
=
(g ◦f )(p) dxj
chain rule
∂x
j
j=1
= d(g ◦f )(p) = d(f ∗ g)(p).
Now let ω =
P
I
aI dxI be an arbitrary form. Since by Leibniz rule
df ∗ ( dxI ) = d(d(fi1 )∧ · · · ∧d(fik )) = 0,
11 Differential Forms on Rn
290
we get by Leibniz rule
d (f ∗ ω) = d
=
X
I
X
f ∗ (aI )f ∗ ( dxI )
I
!
(d(f ∗ (aI )) ∧ f ∗ ( dxI ) + f ∗ (aI ) d(f ∗ ( dxI ))) =
X
I
d(f ∗ (aI ))i ∧ f ∗ ( dxI ).
On the other hand, by (d) we have
!
!
X
X
X
ai dxI = f ∗
daI ∧ dxI =
f ∗ (daI ) ∧ f ∗ ( dxI ).
f∗ d
I
I
I
By the first part of (b), both expressions coincide. This completes the proof of (b).
We finally prove (e). By (b) and (d) we have
f ∗ ( dy1 ∧ · · · ∧ dyn ) = f ∗ ( dy1 )∧ · · · ∧f ∗ ( dyn )
n
n
X
X
∂f1
∂fn
dxin
dxi1 ∧ · · · ∧
=
∂xi1
∂xin
i =1
i =1
1
=
n
n
X
i1 ,...,in
∂f1
∂fn
dxi1 ∧ · · · ∧ dxin .
···
∂xi1
∂xin
=1
Since the square of a 1-form vanishes, the only non-vanishing terms in the above sum are the
permutations (i1 , . . . , in ) of (1, . . . , n). Using skew-symmetry to write dxi1 ∧ · · · ∧ dxin as a
multiple of dx1 ∧ · · · ∧ dxn , we obtain the sign of the permutation (i1 , . . . , in ):
X
∂f1
∂fn
sign (I)
f ∗ ( dy1 ∧ · · · ∧ dyn ) =
···
dx1 ∧ · · · ∧ dxn
∂xi1
∂xin
I=(i1 ,...,in )∈Sn
=
∂(f1 , . . . , fn )
dx1 ∧ · · · ∧ dxn .
∂(x1 , . . . , xn )
Example 11.6 (a) Let f (r, ϕ) = (r cos ϕ, r sin ϕ) be given on R2 \ (0 × R) and let { dr, dϕ}
and { dx, dy} be the dual bases to {er , eϕ } and {e1 , e2 }. We have
f ∗ (x) = r cos ϕ,
f ∗ (y) = r sin ϕ,
f ∗ ( dx) = cos ϕ dr − r sin ϕdϕ, f ∗ ( dy) = sin ϕ dr + r cos ϕdϕ,
f∗
f ∗ ( dx∧ dy) = r dr∧dϕ,
−y
x
dx + 2
dy = dϕ.
x2 + y 2
x + y2
(b) Let k ∈ N, r ∈ {1, . . . , k}, and α ∈ R. Define a mapping a mapping from I : Rk → Rk+1
and ω ∈ Ω k (Rk+1 ) by
I(x1 , . . . , xk ) = (x1 , . . . , xr−1 , α, xr , . . . xk ),
ω(y1, . . . , yk+1) =
k+1
X
i=1
fi (y) dy1∧ · · · d
dyi · · · ∧ dyk+1,
11.2 Differential Forms
291
where fi ∈ C∞ (Rk+1 ) for all i; the hat means omission of the factor dyi . Then
I ∗ (ω)(x) = fr (x1 , . . . , xr−1 , α, xr , . . . , xk ) dx1 ∧ · · · ∧ dxk .
This follows from
I ∗ ( dyi ) = dxi ,
∗
I ( dyr ) = 0,
I ∗ ( dyi+1 ) = dxi ,
i = 1, . . . , r − 1,
i = r, . . . , k.
Roughly speaking: f ∗ (ω) is obtained by substituting the new variables at all places.
11.2.4 Closed and Exact Forms
Motivation: Let f (x) be a continuous function on R. Then ω = f (x) dx is a 1-form. By the
fundamental theorem of calculus, there exists an antiderivative F (x) to f (x) such that d F (x) =
f (x) dx = ω.
Problem: Given ω ∈ Ω k (U). Does there exist η ∈ Ω k−1 (U) with dη = ω?
Definition 11.9 ω ∈ Ω k (U) is called closed if d ω = 0.
ω ∈ Ω k (U) is called exact if there exists η ∈ Ω k−1 (U) such that d η = ω.
Remarks 11.2 (a) An exact form ω is closed; indeed, dω = d(dη) = 0.
P
(b) A 1-form ω = i fi dxi is closed if and only if curl f~ = 0 for the corresponding vector field
f~ = (f1 , . . . , fn ). Here the general curl can be defined as a vector with n(n−1)/2 components
~ ij = ∂fj − ∂fi .
( curl f)
∂xi ∂xj
The form ω is exact if and only if f~ is conservative, that is, f~ is a gradient vector field with
f~ = grad (U). Then ω = d U.
(c) There are closed forms that are not exact; for example, the winding form
ω=
−y
x
dx + 2
dy
2
+y
x + y2
x2
on R2 \ {(0, 0)} is not exact, cf. homework 30.1.
(d) If d η = ω then d(η + dξ) = ω, too, for all ξ ∈ Ω k−2 (U).
x0
Definition 11.10 An open set U is called star-shaped if there exists an x0 ∈ U such that for all x ∈ U the segment from x0 to x is
in U, i. e. (1 − t)x0 + tx ∈ U for all t ∈ [0, 1].
Convex sets U are star-shaped (take any x0 ∈ U); any star-shaped set is connected and simply
connected.
11 Differential Forms on Rn
292
Lemma
11.5 Let U
⊂ Rn be star-shaped with respect to the origin.
X
ω=
ai1 ···ik dxi1 ∧ · · · ∧ dxik ∈ Ω k (U). Define
Let
i1 <···<ik
Z 1
k
X X
r−1
k−1
dir ∧ · · · ∧ dxi ,
(−1)
t ai1 ···ik (tx) dt xir dxi1 ∧ · · · ∧ dx
k
I(ω)(x) =
0
i1 <···<ik r=1
(11.13)
where the hat means omission of the factor dxir . Then we have
I(d ω) + d(I ω) = ω.
(11.14)
(Without proof.)
Example 11.7 (a) Let k = 1, n = 3, and ω = a1 dx1 + a2 dx2 + a3 dx3 . Then
I(ω) = x1
Z
1
a1 (tx) dt + x2
0
Z
1
a2 (tx) dt + x3
0
Z
1
a3 (tx) dt.
0
Note that this is exactly the formula for the potential U(x1 , x2 , x3 ) from Remark 8.5 (b). Let
(a1 , a2 , a3 ) be a vector field on U with dω = 0. This is equivalent to curl a = 0 by Example 11.5 (c). The above lemma shows dU = ω for U = I(ω); this means grad U = (a1 , a2 , a3 ),
U is the potential to the vector field (a1 , a2 , a3 ).
(b) Let k = 2, n = 3, and ω = a1 dx2 ∧ dx3 + a2 dx3 ∧ dx1 + a3 dx1 ∧ dx2 where a is a C1 -vector
field on U. Then
I(ω) =
x3
Z
0
1
ta2 (tx) dt − x2
ta3 (tx) dt dx1 +
0
Z 1
Z 1
ta3 (tx) dt − x3
ta1 (tx) dt dx2 +
+ x1
0
0
Z 1
Z 1
+ x2
ta1 (tx) dt − x1
ta2 (tx) dt dx3 .
Z
1
0
0
By Example 11.5 (d), ω is closed if and only if div (a) = 0 on U. Let η = b1 dx1 + b2 dx2 +
b3 dx3 such that dη = ω. This means curl b = a. The Poincaré lemma shows that b with
curl b = a exists if and only if div (a) = 0. Then b is the vector potential to a. In case d ω = 0
we can choose ~b d~x = I(ω).
Theorem 11.6 (Poincaré Lemma) Let U be star-shaped. Then every closed differential form
is exact.
Proof. Without loss of generality let U be star-shaped with respect to the origin and dω = 0.
By Lemma 11.5, d(Iω) = ω.
11.3 Stokes’ Theorem
293
Remarks 11.3 (a) Let U be star-shaped, ω ∈ Ω k (U). Suppose d η0 = ω for some η ∈
Ω k−1 (U). Then the general solution of d η = ω is given by η0 + dξ with ξ ∈ Ω k−2 (U).
Indeed, let η be a second solution of d η = ω. Then d(η − η0 ) = 0. By the Poincaré lemma,
there exists ξ ∈ Ω k−2 (U) with η − η0 = dξ, hence η = η0 + dξ.
(b) Let V be a linear space and W a linear subspace of V . We define an equivalence relation
on V by v1 ∼ v2 if v1 − v2 ∈ W . The equivalence class of v is denoted by v + W . One easily
sees that the set of equivalence classes, denoted by V /W , is again a linear space: α(v + W ) +
β(u + W ) := αv + βu + W .
Let U be an arbitrary open subset of Rn . We define
C k (U) = {ω ∈ Ω k (U) | d ω = 0}, the cocycles on U,
B k (U) = {ω ∈ Ω k (U) | ω is exact}, the coboundaries on U.
Since exact forms are closed, B k (U) is a linear subspace of C k (U). The factor space
k
(U) = C k (U)/B k (U)
HdeR
k
is called the de Rham cohomology of U. If U is star-shaped, HdeR
(U) = 0 for k ≥ 1, by
1
2\
Poincaré’s lemma. The first de Rham cohomology HdeR of R {(0, 0)} is non-zero. The winding form is a non-zero element. We have
0
HdeR
(U) ∼
= Rp ,
if and only if U has exactly p components which are not connected U = U1 ∪ · · · ∪ Up (disjoint
union). Then, the characteristic functions χUi , i = 1, . . . , p, form a basis of the 0-cycles C 0 (U)
(B 0 (U) = 0).
11.3 Stokes’ Theorem
11.3.1
Singular Cubes, Singular Chains, and the Boundary Operator
A very nice treatment of the topics to this section is [Spi65, Chapter 4]. The set [0, 1]k =
[0, 1] × · · · × [0, 1] = {x ∈ Rk | 0 ≤ xi ≤ 1, i = 1, . . . , k} is called the k-dimensional unit
cube. Let U ⊂ Rn be open.
Definition 11.11 (a) A singular k-cube in U ⊂ Rn is a continuously differentiable mapping
ck : [0, 1]k → U.
(b) A singular k-chain in U is a formal sum
sk = n1 ck,1 + · · · + nr ck,r
with singular k-cubes ck,i and integers ni ∈ Z.
c2
c2
A singular 0-cube is a point, a singular 1-cube is a curve,
in general, a singular 2-cube (in R3 ) is a surface with
a boundary of 4 pieces which are differentiable curves.
Note that a singular 2-cube can also be a single point—
that is where the name “singular” comes from.
11 Differential Forms on Rn
294
Let Ik : [0, 1]k → Rk be the identity map, i. e. Ik (x) = x, x ∈ [0, 1]k . It is called the standard kcube in Rk . We are going to define the boundary ∂sk of a singular k-chain sk . For i = 1, . . . , k
define
k
(x1 , . . . , xk−1 ) = (x1 , . . . , xi−1 , 0, xi , . . . , xk−1 ),
I(i,0)
k
I(i,1)
(x1 , . . . , xk−1 ) = (x1 , . . . , xi−1 , 1, xi , . . . , xk−1 ).
Insert a 0 and a 1 at the ith component, respectively.
The boundary of the standard k-cube Ik is now defined by ∂Ik : [0, 1]k−1 → [0, 1]k
∂Ik =
k
X
i=1
k
k
(−1)i I(i,0)
− I(i,1)
.
(11.15)
It is the formal sum of 2k singular (k − 1)-cubes, the faces of the k-cube.
The boundary of an arbitrary singular k-cube ck : [0, 1]k → U ⊂ Rn is defined by the composition of the above mapping ∂ Ik : [0, 1]k−1 → [0, 1]k and the k-cube ck :
∂ck = ck ◦∂Ik =
k
X
i=1
k
k
(−1)i ck ◦I(i,0)
− ck ◦I(i,1)
,
(11.16)
and for a singular k-chain sk = n1 ck,1 + · · · + nr ck,r we set
∂sk = n1 ∂ck,1 + · · · + nr ∂ck,r .
The boundary operator ∂ck associates to each singular k-chain a singular (k − 1)-chain (since
k
k
both I(i,0)
and I(i,1)
depend on k − 1 variables, all from the segment [0, 1]).
One can show that
∂(∂sk ) = 0
for any singular k-chain sk .
Example 11.8 (a) In case n = k = 3 have
+ I(3,1)
x3
3
3
3
3
3
3
∂I3 = −I(1,0)
+ I(1,1)
+ I(2,0)
− I(2,1)
− I(3,0)
+ I(3,1)
,
+I(2,0)
- I(1,0)
-I
+ I(1,1)
x2
x1
- I(3,0)
where
(2,1)
3
3
−I(1,0)
(x1 , x2 ) = −(0, x1 , x2 ), +I(1,1)
(x1 , x2 ) = +(1, x1 , x2 ),
3
3
(x1 , x2 ) = +(x1 , 0, x2 ), −I(2,1)
(x1 , x2 ) = −(x1 , 1, x2 ),
+I(2,0)
3
3
−I(3,0)
(x1 , x2 ) = −(x1 , x2 , 0), +I(3,1)
(x1 , x2 ) = +(x1 , x2 , 1).
k
k
× D2 I(i,j)
to
Note, if we take care of the signs in (11.15) all 6 unit normal vectors D1 I(i,j)
3
the faces have the orientation of the outer normal with respect to the unit 3-cube [0, 1] . The
above sum ∂I3 is a formal sum of singular 2-cubes. You are not allowed to add componentwise:
−(0, x1 , x2 ) + (1, x1 , x2 ) 6= (1, 0, 0).
11.3 Stokes’ Theorem
295
(b) In case k = 2 we have
E3
I (1,0)
+
E1
I (2,0)
2
2
2
2
∂ I2 (x) = I(1,1)
− I(1,0)
+ I(2,0)
− I(2,1)
Γ3
I (2,1)
E4
+ I
(1,1)
Γ4
c2
Γ1
= (1, x) − (0, x) + (x, 0) − (x, 1).
Γ2
∂∂ I2 = (E3 − E2 ) − (E4 − E1 )+
E2
+ (E2 − E1 ) − (E3 − E4 ) = 0.
Here we have ∂c2 = Γ1 + Γ2 − Γ3 − Γ4 .
be the singular 2-cube
(c) Let c2 : [0, 2π] × [0, π] → R
3\
{(0, 0, 0)}
c(s, t) = (cos s sin t, sin s sin t, cos t).
By (b)
∂c2 (x) = c2 ◦∂I2 =
= c2 (2π, x) − c2 (0, x) + c2 (x, 0) − c2 (x, π)
= (cos 2π sin x, sin 2π sin x, cos 2π) − (cos 0 sin x, sin 0 sin x, cos x)+
+ (cos x sin 0, sin x sin 0, cos 0) − (cos x sin π, sin x sin π, cos π)
= (sin x, 0, cos x) − (sin x, 0, cos x) + (0, 0, 1) − (0, 0, −1)
= (0, 0, 1) − (0, 0, −1).
Hence, the boundary ∂c2 of the singular 2-cube c2 is a degenerate singular 1-chain. We come
back to this example.
11.3.2 Integration
Definition 11.12 Let ck : [0, 1]k → U ⊂ Rn , ~x = ck (t1 , . . . , tk ), be a singular k-cube and ω
a k-form on U. Then (ck )∗ (ω) is a k-form on the unit cube [0, 1]k . Thus there exists a unique
function f (t), t ∈ [0, 1]k , such that
(ck )∗ (ω) = f (t) dt1 ∧ · · · ∧ dtk .
Then
Z
ck
ω :=
Z
Ik
(c∗k ) (ω)
:=
Z
[0,1]k
f (t) dt1 · · · dtk
is called the integral of ω over the singular cube ck ; on the right there is the k-dimensional
Riemann integral.
r
X
If sk =
ni ck,i is a k-chain, set
i=1
Z
sk
ω=
r
X
i=1
ni
Z
ω.
ck,i
If k = 0, a 0-cube is a single point c0 (0) = x0 and a 0-form is a function ω ∈ C∞ (G). We set
R
ω = c∗0 (ω)|t=0 = ω(c0(0)) = ω(x0 ). We discuss two special cases k = 1 and k = n.
c0
11 Differential Forms on Rn
296
Example 11.9 (a) k = 1. Let c : [0, 1] → Rn be an oriented, smooth curve Γ = c([0, 1]). Let
ω = f1 (x) dx1 + · · · + fn (x) dxn be a 1-form on Rn , then
c∗ (ω) = (f1 (c(t))c′1 (t) + · · · + fn (c(t))c′n (t)) dt
is a 1-form on [0, 1] such that
Z
Z
Z 1
Z
∗
′
′
ω=
c ω=
f1 (c(t))c1 (t) + · · · + fn (c(t))cn (t) dt =
f~ · d~x.
c
[0,1]
Γ
0
R
Obviously, c ω is the line integral of f~ over Γ .
(b) k = n. Let c : [0, 1]k → Rk be continuously differentiable and let x = c(t). Let ω =
f (x) dx1 ∧ · · · ∧ dxk be a differential k-form on Rk . By Proposition 11.4 (e),
c∗ (ω) = f (c(t))
∂(c1 , . . ., ck )
dt1 ∧. . .∧ dtk .
∂(t1 , . . ., tk )
Therefore,
Z
ω=
c
Z
f (c(t))
∂(c1 , . . ., ck )
dt1 . . . dtk
∂(t1 , . . ., tk )
(11.17)
[0,1]k
Let c = Ik be the standard k-cube in [0, 1]k . Then
Z
Z
f (x) dx1 · · · dxk
ω=
Ik
[0,1]k
is the k-dimensional Riemann integral of f over [0, 1]k .
Let I˜k (x1 , . . ., xk ) = (x2 , x1 , x3 , . . ., xk ). Then Ik ([0, 1]k ) = I˜k ([0, 1]k ) = [0, 1]k , however
Z
Z
ω=
f (x2 , x1 , x3 , . . ., xk )(−1) dx1 · · · dxk
I˜k
[0,1]k
=−
Z
f (x1 , x2 , x3 , . . ., xk ) dx1 · · · dxk = −
Z
ω.
Ik
[0,1]k
R
We see that ω is an oriented Riemann integral. Note that in the above formula (11.17) we do
Ik
not have the absolute value of the Jacobian.
11.3.3 Stokes’ Theorem
Theorem 11.7 Let U be an open subset of Rn , k ≥ 0 a non-negative integer and let sk+1 be a
singular (k + 1)-chain on U. Let ω be a differential k-form on U, ω ∈ Ω1k (U). Then we have
Z
Z
ω=
d ω.
∂sk+1
sk+1
11.3 Stokes’ Theorem
297
Proof. (a) Let sk+1 = Ik+1 be the standard (k + 1)-cube, in particular n = k + 1.
k+1
X
Let ω =
fi (x) dx1 ∧· · ·∧ d
dxi ∧· · ·∧ dxk+1 . Then
i=1
k+1
X
dω =
(−1)i+1
i=1
∂fi (x)
dx1 ∧· · ·∧ dxk+1 ,
∂xi
hence by Example 11.6 (b), Fubini’s theorem and the fundamental theorem of calculus
Z
Z
k+1
X
∂fi
i+1
dω =
(−1)
dx1 · · · dxk+1
∂xi
i=1
Ik+1
[0,1]k+1
=
k+1
X
(−1)
i=1
=
k+1
X
(−1)
Example 11.6 (b)
k+1
X
i=1
ω,
∂fi
(x1 , . . ., t, . . .xk+1 ) dt dx1 · · · dxi−1 dxi+1 · · · dxk+1
bi
∂xi
(fi (x1 , . . . , 1, . . . , xk+1 ) − fi (x1 , . . . , 0, . . . , xk+1 )) dx1 ·b· · dxk+1
bi
[0,1]k
=
=
Z
i+1
1
0
[0,1]k
i=1
Z
Z Z
i+1


(−1)i+1 
Z [0,1]k
bi

Z
∗
∗

k+1
k+1
ω−
ω
I(i,1)
I(i,0)
[0,1]k
∂Ik+1
by definition of ∂Ik+1 . The assertion is shown in case of identity map.
(b) The general case. Let Ik+1 be the standard (k + 1)-cube. Since the pull-back and the
differential commute (Proposition 11.4) we have
Z
Z
Z
Z
∗
∗
dω =
(ck+1) (d ω) =
d ((ck+1) ω) =
(ck+1 )∗ ω
ck+1
Ik+1
k+1
X
=
i=1
k+1
X
=


(−1)i 

P
sk+1
i
Z
∂Ik+1
(ck+1)∗ ω −
k+1
I(i,0)
(−1)
Z
k+1
I(i,0)
Z
i
i=1
(c) Finally, let sk+1 =
by (b),
Z
Ik+1


(ck+1 )∗ ω 

ω=
Z
ω.
∂ck+1
k+1
k+1
ck+1 ◦I(i,0)
−ck+1 ◦I(i,1)
ni cki with ni ∈ Z and singular (k + 1)-cubes cki . By definition and
dω =
X
i
ni
Z
dω =
ck+1
X
i
ni
Z
ω=
∂ck+1
Z
∂sk+1
ω.
11 Differential Forms on Rn
298
Remark 11.4 Stokes’ theorem is valid for arbitrary oriented compact differentiable kdimensional manifolds F and continuously differentiable (k − 1)-forms ω on F.
Example 11.10 We come back to Example 11.8 (c). Let ω = (x dy∧ dz + y dz∧ dx +
z dx∧ dy)/r 3 be a 2-form on R3 \ {(0, 0, 0)}. It is easy to show that ω is closed, d ω = 0. We
R
compute ω. First note that c∗2 (r 3 ) = 1, c∗2 (x) = cos s sin t, c∗2 (y) = sin s sin t, c∗2 (z) = cos t
c2
such that
c∗2 ( dx) = d(cos s sin t) = − sin s sin t ds + cos s cos t dt,
c∗2 ( dy) = d(sin s sin t) = cos s sin t ds + sin s cos t dt,
c∗2 ( dz) = − sin t dt.
and
c∗2 (ω) = c∗2 (x dy∧ dz + y dz∧ dx + z dx∧ dy)
= c∗2 (x) c∗2 ( dy)∧c∗2 ( dz) + c∗2 (y dz∧ dx) + c∗2 (z dx∧ dy)
= (− cos2 s sin3 t − sin2 s sin3 t − cos t(sin2 s sin t cos t + cos2 s sin t cos t) ds∧ dt
= − sin t ds∧ dt,
such that
Z
ω=
c2
Z
c∗2 (ω)
[0,2π]×[0,π]
=
Z
− sin t ds∧ dt =
[0,2π]×[0,π]
Z
2π
0
Z
0
π
(− sin t) dsdt = −4π.
Stokes’ theorem shows that ω is not exact on R3 \ {(0, 0, 0)}. Suppose to the contrary that
ω = d η for some η ∈ Ω 1 (R3 \ {(0, 0, 0)}). Since by Example 11.8 (c), ∂c2 is a degenerate
1-chain (it consists of two points), the pull-back (∂c2 )∗ (η) is 0 and so is the integral
Z
Z
Z
Z
∗
0=
(∂c2 ) (η) =
η=
dη =
ω = −4π,
I1
∂c2
c2
c2
a contradiction; hence, ω is not exact.
We come back to the two special cases k = 1, n = 2 and k = 1, n = 3.
11.3.4 Special Cases
k = 1, n = 3. Let c : [0, 1]2 → U ⊆ R3 be a singular 2-cube, F = c([0, 1]2 ) is a regular
smooth surface in R3 . Then ∂F is a closed path consisting of 4 parts with the counter-clockwise
orientation. Let ω = f1 dx1 +f2 dx2 +f3 dx3 be a differential 1-form on U. By Example 11.9 (a)
Z
Z
ω=
f1 dx1 + f2 dx2 + f3 dx3
∂c2
∂F
On the other hand by Example 11.5 (c)
d ω = curl f · ( dx2 ∧ dx2 , dx3 ∧ dx2 , dx1 ∧ dx2 ).
11.3 Stokes’ Theorem
299
In this case Stokes’ theorem gives
Z
Z
ω=
Z
dω
c2
∂c2
f1 dx1 + f2 dx2 + f3 dx3 =
∂F
Z
curl f · ( dx2 ∧ dx3 , dx3 ∧ dx1 , dx1 ∧ dx2 )
F
If F is in the x1 –x2 plane, we get Green’s theorem.
k = 2, n = 3. Let c3 be a singular 3-cube in R3 and G = c3 ([0, 1]). Further let
ω = v1 dx2 ∧ dx3 + v2 dx3 ∧ dx1 + v3 dx1 ∧ dx2 ,
with a continuously differentiable vector field v ∈ C1 (G).
By Example 11.5 (d),
d ω = div (v) dx1 ∧ dx2 ∧ dx3 . The boundary of G consists of the 6 faces ∂c3 ([0, 1]3 ). They
are oriented with the outer unit normal vector. Stokes’ theorem then gives
Z
Z
dω =
v1 dx2 ∧ dx3 + v2 dx3 ∧ dx1 + v3 dx1 ∧ dx2 ,
c3
∂c3
Z
Z
~
div v dxdydz =
~v · dS.
G
∂G
This is Gauß’ divergence theorem.
11.3.5 Applications
The following two applications were not covered by the lecture.
(a) Brower’s Fixed Point Theorem
Proposition 11.8 (Retraction Theorem) Let G ⊂ Rn be a compact, connected, simply connected set with smooth boundary ∂G.
There exist no vector field f : G → Rn , fi ∈ C2 (G), i = 1, . . . , n such that f (G) ⊂ ∂G and
f (x) = x for all x ∈ ∂G.
Proof. Suppose to the contrary that such f exists; consider ω ∈ Ω n−1 (U),
ω = x1 dx2 ∧ dx3 ∧ · · · ∧ dxn . First we show that f ∗ (dω) = 0. By definition, for
v1 , . . . , vn ∈ Rn we have
f ∗ (dω)(p)(v1 , . . . , vn ) = dω(f (p))(Df (p)v1, Df (p)v2 , . . . , Df (p)vn )).
Since dim f (G) = dim ∂G = n − 1, the n vectors Df (p)v1 , Df (p)v2 , . . . , Df (p)vn can be
thought as beeing n vectors in an n − 1 dimensional linear space; hence, they are linearly
dependent. Consequently, any alternating n-form on those vectors is 0. Thus f ∗ (dω) = 0. By
Stokes’ theorem
Z
Z
∗
f (ω) = 0 =
f ∗ (dω).
∂G
G
11 Differential Forms on Rn
300
On the other hand, f = id on ∂G such that
f ∗ (ω) = ω |∂G= x1 dx2 ∧ · · · ∧ dxn |∂G .
Again, by Stokes’ theorem,
0=
Z
∗
f (ω) =
∂G
Z
G
dx1 ∧ · · · ∧ dxn = | G | ;
a contradiction.
Theorem 11.9 (Brower’s Fixed Point Theorem) Let g : B1 → B1 a continuous mapping of
the closed unit ball B1 ⊂ Rn into itself.
Then f has a fixed point p, f (p) = p.
Proof. (a) We first prove that the theorem holds true for a twice continuously differentiable
mapping g. Suppose to the contrary that g has no fixed point. For any p ∈ B1 the line through
p and f (p) is then well defined. Let h(p) be those intersection point of the above line with the
the unit sphere Sn−1 such that h(p) − p is a positive multiple of f (p) − p. In particular, h is a
C2 -mapping from B1 into Sn−1 which is the identity on Sn−1 . By the previous proposition, such
a mapping does not exist. Hence, f has a fixed point.
(b) For a continuous mapping one needs Stone–Weierstraß to approximate the continuous
functions by polynomials.
In case n = 1 Brower’s theorem is just the intermediate value theorem.
(b) The Fundamental Theorem of Algebra
We give a first proof of the fundamental theorem of algebra, Theorem 5.19:
Every polynomial f (z) = z n + a1 z n−1 + · · · + an with complex coefficients ai ∈ C
has a root in C.
We use two facts, the winding form ω on R2 \ {(0, 0)} is closed but not exact and z n and f (z)
are “close together” for sufficiently large | z |.
We view C as R2 with (a, b) = a + bi. Define the following singular 1-cubes on R2
11.3 Stokes’ Theorem
301
cR,n (s) = (Rn cos(2πns), Rn sin(2πns)) = z n ,
cR,f (s) = f ◦cR,1 (s) = f (R cos(2πs), R sin(2πs)) = f (z),
z
where z = z(s) = R(cos 2πs + i sin 2πs), s ∈ [0, 1].
Note that | z | = R. Further, let
n
f(z)
c(s, t) = (1 − t)cR,f (s) + tcR,n
c(s,t)
= (1 − t)f (z) + tz n ,
(1-t) f(z)+t z
n
(s, t) ∈ [0, 1]2 ,
b(s, t) = f ((1 − t) R(cos 2πs, sin 2πs))
= f ((1 − t)z),
(s, t) ∈ [0, 1]2
be singular 2-cubes in R2 .
Lemma 11.10 If | z | = R is sufficiently large, then
| c(s, t) | ≥
Rn
,
2
(s, t) ∈ [0, 1]2.
Proof. Since f (z) − z n is a polynomial of degree less than n,
f (z) − z n −→ 0,
z→∞
zn
in particular | f (z) − z n | ≤ Rn /2 if R is sufficiently large. Then we have
| c(s, t) | = | (1 − t)f (z) + tz n | = | z n + (1 − t)(f (z) − z n ) |
Rn
Rn
≥ | z n | − (1 − t) | f (z) − z n | ≥ Rn −
=
.
2
2
The only fact we need is c(s, t) 6= 0 for sufficiently large R; hence, c maps the unit square into
R2 \ {(0, 0)}.
Lemma 11.11 Let ω = ω(x, y) = (−y dx + x dy)/(x2 + y 2 ) be the winding form on
R2 \ {(0, 0)}. Then we have
(a)
∂c = cR,f − cR,n ,
∂b = f (z) − f (0).
(b) For sufficiently large R, c, cR,n , and cR,f are chains in R2 \ {(0, 0)} and
Z
Z
ω=
ω = 2πn.
cR,n
cR,f
11 Differential Forms on Rn
302
Proof. (a) Note that z(0) = z(1) = R. Since ∂I2 (x) = (x, 0) − (x, 1) + (1, x) − (0, x) we have
∂c(s) = c(s, 0) − c(s, 1) + c(1, s) − c(0, s)
= f (z) − z n − ((1 − s)f (R) + sRn ) + ((1 − s)f (R) + sRn ) = f (z) − z n .
This proves (a). Similarly, we have
∂b(s) = b(s, 0) − b(s, 1) + b(1, s) − b(0, s)
= f (z) − f (0) + f ((1 − s)R) − f ((1 − s)R) = f (z) − f (0).
(b) By the Lemma 11.10, c is a singular 2-chain in R2 \ {(0, 0, 0)} for sufficiently large R.
Hence ∂c is a 1-chain in R2 \ {(0, 0)}. In particular, both cR,n and cR,f take values in
R2 \ {(0, 0)}. Hence (∂c)∗ (ω) is well-defined. We compute c∗R,n (ω) using the pull-backs of
dx and dy
cR,n ∗ (x2 + y 2 ) = R2n ,
cR,n ∗ ( dx) = −2πnRn sin(2πns) ds,
cR,n ∗ ( dy) = 2πnRn cos(2πns) ds,
cR,n ∗ (ω) = 2πn ds.
Hence
Z
ω=
cR,n
Z
1
2πn ds = 2πn.
0
By Stokes’ theorem and since ω is closed,
Z
Z
ω = d ω = 0,
∂c
such that by (a), and the above calculation
Z
Z
Z
0=
ω=
ω−
ω,
∂c
cR,n
c
hence
cR,f
Z
ω=
cR,n
Z
ω = 2πn.
cR,f
111111111111
000000000000
000000000000
111111111111
0000
1111
00
11
We complete the proof of the fundamental theorem of al000000000000
111111111111
0000
1111
00
11
000000000000
111111111111
0000
1111
n
b(s,t)=(1-t) z
000000000000
111111111111
gebra. Suppose to the contrary that the polynomial f (z)
0000
1111
000000000000
111111111111
0000
1111
for f=1.
000000000000
111111111111
is non-zero in C, then b as well as ∂b are singular chains
000000000000
111111111111
000000000000
111111111111
000000000000
111111111111
in R2 \ {(0, 0)}.
000000000000
111111111111
000000000000
111111111111
By Lemma 11.11 (b) and again by Stokes’ theorem we have
Z
Z
Z
Z
ω=
ω=
ω = dω = 0.
cR,f
cR,f −f (0)
∂b
b
But this is a contradiction to Lemma 11.11 (b). Hence, b is not a 2-chain in R2 \ {(0, 0)}, that
is there exist s, t ∈ [0, 1] such that b(s, t) = f ((1 − t)z) = 0. We have found that (1 −
11.3 Stokes’ Theorem
303
t)R(cos(2πs) + i sin(2πs)) is a zero of f . Actually, we have shown a little more. There is a
P
zero of f in the disc {z ∈ C | | z | ≤ R} where R ≥ max{1, 2 i | ai |}. Indeed, in this case
n−1
n−k X
Rn
n−1
| ak | z
| ak | R
≤
≤
| f (z) − z | ≤
2
k=1
k=1
n
n−1
X
and this condition ensures | c(s, t) | =
6 0 as in the proof of Lemma 11.10.
304
11 Differential Forms on Rn
Chapter 12
Measure Theory and Integration
12.1 Measure Theory
Citation from Rudins book, [Rud66, Chapter 1]: Towards the end of the 19th century it became
clear to many mathematicians that the Riemann integral should be replaced by some other
type of integral, more general and more flexible, better suited for dealing with limit processes.
Among the attempts made in this direction, the most notable ones were due to Jordan, Borel,
W.H. Young, and Lebesgue. It was Lebesgue’s construction which turned out to be the most
successful.
In a brief outline, here is the main idea: The Riemann integral of a function f over an interval
[a, b] can be approximated by sums of the form
n
X
f (ti ) m(Ei ),
i=1
where E1 , . . . , En are disjoint intervals whose union is [a, b], m(Ei ) denotes the length of Ei
and ti ∈ Ei for i = 1, . . . , n. Lebesgue discovered that a completely satisfactory theory of integration results if the sets Ei in the above sum are allowed to belong to a larger class of subsets
of the line, the so-called “measurable sets,” and if the class of functions under consideration is
enlarged to what we call “measurable functions.” The crucial set-theoretic properties involved
are the following: The union and the intersection of any countable family of measurable sets are
measurable;. . . the notion of “length” (now called “measure”) can be extended to them in such
a way that
m(E1 ∪ E2 ∪ · · · ) = m(E1 ) + m(E2 ) + · · ·
for any countable collection {Ei } of pairwise disjoint measurable sets. This property of m is
called countable additivity.
The passage from Riemann’s theory of integration to that of Lebesgue is a process of completion. It is of the same fundamental importance in analysis as the construction of the real number
system from rationals.
305
12 Measure Theory and Integration
306
12.1.1 Algebras, σ-algebras, and Borel Sets
(a) The Measure Problem. Definitions
Lebesgue (1904) states the following problem: We want to associate to each bounded subset
E of the real line a positive real number m(E), called measure of E, such that the following
properties are satisfied:
(1) Any two congruent sets (by shift or reflexion) have the same measure.
(2) The measure is countably additive.
(3) The measure of the unit interval [0, 1] is 1.
He emphasized that he was not able to solve this problem in full detail, but for a certain class of
sets which he called measurable. We will see that this restriction to a large family of bounded
sets is unavoidable—the measure problem has no solution.
Definition 12.1 Let X be a set. A family (non-empty) family A of subsets of X is called an
algebra if A, B ∈ A implies Ac ∈ A and A ∪ B ∈ A.
An algebra A is called a σ-algebra if for all countable families {An | n ∈ N} with An ∈ A we
have
[
An = A1 ∪ A2 ∪ · · · ∪ An ∪ · · · ∈ A.
N
n∈
Remarks 12.1 (a) Since A ∈ A implies A ∪ Ac ∈ A; X ∈ A and ∅ = X c ∈ A.
(b) If A is an algebra, then A ∩ B ∈ A for all A, B ∈ A. Indeed, by de Morgan’s rule,
S
T
T
S
( α Aα )c = α Acα and ( α Aα )c = α Acα , we have A ∩ B = (Ac ∪ B c )c , and all the
members on the right are in A by\
the definition of an algebra.
(c) Let A be a σ-algebra. Then
An ∈ A if An ∈ A for all n ∈ N. Again by de Morgan’s
N
n∈
rule
\
N
An =
n∈
[
N
n∈
Acn
!c
.
(d) The family P(X) of all subsets of X is both an algebra as well as a σ-algebra.
(e) Any σ-algebra is an algebra but there are algebras not being σ-algebras.
(f) The family of finite and cofinite subsets (these are complements of finite sets) of an infinite
set form an algebra. Do they form a σ-algebra?
(b) Elementary Sets and Borel Sets in
R
n
Let R be the extended real axis together with ±∞, R = R ∪ {+∞} ∪ {−∞}. We use the old
rules as introduced in Section 3.1.1 at page 79. The new rule which is used in measure theory
only is
0 · ±∞ = ±∞ · 0 = 0.
The set
I = {(x1 , . . . , xn ) ∈ Rn | ai xi bi ,
i = 1, . . . , n}
12.1 Measure Theory
307
is called a rectangle or a box in Rn , where either stands for < or for ≤ where ai , bi ∈ R.
For example ai = −∞ and bi = +∞ yields I = Rn , whereas a1 = 2, b1 = 1 yields I = ∅.
A subset of Rn is called an elementary set if it is the union of a finite number of rectangles
in Rn . Let En denote the set of elementary subsets of Rn . En = {I1 ∪ I2 ∪ · · · ∪ Ir | r ∈
N, Ij is a box in Rn }.
Lemma 12.1 En is an algebra but not a σ-algebra.
Proof. The complement of a finite interval is the union of
two intervals, the complement of an infinite interval is an
infinite interval. Hence, the complement of a rectangle in
Rn is the finite union of rectangles.
S
The countable (disjoint) union M = n∈N [n, n + 12 ] is
not an elementary set.
Note that any elementary set is the disjoint union of a
finite number of rectangles.
Let B be any (nonempty) family of \
subsets of X. Let σ(B) denote the intersection of all σalgebras containing B, i. e. σ(B) =
Ai , where {Ai | i ∈ I} is the family of all σ-algebras
i∈I
Ai which contain B, B ⊆ Ai for all i ∈ I.
Note that the σ-algebra P(X) of all subsets is always a member of that family {Ai } such that
σ(B) exists. We call σ(B) the σ-algebra generated by B. By definition, σ(B) is the smallest
σ-algebra which contains the sets of B. It is indeed a σ-algebra. Moreover, σ(σ(B)) = σ(B)
and if B1 ⊂ B2 then σ(B1 ) ⊂ σ(B2 ).
Definition 12.2 The Borel algebra Bn in Rn is the σ-algebra generated by the elementary sets
En . Its elements are called Borel sets.
The Borel algebra Bn = σ(En ) is the smallest σ-algebra which contains all boxes in Rn We
will see that the Borel algebra is a huge family of subsets of Rn which contains “all sets we are
ever interested in.” Later, we will construct a non-Borel set.
Proposition 12.2 Open and closed subsets of Rn are Borel sets.
Proof. We give the proof in case of R2 . Let Iε (x0 , y0 ) = (x0 − ε, x0 + ε) × (y0 − ε, y0 + ε)
denote the open square of size 2ε by 2ε with midpoint (x0 , y0 ). Then I 1 ⊆ I 1 for n ∈ N. Let
n+1
n
M ⊂ R2 be open. To every point (x0 , y0 ) with rational coordinates x0 , y0 we choose the largest
square I1/n (x0 , y0 ) ⊆ M in M and denote it by J(x0 , y0 ). We show that
M=
[
Q
(x0 ,y0 )∈M, x0 ,y0 ∈
J(x0 , y0 ).
12 Measure Theory and Integration
308
(x,y)
0 0
(a,b)
Since the number of rational points in M is at least countable, the right
side is in σ(E2 ). Now let (a, b) ∈ M arbitrary. Since M is open, there
exists n ∈ N such that I2/n (a, b) ⊆ M. Since the rational points
are dense in R2 , there is rational point (x0 , y0 ) which is contained in
I1/n (a, b). Then we have
I 1 (x0 , y0) ⊆ I 2 (a, b) ⊆ M.
n
n
Since (a, b) ∈ I 1 (x0 , y0) ⊆ J(x0 , y0 ), we have shown that M is the union of the countable
n
family of sets J. Since closed sets are the complements of open sets and complements are
again in the σ-algebra, the assertion follows for closed sets.
Remarks 12.2 (a) We have proved that any open subset M of Rn is the countable union of
rectangles I ⊆ M.
(b) The Borel algebra Bn is also the σ-algebra generated by the family of open or closed sets
in Rn , Bn = σ(Gn ) = σ(Fn ). Countable unions and intersections of open or closed sets are in
Bn .
Let us look in more detail at some of the sets in σ(En ). Let G and F be the families of all open
and closed subsets of Rn , respectively. Let Gδ be the collection of all intersection of sequences
of open sets (from G), and let Fσ be the collection of all unions of sequences of sets of F. One
can prove that F ⊂ Gδ and G ⊂ Fσ . These inclusions are strict. Since countable intersection
and unions of countable intersections and union are still countable operations, Gδ , Fσ ⊂ σ(En )
For an arbitrary family S of sets let Sσ be the collection of all unions of sequences of sets in S,
and let Sδ be the collection of all unions of sequences of sets in S. We can iterate the operations
represented by σ and δ, obtaining from the class G the classes Gδ , Gδσ , Gδσδ ,. . . and from F the
classes Fσ , Fσδ ,. . . . It turns out that we have inclusions
G ⊂ Gδ ⊂ Gδσ ⊂ · · · ⊂ σ(En )
F ⊂ Fσ ⊂ Fσδ ⊂ · · · ⊂ σ(En ).
No two of these classes are equal. There are Borel sets that belong to none of them.
12.1.2 Additive Functions and Measures
Definition 12.3 (a) Let A be an algebra over X. An additive function or content µ on A is a
function µ : A → [0, +∞] such that
(i) µ(∅) = 0,
(ii) µ(A ∪ B) = µ(A) + µ(B) for all A, B ∈ A with A ∩ B = ∅.
(b) An additive function µ is called countably additive (or σ-additive in the German literature)
on A if for any disjoint family {An | An ∈ A, n ∈ N}, that is Ai ∩ Aj = ∅ for all i 6= j, with
12.1 Measure Theory
S
309
N An ∈ A we have
n∈
[
µ
N
An
n∈
!
=
X
N
µ(An ).
n∈
(c) A measure is a countably additive function on a σ-algebra A.
If X is a set, A a σ-algebra on X and µ a measure on A, then the tripel (X, A, µ) is called
a measure space. Likewise, if X is a set and A a σ-algebra on X, the pair (X, A) is called a
measurable space.
Notation. X
[
An in place of
An if {An } is a disjoint family of subsets. The countable
We write
N
n∈
N
n∈
additivity reads as follows
µ(A1 ∪ A2 ∪ · · · ) = µ(A1 ) + µ(A2 ) + · · · ,
!
∞
∞
X
X
µ
An =
µ(An ).
n=1
n=1
We say µ is finite if µ(X) < ∞. If µ(X) = 1, we call (X, A, µ) a probability space. We call µ
S
σ-finite if there exist sets An ∈ A, with µ(An ) < ∞ and X = ∞
n=1 An .
Example 12.1 (a) Let X be a set, x0 ∈ X and A = P(X). Then
µ(A) =
(
1,
0,
x0 ∈ A,
x0 6∈ A
defines a finite measure on A. µ is called the point mass concentrated at x0 .
(b1) Let X be a set and A = P(X). Put
µ(A) =
(
n,
if A has n elements
+∞,
if A has infinitely many elements.
µ is a measure on A, the so called counting measure.
(b2) Let X be a set and A = P(X). Put
µ(A) =
(
0,
+∞,
if A has finitely many or countably many elements
if A has uncountably many elements.
µ is countably additive, not σ-finite.
(b3) Let X be a set and A = P(X). Put
µ(A) =
µ is additive, not σ-additive.
(
0,
+∞,
if A is finite
if A is infinite.
12 Measure Theory and Integration
310
(c) X = Rn , A = En is the algebra of elementary sets of Rn . Every A ∈ En is the finite disjoint
P
Pm
union of rectangles A = m
k=1 Ik . We set µ(A) =
k=1 µ(Ik ) where
µ(I) = (b1 − a1 ) · · · (bn − an ),
if I = {(x1 , . . . , xn ) ∈ Rn | ai xi bi , i = 1, . . . , n} and ai bi ; µ(∅) = 0. Then µ is an
additive function on En . It is called the Lebesgue content on Rn . Note that µ is not a measure
(since A is not a σ-algebra and µ is not yet shown to be countably additive). However, we will
see in Proposition 12.5 below that µ is even countably additive. By definition, µ(line in R2 ) = 0
and µ(plane in R3 ) = 0.
(d) Let X = R, A = E1 , and α an increasing function on R. For a, b in R with a < b set
µα ([a, b]) = α(b + 0) − α(a − 0),
µα ([a, b)) = α(b − 0) − α(a − 0),
µα ((a, b]) = α(b + 0) − α(a + 0),
µα ((a, b)) = α(b − 0) − α(a + 0).
Then µα is an additive function on E1 if we set
µα (A) =
n
X
µα (Ii ),
if A =
i=1
n
X
Ii .
i=1
We call µα the Lebesgue–Stieltjes content.
On the other hand, if µ : E1 → R is an additive function, then αµ : R → R defined by
αµ (x) =
(
µ((0, x]),
−µ((x, 0]),
x ≥ 0,
x < 0,
defines an increasing, right-continuous function αµ on R such that µ = µαµ . In general α 6= αµα
since the function on the right hand side is continuous from the right whereas α is, in general,
not .
Properties of Additive Functions
Proposition 12.3 Let A be an algebra over X and µ an additive function on A. Then
!
n
n
X
X
Ak =
µ(Ak ) if Ak ∈ A, k = 1, . . . , n form a disjoint family of n
(a) µ
k=1
k=1
subsets.
(b) µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B), A, B ∈ A.
(c) A ⊆ B implies µ(A) ≤ µ(B) (µ is monotone).
(d) If A ⊆ B, A, B ∈ A, and µ(A) < +∞, then µ(B \ A) = µ(B) − µ(A),
(µ is subtractive).
12.1 Measure Theory
311
(e)
µ
n
[
Ak
k=1
!
≤
n
X
µ(Ak ),
k=1
if Ak ∈ A, k = 1, . . . , n, (µ is finitely subadditive).
∞
X
Ak ∈ A. Then
(f) If {Ak | k ∈ N} is a disjoint family in A and
k=1
µ
∞
X
Ak
k=1
!
≥
∞
X
µ(Ak ).
k=1
Proof. (a) is by induction. (d), (c), and (b) are easy (cf. Homework 34.4).
n
[
(e) We can write
Ak as the finite disjoint union of n sets of A:
k=1
n
[
k=1
Ak = A1 ∪ A2 \ A1 ∪ A3 \ (A1 ∪ A2 ) ∪ · · · ∪ An \ (A1 ∪ · · · ∪ An−1 ) .
Since µ is additive,
µ
n
[
k=1
Ak
!
=
n
X
k=1
µ (Ak \ (A1 ∪ · · · Ak−1 )) ≤
n
X
µ(Ak ),
k=1
where we used µ(B \ A) ≤ µ(B) (from (d)).
(f) Since µ is additive, and monotone
n
X
µ(Ak ) = µ
k=1
n
X
Ak
k=1
!
≤µ
∞
X
k=1
Ak
!
.
Taking the supremum on the left gives the assertion.
Proposition 12.4 Let µ be an additive function on the algebra A. Consider the following statements
(a) µ is countably additive.
(b) For any increasing sequence An ⊆ An+1 , An ∈ A, with
have lim µ(An ) = µ(A).
n→∞
(c) For any decreasing sequence An ⊇ An+1 , An ∈ A, with
µ(An ) < ∞ we have lim µ(An ) = µ(A).
n→∞
(d) Statement (c) with A = ∅ only.
∞
[
n=1
∞
\
n=1
An = A ∈ A we
An = A ∈ A and
We have (a) ↔ (b) → (c) → (d). In case µ(X) < ∞ (µ is finite), all statements are equivalent.
12 Measure Theory and Integration
312
Proof. (a) → (b). Without loss of generality A1 = ∅. Put Bn = An \ An−1 for n = 2, 3, . . . .
S
Then {Bn } is a disjoint family with An = B2 ∪ B3 ∪ · · · ∪ Bn and A = ∞
n=1 Bn . Hence, by
countable additivity of µ,
!
∞
n
n
X
X
X
µ(A) =
µ(Bk ) = lim
µ(Bk ) = lim µ
Bk = lim µ(An ).
n→∞
k=2
n→∞
k=2
k=2
n→∞
S
(b) → (a). Let {An } be a family of disjoint sets in A with An = A ∈ A; put Bk =
A1 ∪ · · · ∪ Ak . Then Bk is an increasing to A sequence. By (b)
!
!
n
∞
n
X
X
X
µ(Bn ) = µ
=
Ak
µ(Ak ) −→ µ(A) = µ
Ak ;
µ is additive
k=1
Thus,
∞
X
µ(Ak ) = µ
k=1
∞
X
k=1
n→∞
k=1
k=1
!
Ak .
(b) → (c). Since An is decreasing to A, A1 \ An is increasing to A1 \ A. By (b)
µ(A1 \ An ) −→ µ(A1 \ A),
n→∞
hence µ(A1 ) − µ(An ) −→ µ(A1 ) − µ(A) which implies the assertion.
n→∞
(c) → (d) is trivial.
Now let µ be finite, in particular, µ(B) < ∞ for all B ∈ A. We show (d) → (b). Let (An ) be an
increasing to A sequence of subsets A, An ∈ A. Then (A \ An ) is a decreasing to ∅ sequence.
By (d), µ(A \ An ) −→ 0. Since µ is subtractive (Proposition 12.3 (d)) and all values are finite,
n→∞
µ(An ) −→ µ(A).
n→∞
Proposition 12.5 Let α be a right-continuous increasing function α : R → R, and µα the
corresponding Lebesgue–Stieltjes content on E1 . Then µα is countably additive if.
Proof. For simplicity we write µ for µα . Recal that µ((a, b]) = α(b) − α(a). We will perform
the proof in case of
∞
[
(ak , bk ]
(a, b] =
k=1
with a disjoint family [ak , bk ) of intervals. By Proposition 12.3 (f) we already know
µ ((a, b]) ≥
∞
X
µ((ak , bk ]).
(12.1)
k=1
We prove the opposite direction. Let ε > 0. Since α is continuous from the right at a, there
exists a0 ∈ [a, b) such that α(a0 )−α(a) < ε and, similarly, for every k ∈ N there exists ck > bk
such that α(ck ) − α(bk ) < ε/2k . Hence,
[a0 , b] ⊂
∞
[
(ak , bk ] ⊂
k=1
∞
[
(ak , ck )
k=1
12.1 Measure Theory
313
is an open covering of a compact set. By Heine–Borel (Definition 6.14) there exists a finite
subcover
N
N
[
[
[a0 , b] ⊂
(ak , ck ), hence (a0 , b] ⊂
(ak , ck ],
k=1
k=1
such that by Proposition 12.3 (e)
µ((a0 , b]) ≤
N
X
µ((ak , ck ]).
k=1
By the choice of a0 and ck ,
µ((ak , ck ]) = µ((ak , bk ]) + α(ck ) − α(bk ) ≤ µ((ak , bk ]) +
ε
.
2k
Similarly, µ((a, b]) = µ((a, a0 ]) + µ((a0 , b]) such that
N X
ε
µ((ak , bk ]) + k + ε
µ((a, b]) ≤ µ((a0 , b]) + ε ≤
2
k=1
≤
N
X
k=1
µ((ak , bk ]) + 2ε ≤
Since ε was arbitrary,
µ((a, b]) ≤
In view of (12.1), µα is countably additive.
∞
X
∞
X
µ((ak , bk ]) + 2ε
k=1
µ((ak , bk ]).
k=1
Corollary 12.6 The correspondence µ 7→ αµ from Example 12.1 (d) defines a bijection between countably additive functions µ on E1 and the monotonically increasing, right-continuous
functions α on R (up to constant functions, i. e. α and α + c define the same additive function).
Historical Note. It was the great achievment of Émile Borel (1871–1956) that he really proved
the countable additivity of the Lebesgue measure. He realized that the countable additivity of µ
is a serious mathematical problem far from being evident.
12.1.3 Extension of Countably Additive Functions
Here we must stop the rigorous treatment of measure theory. Up to now, we know only two
trivial examples of measures (Example 12.1 (a) and (b)). We give an outline of the steps toward
the construction of the Lebesgue measure.
• Construction of an outer measure µ∗ on P(X) from a countably additive function µ on an
algebra A.
• Construction of the σ-algebra Aµ of measurable sets.
12 Measure Theory and Integration
314
The extension theory is due to Carathéodory (1914). For a detailed treatment, see [Els02, Section II.4].
Theorem 12.7 (Extension and Uniqueness) Let µ be a countably additive function on the algebra A.
(a) There exists an extension of µ to a measure on the σ-algebra σ(A) which coincides with µ
on A. We denote the measure on σ(A) also by µ. It is defined as the restriction of the outer
measure µ∗ : P(X) → [0, ∞]
µ∗ (A) = inf
(
∞
X
k=n
µ(An ) | A ⊂
∞
[
n=1
)
An , An ∈ A, n ∈ N
to the µ-measurable sets Aµ∗ .
(b) This extension is unique, if (X, A, µ) is σ-finite.
(For a proof, see [Brö92, (2.6), p.68])
Remark 12.3 (a) A subset A ⊂ X is said to be µ-measurable if for all Y ⊂ X
µ∗ (Y ) = µ∗ (A ∩ Y ) + µ∗ (Ac ∩ Y ).
The family of µ-measurable sets form a σ-algebra Aµ∗ .
(b) We have A ⊂ σ(A) ⊂ Aµ∗ and µ∗ (A) = µ(A) for all A ∈ A.
(c) (X, Aµ∗ , µ∗ ) is a measure space, in particular, µ∗ is countably additive on the measurable
sets Aµ∗ and we redenote it by µ.
12.1.4 The Lebesgue Measure on
R
n
Using the facts from the previous subsection we conclude that for any increasing, right continuous function α on R there exists a measure µα on the σ-algebra of Borel sets. We call this
measure the Lebesgue–Stieltjes measure on R. In case α(x) = x we call it the Lebesgue measure. Extending the Lebesgue content on elementary sets of Rn to the Borel algebra Bn , we
obtain the n-dimensional Lebesgue measure λn on Rn .
Completeness
A measure µ : A → R+ on a σ-algebra A is said to be complete if A ∈ A, µ(A) = 0, and
B ⊂ A implies B ∈ A. It turns out that the Lebesgue measure λn on the Borel sets of Rn is
not complete. Adjoining to Bn the subsets of measure-zero-sets, we obtain the σ-algebra Aλn
of Legesgue measurable sets Aλn .
Aλn = σ (Bn ∪ {X ⊆ Rn | ∃ B ∈ Bn : X ⊂ E,
The Lebesgue measure λn on Aλn is now complete.
λn (B) = 0}) .
12.1 Measure Theory
315
Remarks 12.4 (a) The Lebesgue measure is invariant under the motion group of Rn . More
precisely, let O(n) = {T ∈ Rn×n | T ⊤ T = T T ⊤ = En } be the group of real orthogonal
n × n-matrices (“motions”), then
λn (T (A)) = λn (A),
A ∈ Aλn ,
T ∈ O(n).
(b) λn is translation invariant, i. e. λn (A) = λn (x+A) for all x ∈ Rn . Moreover, the invariance
of λn under translations uniquely characterizes the Lebesgue measure λn : If λ is a translation
invariant measure on Bn , then λ = cλn for some c ∈ R+ .
(c) There exist non-measurable subsets in Rn . We construct a subset E of R that is not Lebesgue
measurable.
We write x ∼ y if x − y is rational. This is an equivalence relation since x ∼ x for all x ∈ R,
x ∼ y implies y ∼ x for all x and y, and x ∼ y and y ∼ z implies x ∼ z. Let E be a subset of
(0, 1) that contains exactly one point in every equivalence class. (the assertion that there is such
a set E is a direct application of the axiom of choice). We claim that E is not measurable. Let
E + r = {x + r | x ∈ E}. We need the following two properties of E:
(a) If x ∈ (0, 1), then x ∈ E + r for some rational r ∈ (−1, 1).
(b) If r and s are distinct rationals, then (E + r) ∩ (E + s) = ∅.
To prove (a), note that for every x ∈ (0, 1) there exists y ∈ E with x ∼ y. If r = x − y, then
x = y + r ∈ E + r.
To prove (b), suppose that x ∈ (E + r) ∩ (E + s). Then x = y + r = z + s for some y, z ∈ E.
Since y − z = s − r 6= 0, we have y ∼ z, and E contains two equivalent points, in contradiction
to our choice of E.
[
Now assume that E is Lebesgue measurable with λ(E) = α. Define S =
(E + r) where
the union is over all rational r ∈ (−1, 1). By (b), the sets E + r are pairwise disjoint; since λ
is translation invariant, λ(E + r) = λ(E) = α for all r. Since S ⊂ (−1, 2), λ(S) ≤ 3. The
countable additivity of λ now forces α = 0 and hence λ(S) = 0. But (a) implies (0, 1) ⊂ S,
hence 1 ≤ λ(S), and we have a contradiction.
(d) Any countable set has Lebesgue measure zero. Indeed, every single point is a box with
edges of length 0; hence λ({pt}) = 0. Since λ is countably additive,
λ({x1 , x2 , . . . , xn , . . . }) =
∞
X
n=1
λ({xn }) = 0.
In particular, the rational numbers have Lebesgue measure 0, λ(Q) = 0.
(e) There are uncountable sets with measure zero. The Cantor set (Cantor: 1845–1918, inventor
of set theory) is a prominent example:
)
( ∞
X ai C=
ai ∈ {0, 2} ∀ i ∈ N ;
3i i=1
Obviously, C ⊂ [0, 1]; C is compact and can be written as the intersection of a decreasing
sequence (Cn ) of closed subsets; C1 = [0, 1/3] ∪ [2/3, 1], λ(C1 ) = 2/3, and, recursively,
2
1
1
1
2
2 1
Cn+1 = Cn ∪
= λ(Cn ).
+ Cn =⇒ λ(Cn+1 ) = λ(Cn ) + λ Cn +
3
3 3
3
3
3
3
12 Measure Theory and Integration
316
iai ∈ {0, 2} ∀ i = 1, . . . , n Clearly,
n
n+1
2
2
2
λ(Cn+1) = λ(Cn ) = · · · =
λ(C1 ) =
.
3
3
3
It turns out that Cn =
P∞
ai
i=1 3i
By Proposition 12.4 (c), λ(C) = lim λ(Cn ) = 0. However, C has the same cardinality as
n→∞
{0, 2}N ∼
= {0, 1}N ∼
= R which is uncountable.
12.2 Measurable Functions
Let A be a σ-algebra over X.
Definition 12.4 A real function f : X → R is called A-measurable if for all a ∈ R the set
{x ∈ X | f (x) > a} belongs to A.
A complex function f : X → C is said to be A-measurable if both Re f and Im f are Ameasurable.
A function f : U → R, U ⊂ Rn , is said to be a Borel function if f is Bn -measurable, i. e. f is
measurable with respect to the Borel algebra on Rn .
A function f : U → V , U ⊂ Rn , V ⊂ Rm , is called a Borel function if f −1 (B) is a Borel set
for all Borel sets B ⊂ V . It is part of homework 39.3 (b) to prove that in case m = 1 these
definitions coincide. Also, f = (f1 , . . . , fm ) is a Borel function if all fi are.
Note that {x ∈ X | f (x) > a} = f −1 ((a, +∞)). From Proposition 12.8 below it becomes clear
that the last two notions are consistent. Note that no measure on (X, A) needs to be specified
to define a measurable function.
Example 12.2 (a) Any continuous function f : U → R, U ⊂ Rn , is a Borel function. Indeed,
since f is continuous and (a, +∞) is open, f −1 ((a, +∞)) is open as well and hence a Borel set
(cf. Proposition 12.2).
(b) The characteristic function χA is A-measurable if and only if A ∈ A (see homework 35.3).
(c) Let f : U → V and g : V → W be Borel functions. Then g ◦f : U → W is a Borel function,
too. Indeed, for any Borel set C ⊂ W , g −1(C) is a Borel set in V since g is a Borel function.
Since f is a Borel function (g ◦f )−1 (C) = f −1 (g −1(C)) is a Borel subset of U which shows the
assertion.
Proposition 12.8 Let f : X → R be a function. The following are equivalent
(a) {x | f (x) > a} ∈ A for all a ∈ R (i. e. f is A-measurable).
(b) {x | f (x) ≥ a} ∈ A for all a ∈ R.
(c) {x | f (x) < a} ∈ A for all a ∈ R.
(d) {x | f (x) ≤ a} ∈ A for all a ∈ R.
(e) f −1 (B) ∈ A for all Borel sets B ∈ B1 .
Proof. (a) → (b) follows from the identity
[a, +∞] =
\
N
n∈
(a − 1/n, +∞]
12.2 Measurable Functions
317
and the invariance of intersections under preimages, f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B).
Since f is A-measurable and A is a σ-algebra, the countable intersection on the right is in A.
(a) → (d) follows from {x | f (x) ≤ a} = {x | f (x) > a}c . The remaining directions are left
to the reader (see also homework 35.5).
Remark 12.5 (a) Let f, g : X → R be A-measurable. Then {x | f (x) > g(x)} and {x |
f (x) = g(x)} are in A. Proof. Since
[
{x | f (x) < g(x)} =
({x | f (x) < q} ∩ {x | q < g(x)}) ,
Q
q∈
and all sets {f < q} and {q < g} the right are in A, and on the right there is a countable union,
the right hand side is in A. A similar argument works for {f > g}. Note that the sets {f ≥ g}
and {f ≤ g} are the complements of {f < g} and {f > g}, respectively; hence they belong to
A as well. Finally, {f = g} = {f ≥ g} ∩ {f ≤ g}.
(b) It is not difficult to see that for any sequence (an ) of real numbers
lim an = inf sup ak
n→∞
N
n∈
lim an = sup inf ak .
and
(12.2)
N k≥n
n→∞
k≥n
n∈
As a consequence we can construct new mesurable functions using sup and limn→∞ . Let (fn )
be a sequence of A-measurable real functions on X. Then sup fn , inf fn , lim fn , lim fn are
N
n∈
N
n∈
A-measurable. In particular lim fn is measurable if the limit exists.
n→∞
Proof. Note that for all a ∈ R we have
\
{fn ≤ a}.
{sup fn ≤ a} =
n→∞
n→∞
N
n∈
Since all fn are measurable, so is sup fn . A similar proof works for inf fn . By (12.2), lim fn
n→∞
and lim fn , are measurable, too.
n→∞
Proposition 12.9 Let f, g : X → R Borel functions on X ⊂ Rn . Then αf + βg, f · g, and | f |
are Borel functions, too.
Proof. The function h(x) = (f (x), g(x)) : X → R2 is a Borel function since its coordinate
functions are so. Since the sum s(x, y) = x + y and the product p(x, y) = xy are continuous
functions, the compositions s◦h and p◦h are Borel functions by Example 12.2 (c). Since the
constant functions α and β are Borel, so are αf , βg, and finally αf + βg. Hence, the Borel
functions over X form a linear space, moreover a real algebra. In particular −f is Borel and so
is | f | = max{f, −f }.
12 Measure Theory and Integration
318
f+
f-
Let (X, A, µ) be a measure space and f : X → R arbitrary. Let f + = max{f, 0} and f − = max{−f, 0}
denote the positive and negative parts of f . We have
f = f + −f − and | f | = f + +f − ; moreover f + , f − ≥ 0.
Corollary 12.10 Let f is a Borel function if and only if both f + and f − are Borel.
12.3 The Lebesgue Integral
We define the Lebesgue integral of a complex function in three steps; first for positive simple
functions, then for positive measurable functions and finally for arbitrary measurable functions.
In this section (X, A, µ) is a measure space.
12.3.1 Simple Functions
Definition 12.5 Let M ⊆ X be a subset. The function
(
1, x ∈ M,
χM (x) =
0, x 6∈ M,
is called characteristic function of M.
An A-measurabel function f : X → R is called simple if f takes only finitely many values
c1 , . . . , cn . We denote the set of simple functions on (X, A) by S; the set of non-negative simple
functions is denoted by S+ .
Clearly, if c1 , . . . , cn are the distinct values of the simple function f , then
f=
n
X
ci χAi ,
i=1
where Ai = {x | f (x) = ci }. It is clear, that f measurable if and only if Ai ∈ A for all i.
Obviously, {Ai | i = 1, . . . , n} is a disjoint family of subsets of X.
It is easy to see that f, g ∈ S implies αf + βg ∈ S, max{f, g} ∈ S, min{f, g} ∈ S, and f g ∈ S.
Step 1: Positive Simple Functions
For f =
n
X
i=1
ci χAi ∈ S+ define
Z
f dµ =
X
n
X
ci µ(Ai ).
(12.3)
i=1
The convention 0 · (+∞) is used here; it may happen that ci = 0 for some i and µ(Ai ) = +∞.
12.3 The Lebesgue Integral
319
Remarks 12.6 (a) Since ci ≥ 0 for all i, the right hand side is well-defined in R.
m
m
X
X
(b) Given another presentation of f , say, f (x) =
dj χBj ,
dj µ(Bj ) gives the same value
j=1
j=1
as (12.3).
The following properties are easily checked.
Lemma 12.11 For f, g ∈ S+ , A ∈ A, c ∈ R+ we have
R
(1) X χA dµ = µ(A).
R
R
(2) X cf dµ = c X f dµ.
R
R
R
(3) X (f + g) dµ = X f dµ + X g dµ.
R
R
(4) f ≤ g implies X f dµ ≤ X g dµ.
12.3.2 Positive Measurable Functions
The idea is to approximate a positive measurable function with an increasing sequence of positive simple ones.
Theorem 12.12 Let f : X → [0, +∞] be measurable. There exist simple functions sn , n ∈ N,
on X such that
(a)
0 ≤ s1 ≤ s2 ≤ · · · ≤ f .
(b)
sn (x) −→ f (x), as n → ∞, for every x ∈ X.
n→∞
y
f
1
Example. X = (a, b), n = 1,
1 ≤ i ≤ 2. Then
1
−1
,
0,
E11 = f
2
1
−1
E12 = f
,1
,
2
s1
3/4
E24
E24
1/2
F2 = f −1 ([1, +∞]) .
0
a
E11
E12
E12 E11 b
F1
Proof. For n ∈ N and for 1 ≤ i ≤ n2n , define
i−1 i
−1
,
Eni = f
2n 2n
x
and Fn = f −1 ([n, ∞])
and put
n
sn =
n2
X
i−1
i=1
2n
χEni + nχFn .
12 Measure Theory and Integration
320
Proposition 12.8 shows that Eni and Fn are measurable sets. It is easily seen that the functions
sn satisfy (a). If x is such that f (x) < +∞, then
0 ≤ f (x) − sn (x) ≤
1
2n
(12.4)
as soon as n is large enough, that is, x ∈ Eni for some n, i ∈ N and not x ∈ Fn . If f (x) = +∞,
then sn (x) = n; this proves (b).
From (12.4) it follows, that sn ⇉ f uniformly on X if f is bounded.
Step 2: Positive Measurable Real Functions
Definition 12.6 (Lebesgue Integral) Let f : X → [0, +∞] be measurable. Let (sn ) be an
increasing sequence of non-negative simple functions sn converging to f (x) for all x ∈ X,
lim sn (x) = sup sn (x) = f (x). Define
n→∞
N
n∈
Z
f dµ = lim
n→∞
X
Z
X
sn dµ = sup
N
n∈
Z
sn dµ
(12.5)
X
and call this number in [0, +∞] the Lebesgue integral of f (x) over X with respect to the measure µ or µ-integral of f over X.
The definition of the Lebesgue integral does not depend on the special choice of the increasing
functions sn ր f . One can define
Z
Z
f dµ = sup
s dµ | s ≤ f, and s is a simple function .
X
X
R
Observe, that we apparently have two definitions for X f dµ if f is a simple function. However
these assign the same value to the integral since f is the largest simple function greater than or
equal to f .
Proposition 12.13 The properties (1) to (4) from Lemma 12.11 hold for any non-negative measurable functions f, g : X → [0, +∞], c ∈ R+ .
(Without proof.)
Step 3: Measurable Real Functions
Let f : X → R be measurable and f + (x) = max(f, 0), f − (x) = max(−f (x), 0). Then f +
and f − are both positive and measurable. Define
Z
Z
Z
+
f dµ =
f dµ −
f − dµ
X
X
X
if at least one of the integrals on the right is finite. We say that f is µ-integrable if both are
finite.
12.3 The Lebesgue Integral
321
Step 4: Measurable Complex Functions
Definition 12.7 (Lebesgue Integral—Continued) A
complex,
f : X → C is called µ-integrable if
Z
| f | dµ < ∞.
measurable
function
X
If f = u + iv is µ-integrable, where u = Re f and v = Im f are the real and imaginary parts
of f , u and v are real measurable functions on X. Define the µ-integral of f over X by
Z
Z
Z
Z
Z
+
−
+
f dµ =
u dµ −
u dµ + i
v dµ − i
v − dµ.
(12.6)
X
X
X
X
X
These four functions u+ , u− , v + , and v − are measurable, real, and non-negative. Since we have
u+ ≤ | u | ≤ | f | etc., each of these four integrals is finite. Thus, (12.6) defines the integral on
the left as a complex number.
We define L 1 (X, µ) to be the collection of all complex µ-integrable functions f on X.
R
Note that for an integrable functions f , X f dµ is a finite number.
Proposition 12.14 Let f, g : X → C be measurable.
(a) f is µ-integrable if and only if | f | is µ-integrable and we have
Z
Z
f dµ ≤
| f | dµ.
X
X
(b) f is µ-integrable if and only if there exists an integrable function h with | f | ≤
h.
(c) If f, g are integrable, so is c1 f + c2 g where
Z
Z
Z
(c1 f + c2 g) dµ = c1
f dµ + c2
g dµ.
X
(d) If f ≤ g on X, then
X
R
X
f dµ ≤
R
X
X
g dµ.
It follows that the set L 1 (X, µ) of µ-integrable complex-valued functions on X is a linear
space. The Lebesgue integral defines a positive linear functional on L 1 (X, µ). Note that (b)
implies that any measurable and bounded function f on a space X with µ(X) < ∞ is integrable.
Step 5:
R
A
f dµ
Definition 12.8 Let A ∈ A, f : X → R or f : X → C measurable. The function f is called
µ-integrable over A if χA f is µ-integrable over X. In this case we put,
Z
Z
f dµ =
χA f dµ.
A
In particular, Lemma 12.11 (1) now reads
X
R
A
dµ = µ(A).
12 Measure Theory and Integration
322
12.4 Some Theorems on Lebesgue Integrals
12.4.1 The Role Played by Measure Zero Sets
Equivalence Relations
Let X be a set and R ⊂ X × X. For a, b ∈ X we write a ∼ b if (a, b) ∈ R.
Definition 12.9 (a) The subset R ⊂ X ×X is said to be an equivalence relation if R is reflexive,
symmetric and transitive, that is,
(r) ∀ x ∈ X : x ∼ x.
(s) ∀ x, y ∈ X : x ∼ y =⇒ y ∼ x.
(t) ∀ x, y, z ∈ X : x ∼ y ∧ y ∼ z =⇒ x ∼ z.
For a ∈ X the set a := {x ∈ X | x ∼ a} is called the equivalence class of a. We have a = b if
and only if a ∼ b.
(b) A partition P of X is a disjoint family P = {Aα | α ∈ I} of subsets Aα ⊂ X such that
P
Aα = X.
α∈I
The set of equivalence classes is sometimes denoted by X/ ∼.
Example 12.3 (a) On Z define a ∼ b if 2 | (a−b). a and b are equivalent if both are odd or both
are even. There are two equivalence classes, 1 = −5 = 2Z + 1 (odd numbers), 0 = 100 = 2Z
even numbers.
(b) Let W ⊂ V be a subspace of the linear space V . For x, y ∈ V define x ∼ y if x − y ∈ W .
This is an equivalence relation, indeed, the relation is reflexive since x − x = 0 ∈ W , it
is symmetric since x − y ∈ W implies y − x = −(x − y) ∈ W , and it is transitive since
x − y, y − z ∈ W implies that there sum (x − y) + (y − z) = x − z ∈ W such that x ∼ z. One
has 0 = W and x = x + W := {x + w | w ∈ W }. Set set of equivalence classes with respect to
this equivalence relation is called the factor space or quotient space of V with respect to W and
is denoted by V /W . The factor space becomes a linear space if we define x + y := x + y and
λ x = λx, λ ∈ C. Addition is indeed well-defined since x ∼ x′ and y ∼ y ′ , say, x − x′ = w1 ,
y − y ′ = w2 , w1 , w2 ∈ W implies x + y − (x′ + y ′) = w1 + w2 ∈ W such that x + y = x′ + y ′ .
(c) Similarly as in (a), for m ∈ N define the equivalence relation a ≡ b (mod m) if m | (a − b).
We say “a is congruent b modulo m”. This defines a partition of the integers into m disjoint
equivalence classes 0, 1, . . . , m − 1, where r = {am + r | a ∈ Z}.
(d) Two triangles in the plane are equivalent if
(1) there exists a translation such that the first one is mapped onto the second one.
(2) there exists a rotation around (0, 0)
(3) there exists a motion (rotation or translation or reflexion or composition)
Then (1) – (3) define different equivalence relations on triangles or more generally on subsets
of the plane.
(e) Cardinality of sets is an equivalence relation.
12.4 Some Theorems on Lebesgue Integrals
323
Proposition 12.15 (a) Let ∼ be an equivalence relation on X. Then P = {x | x ∈ X} defines
a partition on X denoted by P∼ .
(b) Let P be a partition on X. Then x ∼ y if there exists A ∈ P with x, y ∈ A defines an
equivalence relation ∼P on X.
(c) ∼P∼ =∼ and P∼P = P.
Let P be a property which a point x may or may not have. For instance, P may be the property
“f (x) > 0” if f is a given function or “(fn (x)) converges” if (fn ) is a given sequence of
functions.
Definition 12.10 If (X, A, µ) is a measure space and A ∈ A, we say “P holds almost everywhere on A”, abbreviated by “P holds a. e. on A”, if there exists N ∈ A such that µ(N) = 0
and P holds for every point x ∈ A \ N.
This concept of course strongly depends on the measure µ, and sometimes we write a. e. to
emphasize the dependence on µ.
(a) Main example. On the set of measurable functions on X, f : X → C we define an equivalence relation by
f ∼ g, if f = g a. e. on X.
This is indeed an equivalence relation since f (x) = f (x) for all x ∈ X, f (x) = g(x) implies
g(x) = f (x). Let f = g a. e. on X and g = h a. e. on X, that is, there exist M, N ∈ A with
µ(M) = µ(N) = 0 and f (x) = g(x) for all x ∈ X \ M, g(x) = h(x) for all x ∈ N. Hence,
f (x) = h(x) for all x ∈ M ∪ N. Since 0 ≤ µ(M ∪ N) ≤ µ(M) + µ(N) = 0 + 0 = 0 by
Proposition 12.3 (e), µ(M ∪ N) = 0 and finally f = h a. e. on X.
(b) Note that f = g a. e. on X implies
Z
Z
f dµ =
g dµ.
X
X
Indeed, let N denote the zero-set where f 6= g. Then
Z
Z
Z
Z
Z
f dµ −
g dµ ≤
| f − g | dµ =
| f − g | dµ +
X
X
X
X \N
N
| f − g | dµ
≤ µ(N)(∞) + µ(X \ N) · 0 = 0.
Here we used that for disjoint sets A, B ∈ A,
Z
Z
Z
Z
Z
Z
f dµ =
χA∪B f dµ =
χA f dµ +
χB f dµ =
f dµ +
f dµ.
A∪B
X
X
X
A
B
Proposition 12.16 Let f : X → [0, +∞] be measurable. Then
R
f dµ = 0 if and only if f = 0 a. e. on X.
X
R
R
Proof. By the above argument in (b), f = 0 a. e. implies X f dµ = X 0 dµ = 0 which proves
one direction. The other direction is homework 40.4.
12 Measure Theory and Integration
324
12.4.2 The space Lp (X, µ)
For any measurable function f : X → C and any real p, 1 ≤ p < ∞ define
kf kp =
Z
p
X
| f | dµ
p1
.
(12.7)
This number may be finite or ∞. In the first case, | f |p is integrable and we write f ∈ L 1 (X, µ).
Proposition 12.17 Let p, q ≥ 1 be given such that 1p + 1q = 1.
(a) Let f, g : X → C be measurable functions such that f ∈ L p and g ∈ L q .
Then f g ∈ L 1 and
Z
| f g | dµ ≤ kf kp kgkq (Hölder inequality).
(12.8)
X
(b) Let f, g ∈ L q . Then f + g ∈ L q and
kf + gkq ≤ kf kq + kgkq ,
(Minkowski inequality).
(12.9)
Idea of proof. Hölder follows from Young’s inequality (Proposition 1.31, as in the calssical case
of Hölder’s inequality in Rn , see Proposition 1.32
Minkowski’s inequality follows from Hölder’s inequality as in Propostion 1.34
Note that Minkowski implies that f, g ∈ L p yields kf + gk < ∞ such that f + g ∈ L p . In
particular, L p is a linaer space.
Let us check the properties of k·kp . For all measurable f, g we have
kf kp ≥ 0,
kλf kp = | λ | kf kp ,
kf + gk ≤ kf k + kgk .
All properties of a norm, see Definition 6.9 at page 179 are satisfied except for the definitness:
R
kf kp = 0 imlies X | f |p dµ = 0 implies by Proposition 12.16, | f |p = 0 a. e. implies f = 0
a. e. . However, it does not imply f = 0. To overcome this problem, we use the equivalece
relation f = g a. e. and consider from now on only equivalence classes of functions in L p , that
is we identify functions f and g which are equal a. e. .
The space N = {f : X → C | f is measurable and f = 0 a. e. } is a linear subspace of
L p (X, µ) for all all p, and f = g a. e. if and only if f − g ∈ N. Then the factor space L p /N,
see Example 12.3 (b) is again a linear space.
Definition 12.11 Let (X, A, µ) be a measure space. Lp (X, µ) denotes the set of equivalence
classes of functions of L p (X, µ) with respect to the equivalence relation f = g a. e. that is,
Lp (X, µ) = L p (X, µ)/N
is the quotient space. (Lp (X, µ), k·kp ) is a normed space. With this norm Lp (X, µ) is complete.
12.4 Some Theorems on Lebesgue Integrals
325
Example 12.4 (a) We have χQ = 0 in Lp (R, λ) since χQ = 0 a. e. on R with respect to the
Lebesgue measure (note that Q is a set of measure zero).
(b) In case of sequence spaces L p (N) with respect to the counting measure, L p = Lp since
f = 0 a. e. implies f = 0.
(c) f (x) = x1α , α > 0 is in L2 (0, 1) if and only if 2α < 1. We identify functions and their
equivalence classes.
12.4.3 The Monotone Convergence Theorem
The following theorem about the monotone convergence by Beppo Levi (1875–1961) is one of
the most important in the theory of integration. The theorem holds for an arbitrary increasing
R
sequence of measurable functions with, possibly, X fn dµ = +∞.
Theorem 12.18 (Monotone Convergence Theorem/Lebesgue) Let (fn ) be a sequence of
measurable functions on X and suppose that
(1)
(2)
0 ≤ f1 (x) ≤ f2 (x) ≤ · · · ≤ +∞ for all x ∈ X,
fn (x) −→ f (x), for every x ∈ X.
n→∞
Then f is measurable, and
lim
n→∞
Z
fn dµ =
X
Z
f dµ =
X
Z lim fn dµ.
n→∞
X
(Without proof) Note, that f is measurable is a consequence of Remark 12.5 (b).
Corollary 12.19 (Beppo Levi) Let fn : X → [0, +∞] be measurable for all n ∈ N, and
∞
X
f (x) =
fn (x) for x ∈ X. Then
n=1
Z X
∞
fn dµ =
X n=1
∞ Z
X
n=1
fn dµ.
X
Example 12.5 (a) Let X = N, A = P(N) the σ-algebra of all subsets, and µ the counting
measure on N. The functions on N can be identified with the sequences (xn ), f (n) = xn .
Trivially, any function is A-measurable.
R
What is N xn dµ? First, let f ≥ 0. For a simple function gn , given by gn = xn χ{n} , we obtain
∞
X
R
gn dµ = xn µ({n}) = xn . Note that f =
gn and gn ≥ 0 since xn ≥ 0. By Corollary 12.19,
n=1
Z
N
f dµ =
∞ Z
X
n=1
R
N
gn dµ =
∞
X
xn .
n=1
P
Now, let f be arbitrary integrable, i. e. N | f | dµ < ∞; thus ∞
n=1 | xn | < ∞. Therefore,
P
1
(xn ) ∈ L (N, µ) if and only if xn converges absolutely. The space of absolutely convergent
series is denoted by ℓ1 or ℓ1 (N).
12 Measure Theory and Integration
326
(b) Let anm ≥ 0 for all n, m ∈ N. Then
∞
∞ X
X
amn =
n=1 m=1
∞
∞ X
X
amn .
m=1 n=1
Proof. Consider the measure space (N, P(N), µ) from (a). For n ∈ N define functions fn (m) =
amn . By Corollary 12.19 we then have
Z X
∞
X n=1
=
|
fn (m) dµ =
{z
f (m)
∞
XZ
n=1
}
Z
f dµ =
(a)
X
fn (m) dµ =
X
∞ X
∞
X
∞
X
f (m) =
m=1
∞ X
∞
X
amn
m=1 n=1
amn .
n=1 m=1
Proposition 12.20 Let f : X → [0, +∞] be measurable. Then
Z
ϕ(A) =
f dµ, A ∈ A
A
defines a measure ϕ on A.
Proof. Since f ≥ 0, ϕ(A) ≥ 0 for all A ∈ A. Let (An ) be a countable disjoint family of
P
P∞
measurable sets An ∈ A and let A = ∞
n=1 An . By homework 40.1, χA =
n=1 χAn and
therefore
Z
Z X
Z
∞
∞ Z
X
f dµ =
χA f dµ =
χAn f dµ =
χAn f dµ
ϕ(A) =
A
=
X
∞ Z
X
n=1
An
f dµ =
X n=1
∞
X
B.Levi
n=1
X
ϕ(An ).
n=1
12.4.4 The Dominated Convergence Theorem
Besides the monotone convergence theorem the present theorem is the most important one. It is
due to Henry Lebesgue. The great advantage, compared with Theorem6.6, is that µ(X) = ∞ is
allowed, that is, non-compact domains X are included. We only need the pointwise convergence
of (fn ), not the uniform convergence. The main assumtion here is the existence of an integrable
upper bound for all fn .
Theorem 12.21 (Dominated Convergence Theorem of Lebesgue) Let fn : X
g, fn : X → C be measurable functions such that
→
R or
12.4 Some Theorems on Lebesgue Integrals
327
(1) fn (x) → f (x) as n → ∞ a. e. on X,
(2) | fn (x) | ≤ g(x)
Z
g dµ < +∞.
(3)
a. e. on X,
X
R
Then f is measurable and integrable, X | f | dµ < ∞, and
Z
Z
Z
lim
fn dµ =
f dµ =
lim fn dµ,
n→∞ X
n→∞
X
X
Z
| fn − f | dµ = 0.
lim
n→∞
(12.10)
X
Note, that (12.10) shows that (fn ) converges to f in the normed space L1 (X, µ).
Example
12.6 (a) Let An ∈ A, n ∈ N, A1 ⊂ A2 ⊂ · · · be an increasing sequence with
[
An = A. If f ∈ L 1 (A, µ), then f ∈ L 1 (An ) for all n and
N
n∈
lim
n→∞
Z
f dµ =
An
Z
f dµ.
(12.11)
A
Indeed, the sequence (χAn f ) is pointwise converging to χA f since χA (x) = 1 iff x ∈ A
iff x ∈ An for all n ≥ n0 iff limn→∞ χAn (x) = 1. Moreover, | χAn f | ≤ | χA f | which is
integrable. By Lebesgue’s theorem,
Z
Z
Z
Z
f dµ = lim
lim
χAn f =
χA f dµ =
f dµ.
n→∞
An
n→∞
X
X
A
However, if we do not assume f ∈ L 1 (A, µ), the statement is not true (see Remark 12.7 below).
Exhausting theorem. Let (An ) be an increasing sequencce of measurable sets and A =
R
S∞
n=1 An . suppose that f is measurable, and An f dµ is a bounded sequence. Then f ∈
L 1 (A, µ) and (12.11) holds.
(b) Let fn (x) = (−1)n xn on [0, 1]. The sequence is dominated by the integrable function
R
R
1 ≥ | fn (x) | for all x ∈ [0, 1]. Hence limn→∞ [0,1] fn dλ = 0 = [0,1] limn→∞ fn dλ.
12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals
As a direct application of the dominated convergence theorem we now treat parameter dependent integrals see Propostions 7.22 and 7.23
Proposition 12.22 (Continuity) Let U ⊂ Rn be an open connected set, t0 ∈ U, and
f : Rm × U → R be a function. Assume that
(a) for a.e. x ∈ Rm , the function t 7→ f (x, t) is continuous at t0 ,
(b) There exists an integrable function F : Rm → R such that for every t ∈ U,
| f (x, t) | ≤ F (x) a.e. on Rm .
12 Measure Theory and Integration
328
Then the function
g(t) =
is continuous at t0 .
Z
Rm
f (x, t) dx
Proof. First we note that for any fixed t ∈ U, the function ft (x) = f (x, t) is integrable on
Rm since it is dominated by the integrable function F . We have to show that for any sequence
tj → t0 , tj ∈ U, g(tj ) tends to g(t0 ) as n → ∞. We set fj (x) = f (x, tj ) and f0 (x) = f (x, t0 )
for all n ∈ N. By (b) we have
f0 (x) = lim fj (x),
j→∞
a. e. x ∈ Rm .
By (a) and (c), the assumptions of the dominated convergence theorem are satisfied and thus
Z
Z
Z
lim g(tj ) = lim
fj (x) dx =
lim fj (x) dx =
f0 (x) dx = g(t0 ).
j→∞
j→∞
Rm
Rm j→∞
Rm
Proposition 12.23 (Differentiation under the Integral Sign) Let I ⊂ R be an open interval
and f : Rm × I → R be a function such that
(a) for every t ∈ I the function x 7→ f (x, t) is integrable,
(b) for almost all x ∈ Rm , the function t 7→ f (x, t) is finite and continuously
differentiable,
(c) There exists an integrable function F : Rm → R such that for every t ∈ I
∂f
≤ F (x), a.e. x ∈ Rm .
(x,
t)
∂t
R
Then the function g(t) = Rm f (x, t) dx is differentiable on I with
Z
∂f
′
(x, t) dx.
g (t) =
Rm ∂t
The proof uses the previous theorem about the continuity of the parametric integral. A detailed
proof is to be found in [Kön90, p. 283].
Example 12.7 (a) Let f ∈ L 1 (R).
Then the Fourier transform fˆ: R → C,
R −itx
1
f (x) dx is continuous on R, see homework 41.3.
fˆ(t) = √2π e
R
(b) Let K ⊂ R3 be a compact subset and ρ : K → R integrable, the Newton potential (with
mass density ρ) is given by
Z
ρ(x)
u(t) =
dx, t 6∈ K.
K kx − tk
Then u(t) is a harmonic function on R3 \ K.
Similarly, if K ⊂ R2 is compact and ρ ∈ L (K), the Newton potential is given by
Z
u(t) =
ρ(x) log kx − tk dx, t 6∈ K.
K
Then u(t) is a harmonic function on R2 \ K.
12.4 Some Theorems on Lebesgue Integrals
329
12.4.6 The Riemann and the Lebesgue Integrals
Proposition 12.24 Let f be a bounded function on the finite interval [a, b].
(a) f is Riemann integrable on [a, b] if and only if f is continuous a. e. on [a, b].
(b) If f is Riemann integrable on [a, b], then f is Lebesgue integrable, too. Both integrals
coincide.
Let I ⊂ R be an interval such that f is Riemann integrable on all compact subintervals of I.
(c) f is Lebesgue integrable on I if and only if | f | is improperly Riemann integrable on I (see
Section 5.4); both integrals coincide.
Remarks 12.7 (a) The characteristic function χQ on [0, 1] is Lebesgue but not Riemann integrable; χQ is nowhere continuous on [0, 1].
(b) The (improper) Riemann integral
Z ∞
sin x
dx
x
1
converges (see Example 5.11); however, the Lebesgue integral does not exist since the integral
does not converge absolutely. Indeed, for non-negative integers n ≥ 1 we have with some c > 0
Z (n+1)π
Z (n+1)π sin x 1
c
dx ≥
;
| sin x | dx =
x (n + 1)π nπ
(n + 1)π
nπ
hence
n
X
sin x 1
dx ≥ c
.
x π k=1 k + 1
π
R∞
Since the harmonic series diverges, so does the integral 1 sinx x dx.
Z
(n+1)π
12.4.7 Appendix: Fubini’s Theorem
Theorem 12.25 Let (X1 , A1 , µ1 ) and (X2 , A2 , µ2 ) be σ-finite measure spaces, let f be an A1 ⊗
A2 -measurable function and X = X1 × X2 .
R
R
(a) If f : X → [0, +∞], ϕ(x1 ) = f (x1 , x2 ) dµ2 , ψ(x2 ) = f (x1 , x2 ) dµ1 , then
Z
X2
ψ(x2 ) dµ2 =
X2
Z
X1
f d(µ1 ⊗ µ2 ) =
X1 ×X2
(b) If f ∈ L 1 (X, µ1 ⊗ µ2 ) then
Z
Z
f d(µ1 ⊗ µ2 ) =
X1 ×X2
X2
Z
Z
ϕ(x1 ) dµ1 .
X1
f (x1 , x2 ) dµ1
X1
dµ2 .
Here A1 ⊗ A2 denotes the smallest σ-algebra over X, which contains all sets A × B, A ∈ A1
and B ∈ A2 . Define µ(A × B) = µ1 (A)µ2 (B) and extend µ to a measure µ1 ⊗ µ2 on A1 ⊗ A2 .
Remark 12.8 In (a), as in Levi’s theorem, we don’t need any assumption on f to change the
order of integration since f ≥ 0. In (b) f is an arbitrary measurable function on X1 × X2 ,
R
however, the integral X | f | dµ needs to be finite.
330
12 Measure Theory and Integration
Chapter 13
Hilbert Space
Functional analysis is a fruitful interplay between linear algebra and analysis. One defines
function spaces with certain properties and certain topologies and considers linear operators
between such spaces. The friendliest example of such spaces are Hilbert spaces.
This chapter is divided into two parts—one describes the geometry of a Hilbert space, the
second is concerned with linear operators on the Hilbert space.
13.1 The Geometry of the Hilbert Space
13.1.1 Unitary Spaces
Let E be a linear space over K = R or over K = C.
Definition 13.1 An inner product on E is a function h· , ·i : E × E → K with
(a) hλ1 x1 + λ2 x2 , yi = λ1 hx1 , yi + λ2 hx2 , yi (Linearity)
(b) hx , yi = hy , xi. (Hermitian property)
(c) hx , xi ≥ 0 for all x ∈ E, and hx , xi = 0 implies x = 0 (Positive definiteness)
A unitary space is a linear space together with an inner product.
Let us list some immediate consequences from these axioms: From (a) and (b) it follows that
(d)
hy , λ1 x1 + λ2 x2 i = λ1 hy , x1 i + λ2 hy , x2 i .
A form on E × E satisfying (a) and (d) is called a sesquilinear form. (a) implies h0 , yi = 0
for all y ∈ E. The mapping x 7→ hx , yi is a linear mapping into K (a linear functional) for all
y ∈ E.
By (c), we may define kxk, the norm of the vector x ∈ E to be the square root of hx , xi; thus
kxk2 = hx , xi .
331
(13.1)
13 Hilbert Space
332
Proposition 13.1 (Cauchy–Schwarz Inequality) Let (E, h· , ·i) be a unitary space.
x, y ∈ E we have
| hx , yi | ≤ kxk kyk .
For
Equality holds if and only if x = βy for some β ∈ K.
Proof. Choose α ∈ C, | α | = 1 such that α hy , xi = | hx , yi |. For λ ∈ R we then have (since
α hx , yi = α hy , xi = | hx , yi |)
hx − αλy , x − αλyi = hx , xi − αλ hy , xi − αλ hx , yi + λ2 hy , yi
= hx , xi − 2λ | hx , yi | + λ2 hy , yi ≥ 0.
This is a quadratic polynomial aλ2 + bλ + c in λ with real coefficients. Since this polynomial
takes only non-negative values, its discriminant b2 − 4ac must be non-positive:
4 | hx , yi |2 − 4 kxk2 kyk2 ≤ 0.
This implies | hx , yi | ≤ kxk kyk.
Corollary 13.2 k·k defines a norm on E.
Proof. It is clear that kxkq≥ 0. From (c) it follows that kxk = 0 implies x = 0. Further,
p
kλxk = hλx , λxi = | λ |2 hx , xi = | λ | kxk. We prove the triangle inequality. Since
2 Re (z) = z + z we have by Proposition 1.20 and the Cauchy-Schwarz inequality
kx + yk2 = hx + y , x + yi = hx , xi + hx , yi + hy , xi + hy , yi
= kxk2 + kyk2 + 2 Re hx , yi
≤ kxk2 + kyk2 + 2 | hx , yi |
≤ kxk2 + kyk2 + 2 kxk kyk = (kxk + kyk)2 ;
hence kx + yk ≤ kxk + kyk.
p
By the corollary, any unitary space is a normed space with the norm kxk = hx , xi.
Recall that any normed vector space is a metric space with the metric d(x, y) = kx − yk. Hence,
the notions of open and closed sets, neighborhoods, converging sequences, Cauchy sequences,
continuous mappings, and so on make sense in a unitary space. In particular lim xn = x
n→∞
means that the sequence (kxn − xk) of non-negative real numbers tends to 0. Recall from
Definition 6.8 that a metric space is said to be complete if every Cauchy sequence converges.
Definition 13.2 A complete unitary space is called a Hilbert space.
Example 13.1 Let K = C.
(a) E = C , x = (x1 , . . . , xn ) ∈ C , y = (y1 , . . . , yn ) ∈ C . Then hx , yi =
n
n
n
n
X
k=1
xk yk defines
1
P
an inner product, with the euclidean norm kxk = ( nk=1 | xk |) 2 . (Cn , h· , ·i) is a Hilbert space.
13.1 The Geometry of the Hilbert Space
333
R
(b) E = L2 (X, µ) is a Hilbert space with the inner product hf , gi = X f g dµ.
By Proposition 12.17 with p = q = 2 we obtain the Cauchy-Schwarz inequality
Z
Z
12 Z
12
2
2
f g dµ ≤
| f | dµ
| g | dµ .
X
X
X
Using CSI one can prove Minkowski’s inequality, that is, f, g ∈ L2 (X, µ) implies f + g ∈
L2 (X, µ). Also, hf , gi is a finite complex number, since f g ∈ L1 (X, µ).
R
Note that the inner product is positive definite since X | f |2 dµ = 0 implies (by Proposition 12.16) | f | = 0 a. e. and therefore, f = 0 in L2 (X, µ). To prove the completeness of
L2 (X, µ) is more complicated, we skip the proof.
(c) E = ℓ2 , i. e.
∞
X
| xn |2 < ∞}.
ℓ2 = {(xn ) | xn ∈ C, n ∈ N,
n=1
Note that Cauchy–Schwarz’s inequality in R (Corollary 1.26) implies
2
k
k
∞
k
∞
X
X
X
X
X
2
2
2
xn yn ≤
| xn |
| yn | ≤
| xn |
| y n |2 .
n=1
n=1
n=1
n=1
n=1
k
Taking the supremum over all k ∈ N on the left, we have
2
∞
∞
∞
X
X
X
2
xn yn ≤
| xn |
| y n |2 ;
n=1
n=1
hence
h(xn ) , (yn )i =
n=1
∞
X
xn yn
n=1
is an absolutely converging series such that the inner product is well-defined on ℓ2 .
Lemma 13.3 Let E be a unitary space. For any fixed y ∈ E the mappings f, g : E → C given
by
f (x) = hx , yi , and g(x) = hy , xi
are continuous functions on E.
Proof. First proof. Let (xn ) be a sequence in E , converging to x ∈ E, that is,
limn→∞ kxn − xk = 0. Then
| hxn , yi − hx , yi | = | hxn − x , yi | ≤ kxn − xk kyk =⇒ 0
CSI
as n → ∞. This proves continuity of f . The same proof works for g.
Second proof. The Cauchy–Schwarz inequality implies that for x1 , x2 ∈ E
| hx1 , yi − hx2 , yi | = | hx1 − x2 , yi | ≤ kx1 − x2 k kyk ,
which proves that the map x 7→ hx , yi is in fact uniformly continuous (Given ε > 0 choose
δ = ε/ kyk. Then kx1 − x2 k < δ implies | hx1 , yi − hx2 , yi | < ε). The same is true for
x 7→ hy , xi.
13 Hilbert Space
334
Definition 13.3 Let H be a unitary space. We call x and y orthogonal to each other, and write
x ⊥ y, if hx , yi = 0. Two subsets M, N ⊂ H are called orthogonal to each other if x ⊥ y for
all x ∈ M and y ∈ N.
For a subset M ⊂ H define the orthogonal complement M ⊥ of M to be the set
M ⊥ = {x ∈ H | hx , mi = 0,
for all m ∈ M }.
For example, E = Rn with the standard inner product and v = (v1 , . . . , vn ) ∈ Rn , v 6= 0 yields
{v} = {x ∈ R |
⊥
n
n
X
xk vk = 0}.
k=1
This is a hyperplane in Rn which is orthogonal to v.
Lemma 13.4 Let H be a unitary space and M ⊂ H be an arbitrary subset. Then, M ⊥ is a
closed linear subspace of H.
Proof. (a) Suppose that x, y ∈ M ⊥ . Then for m ∈ M we have
hλ1 x + λ2 y , mi = λ1 hx , mi + λ2 hy , mi = 0;
hence λ1 x + λ2 y ∈ M ⊥ . This shows that M ⊥ is a linear subspace.
(b) We show that any converging sequence (xn ) of elements of M ⊥ has its limit in M ⊥ . Suppose
lim xn = x, xn ∈ M ⊥ , x ∈ H. Then for all m ∈ M, hxn , mi = 0. Since the inner product is
n→∞
continuous in the first argument (see Lemma 13.3) we obtain
0 = lim hxn , mi = hx , mi .
n→∞
This shows x ∈ M ⊥ ; hence M ⊥ is closed.
13.1.2 Norm and Inner product
Problem. Given p
a normed linear space (E, k·k). Does there exist an inner product h· , ·i on E
such that kxk = hx , xi for all x ∈ E? In this case we call k·k an inner product norm.
Proposition 13.5 (a) A norm k·k on a linear space E over K = C or K = R is an inner
product norm if and only if the parallelogram law
kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ),
x, y ∈ E
(13.2)
is satisfied.
(b) If (13.2) is satisfied, the inner product h· , ·i is given by (13.3) in the real case K = R and
by (13.4) in the complex case K = C.
1
kx + yk2 − kx − yk2 , if K = R.
(13.3)
hx , yi =
4
1
kx + yk2 − kx − yk2 + i kx + iyk2 − i kx − iyk2 , if K = C.
hx , yi =
(13.4)
4
These equations are called polarization identities.
13.1 The Geometry of the Hilbert Space
335
Proof. We check the parallelogram and the polarization identity in the real case, K = R.
kx + yk2 + kx − yk2 = hx + y , x + yi + hx − y , x − yi
= hx , xi + hy , xi + hx , yi + hy , yi + (hx , xi − hy , xi − hx , yi + hy , yi)
= 2 kxk2 + 2 kyk2 .
Further,
kx + yk2 − kx − yk2 = (hx , xi + hy , xi + hx , yi + hy , yi)−
− (hx , xi − hy , xi − hx , yi + hy , yi) = 4 hx , yi .
The proof that the parallelogram identity is sufficient for E being a unitary space is in the
appendix to this section.
R2
Example 13.2 We show that L1 ([0, 2]) with kf k1 = 0 | f | dx is not an inner product norm.
R1
Indeed, let f = χ[1,2] and g = χ[0,1] . Then f + g = | f − g | = 1 and kf k1 = kgk1 = 0 dx = 1
such that
kf + gk21 + kf − gk21 = 22 + 22 = 8 6= 4 = 2(kf k21 + kgk21 ).
The parallelogram identity is not satisfied for k·k1 such that L1 ([0, 2]) is not an inner product
space.
13.1.3 Two Theorems of F. Riesz
(born: January 22, 1880 in Austria-Hungary, died: February 28, 1956, founder of functional
analysis)
Definition 13.4 Let (H1 , h· , ·i1 ) and (H2 , h· , ·i2 ) be Hilbert spaces. Let H = {(x1 , x2 ) | x1 ∈
H1 , x2 ∈ H2 } be the direct sum of the Hilbert spaces H1 and H2 . Then
h(x1 , x2 ) , (y1 , y2)i = hx1 , y1 i1 + hx2 , y2 i2
defines an inner product on H. With this inner product H becomes a Hilbert space. H =
H1 ⊕ H2 is called the (direct) orthogonal sum of H1 and H2 .
Definition 13.5 Two Hilbert spaces H1 and H2 are called isomorphic if there exists a bijective
linear mapping ϕ : H1 → H2 such that
hϕ(x) , ϕ(y)i2 = hx , yi1 ,
x, y ∈ H1 .
ϕ is called isometric isomorphism or a unitary map.
Back to the orthogonal sum H = H1 ⊕ H2 . Let H̃1 = {(x1 , 0) | x1 ∈ H1 } and H̃2 = {(0, x2 ) |
x2 ∈ H2 }. Then x1 7→ (x1 , 0) and x2 7→ (0, x2 ) are isometric isomorphisms from Hi → H̃i ,
i = 1, 2. We have H̃1 ⊥ H̃2 and H̃i , i = 1, 2 are closed linear subspaces of H.
In this situation we say that H is the inner orthogonal sum of the two closed subspaces H̃1 and
H̃2 .
13 Hilbert Space
336
(a) Riesz’s First Theorem
Problem. Let H1 be a closed linear subspace of H. Does there exist another closed linear
subspace H2 such that H = H1 ⊕ H2 ?
Answer: YES.
Lemma 13.6 ( Minimal Distance Lemma) Let C be a conx
vex and closed subset of the Hilbert space H. For x ∈ H
let
1111
00
0000
11
00
11
00
11
c
000000000
111111111
00
11
01
000000000
111111111
000000000
111111111
000000000
111111111
000000000
111111111
C
̺(x) = inf{kx − yk | y ∈ C}.
Then there exists a unique element c ∈ C such that
̺(x) = kx − ck .
Proof. Existence. Since ̺(x) is an infimum, there exists a sequence (yn ), yn ∈ C, which
approximates the infimum, limn→∞ kx − yn k = ̺(x). We will show, that (yn ) is a Cauchy
sequence. By the parallelogram law (see Proposition 13.5) we have
kyn − ym k2 = kyn − x + x − ym k2
= 2 kyn − xk2 + 2 kx − ym k2 − k2x − yn − ym k2
2
yn + ym 2
2
.
= 2 kyn − xk + 2 kx − ym k − 4 x −
2
m
Since C is convex, (yn + ym )/2 ∈ C and therefore x − yn +y
≥ ̺(x). Hence
2
≤ 2 kyn − xk2 + 2 kx − ym k2 − 4̺(x)2 .
By the choice of (yn ), the first two sequences tend to ̺(x)2 as m, n → ∞. Thus,
lim kyn − ym k2 = 2(̺2 (x) + ̺(x)2 ) − 4̺(x)2 = 0,
m,n→∞
hence (yn ) is a Cauchy sequence. Since H is complete, there exists an element c ∈ H such
that limn→∞ yn = c. Since yn ∈ C and C is closed, c ∈ C. By construction, we have
kyn − xk −→ ̺(x). On the other hand, since yn −→ c and the norm is continuous (see
homework 42.1. (b)), we have
kyn − xk −→ kc − xk .
This implies ̺(x) = kc − xk.
Uniqueness. Let c, c′ two such elements. Then, by the parallelogram law,
2
2
0 ≤ kc − c′ k = kc − x + x − c′ k
2
c + c′ 2
′ 2
= 2 kc − xk + 2 kx − c k − 4 x −
2 ≤ 2(̺(x)2 + ̺(x)2 ) − 4̺(x)2 = 0.
This implies c = c′ ; the point c ∈ C which realizes the infimum is unique.
13.1 The Geometry of the Hilbert Space
337
11111111
00000000
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000000
111111111
000
111
000000
111111
0
1
00000000
11111111
H
0000000000000000000
1111111111111111111
000
111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
x
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
x
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
000000
111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
0
1
00000000
11111111
0000000000000000000
1111111111111111111
00000000
11111111
0000000000000000000
1111111111111111111
00000000
11111111
00
11
000000000
111111111
000
x 111
0
1
00000000
H11111111
2
00
11
Theorem 13.7 (Riesz’s first theorem)
Let H1 be a closed linear subspace of the
Hilbert space H. Then we have
H = H1 ⊕ H1⊥ ,
1
2
1
that is, any x ∈ H has a unique representation x = x1 + x2 with x1 ∈ H1 and
x2 ∈ H1⊥ .
Proof. Existence. Apply Lemma 13.6 to the convex, closed set H1 . There exists a unique
x1 ∈ H1 such that
̺(x) = inf{kx − yk | y ∈ H1 } = kx − x1 k ≤ kx − x1 − ty1 k
for all t ∈ K and y1 ∈ H1 . homework 42.2 (c) now implies x2 = x − x1 ⊥ y1 for all y1 ∈ H1 .
Hence x2 ∈ H1⊥ . Therefore, x = x1 + x2 , and the existence of such a representation is shown.
Uniqueness. Suppose that x = x1 + x2 = x′1 + x′2 are two possibilities to write x as a sum of
elements of x1 , x′1 ∈ H1 and x2 , x′2 ∈ H1⊥ . Then
x1 − x′1 = x′2 − x2 = u
belongs to both H1 and H1⊥ (by linearity of H1 and H2 ). Hence hu , ui = 0 which implies
u = 0. That is, x1 = x′1 and x2 = x′2 .
Let x = x1 + x2 be as above. Then the mappings P1 (x) = x1 and P2 (x) = x2 are well-defined
on H. They are called orthogonal projections of H onto H1 and H2 , respectively. We will
consider projections in more detail later.
Example 13.3 Let H be a Hilbert space, z ∈ H, z 6= 0, H1 = K z the one-dimensional linear
subspace spanned by one single vector z. Since any finite dimensional subspace is closed,
Riesz’s first theorem applies. We want to compute the projections of x ∈ H with respect to H1
and H1⊥ . Let x1 = αz; we have to determine α such that hx − x1 , zi = 0, that is
hx − αz , zi = hx , zi − hαz , zi = hx , zi − α hz , zi = 0.
Hence,
α=
hx , zi
hx , zi
=
.
hz , zi
kzk2
The Riesz’s representation with respect to H1 = Kz and H1⊥ is
hx , zi
hx , zi
x=
z+ x−
z .
kzk2
kzk2
13 Hilbert Space
338
(b) Riesz’s Representation Theorem
Recall from Section 11 that a linear functional on the vector space E is a mapping F : E → K
such that F (λ1 x1 + λ2 x2 ) = λ1 F (x1 ) + λ2 F (x2 ) for all x1 , x2 ∈ E and λ1 , λ2 ∈ K.
Let (E, k·k) be a normed linear space over K. Recall that a linear functional F : E → K is
called continuous if xn −→ x in E implies F (xn ) −→ F (x).
The set of all continuous linear functionals F on E form a linear space E ′ with the same linear
operations as in E ∗ .
Now let (H, h· , ·i) be a Hilbert space. By Lemma 13.3, Fy : H → K, Fy (x) = hx , yi defines
a continuous linear functional on H. Riesz’s representation theorem states that any continuous
linear functional on H is of this form.
Theorem 13.8 (Riesz’s Representation Theorem) Let F be a continuous linear functional on
the Hilbert space H.
Then there exists a unique element y ∈ H such that F (x) = Fy (x) = hx , yi for all x ∈ H.
Proof. Existence. Let H1 = ker F be the null-space of the linear functional F . H1 is a linear
subspace (since F is linear). H1 is closed since H1 = F −1 ({0}) is the preimage of the closed
set {0} under the continuous map F . By Riesz’s first theorem, H = H1 ⊕ H1⊥ .
Case 1. H1⊥ = {0}. Then H = H1 and F (x) = 0 for all x. We can choose y = 0; F (x) =
hx , 0i.
Case 2. H1⊥ 6= {0}. Suppose u ∈ H1⊥ , u 6= 0. Then F (u) 6= 0 (otherwise, u ∈ H1⊥ ∩ H1 such
that hu , ui = 0 which implies u = 0). We have
F (x)
F (x)
u = F (x) −
F (u) = 0.
F x−
F (u)
F (u)
Hence x −
F (x)
u ∈ H1 . Since u ∈ H1⊥ we have
F (u)
F (x)
x−
u, u
F (u)
F (x)
= hx , ui −
hu , ui
F (u)
*
+
F (u)
F (u)
hx , ui = x ,
u = Fy (x),
F (x) =
hu , ui
kuk2
0=
F (u)
u.
kuk2
Uniqueness. Suppose that both y1 , y2 ∈ H give the same functional F , i. e. F (x) = hx , y1 i =
hx , y2 i for all x. This implies
where y =
hy1 − y2 , xi = 0,
x ∈ H.
In particular, choose x = y1 − y2 . This gives ky1 − y2 k2 = 0; hence y1 = y2 .
13.1 The Geometry of the Hilbert Space
339
(c) Example
Any continuous linear functionals on L2 (X, µ) are of the form F (f ) =
g ∈ L2 (X, µ). Any continuous linear functional on ℓ2 is given by
F ((xn )) =
∞
X
xn yn ,
n=1
R
X
f g dµ with some
with (yn ) ∈ ℓ2 .
13.1.4 Orthogonal Sets and Fourier Expansion
Motivation. Let E = Rn be the euclidean space with the standard inner product and standard
basis {e1 , . . . , en }. Then we have with xi = hx , ei i
x=
n
X
xk ek ,
k=1
2
kxk =
n
X
k=1
2
| xk | ,
hx , yi =
n
X
xk yk .
k=1
We want to generalize these formulas to arbitrary Hilbert spaces.
(a) Orthonormal Sets
Let (H, h· , ·i) be a Hilbert space.
Definition 13.6 Let {xi | i ∈ I} be a family of elements of H.
{xi } is called an orthogonal set or OS if hxi , xj i = 0 for i 6= j.
{xi } is called an orthonormal set or NOS if hxi , xj i = δij for all i, j ∈ I.
Example 13.4 (a) H = ℓ2 , en = (0, 0, . . . , 0, 1, 0, . . . ) with the 1 at the nth component. Then
{en | n ∈ N} is an OS in H.
R 2π
(b) H = L2 ((0, 2π)) with the Lebesgue measure, hf , gi = 0 f g dλ. Then
{1, sin(nx), cos(nx) | n ∈ N}
is an OS in H.
sin(nx) cos(nx)
1
√ , √
, √
|n∈N ,
π
π
2π
to be orthonormal sets of H.
einx
√ |n∈N
2π
Lemma 13.9 (The Pythagorean Theorem) Let {x1 , . . . , xk } be an OS in H, then
kx1 + · · · + xk k2 = kx1 k2 + · · · + kxk k2 .
The easy proof is left to the reader.
Lemma 13.10 Let {xn } be an OS in H. Then
converges.
The proof is in the appendix.
P∞
k=1
xk converges if and only if
P∞
k=1 kxk k
2
13 Hilbert Space
340
(b) Fourier Expansion and Completeness
Throughout this paragraph let {xn | n ∈ N} an NOS in the Hilbert space H.
Definition 13.7 The numbers hx , xn i, n ∈ N, are called Fourier coefficients of x ∈ H with
respect to the NOS {xn }.
1
sin(nx) cos(nx)
Example 13.5 Consider the NOS √ , √
, √
| n ∈ N from the previous exπ
π
2π
ample on H = L2 ((0, 2π)). Let f ∈ H. Then
Z 2π
sin(nx)
1
f (t) sin(nt) dt,
f, √
=√
π
π 0
Z 2π
cos(nx)
1
f (t) cos(nt) dt,
f, √
=√
π
π 0
Z 2π
1
1
f, √
f (t) dt,
=√
2π
2π 0
These are the usual Fourier coefficients—up to a factor. Note that we have another normalization than in Definition 6.3 since the inner product there has the factor 1/(2π).
Proposition 13.11 (Bessel’s Inequality) For x ∈ H we have
∞
X
k=1
| hx , xk i |2 ≤ kxk2 .
Proof. Let n ∈ N be a positive integer and yn = x −
hyn , xm i = hx , xm i −
n
X
k=1
Pn
k=1 hx ,
(13.5)
xk i xk . Then
hx , xk i hxk , xm i = hx , xm i −
n
X
k=1
hx , xk i δkm = 0
for m = 1, . . . , n. Hence, {yn , hx , x1 i x1 , . . . , hx , xn i xn } is an OS. By Lemma 13.9
2
n
n
n
X
X
X
2
2
2
2
kxk = yn +
hx , xk i xk = kyn k +
| hx , xk i | kxk k ≥
| hx , xk i |2 ,
k=1
k=1
k=1
since kxk k2 = 1 for all k. Taking the supremum over all n on the right, the assertion follows.
Corollary 13.12 For any x ∈ H the series
∞
X
k=1
hx , xk i xk converges in H.
Proof. Since {hx , xk i xk } is an OS, by Lemma 13.10 the series converges if and only if the
P∞
P
2
2
series ∞
k=1 khx , xk i xk k =
k=1 | hx , xk i | converges. By Bessel’s inequality, this series
converges.
We call
P∞
k=1 hx ,
xk i xk the Fourier series of x with respect to the NOS {xk }.
13.1 The Geometry of the Hilbert Space
341
Remarks 13.1 (a) In general, the Fourier series of x does not converge to x.
√
√
(b) The NOS { √12π , sin(nx)
, cos(nx)
} gives the ordinary Fourier series of a function f which is
π
π
integrable over (0, 2π).
Theorem 13.13 Let {xk | k ∈ N} be an NOS in H. The following are equivalent:
(a) x =
∞
X
k=1
hx , xk i xk
for all x ∈ H, i.e. the Fourier series of x converges to x.
(b) hz , xk i = 0 for all k ∈ N implies z = 0, i. e. the NOS is maximal.
∞
X
2
(c) For every x ∈ H we have kxk =
| hx , xk i |2 .
k=1
∞
X
(d) If x ∈ H and y ∈ H, then hx , yi =
k=1
hx , xk i hxk , yi.
Formula (c) is called Parseval’s identity.
Definition 13.8 An orthonormal set {xi | i ∈ N} which satisfies the above (equivalent) properties is called a complete orthonormal system, CNOS for short.
Proof. (a) → (d): Since the inner product is continuous in both components we have
hx , yi =
=
*∞
X
k=1
∞
X
k=1
hx , xk i xk ,
∞
X
n=1
hy , xn i xn
+
=
∞
X
k,n=1
hx , xk i hxk , yi .
hx , xk i hy , xn i hxk , xn i
| {z }
δkn
(d) → (c): Put y = x.
(c) → (b): Suppose hz , xk i = 0 for all k. By (c) we then have
2
kzk =
∞
X
k=1
| hz , xk i |2 = 0;
hence
z = 0.
P∞
(b) → (a): Fix x ∈ H and put y =
k=1 hx , xk i xk which converges according to Corollary 13.12. With z = x − y we have for all positive integers n ∈ N
hz , xn i = hx − y , xn i =
hz , xn i = hx , xn i −
∞
X
k=1
*
x−
∞
X
k=1
hx , xk i xk , xn
+
hx , xk i hxk , xn i = hx , xn i − hx , xn i = 0.
This shows z = 0 and therefore x = y, i. e. the Fourier series of x converges to x.
13 Hilbert Space
342
Example 13.6 (a) H = ℓ2 , {en | n ∈ N} is an NOS. We show that this NOS is complete.
For, let x = (xn ) be orthogonal to every en , n ∈ N; that is, 0 = hx , en i = xn . Hence,
x = (0, 0, . . . ) = 0. By (b), {en } is a CNOS. How does the Fourier series of x look like? The
Fourier coefficients of x are hx , en i = xn such that
x=
∞
X
xn en
n=1
is the Fourier series of x . The NOS {en | n ≥ 2} is not complete.
(b) H = L2 ((0, 2π)),
inx
e
1
sin(nx) cos(nx)
√ , √
√ |n∈Z
, √
| n ∈ N , and
π
π
2π
2π
are both CNOSs in H. This was stated in Theorem6.14
(c) Existence of CNOS in a Separable Hilbert Space
Definition 13.9 A metric space E is called separable if there exists a countable dense subset
of E.
Example 13.7 (a) Rn is separable. M = {(r1 , . . . , rn ) | r1 , . . . , rn ∈ Q} is a countable dense
set in Rn .
(b) Cn is separable. M = {(r1 + is1 , . . . , rn + isn ) | r1 , . . . , rn , s1 , . . . , sn ∈ Q} is a countable
dense subset of Cn .
(c) L2 ([a, b]) is separable. The polynomials {1, x, x2 , . . . } are linearly independent in L2 ([a, b])
and they can be orthonormalized via Schmidt’s process. As a result we get a countable CNOS
in L2 ([a, b]) (Legendre polynomials in case −a = 1 = b). However, L2 (R) contains no polyno2
mial; in this case the Hermite functions which are of the form pn (x) e−x with polynomials pn ,
form a countable CNOS.
More general, L2 (G, λn ) is separable for any region G ⊂ Rn with respect to the Lebesgue
measure.
(d) Any Hilbert space is isomorphic to some L2 (X, µ) where µ is the counting measure on X;
X = N gives ℓ2 . X uncountable gives a non-separable Hilbert space.
Proposition 13.14 (Schmidt’s Orthogonalization Process) Let {yk } be an at most countable
linearly independent subset of the Hilbert space H. Then there exists an NOS {xk } such that
for every n
lin {y1 , . . . , yn } = lin {x1 , . . . , xn }.
The NOS can be computed recursively,
y1
x1 :=
,
ky1 k
xn+1 = (yn+1 −
n
X
k=1
hyn+1 , xk i xk )/ k∼k
Corollary 13.15 Let {ek | k ∈ N} be an NOS where N = {1, . . . , n} for some n ∈ N
or N = N. Suppose that H1 = lin {ek | k ∈ N} is the linear span of the NOS. Then
P
x1 = k∈N hx , ek i ek is the orthogonal projection of x ∈ H onto H1 .
13.1 The Geometry of the Hilbert Space
343
Proposition 13.16 (a) A Hilbert space H has an at most countable complete orthonormal system (CNOS) if and only if H is separable.
(b) Let H be a separable Hilbert space. Then H is either isomorphic to Kn for some n ∈ N or
to ℓ2 .
13.1.5 Appendix
(a) The Inner Product constructed from an Inner Product Norm
Proof of Proposition 13.5. We consider only the case K = R. Assume that the parallelogram
identity is satisfied. We will show that
hx , yi =
1
kx + yk2 − kx − yk2
4
defines a bilinear form on E.
(a) We show Additivity. First note that the parallelogram identity implies
1
1
kx1 +x2 +yk2 + kx1 +x2 +yk2
2
2
1
1
2
2
=
2 kx1 +yk +2 kx2 k − kx1 −x2 +yk2 + 2 kx2 +yk2 +2 kx1 k2 − kx2 −x1 +yk2
2
2
1
2
2
2
2
= kx1 +yk + kx2 +yk + kx1 k + kx2 k − kx1 −x2 +yk2 + kx2 −x1 +yk2
2
kx1 +x2 +yk2 =
Replacing y by −y, we have
kx1 +x2 −yk2 = kx1 −yk2 + kx2 −yk2 + kx1 k2 + kx2 k2 −
By definition and the above two formulas,
1
kx1 −x2 −yk2 + kx2 −x1 −yk2 .
2
1
kx1 + x2 + yk2 − kx1 + x2 − yk2
4
1
=
kx1 + yk2 − kx1 − yk2 + kx2 + yk2 − kx2 − yk2
2
= hx1 , yi + hx2 , yi ,
hx1 + x2 , yi =
that is, h· , ·i is additive in the first variable. It is obviously symmetric and hence additive in the
second variable, too.
(b) We show hλx , yi = λ hx , yi for all λ ∈ R, x, y ∈ E. By (a), h2x , yi = 2 hx , yi. By
induction on n, hnx , yi = n hx , yi for all n ∈ N. Now let λ = m
, m, n ∈ N. Then
n
Dm
E D m
E
n hλx , yi = n
x , y = n x , y = m hx , yi
n
n
m
=⇒ hλx , yi = hx , yi = λ hx , yi .
n
Hence, hλx , yi = λ hx , yi holds for all positive rational numbers λ. Suppose λ ∈ Q+ , then
0 = hx + (−x) , yi = hx , yi + h−x , yi
13 Hilbert Space
344
implies h−x , yi = − hx , yi and, moreover, h−λx , yi = −λ hx , yi such that the equation
holds for all λ ∈ Q. Suppose that λ ∈ R is given. Then there exists a sequence (λn ), λn ∈ Q
of rational numbers with λn → λ. This implies λn x → λx for all x ∈ E and, since k·k is
continuous,
hλx , yi = lim hλn x , yi = lim λn hx , yi = λ hx , yi .
n→∞
n→∞
This completes the proof.
(b) Convergence of Orthogonal Series
We reformulate Lemma 13.10
P
P∞
2
Let {xn } be an OS in H. Then ∞
k=1 xk converges if and only if
k=1 kxk k
converges.
P
x of elements xi of a Hilbert space H is defined
Note that the convergence of a series ∞
i=1
Pni
to be the limit of the partial sums limn→∞ i=1 xi . In particular, the Cauchy criterion applies
since H is complete:
P
The series yi converges
if and only if for every ε > 0 there exists n0 ∈ N such
n
X
yi < ε.
that m, n ≥ n0 imply i=m
P
Pn
2
Proof. By the above discussion, ∞
k=1 xk converges if and only if k
k=m xk k becomes small
for sufficiently large m, n ∈ N. By the Pythagorean theorem this term equals
n
X
k=m
hence the series
P
kxk k2 ;
xk converges, if and only if the series
P
kxk k2 converges.
13.2 Bounded Linear Operators in Hilbert Spaces
13.2.1 Bounded Linear Operators
Let (E1 , k·k1 ) and (E2 , k·k2 ) be normed linear space. Recall that a linear map T : E1 → E2 is
called continuous if xn −→ x in E1 implies T (xn ) −→ T (x) in E2 .
Definition 13.10 (a) A linear map T : E1 → E2 is called bounded if there exist a positive real
number C > 0 such that
kT (x)k2 ≤ C kxk1 ,
for all x ∈ E1 .
(13.6)
13.2 Bounded Linear Operators in Hilbert Spaces
345
(b) Suppose that T : E1 → E2 is a bounded linear map. Then the operator norm is the smallest
number C satisfying (13.6) for all x ∈ E1 , that is
kT k = inf {C > 0 | ∀ x ∈ E1 : kT (x)k2 ≤ C kxk1 } .
One can show that
kT (x)k2
| x ∈ E1 , x 6= 0 ,
kxk1
(a)
kT k = sup
(b)
kT k = sup {kT (x)k2 | kxk1 ≤ 1}
(c)
kT k = sup {kT (x)k2 | kxk1 = 1}
Indeed, we may restrict ourselves to unit vectors since
| α | kT (x)k2
kT (x)k2
kT (αx)k2
=
=
.
kαxk1
| α | kxk1
kxk1
This shows the equivalence of (a) and (c). Since kT (αx)k2 = | α | kT (x)k2 , the suprema (b)
and (c) are equal. From The last equality follows from the fact that the least upper bound is the
infimum over all upper bounds. From (a) and (d) it follows,
kT (x)k2 ≤ kT k kxk1 .
S
(13.7)
T
Also, if E1 ⇒ E2 ⇒ E3 are bounded linear mappings, then T ◦S is a bounded linear mapping
with
kT ◦Sk ≤ kT k kSk .
Indeed, for x 6= 0 one has by (13.7)
kT (S(x))k3 ≤ kT k kS(x)k2 ≤ kT k kSk kxk1 .
Hence, k(T ◦S)(x)k3 / kxk1 ≤ kT k kSk.
Proposition 13.17 For a linear map T : E1 → E2 of a normed space E1 into a normed space
E2 the following are equivalent:
(a) T is bounded.
(b) T is continuous.
(c) T is continuous at one point of E1 .
Proof. (a) → (b). This follows from the fact
kT (x1 ) − T (x2 )k = kT (x1 − x2 )k ≤ kT k kx1 − x2 k ,
and T is even uniformly continuous on E1 . (b) trivially implies (c).
(c) → (a). Suppose T is continuous at x0 . To each ε > 0 one can find δ > 0 such that
kx − x0 k < δ implies kT (x) − T (x0 )k < ε. Let y = x − x0 . In other words kyk < δ implies
kT (y + x0 ) − T (x0 )k = kT (y)k < ε.
Suppose z ∈ E1 , kzk ≤ 1. Then kδ/2zk ≤ δ/2 < δ; hence kT (δ/2z)k < ε. By linearity of T ,
kT (z)k < 2ε/δ. This shows kT k ≤ 2ε/δ.
13 Hilbert Space
346
Definition 13.11 Let E and F be normed linear spaces. Let L (E, F ) denote the set of all
bounded linear maps from E to F . In case E = F we simply write L (E) in place of L (E, F ).
Proposition 13.18 Let E and F be normed linear spaces. Then L (E, F ) is a normed linear
space if we define the linear structure by
(S + T )(x) = S(x) + T (x),
(λT )(x) = λ T (x)
for S, T ∈ L (E, F ), λ ∈ K. The operator norm kT k makes L (E, F ) a normed linear space.
Note that L (E, F ) is complete if and only if F is complete.
Example 13.8 (a) Recall that L (Kn , Km ) is a normed vector space with kAk ≤
P
1
2 2
, where A = (aij ) is the matrix representation of the linear operator A, see
|
a
|
ij
i,j
Proposition 7.1
(b) The space E ′ = L (E, K) of continuous linear functionals on E.
(c) H = L2 ((0, 1)), g ∈ C([0, 1]),
Tg (f )(t) = g(t)f (t)
defines a bounded linear operator on H. (see homework)
(d) H = L2 ((0, 1)), k(s, t) ∈ L2 ([0, 1] × [0, 1]). Then
Z 1
k(s, t)f (s) ds, f ∈ H = L2 ([0, 1])
(Kf )(t) =
0
defines a continuous linear operator K ∈ L (H). We have
Z 1
2 Z 1
2
2
| (Kf )(t) | = k(s, t)f (s) ds ≤
| k(s, t) | | f (s) | ds
0
0
Z 1
Z 1
2
≤
| k(s, t) | ds
| f (s) |2 ds
C-S-I 0
0
Z 1
=
| k(s, t) |2 ds kf k2H .
0
Hence,
kK(f )k2H
≤
Z
1
0
Z
1
0
2
| k(s, t) | ds
dt kf k2H
kK(f )kH ≤ kkkL2 ([0,1]×[0,1]) kf kH .
This shows Kf ∈ H and further, kKk ≤ kkkL2 ([0,1]2 ) . K is called an integral operator; K is
compact, i. e. it maps the unit ball into a set whose closure is compact.
(e) H = L2 (R), a ∈ R,
(Va f )(t) = f (t − a), t ∈ R,
defines a bounded linear operator called the shift operator. Indeed,
Z
Z
2
2
kVa f k2 =
| f (t − a) | dt =
| f (t) |2 dt = kf k22 ;
R
R
13.2 Bounded Linear Operators in Hilbert Spaces
347
since all quotients kVa (x)k / kxk = 1, kVa k = 1 .
(f) H = ℓ2 . We define the right-shift S by
S(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ).
1
P∞
2 2
Obviously, kS(x)k = kxk =
|
x
|
. Hence, kSk = 1.
n
n=1
1
(g) Let E1 = C ([0, 1]) and E2 = C([0, 1]). Define the differentiation operator (T f )(t) = f ′ (t).
Let kf k1 = kf k2 = sup | f (t) |. Then T is linear but not bounded. Indeed, let fn (t) = 1 − tn .
t∈[0,1]
Then kfn k1 = 1 and T fn (t) = ntn−1 such that ktfn k2 = n. Thus, kT fn k2 / kfn k1 = n → +∞
as n → ∞. T is unbounded.
However, if we put kf k1 = sup | f (t) | + sup | f ′ (t) | and kf k2 as before, then T is bounded
t∈[0,1]
t∈[0,1]
since
kT f k2 = sup | f ′ (t) | ≤ kf k1 =⇒ kT k ≤ 1.
t∈[0,1]
13.2.2 The Adjoint Operator
In this subsection H is a Hilbert space and L (H) the space of bounded linear operators on H.
Let T ∈ L (H) be a bounded linear operator and y ∈ H. Then F (x) = hT (x) , yi defines a
continuous linear functional on H. Indeed,
| F (x) | = | hT (x) , yi | ≤ kT (x)k kyk ≤ kT k kyk kxk ≤ C kxk .
| {z }
CSI
C
Hence, F is bounded and therefore continuous. in particular,
kF k ≤ kT k kyk
By Riesz’s representation theorem, there exists a unique vector z ∈ H such that
hT (x) , yi = F (x) = hx , zi .
Note that by the above inequality
kzk = kF k ≤ kT k kyk .
(13.8)
Suppose y1 is another element of H which corresponds to z1 ∈ H with
hT (x) , y1 i = hx , z1 i .
Finally, let u ∈ H be the element which corresponds to y + y1 ,
hT (x) , y + y1 i = hx , ui .
Since the element u which is given by Riesz’s representation theorem is unique, we have u =
z + z1 . Similarly,
hT (x) , λyi = F (x) = hx , λzi
shows that λz corresponds to λy.
13 Hilbert Space
348
Definition 13.12 The above correspondence y 7→ z is linear. Define the linear operator T ∗ by
z = T ∗ (y). By definition,
hT (x) , yi = x , T ∗ (y) ,
x, y ∈ H.
(13.9)
T ∗ is called the adjoint operator to T .
Proposition 13.19 Let T, T1 , T2 ∈ L (H). Then T ∗ is a bounded linear operator with T ∗ =
kT k. We have
(a) (T1 + T2 )∗ = T1∗ + T2∗ and
(b) (λ T )∗ = λ T ∗ .
(c) (T1 T2 )∗ = T2∗ T1∗ .
(d) If T is invertible in L (H), so is T ∗ , and we have (T ∗ )−1 = (T −1 )∗ .
(e) (T ∗ )∗ = T .
Proof. Inequality (13.8) shows that
∗ T (y) ≤ kT k kyk ,
By definition, this implies
and T ∗ is bounded. Since
y ∈ H.
∗
T ≤ kT k
∗
T (x) , y = hy , T ∗ (x)i = hT (y) , xi = hx , T (y)i ,
we get (T ∗ )∗ = T . We conclude kT k = T ∗∗ ≤ T ∗ ; such that T ∗ = kT k.
(a). For x, y ∈ H we have
h(T1 + T2 )(x) , yi = hT1 (x) + T2 (x) , yi = hT1 (x) , yi + hT2 (x) , yi
= x , T1∗ (y) + x , T2∗ (y) = x , (T1∗ + T2∗ )(y) ;
which proves (a).
(c) and (d) are left to the reader.
A mapping ∗ : A → A such that the above properties (a), (b), and (c) are satisfied is called an
involution. An algebra with involution is called a ∗-algebra.
We have seen that L (H) is a (non-commutative) ∗-algebra. An example of a commutative
∗-algebra is C(K) with the involution f ∗ (x) = f (x).
Example 13.9 (Example 13.8 continued)
(a) H = Cn , A = (aij ) ∈ M(n × n, C). Then A∗ = (bij ) has the matrix elements bij = aji .
(b) H = L2 ([0, 1]), Tg∗ = Tg .
(c) H = L2 (R), Va (f )(t) = f (t − a) (Shift operator), Va∗ = V−a .
13.2 Bounded Linear Operators in Hilbert Spaces
349
(d) H = ℓ2 . The right-shift S is defined by S((xn )) = (0, x1 , x2 , . . . ). We compute the adjoint
S ∗.
∞
∞
X
X
hS(x) , yi =
xn−1 yn =
xn yn+1 = h(x1 , x2 , . . . ) , (y2 , y3 , . . . )i .
n=2
n=1
Hence, S ∗ ((yn )) = (y2 , y3 , . . . ) is the left-shift.
13.2.3 Classes of Bounded Linear Operators
Let H be a complex Hilbert space.
(a) Self-Adjoint and Normal Operators
Definition 13.13 An operator A ∈ L (H) is called
(a) self-adjoint, if A∗ = A,
(b) normal , if
A∗ A = A A∗ ,
A self-adjoint operator A is called positive, if hAx , xi ≥ 0 for all x ∈ H. We write A ≥ 0. If
A and B are self-adjoint, we write A ≥ B if A − B ≥ 0.
A crucial role in proving the simplest properties plays the polarization identity which generalizes the polarization identity from Subsection 13.1.2. However, this exist only in complex
Hilbert spaces.
4 hA(x) , yi = hA(x + y) , x + yi − hA(x − y) , x − yi +
+ i hA(x + iy) , x + iyi − i hA(x − iy) , x − iyi .
We use the identity as follows
hA(x) , xi = 0
for all x ∈ H implies A = 0.
Indeed, by the polarization identity, hA(x) , yi = 0 for all x, y ∈ H. In particular y = A(x)
yields A(x) = 0 for all x; thus, A = 0.
Remarks 13.2 (a) A is normal if and only if kA(x)k = A∗ (x) for all x ∈ H. Indeed, if A
is normal, then for all x ∈ H we have A∗ A(x) , x = A A∗ (x) , x which imply kA(x)k2 =
2
hA(x) , A(x)i = A∗ (x) , A∗ (x) = A∗ (x) . On the other hand, the polarization identity
and A∗ A(x) , x = A A∗ (x) , x implies (A∗ A − A A∗ )(x) , x = 0 for all x; hence
A∗ A − A A∗ = 0 which proves the claim.
(b) Sums and real scalar multiples of self-adjoint operators are self-adjoint.
(c) The product AB of self-adjoint operators is self-adjoint if and only if A and B commute
with each other, AB = BA.
(d) A is self-adjoint if and only if hAx , xi is real for all x ∈ H.
Proof. Let A∗ = A. Then hAx , xi = hx , Axi = hAx , xi is real; for the opposite direction
hA(x) , xi = hx , A(x)i and the polarization identity yields hA(x) , yi = hx , A(y)i for all
x, y; hence A∗ = A.
13 Hilbert Space
350
(b) Unitary and Isometric Operators
Definition 13.14 Let T ∈ L (H). Then T is called
(a) unitary, if T ∗ T = I = T T ∗ .
(b) isometric, if
kT (x)k = kxk for all x ∈ H.
Proposition 13.20 (a) T is isometric if and only if T ∗ T = I and if and only if hT (x) , T (y)i =
hx , yi for all x, y ∈ H.
(b) T is unitary, if and only if T is isometric and surjective.
(c) If S, T are unitary, so are ST and T −1 . The unitary operators of L (H) form a group.
Proof. (a) T isometric yields hT (x) , T (x)i = hx , xi and further (T ∗ T − I)(x) , x = 0 for
all x. The polarization identity implies T ∗ T = I. This implies (T ∗ T − I)(x) , y = 0, for
all x, y ∈ H. Hence, hT (x) , T (y)i = hx , yi. Inserting y = x shows T is isometric.
(b) Suppose T is unitary. T ∗ T = I shows T is isometric. Since T T ∗ = I, T is surjective.
Suppose now, T is isometric and surjective. Since T is isometric, T (x) = 0 implies x = 0;
hence, T is bijective with an inverse operator T −1 . Insert y = T −1 (z) into hT (x) , T (y)i =
hx , yi. This gives
hT (x) , zi = x , T −1 (z) , x, z ∈ H.
Hence T −1 = T ∗ and therefore T ∗ T = T T ∗ = I.
(c) is easy (see homework 45.4).
Note that an isometric operator is injective with norm 1 (since kT (x)k / kxk = 1 for all x). In
case H = Cn , the unitary operators on Cn form the unitary group U(n). In case H = Rn , the
unitary operators on H form the orthogonal group O(n).
Example 13.10 (a) H = L2 (R). The shift operator Va is unitary since Va Vb = Va+b . The
multiplication operator Tg f = gf is unitary if and only if | g | = 1. Tg is self-adjoint (resp.
positive) if and only if g is real (resp. positive).
(b) H = ℓ2 , the right-shift S((xn )) = (0, x1 , x2 , . . . ) is isometric but not unitary since S is not
surjective. S ∗ is not isometric since S ∗ (1, 0, . . . ) = 0; hence S ∗ is not injective.
(c) Fourier transform. For f ∈ L1 (R) define
Z
1
(Ff )(t) = √
e−itx f (x) dx.
2π R
Let S(R) = {f ∈ C∞ (R) | supt∈R tn f (k) (t) < ∞, ∀ n, k ∈ Z+ }. S(R) is called the
Schwartz space after Laurent Schwartz. We have S(R) ⊆ L1 (R) ∩ L2 (R), for example, f (x) =
2
e−x ∈ S(R). We will show later that F : S(R) → S(R) is bijective and norm preserving,
kF(f )kL2 (R) = kf kL2 (R) , f ∈ S(R). F has a unique extension to a unitary operator on L2 (R).
The inverse Fourier transform is
Z
1
−1
(F f )(t) = √
eitx f (x) dx, f ∈ S(R).
2π R
13.2 Bounded Linear Operators in Hilbert Spaces
351
13.2.4 Orthogonal Projections
(a) Riesz’s First Theorem—revisited
Let H1 be a closed linear subspace. By Theorem 13.7 any x ∈ H has a unique decomposition
x = x1 + x2 with x1 ∈ H1 and x2 ∈ H1⊥ . The map PH1 (x) = x1 is a linear operator from H
to H, (see homework 44.1). PH1 is called the orthogonal projection from H onto the closed
subspace H1 . Obviously, H1 is the image of PH1 ; in particular, PH1 is surjective if and only if
H1 = H. In this case, PH = I is the identity. Since
kPH1 (x)k2 = kx1 k2 ≤ kx1 k2 + kx2 k2 = kxk2
we have kPH1 k ≤ 1. If H1 6= {0}, there exists a non-zero x1 ∈ H1 such that kPH1 (x1 )k = kx1 k.
This shows kPH1 k = 1.
Here is the algebraic characterization of orthogonal projections.
Proposition 13.21 A linear operator P ∈ L (H) is an orthogonal projection if and only if
P 2 = P and P ∗ = P .
In this case H1 = {x ∈ H | P (x) = x}.
Proof. “→”. Suppose that P = PH1 is the projection onto H1 . Since P is the identity on H1 ,
P 2 (x) = P (x1 ) = x1 = P (x) for all x ∈ H; hence P 2 = P .
Let x = x1 + x2 and y = y1 + y2 be the unique decompositions of x and y in elements of H1
and H1⊥ , respectively. Then
hP (x) , yi = hx1 , y1 + y2 i = hx1 , y1 i + hx1 , y2 i = hx1 , y1 i = hx1 + x2 , y1 i = hx , P (y)i ,
| {z }
=0
that is, P ∗ = P .
“←”. Suppose P 2 = P = P ∗ and put H1 = {x | P (x) = x}. First note, that for P 6= 0, H1 6=
{0} is non-trivial. Indeed, since P (P (x)) = P (x), the image of P is part of the eigenspace
of P to the eigenvalues 1, P (H) ⊂ H1 . Since for z ∈ H1 , P (z) = z, H1 ⊂ P (H) and thus
H1 = P (H).
Since P is continuous and {0} is closed, H1 = (P − I)−1 ({0}) is a closed linear subspace of
H. By Riesz’s first theorem, H = H1 ⊕ H1⊥ . We have to show that P (x) = x1 for all x.
Since P 2 = P , P (P (x)) = P (x) for all x; hence P (x) ∈ H1 . We show x − P (x) ∈ H1⊥ which
completes the proof. For, let z ∈ H1 , then
hx − P (x) , zi = hx , zi − hP (x) , zi = hx , zi − hx , P (z)i = hx , zi − hx , zi = 0.
Hence x = P (x) + (I − P )(x) is the unique Riesz decomposition of x with respect to H1 and
H1⊥ .
Example 13.11 (a) Let {x1 , . . . , xn } be an NOS in H. Then
P (x) =
n
X
k=1
hx , xk i xk ,
x ∈ H,
13 Hilbert Space
352
defines the orthogonal projection P : H → H onto lin {x1 , . . . , xn }. Indeed, since P (xm ) =
Pn
2
k=1 hxm , xk i xk = xm , P = P and since
+
*
n
n
X
X
hP (x) , yi =
hx , xk i hxk , yi = x ,
hxk , yi xk = hx , P (y)i .
k=1
k=1
Hence, P ∗ = P and P is a projection.
(b) H = L2 ([0, 1] ∪ [2, 3]), g ∈ C([0, 1] ∪ [2, 3]). For f ∈ H define Tg f = gf . Then Tg = (Tg )∗
if and only if g(t) is real for all t. Tg is an orthogonal projection if g(t)2 = g(t) such that
g(t) = 0 or g(t) = 1. Since g is continuous, there are only four solutions: g1 = 0, g2 = 1,
g3 = χ[0,1] , and g4 = χ[2,3] .
In case of g3 , the subspace H1 can be identified with L2 ([0, 1]) since f ∈ H1 iff Tg f = f iff
gf = f iff f (t) = 0 for all t ∈ [2, 3].
(b) Properties of Orthogonal Projections
Throughout this paragraph let P1 and P2 be orthogonal projections on the closed subspaces H1
and H2 , respectively.
Lemma 13.22 The following are equivalent.
(a) P1 + P2 is an orthogonal projection.
(b) P1 P2 = 0.
(c) H1 ⊥ H2 .
Proof. (a) → (b). Let P1 + P2 be a projection. Then
!
(P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + P2 + P1 P2 + P2 P1 = P1 + P2 ,
hence P1 P2 + P2 P1 = 0. Multiplying this from the left by P1 and from the right by P1 yields
P1 P2 + P1 P2 P1 = 0 = P1 P2 P1 + P2 P1 .
This implies P1 P2 = P2 P1 and finally P1 P2 = P2 P1 = 0.
(b) → (c). Let x1 ∈ H1 and x2 ∈ H2 . Then
0 = hP1 P2 (x2 ) , x1 i = hP2 (x2 ) , P1 (x1 )i = hx2 , x1 i .
This shows H1 ⊥ H2 .
(c) → (b). Let x, z ∈ H be arbitrary. Then
hP1 P2 (x) , zi = hP2 (x) , P1 (z)i = hx2 , z1 i = 0;
Hence P1 P2 (x) = 0 and therefore P1 P2 = 0. The same argument works for P2 P1 = 0.
(b) → (a). Since P1 P2 = 0 implies P2 P1 = 0 (via H1 ⊥ H2 ),
(P1 + P2 )∗ = P1∗ + P2∗ = P1 + P2 ,
(P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + 0 + 0 + P2 .
13.2 Bounded Linear Operators in Hilbert Spaces
353
Lemma 13.23 The following are equivalent
P1 P2 is an orthogonal projection.
P1 P2 = P2 P1 .
(a)
(b)
In this case, P1 P2 is the orthogonal projection onto H1 ∩ H2 .
Proof. (b) → (a). (P1 P2 )∗ = P2∗ P1∗ = P2 P1 = P1 P2 , by assumption. Moreover, (P1 P2 )2 =
P1 P2 P1 P2 = P1 P1 P2 P2 = P1 P2 which completes this direction.
(a) → (b). P1 P2 = (P1 P2 )∗ = P2∗ P1∗ = P2 P1 .
Clearly, P1 P2 (H) ⊆ H1 and P2 P1 (H) ⊆ H2 ; hence P1 P2 (H) ⊆ H1 ∩ H2 . On the other hand
x ∈ H1 ∩ H2 implies P1 P2 x = x. This shows P1 P2 (H) = H1 ∩ H2 .
The proof of the following lemma is quite similar to that of the previous two lemmas, so we
omit it (see homework 40.5).
Lemma 13.24 The following are equivalent.
(a) H1 ⊆ H2 ,
(b)
(d)
P1 P2 = P1 ,
(e)
P2 − P1
(c)
is an orth. projection,
(f)
P1 ≤ P2 ,
P2 P1 = P1 ,
kP1 (x)k ≤ kP2 (x)k ,
x ∈ H.
Proof. We show (d) ⇒ (c). From P1 ≤ P2 we conclude that I − P2 ≤ I − P2 . Note that both
I − P1 and I − P2 are again orthogonal projections on H1⊥ and H2⊥ , respectively. Thus for all
x ∈ H:
k(I − P2 )P1 (x)k2 = h(I − P2 )P1 (x) , (I − P2 )P1 (x)i
= (I − P2 )∗ (I − P2 )P1 (x) , P1 (x) = h(I − P2 )P1 (x) , P1 (x)i
≤ h(I − P1 )P1 (x) , P1 (x)i = hP1 (x) − P1 (x) , P1 (x)i = h0 , P1 (x)i = 0.
Hence, k(I − P2 )P1 (x)k2 = 0 which implies (I − P2 )P1 = 0 and therefore P1 = P2 P1 .
13.2.5 Spectrum and Resolvent
Let T ∈ L (H) be a bounded linear operator.
(a) Definitions
Definition 13.15 (a) The resolvent set of T , denoted by ρ(T ), is the set of all λ ∈ C such that
there exists a bounded linear operator Rλ (T ) ∈ L (H) with
Rλ (T )(T − λI) = (T − λI)Rλ (T ) = I,
13 Hilbert Space
354
i. e. there T − λI has a bounded (continuous) inverse Rλ (T ). We call Rλ (T ) the resolvent of T
at λ.
(b) The set C \ ρ(T ) is called the spectrum of T and is denoted by σ(T ).
(c) λ ∈ C is called an eigenvalue of T if there exists a nonzero vector x, called eigenvector,
with (T − λI)x = 0. The set of all eigenvalues is the point spectrum σp (T )
Remark 13.3 (a) Note that the point spectrum is a subset of the spectrum, σp (T ) ⊆ σ(T ).
Suppose to the contrary, the eigenvalue λ with eigenvector y belongs to the resolvent set. Then
there exists Rλ (T ) ∈ L (H) with
y = Rλ (T )(T − λI)(y) = Rλ (T )(0) = 0
which contradicts the definition of an eigenvector; hence eigenvalues belong to the spectrum.
(b) λ ∈ σp (T ) is equivalent to T − λI not being injective. It may happen that T − λI is not
surjective, which also implies λ ∈ σ(T ) (see Example 13.12 (b) below).
Example 13.12 (a) H = Cn , A ∈ M(n × n, C). Since in finite dimensional spaces T ∈ L (H)
is injective if and only if T is surjective, σ(A) = σp (A).
(b) H = L2 ([0, 1]). (T f )(x) = xf (x). We have
σp (T ) = ∅.
Indeed, suppose λ is an eigenvalue and f ∈ L 2 ([0, 1]) an eigenfunction to T , that is (T −
λI)(f ) = 0; hence (x − λ)f (x) ≡ 0 a. e. on [0, 1]. Since x − λ is nonzero a. e. , f = 0 a. e. on
[0, 1]. That is f = 0 in H which contradicts the definition of an eigenvector. We have
C
\ [0, 1]
⊆ ρ(T ).
Suppose λ 6∈ [0, 1]. Since x − λ 6= 0 for all x ∈ [0, 1], g(x) =
bounded) function on [0, 1]. Hence,
(Rλ f )(x) =
1
is a continuous (hence
x−λ
1
f (x)
x−λ
defines a bounded linear operator which is inverse to T − λI since
1
1
(T − λI)
f (x) = (x − λ)
f (x) = f (x).
x−λ
x−λ
We have
σ(T ) = [0, 1].
Suppose to the contrary that there exists λ ∈ ρ(T ) ∩ [0, 1]. Then there exists Rλ ∈ L (H) with
Rλ (T − λI) = I.
(13.10)
By homework 39.5 (a), the norm of the multiplication operator Tg is less that or equal to kgk∞
(the supremum norm of g). Choose fε = χ(λ−ε,λ+ε) . Since χM = χ2M ,
k(T − λI)fε k = (x − λ)χUε (λ) (x)fε (x) ≤ sup (x − λ)χUε (λ) (x) kfε k .
x∈[0,1]
13.2 Bounded Linear Operators in Hilbert Spaces
However,
sup (x − λ)χUε (λ) (x) = sup | x − λ | = ε.
x∈[0,1]
This shows
355
x∈Uε (λ)
k(T − λI)fε k ≤ ε kfε k .
Inserting fε into (13.10) we obtain
kfε k = kRλ (T − λI)fε k ≤ kRλ k k(T − λI)fε k ≤ kRλ k ε kfε k
which implies kRλ k ≥ 1/ε. This contradicts the boundedness of Rλ since ε > 0 was arbitrary.
(b) Properties of the Spectrum
Lemma 13.25 Let T ∈ L (H). Then
σ(T ∗ ) = σ(T )∗ ,
(complex conjugation) ρ(T ∗ ) = ρ(T )∗ .
Proof. Suppose that λ ∈ ρ(T ). Then there exists Rλ (T ) ∈ L (H) such that
Rλ (T )(T − λI) = (T − λI)Rλ (T ) = I
(R (T )(T − λI))∗ = ((T − λI)R )∗ = I
λ
λ
(T − λI)Rλ (T )∗ = Rλ (t)∗ (T ∗ − λI) = I.
∗
This shows Rλ (T ∗ ) = Rλ (T )∗ is again a bounded linear operator on H. Hence,
ρ(T ∗ ) ⊆ (ρ(T ))∗ . Since ∗ is an involution (T ∗∗ = T ), the opposite inclusion follows.
Since σ(T ) is the complement of the resolvent set, the claim for the spectrum follows as well.
For λ, µ, T and S we have
Rλ (T ) − Rµ (T ) = (λ − µ)Rλ (T )Rµ (T ) = (λ − µ)Rµ (T )Rλ (T ),
Rλ (T ) − Rλ (S) = Rλ (T )(S − T )Rλ (S).
Proposition 13.26 (a) ρ(T ) is open and σ(T ) is closed.
(b) If λ0 ∈ ρ(T ) and | λ − λ0 | < kRλ0 (T )k−1 then λ ∈ ρ(T ) and
Rλ (T ) =
∞
X
n=0
(λ − λ0 )n Rλ0 (T )n+1 .
(c) If | λ | > kT k, then λ ∈ ρ(T ) and
Rλ (T ) = −
∞
X
n=0
λ−n−1 T n .
13 Hilbert Space
356
Proof. (a) follows from (b).
(b) For brevity, we write Rλ0 in place of Rλ0 (T ). With q = | λ − λ0 | kRλ0 (T )k, q ∈ (0, 1) we
have
∞
∞
X
X
kRλ0 k
n
n+1
converges.
| λ − λ0 | kRλ0 k
=
q n kRλ0 k =
1
−
q
n=0
n=0
P
P
By homework 38.4, xn converges if kxn k converges. Hence,
B=
∞
X
n=0
(λ − λ0 )n Rλn+1
0
converges in L (H) with respect to the operator norm. Moreover,
(T − λI)B = (T − λ0 I)B − (λ − λ0 )B
∞
∞
X
X
n
n+1
=
(λ − λ0 ) (T − λ0 I)Rλ0 −
(λ − λ0 )n+1 Rλn+1
0
n=0
=
n=0
∞
X
n=0
(λ − λ0 )n Rλn0 −
= (λ − λ0 )
0
Rλ0 0
= I.
∞
X
n=0
(λ − λ0 )n+1 Rλn+1
0
Similarly, one shows B(T − λI) = I. Thus, Rλ (T ) = B.
(c) Since | λ | > kT k, the series converges with respect to operator norm, say
C=−
We have
(T − λI)C = −
∞
X
λ
∞
X
λ−n−1 T n .
n=0
−n−1
T
n+1
n=0
+
∞
X
λ−n T n = λ0 T 0 = I.
n=0
Similarly, C(T − λI) = I; hence Rλ (T ) = C.
Remarks 13.4 (a) By (b), Rλ (T ) is a holomorphic (i. e. complex differentiable) function in
the variable λ with values in L (H). One can use this to show that the spectrum is non-empty,
σ(T ) 6= ∅.
P
n
(b) If kT k < 1, T − I is invertible with inverse − ∞
n=0 T .
(c) Proposition 13.26 (c) means: If λ ∈ σ(T ) then | λ | ≤
kT k. However, there is, in general, a smaller disc around
0 which contains the spectrum. By definition, the spectral radius r(T ) of T is the smallest non-negative number
such that the spectrum is completely contained in the disc
around 0 with radius r(T ):
σ(Τ)
0
r(T)
r(T ) = sup{| λ | | λ ∈ σ(T )}.
13.2 Bounded Linear Operators in Hilbert Spaces
357
(d) λ ∈ σ(T ) implies λn ∈ σ(T n ) for all non-negative integers. Indeed, suppose λn ∈ ρ(T n ),
that is B(T n − λn ) = (T n − λn )B = I for some bounded B. Hence,
B
n
X
k=0
T k λn−1−k (T − λ) = (T − λ)CB = I;
thus λ ∈ ρ(T ).
We shall refine the above statement and give a better upper bound for {| λ | | λ ∈ σ(T )} than
kT k.
Proposition 13.27 Let T ∈L (H) be a bounded linear operator. Then the spectral radius of T
is
1
r(T ) = lim kT n k n .
(13.11)
n→∞
The proof is in the appendix.
13.2.6 The Spectrum of Self-Adjoint Operators
Proposition 13.28 Let T = T ∗ be a self-adjoint operator in L (H). Then λ ∈ ρ(T ) if and
only if there exists C > 0 such that
k(T − λI)xk ≥ C kxk .
Proof. Suppose that λ ∈ ρ(T ). Then there exists (a non-zero) bounded operator Rλ (T ) such
that
kxk = kRλ (T )(T − λI)xk ≤ kRλ (T )k k(T − λI)xk .
Hence,
k(T − λI)xk ≥
1
kxk ,
kRλ (T )k
x ∈ H.
We can choose C = 1/ kRλ (T )k and the condition of the proposition is satisfied.
Suppose, the condition is satisfied. We prove the other direction in 3 steps, i. e. T − λ0 I has a
bounded inverse operator which is defined on the whole space H.
Step 1. T − λI is injective. Suppose to the contrary that (T − λ)x1 = (T − λ)x2 . Then
0 = k(T − λ)(x1 − x2 )k ≥ C kx1 − x2 k ,
and kx1 − x2 k = 0 follows. That is x1 = x2 . Hence, T − λI is injective.
Step 2. H1 = (T − λI)H, the range of T − λI is closed. Suppose that yn = (T − λI)xn ,
xn ∈ H, converges to some y ∈ H. We want to show that y ∈ H1 . Clearly (yn ) is a Cauchy
sequence such that kym − yn k → 0 as m, n → ∞. By assumption,
kym − yn k = k(T − λI)(xn − xm )k ≥ C kxn − xm k .
Thus, (xn ) is a Cauchy sequence in H. Since H is complete, xn → x for some x ∈ H. Since
T − λI is continuous,
yn = (T − λI)xn −→ (T − λI)x.
n→∞
13 Hilbert Space
358
Hence, y = (T − λI)x and H1 is a closed subspace.
Step 3. H1 = H. By Riesz first theorem, H = H1 ⊕ H1⊥ . We have to show that H1⊥ = {0}. Let
u ∈ H1⊥ , that is, since T ∗ = T ,
0 = h(T − λI)x , ui = x , (T − λI)u , for all x ∈ H.
This shows (T − λI)u = 0, hence T (u) = λu. This implies
hT (u) , ui = λ hu , ui .
However, T = T ∗ implies that the left side is real, by Remark 13.2 (d). Hence λ = λ is real.
We conclude, (T − λI)u = 0. By injectivity of T − λI, u = 0. That is H1 = H.
We have shown that there exists a linear operator S = (T − λI)−1 which is inverse to T − λI
and defined on the whole space H. Since
kyk = k(T − λI)S(y)k ≥ C kS(y)k ,
S is bounded with kSk ≤ 1/C. Hence, S = Rλ (T ).
Note that for any bounded real function f (x, y) we have
sup f (x, y) = sup(sup f (x, y)) = sup(sup f (x, y)).
x,y
x
y
y
x
In particular, kxk = sup | hx , yi | since y = x/ kxk yields the supremum and CSI gives the
kyk≤1
upper bound. Further, kT (x)k = sup | hT (x) , yi | such that
kyk≤1
kT k = sup sup | hT (x) , yi | =
kxk≤1 kyk≤1
sup
kxk≤1, kyk≤1
| hT (x) , yi | = sup sup | hT (x) , yi |
kyk≤1 kxk≤1
In case of self-adjoint operators we can generalize this.
Proposition 13.29 Let T = T ∗ ∈ L (H). Then we have
kT k = sup | hT (x) , xi | .
(13.12)
kxk≤1
Proof. Let C = sup | hT (x) , xi |. By Cauchy–Schwarz inequality, | hT (x) , xi | ≤ kT k kxk2
kxk≤1
such that C ≤ kT k.
For any real positive α > 0 we have:
1 kT (x)k2 = hT (x) , T (x)i = T 2 (x) , x =
T (αx + α−1 T (x)) , αx + α−1 T (x) −
4
= − T (αx − α−1 T (x)) , αx − α−1 T (x)
2
2 1 C αx + α−1 T (x) + C αx − α−1 T (x)
≤
4
2 C 2
C
2 kαxk2 + 2 α−1 T (x) =
α kxk2 + α−2 kT (x)k2 .
=
P.I. 4
2
13.2 Bounded Linear Operators in Hilbert Spaces
359
Inserting α2 = kT (x)k / kxk we obtain
=
C
(kT (x)k kxk + kxk kT (x)k)
2
which implies kT (x)k ≤ C kxk. Thus, kT k = C.
Let m = inf hT (x) , xi and M = sup hT (x) , xi denote the lower and upper bound of T .
kxk=1
kxk=1
Then we have
sup | hT (x) , xi | = max{| m | , M} = kT k ,
kxk≤1
and
m kxk2 ≤ hT (x) , xi ≤ M kxk2 ,
for all x ∈ H.
Corollary 13.30 Let T = T ∗ ∈ L (H) be a self-adjoint operator. Then
σ(T ) ⊂ [m, M].
Proof. Suppose that λ0 6∈ [m, M]. Then
C :=
inf
µ∈[m,M ]
| λ0 − µ | > 0.
Since m = inf hT (x) , xi and M = sup hT (x) , xi we have for kxk = 1
kxk=1
kxk=1
2
k(T − λ0 I)xk = kxk k(T − λ0 I)xk ≥ | h(T − λ0 I)x , xi | = hT (x) , xi − λ0 kxk ≥ C.
|{z} | {z }
CSI
∈[m,M ]
1 This implies
k(T − λ0 I)xk ≥ C kxk
for all x ∈ H.
By Proposition 13.28, λ0 ∈ ρ(T ).
Example 13.13 (a) Let H = L2 [0, 1], g ∈ C[0, 1] a real-valued function, and (Tg f )(t) =
g(t)f (t). Let m = inf g(t), M = sup g(t). One proves that m and M are the lower and
t∈[0,1]
t∈[0,1]
upper bounds of Tg such that σ(Tg ) ⊆ [m, M]. Since g is continuous, by the intermediate value
theorem, σ(Tg ) = [m, M].
(b) Let T = T ∗ ∈ L (H) be self-adjoint. Then all eigenvalues of T are real and eigenvectors
to different eigenvalues are orthogonal to each other. Proof. The first statement is clear from
Corollary 13.30. Suppose that T (x) = λx and T (y) = µy with λ 6= µ. Then
λ hx , yi = hT (x) , yi = hx , T (y)i = µ hx , yi = µ hx , yi .
Since λ 6= µ, hx , yi = 0.
The statement about orthogonality holds for arbitrary normal operators.
13 Hilbert Space
360
Appendix: Compact Self-Adjoint Operator in Hilbert Space
Proof of Proposition 13.27. From the theory of power series, Theorem 2.34 we know that the
series
−z
∞
X
n=0
kT n k z n
(13.13)
converges if | z | < R and diverges if | z | > R, where
R=
lim
n→∞
1
p
n
kT n k
.
(13.14)
Inserting z = 1/λ and using homework 38.4, we have
−
p
n
∞
X
λ−n−1 T n
n=0
n→∞
kT n k (and converges if | λ | > lim
p
n
kT n k). The reason for the
p
divergence of the power series is, that the spectrum σ(T ) and the circle with radius lim n kT n k
n→∞
have points in common; hence
p
r(T ) = lim n kT n k.
diverges if | λ | < lim
n→∞
n→∞
On the other hand, by Remark 13.4 (d), λ ∈ σ(T ) implies λn ∈ σ(T n ); hence, by Remark 13.4
(c),
p
| λn | ≤ kT n k =⇒ | λ | ≤ n kT n k.
Taking the supremum over all λ ∈ σ(T ) on the left and the lim over all n on the right, we have
p
p
r(T ) ≤ lim n kT n k ≤ lim n kT n k = r(T ).
n→∞
Hence, the sequence
p
n
n→∞
kT n k converges to r(T ) as n tends to ∞.
Compact operators generalize finite rank operators. Integral operators on compact sets are compact.
Definition 13.16 A linear operator T ∈ L (H) is called compact if the closure T (U1 ) of the
unit ball U1 = {x | kxk ≤ 1} is compact in H. In other words, for every sequence (xn ),
xn ∈ U1 , there exists a subsequence such that T (xnk ) converges.
Proposition 13.31 For T ∈ L (H) the following are equivalent:
(a) T is compact.
(b) T ∗ is compact.
(c) For all sequences (xn ) with (hxn , yi) −→ hx , yi converges for all y we have
T (xn ) −→ T (x).
(d) There exists a sequence (Tn ) of operators of finite rank such that kT − Tn k −→
0.
13.2 Bounded Linear Operators in Hilbert Spaces
361
Definition 13.17 Let T be an operator on H and H1 a closed subspace of H. We call H1 an
reducing subspace if both H1 and H1⊥ are T -invariant, i. e. T (H1 ) ⊂ H1 and T (H1⊥ ) ⊂ H1⊥ .
Proposition 13.32 Let T ∈ L (H) be normal.
(a) The eigenspace ker(T − λI) is a reducing subspace for T and ker(T − λI) = ker(T − λI)∗ .
(b) If λ, µ are distinct eigenvalues of T , ker(T − λI) ⊥ ker(T − µI).
Proof. (a) Since T is normal, so is T − λ. Hence k(T − λ)(x)k = (T − λ)∗ (x). Thus,
ker(T − λ) = ker(T − λ)∗ . In particular, T ∗ (x) = λx if x ∈ ker(T − λI).
We show invariance. Let x ∈ ker(T − λ); then T (x) = λx ∈ ker(T − αI). Similarly,
x ∈ ker(T − λI)⊥ , y ∈ ker(T − λI) imply
hT (x) , yi = x , T ∗ (y) = x , λy = 0.
Hence, ker(T − λI)⊥ is T -invariant, too.
(b) Let T (x) = λx and T (y) = µy. Then (a) and T ∗ (y) = µy ... imply
λ hx , yi = hT (x) , yi = x , T ∗ (y) = hx , µyi = µ hx , yi .
Thus (λ − µ) hx , yi = 0; since λ 6= µ, x ⊥ y.
Theorem 13.33 (Spectral Theorem for Compact Self-Adjoint Operators) Let H be an infinite dimensional separable Hilbert space and T ∈ L (H) compact and self-adjoint.
Then there exists a real sequence (λn ) with λn −→ 0 and an CNOS {en | n ∈ N} ∪ {fk | k ∈
n→∞
N ⊂ N} such that
T (en ) = λn en ,
n∈N
T (fk ) = 0,
k ∈ N.
Moreover,
T (x) =
∞
X
n=1
λn hx , en i en ,
x ∈ H.
(13.15)
Remarks 13.5 (a) Since {en } ∪ {fk } is a CNOS, any x ∈ H can be written as its Fourier series
x=
∞
X
n=1
hx , en i en +
X
k∈N
hx , fk i fk .
Applying T using T (en ) = λn en we have
T (x) =
∞
X
n=1
hx , en i λn en +
X
k∈N
hx , fk i T (fk )
| {z }
=0
which establishes (13.15). The main point is the existence of a CNOS of eigenvectors {en } ∪
{fk }.
(b) In case H = Cn (Rn ) the theorem says that any hermitean (symmetric) matrix A is diagonalizable with only real eigenvalues.
362
13 Hilbert Space
Chapter 14
Complex Analysis
Here are some useful textbooks on Complex Analysis: [FL88] (in German), [Kno78] (in German), [Nee97], [Rüh83] (in German), [Hen88].
The main part of this chapter deals with holomorphic functions which is another name for a
function which is complex differentiable in an open set. On the one hand, we are already
familiar with a huge class of holomorphic functions: polynomials, the exponential function,
sine and cosine functions. On the other hand holomorphic functions possess quite amazing
properties completely unusual from the vie point of real analysis. The properties are very
strong. For example, it is easy to construct a real function which is 17 times differentiable but
not 18 times. A complex differentiable function (in a small region) is automatically infinitely
often differentiable.
Good references are Ahlfors [Ahl78], a little harder is Conway [Con78], easier is Howie
[How03].
14.1 Holomorphic Functions
14.1.1 Complex Differentiation
We start with some notations.
Ur
UR (a)
Ur
◦
Ur
Sr
{z | | z | < r}
{z | | z − a | < R}
{z | | z | ≤ r}
{z | 0 < | z | < r}
{z | | z | = r}
open ball of radius r around 0
open ball of radius R around a
closed ball of radius r around 0
punctured ball of radius r
circle of radius r around 0
Definition 14.1 Let U ⊂ C be an open subset of C and f : U → C be a complex function.
(a) If z0 ∈ U and the limit
f (z) − f (z0 )
lim
=: f ′ (z0 )
z→z0
z − z0
exists, we call f complex differentiable at z0 and f ′ (z0 ) the derivative of f at z0 . We call f ′ (z0 )
the derivative of f at z0 .
363
14 Complex Analysis
364
(b) If f is complex differentiable for every z0 ∈ U, we say that f is holomorphic in U. We call
f ′ the derivative of f on U.
(c) f is holomorphic at z0 if it complex differentiable in a certain neighborhood of z0 .
To be quite explicit, f ′ (z0 ) exists if to every ε > 0 there exists some δ > 0 such that z ∈ Uδ (z0 )
implies
f (z) − f (z0 )
′
− f (z0 ) < ε.
z − z0
Remarks 14.1 (a) Differentiability of f at z0 forces f to be continuous at z0 . Indeed, f is
differentiable at z0 with derivative f ′ (z0 ) if and only if, there exists a function r(z, z0 ) such that
f (z) = f (z0 ) + f ′ (z0 )(z − z0 ) + (z − z0 )r(z, z0 ),
where limz→z0 r(z, z0 ) = 0, In particular, taking the limit z → z0 in the above equation we get
lim f (z) = f (z0 ),
z→z0
which proves continuity at z0 .
Complex conjugation is a uniformly continuous function on C since | z − z 0 | = | z − z0 | for
all z, z0 ∈ C.
(b) The derivative satisfies the well-known sum, product, and quotient rules, that is, if both f
and g are holomorphic in U, so are f + g, f g, and f /g, provided g 6= 0 in U and we have
′
f ′g − f g′
f
′
′
′
′
′
′
=
.
(f + g) = f + g , (f g) = f g + f g ,
g
g2
f
g
Also, the chain rule holds; if U → V → C are holomorphic, so is g ◦f and
(g ◦f )′ (z) = g ′(f (z)) f ′ (z).
The proofs are exactly the same as in the real case. Since the constant functions f (z) = c and
the identity f (z) = z are holomorphic in C, so is every polynomial with complex coefficients
and, moreover, every rational function (quotient of two polynomials) f : U → C, provided the
denominator has no zeros in U. So, we already know a large class of holomorphic functions.
Another bigger class are the convergent power series.
Example 14.1 f (z) = | z |2 is complex differentiable at 0 with f ′ (0) = 0. f is not differentiable
at z0 = 1. Indeed,
f (h + 0) − f (0)
| h |2
lim
= lim
= lim h = 0.
h→0
h→0 h
h→0
h
On the other hand. Let ε ∈ R
| 1 + ε |2 − 1
2ε + ε2
= lim
=2
ε→0
ε→0
ε
ε
lim
whereas
| 1 + iε |2 − 1
1 + ε2 − 1
= lim
= 0.
lim
ε→0
ε→0
iε
iε
This shows that f ′ (1) does not exist.
14.1 Holomorphic Functions
365
14.1.2 Power Series
Recall from Subsection 2.3.9 that a power series
R=
lim
n→∞
P
1
p
n
cn z n has a radius of convergence
| cn |
.
That is, the series converges absolutely for all z with | z | < R; the series diverges for | z | > R,
the behaviour for | z | = R depends on the (cn ). Moreover, it converges uniformly on every
closed ball Ur with 0 < r < R, see Proposition 6.4.
We already know that a real power series can be differentiated elementwise, see Corollary 6.11.
We will see, that power series are holomorphic inside its radius of convergence.
Proposition 14.1 Let a ∈ C and
f (z) =
∞
X
n=0
cn (z − a)n
(14.1)
be a power series with radius of convergence R. Then f : UR (a) → C is holomorphic and the
derivative is
′
f (z) =
∞
X
n=1
ncn (z − a)n−1 .
(14.2)
Proof. If the series (14.1) converges in UR (a), the root test shows that the series (14.2) also
converges there. Without loss of generality, take a = 0. Denote the sum of the series (14.2) by
g(z), fix w ∈ UR (0) and choose r so that | w | < r < R. If z 6= w, we have
n
∞
X
z − wn
f (z) − f (w)
n−1
cn
.
− g(w) =
− nw
z−w
z
−
w
n=0
The expression in the brackets is 0 if n = 1. For n ≥ 2 it is (by direct computation of the
following term)
= (z − w)
n−1
X
k=1
kw
k−1
z
n−k−1
=
n−1
X
k=1
kw k−1 z n−k − kw k z n−k−1 ,
(14.3)
which gives a telescope sum if we shift k := k + 1 in the first summand. If | z | < r, the absolute
value of the sum (14.3) is less than
n(n − 1) n−2
r ,
2
so
∞
X
f (z) − f (w)
− g(w) ≤ | z − w |
n2 | cn | r n−2 .
(14.4)
z−w
n=2
Since r < R, the last series converges. Hence the left side of (14.4) tends to 0 as z → w. This
says that f ′ (w) = g(w), and completes the proof.
14 Complex Analysis
366
Corollary 14.2 Since f ′ (z) is again a power series with the same radius of convergence R, the
proposition can be applied to f ′ (z). It follows that f has derivatives of all orders and that each
derivative has a power series expansion around a
f
(k)
(z) =
∞
X
n=k
n(n − 1) · · · (n − k + 1)cn (z − a)n−k
(14.5)
Inserting z = a implies
f (k) (a) = k!ck ,
k = 0, 1, . . . .
This shows that the coefficients cn in the power series expansion f (z) =
with midpoint a are unique.
z
Example 14.2 The exponential function e =
P∞
n=0 cn (z
− a)n of f
∞
X
zn
is holomorphic on the whole complex
n!
plane with (ez )′ = ez ; similarly, the trigonometric functions sin z and cos z are holomorphic in
C since
∞
∞
2n+1
2n
X
X
n z
n z
sin z =
(−1)
(−1)
, cos z =
.
(2n + 1)!
(2n)!
n=0
n=0
n=0
We have (sin z)′ = cos z and (cos z)′ = − sin z.
Definition 14.2 A complex function which is defined on C and which is holomorphic on the
entire complex plane is called an entire function.
14.1.3 Cauchy–Riemann Equations
Let us identify the complex field C and the two dimensional real plane R2 via z = x + iy, that
is, every complex number z corresponds to an ordered pair (x, y) of real numbers. In this way,
a complex function w = f (z) corresponds to a function U → R2 where U ⊂ C is open. We
have w = u + iv where u = u(x, y) and v = v(x, y) are the real and the imaginary parts of
the function f ; u = Re w and v = Im w. Problem: What is the relation between complex
differentiability and the differentiability of f as a function from R2 to R2 ?
Proposition 14.3 Let
f : U → C,
U ⊂ C open,
a∈U
be a function. Then the following are equivalent:
(a) f is complex differentiable at a.
(b) f (x, y) = u(x, y) + iv(x, y) is real differentiable at a as a function
f : U ⊂ R2 → R2 , and the Cauchy–Riemann equations are satisfied at a:
∂u
∂v
(a) =
(a),
∂x
∂y
ux = vy ,
∂u
∂v
(a) = − (a).
∂y
∂x
uy = −vx .
(14.6)
14.1 Holomorphic Functions
367
In this case,
f ′ = ux + ivx = vy − iuy .
Proof. (a) → (b): Suppose that z = h + ik is a complex number such that a + z ∈ U; put
f ′ (a) = b1 + ib2 . By assumption,
| f (a + z) − f (a) − zf ′ (a) |
= 0.
z→0
|z |
lim
We shall write this in the real form with real variables h and k. Note that
zf ′ (a) = (h + ik)(b1 + ib2 ) = hb1 − kb2 + i(hb2 + kb1 )
hb1 − kb2
b1 −b2
h
=
.
=
k
hb2 + kb2
b2 b1
This implies, with the identification z = (h, k),
h f (a + z) − f (a) − b1 −b2
k b2 b1
= 0.
lim
z→0
|z |
That is (see Subsection 7.2), f is real differentiable at a with the Jacobian matrix
b1 −b2
′
f (a) = Df (a) =
.
b2 b1
(14.7)
By Proposition 7.6, the Jacobian matrix is exactly the matrix of the partial derivatives, that is
ux uy
.
Df (a) =
vx vy
Comparing this with (14.7), we obtain ux (a) = vy (a) = Re f ′ (a) and uy (a) = −vx (a) =
Im f ′ (a). This completes the proof of the first direction.
(b) → (a). Since f = (u, v) is differentiable at a ∈ U as a real function, there exists a linear
mapping Df (a) ∈ L (R2 ) such that
f (a + (h, k)) − f (a) − Df (a) h k lim
= 0.
(h,k)→0
k(h, k)k
By Proposition 7.6,
ux uy
.
Df (a) =
vx vy
The Cauchy–Riemann equations show that Df takes the form
b1 −b2
Df (a) =
,
b2 b1
14 Complex Analysis
368
where ux = b1 and vx = b2 . Writing
h
hb1 − kb2
Df (a)
=
= z(b1 + ib2 )
k
hb2 + kb2
in the complex form with z = h+ik gives f is complex differentiable at a with f ′ (a) = b1 +ib2 .
Example 14.3 (a) We already know that f (z) = z 2 is complex differentiable. Hence, the
Cauchy–Riemann equations must be fulfilled. From
f (z) = z 2 = (x + iy)2 = x2 − y 2 + 2ixy,
u(x, y) = x2 − y 2 ,
v(x, y) = 2xy
we conclude
ux = 2x,
uy = −2y,
vx = 2y,
vy = 2x.
The Cauchy–Riemann equations are satisfied.
(b) f (z) = | z |2 . Since f (z) = x2 + y 2, u(x, y) = x2 + y 2 , v(x, y) = 0. The Cauchy-Riemann
equations yield ux = 2x = 0 = vy and uy = 2y = 0 = −vx such that z = 0 is the only solution
of the CRE. z = 0 is the only point where f is differentiable.
f (z) = z is nowhere differentiable since u(x, y) = x, v(x, y) = −y; thus
1 = ux 6= vy = −1.
c2
c1
c3
A function f : U → C, U ⊂ C open, is called locally
constant in U, if for every point a ∈ U there exists a ball
V with a ∈ V ⊂ U such that f is constant on V .
Clearly, on every connectedness component of U, f is
constant. In fact, one can define U to be connected if for
every holomorphic f : U → C, f is constant.
Corollary 14.4 Let U ⊂ C be open and f : U → C be a holomorphic function on U.
(a) If f ′ (z) = 0 for all z ∈ U, then f is locally constant in U.
(b) If f takes real values only, then f is locally constant.
(c) If f has a continuous second derivative, u = Re f and v = Im f are harmonic functions,
i. e., they satisfy the Laplace equation ∆(u) = uxx + uyy = 0 and ∆(v) = 0.
Proof. (a) Since f ′ (z) = 0 for all z ∈ U, the Cauchy–Riemann equations imply ux = uy =
vx = vy = 0 in U. From real analysis, it is known that u and v are locally constant in U (apply
Corollary 7.12 with grad f (a + θx) = 0).
(b) Since f takes only real values, v(x, y) = 0 for all (x, y) ∈ U. This implies vx = vy = 0 on
U. By the Cauchy–Riemann equations, ux = uy = 0 and f is locally constant by (a).
14.2 Cauchy’s Integral Formula
369
(c) ux = vy implies uxx = vyx and differentiating uy = −vx with respect to y yields
uyy = −vxy . Since both u and v are twice continuously differentiable (since so is f ), by
Schwarz’ Lemma, the sum is uxx + uyy = vyx − vxy = 0. The same argument works for
vxx + vyy = 0.
Remarks 14.2 (a) We will see soon that the additional differentiability assumption in (c) is
superfluous.
(b) Note, that an inverse statement to (c) is easily proved: If Q = (a, b) × (c, d) is an open
rectangle and u : Q → R is harmonic, then there exists a holomorphic function f : Q → C
such that u = Re f .
14.2 Cauchy’s Integral Formula
14.2.1 Integration
The major objective of this section is to prove the converse to Proposition 14.1: Every in D
holomorphic function is representable as a power series in D. The quickest route to this is
via Cauchy’s Theorem and Cauchy’s Integral Formula. The required integration theory will be
developed. It is a useful tool to study holomorphic functions.
Recall from Section 5.4 the definition of the Riemann integral of a bounded complex valued
function ϕ : [a, b] → C. It was defined by integrating both the real and the imaginary parts of
ϕ. In what follows, a path is always a piecewise continuously differentiable curve.
Definition 14.3 Let U ⊂ C be open and
f : U → C a continuous function on U. Suppose that γ : [t1 , tn ] → U is a path in U. The
Z
integral of f along γ is defined as the line integral
6
Z5
U
f
C
Z4
Z1
γ
Z2
Z3
Z
f (z) dz :=
t
n Zk
X
f (γ(t))γ ′ (t) dt, (14.8)
k=1t
γ
k−1
where γ is continuously differentiable on
[tk−1 , tk ] for all k = 1, . . . , n.
By the change of variable rule, the integral of f along γ does not depend on the parametrization
γ of the path {γ(t) | t ∈ [t0 , t1 ]}. However, if we exchange the initial and the end point of γ(t),
we obtain a negative sign.
Remarks 14.3 (Properties of the complex integral) (a) The integral of f along γ is linear
over C:
Z
Z
Z
Z
Z
(αf1 + βf2 ) dz = α f1 dz + β f2 dz,
f (z) dz = −
f (z) dz,
γ
γ
γ
γ−
γ+
14 Complex Analysis
370
where γ− has the opposite orientation of γ+ .
(b) If γ1 and γ2 are two paths so that γ1 and γ2 join to form γ, then we have
Z
f (z) dz =
γ
Z
f (z) dz +
Z
f (z) dz.
γ2
γ1
(c) From the definition and the triangle inequality, it follows that for a continuously differentiable path γ
Z
f (z) dz ≤ M ℓ,
γ
Rb
where | f (z) | ≤ M for all z ∈ γ and ℓ is the length of γ, ℓ = a | γ ′ (t) | dt. t ∈ [t0 , t1 ]. Note
that the integral on the right is the length of the curve γ(t).
Rb
(d) The integral of f over γ generalizes the real integral f (t) dt. Indeed, let γ(t) = t, t ∈ [a, b],
a
then
Z
f (z) dz =
γ
Zb
f (t) dt.
a
(e) Let γ be the circle Sr (a) of radius r with center a. We can parametrize the positively oriented
circle as γ(t) = a + reit , t ∈ [0, 2π]. Then
Z
f (z) dz = ir
γ
Z2π
0
f a + reit eit dt.
Example 14.4 (a) Let γ1 (t) = eit , t ∈ [0, π], be the half of the unit circle from 1 to −1 via i and
γ2 (t) = −t, t ∈ [−1, 1] the segment from 1 to −1. Then γ1′ (t) = ieit and γ2 (t)′ = −1. Hence,
Z
γ1
2
z dz = i
Z
0
π
e2it eit
dt = i
Z
π
−2it+it
e
0
π
i −it =
e = −(−1 − 1) = 2.
−i
0
Z 1
Z
2
z 2 dz = −
t2 dt = − .
see (b)
3
−1
dt = i
Z
π
e−it dt
0
γ2
In particular, the integral of z 2 is not path independent.
(b)
(
Z
Z 2π
Z 2π
0,
n 6= 1,
dz
ireit
=
dt = ir −n+1
e−(n−1)it dt =
n
n
int
r e
2πi, n = 1.
Sr z
0
0
14.2 Cauchy’s Integral Formula
371
14.2.2 Cauchy’s Theorem
Cauchy’s theorem is the main part in the proof that every in a holomorphic function can be
written as a power series with midpoint a. As a consequence of Corollary 14.2, holomorphic
functions have derivatives of all orders.
We start with a very weak form. The additional assumption is, that f has an antiderivative.
Lemma 14.5 Let f : U → C be continuous, and suppose that f has an antiderivative F which
is holomorphic on U, F ′ = f . If γ is any path in U joining z0 and z1 from U, we have
Z
f (z) dz = F (z2 ) − F (z1 ).
γ
In particular, if γ is a closed path in U
Z
f (z) dz = 0.
γ
Proof. It suffices to prove the statement for a continuously differentiable curve γ(t). Put h(t) =
F (γ(t)). By the chain rule
h′ (t) =
d
F (γ(t)) = F ′ (γ(t))γ ′ (t) = f (γ(t))γ ′ (t)
dt
By definition of the integral and the fundamental theorem of calculus (see Subsection 5.5),
Z
Z b
Z b
′
f (z) dz =
f (γ(t))γ (t) dt =
h′ (t) dt = h(t)|ba = h(b) − h(a) = F (z1 ) − F (z0 ).
a
γ
Example 14.5 (a)
(b)
R πi
1
a
Z
1−i
z 3 dz =
2+3i
(1 − i)4 (2 + 3i)4
−
.
4
4
ez dz = −1 − e.
Theorem 14.6 (Cauchy’s Theorem) Let U be a simply connected region in C and let f (z) be
R
holomorphic in U. Suppose that γ(t) is a path in U joining z0 and z1 in U. Then f (z) dz
γ
R
depends on z0 and z1 only and not on the choice of the path. In particular, f (z) dz = 0 for
γ
any closed path in U.
Proof. We give the proof under the weak additional assumption that f ′ not only exists but is
continuous in U. In this case, the partial derivatives ux , uy , vx , and vy are continuous and we can
apply the integrability criterion Proposition 8.3 which was a consequence of Green’s theorem,
see Theorem 10.3. Note that we need U to be simply connected in contrast to Lemma 14.5.
14 Complex Analysis
372
Without this additional assumption (f ′ is continuous), the proof is lengthy (see [FB93, Lan89,
Jän93]) and starts with triangular or rectangular paths and is generalized then to arbitrary paths.
We have
Z
f (z) dz =
γ
Z
(u + iv)( dx + i dy) =
γ
Z
(u dx − v dy) + i
γ
We have path independence of the line integral
R
Z
(v dx + u dy).
γ
P dx + Q dy if and only if, the integrability
γ
condition Qx = Py is satisfied if and only if P dx + Q dy is a closed form.
In our case, the real part is path independent if and only if −vx = uy . The imaginary part is
path independent if and only if ux = vy . These are exactly the Cauchy–Riemann equations
which are satisfied since f is holomorphic.
Remarks 14.4 (a) The proposition holds under the following weaker assumption: f is continuous in the closure U and holomorphic in U, U is a simply connected region, and γ = ∂U is a
path.
(b) The statement is wrong without the assumption “U is simply connected”. Indeed, consider
the circle of radius r with center a, that is γ(t) = a + reit . Then f (z) = 1/(z − a) is singular
at a and we have
Z 2π it
Z 2π
Z
dz
e
= ir
dt = i
dt = 2πi.
z−a
reit
0
0
Sr (a)
δ1
δ3
δ2
(c) For a non-simply
connected region G one
cuts G with pairwise
inverse to each other
paths (in the picture: δ1 ,
δ2 , δ3 and δ4 ). The resulting region G̃ is now
simply connected such
R
that f (z) dz = 0 by
δ4
γ
γ
2
γ1
∂G
(a).
Since the integrals along δi , i = 1, . . . , 4, cancel, we have
Z
f (z) dz = 0.
γ+γ1 +γ2
Sr2
Sr1
In particular, if f is holomorphic in {z | 0 < | z − a | < R} and 0 <
r1 < r2 < R, then
Z
Z
f (z) dz =
f (z) dz
Sr1 (a)
Sr2 (a)
if both circles are positively oriented.
14.2 Cauchy’s Integral Formula
373
Proposition 14.7 Let U be a simply connected region, z0 ∈ U, U0 = U \ {z0 }. Suppose that f
is holomorphic in U0 and bounded in a certain neighborhood of z0 .
Then
Z
f (z) dz = 0
γ
for every non-selfintersecting closed path γ in U0 .
Proof. Suppose that | f (z) | ≤ C for | z − z0 | < ε0 . For any ε with 0 < ε < ε0 we then have by
Remark 14.3 (c)
Z
f (z) dz ≤ 2πε C.
Sε (z0 )
Z
Z
Z
By Remark 14.4 (c), f (z) dz =
f (z) dz =
f (z) dz. Hence
γ
Sε0 (z0 )
Sε (z0 )
Z
Z
f (z) dz = γ
Sε (z0 )
Since this is true for all small ε > 0,
R
f (z) dz ≤ 2πε C.
f (z) dz = 0.
γ
We will see soon that under the conditions of the proposition, f can be made holomorphic at z0 ,
too.
14.2.3 Cauchy’s Integral Formula
Theorem 14.8 (Cauchy’s Integral Formula) Let U be a region. Suppose that f is holomorphic in U, and γ a non-selfintersecting positively oriented path in U such that γ is the boundary
of U0 ⊂ U; in particular, U0 is simply connected.
Then for every a ∈ U0 we have
Z
f (z) dz
1
f (a) =
(14.9)
2πi
z−a
γ
Proof. a ∈ U0 is fixed. For z ∈ U we define
(
F (z) =
f (z)−f (a)
,
z−a
0,
z 6= a
z = a.
Then F (z) is holomorphic in U \ {a} and bounded in a neighborhood of a since f ′ (a) exists
and therefore,
f (z) − f (a)
′
<ε
−
f
(a)
z−a
14 Complex Analysis
374
as z approaches a. Using Proposition 14.7 and Remark14.4 (b) we have
R
F (z) dz = 0, that is
γ
Z
γ
such that
Z
γ
f (z) dz
=
z−a
Z
γ
f (z) − f (a)
dz = 0,
z−a
f (a) dz
= f (a)
z−a
Z
γ
dz
= 2πi f (a).
z−a
Remark 14.5 The values of a holomorphic function f inside a path γ are completely determined by the values of f on γ.
Example 14.6 Evaluate
Ir :=
Z
sin z
dz
z2 + 1
Sr (a)
in cases a = 1 + i and r = 12 , 2, 3.
Solution. We use the partial fraction decomposition of z 2 + 1 to obtain linear terms in the
denominator.
1
1
1
1
.
=
−
z2 + 1
2i z − i z + i
Hence, with f (z) = sin z we have in case r = 3
Z
Z
Z
1
1
sin z dz
sin z
=
dz −
I3 =
z2 + 1
2i
z−i
2i
Sr (a)
Sr (a)
sin z
dz
z+i
Sr (a)
= π(f (i) − f (−i)) = 2π sin(i) = πi(e − 1/e).
In case r = 2, the function
sin z
z+i
is holomorphic inside the circle of radius 2 with center a. Hence,
I2 = π sin(i) = I3 /2.
In case r = 12 , both integrand are holomorphic, such that I 1 = 0.
2
2
γ2
γ3
45°
0
γ1
R
Example 14.7 Consider the function f (z) = eiz which
is an entire function. Let γ1 (t) = t, t ∈ [0, R], be the
segment from 0 to R on the real line; let γ2 (t) = Reit ,
t ∈ [0, π/4], be the sector of the circle of radius R with
center 0; and let finally γ3 (t) = teiπ/4 , t ∈ [0, R], be the
segment from 0 to Reiπ/4 . By Cauchy’s Theorem,
Z
I1 + I2 − I3 =
f (z) dz = 0.
γ1 +γ2 −γ3
14.2 Cauchy’s Integral Formula
Obviously, since eiπ/4
2
375
= eiπ/2 = i
Z
R
2
eit dt,
0
Z R
2
iπ/4
I3 = e
e−t dt
I1 =
0
We shall show that | I2 (R) | −→ 0 as R tends to ∞. We have
n→∞
Z
Z π/4
Z π/4 π/4
2
iR2 (cos 2t+i sin 2t) i(R2 e2it )
it
| I2 (R) | = e
e−R sin 2t dt.
Rie dt ≤ R
e
dt ≤ R
0
0
0
Note that sin t is a concave function on [0, π/2], that is, the graph of the sine function is above
the graph of the corresponding linear function through (0, 0) and (π/2, 1); thus, sin t ≥ 2t/π,
t ∈ [0, π/2]. We have
Z π/4
π −R2
π −R2 4t/π
−R2
.
e
dt = −
−1 =
| I2 (R) | ≤ R
e
1−e
4R
4R
0
We conclude that | I2 (R) | tends to 0 as R → ∞. By Cauchy’s Theorem I1 + I2 − I3 = 0 for
all R, we conclude
Z ∞
Z ∞
2
it2
iπ/4
lim I1 (R) =
e dt = e
e−t dt = lim I3 (R).
R→∞
0
R→∞
0
√
2
π/2 (see below); hence eit = cos(t2 ) + i sin(t2 ) implies
√
Z ∞
Z ∞
2π
2
=
cos(t ) dt =
sin(t2 ) dt.
4
0
0
R∞
√
2
These are the so called Fresnel integrals. We show that I = 0 e−x dx = π/2. (This was
already done in Homework 41) For, we compute the double integral using Fubini’s theorem:
The integral on the right is
Z∞ Z∞
0
−x2 −y 2
e
0
dxdy =
Z
∞
−x2
e
0
dx
Z
∞
2
e−y dy = I 2 .
0
Passing to polar coordinates yields dxdy = r dr, x2 + y 2 = r 2 such that
Z π/2
Z R
ZZ
2
−x2 −y 2
e
dxdy = lim
dϕ
e−r r dr.
R+)2
(
R→∞
0
0
The change of variables r 2 = t, dt = 2r dr yields
√
Z
1 π ∞ −t
π
π
2
e dt = =⇒ I =
I =
.
22 0
4
2
√
This proves the claim. In addition, the change of variables x = s also yields
Z ∞ −x
√
1
e
√ dx = π.
Γ
=
x
2
0
14 Complex Analysis
376
Theorem 14.9 Let γ be a path in an open set U and g be a continuous function on U. If a is
not on γ, define
Z
g(z)
h(a) =
dz.
z−a
γ
Then h is holomorphic on the complement of γ in U and has derivatives of all orders. They are
given by
Z
g(z)
(n)
h (a) = n!
dz.
(z − a)n+1
γ
Proof. Let b ∈ U and not on γ. Then there exists
some r > 0 such that | z − b | ≥ r for all points z on
γ. Let 0 < s < r. We shall see that h has a power
series expansion in the ball Us (b). We write
z
1
1
1
1
=
=
z−a
z − b − (a − b)
z − b 1 − a−b
z−b
!
2
a−b
a−b
1
1+
+
=
+··· .
z−b
z−b
z−b
a
b
s
γ
This geometric series converges absolutely and uniformly for | a − b | ≤ s because
a−b s
z − b ≤ r < 1.
U
Since g is continuous and γ is a compact set, g(z) is bounded on γ such that by Theorem 6.6,
P
a−b n
the series ∞
can be integrated term by term, and we find
n=0 g(z) z−b
h(a) =
=
Z X
∞
(a − b)n
g(z)
dz
(z − b)n+1
n=0
γ
∞
X
n=0
=
∞
X
n=0
(a − b)
n
Z
γ
g(z)
dz
(z − b)n+1
cn (a − b)n ,
where
cn =
Z
γ
g(z) dz
.
(z − b)n+1
This proves that h can be expanded into a power series in a neighborhood of b. By Proposition 14.1 and Corollary 14.2, f has derivatives of all orders in a neighborhood of b. By the
14.2 Cauchy’s Integral Formula
377
formula in Corollary 14.2
(n)
h
(b) = n!cn = n!
Z
γ
g(z) dz
.
(z − b)n+1
Remark 14.6 There is an easy way to deduce the formula. Formally, we can exchange the
R
d
and γ :
differentiation da
Z
Z
Z
d
g(z)
d
g(z)
′
−1
dz.
h (a) =
g(z)(z − a)
dz =
dz =
da
z−a
da
(z − a)2
γ
γ
γ
Z
Z
g(z)
d
h′′ (a) =
g(z)(z − a)−2 dz = 2
dz.
da
(z − a)3
γ
γ
Theorem 14.10 Suppose that f is holomorphic in U and Ur (a) ⊂ U, then f has a power series
expansion in Ur (a)
∞
X
cn (z − a)n .
f (z) =
n=0
In particular, f has derivatives of all orders, and we have the following coefficient formula
Z
f (n) (a)
f (z) dz
1
cn =
.
(14.10)
=
n!
2πi
(z − a)n+1
Sr (a)
Proof. In view of Cauchy’s Integral Formula (Theorem 14.8) we obtain
Z
1
f (z) dz
f (a) =
2πi
z−a
SR (a)
Inserting g(z) = f (z)/(2πi) (f is continuous) into Theorem 14.9, we see that f can be expanded
into a power series with center a and, therefore, it has derivatives of all orders at a,
Z
f (z) dz
n!
(n)
f (a) =
.
2πi
(z − a)n+1
SR (a)
14.2.4 Applications of the Coefficient Formula
Proposition 14.11 (Growth of Taylor Coefficients) Suppose that f is holomorphic in U and
is bounded by M > 0 in Ur (a) ⊂ U; that is, | z − a | < r implies | f (z) | ≤ M.
P
n
Let ∞
n=0 cn (z − a) be the power series expansion of f at a.
Then we have
M
| cn | ≤ n .
(14.11)
r
14 Complex Analysis
378
Proof.
By the
coefficient formula (14.10) and Remark 14.3 (c) we have noting that
M
f
(z)
(z − a)n+1 ≤ r n+1 for z ∈ Sr a
Z
1
1 M
M
M
f
(z)
≤
dz
ℓ(S
(a))
=
2πr
=
.
| cn | ≤ r
2πi
2π r n+1
(z − a)n+1
2πr n+1
rn
Sr (a)
Theorem 14.12 (Liouville’s Theorem) A bounded entire function is constant.
Proof. Suppose that | f (z) | ≤ M for all z ∈ C. Since f is given by a power series f (z) =
P∞
n
n=0 cn z with radius of convergence R = ∞, the previous proposition gives
| cn | ≤
M
rn
for all r > 0. This shows cn = 0 for all n 6= 0; hence f (z) = c0 is constant.
Remarks 14.7 (a) Note that we explicitly assume f to be holomorphic on the entire complex
plane. For example, f (z) = e1/z is holomorphic and bounded outside every ball Uε (0). However, f is not constant.
(b) Note that f (z) = sin z is an entire function which is not constant. Hence, sin z is unbounded
as a complex function.
Theorem 14.13 (Fundamental Theorem of Algebra) A polynomial p(z) with complex coefficients of degree deg p ≥ 1 has a complex root.
Proof. Suppose to the contrary that p(z) 6= 0 for all z ∈ C. It is known, see Example 3.3, that
lim | p(z) | = +∞. In particular there exists R > 0 such that
| z |→∞
| z | ≥ R =⇒ | p(z) | ≥ 1.
That is, f (z) = 1/p(z) is bounded by 1 if | z | ≥ R. On the other hand, f is a continuous
function and {z | | z | ≤ R} is a compact subset of C. Hence, f (z) = 1/p(z) is bounded on
UR , too. That is, f is bounded on the entire plane. By Liouville’s theorem, f is constant and so
is p. This contradicts our assumption deg p ≥ 1. Hence, p has a root in C.
Now, there is an inverse-like statement to Cauchy’s Theorem.
Theorem 14.14 (Morera’s Theorem) Let f : U → C be a continuous function where U ⊂ C
is open. Suppose that the integral of f along each closed triangular path [z1 , z2 , z3 ] in U is 0.
Then f is holomorphic in U.
14.2 Cauchy’s Integral Formula
379
Proof. Fix z0 ∈ U. We show that f has an anti-derivative in a small neighborhood Uε (z0 ) ⊂ U.
For a ∈ Uε (z0 ) define
Za
F (a) = f (z) dz.
z0
a+h
a
Note that F (a) takes the same value for all polygonal paths
from z0 to a by assumption of the theorem. We have
Z a+h
F (a + h) − F (a)
1
(f (z) − f (a)) dz ,
− f (a) = h
h a
where the integral on the right is over the segment form a to
R a+h
a + h and we used a c dz = ch. By Remark 14.3 (c), the
right side is less than or equal to
z
≤
1
sup | f (z) − f (a) | | h | = sup | f (z) − f (a) | .
| h | z∈Uh(a)
z∈Uh (a)
Since f is continuous the above term tends to 0 as h tends to 0. This shows that F is
differentiable at a with F ′ (a) = f (a). Since F is holomorphic in U, by Theorem 14.10 it has
derivatives of all orders; in particular f is holomorphic.
Corollary 14.15 Suppose that (fn ) is a sequence of holomorphic functions on U, uniformly
converging to f on U. Then f is holomorphic on U.
Proof. Since fn are continuous and uniformly converging, f is continuous on U. Let γ be any
closed triangular path in U. Since (fn ) converges uniformly, we may exchange integration and
limit:
Z
Z
Z
f (z) dz =
γ
lim fn (z) dz = lim
γ n→∞
n→∞
fn (z) dz = lim 0 = 0
γ
n→∞
since each fn is holomorphic. By Morera’s theorem, f is holomorphic in U.
Summary
Let U be a region and f : U → C be a function on U. The following are equivalent:
(a) f is holomorphic in U.
(b) f = u + iv is real differentiable and the Cauchy–Riemann equations ux = vy
and uv = −vx are satisfied in U.
(c) If U is simply connected, f is continuous and for every closed triangular path
R
γ = [z1 , z2 , z3 ] in U, γ f (z) dz = 0 (Morera condition).
14 Complex Analysis
380
(d) f possesses locally an antiderivative, that is, for every a ∈ U there is a ball
Uε (a) ⊂ U and a holomorphic function F such that F ′ (z) = f (z) for all z ∈ Uε (a).
(e) f is continuous and for every ball Ur (a) with Ur (a) ⊂ U we have
Z
1
f (z)
f (b) =
dz,
∀ b ∈ Ur (a).
2πi
z−b
Sr (a)
(f) For a ∈ U there exists a ball with center a such that f can be expanded in that
ball into a power series.
(g) For every ball B which is completely contained in U, f can be expanded into a
power series in B.
14.2.5 Power Series
Since holomorphic functions are locally representable by power series, it is quite useful to know
how to operate with power series. In case that a holomorphic function f is represented by a
power series, we say that f is an analytic function. In other words, every holomorphic function
is analytic and vice versa. Thus, by Theorem 14.10, any holomorphic function is analytic and
vice versa.
(a) Uniqueness
P
P
bn z n converge in a ball around 0 and define the same function then
If both
cn z n and
cn = bn for all n ∈ N0 .
(b) Multiplication
P
P
If both cn z n and bn z n converge in a ball Ur (0) around 0 then
∞
X
n=0
where dn =
Pn
n
cn z ·
∞
X
n=0
n
bn z =
∞
X
dn z n ,
n=0
| z | < r,
k=0 cn−k bk .
(c) The Inverse 1/f
Let f (z) =
∞
X
cn z n be a convergent power series and
n=0
c0 6= 0.
Then f (0) = c0 6= 0 and, by continuity of f , there exists r > 0 such that the power series
converges in the ball Ur (0) and is non-zero there. Hence, 1/f (z) is holomorphic in Ur (0) and
therefore it can be expanded into a converging power series in Ur (0), see summary (f). Suppose
14.2 Cauchy’s Integral Formula
that 1/f (z) = g(z) =
∞
X
n=0
381
bn z n , | z | < r. Then f (z)g(z) = 1 = 1 + 0z + 0z 2 + · · · , the
uniqueness and (b) yields
1 = c0 b0 ,
0 = c0 b1 + c1 b0 ,
0 = c0 b2 + c1 b1 + c2 b0 , · · ·
This system of equations can be solved recursively for bn , n ∈ N0 , for example, b0 = 1/c0,
b1 = −c1 b0 /c0 .
(d) Double Series
Suppose that
fk (z) =
∞
X
n=0
ckn (z − a)n ,
k∈N
are converging in Ur (a) power series. Suppose further that the series
∞
X
fk (z)
k=1
converges locally uniformly in Ur (a) as well. Then
∞
X
fk (z) =
∞
∞
X
X
n=0
k=1
k=1
!
ckn (z − a)n .
P
In particular, one can form the sum of a locally uniformly convergent series fk (z) of power
P
series coefficientwise. Note that a series of functions ∞
k=1 fk (z) converges locally uniformly at b if
there exists ε > 0 such that the series converges uniformly in Uε (b).
Note that any locally uniformly converging series of holomorphic functions defines a holomorphic function (Theorem of Weierstraß). Indeed, since the series converges uniformly, line
integral and summation can be exchanged: Let γ = [z0 z1 z2 ] be any closed triangular path inside
U, then by Cauchy’s theorem
Z
Z X
∞
∞ Z
∞
X
X
f (z) dz =
fk (z) dz =
fk (z) dz =
0 = 0.
γ
By Morera’s theorem, f (z) =
γ k=1
P∞
k=1 f (z)
k=1
γ
k=1
is holomorphic.
(e) Change of Center
P
n
Let f (z) = ∞
n=0 cn (z − a) be convergent in Ur (a), r > 0, and b ∈ Ur (a). Then f can be
expanded into a power series with center b
∞
X
f (n) (b)
n
bn (z − b) , bn =
f (z) =
,
n!
n=0
with radius of convergence at least r − | b − a |. Also, the coefficients can be obtained by
reordering for powers of (z − b)k using the binomial formula
n X
n
n
n
(z − b)k (b − a)n−k .
(z − a) = (z − b + b − a) =
k
k=0
14 Complex Analysis
382
(f) Composition
We restrict ourselves to the case
f (z) = a0 + a1 z + a2 z 2 + · · ·
g(z) = b1 z + b2 z 2 + · · ·
where g(0) = 0 and therefore, the image of g is a small neighborhood of 0, and we assume that
the first power series f is defined there; thus, f (g(z)) is defined and holomorphic in a certain
neighborhood of 0, see Remark 14.1. Hence
h(z) = f (g(z)) = c0 + c1 z + c2 z 2 + · · ·
where the coefficients cn = h(n) (0)/n! can be computed using the chain rule, for example,
c0 = f (g(0)) = a0 , c1 = f ′ (g(0))g ′(0) = a1 b1 .
(g) The Composition Inverse f −1
P
n
Suppose that f (z) = ∞
n=1 an z , a1 6= 0, has radius of convergence r > 0. Then there exists a
P∞
power series g(z) = n=1 bn z n converging on Uε (0) such that f (g(z)) = z = g(f (z)) for all
z ∈ Uε (0). Using (f) and the uniqueness, the coefficients bn can be computed recursively.
Example 14.8 (a) The function
f (z) =
1
1
+
2
1+z
3−z
is holomorphic in C \ {i, −i, 3}. Expanding f into a power series with center 1, the closest
singularity to 1 is ±i. Since the disc of convergence cannot contain ±i, the radius of convergence
√
is | 1 − i | = 2. Expanding the power series around a = 2, the closest singularity of f is 3;
hence, the radius of convergence is now | 3 − 2 | = 1.
1
1−z
which is holomorphic in C \ {1} into a power series
around b = i/2. For arbitrary b with | b | < 1 we have
(b) Change of center. We want to expand f (z) =
i
i/2
0
1
1
1
1
1
=
=
·
1−z
1 − b − (z − b)
1 − b 1 − z−b
1−b
∞
X
1
=
(z − b)n = f˜(z).
n+1
(1
−
b)
n=0
By the root test, the
of this series is | 1 − b |. In case b = i/2 we have
p radius of convergence
√
r = | 1 − i/2 | = 1 + 1/4 = 5/2. Note that the power series 1 + z + z 2 + · · · has radius of
convergence 1 and a priori defines an analytic (= holomorphic) function in the open unit ball.
However, changing the center we obtain an analytic continuation of f to a larger region. This
example shows that (under certain assumptions) analytic functions can be extended into a larger
region by changing the center of the series.
14.3 Local Properties of Holomorphic Functions
383
14.3 Local Properties of Holomorphic Functions
We omit the proof of the Open Mapping Theorem and refer to [Jän93, Satz 11, Satz 13] see
also [Con78].
Theorem 14.16 (Open Mapping Theorem) Let G be a region and f a non-constant holomorphic function on G. Then for every open subset U ⊂ G, f (U) is open.
The main idea is to show that any at some point a holomorphic function f with f (a) = 0 looks
like a power function, that is, there exists a positive integer k such that f (z) = h(z)k in a small
neighborhood of a, where h is holomorphic at a with a zero of order 1 at a.
Theorem 14.17 (Maximum Modulus Theorem) Let f be holomorphic in the region U and
a ∈ U is a point such that | f (a) | ≥ | f (z) | for all z ∈ U. Then f must be a constant function.
Proof. First proof. Let V = f (U) and b = f (a). By assumption, | b | ≥ | w | for all w ∈ V .
b cannot be an inner point of V since otherwise, there is some c in the neighborhood of b with
| c | > | b | which contradict the assumption. Hence b is in the boundary ∂V ∩ V . In particular,
V is not open. Hence, the Open Mapping Theorem says that f is constant.
Second Proof. We give a direct proof using Cauchy’s Integral formula. For simplicity let a = 0
and let Ur (0) ⊆ U be a small ball in U. By Cauchy’s theorem with γ = Sr (0), z = γ(t) = reit ,
t ∈ [0, 2π], dz = rieit dt we get
Z 2π
Z 2π
f reit rieit
1
1
f (0) =
dt =
f (reit ) dt.
2πi 0
reit
2π 0
In other words, f (0) is the arithmetic mean of the values of f on any circle with center 0.
Let M = | f (a) | ≥ | f (z) | be the maximal modulus of f on U. Suppose, there exists z0
with z0 = r0 eit0 with r0 < r and | f (z0 ) | < M. Since f is continuous, there exists a whole
neighborhood of t ∈ Uε (t0 ) with f (reit ) < M. However, in this case
Z 2π
Z 2π
1
1
it
f (reit ) dt < M
f (re ) dt ≤
M = | f (0) | = 2π 0
2π 0
which contradicts the mean value property. Hence, | f (z) | = M is constant in any sufficiently
small neighborhood of 0. Let z1 ∈ U be any point in U. We connect 0 and z1 by a path in
U. Let d be its distance from the boundary ∂U. Let z continuously moving from 0 to z1 and
cosider the chain of balls with center z and radius d/2. By the above, | f (z) | = M in any such
ball, hence | f (z) | = M in U. It follows from homework 47.2 that f is constant.
Remark 14.8 In other words, if f is holomorphic in G and U ⊂ G, then sup | f (z) | is attained
z∈U
on the boundary ∂U. Note that both theorems are not true in the real setting: The image of the
sine function of the open set (0, 2π) is [−1, 1] which is not open. The maximum of f (x) = 1−x2
over (−1, 1) is not attained on the boundary since f (−1) = f (1) = 0 while f (0) = 1. However
| z 2 − 1 | on the complex unit ball attains its maximum in z = ±i—on the boundary.
14 Complex Analysis
384
Recall from topology:
• An accumulation point of a set M ⊂ C is a point of z ∈ C such that for every ε > 0,
Uε (z) contains infinitely many elements of M. Accumulation points are in the closure of
M, not necessarily in the boundary of M. The set of accumulation points of M is closed
(Indeed, suppose that a is in the closure of the set of accumulation points of M. Then
every neighborhood of a meets the set of accumulation points of M; in particular, every
neighborhood has infinitely many element of M. Hence a itself is an accumulation point
of M).
• M is connected if every locally constant function is constant. M is not connected if M is
the disjoint union of two non-empty subsets A and B, both are open and closed in M.
For example, M = {1/n | n ∈ N} has no accumulation point in C \ {0} but it has one
accumulation point, 0, in C.
Proposition 14.18 Let U be a region and let f : U → C be holomorphic in U.
If the set Z(f ) of the zeros of f has an accumulation point in U, then f is identically 0 in U.
Example 14.9 Consider the holomorphic function f (z) = sin 1z , f : U → C, U = C \ {0} and
1
| n ∈ Z}. The only accumulation point 0 of Z(f ) does
with the set of zeros Z(f ) = {zn = nπ
not belong to U. The proposition does not apply.
Proof. Suppose a ∈ U is an accumulation point of Z(f ). Expand f into a power series with
center a:
∞
X
f (z) =
cn (z − a)n , | z − a | < r.
n=0
Since a is an accumulation point of the zeros, there exists a sequence (zn ) of zeros converging
to a. Since f is continuous at a, limn→∞ f (zn ) = 0 = f (a). This shows c0 = 0. The same
argument works with the function
f (z)
,
z−a
which is holomorphic in the same ball with center a and has a as an accumulation point of zeros.
Hence, c1 = 0. In the same way we conclude that c2 = c3 = · · · = cn = · · · = 0. This shows
that f is identically 0 on Ur (a). That is, the set
f1 (z) = c1 + c2 (z − a) + c3 (z − a)3 + · · · =
A = {a ∈ U | a is an accumulation point of Z(f ) }
is an open set. Also,
B = {a ∈ U | a is not an accumulation point of Z(f ) }
is open (with every non-accumulation point z, there is a whole neighborhood of z not containing accumulation points of Z(f )). Now, U is the disjoint union of A and B, both are open as
well as closed in U. Hence, the characteristic function on A is a locally constant function on U.
Since U is connected, either U = A or U = B. Since by assumption A is non-empty, A = U,
that is f is identically 0 on U.
14.4 Singularities
385
Theorem 14.19 (Uniqueness Theorem) Suppose that f and g are both holomorphic functions
on U and U is a region. Then the following are equivalent:
(a) f = g
(b) The set D = {z ∈ U | f (z) = g(z)} where f and g are equal has an accumulation point in U.
(c) There exists z0 ∈ U such that f (n) (z0 ) = g (n) (z0 ) for all non-negative integers
n ∈ N0 .
Proof. (a) ↔ (b). Apply the previous proposition to the function f − g.
(a) implies (c) is trivial. Suppose that (c) is satisfied. Then, the power series expansion of
f − g at z0 is identically 0. In particular, the set Z(f − g) contains a ball Bε (z0 ) which has an
accumulation point. Hence, f − g = 0.
The following proposition is an immediate consequence of the uniqueness theorem.
Proposition 14.20 (Uniqueness of Analytic Continuation) Suppose that M ⊂ U ⊂ C where
U is a region and M has an accumulation point in U. Let g be a function on M and suppose
that f is a holomorphic function on U which extents g, that is f (z) = g(z) on M.
Then f is unique.
Remarks 14.9 (a) The previous proposition shows a quite amazing property of a holomorphic
function: It is completely determined by “very few values”. This is in a striking contrast to
C∞ -functions on the real line. For example, the “hat function”
( − 1
| x | < 1,
e 1−x2
h(x) =
0
|x| ≥ 1
is identically 0 on [2, 3] (a set with accumulation points), however, h is not identically 0. This
shows that h is not holomorphic.
(b) For the uniqueness theorem, it is an essential point that U is connected.
(c) It is now clear that the real function ex , sin x, and cos x have a unique analytic continuation
into the complex plane.
(d) The algebra O(U) of holomorphic functions on a region U is a domain, that is, f g =
0 implies f = 0 or g = 0. Indeed, suppose that f (z0 ) 6= 0, then f (z) 6= 0 in a certain
neighborhood of z0 (by continuity of f ). Then g = 0 on that neighborhood. Since an open set
has always an accumulation point in itself, g = 0.
14.4 Singularities
◦
We consider functions which are holomorphic in a punctured ball U r (a). From information
about the behaviour of the function near the center a, a number of interesting and useful results
will be derived. In particular, we will use these results to evaluate certain unproper integrals
over the real line which cannot be evaluated by methods of calculus.
14 Complex Analysis
386
14.4.1 Classification of Singularities
Throughout this subsection U is a region, a ∈ U, and f : U \ {a} → C is holomorphic.
Definition 14.4 (a) Let f be holomorphic in U \ {a} where U is a region and a ∈ U. Then a is
said to be an isolated singularity.
(b) The point a is called a removable singularity if there exists a holomorphic function
g : Ur (a) → C such that g(z) = f (z) for all z with 0 < | z − a | < r.
sin z 1
, , and e1/z all have isolated singularities at 0. However,
Example 14.10 The functions
z z
sin z
has a removable singularity. The holomorphic function g(z) which coincides
only f (z) =
z
with f on C \ {0} is g(z) = 1 − z 2 /3! + z 4 /5! − + · · · . Hence, redefining f (0) := g(0) = 1
makes f holomorphic in C. We will see later that the other two singularities are not removable.
It is convenient to denote the the new function g with one more point in its domain (namely a)
also by f .
Proposition 14.21 (Riemann—1851) Suppose that f : U \ {a} → C, a ∈ U, is holomorphic.
Then a is a removable singularity of f if and only if there exists a punctured neighborhood
◦
U r (a) where f is bounded.
Proof. The necessity of the condition follows from the fact, that a holomorphic function g is
continuous and the continuous function | g(z) | defined on the compact set Ur/2 (a) is bounded;
hence f is bounded.
For the sufficiency we assume without loss of generality, a = 0 (if a is non-zero, consider the
function f˜(z) = f (z − a) instead). The function
(
z 2 f (z),
z 6= 0,
h(z) =
0,
z=0
◦
is holomorphic in U r (0). Moreover, h is differentiable at 0 since f is bounded in a neighborhood
of 0 and
h(z) − h(0)
h′ (0) = lim
= lim zf (z) = 0.
z→0
z→0
z
Thus, h can be expanded into a power series at 0,
h(z) = c0 + c1 z + c2 z 3 + c3 z 3 + · · · = c2 z 2 + c3 z 3 + · · ·
with c0 = c1 = 0 since h(0) = h′ (0) = 0. For non-zero z we have
f (z) =
h(z)
= c2 + c3 z + c4 z 2 + · · · .
2
z
The right side defines a holomorphic function in a neighborhood of 0 which coincides with f
for z 6= 0. The setting f (0) = c2 removes the singularity at 0.
14.4 Singularities
387
Definition 14.5 (a) An isolated singularity a of f is called a pole of f if there exists a positive
integer m ∈ N and a holomorphic function g : Ur (a) → C such that
f (z) =
g(z)
(z − a)m
The smallest number m such that (z − a)m f (z) has a removable singularity at a is called the
order of the pole.
(b) An isolated singularity a of f which is neither removable nor a pole is called an essential
singularity.
(c) If f is holomorphic at a and there exists a positive integer m and a holomorphic function g
such that f (z) = (z − a)m g(z), and g(a) 6= 0, a is called a zero of order m of f .
Note that m = 0 corresponds to removable singularities. If f (z) has a zero of order m at a,
1/f (z) has a pole of order m at a and vice versa.
Example 14.11 The function f (z) = 1/z 2 has a pole of order 2 at z = 0 since z 2 f (z) = 1 has
a removable singularity at 0 and zf (z) = 1/z not. The function f (z) = (cos z − 1)/z 3 has a
pole of order 1 at 0 since (cos z − 1)/z 3 = −/(2z) + z/4! ∓ · · · .
14.4.2 Laurent Series
In a neighborhood of an isolated singularity a holomorphic function cannot expanded into a
power series, however, in a so called Laurent series.
Definition 14.6 A Laurent series with center a is a series of the form
∞
X
n=−∞
cn (z − a)n
or more precisely the pair of series
f− (z) =
∞
X
n=1
c−n (z − a)
−n
and
f+ (z) =
∞
X
n=0
cn (z − a)n .
The Laurent series is said to be convergent if both series converge.
14 Complex Analysis
388
R
R
r
r
a
a
a
f(z) converges
+
f(z) converges
−
1
.” Thus, we can derive facts about the
z−a
convergence of Laurent series from the convergence of power series. In fact, suppose that 1/r
P
n
is the radius of convergence of the power series ∞
n=1 c−n ζ and R is the radius of convergence
P
P∞
of the series n=0 cn z n , then the Laurent series n∈Z cn z n converges in the annulus Ar,R =
{z | r < z < R} and defines there a holomorphic function.
P
(a) The power series f+ (z) = n≥0 cn (z − a)n converges in the inner part of the ball UR (a)
whereas the series with negative powers, called the principal part of the Laurent series, f− (z) =
P
n
n<0 cn (z − a) converges in the exterior of the ball Ur (a). Since both series must converge,
f (z) convergence in intersection of the two domains which is the annulus Ar,R (a).
Remark 14.10 (a) f− (z) is “a power series in
The easiest way to determine the type of an isolated singularity is to use Laurent series which
are, roughly speaking, power series with both positive and negative powers z n .
Proposition 14.22 Suppose that f is holomorphic in the open annulus Ar,R (a) =
{z | r < | z − a | < R}. Then f (z) has an expansion in a convergent Laurent series for
z ∈ Ar,R
f (z) =
∞
X
n=0
n
cn (z − a) +
∞
X
n=1
c−n
1
(z − a)n
(14.12)
n ∈ Z,
(14.13)
with coefficients
1
cn =
2πi
Z
Sρ (a)
where r < ρ < R.
r < s1 ≤ s2 < R.
f (z)
dz,
(z − a)n+1
The series converges uniformly on every annulus As1 ,s2 (a) with
14.4 Singularities
389
Proof.
z
s1
a
s
2
γ
Let z be in the annulus As1 ,s2 and let γ be the closed
path around z in the annulus consisting of the two circles −Ss1 (a), Ss2 (a) and the two “bridges.” By Cauchy’s
integral formula,
Z
f (w)
1
dw = f1 (z) + f2 (z) =
f (z) =
2πi γ w − z
Z
Z
1
1
f (w)
f (w)
dw −
dw
=
2πi
w−z
2πi
w−z
Ss2 (a)
We consider the two functions
Z
1
f1 (z) =
2πi
Ss2 (a)
f (w)
dw,
w−z
Ss1 (a)
1
f2 (z) = −
2πi
Z
Ss1 (a)
f (w)
dw
w−z
separately.
P
n
In what follows, we will see that f1 (z) is a power series ∞
n=0 cn (z − a) and f2 (z) =
P∞
1
The first part is completely analogous to the proof of Theorem 14.9.
n=1 c−n (z−a)n .
z−a <1
Case 1. w ∈ Ss2 (a). Then | z − a | < | w − a | and | q | = s
w−a
z
such that
2
a
∞
∞
1
1
1 X n X (z − a)n
1
q =
.
=
=
z−a
w−z
w − a 1 − w−a
w − a n=0
(w − a)n+1
n=0
w
Since f (w) is bounded on Ss2 (a), the geometric series has a converging numerical upper bound.
Hence, the series converges uniformly with respect to w; we can exchange integration and
summation:
Z
1
f1 (z) =
2πi
Ss2 (a)
where cn =
1
2πi
X (z − a)n
(z − a)n
f (w)
dw
=
(w − a)n+1
2πi
n=0
n=0
R
Ss2 (a)
s1
z
a
w
∞
∞
X
Z
Ss2 (a)
∞
X
f (w)dw
=
cn (z−a)n ,
(w − a)n+1
n=0
f (w)dw
(w−a)n+1
are the coefficients of the power series f1 (z).
w−a
< 1 such
Case 2. w ∈ Ss1 (a). Then | z − a | > | w − a | and z−a that
∞
∞
X
1
1
1 X n
1
(w − a)n
=−
=
−
q
=
−
.
w−z
z − a 1 − w−a
z − a n=0
(z − a)n+1
z−a
n=0
Since f (w) is bounded on Ss1 (a), the geometric series has a converging numerical upper bound.
Hence, the series converges uniformly with respect to w; we can exchange integration and
14 Complex Analysis
390
summation:
Z
1
f2 (z) =
2πi
Ss1 (a)
=
∞
X
n=1
where c−n =
Ss2 (a)
∞
X
1
(w − a)n
f (w)
dw
=
(z − a)n+1
2πi(z − a)n+1
n=0
n=0
Z
f (w) (w − a)n dw
Ss1 (a)
c−n (z − a)−n ,
1
2πi
R
Ss2 (a)
Since the integrand
mark 14.4 (c)
Z
∞
X
f (w) (w − a)n−1 dw are the coefficients of the series f2 (z).
f (w)
, k ∈ Z, is holomorphic in both annuli As1 ,ρ and Aρ,s2 , by Re(w − a)k
f (w)dw
=
(w − a)k
Z
Sρ (a)
f (w)dw
,
(w − a)k
Z
and
Ss1 (a)
f (w)dw
=
(w − a)k
Z
Sρ (a)
f (w)dw
,
(w − a)k
that is, in the coefficient formulas we can replace both circles Ss1 (a) and Ss2 (a) by a common
circle Sρ (a). Since a power series converge uniformly on every compact subset of the disc of
convergence, the last assertion follows.
Remark 14.11 The Laurent series of f on Ar,R (a) is unique. Its coefficients cn , n ∈ Z are
uniquely determined by (14.13). Another value of ρ with r < ρ < R yields the same values cn
by Remark 14.4 (c).
Example 14.12 Find the Laurent expansion of f (z) =
midpoint 0
0 < | z | < 1,
z2
1 < | z | < 3,
2
in the three annuli with
− 4z + 3
3 < |z|.
1
1
+
, we find in the case
1−z z−3
∞
1
1
1 X z n
1
=
=
.
3−z
3 1 − z3
3 n=0 3
Using partial fraction decomposition, f (z) =
(a) | z | < 1
Hence,
∞
X
1
=
zn,
1−z
n=0
∞ X
f (z) =
1−
n=0
In the case | z | > 1,
and as in (a) for | z | < 3
1
3n+1
1 1
1
=
1−z
z 1−
1
z
zn ,
| z | < 1.
∞
X
1
=
n+1
z
n=0
1
1 X z n
=
3−z
3 n=0 3
∞
14.4 Singularities
391
such that
f (z) =
∞
X
−1
n=0
(c) In case | z | > 3 we have
∞
X
−1 n
+
z .
n
z
3n+1
n=0
∞
such that
X 3n
1
1
=
=
z−3
z n+1
z 1 − 3z
n=0
f (z) =
∞
X
n=1
(3n−1 − 1)
1
.
zn
We want to study the behaviour of f in a neighborhood of an essential singularity a. It is
characterized by the following theorem. For the proof see Conway, [Con78, p. 300].
Theorem 14.23 (Great Picard Theorem (1879)) Suppose that f (z) is holomorphic in the annulus G = {z | 0 < | z − a | < r} with an essential singularity at a.
Then there exists a complex number w1 with the following property. For any complex number
w 6= w1 , there are infinitely many z ∈ G with f (z) = w.
In other words, in every neighborhood of a the function f (z) takes all complex values with
possibly one omission. In case of f (z) = e1/z the number 0 is omitted; in case of f (z) = sin 1z
no complex number is omitted.
We will prove a much weaker form of this statement.
Proposition 14.24 (Casorati–Weierstraß) Suppose that f (z) is holomorphic in the annulus
G = {z | 0 < | z − a | < r} with an essential singularity at a.
Then the image of any neighborhood of a in G is dense in C, that is for every w ∈ C and any
ε > 0 and δ > 0 there exists z ∈ G such that | z − a | < δ and | f (z) − w | < ε.
Proof. For simplicity, assume that δ < r. Assume to the contrary that there exists w ∈ C and
◦
ε > 0 such that | f (z) − w | ≥ ε for all z ∈ U δ (a). Then the function
g(z) =
1
,
f (z) − w
◦
z ∈ U δ (a)
is bounded (by 1/ε) in some neighborhood of a; hence, by Proposition 14.21, a is a removable
singularity of g(z). We conclude that
f (z) =
1
+w
g(z)
has a removable singularity at a if g(a) 6= 0. If, on the other hand, g(z) has a zero at a of order
m, that is
∞
X
cn (z − a)n , cm 6= 0,
g(z) =
n=m
14 Complex Analysis
392
the function (z − a)m f (z) has a removable singularity at a. Thus, f has a pole of order m at a.
Both conclusions contradict our assumption that f has an essential singularity at a.
The Laurent expansion establishes an easy classification of the singularity of f at a. We summarize the main facts about isolated singularities.
◦
Proposition 14.25 Suppose that f (z) is holomorphic in the punctured disc U = U R (a) and
∞
X
cn (z − a)n .
possesses there the Laurent expansion f (z) =
n=−∞
Then the singularity at a
(a) is removable if cn = 0 for all n < 0. In this case, | f (z) | is bounded in U.
(b) is a pole of order m if c−m 6= 0 and cn = 0 for all n < −m. In this case, limz→a | f (z) | =
+∞.
(c) is an essential singularity if cn 6= 0 for infinitely many n < 0. In this case, | f (z) | has no
finite or infinite limit as z → a.
The easy proof is left to the reader. Note that Casorati–Weierstraß implies that | f (z) | has no
limit at a.
Example 14.13 f (z) = e1/z has in C \ {0} the Laurent expansion
1
ez =
∞
X
1
,
n
n!z
n=0
| z | > 0.
Since c−n 6= 0 for all n, f has an essential singularity at 0.
14.5 Residues
Throughout U ⊂ C is an open connected subset of C.
◦
Definition 14.7 Suppose that f : U r (a) → C, is holomorphic, 0 < r1 < r and let
Z
X
1
f (z) dz
n
cn (z − a) , cn =
f (z) =
2πi Sr1 (a) (z − a)n+1
n∈Z
be the Laurent expansion of f in the annulus {z | 0 < | z − a | < r}.
Then the coefficient
Z
1
f (z) dz
c−1 =
2πi Sr1 (a)
is called the residue of f at a and is denoted by Res f (z) or Res f .
a
a
14.5 Residues
393
Remarks 14.12 (a) If f is holomorphic at a, Res f = 0 by Cauchy’s theorem.
a
R
(b) The integral γ f (z) dz depends only on the coefficient c−1 in the Laurent expansion of f (z)
around a. Indeed, every summand cn (z − a)n , n 6= 0, has an antiderivative in U \ {a} such that
the integral over a closed path is 0.
(c) Res f + Res g = Res f + g and Res λf = λ Res f .
a
a
a
a
a
Theorem 14.26 (Residue Theorem) Suppose that f : U \ {a1 , . . . , am } → C, a1 , . . . , am ∈
U, is holomorphic. Further, let γ be a non-selfintersecting positively oriented closed curve in
U such that the points a1 , . . . , am are in the inner part of γ. Then
Z
m
X
f (z) dz = 2πi
Res f
(14.14)
γ
k=1
Proof.
δ1
δ3
δ2
δ4
γ
a
a1
2
γ
2
γ1
ak
R
As in Remark 14.4 we can replace γ by
the sum of integrals over small circles, one
around each singularity. As before, we obtain
Z
Z
m
X
f (z) dz =
f (z) dz,
γ
k=1
Sε (ak )
where all circles are positively oriented. Applying the definition of the residue we obtain the
assertion.
Remarks 14.13 (a) The residue theorem generalizes the Cauchy’s Theorem, see Theorem 14.6.
Indeed, if f (z) possesses an analytic continuation to the points a1 , . . . , am , all the residues are
R
zero and therefore γ f (z) dz = 0.
∞
X
(b) If g(z) is holomorphic in the region U, g(z) =
cn (z − a)n , c0 = g(a), then
n=0
f (z) =
g(z)
,
z−a
z ∈ U \ {a}
is holomorphic in U \ {a} with a Laurent expansion around a:
f (z) =
c0
+ c1 + c2 (z − a)2 + · · · ,
z−a
where c0 = g(a) = Res f . The residue theorem gives
a
Z
Sr (a)
g(z)
dz = 2πi Res f = 2πi c0 = 2πig(a).
a
z−a
We recovered Cauchy’s integral formula.
14 Complex Analysis
394
14.5.1 Calculating Residues
(a) Pole of order 1
As in the previous Remark, suppose that f has a pole of order 1 at a and g(z) = (z − a)f (z) is
the corresponding holomorphic function in Ur (a). Then
Res f = g(a) =
a
lim g(z) = lim (z − a)f (z).
(14.15)
z→a
z→a,z6=a
(b) Pole of order m
Suppose that f has a pole of order m at a. Then
f (z) =
c−m
c−m+1
c−1
+ c0 + c1 (z − a) + · · · ,
+
+···+
m
m−1
(z − a)
(z − a)
z−a
0 < |z − a| < r
(14.16)
is the Laurent expansion of f around a. Multiplying (14.16) by (z − a)m yields a holomorphic
function
(z − a)m f (z) = c−m + c−m+1 (z − a) + · · · c−1 (z − a)m−1 + · · · ,
| z − a | < r.
Differentiating this (m − 1) times, all terms having coefficient c−m , c−m+1 , . . . , c−2 vanish and
we are left with the power series
dm−1
((z − a)m f (z)) = (m − 1)!c−1 + m(m − 1) · · · 2 c0 (z − a) + · · ·
dz m−1
Inserting z = a on the left, we obtain c−1 . However, on the left we have to take the limit z → a
since f is not defined at a.
Thus, if f has a pol of order m at a,
Res f (z) =
a
dm−1
1
lim m−1 ((z − a)m f (z)) .
(m − 1)! z→a dz
(14.17)
(c) Quotients of Holomorphic Functions
Suppose that f = pq where p and q are holomorphic at a and q has a zero of order 1 at a, that is
q(a) = 0 6= q ′ (a). Then, by (a)
p(z)
p
= lim
Res = lim (z − a)
a
z→a
q
q(z) z→a
p(z)
q(z)−q(a)
z−a
lim p(z)
=
z→a
lim q(z)−q(a)
z−a
z→a
=
p(a)
.
q ′ (a)
(14.18)
14.6 Real Integrals
395
R
dz
Example 14.14 Compute S1 (i) 1+z
The only singularities of
4.
4
f (z) = 1/(1 + z ) inside the disc {z | | z − i | < 1} are
√
√
a1 = eπi/4 = (1 + i)/ 2 and a2 = e3πi/4 = (−1 + i)/ 2. In√
deed, | a1 − i |2 = 2 − 2 < 1. We apply the Residue Theorem
and (c) and obtain
Im
i
a2
a1
0
1
Z
Re
S1 (i)
dz
= 2πi Res f + Res f =
a1
a2
1 + z4
√
2π
1
−a1 − a2
1
=
.
+ 3 = 2πi
2πi
3
4a1 4a2
4
2
14.6 Real Integrals
14.6.1 Rational Functions in Sine and Cosine
Suppose we have to compute the integral of such a function over a full period [0, 2π]. The idea
is to replace t by z = eit on the unit circle, cos t and sin t by (z + 1/z)/2 and (z − 1/z)/(2i),
respectively, and finally dt = dz/(iz).
Proposition 14.27 Supppose that R(x, y) is a rational function in two variables and
R(cos t, sin t) is defined for all t ∈ [0, 2π]. Then
Z 2π
X
R(cos t, sin t) dt = 2πi
Res f (z),
(14.19)
0
where
a∈U1 (0)
1
f (z) = R
iz
a
1
1
1
1
z+
,
z−
2
z
2i
z
and the sum is over all isolated singularities of f (z) in the open unit ball.
Proof. By the residue theorem,
Z
f (z) dz = 2πi
S1 (0)
X
a∈U1 (0)
Res f
a
Let z = eit for t ∈ [0, 2π]. Rewriting the integral on the left using dz = eit i dt = iz dt
Z 2π
Z
f (z) dz =
R(cos t, sin t) dt
S1 (0)
0
completes the proof.
Example 14.15 For | a | < 1,
Z
2π
0
dt
2π
=
.
2
1 − 2a cos t + a
1 − a2
14 Complex Analysis
396
For a = 0, the statement is trivially true; suppose now a 6= 0. Indeed, the complex function
corresponding to the integrand is
f (z) =
1
1
i/a
.
=
=
iz(1 + a2 − az − a/z)
i(−az 2 − a + (1 + a2 )z)
(z − a) z − a1
In the unit disc, f (z) has exactly one pole of order 1, namely z = a. By (14.15), the formula in
Subsection 14.5.1,
i
i
a
;
Res f = lim (z − a)f (z) =
= 2
1
a
z→a
a −1
a− a
the assertion follows from the proposition:
Z
2π
0
dt
2π
i
=
= 2πi 2
.
2
1 − 2a cos t + a
a −1
1 − a2
Specializingi R = 1 and r = a ∈ R in Homework 49.1, we obtain the same formula.
14.6.2 Integrals of the form
(a) The Principal Value
R∞
f (x) dx
−∞
We often compute improper integrals of the form
Z
∞
f (x) dx. Using the residue theorem, we
−∞
calculate limits
lim
R→∞
Z
R
f (x) dx,
(14.20)
−R
which is called the principal value (or Cauchy mean value) of the integral over R and we denote
it by
Z
∞
Vp
f (x) dx.
−∞
The existence of the “coupled” limit (14.20) in general does not imply the existence of the
improper integral
Z
∞
−∞
f (x) dx = lim
r→−∞
Z
r
0
f (x) dx + lim
s→∞
Z
s
f (x) dx.
0
R∞
R∞
R∞
For example, Vp −∞ x dx = 0 whereas −∞ x dx does not exist since 0 x dx = +∞. In
general, the existence of the improper integral implies the existence of the principal value. If f
is an even function or f (x) ≥ 0, the existence of the principal value implies the existence of the
improper integral.
(b) Rational Functions
R
The main idea to evaluate the integral R f (x) dx is as follows. Let H = {z | Im (z) > 0}
be the upper half-plane and f : H \ {a1 , . . . , am } → C be holomorphic. Choose R > 0 large
14.6 Real Integrals
397
enough such that | ak | < R for all k = 1, . . . , m, that is, all isolated singularities of f are in the
upper-plane-half-disc of radius R around 0.
Consider the path as in the picture which consists of the segment from −R to R on the real
iR
line and the half-circle γR of radius R. By the
γ
.
R
residue theorem,
.
.
Z R
Z
m
.
. .
X
.
f (x) dx+
Res f (z).
f (z) dz = 2πi
−R
−R
R
If
lim
R→∞
Z
γR
f (z) dz = 0
k=1
ak
(14.21)
γR
the above formula implies
lim
R→∞
Z
R
f (x) dx = 2πi
−R
Knowing the existence of the improper integral
Z
∞
Z
Res (z).
k=1
∞
ak
f (x) dx one has
−∞
f (x) dx = 2πi
−∞
m
X
m
X
Res (z).
k=1
ak
Suppose that f = pq is a rational function such that q has no real zeros and deg q ≥ deg p + 2.
Then (14.21) is satisfied. Indeed, since only the two leading terms
of p and q determine the the
p(z) ≤ C on γR . Using the
limit behaviour of f (z) for | z | → ∞, there exists C > 0 with q(z) R2
estimate Mℓ(γ) from Remark 14.3 (c) we get
Z
p(z) C
πC
dz ≤ 2 ℓ(γR ) =
−→ 0.
R
R R→∞
γR q(z)
By the same reason namely | p(x)/q(x) | ≤ C/x2 , for large x, the improper real integral exists
(comparison test) and converges absolutely. Thus, we have shown the following proposition.
Proposition 14.28 Suppose that p and q are polynomials with deg q ≥ deg p + 2. Further, q
in the open
has no real zeros and a1 , . . . , am are all poles of the rational function f (z) = p(z)
q(z)
upper half-plane H.
Then
Z ∞
m
X
f (x) dx = 2πi
Res f.
−∞
k=1
ak
R ∞ dx
2
2
Example 14.16 (a) −∞ 1+x
2 . The only zero of q(z) = z + 1 in H is a1 = i and deg(1 + z ) =
2 ≥ deg(1) + 2 such that
Z ∞
dx
1
1 = 2πi Res
= 2πi
= π.
2
i
1 + z2
1 + z z=i
−∞ 1 + x
14 Complex Analysis
398
(b) It follows from Example 14.14 that
√
Z ∞
π 2
dx
.
= 2πi Res f + Res f =
4
a1
a2
2
−∞ 1 + x
(c) We compute the integral
Z
∞
0
1
dt
=
6
1+t
2
Z
∞
−∞
dt
.
1 + t6
a 2 =i
a3
a1
The zeros of q(z) = z 6 + 1 in the upper half-plane are
a1 = eiπ/6 , a2 = eiπ/2 and a3 = e5iπ/6 . They are all of
multiplicity 1 such that Formula (14.18) applies:
1
Res
ak
1
1
ak
1
= ′
= 5 =− .
q(z)
q (ak )
6ak
6
By Proposition 14.28 and noting that a1 + a3 = i,
Z ∞
πi
1
dt
−1 iπ/6
π
5iπ/6
= − 2i = .
=
+
i
+
e
e
2πi
6
1+t
2
6
6
3
0
(c) Functions of Type g(z) eiαz
Proposition 14.29 Suppose that p and q are polynomials with deg q ≥ deg p + 1. Further, q
has no real zeros and a1 , . . . , am are all poles of the rational function g = pq in the open upper
half-plane H. Put f (z) = g(z) eiαz , where α ∈ R is positive α > 0.
Then
Z ∞
m
X
f (x) dx = 2πi
Res f.
−∞
k=1
ak
Proof. (The proof was omitted in the lecture.)
r+ir
ir
−r+ir
.
.
.
Instead of a semi-circle it is more appropriate to
consider a rectangle now.
.
.
. .
−r
r
According to the residue theorem,
Z r+ir
Z
Z r
f (x) dx +
f (z) dz +
−r
r
r−ir
r+ir
f (z) dz +
Z
−r
−r+ir
f (z) dz = 2πi
m
X
k=1
Res f.
ak
p(z) p(z) exists and tends to 0 as
Since deg q ≥ deg p + 1, limz→∞ q(z) = 0. Thus, sr = sup | z |≥r q(z)
r → ∞.
14.6 Real Integrals
399
Consider the second integral I2 with z = r + it, t ∈ [0, r], dz = i dt. On this segment we have
the following estimate
p(z) iα(r+it) ≤ sr e−αt
q(z) e
which implies
| I2 | ≤ sr
Z
r
e−αt dt =
0
sr
sr
1 − e−αr ≤ .
α
α
A similar estimate holds for the fourth integral from −r + ir to −r. In case of the third integral
one has z = t + ir, t ∈ [−r, r], dz = dt such that
Z r
Z r
iα(t+ri) −αr
| I3 | ≤
sr e
dt = 2rsr e−αr .
dt = sr e
−r
−r
Since 2re−αr is bounded and sr → 0 as r → ∞, all three integrals I2 , I3 , and I4 tend to 0 as
r → ∞. This completes the proof.
Example 14.17 For a > 0,
Obviously,
Z
0
The function f (z) =
formula (14.18)
Z
∞
∞
0
cos t
π −a
e .
dt
=
t2 + a2
2a
cos t
1
dt = Re
2
2
t +a
2
Z
∞
−∞
eit
dt .
t2 + a2
eit
has a single pole of order 1 in the upper half-plane at z = ai. By
t2 + a2
eiz
eiz e−a
Res 2
.
=
=
ai z + a2
2z z=ai
2ai
Proposition 14.29 gives the result.
(d) A Fourier transformations
Lemma 14.30 For a ∈ R,
1
√
2π
Z
R
1 2
−iax
e− 2 x
1 2
dx = e− 2 a .
(14.22)
Proof.
−R+ai
γ3
ai
γ4
γ
−R
γ1
Let f (z) = e− 2 z , z ∈ C and γ the closed
rectangular path γ = γ1 + γ2 + γ3 + γ4 as in
the picture. Since f is an entire function, by
R
Cauchy’s theorem γ f (z) dz = 0. Note that
γ2 is parametrized as z = R + ti, t ∈ [0, a],
dz = i dt, such that
1 2
R+ai
R
2
14 Complex Analysis
400
Z
Z
2
Z
− 12 (R2 +it)2
a
1
2
2
f (z) dz =
e
i dt =
e− 2 (R +2Rit−t ) i dt
0
Z0 a
Z γ2
Z a
1 2
1 2
− 12 R2 + 12 t2
− 12 R2
f (z) dz ≤
e
e 2 t dt = Ce− 2 R .
dt = e
γ2
1
a
0
0
SinceZ e− 2 R tends to 0 as R → ∞, the above integral
tends to 0, as well; hence
Z
R
lim
f (z) dz = 0. Similarly, one can show that lim
f (z) dz = 0. Since γ f (z) dz = 0,
R→∞ γ
R→∞ γ
2Z
4
f (z) dz = 0, that is
we have
γ1 +γ3
Z
Using
∞
f (x) dx =
−∞
Z
R
1 2
Z
∞
f (x + ai) dx.
−∞
e− 2 x dx =
√
2π,
which follows from Example 14.7, page 374, or from homework 41.3, we have
Z
Z
Z
√
1 2
1 2
− 12 x2
− 12 (x2 +2iax−a2 )
a
2
2π =
e
dx =
e
dx = e
e− 2 x −iax dx
R
R Z
R
1 2
1 2
1
√
e− 2 x −iax dx = e− 2 a .
2π R
Chapter 15
Partial Differential Equations I — an
Introduction
15.1 Classification of PDE
15.1.1 Introduction
There is no general theory known concerning the solvability of all PDE. Such a theory is extremely unlikely to exist, given the rich variety of physical, geometric, probabilistic phenomena
which can be modelled by PDE. Instead, research focuses on various particular PDEs that are
important for applications in mathematics and physics.
Definition 15.1 A partial differential equation (abbreviated as PDE) is an equation of the form
F (x, y, . . . , u, ux, uy , . . . , uxx , uxy , . . . ) = 0
(15.1)
where F is a given function of the independent variables x, y, . . . of the unknown function u
and a finite number of its partial derivatives.
We call u a solution of (15.1) if after substitution of u(x, y, . . . ) and its partial derivatives (15.1)
is identically satisfied in some region Ω in the space of the independent variables x, y, . . . . The
order of a PDE is the order of the highest derivative that occurs.
A PDE is called linear if it is linear in the unknown function u and their derivatives ux , uy , uxy ,
. . . , with coefficients depending only on the variables x, y, . . . . In other words, a linear PDE
can be written in the form
G(u, ux , uy , . . . , uxx , uxy , . . . ) = f (x, y, . . . ),
(15.2)
where the function f on the right depends only on the variables x, y, . . . and G is linear in all
components with coefficients depending on x, y, . . . . More precisely, the formal differential
operator L(u) = G(u, ux, uy , . . . , uxx , uxy , . . . ) which associates to each function u(x, y, . . . )
a new function L(u)(x, y, . . . ) is a linear operator. The linear PDE (15.2) (L(u) = f ) is called
homogeneous if f = 0 and inhomogeneous otherwise. For example, cos(xy 2 )uxxy − y 2 ux +
401
15 Partial Differential Equations I — an Introduction
402
u sin x + tan(x2 + y 2) = 0 is a linear inhomogeneous PDE of order 3, the corresponding
homogeneous linear PDE is cos(xy 2 )uxxy − y 2 ux + u sin x = 0.
A PDE is called quasi-linear if it is linear in all partial derivatives of order m (the order of the
PDE) with coefficients which depend on the variables x, y, · · · and partial derivatives of order
less than m; for example ux uxx + u2 = 0 is quasi-linear, uxy uxx + 1 = 0 not. Semi-linear
equations are those quasi-linear equation in which the coefficients of the highest order terms
does not depend on u and its partial derivatives; sin xuxx +u2 = 0 is semi-linear; ux uxx +u2 = 0
not. Sometimes one considers systems of PDEs involving one or more unknown functions and
their derivatives.
15.1.2 Examples
(1) The Laplace equation in n dimensions for a function u(x1 , . . . , xn ) is the linear second
order equation
∆u = ux1x1 + · · · + uxn xn = 0.
The solutions u are called harmonic (or potential) functions. In case n = 2 we associate
with a harmonic function u(x, y) its “conjugate” harmonic function v(x, y) such that the
first-order system of Cauchy–Riemann equations
ux = vy ,
uy = −vx
is satisfied. A real solution (u, v) gives rise to the analytic function f (z) = u + iv. The
Poisson equation is
∆u = f,
for a given function f : Ω → R.
The Laplace equation models equilibrium states while the Poisson equation is important in electrostatics. Laplace and Poisson equation always describe stationary processes
(there is no time dependence).
(2) The heat equation. Here one coordinate t is distinguished as the “time” coordinate, while
the remaining coordinates x1 , . . . , xn represent spatial coordinates. We consider
u : Ω × R+ → R,
Ω open in Rn ,
where R+ = {t ∈ R | t > 0} is the positive time axis and pose the equation
kut = ∆u,
where
∆u = ux1 x1 + · · · + uxn xn .
The heat equation models heat conduction and other diffusion processes.
(3) The wave equation. With the same notations as in (2), here we have the equation
utt − a2 ∆u = 0.
It models wave and oscillation phenomena.
15.1 Classification of PDE
403
(4) The Korteweg–de Vries equation
ut − 6u ux + uxxx = 0
models the propagation of waves in shallow waters.
(5) The Monge–Ampère equation
uxx uyy − u2xy = f
with a given function f , is used for finding surfaces with prescribed curvature.
(6) The Maxwell equations for the electric field strength E = (E1 , E2 , E3 ) and the magnetic
field strength B = (B1 , B2 , B3 ) as functions of (t, x1 , x2 , x3 ):
div B = 0,
(magnetostatic law),
Bt + curl E = 0,
(magnetodynamic law),
div E = 4πρ,
(electrostatic law, ρ = charge density),
Et − curl B = −4πj
(electrodynamic law, j = current density)
(7) The Navier–Stokes equations for the velocity v(x, t) = (v 1 , v 2 , v 3 ) and the pressure
p(x, t) of an incompressible fluid of density ρ and viscosity η:
ρvtj
+ρ
3
X
i=1
v i vxj i − η∆v j = −pxj ,
j = 1, 2, 3,
div v = 0.
(8) The Schrödinger equation
~2
∆u + V (x, u)
i~ut = −
2m
(m = mass, V = given potential, u : Ω → C) from quantum mechanics is formally
similar to the heat equation, in particular in the case V = 0. The factor i, however, leads
to crucial differences.
Classification
We have seen so many rather different-looking PDEs, and it is hopeless to develop a theory that
can treat all these diverse equations. In order to proceed we want to look for criteria to classify
PDEs. Here are some possibilities
(I) Algebraically.
(a) Linear equations are (1), (2), (3), (6) which is of first order, and (8)
(b) semi-linear equations are (4) and (7)
15 Partial Differential Equations I — an Introduction
404
(c) a non-linear equation is (5)
naturally, linear equations are simple than non-linear ones. We shall therefore mostly
study linear equations.
(II) The order of the equation. The Cauchy–Riemann equations and the Maxwell equations
are linear first order equations. (1), (2), (3), (5), (7), (8) are of second order; (4) is of third
order. Equations of higher order rarely occur. The most important PDEs are second order
PDEs.
(III) Elliptic, parabolic, hyperbolic. In particular, for the second order equations the following
classification turns out to be useful: Let x = (x1 , . . . , xn ) ∈ Ω and
F (x, u, uxi , uxixj ) = 0
be a second-order PDE. We introduce auxiliary variables pi , pij , i, j = 1, . . . , n, and study
the function F (x, u, pi, pij ). The equation is called elliptic in Ω if the matrix
Fpij (x, u(x), uxi (x), uxi xj (x))i,j=1,...,n
of the first derivatives of F with respect to the variables pij is positive definite or negative
definite for all x ∈ Ω.
this may depend on the function u. The Laplace equation is the prominent example of an
elliptic equation. Example (5) is elliptic if f (x) > 0.
The equation is called hyperbolic if the above matrix has precisely one negative and d − 1
positive eigenvalues (or conversely, depending on the choice of the sign). Example (3) is
hyperbolic and so is (5) if f (x) < 0.
Finally, the equation is parabolic if one eigenvalue of the above matrix is 0 and all the
other eigenvalues have the same sign. More precisely, the equation can be written in the
form
ut = F (t, x, u, uxi , uxixj )
with an elliptic F .
(IV) According to solvability. We consider the second-order PDE F (x, u, uxi , uxixj ) = 0 for
u : Ω → R, and wish to impose additional conditions upon the solution u, typically
prescribing the values of u or of certain first derivatives of u on the boundary ∂Ω or part
of it.
Ideally such a boundary problem satisfies the three conditions of Hadamard for a wellposed problem
(a) Existence of a solution u for the given boundary values;
(b) Uniqueness of the solution;
(c) Stability, meaning continuous dependence on the boundary values.
15.2 First Order PDE — The Method of Characteristics
405
Example 15.1 In the following examples Ω = R2 and u = u(x, y).
(a) Find all solutions u ∈ C2 (R2 ) with uxx = 0. We first integrate with respect to x and find
that ux is independent on x, say ux = a(y). We again integrate with respect to x and obtain
u(x, y) = xa(y) + b(y) with arbitrary functions a and b. Note that the ODE u′′ = 0 has the
general solution ax + b with coefficients a, b. Now the coefficients are functions on y.
(b) Solve uxx + u = 0, u ∈ C2 (R2 ). The solution of the corresponding ODE u′′ + u = 0,
u = u(x), u ∈ C2 (R), is a cos x + b sin x such that the general solution of the corresponding
PDE in 2 variables x and y is a(y) cos x + b(y) sin x with arbitrary functions a and b.
∂
(c) Solve uxy = 0, u ∈ C2 (R2 ). First integrate ∂y
(ux ) = 0 with respect to y. we obtain
R
˜
ux = f(x).
Integration with respect to x yields u = f˜(x) dx + g(y) = f (x) + g(y), where f
is differentiable and g is arbitrary.
15.2 First Order PDE — The Method of Characteristics
We solve first order PDE by the method of chracteristics. It applies to quasi-linear equations
a(x, y, u)ux + b(x, y, u)uy = c(x, y, u)
(15.3)
as well as to the linear equation
a(x, y)ux + b(x, y)uy = c0 (x, y)u + c1 (x, y).
(15.4)
We restrict ourselves to the linear equation with an initial condition given as a parametric curve
in the xyu-space
Γ = Γ (s) = (x0 (s), y0(s), u0 (s)),
s ∈ (a, b) ⊆ R.
(15.5)
The curve Γ will be called the initial curve. The initial condition then reads
u(x0 (s), y0 (s)) = u0 (s),
u
initial curve
(x(0,s), y(0,s),u(0,s))
Γ(s)
integral
surface
initial point
y
characteristic curves
x
s ∈ (a, b).
The geometric idea behind this method is the
following. The solution u = u(x, y) can be
thought as surface in R3 = {(x, y, u) | x, y, u ∈
R3 }. Starting from a point on the initial curve,
we construct a chracteristic curve in the surface
u. If we do so for any point of the initial curve,
we obtain a one-parameter family of characteristic curves; glueing all these curves we get the
solution surface u.
The linear equation (15.4) can be rewritten as
(a, b, c0 u + c1 )·(ux , uy , −1) = 0.
(15.6)
15 Partial Differential Equations I — an Introduction
406
Recall that (ux , uy , −1) is the normal vector to the surface (x, y, u(x, y)), that is, the tangent
equation to u at (x0 , y0 , u0 ) is
u − u0 = ux (x − x0 ) + uy (y − y0 ) ⇔ (x − x0 , y − y0 , u − u0 )·(ux , uy , −1) = 0.
It follows from (15.6) that (a, b, c0 u + c1 ) is a vector in the tangent plane. Finding a curve
(x(t), y(t), u(t)) with exactely this tangent vector
(a(x(t), y(t)), b(x(t), y(t)), c0(x(t), y(t))u(t) + c1 (x(t), y(t)))
is equivalent to solve the ODE
x′ (t) = a(x(t), y(t)),
(15.7)
y ′(t) = b(x(t), y(t)),
(15.8)
′
u (t) = c0 (x(t), y(t))u(t) + c1 (x(t), y(t))).
(15.9)
This system is called the characteristic equations. The solutions are called characteristic curves
of the equation. Note that the above system is autonomous, i. e. there is no explicit dependence
on the parameter t.
In order to determine characteristic curves we need an initial condition. We shall require the
initial point to lie on the initial curve Γ (s). Since each curve (x(t), y(t), u(t)) emanates from
a different point Γ (s), we shall explicitely write the curves in the form (x(t, s), y(t, s), u(t, s)).
The initial conditions are written as
x(0, s) = x0 (s),
y(0, s) = y0 (s),
u(0, s) = u0 (s).
Notice that we selected the parameter t such that the characteristic curve is located at the initial
curve at t = 0. Note further that the parametrization (x(t, s), y(t, s), u(t, s)) represents a surface
in R3 .
The method of characteristics also applies to quasi-linear equations.
To summarize the method: In the first step we identify the initial curve Γ . In the second step
we select a point s on Γ as initial point and solve the characterictic equations using the point
we selected on Γ as an initial point. After preforming the steps for all points on Γ , we obtain
a portion of the solution surface, also called integral surface. That consists of the union of the
characteristic curves.
Example 15.2 Solve the equation
ux + uy = 2
subject to the initial condition u(x, 0) = x2 . The characteristic equations and the parametric
initial conditions are
xt (t, s) = 1,
yt (t, s) = 1,
ut (t, s) = 2,
x(0, s) = s,
y(0, s) = 0,
u(0, s) = s2 .
It is easy to solve the characteristic equations:
x(t, s) = t + f1 (s),
y(t, s) = t + f2 (s),
u(t, s) = 2t + f3 (s).
15.2 First Order PDE — The Method of Characteristics
407
Inserting the initial conditions, we find
x(t, s) = t + s,
y(t, s) = t,
u(t, s) = 2t + s2 .
We have obtained a parametric representation of the integral surface. To find an explicit representation we have to invert the transformation (x(t, s), y(t, s)) in the form (t(x, y), s(x, y)),
namely, we have to solve for s and t. In the current example, we find t = y, s = x − y. Thus
the explicit formula for the integral surface is
u(x, y) = 2y + (x − y)2.
Remark 15.1 (a) This simple example might lead us to think that each initial value problem
for a first-order PDE possesses a unique solution. But this is not the case Is the problem(15.3)
together with the initial condition (15.5) well-posed? Under which conditions does there exists
a unique integral surface that contains the initial curve?
(b) Notice that even if the PDE is linear, the characteristic equations are non-linear. It follows
that one can expect at most a local existanece theorem for a first ordwer PDE.
(c) The inversion of the parametric presentation of the integral surface might hide further difficulties. Recall that the implicit function theorem implies that the inversion locally exists if the
6= 0. An explicit computation of the Jacobian at a point s of the initial curve
Jacobian ∂(x,y)
∂(t,s)
gives
a b
∂x ∂y ∂x ∂y
′
′
.
J=
−
= ay0 − bx0 = ′
x0 y0′ ∂t ∂s
∂s ∂t
Thus, the Jacobian vanishes at some point if and only if the vectors (a, b) and (x′0 , y0′ ) are
linearly dependent. The geometrical meaning of J = 0 is that the projection of Γ into the xy
plane is tangent to the projection of the characteristic curve into the xy plane. To ensure a unique
solution near the initial curve we must have J 6= 0. This condition is called the transersality
condition.
Example 15.3 (Well-posed and Ill-posed Problems) (a) Solve ux = 1 subject to the initial
condition u(0, y) = g(y). The characteristic equations and the inition conditions are given by
xt = 1,
yt (t, s) = 0,
ut (t, s) = 1,
x(0, s) = 0,
y(0, s) = s,
u(0, s) = g(s).
The parametric integral surface is (x(t, s), y(t, s), u(t, s)) = (t, s, t+g(s)) such that the explicit
solution is u(x, y) = x + g(y).
(b) If we keep ux = 1 but modify the initial condition into u(x, 0) = h(x), the picture changess
dramatically.
xt = 1,
yt (t, s) = 0,
ut (t, s) = 1,
x(0, s) = s,
y(0, s) = 0,
u(0, s) = h(s).
In this case the parametric solution is
(x(t, s), y(t, s), u(t, s)) = (t + s, 0, t + h(s)).
15 Partial Differential Equations I — an Introduction
408
Now, however, the transformation x = t + s, y = 0 cannot be inverted. Geometrically, the
projection of the initial curve is the x axis, but this is also the projection of the characteristic
curve. In the speial case h(x) = x + c for some constant c, we obtain u(t, s) = t + s + c. Then
it is not necessary to invert (x, y) since we find at once u = x + c + f (y) for every differentiable
function f with f (0) = 0. We have infinitely many solutions — uniqueness fails.
(c) However, for any other choice of h Existence fails — the problem has no solution at all.
Note that the Jacobian is
a b 1 0
= 0.
J = ′
=
x0 y0′ 1 0
Remark 15.2 Because of the special role played by the projecions of the characteristics on the
xy plane, we also use the term characteristics to denote them. In case of the linear PDE (15.4)
the ODE for the projection is
x′ (t) =
which yields y ′ (x) =
dx
= a(x(t), y(t)),
dt
y ′(t) =
dy
= b(x(t), y(t)),
dt
(15.10)
dy
b(x, y)
=
.
dx
a(x, y)
15.3 Classification of Semi-Linear Second-Order PDEs
15.3.1 Quadratic Forms
We recall some basic facts about quadratic forms and symmetric matrices.
Proposition 15.1 (Sylvester’s Law of Inertia) Suppose that A ∈ Rn×n is a symmetric matrix.
(a) Then there exist an invertible matrix B ∈ Rn×n , r, s, t ∈ N0 with r + s + t = n and a
diagonal matrix D = diag (d1 , d2 , . . . , dr+s , 0, . . . , 0) with di > 0 for i = 1, . . . , r and di < 0
for i = r + 1, . . . , r + s and
BAB ⊤ = diag (d1 , . . . , dr+s , 0, . . . , 0).
We call (r, s, t) the signature of A.
(b) The signature does not depend on the change of coordinates, i. e. If there exist another
regular matrix B ′ and a diagonal matrix D ′ with D ′ = B ′ A(B ′ )⊤ then the signature of D ′ and
D coincide.
15.3.2 Elliptic, Parabolic and Hyperbolic
Consider the semi-linear second-order PDE in n variables x1 , . . . , xn in a region Ω ⊂ Rn
n
X
aij (x) uxi xj + F (x, u, ux1 , . . . , uxn ) = 0
(15.11)
i,j=1
with continuous coefficients aij (x). Since we assume u ∈ C2 (Ω), by Schwarz’s lemma we
assume without loss of generality that aij = aji. Using the terminology of the introduction
15.3 Classification of Semi-Linear Second-Order PDEs
409
(Classification (III), see page 404) we find that the matrix A(x) := (aij (x))i,j=1,...,n , coincides
with the matrix (Fpij )i,j defined therein.
Definition 15.2 We call the PDE (15.11) elliptic at x0 if the matrix A(x0 ) is positive definite
or negative definite. We call it parabolic at x0 if A(x0 ) is positive or negative semidefinite with
exactly one eigenvalue 0. We call it hyperbolic if A(x0 ) has the signature (n − 1, 1, 0), i. e. A is
indefinite with n − 1 positive eigenvalues and one negative eigenvalue and no zero eigenvalue
(or vice versa).
15.3.3 Change of Coordinates
First we study how the coefficients aij will change if we impose a non-singular transformation
of coordinates y = ϕ(x);
yl = ϕl (x1 , . . . , xn ),
l = 1, . . . , n;
1 ,...,ϕn )
The transformation is called non-singular if the Jacobian ∂(ϕ
(x0 ) 6= 0 is non-zero at
∂(x1 ,...,xn )
any point x0 ∈ Ω. By the Inverse Mapping Theorem, the transformation possesses locally an
inverse transformation denoted by x = ψ(y)
xl = ψl (y1, . . . , yn ),
l = 1, . . . , n.
Putting
ũ(y) := u(ψ(y)),
then u(x) = ũ(ϕ(x))
and if moreover ϕl ∈ C2 (Ω) we have by the chain rule
u xi =
n
X
l=1
ũyl
∂ϕl
,
∂xi
uxi xj = (uxi )xj =
n
X
n
ũyl yk
k,l=1
∂ϕl ∂ϕk X
∂ 2 ϕl
+
ũyl
.
∂xi ∂xj
∂x
i ∂xj
l=1
(15.12)
Inserting (15.12) into (15.11) one has
n
X
k,l=1
ũylyk
n
X
i,j=1
aij
n
n
X
∂ϕl ∂ϕk X
∂ 2 ϕl
+
ũyl
aij
+ F̃ (y, ũ, ũy1 , . . . , ũyn ) = 0.
∂xi ∂xj
∂x
∂x
i
j
i,j=1
l=1
(15.13)
We denote by ãlk the new coefficients of the partial second derivatives of ũ,
ãlk =
n
X
aij (x)
i,j=1
∂ϕl ∂ϕk
,
∂xi ∂xj
and write (15.13) in the same form as (15.11)
n
X
k,l=1
ãlk (y)ũylyk + F̃ (y, ũ, ũy1 , . . . , ũyn ) = 0.
(15.14)
15 Partial Differential Equations I — an Introduction
410
Equation (15.14) later plays a crucial role in simplifying PDE (15.11). Namely, if we want
some of the coefficients ãlk to be 0, the right hand side of (15.14) has to be 0. Writing
blj =
∂ϕl
,
∂xj
l, j = 1, . . . , n,
B = (blj ),
the new coefficient matrix Ã(y) = (ãlk (y)) reads as follows
à = B ·A·B ⊤ .
By Proposition 15.1, A and à have the same signature. We have shown the following proposition.
Proposition 15.2 The type of a semi-linear second order PDE is invariant under the change of
coordinates.
Notation. We call the operator L with
L(u) =
n
X
aij (x)
i,j=1
∂2u
+ F (x, u, ux1 , . . . , uxn )
∂xi ∂xj
differential operator and denote by L2
n
X
∂2u
L2 (u) =
aij (x)
∂xi ∂xj
i,j=1
the sum of its the highest order terms; L2 is a linear operator.
Definition 15.3 The second-order PDE L(u) = 0 has normal form if
L2 (u) =
m
X
j=1
u xj xj −
r
X
u xj xj
j=m+1
with some positive integers m ≤ r ≤ n.
Remarks 15.3 (a) It happens that the type of the equation depends on the point x0 ∈ Ω. For
example, the Trichomi equation
yuxx + uyy = 0
is of mixed type. More precisely, it is elliptic if y > 0, parabolic if y = 0 and hyperbolic if
y < 0.
(b) The Laplace equation is elliptic, the heat equation is parabolic, the wave equation is hyperbolic.
(c) The classification is not complete in case n ≥ 3; for example, the quadratic form can be of
type (n − 2, 1, 1).
(d) Case n = 2. The PDE
auxx + 2buxy + cuyy + F (x, y, u, ux, uy ) = 0
with coefficients a = a(x, y), b = b(x, y) and c = c(x, y) is elliptic, parabolic or hyperbolic at
(x0 , y0) if and only if ac − b2 > 0, ac − b2 = 0 or ac − b2 < 0 at (x0 , y0 ), respectively.
15.3 Classification of Semi-Linear Second-Order PDEs
411
15.3.4 Characteristics
Suppose we are given the semi-linear second-order PDE in Ω ⊂ Rn
n
X
aij (x)
i,j=1
∂2u
+ F (x, u, ux1 , . . . , uxn ) = 0
∂xi ∂xj
(15.15)
with continuous aij ; aij (x) = aji(x).
We define the concept of characteristics which plays an important role in the theory of PDEs,
not only of second-order PDEs.
Definition 15.4 Suppose that σ ∈ C1 (Ω) is continuously differentiable, grad σ 6= 0, and
(a) for some point x0 of the hypersurface F = {x ∈ Ω | σ(x) = c}, c ∈ R, we have
n
X
i,j=1
aij (x0 )
∂σ(x0 ) ∂σ(x0 )
= 0.
∂xi
∂xj
(15.16)
Then F is said to be characteristic at x0 .
(b) If F is characteristic at every point of F, F is called a characteristic hypersurface or simply
a characteristic of the PDE (15.11). Equation (15.16) is called the characteristic equation of
(15.11).
In case n = 2 we speak of characteristic lines.
If all hypersurfaces σ(x) = c, a < c < b, are characteristic, this family of hypersurfaces fills
the region Ω such that for any point x ∈ Ω there is exactly one hypersurface with σ(x) = c.
This c can be chosen to be one new coordinate. Setting
y1 = σ(x)
we see from (15.14) that ã11 = 0. That is, the knowledge of one or more characteristic hypersurfaces can simplify the PDE.
Example 15.4 (a) The characteristic equation of uxy = 0 is σx σy = 0 such that σx = 0 and
σy = 0 define the characteristic lines; the parallel lines to the coordinate axes, y = c1 and
x = c2 , are the characteristics.
(b) Find type and characteristic lines of
x2 uxx − y 2 uyy = 0,
x 6= 0, y 6= 0.
Since det = x2 (−y 2 )−0 = −x2 y 2 < 0, the equation is hyperbolic. The characteristic equation,
in the most general case, is
aσx2 + 2bσx σy + cσy2 = 0.
Since grad σ 6= 0, σ(x, y) = c is locally solvable for y = y(x) such that y ′ = −σx /σy . Another
way to obtain this is as follows: Differentiating the equation σ(x, y) = c yields σx dx+σy dy =
0 or dy/ dx = −σx /σy . Inserting this into the previous equation we obtain a quadratic ODE
a(y ′ )2 − 2by ′ + c = 0,
15 Partial Differential Equations I — an Introduction
412
with solutions
√
b2 − ac
, if a 6= 0.
a
We can see, that the elliptic equation has no characteristic lines, the parabolic equation has one
family of characteristics, the hyperbolic equation has two families of characteristic lines.
Hyperbolic case. In general, if c1 = ϕ1 (x, y) is the first family of characteristic lines and
c2 = ϕ2 (x, y) is the second family of characteristic lines,
′
y =
b±
ξ = ϕ1 (x, y),
η = ϕ2 (x, y)
is the appropriate change of variable which gives ã = c̃ = 0. The transformed equation then
reads
2b̃ ũξη + F (ξ, η, ũ, ũξ , ũη ) = 0.
Parabolic case. Since det A = 0, there is only one real family of characteristic hyperplanes,
say c1 = ϕ1 (x, y). We impose the change of variables
z = ϕ1 (x, y),
y = y.
Since det à = 0, the coefficients b̃ vanish (together with ã). The transformed equation reads
c̃ ũyy + F (z, y, ũ, ũz , ũy ) = 0.
The above two equations are called the characteristic forms of the PDE (15.11)
In our case the characteristic equation is
x2 (y ′)2 − y 2 = 0,
y ′ = ±y/x.
This yields
dx
dy
=± ,
log | y | = ± log | x | + c0 .
y
x
We obtain the two families of characteristic lines
Indeed, in our example
ξ=
c2
.
x
y = c1 x,
y=
y
= c1 ,
x
η = xy = c2
gives
ηx = y,
ξx = −
ηy = x,
1
ξy = ,
x
y
,
x2
ηxx = 0,
ξxx = 2
y
,
x3
ηyy = 0,
ηxy = 1,
ξyy = 0,
ξxy = −
In our case (15.12) reads
uxx = ũξξ ξx2 + 2ũξη ξx ηx + ũηη ηx2 + ũξ ξxx + ũη ηxx
uyy = ũξξ ξy2 + 2ũξη ξy ηy + ũηη ηy2 + ũξ ξyy + ũη ηyy
1
.
x2
15.3 Classification of Semi-Linear Second-Order PDEs
413
Noting x2 = η/ξ, y 2 = ξη and inserting the values of the partial derivatives of ξ and η we get
y2
y2
y
−
2
ũξη + ũηη y 2 + 2 3 ũξ ,
4
2
x
x
x
1
= ũξξ 2 + 2ũξη + ũηη x2 .
x
uxx = ũξξ
uyy
Hence
y
x2 uxx − y 2 uyy = −4y 2 ũξη + 2 ũξ = 0
x
1 1
ũξη −
ũξ = 0.
2 xy
Since η = xy, we obtain the characteristic form of the equation to be
ũξη −
1
ũξ = 0.
2η
1
1
v = 0 which corresponds to the ODE v ′ − 2η
v=
Using the substitution v = ũξ , we obtain vη − 2η
√
√
0. Hence, v(η, ξ) = c(ξ) η. Integration with respect to ξ gives ũ(ξ, η) = A(ξ) η + B(η).
Transforming back to the variables x and y, the general solution is
y √
u(x, y) = A
xy + B(xy).
x
(c) The one-dimensional wave equation utt − a2 uxx = 0. The characteristic equation σt2 = a2 σx2
yields
−σt /σx = dx/ dt = ẋ = ±a.
The characteristics are x = at + c1 and x = −at + c2 . The change of variables ξ = x − at
and η = x + at yields ũξη = 0 which has general solution ũ(ξ, η) = f (ξ) + g(η). Hence,
u(x, t) = f (x − at) + g(x + at) is the general solution, see also homework 23.2. .
(d) The wave equation in n dimensions has characteristic equation
σt2
2
−a
n
X
σx2i = 0.
i=1
This equation is satisfied by the characteristic cone
2
(0) 2
σ(x, t) = a (t − t ) −
n
X
i=1
(0)
(xi − xi )2 = 0,
where the point (x(0) , t(0) ) is the peak of the cone. Indeed,
implies σt2 − a2
Pn
i=1 (xi
σt = 2a2 (t − t(0) ),
(0)
− xi )2 = 0.
(0)
σx1 = −2(xi − xi )
15 Partial Differential Equations I — an Introduction
414
Further, there are other characteristic surfaces: the hyperplanes
σ(x, t) = at +
n
X
bi xi = 0,
i=1
where kbk = 1.
P
(e) The heat equation has characteristic equation ni=1 σx2i = 0 which implies σxi = 0 for
all i = 1, . . . , n such that t = c is the only family of characteristic surfaces (the coordinate
hyperplanes).
(f) The Poisson and Laplace equations have the same characteristic equation; however we have
one variable less (no t) and obtain grad σ = 0 which is impossible. The Poisson and Laplace
equations don’t have characteristic surfaces.
15.3.5 The Vibrating String
(a) The Infinite String on
R
We consider the Cauchy problem for an infinite string (no boundary values):
utt − a2 uxx = 0,
u(x, 0) = u0 (x),
where u0 and u1 are given.
Inserting the initial values into
u(x, t) = f (x − at) + g(x + at) we get
the
u0 (x) = f (x) + g(x),
ut (x, 0) = u1(x),
general
solution
(see
Example 15.4 (c))
u1(x) = −af ′ (x) + ag ′ (x).
Differentiating the first one yields u′0 (x) = f ′ (x) + g ′(x) such that
1
1
f ′ (x) = u′0 (x) − u1 (x),
2
2a
Integrating these equations we obtain
Z x
1
1
f (x) = u0 (x) −
u1 (y) dy + A,
2
2a 0
1
1
g ′(x) = u′0 (x) + u1 (x).
2
2a
1
1
g(x) = u0 (x) +
2
2a
Z
x
u1 (y) dy + B,
0
where A and B are constants such that A + B = 0 (since f (x) + g(x) = u0 (x)). Finally we
have
u(x, t) = f (x − at) + g(x + at)
1
= (u0 (x + at) + u0 (x − at)) −
2
1
= (u0 (x + at) + u0 (x − at)) +
2
Z x−at
Z x+at
1
1
u1 (y) dy +
u1 (y) dy
2a 0
2a 0
Z x+at
1
u1 (y) dy.
(15.17)
2a x−at
15.3 Classification of Semi-Linear Second-Order PDEs
415
t
It is clear from (15.17) that u(x, t) is uniquely determined by the values of the initial functions u0 and u1
in the interval [x − at, x + at] whose end points are
cut out by the characteristic lines through the point
(x, t). This interval represents the domain of dependence for the solution at point u(x, t) as shown in the
figure.
(x,t)
ξ
x-at
x
x+at
Conversely, the initial values at point (ξ, 0) of the x-axis influence u(x, t) at points (x, t) in the
wedge-shaped region bounded by the characteristics through (ξ, 0), i. e. , for ξ−at < x < ξ+at.
This indicates that our “signal” or “disturbance” only moves with speed a.
We want to give some interpretation of the solution (15.17). Suppose u1 = 0 and
u
1
t=0
(
| x | ≤ a,
1 − | xa | ,
u0 (x) =
x
0,
| x | > a.
-a
a
In this example we consider the vibrating string which is plucked at time t = 0 as in the above
picture (given u0 (x)). The initial velocity is zero (u1 = 0).
u
t=1/2
1/2
-3a/2
a/2
-a/2
x
3a/2
u
t=1
1/2
-2a
-a
a
2a
x
In the pictures one can see the behaviour of
the string. The initial peek is divided into two
smaller peeks with the half displacement, one
moving to the right and one moving to the left
with speed a.
u
t=2
1/2
-3a
-2a
-a
a
2a
3a
Formula (15.17) is due to d’Alembert (1746). Usually one assumes u0 ∈ C2 (R) and u1 ∈
C1 (R). In this case, u ∈ C2 (R2 ) and we are able to evaluate the classical Laplacian ∆(u)
which gives a continuous function. On the other hand, the right hand side of (15.17) makes
sense for arbitrary continuous function u1 and arbitrary u0 . If we want to call these u(x, t) a
generalized solution of the Cauchy problem we have to alter the meaning of ∆(u). In particular,
we need more general notion of functions and derivatives. This is our main objective of the next
section.
(b) The Finite String over [0, l]
We consider the initial boundary value problem (IBVP)
utt = a2 uxx ,
u(0, x) = u0(x),
ut (0, x) = u1 (x), x ∈ [0, l], u(0, t) = u(l, t) = 0,
t ∈ R.
15 Partial Differential Equations I — an Introduction
416
Suppose we are given functions u0 ∈ C2 ([0, l]) and u1 ∈ C1 ([0, l]) on [0, l] with
u0 (0) = u0 (l) = 0,
u1 (0) = u1 (l) = 0,
u′′0 (0) = u′′0 (l) = 0.
To solve the IBVP, we define new functions ũ0 and ũ1 on R as follows: first extend both
functions to [−l, l] as odd functions, that is, ũi (−x) = −ui (x), i = 0, 1. Then extend ũi as a
2l-periodic function to the entire real line. The above assumptions ensure that ũ0 ∈ C2 (R) and
ũ1 ∈ C1 (R). Put
Z x+at
1
1
u(x, t) = (ũ0 (x + at) + ũ0 (x − at)) +
ũ1 (y) dy.
2
2a x−at
Then u(x, t) solves the IVP.
Chapter 16
Distributions
16.1 Introduction — Test Functions and Distributions
In this section we introduce the notion of distributions. Distributions are generalized functions.
The class of distributions has a lot of very nice properties: they are differentiable up to arbitrary
order, one can exchange limit procedures and differentiation, Schwarz’ lemma holds. Distributions play an important role in the theory of PDE, in particular, the notion of a fundamental
solution of a differential operator can be made rigorous within the theory of distributions only.
Generalized functions were first used by P. Dirac to study quantum mechanical phenomena.
Systematically he made use of the so called δ-function (better: δ-distribution). The mathematical foundations of this theory are due to S. L. Sobolev (1936) and L. Schwartz (1950, 1915 –
2002).
Since then many mathematicians made progress in the theory of distributions. Motivation comes
from problems in mathematical physics and in the theory of partial differential equations.
Good accessible (German) introductions are given in the books of W. Walter [Wal74] and
O. Forster [For81, § 17]. More detailed explanations of the theory are to be found in the
books of H. Triebel (in English and German), V. S. Wladimirow (in russian and german) and
Gelfand/Schilow (in Russian and German, part I, II, and III), [Tri92, Wla72, GS69, GS64].
16.1.1 Motivation
Distributions generalize the notion of a function. They are linear functionals on certain spaces of
test functions. Using distributions one can express rigorously the density of a mass point, charge
density of a point, the single-layer and the double-layer potentials, see [Arn04, pp. 92]. Roughly
speaking, a generalized function is given at a point by the “mean values” in the neighborhood
of that point.
The main idea to associate to each “sufficiently nice” function f a linear functional Tf (a distribution) on an appropriate function space D is described by the following formula.
hTf , ϕi =
Z
R
f (x)ϕ(x) dx,
417
ϕ ∈ D.
(16.1)
16 Distributions
418
On the left we adopt the notation of a dual pairing of vector spaces from Definition 11.1. In
general the bracket hT , ϕi denotes the evaluation of the functional T on the test function ϕ.
Sometimes it is also written as T (ϕ). It does not denote an inner product; the left and the right
arguments are from completely different spaces.
What we really want of Tf is
(a) The correspondence should be one-to-one, i. e., different functionals Tf and Tg correspond to different functions f and g. To achieve this, we need the function space D
sufficiently large.
(b) The class of functions f should contain at least the continuous functions. However, if
f (x) = xn , the function f (x)ϕ(x) must be integrable over R, that is xn ϕ(x) ∈ L1 (R).
Since polynomials are not in L1 (R), the functions ϕ must be “very small” for large | x |.
Roughly speaking, there are two possibilities to this end. First, take only those functions
ϕ which are identically zero outside a compact set (which depends on ϕ). This leads to
the test functions D(R). Then Tf is well-defined if f is integrable over every compact
subset of R. These functions f are called locally integrable.
Secondly, we take ϕ(x) to be rapidly decreasing as | x | tends to ∞. More precisely, we
want
sup | xn ϕ(x) | < ∞
R
x∈
for all non-negative integers n ∈ Z+ . This concept leads to the notion of the so called
Schwartz space S (R).
(c) We want to differentiate f arbitrarily often, even in case that f has discontinuities. The
only thing we have to do is to give the expression
Z
R
f ′ (x)ϕ(x) dx,
ϕ∈D
a meaning. Using integration by parts and the fact that ϕ(+∞) = ϕ(−∞) = 0, the above
R
expression equals − R f (x)ϕ′ (x) dx. That is, instead differentiating f , we differentiate
the test function ϕ. In this way, the functional Tf ′ makes sense as long as f ϕ′ is integrable. Since we want to differentiate f arbitrarily often, we need the test function ϕ to
be arbitrarily differentiable, ϕ ∈ C∞ (R).
Note that conditions (b) and (c) make the space of test functions sufficiently small.
R ) and D(Ω)
16.1.2 Test Functions D(
n
We want to solve the problem f ϕ to be integrable for all polynomials f . We use the first
approach and consider only functions ϕ which are 0 outside a bounded set. If nothing is stated
otherwise, Ω ⊆ Rn denotes an open, connected subset of Rn .
16.1 Introduction — Test Functions and Distributions
419
(a) The Support of a Function and the Space of Test Functions
Let f be a function, defineds on Ω. The set
supp f := {x ∈ Ω | f (x) 6= 0} ⊂ Rn
is called the support of f , denoted by supp f .
Remark 16.1 (a) supp f is always closed; it is the smallest closed subset M such that f (x) = 0
for all x ∈ Rn \ M.
(b) A point x0 6∈ supp f if and only if there exists ε > 0 such that f ≡ 0 in Uε (x0 ). This in
particular implies that for f ∈ C∞ (Rn ) we have f (k) (x0 ) = 0 for all k ∈ N.
(c) supp f is compact if and only if it is bounded.
Example 16.1 (a) supp sin = R.
(b) Let let f : (−1, 1) → R, f (x) = x(1 − x). Then supp f = [−1, 1].
(c)The characteristic function χM has support M .
(d) Let h be the hat function on R – note that supp h = [−1, 1] and f (x) = 2h(x) − 3h(x − 10).
Then supp f = [−1, 1] ∪ [−11, −9].
Definition 16.1 (a) The space D(
Rn with compact support.
R ) consists of all infinitely differentiable functions f on
n
n
∞
n
D(Rn ) = C∞
0 (R ) = {f ∈ C (R ) | supp f
is compact}.
(b) Let Ω be a region in Rn . Define D(Ω) as follows
D(Ω) = {f ∈ C∞ (Ω) | supp f is compact in Rn and supp f ⊂ Ω}.
We call D(Ω) the space of test functions on Ω.
c/e
h(t)
First of all let us make sure the existence of such
functions. On the real axis consider the “hat”
function (also called bump function)
( − 1
| t | < 1,
c e 1−t2 ,
h(t) =
0,
| t | ≥ 1.
−1
1
R
The constant c is chosen such that R h(t) dt = 1. The function h is continuous on R. It was
already shown in Example 4.5 that h(k) (−1) = h(k) (1) = 0 for all k ∈ N. Hence h ∈ D(R) is
a test function with supp h = [−1, 1]. Accordingly, the function
( − 1
kxk < 1,
cn e 1−kxk2 ,
h(x) =
0,
kxk ≥ 1.
16 Distributions
420
is an element of D(Rn ) with support
being theZ closed unit ball supp h = {x | kxk ≤ 1}. The
Z
constant cn is chosen such that
h(x) dx =
Rn
h(x) dx = 1.
U1 (0)
For ε > 0 we introduce the notation
1 x
.
h
εn
ε
hε (x) =
Then supp hε = U ε (0) and
Z
Z
Z
x
1
dx =
h
h(y) dy = 1.
hε (x) dx = n
ε Uε (0)
ε
Rn
U1 (0)
So far, we have constructed only one function h(x) (as well as its scaled relatives hε (x)) which
is C∞ and has compact support. Using this single hat-function hε we are able
(a) to restrict the support of an arbitrary integrable function f to a given domain by replacing
f by f hε (x − a) which has a support in Uε (a),
(b) to make f smooth.
(b) Mollification
In this way, we have an amount of C∞ functions with compact support which is large enough for
our purposes (especially, to recover the function f from the functional Tf ). Using the function
hε , S. L. Sobolev developed the following mollification method.
Definition 16.2 (a) Let f ∈ L1 (Rn ) and g ∈ D(Rn ), define the convolution product f ∗ g by
Z
Z
f (y)g(x − y) dy =
f (x − y)g(y) dy = (g ∗ f )(x).
(f ∗ g)(x) =
Rn
Rn
(b) We define the mollified function fε of f by
fε = f ∗ hε .
Note that
fε (x) =
Z
Rn
hε (x − y)f (y) dy =
Z
hε (x − y)f (y) dy.
(16.2)
Uε (x)
Roughly speaking, fε (x) is the mean value of f in the ε-neighborhood of x. If f is continuous
at x0 then fε (x0 ) = f (ξ) for some ξ ∈ Uε (x0 ). This follows from Proposition 5.18.
In particular let f = χ[1,2] the characteristic
function of the interval [1, 2]. The mollification
fε looks as follows


0,
x < 1 − ε,


1
2



1 − ε < x < 1 + ε,

∗,
fε (x) =
1− ε
1
1+ ε
2− ε
2 2+ ε
1,




∗,




0,
1 + ε < x < 2 − ε,
2 − ε < x < 2 + ε,
2 + ε < x,
16.1 Introduction — Test Functions and Distributions
421
where ∗ denotes a value between 0 and 1.
Remarks 16.2 (a) For f ∈ L1 (Rn ), fε ∈ C∞ (Rn ),
(b) fε → f in L1 (Rn ) as ε → 0.
(c) C0 (Rn ) ⊂ L1 (Rn ) is dense (with respect to the L1 -norm). In other words, for any
R
f ∈ L1 (Rn ) and ε > 0 there exists g ∈ C(Rn ) with supp g is compact and Rn | f − g | dx < ε.
(Sketch of Proof). (A) Any integrable function can be approximated by integrable
functions with compact support. This follows from Example 12.6.
(B) Any integrable function with compact support can be approximated by simple functions (which are finite linear combinations of characteristic functions) with
compact support.
(C) Any characteristic function with compact support can be approximated by characteristic functions χQ where Q is a finite union of boxes.
(D) Any χQ where Q is a closed box can be approximated by a sequence fn of
continuous functions with compact support:
fn (x) = max{0, n d(x, Q)},
n ∈ N,
where d(x, Q) denotes the distance of x from Q. Note that fn is 1 in Q and 0
outside U1/n (Q).
n
1
n
(d) C∞
0 (R ) ⊂ L (R ) is dense.
(b) Convergence in D
Notations. For x ∈ Rn and α ∈ Nn0 (a multi-index), α = (α1 , . . . , αn ) we write
| α | = α1 + α2 + · · · + αn ,
α! = α1 ! · · · αn !
xα = xα1 1 xα2 2 · · · xαnn ,
∂ | α | u(x)
.
D u(x) = α1
∂x1 · · · ∂xαnn
α
It is clear that D(Rn ) is a linear space. We shall introduce an appropriate notion of convergence.
Definition 16.3 A sequence (ϕn (x)) of functions of D(Rn ) converges to ϕ ∈ D(Rn ) if there
exists a compact set K ⊆ Rn such that
(a) supp ϕn ⊆ K for all n ∈ N and
(b)
D α ϕn ⇉ D α ϕ,
uniformly on K for all multi-indices α .
We denote this type of convergence by ϕn −→ ϕ.
D
16 Distributions
422
Example
16.2
Let ϕ ∈ D be a fixed test function and consider the sequence (ϕn (x)) given by
ϕ(x)
(a)
. This sequence converges to 0 in D since supp ϕn = supp ϕ for all n and the
n
convergence
isuniform for all x ∈ Rn (in fact, it suffices to consider x ∈ supp ϕ).
ϕ(x/n)
. The sequence does not converge to 0 in D since the supports supp (ϕn ) =
(b)
n
n supp
n ∈ N, are not in any common compact subset.
(ϕ), ϕ(nx)
(c)
has no limit if ϕ 6= 0, see homework 49.2.
n
Note that D(Rn ) is not a metric space, more precisely, there is no metric on D(Rn ) such that
the metric convergence and the above convergence coincide.
16.2 The Distributions D′(
R)
n
Definition 16.4 A distribution (generalized function) is a continuous linear functional on the
space D(Rn ) of test functions.
Here, a linear functional T on D is said to be continuous if and only if for all sequences (ϕn ),
ϕn , ϕ ∈ D, with ϕn −→ ϕ we have hT , ϕn i → hT , ϕi in C.
D
The set of distributions is denoted by D′ (Rn ) or simply by D′ .
The evaluation of a distribution T ∈ D′ on a test function ϕ ∈ D is denoted by hT , ϕi. Two
distributions T1 and T2 are equal if and only if hT1 , ϕi = hT2 , ϕi for all ϕ ∈ D.
Remark 16.3 (Characterization of continuity.) (a) A linear functional T on D(Rn ) is continuous if and only if ϕn −→ 0 implies hT , ϕn i → 0 in C. Indeed, T continuous, trivially
D
implies the above statement. Suppose now, that ϕn −→ ϕ. Then (ϕn − ϕ) −→ 0; thus
D
D
hT , ϕn − ϕi → 0 as n → ∞. Since T is linear, this shows hT , ϕn i → hT , ϕi and T is
continuous.
(b) A linear functional T on D is continuous if and only if for all compact sets K there exist a
constant C > 0 and l ∈ Z+ such that for all
| hT , ϕi | ≤ C
sup
x∈K, | α |≤l
| D α ϕ(x) | ,
∀ϕ ∈ D
with supp ϕ ⊂ K.
(16.3)
We show that the criterion (16.3) in implies continuity of T . Indeed, let ϕ −→ 0. Then there
D
exists compact subset K ⊂ Rn such that supp ϕn ⊆ K for all n. By the criterion, there is a
C > 0 and an l ∈ Z+ with | hT , ϕn i | ≤ C sup | D α ϕn (x) |, where the supremum is taken over
all x ∈ K and multiindices α with | α | ≤ l. Since D α ϕn ⇉ 0 on K for all α, we particularly
have sup | D α ϕn (x) | −→ 0 as n → ∞. This shows hT , ϕn i → 0 and T is continuous.
For the proof of the converse direction, see [Tri92, p. 52]
16.2.1 Regular Distributions
′
A large subclass of distributions
Z of D is given by ordinary functions via the correspondence
f ↔ Tf given by hTf , ϕi =
f (x)ϕ(x) dx. We are looking for a class which is as large as
R
16.2 The Distributions D′ (Rn )
423
possible.
Definition 16.5 Let Ω be an open subset of Rn . A function f (x) on Ω is said to be locally
integrable over Ω if f (x) is integrable over every compact subset K ⊆ Ω; we write in this case
f ∈ L1loc (Ω).
Remark 16.4 The following are equivalent:
(a) f ∈ L1loc (Rn ).
(b) For any R > 0, f ∈ L1 (UR (0)).
(c) For any x0 ∈ Rn there exists ε > 0 such that f ∈ L1 (Uε (x0 )).
Lemma 16.1 If f is locally integrable f ∈ L1loc (Ω), Tf is a distribution, Tf ∈ D′ (Ω).
A distribution T which is of the form T = Tf with some locally integrable function f is called
regular.
Proof. First, Tf is linear functional on D since integration is a linear operation. Secondly, if
ϕm −→ 0, then there exists a compact set K with supp ϕm ⊂ K for all m. We have the
D
following estimate:
Z
Z
| f (x) | dx = C sup | ϕm (x) | ,
f (x)ϕm (x) dx ≤ sup | ϕm (x) |
R
Rn
x∈K
K
x∈K
where C = K | f | dx exists since f ∈ L1loc . The expression on the right tends to 0 since ϕm (x)
uniformly tends to 0. Hence hTf , ϕm i → 0 and Tf belongs to D′ .
Example 16.3 (a) C(Ω) ⊂ L1loc (Ω), L1 (Ω) ⊆ L1loc (Ω).
1
is in L1loc ((0, 1)); however, f 6∈ L1 ((0, 1)) and f 6∈ L1loc (R) since f is not
(b) f (x) =
x
integrable over [−1, 1].
Lemma 16.2 (Du Bois–Reymond, Fund. Lemma of the Calculus of Variation) Let Ω
Rn be a region. Suppose that f ∈ L1loc (Rn ) and hTf , ϕi = 0 for all ϕ ∈ D(Ω).
Then f = 0 almost everywhere in Ω.
⊆
Proof. For simplicity we consider the case n = 1, Ω = (−π, π). Fix ε with 0 < ε < π. Let
ϕn (x) = e−inx hε (x), n ∈ Z. Then supp ϕn ⊂ [−π, π]. Since both, ex and hε are C∞ -functions,
ϕn ∈ D(Ω) and
Z π
cn = hTf , ϕn i =
f (x)e−inx hε (x) dx = 0, n ∈ Z;
−π
and all Fourier coefficients of f hε ∈ L2 [−π, π] vanish. From Theorem 13.13 (b) it follows that
f hε is 0 in L2 (−π, π). By Proposition 12.16 it follows that f hε is 0 a.e. in (−π, π). Since
hε > 0 on (−π, π), f = 0 a.e. on (−π, π).
16 Distributions
424
Remark 16.5 The previous lemma shows, if f1 and f2 are locally integrable and Tf1 = Tf2 then
f1 = f2 a.e.; that is, the correspondence is one-to-one. In this way we can identify L1loc (Rn ) ⊆
D′ (Rn ) the locally integrable functions as a subspace of the distributions.
16.2.2 Other Examples of Distributions
Definition 16.6 Every non-regular distribution is called singular. The most important example
of singular distribution is the δ-distribution δa defined by
a ∈ Rn ,
hδa , ϕi = ϕ(a),
ϕ ∈ D.
It is immediate that δa is a linear functional on D. Suppose that ϕn −→ 0 then ϕn (x) → 0
D
pointwise. Hence, δa (ϕn ) = ϕn (a) → 0; the functional is continuous on D and therefore a
distribution. We will also use the notation δ(x − a) in place of δa and δ or δ(x) in place of δ0 .
Proof that δa is singular. If δa ∈ D′ were regular there would exist a function f ∈ L1loc such that
R
δa = Tf , that is ϕ(a) = Rn f (x)ϕ(x) dx. First proof. Let Ω ⊂ Rn be an open such that a 6∈ Ω.
R
Let ϕ ∈ D(Ω), that is, supp ϕ ⊂ Ω. In particular ϕ(a) = 0. That is, Ω f (x)ϕ(x) dx = 0 for
all ϕ ∈ D(Ω). By Du Bois-Reymond’s Lemma, f = 0 a.e. in Ω. Since Ω was arbitrary, f = 0
a.e. in Rn \ {a} and therefore, f = 0 a.e. in Rn . It follows that Tf = 0 in D′ (Rn ), however
δa 6= 0 – a contradiction.
Second Proof for a = 0. Since f ∈ L1loc there exists ǫ > 0 such that
d :=
Z
Uǫ (0)
| f (x) | dx < 1.
Putting ϕ(x) = h(x/ε) with the bump function h we have supp ϕ = Uε (0) and
supx∈Rn | ϕ(x) | = ϕ(0) > 0 such that
Z
Z
f (x)ϕ(x) dx ≤ sup | ϕ(x) |
n
R
Uε (0)
| f (x) | dx = ϕ(0)d < ϕ(0).
R
This contradicts Rn f (x)ϕ(x) dx = | ϕ(0) | = ϕ(0).
In the same way one can show that the assignment
hT , ϕi = D α ϕ(a),
defines an element of D′ which is singular.
The distribution
Z
hT , ϕi =
Rn
a ∈ Rn ,
f (x)D α ϕ(x) dx,
ϕ∈D
f ∈ L1loc ,
may be regular or singular which depends on the properties of f .
Locally integrable functions as well as δa describe mass, force, or charge densities. That is why
L. Schwartz named the generalized functions “distributions.”
16.2 The Distributions D′ (Rn )
425
16.2.3 Convergence and Limits of Distributions
There are a lot of possibilities to approximate the distribution δ by a sequence of L1loc functions.
Definition 16.7 A sequence (Tn ), Tn ∈ D′ (Rn ), is said to be convergent to T ∈ D′ (Rn ) if and
only if for all ϕ ∈ D(Rn )
lim Tn (ϕ) = T (ϕ).
n→∞
Similarly, let Tε , ε > 0, be a family of distributions in D′ , we say that limε→0 Tε = T if
limε→0 Tε (ϕ) = T (ϕ) for all ϕ ∈ D.
Note that D′ (Rn ) with the above notion of convergence is complete, see [Wal02, p. 39].
Example 16.4 Let f (x) = 12 χ[−1,1] and fε = 1ε f xε be the scaling of f . Note that fε =
1/(2ε)χ[−ε,ε]. We will show tht fε → δ in D′ (R). Indeed, for ϕ ∈ D(R), by the Mean Value
Theorem of integration,
Z
Z
1
1 ε
1
Tfε (ϕ) =
χ[−ε,ε]ϕ dx =
ϕ(x) dx = 2εϕ(ξ) = ϕ(ξ), ξ ∈ [−ε, ε],
2ε R
2ε −ε
2ε
for some ξ. Since ϕ is continuous at 0, ϕ(ξ) tends to ϕ(0) as ε → 0 such that
lim Tfε (ϕ) = ϕ(0) = δ(ϕ).
ε→0
This proves the calaim.
The following lemma generalizes this example.
Lemma 16.3 Suppose that f ∈ L1 (R) with
function fε (x) = 1ε f xε .
Then lim Tfε = δ in D′ (R).
R
R f (x) dx = 1. For ε > 0 define the scaled
ε→0+0
R
Proof. By the change of variable theorem, R fε (x) dx = 1 for all ε > 0. To prove the claim we
have to show that for all ϕ ∈ D
Z
Z
fε (x)ϕ(x) dx −→ ϕ(0) =
fε (x)ϕ(0) dx as ε → 0;
R
or, equivalently,
R
Z
fε (x)(ϕ(x) − ϕ(0)) dx −→ 0,
R
as ε → 0.
Using the new coordinate y with x = εy, dx = ε dy the above integral equals
Z
Z
=
.
εf
(εy)(ϕ(εy)
−
ϕ(0))
dy
f
(y)(ϕ(εy)
−
ϕ(0))
dy
ε
R
R
Since ϕ is continuous at 0, for every fixed y, the family of functions (ϕ(εy) − ϕ(0)) tends to
0 as ε → 0. Hence, the family of functions gε (y) = f (y)(ϕ(εy) − ϕ(0)) pointwise tends to
16 Distributions
426
0. Further, gε has an integrable upper bound, 2C | f |, where C = sup | ϕ(x) |. By Lebesgue’s
R
x∈
theorem about the dominated convergence, the limit of the integrals is 0:
Z
Z
lim
| f (y) | | ϕ(εy) − ϕ(0) | dy =
| f (y) | lim | ϕ(εy) − ϕ(0) | dy = 0.
ε→0
R
R
ε→0
This proves the claim.
The following sequences of locally integrable functions approximate δ as ε → 0.
1
2 x
sin
,
πεx2
ε
x2
1
fε (x) = √ e− 4ε2 ,
2ε π
1
ε
,
π x2 + ε2
x
1
sin
fε (x) =
πx
ε
fε (x) =
fε (x) =
(16.4)
The first three functions satisfy the assumptions of the Lemma, the last one not since sinx x is
R∞
not in L1 (R). Later we will see that the above lemma even holds if −∞ f (x) dx = 1 as an
improper Riemann integral.
16.2.4 The distribution P
1
x
Since the function x1 is not locally integrable in a neighborhood of 0, 1/x is not a regular
distribution. However, we can define a substitute that coincides with 1/x for all x 6= 0.
Recall that the principal value (or Cauchy mean value) of an improper Riemann integral is
defined as follows. Suppose f (x) has a singularity at c ∈ [a, b] then
Z c−ε Z b Z b
f (x) dx := lim
+
f (x) dx.
Vp
ε→0
a
a
c+ε
R 1 dx
For example, Vp −1 x2n+1
= 0, n ∈ N.
For ϕ ∈ D define
Z −ε Z ∞ Z ∞
ϕ(x)
ϕ(x)
F (ϕ) = Vp
dx = lim
dx.
+
ε→0
x
−∞ x
−∞
ε
Then F is obviously linear. We have to show, that F (ϕ) is finite and continuous on D. Suppose
that supp ϕ ⊆ [−R, R]. Define the auxiliary function
(
ϕ(x)−ϕ(0)
,
x 6= 0
x
ψ(x) =
ϕ′ (0),
x = 0.
Rε
Since ϕ is differentiable at 0, ψ ∈ C(R). Since 1/x is odd, −ε dx/x = 0 and we get
Z −ε Z ∞ Z −ε Z R ϕ(x)
ϕ(x) − ϕ(0)
F (ϕ) = lim
+
+
dx = lim
dx
ε→0
ε→0
x
x
−∞
ε
−R
ε
Z −ε Z R Z R
+
ψ(x) dx =
ψ(x) dx.
= lim
ε→0
−R
ε
−R
16.2 The Distributions D′ (Rn )
427
Since ψ is continuous, the above integral exists.
We now prove continuity of F . By Taylor’s theorem, ϕ(x) = ϕ(0) + xϕ′ (ξx ) for some value ξx
between x and 0. Therefore
Z −ε Z ∞ ϕ(x)
| F (ϕ) | = lim
dx +
ε→0
x
−∞
ε
Z −ε Z R ′
(ξ
)
ϕ(0)
+
xϕ
x
= lim
dx +
ε→0
x
−R
ε
Z R
≤
| ϕ′ (ξx ) | dx ≤ 2R sup | ϕ′ (x) | .
R
x∈
−R
This shows that the condition (16.3) in Remark 16.3 is satisfied with C = 2R and l = 1 such
that F is a continuous linear functional on D(R), F ∈ D′ (R). We denote this distribution by
P x1 .
In quantum physics one needs the so called Sokhotsky’s formulas, [Wla72, p.76]
1
1
= −πiδ(x) + P ,
ε→0+0 x + εi
x
1
1
lim
= πiδ(x) + P .
ε→0+0 x − εi
x
lim
Idea of proof: Show the sum and the difference of the above formulas instead.
lim
ε→0+0 x2
2x
1
= 2P ,
2
+ε
x
−2iε
= −2πiδ.
ε→0+0 x2 + ε2
lim
The second limit follows from (16.4).
16.2.5 Operation with Distributions
The distributions are distinguished by the fact that in many calculations they are much easier to
handle than functions. For this purpose it is necessary to define operations on the set D′ . We
already know how to add distributions and how to multiply them with complex numbers since
D′ is a linear space. Our guiding principle to define multiplication, derivatives, tensor products, convolution, Fourier transform is always the same: for regular distributions, i.e. locally
integrable functions, we want to recover the old well-known operation.
(a) Multiplication
There is no notion of a product T1 T2 of distributions. However, we can define a·T = T ·a,
T ∈ D′ (Rn ), a ∈ C∞ (Rn ). What happens in case of a regular distribution T = Tf ?
Z
Z
haTf , ϕi =
(16.5)
a(x)f (x)ϕ(x) dx =
f (x) a(x)ϕ(x) dx = hTf , a ϕi .
Rn
Rn
Obviously, aϕ ∈ D(Rn ) since a ∈ C∞ (Rn ) and ϕ has compact support; thus, aϕ has compact
support, too. Hence, the right hand side of (16.5) defines a linear functional on D(Rn ). We
have to show continuity. Suppose that ϕn −→ 0 then aϕn −→ 0. Then hT , aϕn i → 0 since T
D
is continuous.
D
16 Distributions
428
Definition 16.8 For a ∈ C∞ (Rn ) and T ∈ D′ (Rn ) we define aT ∈ D′ (Rn ) by
haT , ϕi = hT , aϕi
and call aT the product of a and T .
Example 16.5 (a) xP
1
xP , ϕ
x
=
1
x
= 1. Indeed, for ϕ ∈ D(Rn ),
1
P , xϕ(x)
x
= Vp
Z
∞
xϕ(x)
dx =
x
−∞
Z
R
ϕ(x) dx = h1 , ϕi .
(b) If f (x) ∈ C∞ (Rn ) then
hf (x)δa , ϕi = hδa , f (x)ϕ(x)i = f (a)ϕ(a) = f (a) hδa , ϕi .
This shows f (x)δa = f (a)δa .
(c) Note that multiplication of distribution is no longer associative:
1
1
(δ · x) P = 0 · P = 0,
x (b)
x
1
δ x·P
x
= δ · 1 = δ.
(a)
(b) Differentiation
Consider n = 1. Suppose that f ∈ L1loc is continuously differentiable. Suppose further that
ϕ ∈ D with supp ϕ ⊂ (−a, a) such that ϕ(−a) = ϕ(a) = 0. We want to define (Tf )′ to be Tf ′ .
Using integration by parts we find
Z a
Z a
a
′
hTf ′ , ϕi =
f (x)ϕ(x) dx = f (x)ϕ(x)|−a −
f (x)ϕ′ (x) dx
−a
−a
Z a
f (x)ϕ′ (x) dx = − hTf , ϕ′ i ,
=−
−a
where we used that ϕ(−a) = ϕ(a) = 0. Hence, it makes sense to define Tf′ , ϕ = − hTf , ϕ′ i.
This can easily be generalized to arbitrary partial derivatives D α Tf .
Definition 16.9 For T ∈ D′ (Rn ) and a multi-index α ∈ Nn0 define D α T ∈ D′ (Rn ) by
hD α T , ϕi = (−1)| α | hT , D α ϕi .
We have to make sure that D α T is indeed a distribution. The linearity of D α T is obvious. To
prove continuity let ϕn −→ 0. By definition, this implies D α ϕn −→ 0. Since T is continuous,
D
D
hT , D α ϕn i → 0. This shows hD α T , ϕn i → 0; hence D α T is a continuous linear functional
on D.
Note, that exactly the fact D α T ∈ D′ needs the complicated looking notion of convergence in
D, D α ϕn → D α ϕ. Further, a distribution has partial derivatives of all orders.
16.2 The Distributions D′ (Rn )
429
Lemma 16.4 Let a ∈ C∞ (Rn ) and T ∈ D′ (Rn ). Then
(a) Differentiation D α : D′ → D′ is continuous in D′ , that is, Tn → T in D′ implies D α Tn →
D α T in D′ .
(b)
∂
∂a
∂T
(a T ) =
T +a
,
∂xi
∂xi
∂xi
i = 1, . . . , n (product rule).
(c) For any two multi-indices α and β
D α+β T = D α (D β T ) = D β (D α T ) (Schwarz’s Lemma).
Proof. (a) Suppose that Tn → T in D′ , that is hTn , ψi → hT , ψi for all ψ ∈ D. In particular,
for ψ = D α ϕ, ϕ ∈ D, we get
(−1)| α | hD α Tn , ϕi = hTn , D α ϕi −→ hT , D α ϕi = (−1)| α | hD α T , ϕi .
Since this holds for all ϕ ∈ D, the assertion follows.
(b) This follows from
∂T
∂
∂T
a
,ϕ =
, aϕ = − T ,
(aϕ)
∂xi
∂xi
∂xi
∂ϕ
∂ϕ
= − haxi T , ϕi − aT ,
= − hT , axi (x)ϕi − T , a
∂xi
∂xi
∂
∂
= − haxi T , ϕi +
(aT ) , ϕ = −axi T +
(aT ) , ϕ .
∂xi
∂xi
“Cancelling” ϕ on both sides proves the claim.
(c) The easy proof uses D α+β ϕ = D α (D β ϕ) for ϕ ∈ D.
Example 16.6 (a) Let a ∈ Rn , f ∈ L1loc (Rn ), ϕ ∈ D. Then
hD α δa , ϕi = (−1)| α | hδa , D α ϕi = (−1)| α | D α ϕ(a)
Z
α
|α|
f Dα ϕ dx.
hD f , ϕi = (−1)
Rn
(b) Recall that the so-called Heaviside function H(x) is defined as the characteristic function of
the half-line (0, +∞). We compute its derivative in D′ :
Z
Z ∞
′
′
H(x)ϕ (x) dx = −
ϕ(x)′ dx = −ϕ(x)|∞
hTH , ϕ(x)i = −
0 = ϕ(0) = hδ , ϕi .
R
0
This shows TH′ = δ.
f(x)
h
c
(c) More generally, let f (x) be differentiable on G =
R \ {c} = (−∞, c) ∪ (c, ∞) with a discontinuity of the
first kind at c.
16 Distributions
430
The derivative of Tf in D′ is
Tf′ = Tf ′ + hδc ,
where
h = f (c + 0) − f (c − 0),
is the difference between the right-handed and left-handed limits of f at c. Indeed, for ϕ ∈ D
we have
Z c
Z ∞
′
Tf , ϕ = −
f (x)ϕ′ (x) dx
−
−∞
c
Z
= −f (c − 0)ϕ(c) + f (c + 0)ϕ(c) +
f ′ (x)ϕ(x) dx
G
= (f (c + 0) − f (c − 0))δc + Tf ′ (x) , ϕ
= hh δc + Tf ′ , ϕi .
(d) We prove that f (x) = log | x | is in L1loc (R) (see homework 50.4, and 51.5) and compute its
derivative in D′ (R).
Proof. Since f (x) is continuous on R \ {0}, the only critical point is 0. Since the integral
R1
R0
(improper Riemann or Lebesgue) 0 log x dx = − −∞ et dt = −1 exists f is locally integrable
at 0 and therefore defines a regular distribution. We will show that f ′ (x) = P x1 . We use the
R∞
R −ε R ε R ∞
fact that −∞ = −∞ + −ε + ε for all positive ε > 0. Also, the limit ε → 0 of the right hand
R∞
side gives the −∞ . By definition of the derivative,
Z ∞
′
′
log | x | ϕ′ (x) dx
hlog | x | , ϕ(x)i = − hlog | x | , ϕ (x)i =
Z −ε Z ε Z ∞−∞
′
+
+
log | x | ϕ (x) dx .
=−
−∞
−ε
ε
R
Rε
1
Since −1 log | x | ϕ′ (x) dx < ∞, the middle integral −ε log | x | ϕ′ (x) dx tends to 0 as ε → 0
(Apply Lebesgue’s theorem to the family of functions gε (x) = χ[−ε,ε](x) log | x | ϕ′ (x) which
pointwise tends to 0 and is dominated by the integrable function log | x | ϕ′ (x)). We consider
the third integral. Integration by parts and ϕ(+∞) = 0 gives
Z ∞
Z ∞
Z ∞
ϕ(x)
ϕ(x)
∞
′
dx = log εϕ(ε) −
dx.
log x ϕ (x) dx = log x ϕ(x)|ε −
x
x
ε
ε
ε
Similarly,
Z
−ε
−∞
′
log(−x) ϕ (x) dx = − log ε ϕ(−ε) −
Z
−ε
−∞
ϕ(x)
dx.
x
The sum of the first two (non-integral) terms tends to 0 as ε → 0 since ε log ε → 0. Indeed,
log ε ϕ(ε) − log ε ϕ(−ε) = log ε
Hence,
′
hf , ϕi = lim
ε→0
Z
−ε
−∞
ϕ(ε) − ϕ(−ε)
2ε −→ 2 lim ε log ε ϕ′ (0) = 0.
ε→0
2ε
+
Z
ε
∞
ϕ(x)
dx =
x
1
P ,ϕ .
x
16.2 The Distributions D′ (Rn )
431
(c) Convergence and Fourier Series
Lemma 16.5 Suppose that (fn ) converges locally uniformly to some function f , that is, fn ⇉ f
uniformly on every compact set; assume further that fn is locally integrable for all n, fn ∈
L1loc (Rn ).
(a) Then f ∈ L1loc (Rn ) and Tfn → Tf in D′ (Rn ).
(b) D α Tfn → D α Tf in D′ (Rn ) for all multi-indices α.
Proof. (a) Let K be a compact subset of Rn , we will show that f ∈ L1 (K). Since fn converge
R
R
uniformly on K to 0, by Theorem 6.6 f is integrable and limn→∞ K fn (x) dx = K f dx. such
that f ∈ L1loc (Rn ).
We show that Tfn → Tf in D′ . Indeed, for any ϕ ∈ D with compact support K, again by
Theorem 6.6 and uniform convergence of fn ϕ on K,
Z
fn (x)ϕ(x) dx
lim Tfn (ϕ) = lim
n→∞
n→∞ K
Z Z
=
lim fn (x) ϕ(x) dx =
f (x)ϕ(x) dx = Tf (ϕ);
K
n→∞
K
we are allowed to exchange limit and integration since (fn (x)ϕ(x)) uniformly converges on K.
Since this is true for all ϕ ∈ D, it follows that Tfn → Tf .
(b) By Lemma 16.4 (a), differentiation is a continuous operation in D′ . Thus D α Tfn → D α Tf .
Example 16.7 (a) Suppose that a, b > 0 and m ∈ N are given such that | cn | ≤ a | n |m + b for
all n ∈ Z. Then the Fourier series
X
cn einx ,
Z
n∈
converges in D′ (R).
First consider the series
X
cn
c0 xm+2
+
einx .
(m + 2)! n∈Z,n6=0 (ni)m+2
By assumption,
cn
cn
inx
(ni)m+2 e = (ni)m+2
a | n |m + b
ã
≤
≤
.
m+2
|n|
| n |2
(16.6)
P
Since n6=0 | nã|2 < ∞ , the series (16.6) converges uniformly on R by the criterion of Weierstraß (Theorem 6.2). By Lemma 16.5, the series (16.6) converges in D′ , too and can be differentiated term-by-term. The (m + 2)nd derivative of (16.6) is exactly the given Fourier series.
1/2
π
−π
−1/2
2π
x
The 2π-periodic function f (x) = 12 − 2π
,x∈
[0, 2π) has discontinuities of the first kind at
2πn, n ∈ Z; the jump has height 1 since
f (0 + 0) − f (0 − 0) = 12 + 12 = 1.
16 Distributions
432
Therefore in D′
f ′ (x) = −
The Fourier series of f (x) is
X
1
δ(x − 2πn).
+
2π n∈Z
f (x) ∼
1 X 1 inx
e .
2πi n6=0 n
R 2π
Note that f and the Fourier series g on the right are equal in L2 (0, 2π). Hence 0 | f − g |2 = 0.
This implies f = g a. e. on [0, 2π]; moreover f = g a. e. on R. Thus f = g in L1loc (R) and f
coincides with g in D′ (R).
1 X 1 inx
f (x) =
e
in D′ (R).
2πi
n
n6=0
By Lemma 16.5 the series can be differentiated elementwise up to arbitrary order. Applying
Example 16.6 we obtain:
X
1
1 X inx
=
δ(x − 2πn) =
e
in D′ (R).
f ′ (x) +
2π n∈Z
2π n∈Z
(b) A solution of xm u(x) = 0 in D′ is
u(x) =
m−1
X
cn δ (n) (x),
n=0
cn ∈ C.
Since for every ϕ ∈ D and n = 0, . . . , m − 1 we have
m (n)
x δ (x) , ϕ = (−1)n δ , (xm ϕ(x))(n) = (xm ϕ(x))(n) x=0 = 0;
thus, the given u satisfies xm u = 0. One can show, that this is the general solution, see [Wla72,
p. 84].
(c) The general solution of the ODE u(m) = 0 in D′ is a polynomial of degree m − 1.
Proof. We only prove that u′ = 0 implies u = c in D′ . The general statement follows by
induction on m.
Suppose that u′ = 0. That is, for all ψ ∈ D we have 0 = hu′ , ψi = − hu , ψ ′ i. In particular,
for ϕ, ϕ1 ∈ D we have
Z x
(ϕ(t) − ϕ1 (t)I) dt, where I = h1 , ϕi ,
ψ(x) =
−∞
belongs to D since both ϕ and ϕ1 do; ϕ1 plays an auxiliary role. Since hu , ψ ′ i = 0 and
ψ ′ = ϕ − I ϕ1 we obtain
0 = hu , ψ ′ i = hu , ϕ − ϕ1 Ii = hu , ϕi − hu , ϕ1 i h1 , ϕi
= hu , ϕi − h1 hu , ϕ1 i , ϕi
= hu − 1 hu , ϕ1 i , ϕi = hu − c 1 , ϕi ,
where c = hu , ϕ1 i. Since this is true for all test functions ϕ ∈ D(Rn ), we obtain 0 = u − c or
u = c which proves the assertion.
16.3 Tensor Product and Convolution Product
433
16.3 Tensor Product and Convolution Product
16.3.1 The Support of a Distribution
Let T ∈ D′ be a distribution. We say that T vanishes at x0 if there exists ε > 0 such that
hT , ϕi = 0 for all functions ϕ ∈ D with supp ϕ ⊆ Uε (x0 ). Similarly, we say that two
distributions T1 and T2 are equal at x0 , T1 (x0 ) = T2 (x0 ), if T1 − T2 vanishes at x0 . Note that
T1 = T2 if and only if T1 = T2 at a ∈ Rn for all points a ∈ Rn .
Definition 16.10 Let T ∈ D′ be a distribution. The support of T , denoted by supp T , is the set
of all points x such that T does not vanish at x, that is
supp T = {x | ∀ ε > 0 ∃ ϕ ∈ D(Uε (x)) : hT , ϕi =
6 0}.
Remarks 16.6 (a) If f is continuous, then supp Tf = supp f ; for an arbitrary locally integrable
function we have, in general supp Tf ⊂ supp f . The support of a distribution is closed. Its
complement is the largest open subset G of Rn such that T ↾G = 0.
(b) supp δa = {a}, that is, δa vanishes at all points b 6= a. supp TH = [0, +∞), supp TχQ = ∅.
16.3.2 Tensor Products
(a) Tensor product of Functions
Let f : Rn → C, g : Rm → C be functions. Then the tensor product f ⊗ g : Rn+m → C is
defined via f ⊗g(x, y) = f (x)g(y), x ∈ Rn , y ∈ Rm .
If ϕ ∈ D(Rn ) and ψk ∈ D(Rm ), k = 1, . . . , r, we call the function ϕ(x, y) =
Pr k
(y) which is defined on Rn+m the tensor product of the functions ϕk and ψk . It is
k=1 ϕk (x)ψ
Pk
P
denoted by k ϕk ⊗ ψk . The set of such tensors rk=1 ϕk ⊗ ψk is denoted by D(Rn ) ⊗ D(Rm ).
It is a linear space.
P
Note first that under the above assumptions on ϕk and ψk the tensor product ϕ = k ϕk ⊗
ψk ∈ C∞ (Rn+m ). Let K1 ⊂ Rn and K2 ⊂ Rm denote the common compact supports of
the families {ϕk } and {ψk }, respectively. Then supp ϕ ⊂ K1 × K2 . Since both K1 and K2
are compact, its product K1 × K2 is agian compact. Hence, ϕ(x, y) ∈ D(Rn+m ). Thus,
D(Rn ) ⊗ D(Rm ) ⊂ D(Rn+m ). Moreover, D(Rn ) ⊗ D(Rm ) is a dense subspace in D(Rn+m ).
(m)
(m)
That is, for any η ∈ D(Rn+m ) there exist positive integers rk ∈ N and test functions ϕk , ψk
such that
rm
X
(m)
(m)
ϕk ⊗ ψk −→ η as m → ∞.
D
l=1
(b) Tensor Product of Distributions
Definition 16.11 Let T ∈ D′ (Rn ) and S ∈ D′ (Rm ) be two distributions. Then there exists a
unique distribution F ∈ D′ (Rn+m ) such that for all ϕ ∈ D(Rn ) and ψ ∈ D(Rm )
F (ϕ ⊗ ψ) = T (ϕ)S(ψ).
This distribution F is denoted by T ⊗ S.
16 Distributions
434
P
Indeed, T ⊗ S is linear on D(Rn ) ⊗ D(Rn ) such that (T ⊗ S)( rk=1 ϕk ⊗ ψk ) =
Pr
n
n
n+m
). For
k=1 T (ϕk )S(ψk ). By continuity it is extended from D(R ) ⊗ D(R ) to D(R
n
m
n
example, if a ∈ R , b ∈ R then δa ⊗ δb = δ(a,b) . Indeed, for ϕ ∈ D(R ) and ψ ∈ D(Rm ) we
have
(δa ⊗ δb )(ϕ ⊗ ψ) = ϕ(a)ψ(b) = (ϕ ⊗ ψ)(a, b) = δ(a,b) (ϕ ⊗ ψ).
Lemma 16.6 Let F = T ⊗ S be the unique distribution in D′ (Rn+m ) where T ∈ D(Rn ) and
S ∈ D(Rm ) and η(x, y) ∈ D(Rn+m ).
Then ϕ(x) = hS(y) , η(x, y)i is in D(Rn ), ψ(y) = hT (x) , η(x, y)i is in D(Rm ) and we have
h(T ⊗ S) , ηi = hS , hT , ηii = hT , hS , ηii .
For the proof, see [Wla72, II.7].
Example 16.8 (a) Regular Distributions. Let f ∈ L1loc (Rn ) and g ∈ L1loc (Rm ). Then f ⊗ g ∈
L1loc (Rn+m ) and Tf ⊗ Tg = Tf ⊗g . Indeed, by Fubini’s theorem, for test functions ϕ and ψ one
has
Z
Z
h(Tf ⊗ Tg ) , ϕ ⊗ ψi = hTf , ϕi hTg , ψi = f (x)ϕ(x) dx
g(y)ψ(y) dy
ZR
Rm
n
=
Rn+m
f (x)g(y) ϕ(x)ψ(y) dxdy = hTf ⊗g , ϕ ⊗ ψi .
(b) hδx0 ⊗ T , ηi = hT , η(x0 , y)i. Indeed,
hδx0 ⊗ T , ϕ(x)ψ(y)i = hδx0 , ϕ(x)i hT , ψi = ϕ(x0 ) hT , ψ(y)i = hT , ϕ(x0 )ψ(y)i .
In particular,
(δa ⊗ Tg )(η) =
(c) For any α ∈ Nn0 , β ∈ Nm
0 ,
Z
Rm
g(y)η(a, y) dy.
D α+β (T ⊗ S) = (Dxα T ) ⊗ (Dyβ S) = D β ((D α T ) ⊗ S) = D α (T ⊗ D β S).
Idea of proof in case n = m = 1. Let ϕ, ψ ∈ D(R). Then
∂
∂
(T ⊗ S) (ϕ ⊗ ψ) = − (T ⊗ S)
(ϕ ⊗ ψ)
∂x
∂x
= −(T ⊗ S)(ϕ′ ⊗ ψ) = −T (ϕ′ )S(ψ) = T ′ (ϕ)S(ψ)
= (T ′ ⊗ S)(ϕ ⊗ ψ).
16.3.3 Convolution Product
Motivation: Knowing the fundamental solution E of a linear differential operator L, that is
L(E) = δ, one can immediately has the solution of the equation L[u] = f for an arbitrary
f , namely u = E ∗ f where the ∗ is the convolution product already defined for functions in
Definition 16.2.
16.3 Tensor Product and Convolution Product
435
(a) Convolution Product of Functions
The main problem with convolutions is: we run into trouble with the support. Even in case that
f and g are locally integrable, f ∗ g need not to be a locally integrable function. However, there
are three cases where all is fine:
1. One of the two functions f or g has compact support.
2. Both functions have support in [0, +∞).
3. Both functions are in L1 (R).
R
In the last case (f ∗ g)(x) = f (y)g(x − y) dy is again integrable. The convolution product is
a commutative and associative operation on L1 (Rn ).
(b) Convolution Product of Distributions
Let us consider the case of regular distributions. Suppose that If f, g, f ∗ g ∈ L1loc (Rn ).
As usual we want to have Tf ∗ Tg = Tf ∗g . Let ϕ ∈ D(Rn ),
Z
ZZ
hTf ∗g , ϕi = (f ∗ g)(x)ϕ(x) dx =
f (y)g(x − y)ϕ(x) dxdy
R
=
t=x−y
ZZ
R2
R2
f (y)g(t)ϕ(y + t) dy dt = Tf ⊗g (ϕ̃),
(16.7)
where ϕ̃(y, t) = ϕ(y + t).
0110
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
supp ϕ(x+y)
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
x
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
supp ϕ
00000000000000000000000
11111111111111111111111
1010
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
y
There are two problems: (a) in general ϕ̃ is not
a test function since it has unbounded support
in R2n . Indeed, (y, t) ∈ supp ϕ̃ if y + t = c ∈
supp ϕ, which is a family of parallel lines forming an unbounded strip. (b) the integral does not
exist. We overcome the second problem if we
impose the condition that the set
Kϕ = {(y, t) ∈ R2n | y ∈ supp Tf , t ∈ supp Tg ,
y + t ∈ supp ϕ}
is bounded for any ϕ ∈ D(Rn ); then the integral (16.7) makes sense.
We want to solve the problem (a) by “cutting” ϕ̃. Define
Tf ∗g (ϕ) = lim (Tf ⊗ Tg )(ϕ(y + t)ηk (y, t)),
k→∞
where ηk −→ 1 as k → ∞ and ηk ∈ D(R2n ). Such a sequence exists; let η(y, t) ∈ D(R2n )
n→∞
with η(y, t) = 1 for kyk2 +ktk2 ≤ 1. Put ηk (y, t) = η ky , kt , k ∈ N. Then limk→∞ ηk (y, t) = 1
for all (y, t) ∈ R2n .
16 Distributions
436
Definition 16.12 Let T, S ∈ D′ (Rn ) be distributions and assume that for every ϕ ∈ D(Rn ) the
set
Kϕ := {(x, y) ∈ R2n | x + y ∈ supp ϕ, x ∈ supp T, y ∈ supp S}
is bounded. Define
hT ∗ S , ϕi = lim hT ⊗ S , ϕ(x + y)ηk (x, y)i .
(16.8)
k→∞
T ∗ S is called the convolution product of the distributions S and T .
Remark 16.7 (a) The sequence (16.8) becomes stationary for large k such that the limit exists.
Indeed, for fixed ϕ, the set Kϕ is bounded, hence there exists k0 ∈ N such that ηk (x, y) = 1 for
all x, y ∈ Kϕ and all k ≥ k0 . That is ϕ(x + y)ηk (x, y) does not change for k ≥ k0 .
(b) The limit is a distribution in D′ (Rn )
(c) The limit does not depend on the special choice of the sequence ηk .
Remarks 16.8 (Properties) (a) If S or T has compact support, then T ∗ S exists. Indeed,
suppose that supp T is compact. Then x + y ∈ supp ϕ and x ∈ supp T imply y ∈ supp ϕ −
supp T = {y1 − y2 | y1 ∈ supp ϕ, y2 ∈ supp T }. Hence, (x, y) ∈ Kϕ implies
k(x, y)k ≤ kxk + kyk ≤ kxk + ky1 k + ky2 k ≤ 2C + D
if supp T ⊂ UC (0) and supp ϕ ⊂ UD (0). That is, Kϕ is bounded.
(b) If S ∗ T exists, so does T ∗ S and S ∗ T = T ∗ S.
(c) If T ∗ S exists, so do D α T ∗ S, T ∗ D α S, D α (T ∗ S) and they coincide:
D α (T ∗ S) = D α T ∗ S = T ∗ D α S.
Proof. For simplicity let n = 1 and D α =
d
.
dx
Suppose that ϕ ∈ D(R) then
h(S ∗ T )′ , ϕi = − hS ∗ T , ϕ′ i = − lim hS ⊗ T , ϕ′ (x + y) ηk (x, y)i
k→∞
∂
∂ηk
= − lim S ⊗ T ,
(ϕ(x + y)ηk (x, y)) − ϕ(x + y)
k→∞
∂x
∂x
*
= lim hS ′ ⊗ T , ϕ(x + y)ηk (x, y)i − lim
k→∞
k→∞
S ⊗ T , ϕ(x + y)
∂ηk
∂x
|{z}
+
=0 for large k
′
= hS ∗ T , ϕi
The proof of the second equality uses commutativity of the convolution product.
(d) If supp S is compact and ψ ∈ D(Rn ) such that ψ(y) = 1 in a neighborhood of supp S.
Then
(T ∗ S)(ϕ) = hT ⊗ S , ϕ(x + y)ψ(y)i ,
∀ ϕ ∈ D(Rn ).
(e) If T1 , T2 , T3 ∈ D′ (Rn ) all have compact support, then T1 ∗ (T2 ∗ T3 ) and (T1 ∗ T2 ) ∗ T3 exist
and T1 ∗ (T2 ∗ T3 ) = (T1 ∗ T2 ) ∗ T3 .
16.3 Tensor Product and Convolution Product
437
16.3.4 Linear Change of Variables
Suppose that y = Ax + b is a regular, linear change of variables; that is, A is a regular n × n
˜
matrix. As usual, consider first the case of a regular distribution f (x). Let f(x)
= f (Ax + b)
−1
with y = Ax + b, x = A (y − b), dy = det A dx. Then
D
E Z
˜
f (x) , ϕ(x) = f (Ax + b)ϕ(x) dx
Z
1
= f (y)ϕ(A−1(y − b))
dy
| det A |
1
f (y) , ϕ(A−1 (y − b)) .
=
| det A |
Definition 16.13 Let T ∈ D′ (Rn ), A a regular n × n-matrix and b ∈ Rn . Then T (Ax + b)
denotes the distribution
1
T (y) , ϕ(A−1 (y − b)) .
hT (Ax + b) , ϕ(x)i :=
| det A |
For example, T (x) = T , hT (x − a) , ϕ(x)i = hT , ϕ(x + a)i, in particular, δ(x − a) = δa .
hδ(x − b) , ϕ(x)i = hδ(x) , ϕ(x + b)i = ϕ(0 + b) = ϕ(b) = hδb , ϕi .
Example 16.9 (a) δ ∗ S = S ∗ δ = S for all S ∈ D′ . The existence is clear since δ has compact
support.
h(δ ∗ S) , ϕi = lim hδ(x) ⊗ S(y) , ϕ(x + y)ηk (x, y)i
k→∞
= lim hS(y) , ϕ(y)ηk (0, y)i = hS , ϕi
k→∞
(b) δa ∗ S = S(x − a). Indeed,
(δa ∗ S)(ϕ) = lim hδa ⊗ S , ϕ(x + y)ηk (x, y)i
k→∞
= lim hS(y) , ϕ(a + y)ηk (a, y)i = hS(y) , ϕ(a + y)i = hS(y − a) , ϕi .
k→∞
Inparticular δa ∗ δb = δa+b .
(c) Let ̺ ∈ L1loc (Rn ) and supp Tf is compact.
1
Case n = 2 f (x) = log kxk
∈ L1loc (R2 ). We call
V (x) = (̺ ∗ f )(x) =
surface potential with density ̺.
Case n ≥ 3 f (x) =
1
kxkn−2
ZZ
∈ L1loc (Rn ). We call
V (x) = (̺ ∗ f )(x) =
Z
vector potential with density ̺.
(d) For α > 0 and x ∈ R put fα (x) =
2
x
√1 e− 2α2 .
α 2π
R2
Rn
̺(y) log
̺(y)
1
dy
kx − yk
1
dy
kx − ykn−2
Then fα ∗ fβ = f√α2 +β 2 .
16 Distributions
438
16.3.5 Fundamental Solutions
Suppose that L[u] is a linear differential operator on Rn ,
X
L[u] =
cα (x)D α u,
| α |≤k
where cα ∈ C∞ (Rn ).
Definition 16.14 A distribution E ∈ D′ (Rn ) is said to be a fundamental solution of the differential operator L if
L(E) = δ.
Note that E ∈ D′ (Rn ) need not to be unique. It is a general result due to Malgrange and Ehrenpreis (1952) that any linear partial differential operator with constant coefficients possesses a
fundamental solution.
(a) ODE
We start with an example from the theory of ordinary differential equations. Recall that H =
χ(0,+∞) denotes the Heaviside function.
Lemma 16.7 Suppose that u(t) is a solution of the following initial value problem for the ODE
L[u] = u(m) + a1 (t)u(m−1) + · · · + am (t)u = 0,
u(0) = u′ (0) = · · · = u(m−2) (0) = 0,
u(m−1) (0) = 1.
Then E = TH(t) u(t) is a fundamental solution of L, that is, it satisfies L(E) = δ.
Proof. Using Leibniz’ rule, Example 16.5 (b), and u(0) = 0 we find
E′ = δ Tu + THu′ = u(0)δ + THu′ = THu′ .
Similarly, on has
E′′ = THu′′ , . . . , E(m−1) = THu(m−1) ,
E(m) = THu(m) + δ(t).
This yields
L(E) = E(m) + a1 (t)E(m−1) + · · · + am (t)E(t) = TH(t) L(u(t)) + δ = T0 + δ = δ.
Example 16.10 We have the following example of fundamental solutions:
y ′ + ay = 0,
E = TH(x)e−ax ,
′′
E = TH(x) sin ax .
2
y + a y = 0,
a
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
439
(b) PDE
Here is the main application of the convolution product: knowing the fundamental solution of
a partial differential operator L one immediately knows a weak solution of the inhomogeneous
equations L(u) = f for f ∈ D′ (Rn ).
X
Theorem 16.8 Suppose that L[u] =
cα D α u is a linear differential operator in Rn with
| α |≤k
constant coefficients cα . Suppose further that E ∈ D′ (Rn ) is a fundamental solution of L. Let
f ∈ D′ (Rn ) be a distribution such that the convolution product S = E ∗ f exists.
Then L(S) = f in D′ .
In the set of distributions of D′ which possess a convolution with E, S is the unique solution of
L(S) = f
Proof. By Remark 16.8 (b) we have
X
X
L(S) =
ca D α (E ∗ f ) =
ca D α (E) ∗ f = L(E) ∗ f = δ ∗ f = f.
| α |≤k
| α |≤k
Suppose that S1 and S2 are both solutions of L(S) = f , i. e. L(S1 ) = L(S2 ) = f . Then
S1 − S2 = (S1 − S2 ) ∗ δ = (S1 − S2 ) ∗

=
X
ca D α E =
| α |≤k
X
| α |≤k

ca D α (S1 − S2 ) ∗ E = (f − f ) ∗ E = 0. (16.9)
16.4 Fourier Transformation in S (
R ) and S (R )
n
′
n
We want to define the Fourier transformation for test functions ϕ as well as for distributions.
The problem with D(Rn ) is that its Fourier transformation
Z
e−ixξ ϕ(x) dx
Fϕ(ξ) = αn
R
of ϕ is an entire (analytic) function with real support R. That is, Fϕ does not have compact
support. The only test function in D which is analytic is 0. To overcome this problem, we
enlarge the space of test function D ⊂ S in such a way that S becomes invariant under the
Fourier transformation F(S ) ⊂ S .
R
Lemma 16.9 Let ϕ ∈ D(R). Then the Fourier transform g(z) = αn R e−itz ϕ(t) dt is holomorphic in the whole complex plane and bounded in any half-plane Ha = {z ∈ C | Im (z) ≤
a}.
16 Distributions
440
Proof. (a) We show that the complex limit limh→0(g(z + h) − g(z))/h exists for all z ∈ C.
Indeed,
Z
g(z + h) − g(z)
e−iht − 1
= αn
ϕ(t) dt.
e−izt
h
h
R
−izt e−iht −1
Since e
ϕ(t) ≤ C for all x ∈ supp (ϕ), h ∈ C, | h | ≤ 1, we can apply Lebesgue’s
h
Dominated Convergence theorem:
Z
Z
g(z + h) − g(z)
e−iht − 1
−izt
= αn
ϕ(t) dt = αn
e
lim
e−izt (−it)ϕ(t) dt = F(−itϕ(t)).
lim
h→0
h→0
h
h
R
R
(b) Suppose that Im (z) ≤ a. Then
Z
Z
−it Re (z) t Im (z)
| g(z) | ≤ αn
e
e
ϕ(t) dt ≤ αn sup | ϕ(t) |
eta dt,
R
t∈K
K
where K is a compact set which contains supp ϕ.
R)
16.4.1 The Space S (
n
Definition 16.15 S (Rn ) is the set of all functions f ∈ C∞ (Rn ) such that for all multi-indices
α and β
pα,β (f ) = sup xβ D α f (x) < ∞.
Rn
x∈
S is called the Schwartz space or the space of rapidly decreasing functions.
S (Rn ) = {f ∈ C∞ (Rn ) | ∀ α, β : Pα,β (f ) < ∞}.
Roughly speaking, a Schwartz space function is a function decreasing to 0 (together with all its
1
partial derivatives) faster than any rational function
as x → ∞. In place of pα,β one can
P (x)
also use the norms
X
pk,l (ϕ) =
pα,β (ϕ), k, l ∈ Z+
| α |≤k, | β |≤l,
to describe S (Rn ).
The set S (Rn ) is a linear space and pα,β are norms on S .
2
For example, P (x) 6∈ S for any non-zero polynomial P (x); however e−kxk ∈ S (Rn ).
S (Rn ) is an algebra. Indeed, the generalized Leibniz rule ensures pkl (ϕ · ψ) < ∞. For
2
example, f (x) = p(x)e−ax +bx+c , a > 0, belongs to S (R) for p is a polynomial; g(x) = e−| x |
is not differentiable at 0 and hence not in S (R).
Convergence in S
Definition 16.16 Let ϕn , ϕ ∈ S . We say that the sequence (ϕn ) converges in S to ϕ, abbreviated by ϕn −→ ϕ, if one of the following equivalent conditions is satisfied for all multi-indices
S
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
441
α and β:
pα,β (ϕ − ϕn ) −→ 0;
n→∞
−→ 0, uniformly on Rn ;
x D (ϕ − ϕn ) −
−→
−→ xβ D α ϕ, uniformly on Rn .
−→
xβ D α ϕn −
β
α
Remarks 16.9 (a) In quantum mechanics one defines the position and momentum operators
Qk and Pk , k = 1, . . . , n, by
(Qk ϕ)(x) = xk ϕ(x),
(Pk ϕ)(x) = −i
∂ϕ
,
∂xk
respectively. The space S is invariant under both operators Qk and Pk ; that is xβ D α ϕ(x) ∈
S (Rn ) for all ϕ ∈ S (Rn ).
(b) S (Rn ) ⊂ L1 (Rn ).
Recall that a rational function P (x)/Q(x) is integrable over [1, +∞) if and only if Q(x) 6= 0
for x ≥ 1 and deg Q ≥ deg P + 2. Indeed, C/x2 is then an integrable upper bound. We want
to find a condition on m such that
Z
dx
< ∞.
Rn (1 + kxk2 )m
For, we use that any non-zero x ∈ Rn can uniquely be written as x = r y where r = kxk and y
is on the unit sphere Sn−1 . One can show that dx1 dx2 · · · dxm = r n−1 dr dS where dS is the
surface element of the unit sphere Sn−1 . Using this and Fubini’s theorem,
Z
Z ∞Z
Z ∞ n−1
dx
r n−1 dr dS
r
dr
= ωn−1
,
2 m =
2
m
(1 + r 2 )m
0
0
Rn (1 + kxk )
Sn−1 (1 + r )
where ωn−1 is the (n − 1)-dimensional measure of the unit sphere Sn−1 . By the above criterion,
the integral is finite if and only if 2m − n + 1 > 1 if and only if m > n/2. In particular,
Z
dx
< ∞.
Rn 1 + kxkn+1
In case n = 1 the integral is π.
By the above argument
Z
Z
| ϕ(x) | dx =
Rn
Rn
(1 + kxk2n )ϕ(x) ≤ C p0,2n (ϕ)
Z
Rn
dx
1 + kxk2n
dx
< ∞.
1 + kxk2n
(c) D(Rn ) ⊂ S (Rn ); indeed, the supremum pα,β (ϕ) of any test function ϕ ∈ D(Rn ) is finite
since the supremum of a continuous function over a compact set is finite. On the other hand,
2
D ( S since e−kxk is in S but not in D.
(d) In contrast to D(Rn ), the rapidly decreasing functions S (Rn ) form a metric space. Indeed,
S (Rn ) is a locally convex space, that is a linear space V such that the topology is given by a
16 Distributions
442
set of semi-norms pα separating the elements of V , i. e. pα (x) = 0 for all α implies x = 0.
Any locally convex linear space where the topology is given by a countable set of semi-norms
is metrizable. Let (pn )n∈N be the defining family of semi-norms. Then
∞
X
1 pn (ϕ − ψ)
d(ϕ, ψ) =
,
2n 1 + pn (ϕ − ψ)
n=1
ϕ, ψ ∈ V
defines a metric on V describing the same topology. (In our case, use Cantor’s first diagonal
method to write the norms pkl , k, l ∈ N, from th array into a sequence pn .)
Definition 16.17 Let f (x) ∈ L1 (Rn ), then the Fourier transform Ff of the function f (x) is
given by
Z
1
ˆ
e−ix·ξ f (x) dx,
Ff (ξ) = f (ξ) = √ n
2π
Rn
P
where x = (x1 , . . . , xn ), ξ = (ξ1 , . . . , ξn ), and x · ξ = nk=1 xk ξk .
1
Let us abbreviate the normalization factor, αn = √2π
n . Caution, Wladimirow, [Wla72] uses
+iξ·x
under the integral and normalization factor 1 in place of αn .
another convention with e
R
Note that Ff (0) = αn Rn f (x) dx.
Example 16.11 We calculate the Fourier transform Fϕ
2
1
ϕ(x) = e−kxk /2 = e− 2 x·x , x ∈ Rn .
(a) n = 1. From complex analysis, Lemma 14.30, we know
Z
x2 ξ2
x2
1
− 2
F e
(ξ) = √
e− 2 e−ixξ dx = e− 2 .
2π R
of
the
function
(16.10)
(b) Arbitrary n. Thus,
Fϕ(ξ) = ϕ̂(ξ) = αn
= αn
=
Z
Rn k=1
n
Y
(16.10)
Z
1
Rn
1
e− 2
Pn
k=1
x2k
e−i
Pn
k=1
xk ξk
dx
2
e− 2 xk −ixk ξk dx1 · · · dxn
1
2
αn e− 2 xk −ixk ξk dxk
k=1
Fϕ(ξ) =
n
Y
Z
n
Y
1 2
1 2
e− 2 ξk = e− 2 ξ .
k=1
1 2
Hence, the Fourier transform of e− 2 x is the function itself. It follows via scaling x 7→ cx that
c2 x 2 ξ2
1
F e− 2 (ξ) = n e− 2c2 .
c
Theorem 16.10 Let ϕ, ψ ∈ S (Rn ). Then we have
(i) F(xα ϕ(x)) = i| α | D α (Fϕ), that is F ◦Qk = −Pk ◦F, k = 1, . . . , n.
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
443
(ii) F(D αϕ(x))(ξ) = i| α | ξ α (Fϕ)(ξ), that is F ◦Pk = Qk ◦F, k = 1, . . . , n.
(iii) F(ϕ) ∈ S (Rn ), moreover ϕn −→ ϕ implies Fϕn −→ Fϕ, that is, the Fourier transform
S
S
F is a continuous linear operator on S .
(iv) F(ϕ ∗ ψ) = αn−1 F(ϕ) F(ψ).
(v) F(ϕ · ψ) = αn F(ϕ) ∗ F(ψ)
(vi)
F(ϕ(Ax + b))(ξ) =
1
−1
eiA b·ξ Fϕ A−⊤ ξ ,
| det A |
where A is a regular n × n matrix and A−⊤ denotes the transpose of A−1 . In particular,
ξ
1
,
F(ϕ(λx))(ξ) =
n (Fϕ)
|λ|
λ
F(ϕ(x + b))(ξ) = eib·ξ (Fϕ)(ξ).
Proof. (i) We carry out the proof in case α = (1, 0, . . . , 0). The general case simply follows.
Z
∂
∂
(Fϕ)(ξ) = αn
e−iξ·x ϕ(x) dx.
∂ξ1
∂ξ1 Rn
Since ∂ξ∂1 e−iξ·x ϕ(x) = −ix1 e−iξ·xϕ(x) tends to 0 as x → ∞, we can exchange partial differentiation and integration, see Proposition 12.23 Hence,
Z
∂
e−iξ·x ix1 ϕ(x) dx = F(−ix1 ϕ(x))(ξ).
(Fϕ)(ξ) = −αn
∂ξ1
Rn
(ii) Without loss of generality we again assume α = (1, 0, . . . , 0). Using integration by parts,
we obtain
Z
Z
∂
∂
−iξ·x
ϕ(x) (ξ) = αn
ϕx1 (x) dx = −αn
e
F
e−iξ·x ϕ(x) dx
∂x1
RZn
Rn ∂x1
e−iξ·x ϕ(x) dx = iξ1 (Fϕ)(ξ).
= iξ1 αn
Rn
(iii) By (i) and (ii) we have for | α | ≤ k and | β | ≤ l
Z
Z
α β
α β
D (x ϕ(x)) dx ≤ c1
ξ D Fϕ ≤ αn
R
n
Z
R
n
(1 + kxkl )
(1 + kxkl+n+1) X
| D γ ϕ(x) | dx
n+1
n
R (1 + kxk ) | γ |≤k


X
| D γ ϕ(x) |
≤ c3 sup (1 + kxkl+n+1 )
≤ c2
Rn
x∈
≤ c4 pk,l+n+1(ϕ).
| γ |≤k
X
| γ |≤k
| D γ ϕ(x) | dx
16 Distributions
444
This implies Fϕ ∈ S (Rn ) and, moreover F : S → S being continuous.
(iv) First note that L1 (Rn ) is an algebra with respect to the convolution product where
kf ∗ gkL1 ≤ kf kL1 kgkL1 . Indeed,
Z Z
Z
| f ∗ g | dx =
| f (y)g(x − y) | dy dx
kf ∗ gkL1 =
n
n
n
R
R
R
Z
Z
≤
| f (y) |
| g(x − y) | dx dy
Rn Z
Rn
| f (y) | dy = kf kL1 kgkL1 .
≤ kgkL1
Rn
This in particular shows that | f ∗ g(x) | < ∞ is finite a.e. on Rn . By definition and Fubini’s
theorem we have
Z
Z
−ix·ξ
F(ϕ ∗ ψ)(ξ) = αn
e
ϕ(y)ψ(x − y) dy dx
n
n
R
R
Z Z
−1
−i(x−y)·ξ
= αn
ψ(x − y) dx αn e−iy·ξ ϕ(y) dy
e
αn
Rn
=
z=x−y
Rn
αn−1 Fψ(ξ) Fϕ(ξ).
(v) will be done later, after Proposition 16.11.
(vi) is straightforward using hA−1 (y) , ξi = y , A−⊤ (ξ) .
Remark 16.10 Similar properties as F has the operator G which is also defined on L1 (Rn ):
Z
Gϕ(ξ) = ϕ̌(ξ) = αn
e+ix·ξ ϕ(x) dx.
Rn
Put ϕ− (x) := ϕ(−x). Then
Gϕ = Fϕ− = F(ϕ)
and Fϕ = Gϕ− = G(ϕ).
It is easy to see that (iv) holds for G, too, that is
G(ϕ ∗ ψ) = αn−1 G(ϕ)G(ψ).
Proposition 16.11 (Fourier Inversion Formula) The Fourier transformation is a one-to-one
mapping of S (Rn ) onto S (Rn ). The inverse Fourier transformation is given by Gϕ:
F(Gϕ) = G(Fϕ) = ϕ,
x·x
ϕ ∈ S (Rn ).
ε2 x2
Proof. Let ψ(x) = e− 2 and Ψ (x) = ψ(εx) = e− 2 . Then Ψε (x) := FΨ (x) = ε1n ψ
have
Z
Z
Z
1 x
dx = αn
αn
Ψε (x) dx = αn
ψ(x) dx = ψ̂(0) = 1.
ψ
Rn
Rn εn ε
Rn
x
ε
. We
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
445
Further,
Z
Z
Z
1
1
1
√ n
Ψε (x)ϕ(x) dx = √ n
ψ(x)ϕ(εx) dx −→ √ n
ψ(x)ϕ(0) dx = ϕ(0).
ε→0
2π Rn
2π Rn
2π Rn
(16.11)
In other words, αn Ψε (x) is a δ-sequence.
We compute G(Fϕ Ψ )(x). Using Fubini’s theorem we have
Z
Z
Z
1
1
i x·ξ
i x·ξ
(Fϕ)(ξ)Ψ (ξ)e dξ =
Ψ (ξ)e
e−i ξ·y ϕ(y) dydξ
G(Fϕ Ψ )(x) = √ n
n
(2π) Rn
2π ZRn
Rn
Z
1
1
ϕ(y) √ n
e−i(y−x)·ξ Ψ (ξ)dξ dy
=√ n
2π ZRn
2π Rn
1
ϕ(y) (FΨ )(y − x) dy
=√ n
2π Rn Z
Z
1
1
= √ n
FΨ (z) ϕ(z + x) dz = √ n
Ψε (z) ϕ(z + x) dz
z:=y−x
2π Rn
2π Rn
−→
ϕ(x).
as ε → 0, see (16.11)
On the other hand, by Lebesgue’s dominated convergence theorem,
Z
Z
i x·ξ
(Fϕ)(ξ)ψ(0)e dξ = αn
(Fϕ)(ξ)ei x·ξ dξ = G(Fϕ)(x).
lim G(Fϕ · Ψ )(x) = αn
Rn
ε→0
Rn
This proves the first part. The second part F(Gϕ) = ϕ follows from G(ϕ) = F(ϕ), F(ϕ) =
G(ϕ), and the first part.
We are now going to complete the proof of Theorem 16.10 (v). For, let ϕ = Gϕ1 and ψ = Gψ1
with ϕ1 , ψ1 ∈ S . By (iv) we have
F(ϕ · ψ) = F(G(ϕ1 )G(ψ1 )) = F(αn G(ϕ1 ∗ ψ1 )) = αn ϕ1 ∗ ψ1 = αn Fϕ ∗ Fψ.
Proposition 16.12 (Fourier–Plancherel formula) For ϕ, ψ ∈ S (Rn ) we have
Z
Z
ϕ ψ dx =
F(ϕ) F(ψ) dx,
Rn
Rn
In particular, kϕkL2 (Rn ) = kF(ϕ)kL2 (Rn )
Proof. First note that
F(ψ)(−y) = αn
Z
ix·y
R
e
n
ψ(x) dx = αn
Z
R
e−ix·y ψ(x) dx = F(ψ)(y).
By Theorem 16.10 (v),
Z
ϕ(x) ψ(x) dx = αn−1 F(ϕ · ψ)(0) = (F(ϕ) ∗ F(ψ))(0)
Rn
Z
Z
=
F(ϕ)(y) F(ψ)(0 − y) dy =
Rn
(16.12)
n
(16.12)
Rn
F(ϕ)(y) F(ψ)(y) dy.
16 Distributions
446
Remark 16.11 S (Rn ) ⊂ L2 (Rn ) is dense. Thus, the Fourier transformation has a unique
extension to a unitary operator F : L2 (Rn ) → L2 (Rn ). (To a given f ∈ L2 choose a sequence
ϕn ∈ S converging to f in the L2 -norm. Since F preserves the L2 -norm, kFϕn − Fϕm k =
kϕn − ϕm k and (ϕn ) is a Cauchy sequence in L2 , (Fϕn ) is a Cauchy sequence, too; hence it
converges to some g ∈ L2 . We define F(f ) = g.)
R)
16.4.2 The Space S ′ (
n
Definition 16.18 A tempered distribution (or slowly increasing distribution) is a continuous
linear functional T on the space S (Rn ). The set of all tempered distributions is denoted by
S ′ (Rn ).
A linear functional T on S (Rn ) is continuous if for all sequences ϕn ∈ S with ϕn −→ 0,
S
hT , ϕn i → 0.
For ϕn ∈ D with ϕn −→ 0 it follows that ϕn −→ 0. So, every continuous linear functional
S
D
on S (restricted to D) is continuous on D. Moreover, the mapping ι : S ′ → D′ , ι(T ) = T ↾D
is injective since T (ϕ) = 0 for all ϕ ∈ D implies T (ψ) = 0 for all ψ ∈ S which follows
from D ⊂ S is dense and continuity of T . Using the injection ι, we can identify S ′ with as a
subspace of D′ , S ′ ⊆ D′ . That is, every tempered distribution “is” a distribution.
Lemma 16.13 (Characterization of S ′ ) A linear functional T defined on S (Rn ) belongs to
S ′ (Rn ) if and only if there exist non-negative integers k and l and a positive number C such
that for all ϕ ∈ S (Rn )
| T (ϕ) | ≤ C pkl (ϕ),
X
where pkl (ϕ) =
pαβ (ϕ).
| α |≤k | β |≤l,
Remarks 16.12 (a) With the usual identification f ↔ Tf of functions and regular distributions,
L1 (Rn ) ⊆ S ′ (Rn ), L2 (Rn ) ⊆ S ′ (Rn ).
2
(b) L1loc 6⊂ S ′ , for example Tf 6∈ S ′ (
R), f (x) = ex , since Tf (ϕ) is not well-defined for all
2
Schwartz function ϕ, for example Tex2 e−x = +∞.
(c) If T ∈ D′ (Rn ) and supp T is compact then T ∈ S ′ (Rn ).
(d) Let f (x) be measurable. If there exist C > 0 and m ∈ N such that
| f (x) | ≤ C(1 + kxk2 )m
a. e. x ∈ Rn .
Then Tf ∈ S ′ . Indeed, the above estimate and Remark 16.9 imply
Z
Z
1
f (x)ϕ(x) dx ≤ C
(1 + kxk2 )m (1 + kxk2 )n
| hTf , ϕi | = | ϕ(x) | dx
n
(1 + kxk2 )n
Rn
R
Z
dx
≤ Cp0,2m+2n (ϕ)
.
Rn (1 + kxk2 )n
By Lemma 16.13, f is a tempered regular distribution, f (x) ∈ S ′ .
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
447
Operations on S ′
The operations are defined in the same way as in case of D′ . One has to show that the result is
again in the (smaller) space S ′ . If T ∈ S ′ then
(a) D α T ∈ S ′ for all multi-indices α.
(b) f ·T ∈ S ′ for all f ∈ C∞ (Rn ) such that D α f growth at most polynomially at infinity for all
multi-indices α, (i. e. for all multi-indices α there exist Cα > 0 and kα such that | D α f (x) | ≤
Cα (1 + kxk)kα . )
(c) T (Ax + b) ∈ S ′ for any regular real n × n- matrix and b ∈ Rn .
(d) T ∈ S ′ (Rn ) and S ∈ S ′ (Rm ) implies T ⊗ S ∈ S ′ (Rn+m ).
(e) Let T ∈ S ′ (Rn ), ψ ∈ S (Rn ). Define the convolution product
hψ ∗ T , ϕi = h(1(x) ⊗ T (y)) , ψ(x)ϕ(x + y)i ,
Z
= T,
ψ(x)ϕ(x + y) dx
ϕ ∈ S (Rn )
Rn
Note that this definition coincides with the more general Definition 16.12 since
lim ψ(x)ϕ(x + y)ηk (x, y) = ψ(x)ϕ(x + y) ∈ S (R2n ).
k→∞
R)
16.4.3 Fourier Transformation in S ′ (
n
We are following our guiding principle to define the Fourier transform of a distribution T ∈ S ′ :
First consider the case of a regular tempered distribution. We want to define F(Tf ) := TFf .
Suppose that f (x) ∈ L1 (Rn ) is integrable. Then its Fourier transformation Ff exists and is a
bounded continuous function:
Z
Z
iξ·x
| Ff (ξ) | ≤ αn
| f (x) | dx = αn kTf kL1 < ∞.
e f (x) dx = αn
Rn
Rn
By Remark 16.12 (d), Ff defines a distribution in S ′ . By Fubini’s theorem
ZZ
Z
hTFf , ϕi =
f (x)e−iξ·xϕ(ξ)dξ dx
Ff (ξ)ϕ(ξ)dξ = αn
Rn
=
Z
Rn
R2n
f (x) Fϕ(x) dx = hTf , Fϕi .
Hence, hTFf , ϕi = hTf , Fϕi. We take this equation as the definition of the Fourier transformation of a distribution T ∈ S ′ .
Definition 16.19 For T ∈ S ′ and ϕ ∈ S define
hFT , ϕi = hT , Fϕi .
We call FT the Fourier transform of the distribution T .
(16.13)
16 Distributions
448
Since Fϕ ∈ S (Rn ), FT it is well-defined on S . Further, F is a linear operator and T a
linear functional, hence FT is again a linear functional. We show that FT is a continuous linear
functional on S . For, let ϕn −→ ϕ in S . By Theorem 16.10, Fϕn −→ Fϕ. Since T is
S
S
continuous,
hFT , ϕn i = hT , Fϕn i −→ hT , Fϕi = hFT , ϕi
which proves continuity.
Lemma 16.14 The Fourier transformation F : S ′ (Rn ) → S ′ (Rn ) is a continuous bijection.
Proof. (a) We show continuity (see also homework 53.3). Suppose that Tn → T in S ′ ; that is
for all ϕ ∈ S , hTn , ϕi → hT , ϕi. Hence,
hFTn , ϕi = hTn , Fϕi −→ hT , Fϕi = hFT , ϕi
n→∞
which proves the assertion.
(b) We define a second transformation G : S ′ → S ′ via
hGT , ϕi := hT , Gϕi
and show that F ◦G = G◦F = id on S ′ . Taking into account Proposition 16.11 we have
hG(FT ) , ϕi = hFT , Gϕi = hT , F(Gϕ)i = hT , ϕi ;
thus, G◦F = id . The proof of the direction F ◦G = id is similar; hence, F is a bijection.
Remark 16.13 All the properties of the Fourier transformation as stated in Theorem 16.10 (i),
(ii), (iii), (iv), and (v) remain valid in case of S ′ . In particular, F(xα T ) = i| α | D α (FT ).
Indeed, for ϕ ∈ S (Rn ), by Theorem 16.10 (ii)
F(xα T )(ϕ) = hxα T , Fϕi = hT , xα Fϕi = T , (−i)| α | F(D α ϕ)
= (−1)| α | (−i)| α | hD α (FT ) , ϕi = i| α | D α T , ϕ .
Example 16.12 (a) Let a ∈ Rn . We compute F δa . For ϕ ∈ S (Rn ),
Z
e−ix·a ϕ(x) dx = Tαn e−ix·a (ϕ).
Fδa (ϕ) = δa (Fϕ) = (Fϕ)(a) = αn
Rn
Hence, Fδa is the regular distribution corresponding to f (x) = αn e−ix·a . In particular, F(δ) =
1
Tαn 1 is the constant function. Note that F(δ) = G(δ) = √2π
n T1 . Moreover, F(T1 ) = G(T1 ) =
−1
αn δ.
(b) n = 1, b > 0.
Z
e−ixξ H(b − | x |) dx
F(H(b − | x |)) = α1
R
Z b
2 sin(bξ)
= α1
e−ixξ dx = √
.
ξ
2π
−b
16.4 Fourier Transformation in S (Rn ) and S ′ (Rn )
449
(c) The single-layer distribution. Suppose that S is a compact, regular, piecewise differentiable,
non self-intersecting surface in R3 and ̺(x) ∈ L1loc (R3 ) is a function on S (a density function
or distribution—in the physical sense). We define the distribution ̺δS by the scalar surface
integral
ZZ
h̺δS , ϕi =
̺(x)ϕ(x) dS.
S
The support of ̺δS is S, a set of measure zero with respect to the 3-dimensional Lebesgue
measure. Hence, ̺δS is a singular distribution.
Similarly, one defines the double-layer distribution (which comes from dipoles) by
ZZ
∂
∂ϕ(x)
−
(̺δS ) , ϕ =
dS,
̺(x)
∂~n
∂~n
S
where ~n denotes the unit normal vector to the surface.
We compute the Fourier transformation of the single layer F(̺δS ) in case of a sphere of radius
r, Sr = {x ∈ R3 | kxk = r} and density ̺ = 1. By Fubini’s theorem,
Z Z Z
1
−ix·ξ
hFδSr , ϕi = δSr (0) , Fϕ √ 3
ϕ(x) dx dSξ
e
3
Sr
R
2π


Z
ZZ
1

(cos(x · ξ) − i sin(x · ξ)) dSξ  ϕ(x) dx
=√ 3
|
{z
}
3
Sr
2π R
is 0
Using spherical coordinates on Sr , where x is fixed to be the z-axis and ϑ is the angle between
x and ξ ∈ Sr , we have dS = r 2 sin ϑ dϕ dϑ and x · ξ = r kxk cos ϑ. Hence, the inner (surface)
integral reads
=
Z
2π
0
= 2π
Z
Z
π
cos(kxk r cos ϑ)r 2 sin ϑdϑdϕ,
0
−kxkr
kxkr
− cos s
s = kxk r cos ϑ,
ds = − kxk r sin ϑdϑ
r
r
ds = 4π
sin(kxk r).
kxk
kxk
Hence,
2r
hFδSr , ϕi = √
2π
Z
R3
ϕ(x)
sin(r kxk)
dx;
kxk
the Fourier transformation of δSr is the regular distribution
2r sin(r kxk)
FδSr (x) = √
.
kxk
2π
(d) The Resolvent of the Laplacian −∆. Consider the Hilbert space H = L2 (Rn ) and its dense
subspace S (Rn ). For ϕ ∈ S there is defined the Laplacian −∆ϕ. Recall that the resolvent of
a linear operator A at λ is the bounded linear operator on H, given by Rλ (A) = (A − λI)−1 .
Given f ∈ H we are looking for u ∈ H with Rλ (A) f = u. This is equivalent to solve
16 Distributions
450
f = (A − λI)(u) for u. In case of A = −∆ we can apply the Fourier transformation to solve
this equation. By Theorem 16.10 (ii)
!
n
2
X
∂
−∆u − λu = f, −F
u − λFu = Ff,
∂x2k
k=1
n
X
k=1
ξk2 (Fu)(ξ) − λFu(ξ) = (Fu)(ξ)(ξ 2 − λ) = Ff (ξ)
Ff (ξ)
ξ2 − λ
1
Ff (ξ) (x)
u(x) = G 2
ξ −λ
Fu(ξ) =
Hence,
Rλ (−∆) = F −1◦
1
◦F,
ξ2 − λ
where in the middle is the multiplication operator by the function 1/(ξ 2 − λ). One can see that
this operator is bounded in H if and only if λ ∈ C \ R+ such that the spectrum of −∆ satisfies
σ(−∆) ⊂ R+ .
16.5 Appendix—More about Convolutions
Since the following proposition is used in several places, we make the statement explicit.
Proposition 16.15 Let T (x, t) and S(x, t) be distributions in D′ (Rn+1 ) with supp T ⊂ Rn ×
R+ and supp S ⊂ Γ+ (0, 0). Here Γ+ (0, 0) = {(x, t) ∈ Rn+1 | kxk ≤ at} denotes the forward
light cone at the origin.
Then the convolution T ∗ S exists in D′ (Rn+1 ) and can be written as
hT ∗ S , ϕi = hT (x, t) ⊗ S(y, s) , η(t)η(s)η(as − kyk) ϕ(x + y, t + s)i ,
(16.14)
ϕ ∈ D(Rn+1 ), where η ∈ D(R) with η(t) = 1 for t > −ε and ε > 0 is any fixed positive
number. The convolution (T ∗ S)(x, t) vanishes for t < 0 and is continuous in both components,
that is
(a) If Tk → T in D′ (Rn+1 ) and supp fk , f ⊆ Rn × R, then Tk ∗ S → T ∗ S in D′ (Rn+1 ).
(b) If Sk → S in D′ (Rn+1 ) and supp Sk , S ⊆ Γ+ (0, 0), then T ∗ Sk → T ∗ S in D′ (Rn+1 ).
Proof. Since η ∈ D(R), there exists δ > 0 with η(x) = 0 for x < −δ. Let ϕ(x, t) ∈ D(Rn+1 )
with supp ϕ ∈ UR (0) for some R > 0. Let ηK (x, t, y, s), K → R2n+2 , be a sequence in
D(R2n+2 ) converging to 1 in R2n+2 , see before Definition 16.12. For sufficiently large K we
then have
ψK := η(s)η(t)η(as − kyk)ηK (x, t, y, s)ϕ(x + y, t + s)
= η(s)η(t)η(at − kyk)ϕ(x + y, t + s) =: ψ.
(16.15)
16.5 Appendix—More about Convolutions
451
To prove this it suffices to show that ψ ∈ D(R2n+2 ). Indeed, ψ is arbitrarily often differentiable
and its support is contained in
{(x, t, y, s) | s, t ≥ −δ, as − kyk ≥ −δ,
kx + yk2 + | r + s |2 ≤ R2 },
which is a bounded set.
Since η(t) = 1 in a neighborhood of supp T and η(s)η(as − kyk) = 1 in a neighborhood of
supp S, T (x, t) = η(x)T (x, t) and S(y, s) = η(s)η(as − kyk)S(y, s). Using (16.15) we have
hT ∗ S , ϕi =
=
lim2n+2 hT (x, t) ⊗ S(y, s) , ηK (x, t, y, s)ϕ(x + y, t + s)i
K→
R
lim2n+2 hT (x, t) ⊗ S(y, s) , ψK i ,
K→
R
ϕ ∈ D(R2n+2 ).
This proves the first assertion.
We now prove that the right hand side of (16.14) defines a continuous linear functional on
D(Rn+1 ). Let ϕk −→ ϕ as k → ∞. Then
D
ψk := η(t)η(s)η(as − kyk) ϕk (x + y, t + s) −→ ψ
D
as k → ∞. Hence,
hT ∗ S , ϕk i = hT (x, t) ⊗ S(y, s) , ψk i → hT (x, t) ⊗ S(y, s) , ψi = hT ∗ S , ϕi ,
k → ∞,
and T ∗ S is continuous.
We show that T ∗ S vanishes for t < 0. For, let ϕ ∈ D(Rn+1 ) with supp ϕ ⊆ Rn × (−∞, −δ1 ].
Choosing δ > δ1 /2 one has
η(t)η(s)η(as − kyk)ϕ(x + y, t + s) = 0,
such that hT ∗ S , ϕi = 0. Continuity of the convolution product follows from the continuity
of the tensor product.
452
16 Distributions
Chapter 17
PDE II — The Equations of Mathematical
Physics
In this chapter we study in detail the Laplace equation, wave equation as well as the heat equation. Firstly, for all space dimensions n we determine the fundamental solutions to the corresponding differential operators; then we consider initial value problems and initial boundary
value problems. We study eigenvalue problems for the Laplace equation.
Recall Green’s identities, see Proposition 10.2,
ZZ ZZZ
∂u
∂v
dS,
−v
u
(u∆(v) − v∆(u)) dxdydz =
∂~n
∂~n
G
ZZZ
Z∂GZ
∂u
∆(u) dxdydz =
dS.
(17.1)
∂~n
G
∂G
We also need that for x ∈ Rn \ {0},
1
= 0,
∆
kxkn−2
n ≥ 3,
∆(log kxk) = 0,
n = 2,
see Example 7.5.
17.1 Fundamental Solutions
17.1.1 The Laplace Equation
Let us denote by ωn the measure of the unit sphere Sn−1 in Rn , that is, ω2 = 2π, ω3 = 4π.
Theorem 17.1 The function
En (x) =
(
1
2π
log kxk ,
1
− (n−2)ω
n
1
,
kxkn−2
n = 2,
n≥3
is locally integrable; the corresponding regular distribution En satisfies the equation ∆En = δ,
and hence is a fundamental solution for the Laplacian in Rn .
453
17 PDE II — The Equations of Mathematical Physics
454
Proof. Step 1. Example 7.5 shows that ∆(E(x)) = 0 if x 6= 0.
Step 2. By homework 50.4, log kxk is in L1loc (R2 ) and 1/ kxkα ∈ L1loc (Rn ) if and only if α < n.
Hence, En , n ≥ 2, define regular distributions in Rn .
Let n ≥ 3 and ϕ ∈ D(Rn ). Using that 1/ kxkn−2 is locally integrable and Example 12.6 (a),
ψ ∈ D implies
Z
Z
ψ(x)
ψ(x)
dx = lim
dx.
n−2
ε→0 kxk≥ε r n−2
Rn r
Abbreviating βn = −1/((n − 2)ωn ) we have
Z
∆ϕ(x) dx
RnZ kxkn−2
∆ϕ(x) dx
= βn lim
.
ε→0 kxk≥ε kxkn−2
h∆En , ϕi = βn
We compute the integral on the right using v(x) =
∆v = 0. Applying Green’s identity, we have
βn
Z
kxk≥ε
∆ϕ(x) dx
= βn
kxkn−2
Z
kxk=ε
1
,
r n−2
which is harmonic for kxk ≥ ε,
1
∂
∂
ϕ(x) − ϕ(x)
n−2
r
∂r
∂r
1
r n−2
dS
Let us consider the first integral as ε → 0. Note that ϕ and grad ϕ are both bounded by a
constant C since hϕ is a test function. We make use of the estimate
Z
kxk=ε
Z
Z
∂ϕ(x)
c
dS c′ εn−1
1
∂
≤
ϕ(x) n−2 ≤ n−2
dS
dS
=
= c′ ε
εn−2
∂r
n−2
∂r
r
ε
ε
kxk=ε
kxk=ε
which tends to 0 as ε → 0.
\
n \v
second integral = βn
Hence we are left with computing the second integral. Note
x
that the outer unit normal vector to the sphere is ~n = −
kxk
∂
1
1
such that ∂~n kxkn−2 = (n − 2) rn−1 and we have
Z
−(n − 2)
1 1
ϕ(x)
dS =
n−1
r
ωn εn−1
kxk=ε
Z
ϕ(x) dS.
kxk=ε
Note that ωn εn−1 is exactly the (n − 1)-dimensional measure of the sphere of radius ε. So, the
integral is the mean value of ϕ over the sphere of radius ε. Since ϕ is continuous at 0, the mean
value tends to ϕ(0). This proves the assertion in case n ≥ 3.
The proof in case n = 2 is quite analogous.
17.1 Fundamental Solutions
455
Corollary 17.2 Suppose that f (x) is a continuous function with compact support. Then S =
E ∗ f is a regular distribution and we have ∆S = f in D′ . In particular,
ZZ
1
S(x) =
log kx − yk f (y) dy, n = 2;
2π
R2Z Z Z
(17.2)
f (y)
1
dy, n = 3.
S(x) = −
4π
kx − yk
R3
Proof. By Theorem 16.8, S = E ∗ f is a solution of Lu = f if E is a fundamental solution of
the differential operator L. Inserting the fundamental solution of the Laplacian for n = 2 and
n = 3 and using that f has compact support, the assertion follows.
Remarks 17.1 (a) The given solution (17.2) is even a classical solutions of the Poisson equation. Indeed, we can differentiate the parameter integral as usual.
(b) The function G(x, y) = En (x − y) is called the Green’s function of the Laplace equation.
17.1.2 The Heat Equation
Proposition 17.3 The function
F (x, t) =
kxk2
1
− 2
n H(t) e 4a t
(4πa2 t) 2
defines a regular distribution E = TF and a fundamental solution of the heat equation
ut − a2 ∆u = 0, that is
Et − a2 ∆x E = δ(x) ⊗ δ(t).
(17.3)
Proof. Step 1. The function F (x, t) is locally integrable since F = 0 for t ≤ 0 and F ≥ 0 for
t > 0 and
Z
Z
Z
n Y
2
1
1
2
− r2
−ξ
√
e k dξk = 1.
(17.4)
F (x, t) dx =
e 4a t dx =
(4πa2 t)n/2 Rn
π
Rn
R
k=1
Step 2. For t > 0, F ∈ C∞ and therefore
2
∂F
x
n
F,
=
−
∂t
4a2 t2 2t
2
∂F
xi
1
xi
∂2F
= − 2 F;
=
−
,
∂xi
2a t
∂x2i
4a4 t2 2a2 t
∂F
− a2 ∆F = 0.
∂t
See also homework 59.2.
(17.5)
17 PDE II — The Equations of Mathematical Physics
456
We give a proof using the Fourier transformation with respect to the spatial variables. Let
E(ξ, t) = (Fx F )(ξ, t). We apply the Fourier transformation to (17.3) and obtain a first order
ODE with respect to the time variable t:
∂
E(ξ, t) + a2 ξ 2 E(ξ, t) = αn 1(ξ)δ(t).
∂t
Recall from Example 16.10, that u′ + bu = δ has the fundamental solution u(t) = H(t) e−bt ;
hence
2 2
E(ξ, t) = H(t) αn e−a ξ t
We want to apply the inverse Fourier transformation with respect to the spatial variables. For,
note that by Example 16.11,
ξ2
F −1 e− 2c2
where, in our case,
1
2c2
= a2 t or c =
√1 .
2a2 t
22
E(x, t) = H(t) αn F −1 e−a ξ t =
= cn e−
c2 x 2
2
,
Hence,
2
2
1
1
1
− x2
− x2
n ·
n e 2·2a t =
n e 4a t .
(2π) 2 (2a2 t) 2
(4πa2 t) 2
Corollary 17.4 Suppose that f (x, t) is a continuous function on Rn × R+ with compact support. Let
kx−yk2
Z tZ
−
e 4a2 (t−s)
1
V (x, t) = H(t)
n
n f (y, s) dy ds
(4a2 π) 2 0 Rn (t − s) 2
Then V (x, t) is a regular distribution in D′ (Rn × R+ ) and a solution of ut − a2 ∆u = f in
D′ (Rn × R+ ).
Proof. This follows from Theorem 16.8.
17.1.3 The Wave Equation
We shall determine the fundamental solutions for the wave equation in dimensions n = 3,
n = 2, and n = 1. In case n = 3 we again apply the Fourier transformation. For the other
dimensions we use the method of descent.
(a) Case n = 3
Proposition 17.5
H(t)
∈ D′ (R4 )
4πa2 t
is a fundamental solution for the wave operator 2a u = utt − a2 (ux1 x1 + ux2x2 + ux3 x3 ) where
δSat denotes the single-layer distribution of the sphere of radius a·t around 0.
E(x, t) = δSat ⊗
17.1 Fundamental Solutions
457
Proof. As in case of the heat equation let E(ξ, t) = FE(ξ, t) be the Fourier transform of the
fundamental solution E(x, t). Then E(ξ, t) satisfies
∂2
E + a2 ξ 2 E = α3 1(ξ)δ(t)
∂t2
Again, this is an ODE of order 2 in t. Recall from Example 16.10 that u′′ + a2 u = δ, a 6= 0, has
a solution u(t) = H(t) sinaat . Thus,
E(ξ, t) = α3 H(t)
sin(a kξk t)
,
a kξk
where ξ is thought to be a parameter. Apply the inverse Fourier transformation Fx−1 to this
function. Recall from Example 16.12 (b), the Fourier transform of the single layer of the sphere
of radius at around 0 is
2at sin(at kξk)
FδSat (ξ) = √
.
kξk
2π
This shows
E(x, t) =
1
1
1 1
H(t)δSat (x) =
H(t)δSat (x)
2π 2at
a
4πa2 t
Let’s evaluate hE3 , ϕ(x, t)i. Using dx1 dx2 dx3 = dSr dr where x = (x1 , x2 , x3 ) and r = kxk
as well as the transformation r = at, dr = a dt and dS is the surface element of the sphere
Sr (0), we obtain
Z ∞ ZZ
1
1
hE3 , ϕ(x, t)i =
ϕ(x, t) dS dt
(17.6)
2
4πa 0 t
S
Z ∞ ZatZ a
1
dr
r
=
dS
ϕ
x,
2
4πa 0 r
a
a
Sr
Z ϕ x, kxk
a
1
=
dx.
(17.7)
2
4πa R3
kxk
(b) The Dimensions n = 2 and n = 1
To construct the fundamental solution E2 (x, t), x = (x1 , x2 ), we use the so-called method of
descent.
Lemma 17.6 A fundamental solution E2 of the 2-dimensional wave operator 2a,2 is given by
hE2 , ϕ(x1 , x2 , t)i = lim hE3 (x1 , x2 , x3 , t) , ϕ(x1 , x2 , t)ηk (x3 )i ,
k→∞
where E3 denotes a fundamental solution of the 3-dimensional wave operator 2a,3 and ηk ∈
D(R) is the function converging to 1 as k → ∞.
17 PDE II — The Equations of Mathematical Physics
458
Proof. Let ϕ ∈ DR2 ). Noting that ηk′′ → 0 uniformly on R as k → ∞, we get
h2a,2 E2 , ϕ(x1 , x2 , t)i = hE2 , 2a,2 ϕ(x1 , x2 , t)i
= lim hE3 , 2a,2 (ϕ(x1 , x2 , t)ηk (x3 ))i
k→∞
= lim hE3 , 2a,2 ϕ(x1 , x2 , t)·ηk (x3 )+ϕ·ηk′′ (x3 )i
k→∞
= lim E3 , 2a,3 ϕ(x1 , x2 , t) ηk (x3 )
k→∞
= lim h2a,3 E3 , ϕ(x1 , x2 , t)ηk (x3 )i =
k→∞
lim hδ(x1 , x2 , x3 )δ(t) , ϕ(x1 , x2 , t)ηk (x3 )i
k→∞
= ϕ(0, 0, 0) = hδ(x1 , x2 , t) , ϕ(x1 , x2 , t)i .
In the third line we used that ∆x1 ,x2 ,x3 ϕ(x1 , x2 , t)·η(x3 ) = ∆x1 ,x2 ϕ(x1 , x2 , t) η + ϕ η ′′ (x3 ).
Proposition 17.7 (a) For x = (x1 , x2 ) ∈ R2 and t ∈ R, the regular distribution
(
1 √ 1
,
at > kxk ,
1 H(at − kxk)
2πa a2 t2 −x2
√
=
E2 (x, t) =
2πa
a2 t2 − x2
0,
at ≤ kxk
is a fundamental solution of the 2-dimensional wave operator.
(b) The regular distribution
(
1
,
| x | < at,
1
E1 (x, t) = H(at − | x |) = 2a
2a
0,
| x | ≥ at
is a fundamental solution of the one-dimensional wave operator.
Proof. By the above lemma,
1
hE2 , ϕ(x1 , x2 , t)i = hE3 , ϕ(x1 , x2 , t)1(x3 )i =
4πa2
Z
∞
0
1
t
ZZ
ϕ(x1 , x2 , t) dS dt.
Sat
We compute the surface element of the sphere of radius
p at around 0 in terms of x1 , x2 . The
surface is the graph of the function x3 = f (x1 , x2 ) = a2 t2 − x21 − x22 . By the formula before
p
Example 10.4, dS = 1 + fx21 + fx22 dx1 dx2 . In case of the sphere we have
at dx1 dx2
dSx1 ,x2 = p
a2 t2 − x21 − x22
Integration over both the upper and the lower half-sphere yields factor 2,
Z ∞
ZZ
1
1
atϕ(x1 , x2 , t)
p
=2
dx1 dx2 dt
2
4πa 0 t
a2 t2 − x21 − x22
x2 +x2 ≤a2 t2
Z ∞ ZZ 1 2
ϕ(x1 , x2 , t)
1
p
=
dx1 dx2 dt.
2πa 0
a2 t2 − x21 − x22
kxk≤at
17.2 The Cauchy Problem
459
This shows that E2 (x, t) is a regular distribution of the above form.
RRR
One can show directly that E2 ∈ L1loc (R3 ). Indeed,
E2 (x1 , x2 , t) dx dt < ∞ for all
R2×[−R,R]
R > 0.
(b) It was already shown in homework 57.2 that E1 is the fundamental solution to the onedimensional wave operator. A short proof is to be found in [Wla72, II. §6.5 Example g)].
Use the method of descent to complete this proof.
Let ϕ ∈ D(R2 ).
Since
R
R
R | E2 (x1 , x2 , t) | dx2 < ∞ and R E2 (x1 , x2 , t) dx2 again defines a locally integrable function, we have as in Lemma 17.6
Z
E1 (ϕ) = lim hE2 (x1 , x2 , t) , ϕ(x1 , t)ηk (x2 )i = lim
E2 (x1 , x2 , t)ηk (x2 )ϕ(x1 , t) dx1 dx2 dt.
k→∞
k→∞
R3
Hence, the fundamental solution E1 is the regular distribution
Z ∞
E2 (x1 , x2 , t) dx2
E1 (x1 , t) =
−∞
17.2 The Cauchy Problem
In this section we formulate and study the classical and generalized Cauchy problems for the
wave equation and for the heat equation.
17.2.1 Motivation of the Method
To explain the method, we first apply the theory of distribution to solve an initial value problem
of a linear second order ODE.
Consider the Cauchy problem
u′′ (t) + a2 u(t) = f (t),
u |t=0+ = u0 ,
u′ |t=0+ = u1 ,
(17.8)
where f ∈ C(R+ ). We extend the solution u(t) as well as f (t) by 0 for negative values of t,
t < 0. We denote the new function by ũ and f˜, respectively. Since ũ has a jump of height u0
at 0, by Example 16.6, ũ′ (t) = {u′(t)} + u0 δ(t). Similarly, u′ (t) jumps at 0 by u1 such that
ũ′′ (t) = {u′′ (t)} + u0 δ ′ (t) + u1 δ(t). Hence, ũ satisfies on R the equation
ũ′′ + a2 ũ = f˜(t) + u0 δ ′ (t) + u1δ(t).
(17.9)
We construct the solution ũ. Since the fundamental solution E(t) = H(t) sin at/a as well as the
right hand side of (17.9) has positive support, the convolution product exists and equals
ũ = E ∗ (f˜ + u0 δ ′ (t) + u1 δ(t)) = E ∗ f˜ + u0 E′ (t) + u1E(t)
Z
1 t
f (τ ) sin a(t − τ )dτ + u0 E′ (t) + u1 E(t).
ũ =
a 0
Since in case t > 0, ũ satisfies (17.9) and the solution of the Cauchy problem is unique, the
above formula gives the classical solution for t > 0, that is
Z
sin at
1 t
.
f (τ ) sin a(t − τ )dτ + u0 cos at + u1
u(t) =
a 0
a
17 PDE II — The Equations of Mathematical Physics
460
17.2.2 The Wave Equation
(a) The Classical and the Generalized Initial Value Problem—Existence, Uniqueness, and
Continuity
Definition 17.1 (a) The problem
2a u = f (x, t),
u|t=0+ = u0(x),
∂u = u1(x),
∂t x ∈ Rn , t > 0,
(17.10)
(17.11)
(17.12)
t=0+
where we assume that
f ∈ C(Rn × R+ ),
u0 ∈ C1 (Rn ),
u1 ∈ C(Rn ).
is called the classical initial value problem (CIVP, for short) to the wave equation.
A function u(x, t) is called classical solution of the CIVP if
u(x, t) ∈ C2 (Rn × R+ ) ∩ C1 (Rn × R+ ),
u(x, y) satisfies the wave equation (17.10) for t > 0 and the initial conditions (17.11) and
(17.12) as t → 0 + 0.
(b) The problem
2a U = F (x, t) + U0 (x) ⊗ δ ′ (t) + U1 (x) ⊗ δ(t)
with F ∈ D′ (Rn+1 ), U0 , U1 ∈ D′ (Rn ), and supp F ⊂ Rn × [0, +∞) is called
generalized initial value problem (GIVP). A generalized function U ∈ D′ (Rn+1 ) with
supp U⊂ Rn × [0, +∞) which satisfies the above equation is called a (generalized, weak) solution of the GIVP.
Proposition 17.8 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f , u0 ,
and u1 . Then the regular distribution Tu is a solution of the GIVP with the right hand side
Tf + Tu0 ⊗ δ ′ (t) + Tu1 ⊗ δ(t) provided that f (x, t) and u(x, t) are extended by 0 into the domain
{(x, t) | (x, t) ∈ Rn+1 , t < 0}.
(b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 ,
U1 = Tu1 and U = Tu be regular and they satisfy the regularity assumptions of the CIVP.
Then, u(x, t) is a solution of the CIVP.
Proof. (b) Suppose that U is a solution of the GIVP; let ϕ ∈ D(Rn+1 ). By definition of the
tensor product and the derivative,
Utt − a2 ∆U , ϕ = hF , ϕi + hU0 ⊗ δ ′ , ϕi + hU1 ⊗ δ , ϕi
Z ∞Z
Z
Z
∂ϕ
=
f (x, t)ϕ(x, t) dx dt −
u0 (x) (x, 0) dx +
u1 (x)ϕ(x, 0) dx.
∂t
0
Rn
Rn
Rn
(17.13)
17.2 The Cauchy Problem
461
Applying integration by parts with respect to t twice, we find
Z ∞
Z ∞
∞
uϕtt dt = uϕt |0 −
ut ϕt dt
0
0
Z ∞
∞
= −u(x, 0)ϕt (x, 0) − ut ϕ|0 +
ut tϕ dt
0
Z ∞
= −u(x, 0)ϕt (x, 0) + ut (x, 0)ϕ(x, 0) +
ut tϕ dt.
0
Since Rn has no boundary and ϕ has compact support, integration by parts with respect to the
spatial variables x yields no boundary terms.
RR
RR
Hence, by the above formula and
u ∆ϕ dt dx =
∆u ϕ dt dx, we obtain
Z Z ∞
2
2
Utt − a ∆U , ϕ = U , ϕtt − a ∆ϕ =
u(x, t) ϕtt − a2 ∆ϕ dt dx
n
Z Z ∞
ZR 0
(u(x, 0)ϕt (x, 0) − ut (x, 0)ϕ(x, 0)) dx.
=
utt − a2 ∆u ϕ(x, t) dx dt −
Rn
Rn
0
(17.14)
For any ϕ ∈ D(Rn × R+ ), supp ϕ is contained in Rn × (0, +∞) such that
ϕ(x, 0) = ϕt (x, 0) = 0. From (17.13) and (17.14) it follows that
Z Z ∞
f (x, t) − utt + a2 ∆u ϕ(x, t) dt dx = 0.
Rn
0
By Lemma 16.2 (Du Bois Reymond) it follows that utt − a2 ∆u = f on Rn × R+ . Inserting this
into (17.13) and (17.14) we have
Z
Z
(u0 (x) − u(x, 0))ϕt (x, 0) dx −
(u1 (x) − ut (x, 0)) ϕ(x, 0) dx = 0.
Rn
Rn
If we set ϕ(x, t) = ψ(x)η(t) where η ∈ D(R) and η(t) = 1 is constant in a neighborhood of 0,
ϕt (x, 0) = 0 and therefore
Z
(u1 (x) − ut (x, 0))ψ(x) = 0, ψ ∈ D(Rn ).
Rn
Moreover,
Z
R
n
(u0(x) − u(x, 0))ψ(x) dx = 0,
ψ ∈ D(Rn )
if we set ϕ(x, t) = tη(t)ψ(x). Again, Lemma 16.2 yields
u0 (x) = u(x, 0),
u1 (x) = ut (x, 0)
and u(x, t) is a solution of the CIVP.
(a) Conversely, if u(x, t) is a solution of the CIVP then (17.14) holds with
U(x, t) = H(t) u(x, t).
17 PDE II — The Equations of Mathematical Physics
462
Comparing this with (17.13) it is seen that
Utt − a2 ∆U = F + U0 ⊗ δ ′ + U1 ⊗ δ
where F (x, t) = H(t)f (x, t), U0 (x) = u(x, 0) and U1 (x) = ut (x, 0).
Corollary 17.9 Suppose that F , U0 , and U1 are data of the GIVP. Then there exists a unique
solution U of the GIVP. It can be written as
U = V + V (0) + V (1)
where
∂En
∗x U0 .
∂t
Here En ∗x U1 := En ∗ (U1 (x) ⊗ δ(t)) denotes the convolution product with respect to the spatial
variables x only. The solution U depends continuously in the sense of the convergence in D′ on
F , U0 , and U1 . Here En denote the fundamental solution of the n-dimensional wave operator
2a,n .
V = En ∗ F,
V (1) = En ∗x U1 ,
V (0) =
Proof. The supports of the distributions U0 ⊗ δ ′ and U1 ⊗ δ are contained in the hyperplane
{(x, t) ∈ Rn+1 | t = 0}. Hence the support of the distribution F + U0 ⊗ δ ′ + U1 ⊗ δ is contained
in the half space Rn × R+ .
It follows from Proposition 16.15 below that the convolution product
U = En ∗ (F + U0 ⊗ δ ′ + U1 ⊗ δ)
exists and has support in the positive half space t ≥ 0. It follows from Theorem 16.8 that U is a
solution of the GIVP. On the other and, any solution of the GIVP has support in Rn × R+ and
therefore,
by Proposition 16.15, posses the convolution with En . By Theorem 16.8, the solution U is
unique.
Suppose that Uk −→ U1 as k → ∞ in D′ (Rn+1 ) then En ∗ Uk −→ En ∗ U1 by the continuity of
the convolution product in D′ (see Proposition 16.15).
(b) Explicit Solutions for n = 1, 2, 3
We will make the above formulas from Corollary 17.9 explicit, that is, we compute the above
convolutions to obtain the potentials V , V (0) , and V (1) .
Proposition 17.10 Let f ∈ C2 (Rn × R+ ), u0 ∈ C3 (Rn ), and u1 ∈ C2 (Rn ) for n = 2, 3; let
f ∈ C1 (R+ ), u0 ∈ C2 (R), and u1 ∈ C1 (R) in case n = 1.
Then there exists a unique solution of the CIVP. It is given in case n = 3 by Kirchhoff’s formula



x−y ZZZ
Z
Z
Z
Z
f y, t − a ∂ 1
1
1 

u1(y) dSy +
u0 (y) dSy  .
dy +
u(x, t) =


2
4πa
kx − yk
t
∂t t
Uat (x)
Sat (x)
Sat (x)
(17.15)
17.2 The Cauchy Problem
463
The first term V is called retarded potential.
In case n = 2, x = (x1 , x2 ), y = (y1 , y2 ), it is given by Poisson’s formula
1
u(x, t) =
2πa
Z
t
ZZ
f (y, s) dy ds
q
0
a2 (t − s)2 − kx − yk2
Ua(t−s) (x)
1
+
2πa
ZZ
Uat (x)
1 ∂
+
2πa ∂t
ZZ
Uat (x)
u1 (y) dy
q
a2 t2 − kx − yk2
u0 (y) dy
q
. (17.16)
2
2
2
a t − kx − yk
In case n = 1 it is given by d’Alembert’s formula
Z t Z x+a(t−s)
Z x+at
1
1
1
u(x, t) =
f (y, s) dy ds +
u1 (y) dy + (u0 (x + at) + u0 (x − at)) .
2a 0 x−a(t−s)
2a x−at
2
(17.17)
The solution u(x, t) depends continuously on u0 , u1 , and f in the following sense: If
˜
f − f < ε, | u0 − ũ0 | < ε0 , | u1 − ũ1 | < ε1 , k grad (u0 − ũ0 )k < ε′0
(where we impose the last inequality only in cases n = 3 and n = 2), then the corresponding
solutions u(x, t) and ũ(x, t) satisfy in a strip 0 ≤ t ≤ T
1
| u(x, t) − ũ(x, y) | < T 2 ε + T ε1 + ε0 + (aT ε′0 ),
2
where the last term is omitted in case n = 1.
Proof. (idea of proof) We show Kirchhoff’s formula.
(a) The potential term with f .
By Proposition 16.15 below, the convolution product E3 ∗ f exists. It is shown in [Wla72, p.
153] that for a locally integrable function f ∈ L1loc (Rn+1 ) with supp f ⊂ Rn × R+ , En ∗ Tf is
again a locally integrable function.
Formally, the convolution product is given by
Z
Z
(E3 ∗ f )(x, t) =
E3 (y, s)f (x − y, t − s) dy ds =
E3 (x − y, t − s) f (y, s) dy ds,
R4
R4
where the integral is to be understood the evaluation of E3 (y, s) on the shifted function f (x −
y, t − s). Since f has support on the positive time axis, one can restrict oneselves to s > 0 and
t − s > 0, that is to 0 < s < t. That is formula (17.6) gives
Z t ZZ
1
1
E3 ∗ f (x, t) =
f (x − y, t − s) dS(y) ds
2
4πa 0 s
Sas
Using r = as, dr = a ds, we obtain
1
V (x, t) =
4πa2
Z
0
at
ZZ
Sr
1 r
dS(y) dr.
f x − y, t −
r
a
17 PDE II — The Equations of Mathematical Physics
464
Using dy1 dy2 dy3 = dr dS as well as kyk = r = as we can proceed
Z Z Z f x − y, t − kyk
a
1
V (x, t) =
dy1 dy2 dy3 .
2
4πa
kyk
Uat(0)
The shift z = x − y, dz1 dz2 dz3 = dy1 dy2 dy3 finally yields
Z Z Z f z, t − kx−zk
a
1
V (x, t) =
dz.
2
4πa
kx − zk
Uat(x)
This is the first potential term of Kirchhoff’s formula.
(b) We compute V (1) (x, t). By definition,
V (1) = E3 ∗ (u1 ⊗ δ) = E3 ∗x u1 .
Formally, this is given by,
V
(1)
1
(x, t) =
4πa2 t
ZZZ
ZRZ
1
δSat (y) u1(x − y) dy =
4πa2 t
3
=
1
4πa2 t
ZZ
u1 (x − y) dS(y)
Sat
u1 (y) dS(y).
Sat (x)
(c) Recall that (D α S) ∗ T = D α (S ∗ T ), by Remark 16.8 (b). In particular
E3 ∗ (u0 ⊗ δ ′ ) =
∂
(E3 ∗x u0 ) ;
∂t
which immediately gives (c) in view of (b).
Remark 17.2 (a) The stronger regularity (differentiability) conditions on f, u0 , u1 are necessary to prove u ∈ C2 (Rn × R+ ) and to show stability.
(b) Proposition 17.10 and Corollary 17.9 show that the GIVP for the wave wave equation is a
well-posed problem (existence, uniqueness, stability).
17.2.3 The Heat Equation
Definition 17.2 (a) The problem
ut − a2 ∆u = f (x, t),
u(x, 0) = u0 (x),
x ∈ Rn , t > 0
where we assume that
f ∈ C(Rn × R+ ),
u0 ∈ C(Rn )
(17.18)
(17.19)
17.2 The Cauchy Problem
465
is called the classical initial value problem (CIVP, for short) to the heat equation.
A function u(x, t) is called classical solution of the CIVP if
u(x, t) ∈ C2 (Rn × (0, +∞)) ∩ C(Rn × [0, +∞)),
and u(x, t) satisfies the heat equation (17.18) and the initial condition (17.19).
(b) The problem
Ut − a2 ∆U = F + U0 ⊗ δ
with F ∈ D′ (Rn+1 ), U0 ∈ D′ (Rn ), and supp F ⊂ Rn × R+ is called generalized initial
value problem (GIVP). A generalized function U ∈ D′ (Rn+1 ) with supp U ⊂ Rn × R+ which
satisfies the above equation is called a generalized solution of the GIVP.
The fundamental solution of the heat operator has the following properties:
Z
E(x, t) dx = 1,
Rn
E(x, t) −→ δ(x),
as
t→0+.
The fundamental solution describes the heat distribution of a point-source at the origin (0, 0).
Since E(x, t) > 0 for all t > 0 and all x ∈ Rn , the heat propagates with infinite speed. This is in
contrast to our experiences. However, for short distances, the heat equation is gives sufficiently
good results. For long distances one uses the transport equation. We summarize the results
which are similar to that of the wave equation.
Proposition 17.11 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f
and u0 . Then the regular distribution Tũ is a solution of the GIVP with the right hand side
Tf˜ + Tu−0 ⊗ δ provided that f (x, t) and u(x, t) are extended to f˜(x, t) and ũ(x, t) by 0 into the
left half-space {(x, t) | (x, t) ∈ Rn+1 , t < 0}.
(b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 ,
and U = Tu be regular and they satisfy the regularity assumptions of the CIVP.
Then, u(x, t) is a solution of the CIVP.
Proposition 17.12 Suppose that F and U0 are data of the GIVP. Suppose further that F and U0
both have compact support. Then there exists a solution U of the GIVP which can be written as
U = V + V (0)
where
V = E ∗ F,
V (0) = E ∗x U0 .
The solution U varies continuously with F and U0 .
Remark 17.3 The theorem differs from the corresponding result for the wave equation in that
there is no proposition on uniqueness. It turns out that the GIVP cannot be solved uniquely. A.
Friedman, Partial differential equations of parabolic type, gave an example of a non-vanishing
17 PDE II — The Equations of Mathematical Physics
466
distribution which solves the GIVP with F = 0 and U0 = 0.
However, if all distributions are regular and we place additional requirements on the growth for
t, kxk → ∞ of the regular distribution, uniqueness can be achieved.
For existence and uniqueness we introduce the following class of functions
M = {f ∈ C(Rn × R+ ) | f
Cb (Rn ) = {f ∈ C(Rn ) | f
is bounded on the strip Rn × [0, T ] for all T > 0},
is bounded on
Rn }
Corollary 17.13 (a) Let f ∈ M and u0 ∈ Cb (Rn ). Then the two potentials V (x, t) as in
Corollary 17.4 and
Z
2
H(t)
− kx−yk
(0)
4a2 t
V (x, t) = E ∗ Tu0 ⊗ δ =
(y)e
dy
u
n
0
(4πa2 t) 2 Rn
are regular distributions and u = V + V (0) is a solution of the GIVP.
(b) In case f ∈ C2 (Rn × R+ ) with D α f ∈ M for all α with | α | ≤ 1 (first order partial
derivatives), the solution in (a) is a solution of the CIVP. In particular, V (0) (x, t) −→ u0 (x) as
t → 0+.
(c) The solution u of the GIVP is unique in the class M.
17.2.4 Physical Interpretation of the Results
s
The backward light cone
s
(x,t)
00000000000000000000000
11111111111111111111111
11111111111111111111111
00000000000000000000000
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
Γ (x,t)
−
y1
00000000000000000000000
11111111111111111111111
11111111111111111111111
00000000000000000000000
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
00000000000000000000000
11111111111111111111111
radius = at
Γ (x,t)
+
y2
y2
(x,t)
The forward light cone
y1
Definition 17.3 We introduce the two cones in Rn+1
Γ− (x, t) = {(y, s) | kx − yk < a(t − s)},
Γ+ (x, t) = {(y, s) | kx − yk < a(s − t)},
s < t,
s > t,
which are called domain of dependence (backward light cone) and domain of influence (forward
light cone), respectively.
Recall that the boundaries ∂Γ+ and ∂Γ− are characteristic surfaces of the wave equation.
17.2 The Cauchy Problem
467
(a) Propagation of Waves in Space
Consider the fundamental solution
E3 (x, t) =
1
δS ⊗ T 1
t
4πa2 at
of the 3-dimensional wave equation.
It shows that the disturbance at time t > 0 effected by a
point source δ(x)δ(t) in the origin is located on a sphere
of radius at around 0. The disturbance moves like a
spherical wave, kxk = at, with velocity a. In the beginning there is silence, then disturbance (on the sphere),
and afterwards, again silence. This is called Huygens’
principle.
Sat
Disturbance is on a sphere of radius at
It follows by the superposition principle that the solution u(x0 , t0 ) of an initial disturbance
u0 (x)δ ′ (t) + u1 (x)δ(t) is completely determined by the values of u0 and u1 on the sphere of
the backwards light-cone at t = 0; that is by the values u0 (x) and u1 (x) at all values x with
kx − x0 k = at0 .
Now, let the disturbance be situated in a compact set K rather than in a single point. Suppose that d and D are the minimal and maximal
distances of x from K. Then the disturbance
starts to act in x at time t0 = d/a it lasts for
(D − d)/a; and again, for t > D/a = t1 there
is silence at x. Therefore, we can observe a forward wave front at time t0 and a backward wave
front at time t1 .
silence
disturbance
M(K)
M(K)
silence
K
This shows that the domain of influence M(K) of compact set K is the union of all boundaries
of forward light-cones Γ+ (y, 0) with y ∈ K at time t = 0.
M(K) = {(y, s) | ∃ x ∈ K : kx − yk = as}.
(b) Propagation of Plane Waves
Consider the fundamental solution
E2 (x, t) =
H(at − kxk)
q
,
2
2
2
2πa a t − kxk
of the 2-dimensional wave equation.
x = (x1 , x2 )
17 PDE II — The Equations of Mathematical Physics
468
000
111
111
000
000
111
000
111
000
111
000
111
t=0
t=1
0000000
1111111
1111111
0000000
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
0000000
1111111
t=2
1111111111
0000000000
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
0000000000
1111111111
t=3
0000000000000000
1111111111111111
1111111111111111
0000000000000000
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
0000000000000000
1111111111111111
t=4
It shows that the disturbance effected by a point
source δ(x)δ(t) in the origin at time 0 is a disc
Uat of radius at around 0. One observes a forward wave front moving with speed a. In contrast to the 3-dimensional picture, there exists
no back front. The disturbance is permanently
present from t0 on. We speak of wave diffusion;
Huygens’ principle does not hold.
Diffusion can also be observed in case of arbitrary initial disturbance u0 (x)δ ′ (t) + u1 (x)δ(t).
Indeed, the superposition principle shows that the domain of dependence of a compact initial
disturbance K is the union of all discs Uat (y) with y ∈ K.
(c) Propagation on a Line
1
Recall that E1 (x, t) = 2a
H(at − | x |). The disturbance at time t > 0 which is effected by a
point source δ(x)δ(t) is the whole closed interval [−at, at]. We have two forward wave “fronts”
one at the point x = at and one at x = −at; one moving to the right and one moving to the left.
As in the plane case, there does not exist a back wave font; we observe diffusion.
For more details, see the discussion in Wladimirow, [Wla72, p. 155 – 159].
17.3 Fourier Method for Boundary Value Problems
A good, easy accessable introduction to the Fourier method is to be found in [KK71].
In this section we use Fourier series to solve BEVP to the Laplace equation as well as initial
boundary value problems to the wave and heat equations.
Recall that the following sets are CNOS in the Hilbert space H
1 int
√ e | n ∈ Z, , H = L2 (a, a + 2π),
2π
2πi
1
nt
√
e b−a | n ∈ Z , H = L2 (a, b),
b−a
1
1
1
√ , √ sin(nt), √ cos(nt) | n ∈ N , H = L2 (a, a + 2π),
π
π
2π
(
)
r
r
1
2
2
2π
2π
√
,
sin
nt ,
cos
nt | n ∈ N ,
b−a
b−a
b−a
b−a
b−a
For any function f ∈ L1 (0, 2π) one has an associated Fourier series
f∼
X
Z
n∈
int
cn e ,
1
cn = √
2π
Z
0
2π
f (t)e−int dt.
H = L2 (a, b),
17.3 Fourier Method for Boundary Value Problems
469
Lemma 17.14 Each of the following two sets forms a CNOS in H = L2 (0, π) (on the half
interval).
(r
)
(r
)
2
2
sin(nt) | n ∈ N ,
cos(nt) | n ∈ N0 .
π
π
Proof. To check that they form an NOS is left to the reader. We show completeness of the first
set. Let f ∈ L2 (0, π). Extend f to an odd function f˜ ∈ L2 (−π, π), that is f˜(x) = f (x) and
f˜(−x) = −f (x) for x ∈ (0, π). Since f˜ is an odd function, in its Fourier series
∞
a0 X
+
an cos(nt) + bn sin(nt)
2
n=1
P
2
˜
we have an = 0 for all n. Since the Fourier series ∞
n=1 bn sin(nt) converges to f in L (−π, π),
it converges to f in L2 (0, π). Thus, the sine system is complete. The proof for the cosine
system is analogous.
17.3.1 Initial Boundary Value Problems
(a) The Homogeneous Heat Equation, Periodic Boundary Conditions
We consider heat conduction in a closed wire loop of length 2π. Let u(x, t) be the temperature
of the wire at position x and time t. Since the wire is closed (a loop), u(x, t) = u(x + 2π, t);
u is thought to be a 2π periodic function on R for every fixed t. Thus, we have the following
periodic boundary conditions
u(0, t) = u(2π, t),
ux (0, t) = ux (2π, t),
t ∈ R+ .
(PBC)
The initial temperature distribution at time t = 0 is given such that the BIVP reads
ut − a2 uxx = 0,
x ∈ R, t > 0,
u(x, 0) = u0 (x),
(PBC).
x ∈ R,
Separation of variables. We are seeking solutions of the form
u(x, t) = f (x) · g(t)
ignoring the initial conditions for a while. The heat equation then takes the form
f (x)g ′ (t) = a2 f ′′ (x)g(t) ⇐⇒
1 g ′ (t)
f ′′ (x)
=
= κ = const.
a2 g(t)
f (x)
We obtain the system of two independent ODE only coupled by κ:
f ′′ (x) − κf (x) = 0,
g ′(t) − a2 κg(t) = 0.
(17.20)
17 PDE II — The Equations of Mathematical Physics
470
The periodic boundary conditions imply:
f (0)g(t) = f (2π)g(t),
f ′ (x)g(t) = f ′′ (2π)g(t)
which for nontrivial g gives f (0) = f (2π) and f ′ (0) = f ′ (2π). In case κ = 0, f ′′ (x) = 0 has
the general solution f (x) = ax + b. The only periodic solution is f (x) = b = const. Suppose
now that κ = −ν 2 < 0. Then the general solution of the second order ODE is
f (x) = c1 cos(νx) + c2 sin(νx).
Since f is periodic with period 2π, only a discrete set of values ν are possible, namely νn = n,
n ∈ Z. This implies κn = −n2 , n ∈ N.
Finally, in case κ = ν 2 > 0, the general solution
f (x) = c1 eνx + c2 e−νx
provides no periodic solutions f . So far, we obtained a set of solutions
fn (x) = an cos(nx) + bn sin(nx),
n ∈ N0 ,
corresponding to κn = −n2 . The ODE for gn (t) now reads
gn′ (t) + a2 n2 gn (t) = 0.
2 n2 t
, n ∈ N. Hence, the solutions of the BVP are given by
2 n2 t
(an cos(nx) + bn sin(nx)), n ∈ N,
Its solution is gn (t) = ce−a
un (x, t) = e−a
u0 (x, t) =
a0
2
and finite or “infinite” linear combinations:
∞
a0 X −a2 n2 t
+
e
(an cos(nx) + bn sin(nx)),
u(x, t) =
2
n=0
(17.21)
Consider now the initial value, that is t = 0. The corresponding series is
∞
a0 X
an cos(nx) + bn sin(nx),
+
u(x, 0) =
2
n=0
(17.22)
which gives the ordinary Fourier series of u0 (x). That is, the Fourier coefficient an and bn of
the initial function u0 (x) formally give a solution u(x, t).
1. If the Fourier series of u0 pointwise converges to u0 , the initial conditions are satisfied by
the function u(x, t) given in (17.21)
2. If the Fourier series of u0 is twice differentiable (with respect to x), so is the function
u(x, t) given by (17.21).
17.3 Fourier Method for Boundary Value Problems
471
Lemma 17.15 Consider the BIVP (17.20). (a) Existence. Suppose that u0 ∈ C4 (R) is periodic.
Then the function u(x, t) given by (17.21) is in C2,1
x,t ([0, 2π] × R+ ) and solves the classical BIVP
(17.20).
(b) Uniqueness and Stability. In the class of functions C2,1
x,t ([0, 2π] × R+ ) the solution of the
above BIVP is unique.
(4)
Proof. (a) The Fourier coefficients of u0 are bounded such that the Fourier coefficients of u0
(4)
have growth 1/n4 (integrate the Fourier series of u0 four times). Then the series for uxx (x, t)
∞
X
1
and ut (x, t) both are dominated by the series
; hence they converge uniformly. This
n2
n=0
shows that the series u(x, t) can be differentiated term by term twice w.r.t. x and once w.r.t. t.
(b) For any fixed t ≥ 0, u(x, t) is continuous in x. Consider v(t) := ku(x, t)k2L2 (0,2π) . Then
Z 2π
Z 2π
Z 2π
d
2
u(x, t) dx = 2
u(x, t)ut (x, t) dx = 2
u(x, t)a2 uxx (x, t) dx
v (t) =
dt
0
0
0
Z 2π
Z 2π
2π
= 2 a2 ux u0 − a2
(ux (x, t))2 dx = −2a2
u2x dx ≤ 0.
′
0
0
This shows that v(t) is monotonically decreasing in t.
Suppose now u1 and u2 both solve the BIVP in the given class. Then u = u1 − u2 solves
the BIVP in this class with homogeneous initial values, that is u(x, 0) = 0, hence, v(0) =
ku(x, 0)k2L2 = 0. Since v(t) is decreasing for t ≥ 0 and non-negative, v(t) = 0 for all t ≥ 0;
hence u(x, t) = 0 in L2 (0, 2π) for all t ≥ 0. Since u(x, t) is continuous in x, this implies
u(x, t) = 0 for all x and t. Thus, u1 (x, t) = u2 (x, t)—the solution is unique.
Stability. Since v(t) is decreasing
sup ku(x, t)kL2 (0,2π) ≤ ku0 kL2 (0,2π) .
R+
t∈
This shows that small changes in the initial conditions u0 imply small changes in the solution
u(x, t). The problem is well-posed.
(b) The Inhomogeneous Heat Equation, Periodic Boundary Conditions
We study the IBVP
ut − a2 uxx = f (x, t),
u(x, 0) = 0,
(17.23)
(PBC).
√
Solution. Let en (x) = einx / 2π, n ∈ Z, be the CNOS in L2 (0, 2π). These functions are all
′′
2
d2
eigen functions with respect to the differential operator dx
2 , en (x) = −n en (x). Let t > 0 be
fixed and
X
f (x, t) ∼
cn (t) en (x)
Z
n∈
17 PDE II — The Equations of Mathematical Physics
472
be the Fourier series of f (x, t) with coefficients cn (t). For u, we try the following ansatz
X
u(x, t) ∼
dn (t) en (x)
(17.24)
Z
n∈
If f (x, t) is continuous in x and piecewise continuously differentiable with respect to x, its
Fourier series converges pointwise and we have
X
X
cn (t)en .
en d′ (t) + a2 n2 en d(t) = f (x, t) =
ut − a2 uxx =
Z
N
n∈
n∈
For each n this is an ODE in t
d′n (t) + a2 n2 dn (t) = cn (t),
dn (0) = 0.
From ODE the solution is well-known
−a2 n2 t
dn (t) = e
Z
t
2 n2 s
ea
cn (s) ds.
0
Under certain regularity and growth conditions on f , (17.24) solves the inhomogeneous IBVP.
(c) The Homogeneous Wave Equation with Dirichlet Conditions
Consider the initial boundary value problem of the vibrating string of length π.
(E)
(BC)
(IC)
utt − a2 uxx = 0,
0 < x < π,
t > 0;
u(0, t) = u(π, t) = 0,
u(x, 0) = ϕ(x),
ut (x, 0) = ψ(x),
0 < x < π.
The ansatz u(x, t) = f (x)g(t) yields
g ′′
f ′′
=κ= 2 ,
f
ag
f ′′ (x) = κf (x),
g ′′ = κa2 g.
The boundary conditions imply f (0) = f (π) = 0. Hence, the first ODE has the only solutions
fn (x) = cn sin(nx),
κn = −n2 ,
n ∈ N.
The corresponding ODEs for g then read
gn′′ + n2 a2 gn = 0,
which has the general solution an cos(nat) + bn sin(nat). Hence,
∞
X
u(x, t) =
(an cos(nat) + bn sin(nat)) sin(nx)
n=1
17.3 Fourier Method for Boundary Value Problems
473
solves the boundary value problem in the sense of D′ (R2 ) (choose any an , bn of polynomial
growth). Now, insert the initial conditions, t = 0:
u(x, 0) =
∞
X
!
an sin(nx) = ϕ(x),
ut (x, 0) =
n=1
q
∞
X
!
nabn sin(nx) = ψ(x).
n=1
Since { π2 sin(nx) | n ∈ N} is a CNOS in L2 (0, π), we can determine the Fourier coefficients
of ϕ and ψ with respect to this CNOS and obtain an and bn , respectively.
Regularity. Suppose that ϕ ∈ C4 ([0, π]), ψ ∈ C3 ([0, π]). Then the Fourier-Sine coefficients an
and anbn of ϕ and ψ have growth 1/n4 and 1/n3 , respectively. Hence, the series
∞
X
u(x, t) =
(an cos(nat) + bn sin(nat)) sin(nx)
(17.25)
n=1
can be differentiated twice with respect to x or t since the differentiated series have a summable
P
upper bound c/n2 . Hence, (17.25) solves the IBVP.
(d) The Wave Equation with Inhomogeneous Boundary Conditions
Consider the following problem in Ω ⊂ Rn
utt − a2 ∆u = 0,
u(x, 0) = ut (x, 0) = 0
u |∂Ω = w(x, t).
Idea. Find an extension v(x, t) of w(x, t), v ∈ C2 (Ω × R+ ), and look for functions ũ = u − v.
Then ũ has homogeneous boundary conditions and satisfies the IBVP
ũtt − a2 ∆ũ = −vtt + a2 ∆v,
ũ(x, 0) = −v(x, 0),
ũ |∂Ω = 0.
ũt (x, 0) = −vt (x, 0)
This problem can be split into two problems, one with zero initial conditions and one with
homogeneous wave equation.
17.3.2 Eigenvalue Problems for the Laplace Equation
In the previous subsection we have seen that BIVPs using Fourier’s method often lead to boundary eigenvalue problems (BEVP) for the Laplace equation.
We formulate the problems. Let n = 1 and Ω = (0, l). One considers the following types of
BEVPs to the Laplace equation; f ′′ = λf :
• Dirichlet boundary conditions: f (0) = f (l) = 0.
• Neumann boundary conditions: f ′ (0) = f ′ (l) = 0.
17 PDE II — The Equations of Mathematical Physics
474
• Periodic boundary conditions : f (0) = f (l), f ′ (0) = f ′ (l).
• Mixed boundary conditions: α1 f (0) + α2 f ′ (0) = 0, β1 f (l) + β2 f ′ (l) = 0.
• Symmetric boundary conditions: If u and v are function satisfying these boundary conditions, then (u′ v − uv ′ )|l0 = 0. In this case integration by parts gives
Z l
Z l
′′
′′
′
′ l
(u v − uv ) dx = u v − v u|0 − (u′ v ′ − v ′ u′) dx = 0.
0
0
That is u′′ ·v = u·v ′′ and the Laplace operator becomes symmetric.
Proposition 17.16 Let Ω ⊂ Rn . The BEVP with Dirichlet conditions
∆u = λu,
u ∈ C2 (Ω) ∩ C1 (Ω)
u |∂Ω = 0,
(17.26)
has countably many eigenvalues λk . All eigenvalues are negative and of finite multiplicity. Let
0 > λ1 > λ2 > · · · then sequence ( λ1k ) tends to 0. The eigenfunctions uk corresponding to λk
form a CNOS in L2 (Ω).
Sketch of proof. (a) Let H = L2 (Ω). We use Green’s 1st formula with u = v , u |∂Ω = 0,
Z
Z
u ∆u dx + (∇u)2 dx = 0.
Ω
Ω
to show that all eigenvalues of ∆ are negative. Let ∆u = λu. First note, that λ = 0 is not an
eigenvalue of ∆. Suppose to the contrary ∆u = 0, that is, u is harmonic. Since u |∂Ω = 0, by
the uniqueness theorem for the Dirichlet problem, u = 0 in Ω. Then
Z
Z
2
u ∆u dx = − (∇u)2 dx < 0.
λ kuk = λ hu , ui = hλu , ui = h∆u , ui =
Ω
Ω
Hence, λ is negative.
(b) Assume that a Green’s function G for Ω exists. By (17.34), that is
Z
Z
∂G(x, y)
u(y) =
G(x, y) ∆u(x) dx +
u(x)
dS(x),
∂~nx
Ω
∂Ω
u |∂Ω = 0 implies
u(y) =
Z
G(x, y) ∆u(x) dx.
Ω
2
This shows that the integral operator A : L (Ω) → L2 (Ω) defined by
Z
(Av)(y) :=
G(x, y)v(x) dx
Ω
is inverse to the Laplacian. Since G(x, y) = G(y, x) is real, A is self-adjoint. By (a), its
eigenvalues, 1/λk are all negative. If
ZZ
| G(x, y) |2 dxdy < ∞,
Ω×Ω
17.3 Fourier Method for Boundary Value Problems
475
A is a compact operator.
R
We want to justify the last statement. Let (Kf )(x) = Ω k(x, y)f (y) dy be an integral operator
on H = L2 (Ω), with kernel k(x, y) ∈ H = L2 (Ω × Ω). Let {un | n ∈ N} be a CNOS in H;
then {un (x)um (y) | n, m ∈ N} is a CNOS in H. Let knm be the Fourier coefficients of k with
respect to the basis {un (x)um (y)} in H. Then
Z
X
(Kf )(x) =
f (y)
knm un (x)um (y) dy
Ω
=
n,m
X
un (x)
n
=
X
X
knm
m
un (x)
n
X
m
Z
um (y)f (y) dy
Ω
knm hf , um i =
X
n,m
!
knm hf , um i un .
This in particular shows that
2
kKf k =
2
kKk ≤
X
m,n
X
n
2
kmn
X
2
hf , um i ≤
2
kmn
m,n
2
2
2
kf k = kf k
Z
Ω×Ω
k(x, y)2 dxdy = kf k2
X
n
kKun k2
kKun k
We show that K is approximated by the sequence (Kn ) defined by
Kn f =
n
XX
m
r=1
krm hf , um i ur
of finite rank operators. Indeed,
2
k(K − Kn )f k =
∞
X X
2
krm
m r=n+1
2
| hf , um i | ≤ sup
such that
2
kK − Kn k = sup
m
m
∞
X
r=n+1
∞
X
r=n+1
2
krm
kf k2
2
krm
−→ 0
as n → ∞. Hence, K is compact.
(c) By (a) and (b), A is a negative, compact, self-adjoint operator. By the spectral theorem for
compact self-adjoint operators, Theorem 13.33, there exists an NOS (uk ) of eigenfunctions to
1/λk of A. The NOS (uk ) is complete since 0 is not an eigenvalue of A.
Example 17.1 Dirichlet Conditions on the Square. Let Q = (0, π) × (0, π) ⊂ R2 . The
Laplace operator with Dirichlet boundary conditions on Ω has eigenfunctions
umn (x, y) =
2
sin(mx) sin(ny),
π
corresponding to the eigenvalues λmn = −(m2 + n2 ). The eigenfunctions {umn | m, n ∈ N}
form a CNOS in the Hilbert space L2 (Ω).
17 PDE II — The Equations of Mathematical Physics
476
Example 17.2 Dirichlet Conditions on the ball U1 (0) in
Dirichlet boundary conditions on the ball.
−∆u = λu,
R . We consider the BEVP with
2
u |S1 (0) = 0.
In polar coordinates u(x, y) = ũ(r, ϕ) this reads,
1
1 ∂
(r ũr ) + 2 ũϕϕ = −λũ, 0 < r < 1, 0 ≤ ϕ < 2π.
r ∂r
r
Separation of variables. We try the ansatz ũ(r, ϕ) = R(r)Φ(ϕ). We have the boundary condition R(1) = 0 and in R(r) is bounded in a neighborhood of r = 0. Also, Φ is periodic. Then
∂
ũ = R′ Φ and
∂r
∆ũ(r, ϕ) =
∂
∂
(rũr ) =
(rR′ Φ) = (R′ + rR′′ )Φ, ũϕϕ = RΦ′′ .
∂r
∂r
Hence, ∆u = −λu now reads
′
R
R
′′
+ R Φ + 2 Φ′′ = −λRΦ
r
r
R′
r
+ R′′
1 Φ′′
+ 2
= −λ
R
r Φ
Φ′′
rR′ + r 2 R′′
+ λr 2 = −
= µ.
R
Φ
In this way, we obtain the two one-dimensional problems
2
′′
′
Φ′′ + µΦ = 0,
Φ(0) = Φ(2π);
2
| R(0) | < ∞,
r R + rR + (λr − µ)R = 0,
R(1) = 0.
(17.27)
The eigenvalues and eigenfunctions to the first problem are
µk = k 2 ,
Φk (ϕ) = eikϕ ,
k ∈ Z.
2
Equation (17.27) is the Bessel ODE.
√ For µ = k the solution of (17.27) bounded in r = 0 is
given by the Bessel function Jk (r λ). Recall from homework 21.2 that
2n+k
∞
X
(−1)n x2
Jk (x) =
, k ∈ N0 .
n!(n
+
k)!
n=0
To √
determine the eigenvalues
λ we use the boundary condition R(1) = 0 in (17.27), namely
√
Jk ( λ) = 0. Hence, λ = µkj , where µkj , j = 1, 2, . . . , denote the positive zeros of Jk . We
obtain
λkj = µ2kj , Rkj (r) = Jk (µkj r), j = 1, 2, · · · .
The solution of the BEVP is
λkj = µ2kj ,
ukj (x) = J| k | (µ| k |j r)eikϕ ,
k ∈ Z,
j = 1, 2, · · · .
Note that the Bessel functions {Jk | k ∈ Z+ } and the system {eikt | k ∈ Z} form a complete
OS in L2 ((0, 1), r dr) and in in L2 (0, 2π), respectively. Hence, the OS {ukl | k ∈ Z, l ∈ Z+ } is
a complete OS in L2 (U1 (0)). Thus, there are no further solutions to the given BEVP. For more
details on Bessel functions, see [FK98, p. 383].
17.4 Boundary Value Problems for the Laplace and the Poisson Equations
477
17.4 Boundary Value Problems for the Laplace and the Poisson Equations
Throughout this section (if nothing is stated otherwise) we will assume that Ω is a bounded
region in Rn , n ≥ 2. We suppose further that Ω belongs to the class C2 , that is, the boundary
∂Ω consists of finitely many twice continuously differentiable hypersurfaces; Ω ′ := Rn \ Ω is
assumed to be connected (i. e. it is a region, too). All functions are assumed to be real valued.
17.4.1 Formulation of Boundary Value Problems
(a) The Inner Dirichlet Problem:
Given ϕ ∈ C(∂Ω) and f ∈ C(Ω), find u ∈ C(Ω) ∩ C2 (Ω) such that
∆u(x) = f (x) ∀ x ∈ Ω,
and
u(y) = ϕ(y), ∀ y ∈ ∂Ω.
(b) The Exterior Dirichlet Problem:
Given ϕ ∈ C(∂Ω) and f ∈ C(Ω ′ ), find u ∈ C(Ω ′ ) ∩ C2 (Ω ′ ) such that
∆u(x) = f (x), ∀ x ∈ Ω ′ ,
and
u(y) = ϕ(y), ∀ y ∈ ∂Ω,
lim u(x) = 0.
| x |→∞
(c) The Inner Neumann Problem:
n
y
Ω
Given ϕ ∈ C(∂Ω) and f ∈ C(Ω), find u ∈ C1 (Ω) ∩ C2 (Ω)
such that
x=y−tn
∆u(x) = f (x) ∀ x ∈ Ω, and
∂u
(y) = ϕ(y), ∀ y ∈ ∂Ω.
∂~n−
Here
∂u
(y)
∂~
n−
denotes the limit of directional derivative
∂u
(y) = lim ~n(y) · grad u(y − t~n(y))
t→0+0
∂~n−
and ~n(y) is the outer normal to Ω at y ∈ ∂Ω. That is, x ∈ Ω approaches y ∈ ∂Ω in the direction
of the normal vector ~n(y). We assume that this limit exists for all boundary points y ∈ ∂Ω.
17 PDE II — The Equations of Mathematical Physics
478
(d) The Exterior Neumann Problem:
x=y+tn
n
y
Given ϕ ∈ C(∂Ω) and f ∈ C(Ω ′ ), find u ∈ C1 (Ω ′ ) ∩
C2 (Ω ′ ) such that
Ω
∆u(x) = f (x) ∀ x ∈ Ω ′ ,
∂u
(y) = ϕ(y), ∀ y ∈ ∂Ω
∂~n +
lim u(x) = 0.
and
| x |→∞
Here
∂u
(y)
∂~
n+
denotes the limit of directional derivative
∂u
(y) = lim ~n(y) · grad u(y + t~n(y))
t→0+0
∂~n +
and ~n(y) is the outer normal to Ω at y ∈ ∂Ω. We assume that this limit exists for all boundary
points y ∈ ∂Ω. In both Neumann problems one can also look for a function u ∈ C2 (Ω) ∩ C(Ω)
′
or u ∈ C2 (Ω ′ ) ∩ C(Ω ), respectively, provided the above limits exist and define continuous
functions on the boundary.
These four problems are intimately connected with each other, and we will obtain solutions to
all of them simultaneously.
17.4.2 Basic Properties of Harmonic Functions
Recall that a function u ∈ C2 (Ω), is said to be harmonic if ∆u = 0 in Ω.
We say that an operator L on a function space V over Rn is invariant under a affine transformation T , T (x) = A(x) + b, where A ∈ L (Rn ), b ∈ Rn , if
L◦T ∗ = T ∗ ◦L,
where T ∗ : V → V is given by (T ∗ f )(x) = f (T (x)), f ∈ V . It follows that the Laplacian is
invariant under translations (T (x) = x+ b) and rotations T (i. e. T ⊤ T = T T ⊤ = I). Indeed, for
translations, the matrix B with à = BAB ⊤ is the identity and in case of the rotation, B = T −1 .
Since A = I, in both cases, Ã = A = I; the Laplacian is invariant.
In this section we assume that Ω ⊂ Rn is a region where Gauß’ divergence theorem is valid for
all vector fields f ∈ C1 (Ω) ∩ C(Ω) for :
Z
Z
~
div f (x) dx =
f (y) · dS(y),
Ω
∂Ω
where the dot · denotes the inner product in R . The term under the integral can be written as
n
~ = (f1 (y), · · · , fn (y)) · ( dy2 ∧ dy3 ∧ · · · ∧ dyn , − dy1 ∧ dy3 ∧ · · · ∧ dyn ,
ω(y) = f (y) · dS
· · · , (−1)n−1 dy1 ∧ · · · ∧ dyn−1
n
X
=
(−1)k−1 fk (y) dy1 ∧ · · · ∧ d
dyk ∧ · · · ∧ dyn ,
k=1
17.4 Boundary Value Problems for the Laplace and the Poisson Equations
479
where the hat means ommission of this factor. In this way ω(y) becomes a differential (n − 1)form. Using differentiation of forms, see Definition 11.7, we obtain
dω = div f (y) dy1 ∧ dy2 ∧ · · · ∧ dyn .
This establishes the above generalized form of Gauß’ divergence theorem. Let U : ∂Ω → R be
~
a continuous scalar function on ∂Ω, one can define U(y) dS(y) := U(y)~n(y) · dS(y),
where ~n
is the outer unit normal vector to the surface ∂Ω.
Recall that we obtain Green’s first formula inserting f (x) = v(x)∇u(x), u, v ∈ C2 (Ω):
Z
Z
Z
∂u
v(x)∆u(x) dx +
∇u(x) · ∇v(x) dx =
v(y) (y) dS(y).
∂~n
Ω
Ω
∂Ω
Interchanging the role of u and v and taking the difference, we obtain Green’s second formula
Z
Z ∂v
∂u
(y) − u(y) (y) dS(y).
(17.28)
v(y)
(v(x)∆u(x) − u(x)∆v(x)) dx =
∂~n
∂~n
Ω
∂Ω
Recall that
1
log kxk , n = 2,
2π
1
kxk−n+2 ,
En (x) = −
(n − 2)ωn
E2 (x) =
n≥3
are the fundamental solutions of the Laplacian in Rn .
Theorem 17.17 (Green’s representation formula) Let u ∈ C2 (Ω).
Then for x ∈ Ω we have
Z
Z ∂En
∂u
(y) dS(y)
u(x) =
En (x − y) ∆u(y) dy +
u(y)
(x − y) − En (x − y)
∂~ny
∂~n
Ω
∂Ω
(17.29)
Here
y.
∂
∂~
ny
denotes the derivative in the direction of the outer normal with respect to the variable
∂u
∂
Note that the distributions {∆u}, ∂~
δ , and ∂~
(u δ∂Ω ) have compact support such that the
n ∂Ω
n
convolution products with En exist.
Proof. Idea of proof For sufficiently small ε > 0, Uε (x) ⊂ Ω, since Ω is open. We apply
Green’s second formula with v(y) = En (x − y) and Ω \ Uε (x) in place of Ω. Since En (x − y)
is harmonic with respect to the variable y in Ω \ {x} (recall from Example 7.5, that En (x) is
harmonic in Rn \ {0}), we obtain
Z
Z ∂u
∂En (x − y)
En (x − y)∆(y) dy =
En (x − y) (y) − u(y)
dS(y)
∂~n
∂~ny
∂Ω
Ω \ Uε (x)
+
Z
∂Uε (x)
∂En (x − y)
∂u
En (x − y) (y) − u(y)
dS(y). (17.30)
∂~n
∂~ny
17 PDE II — The Equations of Mathematical Physics
480
In the second integral ~n denotes the outer normal to Ω \ Uε (x) hence the inner normal of Uε (x).
We wish to evaluate the limits of the individual integrals in this formula as ε → 0. Consider
the left-hand side of (17.30). Since u ∈ C2 (Ω), ∆u is bounded; since En (x − y) is locally
integrable, the lhs converges to
Z
En (x − y) ∆u(y) dy.
Ω
On ∂Uε (x), we have En (x − y) = βn ε−n+2 , βn = −1/(ωn (n − 2)). Thus as ε → 0,
Z
Z
| β | Z ∂u
∂u(y) | βn |
∂u
n
En (x − y) (y) dS ≤ n−2
dS
∂~n (y) dS ≤ εn−2 sup ∂~n ε
∂~n
Uε (x)
∂Uε (x)
∂Uε (x)
Sε (x)
∂u(y) 1
ωn εn−1 sup = Cε −→ 0.
≤
n−2
(n − 2)ωn ε
∂~n Uε (x)
Furthermore, since ~n is the interior normal of the ball Uε (y), the same calculations as in the
d
proof of Theorem 17.1 show that ∂En∂~(x−y)
= −βn dε
(ε−n+2 ) = −ε−n+1 /ωn . We obtain,
ny
Z
Z
∂En (x − y)
1
−
u(y)
dS(y) =
u(y) dS(y) −→ u(x).
∂~ny
ωn εn−1
∂Uε (x)
Sε (x)
{z
}
|
spherical mean
In the last line we used that the integral is the mean value of u over the sphere Sε (x), and u is
continuous at x.
Remarks 17.4 (a) Green’s representation formula is also true for functions
u ∈ C2 (Ω) ∩ C1 (Ω). To prove this, consider Green’s representation theorem on smaller
regions Ωε ⊂ Ω such that Ωε ⊂ Ω.
(b) Applying Green’s representation formula to a test function ϕ ∈ D(Ω), see Definition 16.1,
ϕ(y) = ∂ϕ
(y) = 0, y ∈ ∂Ω, we obtain
∂~
n
Z
ϕ(x) =
En (x − y)∆ϕ(x) dx
Ω
(c) We may now draw the following consequence from Green’s representation formula: If one
knows ∆u, then u is completely determined by its values and those of its normal derivative on
∂Ω. In particular, a harmonic function on Ω can be reconstructed from its boundary data. One
may ask conversely whether one can construct a harmonic function for arbitrary given values
∂u
of u and ∂~
on ∂Ω. Ignoring regularity conditions, we will find out that this is not possible in
n
general. Roughly speaking, only one of these data is sufficient to describe u completely.
(d) In case of a harmonic function u ∈ C2 (Ω)∩C1 (Ω), ∆u = 0, Green’s representation formula
reads (n = 3):
Z 1
1
∂
∂u(y)
1
u(x) =
dS(y).
(17.31)
− u(y)
4π ∂Ω kx − yk ∂~n
∂~ny kx − yk
17.4 Boundary Value Problems for the Laplace and the Poisson Equations
481
In particular, the surface potentials V (0) (x) and V (1) (x) can be differentiated arbitrarily often
for x ∈ Ω. Outside ∂Ω, V (0) and V (1) are harmonic. It follows from (17.31) that any harmonic
function is a C∞ -function.
Spherical Means and Ball Means
First of all note that u ∈ C1 (Ω) and u harmonic in Ω implies
Z
∂u(y)
dS = 0.
n
∂Ω ∂~
(17.32)
Indeed, this follows from Green’s first formula inserting v = 1 and u harmonic, ∆u = 0.
Proposition 17.18 (Mean Value Property) Suppose that u is harmonic in UR (x0 ) and continuous in UR (x0 ).
(a) Then u(x0 ) coincides with its spherical mean over the sphere SR (x0 ).
Z
1
u(x0 ) =
u(y) dS(y) (spherical mean).
(17.33)
ωn Rn−1
SR (x0 )
(b) Further,
n
u(x0 ) =
ωn R n
Z
u(x) dx (ball mean).
UR (x0 )
Proof. (a) For simplicity, we consider only the case n = 3 and x0 = 0. Apply Green’s representation formula (17.31) to any ball Ω = Uρ (0) with ρ < R. Noting (17.32) from (17.31) it
follows that
!
Z
Z
∂u(y)
1 1
∂ 1
dS −
dS
u(0) =
u(y)
4π ρ Sρ (0) ∂~n
∂~ny kyk
Sρ (0)
Z
Z
1
1
∂ 1
1
dS =
=−
u(y)
u(y) 2 dS
4π Sρ (0)
∂~ny kyk
4π Sρ (0)
ρ
Z
1
=
u(y) dS,
4πρ2 Sρ (0)
Since u is continuous on the closed ball of radius R, the formula remains valid as ρ → R.
(b) Use dx = dx1 · · · dxn = dr dSr where kxk = r. Multiply both sides of (17.33) by
r n−1 dr and integrate with respect to r from 0 to R:
Z R
Z
Z R
1
n−1
n−1
r u(x0 ) dr =
r
u(y) dS dr
ωn r n−1 SR (x0 )
0
0
Z
1 n
1
u(x) dx.
R u(x0 ) =
n
ωn UR (x0 )
The assertion follows. Note that Rn ωn /n is exactly the n-dimensional volume of UR (x0 ). The
proof in case n = 2 is similar.
17 PDE II — The Equations of Mathematical Physics
482
Proposition 17.19 (Minimum-Maximum Principle) Let u be harmonic in Ω and continuous
in Ω. Then
max u(x) = max u(x);
x∈Ω
x∈∂Ω
i. e. u attains its maximum on the boundary ∂Ω. The same is true for the minimum.
Proof. Suppose to the contrary that M = u(x0 ) = max u(x) is attained at an inner point x0 ∈ Ω
x∈Ω
and M > m = max u(x) = u(y0 ), y0 ∈ ∂Ω.
x∈∂Ω
(a) We show that u(x) = M is constant in any ball Uε (x0 ) ⊂ Ω around x0 .
Suppose to the contrary that u(x1 ) < M for some x1 ∈ Uε (x0 ). By continuity of u, u(x) < M
for all x ∈ Uε (x0 ) ∩ Uη (x1 ). In particular
Z
Z
n
n
M = u(x0 ) =
u(x) dx <
M dy = M;
ωn εn Bε (x0 )
ωn εn Bε (x0 )
this is a contradiction; u is constant in Uε (x0 ).
(b) u(x) = M is constant in Ω. Let x1 ∈ Ω; we will show that u(x1 ) = M. Since Ω is
connected and bounded, there exists a path from x0 to x1 which can be covered by a chain of
finitely many balls in Ω. In all balls, starting with the ball around x0 from (a), u(x) = M is
constant. Hence, u is constant in Ω. Since u is continuous, u is constant in Ω. This contradicts
the assumption; hence, the maximum is assumed on the boundary ∂Ω.
Passing from u to −u, the statement about the minimum follows.
Remarks 17.5 (a) A stronger proposition holds with “local maximum” in place of “maximum”
(b) Another stricter version of the maximum principle is:
Let u ∈ C2 (Ω) ∩ C(Ω) and ∆u ≥ 0 in Ω. Then either u is constant or
u(y) < max u(x)
x∈∂Ω
for all y ∈ Ω.
Corollary 17.20 (Uniqueness) The inner and the outer Dirichlet problem has at most one solution, respectively.
Proof. Suppose that u1 and u2 both are solutions of the Dirichlet problem, ∆u1 = ∆u2 = f .
Put u = u1 − u2 . Then ∆u(x) = 0 for all x ∈ Ω and u(y) = 0 on the boundary y ∈ ∂Ω.
(a) Inner problem. By the maximum principle, u(x) = 0 for all x ∈ Ω; that is u1 = u2 .
(b) Suppose that u 6≡ 0. Without loss of generality we may assume that
u(x1 ) = α > 0 for some x1 ∈ Ω ′ . By assumption, | u(x) | → 0 as
B(0)
x → ∞. Hence, there exists r > 0 such that | u(x) | < α/2 for all
x ≥ r. Since u is harmonic in Br (0) \ Ω, the maximum principle yields
Ω
11111111111111111111111111
00000000000000000000000000
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
r
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
00000000000000000000000000
11111111111111111111111111
α = u(x1 ) ≤
a contradiction.
max
x∈SR (0)∪∂Ω
u(x) ≤ α/2;
17.4 Boundary Value Problems for the Laplace and the Poisson Equations
483
Corollary 17.21 (Stability) Suppose that u1 and u2 are solutions of the inner Dirichlet problem ∆u1 = ∆u2 = f with boundary values ϕ1 (y) and ϕ2 (y) on ∂Ω, respectively. Suppose
further that
| ϕ1 (y) − ϕ2 (y) | ≤ ε ∀ y ∈ ∂Ω.
Then | u1 (x) − u2 (x) | ≤ ε for all x ∈ Ω.
A similar statement is true for the exterior Dirichlet problem.
Proof. Put u = u1 − u2 . Then ∆u = 0 and | u(y) | ≤ ε for all y ∈ ∂Ω. By the Maximum
Principle, | u(x) | ≤ ε for all x ∈ Ω.
Lemma 17.22 Suppose that u is a non-constant harmonic function on Ω and the maximum of
u(x) is attained at y ∈ ∂Ω.
∂u
Then ∂~
(y) > 0.
n
For the proof see [Tri92, 3.4.2. Theorem, p. 174].
Proposition 17.23 (Uniqueness) (a) The exterior Neumann problem has at most one solution.
(b) A necessary condition for solvability of the inner Neumann problem is
Z
Z
ϕ dS =
f (x) dx.
∂Ω
Ω
Two solutions of the inner Neumann problem differ by a constant.
Proof. (a) Suppose that u1 and u2 are solutions of the exterior Neumann problem, then u =
∂u
u1 − u2 satisfies ∆u = 0 and
(y) = 0. The above lemma shows that u(x) = c is constant in
∂~n
Ω. Since lim| x |→∞ u(x) = 0, the constant c is 0; hence u1 = u2.
(b) Inner problem. The uniqueness follows as in (a). The necessity of the formula follows from
∂v
(17.28) with v ≡ 1, ∆u = f , ∂~
= 0.
n
Proposition 17.24 (Converse Mean Value Theorem) Suppose that u ∈ C(Ω) and that whenever x0 ∈ Ω such that Ur (x0 ) ⊂ Ω we have the mean value property
Z
Z
1
1
u(y) dS(y) =
u(x0 + ry) dS(y).
u(x0 ) =
ωn r n−1 Sr (x0 )
ωn S1 (0)
Then u ∈ C∞ (Ω) and u is harmonic in Ω.
Proof. (a) We show that u ∈ C∞ (Ω). The Mean Value Property ensures that the mollification
hε ∗ u equals u as long as U1/ε (x0 ) ⊂ Ω; that is, the mollification does not change u. By
17 PDE II — The Equations of Mathematical Physics
484
homework 49.4, u ∈ C∞ since hε is. We prove hε ∗ u = u. Here g denotes the 1-dimensional
bump function, see page 419.
Z
Z
uε (x) = (u ∗ hε )(x) =
u(y)hε (x − y) dy =
u(x − y)hε (y) dy
n
n
R
R
Z
Z
−n
u(x − y)h(y/ε)ε dy =y
u(x − εz)h(z) dz
=
=
Z
Uε (0)
1Z
0
zi =
S1 (0)
= ωn u(x)
Z
i
ε
U1 (0)
u(x − rεy)g(r)r n−1 dS(y) dr
1
g(r)r
0
n−1
dr = u(x)
Z
Rn
h(y) dy = u(x).
R
Second Part. Differentiating the above equation with respect to r yields Ur (x) ∆u(y) dy = 0
for any ball in Ω, since the left-hand side u(x) does not depend on r.
Z
Z
d
u(x + ry) dS(y) =
y · ∇u(x + ry) dS(y)
0=
dr S1 (0)
S1 (0)
Z
=
(r −1 z)∇u(x + z)r 1−n dS(z)
Sr (0)
Z
−n
=r
~n(z) · ∇u(x + z) dS(z)
Sr (0)
Z
∂u
−n
=r
(x + z) dS(z)
n
Sr (0) ∂~
Z
Z
∂u(y)
−n
−n
=r
dS(y) = r
∆u(x) dx.
n
Sr (x0 ) ∂~
Ur (x0 )
In the last line we used Green’s 2nd formula with v = 1. Thus ∆u = 0. Suppose to the contrary
that ∆u(x0 ) 6= 0, say ∆u(x0 ) > 0. By continuity of ∆u(x), ∆u(x) > 0 for x ∈ Uε (x0 ). Hence
R
∆u(x) dx > 0 which contradicts the above equation. We conclude that u is harmonic in
Uε (x0 )
Ur (x0 ).
Remark 17.6 A regular distribution u ∈ D′ (Ω) is called harmonic if ∆u = 0, that is,
Z
u(x) ∆ϕ(x) dx = 0, ϕ ∈ D(Ω).
h∆u , ϕi =
Ω
Weyl’s Lemma: Any harmonic regular distribution is a harmonic function, in particular, u ∈
C∞ (Ω).
Example 17.3 Solve
∆u = −2,
u |∂Ω = 0.
(x, y) ∈ Ω = (0, a) × (−b/2, b/2),
17.5 Appendix
485
Ansatz: u = w + v with ∆w = 0. For example, choose w = −x2 + ax. Then ∆v = 0 with
boundary conditions
v(0, y) = v(a, y) = 0,
v(x, −b/2) = v(x, b/2) = x2 − ax.
Use separation of variables, u(x, y)) = X(x)Y (y) to solve the problem.
17.5 Appendix
17.5.1 Existence of Solutions to the Boundary Value Problems
(a) Green’s Function
Let u ∈ C2 (Ω) ∩ C1 (Ω). Let us combine Green’s representation formula and Green’s
2nd formula with a harmonic function v(x) = vy (x), x ∈ Ω, where y ∈ Ω is thought to be
a parameter.
Z
Z ∂En
∂u
(x) dS(x)
u(y) =
En (x − y) ∆u(x) dx +
u(x)
(x − y) − En (x − y)
∂~nx
∂~n
Ω
∂Ω
Z
Z ∂vy
∂u
0=
(x) − vy (x)
(x) dS(x)
u(x)
vy (x)∆u(x) dx +
∂~n
∂~n
Ω
∂Ω
Adding up these two lines and denoting G(x, y) = En (x − y) + vy (x) we get
Z Z
∂G(x, y)
∂u
(x) dS(x).
G(x, y) ∆u(x) dx +
u(x)
− G(x, y)
u(y) =
∂~nx
∂~n
Ω
∂Ω
Suppose now that G(x, y) vanishes for all x ∈ ∂Ω then the last surface integral is 0 and
Z
Z
∂G(x, y)
u(y) =
G(x, y) ∆u(x) dx +
u(x)
dS(x).
(17.34)
∂~nx
Ω
∂Ω
In the above formula, u is completely determined by its boundary values and ∆u in Ω. This
motivates the following definition.
Definition 17.4 A function G : Ω × Ω → R satisfying
(a) G(x, y) = 0 for all x ∈ ∂Ω, y ∈ Ω, x 6= y.
(b) vy (x) = G(x, y) − En (x − y) is harmonic in x ∈ Ω for all y ∈ Ω.
is called a Green’s function of Ω. More precisely, G(x, y) is a Green’s function to the inner
Dirichlet problem on Ω.
Remarks 17.7 (a) The function vy (x) is in particular harmonic in x = y. Since En (x − y) has
a pole at x = y, G(x, y) has a pole of the same order at x = y such that G(x, y) − En (x − y)
has no singularity.
If such a function G(x, y) exists, for all u ∈ C2 (Ω) we have (17.34).
17 PDE II — The Equations of Mathematical Physics
486
In particular, if in addition, u is harmonic in Ω,
u(y) =
Z
u(x)
∂Ω
∂G(x, y)
dS(x).
∂~nx
(17.35)
This is the so called Poisson’s formula for Ω. In general, it is difficult to find Green’s function.
For most regions Ω it is even impossible to give G(x, y) explicitely. However, if Ω has kind
of symmetry, one can use the reflection principle to construct G(x, y) explicitely. Nevertheless,
G(x, y) exists for all “well-behaved” Ω (the boundary is a C2 -set and Gauß’ divergence theorem
holds for Ω).
(c) The Reflection Principle
This is a method to calculate Green’s function explicitely in case of domains Ω with the following property: Using repeated reflections on spheres and on hyperplanes occurring as boundaries
of Ω and its reflections, the whole Rn can be filled up without overlapping.
Example 17.4 Green’s function on a ball UR (0). For, we use the reflection on the sphere SR (0).
For y ∈ Rn put
.
R
y :=
O
y
−
y
(
y
R2
,
kyk2
∞,
y 6= 0,
y = 0.
Note that this map has the the property y · y = R2 and kyk2 y = R2 y. Points on the sphere
SR (0) are fix under this map, y = y. Let En : R+ → R denote the corresponding to En radial
scalar function with E(x) = En (kxk), that is En (r) = −1/((n − 2)ωn r n−2 ), n ≥ 2. Then we
put

E (kx − yk) − E kyk kx − yk ,
n
n
R
G(x, y) =
En (kxk) − En (R),
y 6= 0,
(17.36)
y = 0.
For x 6= y, G(x, y) is harmonic in x, since for kyk < R, kyk > R and therefore x − y 6= 0. The
function G(x, y) has only one singularity in UR (0) namely at x = y and this is the same as that
of En (x − y). Therefore,

−E kyk kx − yk ,
n
R
vy (x) = G(x, y) − En (x − y) =
En (R),
y 6= 0,
y = 0.
17.5 Appendix
487
is harmonic for all x ∈ Ω. For x ∈ ∂Ω = SR (0) we have for y 6= 0
12 21
kyk
2
2
2
2
− En
G(x, y) = En kxk + kyk − 2x · y
kxk + kyk − 2x · y
R

! 12 
2
2
1
y
kyk kyk
= En R2 + kyk2 − 2x · y 2 − En  kyk2 +
− 2 kyk2 x · 2 
2
R
R
1 1 = En R2 + kyk2 − 2x · y 2 − En kyk2 + R2 − 2x · y 2 = 0.
For y = 0 we have
G(x, 0) = En (kxk) − En (R) = En (R) − En (R) = 0.
This proves that G(x, y) is a Green’s function for UR (0). In particular, the above calculation
shows that r = kx − yk and r = kyk
kx − yk are equal if x ∈ ∂Ω.
R
One can show that Green’s function is symmetric, that is G(x, y) = G(y, x). This is a general
property of Green’s function.
∂
To apply formula (17.34) we have to compute the normal derivative
G(x, y). Note first that
∂~nx
n
for any constant z ∈ R and x ∈ SR (0)
x
x−z
∂
f (kx − zk) = ~n · ∇f (kx − zk) =
· f ′ (kx − zk)
.
∂~nx
kxk
kx − zk
Note further that for kxk = R we have
r = kx − yk =
(x − y) · x −
kyk
kx − yk ,
R
kyk2
(x − y) · x = R2 − kyk2 .
R2
Hence, for y 6= 0,
∂
1
G(x, y) = −
∂~nx
(n − 2)ωn
(17.37)
∂
∂
kx − yk−n+2 −
∂~nx
∂~nx
kyk−n+2
kx − yk−n+2
R−n+2
(17.38)
!!
x
kyk−n+2
x
x−y
x−y
·
− −n+2 kx − yk−n+1
·
kx − yk−n+1
kx − yk kxk
R
kx − yk kxk
!
1
kyk2
=
(x
−
y)
·
x
−
(x
−
y)
·x
ωn r n R
R2
1
=
ωn
By (17.38), the expression in the brackets is R2 − kyk2 . Hence,
!
1
R2 − kyk2
∂
G(x, y) =
.
∂~nx
ωn R
kx − ykn
This formula holds true in case y = 0. Inserting this into (17.34) we have for any harmonic
function u ∈ C2 (UR (0)) ∩ C(UR (0)) we have
Z
R2 − kyk2
u(x)
u(y) =
dS(x).
(17.39)
ωn R
kx − ykn
SR (0)
This is the so called Poisson’s formula for the ball UR (0).
17 PDE II — The Equations of Mathematical Physics
488
Proposition 17.25 Let n ≥ 2. Consider the inner Dirichlet problem in Ω = UR (0) and f = 0.
The function
 R2 −kyk2 R
ϕ(x)
 ωn R
dS(x),
kyk < R,
kx−ykn
SR (0)
u(y) =

ϕ(y),
kyk = R
is continuous on the closed ball UR (0) and harmonic in UR (0).
In case n = 2 the function u(y), can be written in the following form
Z
1
z + y dz
u(y) = Re
, y ∈ UR (0) ⊂ C.
ϕ(z)
2πi SR (0)
z−y z
For the proof of the general statement with n ≥ 2, see [Jos02, Theorem 1.1.2] or [Joh82, p.
107]. We show the last statement for n = 2. Since yz − yz is purely imaginary,
z+y
(z + y)(z − y)
| z |2 − | y |2 + yz − yz
R 2 − | y |2
Re
= Re
= Re
=
.
z−y
(z − y)(z − y)
| z − y |2
| z − y |2
Using the parametrization z = Reit , dt =
Re
1
2π
Z
z + y dz
ϕ(z)
z − y iz
SR (0)
1
=
2π
dz
we obtain
iz
Z
0
2π
R2 − | y |2
ϕ(z) dt =
| z − y |2
R2 − | y |2
2πR
Z
SR (0)
ϕ(x)
| dx | .
| x − y |2
In the last line we have a (real) line integral of the first kind, using x = (x1 , x2 ) = x1 + ix2 = z,
x ∈ SR (0) and | dx | = R dt on the circle.
Other Examples. (a) n = 3. The half-space Ω = {(x1 , x2 , x3 ) ∈ R3 | x3 > 0}. We use the
ordinary reflection map with respect to the plane x3 = 0 which is given by y = (y1 , y2, y3 ) 7→
y ′ = (y1, y2 , −y3 ). Then Green’s function to Ω is
1
1
1
′
G(x, y) = E3 (x, y) − E3 (x, y ) =
.
−
4π kx − y ′k kx − yk
(see homework 57.1)
(b) n = 3. The half ball Ω = {(x1 , x2 , x3 ) ∈ R3 | kxk < R, x3 > 0}. We use the reflections
y → y ′ and y → y (reflection with respect to the sphere SR (0)). Then
G(x, y) = E3 (x − y) −
R
R
E3 (x − y) − E3 (x − y ′ ) +
E3 (x − y ′ )
kyk
kyk
is Green’s function to Ω.
(c) n = 3, Ω = {(x1 , x2 , x3 ) ∈ R3 | x2 > 0, x3 > 0}. We introduce the reflection
y = (y1 , y2 , y3 ) 7→ y ∗ = (y1 , −y2 , y3 ). Then Green’s function to Ω is
G(x, y) = E3 (x − y) − E3 (x − y ′) − E3 (x − y ∗ ) + E3 (x − (y ∗ )′ ).
17.5 Appendix
489
Consider the Neumann problem and the ansatz for Green’s function in case of the Dirichlet
problem:
Z
Z ∂H(x, y)
∂u
u(y) =
(x) dS(x). (17.40)
u(x)
H(x, y) ∆u(x) dx +
− H(x, y)
∂~nx
∂~n
Ω
∂Ω
We want to choose a Green’s function of the second kind H(x, y) in such a way that only the
last surface integral remains present.
Inserting u = 1 in the above formula, we have
Z
∂G(x, y)
dS(x).
1=
∂~nx
∂Ω
= α = const. , this constant must be α = 1/vol(∂Ω).
Imposing, ∂G(x,y)
∂~
nx
Note, that one defines a Green’s function to the Neumann problem replacing G(x, y) = 0 on
∂Ω by the condition ∂~∂ny G(x, y) = const.
Green’s function of second kind (Neumann problem) to the ball of radius R in R3 , UR (0).
1
H(x, y) = −
4π
1
R
1
2R2
+
+ log 2
kx − yk kyk kx − yk R
R − x · y + kyk kx − yk
(c) Existence Theorems
The aim of this considerations is to sketch the method of proving existence of solutions of the
four BVPs.
Definition. Suppose that Ω ⊂ Rn , n ≥ 3 and let µ(y) be a continuous function on the boundary
∂Ω, that is µ ∈ C(∂Ω). We call
Z
w(x) =
µ(y) En(x − y) dS(y), x ∈ Rn ,
(17.41)
∂Ω
a single-layer potential and
v(x) =
Z
∂Ω
µ(y)
∂En
(x − y) dS(y),
∂~n
x ∈ Rn
(17.42)
a double-layer potential
Remarks 17.8 (a) For x 6∈ ∂Ω the integrals (17.41) and (17.42) exist.
(b) The single layer potential u(x) is continuous on Rn . The double-layer potential jumps at
y0 ∈ ∂Ω by µ(y0 ) as x approaches y0 , see (17.43) below.
Theorem 17.26 Let Ω be a connected bounded region in Rn of the class C2 and Ω ′ = Rn \ Ω
also be connected.
Then the interior Dirichlet problem to the Laplace equation has a unique solution. It can be
represented in form of a double-layer potential. The exterior Neumann problem likewise has a
unique solution which can be represented in form of a single-layer potential.
17 PDE II — The Equations of Mathematical Physics
490
Theorem 17.27 Under the same assumptions as in the previous theorem, the inner Neumann
R
problem to the Laplace equation has a solution if and only if ∂Ω ϕ(y) dS(y) = 0. If this
condition is satisfies, the solution is unique up to a constant.
The exterior Dirichlet problem has a unique solution.
Remark. Let µID denote the continuous functions which produces the solution v(x) of the inteR
∂
En (x−y).
rior Dirichlet problem, i. e. v(x) = ∂Ω µID (y) K(x, y) dS(y), where K(x, y) =
∂~ny
Because of the jump relation for v(x) at x0 ∈ ∂Ω:
1
1
v(x) − µ(x0 ) = v(x0 ) = lim ′ v(x) + µ(x0 ),
x→x0 ,x∈Ω
x→x0 ,x∈Ω
2
2
lim
(17.43)
µID satisfies the integral equation
1
ϕ(x) = µID (x) +
2
Z
µID (y)K(x, y) dS(y),
∂Ω
x ∈ ∂Ω.
The above equation can be written as ϕ = (A + 12 I)µID , where A is the above integral operator
in L2 (∂Ω). One can prove the following facts: A is compact, A + 12 I is injective and surjective,
ϕ continuous implies µID continuous. For details, see [Tri92, 3.4].
Application to the Poisson Equation
Consider the inner Dirichlet problem ∆u = f , and u = ϕ on ∂Ω. We suppose that f ∈
C(Ω) ∩ C1 (Ω). We already know that
Z
1
f (y)
dy
w(x) = (En ∗ f )(x) = −
(n − 2)ωn Rn kx − ykn−2
is a distributive solution of the Poisson equation, ∆w = f . By the assumptions on f , w ∈
C2 (Ω) and therefore is a classical solution. To solve the problem we try the ansatz u = w + v.
Then ∆u = ∆w + ∆v = f + ∆v. Hence, ∆u = f if and only if ∆v = 0. Thus, the
inner Dirichlet problem for the Poisson equation reduces to the inner Dirichlet problem for the
Laplace equation ∆v = 0 with boundary values
v(y) = u(y) − w(y) = ϕ(y) − w(y) =: ϕ̃(y),
y ∈ ∂Ω.
Since ϕ and w are continuous on ∂Ω, so is ϕ̃.
17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet
Principle
(a) The Dirichlet Principle
Consider the inner Dirichlet problem to the Poisson equation with given data f ∈ C(Ω) and
ϕ ∈ C(∂Ω).
Put
C1ϕ (Ω) := {v ∈ C1 (Ω) | v = ϕ on ∂Ω}.
17.5 Appendix
491
On this space define the Dirichlet integral by
Z
Z
1
2
E(v) =
k∇vk dx +
f · v dx,
2 Ω
Ω
v ∈ C1ϕ (Ω).
(17.44)
This integral is also called energy integral. The Dirichlet principle says that among all functions
v with given boundary values ϕ, the function u with ∆u = f minimizes the energy integral E.
Proposition 17.28 A function u ∈ C1ϕ (Ω) ∩ C2 (Ω) is a solution of the inner Dirichlet problem
if and only if the energy integral E attains its minimum on C1ϕ (Ω) at u.
Proof. (a) Suppose first that u ∈ C1ϕ (Ω) ∩ C2 (Ω) is a solution of the inner Dirichlet problem,
∆u = f . For v ∈ C1ϕ (Ω) let w = v − u ∈ C1ϕ (Ω). Then
Z
Z
1
E(v) = E(u + w) =
(∇u + ∇w) · (∇u + ∇w) dx + (u + w)f dx
2 Ω
Z
Z
Z
ZΩ
1
1
k∇uk2 +
k∇wk2 +
∇u · ∇w dx + (u + w)f dx
=
2 Ω
2 Ω
Ω
Ω
Since u and v satisfy the same boundary conditions, w |∂Ω = 0. Further, ∆u = f . By Green’s
1st formula,
Z
Z
Z
Z
∂u
w dS = −
∇u · ∇w dx = − (∆u) w dx +
f w dx.
n
Ω
Ω
∂Ω ∂~
Ω
Inserting this into the above equation, we have
Z
Z
Z
Z
1
1
2
2
k∇uk +
k∇wk −
f w dx + (u + w)f dx
E(v) =
2 Ω
2 Ω
Ω
Ω
Z
1
k∇wk2 dx ≥ E(u).
= E(u) +
2 Ω
This shows that E(u) is minimal.
(b) Conversely, let u ∈ C1ϕ (Ω) ∩ C2 (Ω) minimize the energy integral. In particular, for any test
function ψ ∈ D(Ω), ψ has zero boundary values, the function
Z
Z
1 2
(∇u · ∇ψ + f ψ) dx + t
k∇ψk2 dx
g(t) = E(u + tψ) = E(u) + t
2
Ω
Ω
has a local minimum at t = 0. Hence, g ′(0) = 0 which is, again by Green’s 1st formula and
ψ |∂Ω = 0, equivalent to
Z
Z
0 = (∇u · ∇ψ + f ψ) dx =
(−∆u + f ) ψ dx.
Ω
Ω
By the fundamental Lemma of calculus of variations, ∆u = f almost everywhere on Ω. Since
both ∆u and f are continuous, this equation holds pointwise for all x ∈ Ω.
17 PDE II — The Equations of Mathematical Physics
492
(b) Hilbert Space Methods
We want to give another reformulation of the Dirichlet problem. Consider the problem
∆u = −f,
u |∂Ω = 0.
On C1 (Ω) define a bilinear map
u·vE =
Z
Ω
∇u · ∇v dx.
C1 (Ω) is not yet an inner product space since for any non-vanishing constant function u, u·uE =
0. Denote by C10 (Ω) the subspace of functions in C1 (Ω) vanishing on the boundary ∂Ω. Now,
u·vE is an inner product on C10 (Ω). The positive definiteness is a consequence of the Poincaré
R
inequality below. Its corresponding norm is kuk2E = Ω k∇uk2 dx. Let u be a solution of the
above Dirichlet problem. Then for any v ∈ C10 (Ω), by Green’s 1st formula
Z
Z
Z
v ·uE =
∇v · ∇u dx = −
v ∆u dx =
v f dx = v ·fL2 .
Ω
Ω
Ω
This suggests that u can be found by representing the known linear functional in v
Z
F (v) =
v f dx
Ω
as an inner product v ·uE . To make use of Riesz’s representations theorem, Theorem 13.8, we
have to complete C10 (Ω) into a Hilbert space W with respect to the energy norm k·kE and to
prove that the above linear functional F is bounded with respect to the energy norm. This is
a consequence of the next lemma. We make the same assumtions on Ω as in the beginning of
Section 17.4.
Lemma 17.29 (Poincaré inequality) Let Ω ⊂ Rn . Then there exists C > 0 such that for all
u ∈ C10 (Ω)
kukL2 (Ω) ≤ C kukE .
Proof. Let Ω be contained in the cube Γ = {x ∈ Rn | | xi | ≤ a, i = 1, . . . , n}. We extend u
by zero outside Ω. For any x = (x1 , . . . , xn ), by the Fundamental Theorem of Calculus
Z x1
2 Z x1
2
2
ux1 (y1 , x2 , . . . , xn ) dy1 =
1 · ux1 (y1 , x2 , . . . , xn ) dy1
u(x) =
−a
−a
Z x1
Z x1
Z x1
Z a
2
2
≤
dy1
ux1 dy1 = (x1 + a)
ux1 dy1 ≤ 2a
u2x1 dy1 .
CSI
−a
−a
−a
−a
Since the last integral does not depend on x1 , integration with respect to x1 gives
Z a
Z a
2
2
u(x) dx1 ≤ 4a
u2x1 dy1 .
−a
−a
Integrating over x2 , . . . , xn from −a to a we find
Z
Z
2
2
u dx ≤ 4a
u2x1 dy.
Γ
Γ
17.5 Appendix
493
The same inequality holds for xi , i = 2, . . . , n in place of x1 such that
Z
Z
4a2
2
2
kukL2 =
u dx ≤
(∇u)2 dx = C 2 kuk2E ,
n
Γ
Γ
√
where C = 2a/ n.
The Poincaré inequality is sometimes called Poincaré–Friedrich inequality. It remains true for
functions u in the completion W . Let us discuss the elements of W in more detail. By definition,
f ∈ W if there is a Cauchy sequence (fn ) in C10 (Ω) such that (fn ) “converges to f ” in the
energy norm. By the Poincaré inequality, (fn ) is also an L2 -Cauchy sequence. Since L2 (Ω)
is complete, (fn ) has an L2 -limit f . This shows W ⊆ L2 (Ω). For simplicity, let Ω ⊂ R1 .
R
By definition of the energy norm, Ω | (fn − fm )′ |2 dx → 0, as m, n → ∞; that is (fn′ ) is an
L2 -Cauchy sequence, too. Hence, (fn′ ) has also some L2 -limit, say g ∈ L2 (Ω). So far,
kfn − f kL2 −→ 0,
kfn′ − gkL2 −→ 0.
(17.45)
We will show that the above limits imply f ′ = g in D′ (Ω). Indeed, by (17.45) and the Cauchy–
Schwarz inequality, for all ϕ ∈ D(Ω),
Z
′
Ω
Z
(fn − f )ϕ dx ≤
Ω
Hence,
Z
′
Ω
(fn′ − g)ϕ dx ≤
f ϕ dx = −
Z
Ω
′
Z
Ω
Z
Ω
21 Z
12
′ 2
−→ 0,
| fn − f | dx
| ϕ | dx
2
Ω
12 Z
21
2
2
′
−→ 0.
| fn − g | dx
| ϕ | dx
Ω
f ϕ dx = − lim
n→∞
Z
′
fn ϕ dx = lim
Ω
n→∞
Z
Ω
fn′
ϕ dx =
Z
g ϕ dx.
Ω
This shows f ′ = g in D′ (Ω). One says that the elements of W provide weak derivatives, that
is, its distributive derivative is an L2 -function (and hence a regular distribution).
Also, the inner product ···E is positive definite since the L2 -inner product is. It turns out that W
is a separable Hilbert space. W is the so called Sobolev space W01,2 (Ω) sometimes also denoted
by H10 (Ω). The upper indices 1 and 2 in W01,2 (Ω) refer to the highest order of partial derivatives
(| α | = 1) and the Lp -space (p = 2) in the definition of W , respectively. The lower index 0
refers to the so called generalized boundary values 0. For further readings on Sobolev spaces,
see [Fol95, Chapter 6].
R
Corollary 17.30 F (v) = v ·fL2 = Ω f v dx defines a bounded linear functional on W .
Proof. By the Cauchy–Schwarz and Poincaré inequalities,
Z
| F (v) | ≤
| f v | dx ≤ kf kL2 kvkL2 ≤ C kf kL2 kvkE .
Ω
Hence, F is bounded with kF k ≤ C kf kL2 .
17 PDE II — The Equations of Mathematical Physics
494
Corollary 17.31 Let f ∈ C(Ω). Then there exists a unique u ∈ W such that
v ·uE = v ·fL2 ,
∀ v ∈ W.
This u solves
∆u = −f,
in
D′ (Ω).
The first statement is a consequence of Riesz’s representation theorem; note that F is a bounded
linear functional on the Hilbert space W . The last statement follows from D(Ω) ⊂ C10 (Ω) and
u·ϕE = − h∆u , ϕi = f ·ϕL2 . This is the so called modified Dirichlet problem. It remains open
the task to identify the solution u ∈ W with an ordinary function u ∈ C2 (Ω).
17.5.3 Numerical Methods
(a) Difference Methods
Since most ODE and PDE are not solvable in a closed form there are a lot of methods to
find approximative solutions to a given equation or a given problem. A general principle is
discretization. One replaces the derivative u′(x) by one of its difference quotients
∂ + u(x) =
u(x + h) − u(x)
,
h
∂ − u(x) =
u(x) − u(x − h)
,
h
where h is called the step size. One can also use a symmetric difference
Five-Point formula for the Laplacian in R2 is then given by
u(x+h)−u(x−h)
.
2h
The
∆h u(x, y) := (∂x− ∂x+ + ∂y− ∂y+ )u(x, y) =
u(x − h, y) + u(x + h, y) + u(x, y − h) + u(x, y + h) − 4u(x, y)
=
h2
Besides the equation, the domain Ω as well as its boundary ∂Ω undergo a discretization: If
Ω = (0, 1) × (0, 1) then
Ωh = {(nh, mh) ∈ Ω | n, m ∈ N},
∂Ωh = {(nh, mh) ∈ ∂Ω | n, m ∈ Z}.
The discretization of the inner Dirichlet problem then reads
∆h u = f,
u |∂Ωh = ϕ.
x ∈ Ωh ,
Also, Neumann problems have discretizations, [Hac92, Chapter 4].
(b) The Ritz–Galerkin Method
Suppose we have a boundary value problem in its variational formulation:
Find u ∈ V , so that u·vE = F (v) for all v ∈ V ,
17.5 Appendix
495
where we are thinking of the Sobolev space V = W from the previous paragraph. Of course,
F is assumed to be bounded.
Difference methods arise through discretising the differential operator. Now we wish to leave
the differential operator which is hidden in ···E unchanged. The Ritz–Galerkin method consists
in replacing the infinite-dimensional space V by a finite-dimensional space VN ,
VN ⊂ V,
dim VN = N < ∞.
VN equipped with the norm k·kE is still a Banach space. Since VN ⊆ V , both the inner product
···E and F are defined for u, v ∈ VN . Thus, we may pose the problem
Find uN ∈ VN , so that uN ·vE = F (v) for all v ∈ VN ,
The solution to the above problem, if it exists, is called Ritz–Galerkin solution (belonging to
VN ).
An introductory example is to be found in [Hac92, 8.1.11, p. 164], see also [Bra01, Chapter 2].
496
17 PDE II — The Equations of Mathematical Physics
Bibliography
[AF01]
I. Agricola and T. Friedrich. Globale Analysis (in German). Friedr. Vieweg & Sohn,
Braunschweig, 2001.
[Ahl78] L. V. Ahlfors. Complex analysis. An introduction to the theory of analytic functions of one complex variable. International Series in Pure and Applied Mathematics.
McGraw-Hill Book Co., New York, 3 edition, 1978.
[Arn04] V. I. Arnold. Lectures in Partial Differential Equations. Universitext. Springer and
Phasis, Berlin. Moscow, 2004.
[Bra01] D. Braess. Finite elements. Theory, fast solvers, and applications in solid mechanics.
Cambridge University Press, Cambridge, 2001.
[Bre97] Glen E. Bredon. Topology and geometry. Number 139 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1997.
[Brö92] Th. Bröcker. Analysis II (German). B. I. Wissenschaftsverlag, Mannheim, 1992.
[Con78] J. B. Conway. Functions of one complex variable. Number 11 in Graduate Texts in
Mathematics. Springer-Verlag, New York, 1978.
[Con90] J. B. Conway. A course in functional analysis. Number 96 in Graduate Texts in
Mathematics. Springer-Verlag, New York, 1990.
[Cou88] R. Courant. Differential and integral calculus I–II. Wiley Classics Library. John
Wiley & Sons, New York etc., 1988.
[Die93] J. Dieudonné. Treatise on analysis. Volume I – IX. Pure and Applied Mathematics.
Academic Press, Boston, 1993.
[Els02]
J. Elstrodt. Maß- und Integrationstheorie. Springer, Berlin, third edition, 2002.
[Eva98] L. C. Evans. Partial differential equations. Number 19 in Graduate Studies in Mathematics. AMS, Providence, 1998.
[FB93]
E. Freitag and R. Busam. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag,
Berlin, 1993.
497
BIBLIOGRAPHY
498
[FK98]
H. Fischer and H. Kaul. Mathematik für Physiker. Band 2. Teubner Studienbücher:
Mathematik. Teubner, Stuttgart, 1998.
[FL88]
W. Fischer and I. Lieb. Ausgewählte Kapitel der Funktionentheorie. Number 48 in
Vieweg Studium. Friedrich Vieweg & Sohn, Braunschweig, 1988.
[Fol95]
G. B. Folland. Introduction to partial differential equations. Princeton University
Press, Princeton, 1995.
[For81]
O. Forster. Analysis 3 (in German). Vieweg Studium: Aufbaukurs Mathematik.
Vieweg, Braunschweig, 1981.
[For01]
O. Forster. Analysis 1 – 3 (in German). Vieweg Studium: Grundkurs Mathematik.
Vieweg, Braunschweig, 2001.
[GS64]
I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen).
III: Einige Fragen zur Theorie der Differentialgleichungen. (German). Number 49
in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1964.
[GS69]
I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). I:
Verallgemeinerte Funktionen und das Rechnen mit ihnen. (German). Number 47
in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1969.
[Hac92] W. Hackbusch. Theory and numerical treatment. Number 18 in Springer Series in
Computational Mathematics. Springer-Verlag, Berlin, 1992.
[Hen88] P. Henrici. Applied and computational complex analysis. Vol. 1. Wiley Classics
Library. John Wiley & Sons, Inc., New York, 1988.
[How03] J. M. Howie. Complex analysis.
Springer-Verlag, London, 2003.
[HS91]
Springer Undergraduate Mathematics Series.
F. Hirzebruch and W. Scharlau. Einführung in die Funktionalanalysis. Number 296
in BI-Hochschultaschenbücher. BI-Wissenschaftsverlag, Mannheim, 1991.
[HW96] E. Hairer and G. Wanner. Analysis by its history. Undergraduate texts in Mathematics.
Readings in Mathematics. Springer-Verlag, New York, 1996.
[Jän93]
K. Jänich. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, third edition, 1993.
[Joh82]
F. John. Partial differential equations. Number 1 in Applied Mathematical Sciences.
Springer-Verlag, New York, 1982.
[Jos02]
J. Jost. Partial differential equations. Number 214 in Graduate Texts in Mathematics.
Springer-Verlag, New York, 2002.
BIBLIOGRAPHY
499
[KK71] A. Kufner and J. Kadlec. Fourier Series. G. A. Toombs Iliffe Books, London, 1971.
[Kno78] K. Knopp. Elemente der Funktionentheorie. Number 2124 in Sammlung Göschen.
Walter de Gruyter, Berlin, New York, 9 edition, 1978.
[Kön90] K. Königsberger. Analysis 1 (English). Springer-Verlag, Berlin, Heidelberg, New
York, 1990.
[Lan89] S. Lang. Undergraduate Analysis. Undergraduate texts in mathematics. Springer,
New York-Heidelberg, second edition, 1989.
[MW85] J. Marsden and A. Weinstein. Calculus. I, II, III. Undergraduate Texts in Mathematics. Springer-Verlag, New York etc., 1985.
[Nee97] T. Needham. Visual complex analysis. Oxford University Press, New York, 1997.
[O’N75] P. V. O’Neil. Advanced calculus. Collier Macmillan Publishing Co., London, 1975.
[RS80]
M. Reed and B. Simon. Methods of modern mathematical physics. I. Functional
Analysis. Academic Press, Inc., New York, 1980.
[Rud66] W. Rudin. Real and Complex Analysis. International Student Edition. McGraw-Hill
Book Co., New York-Toronto, 1966.
[Rud76] W. Rudin. Principles of mathematical analysis. International Series in Pure and Applied Mathematics. McGraw-Hill Book Co., New York-Auckland-Düsseldorf, third
edition, 1976.
[Rüh83] F. Rühs. Funktionentheorie. Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 4 edition, 1983.
[Spi65]
M. Spivak. Calculus on manifolds. W. A. Benjamin, New York, Amsterdam, 1965.
[Spi80]
M. Spivak. Calculus. Publish or Perish, Inc., Berkeley, California, 1980.
[Str92]
W. A. Strauss. Partial differential equations. John Wiley & Sons, New York, 1992.
[Tri92]
H. Triebel. Higher analysis. Hochschulbücher für Mathematik. Johann Ambrosius
Barth Verlag GmbH, Leipzig, 1992.
[vW81] C. von Westenholz. Differential forms in mathematical physics. Number 3 in Studies in Mathematics and its Applications. North-Holland Publishing Co., AmsterdamNew York, second edition, 1981.
[Wal74] W. Walter. Einführung in die Theorie der Distributionen (in German). Bibliographisches Institut, B.I.- Wissenschaftsverlag, Mannheim-Wien-Zürich, 1974.
[Wal02] W. Walter. Analysis 1–2 (in German). Springer-Lehrbuch. Springer, Berlin, fifth
edition, 2002.
500
BIBLIOGRAPHY
[Wla72] V. S. Wladimirow. Gleichungen der mathematischen Physik (in German). Number 74
in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften,
Berlin, 1972.
PD D R . A. S CH ÜLER
M ATHEMATISCHES I NSTITUT
U NIVERSIT ÄT L EIPZIG
04009 L EIPZIG
[email protected]