Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Calculus 1 to 4 (2004–2006) Axel Schüler January 3, 2007 2 Contents 1 Real and Complex Numbers Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . Sums and Products . . . . . . . . . . . . . . . . . . . . . . Mathematical Induction . . . . . . . . . . . . . . . . . . . . Binomial Coefficients . . . . . . . . . . . . . . . . . . . . . 1.1 Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 Ordered Sets . . . . . . . . . . . . . . . . . . . . . 1.1.2 Fields . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.3 Ordered Fields . . . . . . . . . . . . . . . . . . . . 1.1.4 Embedding of natural numbers into the real numbers 1.1.5 The completeness of R . . . . . . . . . . . . . . . . 1.1.6 The Absolute Value . . . . . . . . . . . . . . . . . . 1.1.7 Supremum and Infimum revisited . . . . . . . . . . 1.1.8 Powers of real numbers . . . . . . . . . . . . . . . . 1.1.9 Logarithms . . . . . . . . . . . . . . . . . . . . . . 1.2 Complex numbers . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 The Complex Plane and the Polar form . . . . . . . 1.2.2 Roots of Complex Numbers . . . . . . . . . . . . . 1.3 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Monotony of the Power and Exponential Functions . 1.3.2 The Arithmetic-Geometric mean inequality . . . . . 1.3.3 The Cauchy–Schwarz Inequality . . . . . . . . . . . 1.4 Appendix A . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Sequences and Series 2.1 Convergent Sequences . . . . . . . . . . . 2.1.1 Algebraic operations with sequences 2.1.2 Some special sequences . . . . . . 2.1.3 Monotonic Sequences . . . . . . . 2.1.4 Subsequences . . . . . . . . . . . . 2.2 Cauchy Sequences . . . . . . . . . . . . . 2.3 Series . . . . . . . . . . . . . . . . . . . . 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 11 11 12 12 13 15 15 17 19 20 21 22 23 24 26 29 31 33 34 34 34 35 36 . . . . . . . 43 43 46 49 50 51 55 57 CONTENTS 4 2.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.3.8 2.3.9 2.3.10 2.3.11 Properties of Convergent Series . . . Operations with Convergent Series . . Series of Nonnegative Numbers . . . The Number e . . . . . . . . . . . . The Root and the Ratio Tests . . . . . Absolute Convergence . . . . . . . . Decimal Expansion of Real Numbers Complex Sequences and Series . . . Power Series . . . . . . . . . . . . . Rearrangements . . . . . . . . . . . . Products of Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 59 59 61 63 65 66 67 68 69 72 3 Functions and Continuity 75 3.1 Limits of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76 3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity . . . . . . . . . 77 3.2 Continuous Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 3.2.1 The Intermediate Value Theorem . . . . . . . . . . . . . . . . . . . . . 81 3.2.2 Continuous Functions on Bounded and Closed Intervals—The Theorem about Maximum and M 3.3 Uniform Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 3.4 Monotonic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses . . . 86 3.5.1 Exponential and Logarithm Functions . . . . . . . . . . . . . . . . . . 86 3.5.2 Trigonometric Functions and their Inverses . . . . . . . . . . . . . . . 89 3.5.3 Hyperbolic Functions and their Inverses . . . . . . . . . . . . . . . . . 94 3.6 Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 3.6.1 Monotonic Functions have One-Sided Limits . . . . . . . . . . . . . . 95 3.6.2 Proofs for sin x and cos x inequalities . . . . . . . . . . . . . . . . . . 96 3.6.3 Estimates for π . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 4 Differentiation 4.1 The Derivative of a Function . . . . . . . . . 4.2 The Derivatives of Elementary Functions . . 4.2.1 Derivatives of Higher Order . . . . . 4.3 Local Extrema and the Mean Value Theorem 4.3.1 Local Extrema and Convexity . . . . 4.4 L’Hospital’s Rule . . . . . . . . . . . . . . . 4.5 Taylor’s Theorem . . . . . . . . . . . . . . . 4.5.1 Examples of Taylor Series . . . . . . 4.6 Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 101 107 108 108 111 112 113 115 117 5 Integration 119 5.1 The Riemann–Stieltjes Integral . . . . . . . . . . . . . . . . . . . . . . . . . . 119 5.1.1 Properties of the Integral . . . . . . . . . . . . . . . . . . . . . . . . . 126 CONTENTS 5.2 5.3 5.4 5.5 5.6 5 Integration and Differentiation . . . . . . . . . . . . . . . 5.2.1 Table of Antiderivatives . . . . . . . . . . . . . . 5.2.2 Integration Rules . . . . . . . . . . . . . . . . . . 5.2.3 Integration of Rational Functions . . . . . . . . . 5.2.4 Partial Fraction Decomposition . . . . . . . . . . 5.2.5 Other Classes of Elementary Integrable Functions . Improper Integrals . . . . . . . . . . . . . . . . . . . . . 5.3.1 Integrals on unbounded intervals . . . . . . . . . . 5.3.2 Integrals of Unbounded Functions . . . . . . . . . 5.3.3 The Gamma function . . . . . . . . . . . . . . . . Integration of Vector-Valued Functions . . . . . . . . . . . Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . Appendix D . . . . . . . . . . . . . . . . . . . . . . . . . 5.6.1 More on the Gamma Function . . . . . . . . . . . 6 Sequences of Functions and Basic Topology 6.1 Discussion of the Main Problem . . . . . . . . . . 6.2 Uniform Convergence . . . . . . . . . . . . . . . . 6.2.1 Definitions and Example . . . . . . . . . . 6.2.2 Uniform Convergence and Continuity . . . 6.2.3 Uniform Convergence and Integration . . . 6.2.4 Uniform Convergence and Differentiation . 6.3 Fourier Series . . . . . . . . . . . . . . . . . . . . 6.3.1 An Inner Product on the Periodic Functions 6.4 Basic Topology . . . . . . . . . . . . . . . . . . . 6.4.1 Finite, Countable, and Uncountable Sets . . 6.4.2 Metric Spaces and Normed Spaces . . . . . 6.4.3 Open and Closed Sets . . . . . . . . . . . 6.4.4 Limits and Continuity . . . . . . . . . . . 6.4.5 Comleteness and Compactness . . . . . . . 6.4.6 Continuous Functions in Rk . . . . . . . . 6.5 Appendix E . . . . . . . . . . . . . . . . . . . . . 7 Calculus of Functions of Several Variables 7.1 Partial Derivatives . . . . . . . . . . . . . 7.1.1 Higher Partial Derivatives . . . . 7.1.2 The Laplacian . . . . . . . . . . . 7.2 Total Differentiation . . . . . . . . . . . . 7.2.1 Basic Theorems . . . . . . . . . . 7.3 Taylor’s Formula . . . . . . . . . . . . . 7.3.1 Directional Derivatives . . . . . . 7.3.2 Taylor’s Formula . . . . . . . . . 7.4 Extrema of Functions of Several Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132 134 135 138 140 141 143 143 146 147 148 150 151 152 . . . . . . . . . . . . . . . . 157 157 158 158 162 163 166 168 171 177 177 178 180 182 185 187 188 . . . . . . . . . 193 194 197 199 199 202 206 206 208 211 CONTENTS 6 7.5 7.6 7.7 7.8 7.9 The Inverse Mapping Theorem . . . . . . . The Implicit Function Theorem . . . . . . . Lagrange Multiplier Rule . . . . . . . . . . Integrals depending on Parameters . . . . . 7.8.1 Continuity of I(y) . . . . . . . . . 7.8.2 Differentiation of Integrals . . . . . 7.8.3 Improper Integrals with Parameters Appendix . . . . . . . . . . . . . . . . . . 8 Curves and Line Integrals 8.1 Rectifiable Curves . . . . . 8.1.1 Curves in Rk . . . 8.1.2 Rectifiable Curves 8.2 Line Integrals . . . . . . . 8.2.1 Path Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 Integration of Functions of Several Variables 9.1 Basic Definition . . . . . . . . . . . . . . . . . 9.1.1 Properties of the Riemann Integral . . . 9.2 Integrable Functions . . . . . . . . . . . . . . 9.2.1 Integration over More General Sets . . 9.2.2 Fubini’s Theorem and Iterated Integrals 9.3 Change of Variable . . . . . . . . . . . . . . . 9.4 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Surface Integrals 10.1 Surfaces in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.1.1 The Area of a Surface . . . . . . . . . . . . . . . . . . . . 10.2 Scalar Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Other Forms for dS . . . . . . . . . . . . . . . . . . . . . 10.2.2 Physical Application . . . . . . . . . . . . . . . . . . . . . 10.3 Surface Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3.1 Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Gauß’ Divergence Theorem . . . . . . . . . . . . . . . . . . . . . . 10.5 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.5.1 Green’s Theorem . . . . . . . . . . . . . . . . . . . . . . . 10.5.2 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . 10.5.3 Vector Potential and the Inverse Problem of Vector Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216 219 223 225 225 225 227 230 . . . . . 231 231 231 233 236 239 . . . . . . . 245 245 247 248 249 250 253 257 . . . . . . . . . . . . 259 259 261 262 262 264 264 264 268 272 272 274 276 . . . . 279 279 279 284 285 R 11 Differential Forms on n 11.1 The Exterior Algebra Λ(Rn ) . . . 11.1.1 The Dual Vector Space V ∗ 11.1.2 The Pull-Back of k-forms 11.1.3 Orientation of Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS 7 11.2 Differential Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.2 Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.3 Pull-Back . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2.4 Closed and Exact Forms . . . . . . . . . . . . . . . . . . . . 11.3 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator 11.3.2 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.3 Stokes’ Theorem . . . . . . . . . . . . . . . . . . . . . . . . 11.3.4 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . 11.3.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Measure Theory and Integration 12.1 Measure Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Algebras, σ-algebras, and Borel Sets . . . . . . . . . . . . . 12.1.2 Additive Functions and Measures . . . . . . . . . . . . . . 12.1.3 Extension of Countably Additive Functions . . . . . . . . . 12.1.4 The Lebesgue Measure on Rn . . . . . . . . . . . . . . . . 12.2 Measurable Functions . . . . . . . . . . . . . . . . . . . . . . . . . 12.3 The Lebesgue Integral . . . . . . . . . . . . . . . . . . . . . . . . 12.3.1 Simple Functions . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Positive Measurable Functions . . . . . . . . . . . . . . . 12.4 Some Theorems on Lebesgue Integrals . . . . . . . . . . . . . . . . 12.4.1 The Role Played by Measure Zero Sets . . . . . . . . . . . 12.4.2 The space Lp (X, µ) . . . . . . . . . . . . . . . . . . . . . . 12.4.3 The Monotone Convergence Theorem . . . . . . . . . . . . 12.4.4 The Dominated Convergence Theorem . . . . . . . . . . . 12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals . 12.4.6 The Riemann and the Lebesgue Integrals . . . . . . . . . . 12.4.7 Appendix: Fubini’s Theorem . . . . . . . . . . . . . . . . . 13 Hilbert Space 13.1 The Geometry of the Hilbert Space . . . . . . . 13.1.1 Unitary Spaces . . . . . . . . . . . . . 13.1.2 Norm and Inner product . . . . . . . . 13.1.3 Two Theorems of F. Riesz . . . . . . . 13.1.4 Orthogonal Sets and Fourier Expansion 13.1.5 Appendix . . . . . . . . . . . . . . . . 13.2 Bounded Linear Operators in Hilbert Spaces . . 13.2.1 Bounded Linear Operators . . . . . . . 13.2.2 The Adjoint Operator . . . . . . . . . . 13.2.3 Classes of Bounded Linear Operators . 13.2.4 Orthogonal Projections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 285 285 286 288 291 293 293 295 296 298 299 . . . . . . . . . . . . . . . . . 305 305 306 308 313 314 316 318 318 319 322 322 324 325 326 327 329 329 . . . . . . . . . . . 331 331 331 334 335 339 343 344 344 347 349 351 CONTENTS 8 13.2.5 Spectrum and Resolvent . . . . . . . . . . . . . . . . . . . . . . . . . 353 13.2.6 The Spectrum of Self-Adjoint Operators . . . . . . . . . . . . . . . . . 357 14 Complex Analysis 14.1 Holomorphic Functions . . . . . . . . . . . . . 14.1.1 Complex Differentiation . . . . . . . . 14.1.2 Power Series . . . . . . . . . . . . . . 14.1.3 Cauchy–Riemann Equations . . . . . . 14.2 Cauchy’s Integral Formula . . . . . . . . . . . 14.2.1 Integration . . . . . . . . . . . . . . . 14.2.2 Cauchy’s Theorem . . . . . . . . . . . 14.2.3 Cauchy’s Integral Formula . . . . . . . 14.2.4 Applications of the Coefficient Formula 14.2.5 Power Series . . . . . . . . . . . . . . 14.3 Local Properties of Holomorphic Functions . . 14.4 Singularities . . . . . . . . . . . . . . . . . . . 14.4.1 Classification of Singularities . . . . . 14.4.2 Laurent Series . . . . . . . . . . . . . 14.5 Residues . . . . . . . . . . . . . . . . . . . . . 14.5.1 Calculating Residues . . . . . . . . . . 14.6 Real Integrals . . . . . . . . . . . . . . . . . . 14.6.1 Rational Functions in Sine and Cosine . R∞ 14.6.2 Integrals of the form −∞ f (x) dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Partial Differential Equations I — an Introduction 15.1 Classification of PDE . . . . . . . . . . . . . . . . 15.1.1 Introduction . . . . . . . . . . . . . . . . . 15.1.2 Examples . . . . . . . . . . . . . . . . . . 15.2 First Order PDE — The Method of Characteristics 15.3 Classification of Semi-Linear Second-Order PDEs . 15.3.1 Quadratic Forms . . . . . . . . . . . . . . 15.3.2 Elliptic, Parabolic and Hyperbolic . . . . . 15.3.3 Change of Coordinates . . . . . . . . . . . 15.3.4 Characteristics . . . . . . . . . . . . . . . 15.3.5 The Vibrating String . . . . . . . . . . . . 16 Distributions 16.1 Introduction — Test Functions and Distributions . 16.1.1 Motivation . . . . . . . . . . . . . . . . 16.1.2 Test Functions D(Rn ) and D(Ω) . . . . 16.2 The Distributions D′ (Rn ) . . . . . . . . . . . . . 16.2.1 Regular Distributions . . . . . . . . . . . 16.2.2 Other Examples of Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 363 363 365 366 369 369 371 373 377 380 383 385 386 387 392 394 395 395 396 . . . . . . . . . . 401 401 401 402 405 408 408 408 409 411 414 . . . . . . 417 417 417 418 422 422 424 CONTENTS 16.2.3 Convergence and Limits of Distributions 16.2.4 The distribution P x1 . . . . . . . . . . . 16.2.5 Operation with Distributions . . . . . . . 16.3 Tensor Product and Convolution Product . . . . . 16.3.1 The Support of a Distribution . . . . . . 16.3.2 Tensor Products . . . . . . . . . . . . . . 16.3.3 Convolution Product . . . . . . . . . . . 16.3.4 Linear Change of Variables . . . . . . . . 16.3.5 Fundamental Solutions . . . . . . . . . . 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) . 16.4.1 The Space S (Rn ) . . . . . . . . . . . . 16.4.2 The Space S ′ (Rn ) . . . . . . . . . . . . 16.4.3 Fourier Transformation in S ′ (Rn ) . . . . 16.5 Appendix—More about Convolutions . . . . . . 9 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 426 427 433 433 433 434 437 438 439 440 446 447 450 17 PDE II — The Equations of Mathematical Physics 17.1 Fundamental Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1.1 The Laplace Equation . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1.2 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.1.3 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2 The Cauchy Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.1 Motivation of the Method . . . . . . . . . . . . . . . . . . . . . . . . 17.2.2 The Wave Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.3 The Heat Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.2.4 Physical Interpretation of the Results . . . . . . . . . . . . . . . . . . 17.3 Fourier Method for Boundary Value Problems . . . . . . . . . . . . . . . . . . 17.3.1 Initial Boundary Value Problems . . . . . . . . . . . . . . . . . . . . . 17.3.2 Eigenvalue Problems for the Laplace Equation . . . . . . . . . . . . . 17.4 Boundary Value Problems for the Laplace and the Poisson Equations . . . . . . 17.4.1 Formulation of Boundary Value Problems . . . . . . . . . . . . . . . . 17.4.2 Basic Properties of Harmonic Functions . . . . . . . . . . . . . . . . . 17.5 Appendix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17.5.1 Existence of Solutions to the Boundary Value Problems . . . . . . . . . 17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle 17.5.3 Numerical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . 453 453 453 455 456 459 459 460 464 466 468 469 473 477 477 478 485 485 490 494 10 CONTENTS Chapter 1 Real and Complex Numbers Basics Notations R C Q N = {1, 2, . . . } Z Real numbers Complex numbers Rational numbers positive integers (natural numbers) Integers We know that N ⊆ Z ⊆ Q ⊆ R ⊆ C. We write R+ , Q+ and Z+ for the non-negative real, rational, and integer numbers x ≥ 0, respectively. The notions A ⊂ B and A ⊆ B are equivalent. If we want to point out that B is strictly bigger than A we write A ( B. We use the following symbols := y, ⇒ ⇐⇒ ∀ ∃ defining equation implication, “if . . . , then . . . ” “if and only if”, equivalence for all there exists Let a < b fixed real numbers. We denote the intervals as follows [a, b] := {x ∈ R | a ≤ x ≤ b} (a, b) := {x ∈ R | a < x < b} [a, b) := {x ∈ R | a ≤ x < b} (a, b] := {x ∈ R | a < x ≤ b} [a, ∞) := {x ∈ R | a ≤ x} (a, ∞) := {x ∈ R | a < x} (−∞, b] := {x ∈ R | x ≤ b} (−∞, b) := {x ∈ R | x < b} closed interval open interval half-open interval half-open interval closed half-line open half-line closed half-line open half-line 11 1 Real and Complex Numbers 12 (a) Sums and Products P Q Let us recall the meaning of the sum sign and the product sign . Suppose m ≤ n are integers, and ak , k = m, . . . , n are real numbers. Then we set n X k=m n Y ak := am + am+1 + · · · + an , k=m ak := am am+1 · · · an . In case m = n the sum and the product consist of one summand and one factor only, respectively. In case n < m it is customary to set n X ak := 0, (empty sum) k=m n Y ak := 1 (empty product). k=m The following rules are obvious: If m ≤ n ≤ p and d ∈ Z are integers then n X k=m ak + p X ak = k=n+1 We have for a ∈ R, n X k=m p X ak , k=m n X ak = k=m n+d X ak−d (index shift). k=m+d a = (n − m + 1)a. (b) Mathematical Induction Mathematical induction is a powerful method to prove theorems about natural numbers. Theorem 1.1 (Principle of Mathematical Induction) Let n0 ∈ Z be an integer. To prove a statement A(n) for all integers n ≥ n0 it is sufficient to show: (I) A(n0 ) is true. (II) For any n ≥ n0 : If A(n) is true, so is A(n + 1) (Induction step). It is easy to see how the principle works: First, A(n0 ) is true. Apply (II) to n = n0 we obtain that A(n0 + 1) is true. Successive application of (II) yields A(n0 + 2), A(n0 + 3) are true and so on. Example 1.1 (a) For all nonnegative integers n we have n X k=1 (2k − 1) = n2 . Proof. We use induction over n. In case n = 0 we have an empty sum on the left hand side (lhs) and 02 = 0 on the right hand side (rhs). Hence, the statement is true for n = 0. Suppose it is true for some fixed n. We shall prove it for n + 1. By the definition of the sum P and by induction hypothesis, nk=1 (2k − 1) = n2 , we have n+1 X k=1 (2k − 1) = n X k=1 (2k − 1) + 2(n + 1) − 1 This proves the claim for n + 1. = ind. hyp. n2 + 2n + 1 = (n + 1)2 . 13 (b) For all positive integers n ≥ 8 we have 2n > 3n2 . Proof. In case n = 8 we have 2n = 28 = 256 > 192 = 3 · 64 = 3 · 82 = 3n2 ; and the statement is true in this case. Suppose it is true for some fixed n ≥ 8, i. e. 2n > 3n2 (induction hypothesis). We will show that the statement is true for n + 1, i. e. 2n+1 > 3(n + 1)2 (induction assertion). Note that n ≥ 8 implies =⇒ (n − 1)2 > 4 > 2 n−1≥ 7>2 =⇒ 3(n2 − 2n − 1) > 0 =⇒ 6n2 > 3n2 + 6n + 3 =⇒ 2 · 3n2 > 3(n + 1)2 . =⇒ 3n2 − 6n − 3 > 0 =⇒ 2 · 3n2 > 3(n2 + 2n + 1) =⇒ n2 − 2n − 1 > 0 | +3n2 + 6n + 3 (1.1) By induction assumption, 2n+1 = 2 · 2n > 2 · 3n2 . This together with (1.1) yields 2n+1 > 3(n + 1)2 . Thus, we have shown the induction assertion. Hence the statement is true for all positive integers n ≥ 8. For a positive integer n ∈ N we set n! := n Y k, read: “n factorial,” 0! = 1! = 1. k=1 (c) Binomial Coefficients For non-negative integers n, k ∈ Z+ we define k Y n−i+1 n(n − 1) · · · (n − k + 1) n = . := i k(k − 1) · · · 2 · 1 k i=1 The numbers nk (read: “n choose k”) are called binomial coefficients since they appear in the binomial theorem, see Proposition 1.4 below. It just follows from the definition that n = 0 for k > n, k n! n n = for 0 ≤ k ≤ n. = k k!(n − k)! n−k Lemma 1.2 For 0 ≤ k ≤ n we have: n n n+1 . + = k+1 k k+1 1 Real and Complex Numbers 14 Proof. For k = n the formula is obvious. For 0 ≤ k ≤ n − 1 we have n! n n n! = + + k+1 k k!(n − k)! (k + 1)!(n − k − 1)! (k + 1)n! + (n − k)n! (n + 1)! n+1 = . = = (k + 1)!(n − k)! (k + 1)!(n − k)! k+1 We say that X is an n-set if X has exactly n elements. We write Card X = n (from “cardinality”) to denote the number of elements in X. Lemma 1.3 The number of k-subsets of an n-set is nk . The Lemma in particular shows that nk is always an integer (which is not obvious by its definition). Proof. We denote the number of k-subsets of an n set Xn by Ckn . It is clear that C0n = Cnn = 1 since ∅ is the only 0-subset of Xn and Xn itself is the only n-subset of Xn . We use induction over n. The case n = 1 is obvious since C01 = C11 = 10 = 11 = 1. Suppose that the claim is true for some fixed n. We will show the statement for the (n + 1)-set X = {1, . . . , n + 1} and all k with 1 ≤ k ≤ n. The family of (k + 1)-subsets of X splits into two disjoint classes. In the first class A1 every subset contains n + 1; in the second class A2 , not. To form a subset in A1 one has to choose another k elements out of {1, . . . , n}. By induction assumption the number is Card A1 = Ckn = nk . To form a subset in A2 one has to choose k + 1 elements out of n n . By Lemma 1.2 = k+1 {1, . . . , n}. By induction assumption this number is Card A2 = Ck+1 we obtain n+1 n n n+1 Ck+1 = Card A1 + Card A2 = = + k+1 k+1 k which proves the induction assertion. Proposition 1.4 (Binomial Theorem) Let x, y ∈ R and n ∈ N. Then we have n X n n−k k (x + y) = x y . k k=0 n Proof. We give a direct proof. Using the distributive law we find that each of the 2n summands of product (x + y)n has the form xn−k y k for some k = 0, . . . , n. We number the n factors as (x + y)n = f1 · f2 · · · fn , f1 = f2 = · · · = fn = x + y. Let us count how often the summand xn−k y k appears. We have to choose k factors y out of the n factors f1 , . . . , fn . The remaining n − k factors must be x. This gives a 1-1-correspondence between the k-subsets of {f1 , . . . , fn } and the different summands of the form xn−k y k . Hence, by Lemma 1.3 their number is Ckn = nk . This proves the proposition. 1.1 Real Numbers 15 1.1 Real Numbers In this lecture course we assume the system of real numbers to be given. Recall that the set of integers is Z = {0, ±1, ±2, . . . } while the fractions of integers Q = { m | m, n ∈ Z, n 6= 0} n form the set of rational numbers. A satisfactory discussion of the main concepts of analysis such as convergence, continuity, differentiation and integration must be based on an accurately defined number concept. An existence proof for the real numbers is given in [Rud76, Appendix to Chapter 1]. The author explicitly constructs the real numbers R starting from the rational numbers Q. The aim of the following two sections is to formulate the axioms which are sufficient to derive all properties and theorems of the real number system. The rational numbers are inadequate for many purposes, both as a field and an ordered set. For instance, there is no rational x with x2 = 2. This leads to the introduction of irrational numbers which are often written as infinite decimal expansions and are considered to be “approximated” by the corresponding finite decimals. Thus the sequence 1, 1.4, 1.41, 1.414, 1.4142, . . . √ √ “tends to 2.” But unless the irrational number 2 has been clearly defined, the question must arise: What is it that this sequence “tends to”? This sort of question can be answered as soon as the so-called “real number system” is constructed. Example 1.2 As shown in the exercise class, there is no rational number x with x2 = 2. Set A = {x ∈ Q+ | x2 < 2} and B = {x ∈ Q+ | x2 > 2}. Then A ∪ B = Q+ and A ∩ B = ∅. One can show that in the rational number system, A has no largest element and B has no smallest element, for details see Appendix A or Rudin’s book [Rud76, Example 1.1, page 2]. This example shows that the system of rational numbers has certain gaps in spite of the fact that between any two rationals there is another: If r < s then r < (r + s)/2 < s. The real number system fills these gaps. This is the principal reason for the fundamental role which it plays in analysis. We start with the brief discussion of the general concepts of ordered set and field. 1.1.1 Ordered Sets Definition 1.1 (a) Let S be a set. An order (or total order) on S is a relation, denoted by <, with the following properties. Let x, y, z ∈ S. (i) One and only one of the following statements is true. x < y, x = y, (ii) x < y and y < z implies x < z y<x (trichotomy) (transitivity). 1 Real and Complex Numbers 16 In this case S is called an ordered set. (b) Suppose (S, <) is an ordered set, and E ⊆ S. If there exists a β ∈ S such that x ≤ β for all x ∈ E, we say that E is bounded above, and call β an upper bound of E. Lower bounds are defined in the same way with ≥ in place of ≤. If E is both bounded above and below, we say that E is bounded. The statement x < y may be read as “x is less than y” or “x precedes y”. It is convenient to write y > x instead of x < y. The notation x ≤ y indicates x < y or x = y. In other words, x ≤ y is the negation of x > y. For example, R is an ordered set if r < s is defined to mean that s − r > 0 is a positive real number. Example 1.3 (a) The intervals [a, b], (a, b], [a, b), (a, b), (−∞, b), and (−∞, b] are bounded above by b and all numbers greater than b. (b) E := { n1 | n ∈ N} = {1, 12 , 13 , . . . } is bounded above by any α ≥ 1. It is bounded below by 0. Definition 1.2 Suppose S is an ordered set, E ⊆ S, an E is bounded above. Suppose there exists an α ∈ S such that (i) α is an upper bound of E. (ii) If β is an upper bound of E then α ≤ β. Then α is called the supremum of E (or least upper bound) of E. We write α = sup E. An equivalent formulation of (ii) is the following: (ii)′ If β < α then β is not an upper bound of E. The infimum (or greatest lower bound) of a set E which is bounded below is defined in the same manner: The statement α = inf E means that α is a lower bound of E and for all lower bounds β of E we have β ≤ α. Example 1.4 (a) If α = sup E exists, then α may or may not belong to E. For instance consider [0, 1) and [0, 1]. Then 1 = sup[0, 1) = sup[0, 1], however 1 6∈ [0, 1) but 1 ∈ [0, 1]. We will show that sup[0, 1] = 1. Obviously, 1 is an upper bound of [0, 1]. Suppose that β < 1, then β is not an upper bound of [0, 1] since β 6≥ 1. Hence 1 = sup[0, 1]. We show will show that sup[0, 1) = 1. Obviously, 1 is an upper bound of this interval. Suppose that β < 1. Then β < β+1 < 1. Since β+1 ∈ [0, 1), β is not an upper bound. Consequently, 2 2 1 = sup[0, 1). (b) Consider the sets A and B of Example 1.2 as subsets of the ordered set Q. Since A∪B = Q+ (there is no rational number with x2 = 2) the upper bounds of A are exactly the elements of B. 1.1 Real Numbers 17 Indeed, if a ∈ A and b ∈ B then a2 < 2 < b2 . Taking the square root we have a < b. Since B contains no smallest member, A has no supremum in Q+ . Similarly, B is bounded below by any element of A. Since A has no largest member, B has no infimum in Q. Remarks 1.1 (a) It is clear from (ii) and the trichotomy of < that there is at most one such α. Indeed, suppose α′ also satisfies (i) and (ii), by (ii) we have α ≤ α′ and α′ ≤ α; hence α = α′ . (b) If sup E exists and belongs to E, we call it the maximum of E and denote it by max E. Hence, max E = sup E and max E ∈ E. Similarly, if the infimum of E exists and belongs to E we call it the minimum and denote it by min E; min E = inf E, min E ∈ E. bounded subset of Q [0, 1] [0, 1) A an upper bound sup 2 2 2 1 1 — max 1 — — (c) Suppose that α is an upper bound of E and α ∈ E then α = max E, that is, property (ii) in Definition 1.2 is automatically satisfied. Similarly, if β ∈ E is a lower bound, then β = min E. (d) If E is a finite set it has always a maximum and a minimum. 1.1.2 Fields Definition 1.3 A field is a set F with two operations, called addition and multiplication which satisfy the following so-called “field axioms” (A), (M), and (D): (A) Axioms for addition (A1) (A2) (A3) (A4) (A5) If x ∈ F and y ∈ F then their sum x + y is in F . Addition is commutative: x + y = y + x for all x, y ∈ F . Addition is associative: (x + y) + z = x + (y + z) for all x, y, z ∈ F . F contains an element 0 such that 0 + x = x for all x ∈ F . To every x ∈ F there exists an element −x ∈ F such that x + (−x) = 0. (M) Axioms for multiplication (M1) (M2) (M3) (M4) (M5) If x ∈ F and y ∈ F then their product xy is in F . Multiplication is commutative: xy = yx for all x, y ∈ F . Multiplication is associative: (xy)z = x(yz) for all x, y, z ∈ F . F contains an element 1 such that 1x = x for all x ∈ F . If x ∈ F and x 6= 0 then there exists an element 1/x ∈ F such that x · (1/x) = 1. (D) The distributive law x(y + z) = xy + xz holds for all x, y, z ∈ F . 1 Real and Complex Numbers 18 Remarks 1.2 (a) One usually writes x − y, x , x + y + z, xyz, x2 , x3 , 2x, . . . y in place of 1 x + (−y), x · , (x + y) + z, (xy)z, x · x, x · x · x, 2x, . . . y (b) The field axioms clearly hold in Q if addition and multiplication have their customary meaning. Thus Q is a field. The integers Z form not a field since 2 ∈ Z has no multiplicative inverse (axiom (M5) is not fulfilled). (c) The smallest field is F2 = {0, 1} consisting of the neutral element 0 for addition and the neu+ 0 1 tral element 1 for multiplication. Multiplication and addition are defined as follows 0 0 1 1 1 0 · 0 1 0 0 0 . It is easy to check the field axioms (A), (M), and (D) directly. 1 0 1 (d) (A1) to (A5) and (M1) to (M5) mean that both (F, +) and (F \ {0}, ·) are commutative (or abelian) groups, respectively. Proposition 1.5 The axioms of addition imply the following statements. (a) If x + y = x + z then y = z (Cancellation law). (b) If x + y = x then y = 0 (The element 0 is unique). (c) If x + y = 0 the y = −x (The inverse −x is unique). (d) −(−x) = x. Proof. If x + y = x + z, the axioms (A) give y = 0 + y = (−x + x) + y = −x + (x + y) = −x + (x + z) A4 A5 A3 assump. = (−x + x) + z = 0 + z = z. A3 A5 A4 This proves (a). Take z = 0 in (a) to obtain (b). Take z = −x in (a) to obtain (c). Since −x + x = 0, (c) with −x in place of x and x in place of y, gives (d). Proposition 1.6 The axioms for multiplication imply the following statements. (a) If x 6= 0 and xy = xz then y = z (Cancellation law). (b) If x 6= 0 and xy = x then y = 1 (The element 1 is unique). (c) If x 6= 0 and xy = 1 then y = 1/x (The inverse 1/x is unique). (d) If x 6= 0 then 1/(1/x) = x. The proof is so similar to that of Proposition 1.5 that we omit it. 1.1 Real Numbers 19 Proposition 1.7 The field axioms imply the following statements, for any x, y, z ∈ F (a) 0x = 0. (b) If xy = 0 then x = 0 or y = 0. (c) (−x)y = −(xy) = x(−y). (d) (−x)(−y) = xy. Proof. 0x + 0x = (0 + 0)x = 0x. Hence 1.5 (b) implies that 0x = 0, and (a) holds. Suppose to the contrary that both x 6= 0 and y 6= 0 then (a) gives 1 1 1 1 1 = · xy = · 0 = 0, y x y x a contradiction. Thus (b) holds. The first equality in (c) comes from (−x)y + xy = (−x + x)y = 0y = 0, combined with 1.5 (b); the other half of (c) is proved in the same way. Finally, (−x)(−y) = −[x(−y)] = −[−xy] = xy by (c) and 1.5 (d). 1.1.3 Ordered Fields In analysis dealing with equations is as important as dealing with inequalities. Calculations with inequalities are based on the ordering axioms. It turns out that all can be reduced to the notion of positivity. In F there are distinguished positive elements (x > 0) such that the following axioms are valid. Definition 1.4 An ordered field is a field F which is also an ordered set, such that for all x, y, z ∈ F (O) Axioms for ordered fields (O1) x > 0 and y > 0 implies x + y > 0, (O2) x > 0 and y > 0 implies xy > 0. If x > 0 we call x positive; if x < 0, x is negative. For example Q and R are ordered fields, if x > y is defined to mean that x − y is positive. Proposition 1.8 The following statements are true in every ordered field F . (a) If x < y and a ∈ F then a + x < a + y. (b) If x < y and x′ < y ′ then x + x′ < y + y ′ . 1 Real and Complex Numbers 20 Proof. (a) By assumption (a + y) − (a + x) = y − x > 0. Hence a + x < a + y. (b) By assumption and by (a) we have x + x′ < y + x′ and y + x′ < y + y ′ . Using transitivity, see Definition 1.1 (ii), we have x + x′ < y + y ′. Proposition 1.9 The following statements are true in every ordered field. (a) If x > 0 then −x < 0, and if x < 0 then −x > 0. (b) If x > 0 and y < z then xy < xz. (c) If x < 0 and y < z then xy > xz. (d) If x 6= 0 then x2 > 0. In particular, 1 > 0. (e) If 0 < x < y then 0 < 1/y < 1/x. Proof. (a) If x > 0 then 0 = −x + x > −x + 0 = −x, so that −x < 0. If x < 0 then 0 = −x + x < −x + 0 = −x so that −x > 0. This proves (a). (b) Since z > y, we have z − y > 0, hence x(z − y) > 0 by axiom (O2), and therefore xz = x(z − y) + xy > P rp. 1.8 0 + xy = xy. (c) By (a), (b) and Proposition 1.7 (c) −[x(z − y)] = (−x)(z − y) > 0, so that x(z − y) < 0, hence xz < xy. (d) If x > 0 axiom 1.4 (ii) gives x2 > 0. If x < 0 then −x > 0, hence (−x)2 > 0 But x2 = (−x)2 by Proposition 1.7 (d). Since 12 = 1, 1 > 0. (e) If y > 0 and v ≤ 0 then yv ≤ 0. But y · (1/y) = 1 > 0. Hence 1/y > 0, likewise 1/x > 0. If we multiply x < y by the the positive quantity (1/x)(1/y), we obtain 1/y < 1/x. Remarks 1.3 (a) The finite field F2 = {0, 1}, see Remarks 1.2, is not an ordered field since 1 + 1 = 0 which contradicts 1 > 0. (b) The field of complex numbers C (see below) is not an ordered field since i2 = −1 contradicts Proposition 1.9 (a), (d). 1.1.4 Embedding of natural numbers into the real numbers Let F be an ordered field. We want to recover the integers inside F . In order to distinguish 0 and 1 in F from the integers 0 and 1 we temporarily write 0F and 1F . For a positive integer n ∈ N, n ≥ 2 we define nF := 1F + 1F + · · · + 1F Lemma 1.10 We have nF > 0F for all n ∈ N. (n times). 1.1 Real Numbers 21 Proof. We use induction over n. By Proposition 1.9 (d) the statement is true for n = 1. Suppose it is true for a fixed n, i. e. nF > 0F . Moreover 1F > 0F . Using axiom (O2) we obtain (n + 1)1F = nF + 1F > 0. From Lemma 1.10 it follows that m 6= n implies nF 6= mF . Indeed, let n be greater than m, say n = m + k for some k ∈ N, then nF = mF + kF . Since kF > 0 it follows from 1.8 (a) that nF > mF . In particular, nF 6= mF . Hence, the mapping N → F, n 7→ nF is a one-to-one correspondence (injective). In this way the positive integers are embedded into the real numbers. Addition and multiplication of natural numbers and of its embeddings are the same: nF mF = (nm)F . nF + mF = (n + m)F , From now on we identify a natural number with the associated real number. We write n for nF . Definition 1.5 (The Archimedean Axiom) An ordered field F is called Archimedean if for all x, y ∈ F with x > 0 and y > 0 there exists n ∈ N such that nx > y. An equivalent formulation is: The subset N ⊂ F of positive integers is not bounded above. Choose x = 1 in the above definition, then for any y ∈ F there in an n ∈ N such that n > y; hence N is not bounded above. Suppose N is not bounded and x > 0, y > 0 are given. Then y/x is not an upper bound for N, that is there is some n ∈ N with n > y/x or nx > y. 1.1.5 The completeness of R Using the axioms so far we are not yet able to prove the existence of irrational numbers. We need the completeness axiom. Definition 1.6 (Order Completeness) An ordered set S is said to be order complete if for every non-empty bounded subset E ⊂ S has a supremum sup E in S. (C) Completeness Axiom The real numbers are order complete, i. e. every bounded subset E ⊂ R has a supremum. The set Q of rational numbers is not order complete since, for example, the bounded set √ A = {x ∈ Q+ | x2 < 2} has no supremum in Q. Later we will define 2 := sup A. √ The existence of 2 in R is furnished by the completeness axiom (C). Axiom (C) implies that every bounded subset E ⊂ R has an infimum. This is an easy consequence of Homework 1.4 (a). We will see that an order complete field is always Archimedean. Proposition 1.11 (a) R is Archimedean. (b) If x, y ∈ R, and x < y then there is a p ∈ Q with x < p < y. 1 Real and Complex Numbers 22 Part (b) may be stated by saying that Q is dense in R. Proof. (a) Let x, y > 0 be real numbers which do not fulfill the Archimedean property. That is, if A := {nx | n ∈ N}, then y would be an upper bound of A. Then (C) furnishes that A has a supremum α = sup A. Since x > 0, α − x < α and α − x is not an upper bound of A. Hence α − x < mx for some m ∈ N. But then α < (m + 1)x, which is impossible, since α is an upper bound of A. (b) See [Rud76, Theorem 29]. Remarks 1.4 (a) If x, y ∈ Q with x < y, then there exists z ∈ R \ Q with x < z < y; chose √ z = x + (y − x)/ 2. Ex class: (b) We shall show that inf n1 | n ∈ N = 0. Since n > 0 for all n ∈ N, n1 > 0 by Proposition 1.9 (e) and 0 is a lower bound. Suppose α > 0. Since R is Archimedean, we find m ∈ N such that 1 < mα or, equivalently 1/m < α. Hence, α is not a lower bound for E which proves the claim. (c) Axiom (C) is equivalent to the Archimedean property together with the topological completeness (“Every Cauchy sequence in R is convergent,” see Proposition 2.18). (d) Axiom (C) is equivalent to the axiom of nested intervals, see Proposition 2.11 below: Let In := [an , bn ] a sequence of closed nested intervals, that is (I1 ⊇ I2 ⊇ I3 ⊇ · · · ) such that for all ε > 0 there exists n0 such that 0 ≤ bn − an < ε for all n ≥ n0 . Then there exists a unique real number a ∈ R which is a member of all intervals, T i. e. {a} = n∈N In . 1.1.6 The Absolute Value For x ∈ R one defines | x | := ( x, −x, if if x ≥ 0, x < 0. Lemma 1.12 For a, x, y ∈ R we have (a) | x | ≥ 0 and | x | = 0 if and only if x = 0. Further | −x | = | x |. (b) ±x ≤ | x |, | x | = max{x, −x}, and | x | ≤ a ⇐⇒ (x ≤ a and (c) | xy | = | x | | y | and xy = || xy || if y 6= 0. (d) | x + y | ≤ | x | + | y | (triangle inequality). (e) | | x | − | y | | ≤ | x + y |. − x ≤ a). Proof. (a) By Proposition 1.9 (a), x < 0 implies | x | = −x > 0. Also, x > 0 implies | x | > 0. Putting both together we obtain, x 6= 0 implies | x | > 0 and thus | x | = 0 implies x = 0. Moreover | 0 | = 0. This shows the first part. The statement | x | = | −x | follows from (b) and −(−x) = x. (b) Suppose first that x ≥ 0. Then x ≥ 0 ≥ −x and we have max{x, −x} = x = | x |. If x < 0 then −x > 0 > x and 1.1 Real Numbers 23 max{−x, x} = −x = | x |. This proves max{x, −x} = | x |. Since the maximum is an upper bound, | x | ≥ x and | x | ≥ −x. Suppose now a is an upper bound of {x, −x}. Then | x | = max{x, −x} ≤ a. On the other hand, max{x, −x} ≤ a implies that a is an upper bound of {x, −x} since max is. One proves the first part of (c) by verifying the four cases (i) x, y ≥ 0, (ii) x ≥ 0, y < 0, (iii) x < 0, y ≥ 0, and (iv) x, y < 0 separately. (i) is clear. In case (ii) we have by Proposition 1.9 (a) and (b) that xy ≤ 0, and Proposition 1.7 (c) | x | | y | = x(−y) = −(xy) = | xy | . The cases (iii) and (iv) are similar. To the second part. Since x = xy · y we have by the first part of (c), | x | = xy | y |. The claim follows by multipli- cation with | y1 | . (d) By (b) we have ±x ≤ | x | and ±y ≤ | y |. It follows from Proposition 1.8 (b) that ±(x + y) ≤ | x | + | y | . By the second part of (b) with a = | x | + | y |, we obtain | x + y | ≤ | x | + | y |. (e) Inserting u := x + y and v := −y into | u + v | ≤ | u | + | v | one obtains | x | ≤ | x + y | + | −y | = | x + y | + | y | . Adding − | y | on both sides one obtains | x | − | y | ≤ | x + y |. Changing the role of x and y in the last inequality yields −(| x | − | y |) ≤ | x + y |. The claim follows again by (b) with a = | x + y |. 1.1.7 Supremum and Infimum revisited The following equivalent definition for the supremum of sets of real numbers is often used in the sequel. Note that x≤β =⇒ sup M ≤ β. ∀x ∈ M Similarly, α ≤ x for all x ∈ M implies α ≤ inf M. Remarks 1.5 (a) Suppose that E ⊂ R. Then α is the supremum of E if and only if (1) α is an upper bound for E, (2) For all ε > 0 there exists x ∈ E with α − ε < x. Using the Archimedean axiom (2) can be replaced by (2’) For all n ∈ N there exists x ∈ E such that α − 1 n < x. 1 Real and Complex Numbers 24 (b) Let M ⊂ R and N ⊂ R nonempty subsets which are bounded above. Then M + N := {m + n | m ∈ M, n ∈ N} is bounded above and sup(M + N) = sup M + sup N. (c) Let M ⊂ R+ and N ⊂ R+ nonempty subsets which are bounded above. Then MN := {mn | m ∈ M, n ∈ N} is bounded above and sup(MN) = sup M sup N. 1.1.8 Powers of real numbers We shall prove the existence of nth roots of positive reals. We already know xn , n ∈ Z. It is 1 recursively defined by xn := xn−1 · x, x1 := x, n ∈ N and xn := x−n for n < 0. Proposition 1.13 (Bernoulli’s inequality) Let x ≥ −1 and n ∈ N. Then we have (1 + x)n ≥ 1 + nx. Equality holds if and only if x = 0 or n = 1. Proof. We use induction over n. In the cases n = 1 and x = 0 we have equality. The strict inequality (with an > sign in place of the ≥ sign) holds for n0 = 2 and x 6= 0 since (1 + x)2 = 1 + 2x + x2 > 1 + 2x. Suppose the strict inequality is true for some fixed n ≥ 2 and x 6= 0. Since 1 + x ≥ 0 by Proposition 1.9 (b) multiplication of the induction assumption by this factor yields (1 + x)n+1 ≥ (1 + nx)(1 + x) = 1 + (n + 1)x + nx2 > 1 + (n + 1)x. This proves the strict assertion for n + 1. We have equality only if n = 1 or x = 0. Lemma 1.14 (a) For x, y ∈ R with x, y > 0 and n ∈ N we have x < y ⇐⇒ xn < y n . (b) For x, y ∈ R+ and n ∈ N we have nxn−1 (y − x) ≤ y n − xn ≤ ny n−1 (y − x). We have equality if and only if n = 1 or x = y. Proof. (a) Observe that n n y − x = (y − x) n X k=1 y n−k xk−1 = c(y − x) (1.2) 1.1 Real Numbers 25 P with c := nk=1 y n−k xk−1 > 0 since x, y > 0. The claim follows. (b) We have y n − xn − nxn−1 (y − x) = (y − x) = (y − x) n X k=1 n X k=1 y n−k xk−1 − xn−1 xk−1 y n−k − xn−k ≥ 0 since by (a) y − x and y n−k − xn−k have the same sign. The proof of the second inequality is quite analogous. Proposition 1.15 For every real x > 0 and every positive integer n ∈ N there is one and only one y > 0 such that y n = x. √ 1 This number y is written n x or x n , and it is called “the nth root of x”. Proof. The uniqueness is clear since by Lemma 1.14 (a) 0 < y1 < y2 implies 0 < y1n < y2n . Set E := {t ∈ R+ | tn < x}. Observe that E 6= ∅ since 0 ∈ E. We show that E is bounded above. By Bernoulli’s inequality and since 0 < x < nx we have t ∈ E ⇔ tn < x < 1 + nx < (1 + x)n =⇒ t<1+x Lemma 1.14 Hence, 1 + x is an upper bound for E. By the order completeness of R there exists y ∈ R such that y = sup E. We have to show that y n = x. For, we will show that each of the inequalities y n > x and y n < x leads to a contradiction. Assume y n < x and consider (y + h)n with “small” h (0 < h < 1). Lemma 1.14 (b) implies 0 ≤ (y + h)n − y n ≤ n (y + h)n−1 (y + h − y) < h n(y + 1)n−1 . Choosing h small enough that h n(y + 1)n−1 < x − y n we may continue (y + h)n − y n ≤ x − y n . Consequently, (y + h)n < x and therefore y + h ∈ E. Since y + h > y, this contradicts the fact that y is an upper bound of E. Assume y n > x and consider (y − h)n with “small” h (0 < h < 1). Again by Lemma 1.14 (b) we have 0 ≤ y n − (y − h)n ≤ n y n−1(y − y + h) < h ny n−1. Choosing h small enough that h ny n−1 < y n − x we may continue y n − (y − h)n ≤ y n − x. 1 Real and Complex Numbers 26 Consequently, x < (y − h)n and therefore tn < x < (y − h)n for all t in E. Hence y − h is an upper bound for E smaller than y. This contradicts the fact that y is the least upper bound. Hence y n = x, and the proof is complete. Remarks 1.6 (a) If a and b are positive real numbers and n ∈ N then (ab)1/n = a1/n b1/n . Proof. Put α = a1/n and β = b1/n . Then ab = αn β n = (αβ)n , since multiplication is commutative. The uniqueness assertion of Proposition 1.15 shows therefore that (ab)1/n = αβ = a1/n b1/n . (b) Fix b > 0. If m, n, p, q ∈ Z and n > 0, q > 0, and r = m/n = p/q. Then we have (bm )1/n = (bp )1/q . (1.3) Hence it makes sense to define br = (bm )1/n . (c) Fix b > 1. If x ∈ R define bx = sup{bp | p ∈ Q, p < x}. For 0 < b < 1 set bx = (1.4) 1 . 1 x b Without proof we give the familiar laws for powers and exponentials. Later we will redefine the power bx with real exponent. Then we are able to give easier proofs. (d) If a, b > 0 and x, y ∈ R, then x (i) bx+y = bx by , bx−y = bby , (ii) bxy = (bx )y , (iii) (ab)x = ax bx . 1.1.9 Logarithms Fix b > 1, y > 0. Similarly as in the preceding subsection, one can prove the existence of a unique real x such that bx = y. This number x is called the logarithm of y to the base b, and we write x = logb y. Knowing existence and uniqueness of the logarithm, it is not difficult to prove the following properties. Lemma 1.16 For any a > 0, a 6= 1 we have (a) loga (bc) = loga b + loga c if b, c > 0; (b) loga (bc ) = c loga b, if b > 0; logd b if b, d > 0 and d 6= 1. (c) loga b = logd a Later we will give an alternative definition of the logarithm function. 1.1 Real Numbers 27 Review of Trigonometric Functions (a) Degrees and Radians The following table gives some important angles in degrees and radians. The precise definition of π is given below. For a moment it is just an abbreviation to measure angles. Transformation 2π of angles αdeg measured in degrees into angles measured in radians goes by αrad = αdeg 360 ◦. Degrees 0◦ Radians 0 30◦ 45◦ 60◦ 90◦ 120◦ 135◦ 150◦ π 6 π 4 π 3 π 2 2π 3 3π 4 5π 6 180◦ π 270◦ 3π 2 360◦ 2π (b) Sine and Cosine The sine, cosine, and tangent functions are defined in terms of ratios of sides of a right triangle: length of the side adjacent to ϕ , length of the hypotenuse length of the side opposite to ϕ , sin ϕ = length of the hypotenuse length of the side opposite to ϕ tan ϕ = . length of the side adjacent to θ cos ϕ = hypotenuse opposite side . ϕ adjacent side Let ϕ be any angle between 0◦ and 360◦ . Further let P be the point on the unit circle (with center in (0, 0) and radius 1) such that the ray from P to the origin (0, 0) and the positive x-axis make an angle ϕ. Then cos ϕ and sin ϕ are defined to be the x-coordinate and the y-coordinate of the point P , respectively. y P 1 sin ϕ ϕ cos ϕ x If the angle ϕ is between 0◦ and 90◦ this new definition coincides with the definition using the right triangle since the hypotenuse which is a radius of the unit circle has now length 1. y P If 90◦ < ϕ < 180◦ we find sin ϕ ϕ cos ϕ x cos ϕ = − cos(180◦ − ϕ) < 0, sin ϕ = sin(180◦ − ϕ) > 0. 1 Real and Complex Numbers 28 y If 180◦ < ϕ < 270◦ we find ϕ cos ϕ cos ϕ = − cos(ϕ − 180◦ ) < 0, x sin ϕ = − sin(ϕ − 180◦ ) < 0. sin ϕ P y If 270◦ < ϕ < 360◦ we find ϕ cos ϕ cos ϕ = cos(360◦ − ϕ) > 0, x sin ϕ = − sin(360◦ − ϕ) < 0. sin ϕ 1 P For angles greater than 360◦ or less than 0◦ define cos ϕ = cos(ϕ + k·360◦), sin ϕ = sin(ϕ + k·360◦), where k ∈ Z is chosen such that 0◦ ≤ ϕ + k 360◦ < 360◦ . Thinking of ϕ to be given in radians, cosine and sine are functions defined for all real ϕ taking values in the closed interval [−1, 1]. If ϕ 6= π2 + kπ, k ∈ Z then cos ϕ 6= 0 and we define tan ϕ := If ϕ 6= kπ, k ∈ Z then sin ϕ 6= 0 and we define cot ϕ := sin ϕ . cos ϕ cos ϕ . sin ϕ In this way we have defined cosine, sine, tangent, and cotangent for arbitrary angles. (c) Special Values x in degrees 0◦ 30◦ 45◦ 60◦ 90◦ 120◦ 135◦ 150◦ 180◦ 270◦ 360◦ x in radians 0 π 6 π 4 π 3 π 2 2π 3 3π 4 5π 6 π 3π 2 2π sin x 0 1 2 √ √ √ √ 1 2 0 −1 0 cos x 1 √ 3 2 − 12 − −1 0 1 2 2 √ 2 2 3 2 1 2 1 0 2 2 3 2 √ 2 2 − √ 3 2 √ √ √ √ 3 3 tan x 0 1 3 / − 3 −1 − 0 / 0 3 3 Recall the addition formulas for cosine and sine and the trigonometric pythagoras. cos(x + y) = cos x cos y − sin x sin y, sin(x + y) = sin x cos y + cos x sin y. sin2 x + cos2 x = 1. (1.5) (1.6) 1.2 Complex numbers 29 1.2 Complex numbers Some algebraic equations do not have solutions in the real number system. For instance the quadratic equation x2 − 4x + 8 = 0 gives ‘formally’ √ √ x1 = 2 + −4 and x2 = 2 − −4. We will see that one can work with this notation. Definition 1.7 A complex number is an ordered pair (a, b) of real numbers. “Ordered” means that (a, b) 6= (b, a) if a 6= b. Two complex numbers x = (a, b) and y = (c, d) are said to be equal if and only if a = c and b = d. We define x + y := (a + c, b + d), xy := (ac − bd, ad + bc). Theorem 1.17 These definitions turn the set of all complex numbers into a field, with (0, 0) and (1, 0) in the role of 0 and 1. Proof. We simply verify the field axioms as listed in Definition 1.3. Of course, we use the field structure of R. Let x = (a, b), y = (c, d), and z = (e, f ). (A1) is clear. (A2) x + y = (a + c, b + d) = (c + a, d + b) = y + x. (A3) (x+y)+z = (a+c, b+d)+(e, f ) = (a+c+e, b+d+f ) = (a, b)+(c+e, d+f ) = x+(y+z). (A4) x + 0 = (a, b) + (0, 0) = (a, b) = x. (A5) Put −x := (−a, −b). Then x + (−x) = (a, b) + (−a, −b) = (0, 0) = 0. (M1) is clear. (M2) xy = (ac − bd, ad + bc) = (ca − db, da + cb) = yx. (M3) (xy)z = (ac − bd, ad + bc)(e, f ) = (ace − bde − adf − bcf, acf − bdf + ade + bce) = (a, b)(ce − df, cf + de) = x(yz). (M4) x · 1 = (a, b)(1, 0) = (a, b) = x. (M5) If x 6= 0 then (a, b) 6= (0, 0), which means that at least one of the real numbers a, b is different from 0. Hence a2 + b2 > 0 and we can define −b 1 a , . := x a2 + b2 a2 + b2 Then 1 x · = (a, b) x (D) a −b , a2 + b2 a2 + b2 = (1, 0) = 1. x(y + z) = (a, b)(c + e, d + f ) = (ac + ae − bd − bf, ad + af + bc + be) = (ac − bd, ad + bc) + (ae − bf, af + be) = xy + yz. 1 Real and Complex Numbers 30 Remark 1.7 For any two real numbers a and b we have (a, 0) + (b, 0) = (a + b, 0) and (a, 0)(b, 0) = (ab, 0). This shows that the complex numbers (a, 0) have the same arithmetic properties as the corresponding real numbers a. We can therefore identify (a, 0) with a. This gives us the real field as a subfield of the complex field. Note that we have defined the complex numbers without any reference to the mysterious square root of −1. We now show that the notation (a, b) is equivalent to the more customary a + bi. Definition 1.8 i := (0, 1). Lemma 1.18 (a) i2 = −1. (b) If a, b ∈ R then (a, b) = a + bi. Proof. (a) i2 = (0, 1)(0, 1) = (−1, 0) = −1. (b) a + bi = (a, 0) + (b, 0)(0, 1) = (a, 0) + (0, b) = (a, b). Definition 1.9 If a, b are real and z = a + bi, then the complex number z := a − bi is called the conjugate of z. The numbers a and b are the real part and the imaginary part of z, respectively. We shall write a = Re z and b = Im z. Proposition 1.19 If z and w are complex, then (a) z + w = z + w, (b) zw = z · w, (c) z + z = 2 Re z, z − z = 2i Im z, (d) z z is positive real except when z = 0. Proof. (a), (b), and (c) are quite trivial. To prove (d) write z = a + bi and note that z z = a2 + b2 . Definition 1.10 If z is complex number, its absolute value | z | is the (nonnegative) root of z z; √ that is | z | := z z. The existence (and uniqueness)√of | x | follows from Proposition 1.19 (d). Note that when x is real, then x = x, hence | x | = x2 . Thus | x | = x if x > 0 and | x | = −x if x < 0. We have recovered the definition of the absolute value for real numbers, see Subsection 1.1.6. Proposition 1.20 Let z and w be complex numbers . Then (a) | z | > 0 unless z = 0, (b) | z | = | z |, (c) | zw | = | z | | w |, (d) | Re z | ≤ | z |, (e) | z + w | ≤ | z | + | w | . Proof. (a) and (b) are trivial. Put z = a + bi and w = c + di, with a, b, c, d real. Then | zw |2 = (ac − bd)2 + (ad + bc)2 = (a2 + b2 )(c2 + d2 ) = | z |2 | w |2 1.2 Complex numbers 31 or | zw |2 = (| z | | w |)2 . Now (c) follows from the uniqueness assertion for roots. To prove (d), note that a2 ≤ a2 + b2 , hence √ √ | a | = a2 ≤ a2 + b2 = | z | . To prove (e), note that z w is the conjugate of z w, so that z w + zw = 2 Re (z w). Hence | z + w |2 = (z + w)(z + w) = z z + z w + z w + w w = | z |2 + 2 Re (z w) + | w |2 ≤ | z |2 + 2 | z | | w | + | w |2 = (| z | + | w |)2 . Now (e) follows by taking square roots. 1.2.1 The Complex Plane and the Polar form There is a bijective correspondence between complex numbers and the points of a plane. Im z=a+b i b r=| z| ϕ a By the Pythagorean theorem it is clear that √ | z | = a2 + b2 is exactly the distance of z from the origin 0. The angle ϕ between the positive real axis and the half-line 0z is called the argument of z and is denoted by ϕ = arg z. If z 6= 0, the argument ϕ is uniquely determined up to integer multiples of 2π Re Elementary trigonometry gives sin ϕ = b , |z| cos ϕ = a . |z| This gives with r = | z |, a = r cos ϕ and b = r sin ϕ. Inserting these into the rectangular form of z yields z = r(cos ϕ + i sin ϕ), (1.7) which is called the polar form of the complex number z. √ √ Example 1.5 a) z = 1 + i. Then | z | = 2 and sin ϕ = 1/ 2 = cos ϕ. This implies ϕ = π/4. √ Hence, the polar form of z is 1 + i = 2(cos π/4 + i sin π/4). b) z = −i. We have | −i | = 1 and sin ϕ = −1, cos ϕ = 0. Hence ϕ = 3π/2 and −i = 1(cos 3π/2 + i sin 3π/2). c) Computing the rectangular form of z from the polar form is easier. √ √ z = 32(cos 7π/6 + i sin 7π/6) = 32(− 3/2 − i/2) = −16 3 − 16i. 1 Real and Complex Numbers 32 z+w z w The addition of complex numbers corresponds to the addition of vectors in the plane. The geometric meaning of the inequality | z + w | ≤ | z | + | w | is: the sum of two edges of a triangle is bigger than the third edge. Multiplying complex numbers z = r(cos ϕ + i sin ϕ) and w = s(cos ψ + i sin ψ) in the polar form gives zw zw = rs(cos ϕ + i sin ϕ)(cos ψ + i sin ψ) = rs(cos ϕ cos ψ − sin ϕ sin ψ)+ z w i(cos ϕ sin ψ + sin ϕ cos ψ) zw = rs(cos(ϕ + ψ) + i sin(ϕ + ψ)), (1.8) where we made use of the addition laws for sin and cos in the last equation. Hence, the product of complex numbers is formed by taking the product of their absolute values and the sum of their arguments. The geometric meaning of multiplication by w is a similarity transformation of C. More precisely, we have a rotation around 0 by the angle ψ and then a dilatation with factor s and center 0. Similarly, if w 6= 0 we have z r = (cos(ϕ − ψ) + i sin(ϕ − ψ)) . (1.9) w s Proposition 1.21 (De Moivre’s formula) Let z = r(cos ϕ + i sin ϕ) be a complex number with absolute value r and argument ϕ. Then for all n ∈ Z one has z n = r n (cos(nϕ) + i sin(nϕ)). (1.10) Proof. (a) First let n > 0. We use induction over n to prove De Moivre’s formula. For n = 1 there is nothing to prove. Suppose (1.10) is true for some fixed n. We will show that the assertion is true for n + 1. Using induction hypothesis and (1.8) we find z n+1 = z n ·z = r n (cos(nϕ)+i sin(nϕ))r(cos ϕ+i sin ϕ) = r n+1 (cos(nϕ+ϕ)+i sin(nϕ+ϕ)). This proves the induction assertion. (b) If n < 0, then z n = 1/(z −n ). Since 1 = 1(cos 0 + i sin 0), (1.9) and the result of (a) gives zn = 1 z −n = 1 r −n (cos(0 − (−n)ϕ) + i sin(0 − (−n)ϕ)) = r n (cos(nϕ) + i sin(nϕ)). This completes the proof. 1.2 Complex numbers 33 √ Example 1.6 Compute the polar form of z = 3 − 3i and compute z 15 . √ √ √ We have | z | = 3 + 9 = 2 3, cos ϕ = 1/2, and sin ϕ = − 3/2. Therefore, ϕ = −π/3 and √ z = 2 3(cos(−π/3) + sin(−π/3)). By the De Moivre’s formula we have √ √ π π + i sin −15 = 215 37 3(cos(−5π) + i sin(−5π)) z 15 = (2 3)15 cos −15 3 3 √ 15 15 7 z = −2 3 3. 1.2.2 Roots of Complex Numbers Let z ∈ C and n ∈ N. A complex number w is called an nth root of z if w n = z. In contrast to the real case, roots of complex numbers are not unique. We will see that there are exactly n different nth roots of z for every z 6= 0. Let z = r(cos ϕ + i sin ϕ) and w = s(cos ψ + i sin ψ) an nth root of z. De Moivre’s formula √ gives w n = sn (cos nψ + i sin nψ). If we compare w n and z we get sn = r or s = n r ≥ 0. ϕ 2kπ , k ∈ Z. For k = 0, 1, . . . , n − 1 we obtain Moreover nψ = ϕ + 2kπ, k ∈ Z or ψ = + n n different values ψ0 , ψ1 , . . . , ψn−1 modulo 2π. We summarize. Lemma 1.22 Let n ∈ N and z = r(cos ϕ + i sin ϕ) 6= 0 a complex number. Then the complex numbers √ ϕ + 2kπ ϕ + 2kπ n , k = 0, 1, . . . , n − 1 wk = r cos + i sin n n are the n different nth roots of z. Example 1.7 Compute the 4th roots of z = −1. w1 w0 | z | = 1 =⇒ | w | = √ 4 1 = 1, arg z = ϕ = 180◦. Hence ϕ , 4 ϕ 1 · 360◦ ψ1 = + = 135◦, 4 4 ϕ 2 · 360◦ = 225◦, ψ2 = + 4 4 ϕ 3 · 360◦ = 315◦. ψ3 = + 4 4 ψ0 = z=-1 w2 w3 We obtain 1√ 1√ 2+i 2 2 2 1√ 1√ 2+i 2, w1 = cos 135◦ + i sin 135◦ = − 2 2 1√ 1√ 2−i 2, w2 = cos 225◦ + i sin 225◦ = − 2 2 1√ 1√ w3 = cos 315◦ + i sin 315◦ = 2−i 2. 2 2 Geometric interpretation of the nth roots. The nth roots of z 6= 0 form a regularp n-gon in the n complex plane with center 0. The vertices lie on a circle with center 0 and radius | z |. w0 = cos 45◦ + i sin 45◦ = 1 Real and Complex Numbers 34 1.3 Inequalities 1.3.1 Monotony of the Power and Exponential Functions Lemma 1.23 (a) For a, b > 0 and r ∈ Q we have a < b ⇐⇒ ar < br r if r > 0, r a < b ⇐⇒ a > b if r < 0. r < s ⇐⇒ ar < as if a > 1, (b) For a > 0 and r, s ∈ Q we have r < s ⇐⇒ ar > as if a < 1. Proof. Suppose that r > 0, r = m/n with integers m, n ∈ Z, n > 0. Using Lemma 1.14 (a) twice we get 1 1 a < b ⇐⇒ am < bm ⇐⇒ (am ) n < (bm ) n , which proves the first claim. The second part r < 0 can be obtained by setting −r in place of r in the first part and using Proposition 1.9 (e). (b) Suppose that s > r. Put x = s − r, then x ∈ Q and x > 0. By (a), 1 < a implies 1 = 1x < ax . Hence 1 < as−r = as /ar (here we used Remark 1.6 (d)), and therefore ar < as . Changing the roles of r and s shows that s < r implies as < ar such that the converse direction is also true. The proof for a < 1 is similar. 1.3.2 The Arithmetic-Geometric mean inequality Proposition 1.24 Let n ∈ N and x1 , . . . , xn be in R+ . Then √ x1 + · · · + xn ≥ n x1 · · · xn . n We have equality if and only if x1 = x2 = · · · = xn . (1.11) Proof. We use forward-backward induction over n. First we show (1.11) is true for all n which are powers of 2. Then we prove that if (1.11) is true for some n + 1, then it is true for n. Hence, it is true for all positive integers. √ √ √ The inequality is true for n = 1. Let a, b ≥ 0 then ( a − b)2 ≥ 0 implies a + b ≥ 2 ab and the inequality is true in case n = 2. Equality holds if and only if a = b. Suppose it is true for some fixed k ∈ N; we will show that it is true for 2k. Let x1 , . . . , xk , y1 , . . . , yk ∈ R+ . Using induction assumption and the inequality in case n = 2, we have ! ! !1/k !1/k k k k k k k X Y Y 1 X 1X 1 1X 1 xi + yi ≥ xi + yi ≥ xi + yi 2k i=1 2 k k 2 i=1 i=1 i=1 i=1 i=1 ≥ k Y i=1 xi k Y i=1 yi ! 2k1 . 1.3 Inequalities 35 This completes the ‘forward’ part. Assume now (1.11) is true for n + 1. We will show it for n. P Let x1 , . . . , xn ∈ R+ and set A := ( ni=1 xi )/n. By induction assumption we have 1 ! n+1 n Y n Y 1 ! n+1 1 1 1 ⇐⇒ A n+1 (x1 + · · · + xn + A) ≥ (nA + A) ≥ xi A xi n+1 n+1 i=1 i=1 1 1 ! ! !1/n n n n n+1 n+1 Y Y Y 1 n A≥ A n+1 ⇐⇒ A n+1 ≥ ⇐⇒ A ≥ xi xi xi . i=1 i=1 i=1 It is trivial that in case x1 = x2 = · · · = xn we have equality. Suppose that equality holds in a case where at least two of the xi are different, say x1 < x2 . Consider the inequality with the new set of values x′1 := x′2 := (x1 + x2 )/2, and x′i = xi for i ≥ 3. Then !1/n n n n X X Y 1 1 xk = xk = x′k ≥ x′k . n n k=1 k=1 k=1 k=1 2 x1 + x2 ′ ′ ⇐⇒ 4x1 x2 ≥ x21 + 2x1 x2 + x22 ⇐⇒ 0 ≥ (x1 − x2 )2 . x1 x2 ≥ x1 x2 = 2 n Y !1/n This contradicts the choice x1 < x2 . Hence, x1 = x2 = · · · = xn is the only case where equality holds. This completes the proof. 1.3.3 The Cauchy–Schwarz Inequality Proposition 1.25 (Cauchy–Schwarz inequality) Suppose that x1 , . . . , xn , y1 , . . . , yn are real numbers. Then we have !2 n n n X X X xk yk ≤ x2k · yk2. (1.12) k=1 k=1 k=1 Equality holds if and only if there exists t ∈ R such that yk = t xk for k = 1, . . . , n that is, the vector y = (y1 , . . . , yn ) is a scalar multiple of the vector x = (x1 , . . . , xn ). Proof. Consider the quadratic function f (t) = at2 − 2bt + c where a= n X x2k , b= k=1 n X xk yk , c= k=1 n X yk2 . k=1 Then f (t) = = n X k=1 n X k=1 x2k t2 − n X k=1 2xk yk t + n X yk2 k=1 n X x2k t2 − 2xk yk t + yk2 = (xk t − yk )2 ≥ 0. k=1 1 Real and Complex Numbers 36 Equality holds if and only if there is a t ∈ R with yk = txk for all k. Suppose now, there is no such t ∈ R. That is f (t) > 0, for all t ∈ R. √ In other words, the polynomial f (t) = at2 − 2bt + c has no real zeros, t1,2 = a1 b ± b2 − ac . That is, the discriminant D = b2 − ac is negative (only complex roots); hence b2 < ac: !2 n n n X X X 2 xk yk < xk · yk2 . k=1 k=1 k=1 this proves the claim. Corollary 1.26 (The Complex Cauchy–Schwarz inequality) If x1 , . . . , xn and y1 , . . . , yn are complex numbers, then 2 n n n X X X 2 xk yk ≤ | xk | | y k |2 . (1.13) k=1 k=1 k=1 Equality holds if and only if there exists a λ ∈ C such that y = λx, where y = (y1 , . . . , yn ) ∈ Cn , x = (x1 , . . . , xn ) ∈ Cn . Proof. Using the generalized triangle inequality | z1 + · · · + zn | ≤ | z1 | + · · · + | zn | and the real Cauchy–Schwarz inequality we obtain 2 !2 !2 n n n n n X X X X X 2 xk yk ≤ | xk yk | = | xk | | yk | ≤ | xk | · | yk |2 . k=1 k=1 k=1 k=1 k=1 This proves the inequality. The right “less equal” is an equality if there is a t ∈ R such that | y | = t | x |. In the first “less equal” sign we have equality if and only if all zk = xk y k have the same argument; that is arg yk = arg xk + ϕ. Putting both together yields y = λx with λ = t(cos ϕ + i sin ϕ). 1.4 Appendix A In this appendix we collect some assitional facts which were not covered by the lecture. We now show that the equation x2 = 2 (1.14) is not satisfied by any rational number x. Suppose to the contrary that there were such an x, we could write x = m/n with integers m and n, n 6= 0 that are not both even. Then (1.14) implies m2 = 2n2 . (1.15) 1.4 Appendix A 37 This shows that m2 is even and hence m is even. Therefore m2 is divisible by 4. It follows that the right hand side of (1.15) is divisible by 4, so that n2 is even, which implies that n is even. But this contradicts our choice of m and n. Hence (1.14) is impossible for rational x. We shall show that A contains no largest element and B contains no smallest. That is for every p ∈ A we can find a rational q ∈ A with p < q and for every p ∈ B we can find a rational q ∈ B such that q < p. Suppose that p is in A. We associate with p > 0 the rational number 2p + 2 2 − p2 = . p+2 p+2 (1.16) 4p2 + 8p + 4 − 2p2 − 8p − 8 2(p2 − 2) = . (p + 2)2 (p + 2)2 (1.17) q =p+ Then q2 − 2 = If p is in A then 2 − p2 > 0, (1.16) shows that q > p, and (1.17) shows that q 2 < 2. If p is in B then 2 < p2 , (1.16) shows that q < p, and (1.17) shows that q 2 > 2. A Non-Archimedean Ordered Field The fields Q and R are Archimedean, see below. But there exist ordered fields without this property. Let F := R(t) the field of rational functions f (t) = p(t)/q(t) where p and q are polynomials with real coefficients. Since p and q have only finitely many zeros, for large t, f (t) is either positive or negative. In the first case we set f > 0. In this way R(t) becomes an ordered field. But t > n for all n ∈ N since the polynomial f (t) = t − n becomes positive for large t (and fixed n). Our aim is to define bx for arbitrary real x. Lemma 1.27 Let b, p be real numbers with b > 1 and p > 0. Set M = {br | r ∈ Q, r < p}, M ′ = {bs | s ∈ Q, p < s}. Then sup M = inf M ′ . Proof. (a) M is bounded above by arbitrary bs , s ∈ Q, with s > p, and M ′ is bounded below by any br , r ∈ Q, with r < p. Hence sup M and inf M ′ both exist. (b) Since r < p < s implies ar < bs by Lemma 1.23, sup M ≤ bs for all bs ∈ M ′ . Taking the infimum over all such bs , sup M ≤ inf M ′ . (c) Let s = sup M and ε > 0 be given. We want to show that inf M ′ < s + ε. Choose n ∈ N such that 1/n < ε/(s(b − 1)). (1.18) By Proposition 1.11 there exist r, s ∈ Q with r < p < s and s−r < 1 . n (1.19) 1 Real and Complex Numbers 38 Using s − r < 1/n, Bernoulli’s inequality (part 2), and (1.18), we compute 1 1 bs − br = br (bs−r − 1) ≤ s(b n − 1) ≤ s (b − 1) < ε. n Hence inf M ′ ≤ bs < br + ε ≤ sup M + ε. Since ε was arbitrary, inf M ′ ≤ sup M, and finally, with the result of (b), inf M ′ = sup M. Corollary 1.28 Suppose p ∈ Q and b > 1 is real. Then bp = sup{br | r ∈ Q, r < p}. Proof. For all rational numbers r, p, s ∈ Q, r < p < s implies ar < ap < as . Hence sup M ≤ ap ≤ inf M ′ . By the lemma, these three numbers coincide. Inequalities Now we extend Bernoulli’s inequality to rational exponents. Proposition 1.29 (Bernoulli’s inequality) Let a ≥ −1 real and r ∈ Q. Then (a) (1 + a)r ≥ 1 + ra if r ≥ 1, (b) (1 + a)r ≤ 1 + ra if 0 ≤ r ≤ 1. Equality holds if and only if a = 0 or r = 1. Proof. (b) Let r = m/n with m ≤ n, m, n ∈ N. Apply (1.11) to xi := 1 + a, i = 1, . . . , m and xi := 1 for i = m + 1, . . . , n. We obtain 1 1 (m(1 + a) + (n − m)1) ≥ (1 + a)m · 1n−m n n m m a + 1 ≥ (1 + a) n , n which proves (b). Equality holds if n = 1 or if x1 = · · · = xn i. e. a = 0. (a) Now let s ≥ 1, z ≥ −1. Setting r = 1/s and a := (1 + z)1/r − 1 we obtain r ≤ 1 and a ≥ −1. Inserting this into (b) yields 1 r (1 + a)r ≤ (1 + z) r ≤ 1 + r ((1 + z)s − 1) z ≤ r ((1 + z)s − 1) 1 + sz ≤ (1 + z)s . This completes the proof of (a). 1.4 Appendix A 39 Corollary 1.30 (Bernoulli’s inequality) Let a ≥ −1 real and x ∈ R. Then (a) (1 + a)x ≥ 1 + xa if x ≥ 1, (b) (1 + a)x ≤ 1 + xa if x ≤ 1. Equality holds if and only if a = 0 or x = 1. Proof. (a) First let a > 0. By Proposition 1.29 (a) (1 + a)r ≥ 1 + ra if r ∈ Q. Hence, (1 + a)x = sup{(1 + a)r | r ∈ Q, r < x} ≥ sup{1 + ra | r ∈ Q, r < x} = 1 + xa. Now let −1 ≤ a < 0. Then r < x implies ra > xa, and Proposition 1.29 (a) implies (1 + a)r ≥ 1 + ra > 1 + xa. (1.20) By definition of the power with a real exponent, see (1.4) (1 + a)x = 1 sup{(1/(a + 1))r | r ∈ Q, r < x} = HW 2.1 inf{(1 + a)r | r ∈ Q, r < x}. Taking in (1.20) the infimum over all r ∈ Q with r < x we obtain (1 + a)x = inf{(1 + a)r | r ∈ Q, r < x} ≥ 1 + xa. (b) The proof is analogous, so we omit it. Proposition 1.31 (Young’s inequality) If a, b ∈ R+ and p > 1, then ab ≤ 1 p 1 q a + b, p q (1.21) where 1/p + 1/q = 1. Equality holds if and only if ap = bq . Proof. First note that 1/q = 1 − 1/p. We reformulate Bernoulli’s inequality for y ∈ R+ and p>1 y p − 1 ≥ p(y − 1) ⇐⇒ 1 p 1 1 (y − 1) + 1 ≥ y ⇐⇒ y p + ≥ y. p p q If b = 0 the statement is always true. If b 6= 0 insert y := ab/bq into the above inequality: p 1 1 ab ab + ≥ q q p b q b 1 ap bp 1 ab + ≥ q | · bq pq p b q b 1 p 1 q a + b ≥ ab, p q since bp+q = bpq . We have equality if y = 1 or p = 1. The later is impossible by assumption. y = 1 is equivalent to bq = ab or bq−1 = a or b(q−1)p = ap (b 6= 0). If b = 0 equality holds if and only if a = 0. 1 Real and Complex Numbers 40 Proposition 1.32 (Hölder’s inequality) Let p > 1, 1/p + 1/q = 1, and x1 , . . . , xn ∈ R+ and y1 , . . . , yn ∈ R+ non-negative real numbers. Then n X k=1 n X xk yk ≤ ! p1 xpk k=1 n X ykq k=1 ! 1q . (1.22) We have equality if and only if there exists c ∈ R such that for all k = 1, . . . , n, xpk /ykq = c (they are proportional). 1 1 P P Proof. Set A := ( nk=1 xpk ) p and B := ( nk=1 ykq ) q . The cases A = 0 and B = 0 are trivial. So we assume A, B > 0. By Young’s inequality we have 1 xpk xk yk 1 ykq · ≤ + A B p Ap q B q n n n 1 X 1 X p 1 X q =⇒ xk yk ≤ x + y AB k=1 pAp k=1 k qB q k=1 k 1 X p 1 X q xk + P q yk = P p p xk q yk 1 1 = + =1 p q ! p1 ! q1 n n n X X X . xk yk ≤ xpk ykq =⇒ k=1 k=1 k=1 Equality holds if and only if xpk /Ap = ykq /B q for all k = 1, . . . , n. Therefore, xpk /ykq = const. Corollary 1.33 (Complex Hölder’s inequality) Let p > 1, 1/p + 1/q = 1 and xk , yk ∈ C, k = 1, . . . , n. Then n X k=1 | xk yk | ≤ n X k=1 p | xk | ! p1 n X k=1 q | yk | ! 1q . Equality holds if and only if | xk |p / | yk |q = const. for k = 1, . . . , n. Proof. Set xk := | xk | and yk := | yk | in (1.22). This will prove the statement. Proposition 1.34 (Minkowski’s inequality) If x1 , . . . , xn ∈ R+ and y1 , . . . , yn ∈ R+ and p ≥ 1 then n X (xk + yk )p k=1 ! p1 ≤ n X k=1 xpk ! p1 Equality holds if p = 1 or if p > 1 and xk /yk = const. + n X k=1 ykp ! p1 . (1.23) 1.4 Appendix A 41 Proof. The case p = 1 is obvious. Let p > 1. As before let q > 0 be the unique positive number with 1/p + 1/q = 1. We compute n n n n X X X X p p−1 p−1 (xk + yk ) = (xk + yk )(xk + yk ) = xk (xk + yk ) + yk (xk + yk )p−1 k=1 ≤ (1.22) ≤ k=1 X X xpk xpk p1 1/p X k + (xk + yk )(p−1)q X ! q1 k=1 + X ykp k=1 1p 1/q X 1/q q (xk + yk )p . yk X (xk + yk )(p−1)q k ! q1 P 1 1 We can assume that (xk + yk )p > 0. Using 1 − = by taking the quotient of the last q p P p 1/q inequality by ( (xk + yk ) ) we obtain the claim. Equality holds if xpk /(xk + yk )(p−1)q = const. and ykp /(xk + yk )(p−1)q) = const.; that is xk /yk = const. Corollary 1.35 (Complex Minkowski’s inequality) If x1 , . . . , xn , y1, . . . , yn ∈ C and p ≥ 1 then ! p1 ! p1 ! p1 n n n X X X p p p ≤ + . (1.24) | xk + yk | | xk | | yk | k=1 k=1 k=1 Equality holds if p = 1 or if p > 1 and xk /yk = λ > 0. Proof. Using the triangle inequality gives | xk + yk | ≤ | xk | + | yk |; hence Pn Pn p p ≤ The real version of Minkowski’s inequality k=1 | xk + yk | k=1 (| xk | + | yk |) . now proves the assertion. If x = (x1 , . . . , xn ) is a vector in Rn or Cn , the (non-negative) number kxkp := n X k=1 | xk | p ! p1 is called the p-norm of the vector x. Minkowski’s inequalitie then reads as kx + ykp ≤ kxkp + kykp which is the triangle inequality for the p-norm. 42 1 Real and Complex Numbers Chapter 2 Sequences and Series This chapter will deal with one of the main notions of calculus, the limit of a sequence. Although we are concerned with real sequences, almost all notions make sense in arbitrary metric spaces like Rn or Cn . Given a ∈ R and ε > 0 we define the ε-neighborhood of a as Uε (a) := (a − ε, a + ε) = {x ∈ R | a − ε < x < a + ε} = {x ∈ R | | x − a | < ε}. 2.1 Convergent Sequences A sequence is a mapping x : N → R. To every n ∈ N we associate a real number xn . We write this as (xn )n∈N or (x1 , x2 , . . . ). For different sequences we use different letters as (an ), (bn ), (yn ). Example 2.1 (a) xn = n1 , n1 , (xn ) = 1, 12 , 13 , . . . ; (b) xn = (−1)n + 1, (xn ) = (0, 2, 0, 2, . . . ); (c) xn = a (a ∈ R fixed), (xn ) = (a, a, . . . ) (constant sequence), (d) xn = 2n − 1, (xn ) = (1, 3, 5, 7, . . . ) the sequence of odd positive integers. (e) xn = an (a ∈ R fixed), (xn ) = (a, a2 , a3 , . . . ) (geometric sequence); Definition 2.1 A sequence (xn ) is said to be convergent to x ∈ R if For every ε > 0 there exists n0 ∈ N such that n ≥ n0 implies | xn − x | < ε. x is called the limit of (xn ) and we write x = lim xn n→∞ or simply x = lim xn or xn → x. If there is no such x with the above property, the sequence (xn ) is said to be divergent. In other words: (xn ) converges to x if any neighborhood Uε (x), ε > 0, contains “almost all” elements of the sequence (xn ). “Almost all” means “all but finitely many.” Sometimes we say “for sufficiently large n” which means the same. 43 2 Sequences and Series 44 This is an equivalent formulation since xn ∈ Uε (x) means x−ε < xn < x+ε, hence | x − xn | < ε. The n0 in question need not to be the smallest possible. We write lim xn = +∞ n→∞ (2.1) if for all E > 0 there exists n0 ∈ N such that n ≥ n0 implies xn ≥ E. Similarly, we write lim xn = −∞ n→∞ (2.2) if for all E > 0 there exists n0 ∈ N such that n ≥ n0 implies xn ≤ −E. In these cases we say that +∞ and −∞ are improper limits of (xn ). Note that in both cases (xn ) is divergent. Example 2.2 This is Example 2.1 continued. (a) limn→∞ n1 = 0. Indeed, let ε > 0 be fixed. We are looking for some n0 with n1 − 0 < ε for all n ≥ n0 . This is equivalent to 1/ε < n. Choose n0 > 1/ε (which is possible by the Archimedean property). Then for all n ≥ n0 we have n ≥ n0 > 1 1 =⇒ < ε =⇒ | xn − 0 | < ε. ε n Therefore, (xn ) tends to 0 as n → ∞. (b) xn = (−1)n + 1 is divergent. Suppose to the contrary that x is the limit. To ε = 1 there is n0 such that for n ≥ n0 we have | xn − x | < 1. For even n ≥ n0 this implies | 2 − x | < 1 for odd n ≥ n0 , | 0 − x | = | x | < 1. The triangle inequality gives 2 = | (2 − x) + x | ≤ | 2 − x | + | x | < 1 + 1 = 2. This is a contradiction. Hence, (xn ) is divergent. (c) xn = a. lim xn = a since | xn − a | = | a − a | = 0 < ε for all ε > 0 and all n ∈ N. (d) lim(2n − 1) = +∞. Indeed, suppose that E > 0 is given. Choose n0 > E2 + 1. Then n ≥ n0 =⇒ n > E + 1 =⇒ 2n − 2 > E =⇒ xn = 2n − 1 > 2n − 2 > E. 2 This proves the claim. Similarly, one can show that lim −n3 = −∞. But both ((−n)n ) and (1, 2, 1, 3, 1, 4, 1, 5, . . . ) have no improper limit. Indeed, the first one becomes arbitrarily large for even n and arbitrarily small for odd n. The second one becomes large for eveb n but is constant for odd n. (e) xn = an , (a ≥ 0). ( 1, if a = 1, lim an = n→∞ 0, if 0 ≤ a < 1. (an ) is divergent for a > 1. Moreover, lim an = +∞. To prove this let E > 0 be given. By the Archimedean property of R and since a − 1 > 0 we find m ∈ N such that m(a − 1) > E. Bernoulli’s inequality gives am ≥ m(a − 1) + 1 > m(a − 1) > E. 2.1 Convergent Sequences 45 By Lemma 1.23 (b), n ≥ m implies an ≥ am > E. This proves the claim. Clearly (an ) is convergent in cases a = 0 and a = 1 since it is constant then. Let 0 < a < 1 and put b = a1 − 1; then b > 0. Bernoulli’s inequality gives n 1 = (b + 1)n ≥ 1 + nb > nb a 1 =⇒ 0 < an < . nb 1 = an Let ε > 0. Choose n0 > (2.3) 1 1 . Then ε > and n ≥ n0 implies εb n0 b | an − 0 | = | an | = an < (2.3) 1 1 ≤ < ε. nb n0 b Hence, an → 0. Proposition 2.1 The limit of a convergent sequence is uniquely determined. Proof. Suppose that x = lim xn and y = lim xn and x 6= y. Put ε := | x − y | /2 > 0. Then ∃n1 ∈ N ∀n ≥ n1 : | x − xn | < ε, ∃n2 ∈ N ∀n ≥ n2 : | y − xn | < ε. Choose m ≥ max{n1 , n2 }. Then | x − xm | < ε and | y − xm | < ε. Hence, | x − y | ≤ | x − xm | + | y − xm | < 2ε = | x − y | . This contradiction establishes the statement. Proposition 2.1 holds in arbitrary metric spaces. Definition 2.2 A sequence (xn ) is said to be bounded if the set of its elements is a bounded set; i. e. there is a C ≥ 0 such that | xn | ≤ C for all n ∈ N. Similarly, (xn ) is said to be bounded above or bounded below if there exists C ∈ R such that xn ≤ C or xn ≥ C, respectively, for all n ∈ N Proposition 2.2 If (xn ) is convergent, then (xn ) is bounded. 2 Sequences and Series 46 Proof. Let x = lim xn . To ε = 1 there exists n0 ∈ N such that | x − xn | < 1 for all n ≥ n0 . Then | xn | = | xn − x + x | ≤ | xn − x | + | x | < | x | + 1 for all n ≥ n0 . Put C := max{| x1 | , . . . , | xn0 −1 | , | x | + 1}. Then | xn | ≤ C for all n ∈ N. The reversal statement is not true; there are bounded sequences which are not convergent, see Example 2.1 (b). Ex Class: If (xn ) has an improper limit, then (xn ) is divergent. Proof. Suppose to the contrary that (xn ) is convergent; then it is bounded, say | xn | ≤ C for all n. This contradicts xn > E as well as xn < −E for E = C and sufficiently large n. Hence, (xn ) has no improper limits, a contradiction. 2.1.1 Algebraic operations with sequences The sum, difference, product, quotient and absolute value of sequences (xn ) and (yn ) are defined as follows (xn ) ± (yn ) := (xn ± yn ), (xn ) xn := , (yn 6= 0) (yn ) yn (xn ) · (yn ) := (xn yn ), | (xn ) | := (| xn |). Proposition 2.3 If (xn ) and (yn ) are convergent sequences and c ∈ R, then their sum, difference, product, quotient (provided yn 6= 0 and lim yn 6= 0), and their absolute values are also convergent: (a) lim(xn ± yn ) = lim xn ± lim yn ; (b) lim(cxn ) = c lim xn , lim(xn + c) = lim xn + c. (c) lim(xn yn ) = lim xn · lim yn ; xn (d) lim xynn = lim if yn 6= 0 for all n and lim yn 6= 0; lim yn (e) lim | xn | = | lim xn | . Proof. Let xn → x and yn → y. (a) Given ε > 0 then there exist integers n1 and n2 such that n ≥ n1 implies | xn − x | < ε/2 and n ≥ n2 implies | yn − y | < ε/2. If n0 := max{n1 , n2 }, then n ≥ n0 implies | (xn + yn ) − (x + y) | ≤ | xn − x | + | yn − y | ≤ ε. The proof for the difference is quite similar. (b) follows from | cxn − cx | = | c | | xn − x | and | (xn + c) − (x + c) | = | xn − x |. (c) We use the identity xn yn − xy = (xn − x)(yn − y) + x(yn − y) + y(xn − x). Given ε > 0 there are integers n1 and n2 such that (2.4) 2.1 Convergent Sequences 47 n ≥ n1 implies | xn − x | < √ ε and n ≥ n2 implies | yn − y | < √ ε. If we take n0 = max{n1 , n2 }, n ≥ n0 implies | (xn − x)(yn − y) | < ε, so that lim (xn − x)(yn − y) = 0. n→∞ Now we apply (a) and (b) to (2.4) and conclude that lim (xn yn − xy) = 0. n→∞ (d) Choosing n1 such that | yn − y | < | y | /2 if n ≥ n1 , we see that | y | ≤ | y − yn | + | yn | < | y | /2 + | yn | =⇒ | yn | > | y | /2. Given ε > 0, there is an integer n2 > n1 such that n ≥ n2 implies | yn − y | < | y |2 ε/2. Hence, for n ≥ n2 , 1 yn − y 1 yn − y = yn y < 2 | yn − y | < ε | y |2 1 . The general case can be reduced to the above case using (c) and lim yn (xn /yn ) = (xn · 1/yn ). (e) By Lemma 1.12 (e) we have | | xn | − | x | | ≤ | xn − x |. Given ε > 0, there is n0 such that n ≥ n0 implies | xn − x | < ε. By the above inequality, also | | xn | − | x | | ≤ ε and we are done. and we get lim( y1n ) = Example 2.3 (a) zn := n+1 . Set xn = 1 and yn = 1/n. Then zn = xn + yn and we already n = lim 1 + lim n1 = 1 + 0 = 1. know that lim xn = 1 and lim yn = 0. Hence, lim n+1 n 2 (b) an = 3nn2+13n . We can write this as −2 an = 3 + 13 n . 1 − n22 Since lim 1/n = 0, by Proposition 2.3, we obtain lim 1/n2 = 0 and lim 13/n = 0. Hence lim 2/n2 = 0 and lim (3 + 13/n) = 3. Finally, limn→∞ 3 + 13 3 3n2 + 13n n = = = 3. lim 2 2 n→∞ n −2 1 limn→∞ 1 − n (c) We introduce the notion of a polynomial and and a rational function. Given a0 , a1 , . . . , an ∈ R, an 6= 0. The function p : R → R given by p(t) := an tn + an−1 tn−1 + · · · + a1 t + a0 is called a polynomial. The positive integer n is the degree of the polynomial 2 Sequences and Series 48 p(t), and a1 , . . . an are called the coefficients of p(t). The set of all real polynomials forms a real vector space denoted by R[x]. Given two polynomials p and q; put D := {t ∈ R | q(t) 6= 0}. Then r = pq is a called a rational function where r : D → R is defined by r(t) := p(t) . q(t) Polynomials are special rational functions with q(t) ≡ 1. the set of rational functions with real coefficients form both a real vector space and a field. It is denoted by R(x). Lemma 2.4 (a) Let an → 0 be a sequence tending to zero with an 6= 0 for every n. Then ( +∞, if an > 0 for almost all n; 1 lim = n→∞ an −∞, if an < 0 for almost all n. (b) Let yn → a be a sequence converging to a and a > 0. Then yn > 0 for almost all n ∈ N. Proof. (a) We will prove the case with −∞. Let ε > 0. By assumption there is a positive integer n0 such that n ≥ n0 implies −ε < an < 0. Tis implies 0 < −an < ε and further a1n < − 1ε < 0. Suppose E > 0 is given; choose ε = 1/E and n0 as above. Then by the previous argument, n ≥ n0 implies 1 1 < − = −E. an ε This shows limn→∞ a1n = −∞. (b) To ε = a there exists n0 such that n ≥ n0 implies | yn − a | < a. That is −a < yn − a < a or 0 < yn < 2a which proves the claim. Lemma 2.5 Suppose that p(t) = ar 6= 0 and bs 6= 0. Then k k=0 ak t 0, ar , p(n) = br n→∞ q(n) +∞, −∞, lim Proof. Note first that Pr and q(t) = Ps k k=0 bk t are real polynomials with r < s, r = s, r>s and r>s and ar br ar br > 0, < 0. nr ar + ar−1 n1 + · · · + a0 n1r ar + ar−1 n1 + · · · + a0 n1r p(n) 1 1 = s = · 1 1 =: s−r · cn 1 1 s−r q(n) n n bs + bs−1 n + · · · + b0 ns n bs + bs−1 n + · · · + b0 ns Suppose that r = s. By Proposition 2.3, n1k −→ 0 for all k ∈ N. By the same proposition, the limits of each summand in the numerator and denominator is 0 except for the first in each sum. Hence, limn→∞ ar + ar−1 n1 + · · · + a0 n1r p(n) a = r. = lim 1 1 n→∞ q(n) br limn→∞ bs + bs−1 n + · · · + b0 ns 2.1 Convergent Sequences 49 Supose now that r < s. As in the previous case, the sequence (cn ) tends to ar /bs but the first 1 factor ns−r tends to 0. Hence, the product sequence tends to 0. Suppose that r > s and abrs > 0. The sequence (cn ) has a positive limit. By Lemma 2.4 (b) almost all cn > 0. Hence, q(n) 1 1 dn := = r−s p(n) n cn tends above part and dn > 0 for almost all n. By Lemma 2.4 (a), the sequence to 0 by the p(n) 1 = q(n) tends to +∞ as n → ∞, which proves the claim in the first case. The case dn ar /bs < 0 can be obtained by multiplying with −1 and noting that limn→∞ xn = +∞ implies limn→∞ (−xn ) = −∞. In the German literature the next proposition is known as the ‘Theorem of the two policemen’. Proposition 2.6 (Sandwich Theorem) Let an , bn and xn be real sequences with an ≤ xn ≤ bn for all but finitely many n ∈ N. Further let limn→∞ an = limn→∞ bn = x. Then xn is also convergent to x. Proof. Let ε > 0. There exist n1 , n2 , and n3 ∈ N such n ≥ n1 implies an ∈ Uε (x), n ≥ n2 implies bn ∈ Uε (x), and n ≥ n3 implies an ≤ xn ≤ bn . Choosing n0 = max{n1 , n2 , n3 }, n ≥ n0 implies xn ∈ Uε (x). Hence, xn → x. Remark. (a) If two sequences (an ) and (bn ) differ only in finitely many elements, then both sequences converge to the same limit or both diverge. (b) Define the “shifted sequence” bn := an+k , n ∈ N, where k is a fixed positv integer. Then both sequences converge to the same limit or both diverge. 2.1.2 Some special sequences 1 (a) If p > 0, then lim p = 0. n→∞ n √ (b) If p > 0, then lim n p = 1. n→∞ √ (c) lim n n = 1. n→∞ nα (d) If a > 1 and α ∈ R, then lim n = 0. n→∞ a Proposition 2.7 Proof. (a) Let ε > 0. Take n0 > (1/ε)1/p (Note that the Archimedean Property of the real numbers is used here). Then n ≥ n0 implies 1/np < ε. √ (b) If p > 1, put xn = n p−1. Then, xn > 0 and by Bernoulli’s inequality (that is by homework 1 1 that is 4.1) we have p n = (1 + p − 1) n < 1 + p−1 n 0 < xn ≤ 1 (p − 1) n By Proposition 2.6, xn → 0. If p = 1, (b) is trivial, and if 0 < p < 1 the result is obtained by taking reciprocals. 2 Sequences and Series 50 (c) Put xn = √ n n − 1. Then xn ≥ 0, and, by the binomial theorem, n = (1 + xn )n ≥ Hence n(n − 1) 2 xn . 2 r 2 , (n ≥ 2). n−1 √ 1 → 0. Applying the sandwich theorem again, xn → 0 and so n n → 1. By (a), √n−1 (d) Put p = a − 1, then p > 0. Let k be an integer such that k > α, k > 0. For n > 2k, n k n(n − 1) · · · (n − k + 1) k nk pk n p = p > k . (1 + p) > k k! 2 k! 0 ≤ xn ≤ Hence, 0< nα nα 2k k! α−k = < n an (1 + p)n pk (n > 2k). Since α − k < 0, nα−k → 0 by (a). Q 1. Let (xn ) be a convergent sequence, xn → x. Then the sequence of arithmetic means n X sn := n1 xk also converges to x. k=1 Q 2. Let (xn ) > 0 a convergent sequence of positive numbers with and lim xn = x > 0. Then √ n x1 x2 · · · xn → x. Hint: Consider yn = log xn . 2.1.3 Monotonic Sequences Definition 2.3 A real sequence (xn ) is said to be (a) monotonically increasing if xn ≤ xn+1 for all n; (b) monotonically decreasing if xn ≥ xn+1 for all n. The class of monotonic sequences consists of the increasing and decreasing sequences. A sequence is said to be strictly monotonically increasing or decreasing if xn < xn+1 or xn > xn+1 for all n, respectively. We write xn ր and xn ց. Proposition 2.8 A monotonic and bounded sequence is convergent. More precisely, if (xn ) is increasing and bounded above, then lim xn = sup{xn }. If (xn ) is decreasing and bounded below, then lim xn = inf{xn }. Proof. Suppose xn ≤ xn+1 for all n (the proof is analogous in the other case). Let E := {xn | n ∈ N} and x = sup E. Then xn ≤ x, n ∈ N. For every ε > 0 there is an integer n0 ∈ N such that x − ε < xn0 < x, for otherwise x − ε would be an upper bound of E. Since xn increases, n ≥ n0 implies x − ε < xn < x, 2.1 Convergent Sequences 51 which shows that (xn ) converges to x. cn with some fixed c > 0. We will show that xn → 0 as n → ∞. n! Writing (xn ) recursively, Example 2.4 Let xn = xn+1 = c xn , n+1 (2.5) c < n+1 xn . On the other hand, xn > 0 for all n such that (xn ) is bounded below by 0. By Proposition 2.8, (xn ) converges to some x ∈ R. Taking the limit n → ∞ in (2.5), we have we observe that (xn ) is strictly decreasing for n ≥ c. Indeed, n ≥ c implies xn+1 = xn c · lim xn = 0 · x = 0. n→∞ n + 1 n→∞ x = lim xn+1 = lim n→∞ Hence, the sequence tends to 0. 2.1.4 Subsequences Definition 2.4 Let (xn ) be a sequence and (nk )k∈N a strictly increasing sequence of positive integers nk ∈ N. We call (xnk )k∈N a subsequence of (xn )n∈N . If (xnk ) converges, its limit is called a subsequential limit of (xn ). Example 2.5 (a) xn = 1/n, nk = 2k . then (xnk ) = (1/2, 1/4, 1/8, . . . ). (b) (xn ) = (1, −1, 1, −1, . . . ). (x2k ) = (−1, −1, . . . ) has the subsequential limit −1; (x2k+1 ) = (1, 1, 1, . . . ) has the subsequential limit 1. Proposition 2.9 Subsequences of convergent sequences are convergent with the same limit. Proof. Let lim xn = x and xnk be a subsequence. To ε > 0 there exists m0 ∈ N such that n ≥ m0 implies | xn − x | < ε. Since nm ≥ m for all m, m ≥ m0 implies | xnm − x | < ε; hence lim xnm = x. Definition 2.5 Let (xn ) be a sequence. We call x ∈ R a limit point of (xn ) if every neighborhood of x contains infinitely many elements of (xn ). Proposition 2.10 The point x is limit point of the sequence (xn ) if and only if x is a subsequential limit. Proof. If lim xnk = x then every neighborhood Uε (x) contains all but finitely many xnk ; in k→∞ particular, it contains infinitely many elements xn . That is, x is a limit point of (xn ). Suppose x is a limit point of (xn ). To ε = 1 there exists xn1 ∈ U1 (x). To ε = 1/k there exists nk with xnk ∈ U1/k (x) and nk > nk−1 . We have constructed a subsequence (xnk ) of (xn ) with | x − xnk | < 1 ; k 2 Sequences and Series 52 Hence, (xnk ) converges to x. Question: Which sequences do have limit points? The answer is: Every bounded sequence has limit points. Proposition 2.11 (Principle of nested intervals) Let In := [an , bn ] a sequence of closed nested intervals In+1 ⊆ In such that their lengths bn − an tend to 0: Given ε > 0 there exists n0 such that 0 ≤ bn − an < ε for all n ≥ n0 . For any such interval sequence {In } there exists a unique real number x ∈ R which is a member T of all intervals, i. e. {x} = n∈N In . Proof. Since the intervals are nested, (an ) ր is an increasing sequence bounded above by each of the bk , and (bn ) ց is decreasing sequence bounded below by each of the ak . Consequently, by Proposition 2.8 we have ∃x = lim an = sup{an } ≤ bm , n→∞ for all m, and ∃y = lim bm = inf{bm } ≥ x. Since an ≤ x ≤ y ≤ bn for all n ∈ N ∅ 6= [x, y] ⊆ m→∞ \ N In . n∈ T We show the converse inclusion namely that n∈N [an , bn ] ⊆ [x, y]. Let p ∈ In for all n, that is, an ≤ p ≤ bn for all n ∈ N. Hence supn an ≤ p ≤ inf n bn ; that is p ∈ [x, y]. Thus, T [x, y] = n∈N In . We show uniqueness, that is x = y. Given ε > 0 we find n such that y − x ≤ bn − an ≤ ε. Hence y − x ≤ 0; therefore x = y. The intersection contains a unique point x. Proposition 2.12 (Bolzano–Weierstraß) A bounded real sequence has a limit point. Proof. We use the principle of nested intervals. Let (xn ) be bounded, say | xn | ≤ C. Hence, the interval [−C, C] contains infinitely many xk . Consider the intervals [−C, 0] and [0, C]. At least one of them contains infinitely many xk , say I1 := [a1 , b1 ]. Suppose, we have already constructed In = [an , bn ] of length bn − an = C/2n−2 which contains infinitely many xk . Consider the two intervals [an , (an + bn )/2] and [(an + bn )/2, bn ] of length C/2n−1 . At least one of them still contains infinitely many xk , say In+1 := [an+1 , bn+1 ]. In this way we have constructed a nested sequence of intervals which length go to 0. By Proposition 2.11, there T exists a unique x ∈ n∈N In . We will show that x is a subsequential limit of (xn ) (and hence a limit point). For, choose xnk ∈ Ik ; this is possible since Ik contains infinitely many xm . Then, ak ≤ xnk ≤ bk for all k ∈ N. Proposition 2.6 gives x = lim ak ≤ lim xnk ≤ lim bk = x; hence lim xnk = x. Remark. The principle of nested intevals is equivalent to the order completeness of R. 2.1 Convergent Sequences 53 Example 2.6 (a) xn = (−1)n−1 + n1 ; set of limit points is {−1, 1}. First note that −1 = limn→∞ x2n and 1 = limn→∞ x2n+1 are subsequential limits of (xn ). We show, for example, that 13 is not a limit point. Indeed, for n ≥ 4, there exists a small neighborhood of 13 which has no intersection with U 1 (1) and U 1 (−1). Hence, 13 is not a limit point. 4 hni 4 , where [x] denotes the least integer less than or equal to x ([π] = [3] = 3, (b) xn = n − 5 5 [−2.8] = −3, [1/2] = 0). (xn ) = (1, 2, 3, 4, 0, 1, 2, 3, 4, 0, . . . ); the set of limit points is {0, 1, 2, 3, 4} (c) One can enumerate the rational numbers in (0, 1) in the following way. 1 , x1 , 2 1 2 , 3, x2 , x3 3 1 2 3 , 4, , x4 , x5 , x6 , 4 4 ··· ··· The set of limit points is the whole interval [0, 1] since in any neighborhood of any real number there is a rational number, see Proposition 1.11 (b) 1.11 (b). Any rational number of (0, 1) appears infinitely often in this sequence, namely as pq = 2p = 3p = ···. 2q 3q (d) xn = n has no limit point. Since (xn ) is not bounded, Bolzano-Weierstraß fails to apply. Definition 2.6 (a) Let (xn ) be a bounded sequence and A its set of limit points. Then sup A is called the upper limit of (xn ) and inf A is called the lower limit of (xn ). We write lim xn , and n→∞ = lim xn . n→∞ for the upper and lower limits of (xn ), respectively. (b) If (xn ) is not bounded above, we write lim xn = +∞. If moreover +∞ is the only limit point, lim xn = +∞, and we can also write lim xn = +∞. If (xn ) is not bounded below, lim xn = −∞. Proposition 2.13 Let (xn ) be a bounded sequence and A the set of limit points of (xn ). Then lim xn and lim xn are also limit points of (xn ). Proof. Let x = lim xn . Let ε > 0. By the definition of the supremum of A there exists x′ ∈ A with ε x − < x′ < x. 2 Since x′ is a limit point, U 2ε (x′ ) contains infinitely many elements xk . By construction, U 2ε (x′ ) ⊆ Uε (x). Indeed, x ∈ U 2ε (x′ ) implies | x − x′ | < 2ε and therefore | x − x | = | x − x′ + x′ − x | ≤ | x − x′ | + | x′ − x | < ε ε + = ε. 2 2 Hence, x is a limit point, too. The proof for lim xn is similar. Proposition 2.14 Let b ∈ R be fixed. Suppose (xn ) is a sequence which is bounded above, then xn ≤ b for all but finitely many n implies lim xn ≤ b. n→∞ (2.6) 2 Sequences and Series 54 Similarly, if (xn ) is bounded below, then xn ≥ b for all but finitely many n implies lim xn ≥ b. (2.7) n→∞ Proof. We prove only the first part for lim xn . Proving statement for lim xn is similar. Let t := lim xn . Suppose to the contrary that t > b. Set ε = (t − b)/2, then Uε (t) contains infinitely many xn (t is a limit point) which are all greater than b; this contradicts xn ≤ b for all but finitely many n. Hence lim xn ≤ b. Applying the first part to b = supn {xn } and noting that inf A ≤ sup A, we have inf {xn } ≤ lim xn ≤ lim xn ≤ sup{xn }. n n→∞ n→∞ n The next proposition is a converse statement to Proposition 2.9. Proposition 2.15 Let (xn ) be a bounded sequence with a unique limit point x. Then (xn ) converges to x. Proof. Suppose to the contrary that (xn ) diverges; that is, there exists some ε > 0 such that infinitely many xn are outside Uε (x). We view these elements as a subsequence (yk ) := (xnk ) of (xn ). Since (xn ) is bounded, so is (yk ). By Proposition 2.12 there exists a limit point y of (yk ) which is in turn also a limit point of (xn ). Since y 6∈ Uε (x), y 6= x is a second limit point; a contradiction! We conclude that (xn ) converges to x. Note that t := lim xn is uniquely characterized by the following two properties. For every ε > 0 t − ε <xn , for infinitely many n, xn < t + ε, for almost all n. (See also homework 6.2) Let us consider the above examples. n−1 Example 2.7 (a) + 1/n; lim xn = −1, lim xn = 1. h nxin = (−1) , lim xn = 0, lim xn = 4. (b) xn = n − 5 5 (c) (xn ) is the sequence of rational numbers of (0, 1); lim xn = 0, lim xn = 1. (d) xn = n; lim xn = lim xn = +∞. Proposition 2.16 If sn ≤ tn for all but finitely many n, then lim sn ≤ lim tn , n→∞ n→∞ lim sn ≤ lim tn . n→∞ n→∞ 2.2 Cauchy Sequences 55 Proof. (a) We keep the notations s∗ and t∗ for the upper limits of (sn ) and (tn ), respectively. Set s = lim sn and t = lim tn . Let ε > 0. By homework 6.3 (a) =⇒ by assumption s − ε ≤ sn for all but finitely many n s − ε ≤ sn ≤ tn for all but finitely many n =⇒ s − ε ≤ lim tn by Prp. 2.14 =⇒ by first Remark in Subsection 1.1.7 sup{s − ε | ε > 0 } ≤ t s ≤ t. (b) The proof for the lower limit follows from (a) and − sup E = inf(−E). 2.2 Cauchy Sequences The aim of this section is to characterize convergent sequences without knowing their limits. Definition 2.7 A sequence (xn ) is said to be a Cauchy sequence if: For every ε > 0 there exists a positive integer n0 such that | xn − xm | < ε for all m, n ≥ n0 . The definition makes sense in arbitrary metric spaces. The definition is equivalent to ∀ ε > 0 ∃ n0 ∈ N ∀ n ≥ n0 ∀ k ∈ N : | xn+k − xn | < ε. Lemma 2.17 Every convergent sequence is a Cauchy sequence. Proof. Let xn → x. To ε > 0 there is n0 ∈ N such that n ≥ n0 implies xn ∈ Uε/2 (x). By triangle inequality, m, n ≥ n0 implies | xn − xm | ≤ | xn − x | + | xm − x | ≤ ε/2 + ε/2 = ε. Hence, (xn ) is a Cauchy sequence. Proposition 2.18 (Cauchy convergence criterion) A real sequence is convergent if and only if it is a Cauchy sequence. Proof. One direction is Lemma 2.17. We prove the other direction. Let (xn ) be a Cauchy sequence. First we show that (xn ) is bounded. To ε = 1 there is a positive integer n0 such that m, n ≥ n0 implies | xm − xn | < 1. In particular | xn − xn0 | < 1 for all n ≥ n0 ; hence | xn | < 1 + | xn0 |. Setting C = max{| x1 | , | x2 | , . . . , | xn0 −1 | , | xn0 | + 1}, 2 Sequences and Series 56 | xn | < C for all n. By Proposition 2.12 there exists a limit point x of (xn ); and by Proposition 2.10 a subsequence (xnk ) converging to x. We will show that limn→∞ xn = x. Let ε > 0. Since xnk → x we find k0 ∈ N such that k ≥ k0 implies | xnk − x | < ε/2. Since (xn ) is a Cauchy sequence, there exists n0 ∈ N such that m, n ≥ n0 implies | xn − xm | < ε/2. Put n1 := max{n0 , nk0 } and choose k1 with nk1 ≥ n1 ≥ nk0 . Then n ≥ n1 implies | x − xn | ≤ x − xnk1 + xnk1 − xn < 2 · ε/2 = ε. Example 2.8 (a) xn = sequence. For, consider Pn 1 k=1 k = 1+ x2m − xm = 1 2 + 1 3 + · · · + n1 . We show that (xn ) is not a Cauchy 2m 2m X X 1 1 1 1 ≥ =m· = . k k=m+1 2m 2m 2 k=m+1 Hence, there is no n0 such that p, n ≥ n0 implies | xp − xn | < 12 . n X (−1)k+1 (b) xn = = 1 − 1/2 + 1/3 − + · · · + (−1)n+1 1/n. Consider k k=1 1 1 1 1 k+1 xn+k − xn = (−1) − + − · · · + (−1) n+1 n+2 n+3 n+k 1 1 1 1 + +··· − − = (−1)n n+1 n+2 n+3 n+4 ( 1 1 , k even − n+k n+k−1 + 1 , k odd n+k n Since all summands in parentheses are positive, we conclude ( 1 1 − n+k , 1 1 1 1 n+k−1 + +···+ − − | xn+k − xn | = 1 n+1 n+2 n+3 n+4 , n+k ( " 1 , k even 1 1 1 + · · · + n+k 1 − − = 1 n+1 n+2 n+3 − , k even n+k−1 k even k odd n+k 1 , | xn+k − xn | < n+1 since all summands in parentheses are positive. Hence, (xn ) is a Cauchy sequence and converges. 2.3 Series 57 2.3 Series Definition 2.8 Given a sequence (an ), we associate with (an ) a sequence (sn ), where sn = n X k=1 ak = a1 + a2 + · · · + an . For (sn ) we also use the symbol ∞ X ak , (2.8) k=1 and we call it an infinite series or just a series. The numbers sn are called the partial sums of the series. If (sn ) converges to s, we say that the series converges, and write ∞ X ak = s. k=1 The number s is called the sum of the series. Remarks 2.1 (a) The sum of a series should be clearly understood as the limit of the sequence of partial sums; it is not simply obtained by addition. (b) If (sn ) diverges, the series is said to be divergent. P (c) The symbol ∞ as well as the limit of this k=1 ak means both, the sequence of partial sums P∞ sequence (if it exists). Sometimes we use series of the form k=k0 ak , k0 ∈ N. We simply P write ak if there is no ambiguity about the bounds of the index k. Example 2.9 (Example 2.8 continued) ∞ X 1 is divergent. This is the harmonic series. (1) n n=1 ∞ X 1 (2) (−1)n+1 is convergent. It is an example of an alternating series (the summands are n n=1 changing their signs, and the absolute value of the summands form a decreasing to 0 sequence). ∞ X P n 1 q n is called the geometric series. It is convergent for | q | < 1 with ∞ (3) 0 q = 1−q . This n=0 n X 1 − q n+1 , see proof of Lemma 1.14, first formula with y = 1, x = q. is seen from q = 1−q k=0 The series diverges for | q | ≥ 1. The general formula in case | q | < 1 is k ∞ X cq n = n=n0 cq n0 . 1−q 2.3.1 Properties of Convergent Series P P an is convergent, then ∞ Lemma 2.19 (1) If ∞ n=1 k=m ak is convergent for any m ∈ N. P P (2) If an is convergent, then the sequence rn := ∞ k=n+1 ak tends to 0 as n → ∞. (2.9) 2 Sequences and Series 58 (3) If (an ) is a sequence of nonnegative real numbers, then partial sums are bounded. P an converges if and only if the P P∞ Proof. (1). Suppose that ∞ a = s − (a1 + a2 + · · · + am−1 ). n=1 an = s; we show that n=m P∞k P Indeed, let (sn ) and (tn ) denote the nth partial sums of k=1 ak and ∞ k=m ak , respectively. Pm−1 Then for n > m one has tn = sn − k=1 ak . Taking the limit n → ∞ proves the claim. P∞ P We prove (2). Suppose that ∞ 1 an converges to s. By (1), rn = k=n+1 ak is also a convergent series for all n. We have ∞ X ak = k=1 n X ak + k=1 =⇒ s = sn + rn ∞ X ak k=n+1 =⇒ rn = s − sn =⇒ lim rn = s − s = 0. n→∞ (3) Suppose an ≥ 0, then sn+1 = sn + an+1 ≥ sn . Hence, (sn ) is an increasing sequence. By Proposition 2.8, (sn ) converges. The other direction is trivial since every convergent sequence is bounded. Proposition 2.20 (Cauchy criterion) integer n0 ∈ N such that P an converges if and only if for every ε > 0 there is an n X ak < ε (2.10) k=m if m, n ≥ n0 . Proof. Clear from Proposition 2.18. Consider the sequence of partial sums sn = P note that for n ≥ m one has| sn − sm−1 | = | nk=m ak |. Corollary 2.21 If P Pn k=1 ak and an converges, then (an ) converges to 0. Proof. Take m = n in (2.10); this yields | an | < ε. Hence (an ) tends to 0. Proposition 2.22 (Comparison test) (a) If | an | ≤ Cbn for some C > 0 and for almost all P P n ∈ N, and if bn converges, then an converges. P P (b) If an ≥ Cdn ≥ 0 for some C > 0 and for almost all n, and if dn diverges, then an diverges. 2.3 Series 59 Proof. (a) Suppose n ≥ n1 implies | an | ≤ Cbn . Given ε > 0, there exists n0 ≥ n1 such that m, n ≥ n0 implies n X ε bk < C k=m by the Cauchy criterion. Hence n n n X X X ak ≤ | ak | ≤ Cbk < ε, k=m k=m k=m and (a) follows by the Cauchy criterion. P P (b) follows from (a), for if an converges, so must dn . 2.3.2 Operations with Convergent Series P P Definition 2.9 If an and bn are series, we define sums and differences as follows P P P P P can , c ∈ R. an ± bn := (an ± bn ) and c an := Pn P P P Let cn := k=1 ak bn−k+1 , then cn is called the Cauchy product of an and bn . P P P∞ P∞ P (a + b ) = a + If an and bn are convergent, it is easy to see that ∞ n n n 1 1 1 bn and Pn Pn 1 can = c 1 an . P √ Caution, the product series cn need not to be convergent. Indeed, let an := bn := (−1)n / n. P P P One can show that an and bn are convergent (see Proposition 2.29 below), however, cn P is not convergent, when cn = nk=1 ak bn−k+1 . Proof: By the arithmetic-geometric mean inP 2 2n 2 ≥ n+1 . Hence, | cn | ≥ nk=1 n+1 = n+1 . Since cn doesn’t equality, | ak bn−k+1 | = √ 1 k(n+1−k) P∞ converge to 0 as n → ∞, n=0 cn diverges by Corollary 2.21 2.3.3 Series of Nonnegative Numbers Proposition 2.23 (Compression Theorem) Suppose a1 ≥ a2 ≥ · · · ≥ 0. Then the series ∞ X an converges if and only if the series n=1 ∞ X k=0 2k a2k = a1 + 2a2 + 4a4 + 8a8 + · · · converges. Proof. By Lemma 2.19 (3) it suffices to consider boundedness of the partial sums. Let sn = a1 + · · · + an , tk = a1 + 2a2 + · · · + 2k a2k . (2.11) 2 Sequences and Series 60 For n < 2k sn ≤ a1 + (a2 + a3 ) + · · · + (a2k + · · · + a2k+1 −1 ) sn ≤ a1 + 2a2 + · · · + 2k a2k = tk . (2.12) On the other hand, if n > 2k , sn ≥ a1 + a2 + (a3 + a4 ) + · · · + (a2k−1 +1 + · · · + a2k ) 1 sn ≥ a1 + a2 + 2a4 + · · · + 2k−1a2k 2 1 sn ≥ tk . 2 (2.13) By (2.12) and (2.13), the sequences sn and tk are either both bounded or both unbounded. This completes the proof. ∞ X 1 Example 2.10 (a) converges if p > 1 and diverges if p ≤ 1. np n=1 If p ≤ 0, divergence follows from Corollary 2.21. If p > 0 Proposition 2.23 is applicable, and we are led to the series k ∞ ∞ X X 1 k 1 2 kp = . 2 2p−1 k=0 k=0 1 This is a geometric series wit q = p−1 . It converges if and only if 2p−1 > 1 if and only if 2 p > 1. (b) If p > 1, ∞ X n=2 1 n(log n)p (2.14) converges; if p ≤ 1, the series diverges. “log n” denotes the logarithm to the base e. If p < 0, n(log1 n)p > n1 and divergence follows by comparison with the harmonic series. Now let p > 0. By Lemma 1.23 (b), log n < log(n + 1). Hence (n(log n)p ) increases and 1/(n(log n))p decreases; we can apply Proposition 2.23 to (2.14). This leads us to the series ∞ X k=1 2k · ∞ ∞ X X 1 1 1 1 = = , p p p 2k (log 2k )p (k log 2) (log 2) k k=1 k=1 and the assertion follows from example (a). This procedure can evidently be continued. For instance ∞ X n=3 ∞ X n=3 1 converges. n log n(log log n)2 1 diverges, whereas n log n log log n 2.3 Series 61 2.3.4 The Number e Leonhard Euler (Basel 1707 – St. Petersburg 1783) was one of the greatest mathematicians. He made contributions to Number Theorie, Ordinary Differential Equations, Calculus of Varian tions, Astronomy, Mechanics. Fermat (1635) claimed that all numbers of the form fn = 22 +1, n ∈ N, are prime numbers. This is obviously true for the first 5 numbers (3, 5, 17, 257, 65537). Euler showed that 641 | 232 + 1. In fact, it is open whether any other element fn is a prime number. Euler showed that the equation x3 + y 3 = z 3 has no solution in positive integers x, y,z. This is the special case ofFermat’s last theorem. It is known that the limit 1 1 1 γ = lim 1 + + + · · · + − log n exists and gives a finite number γ, the so called n→∞ 2 3 n Euler constant. It is not known whether γ is rational or not. Further, Euler numbers Er play P 1 a role in calculating the series n (−1)n (2n+1) r . Soon, we will speak about the Euler formula ix e = cos x + i sin x. More about live and work of famous mathematicians is to be found in www-history.mcs.st-andrews.ac.uk/ We define ∞ X 1 , (2.15) e := n! n=0 where 0! = 1! = 1 by definition. Since 1 1 1 + +···+ 1·2 1·2·3 1 · 2···n 1 1 1 < 1 + 1 + + 2 + · · · + n−1 < 3, 2 2 2 the series converges (by the comparing it with the geometric series with q = 12 ) and the definition makes sense. In fact, the series converges very rapidly and allows us to compute e with great accuracy. It is of interest to note that e can also defined by means of another limit process. e is called the Euler number. sn = 1 + 1 + Proposition 2.24 Proof. Let n 1 e = lim 1 + . n→∞ n n X 1 , sn = k! k=0 (2.16) n 1 tn = 1 + . n By the binomial theorem, 1 n(n − 1) 1 n(n − 1)(n − 2) 1 n(n − 1) · · · 1 1 · 2+ · 3 +···+ · n tn = 1 + n + n 3! n! n 2! n n 1 1 1 1 2 = 1+1+ 1− + 1− 1− +··· 2! n 3! n n 1 1 2 n−1 + 1− 1− ··· 1− n! n n n 1 1 1 ≤ 1+1+ + +···+ 2! 3! n! 2 Sequences and Series 62 Hence, tn ≤ sn , so that by Proposition 2.16 lim tn ≤ lim sn = lim sn = e. n→∞ n→∞ n→∞ (2.17) Next if n ≥ m, 1 1 1 m−1 1 1− +···+ 1− ··· 1− . tn ≥ 1 + 1 + 2! n m! n n Let n → ∞, keeping m fixed again by Proposition 2.16 we get lim tn ≥ 1 + 1 + n→∞ 1 1 +···+ = sm . 2! m! Letting m → ∞, we finally get e ≤ lim tn . (2.18) n→∞ The proposition follows from (2.17) and (2.18). P The rapidity with which the series 1/n! converges can be estimated as follows. 1 1 + +··· (n + 1)! (n + 2)! 1 1 1 1 1 1 1+ + · +··· = < 1 = 2 (n + 1)! n + 1 (n + 1) (n + 1)! 1 − n+1 n! n e − sn = so that 0 < e − sn < 1 . n! n (2.19) We use the preceding inequality to compute e. For n = 9 we find s9 = 1 + 1 + 1 1 1 1 1 1 1 1 + + + + + + + = 2.718281526... 2 6 24 120 720 5040 40, 320 362, 880 (2.20) By (2.19) 3.1 107 such that the first six digits of e in (2.20) are correct. e − s9 < Example 2.11 n n 1 1 1 n−1 n = lim (a) lim 1 − = lim = lim n−1 n 1 n→∞ n→∞ n→∞ n→∞ 1 + n n 1+ n−1 n−1 4n 4n 4n 3n + 1 3n + 1 3n = lim · lim (b) lim n→∞ 3n − 1 n→∞ n→∞ 3n − 1 3n 3n ! 34 3n ! 34 8 1 1 · lim = e3. = lim 1+ 1+ n→∞ n→∞ 3n 3n − 1 1 n−1 1 = . e 2.3 Series 63 Proposition 2.25 e is irrational. Proof. Suppose e is rational, say e = p/q with positive integers p and q. By (2.19) 1 0 < q!(e − sq ) < . q (2.21) By our assumption, q!e is an integer. Since 1 1 q!sq = q! 1 + 1 + + · · · + 2! q! is also an integer, we see that q!(e − sq ) is an integer. Since q ≥ 1, (2.21) implies the existence of an integer between 0 and 1 which is absurd. 2.3.5 The Root and the Ratio Tests p P Theorem 2.26 (Root Test) Given an , put α = lim n | an |. n→∞ Then P (a) if α < 1, an converges; P (b) if α > 1, an diverges; (c) if α = 1, the test gives no information. Proof. (a) If α < 1 choose β such that α < β < 1, and an integer n0 such that p n | an | < β p for n ≥ n0 (such n0 exists since α is the supremum of the limit set of ( n | an |)). That is, n ≥ n0 implies | an | < β n . P n P Since 0 < β < 1, β converges. Convergence of an now follows from the comparison test. p (b) If α > 1 there is a subsequence (ank ) such that nk | ank | → α. Hence | an | > 1 for infinitely many n, so that the necessary condition for convergence, an → 0, fails. X1 X 1 . For each of the series α = 1, but the first To prove (c) consider the series and n n2 diverges, the second converges. p P Remark. (a) an converges, if there exists q < 1 such that n | an | ≤ q for almost all n. P (b) an diverges if | an | ≥ 1 for infinitely many n. P Theorem 2.27 (Ratio Test) The series an an+1 < 1, (a) converges if lim n→∞ an an+1 ≥ 1 for all but finitely many n. (b) diverges if an 2 Sequences and Series 64 In place of (b) one can also use the (weaker) statement an+1 P > 1. (b’) an diverges if lim an n→∞ an+1 Indeed, if (b’) is satisfied, almost all elements of the sequence an Corollary 2.28 The series P an are ≥ 1. an+1 < 1, (a) converges if lim a n an+1 > 1. (b) diverges if lim an Proof of Theorem 2.27. If condition (a) holds, we can find β < 1 and an integer m such that n ≥ m implies an+1 an < β. In particular, | am+1 | < β | am | , | am+2 | < β | am+1 | < β 2 | am | , ... | am+p | < β p | am | . That is, | an | < | am | n β βm P for n ≥ m, and (a) follows from the comparison test, since β n converges. If | an+1 | ≥ | an | for n ≥ n0 , it is seen that the condition an → 0 does not hold, and (b) follows. Remark 2.2 Homework 7.5 shows that in (b) “all but finitely many” cannot be replaced by the weaker assumption “infinitely many.” Example 2.12 (a) The series P∞ 2 n n=0 n /2 converges since, if n ≥ 3, 2 2 an+1 (n + 1)2 2n 1 1 1 1 8 = 1+ 1+ = ≤ = < 1. an n+1 2 2 n 2 n 2 3 9 (b) Consider the series 1 1 1 1 1 1 1 +1+ + + + + + +··· 2 8 4 32 16 128 64 = 1 1 1 1 1 1 + 0 + 3 + 2 + 5 + 4 +··· 1 2 2 2 2 2 2 2.3 Series 65 an+1 an+1 1 √ = , lim = 2, but lim n an = 12 . Indeed, a2n = 1/22n−2 and a2n+1 = 8 n→∞ an n→∞ an 1/22n+1 yields a2n 1 a2n+1 = , = 2. a2n 8 a2n−1 where lim The root test indicates convergence; the ratio test does not apply. X 1 X1 and (c) For both the ratio and the root test do not apply since both (an+1 /an ) and n n2 √ n ( an ) converge to 1. The ratio test is frequently easier to apply than the root test. However, the root test has wider scope. Remark 2.3 For any sequence (cn ) of positive real numbers, √ √ cn+1 cn+1 ≤ lim n cn ≤ lim n cn ≤ lim . n→∞ n→∞ cn n→∞ cn n→∞ lim For the proof, see [Rud76, 3.37 Theorem]. In particular, if lim exists and both limits coincide. cn+1 √ exists, then lim n cn also cn P P bn = Proposition 2.29 (Leibniz criterion) Let bn be an alternating serie, that is P n+1 (−1) an with a decreasing sequence of positive numbers a1 ≥ a2 ≥ · · · ≥ 0. If lim an = 0 P then bn converges. Proof. The proof is quite the same as in Example 2.8 (b). We find for the partial sums sn of P bn | sn − sm | ≤ am+1 P if n ≥ m. Since (an ) tends to 0, the Cauchy criterion applies to (sn ). Hence, bn is convergent. 2.3.6 Absolute Convergence The series P an is said to converge absolutely if the series Proposition 2.30 If P an converges absolutely, then P plus the Cauchy criterion. k=m | an | converges. an converges. Proof. The assertion follows from the inequality n n X X ak ≤ | ak | k=m P 2 Sequences and Series 66 Remarks 2.4 For series with positive terms, absolute convergence is the same as convergence. P P P If an converges but | an | diverges, we say that an converges nonabsolutely. For inP n+1 stance (−1) /n converges nonabsolutely. The comparison test as well as the root and the ratio tests, is really a test for absolute convergence and cannot give any information about nonabsolutely convergent series. We shall see that we may operate with absolutely convergent series very much as with finite sums. We may multiply them, we may change the order in which the additions are carried out without effecting the sum of the series. But for nonabsolutely convergent sequences this is no longer true and more care has to be taken when dealing with them. Without proof we mention the fact that one can multiply absolutely convergent series; for the proof, see [Rud76, Theorem 3.50]. Proposition 2.31 If cn = n X k=0 P an converges absolutely with ak bn−k , n ∈ Z+ . Then ∞ X ∞ X an = A, n=0 P bn converges, ∞ X bn = B, n=0 cn = AB. n=0 2.3.7 Decimal Expansion of Real Numbers Proposition 2.32 (a) Let α be a real number with 0 ≤ α < 1. Then there exists a sequence (an ), an ∈ {0, 1, 2, . . . , 9} such that α= ∞ X an 10−n . (2.22) n=1 The sequence (an ) is called a decimal expansion of α. (b) Given a sequence (ak ), ak ∈ {0, 1, . . . , 9}, then there exists a real number α ∈ [0, 1] such that ∞ X α= an 10−n . n=1 Proof. (b) Comparison with the geometric series yields ∞ X n=1 P∞ an 10−n ≤ 9 ∞ X n=1 10−n = 1 9 · = 1. 10 1 − 1/10 Hence the series n=1 an 10−n converges to some α ∈ [0, 1]. (a) Given α ∈ [0, 1) we use induction to construct a sequence (an ) with (2.22) and sn ≤ α < sn + 10 −n , where sn = n X ak 10−k . k=1 First, cut [0, 1] into 10 pieces Ij := [j/10, (j + 1)/10), j = 0, . . . , 9, of equal length. If α ∈ Ij , put a1 := j. Then, a1 1 s1 = ≤ α < s1 + . 10 10 2.3 Series 67 Suppose a1 , . . . , an are already constructed and sn ≤ α < sn + 10−n . Consider the intervals Ij := [sn + j/10n+1, sn + (j + 1)/10n+1), j = 0, . . . , 9. There is exactly one j such that α ∈ Ij . Put an+1 := j, then an+1 an+1 + 1 ≤ α < sn + n+1 10 10n+1 −n−1 ≤ α < sn+1 + 10 . sn + sn+1 The induction step is complete. By construction | α − sn | < 10−n , that is, lim sn = α. Remarks 2.5 (a) The proof shows that any real number α ∈ [0, 1) can be approximated by rational numbers. (b) The construction avoids decimal expansion of the form α = . . . a9999 . . . , a < 9, and gives instead α = . . . (a + 1)000 . . . . It gives a bijective correspondence between the real numbers of the interval [0, 1) and the sequences (an ), an ∈ {0, 1, . . . , 9}, not ending with nines. However, the sequence (an ) = (0, 1, 9, 9, · · · ) corresponds to the real number 0.02. (c) It is not difficult to see that α ∈ [0, 1) is rational if and only if there exist positive integers n0 and p such that n ≥ n0 implies an = an+p —the decimal expansion is periodic from n0 on. 2.3.8 Complex Sequences and Series Almost all notions and theorems carry over from real sequences to complex sequences. For example A sequence (zn ) of complex numbers converges to z if for every (real) ε > 0 there exists a positive integer n0 ∈ N such that n ≥ n0 implies | z − zn | < ε. The following proposition shows that convergence of a complex sequence can be reduced to the convergence of two real sequences. Proposition 2.33 The complex sequence (zn ) converges to some complex number z if and only if the real sequences ( Re zn ) converges to Re z and the real sequence ( Im zn ) converges to Im z. Proof. Using the (complex) limit law lim(zn + c) = c + lim zn it is easy to see that we can restrict ourselves to the case z = 0. Suppose first zn → 0. Proposition 1.20 (d) gives | Re zn | ≤ | zn |. Hence Re zn tends to 0 as n → ∞. Similarly, | Im zn | ≤ | zn | and therefore Im zn → 0. Suppose now xn := Re zn → 0 and yn := Im zn → 0 as n goes to infinity. Since 2 Sequences and Series 68 | zn |2 = x2n + yn2 , | zn |2 → 0 as n → ∞; this implies zn → 0. Since the complex field C is not an ordered field, all notions and propositions where the order is involved do not make sense for complex series or they need modifications. The sandwich theorem does not hold; there is no notion of monotonic sequences, upper and lower limits. But still there are bounded sequences (| zn | ≤ C), limit points, subsequences, Cauchy sequences, series, and absolute convergence. The following theorems are true for complex sequences, too: Proposition/Lemma/Theorem 1, 2, 3, 9, 10, 12, 15, 17, 18 The Bolzano–Weierstraß Theorem for bounded complex sequences (zn ) can be proved by considering the real and the imaginary sequences ( Re zn ) and ( Im zn ) separately. The comparison test for series now reads: P (a) If | an | ≤ C | bn | for some C > 0 and for almost all n ∈ N, and if | bn | P converges, then an converges. P (b) If | an | ≥ C | dn | for some C > 0 and for almost all n, and if | dn | diverges, P then an diverges. The Cauchy criterion, the root, and the ratio tests are true for complex series as well. Propositions 19, 20, 26, 27, 28, 30, 31 are true for complex series. 2.3.9 Power Series Definition 2.10 Given a sequence (cn ) of complex numbers, the series ∞ X cn z n (2.23) n=0 is called a power series. The numbers cn are called the coefficients of the series; z is a complex number. In general, the series will converge or diverge, depending on the choice of z. More precisely, with every power series there is associated a circle with center 0, the circle of convergence, such that (2.23) converges if z is in the interior of the circle and diverges if z is in the exterior. The radius R of this disc of convergence is called the radius of convergence. On the disc of convergence, a power series defines a function since it associates to each z with P | z | < R a complex number, namely the sum of the numerical series n cn z n . For example, P∞ n 1 n=0 z defines the function f (z) = 1−z for | z | < 1. If almost all coefficients cn are 0, say cn = 0 for all n ≥ m + 1, the power series is a finite sum and the corresponding function is a Pm P n n 2 m polynomial: ∞ n=0 cn z = n=0 cn z = c0 + c1 z + c2 z + · · · + cm z . P Theorem 2.34 Given a power series cn z n , put p 1 (2.24) α = lim n | cn |, R = . n→∞ α P If α = 0, R = +∞; if α = +∞, R = 0. Then cn z n converges if | z | < R, and diverges if | z | > R. 2.3 Series 69 The behavior on the circle of convergence cannot be described so simple. Proof. Put an = cn z n and apply the root test: p p |z| n . | an | = | z | lim n | cn | = n→∞ n→∞ R lim This gives convergence if | z | < R and divergence if | z | > R. The nonnegative number R is called the radius of convergence. Pm m Examplep2.13 (a) The series has cn = 0 for almost all n. Hence α = n=0 cn z n limn→∞ | cn | = limn→∞ 0 = 0 and R = +∞. P (b) The series nn z n has R = 0. X zn has R = +∞. (In this case the ratio test is easier to apply than the root (c) The series n! test. Indeed, cn+1 n! 1 = lim α = lim = lim = 0, n→∞ n→∞ (n + 1)! n→∞ n + 1 cn and therefore R = +∞. ) P n (d) The series z has R = 1. If | z | = 1 diverges since (z n ) does not tend to 0. This generalizes the geometric series; formula (2.9) still holds if | q | < 1: n ∞ X 2(i/3)2 i 3+i 2 = =− . 3 1 − i/3 15 n=2 P (e) The series z n /n has R = 1. It diverges if z = 1. It converges for all other z with | z | = 1 (without proof). P (f) The series z n /n2 has R = 1. It converges for all z with | z | = 1 by the comparison test, since | z n /n2 | = 1/n2 . 2.3.10 Rearrangements The generalized associative law for finite sums says that we can insert brackets without effecting the sum, for example, ((a1 + a2 ) + (a3 + a4 )) = (a1 + (a2 + (a3 + a4 ))). We will see that a similar statement holds for series: P P P Suppose that k ak is a converging series and l bl is a sum obtained from k ak by “inserting brackets”, for example b1 + b2 + b3 + · · · = (a1 + a2 ) + (a3 + · · · + a10 ) + (a11 + a12 ) + · · · | {z } | {z } | {z } b1 b2 b3 P P Then l bl converges and the sum is the same. If k ak diverges to +∞, the same is true P P P for l bl . However, divergence of ak does not imply divergence of bl in general, since 1 − 1 + 1 − 1 + 1 − 1 + · · · diverges but (1 − 1) + (1 − 1) + · · · converges. For the proof P P let sn = nk=1 and tm = m l=1 bl . By construction, tm = snm for a suitable subsequence (snm ) 2 Sequences and Series 70 P of the partial sums of k ak . Convergence (proper or improper) of (sn ) implies convergence P (proper or improper) of any subsequence. Hence, l bl converges. For finite sums, the generalized commutative law holds: a1 + a2 + a3 + a4 = a2 + a4 + a1 + a3 ; that is, any rearrangement of the summands does not effect the sum. We will see in Example 2.14 below that this is not true for arbitrary series but for absolutely converging ones, (see Proposition 2.36 below). Definition 2.11 Let σ : N → N be a bijective mapping, that is in the sequence (σ(1), σ(2), . . . ) every positive integer appears once and only once. Putting a′n = aσ(n) , we say that P a′n is a rearrangement of P (n = 1, 2, . . . ), an . P P ′ P If (sn ) and (s′n ) are the partial sums of an and a rearrangement an of an , it is easily seen that, in general, these two sequences consist of entirely different numbers. We are led to the problem of determining under what conditions all rearrangements of a convergent series will converge and whether the sums are necessarily the same. Example 2.14 (a) Consider the convergent series ∞ X (−1)n+1 n=1 n =1− 1 1 + − +··· 2 3 (2.25) and one of its rearrangements 1− 1 1 1 1 1 1 1 1 − + − − + − − +··· . 2 4 3 6 8 5 10 12 (2.26) If s is the sum of (2.25) then s > 0 since 1 1 1 + + · · · > 0. − 1− 2 3 4 We will show that (2.26) converges to s′ = s/2. Namely X 1 1 1 1 1 1 1 1 ′ ′ s = − + − + − − − +··· an = 1 − 2 4 3 6 8 5 10 12 1 1 1 1 1 1 = − + − + − +··· 2 4 6 8 10 12 1 1 1 1 1 1 − + − + −··· = s = 2 2 3 4 2 Since s 6= 0, s′ 6= s. Hence, there exist rearrangements which converge; however to a different limit. 2.3 Series 71 (b) Consider the following rearrangement of the series (2.25) X 1 1 1 a′n = 1 − + − + 2 3 4 1 1 1 + − + + 5 7 6 1 1 1 1 1 − + + + + 9 11 13 15 8 +··· 1 1 1 1 − + n + · · · + n+1 +··· + n 2 +1 2 +3 2 −1 2n + 2 Since for every positive integer n ≥ 10 1 1 1 1 1 1 1 1 1 − + n + · · · + n+1 > 2n−1 · n+1 − > − > n 2 +1 2 +3 2 −1 2n + 2 2 2n + 2 4 2n + 2 5 the rearranged series diverges to +∞. Without proof (see [Rud76, 3.54 Theorem]) we remark the following surprising theorem. It shows (together with the Proposition 2.36) that the absolute convergence of a series is necessary and sufficient for every rearrangement to be convergent (to the same limit). P Proposition 2.35 Let an be a series of real numbers which converges, but not absolutely. P Suppose −∞ ≤ α ≤ β ≤ +∞. Then there exists a rearrangement a′n with partial sums s′n such that lim s′n = α, lim s′n = β. n→∞ n→∞ P Proposition 2.36 If an is a series of complex numbers which converges absolutely, then P every rearrangement of an converges, and they all converge to the same sum. P Proof. Let a′n be a rearrangement with partial sums s′n . Given ε > 0, by the Cauchy criterion P for the series | an | there exists n0 ∈ N such that n ≥ m ≥ n0 implies n X k=m | ak | < ε. (2.27) Now choose p such that the integers 1, 2, . . . , n0 are all contained in the set σ(1), σ(2), . . . , σ(p). {1, 2, . . . , n0 } ⊆ {σ(1), σ(2), . . . , σ(p)}. Then, if n ≥ p, the numbers a1 , a2 , . . . , an0 will cancel in the difference sn − s′n , so that n n n n X X X X | sn − s′n | = ak − aσ(k) ≤ ±ak ≤ | ak | < ε, k=1 k=1 k=n0 +1 by (2.27). Hence (s′n ) converges to the same sum as (sn ). P The same argument shows that a′n also absolutely converges. k=n0 +1 2 Sequences and Series 72 2.3.11 Products of Series If we multiply two finite sums a1 + a2 + · · · + an and b1 + b2 + · · · + bm by the distributive law, we form all products ai bj put them into a sequence p0 , p1 , · · · , ps , s = mn, and add up p0 + p1 + p2 + · · · + ps . This method can be generalized to series a0 + a1 + · · · and b0 + b1 + · · · . Surely, we can form all products ai bj , we can arrange them in a sequence p0 , p1 , p2 , · · · and form the product series p0 + p1 + · · · . For example, consider the table a0 b0 a0 b1 a0 b2 · · · a1 b0 a1 b1 a1 b2 · · · a2 b0 a2 b1 a2 b2 · · · .. .. .. . . . p0 p1 p3 · · · p2 p4 p7 · · · p5 p8 p12 · · · .. .. .. . . . P and the diagonal enumeration of the products. The question is: under which conditions on an P and bn the product series converges and its sum does not depend on the arrangement of the products ai bk . P P P ak and ∞ bk converge absolutely with A = ∞ a Proposition 2.37 If both series ∞ k=0 k=0 P∞ P P∞ k=0 k and B = k=0 bk , then any of their product series pk converges absolutely and k=0 pk = AB. P Proof. For the nth partial sum of any product series nk=0 | pk | we have | p0 | + | p1 | + · · · + | pn | ≤ (| a0 | + · · · + | am |) · (| b0 | + · · · + | bm |), if m is sufficiently large. More than ever, | p0 | + | p1 | + · · · + | pn | ≤ ∞ X k=0 | ak | · ∞ X k=0 | bk | . P That is, any series ∞ by Lemma 2.19 (c). By Propok=0 | pk | is bounded and hence convergent P∞ sition 2.36 all product series converge to the same sum s = k=0 pk . Consider now the very P special product series ∞ k=1 qn with partial sums consisting of the sum of the elements in the upper left square. Then q1 + q2 + · · · + q(n+1)2 = (a0 + a1 + · · · + an )(b0 + · · · + bn ). converges to s = AB. Arranging the elements ai bj as above in a diagonal array and summing up the elements on the nth diagonal cn = a0 bn + a1 bn−1 + · · · + an b0 , we obtain the Cauchy product ∞ X n=0 cn = ∞ X n=0 P∞ (a0 bn + a1 bn−1 + · · · + an b0 ). P P Corollary 2.38 If both series k=0 ak and ∞ bk converge absolutely with A = ∞ k=0 k=0 ak P∞ P∞ P∞ and B = k=0 bk , their Cauchy product k=0 ck converges absolutely and k=0 ck = AB. 2.3 Series 73 Example 2.15 We compute the Cauchy product of two geometric series: (1 + p + p2 + · · · )(1 + q + q 2 + · · · ) = 1 + (p + q) + (p2 + pq + q 2 )+ + (p3 + p2 q + pq 2 + q 2 ) + · · · ∞ p − q p2 − q 2 p3 − q 3 1 X n = + + +··· = (p − q n ) p−q p−q p−q p − q n=1 = | p |<1,| q |<1 1 p q 1 p(1 − q) − q(1 − p) 1 1 − = = · . p−q1−p 1−q p − q (1 − p)(1 − q) 1−p 1−q Cauchy Product of Power Series In case of power series the Cauchy product is appropriate since it is again a power series (which P k is not the case for other types of product series). Indeed, the Cauchy product of ∞ k=0 ak z and P∞ k k=0 bk z is given by the general element n X k ak z bn−k z n−k =z k=0 such that ∞ X k=0 ak z k · n n X ak bn−k , k=0 ∞ X bk z k = ∞ X n=0 k=0 (a0 bn + · · · + an b0 )z n . P Corollary 2.39 Suppose that n an z n and n bn z n are power series with positive radius of convergence R1 and R2 , respectively. Let R = min{R1 , R2 }. Then the Cauchy product P∞ n n=0 cn z , cn = a0 bn + · · · + an b0 , converges absolutely for | z | < R and P ∞ X n=0 an z n · ∞ X bn z n = n=0 ∞ X cn z n , n=0 | z | < R. This follows from the previous corollary and the fact that both series converge absolutely for | z | < R. Example 2.16 (a) ∞ X 1 , | z | < 1. (1 − z)2 n=0 P∞ n 1 = 1−z , | z | < 1 with itself. Indeed, consider the Cauchy product of n=0 z Pn Pn an = bn = 1, cn = k=0 ak bn−k = k=0 1 · 1 = n + 1, the claim follows. (b) 2 (z + z ) ∞ X n=0 n z = ∞ X (z n+1 (n + 1)z n = +z n+2 n=0 =z+2 )= ∞ X n=1 ∞ X n=2 n z + ∞ X Since zn = n=2 z n = z + 2z 2 + 2z 3 + 2z 4 + · · · = z + z + z2 2z 2 = . 1−z 1−z 74 2 Sequences and Series Chapter 3 Functions and Continuity This chapter is devoted to another central notion in analysis—the notion of a continuous function. We will see that sums, product, quotients, and compositions of continuous functions are continuous. If nothing is specified otherwise D will denote a finite union of intervals. Definition 3.1 Let D ⊂ R be a subset of R. A function is a map f : D → R. (a) The set D is called the domain of f ; we write D = D(f ). (b) If A ⊆ D, f (A) := {f (x) | x ∈ A} is called the image of A under f . The function f ↾A : A → R given by f ↾A(a) = f (a), a ∈ A, is called the restriction of f to A. (c) If B ⊂ R, we call f −1 (B) := {x ∈ D | f (x) ∈ B} the preimage of B under f . (d) The graph of f is the set graph(f ) := {(x, f (x)) | x ∈ D}. Later we will consider functions in a wider sense: From the complex numbers into complex numbers and from Fn into Fm where F = R or F = C. We say that a function f : D → R is bounded, if f (D) ⊂ R is a bounded set of real numbers, i. e. there is a C > 0 such that | f (x) | ≤ C for all x ∈ D. We say that f is bounded above (resp. bounded below) if there exists C ∈ R such that f (x) < C (resp. f (x) > C) for all x in the domain of f . Example 3.1 (a) Power series (with radius of convergence R > 0), polynomials and rational functions are the most important examples of functions. Let c ∈ R. Then f (x) = c, f : R → R, is called the constant function. (b) Properties of the functions change drastically if we change the domain or the image set. Let f : R → R, g : R → R+ , k : R+ → R, h : R+ → R+ function given by x 7→ x2 . g is surjective, k is injective, h is bijective, f is neither injective nor surjective. Obviously, f ↾R+ = k and g↾R+ = h. P∞ n 1 \ (c) Let f (x) = n=0 x , f : (−1, 1) → R and h(x) = 1−x , h : R {1} → R. Then h↾(−1, 1) = f . 75 3 Functions and Continuity 76 y y y f(x)=x f(x)=|x| c x x x The graphs of the constant, the identity, and absolute value functions. 3.1 Limits of a Function Definition 3.2 (ε-δ-definition) Let (a, b) a finite or infinite interval and x0 ∈ (a, b). Let f : (a, b) \ {x0 } → R be a real-valued function. We call A ∈ R the limit of f in x0 (“The limit of f (x) is A as x approaches x0 ”; “f approaches A near x0 ”) if the following is satisfied For any ε > 0 there exists δ > 0 such that x ∈ (a, b) and 0 < | x − x0 | < δ imply | f (x) − A | < ε. We write lim f (x) = A. x→x0 Roughly speaking, if x is close to x0 , f (x) must be closed to A. f(x)+ε f(x) f(x)- ε x- δ x x+δ Using quantifiers lim f (x) = A reads as x→x0 ∀ ε > 0 ∃ δ > 0 ∀ x ∈ (a, b) : 0 < | x − x0 | < δ =⇒ | f (x) − A | < ε. Note that the formal negation of lim f (x) = A is x→x0 ∃ ε > 0 ∀ δ > 0 ∃ x ∈ (a, b) : 0 < | x − x0 | < δ and | f (x) − A | ≥ ε. 3.1 Limits of a Function 77 Proposition 3.1 (sequences definition) Let f and x0 be as above. Then lim f (x) = A if and x→x0 only if for every sequence (xn ) with xn ∈ (a, b), xn 6= x0 for all n, and lim xn = x0 we have n→∞ lim f (xn ) = A. n→∞ Proof. Suppose limx→x0 f (x) = A, and xn → x0 where xn 6= x0 for all n. Given ε > 0 we find δ > 0 such that | f (x) − A | < ε if 0 < | x − x0 | < δ. Since xn → x0 , there is a positive integer n0 such that n ≥ n0 implies | xn − x0 | < δ. Therefore n ≥ n0 implies | f (xn ) − A | < ε. That is, limn→∞ f (xn ) = A. Suppose to the contrary that the condition of the proposition is fulfilled but limx→x0 f (x) 6= A. Then there is some ε > 0 such that for all δ = 1/n, n ∈ N, there is an xn ∈ (a, b) such that 0 < | xn − x0 | < 1/n, but | f (xn ) − A | ≥ ε. We have constructed a sequence (xn ), xn 6= x0 and xn → x0 as n → ∞ such that limn→∞ f (xn ) 6= A which contradicts our assumption. Hence lim f (x) = A. x→x0 Example. limx→1 x + 3 = 4. Indeed, given ε > 0 choose δ = ε. Then | x − 1 | < δ implies | (x + 3) − 4 | < δ = ε. 3.1.1 One-sided Limits, Infinite Limits, and Limits at Infinity Definition 3.3 (a) We are writing lim f (x) = A x→x0 +0 if for all sequences (xn ) with xn > x0 and lim xn = x0 , we have lim f (xn ) = A. Sometimes n→∞ n→∞ we use the notation f (x0 + 0) in place of lim f (x). We call f (x0 + 0) the right-hand limit x→x0 +0 of f at x0 or we say “A is the limit of f as x approaches x0 from above (from the right).” Similarly one defines the left-hand limit of f at x0 , lim f (x) = A with xn < x0 in place of x→x0 −0 xn > x0 . Sometimes we use the notation f (x0 − 0). (b) We are writing lim f (x) = A x→+∞ if for all sequences (xn ) with lim xn = +∞ we have lim f (xn ) = A. Sometimes we use the n→∞ n→∞ notation f (+∞). In a similar way we define lim f (x) = A. x→−∞ (c) Finally, the notions of (a), (b), and Definition 3.2 still make sense in case A = +∞ and A = −∞. For example, lim f (x) = −∞ x→x0 −0 if for all sequences (xn ) with xn < x0 and lim xn = x0 we have lim f (xn ) = −∞. n→∞ n→∞ Remark 3.1 All notions in the above definition can be given in ε-δ or ε-D or E-δ or E-D languages using inequalities. For example, lim f (x) = −∞ if and only if x→x0 −0 ∀ E > 0 ∃ δ > 0 ∀ x ∈ D(f ) : 0 < x0 − x < δ =⇒ f (x) < −E. 3 Functions and Continuity 78 For example, we show that limx→0−0 x1 = −∞. To E > 0 choose δ = E1 . Then 0 < −x < δ = 1 implies 0 < E < − x1 and hence f (x) < −E. This proves the claim. E Similarly, lim f (x) = +∞ if and only if x→+∞ ∀ E > 0 ∃ D > 0 ∀ x ∈ D(f ) : x > D =⇒ f (x) > E. The proves of equivalence of ε-δ definitions and sequence definitions are along the lines of Proposition 3.1. √ √ For example, limx→+∞ x2 = +∞. To E > 0 choose D = E. Then x > D implies x > E; thus x2 > E. 1 = 0. For, let ε > 0; choose D = 1/ε. Then x > D = 1/ε implies x→+∞ x 0 < 1/x < ε. This proves the claim. (b) Consider the entier function f (x) = [x], defined in Example 2.6 (b). If n ∈ Z, lim f (x) = n − 1 whereas y Example 3.2 (a) lim x→n−0 n lim f (x) = n. f(x)=[x] x→n+0 Proof. We use the ε-δ definition of the one-sided limits to prove the first claim. Let ε > 0. Choose δ = 12 then 0 < n − x < 12 implies n − 12 < x < n and therefore f (x) = n − 1. In particular | f (x) − (n − 1) | = 0 < ε. Similarly one proves lim f (x) = n. n-1 n x x→n+0 Since the one-sided limits are different, lim f (x) does not exist. x→n Definition 3.4 Suppose we are given two functions f and g, both defined on (a, b) \ {x0 }. By f + g we mean the function which assigns to each point x 6= x0 of (a, b) the number f (x) + g(x). Similarly, we define the difference f − g, the product f g, and the quotient f /g, with the understanding that the quotient is defined only at those points x at which g(x) 6= 0. Proposition 3.2 Suppose that f and g are functions defined on (a, b) \ {x0 }, a < x0 < b, and limx→x0 f (x) = A, limx→x0 g(x) = B, α, β ∈ R. Then (a) lim f (x) = A′ implies A′ = A. x→x0 (b) lim (αf + βg)(x) = αA + βB; x→x0 (c) lim (f g)(x) = AB; x→x0 A f (x) = , if B 6= 0. x→x0 g B (e) lim | f (x) | = | A |. (d) lim x→x0 Proof. In view of Proposition 3.1, all these assertions follow immediately from the analogous properties of sequences, see Proposition 2.3. As an example, we show (c). Let (xn ) , xn 6= x0 , be a sequence tending to x0 . By assumption, limn→∞ f (xn ) = A and limn→∞ g(xn ) = B. By the Propostition 2.3 limn→∞ f (xn )g(xn ) = AB, that is, limn→∞ (f g)(xn ) = AB. By 3.1 Limits of a Function 79 Proposition 3.1, limx→x0 f g(x) = AB. Remark 3.2 The proposition remains true if we replace (at the same time in all places) x → x0 by x → x0 + 0, x → x0 − 0, x → +∞, or x → −∞. Moreover we can replace A or B by +∞ or by −∞ provided the right members of (b), (c), (d) and (e) are defined. Note that +∞ + (−∞), 0 · ∞, ∞/∞, and A/0 are not defined. The extended real number system consists of the real field R and two symbols, +∞ and −∞. We preserve the original order in R and define −∞ < x < +∞ for every x ∈ R. It is the clear that +∞ is an upper bound of every subset of the extended real number system, and every nonempty subset has a least upper bound. If, for example, E is a set of real numbers which is not bounded above in R, then sup E = +∞ in the extended real system. Exactly the same remarks apply to lower bounds. The extended real system does not form a field, but it is customary to make the following conventions: (a) If x is real then x + ∞ = +∞, x − ∞ = −∞, x x = = 0. +∞ −∞ (b) If x > 0 then x · (+∞) = +∞ and x · (−∞) = −∞. (c) If x < 0 then x · (+∞) = −∞ and x · (−∞) = +∞. When it is desired to make the distinction between the real numbers on the one hand and the symbols +∞ and −∞ on the other hand quite explicit, the real numbers are called finite. In Homework 9.2 (a) and (b) you are invited to give explicit proves in two special cases. Example 3.3 (a) Let p(x) and q(x) be polynomials and a ∈ R. Then lim p(x) = p(a). x→a This immediately follows from limx→a x = a, limx→a c = c and Proposition 3.2. Indeed, by (b) and (c), for p(x) = 3x3 − 4x + 7 we have limx→a (3x2 − 4x + 7) = 3 (limx→a x)3 − 4 limx→a x + 7 = 3a2 − 4a + 7 = p(a). This works for arbitrary polynomials . Suppose moreover that q(a) 6= 0. Then by (d), p(a) p(x) lim = . x→a q(x) q(a) Hence, the limit of a rational function f (x) as x approaches a point a of the domain of f is f (a). 3 Functions and Continuity 80 P (b) Let f (x) = p(x) be a rational function with polynomials p(x) = rk=0 ak xk and q(x) = q(x) Ps k k=0 bk x with real coefficients ak and bk and of degree r and s, respectively. Then 0, if r < s, ar , if r = s, lim f (x) = bs x→+∞ +∞, if r > s and −∞, if r > s and ar bs ar bs > 0, < 0. The first two statements (r ≥ s) follow from Example 3.2 (b) together with Proposition 3.2. Namely, ak xk−r → 0 as x → +∞ provided 0 ≤ k < r. The statements for r > s follow from xr−s → +∞ as x → +∞ and the above remark. Note that lim f (x) = (−1)r+s lim f (x) x→−∞ x→+∞ since r p(−x) (−1)r ar xr + . . . r+s ar x + . . . = = (−1) . q(−x) (−1)s bs xs + . . . bs xs + . . . 3.2 Continuous Functions Definition 3.5 Let f be a function and x0 ∈ D(f ). We say that f is continuous at x0 if ∀ ε > 0 ∃ δ > 0 ∀ x ∈ D(f ) : | x − x0 | < δ =⇒ | f (x) − f (x0 ) | < ε. (3.1) We say that f is continuous in A ⊂ D(f ) if f is continuous at all points x0 ∈ A. Proposition 3.1 shows that the above definition of continuity in x0 is equivalent to: For all sequences (xn ), xn ∈ D(f ), with lim xn = x0 , lim f (xn ) = f (x0 ). In other words, f is n→∞ n→∞ continuous at x0 if lim f (x) = f (x0 ). x→x0 Example 3.4 (a) In example 3.2 we have seen that every polynomial is continuous in R and every rational functions f is continuous in their domain D(f ). f (x) = | x | is continuous in R. (b) Continuity is a local property: If two functions f, g : D → R coincide in a neighborhood Uε (x0 ) ⊂ D of some point x0 , then f is continuous at x0 if and only if g is continuous at x0 . (c) f (x) = [x] is continuous in R \ Z. If x0 is not an integer, then n < x0 < n + 1 for some n ∈ N and f (x) = n coincides with a constant function in a neighborhood x ∈ Uε (x0 ). By (b), f is continuous at x0 . If x0 = n ∈ Z, limx→n [x] does not exist; hence f is not continuous at n. x2 − 1 (d) f (x) = if x 6= 1 and f (1) = 1. Then f is not continuous at x0 = 1 since x−1 x2 − 1 = lim (x + 1) = 2 6= 1 = f (1). x→1 x − 1 x→1 lim 3.2 Continuous Functions 81 There are two reasons for a function not being continuous at x0 . First, limx→x0 f (x) does not exist. Secondly, f has a limit at x0 but limx→x0 f (x) 6= f (x0 ). Proposition 3.3 Suppose f, g : D → R are continuous at x0 ∈ D. Then f + g and f g are also continuous at x0 . If g(x0 ) 6= 0, then f /g is continuous at x0 . The proof is obvious from Proposition 3.2. The set C(D) of continuous function on D ⊂ R form a commutative algebra with 1. Proposition 3.4 Let f : D → R and g : E → R functions with f (D) ⊂ E. Suppose f is continuous at a ∈ D, and g is continuous at b = f (a) ∈ E. Then the composite function g ◦f : D → R is continuous at a. Proof. Let (xn ) be a sequence with xn ∈ D and limn→∞ xn = a. Since f is continuous at a, limn→∞ f (xn ) = b. Since g is continuous at b, limn→∞ g(f (xn )) = g(b); hence g ◦f (xn ) → g ◦f (a). This completes the proof. Example 3.5 f (x) = x1 is continuous for x 6= 0, g(x) = sin x is continuous (see below), hence, (g ◦f )(x) = sin x1 is continuous on R \ {0}. 3.2.1 The Intermediate Value Theorem In this paragraph, [a, b] ⊂ R is a closed, bounded interval, a, b ∈ R. The intermediate value theorem is the basis for several existence theorems in analysis. It is again equivalent to the order completeness of R. Theorem 3.5 (Intermediate Value Theorem) Let f : [a, b] → R be a continuous function and γ a real number between f (a) and f (b). Then there exists c ∈ [a, b] such that f (c) = γ. The statement is clear from the graphical presentation. Nevertheless, it needs a proof since pictures do not prove anything. The statement is wrong for rational numbers. For example, let γ D = {x ∈ Q | 1 ≤ x ≤ 2} and f (x) = x2 − 2. Then f (1) = −1 c a b and f (2) = 2 but there is no p ∈ D with f (p) = 0 since 2 has no rational square root. Proof. Without loss of generality suppose f (a) ≤ f (b). Starting with [a1 , b1 ] = [a, b], we successively construct a nested sequence of intervals [an , bn ] such that f (an ) ≤ γ ≤ f (bn ). As in the proof of Proposition 2.12, the [an , bn ] is one of the two halfintervals [an−1 , m] and [m, bn−1 ] where m = (an−1 + bn−1 )/2 is the midpoint of the (n − 1)st interval. By Proposition 2.11 the monotonic sequences (an ) and (bn ) both converge to a common point c. Since f is continuous, lim f (an ) = f ( lim an ) = f (c) = f ( lim bn ) = lim f (bn ). n→∞ n→∞ n→∞ By Proposition 2.14, f (an ) ≤ γ ≤ f (bn ) implies lim f (an ) ≤ γ ≤ lim f (bn ); n→∞ n→∞ n→∞ 3 Functions and Continuity 82 Hence, γ = f (c). Example 3.6 (a) We again show the existence of the nth root of a positive real number a > 0, n ∈ N. By Example 3.2, the polynomial p(x) = xn − a is continuous in R. We find p(0) = −a < 0 and by Bernoulli’s inequality p(1 + a) = (1 + a)n − a ≥ 1 + (n − 1)a ≥ 1 > 0. Theorem 3.5 shows that p has a root in the interval (0, 1 + a). (b) A polynomial p of odd degree with real coefficients has a real zero. Namely, by Example 3.3, if the leading coefficient ar of p is positive, lim p(x) = −∞ and lim p(x) = +∞. Hence x→−∞ x→∞ there are a and b with a < b and p(a) < 0 < p(b). Therefore, there is a c ∈ (a, b) such that p(c) = 0. There are polynomials of even degree having no real zeros. For example f (x) = x2k + 1. Remark 3.3 Theorem 3.5 is not true for continuous functions f : Q → R. For example, f (x) = x2 − 2 is continuous, f (0) = −2 < 0 < 2 = f (2). However, there is no r ∈ Q with f (r) = 0. 3.2.2 Continuous Functions on Bounded and Closed Intervals—The Theorem about Maximum and Minimum We say that f : [a, b] → R is continuous, if f is continuous on (a, b) and f (a + 0) = f (a) and f (b − 0) = f (b). Theorem 3.6 (Theorem about Maximum and Minimum) Let f : [a, b] → R be continuous. Then f is bounded and attains its maximum and its minimum, that is, there exists C > 0 with | f (x) | ≤ C for all x ∈ [a, b] and there exist p, q ∈ [a, b] with sup f (x) = max f (x) = f (p) a≤x≤b a≤x≤b and inf f (x) = min f (x) = f (q). a≤x≤b a≤x≤b Remarks 3.4 (a) The theorem is not true in case of open, half-open or infinite intervals. For example, f : (0, 1] → R, f (x) = x1 is continuous but not bounded. The function f : (0, 1) → R, f (x) = x is continuous and bounded. However, it doesn’t attain maximum and minimum. Finally, f (x) = x2 on R+ is continuous but not bounded. (b) Put M := max f (x) and m := min f (x). By the Theorem about maximum and minimum x∈K x∈K and the intermediate value theorem, for all γ ∈ R with m ≤ γ ≤ M there exists c ∈ [a, b] such that f (c) = γ; that is, f attains all values between m and M. Proof. We give the proof in case of the maximum. Replacing f by −f yields the proof for the minimum. Let A = sup f (x) ∈ R ∪ {+∞}. a≤x≤b 3.3 Uniform Continuity 83 (Note that A = +∞ is equivalent to f is not bounded above.) Then there exists a sequence (xn ) ∈ [a, b] such that limn→∞ f (xn ) = A. Since (xn ) is bounded, by the Theoremm of Weierstraß there exists a convergent subsequence (xnk ) with p = lim xnk and a ≤ p ≤ b. Since k f is continuous, A = lim f (xnk ) = f (p). k→∞ In particular, A is a finite real number; that is, f is bounded above by A and f attains its maximum A at point p ∈ [a, b]. 3.3 Uniform Continuity Let D be a finite or infinite interval. Definition 3.6 A function f : D → R is called uniformly continuous if for every ε > 0 there exists a δ > 0 such that for all x, x′ ∈ D | x − x′ | < δ implies | f (x) − f (x′ ) | < ε. f is uniformly continuous on [a, b] if and only if ∀ ε > 0 ∃ δ > 0 ∀ x, y ∈ [a, b] : | x − y | < δ =⇒ | f (x) − f (y) | < ε. (3.2) Remark 3.5 If f is uniformly continuous on D then f is continuous on D. However, the converse direction is not true. Consider, for example, f : (0, 1) → R, f (x) = x1 which is continuous. Suppose to the contrary that f is uniformly continuous. Then to ε = 1 there exists δ > 0 with (3.2). By the Archimedian 1 1 property there exists n ∈ N such that 2n < δ. Consider xn = n1 and yn = 2n . Then | xn − yn | = 1 < δ. However, 2n | f (xn ) − f (yn ) | = 2n − n = n ≥ 1. A contradiction! Hence, f is not uniformly continuous on (0, 1). Let us consider the differences between the concepts of continuity and uniform continuity. First, uniform continuity is a property of a function on a set, whereas continuity can be defined in a single point. To ask whether or not a given function is uniformly continuous at a certain point is meaningless. Secondly, if f is continuous on D, then it is possible to find, for each ε > 0 and for each point x0 ∈ D, a number δ = δ(x0 , ε) > 0 having the property specified in Definition 3.5. This δ depends on ε and on x0 . If f is, however, uniformly continuous on X, then it is possible, for each ε > 0 to find one δ = δ(ε) > 0 which will do for all possible points x0 of X. That the two concepts are equivalent on bounded and closed intervals follows from the next proposition. Proposition 3.7 Let f : [a, b] → R be a continuous function on a bounded and closed interval. Then f is uniformly continuous on [a, b]. 3 Functions and Continuity 84 Proof. Suppose to the contrary that f is not uniformly continuous. Then there exists ε0 > 0 without matching δ > 0; for every positive integer n ∈ N there exists a pair of points xn , x′n with | xn − x′n | < 1/n but | f (xn ) − f (x′n ) | ≥ ε0 . Since [a, b] is bounded and closed, (xn ) has a subsequence converging to some point p ∈ [a, b]. Since | xn − x′n | < 1/n, the sequence (x′n ) also converges to p. Hence lim f (xnk ) − f (x′nk ) = f (p) − f (p) = 0 k→∞ which contradicts f (xnk ) − f (x′nk ) ≥ ε0 for all k. There exists an example of a bounded continuous function f : [0, 1) → R which is not uniformly continuous, see [Kön90, p. 91]. Discontinuities If x is a point in the domain of a function f at which f is not continuous, we say f is discontinuous at x or f has a discontinuity at x. It is customary to divide discontinuities into two types. Definition 3.7 Let f : (a, b) → R be a function which is discontinuous at a point x0 . If the one-sided limits limx→x0 +0 f (x) and limx→x0 −0 f (x) exist, then f is said to have a simple discontinuity or a discontinuity of the first kind. Otherwise the discontinuity is said to be of the second kind. Example 3.7 (a) f (x) = sign(x) is continuous on R \ {0} since it is locally constant. Moreover, f (0 + 0) = 1 and f (0 − 0) = −1. Hence, sign(x) has a simple discontinuity at x0 = 0. (b) Define f (x) = 0 if x is rational, and f (x) = 1 if x is irrational. Then f has a discontinuity of the second kind at every point x since neither f (x + 0) nor f (x − 0) exists. (c) Define f (x) = ( sin x1 , if 0, if x 6= 0; x = 0. Consider the two sequences xn = π 2 1 + nπ and yn = 1 , nπ Then both sequences (xn ) and (yn ) approach 0 from above but limn→∞ f (xn ) = 1 and limn→∞ f (yn ) = 0; hence f (0 + 0) does not exist. Therefore f has a discontinuity of the second kind at x = 0. We have not yet shown that sin x is a continuous function. This will be done in Section 3.5.2. 3.4 Monotonic Functions 85 3.4 Monotonic Functions Definition 3.8 Let f be a real function on the interval (a, b). Then f is said to be monotonically increasing on (a, b) if a < x < y < b implies f (x) ≤ f (y). If the last inequality is reversed, we obtain the definition of a monotonically decreasing function. The class of monotonic functions consists of both the increasing and the decreasing functions. If a < x < y < b implies f (x) < f (y), the function is said to be strictly increasing. Similarly, strictly decreasing functions are defined. Theorem 3.8 Let f be a monotonically increasing function on (a, b). Then the one-sided limits f (x + 0) and f (x − 0) exist at every point x of (a, b). More precisely, sup f (t) = f (x − 0) ≤ f (x) ≤ f (x + 0) = inf f (t). t∈(x,b) t∈(a,x) (3.3) Furthermore, if a < x < y < b, then f (x + 0) ≤ f (y − 0). (3.4) Analogous results evidently hold for monotonically decreasing functions. Proof. See Appendix B to this chapter. Proposition 3.9 Let f : [a, b] → R be a strictly monotonically increasing continuous function and A = f (a) and B = f (b). Then f maps [a, b] bijectively onto [A, B] and the inverse function f −1 : [A, B] → R is again strictly monotonically increasing and continuous. Note that the inverse function f −1 : [A, B] → [a, b] is defined by f (y0 ) = x0 , y0 ∈ [A, B], where x0 is the unique element of [a, b] with f (x0 ) = y0 . However, we can think of f −1 as a function into R. A similar statement is true for strictly decreasing functions. Proof. By Remark 3.4, f maps [a, b] onto the whole closed interval [A, B] (intermediate value theorem). Since x < y implies f (x) < f (y), f is injective and hence bijective. Hence, the inverse mapping f −1 : [A, B] → [a, b] exists and is again strictly increasing (u < v implies f −1 (u) = x < y = f −1 (v) otherwise, x ≥ y implies u ≥ v). We show that g = f −1 is continuous. Suppose (un ) is a sequence in [A, B] with un → u and un = f (xn ) and u = f (x). We have to show that (xn ) converges to x. Suppose to the contrary that there exists ε0 > 0 such that | xn − x | ≥ ε0 for infinitely many n. Since (xn ) ⊆ [a, b] is bounded, there exists a converging subsequence (xnk ), say, xnk → c as k → ∞. The above inequality is true for the limit c, too, that is | c − x | ≥ ε0 . By continuity of f , xnk → c implies f (xnk ) → f (c). That is unk → f (c). Since un → u = f (x) and the limit of a converging sequence is unique, f (c) = f (x). Since f is bijective, x = c; this contradicts | c − x | ≥ ε0 . Hence, g is continuous at u. 3 Functions and Continuity 86 Example 3.8 The function f : R+ → R+ , f (x) = xn , is continuous and strictly increasing. √ Hence x = g(y) = n y is continuous, too. This gives an alternative proof of homework 5.5. 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 3.5.1 Exponential and Logarithm Functions In this section we are dealing with the exponential function which is one of the most important in analysis. We use the exponential series to define the function. We will see that this definition is consistent with the definition ex for rational x ∈ Q as defined in Chapter 1. Definition 3.9 For z ∈ C put E(z) = ∞ X zn n=0 n! =1+z+ z2 z3 + +··· . 2 6 (3.5) Note that E(0) = 1 and E(1) = e by the definition at page 61. The radius of convergence of the exponential series (3.5) is R = +∞, i. e. the series converges absolutely for all z ∈ C, see Example 2.13 (c). Applying Proposition 2.31 (Cauchy product) on multiplication of absolutely convergent series, we obtain n ∞ ∞ ∞ X z n X w m X X z k w n−k = E(z)E(w) = n! m=0 m! k! (n − k)! n=0 n=0 k=0 ∞ n ∞ X 1 X n X (z + w)n k n−k , z w = = n! n! k n=0 n=0 k=0 which gives us the important addition formula E(z + w) = E(z)E(w), z, w ∈ C. (3.6) z ∈ C. (3.7) One consequence is that E(z)E(−z) = E(0) = 1, This shows that E(z) 6= 0 for all z. By (3.5), E(x) > 0 if x > 0; hence (3.7) shows E(x) > 0 for all real x. Iteration of (3.6) gives E(z1 + · · · + zn ) = E(z1 ) · · · E(zn ). (3.8) Let us take z1 = · · · = zn = 1. Since E(1) = e by (2.15), we obtain E(n) = en , n ∈ N. (3.9) 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 87 If p = m/n, where m, n are positive integers, then E(p)n = E(pn) = E(m) = em , (3.10) so that p ∈ Q+ . E(p) = ep , (3.11) It follows from (3.7) that E(−p) = e−p if p is positive and rational. Thus (3.11) holds for all rational p. This justifies the redefinition x ∈ C. ex := E(x), The notation exp(x) is often used in place of ex . Proposition 3.10 We can estimate the remainder term rn := 2 | z |n | rn (z) | ≤ n! Proof. We have ∞ k X z | rn (z) | ≤ k! k=n ≤ | z |n n! if |z| ≤ P∞ k=n z k /k! as follows n+1 . 2 (3.12) | z |n = n! |z| | z |2 | z |k 1+ + +···+ +··· n + 1 (n + 1)(n + 2) (n + 1) · · · (n + k) ! | z |k |z| | z |2 +···+ +··· . 1+ + n + 1 (n + 1)2 (n + 1)k | z | ≤ (n + 1)/2 implies, | z |n | rn (z) | ≤ n! 1 1 1 1+ + +···+ k +··· 2 4 2 2 | z |n . ≤ n! Example 3.9 (a) Inserting n = 1 gives | E(z) − 1 | = | r1 (z) | ≤ 2 | z | , | z | ≤ 1. In particular, E(z) is continuous at z0 = 0. Indeed, to ε > 0 choose δ = ε/2 then | z | < δ implies | E(z) − 1 | ≤ 2 | z | ≤ ε; hence lim E(z) = E(0) = 1 and E is continuous at 0. z→0 (b) Inserting n = 2 gives | ez − 1 − z | = | r2 (z) | ≤ | z |2 , This implies z e −1 ≤ |z |, − 1 z The sandwich theorem gives lim z→0 ez −1 z = 1. 3 |z| ≤ . 2 3 |z| < . 2 ! 3 Functions and Continuity 88 By (3.5), limx→∞ E(x) = +∞; hence (3.7) shows that limx→−∞ E(x) = 0. By (3.5), 0 < x < y implies that E(x) < E(y); by (3.7), it follows that E(−y) < E(−x); hence, E is strictly increasing on the whole real axis. The addition formula also shows that lim (E(z + h) − E(z)) = E(z) lim (E(h) − 1) = E(z) · 0 = 0, h→0 h→0 (3.13) where limh→0 E(h) = 1 directly follows from Example 3.9. Hence, E(z) is continuous for all z. Proposition 3.11 Let ex be defined on R by the power series (3.5). Then (a) ex is continuous for all x. (b) ex is a strictly increasing function and ex > 0. (c) ex+y = ex ey . (d) lim ex = +∞, lim ex = 0. x→+∞ x→−∞ xn (e) lim x = 0 for every n ∈ N. x→+∞ e Proof. We have already proved (a) to (d); (3.5) shows that ex > xn+1 (n + 1)! for x > 0, so that (n + 1)! xn , < x e x and (e) follows. Part (e) shows that ex tends faster to +∞ than any power of x, as x → +∞. Since ex , x ∈ R, is a strictly increasing continuous function, by Proposition 3.9 ex has an strictly increasing continuous inverse function log y, log : (0, +∞) → R. The function log is defined by elog y = y, y > 0, (3.14) or, equivalently, by log(ex ) = x, x ∈ R. (3.15) Writing u = ex and v = ey , (3.6) gives log(uv) = log(ex ey ) = log(ex+y ) = x + y, such that log(uv) = log u + log v, u > 0, v > 0. (3.16) 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 89 This shows that log has the familiar property which makes the logarithm useful for computations. Another customary notation for log x is ln x. Proposition 3.11 shows that lim log x = +∞, lim log x = −∞. x→+∞ x→0+0 We summarize what we have proved so far. Proposition 3.12 Let the logarithm log : (0, +∞) → R be the inverse function to the exponential function ex . Then (a) log is continuous on (0, +∞). (b) log is strictly increasing. (c) log(uv) = log u + log v for u, v > 0. (d) lim log x = +∞, lim log x = −∞. x→+∞ x→0+0 It is seen from (3.14) that x = elog x =⇒ xn = en log x (3.17) if x > 0 and n is an integer. Similarly, if m is a positive integer, we have 1 xm = e log x m (3.18) Combining (3.17) and (3.18), we obtain xα = eα log x . (3.19) for any rational α. We now define xα for any real α and x > 0, by (3.19). In the same way, we redefine the exponential function ax = ex log a , a > 0, x ∈ R. It turns out that in case a 6= 1, f (x) = ax is strict monotonic and continuous since ex is so. Hence, f has a stict monotonic continuous inverse function loga : (0, +∞) → R defined by loga (ax ) = x, x ∈ R, aloga x = x, x > 0. 3.5.2 Trigonometric Functions and their Inverses Simple Trig Functions 1 In this section we redefine the trigonometric functions using the exponential function ez . We will see that the new definitions coincide with the old ones. 0.5 –6 –4 –2 2 4 6 Definition 3.10 For z ∈ C define cos z = –0.5 such that –1 Sine Cosine 1 iz e + e−i z , 2 ei z = cos z + i sin z sin z = 1 iz e − e−i z 2i (Euler formula) (3.20) (3.21) 3 Functions and Continuity 90 Proposition 3.13 (a) The functions sin z and cos z can be written as power series which converge absolutely for all z ∈ C: cos z = sin z = ∞ X (−1)n n=0 ∞ X n=0 1 1 1 z 2n = 1 − z 2 + z 4 − z 6 + − · · · (2n)! 2 4! 6! 1 1 (−1)n 2n+1 = z − z3 + z5 − + · · · . z (2n + 1)! 3! 5! (3.22) (b) sin x and cos x are real valued and continuous on R, where cos x is an even and sin x is an odd function, i. e. cos(−x) = cos x, sin(−x) = − sin x. We have sin2 x + cos2 x = 1; (3.23) cos(x + y) = cos x cos y − sin x sin y; sin(x + y) = sin x cos y + cos x sin y. (3.24) Proof. (a) Inserting iz into (3.5) in place of z and using (in ) = (i, −1, −i, 1, i, −1, . . . ), we have iz e = ∞ X nz i n=0 n n! = ∞ X k=0 ∞ X z 2k z 2k+1 (−1) (−1)k +i . (2k)! (2k + 1)! k k=0 Inserting −iz into (3.5) in place of z we have e−i z = ∞ X in n=0 ∞ ∞ X zn X z 2k z 2k+1 (−1)k (−1)k = −i . n! (2k)! (2k + 1)! k=0 k=0 Inserting this into (3.20) proves (a). (b) Since the exponential function is continuous on C, sin z and cos z are also continuous on C. In particular, their restrictions to R are continuous. Now let x ∈ R, then i x = −i x. By Homework 11.3 and (3.20) we obtain 1 1 1 ix e + ei x = ei x + ei x = ei x + ei x = Re ei x cos x = 2 2 2 and similarly sin x = Im ei x . Hence, sin x and cos x are real for real x. For x ∈ R we have ei x = 1. Namely by (3.7) and Homework 11.3 i ei x ix 2 i sin x e = eix eix = eix e−i x = e0 = 1, so that for x ∈ R 0 cos x 1 ix e = 1. (3.25) On the other hand, the Euler formula and the fact that cos x and sin x are real give 1 = ei x = | cos x + i sin x | = cos2 x + sin2 x. 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 91 Hence, eix = cos x + i sin x is a point on the unit circle in the complex plane, and cos x and sin x are its coordinates. This establishes the equivalence between the old definition of cos x as ◦ the length of the adjacent side in a rectangular triangle with hypothenuse 1 and angle x 180 and π the power series definition of cos x. The only missing link is: the length of the arc from 1 to eix is x. It follows directly from the definition that cos(−z) = cos z and sin(−z) = − sin z for all z ∈ C. The addition laws for sin x and cos x follow from (3.6) applied to ei (x+y) . This completes the proof of (b). Lemma 3.14 There exists a unique number τ ∈ (0, 2) such that cos τ = 0. We define the number π by π = 2τ. (3.26) The proof is based on the following Lemma. Lemma 3.15 √ x3 < sin x < x. 6 √ (b) 0 < x < 2 implies 0 < cos x, sin x 0 < sin x < x < , cos x 1 . cos2 x < 1 + x2 (a) 0 < x < 6 implies x − (3.27) (3.28) (3.29) (3.30) (c) cos x is strictly decreasing on [0, π]; whereas sin x is strictly increasing on [−π/2, π/2]. 2 In particular, the sandwich theorem applied to statement (a), 1 − x6 < sinx x < 1 as x → 0 + 0 gives limx→0+0 sinx x = 1. Since sinx x is an even function, this implies limx→0 sinx x = 1. The proof of the lemma is in the Appendix B to this chapter. Proof of Lemma 3.14. cos 0 = 1. By the Lemma 3.15, cos2 1 < 1/2. By the double angle formula for cosine, cos 2 = 2 cos2 1 − 1 < 0. By continuity of cos x and Theorem 3.5, cos has a zero τ in the interval (0, 2). By addition laws, x+y x−y cos x − cos y = −2 sin sin . 2 2 So that by Lemma 3.15 0 < x < y < 2 implies 0 < sin((x + y)/2) and sin((x − y)/2) < 0; therefore cos x > cos y. Hence, cos x is strictly decreasing on (0, 2). The zero τ is therefore unique. 3 Functions and Continuity 92 By definition, cos π2 = 0; and (3.23) shows sin(π/2) = ±1. By (3.27), sin π/2 = 1. Thus eiπ/2 = i, and the addition formula for ez gives eπ i = −1, e2π i = 1; (3.31) z ∈ C. (3.32) hence, ez+2π i = ez , Proposition 3.16 (a) The function ez is periodic with period 2πi. We have eix = 1, x ∈ R, if and only if x = 2kπ, k ∈ Z. (b) The functions sin z and cos z are periodic with period 2π. The real zeros of the sine and cosine functions are {kπ | k ∈ Z} and {π/2 + kπ | k ∈ Z}, respectively. Proof. We have already proved (a). (b) follows from (a) and (3.20). Tangent and Cotangent Functions Tangent 4 Cotangent 4 3 y 3 y 2 1 –4 –3 –2 –1 2 1 1 2 3 4 –4 –3 –2 –1 1 x 2 3 4 x –1 –1 –2 –2 –3 –3 –4 –4 For x 6= π/2 + kπ, k ∈ Z, define tan x = sin x . cos x (3.33) cot x = cos x . sin x (3.34) For x 6= kπ, k ∈ Z, define Lemma 3.17 (a) tan x is continuous at x ∈ R \ {π/2 + kπ | k ∈ Z}, and tan(x + π) = tan x; (b) lim tan x = +∞, lim tan x = −∞; π π x→ 2 −0 x→ 2 +0 (c) tan x is strictly increasing on (−π/2, π/2); 3.5 Exponential, Trigonometric, and Hyperbolic Functions and their Inverses 93 Proof. (a) is clear by Proposition 3.3 since sin x and cos x are continuous. We show only (c) and let (b) as an exercise. Let 0 < x < y < π/2. Then 0 < sin x < sin y and cos x > cos y > 0. Therefore tan x = sin x sin y < = tan y. cos x cos y Hence, tan is strictly increasing on (0, π/2). Since tan(−x) = − tan(x), tan is strictly increasing on the whole interval (−π/2, π/2). Similarly as Lemma 3.17 one proves the next lemma. Lemma 3.18 (a) cot x is continuous at x ∈ R \ {kπ | k ∈ Z}, and cot(x + π) = cot x; (b) lim cot x = −∞, lim cot x = +∞; x→0−0 x→0+0 (c) cot x is strictly decreasing on (0, π). Inverse Trigonometric Functions We have seen in Lemma 3.15 that cos x is strictly decreasing on [0, π] and sin x is strictly increasing on [−π/2, π/2]. Obviously, the images are cos[0, π] = sin[−π/2, π/2] = [−1, 1]. Using Proposition 3.9 we obtain that the inverse functions exists and they are monotonic and continuous. Proposition 3.19 (and Definition) There exists the inArcosine and Arcsine verse function to cos 3 arccos : [−1, 1] → [0, π] 2 1 –1 –0.8 –0.6 –0.4 –0.2 0 0.2 –1 0.4 0.6 0.8 1 (3.35) given by arccos(cos x) = x, x ∈ [0, π] or cos(arccos y) = y, y ∈ [−1, 1]. The function arccos x is strictly decreasing and continuous. There exists the inverse function to sin arcsin : [−1, 1] → [−π/2, π/2] (3.36) given by arcsin(sin x) = x, x ∈ [−π/2, π/2] or sin(arcsin y) = y, y ∈ [−1, 1]. The function arcsin x is strictly increasing and continuous. Note that arcsin x + arccos x = π/2 if x ∈ [−1, 1]. Indeed, let y = arcsin x; then x = sin y = cos(π/2 − y). Since y ∈ [0, π], π/2 − y ∈ [−π/2, π/2], and we have arccos x = π/2 − y. Therefore y + arccos x = π/2. arccos arcsin 3 Functions and Continuity 94 By Lemma 3.17, tan x is strictly increasing on (−π/2, π/2). Therefore, there exists the inverse function on the image tan(−π/2, π/2) = R. Arctangent and Arccotangent 3 –10 –8 –6 –4 2 Proposition 3.20 (and Definition) There exists the inverse function to tan 1 arctan : R → (−π/2, π/2) –2 2 4 6 8 x –1 10 (3.37) given by arctan(tan x) = x, x ∈ (−π/2, π/2) or tan(arctan y) = y, y ∈ R. The function arctan x is strictly increasing and continuous. There exists the inverse function to cot x arccot : R → (0, π) (3.38) given by arccot (cot x) = x, x ∈ (0, π) or cot(arccot y) = y, y ∈ R. The function arccot x is strictly decreasing and continuous. arctan arccot 3.5.3 Hyperbolic Functions and their Inverses Hyperbolic Cosine and Sine The functions 3 2 1 –2 ex − e−x , 2 ex + e−x cosh x = , 2 ex − e−x tanh x = x = e + e−x ex + e−x = coth x = x e − e−x sinh x = –1 1 –1 –2 –3 cosh sinh 2 (3.39) (3.40) sinh x cosh x cosh x sinh x (3.41) (3.42) are called hyperbolic sine, hyperbolic cosine, hyperbolic tangent, and hyperbolic cotangent, respectively. There are many analogies between these functions and their ordinary trigonometric counterparts. 3.6 Appendix B 95 Hyperbolic Cotangent 3 Hyperbolic Tangent 1 2 y 0.5 1 –3 –2 –1 0 1 2 3 –3 –2 –1 1 2 3 x –1 –0.5 –2 –1 –3 The functions sinh x and tanh x are strictly increasing with sinh(R) = R and tanh(R) = (−1, 1). Hence, their inverse functions are defined on R and on (−1, 1), respectively, and are also strictly increasing and continuous. The function arsinh : R → R (3.43) is given by arsinh (sinh(x)) = x, x ∈ R or sinh(arsinh (y)) = y, y ∈ R. The function artanh : (−1, 1) → R (3.44) is defined by artanh (tanh(x)) = x, x ∈ R or tanh(artanh (y)) = y, y ∈ (−1, 1). The function cosh is strictly increasing on the half line R+ with cosh(R+ ) = [1, ∞). Hence, the inverse function is defined on [1, ∞) taking values in R+ . It is also strictly increasing and continuous. arcosh : [1, ∞) → R+ (3.45) is defined via arcosh (cosh(x)) = x, x ≥ 0 or by cosh(arcosh (y)) = y, y ≥ 1. The function coth is strictly decreasing on the x < 0 and on x > 0 with coth(R \ 0) = R \ [−1, 1]. Hence, the inverse function is defined on R \ [−1, 1] taking values in R \ 0. It is also strictly decreasing and continuous. arcoth : R \ [−1, 1] → R (3.46) is defined via arcoth (coth(x)) = x, x 6= 0 or by coth(arcoth (y)) = y, y < −1 or y > 1. 3.6 Appendix B 3.6.1 Monotonic Functions have One-Sided Limits Proof of Theorem 3.8. By hypothesis, the set {f (t) | a < t < x} is bounded above by f (x), and therefore has a least upper bound which we shall denote by A. Evidently A ≤ f (x). We 3 Functions and Continuity 96 have to show that A = f (x − 0). Let ε > 0 be given. It follows from the definition of A as a least upper bound that there exists δ > 0 such that a < x − δ < x and A − ε < f (x − δ) ≤ A. (3.47) Since f is monotonic, we have f (x − δ) < f (t) ≤ A, if x − δ < t < x. (3.48) Combining (3.47) and (3.48), we see that | f (t) − A | < ε if x − δ < t < x. Hence f (x − 0) = A. The second half of (3.3) is proved in precisely the same way. Next, if a < x < y < b, we see from (3.3) that f (x + 0) = inf f (t) = inf f (t). x<t<b x<t<y (3.49) The last equality is obtained by applying (3.3) to (a, y) instead of (a, b). Similarly, f (y − 0) = sup f (t) = sup f (t). a<t<y (3.50) x<t<y Comparison of the (3.49) and (3.50) gives (3.4). 3.6.2 Proofs for sin x and cos x inequalities Proof of Lemma 3.15. (a) By (3.22) 1 2 1 1 2 4 cos x = 1 − x + x − x +··· . 2! 4! 6! √ 0 < x < 2 implies 1 − x2 /2 > 0 and, moreover 1/(2n)! − x2 /(2n + 2)! > 0 for all n ∈ N; hence C(x) > 0. By (3.22), 1 2 1 2 1 5 − x +··· . sin x = x 1 − x + x 3! 5! 7! Now, √ √ 1 1 1 2 x > 0 ⇐⇒ x < 6, − x2 > 0 ⇐⇒ x < 42, . . . . 3! 5! 7! √ Hence, S(x) > 0 if 0 < x < 6. This gives (3.27). Similarly, 1 1 1 2 1 2 3 7 − x +x − x +··· , x − sin x = x 3! 5! 7! 9! 1− 3.6 Appendix B 97 √ and we obtain sin x < x if 0 < x < 20. Finally we have to check whether sin x − x cos x > 0; equivalently ? 1 1 1 1 1 1 3 5 7 −x +x −+··· − − − 0<x 2! 3! 4! 5! 6! 7! ? 2 6 3 2 4 7 2 8 0<x +x +··· −x −x 3! 5! 7! 9! √ Now 10 > x > 0 implies 2n + 2 2 2n − x >0 (2n + 1)! (2n + 3)! for all n ∈ N. This completes the proof of (a) (b) Using (3.23), we get 0 < x cos x < sin x =⇒ 0 < x2 cos2 x < sin2 x 1 . =⇒ x2 cos2 x + cos2 x < 1 =⇒ cos2 x < 1 + x2 (c) In the proof of Lemma 3.14 we have seen that cos x is strictly decreasing in (0, π/2). By √ (3.23), sin x = 1 − cos2 x is strictly increasing. Since sin x is an odd function, sin x is strictly increasing on [−π/2, π/2]. Since cos x = − sin(x − π/2), the statement for cos x follows. 3.6.3 Estimates for π Proposition 3.21 For real x we have cos x = n X (−1)k x2k + r2n+2 (x) (2k)! (3.51) (−1)k x2k+1 + r2n+3 (x), (2k + 1)! (3.52) k=0 sin x = n X k=0 where | x |2n+2 | r2n+2 (x) | ≤ (2n + 2)! | x |2n+3 | r2n+3 (x) | ≤ (2n + 3)! Proof. Let Put if | x | ≤ 2n + 3, (3.53) if | x | ≤ 2n + 4. (3.54) x2 x2n+2 r2n+2 (x) = ± 1− ±··· . (2n + 2)! (2n + 3)(2n + 4) ak := x2k . (2n + 3)(2n + 4) · · · (2n + 2(k + 1)) 3 Functions and Continuity 98 Then we have, by definition r2n+2 (x) = ± x2n+2 (1 − a1 + a2 − + · · · ) . (2n + 2)! Since ak = ak−1 x2 , (2n + 2k + 1)(2n + 2k + 2) | x | ≤ 2n + 3 implies 1 > a1 > a2 > · · · > 0 and finally as in the proof of the Leibniz criterion 0 ≤ 1 − a1 + a2 − a3 + − · · · ≤ 1. Hence, | r2n+2 (x) | ≤ | x |2n+2 /(2n + 2)!. The estimate for the remainder of the sine series is similar. This is an application of Proposition 3.21. For numerical calculations it is convenient to use the following order of operations cos x = · · · −x2 −x2 −x2 +1 +1 +1 ··· 2n(2n − 1) (2n − 2)(2n − 3) (2n − 4)(2n − 5) −x2 + 1 + r2n+2 (x). ··· 2 First we compute cos 1.5 and cos 1.6. Choosing n = 7 we obtain cos x = −x2 −x2 −x2 −x2 −x2 −x2 −x2 +1 +1 +1 +1 +1 +1 + 182 132 90 56 30 12 2 + 1 + r16 (x). By Proposition 3.21 | r16 (x) | ≤ | x |16 ≤ 0.9 · 10−10 16! if | x | ≤ 1.6. The calculations give cos 1.5 = 0.07073720163 ± 20 · 10−11 > 0, cos 1.6 = −0.02919952239 ± 20 · 10−11 < 0. By the itermediate value theorem, 1.5 < π/2 < 1.6. 3.6 Appendix B 99 Now we compute cos x for two values of x which are close to the linear interpolation cos 1.5 a = 1.5 + 0.1 1.5 a cos 1.5 = 1.57078 . . . cos 1.5 − cos 1.6 cos 1.5707 = 0.000096326273 ± 20 · 10−11 > 0, 1.6 cos 1.5708 = −0.00000367326 ± 20 · 10−11 < 0. cos 1.6 Hence, 1.5707 < π/2 < 1.5708. The next linear interpolation gives cos 1.5707 = 1.570796326 . . . cos 1.707 − cos 1.708 cos 1.570796326 = 0.00000000073 ± 20 · 10−11 > 0, b = 1.5707 + 0.00001 cos 1.570796327 = −0.00000000027 ± 20 · 10−11 < 0. Therefore 1.570796326 < π/2 < 1.570796327 so that π = 3.141592653 ± 10−9. 100 3 Functions and Continuity Chapter 4 Differentiation 4.1 The Derivative of a Function We define the derivative of a function and prove the main properties like product, quotient and chain rule. We relate the derivative of a function with the derivative of its inverse function. We prove the mean value theorem and consider local extrema. Taylor’s theorem will be formulated. Definition 4.1 Let f : (a, b) → R be a function and x0 ∈ (a, b). If the limit lim x→x0 f (x) − f (x0 ) x − x0 (4.1) exists, we call f differentiable at x0 . The limit is denoted by f ′ (x0 ). We say f is differentiable if f is differentiable at every point x ∈ (a, b). We thus have associated to every function f a function f ′ whose domain is the set of points x0 where the limit (4.1) exists; f ′ is called the derivative of f . Sometimes the Leibniz notation is used to denote the derivative of f f ′ (x0 ) = df (x0 ) d = f (x0 ). dx dx f (x0 + h) − f (x0 ) Remarks 4.1 (a) Replacing x − x0 by h , we see that f ′ (x0 ) = lim . h→0 h (b) The limits f (x0 + h) − f (x0 ) f (x0 + h) − f (x0 ) , lim lim h→0−0 h→0+0 h h are called left-hand and right-hand derivatives of f in x0 , respectively. In particular for f : [a, b] → R, we can consider the right-hand derivative at a and the left-hand derivative at b. Example 4.1 (a) For f (x) = c the constant function f (x) − f (x0 ) c−c = lim = 0. x→x0 x→x0 x − x0 x − x0 f ′ (x0 ) = lim (b) For f (x) = x, f ′ (x0 ) = lim x→x0 x − x0 = 1. x − x0 101 4 Differentiation 102 (c) The slope of the tangent line. Given a function f : (a, b) → R which is differentiable at x0 . Then f ′ (x0 ) is the slope of the tangent line to the graph of f through the point (x0 , f (x0 )). The slope of the secant line through (x0 , f (x0 )) and (x1 , f (x1 )) is y f( x 1) m = tan α1 = f( x 0) α1 x0 x1 x f (x1 ) − f (x0 ) . x1 − x0 One can see: If x1 approaches x0 , the secant line through (x0 , f (x0 )) and (x1 , f (x1 )) approaches the tangent line through (x0 , f (x0 )). Hence, the slope of the tangent line is the limit of the slopes of the secant lines if x approaches x0 : f (x) − f (x0 ) . f ′ (x0 ) = lim x→x0 x − x0 Proposition 4.1 If f is differentiable at x0 ∈ (a, b), then f is continuous at x0 . Proof. By Proposition 3.2 we have f (x) − f (x0 ) (x − x0 ) = f ′ (x0 ) lim (x − x0 ) = f ′ (x0 ) · 0 = 0. x→x0 x→x0 x − x0 lim (f (x) − f (x0 )) = lim x→x0 The converse of this proposition is not true. For example f (x) = | x | is continuous in R but |h| |h| = 1 whereas lim = −1. Later we will become differentiable in R \ {0} since lim h→0+0 h h→0−0 h aquainted with a function which is continuous on the whole line without being differentiable at any point! Proposition 4.2 Let f : (r, s) → R be a function and a ∈ (r, s). Then f is differentiable at a if and only if there exists a number c ∈ R and a function ϕ defined in a neighborhood of a such that f (x) = f (a) + (x − a)c + ϕ(x), (4.2) where lim x→a In this case f ′ (a) = c. ϕ(x) = 0. x−a (4.3) The proposition says that a function f differentiable at a can be approximated by a linear function, in our case by y = f (a) + (x − a)f ′ (a). The graph of this linear function is the tangent line to the graph of f at the point (a, f (a)). Later we will use this point of view to define differentiability of functions f : Rn → Rm . 4.1 The Derivative of a Function 103 Proof. Suppose first f satisfies (4.2) and (4.3). Then ϕ(x) f (x) − f (a) lim = c. = lim c + x→a x→a x−a x−a Hence, f is differentiable at a with f ′ (a) = c. Now, let f be differentiable at a with f ′ (a) = c. Put ϕ(x) = f (x) − f (a) − (x − a)f ′ (a). Then lim x→a ϕ(x) f (x) − f (a) = lim − f ′ (a) = 0. x→a x−a x−a Let us compute the linear function whose graph is the tangent line through (x0 , f (x0 )). Consider the rectangular triangle P P0Q0 . By Example 4.1 (c) we have y y P (x,y) α P0 f( x 0) f ′ (x0 ) = tan α = Q x0 x 0 y − y0 , x − x0 such that the tangent line has the equation x y = g(x) = f (x0 ) + f ′ (x0 )(x − x0 ). This function is called the linearization of f at x0 . It is also the Taylor polynomial of degree 1 of f at x0 , see Section4.5 below. Proposition 4.3 Suppose f and g are defined on (a, b) and are differentible at a point x ∈ (a, b). Then f + g, f g, and f /g are differentiable at x and (a) (f + g)′ (x) = f ′ (x) + g ′(x); ′ ′ ′ (b) (f g) (x) = f (x)g(x) + f (x)g (x); (c) f g ′ (x) = f ′ (x)g(x)−f (x)g ′ (x) . g(x)2 In (c), we assume that g(x) 6= 0. Proof. (a) Since f (x + h) − f (x) g(x + h) − g(x) (f + g)(x + h) − (f + g)(x) = + , h h h the claim follows from Proposition 3.2. Let h = f g and t be variable. Then h(t) − h(x) = f (t)(g(t) − g(x)) + g(x)(f (t) − f (x)) h(t) − h(x) g(t) − g(x) f (t) − f (x) = f (t) + g(x) . t−x t−x t−x 4 Differentiation 104 Noting that f (t) → f (x) as t → x, (b) follows. Next let h = f /g. Then f (t) g(t) − f (x) g(x) f (t)g(x) − f (x)g(t) t−x g(x)g(t)(t − x) 1 f (t)g(x) − f (x)g(x) + f (x)g(x) − f (x)g(t) = g(t)g(x) t−x f (t) − f (x) g(t) − g(x) 1 g(x) . − f (x) = g(t)g(x) t−x t−x h(t) − h(x) = t−x = Letting t → x, and applying Propositions 3.2 and 4.1, we obtain (c). Example 4.2 (a) f (x) = xn , n ∈ Z. We will prove f ′ (x) = nxn−1 by induction on n ∈ N. The cases n = 0, 1 are OK by Example 4.1. Suppose the statement is true for some fixed n. We will show that (xn+1 )′ = (n + 1)xn . By the product rule and the induction hypothesis (xn+1 )′ = (xn · x)′ = (xn )′ x + xn (x′ ) = nxn−1 x + xn = (n + 1)xn . This proves the claim for positive integers n. For negative n consider f (x) = 1/x−n and use the quotient rule. (b) (ex )′ = ex . ex+h − ex ex eh − ex eh − 1 = lim = ex lim = ex ; h→0 h→0 h→0 h h h (ex )′ = lim (4.4) the last equation simply follows from Example 3.9 (b) (c) (sin x)′ = cos x, (cos x)′ = − sin x. Using sin(x + y) − sin(x − y) = 2 cos (x) sin (y) we have 2 cos 2x+h sin h2 sin(x + h) − sin x 2 = lim (sin x) = lim h→0 h→0 h h sin h2 h lim = lim cos x + . h→0 2 h→0 h2 ′ sin h = 1 (by the argument after Lemma 3.15), we obtain h→0 h (sin x)′ = cos x. The proof for cos x is analogous. 1 (d) (tan x)′ = . Using the quotiont rule for the function tan x = sin x/ cos x we have cos2 x Since cos x is continuous and lim (tan x)′ = (sin x)′ cos x − sin x(cos x)′ cos2 x + sin2 x 1 = = . 2 2 cos x cos x cos2 x The next proposition deals with composite functions and is probably the most important statement about derivatives. 4.1 The Derivative of a Function 105 Proposition 4.4 (Chain rule) Let g : (α, β) → R be differentiable at x0 ∈ (α, β) and let f : (a, b) → R be differentiable at y0 = g(x0 ) ∈ (a, b). Then h = f ◦g is differentiable at x0 , and h′ (x0 ) = f ′ (y0 )g ′ (x0 ). (4.5) Proof. We have f (g(x)) − f (g(x0 )) g(x) − g(x0 ) f (g(x)) − f (g(x0 )) = x − x0 g(x) − g(x0 ) x − x0 f (y) − f (y0 ) ′ −→ lim · g (x0 ) = f ′ (y0 )g ′ (x0 ). x→x0 y→y0 y − y0 Here we used that y = g(x) tends to y0 = g(x0 ) as x → x0 , since g is continuous at x0 . Proposition 4.5 Let f : (a, b) → R be strictly monotonic and continuous. Suppose f is differentiable at x. Then the inverse function g = f −1 : f ((a, b)) → R is differentiable at y = f (x) with g ′ (y) = 1 f ′ (x) = 1 f ′ (g(y)) . (4.6) Proof. Let (yn ) ⊂ f ((a, b)) be a sequence with yn → y and yn 6= y for all n. Put xn = g(yn ). Since g is continuous (by Corollary 3.9), limn→∞ xn = x. Since g is injective, xn 6= x for all n. We have g(yn) − g(y) xn − x = lim = lim n→∞ n→∞ f (xn ) − f (x) n→∞ yn − y lim 1 f (xn )−f (x) xn −x = 1 f ′ (x) . Hence g ′ (y) = 1/f ′ (x) = 1/f ′ (g(y)). We give some applications of the last two propositions. Example 4.3 (a) Let f : R → R be differentiable; define F : R → R by F (x) := f (ax + b) with some a, b ∈ R. Then F ′ (x) = af ′ (ax + b). (b) In what follows f is the original function (with known derivative) and g is the inverse function to f . We fix the notion y = f (x) and x = g(y). log : R+ \ {0} → R is the inverse function to f (x) = ex . By the above proposition (log y)′ = 1 1 1 = x = . x ′ (e ) e y 1 (c) xα = eα log x . Hence, (xα )′ = (eα log x )′ = eα log x α = αxα−1 . x ′ ′1 (d) Suppose f > 0 and g = log f . Then g = f ; hence f ′ = f g ′ . f 4 Differentiation 106 (e) arcsin : [−1, 1] → R is the inverse function to y = f (x) = sin x. If x ∈ (−1, 1) then (arcsin(y))′ = 1 1 = . ′ (sin x) cos x Since y ∈ [−1,p1] implies x = arcsin y ∈ [−π/2, π/2], cos x ≥ 0. Therefore, cos x = p 1 − sin2 x = 1 − y 2. Hence 1 , (arcsin y)′ = p 1 − y2 −1 < y < 1. Note that the derivative is not defined at the endpoints y = −1 and y = 1. (f) 1 1 = 1 = cos2 x. (arctan y)′ = ′ (tan x) cos2 x Since y = tan x we have y 2 = tan2 x = 1 1 + y2 1 (arctan y)′ = . 1 + y2 cos2 x = 1 − cos2 x 1 sin2 x = = −1 2 2 cos x cos x cos2 x 4.2 The Derivatives of Elementary Functions 107 4.2 The Derivatives of Elementary Functions function const. derivative 0 xn (n ∈ N) nxn−1 xα (α ∈ R, x > 0) αxα−1 ex ex ax , (a > 0) ax log a 1 x 1 x log a cos x log x loga x sin x cos x tan x cot x sinh x cosh x tanh x coth x arcsin x arccos x arctan x arccot x arsinh x arcosh x artanh x arcoth x − sin x 1 cos2 x 1 − 2 sin x cosh x sinh x 1 cosh2 x 1 − sinh2 x 1 √ 1 − x2 1 −√ 1 − x2 1 1 + x2 1 − 1 + x2 1 √ x2 + 1 1 √ x2 − 1 1 1 − x2 1 1 − x2 4 Differentiation 108 4.2.1 Derivatives of Higher Order Let f : D → R be differentiable. If the derivative f ′ : D → R is differentiable at x ∈ D, then d2 f (x) = f ′′ (x) = (f ′ )′ (x) dx2 is called the second derivative of f at x. Similarly, one defines inductively higher order derivatives. Continuing in this manner, we obtain functions f, f ′ , f ′′ , f (3) , . . . , f (k) each of which is the derivative of the preceding one. f (n) is called the nth derivative of f or the derivative of order n of f . We also use the Leibniz notation f (k) dk f (x) (x) = = dxk d dx k f (x). Definition 4.2 Let D ⊂ R and k ∈ N a positive integer. We denote by Ck (D) the set of all functions f : D → R such that f (k) (x) exists for all x ∈ D and f (k) (x) is continuous. Obviously C(D) ⊃ C1 (D) ⊃ C2 (D) ⊃ · · · . Further, we set \ Ck (D) = {f : D → R | f (k) (x) exists ∀ k ∈ N, x ∈ D}. (4.7) C∞ (D) = N k∈ f ∈ Ck (D) is called k times continuously differentiable. C(D) = C0 (D) is the vector space of continuous functions on D. Using induction over n, one proves the following proposition. Proposition 4.6 (Leibniz formula) Let f and g be n times differentiable. Then f g is n times differentiable with (f (x)g(x)) (n) n X n (k) f (x)g (n−k) (x). = k k=0 (4.8) 4.3 Local Extrema and the Mean Value Theorem Many properties of a function f like monotony, convexity, and existence of local extrema can be studied using the derivative f ′ . From estimates for f ′ we obtain estimates for the growth of f. Definition 4.3 Let f : [a, b] → R be a function. We say that f has a local maximum at the point ξ, ξ ∈ (a, b), if there exists δ > 0 such that f (x) ≤ f (ξ) for all x ∈ [a, b] with | x − ξ | < δ. Local minima are defined likewise. We say that ξ is a local extremum if it is either a local maximum or a local minimum. 4.3 Local Extrema and the Mean Value Theorem 109 Proposition 4.7 Let f be defined on [a, b]. If f has a local extremum at a point ξ ∈ (a, b), and if f ′ (ξ) exists, then f ′ (ξ) = 0. Proof. Suppose f has a local maximum at ξ. According with the definition choose δ > 0 such that a < ξ − δ < ξ < ξ + δ < b. If ξ − δ < x < ξ, then Letting x → ξ, we see that f ′ (ξ) ≥ 0. If ξ < x < ξ + δ, then f (x) − f (ξ) ≥ 0. x−ξ f (x) − f (ξ) ≤ 0. x−ξ Letting x → ξ, we see that f ′ (ξ) ≤ 0. Hence, f ′ (ξ) = 0. Remarks 4.2 (a) f ′ (x) = 0 is a necessary but not a sufficient condition for a local extremum in x. For example f (x) = x3 has f ′ (x) = 0, but x3 has no local extremum. (b) If f attains its local extrema at the boundary, like f (x) = x on [0, 1], we do not have f ′ (ξ) = 0. Theorem 4.8 (Rolle’s Theorem) Let f : [a, b] → R be continuous with f (a) = f (b) and let f be differentiable in (a, b). Then there exists a point ξ ∈ (a, b) with f ′ (ξ) = 0. In particular, between two zeros of a differentiable function there is a zero of its derivative. Proof. If f is the constant function, the theorem is trivial since f ′ (x) ≡ 0 on (a, b). Otherwise, there exists x0 ∈ (a, b) such that f (x0 ) > f (a) or f (x0 ) < f (a). Then f attains its maximum or minimum, respectively, at a point ξ ∈ (a, b). By Proposition 4.7, f ′ (ξ) = 0. Theorem 4.9 (Mean Value Theorem) Let f : [a, b] → R be continuous and differentiable in (a, b). Then there exists a point ξ ∈ (a, b) such that f ′ (ξ) = f (b) − f (a) b−a (4.9) Geometrically, the mean value theorem states that there exists a tangent line through some point (ξ, f (ξ)) which is parallel to the secant line AB, A = (a, f (a)), B = (b, f (b)). a ξ b 4 Differentiation 110 Theorem 4.10 (Generalized Mean Value Theorem) Let f and g be continuous functions on [a, b] which are differentiable on (a, b). Then there exists a point ξ ∈ (a, b) such that (f (b) − f (a))g ′(ξ) = (g(b) − g(a))f ′ (ξ). Proof. Put h(t) = (f (b) − f (a))g(t) − (g(b) − g(a))f (t). Then h is continuous in [a, b] and differentiable in (a, b) and h(a) = f (b)g(a) − f (a)g(b) = h(b). Rolle’s theorem shows that there exists ξ ∈ (a, b) such that h′ (ξ) = f (b) − f (a))h′ (ξ) − (g(b) − g(a))f ′ (ξ) = 0. The theorem follows. In case that g ′ is nonzero on (a, b) and g(b)−g(a) 6= 0, the generalized MVT states the existence of some ξ ∈ (a, b) such that f ′ (ξ) f (b) − f (a) = ′ . g(b) − g(a) g (ξ) This is in particular true for g(x) = x and g ′ = 1 which gives the assertion of the Mean Value Theorem. Remark 4.3 Note that the MVT fails if f is complex-valued, continuous on [a, b], and differentiable on (a, b). Indeed, f (x) = eix on [0, 2π] is a counter example. f is continuous on [0, 2π], differentiable on (0, 2π) and f (0) = f (2π) = 1. However, there is no ξ ∈ (0, 2π) such that (0) 0 = f (2π)−f = f ′ (ξ) = ieiξ since the exponential function has no zero, see (3.7) (ez · e−z = 1) 2π in Subsection 3.5.1. Corollary 4.11 Suppose f is differentiable on (a, b). If f ′ (x) ≥ 0 for all x ∈ (a, b), then f in monotonically increasing. If f ′ (x) = 0 for all x ∈ (a, b), then f is constant. If f ′ (x) ≤ 0 for all x in (a, b), then f is monotonically decreasing. Proof. All conclusions can be read off from the equality f (x) − f (t) = (x − t)f ′ (ξ) which is valid for each pair x, t, a < t < x < b and for some ξ ∈ (t, x). 4.3 Local Extrema and the Mean Value Theorem 111 4.3.1 Local Extrema and Convexity Proposition 4.12 Let f : (a, b) → R be differentiable and suppose f ′′ (ξ) exists at a point ξ ∈ (a, b). If f ′ (ξ) = 0 and f ′′ (ξ) > 0, then f has a local minimum at ξ. Similarly, if f ′ (ξ) = 0 and f ′′ (ξ) < 0, f has a local maximum at ξ. Remark 4.4 The condition of Proposition 4.12 is sufficient but not necessary for the existence of a local extremum. For example, f (x) = x4 has a local minimum at x = 0, but f ′′ (0) = 0. Proof. We consider the case f ′′ (ξ) > 0; the proof of the other case is analogous. Since f ′ (x) − f ′ (ξ) > 0. x→ξ x−ξ f ′′ (ξ) = lim By Homework 10.4 there exists δ > 0 such that | f ′′ (ξ) | f ′ (x) − f ′ (ξ) > > 0, x−ξ 2 for all x with 0 < | x − ξ | < δ. Since f ′ (ξ) = 0 it follows that f ′ (x) < 0 ′ f (x) > 0 if ξ − δ < x < ξ, if ξ < x < ξ + δ. Hence, by Corollary 4.11, f is decreasing in (ξ − δ, ξ) and increasing in (ξ, ξ + δ). Therefore, f has a local minimum at ξ. Definition 4.4 A function f : (a, b) → R is said to be convex if for all x, y ∈ (a, b) and all λ ∈ [0, 1] λ f(x)+(1- λ )f(y) f (λx + (1 − λ)y) ≤ λf (x) + (1 − λ)f (y). (4.10) f( λ x+(1- λ )y ) x λ x+(1- λ )y y A function f is said to be concave if −f is convex. Proposition 4.13 (a) Convex functions are continuous. (b) Suppose f : (a, b) → R is twice differentiable. Then f is convex if and only if f ′′ (x) ≥ 0 for all x ∈ (a, b). Proof. The proof is in Appendix C to this chapter. 4 Differentiation 112 4.4 L’Hospital’s Rule Theorem 4.14 (L’Hospital’s Rule) Suppose f and g are differentiable in (a, b) and g(x) 6= 0 for all x ∈ (a, b), where −∞ ≤ a < b ≤ +∞. Suppose f ′ (x) = A. x→a+0 g ′ (x) lim (4.11) If (a) (b) lim f (x) = lim g(x) = 0 or (4.12) lim f (x) = lim g(x) = +∞, (4.13) x→a+0 x→a+0 x→a+0 x→a+0 then f (x) = A. x→a+0 g(x) lim (4.14) The analogous statements are of course also true if x → b − 0, or if g(x) → −∞. Proof. First we consider the case of finite a ∈ R. (a) One can extend the definition of f and g via f (a) = g(a) = 0. Then f and g are continuous at a. By the generalized mean value theorem, for every x ∈ (a, b) there exists a ξ ∈ (a, x) such that f (x) f ′ (ξ) f (x) − f (a) = = ′ . g(x) − g(a) g(x) g (ξ) If x approaches a then ξ also approaches a, and (a) follows. (b) Now let f (a + 0) = g(a + 0) = +∞. Given ε > 0 choose δ > 0 such that ′ f (t) g ′ (t) − A < ε if t ∈ (a, a + δ). By the generalized mean value theorem for any x, y ∈ (a, a + δ) with x 6= y, f (x) − f (y) g(x) − g(y) − A < ε. We have f (x) − f (y) 1 − f (x) = g(x) g(x) − g(y) 1 − g(y) g(x) f (y) f (x) . The right factor tends to 1 as x approaches a, in particular there exists δ1 > 0 with δ1 < δ such that x ∈ (a, a + δ1 ) implies f (x) f (x) − f (y) g(x) − g(x) − g(y) < ε. Further, the triangle inequality gives f (x) < 2ε. − A g(x) 4.5 Taylor’s Theorem 113 This proves (b). The case x → +∞ can be reduced to the limit process y → 0 + 0 using the substitution y = 1/x. L’Hospital’s rule applies also applies in the cases A = +∞ and A = −∞. sin x cos x = lim = 1. Example 4.4 (a) lim x→0 x x→0 1 √ 1 √ x 1 2 x (b) lim = lim = lim √ = +∞. x→0+0 1 − cos x x→0+0 sin x x→0+0 2 x sin x (c) 1 log x x = lim = lim −x = 0. lim x log x = lim 1 x→0+0 x→0+0 x→0+0 − 12 x→0+0 x x 0 ∞ Remark 4.5 It is easy to transform other indefinite expressions to or of l’Hospital’s rule. 0 ∞ f 0·∞: f ·g = 1 g ∞−∞: f −g = 1 g − 1 fg g log f 1 f ; 00 : f g = e . Similarly, expressions of the form 1∞ and ∞0 can be transformed. 4.5 Taylor’s Theorem The aim of this section is to show how n times differentiable functions can be approximated by polynomials of degree n. First consider a polynomial p(x) = an xn + · · · + a1 x + a0 . We compute p′ (x) = nan xn−1 + (n − 1)an−1 xn−2 + · · · + a1 , p′′ (x) = n(n − 1)an xn−2 + (n − 1)(n − 2)an−1 xn−2 + · · · + 2a2 , .. . p(n) (x) = n! an . Inserting x = 0 gives p(0) = a0 , p′ (0) = a1 , p′′ (0) = 2a2 ,. . . , p(n) (0) = n!an . Hence, p(x) = p(0) + p(n) (0) n p′′ (0) p′′ (0) 2 x+ x +···+ x . 1! 2! n! Now, fix a ∈ R and let q(x) = p(x + a). Since q (k) (0) = p(k) (a), (4.15) gives p(x + a) = q(x) = p(x + a) = n X q (k) (0) k=0 n X k=0 k! xk , p(k) (a) k x . k! (4.15) 4 Differentiation 114 Replacing in the above equation x + a by x yields p(x) = n X p(k) (a) k=0 k! (x − a)k . (4.16) Theorem 4.15 (Taylor’s Theorem) Suppose f is a real function on [r, s], n ∈ N, f (n) is continuous on [r, s], f (n+1) (t) exists for all t ∈ (r, s). Let a and x be distinct points of [r, s] and define Pn (x) = n X f (k) (a) k=0 k! (x − a)k . (4.17) Then there exists a point ξ between x and a such that f (x) = Pn (x) + f (n+1) (ξ) (x − a)n+1 . (n + 1)! (4.18) For n = 0, this is just the mean value theorem. Pn (x) is called the nth Taylor polynomial of f at x = a, and the second summand of (4.18) Rn+1 (x, a) = f (n+1) (ξ) (x − a)n+1 (n + 1)! is called the Lagrangian remainder term. In general, the theorem shows that f can be approximated by a polynomial of degree n, and that (n+1) (4.18) allows to estimate the error, if we know the bounds of f (x) . Proof. Consider a and x to be fixed; let M be the number defined by f (x) = Pn (x) + M(x − a)n+1 and put g(t) = f (t) − Pn (t) − M(t − a)n+1 , for r ≤ t ≤ s. (4.19) We have to show that (n + 1)!M = f (n+1) (ξ) for some ξ between a and x. By (4.17) and (4.19), g (n+1) (t) = f (n+1) (t) − (n + 1)!M, for r < t < s. (4.20) Hence the proof will be complete if we can show that g (n+1) (ξ) = 0 for some ξ between a and x. (k) Since Pn (a) = f (k) (a) for k = 0, 1, . . . , n, we have g(a) = g ′ (a) = · · · = g (n) (a) = 0. Our choice of M shows that g(x) = 0, so that g ′ (ξ1 ) = 0 for some ξ1 between a and x, by Rolle’s theorem. Since g ′ (a) = 0 we conclude similarly that g ′′ (ξ2 ) = 0 for some ξ2 between a and ξ1 . After n + 1 steps we arrive at the conclusion that g (n+1) (ξn+1 ) = 0 for some ξn+1 between a and ξn , that is, between a and x. 4.5 Taylor’s Theorem 115 Definition 4.5 Suppose that f is a real function defined on [r, s] such that f (n) (t) exists for all t ∈ (r, s) and all n ∈ N. Let x and a points of [r, s]. Then Tf (x) = ∞ X f (k) (a) k=0 k! (x − a)k (4.21) is called the Taylor series of f at a. Remarks 4.6 (a) The radius r of convergence of a Taylor series can be 0. (b) If Tf converges, it may happen that Tf (x) 6= f (x). If Tf (x) at a point a converges to f (x) in a certain neighborhood Ur (a), r > 0, f is called to be analytic at a. Example 4.5 We give an example for (b). Define f : R → R via ( 2 if x 6= 0, e−1/x , f (x) = 0, if x = 0. We will show that f ∈ C∞ (R) with f (k) (0) = 0. For we will prove by induction on n that there exists a polynomial pn such that 1 −1/x2 (n) , x 6= 0 f (x) = pn e x and f (n) (0) = 0. For n = 0 the statement is clear taking p0 (x) ≡ 1. Suppose the statement is true for n. First, let x 6= 0 then ′ 2 1 −1/x2 1 ′ 1 1 2 (n+1) f e + 3 pn e−1/x . (x) = pn = − 2 pn x x x x x Choose pn+1 (t) = −p′n (t)t2 + 2pn (t)t3 . Secondly, pn f (n) (h) − f (n) (0) = lim f (n+1) (0) = lim h→0 h→0 h 1 h 2 e−1/h 2 = lim xpn (x)e−x = 0, x→±∞ h where we used Proposition 2.7 in the last equality. Hence Tf ≡ 0 at 0—the Taylor series is identically 0—and Tf (x) does not converge to f (x) in a neigborhood of 0. 4.5.1 Examples of Taylor Series (a) Power series coincide with their Taylor series. ex = ∞ X xn n=0 n! , x ∈ R, (b) f (x) = log(1 + x), see Homework 13.4. ∞ X n=0 xn = 1 , 1−x x ∈ (−1, 1). 4 Differentiation 116 (c) f (x) = (1 + x)α , α ∈ R, a = 0. We have f (k) (x) = α(α−1) · · · (α−k+1)(1+x)α−k , in particular f (k) (0) = α(α−1) · · · (α−k+1). Therefore, (1 + x)α = n X α(α − 1) · · · (α − k + 1) k! k=1 xk + Rn (x) (4.22) The quotient test shows that the corresponding power series converges for | x | < 1. Consider the Lagrangian remainder term with 0 < ξ < x < 1 and n + 1 > α. Then α α α −→ 0 (1 + ξ)α−n−1xn+1 ≤ xn+1 ≤ | Rn+1 (x) | = n+1 n+1 n+1 as n → ∞. Hence, α (1 + x) = ∞ X α n=0 n xn , 0 < x < 1. (4.23) (4.23) is called the binomial series. Its radius of convergence is R = 1. Looking at other forms of the remainder term gives that (4.23) holds for −1 < x < 1. (d) y = f (x) = arctan x. Since y ′ = 1/(1 + x2 ) and y ′′ = −2x/(1 + x2 )2 we see that y ′(1 + x2 ) = 1. Differentiating this n times and using Leibniz’s formula, Proposition 4.6 we have n X ′ (k) 2 (n−k) (y ) (1 + x ) k=0 n = 0. k n (n+1) n n 2 (n) y =⇒ y 2x + y (n−1) 2 = 0; (1 + x ) + n n−1 n−2 x=0: This yields y (n) (0) = ( 0, k (−1) (2k)!, y (n+1) + n(n − 1)y (n−1) = 0. if n = 2k, if n = 2k + 1. Therefore, n X (−1)k 2k+1 x arctan x = + R2n+2 (x). 2k + 1 k=0 (4.24) One can prove that −1 < x ≤ 1 implies R2n+2 (x) → 0 as n → ∞. In particular, x = 1 gives π 1 1 = 1− + − +··· . 4 3 5 4.6 Appendix C 117 4.6 Appendix C Corollary 4.16 (to the mean value theorem) Let f : R → R be a differentiable function with f ′ (x) = cf (x) for all x ∈ R, (4.25) where c ∈ R is a fixed number. Let A = f (0). Then f (x) = Aecx for all x ∈ R. (4.26) Proof. Consider F (x) = f (x)e−cx . Using the product rule for derivatives and (4.25) we obtain F ′ (x) = f ′ (x)e−cx + f (x)(−c)e−cx = (f ′ (x) − cf ′ (x)) e−cx = 0. By Corollary 4.11, F (x) is constant. Since F (0) = f (0) = A, F (x) = A for all x ∈ R; the statement follows. The Continuity of derivatives We have seen that there exist derivatives f ′ which are not continuous at some point. However, not every function is a derivative. In particular, derivatives which exist at every point of an interval have one important property: The intermediate value theorem holds. The precise statement follows. Proposition 4.17 Suppose f is differentiable on [a, b] and suppose f ′ (a) < λ < f ′ (b). Then there is a point x ∈ (a, b) such that f ′ (x) = λ. Proof. Put g(t) = f (t) − λt. Then g is differentiable and g ′ (a) < 0. Therefore, g(t1) < g(a) for some t1 ∈ (a, b). Similarly, g ′ (b) > 0, so that g(t2 ) < g(b) for some t2 ∈ (a, b). Hence, g attains its minimum in the open interval (a, b) in some point x ∈ (a, b). By Proposition 4.7, g ′(x) = 0. Hence, f ′ (x) = λ. Corollary 4.18 If f is differentiable on [a, b], then f ′ cannot have discontinuities of the first kind. Proof of Proposition 4.13. (a) Suppose first that f ′′ ≥ 0 for all x. By Corollary 4.11, f ′ is increasing. Let a < x < y < b and λ ∈ [0, 1]. Put t = λx + (1 − λ)y. Then x < t < y and by the mean value theorem there exist ξ1 ∈ (x, t) and ξ2 ∈ (t, y) such that f (y) − f (t) f (t) − f (x) = f ′ (ξ1 ) ≤ f ′ (ξ2 ) = . t−x y−t Since t − x = (1 − λ)(y − x) and y − t = λ(y − x) it follows that f (t) − f (x) f (y) − f (t) ≤ 1−λ λ =⇒ f (t) ≤ λf (x) + (1 − λ)f (y). 4 Differentiation 118 Hence, f is convex. (b) Let f : (a, b) → R be convex and twice differentiable. Suppose to the contrary f ′′ (x0 ) < 0 for some x0 ∈ (a, b). Let c = f ′ (x0 ); put ϕ(x) = f (x) − (x − x0 )c. Then ϕ : (a, b) → R is twice differentiable with ϕ′ (x0 ) = 0 and ϕ′′ (x0 ) < 0. Hence, by Proposition 4.12, ϕ has a local maximum in x0 . By definition, there is a δ > 0 such that Uδ (x0 ) ⊂ (a, b) and ϕ(x0 − δ) < ϕ(x0 ), ϕ(x0 + δ) < ϕ(x0 ). It follows that f (x0 ) = ϕ(x0 ) > 1 1 (ϕ(x0 − δ) + ϕ(x0 + δ)) = (f (x0 − δ) + f (x0 + δ)) . 2 2 This contradicts the convexity of f if we set x = x0 − δ, y = x0 + δ, and λ = 1/2. Chapter 5 Integration In the first section of this chapter derivatives will not appear! Roughly speaking, integration generalizes “addition”. The formula distance = velocity × time is only valid for constant Rt velocity. The right formula is s = t01 v(t) dt. We need integrals to compute length of curves, areas of surfaces, and volumes. The study of integrals requires a long preparation, but once this preliminary work has been completed, integrals will be an invaluable tool for creating new functions, and the derivative will reappear more powerful than ever. The relation between the integral and derivatives is given in the Fundamental Theorem of Calculus. The integral formalizes a simple intuitive concept—that of area. It is not a surprise that to learn the definition of an intuitive concept can present great difficulties—“area” is certainly not an exception. What’s the area ?? a b 5.1 The Riemann–Stieltjes Integral In this section we will only define the area of some very special regions—those which are bounded by the horizontal axis, the vertical lines through (a, 0) and (b, 0) and the graph of a function f such that f (x) ≥ 0 for all x in [a, b]. If f is negative on a subinterval of [a, b], the integral will represent the difference of the areas above and below the x-axis. All intervals [a, b] are finite intervals. Definition 5.1 Let [a, b] be an interval. By a partition of [a, b] we mean a finite set of points x0 , x1 , . . . , xn , where a = x0 ≤ x1 ≤ · · · ≤ xn = b. We write ∆xi = xi − xi−1 , 119 i = 1, . . . , n. 5 Integration 120 Now suppose f is a bounded real function defined on [a, b]. Corresponding to each partition P of [a, b] we put Mi = sup{f (x) | x ∈ [xi−1 , xi ]} x= a 0 x1 x2 x= 3 b mi = inf{f (x) | x ∈ [xi−1 , xi ]} n n X X U(P, f ) = Mi ∆xi , L(P, f ) = mi ∆xi , i=1 (5.1) (5.2) (5.3) i=1 and finally Z Z b f dx = inf U(P, f ), (5.4) f dx = sup L(P, f ), (5.5) a b a where the infimum and supremum are taken over all partitions P of [a, b]. The left members of (5.4) and (5.5) are called the upper and lower Riemann integrals of f over [a, b], respectively. If the upper and lower integrals are equal, we say that f Riemann-integrable on [a, b] and we write f ∈ R (that is R denotes the Riemann-integrable functions), and we denote the common value of (5.4) and (5.5) by Z b Z b f dx or by f (x) dx. (5.6) a a This is the Riemann integral of f over [a, b]. Since f is bounded, there exist two numbers m and M such that m ≤ f (x) ≤ M for all x ∈ [a, b]. Hence for every partition P m(b − a) ≤ L(P, f ) ≤ U(P, f ) ≤ M(b − a), so that the numbers L(P, f ) and U(P, f ) form a bounded set. This shows that the upper and the lower integrals are defined for every bounded function f . The question of their equality, and hence the question of the integrability of f , is a more delicate one. Instead of investigating it separately for the Riemann integral, we shall immediately consider a more general situation. Definition 5.2 Let α be a monotonically increasing function on [a, b] (since α(a) and α(b) are finite, it follows that α is bounded on [a, b]). Corresponding to each partition P of [a, b], we write ∆αi = α(xi ) − α(xi−1 ). It is clear that ∆αi ≥ 0. For any real function f which is bounded on [a, b] we put U(P, f, α) = L(P, f, α) = n X i=1 n X i=1 Mi ∆αi , (5.7) mi ∆αi , (5.8) 5.1 The Riemann–Stieltjes Integral 121 where Mi and mi have the same meaning as in Definition 5.1, and we define Z Z b f dα = inf U(P, f, α), (5.9) f dα = sup U(P, f, α), (5.10) a b a where the infimum and the supremum are taken over all partitions P . If the left members of (5.9) and (5.10) are equal, we denote their common value by Z b f dα or sometimes by a Z b f (x) dα(x). (5.11) a This is the Riemann–Stieltjes integral (or simply the Stieltjes integral) of f with respect to α, over [a, b]. If (5.11) exists, we say that f is integrable with respect to α in the Riemann sense, and write f ∈ R(α). By taking α(x) = x, the Riemann integral is seen to be a special case of the Riemann–Stieltjes integral. Let us mention explicitely, that in the general case, α need not even be continuous. We shall now investigate the existence of the integral (5.11). Without saying so every time, f will be assumed real and bounded, and α increasing on [a, b]. Definition 5.3 We say that a partition P ∗ is a refinement of the partition P if P ∗ ⊃ P (that is, every point of P is a point of P ∗ ). Given two partitions, P1 and P2 , we say that P ∗ is their common refinement if P ∗ = P1 ∪ P2 . Lemma 5.1 If P ∗ is a refinement of P , then L(P, f, α) ≤ L(P ∗ , f, α) and U(P, f, α) ≥ U(P ∗ , f, α). (5.12) Proof. We only prove the first inequality of (5.12); the proof of the second one is analogous. Suppose first that P ∗ contains just one point more than P . Let this extra point be x∗ , and suppose xi−1 ≤ x∗ < xi , where xi−1 and xi are two consecutive points of P . Put w1 = inf{f (x) | x ∈ [xi−1 , x∗ ]}, w2 = inf{f (x) | x ∈ [x∗ , xi ]}. Clearly, w1 ≥ mi and w2 ≥ mi (since inf M ≥ inf N if M ⊂ N, see homework 1.4 (b)), where, as before, mi = inf{f (x) | x ∈ [xi−1 , xi ]}. Hence L(P ∗ , f, α) − L(P, f, α) = w1 (α(x∗ ) − α(xi−1 )) + w2 (α(xi ) − α(x∗ )) − mi (α(xi ) − α(xi−1 )) = (w1 − mi )(α(x∗ ) − α(xi−1 )) + (w2 − mi )(α(xi ) − α(x∗ )) ≥ 0. If P ∗ contains k points more than P , we repeat this reasoning k times, and arrive at (5.12). 5 Integration 122 Proposition 5.2 Z b a f dα ≤ Z b f dα. a ∗ Proof. Let P be the common refinement of two partitions P1 and P2 . By Lemma 5.1 L(P1 , f, α) ≤ L(P ∗ , f, α) ≤ U(P ∗ , f, α) ≤ U(P2 , f, α). Hence L(P1 , f, α) ≤ U(P2 , f, α). If P2 is fixed and the supremum is taken over all P1 , (5.13) gives Z b f dα ≤ U(P2 , f, α). (5.13) (5.14) a The proposition follows by taking the infimum over all P2 in (5.14). Proposition 5.3 (Riemann Criterion) f ∈ R(α) on [a, b] if and only if for every ε > 0 there exists a partition P such that U(P, f, α) − L(P, f, α) < ε. (RC) Proof. For every P we have L(P, f, α) ≤ Z 0≤ Z Thus (RC) implies b a f dα ≤ b a f dα − Z Z b a f dα ≤ U(P, f, α). b f dα < ε. a since the above inequality can be satisfied for every ε > 0, we have Z b f dα = a Z b f dα, a that is f ∈ R(α). Conversely, suppose f ∈ R(α), and let ε > 0 be given. Then there exist partitions P1 and P2 such that Z b Z b ε ε U(P2 , f, α) − (5.15) f dα < , f dα − L(P1 , f, α) < . 2 2 a a We choose P to be the common refinement of P1 and P2 . Then Lemma 5.1, together with (5.15), shows that Z b ε U(P, f, α) ≤ U(P2 , f, α) < f dα + < L(P1 , f, α) + ε ≤ L(P, f, α) + ε, 2 a 5.1 The Riemann–Stieltjes Integral 123 so that (RC) holds for this partition P . Proposition 5.3 furnishes a convenient criterion for integrability. Before we apply it, we state some closely related facts. Lemma 5.4 (a) If (RC) holds for P and some ε, then (RC) holds with the same ε for every refinement of P . (b) If (RC) holds for P = {x0 , . . . , xn } and if si , ti are arbitrary points in [xi−1 , xi ], then n X i=1 | f (si) − f (ti ) | ∆αi < ε. (c) If f ∈ R(α) and (RC) holds as in (b), then Z b n X f (t )∆α − f dα < ε. i i a i=1 Proof. Lemma 5.1 implies (a). Under the assumptions made in (b), both f (si) and f (ti ) lie in [mi , Mi ], so that | f (si ) − f (ti ) | ≤ Mi − mi . Thus n X i=1 | f (ti ) − f (si ) | ∆αi ≤ U(P, f, α) − L(P, f, α), which proves (b). The obvious inequalities X L(P, f, α) ≤ f (ti )∆αi ≤ U(P, f, α) i and L(P, f, α) ≤ prove (c). Z a b f dα ≤ U(P, f, α) Theorem 5.5 If f is continuous on [a, b] then f ∈ R(α) on [a, b]. Proof. Let ε > 0 be given. Choose η > 0 so that (α(b) − α(a))η < ε. Since f is uniformly continuous on [a, b] (Proposition 3.7), there exists a δ > 0 such that | f (x) − f (t) | < η (5.16) if x, t ∈ [a, b] and | x − t | < δ. If P is any partition of [a, b] such that ∆xi < δ for all i, then (5.16) implies that Mi − mi ≤ η, i = 1, . . . , n (5.17) 5 Integration 124 and therefore U(P, f, α) − L(P, f, α) = n X i=1 (Mi − mi )∆αi ≤ η n X i=1 ∆αi = η(α(b) − α(a)) < ε. By Proposition 5.3, f ∈ R(α). Example 5.1 (a) The proof of Theorem 5.5 together with Lemma 5.4 shows that Z b n X f (ti )∆αi − f dα < ε a i=1 if ∆xi < δ. Rb We compute I = a sin x dx. Let ε > 0. Since sin x is continuous, f ∈ R. There exists δ > 0 such that | x − t | < δ implies | sin x − sin t | < ε . b−a (5.18) In this case (RC) is satisfied and consequently Z b n X sin(ti )∆xi − sin x dx < ε a i=1 for every partition P with ∆xi < δ, i = 1, . . . , n. For we choose an equidistant partition of [a, b], xi = a + (b − a)i/n, i = 0, . . . , n. Then (b − a)2 h = ∆xi = (b − a)/n and the condition (5.18) is satisfied provided n > . We have, by ε addition the formula cos(x − y) − cos(x + y) = 2 sin x sin y n X sin xi ∆xi = i=1 n X i=1 = n X h sin(a + ih)h = 2 sin h/2 sin(a + ih) 2 sin h/2 i=1 n X h (cos(a + (i − 1/2)h) − cos(a + (i + 1/2)h)) 2 sin h/2 i=1 h (cos(a + h/2) − cos(a + (n + 1/2)h)) 2 sin h/2 h/2 (cos(a + h/2) − cos(b + h/2))) = sin h/2 = Since limh→0 sin h/h = 1 and cos x is continuous, we find that the above expression tends to Rb cos a − cos b. Hence a sin x dx = cos a − cos b. (b) For x ∈ [a, b] define ( 1, x ∈ Q, f (x) = 0, x 6∈ Q. We will show f 6∈ R. Let P be any partition of [a, b]. Since any interval contains rational as well as irrational points, mi = 0 and Mi = 1 for all i. Hence L(P, f ) = 0 whereas 5.1 The Riemann–Stieltjes Integral 125 P U(P, f ) = ni=1 ∆xi = b − a. We conclude that the upper and lower Riemann integrals don’t coincide; f 6∈ R. A similar reasoning shows f 6∈ R(α) if α(b) > α(a) since L(P, f, α) = 0 < U(P, f, α) = α(b) − α(a). Proposition 5.6 If f is monotonic on [a, b], and α is continuous on [a, b], then f ∈ R(α). α (x) Let ε > 0 be given. For any positive integer n, choose a partition such that ∆αi = Proof. α(b) − α(a) , n i = 1, . . . , n. This is possible by the intermediate value theorem (Theorem 3.5) since α is continuous. a x 1 x2 x b n-1 We suppose that f is monotonically increasing (the proof is analogous in the other case). Then Mi = f (xi ), mi = f (xi−1 ), i = 1, . . . , n, so that n U(P, f, α) − L(P, f, α) = = α(b) − α(a) X (f (xi ) − f (xi−1 )) n i=1 α(b) − α(a) (f (b) − f (a)) < ε n if n is taken large enough. By Proposition 5.3, f ∈ R(α). Without proofs which can be found in [Rud76, pp. 126 –128] we note the following facts. Proposition 5.7 If f is bounded on [a, b], f has finitely many points of discontinuity on [a, b], and α is continuous at every point at which f is discontinuous. Then f ∈ R(α). Proof. We give an idea of the proof in case of the Riemann integral (α(x) = x) and one single discontinuity at c, a < c < b. For, let ε > 0 be given and m ≤ f (x) ≤ M for all x ∈ [a, b] and put C = M − m. First choose point a′ and b′ with a < a′ < c < b′ < b and C(b′ − a′ ) < ε. Let fj , j = 1, 2, denote the restriction of f to the subintervals I1 = [a, a′ ] and I2 = [b, b′ ], respectively. Since fj is continuous on Ij , fj ∈ R over Ij and therefore, by the Riemann criterion, there exist partitions Pj , j = 1, 2, of Ij such that U(Pj , fj ) − L(Pj , fj ) < ε, j = 1, 2. Let P = P1 ∪ P2 be a partition of [a, b]. Then U(P, f ) − L(P, f ) = U(P1 , f1 ) − L(P1 , f ) + U(P2 , f ) − L(P2 , f ) + (M0 − m0 )(b′ − a′ ) ≤ ε + ε + C(b′ − a′ ) < 3ε, 5 Integration 126 where M0 and m0 are the supremum and infimum of f (x) on [a′ , b′ ]. The Riemann criterion is satisfied for f on [a, b], f ∈ R. Proposition 5.8 If f ∈ R(α) on [a, b], m ≤ f (x) ≤ M, ϕ is continuous on [m, M], and h(x) = ϕ(f (x)) on [a, b]. Then h ∈ R(α) on [a, b]. Remark 5.1 (a) A bounded function f is Riemann-integrable on [a, b] if and only if f is continuous almost everywhere on [a, b]. (The proof of this fact can be found in [Rud76, Theorem 11.33]). “Almost everywhere” means that the discontinuities form a set of (Lebesgue) measure 0. A set S M ⊂ R has measure 0 if for given ε > 0 there exist intervals In , n ∈ N such that M ⊂ n∈N In P and n∈N | In | < ε. Here, | I | denotes the length of the interval. Examples of sets of measure 0 are finite sets, countable sets, and the Cantor set (which is uncountable). (b) Note that such a “chaotic” function (at point 0) as ( cos x1 , x 6= 0, f (x) = 0, x = 0, is integrable on [−π, π] since there is only one single discontinuity at 0. 5.1.1 Properties of the Integral Proposition 5.9 (a) If f1 , f2 ∈ R(α) on [a, b] then f1 + f2 ∈ R(α), cf ∈ R(α) for every constant c and Z b Z b Z b Z b Z b (f1 + f2 ) dα = f1 dα + f2 dα, cf dα = c f dα. a a a a a (b) If f1 , f2 ∈ R(α) and f1 (x) ≤ f2 (x) on [a, b], then Z b Z b f1 dα ≤ f2 dα. a a (c) If f ∈ R(α) on [a, b] and if a < c < b, then f ∈ R(α) on [a, c] and on [c, b], and Z b Z c Z b f dα = f dα + f dα. a a c (d) If f ∈ R(α) on [a, b] and | f (x) | ≤ M on [a, b], then Z b ≤ M(α(b) − α(a)). f dα a (e) If f ∈ R(α1 ) and f ∈ R(α2 ), then f ∈ R(α1 + α2 ) and Z b Z b Z b f d(α1 + α2 ) = f dα1 + f dα2 ; a a a 5.1 The Riemann–Stieltjes Integral 127 if f ∈ R(α) and c is a positive constant, then f ∈ R(cα) and Z b Z b f d(cα) = c f dα. a a Proof. If f = f1 + f2 and P is any partition of [a, b], we have L(P, f1 , α) + L(P, f2 , α) ≤ L(P, f, α) ≤ U(P, f, α) ≤ U(P, f1 , α) + U(P, f2 , α) (5.19) since inf f1 + inf f2 ≤ inf (f1 + f2 ) and sup f1 + sup f2 ≥ sup(f1 + f2 ). Ii Ii Ii Ii Ii Ii If f1 ∈ R(α) and f2 ∈ R(α), let ε > 0 be given. There are partitons Pj , j = 1, 2, such that U(Pj , fj , α) − L(Pj , fj , α) < ε. These inequalities persist if P1 and P2 are replaced by their common refinement P . Then (5.19) implies U(P, f, α) − L(P, f, α) < 2ε which proves that f ∈ R(α). With the same P we have Z b fj dα + ε, U(P, fj , α) < j = 1, 2; a since L(P, f, α) < Rb f dα < U(P, f, α); hence (5.19) implies Z b Z b Z b f dα ≤ U(P, f, α) < f1 dα + f2 dα + 2ε. a a a a Since ε was arbitrary, we conclude that Z b Z b Z b f dα ≤ f1 dα + f2 dα. a a (5.20) a If we replace f1 and f2 in (5.20) by −f1 and −f2 , respectively, the inequality is reversed, and the equality is proved. Rb (b) Put f = f2 − f1 . It suffices to prove that a f dα ≥ 0. For every partition P we have mi ≥ 0 since f ≥ 0. Hence Z b n X f dα ≥ L(P, f, α) = mi ∆αi ≥ 0 a i=1 since in addition ∆αi = α(xi ) − α(xi−1 ) ≥ 0 (α is increasing). The proofs of the other assertions are so similar that we omit the details. In part (c) the point is that (by passing to refinements) we may restrict ourselves to partitions which contain the point Rb c, in approximating a f dα, cf. Homework 14.5. Note that in (c), f ∈ R(α) on [a, c] and on [c, b] in general does not imply that f ∈ R(α) on [a, b]. For example consider the interval [−1, 1] with ( 0, −1 ≤ x < 0, f (x) = α(x) = 1, 0 ≤ x ≤ 1. 5 Integration 128 R1 Then 0 f dα = 0. The integral vanishes since α is constant on [0, 1]. However, f 6∈ R(α) on [−1, 1] since for any partition P including the point 0, we have U(P, f, α) = 1 and L(P, f, α) = 0. Proposition 5.10 If f, g ∈ R(α) on [a, b], then (a) f g ∈ R(α); Z b Z b (b) | f | ∈ R(α) and f dα ≤ | f | dα. a a Proof. If we take ϕ(t) = t2 , Proposition 5.8 shows that f 2 ∈ R(α) if f ∈ R(α). The identity 4f g = (f + g)2 − (f − g)2 completes the proof of (a). If we take ϕ(t) = | t |, Proposition 5.8 shows that | f | ∈ R(α). Choose c = ±1 so that R c f dα ≥ 0. Then since ±f ≤ | f |. Z Z Z Z f dα = c f dα = cf dα ≤ | f | dα, The unit step function or Heaviside function H(x) is defined by H(x) = 0 if x < 0 and H(x) = 1 if x ≥ 0. Example 5.2 (a) If a < s < b, f is bounded on [a, b], f is continuous at s, and α(x) = H(x−s), then Z b f dα = f (s). a For the proof, consider the partition P with n = 3; a = x0 < x1 < s = x2 < x3 = b. Then ∆α1 = ∆α3 = 0, ∆α2 = 1, and U(P, f, α) = M2 , L(P, f, α) = m2 . Since f is continuous at s, we see that M2 and m2 converge to f (s) as x → s. (b) Suppose cn ≥ 0 for all n = 1, . . . , N and (sn ), n = 1, . . . , N, is a strictly increasing finite P sequence of distinct points in (a, b). Further, α(x) = N n=1 cn H(x − sn ). Then Z b f dα = a This follows from (a) and Proposition 5.9 (e). N X n=1 cn f (sn ). 5.1 The Riemann–Stieltjes Integral 129 Proposition 5.11 Suppose cn ≥ 0 for all positive integers P∞ n ∈ N, n=1 cn converges, (sn ) is a strictly increasing sequence of distinct points in (a, b), and α(x) = ∞ X n=1 c3 cn H(x − sn ). (5.21) c1 Let f be continuous on [a, b]. Then Z b f dα = a ∞ X c2 s1 cn f (sn ). s2 s 3 s4 (5.22) n=1 Proof. The comparison test shows that the series (5.21) converges for every x. Its sum α is P evidently an increasing function with α(a) = 0 and α(b) = cn . Let ε > 0 be given, choose N so that ∞ X cn < ε. n=N +1 Put α1 (x) = N X n=1 cn H(x − sn ), α2 (x) = n=N +1 By Proposition 5.9 and Example 5.2 Z ∞ X b f dα1 = a N X cn H(x − sn ). cn f (sn ). n=1 Since α2 (b) − α2 (a) < ε, by Proposition 5.9 (d), Z b ≤ Mε, f dα 2 a where M = sup | f (x) |. Since α = α1 + α2 it follows that Z N b X f dα − cn f (sn ) ≤ Mε. a n=1 If we let N → ∞ we obtain (5.22). Proposition 5.12 Assume that α is increasing and α′ ∈ R on [a, b]. Let f be a bounded real function on [a, b]. Then f ∈ R(α) if and only if f α′ ∈ R. In that case Z b Z b f dα = f (x)α′ (x) dx. (5.23) a a The statement remains true if α is continuous on [a, b] and differentiable up to finitely many points c1 , c2 , . . . , cn . 5 Integration 130 Proof. Let ε > 0 be given and apply the Riemann criterion Proposition 5.3 to α′ : There is a partition P = {x0 , . . . , xn } of [a, b] such that U(P, α′ ) − L(P, α′) < ε. (5.24) The mean value theorem furnishes points ti ∈ [xi−1 , xi ] such that ∆αi = α(xi ) − α(xi−1 ) = α′ (ti )(xi − xi−1 ) = α′ (ti )∆xi , If si ∈ [xi−1 , xi ], then n X i=1 for | α′ (si ) − α′ (ti ) | ∆xi < ε i = 1, . . . , n. (5.25) by (5.24) and Lemma 5.4 (b). Put M = sup | f (x) |. Since n X f (si )∆αi = i=1 n X f (si )α′ (ti )∆xi i=1 it follows from (5.25) that n n X X f (si)∆αi − f (si )α′ (si )∆xi ≤ Mε. i=1 In particular, (5.26) i=1 n X i=1 f (si )∆αi ≤ U(P, f α′ ) + Mε, for all choices of si ∈ [xi−1 , xi ], so that U(P, f, α) ≤ U(P, f α′) + Mε. The same argument leads from (5.26) to U(P, f α′ ) ≤ U(P, f, α) + Mε. Thus | U(P, f, α) − U(P, f α) | ≤ Mε. (5.27) Now (5.25) remains true if P is replaced by any refinement. Hence (5.26) also remains true. We conclude that Z b Z b ′ f dα − f (x)α (x) dx ≤ Mε. a a But ε is arbitrary. Hence Z b f dα = a Z b f (x)α′ (x) dx, a for any bounded f . The equality for the lower integrals follows from (5.26) in exactly the same way. The proposition follows. We now summarize the two cases. 5.1 The Riemann–Stieltjes Integral 131 Proposition 5.13 Let f be continuous on [a, b]. Except for finitely many points c0 , c1 , . . . , cn with c0 = a and cn = b there exists α′ (x) which is continuous and bounded on [a, b] \ {c0 , . . . , cn }. Then f ∈ R(α) and Z Z b f dα = a b ′ f (x)α (x) dx + a n−1 X i=1 f (ci )(α(ci + 0) − α(ci − 0))+ f (a)(α(a + 0) − α(a)) + f (b)(α(b) − α(b − 0)). − Proof (Sketch of proof). (a) Note that A+ i = α(ci + 0) − α(ci ) and Ai = α(ci ) − α(ci − 0) exist by Theorem 3.8. Define n−1 X α1 (x) = A+ i H(x i=0 − ci ) + k X i=1 −A− i H(ci − x). (b) Then α2 = α − α1 is continuous. (c) Since α1 is piecewise constant, α1′ (x) = 0 for x 6= ck . Hence α2′ (x) = α′(x). for x 6= ci . Applying Proposition 5.12 gives Z Z b f dα2 = a Further, Z b f dα = a By Proposition 5.11 Z b a f α2′ dx = b f α′ dx. a b f d(α1 + α2 ) = a Z Z Z b ′ f α dx + a b f dα1 = a n X A+ i f (ci ) i=1 − n−1 X Z b f dα1 . a A− i (−f (ci )). i=1 Example 5.3 (a) The Fundamental Theorem of Calculus, see Theorem 5.15 yields Z 0 (b) f (x) = x2 . 2 3 x dx = Z 0 2 2 x4 x · 3x dx = 3 = 12. 4 0 2 x, 7, α(x) = x2 + 10, 64, 0 ≤ x < 1, x = 1, 1 < x < 2, x = 2. 5 Integration 132 Z 0 2 Z 2 f α′ dx + f (1)(α(1 + 0) − α(1 − 0)) + f (2)(α(2) − α(2 − 0)) Z 1 Z 2 2 = x · 1 dx + x2 · 2x dx + 1(11 − 1) + 4(64 − 14) 0 1 3 1 4 2 1 5 x x 1 = + + 10 + 200 = + 8 − + 210 = 217 . 3 0 2 1 3 2 6 f dα = 0 Remark 5.2 The three preceding proposition show the flexibility of the Stieltjes process of integration. If α is a pure step function, the integral reduces to an infinite series. If α has an initegrable derivative, the integral reduces to the ordinary Riemann integral. This makes it possible to study series and integral simultaneously, rather than separately. 5.2 Integration and Differentiation We shall see that integration and differentiation are, in a certain sense, inverse operations. Theorem 5.14 Let f ∈ R on [a, b]. For a ≤ x ≤ b put Z x f (t) dt. F (x) = a Then F is continuous on [a, b]; furthermore, if f is continuous at x0 ∈ [a, b] then F is differentiable at x0 and F ′ (x0 ) = f (x0 ). Proof. Since f ∈ R, f is bounded. Suppose | f (t) | ≤ M on [a, b]. If a ≤ x < y ≤ b, then Z y | F (y) − F (x) | = f (t) dt ≤ M(y − x), x by Proposition 5.9 (c) and (d). Given ε > 0, we see that | F (y) − F (x) | < ε, provided that | y − x | < ε/M. This proves continuity (and, in fact, uniform continuity) of F . Now suppose that f is continuous at x0 . Given ε > 0, choose δ > 0 such that | f (t) − f (x0 ) | < ε if | t − x0 | < δ, t ∈ [a, b]. Hence, if x0 − δ < s ≤ x0 ≤ t < x0 + δ, and a ≤ s < t ≤ b, we have by Proposition 5.9 (d) as before Z t Z t 1 F (t) − F (s) 1 f (r) dr − f (x0 ) dr − f (x0 ) = t−s t−s s t−s s Z t 1 = (f (u) − f (x0 )) du < ε. t−s s 5.2 Integration and Differentiation 133 This in particular holds for s = x0 , that is F (t) − F (x0 ) < ε. − f (x ) 0 t − x0 It follows that F ′ (x0 ) = f (x0 ). Definition 5.4 A function F : [a, b] → R is called an antiderivative or a primitive of a function f : [a, b] → R if F is differentiable and F ′ = f . Remarks 5.3 (a) There exist functions f not having an antiderivative, for example the Heaviside function H(x) has a simple discontinuity at 0; but by Corollary 4.18 derivatives cannot have simple discontinuities. (b) The antiderivative F of a function f (if it exists) is unique up to an additive constant. More precisely, if F is a antiderivative on [a, b], then F1 (x) = F (x) + c is also a antiderivative of f . If F and G are antiderivatives of f on [a, b], then there is a constant c so that F (x) − G(x) = c. The first part is obvious since F1′ (x) = F ′ (x) + c′ = f (x). Suppose F and G are antiderivatives of f . Put H(x) = F (x) − G(x); then H ′ (x) = 0 and H(x) is constant by Corollary 4.11. Notation for the antiderivative: F (x) = Z f (x) dx = Z f dx. The function f is called the integrand. Integration and differentiation are inverse to each other: Z Z d f (x) dx = f (x), f ′ (x) dx = f (x). dx Theorem 5.15 (Fundamental Theorem of Calculus) Let f : [a, b] → R be continuous. (a) If Z x F (x) = f (t) dt. a Then F (x) is an antiderivative of f (x) on [a, b]. (b) If G(x) is an antiderivative of f (x) then Z b a f (t) dt = G(b) − G(a). Rx Proof. (a) By Theorem 5.14 F (x) = a f (x) dx is differentiable at any point x0 ∈ [a, b] with F ′ (x) = f (x). (b) By the above remark, the antiderivative is unique up to a constant, hence F (x) − G(x) = C. Ra Since F (a) = a f (x) dx = 0 we obtain G(b) − G(a) = (F (b) − C) − (F (a) − C) = F (b) − F (a) = F (b) = Z b f (x) dx. a 5 Integration 134 Note that the FTC is also true if f ∈ R and G is an antiderivative of f on [a, b]. Indeed, let ε > 0 be given. By the Riemann criterion, Proposition 5.3 there exists a partition P = {x0 , . . . , xn } of [a, b] such that U(P, f ) − L(P, f ) < ε. By the mean value theorem, there exist points ti ∈ [xi−1 , xi ] such that F (xi ) − F (xi−1 ) = f (ti )(xi − xi−1 ), Thus F (b) − F (a) = n X i = 1, . . . , n. f (ti )∆xi . i=1 It follows from Lemma 5.4 (c) and the above equation that Z b Z b n X f (ti )∆xi − f (x) dx = F (b) − F (a) − f (x) dx < ε. a a i=1 Since ε > 0 was arbitrary, the proof is complete. 5.2.1 Table of Antiderivatives By differentiating the right hand side one gets the left hand side of the table. function domain xα α ∈ R \ {−1}, x > 0 1 x ex x < 0 or x>0 log | x | R ax a > 0, a 6= 1, x ∈ R sin x R R cos x 1 sin2 x 1 cos2 x 1 1 + x2 1 √ 1 + x2 1 √ 1 − x2 1 √ x2 − 1 antiderivative 1 xα+1 α+1 ex ax log a − cos x sin x R \ {kπ | k ∈ Z} R\ nπ 2 + kπ | k ∈ Z − cot x o tan x R R arctan x arsinh x = log(x + −1 < x < 1 x < −1 or x>1 √ x2 + 1) arcsin x log(x + √ x2 − 1) 5.2 Integration and Differentiation 135 5.2.2 Integration Rules The aim of this subsection is to calculate antiderivatives of composed functions using antiderivatives of (already known) simpler functions. Notation: f (x)|ba := f (b) − f (a). Proposition 5.16 (a) Let f and g be functions with antiderivatives F and G, respectively. Then af (x) + bg(x), a, b ∈ R, has the antiderivative aF (x) + bG(x). Z Z Z (af + bg) dx = a f dx + b g dx (Linearity.) (b) If f and g are differentiable, and f (x)g ′ (x) has a antiderivative then f ′ (x)g(x) has a antiderivative, too: Z Z ′ f g dx = f g − f g ′ dx, (Integration by parts.) (5.28) If f and g are continuously differentiable on [a, b] then Z b Z b b ′ f g dx = f (x)g(x)|a − f g ′ dx. a (5.29) a (c) If ϕ : D → R is continuously differentiable with ϕ(D) ⊂ I, and f : I → R has a antiderivative F , then Z (5.30) f (ϕ(x))ϕ′ (x) dx = F (ϕ(x)), (Change of variable.) If ϕ : [a, b] → R is continuously differentiable with ϕ([a, b]) ⊂ I and f : I → R is continuous, then Z b Z ϕ(b) ′ f (ϕ(t))ϕ (t) dt = f (x) dx. a ϕ(a) Proof. Since differentiation is linear, (a) follows. (b) Differentiating the right hand side, we obtain Z d (f g − f g ′ dx) = f ′ g + f g ′ − f g ′ = f ′ g dx which proves the statement. (c) By the chain rule F (ϕ(x)) is differentiable with d F (ϕ(x)) = F ′ (ϕ(x))ϕ′ (x) = f (ϕ(x))ϕ′ (x), dx and (c) follows. The statements about the Riemann integrals follow from the statements about antiderivatives 5 Integration 136 using the fundamental theorem of calculus. For example, we show the second part of (c). By the above part, F (ϕ(t)) is an antiderivative of f (ϕ(t))ϕ′(t). By the FTC we have Z b f (ϕ(t))ϕ′ (t) dt = F (ϕ(t))|ba = F (ϕ(b)) − F (ϕ(a)). a On the other hand, again by the FTC, Z ϕ(b) ϕ(b) f (x) dx = F (x)|ϕ(a) = F (ϕ(b)) − F (ϕ(a)). ϕ(a) This completes the proof of (c). Corollary 5.17 Suppose F is the antiderivative of f . Z 1 (5.31) f (ax + b) dx = F (ax + b), a 6= 0; a Z ′ g (x) dx = log | g(x) | , (g differentiable and g(x) 6= 0). (5.32) g(x) R P Example 5.4 (a) The antiderivative of a polymnomial. If p(x) = nk=0 ak xk , then p(x) dx = Pn ak k+1 . k=0 k+1 x ′ (b) Put f (x) = ex and g(x) = x, then f (x) = ex and g ′(x) = 1 and we obtain Z Z x x xe dx = xe − 1 · ex dx = ex (x − 1). R R R (c) I = (0, ∞). log x dx = 1 · log x dx = x log x − x x1 dx = x log x − x. (d) Z Z Z 1 arctan x dx = 1 · arctan x dx = x arctan x − x dx 1 + x2 Z 1 1 (1 + x2 )′ = x arctan x − dx = x arctan x − log(1 + x2 ). 2 2 1+x 2 In the last equation we made use of (5.32). (e) Recurrent computation of integrals. Z dx In := , (1 + x2 )n I1 = arctan x. In = Put u = x, v ′ = Z n ∈ N. (1 + x2 ) − x2 = In−1 − (1 + x2 )n Z x2 dx . (1 + x2 )n x . Then U ′ = 1 and (1 + x2 )n Z x dx 1 (1 + x2 )1−n v= = . (1 + x2 )n 2 1−n 5.2 Integration and Differentiation Hence, 137 Z 1 x(1 + x2 )1−n 1 In = In−1 − (1 + x2 )1−n dx − 2 1−n 2(1 − n) x 2n − 3 In = In−1 . + (2n − 2)(1 + x2 )n−1 2n − 2 In particular, I2 = x 2(1+x2 ) + 12 arctan x and I3 = x 4(1+x2 )2 + 34 I2 . Proposition 5.18 (Mean Value Theorem of Integration) Let f, ϕ : [a, b] → R be continuous functions and ϕ ≥ 0. Then there exists ξ ∈ [a, b] such that Z b Z b f (x)ϕ(x) dx = f (ξ) ϕ(x) dx. (5.33) a a In particular, in case ϕ = 1 we have Z b a f (x) dx = f (ξ)(b − a) for some ξ ∈ [a, b]. Proof. Put m = inf{f (x) | x ∈ [a, b]} and M = sup{f (x) | x ∈ [a, b]}. Since ϕ ≥ 0 we obtain mϕ(x) ≤ f (x)ϕ(x) ≤ Mϕ(x). By Proposition 5.9 (a) and (b) we have Z b Z b Z b ϕ(x) dx ≤ f (x)ϕ(x) dx ≤ M ϕ(x) dx. m a a a Hence there is a µ ∈ [m, M] such that Z b Z b f (x)ϕ(x) dx = µ ϕ(x) dx. a a Since f is continuous on [a, b] the intermediate value theorem Theorem 3.5 ensures that there is a ξ with µ = f (ξ). The claim follows. Example 5.5 The trapezoid rule. Let f : [0, 1] → R be twice continuously differentiable. Then there exists ξ ∈ [0, 1] such that Z 1 1 1 f (x) dx = (f (0) + f (1)) − f ′′ (ξ). (5.34) 2 12 0 Proof. Let ϕ(x) = 12 x(1 −x) such that ϕ(x) ≥ 0 for x ∈ [0, 1], ϕ′ (x) = 12 −x, and ϕ′′ (x) = −1. Using integration by parts twice as well as Theorem 5.18 we find Z 1 Z 1 Z 1 1 ′′ ′ f (x) dx = − ϕ (x)f (x) dx = −ϕ (x)f (x)|0 + ϕ′ (x)f ′ (x) dx 0 0 0 Z 1 1 1 = (f (0) + f (1)) + ϕ(x)f ′ (x)|0 − ϕ(x)f ′′ (x) dx 2 0 Z 1 1 = (f (0) + f (1)) − f ′′ (ξ) ϕ(x) dx 2 0 1 1 = (f (0) + f (1)) − f ′′ (ξ). 2 12 5 Integration 138 Indeed, R1 0 1 x 2 1 − 12 x2 dx = 14 x2 − 16 x3 0 = 1 4 − 1 6 = 1 . 12 5.2.3 Integration of Rational Functions We will give a useful method to compute antiderivatives of an arbitrary rational function. Consider a rational function p/q where p and q are polinomials. We will assume that deg p < deg q; for otherwise we can express p/q as a polynomial function plus a rational function which is of this form, for eample x2 1 =x+1+ . x−1 x−1 Polynomials We need some preliminary facts on polynomials which are stated here without proof. Recall that a non-zero constant polynomial has degree zero, deg c = 0 if c 6= 0. By definition, the zero polynomial has degree −∞, deg 0 = −∞. Theorem 5.19 (Fundamental Theorem of Algebra) Every polynomial p of positive degree with complex coefficients has a complex root, i. e. there exists a complex number z such that p(z) = 0. Lemma 5.20 (Long Division) Let p and q be polynomials, then there exist unique polynomials r and s such that p = qs + r, deg r < deg q. Lemma 5.21 Let p be a complex polynomial of degree n ≥ 1 and leading coefficient an . Then there exist n uniquely determined numbers z1 , . . . , zn (which may be equal) such that p(z) = an (z − z1 )(z − z2 ) · · · (z − zn ). Proof. We use induction over n and the two preceding statements. In case n = 1 the linear polynomial p(z) = az + b can be written in the desired form b −b p(z) = a z − with the unique root z1 = − . a a Suppose the statement is true for all polynomials of degree n − 1. We will show it for degree n polynomials. For, let zn be a complex root of p which exists by Theorem 5.19; p(zn ) = 0. Using long division of p by the linear polynomial q(z) = z −zn we obtain a quotient polynomial p1 (z) and a remainder polynomial r(z) of degree 0 (a constant polynomial) such that p(z) = (z − zn )p1 (z) + r(z). Inserting z = zn gives p(zn ) = 0 = r(zn ); hence the constant r vanishes and we have p(z) = (z − zn )p1 (z) 5.2 Integration and Differentiation 139 with a polynomial p1 (z) of degree n − 1. Applying the induction hypothesis to p1 the statement follows. A root α of p is said to be a root of multiplicity k, k ∈ N, if α appears exactly k times among the zeros z1 , z2 , . . . , zn . In that case (z − α)k divides p(z) but (z − α)k+1 not. If p is a real polynomial, i. e. a polynomial with real coefficients, and α is a root of multiplicity k of p then α is also a root of multiplicity k of p. Indeed, taking the complex conjugation of the equation p(z) = (z − α)k q(z) we have since p(z) = p(z) = p(z) p(z) = (z − α)k q(z) =⇒ p(z) = (z − α)k q(z). z:=z Note that the product of the two complex linear factors z − α and z − α yield a real quadratic factor (z − α)(z − α) = z 2 − (α + α)z + αα = z 2 − 2 Re α + | α |2 . Using this fact, the real version of Lemma 5.21 is as follows. Lemma 5.22 Let q be a real polynomial of degree n with leading coefficient an . Then there exist real numbers αi , βj , γj and multiplicities ri , sj ∈ N, i = 1, . . . , k, j = 1, . . . , l such that q(x) = an k Y i=1 (x − αi ) ri l Y j=1 (x2 − 2βj x + γj )sj . We assume that the quadratic factors cannot be factored further; this means Of course, deg q = P i ri + P βj2 − γj < 0, j j = 1, . . . , l. 2sj = n. √ √ √ √ Example 5.6 (a) x4 − 4 = (x2 + 2)(x2 − 2) = (x − 2)(x + 2)(x − i 2)(x + i 2) = √ √ (x − 2)(x + 2)(x2 + 2) (b) x3 + x − 2. One can guess the first zero x1 = 1. Using long division one gets x3 −(x3 −x2 ) x2 −(x2 +x −2 = (x − 1)(x2 + x + 2) +x −2 −x ) 2x −2 −(2x −2) 0 There are no further real zeros of x2 + x + 2. 5 Integration 140 5.2.4 Partial Fraction Decomposition Proposition 5.23 Let p(x) and q(x) be real polynomials with deg p < deg q. There exist real numbers Air , Bjs , and Cjs such that ! ! sj ri k l X X Air Bjs x + Cjs p(x) X X = + (5.35) q(x) (x − αi )r (x2 − 2βj x + γj )s r=1 s=1 i=1 j=1 where the αi , βj , γj , ri , and sj have the same meaning as in Lemma 5.22. Z Z x4 Example 5.7 (a) Compute f (x) dx = dx. We use long division to obtain a ratiox3 − 1 x nal function p/q with deg p < deg q, f (x) = x + x3 −1 . To obtain the partial fraction decomposition (PFD), we need the factorization of the denominator polynomial q(x) = x3 − 1. One can guess the first real zero x1 = 1 and divide q by x − 1; q(x) = (x − 1)(x2 + x + 1). The PFD then reads x a bx + c = + 2 . 3 x −1 x−1 x +x+1 We have to determine a, b, c. Multiplication by x3 − 1 gives 0 · x2 + 1 · x + 0 = a(x2 + x + 1) + (bx + c)(x − 1) = (a + b)x2 + (a − b + c)x + a − c. The two polynomials on the left and on the right must coincide, that is, there coefficients must be equal: 0 = a − c, 1 = a − b + c, 0 = a + b; which gives a = −b = c = 13 . Hence, x3 x 1 1 1 x−1 = − . −1 3 x − 1 3 x2 + x + 1 We can integrate the first two terms but we have to rewrite the last one x2 Recall that Z x−1 1 2x + 1 3 1 = − . 2 1 +x+1 2 x +x+1 2 x+ 2+ 3 2 4 2 2x − 2β x − 2βx + γ , dx = log x2 − 2βx + γ Z x+b 1 dx . = arctan 2 2 (x + b) + a a a Therefore, Z 2x + 1 x4 1 2 1 1 1 2 √ √ . arctan dx = x log | x − 1 | − log(x + + x + 1) + x3 − 1 2 2 6 3 3 (b) If q(x) = (x − 1)3 (x + 2)(x2 + 2)2 (x2 + 1) and p(x) is any polynomial with deg p < deg q = 10, then the partial fraction decomposition reads as A11 A12 B11 x + C11 B12 x + C12 B21 + C21 A13 A21 p(x) = + + + . + + + 2 3 q(x) x − 1 (x − 1) (x − 1) x+2 x2 + 2 (x2 + 2)2 x2 + 1 (5.36) 5.2 Integration and Differentiation 141 Suppose now that p(x) ≡ 1. One can immediately compute A13 and A21 . Multiplying (5.36) by (x − 1)3 yields 1 = A13 + (x − 1)p1 (x) (x + 2)(x2 + 2)2 (x2 + 1) with a rational function p1 not having (x − 1) in the denominator. Inserting x = 1 gives 1 1 = 54 . Similarly, A13 = 32 ·3·2 1 1 A21 = . = 3 2 2 2 3 (x − 1) (x + 2) (x + 1) x=−2 (−3) · 6 · 5 5.2.5 Other Classes of Elementary Integrable Functions An elementary function is the compositions of rational, exponential, trigonometric functions and their inverse functions, for example √ esin( x−1) f (x) = . x + log x A function is called elementary integrable if it has an elementary antiderivative. Rational functions are elementary integrable. “Most” functions are not elementary integrable as 2 e−x , ex , x 1 , log x They define “new” functions Z x t2 e− 2 dt, W (x) := Z0 x dt li(x) := log t Z0 ϕ dx p F(ϕ, k) := 0 1 − k 2 sin2 x Z ϕp 1 − k 2 sin2 x dx E(ϕ, k) := sin x . x (Gaussian integral), (integral logarithm) (elliptic integral of the first kind), (elliptic integral of the second kind). 0 R R(cos x, sin x) dx Let R(u, v) be a rational function in two variables u and v, that is R(u, v) = x mials p and q in two variables. We substitute t = tan . Then 2 sin x = Hence Z 2t , 1 + t2 R(cos x, sin x) dx = Z cos x = R 1 − t2 , 1 + t2 1 − t2 2t , 1 + t2 1 + t2 dx = 2dt . 1 + t2 2dt = 1 + t2 with another rational function R1 (t). There are 3 special cases where another substitution is apropriate. p(u,v) q(u,v) Z R1 (t) dt with polino- 5 Integration 142 (a) R(−u, v) = −R(u, v), R is odd in u. Substitute t = sin x. (b) R(u, −v) = −R(u, v), R is odd in v. Substitute t = cos x. (c) R(−u, −v) = R(u, v). Substitute t = tan x. Example 5.8 (1) Z sin3 x dx. Here, R(u, v) = v 3 is an odd function in v, such that (b) applies; t = cos x, dt = − sin x dx, sin2 x = 1 − cos2 x = 1 − t2 . This yields Z Z Z t3 3 2 sin x dx = − sin x · (− sin x dx) = − (1 − t2 ) dt = −t + + const. 3 3 cos + const. . = − cos x + 3 R (2) tan x dx. Here, R(u, v) = uv . All (a), (b), and (c) apply to this situation. For example, let t = sin x. Then cos2 x = 1 − t2 , dt = cos x dx and Z Z Z Z 1 sin x · cos x dx d(1 − t2 ) t dt = − = tan x dx = = cos2 x 1 − t2 2 1 − t2 1 = − log(1 − t2 ) = − log | cos x | . 2 R R(x, √ n ax + b) dx The substitution t= √ n ax + b yields x = (tn − b)/a, dx = ntn−1 dt/a, and therefore n Z Z √ t −b n n R , t tn−1 dt. R(x, ax + b) dx = a a R R(x, √ ax2 + 2bx + c) dx Using the method of complete squares the above integral can be written in one of the three basic forms Z Z Z √ √ √ 2 2 R(t, t − 1) dt, R(t, 1 − t2 ) dt. R(t, t + 1) dt, Further substitutions t = sinh u, t = ± cosh u, t = ± cos u, √ √ √ t2 + 1 = cosh u, dt = cosh u du, t2 − 1 = sinh u, dt = ± sinh u du, 1 − t2 = sin u, reduce the integral to already known integrals. dt = ∓ sin u du 5.3 Improper Integrals 143 Z √ dx . Hint: t = x2 + 6x + 5 − x. x2 + 6x + 5 2 2 2 Then (x + t) = x + 2tx + t = x2 + 6x + 5 such that t2 + 2tx = 6x + 5 and therefore x = and 2t(6 − 2t) + 2(t2 − 5) −2t2 + 12t − 10 dx = dt = dt. (6 − 2t)2 (6 − 2t)2 Example 5.9 Compute I = Hence, using t + x = t + √ t2 −5 6−2t = t2 −5 6−2t −t2 +6t−5 , 6−2t Z Z (−2t2 + 12t − 10) dt 1 2(6 − 2t)(−t2 + 6t − 5) I= = dt (6 − 2t)2 t+x (−t2 + 6t − 5)(6 − 2t)2 Z √ dt = − log | 6 − 2t | + const. = log 6 − 2 x2 + 6x + 5 + 2x + const. =2 6 − 2t 5.3 Improper Integrals The notion of the Riemann integral defined so far is apparently too tight for some applications: we can integrate only over finite intervals and the functions are necessarily bounded. If the integration interval is unbounded or the function to integrate is unbounded we speak about improper integrals. We consider three cases: one limit of the integral is infinite; the function is not defined at one of the end points a or b of the interval; both a and b are critical points (either infinity or the function is not defined there). 5.3.1 Integrals on unbounded intervals Definition 5.5 Suppose f ∈ R on [a, b] for all b > a where a is fixed. Define Z ∞ f (x) dx = lim b→+∞ a Z b f (x) dx (5.37) a if this limit exists (and is finite). In that case, we say that the integral on the left converges. If it also converges if f has been replaced by | f |, it is said to converge absolutely. If an integral converges absolutely, then it converges, see Example 5.11 below, where Z ∞ Z ∞ f dx ≤ | f | dx. a Similarly, one defines Z a b f (x) dx. Moreover, −∞ Z ∞ f dx := −∞ if both integrals on the right side converge. Z 0 −∞ f dx + Z ∞ 0 f dx 5 Integration 144 Z Example 5.10 (a) The integral Indeed, Z 1 Since R 1 1 R→+∞ Rs−1 Z ∞ 0 (b) Hence Z Z R 0 ∞ dx converges for s > 1 and diverges for 0 < s ≤ 1. xs R dx 1 1 1 1 1 − s−1 . · = = xs 1 − s xs−1 1 s−1 R lim it follows that ∞ e−x dx = 1. = ( 0, +∞, dx 1 , = s x s−1 if s > 1, if 0 < s < 1, if s > 1. R 1 e−x dx = −e−x 0 = 1 − R . e 0 Proposition 5.24 (Cauchy criterion) The improper integral Z ∞ f dx converges if and only if a for every ε > 0 there exists some b > a such that for all c, d > b Z d < ε. f dx c Proof. The following Cauchy criterion for limits of functions is easily proved using sequences: The limit lim F (x) exists if and only if x→∞ ∀ ε > 0 ∃ R > 0 ∀ x, y > R : | F (x) − F (y) | < ε. (5.38) Indeed, suppose that (xn ) is any sequence converging to +∞ as n → ∞. We will show that (F (xn )) converges if (5.38) is satisfied. Let ε > 0. By assumption, there exists R > 0 with the above property. Since xn −→ +∞ there exists n0 ∈ N such that n ≥ n0 implies n→∞ xn > R. Hence, | F (xn ) − F (xm ) | < ε as m, n ≥ n0 . Thus, (F (xn )) is a Cauchy sequence and therefore convergent. This proves one direction of the above criterion. The inverse direction is even simpler: Suppose that lim F (x) = A exists (and is finite!). We will show that the above x→+∞ criterion is satisfied.Let ε > 0. By definition of the limit there exists R > 0 such that x, y > R imply | F (x) − A | < ε/2 and | F (y) − A | < ε/2. By the triangle inequality, | F (x) − F (y) | = | F (x) − A − (F (y) − A) | ≤ | F (x) − A | + | F (y) − A | < ε ε + = ε, 2 2 as x, y > R which completes the proof of this Cauchy criterion. R Rt d Applying this criterion to the function F (t) = a f dx noting that | F (d) − F (c) | = c f dx , the limit limt→∞ F (t) exists. 5.3 Improper Integrals 145 R∞ R∞ Example 5.11 (a) If a f dx converges absolutely, then a f dx converges. Indeed, let ε > 0 R∞ and a | f | dx converges. By the Cauchy Criterion for the later integral and by the triangle inequality, Proposition 5.10, there exists b > 0 such that for all c, d > b Z d Z d f dx ≤ | f | dx < ε. (5.39) c c R∞ Hence, the Cauchy criterion is satisfied for f if it holds for | f |. Thus, a f dx converges. R∞ (b) 1 sinx x dx. Partial integration with u = x1 and v ′ = sin x yields u′ = − x12 , v = − cos x and d Z d Z d 1 sin x cos x dx = − cos x − dx x x x2 c c c Z d Z d 1 1 sin x dx ≤ − cos d + cos c + dx d 2 x c c c x 1 1 1 1 1 1 <ε + ≤ + + − ≤2 c d d c c d R∞ if c and d are sufficiently large. Hence, 1 sinx x dx converges. The integral does not converge absolutely. For non-negative integers n ∈ Z+ we have Z (n+1)π Z (n+1)π sin x 1 2 dx ≥ ; | sin x | dx = x (n + 1)π nπ (n + 1)π nπ hence n X sin x 1 dx ≥ 2 . x π k=1 k + 1 1 R∞ Since the harmonic series diverges, so does the integral π sinx x dx. R∞ Proposition 5.25 Suppose f ∈ R is nonnegative, f ≥ 0. Then a f dx converges if there exists C > 0 such that Z b f dx < C, for all b > a. Z (n+1)π a The proof is similar to the proof of Lemma 2.19 (c); we omit it. Analogous propositions are true Ra for integrals −∞ f dx. Proposition 5.26 (Integral criterion for series) Assume that f ∈ R is nonnegative f ≥ 0 and R∞ P decreasing on [1, +∞). Then 1 f dx converges if and only if the series ∞ n=1 f (n) converges. Proof. Since f (n) ≤ f (x) ≤ f (n − 1) for n − 1 ≤ x ≤ n, Z n f dx ≤ f (n − 1). f (n) ≤ n−1 Summation over n = 2, 3, . . . , N yields N X n=2 f (n) ≤ Z N 1 f dx ≤ N −1 X n=1 f (n). 5 Integration 146 R∞ P If 1 f dx converges the series ∞ and therefore convergent. n=1 f (n) is bounded RR P∞ P∞ Conversely, if n=1 f (n) converges, the integral 1 f dx ≤ n=1 f (n) is bounded as R → ∞, hence convergent by Proposition 5.25. R ∞ dx P∞ 1 Example 5.12 n=2 n(log n)α converges if and only if 2 x(log x)α converges. The substitution gives y = log x, dy = dx x Z ∞ Z ∞ dx dy = α α x(log x) 2 log 2 y which converges if and only if α > 1 (see Example 5.10). 5.3.2 Integrals of Unbounded Functions Definition 5.6 Suppose f is a real function on [a, b) and f ∈ R on [a, t] for every t, a < t < b. Define Z Z b t f dx = lim f dx t→b−0 a a if the limit on the right exists. Similarly, one defines Z b f dx = lim t→a+0 a Z b f dx t if f is unbounded at a and integrable on [t, b] for all t with a < t < b. Rb In both cases we say that a f dx converges. Example 5.13 (a) Z 1 Z t dx dx π √ √ = lim = lim arcsin x|t0 = lim arcsin t = arcsin 1 = . 2 2 t→1−0 t→1−0 t→1−0 2 1−x 1−x 0 0 (b) Z 1 0 dx = lim t→0+0 xα Z 1 t dx = lim t→0+0 xα ( 1 1−α 1 x 1−α t 1 log x|t , , α 6= 1 α=1 = ( 1 , 1−α α < 1, +∞, α ≥ 1. Remarks 5.4 (a) The analogous statements to Proposition 5.24 and Proposition 5.25 are true Rb for improper integrals a f dx. R 1 dx R1 R1 For example, 0 x(1−x) diverges since both improper integrals 02 f dx and 1 f dx diverge, 2 R 1 dx R1 dx √ √ diverges since it diverges at x = 1, finally I = converges. Indeed, the 0 0 x(1−x) x(1−x) substitution x = sin2 t gives I = π. (b) If f is unbounded both at a and at b we define the improper integral Z b f dx = a Z c f dx + a Z b f dx c 5.3 Improper Integrals 147 if c is between a and b and both improper integrals on the right side exist. (c) Also, if f is unbounded at a define Z ∞ Z b Z ∞ f dx = f dx + f dx a a b if the two improper integrals on the right side exist. (d) If f is unbounded in the interior of the interval [a, b], say at c, we define the improper integral Z c Z b Z b f dx = f dx + f dx a a c if the two improper integrals on the right side exist. For example, Z 1 Z 0 Z 1 Z t Z 1 dx dx dx dx dx p p p p p = + = lim + lim |x| |x| | x | t→0−0 −1 | x | t→0+0 t |x| −1 −1 0 √ t √ 1 = lim −2 −x + lim 2 x = 4. −1 t→0−0 t t→0+0 5.3.3 The Gamma function For x > 0 set Γ(x) = Z ∞ tx−1 e−t dt. (5.40) 0 By Example 5.13, Γ1 (x) = By Example 5.10, Γ2 (x) = R1 0 tx−1 e−t dt converges since for every t > 0 R∞ 1 tx−1 e−t ≤ 1 t1−x . tx−1 e−t dt converges since for every t ≥ t0 tx−1 e−t ≤ 1 . t2 Note that limt→∞ tx+1 e−t = 0 by Proposition 3.11. Hence, Γ(x) is defined for every x > 0. Proposition 5.27 For every positive x xΓ(x) = Γ(x + 1). (5.41) In particular, for n ∈ N we have Γ(n + 1) = n!. Proof. Using integration by parts, Z R Z x −t x −t R t e dt = −t e ε + x ε R tx−1 e−t dt. ε Taking the limits ε → 0 + 0 and R → +∞ one has Γ(x + 1) = xΓ(x). Since by Example 5.10 Z ∞ Γ(1) = e−t dt = 1, 0 5 Integration 148 it follows from (5.41) that Γ(n + 1) = nΓ(n) = · · · = n(n − 1)(n − 2) · · · Γ(1) = n! The Gamma function interpolates the factorial function n! which is defined only for positive integers n. However, this property alone is not sufficient for a complete characterization of the Gamma function. We need another property. This will be done more in detail in the appendix to this chapter. 5.4 Integration of Vector-Valued Functions A mapping γ : [a, b] → Rk , γ(t) = (γ1 (t), . . . , γk (t)) is said to be continuous if all the mappings γi , i = 1, . . . , k, are continuous. Moreover, if all the γi are differentiable, we write γ ′ (t) = (γ1′ (t), . . . , γk′ (t)). Definition 5.7 Let f1 , . . . , fk be real functions on [a, b] and let f = (f1 , . . . , fk ) be the correponding mapping from [a, b] into Rk . If α increases on [a, b], to say that f ∈ R(α) means that fj ∈ R(α) for j = 1, . . . , k. In this case we define Z b Z b Z b f dα = f1 dα, . . . , fk dα . a a a Rb Rb In other words a f dα is the point in Rk whose jth coordinate is a fj dα. It is clear that parts (a), (c), and (e) of Proposition 5.9 are valid for these vector valued integrals; we simply apply the earlier results to each coordinate. The same is true for Proposition 5.12, Theorem 5.14, and Theorem 5.15. To illustrate this, we state the analog of the fundamental theorem of calculus. Theorem 5.28 If f = (f1 , . . . , fk ) ∈ R on [a, b] and if F = (F1 , . . . , Fk ) is an antiderivative of f on [a, b], then Z b f (x) dx = F (b) − F (a). a The analog of Proposition 5.10 (b) offers some new features. Let x = (x1 , . . . , xk ) ∈ Rk be any p vector in Rk . We denote its Euclidean norm by kxk = x21 + · · · + x2k . Proposition 5.29 If f = (f1 , . . . , fk ) ∈ R(α) on [a, b] then kf k ∈ R(α) and Z b Z b f dα kf k dα. ≤ a (5.42) a Proof. By the definition of the norm, kf k = f12 + f22 + · · · + fk2 12 . By Proposition 5.10 (a) each of the functions fi2 belong to R(α); hence so does their sum f12 + f22 + · · · + fk2 . Note that the square-root is a continuous function on the positive half line. If we 5.4 Integration of Vector-Valued Functions 149 apply Proposition 5.8 we see kf k ∈ R(α). Rb Rb To prove (5.42), put y = (y1 , . . . , yk ) with yj = a fj dα. Then we have y = a f dα, and 2 kyk = k X yj2 = j=1 k X yj Z Z bX k b fj dα = a j=1 a (yj fj ) dα. j=1 By the Cauchy–Schwarz inequality, Proposition 1.25, k X j=1 yj fj (t) ≤ kyk kf (t)k , t ∈ [a, b]. Inserting this into the preceding equation, the monotony of the integral gives 2 kyk ≤ kyk Z a b kf k dα. If y = 0, (5.42) is trivial. If y 6= 0, division by kyk gives (5.42). Integration of Complex Valued Functions This is a special case of the above arguments with k = 2. Let ϕ : [a, b] → C be a complexvalued function. Let u, v : [a, b] → R be the real and imaginary parts of ϕ, respectively; u = Re ϕ and v = Im ϕ. The function ϕ = u + iv is said to be integrable if u, v ∈ R on [a, b] and we set Z b ϕ dx = a Z b u dx + i a Z b v dx. a The fundamental theorem of calculus holds: If the complex function ϕ is Riemann integrable, ϕ ∈ R on [a, b] and F (x) is an antiderivative of ϕ, then Z a b ϕ(x) dx = F (b) − F (a). Rx Similarly, if u and v are both continuous, F (x) = a ϕ(t) dt is an antiderivative of ϕ(x). Proof. Let F = U +iV be the antiderivative of ϕ where U ′ = u and V ′ = v. By the fundamental theorem of calculus Z b Z b Z b ϕ dx = u dx + i v dx = U(b) − U(a) + i (V (b) − V (a)) = F (b) − F (a). a Example: a a Z a b b 1 αt e dt = e , α a αt α ∈ C. 5 Integration 150 5.5 Inequalities R Rb b Besides the triangle inequality a f dα ≤ a | f | dα which was shown in Proposition 5.10 we can formulate Hölder’s, Minkowski’s, and the Cauchy–Schwarz inequalities for Riemann– Stieltjes integrals. For, let p > 0 be a fixed positive real number and α an increasing function on [a, b]. For f ∈ R(α) define the Lp -norm kf kp = Z b a p | f | dα p1 . (5.43) Cauchy–Schwarz Inequality Proposition 5.30 Let f, g : [a, b] → C be complex valued functions and f, g ∈ R on [a, b]. Then Z b 2 Z b Z b 2 | f g | dx ≤ | f | dx · | g |2 dx. (5.44) a a a R R R Proof. Letting f = | f | and g = | g |, it suffices to show ( f g dx)2 ≤ f 2 dx · g 2 dx. For, Rb Rb Rb put A = a g 2 dx, B = a f g dx, and C = a f 2 dx. Let λ ∈ C be arbitrary. By the positivity and linearity of the integal, Z b Z b Z b Z b 2 2 2 (f + λg) dx = f dx + 2λ f g dx + λ g 2 dx = C + 2Bλ + Aλ2 =: h(λ). 0≤ a a a a Thus, h is non-negative for all complex values λ. Case 1. A = 0. Inserting this, we get 2Bλ + C ≥ 0 for all λ ∈ C. This implies B = 0 and C ≥ 0; the inequality is satisfied. Case 2. A > 0. Dividing the above inequality by A, we have 2 2 C B 2B B C 2 0≤λ + λ+ ≤ λ+ − + . A A A A A This is satisfied for all λ if and only if 2 B C ≤ A A and, finally B 2 ≤ AC. This completes the proof. Proposition 5.31 (a) Cauchy–Schwarz inequality. Suppose f, g ∈ R(α), then s s Z b Z b Z b Z b 2 f g dα ≤ | f g | dα ≤ | f | dα | g |2 dα or a a a a Z b | f g | dα ≤ kf k2 kgk2 . a (5.45) (5.46) 5.6 Appendix D 151 1 1 (b) Hölder’s inequality. Let p and q be positive real numbers such that + = 1. If f, g ∈ R(α), p q then Z b Z b ≤ f g dα | f g | dα ≤ kf kp kgkq . (5.47) a a (c) Minkowski’s inequality. Let p ≥ 1 and f, g ∈ R(α), then kf + gkp ≤ kf kp + kgkp . (5.48) 5.6 Appendix D The composition of an integrable and a continuous function is integrable Proof of Proposition 5.8. Let ε > 0. Since ϕ is uniformly continuous on [m, M], there exists δ > 0 such that δ < ε and | ϕ(s) − ϕ(t) | < ε if | s − t | < δ and [s, t ∈ [m, M]. Since f ∈ R(α), there exists a partition P = {x0 , x1 , . . . , xn } of [a, b] such that U(P, f, α) − L(P, f, α) < δ 2 . (5.49) Let Mi and mi have the same meaning as in Definition 5.1, and let Mi∗ and m∗i the analogous numbers for h. Divide the numbers 1, 2, . . . , n into two classes: i ∈ A if Mi − mi < δ and i ∈ B if Mi − mi > δ. For i ∈ A our choice of δ shows that Mi∗ − m∗i ≤ ε. For i ∈ B, Mi∗ − m∗i ≤ 2K where K = sup{| ϕ(t) | | m ≤ t ≤ M}. By (5.49), we have X X ∆αi ≤ (Mi − mi )∆αi < δ 2 (5.50) δ so that P i∈B i∈B i∈B ∆αi < δ. It follows that U(P, h, α) − L(P, h, α) = X i∈A (Mi∗ − m∗i )∆αi + X i∈B (Mi∗ − m∗i )∆αi ≤ ε(α(b) − α(a)) + 2Kδ < ε(α(b) − α(a) + 2K). Since ε was arbitrary, Proposition 5.3 implies that h ∈ R(α). Convex Functions are Continuous Proposition 5.32 Every convex function f : (a, b) → R, −∞ ≤ a < b ≤ +∞, is continuous. Proof. There is a very nice geometric proof in Rudin’s book “Real and Complex Analysis”, see [Rud66, 3.2 Theorem]. We give another proof here. Let x ∈ (a, b); choose a finite subinterval (x1 , x2 ) with a < x1 < x < x2 < b. Since f (x) ≤ λf (x1 ) + (1 − λ)f (x2 ), λ ∈ [0, 1], f is bounded above on [x1 , x2 ]. Chosing x3 with x1 < x3 < x the convexity of f implies f (x3 ) − f (x1 ) f (x) − f (x1 ) f (x3 ) − f (x1 ) ≤ =⇒ f (x) ≥ (x − x1 ). x3 − x1 x − x1 x3 − x1 5 Integration 152 This means that f is bounded below on [x3 , x2 ] by a linear function; hence f is bounded on [x3 , x2 ], say | f (x) | ≤ C on [x3 , x2 ]. The convexity implies 1 1 1 (x + h) + (x − h) ≤ (f (x + h) + f (x − h)) f 2 2 2 =⇒ f (x) − f (x − h) ≤ f (x + h) − f (x). Iteration yields f (x − (ν − 1)h) − f (x − νh) ≤ f (x + h) − f (x) ≤ f (x + νh) − f (x + (ν − 1)h). Summing up over ν = 1, . . . , n we have f (x) − f (x − nh) ≤ n (f (x + h) − f (x)) ≤ f (x + nh) − f (x) 1 1 =⇒ (f (x) − f (x − nh)) ≤ f (x + h) − f (x) ≤ (f (x + nh) − f (x)) . n n Let ε > 0 be given; choose n ∈ N such that 2C/n < ε and choose h such that x3 < x − nh < x < x + nh < x2 . The above inequality then implies | f (x + h) − f (x) | ≤ 2C < ε. n This shows continuity of f at x. If g is an increasing convex function and f is a convex function, then g ◦f is convex since f (λx + µy) ≤ λf (x) + µf (y), λ + µ = 1, λ, µ ≥ 0, implies g(f (λx + µy)) ≤ g(λf (x) + µg(x)) ≤ λg(f (x)) + µg(f (y)). 5.6.1 More on the Gamma Function Let I ⊂ R be an interval. A positive function F : I → R is called logarithmic convex if log F : I → R is convex, i. e. for every x, y ∈ I and every λ, 0 ≤ λ ≤ 1 we have F (λx + (1 − λ)y) ≤ F (x)λ F (y)1−λ. Proposition 5.33 The Gamma function is logarithmic convex. Proof. Let x, y > 0 and 0 < λ < 1 be given. Set p = 1/λ and q = 1/(1 − λ). Then 1/p + 1/q = 1 and we apply Hölder’s inequality to the functions f (t) = t and obtain Z ε R f (t)g(t) dt ≤ x−1 p t e− p , Z R ε g(t) = t y−1 q p1 Z f (t)p dt t e− q R ε 1q g(t)q dt . 5.6 Appendix D 153 Note that x y f (t)g(t) = t p + q −1 e−t , f (t)p = tx−1 e−t , g(t)q = ty−1 e−t . Taking the limts ε → 0 + 0 and R → +∞ we obtain 1 1 x y ≤ Γ(x) p Γ(y) q . + Γ p q Remark 5.5 One can prove that a convex function (see Definition 4.4) is continuous, see Proposition 5.32. Also, an increasing convex function of a convex function f is convex, for example ef is convex if f is. We conclude that Γ(x) is continuous for x > 0. Theorem 5.34 Let F : (0, +∞) → (0, +∞) be a function with (a) F (1) = 1, (b) F (x + 1) = xF (x), (c) F is logarithmic convex. Then F (x) = Γ(x) for all x > 0. Proof. Since Γ(x) has the properties (a), (b), and (c) it suffices to prove that F is uniquely determined by (a), (b), and (c). By (b), F (x + n) = F (x)x(x + 1) · · · (x + n) for every positive x and every positive integer n. In particular F (n + 1) = n! and it suffices to show that F (x) is uniquely determined for every x with x ∈ (0, 1). Since n + x = (1 − x)n + x(n + 1) from (c) it follows F (n + x) ≤ F (n)1−x F (n + 1)x = F (n)1−x F (n)x nx = (n − 1)!nx . Similarly, from n + 1 = x(n + x) + (1 − x)((n + 1 + x) it follows n! = F (n + 1) ≤ F (n + x)x F (n + 1 + x)1−x = F (n + x)(n + x)1−x . Combining both inequalities, n!(n + x)x−1 ≤ F (n + x) ≤ (n − 1)!nx and moreover an (x) := Since bn (x) an (x) = (n − 1)!nx n!(n + x)x−1 ≤ F (x) ≤ =: bn (x). x(x + 1) · · · (x + n − 1) x(x + 1) · · · (x + n − 1) (n+x)nx n(n+x)x converges to 1 as n → ∞, (n − 1)!nx . n→∞ x(x + 1) · · · (x + n) F (x) = lim Hence F is uniquely determined. 5 Integration 154 Stirling’s Formula We give an asymptotic formula for n! as n → ∞. We call two sequences (an ) and (bn ) to be an asymptotically equal if lim = 1, and we write an ∼ bn . n→∞ bn Proposition 5.35 (Stirling’s Formula) The asymptotical behavior of n! is n n √ . n! ∼ 2πn e Proof. Using the trapezoid rule (5.34) with f (x) = log x, f ′′ (x) = −1/x2 we have Z k+1 log x dx = k 1 1 (log k + log(k + 1)) + 2 12ξk2 with k ≤ ξk ≤ k + 1. Summation over k = 1, . . . , n − 1 gives Since R Z n 1 n X n−1 1 X 1 1 log x dx = log k − log n + . 2 12 k=1 ξk2 k=1 log x dx = x log x − x (integration by parts), we have n log n − n + 1 = n X log k = k=1 where γn = 1 − 1 12 Pn−1 1 k=1 ξk2 . n X k=1 n−1 log k − 1 n+ 2 1 1 X 1 log n + 2 12 k=1 ξk2 log n − n + γn , Exponentiating both sides of the equation we find with cn = eγn 1 n! = nn+ 2 e−n cn . (5.51) Since 0 < 1/ξk2 ≤ 1/k 2 , the limit ∞ X 1 γ = lim γn = 1 − n→∞ ξ2 k=1 k exists, and so the limit c = lim cn = eγ . n→∞ √ Proof of cn → 2π. Using (5.51) we have √ (n!)2 2n(2n)2n √ 22n (n!)2 c2n = 2√ = c2n n2n+1 (2n)! n(2n)! and limn→∞ c2n c2n = c2 c = c. Using Wallis’s product formula for π π=2 ∞ Y 2·2·4·4 · · · ·2n·2n 4k 2 2 = lim 4k 2 − 1 n→∞ 1·3·3·5 · · · ·(2n − 1)(2n + 1) k=1 (5.52) 5.6 Appendix D 155 we have n Y 4k 2 2 4k 2 − 1 k=1 ! 12 = √ 2 2·4 · · · 2n 1 √ =q 3·5 · · · (2n − 1) 2n + 1 n+ 1 =q n+ such that 1 2 √ 22 ·42 · · · (2n)2 2·3·4 · · · (2n − 1)(2n) 22n (n!)2 · , (2n)! √ Consequently, c = 1 2 · 22n (n!)2 π = lim √ . n→∞ n(2n)! 2π which completes the proof. Proof of Hölder’s Inequality Proof of Proposition 5.31. We prove (b). The other two statements are consequences, their proofs are along the lines in Section 1.3. The main idea is to approximate the integral on the left by Riemann sums and use Hölder’s inequality (1.22). Let ε > 0; without loss of generality, let f, g ≥ 0. By Proposition 5.10 f g, f p , g q ∈ R(α) and by Proposition 5.3 there exist partitions P1 , P2 , and P3 of [a, b] such that U(f g, P1 , α)−L(f g, P1, α) < ε, U(f p , P2 , α)−L(f p , P2 , α) < ε, and U(g q , P3 , α) − L(g q , P3 , α) < ε. Let P = {x0 , x1 , . . . , xn } be the common refinement of P1 , P2 , and P3 . By Lemma 5.4 (a) and (c) Z b n X f g dα < (f g)(ti)∆αi + ε, (5.53) a n X i=1 n X i=1 b f (ti )p ∆αi < q g(ti ) ∆αi < i=1 Z Z f p dα + ε, (5.54) g q dα + ε, (5.55) a b a for any ti ∈ [xi−1 , xi ]. Using the two preceding inequalities and Hölder’s inequality (1.22) we have ! p1 ! 1q n n n 1 1 X X X f (ti )∆αip g(ti )∆αiq ≤ f (ti )p ∆αi g(ti)q ∆αi i=1 i=1 < Z i=1 b p f dα + ε a p1 Z b q g dα + ε a 1q . By (5.53), Z b a Z b p1 Z b 1q n X p q f g dα < (f g)(ti )∆αi + ε < f dα + ε g dα + ε + ε. i=1 a a 156 Since ε > 0 was arbitrary, the claim follows. 5 Integration Chapter 6 Sequences of Functions and Basic Topology In the present chapter we draw our attention to complex-valued functions (including the realvalued), although many of the theorems and proofs which follow extend to vector-valued functions without difficulty and even to mappings into more general spaces. We stay within this simple framework in order to focus attention on the most important aspects of the problem that arise when limit processes are interchanged. 6.1 Discussion of the Main Problem Definition 6.1 Suppose (fn ), n ∈ N, is a sequence of functions defined on a set E, and suppose that the sequence of numbers (fn (x)) converges for every x ∈ E. We can then define a function f by f (x) = lim fn (x), n→∞ x ∈ E. (6.1) Under these circumstances we say that (fn ) converges on E and f is the limit (or the limit function) of (fn ). Sometimes we say that “(fn ) converges pointwise to f on E” if (6.1) holds. P Similarly, if ∞ n=1 fn (x) converges for every x ∈ E, and if we define f (x) = ∞ X fn (x), n=1 the function f is called the sum of the series P∞ n=1 x ∈ E, (6.2) fn . The main problem which arises is to determine whether important properties of the functions fn are preserved under the limit operations (6.1) and (6.2). For instance, if the functions fn are continuous, or differentiable, or integrable, is the same true of the limit function? What are the relations between fn′ and f ′ , say, or between the integrals of fn and that of f ? To say that f is continuous at x means lim f (t) = f (x). t→x 157 6 Sequences of Functions and Basic Topology 158 Hence, to ask whether the limit of a sequence of continuous functions is continuous is the same as to ask whether lim lim fn (t) = lim lim fn (t) t→x n→∞ (6.3) n→∞ t→x i. e. whether the order in which limit processes are carried out is immaterial. We shall now show by means of several examples that limit processes cannot in general be interchanged without affecting the result. Afterwards, we shall prove that under certain conditions the order in which limit operations are carried out is inessential. Example 6.1 (a) Our first example, and the simplest one, concerns a “double sequence.” For positive integers m, n ∈ N let m smn = . m+n Then, for fixed n lim smn = 1, m→∞ so that lim lim smn = 1. On the other hand, for every fixed m, n→∞ m→∞ lim smn = 0, n→∞ so that lim lim smn = 0. The two limits cannot be interchanged. m→∞ n→∞ ( 0, 0 ≤ x < 1, All functions fn (x) are conti(b) Let fn (x) = xn on [0, 1]. Then f (x) = 1, x = 1. nous on [0, 1]; however, the limit f (x) is discontinuous at x = 1; that is lim lim tn = 0 6= 1 = lim lim tn . The limits cannot be interchanged. x→1−0 n→∞ n→∞ t→1−0 After these examples, which show what can go wrong if limit processes are interchanged carelessly, we now define a new notion of convergence, stronger than pointwise convergence as defined in Definition 6.1, which will enable us to arrive at positive results. 6.2 Uniform Convergence 6.2.1 Definitions and Example Definition 6.2 A sequence of functions (fn ) converges uniformly on E to a function f if for every ε > 0 there is a positive integer n0 such that n ≥ n0 implies | fn (x) − f (x) | ≤ ε for all x ∈ E. We write fn ⇉ f on E. As a formula, fn ⇉ f on E if ∀ ε > 0 ∃ n0 ∈ N ∀ n ≥ n0 ∀ x ∈ E : | fn (x) − f (x) | ≤ ε. (6.4) 6.2 Uniform Convergence 159 f(x) + ε f(x) f(x) − ε a Uniform convergence of fn to f on [a, b] means that fn is in the ε-tube of f for sufficiently large n b It is clear that every uniformly convergent sequence is pointwise convergent (to the same function). Quite explicitly, the difference between the two concepts is this: If (fn ) converges pointwise on E to a function f , for every ε > 0 and for every x ∈ E, there exists an integer n0 depending on both ε and x ∈ E such that (6.4) holds if n ≥ n0 . If (fn ) converges uniformly on E it is possible, for each ε > 0 to find one integer n0 which will do for all x ∈ E. P We say that the series ∞ k=1 fk (x) converges uniformly on E if the sequence (sn (x)) of partial sums defined by n X fk (x) sn (x) = k=1 converges uniformly on E. Proposition 6.1 (Cauchy criterion) (a) The sequence of functions (fn ) defined on E converges uniformly on E if and only if for every ε > 0 there is an integer n0 such that n, m ≥ n0 and x ∈ E imply | fn (x) − fm (x) | ≤ ε. (b) The series of functions ∞ X (6.5) gk (x) defined on E converges uniformly on E if and only if for k=1 every ε > 0 there is an integer n0 such that n, m ≥ n0 and x ∈ E imply n X gk (x) ≤ ε. k=m Proof. Suppose (fn ) converges uniformly on E and let f be the limit function. Then there is an integer n0 such that n ≥ n0 , x ∈ E implies ε | fn (x) − f (x) | ≤ , 2 so that | fn (x) − fm (x) | ≤ | fn (x) − f (x) | + | fm (x) − f (x) | ≤ ε if m, n ≥ n0 , x ∈ E. Conversely, suppose the Cauchy condition holds. By Proposition 2.18, the sequence (fn (x)) converges for every x to a limit which may we call f (x). Thus the sequence (fn ) converges 6 Sequences of Functions and Basic Topology 160 pointwise on E to f . We have to prove that the convergence is uniform. Let ε > 0 be given, choose n0 such that (6.5) holds. Fix n and let m → ∞ in (6.5). Since fm (x) → f (x) as m → ∞ this gives | fn (x) − f (x) | ≤ ε for every n ≥ n0 and x ∈ E. P (b) immediately follows from (a) with fn (x) = nk=1 gk (x). Remark 6.1 Suppose lim fn (x) = f (x), n→∞ x ∈ E. Put Mn = sup | fn (x) − f (x) | . x∈E Then fn ⇉ f uniformly on E if and only if Mn → 0 as n → ∞. (prove!) The following comparison test of a function series with a numerical series gives a sufficient criterion for uniform convergence. Theorem 6.2 (Weierstraß) Suppose (fn ) is a sequence of functions defined on E, and suppose Then P∞ n=1 | fn (x) | ≤ Mn , fn converges uniformly on E if P∞ x ∈ E, n ∈ N. n=1 (6.6) Mn converges. P Proof. If Mn converges, then, for arbitrary ε > 0 there exists n0 such that m, n ≥ n0 implies Pn i=m Mi ≤ ε. Hence, n n n X X X f (x) ≤ | f (x) | ≤ Mi ≤ ε, ∀ x ∈ E. i i tr.In. (6.6) i=m i=m i=m Uniform convergence now follows from Proposition 6.1. P Proposition 6.3 ( Comparison Test) If ∞ converges uniformly on E and | fn (x) | ≤ n=1 gn (x)P gn (x) for all sufficiently large n and all x ∈ E then ∞ n=1 fn (x) converges uniformly on E. Proof. Apply the Cauchy criterion. Note that m m m X X X fn (x) ≤ | fn (x) | ≤ gn (x) < ε. n=k n=k n=k 6.2 Uniform Convergence 161 Application of Weierstraß’ Theorem to Power Series and Fourier Series Proposition 6.4 Let ∞ X an ∈ C, an z n , n=0 (6.7) be a power series with radius of convergence R > 0. Then (6.7) converges uniformly on the closed disc {z | | z | ≤ r} for every r with 0 ≤ r < R. Proof. We apply Weierstraß’ theorem to fn (z) = an z n . Note that | fn (z) | = | an | | z |n ≤ | an | r n . P n Since r < R, r belongs to the disc of convergence, the series ∞ n=0 | an | r converges by TheoP∞ rem 2.34. By Theorem 6.2, the series n=0 an z n converges uniformly on {z | | z | ≤ r}. Remark 6.2 (a) The power series ∞ X nan z n−1 n=0 has the same radius of convergence R as the series (6.7) and hence also converges uniformly on the closed disc {z | | z | ≤ r}. Indeed, this simply follows from the fact that lim n→∞ p n (n + 1) | an+1 | = lim n→∞ √ n p 1 n | an | = . n→∞ R n + 1 lim (b) Note that the power series in general does not converge uniformly on the whole open disc of convergence | z | < R. As an example, consider the geometric series f (z) = ∞ X 1 = zk , 1−z k=0 | z | < 1. Note that the condition ∃ ε0 > 0 ∀ n ∈ N ∃ xn ∈ E : | fn (xn ) − f (xn ) | ≥ ε0 implies that (fn ) does not converge uniformly to f on E. Prove! n To ε = 1 and every n ∈ N choose zn = n+1 and we obtain, using Bernoulli’s inequality, n 1 1 znn n zn = 1 − = 1 − zn , hence ≥1−n ≥ 1. (6.8) n+1 n+1 1 − zn so that n−1 X 1 | sn−1 (zn ) − f (zn ) | = znk − 1 − zn k=0 ∞ X znn znk = ≥ 1. = 1 − zn (6.8) k=n The geometric series doesn’t converge uniformly on the whole open unit disc. 6 Sequences of Functions and Basic Topology 162 Example 6.2 (a) A series of the form ∞ X an cos(nx) + n=0 ∞ X an , bn , x ∈ R, bn sin(nx), n=1 (6.9) P∞ P is called a Fourier series (see Section 6.3 below). If both ∞ n=0 | an | and n=0 | bn | converge then the series (6.9) converges uniformly on R to a function F (x). Indeed, since | an cos(nx) | ≤ | an | and | bn sin(nx) | ≤ | bn |, by Theorem 6.2, the series (6.9) converges uniformly on R. (b) Let f : R → R be the sum of the Fourier series f (x) = ∞ X sin nx n=1 n (6.10) P P Note that (a) does not apply since n | bn | = n n1 diverges. If f (x) exists, so does f (x + 2π) = f (x), and f (0) = 0. We will show that the series converges uniformly on [δ, 2π − δ] for every δ > 0. For, put ! n n X X sn (x) = sin kx = Im eikx . k=1 k=1 If δ ≤ x ≤ 2π − δ we have n ei(n+1)x − eix X 2 1 1 ikx ≤ = | sn (x) | ≤ e = . x ≤ ix ix/2 −ix/2 e −1 |e −e | sin 2 sin δ2 k=1 Note that | Im z | ≤ | z | and eix = 1. Since sin x2 ≥ sin 2δ for δ/2 ≤ x/2 ≤ π − δ/2 we have for 0 < m < n n n X X sin kx s (x) − s (x) k k−1 = k k k=m k=m n X (x) (x) s 1 s 1 n m−1 = + − − sk (x) k k + 1 n + 1 m k=m ! n X 1 1 1 1 1 ≤ + − + n+1 m sin 2δ k=m k k + 1 1 2 1 1 1 1 ≤ ≤ − + + δ sin 2 m n + 1 n + 1 m m sin 2δ The right side becomes arbitraryly small as m → ∞. Using Proposition 6.1 (b) uniform convergence of (6.10) on [δ, 2π − δ] follows. 6.2.2 Uniform Convergence and Continuity Theorem 6.5 Let E ⊂ R be a subset and fn : E → R, n ∈ N, be a sequence of continuous functions on E uniformly converging to some function f : E → R. Then f is continuous on E. 6.2 Uniform Convergence 163 Proof. Let a ∈ E and ε > 0 be given. Since fn ⇉ f there is an r ∈ N such that | fr (x) − f (x) | ≤ ε/3 for all x ∈ E. Since fr is continuous at a, there exists δ > 0 such that | x − a | < δ implies | fr (x) − f (a) | ≤ ε/3. Hence | x − a | < δ implies | f (x) − f (a) | ≤ | f (x) − fr (x) | + | fr (x) − fr (a) | + | fr (a) − f (a) | ≤ ε ε ε + + = ε. 3 3 3 This proves the assertion. The same is true for functions f : E → C where E ⊂ C. Example 6.3 (Example 6.2 continued) (a) A power series defines a continuous functions on the disc of convergence {z | | z | < R}. Indeed, let | z0 | < R. Then there exists r ∈ R such that | z0 | < r < R. By Proposition 6.4, P n n an x converges uniformly on {x | | x | ≤ r}. By Theorem 6.5, the function, defined by the sum of the power series is continuous on [−r, r]. P (b) The sum of the Fourier series (6.9) is a continuous function on R if both n | an | and P n | bn | converge. P is continuous on [δ, 2π − δ] for all δ with (c) The sum of the Fourier series f (x) = n sin(nx) n 0 < δ < π by the above theorem. Clearly, f is 2π-periodic since all partial sums are. π/2 f(x) . π . . 2π Later (see the section on Fourier series) we will show that ( 0, x = 0, f (x) = π−x , x ∈ (0, 2π), 2 Since f is discontinuous at x0 = 2πn, the Fourier series does not converge uniformly on R. −π/2 Also, Example 6.1 (b) shows that the continuity of the fn (x) = xn alone is not sufficient for the continuity of the limit function. On the other hand, the sequence of continuous functions (xn ) on (0, 1) converges to the continuous function 0. However, the convergence is not uniform. Prove! 6.2.3 Uniform Convergence and Integration Example 6.4 Let fn (x) = 2n2 xe−n Z 0 2 x2 ; clearly limn→∞ fn (x) = 0 for all x ∈ R. Further 1 fn (x) dx = −e−n 2 x2 1 2 = 1 − e−n −→ 1. 0 6 Sequences of Functions and Basic Topology 164 R1 R1 On the other hand 0 limn→∞ fn (x) dx = 0 0 dx = 0. Thus, limn→∞ and integration cannot be interchanged. The reason, (fn ) converges pointwise to 0 but not uniformly. Indeed, 2n2 −1 2n 1 fn = e = −→ +∞. n n e n→∞ Theorem 6.6 Let α be an increasing function on [a, b]. Suppose fn ∈ R(α) on [a, b] for all n ∈ N and suppose fn → f uniformly on [a, b]. Then f ∈ R(α) on [a, b] and Z b Z b f dα = lim fn dα. (6.11) n→∞ a a Proof. Put εn = sup | fn (x) − f (x) | . x∈[a,b] Then fn − εn ≤ f ≤ fn + εn , so that the upper and the lower integrals of f satisfy Z b a (fn − εn ) dα ≤ Hence, 0≤ Z Z b a f dα − f dα ≤ Z Z b a f dα ≤ Z b (fn + εn ) dα. (6.12) a f dα ≤ 2εn (α(b) − α(a)). Since εn → 0 as n → ∞ (Remark 6.1), the upper and the lower integrals of f are equal. Thus f ∈ R(α). Another application of (6.12) yields Z b Z b Z b (fn − εn ) dα ≤ f dα ≤ (fn + εn ) dα a a a Z b Z b ≤ εn ((α(b) − α(a)). f dα − f dα n a a This implies (6.11). Corollary 6.7 If fn ∈ R(α) on [a, b] and if the series f (x) = ∞ X fn (x), n=1 a≤x≤b converges uniformly on [a, b], then Z a b ∞ X n=1 fn ! dα = ∞ Z X n=1 b a In other words, the series may be integrated term by term. fn dα . 6.2 Uniform Convergence 165 Corollary 6.8 Let fn : [a, b] → R be a sequence of continuous functions uniformly converging on [a, b] to f . Let x0 ∈ [a, b]. Rx Rx Then the sequence Fn (x) = x0 fn (t) dt converges uniformly to F (x) = x0 f (t) dt. Proof. The pointwise convergence of Fn follows from the above theorem with α(t) = t and a and b replaced by x0 and x. We show uniform convergence: Let ε > 0. Since fn ⇉ f on [a, b], there exists n0 ∈ N such ε for all t ∈ [a, b]. For n ≥ n0 and all x ∈ [a, b] we that n ≥ n0 implies | fn (t) − f (t) | ≤ b−a thus have Z x Z x ε | Fn (x) − F (x) | = (b − a) = ε. (fn (t) − f (t)) dt ≤ | fn (t) − f (t) | dt ≤ b−a x0 x0 Hence, Fn ⇉ F on [a, b]. Example 6.5 (a) For every real t ∈ (−1, 1) we have ∞ X (−1)n−1 t2 t3 tn . log(1 + t) = t − + ∓ · · · = 2 3 n n=1 (6.13) Proof. In Homework 13.5 (a) there was computed the Taylor series T (x) = ∞ X (−1)n−1 n=1 n xn of log(1 + x) and it was shown that T (x) = log(1 + x) if x ∈ (0, 1). ∞ X By Proposition 6.4 the geometric series (−1)n xn converges uniformly to the function 1 1+x n=0 on [−r, r] for all 0 < r < 1. By Corollary 6.7 we have for all t ∈ [−r, r] Z tX Z t ∞ dx t = (−1)n xn dx log(1 + t) = log(1 + x)|0 = 1 + x 0 0 n=0 t Z ∞ ∞ ∞ t X X (−1)n X (−1)n−1 n n n n+1 x = t = (−1) x dx = Cor 6.7 n + 1 n 0 0 n=0 n=0 n=1 (b) For | t | < 1 we have ∞ X t2n+1 t3 t5 (−1)n arctan t = t − + ∓ · · · = 3 5 2n + 1 n=0 (6.14) As in the previous example we use the uniform convergence of the geometric series on [−r, r] for every 0 < r < 1 that allows to exchange integration and summation Z t Z tX Z t ∞ ∞ ∞ X X dx (−1)n 2n+1 n 2n n 2n arctan t = t = (−1) x dx = (−1) x dx = . 2 2n + 1 0 1+x 0 n=0 0 n=0 n=0 6 Sequences of Functions and Basic Topology 166 Note that you are, in general, not allowed to insert t = 1 into the equations (6.13) and (6.14). However, the following proposition (the proof is in the appendix to this chapter) fills this gap. Proposition 6.9 (Abel’s Limit Theorem) Let Then the power series f (x) = P∞ n=0 ∞ X an a convergent series of real numbers. an xn n=0 converges for x ∈ [0, 1] and is continuous on [0, 1]. As a consequence of the above proposition we have ∞ X (−1)n−1 1 1 1 log 2 = 1 − + − ± · · · = , 2 3 4 n n=0 ∞ X (−1)n 1 1 1 π = 1− + − ±··· = . 4 3 5 7 2n + 1 n=0 1 −x e n ⇉ f (x) ≡ 0 on [0, +∞). Indeed, | fn (x) − 0 | ≤ n and for all x ∈ R+ . However, Example 6.6 We have fn (x) = if n ≥ 1 ε Z 0 ∞ +∞ − nt fn (t) dt = −e Hence lim n→∞ Z 0 ∞ 0 = lim t→+∞ fn (t) dt = 1 6= 0 = Z − nt 1−e ∞ 1 n <ε = 1. f (t) dt. 0 That is, Theorem 6.6 fails in case of improper integrals. 6.2.4 Uniform Convergence and Differentiation Example 6.7 Let fn (x) = sin(nx) √ , n x ∈ R, n ∈ N, (6.15) f (x) = lim fn (x) = 0. Then f ′ (x) = 0, and n→∞ fn′ (x) = √ n cos(nx), so that (fn′ ) does not converge to f ′ . For instance fn′ (0) = √ n −→ +∞ as n → ∞, whereas n→∞ √ √ f (0) = 0. Note that (fn ) converges uniformly to 0 on R since | sin(nx)/ n | ≤ 1/ n becomes small, independently on x ∈ R. Consequently, uniform convergence of (fn ) implies nothing about the sequence (fn′ ). Thus, stronger hypothesis are required for the assertion that fn → f implies fn′ → f ′ ′ 6.2 Uniform Convergence 167 Theorem 6.10 Suppose (fn ) is a sequence of continuously differentiable functions on [a, b] pointwise converging to some function f . Suppose further that (fn′ ) converges uniformly on [a, b]. Then fn converges uniformly to f on [a, b], f is continuously differentiable and on [a, b], and f ′ (x) = lim fn′ (x), a ≤ x ≤ b. n→∞ (6.16) Proof. Put g(x) = limn→∞ fn′ (x), then g is continuous by Theorem 6.5. By the Fundamental Theorem of Calculus, Theorem 5.14, Z x fn (x) = fn (a) + fn′ (t) dt. a By assumption on (fn′ ) and by Corollary 6.8 the sequence Z x ′ fn (t) dt N a converges uniformly on [a, b] to thus obtain n∈ Rx g(t) dt. Taking the limit n → ∞ in the above equation, we Z x f (x) = f (a) + g(t) dt. a a Since g is continuous, the right hand side defines a differentiable function, namely the antiderivative of g(x), by the FTC. Hence, f ′ (x) = g(x); since g is continuous the proof is now complete. For a more general result (without the additional assumption of continuity of fn′ ) see [Rud76, 7.17 Theorem]. P n Corollary 6.11 Let f (x) = ∞ n=0 an x be a power series with radius of convergence R. (a) Then f is differentiable on (−R, R) and we have f ′ (x) = ∞ X nan xn−1 , n=1 x ∈ (−R, R). (6.17) (b) The function f is infinitely often differentiable on (−R, R) and we have f (k) (x) = ∞ X n=k an = n(n − 1) · · · (n − k + 1)an xn−k , 1 (n) f (0), n! n ∈ N0 . (6.18) (6.19) In particular, f coincides with its Taylor series. P n ′ Proof. (a) By Remark 6.2 (a), the power series ∞ n=0 (an x ) has the same radius of convergence and converges uniformly on every closed subinterval [−r, r] of (−R, R). By Theorem 6.10, f (x) is differentiable and differentiation and summation can be interchanged. 6 Sequences of Functions and Basic Topology 168 (b) Iterated application of (a) yields that f (k−1) is differentiable on (−R, R) with (6.18). In particular, inserting x = 0 into (6.18) we find f (k) (0) = k!ak , =⇒ ak = f (k) (0) . k! These are exactly the Taylor coefficients of f hat a = 0. Hence, f coincides with its Taylor series. Example 6.8 For x ∈ (−1, 1) we have ∞ X nxn = n=1 x . (1 − x)2 P n Since the geometric series f (x) = ∞ n=0 x equals 1/(1 − x) on (−1, 1) by Corollary 6.11 we have ! ∞ ∞ ∞ X X 1 d d d X n 1 n = = x nxn−1 . = (x ) = (1 − x)2 dx 1 − x dx n=0 dx n=1 n=1 Multiplying the preceding equation by x gives the result. 6.3 Fourier Series In this section we consider basic notions and results of the theory of Fourier series. The question is to write a periodic function as a series of cos kx and sin kx, k ∈ N. In contrast to Taylor expansions the periodic function need not to be infinitely often differentiable. Two Fourier series may have the same behavior in one interval, but may behave in different ways in some other interval. We have here a very striking contrast between Fourier series and power series. In this section a periodic function is meant to be a 2π-periodic complex valued function on R, that is f : R → C satisfies f (x + 2π) = f (x) for all x ∈ R. Special periodic functions are the trigonometric polynomials. Definition 6.3 A function f : R → R is called trigonometric polynomial if there are real numbers ak , bk , k = 0, . . . , n with n f (x) = a0 X ak cos kx + bk sin kx. + 2 k=1 The coefficients ak and bk are uniquely determined by f since Z 1 2π ak = f (x) cos kx dx, k = 0, 1, . . . , n, π 0 Z 1 2π f (x) sin kx dx, k = 1, . . . , n. bk = π 0 (6.20) (6.21) 6.3 Fourier Series 169 This is immediate from Z 2π cos kx sin mx dx = 0, 0 Z 2π 0 Z cos kx cos mx dx = πδkm , k, m ∈ N, (6.22) 2π sin kx sin mx dx = πδkm , 0 where δkm = 1 if k = m and δkm = 0 if k 6= m is the so called Kronecker symbol, see Homework 19.2. For example, if m ≥ 1 we have ! Z Z n 1 2π a0 X 1 2π + f (x) cos mx dx = ak cos kx + bk sin kx cos mx dx π 0 π 0 2 k=1 ! n Z 1 X 2π = (ak cos kx cos mx + bk sin kx cos mx) dx π k=1 0 ! n 1 X = ak πδkm = am . π k=1 Sometimes it is useful to consider complex trigonometric polynomials. Using the formulas expressing cos x and sin x in terms of eix and e−ix we can write the above polynomial (6.20) as f (x) = n X ck eikx , (6.23) k=−n where c0 = a0 /2 and ck = 1 (ak − ibk ) , 2 c−k = 1 (ak + ibk ) , 2 k ≥ 1. To obtain the coefficients ck using integration we need the notion of an integral of a complexvalued function, see Section 5.5. If m 6= 0 we have Z b imx e a If a = 0 and b = 2π and m ∈ Z we obtain Z b 1 imx dx = e . im a 2π eimx dx = 0 ( 0, 2π, m ∈ Z \ {0}, m = 0. We conclude, 1 ck = 2π Z 2π f (x) e−ikx dx, 0 k = 0, ±1, . . . , ±n. (6.24) 6 Sequences of Functions and Basic Topology 170 Definition 6.4 Let f : R → C be a periodic function with f ∈ R on [0, 2π]. We call 1 ck = 2π Z 2π f (x)e−ikx dx, 0 k∈Z (6.25) the Fourier coefficients of f , and the series ∞ X ck eikx , (6.26) k=−∞ i. e. the sequence of partial sums sn = n X ck eikx , k=−n n ∈ N, the Fourier series of f . The Fourier series can also be written as ∞ a0 X + ak cos kx + bk sin kx. 2 (6.27) k=1 where ak and bk are given by (6.21). One can ask whether the Fourier series of a function converges to the function itself. It is easy to see: If the function f is the uniform limit of a series of trigonometric polynomials ∞ X f (x) = γk eikx (6.28) k=−∞ then f coincides with its Fourier series. Indeed, since the series (6.28) converges uniformly, by Proposition 6.6 we can change the order of summation and integration and obtain 1 ck = 2π = Z 2π 0 ∞ X γm eimx m=−∞ Z ∞ X 1 2π m=−∞ ! e−ikx dx 2π γm ei(m−k)x dx = γk . 0 In general, the Fourier series of f neither converges uniformly nor pointwise to f . For Fourier series convergence with respect to the L2 -norm kf k2 = is the appropriate notion. 1 2π Z 0 2π 21 | f | dx 2 (6.29) 6.3 Fourier Series 171 6.3.1 An Inner Product on the Periodic Functions Let V be the linear space of periodic functions f : R → C, f ∈ R on [0, 2π]. We introduce an inner product on V by Z 2π 1 f ·g = f (x) g(x) dx, f, g ∈ V. 2π 0 One easily checks the following properties for f, g, h ∈ V , λ, µ ∈ C. f + g ·h = f ·h + g ·h, f ·g + h = f ·g + f ·h, λf ·µg = λ µ f ·g, f ·g = g ·f. R 2π For every f ∈ V we have f ·f = 1/(2π) 0 | f |2 dx ≥ 0. However, f ·f = 0 does not imply f = 0 (you can change f at finitely many points without any impact on f ·f ). If f ∈ V is √ continuous, then f ·f = 0 implies f = 0, see Homework 14.3. Put kf k2 = f ·f . Note that in the physical literature the inner product in L2 (X) is often linear in the second component and antilinear in the first component. Define for k ∈ Z the periodic function ek : R → C by ek (x) = eikx , the Fourier coefficients of f ∈ V take the form ck = f ·ek , k ∈ Z. From (6.24) it follows that the functions ek , k ∈ Z, satisfy ek ·el = δkl . (6.30) Any such subset {ek | k ∈ N} of an inner product space V satisfying (6.30) is called an orthonormal system (ONS). Using ek (x) = cos kx + i sin kx the real orthogonality relations (6.22) immediately follow from (6.30). The next lemma shows that the Fourier series of f is the best L2 -approximation of a periodic function f ∈ V by trigonometric polynomials. Lemma 6.12 (Least Square Approximation) Suppose f ∈ V has the Fourier coefficients ck , k ∈ Z and let γk ∈ C be arbitrary. Then 2 2 n n X X ck ek ≤ f − γk ek , (6.31) f − k=−n 2 k=−n 2 and equality holds if and only if ck = γk for all k. Further, 2 n n X X 2 ck ek = kf k2 − | ck |2 . f − k=−n 2 k=−n (6.32) 6 Sequences of Functions and Basic Topology 172 Proof. Let P always denote n X . Put gn = k=−n f ·g n = f · and gn ·ek = γk such that X P γk ek . Then γk ek = g n ·g n = P P γk f ·ek = P ck γ k | γk |2 . Noting that | a − b |2 = (a − b)(a − b) = | a |2 + | b |2 − ab − ab, it follows that kf − gn k22 = f − gn ·f − gn = f ·f − f ·gn − gn ·f + gn ·gn = kf k22 − = kf k22 − P P ck γ k − P | ck |2 + ck γ k + P P | γk |2 | γ k − ck | 2 (6.33) which is evidently minimized if and only if γk = ck . Inserting this into (6.33), equation (6.32) follows. Corollary 6.13 (Bessel’s Inequality) Under the assumptions of the above lemma we have ∞ X k=−∞ | ck |2 ≤ kf k22 . (6.34) Proof. By equation (6.32), for every n ∈ N we have n X k=−n | ck |2 ≤ kf k22 . Taking the limit n → ∞ or supn∈N shows the assertion. An ONS {ek | k ∈ Z} is said to be complete if instead of Bessel’s inequality, equality holds for all f ∈ V . Definition 6.5 Let fn , f ∈ V , we say that (fn ) converges to f in L2 (denoted by fn −→ f ) if k·k2 lim kfn − f k2 = 0. n→∞ Explicitly Z 2π 0 | fn (x) − f (x) |2 dx −→ 0. n→∞ 6.3 Fourier Series 173 Remarks 6.3 (a) Note that the L2 -limit in V is not unique; changing f (x) at finitely many R 2π points of [0, 2π] does not change the integral 0 | f − fn |2 dx. (b) If fn ⇉ f on R then fn −→ f . Indeed, let ε > 0. Then there exists n0 ∈ N such that k·k2 n ≥ n0 implies supx∈R | fn (x) − f (x) | ≤ ε. Hence Z 2π Z 2π 2 | fn − f | dx ≤ ε2 dx = 2πε2 . 0 0 This shows fn − f −→ 0. k·k2 (c) The above Lemma, in particular (6.32), shows that the Fourier series converges in L2 to f if and only if kf k22 ∞ X = k=−∞ | ck | 2 . (6.35) This is called Parseval’s Completeness Relation. We will see that it holds for all f ∈ V . Let use write f (x) ∼ ∞ X ck eikx k=−∞ to express the fact that (ck ) are the (complex) Fourier coefficients of f . Further sn (f ) = sn (f ; x) = n X ck eikx (6.36) k=−n denotes the nth partial sum. Theorem 6.14 (Parseval’s Completeness Theorem) The ONS {ek | k ∈ Z} is complete. More precisely, if f, g ∈ V with f∼ ∞ X ck ek , k=−∞ g∼ ∞ X γk ek , k=−∞ then (i) 1 lim n→∞ 2π Z 2π 0 (ii) (iii) | f − sn (f ) |2 dx = 0, 1 2π Z 1 2π Z 0 2π f g dx = 0 2π | f |2 dx = ∞ X k=−∞ ∞ X k=−∞ (6.37) ck γ k , | ck | 2 = (6.38) ∞ a20 1 X 2 + (a + b2k ) Parseval’s formula. 4 2 k=1 k (6.39) The proof is in Rudin’s book, [Rud76, 8.16, p.191]. It uses Stone–Weierstraß theorem about the uniform approximation of a continuoous function by polynomials. An elementary proof is in Forster’s book [For01, §23]. 6 Sequences of Functions and Basic Topology 174 Example 6.9 (a) Consider the periodic function f ∈ V given by ( 1, 0≤x<π f (x) = −1, π ≤ x < 2π. Since f is an odd function the coefficients ak vanish. We compute the Fourier coefficients bk . 2 bk = π Z π 0 ( π 0, 2 2 (−1)k+1 + 1 = 4 − cos kx = sin kx dx = − kπ kπ , 0 kπ The Fourier series of f reads Noting that Z k∈ kf k22 1 = 2π Z 2π dx = 1 = 0 if k is odd.. ∞ 4 X sin(2n + 1)x . f∼ π n=0 2n + 1 X Parseval’s formula gives if k is even, | ck | 2 = a20 1 X 2 + (a + b2n ) 4 2 n∈N n ∞ 1 1X 2 8 X 8 π2 . bn = 2 =: s =⇒ s = 1 1 2 n∈N π n=0 (2n + 1)2 π2 8 P 1 Now we can compute s = ∞ n=1 n2 . Since this series converges absolutely we are allowed to rearrange the elements in such a way that we first add all the odd terms, which gives s1 and then all the even terms which gives s0 . Using s1 = π 2 /8 we find 1 1 1 s = s1 + s0 = s1 + 2 + 2 + 2 + · · · 2 4 6 1 1 1 s + 2 + · · · = s1 + s = s0 + 2 2 2 1 2 4 2 4 π s = s1 = . 3 6 (b) Fix a ∈ [0, 2π] and consider f ∈ V with ( 1, f (x) = 0, 0 ≤ x ≤ a, a < x < 2π Z a 1 a dx = and The Fourier coefficients of f are c0 = 2π 0 2π Z a 1 i e−ika − 1 , e−ikx dx = ck = f ·ek = 2π 0 2πk If k 6= 0, | ck | 2 = k 6= 0. 1 − cos ka 1 ika −ika 1 − e 1 − e = , 4π 2 k 2 2π 2 k 2 6.3 Fourier Series 175 hence Parseval’s formula gives ∞ X ∞ X 1 − cos ak a2 | ck | = 2 + 4π π2k2 k=−∞ k=1 where s = P 2 ∞ ∞ a2 1 X 1 1 X cos ak = 2+ 2 − 4π π k=1 k 2 π 2 k=1 k 2 ! ∞ X cos ak a2 1 = 2 + 2 s− , 4π π k2 k=1 1/k 2 . On the other hand kf k22 Z 1 = 2π a dx = 0 a . 2π Hence, (6.37) reads a2 1 + 4π 2 π 2 s− ∞ X cos ka k=1 ∞ X k=1 k2 ! = a 2π cos ka (a − π)2 π 2 a2 aπ π 2 − + = − . = k2 4 2 6 4 12 (6.40) Since the series ∞ X cos kx (6.41) k2 k=1 converges uniformly on R (use Theorem 6.2 and Fourier series of the function (x − π)2 π 2 − , 4 12 P k 1/k 2 is an upper bound) (6.41) is the x ∈ [0, 2π] and the Fourier series converges uniformly on R to the above function. Since the term by term differentiated series converges uniformly on [δ, 2π − δ], see Example 6.2, we obtain − ∞ X sin kx k=1 k = ′ ∞ X cos kx k2 k=1 = (x − π)2 π 2 − 4 12 ′ = x−π 2 which is true for x ∈ (0, 2π). We can also integrate the Fourier series and obtain by Corollary 6.7 Z 0 ∞ xX k=1 ∞ X 1 cos kt dt = k2 k2 k=1 = Z x 0 ∞ X sin kx k=1 k3 x ∞ X 1 cos kt dt = sin kt 3 k 0 k=1 . 6 Sequences of Functions and Basic Topology 176 On the other hand, Z x 0 (t − π)2 π 2 − 4 12 By homework 19.5 f (x) = ∞ X sin kx k3 k=1 dt = = (x − π)3 π 2 π3 − x+ . 12 12 12 π3 (x − π)3 π 2 − x+ 12 12 12 defines a continuously differentiable periodic function on R. π 2π 3π 4π π 2π 3π 4π Theorem 6.15 Let f : R → R be a continuous periodic function which is piecewise continuously differentiable, i. e. there exists a partition {t0 , . . . , tr } of [0, 2π] such that f |[ti−1 , ti ] is continuously differentiable. Then the Fourier series of f converges uniformly to f . Proof. Let ϕi : [ti−1 , ti ] → R denote the continuous derivative of f |[ti−1 , ti ] and ϕ : R → R the periodic function that coincides with ϕi on [ti−1 , ti ]. By Bessel’s inequality, the Fourier coefficients γk of ϕ satisfy ∞ X | γk |2 ≤ kϕk22 < ∞. k=−∞ If k 6= 0 the Fourier coefficients ck of f can be found using integration by parts from the Fourier coefficients of γk . Z ti Z ti i −ikx −ikx ti −ikx f (x)e − f (x)e dx = ϕ(x)e dx . ti−1 k ti−1 ti−1 Hence summation over i = 1, . . . , r yields, 1 ck = 2π ck = Z −i 2πk r 2π −ikx f (x)e 0 Z 2π 1 X dx = 2π i=1 ϕ(x)e−ikx dx = 0 Note that the term r X i=1 Z ti f (x)e−ikx dx ti−1 −iγk . k ti −ikx f (x)e ti−1 vanishes since f is continuous and f (2π) = f (0). Since for α, β ∈ C we have | αβ | ≤ 1 (| α |2 + | β |2 ), we obtain 2 1 1 2 | ck | ≤ + | γk | . 2 | k |2 6.4 Basic Topology 177 ∞ ∞ X X 1 Since both and | γk |2 converge, 2 k k=1 k=−∞ ∞ X k=−∞ | ck | < ∞. Thus, the Fourier series converges uniformly to a continuous function g (see Theorem 6.5). Since the Fourier series converges both to f and to g in the L2 norm, kf − gk2 = 0. Since both f and g are continuous, they coincide. This completes the proof. P P 2 Note that for any f ∈ V , the series | c | converges while the series k k∈Z k∈Z | ck | converges only if the Fourier series converges uniformly to f . 6.4 Basic Topology In the study of functions of several variables we need some topological notions like neighborhood, open set, closed set, and compactness. 6.4.1 Finite, Countable, and Uncountable Sets Definition 6.6 If there exists a 1-1 mapping of the set A onto the the B (a bijection), we say that A and B have the same cardinality or that A and B are equivalent; we write A ∼ B. Definition 6.7 For any nonnegative integer n ∈ N0 let Nn be the set {1, 2, . . . , n}. For any set A we say that: (a) A is finite if A ∼ Nn for some n. The empty set ∅ is also considered to be finite. (b) A is infinite if A is not finite. (c) A is countable if A ∼ N. (d) A is uncountable if A is neither finite nor countable. (e) A is at most countable if A is finite or countable. For finite sets A and B we evidently have A ∼ B if A and B have the same number of elements. For infinite sets, however, the idea of “having the same number of elements” becomes quite vague, whereas the notion of 1-1 correspondence retains its clarity. Example 6.10 (a) Z is countable. Indeed, the arrangement 0, 1, −1, 2, −2, 3, −3, . . . gives a bijection between N and Z. An infinite set Z can be equivalent to one of its proper subsets N. 6 Sequences of Functions and Basic Topology 178 (b) Countable sets represent the “smallest” infinite cardinality: No uncountable set can be a subset of a countable set. Any countable set can be arranged in a sequence. In particular, Q is contable, see Example 2.6 (c). (c) The countable union of countable sets is a countable set; this is Cantor’s First Diagonal Process: x11 → x12 x13 → x14 ... ւ ր ւ ր x21 x22 x23 x24 ... ↓ ր ւ ր x31 x32 x33 x34 ... ւ ր ւ x41 x42 x43 x44 ··· ↓ ր ··· x51 (d) Let A = {(xn ) | xn ∈ {0, 1} ∀ n ∈ N} be the set of all sequences whose elements are 0 and 1. This set A is uncountable. In particular, R is uncountable. Proof. Suppose to the contrary that A is countable and arrange the elements of A in a sequence (sn )n∈N of distinct elements of A. We construct a sequence s as follows. If the nth element in sn is 1 we let the nth digit of s be 0, and vice versa. Then the sequence s differs from every member s1 , s2 , . . . at least in one place; hence s 6∈ A—a contradiction since s is indeed an element of A. This proves, A is uncountable. 6.4.2 Metric Spaces and Normed Spaces Definition 6.8 A set X is said to be a metric space if for any two points x, y ∈ X there is associated a real number d(x, y), called the distance of x and y such that (a) d(x, x) = 0 and d(x, y) > 0 for all x, y ∈ X with x 6= y (positive definiteness); (b) d(x, y) = d(y, x) (symmetry); (c) d(x, y) ≤ d(x, z) + d(z, y) for any z ∈ X (triangle inequality). Any function d : X × X → R with these three properties is called a distance function or metric on X. Example 6.11 (a) C, R, Q, and Z are metric spaces with d(x, y) := | y − x |. Any subsets of a metric space is again a metric space. (b) The real plane R2 is a metric space with respect to p d2 ((x1 , x2 ), (y1 , y2 )) := (x1 − x2 )2 + (y1 − y2 )2 , d1 ((x1 , x2 ), (y1 , y2 )) := | x1 − x2 | + | y1 − y2 | . d2 is called the euclidean metric. 6.4 Basic Topology 179 (c) Let X be a set. Define d(x, y) := ( 1, if 0, if x 6= y, x = y. Then (X, d) becomes a metric space. It is called the discrete metric space. Definition 6.9 Let E be a vector space over C (or R). Suppose on E there is given a function k·k : E → R which associates to each x ∈ E a real number kxk such that the following three conditions are satisfied: (i) kxk ≥ 0 for every x ∈ E, and kxk = 0 if and only if x = 0, (ii) kλxk = | λ | kxk for all λ ∈ C (in R, resp.) (iii) kx + yk ≤ kxk + kyk , for all x, y ∈ E. Then E is called a normed (vector) space and kxk is the norm of x. kxk generalizes the “length” of vector x ∈ E. Every normed vector space E is a metric space if we put d(x, y) = kx − yk. Prove! However, there are metric spaces that are not normed spaces, for example (N, d(m, n) = | n − m |). Example 6.12 (a) E = Rk or E = Ck . Let x = (x1 , · · · , xk ) ∈ E and define v u k uX kxk2 = t | xi |2 . i=1 Then k·k is a norm on E. It is called the Euclidean norm. There are other possibilities to define a norm on E. For example, kxk∞ = max | xi | , 1≤i≤k kxk1 = k X i=1 | xk | , kxka = kxk2 + 3 kxk1 , kxkb = max(kxk1 , kxk2 ). (b) E = C([a, b]). Let p ≥ 1. Then kf k∞ = sup | f (x) | , x∈[a,b] kf kp = Z b a p1 | f (t) | dt . p √ define norms on E. Note that kf kp ≤ p b − a kf k∞ . P 2 (c) Hilbert’s sequence space. E = ℓ2 = {(xn ) | ∞ n=1 | xn | < ∞}. Then kxk2 = ∞ X n=1 | xn |2 ! 21 6 Sequences of Functions and Basic Topology 180 defines a norm on ℓ2 . (d) The bounded sequences. E = ℓ∞ = {(xn ) | supn∈N | xn | < ∞}. Then kxk∞ = sup | xn | N n∈ defines a norm on E. (e) E = C([a, b]). Then kf k1 = defines a norm on E. Z b a | f (t) | dt 6.4.3 Open and Closed Sets Definition 6.10 Let X be a metric space with metric d. All points and subsets mentioned below are understood to be elements and subsets of X, in particular, let E ⊂ X be a subset of X. (a) The set Uε (x) = {y | d(x, y) < ε} with some ε > 0 is called the ε-neighborhood (or ε-ball with center x) of x. The number ε is called the radius of the neighborhood Uε (x). (b) A point p is an interior or inner point of E if there is a neighborhood Uε (p) completely contained in E. E is open if every point of E is an interior point. (c) A point p is called an accumulation or limit point of E if every neighborhood of p has a point q 6= p such that q ∈ E. (d) E is said to be closed if every accumulation point of E is a point of E. The closure of E (denoted by E) is E together with all accumulation points of E. In other words p ∈ E, if and only if every neighborhood of x has a non-empty intersection with E. (e) The complement of E (denoted by E c ) is the set of all points p ∈ X such that p 6∈ E. (f) E is bounded if there exists a real number C > 0 such that d(x, y) ≤ C for all x, y ∈ E. (g) E is dense in X if E = X. Example 6.13 (a) X = R with the standard metric d(x, y) = | x − y |. E = (a, b) ⊂ R is an open set. Indeed, for every x ∈ (a, b) we have Uε (x) ⊂ (a, b) if ε is small enough, say ε ≤ min{| x − a | , | x − b |}. Hence, x is an inner point of (a, b). Since x was arbitrary, (a, b) is open. F = [a, b) is not open since a is not an inner point of [a, b). Indeed, Uε (a) ( [a, b) for every ε > 0. We have E = F = set of accumulation points = [a, b]. 6.4 Basic Topology 181 Indeed, a is an accumulation point of both (a, b) and [a, b). This is true since every neighborhood Uε (a), ε < b − a, has a + ε/2 ∈ (a, b) (resp. in [a, b)) which is different from a. For any point x 6∈ [a, b] we find a neighborhood Uε (x) with Uε (x) ∩ [a, b) = ∅; hence x 6∈ E. The set of rational numbers Q is dense in R. Indeed, every neighborhood Uε (r) of every real number r contains a rational number, see Proposition 1.11 (b). For the real line one can prove: Every open set is the at most countable union of disjoint open (finite or infinite) intervals. A similar description for closed subsets of R is false. There is no similar description of open subsets of Rk , k ≥ 2. (b) For every metric space X, both the whole space X and the empty set ∅ are open as well as closed. (c) Let B = {x ∈ Rk | kxk2 < 1} be the open unit ball in Rk . B is open (see Lemma 6.16 below); B is not closed. For example, x0 = (1, 0, . . . , 0) is an accumulation point of B since xn = (1 − 1/n, 0, . . . , 0) is a sequence of elements of B converging to x0 , however, x0 6∈ B. The accumulation points of B are B = {x ∈ Rk | kxk2 ≤ 1}. This is also the closure of B in Rk . f+ε g f- ε (d) Consider E = C([a, b]) with the supremum norm. Then g ∈ E is in the ε-neighborhood of a function f ∈ E if and only if | f (t) − g(t) | < ε, for all x ∈ [a, b]. Lemma 6.16 Every neighborhood Ur (p), r > 0, of a point p is an open set. Proof. Let q ∈ Ur (p). Then there exists ε > 0 such that d(q, p) = r − ε. We will show that Uε (q) ⊂ Ur (p). For, let x ∈ Uε (q). Then by the triangle inequality we have q x p d(x, p) ≤ d(x, q) + d(q, p) < ε + (r − ε) = r. Hence x ∈ Ur (p) and q is an interior point of Ur (p). Since q was arbitrary, Ur (p) is open. Remarks 6.4 (a) If p is an accumulation point of a set E, then every neighborhood of p contains infinitely many points of E. (b) A finite set has no accumulation points; hence any finite set is closed. Example 6.14 (a) The open complex unit disc, {z ∈ C | | z | < 1}. (b) The closed unit disc, {z ∈ C | | z | ≤ 1}. (c) A finite set. (d) The set Z of all integers. (e) {1/n | n ∈ N}. (f) The set C of all complex numbers. (g) The interval (a, b). 6 Sequences of Functions and Basic Topology 182 Here (d), (e), and (g) are regarded as subsets of R. Some properties of these sets are tabulated below: (a) (b) (c) (d) (e) (f) (g) Closed No Yes Yes Yes No Yes No Open Yes No No No No Yes Yes Bounded Yes Yes Yes No Yes No Yes Proposition 6.17 A subset E ⊂ X of a metric space X is open if and only if its complement E c is closed. Proof. First, suppose E c is closed. Choose x ∈ E. Then x 6∈ E c , and x is not an accumulation point of E c . Hence there exists a neighborhood U of x such that U ∩ E c is empty, that is U ⊂ E. Thus x is an interior point of E and E is open. Next, suppose that E is open. Let x be an accumulation point of E c . Then every neighborhood of x contains a point of E c , so that x is not an interior point of E. Since E is open, this means that x ∈ E c . It follows that E c is closed. 6.4.4 Limits and Continuity In this section we generalize the notions of convergent sequences and continuous functions to arbitrary metric spaces. Definition 6.11 Let X be a metric space and (xn ) a sequence of elements of X. We say that (xn ) converges to x ∈ X if lim d(xn , x) = 0. We write lim xn = x or xn −→ x. n→∞ n→∞ n→∞ In other words, limn→∞ xn = x if for every neighborhood Uε , ε > 0, of x there exists an n0 ∈ N such that n ≥ n0 implies xn ∈ Uε . Note that a subset F of a metric space X is closed if and only if F contains all limits of convergent sequences (xn ), xn ∈ F . Two metrics d1 and d2 on a space X are said to be topologically equivalent if limn→∞ xn = x w.r.t. d1 if and only if limn→∞ xn = x w.r.t. d2 . In particular, two norms k·k1 and k·k2 on the same linear space E are said to be equivalent if the metric spaces are topologically equivalent. Proposition 6.18 Let E1 = (E, k·k1 ) and E2 = (E, k·k2 ) be normed vector spaces such that there exist positive numbers c1 , c2 > 0 with c1 kxk1 ≤ kxk2 ≤ c2 kxk1 , Then k·k1 and k·k2 are equivalent. for all x ∈ E. (6.42) 6.4 Basic Topology 183 Proof. Condition (6.42) is obviously symmetric with respect to E1 and E2 since kxk2 /c2 ≤ kxk1 ≤ kxk2 /c1 . Therefore, it is sufficient to show the following: If xn −→ x w.r.t. k·k1 then n→∞ xn −→ x w.r.t. k·k2 . Indeed, by Definition, limn→∞ kxn − xk1 = 0. By assumption, n→∞ c1 kxn − xk1 ≤ kxn − xk2 ≤ c2 kxn − xk1 , n ∈ N. Since the first and the last expressions tend to 0 as n → ∞, the sandwich theorem shows that limn→∞ kxn − xk2 = 0, too. This proves xn → x w.r.t. k·k2 . Example 6.15 Let E = Rk or E = Ck with the norm kxkp = All these norms are equivalent. Indeed, kxkp∞ ≤ k X i=1 p | xi | ≤ =⇒ kxk∞ ≤ kxkp ≤ √ p k X i=1 p p | x1 |p + · · · | xk |p , p ∈ [1, ∞]. kxkp∞ = k kxkp∞ , k kxk∞ . (6.43) The following Proposition is quite analogous to Proposition 2.33 with k = 2. Recall that a complex sequence (zn ) converges if and only if both Re zn and Im zn converge. Proposition 6.19 Let (xn ) be a sequence of vectors of the euclidean space (Rk , k·k2 ), xn = (xn1 , . . . , xnk ). Then (xn ) converges to a = (a1 , . . . , ak ) ∈ Rk if and only if lim xni = ai , n→∞ i = 1, . . . , k. Proof. Suppose that lim xn = a. Given ε > 0 there is an n0 ∈ N such that n ≥ n0 implies n→∞ kxn − ak2 < ε. Thus, for i = 1, . . . , k we have | xni − ai | ≤ kxn − ak2 < ε; hence lim xni = ai . n→∞ Conversely, suppose that lim xni = ai for i = 1, . . . , k. Given ε > 0 there are n0i ∈ N such n→∞ that n ≥ n0i implies ε | xni − ai | < √ . k For n ≥ max{n01 , . . . , n0k } we have (see (6.43)) √ kxn − ak2 ≤ k kxn − ak∞ < ε. hence lim xn = a. n→∞ 6 Sequences of Functions and Basic Topology 184 Corollary 6.20 Let B ⊂ Rk be a bounded subset and (xn ) a sequence of elements of B. Then (xn ) has a converging subsequence. (1) Proof. Since B is bounded all coordinates of B are bounded; hence there is a subsequence (xn ) (2) (1) of (xn ) such that the first coordinate converges. Further, there is a subsequence (xn ) of (xn ) (k) (k−1) such that the second coordinate converges. Finally there is a subsequence (xn ) of (xn ) (k) such that all coordinates converge. By the above proposition the subsequence (xn ) converges in Rk . The same statement is true for subsets B ⊂ Ck . Definition 6.12 A mapping f : X → Y from the metric space X into the metric space Y is said to be continuous at a ∈ X if one of the following equivalent conditons is satisfied. (a) For every ε > 0 there exists δ > 0 such that for every x ∈ X d(x, a) < δ implies d(f (x), f (a)) < ε. (6.44) (b) For any sequence (xn ), xn ∈ X with lim xn = a it follows that lim f (xn ) = f (a). n→∞ n→∞ The mapping f is said to be continuous on X if f is continuous at every point a of X. Proposition 6.21 The composition of two continuous mappings is continuous. The proof is completely the same as in the real case (see Proposition 3.4) and we omit it. We give the topological description of continuous functions. Proposition 6.22 Let X and Y be metric spaces. A mapping f : X → Y is continuous if and only if the preimage of any open set in Y is open in X. Proof. Suppose that f is continuous and G ⊂ Y is open. If f −1 (G) = ∅, there is nothing to prove; the empty set is open. Otherwise there exists x0 ∈ f −1 (G), and therefore f (x0 ) ∈ G. Since G is open, there is ε > 0 such that Uε (f (x0 )) ⊂ G. Since f is continuous at x0 , there exists δ > 0 such that x ∈ Uδ (x0 ) implies f (x) ∈ Uε (f (x0 )) ⊂ G. That is, Uδ (x0 ) ⊂ f −1 (G), and x0 is an inner point of f −1 (G); hence f −1 (G) is open. Suppose now that the condition of the proposition is fulfilled. We will show that f is continuous. Fix x0 ∈ X and ε > 0. Since G = Uε (f (x0 )) is open by Lemma 6.16, f −1 (G) is open by assumption. In particular, x0 ∈ f −1 (G) is an inner point. Hence, there exists δ > 0 such that Uδ (x0 ) ⊂ f −1 (G). It follows that f (Uδ (x0 )) ⊂ Uε (x0 ); this means that f is continuous at x0 . Since x0 was arbitrary, f is continuous on X. Remark 6.5 Since the complement of an open set is a closed set, it is obvious that the proposition holds if we replace “open set” by “closed set.” In general, the image of an open set under a continuous function need not to be open; consider for example f (x) = sin x and G = (0, 2π) which is open; however, f ((0, 2π)) = [−1, 1] is not open. 6.4 Basic Topology 185 6.4.5 Comleteness and Compactness (a) Completeness Definition 6.13 Let (X, d) be a metric space. A sequence (xn ) of elements of X is said to be a Cauchy sequence if for every ε > 0 there exists a positive integer n0 ∈ N such that d(xn , xm ) < ε for all m, n ≥ n0 . A metric space is said to be complete if every Cauchy sequence converges. A complete normed vector space is called a Banach space. Remark. The euclidean k-space Rk and Ck is complete. The function space C([a, b]), k·k∞ ) is complete, see homework 21.2. The Hilbert space ℓ2 is complete (b) Compactness The notion of compactness is of great importance in analysis, especially in connection with continuity. By an open cover of a set E in a metric space X we mean a collection {Gα | α ∈ I} of open S subsets of X such that E ⊂ α Gα . Here I is any index set and [ Gα = {x ∈ X | ∃ β ∈ I : x ∈ Gβ }. α∈I Definition 6.14 (Covering definition) A subset K of a metric space X is said to be compact if every open cover of K contains a finite subcover. More explicitly, if {Gα } is an open cover of K, then there are finitely many indices α1 , . . . , αn such that K ⊂ Gα1 ∪ · · · ∪ Gαn . Note that the definition does not state that a set is compact if there exists a finite open cover—the whole space X is open and a cover consisting of only one member. Instead, every open cover has a finite subcover. Example 6.16 (a) It is clear that every finite set is compact. (b) Let (xn ) be a converging to x sequence in a metric space X. Then A = {xn | n ∈ N} ∪ {x} is compact. Proof. Let {Gα } be any open cover of A. In particular, the limit point x is covered by, say, G0 . Then there is an n0 ∈ N such that xn ∈ G0 for every n ≥ n0 . Finally, xk is covered by some Gk , k = 1, . . . , n0 − 1. Hence the collection {Gk | k = 0, 1, . . . , n0 − 1} is a finite subcover of A; therefore A is compact. 6 Sequences of Functions and Basic Topology 186 Proposition 6.23 (Sequence Definition) A subset K of a metric space X is compact if and only if every sequence in K contains a convergent subsequence with limit in K. Proof. (a) Let K be compact and suppose to the contrary that (xn ) is a sequence in K without any convergent to some point of K subsequence. Then every x ∈ K has a neighborhood Ux containing only finitely many elements of the sequence (xn ). (Otherwise x would be a limit point of (xn ) and there were a converging to x subsequence.) By construction, [ K⊂ Ux . x∈X Since K is compact, there are finitely many points y1 , . . . , ym ∈ K with K ⊂ Uy1 ∪ · · · ∪ Uym . Since every Uyi contains only finitely many elements of (xn ), there are only finitely many elements of (xn ) in K—a contradiction. (b) The proof is an the appendix to this chapter. Remark 6.6 Further properties. (a) A compact subset of a metric space is closed and bounded. (b) A closed subsets of a compact set is compact. (c) A subset K of Rk or Ck is compact if and only if K is bounded and closed. Proof. Suppose K is closed and bounded. Let (xn ) be a sequence in K. By Corollary 6.20 (xn ) has a convergent subsequence. Since K is closed, the limit is in K. By the above proposition K is compact. The other directions follows from (a) (c) Compactness and Continuity As in the real case (see Theorem 3.6) we have the analogous results for metric spaces. Proposition 6.24 Let X be a compact metric space. (a) Let f : X → Y be a continuous mapping into the metric space Y . Then f (X) is compact. (b) Let f : X → R a continuous mapping. Then f is bounded and attains its maximum and minimum, that is there are points p and q in X such that f (p) = sup f (x), x∈X f (q) = inf f (x). x∈X Proof. (a) Let {Gα } be an open covering of f (X). By Proposition 6.22 f −1 (Gα ) is open for every α. Hence, {f −1 (Gα )} is an open cover of X. Since X is compact there is an open subcover of X, say {f −1(Gα1 ), . . . , f −1 (Gαn )}. Then {Gα1 , . . . , Gαn } is a finite subcover of {Gα } covering f (X). Hence, f (X) is compact. We skip (b). Similarly as for real function we have the following proposition about uniform continuity. The proof is in the appendix. 6.4 Basic Topology 187 Proposition 6.25 Let f : K → R be a continuous function on a compact set K ⊂ R. Then f is uniformly continuous on K. 6.4.6 Continuous Functions in R k Proposition 6.26 (a) The projection mapping pi : Rk → R, i = 1, . . . , k, given by pi (x1 , . . . , xk ) = xi is continuous. (b) Let U ⊆ Rk be open and f, g : U → R be continuous on U. Then f + g, f g, | f |, and, f /g (g 6= 0) are continuous functions on U. (c) Let X be a metric space. A mapping f = (f1 , . . . , fk ) : X → Rk is continuous if and only if all components fi : X → R, i = 1, . . . , k, are continuous. Proof. (a) Let (xn ) be a sequence converging to a = (a1 , . . . , ak ) ∈ Rk . Then the sequence (pi (xn )) converges to ai = pi (a) by Proposition 6.19. This shows continuity of pi at a. (b) The proofs are quite similar to the proofs in the real case, see Proposition 2.3. As a sample we carry out the proof in case f g. Let a ∈ U and put M = max{| f (a) | , | f (b) |}. Let ε > 0, ε < 3M 2 , be given. Since f and g are continuous at a, there exists δ > 0 such that kx − ak < δ implies kx − ak < δ implies ε , 3M ε . | g(x) − g(a) | < 3M | f (x) − f (a) | < (6.45) Note that f g(x) − f g(a) = (f (x) − f (a))(g(x) − g(a)) + f (a)(g(x) − g(a)) + g(a)(f (x) − f (a)). Taking the absolute value of the above identity, using the triangle inequality as well as (6.45) we have that kx − ak < δ implies | f g(x)−f g(a) | ≤ | f (x)−f (a) | | g(x)−g(a) | + | f (a) | | g(x) − g(a) | + | g(a) | | f (x) − f (a) | ε2 ε ε ε ε ε ≤ +M ≤ + + = ε. +M 2 9M 3M 3M 3 3 3 This proves continuity of f g at a. (c) Suppose first that f is continuous at a ∈ X. Since fi = pi ◦f , fi is continuous by the result of (a) and Proposition 6.21. Suppose now that all the fi , i = 1, . . . , k, are continuous at a. Let (xn ), xn 6= a, be a sequence in X with limn→∞ xn = a in X. Since fi is continuous, the sequences (fi (xn )) of numbers converge to fi (a). By Proposition 6.19, the sequence of vectors f (xn ) converges to f (a); hence f is continuous at a. 6 Sequences of Functions and Basic Topology 188 Example 6.17 Let f : R3 → R2 be given by f (x, y, z) = sin √ x2 +ez x2 +y 2 +z 2 +1 2 2 ! log | x2 + y + z + 1 | . Then f is continuous p on U. Indeed, since product, sum, and composition of continuous functions are continuous, x2 + y 2 + z 2 + 1 is a continuous function on R3 . We also made use of Proposition 6.26 (a); the coordinate functions x, y, and z are continuous. Since the denomi2 z nator is nonzero, f1 (x, y, z) = sin √ 2x +e is continuous. Since | x2 + y 2 + z 2 + 1 | > 0, 2 2 x +y +z +1 f2 (x, y, z) = log | x2 + y 2 + z 2 + 1 | is continuous. By Proposition 6.26 (c) f is continuous. 6.5 Appendix E (a) A compact subset is closed Proof. Let K be a compact subset of a metric space X. We shall prove that the complement of K is an open subset of X. Suppose that p ∈ X, p 6∈ K. If q ∈ K, let V q and U(q) be neighborhoods of p and q, respectively, of radius less than d(p, q)/2. Since K is compact, there are finitely many points q1 , . . . , qn in K such that K ⊂ Uq1 ∪ · · · ∪ Uqn =: U. If V = V q1 ∩ · · · ∩ V qn , then V is a neighborhood of p which does not intersect U. Hence U ⊂ K c , so that p is an interior point of K c , and K is closed. We show that K is bounded. Let ε > 0 be given. Since K is compact the open cover {Uε (x) | x ∈ K} of K has a finite S subcover, say {Uε (x1 ), . . . , Uε (xn )}. Let U = ni=1 Uε (xi ), then the maximal distance of two points x and y in U is bounded by 2ε + X d(xi , xj ). 1≤i<j≤n A closed subset of a compact set is compact Proof. Suppose F ⊂ K ⊂ X, F is closed in X, and K is compact. Let {U α } be an open cover of F . Since F c is open, {U α , F c } is an open cover Ω of K. Since K is compact, there is a finite subcover Φ of Ω, which covers K. If F c is a member of Φ, we may remove it from Φ and still retain an open cover of F . Thus we have shown that a finite subcollection of {U α } covers F. 6.5 Appendix E 189 Equivalence of Compactness and Sequential Compactness Proof of Proposition 6.23 (b). This direction is hard to proof. It does not work in arbitrary topological spaces and essentially uses that X is a metric space. The prove is roughly along the lines of Exercises 22 to 26 in [Rud76]. We give the proof of Bredon (see [Bre97, 9.4 Theorem]) Suppose that every sequence in K contains a converging in K subsequence. 1) K contains a countable dense set. For, we show that for every ε > 0, K can be covered by a finite number of ε-balls (ε is fixed). Suppose, this is not true, i. e. K can’t be covered by any finite number of ε-balls. Then we construct a sequence (xn ) as follows. Take an arbitrary x1 . Suppose x1 , . . . , xn are already found; since K is not covered by a finite number of ε-balls, we find xn+1 which distance to every preceding element of the sequence is greater than or equal to ε. Consider a limit point x of this sequence and an ε/2-neighborhood U of x. Almost all elements of a suitable subsequence of (xn ) belong to U, say xr and xs with s > r. Since both are in U their distance is less than ε. But this contradicts the construction of the sequence. Now take the union of all those finite sets corresponding to ε = 1/n, n ∈ N. This is a countable dense set of K. 2) Any open cover {Uα } of K has a countable subcover. Let x ∈ K be given. Since {Uα }α∈I is an open cover of K we find β ∈ I and n ∈ N such that U2/n (x) ⊂ Uα . Further, since {xi }i∈N is dense in K, we find i, n ∈ N such that d(x, xi ) < 1/n. By the triangle inequality x ∈ U1/n (xi ) ⊂ U2/n (x) ⊂ Uβ . To each of the countably many U1/n (xi ) choose one Uβ ⊃ U1/n (xi ). This is a countable subcover of {Uα }. 3) Rename the countable open subcover by {Vn }n∈N and consider the decreasing sequence Cn of closed sets n [ Cn = K \ Vk , C 1 ⊃ C 2 ⊃ · · · . k=1 If Ck = ∅ we have found a finite subcover, namely V1 , V2 , . . . , Vk . Suppose that all the Cn are nonempty, say xn ∈ Cn . Further, let x be the limit of the subsequence (xni ). Since xni ∈ Cm T for all ni ≥ m and Cm is closed, x ∈ Cm for all m. Hence x ∈ m∈N Cm . However, \ N Cm = K \ m∈ [ N Vm = ∅. m∈ This contradiction completes the proof. Proof of Proposition 6.25. Let ε > 0 be given. Since f is continuous, we can associate to each point p ∈ K a positive number δ(p) such that q ∈ K ∩ Uδ(p) (p) implies | f (q) − f (p) | < ε/2. Let J(p) = {q ∈ K | | p − q | < δ(p)/2}. Since p ∈ J(p), the collection {J(p) | p ∈ K} is an open cover of K; and since K is compact, there is a finite set of points p1 , . . . , pn in K such that K ⊂ J(p1 ) ∪ · · · ∪ J(pn ). (6.46) 6 Sequences of Functions and Basic Topology 190 We put δ := 12 min{δ(p1 ), . . . , δ(pn )}. Then δ > 0. Now let p and q be points of K with | x − y | < δ. By (6.46), there is an integer m, 1 ≤ m ≤ n, such that p ∈ J(pm ); hence 1 | p − pm | < δ(pm ), 2 and we also have 1 | q − pm | ≤ | p − q | + | p − pm | < δ + δ(pm ) ≤ δ(pm ). 2 Finally, continuity at pm gives | f (p) − f (q) | ≤ | f (p) − f (pm ) | + | f (pm ) − f (q) | < ε. Proposition 6.27 There exists a real continuous function on the real line which is nowhere differentiable. Proof. Define ϕ(x) = | x | , x ∈ [−1, 1] and extend the definition of ϕ to all real x by requiring periodicity ϕ(x + 2) = ϕ(x). Then for all s, t ∈ R, | ϕ(s) − ϕ(t) | ≤ | s − t | . (6.47) In particular, ϕ is continuous on R. Define f (x) = ∞ n X 3 n=0 4 ϕ(4n x). (6.48) Since 0 ≤ ϕ ≤ 1, Theorem 6.2 shows that the series (6.48) converges uniformly on R. By Theorem 6.5, f is continuous on R. Now fix a real number x and a positive integer m ∈ N. Put δm = ±1 2 · 4m where the sign is chosen that no integer lies between 4m x and 4m (x + δm ). This can be done since 4m | δm | = 12 . It follows that | ϕ(4m x) − ϕ(4m x + 4m δm ) | = 12 . Define γn = ϕ(4n (x + δm )) − ϕ(4n x) . δm 6.5 Appendix E 191 When n > m, then 4n δm is an even integer, so that γn = 0 by peridicity of ϕ. When 0 ≤ n ≤ m, (6.47) implies | γn | ≤ 4m . Since | γm | = 4m , we conclude that m n m−1 X f (x + δm ) − f (x) X 3 1 m = γ − 3n = (3m + 1) . ≥ 3 n n=0 4 δm 2 n=0 As m → ∞, δm → 0. It follows that f is not differentiable at x. Proof of Abel’s Limit Theorem, Proposition 6.9. By Proposition 6.4, the series converges on (−1, 1) and the limit function is continuous there since the radius of convergence is at least 1, by assumption. Hence it suffices to proof continuity at x = 1, i. e. that limx→1−0 f (x) = f (1). P n ∈ Z+ Put rn = ∞ k=n ak ; then r0 = f (1) and rn+1 − rn = −cn for all nonnegative integers P∞ and limn→∞ rn = 0. Hence there is a constant C with | rn | ≤ C and the series n=0 rn+1 xn converges for | x | < 1 by the comparison test. We have (1 − x) ∞ X rn+1 xn = n=0 ∞ X rn+1 xn + n=0 = ∞ X n=0 ∞ X rn+1 xn+1 n=0 n rn+1 x − ∞ X n=0 n rn x + r0 = − hence, f (1) − f (x) = (1 − x) ∞ X ∞ X an xn + f (1), n=0 rn+1 xn . n=0 Let ε > 0 be given. Choose N ∈ N such that n ≥ N implies | rn | < ε. Put δ = ε/(CN); then x ∈ (1 − δ, 1) implies | f (1) − f (x) | ≤ (1 − x) N −1 X n=0 | rn+1 | xn + (1 − x) ≤ (1 − x)CN + (1 − x)ε ∞ X ∞ X n=N | rn+1 | xn xn = 2ε; n=0 hence f tends to f (1) as x → 1 − 0. Definition 6.15 If X is a metric space C(X) will denote the set of all continuous, bounded functions with domain X. We associate with each f ∈ C(X) its supremum norm kf k∞ = kf k = sup | f (x) | . (6.49) x∈X Since f is assumed to be bounded, kf k < ∞. Note that boundedness of X is redundant if X is a compact metric space (Proposition 6.24). Thus C(X) contains of all continuous functions in that case. It is clear that C(X) is a vector space since the sum of bounded functions is again a bounded 6 Sequences of Functions and Basic Topology 192 function (see the triangle inequality below) and the sum of continuous functions is a continuous function (see Proposition 6.26). We show that kf k∞ is indeed a norm on C(X). (i) Obviously, kf k∞ ≥ 0 since the absolute value | f (x) | is nonnegative. Further k0k = 0. Suppose now kf k = 0. This implies | f (x) | = 0 for all x; hence f = 0. (ii) Clearly, for every (real or complex) number λ we have kλf k = sup | λf (x) | = | λ | sup | f (x) | = | λ | kf k . x∈X x∈X (iii) If h = f + g then | h(x) | ≤ | f (x) | + | g(x) | ≤ kf k + kgk , x ∈ X; hence kf + gk ≤ kf k + kgk . We have thus made C(X) into a normed vector space. Remark 6.1 can be rephrased as A sequence (fn ) converges to f with respect to the norm in C(X) if and only if fn → f uniformly on X. Accordingly, closed subsets of C(X) are sometimes called uniformly closed, the closure of a set A ⊂ C(X) is called the uniform closure, and so on. Theorem 6.28 The above norm makes C(X) into a Banach space (a complete normed space). Proof. Let (fn ) be a Cauchy sequence of C(X). This means to every ε > 0 corresponds an n0 ∈ N such that n, m ≥ n0 implies kfn − fm k < ε. It follows by Proposition 6.1 that there is a function f with domain X to which (fn ) converges uniformly. By Theorem 6.5, f is continuous. Moreover, f is bounded, since there is an n such that | f (x) − fn (x) | < 1 for all x ∈ X, and fn is bounded. Thus f ∈ C(X), and since fn → f uniformly on X, we have kf − fn k → 0 as n → ∞. Chapter 7 Calculus of Functions of Several Variables In this chapter we consider functions f : U → R or f : U → Rm where U ⊂ Rn is an open set. In Subsection 6.4.6 we collected the main properties of continuous functions f . Now we will study differentiation and integration of such functions in more detail The Norm of a linear Mapping Proposition 7.1 Let T ∈ L(Rn , Rm ) be a linear mapping of the euclidean spaces Rn into Rm . (a) Then there exists some C > 0 such that kT (x)k2 ≤ C kxk2 , (b) T is uniformly continuous on Rn . for all x ∈ Rn . (7.1) Proof. (a) Using the standard bases of Rn and Rm we identify T with its matrix T = (aij ), P T ej = m i=1 aij ei . For x = (x1 , . . . , xn ) we have ! n n X X a1j xj , . . . , amj xj ; T (x) = j=1 j=1 hence by the Cauchy–Schwarz inequality we have n 2 !2 m X n m X X X aij xj ≤ | aij xj | kT (x)k22 = i=1 j=1 i=1 j=1 ! n m n n XX X X X 2 2 2 ≤ | aij | | xj | = aij | xj |2 = C 2 kxk2 , i=1 j=1 where C = qP i,j j=1 i,j j=1 a2ij . Consequently, kT xk ≤ C kxk . (b) Let ε > 0. Put δ = ε/C with the above C. Then kx − yk < δ implies kT x − T yk = kT (x − y)k ≤ C kx − yk < ε, which proves (b). 193 7 Calculus of Functions of Several Variables 194 Definition 7.1 Let V and W normed vector spaces and A ∈ L(V, W ). The smallest number C with (7.1) is called the norm of the linear map A and is denoted by kAk. kAk = inf{C | kAxk ≤ C kxk for all x ∈ V }. (7.2) By definition, kAxk ≤ kAk kxk . (7.3) Let T ∈ L(Rn , Rm ) be a linear mapping. One can show that kT k = sup x6=0 kT xk = sup kT xk = sup kT xk . kxk kxk=1 kxk≤1 7.1 Partial Derivatives We consider functions f : U → R where U ⊂ Rn is an open set. We want to find derivatives “one variable at a time.” Definition 7.2 Let U ⊂ Rn be open and f : U → R a real function. Then f is called partial differentiable at a = (a1 , . . . , an ) ∈ U with respect to the ith coordinate if the limit Di f (a) = lim h→0 f (a1 , . . . , ai + h, . . . , an ) − f (a1 , . . . , an ) h (7.4) exists where h is real and sufficiently small (such that (a1 , . . . , ai + h, . . . , an ) ∈ U). Di f (x) is called the ith partial derivative of f at a. We also use the notations Di f (a) = ∂f ∂f (a) (a) = = fxi (a). ∂xi ∂xi It is important that Di f (a) is the ordinary derivative of a certain function; in fact, if g(x) = f (a1 , . . . , x, . . . , an ), then Di f (a) = g ′ (ai ). That is, Di f (a) is the slope of the tangent line at (a, f (a)) to the curve obtained by intersecting the graph of f with the plane xj = aj , j 6= i. It also means that computation of Di f (a) is a problem we can already solve. Example 7.1 (a) f (x, y) = sin(xy 2 ). Then D1 f (x, y) = y 2 cos(xy 2 ) and D2 f (x, y) = 2xy cos(xy 2 ). (b) Consider the radius function r : Rn → R q r(x) = kxk2 = x21 + · · · + x2n , x = (x1 , . . . , xn ) ∈ Rn . Then r is partial differentiable on Rn \ 0 with ∂r xi , (x) = ∂xi r(x) Indeed, the function f (ξ) = x 6= 0. q x21 + · · · + ξ 2 + · · · + x2n (7.5) 7.1 Partial Derivatives 195 is differentiable, where x1 , . . . , xi−1 , xi+1 , . . . , xn are considered to be constant. Using the chain rule one obtains (with ξ = xi ) 2ξ 1 xi ∂r (x) = f ′ (ξ) = p 2 = . ∂xi 2 x1 + · · · + ξ 2 + · · · + x2n r (c) Let f : (0, +∞) → R be differentiable. The composition x 7→ f (r(x)) (with the above radius function r) is denoted by f (r), it is partial differentiable on Rn \ 0. The chain rule gives ∂r xi ∂ f (r) = f ′ (r) = f ′ (r) . ∂xi ∂xi r (d) Partial differentiability does not imply continuity. Define ( xy xy (x, y) 6= (0, 0), 2 2 2 = r4 , f (x, y) = (x +y ) 0, (x, y) = (0, 0). Obviously, f is partial differentiable on R2 \ 0. Indeed, by definition of the partial derivative ∂f f (h, 0) (0, 0) = lim = lim 0 = 0. h→0 h→0 ∂x h Since f is symmetric in x and y, ∂f (0, 0) = 0, too. However, f is not continuous at 0 since ∂y 2 f (ε, ε) = 1/(4ε ) becomes large as ε tends to 0. Remark 7.1 In the next section we will become acquainted with stronger notion of differentiability which implies continuity. In particular, a continuously partial differentiable function is continuous. Definition 7.3 Let U ⊂ Rn be open and f : U → R partial differentiable. The vector ∂f ∂f grad f (x) = (x), . . . , (x) ∂x1 ∂xn (7.6) is called the gradient of f at x ∈ U. Example 7.2 (a) For the radius function r(x) defined in Example 7.1 (b) we have x grad r(x) = . r Note that x/r is a unit vector (of the euclidean norm 1) in the direction x. With the notations of Example 7.1 (c), x grad f (r) = f ′ (r) . r (b) Let f, g : U → R be partial differentiable functions. Then we have the following product rule grad (f g) = g grad f + f grad g. This is immediate from the product rule for functions of one variable ∂f ∂g ∂ (f g) = g+f . ∂xi ∂xi ∂xi (c) f (x, y) = xy . Then grad f (x, y) = (yxy−1 , xy log x). (7.7) 7 Calculus of Functions of Several Variables 196 Notation. Instead of grad f one also writes ∇f (“Nabla f ”). ∇ is a vector-valued differential operator: ∂ ∂ ∇= . ,..., ∂x1 ∂xn Definition 7.4 Let U ⊂ Rn . A vector field on U is a mapping v = (v1 , . . . , vn ) : U → Rn . (7.8) To every point x ∈ U there is associated a vector v(x) ∈ Rn . If the vector field v is partial differentiable (i. e. all components vi are partial differentiable) then n X ∂vi div v = ∂xi i=1 (7.9) is called the divergence of the vector field v. Formally the divergence of v can be written as a inner product of ∇ and v div v = ∇·v = n X ∂ vi . ∂x i i=1 The product rule gives the following rule for the divergence. Let f : U → R a partial differentiable function and v = (v1 , . . . , vn ) : U → R a partial differentiable vector field, then ∂f ∂vi ∂ (f vi ) = · vi + f · . ∂xi ∂xi ∂xi Summation over i gives div (f v) = grad f ·v + f div v. Using the nabla operator this can be rewritten as ∇·f v = ∇f ·v + f ∇·v. Example 7.3 Let F : Rn \ 0 → Rn be the vector field F (x) = xr , r = kxk. Since div x = n X ∂xi i=1 ∂xi =n and x·x = r 2 , Example 7.2 gives with v = x and f (r) = 1/r div x n 1 x n−1 1 = grad ·x + div x = − 3 ·x + = . r r r r r r (7.10) 7.1 Partial Derivatives 197 7.1.1 Higher Partial Derivatives Let U ⊂ Rn be open and f : U → R a partial differentiable function. If all partial derivatives Di f : U → R are again partial differentiable, f is called twice partial differentiable. We can form the partial derivatives Dj Di f of the second order. More general, f : U → R is said to be (k + 1)-times partial differentiable if it is k-times partial differentiable and all partial derivatives of order k Dik Dik−1 · · · Di1 f : U → R are partial differentiable. A function f : U → R is said to be k-times continuously partial differentiable if it is k-times partial differentiable and all partial derivatives of order less than or equal to k are continuous. The set of all such functions on U is denoted by Ck (U). We also use the notation Dj Di f = ∂k f ∂2f ∂2f f = = fxi xj , Di Di f = , D · · · D . i i 1 k ∂xj ∂xi ∂x2i ∂xik · · · ∂xi1 Example. Let f (x, y) = sin(xy 2 ). One easily sees that fyx = fxy = 2y cos(xy 2 ) − y 2 sin(xy 2 )2xy. Proposition 7.2 (Schwarz’s Lemma) Let U ⊂ Rn be open and f : U → R be twice continuously partial differentiable. Then for every a ∈ U and all i, j = 1, . . . , n we have Dj Di f (a) = Di Dj f (a). (7.11) Proof. Without loss of generality we assume n = 2, i = 1, j = 2, and a = 0; we write (x, y) in place of (x1 , x2 ). Since U is open, there is a small square of length 2δ > 0 completely contained in U: {(x, y) ∈ R2 | | x | < δ, | y | < δ} ⊂ U. For fixed y ∈ Uδ (0) define the function F : (−δ, δ) → R via F (x) = f (x, y) − f (x, 0). By the mean value theorem (Theorem 4.9) there is a ξ with | ξ | ≤ | x | such that F (x) − F (0) = xF ′ (ξ). But F ′ (ξ) = fx (ξ, y) − fx (ξ, 0). Applying the mean value theorem to the function h(y) = fx (ξ, y), there is an η with | η | ≤ | y | and fx (ξ, y) − fx (ξ, 0) = h′ (y)y = ∂ fx (ξ, η) y = fxy (ξ, η) y. ∂y Altogether we have F (x) − F (0) = f (x, y) − f (x, 0) − f (0, y) + f (0, 0) = fxy (ξ, η)xy. (7.12) 7 Calculus of Functions of Several Variables 198 The same arguments but starting with the function G(y) = f (x, y) − f (0, y) show the existence of ξ ′ and η ′ with | ξ ′ | ≤ | x |, | η ′ | ≤ | y | and f (x, y) − f (x, 0) − f (0, y) + f (0, 0) = fxy (ξ ′, η ′ ) xy. (7.13) From (7.12) and (7.13) for xy 6= 0 it follows that fxy (ξ, η) = fxy (ξ ′, η ′ ). If (x, y) approaches (0, 0) so do (ξ, η) and (ξ ′ , η ′ ). Since fxy and fyx are both continuous it follows from the above equation D2 D1 f (0, 0) = D1 D2 f (0, 0). Corollary 7.3 Let U ⊂ Rn be open and f : U → Rn be k-times continuously partial differentiable. Then Dik · · · Di1 f = Diπ(k) · · · Diπ(1) f for every permutation π of 1, . . . , k. Proof. The proof is by induction on k using the fact that any permutation can be written as a product of transpositions (j ↔ j + 1). Example 7.4 Let U ⊂ R3 be open and let v : U → R3 be a partial differentiable vector field. One defines a new vector field curl v : U → R3 , the curl of v by ∂v3 ∂v2 ∂v1 ∂v3 ∂v2 ∂v1 curl v = . (7.14) − , − , − ∂x2 ∂x3 ∂x3 ∂x1 ∂x1 ∂x2 Formally one can think of curl v as being the vector product of ∇ and v e 1 e2 e3 curl v = ∇ × v = ∂x∂ 1 ∂x∂ 2 ∂x∂ 3 , v1 v2 v3 where e1 , e2 , and e3 are the unit vectors in R3 . If f : U → R has continuous second partial derivatives then, by Proposition 7.2, curl grad f = 0. (7.15) Indeed, the first coordinate of curl grad f is by definition ∂2f ∂2f − = 0. ∂x2 ∂x3 ∂x3 ∂x2 The other two components are obtained by cyclic permutation of the indices. We have found: curl v = 0 is a necessary condition for a continuously partial differentiable vector field v : U → R3 to be the gradient of a function f : U → R. 7.2 Total Differentiation 199 7.1.2 The Laplacian Let U ⊂ Rn be open and f ∈ C(U). Put ∆f = div grad f = and call ∆= ∂2f ∂2f + · · · + , ∂x21 ∂x2n (7.16) ∂2 ∂2 + · · · + ∂x21 ∂x2n the Laplacian or Laplace operator. The equation ∆f = 0 is called the Laplace equation; its solution are the harmonic functions. If f depends on an additional time variable t, f : U × I → R, (x, t) 7→ f (x, t) one considers the so called wave equation ftt − a2 ∆f = 0, (7.17) ft − k∆f = 0. (7.18) and the so called heat equation Example 7.5 Let f : (0, +∞) → R be twice continuously differentiable. We want to compute the Laplacian ∆f (r), r = kxk, x ∈ Rn \ 0. By Example 7.2 we have x grad f (r) = f ′ (r) , r and by the product rule and Example 7.3 we obtain x x x x n−1 ∆f (r) = div grad f (r) = grad f ′ (r)· + f ′ (r) div = f ′′ (r) · + f ′ (r) ; r r r r r thus n−1 ′ f (r). r = 0 if n ≥ 3 and ∆ log r = 0 if n = 2. ∆f (r) = f ′′ (r) + 1 In particular, ∆ rn−2 Prove! 7.2 Total Differentiation In this section we define (total) differentiability of a function f from Rn to Rm . Roughly speaking, f is differentiable (at some point) if it can be approximated by a linear mapping. In contrast to partial differentiability we need not to refer to single coordinates. Differentiable functions are continuous. In this section U always denotes an open subset of Rn . The vector space of linear mappings f of a vector space V into a vector space W will be denoted by L(V, W ). Motivation: If f : R → R is differentiable at x ∈ R and f ′ (x) = a, then f (x + h) − f (x) − a·h = 0. h→0 h lim Note that the mapping h 7→ ah is linear from R → R and any linear mapping is of that form. 7 Calculus of Functions of Several Variables 200 Definition 7.5 The mapping f : U → Rm is said to be differentiable at a point x ∈ U if there exist a linear map A : Rn → Rm such that kf (x + h) − f (x) − A(h)k = 0. h→0 khk lim (7.19) The linear map A ∈ L(Rn , Rm ) is called the derivative of f at x and will be denoted by Df (x). In case n = m = 1 this notion coincides with the ordinary differentiability of a function. Remark 7.2 We reformulate the definition of differentiability of f at a ∈ U: Define a function ϕa : Uε (0) ⊂ Rn → Rm (depending on both a and h) by f (a + h) = f (a) + A(h) + ϕa (h). (7.20) a (h)k Then f is differentiable at a if and only if limh→0 kϕkhk = 0. Replacing the r.h.s. of (7.20) by f (a) + A(h) (forgetting about ϕa ) and inserting x in place of a + h and Df (a) in place of A, we obtain the linearization L : Rn → Rm of f at a: L(x) = f (a) + Df (a)(x − a). (7.21) Lemma 7.4 If f is differentiable at x ∈ U the linear mapping A is uniquely determined. Proof. Throughout we refer to the euclidean norms on Rn and Rm . Suppose that A′ ∈ L(Rn , Rm ) is another linear mapping satisfying (7.19). Then for h ∈ Rn , h 6= 0, kA(h) − A′ (h)k = kf (x + h) − f (x) − A(h) − (f (x + h) − f (x) − A′ (h))k ≤ kf (x + h) − f (x) − A(h)k + kf (x + h) − f (x) − A′ (h)k kA(h) − A′ (h)k kf (x + h) − f (x) − A(h)k kf (x + h) − f (x) − A′ (h)k ≤ + khk khk khk Since the limit h → 0 on the right exists and equals 0, the l.h.s also tends to 0 as h → 0, that is kA(h) − A′ (h)k = 0. h→0 khk lim Now fix h0 ∈ Rn , h0 6= 0, and put h = th0 , t ∈ R , t → 0. Then h → 0 and hence, kA(th0 ) − A′ (th0 )k | t | kA(h0 ) − A′ (h0 )k kA(h0 ) − A′ (h0 )k = lim = . t→0 t→0 kth0 k | t | kh0 k kh0 k 0 = lim Hence, kA(h0 ) − A′ (h0 )k = 0 which implies A(h0 ) = A′ (h0 ) such that A = A′ . Definition 7.6 The matrix (aij ) ∈ Rm×n to the linear map Df (x) with respect to the standard bases in Rn and Rm is called the Jacobi matrix of f at x. It is denoted by f ′ (x), that is 0 .. . ′ aij = (f (x))ij = (0, 0, . . . , |{z} 1 , 0, . . . , 0)·Df (x)· 1 = ei ·Df (x)(ej ). .. i . 0 7.2 Total Differentiation 201 Example Let f : Rn → Rm be linear, f (x) = B(x) with B ∈ L(Rn , Rm ). Then Df (x) = B is the constant linear mapping. Indeed, f (x + h) − f (x) − B(h) = B(x + h) − B(x) − B(h) = 0 since B is linear. Hence, lim kf (x + h) − f (x) − B(h)k khk = 0 which proves the claim. h→0 Remark 7.3 (a) Using a column vector h = (h1 , . . . , hn )⊤ the map Df (x)(h) is then given by matrix multiplication Pn h1 a11 . . . a1n j=1 a1j hj .. .. .. = Df (x)(h) = f ′ (x) · h = ... . . . . Pn hn am1 . . . amn j=1 amj hj Once chosen the standard basis in Rm , we can write f (x) = (f1 (x), . . . , fm (x)) as vector of m scalar functions fi : Rn → R. By Proposition 6.19 the limit of the vector function 1 (f (x + h) − f (x) − Df (x)(h)) . h→0 khk lim exists and is equal to 0 if and only if the limit exists for every coordinate i = 1, . . . , m and is 0 ! n X 1 fi (x + h) − fi (x) − (7.22) aij hj = 0, i = 1, . . . , m. lim h→0 khk j=1 We see, f is differentiable at x if and only if all fi , i = 1, . . . , m, are. In this case the Jacobi matrix f ′ (x) is just the collection of the row vectors fi′ (x), i = 1, . . . , m: ′ f1 (x) f ′ (x) = ... , ′ (x) fm where fi′ (x) = (ai1 , ai2 , . . . , ain ). (b) Case m = 1 (f = f1 ), f ′ (a) ∈ R1×n is a linear functional (a row vector). Proposition 7.6 below will show that f ′ (a) = grad f (a). The linearization L(x) of f at a is given by the linear functional Df (a) from Rn → R. The graph of L is an n-dimensional hyper plane in Rn+1 touching the graph of f at the point (a, f (a)). In coordinates, the linearization (hyper plane equation) is xn+1 = L(x) = f (a) + f ′ (a)·(x − a). Here f ′ (a) is the row vector corresponding to the linear functional Df (a) : Rn → R w.r.t. the standard basis. Example 7.6 Let C = (cij ) ∈ Rn×n be a symmetric n × n matrix, that is cij = cji for all i, j = 1, . . . , n and define f : Rn → R by f (x) = x·C(x) = n X i,j=1 cij xi xj , x = (x1 , . . . , xn ) ∈ Rn . 7 Calculus of Functions of Several Variables 202 If a, h ∈ Rn we have f (a + h) = a + h·C(a + h) = a·C(a) + h·C(a) + a·C(h) + h·C(h) = a·Ca + 2C(a)·h + h·C(h) = f (a) + v ·h + ϕ(h), where v = 2C(a) and ϕ(h) = h·C(h). Since, by the Cauchy–Schwarz inequality, | ϕ(h) | = | h·C(h) | ≤ khk kC(h)k ≤ khk kCk khk ≤ kCk khk2 , = 0. This proves f to be differentiable at a ∈ Rn with derivative Df (x)(x) = limh→0 ϕ(h) khk 2C(a)·x. The Jacobi matrix is a row vector f ′ (a) = 2C(a)⊤ . 7.2.1 Basic Theorems Lemma 7.5 Let f : U → Rm differentiable at x, then f is continuous at x. Proof. Define ϕx (h) as in Remarks 7.2 with Df (x) = A, then lim kϕx (h)k = 0 h→0 since f is differentiable at x. Since A is continuous by Prop. 7.1, limh→0 A(h) = A(0) = 0. This gives lim f (x + h) = f (x) + lim A(h) + lim ϕx (h) = f (x). h→0 h→0 h→0 This shows continuity of f at x. Proposition 7.6 Let f : U → Rm , f (x) = (f1 (x), . . . , fm (x)) be differentiable at x ∈ U . Then , i = 1, . . . m, j = 1, . . . , n exist and the Jacobi matrix f ′ (x) ∈ all partial derivatives ∂f∂xi (x) j Rm×n has the form ∂f1 ∂f1 . . . ∂x ∂x1 n ∂fi (x) .. . ′ .. (x) = . (7.23) (aij ) = f (x) = . i = 1, . . . , m ∂xj ∂fm ∂fm . . . ∂xn ∂x1 j = 1, . . . , n Notation. (a) For the Jacobi matrix we also use the notation ∂(f1 , . . . , fm ) ′ f (x) = (x) . ∂(x1 , . . . , xn ) (b) In case n = m the determinant det(f ′ (x)) of the Jacobi matrix is called the Jacobian or functional determinant of f at x. It is denoted by det(f ′ (x)) = ∂(f1 , . . . , fn ) (x) . ∂(x1 , . . . , xn ) 7.2 Total Differentiation 203 Proof. Inserting h = tej = (0, . . . , t, . . . , 0) into (7.22) (see Remark 7.3) we have, since khk = | t | and hk = tδkj for all i = 1, . . . , m P kfi (x + tej ) − fi (x) − nk=1 aik hk k 0 = lim t→0 ktej k | fi (x1 , . . . , xj + t, . . . , xn ) − fi (x) − taij | = lim t→0 |t| fi (x1 , . . . , xj + t, . . . , xn ) − fi (x) = lim − aij t→0 t ∂fi (x) = − aij . ∂xj Hence aij = ∂fi (x) . ∂xj Hyper Planes A plane in R3 is the set H = {(x1 , x2 , x3 ) ∈ R3 | a1 x2 + a2 x2 + a3 x3 = a4 } where, ai ∈ R, i = 1, . . . , 4. The vector a = (a1 , a2 , a3 ) is the normal vector to H; a is orthogonal to any vector x − x′ , x, x′ ∈ H. Indeed, a·(x − x′ ) = a·x − a·x′ = a4 − a4 = 0. The plane H is 2-dimensional since H can be written with two parameters α1 , α2 ∈ R as (x01 , x02 , x03 )+α1 v1 +α2 v2 , where (x01 , x02 , x03 ) is some point in H and v1 , v2 ∈ R3 are independent vectors spanning H. This concept is can be generalized to Rn . A hyper plane in Rn is the set of points H = {(x1 , . . . , xn ) ∈ Rn | a1 x1 + a2 x2 + · · · + an xn = an+1 }, where a1 , . . . , an+1 ∈ R. The vector (a1 , . . . , an ) ∈ Rn is called the normal vector to the hyper plane H. Note that a is unique only up to scalar multiples. A hyper plane in Rn is of dimension n − 1 since there are n − 1 linear independent vectors v1 , . . . , vn−1 ∈ Rn and a point h ∈ H such that H = {h + α1 v1 + · · · + αn vn | α1 , . . . , αn ∈ R}. Example 7.7 (a) Special case m = 1; let f : U → R be differentiable. Then ∂f ∂f ′ (x), . . . , (x) = grad f (x). f (x) = ∂x1 ∂xn It is a row vector and gives a linear functional on Rn which linearly associates to each vector y = (y1 , . . . , yn )⊤ ∈ Rn a real number y1 n X .. fxj (x)yj . Df (x) . = grad f (x)·y = yn j=1 7 Calculus of Functions of Several Variables 204 In particular by Remark 7.3 (b), the equation of the linearization of f at a (the touching hyper plane) is xn+1 = L(x) = f (a) + grad f (a)·(x − a) xn+1 = f (a) + (fx1 (a), . . . , fxn (a)) ·(x1 − a1 , · · · , xn − an ) xn+1 = f (a) + n X j=1 0= n X j=1 fxj (a)(xj − aj ) −fxj (a)(xj − aj ) + 1 · (xn+1 − f (a)) 0 = ñ·(x̃ − ã), where x̃ = (x1 , · · · , xn , xn+1 ), ã = (a1 , · · · an , f (a)) and ñ = (− grad f (a), 1) ∈ Rn+1 is the normal vector to the hyper plane at ã. (b) Special case n = 1; let f : (a, b) → Rm , f = (f1 , . . . , fm ) be differentiable. Then f is a ′ (t)) ∈ Rm×1 is curve in Rm with initial point f (a) and end point f (b). f ′ (t) = (f1′ (t), . . . , fm the Jacobi matrix of f at x (column vector). It is the tangent vector to the curve f at t ∈ (a, b). (c) Let f : R3 → R2 be given by 3 x − 3xy 2 + z . f (x, y, z) = (f1 , f2 ) = sin(xyz 2 ) Then ′ f (x, y, z) = ∂(f1 , f2 ) ∂(x, y, z) = −6xy 1 3x2 − 3y 2 . yz 2 cos(xy 2 z) xz 2 cos(xy 2 z) 2xyz cos(xy 2 z) The linearization of f at (a, b, c) is L(x, y, z) = f (a, b, c) + f ′ (a, b, c)·(x − a, y − b, z − c). Remark 7.4 Note that the existence of all partial derivatives does not imply the existence of f ′ (a). Recall Example 7.1 (d). There was given a function having partial derivatives at the origin not being continuous at (0, 0), and hence not being differentiable at (0, 0). However, the next proposition shows that the converse is true provided all partial derivatives are continuous. Proposition 7.7 Let f : U → Rm be continuously partial differentiable at a point a ∈ U. Then f is differentiable at a and f ′ : U → L(Rn , Rm ) is continuous at a. The proof in case n = 2, m = 1 is in the appendix to this chapter. Theorem 7.8 (Chain Rule) If f : Rn → Rm is differentiable at a point a and g : Rm → Rp is differentiable at b = f (a), then the composition k = g ◦f : Rn → Rp is differentiable at a and Dk(a) = Dg(b)◦Df (a). (7.24) 7.2 Total Differentiation 205 Using Jacobi matrices, this can be written as k ′ (a) = g ′ (b) · f ′ (a). (7.25) Proof. Let A = Df (a), B = Dg(b), and y = f (x). Defining functions ϕ, ψ, and ρ by ϕ(x) = f (x) − f (a) − A(x − a), ψ(y) = g(y) − g(b) − B(y − b), ρ(x) = g ◦f (x) − g ◦f (a) − B ◦A(x − a) (7.26) (7.27) (7.28) then lim x→a kϕ(x)k = 0, kx − ak and we have to show that lim y→b kψ(y)k =0 ky − bk (7.29) kρ(x)k = 0. x→a kx − ak lim Inserting (7.26) and (7.27) we find ρ(x) = g(f (x)) − g(f (a)) − BA (x − a) = g(f (x)) − g(f (a)) − B(f (x) − f (a) − ϕ(x)) ρ(x) = [g(f (x)) − g(f (a)) − B(f (x) − f (a))] + B ◦ϕ(x) ρ(x) = ψ(f (x)) + B(ϕ(x)). Using kT (x)k ≤ kT k kxk (see Proposition 7.1) this shows kρ(x)k kψ(f (x))k kB ◦ϕ(x)k kψ(y)k kf (x) − f (a)k kϕ(x)k ≤ + ≤ · + kBk . kx − ak kx − ak kx − ak ky − bk kx − ak kx − ak Inserting (7.26) again into the above equation we continue kϕ(x)k kψ(y)k kϕ(x) + A(x − a)k · + kBk ky − bk ka − xk kx − ak kψ(y)k kϕ(x)k kϕ(x)k ≤ + kAk + kBk . ky − bk ka − xk kx − ak = All terms on the right side tend to 0 as x approaches a. This completes the proof. Remarks 7.5 (a) The chain rule in coordinates. If A = f ′ (a), B = g ′(f (a)), and C = k ′ (a), then A ∈ Rm×n , B ∈ Rp×m , and C ∈ Rp×n and ∂(k1 , . . . , kp ) ∂(g1 , . . . , gp ) ∂(f1 , . . . , fm ) ◦ = (7.30) ∂(x1 , . . . , xn ) ∂(y1 , . . . , ym) ∂(x1 , . . . , xn ) m X ∂fi ∂kr ∂gr (a) = (f (a)) (a), r = 1, . . . , p, j = 1, . . . , n. (7.31) ∂xj ∂yi ∂xj i=1 (b) In particular, in case p = 1, k(x) = g(f (x)) we have, ∂k ∂g ∂f1 ∂g ∂fm = +···+ . ∂xj ∂y1 ∂xj ∂ym ∂xj 7 Calculus of Functions of Several Variables 206 Example 7.8 (a) Let f (u, v) = uv, u = g(x, y) = x2 + y 2 , v = h(x, y) = xy, and z = f (g(x, y), h(x, y)) = (x2 + y 2 )xy = x3 y + x2 y 3 . ∂z ∂f ∂g ∂f ∂h = · + · = v · 2x + u · y = 2x2 y + y(x2 + y 2 ) ∂x ∂u ∂x ∂v ∂x ∂z = 3x2 y + y 3 . ∂x (b) Let f (u, v) = uv , u(t) = v(t) = t. Then F (t) = f (u(t), v(t)) = tt and ∂f ′ ∂f ′ u (t) + v (t) = vuv−1 · 1 + uv log u · 1 ∂u ∂v = t · tt−1 + tt log t = tt (log t + 1). F ′ (t) = 7.3 Taylor’s Formula The gradient of f gives an approximation of a scalar function f by a linear functional. Taylor’s formula generalizes this concept of approximation to higher order. We consider quadratic approximation of f to determine local extrema. Througout this section we refer to the euclidean p 2 norm kxk = kxk2 = x1 + · · · + x2n . 7.3.1 Directional Derivatives Definition 7.7 Let f : U → R be a function, a ∈ U, and e ∈ Rn a unit vector, kek = 1. The directional derivative of f at a in the direction of the unit vector e is the limit (De f )(a) = lim t→0 Note that for e = ej we have De f = Dj f = f (a + te) − f (a) . t (7.32) ∂f . ∂xj Proposition 7.9 Let f : U → R be continuously differentiable. Then for every a ∈ U and every unit vector e ∈ Rn , kek = 1, we have De f (a) = e· grad f (a) (7.33) Proof. Define g : R → Rn by g(t) = a + te = (a1 + te1 , . . . , an + ten ). For sufficiently small t ∈ R, say | t | ≤ ε, the composition k = f ◦g g f R → Rn → R, k(t) = f (g(t)) = f (a1 + te1 , . . . , an + ten ) is defined. We compute k ′ (t) using the chain rule: n X ∂f k (t) = (a + te) gj′ (t). ∂xj j=1 ′ 7.3 Taylor’s Formula 207 Since gj′ (t) = (aj + tej )′ = ej and g(0) = a, it follows n X ∂f k (t) = (a + te) ej , ∂xj j=1 ′ ′ k (0) = n X (7.34) fxj (a)ej = grad f (a)·e. j=1 On the other hand, by definition of the directional derivative k(t) − k(0) f (a + te) − f (a) = lim = De f (a). t→0 t→0 t t k ′ (0) = lim This completes the proof. Remark 7.6 (Geometric meaning of grad f ) Suppose that grad f (a) 6= 0 and let e be a normed vector, kek = 1. Varying e, De f (x) = e· grad f (x) becomes maximal if and only if e and ∇f (a) have the same directions. Hence the vector grad f (a) points in the direction of maximal slope of f at a. Similarly, − grad f (a) is the direction of maximal decline. p For example f (x, 1 − x2 −y 2 has y) = grad f (x, y) = √ −x , 1−x2 −y 2 √ −y 1−x2 −y 2 . The maximal slope p of f at (x, y) is in direction e = (−x, −y)/ x2 + y 2 . In this case,the tangent line to the graph points to the z-axis and has maximal slope. Corollary 7.10 Let f : U → R be k-times continuously differentiable, a ∈ U and x ∈ Rn such that the whole segment a + tx, t ∈ [0, 1] is contained in U. Then the function h : [0, 1] → R, h(t) = f (a + tx) is k-times continuously differentiable, where h(k) (t) = n X i1 ,...,ik =1 Dik · · · Di1 f (a + tx)xi1 · · · xik . (7.35) In particular h(k) (0) = n X i1 ,...,ik =1 Dik · · · Di1 f (a)xi1 · · · xik . (7.36) Proof. The proof is by induction on k. For k = 1 it is exactly the statement of the Proposition. We demonstrate the step from k = 1 to k = 2. By (7.34) n n n X X X d ∂f (a + tx) ∂f ′′ xi1 = (a + tx)xi2 xi1 . h (t) = dt ∂xi1 ∂xi2 ∂xi1 i =1 i =1 i =1 1 1 2 In the second equality we applied the chain rule to h̃(t) = fxi1 (a + tx). 7 Calculus of Functions of Several Variables 208 For brevity we use the following notation for the term on the right of (7.36): n X (x ∇)k f (a) = xi1 · · · xik Dik · · · Di1 f (a). i1 ,...,ik =1 In particular, (x∇)f (a) = x1 fx1 (a) + x2 fx2 (a) + · · · + xn fxn (a) and (∇ x)2 f (a) = Pn ∂2f i,j=1 xi xj ∂xi ∂xj . 7.3.2 Taylor’s Formula Theorem 7.11 Let f ∈ Ck+1 (U), a ∈ U, and x ∈ Rn such that a + tx ∈ U for all t ∈ [0, 1]. Then there exists θ ∈ [0, 1] such that k X 1 1 (x ∇)m f (a) + (x ∇)k+1f (a + θx) f (a + x) = m! (k + 1)! m=0 f (a + x) = f (a) + n X i=1 n 1 X xi fxi (a) + xi xj fxi xj (a) + · · · + 2! i,j=1 X 1 + xi1 · · · xik+1 fxi1 ···xik+1 (a + θx). (k + 1)! i ,...i 1 The expression Rk+1 (a, x) = (7.37) 1 (x ∇)k+1 f (a (k+1)! k+1 + θx) is called the Lagrange remainder term. Proof. Consider the function h : [0, 1] → R, h(t) = f (a + tx). By Corollary 7.10, h is a (k + 1)-times continuously differentiable. By Taylor’s theorem for functions in one variable (Theorem 4.15 with x = 1 and a = 0 therein), we have k X h(m) (0) h(k+1) (θ) + . f (a + x) = h(1) = m! (k + 1)! m=0 By Corollary 7.10 for m = 1, . . . , k we have 1 h(m) (0) = (x ∇)m f (a). m! m! and 1 h(k+1) (θ) = (x ∇)k+1 f (a + θx); (k + 1)! (k + 1)! the assertion follows. It is often convenient to substitute x := x + a. Then the Taylor expansion reads f (x) = k X 1 1 ((x − a) ∇)m f (a) + ((x − a) ∇)(k+1) f (a + θ(x − a)) m! (k + 1)! m=0 n X n 1 X f (x) = f (a) + (xi − ai )fxi (a) + (xi − ai )(xj − aj )fxi xj (a) + · · · + 2! i,j=1 i=1 X 1 + (xi1 − ai1 ) · · · (xik+1 − aik+1 )fxi1 ···xik+1 (a + θ(x − a)). (k + 1)! i ,...,i 1 k+1 7.3 Taylor’s Formula 209 We write the Taylor formula for the case n = 2, k = 3: f (a + x,b + y) = f (a, b) + (fx (a, b)x + fy (a, b)y) + 1 + fxx (a, b)x2 + 2fxy (a, b) xy + fyy (a, b)y 2 + 2! 1 fxxx (a, b)x3 + 3fxxy (a, b)x2 y + 3fxyy (a, b)xy 2 + fyyy (a, b)y 3 ) + R4 (a, x). + 3! T k k If f ∈ ∞ k=0 C (U) = {f : f ∈ C (U) ∀ k ∈ N} and lim Rk (a, x) = 0 for all x ∈ U, then k→∞ ∞ X 1 f (x) = ((x − a) ∇)m f (a). m! m=0 The r.h.s. is called the Taylor series of f at a. Example 7.9 (a) We compute the Taylor expansion of f (x, y) = cos x sin y at (0, 0) to the third order. We have fx = − sin x sin y, fy = cos x cos y, fx (0, 0) = 0, fxx = − cos x sin y, fxx (0, 0) = 0, fxxy = − cos x cos y, fxxy (0, 0) = −1, fy (0, 0) = 1, fyy = − cos x sin y, fyy (0, 0) = 0, fyyy = − cos x cos y, fyyy (0, 0) = −1, fxy = − sin x cos y fxy (0, 0) = 0. fxyy (0, 0) = fxxx (0, 0) = 0. Inserting this gives 1 −3x2 y − y 3 + R4 (x, y; 0). 3! The same result can be obtained by multiplying the Taylor series for cos x and sin y: y3 1 x2 x4 y3 + ∓··· y− ± · · · = y − x2 y − +··· 1− 2 4! 3! 2 6 f (x, y) = y + 2 (b) The Taylor series of f (x, y) = exy at (0, 0) is ∞ X (xy 2 )n n=0 n! 1 = 1 + xy 2 + x2 y 4 + · · · ; 2 it converges all over R2 to f (x, y). Corollary 7.12 (Mean Value Theorem) Let f : U → R be continuously differentiable, a ∈ U, x ∈ Rn such that a + tx ∈ U for all t ∈ [0, 1]. Then there exists θ ∈ [0, 1] such that f (a + x) − f (a) = ∇f (a + θx)·x, f (y) − f (x) = ∇f ((1 − θ)x + θy)·(y − x). This is the special case of Taylor’s formula with k = 0. (7.38) 7 Calculus of Functions of Several Variables 210 Corollary 7.13 Let f : U → R be k times continuously differentiable, a ∈ U, x ∈ Rn such that a + tx ∈ U for all t ∈ [0, 1]. Then there exists ϕ : U → R such that k X 1 f (a + x) = (x ∇)m f (a) + ϕ(x), m! m=0 where lim x→0 ϕ(x) kxkk (7.39) = 0. Proof. By Taylor’s theorem for f ∈ Ck (U), there exists θ ∈ [0, 1] such that f (x + a) = k−1 k X X 1 1 1 ! (x ∇)m f (a) + (x ∇)k f (a + θx) = (x ∇)m f (a) + ϕ(x). m! k! m! m=0 m=0 This implies ϕ(x) = 1 (x ∇)k f (a + θx) − (x ∇)k f (a) . k! Since | xi1 · · · xik | ≤ kxk . . . kxk = kxkk for x 6= 0, | ϕ(x) | kxkk ≤ 1 k ∇ f (a + θx) − ∇k f (a) . k! Since all kth partial derivatives of f are continuous, Di1 i2 ···ik (f (a + θx) − f (a)) −→ 0. x→0 This proves the claim. Remarks 7.7 (a) With the above notations let Pm (x) = ((x − a) ∇)m f (a). m! Then Pm is a polynomial of degree m in the set of variables x = (x1 , . . . , xn ) and we have f (x) = k X m=0 Pm (x) + ϕ(x), lim x→a kϕ(x)k kx − akk = 0. Let us consider in more detail the cases m = 0, 1, 2. Case m = 0. D 0 f (a) 0 x = f (a). P0 (x) = 0! P0 is the constant polynomial with value f (a). Case m = 1. We have P1 (x) = n X j=1 fxj (a)(xj − aj ) = grad f (a)·(x − a) 7.4 Extrema of Functions of Several Variables 211 Using Corollary 7.13 the first order approximation of a continuously differentiable function is f (x) = f (a) + grad f (a)·(x − a) + ϕ(x), ϕ(x) = 0. x→a kx − ak lim The linearization of f at a is L(x) = P0 (x) + P1 (x). Case m = 2. n 1X fx x (a)(xi − ai )(xj − aj ). P2 (x) = 2 i,j=1 i j Hence, P2 (x) is quadratic with the corresponding matrix Corollary 7.13 (m = 2) we have for f ∈ C2 (U) 1 f (a) 2 xi xj 1 f (a + x) = f (a) + grad f (a)·x + x⊤ · Hess f (a)·x + ϕ(x), 2 (7.40) . As a special case of ϕ(x) = 0, x→0 kxk2 lim (7.41) where ( Hess f )(a) = fxi xj (a) n i,j=1 (7.42) is called the Hessian matrix of f at a ∈ U. The Hessian matrix is symmetric by Schwarz’ lemma. 7.4 Extrema of Functions of Several Variables Definition 7.8 Let f : U → R be a function. The point x ∈ U is called local maximum (minimum) of f if there exists a neighborhood Uε (x) ⊂ U of x such that f (x) ≥ f (y) (f (x) ≤ f (y)) for all y ∈ Uε (x). A local extremum is either a local maximum or a local minimum. Proposition 7.14 Let f : U → R be partial differentiable. If f has a local extremum at x ∈ U then grad f (x) = 0. Proof. For i = 1, . . . , n consider the function gi (t) = f (x + tei ). This is a differentiable function of one variable, defined on a certain interval (−ε, ε). If f has an extremum at x, then gi has an extremum at t = 0. By Proposition 4.7 gi′ (0) = 0. Since gi′ (0) = limt→0 f (x+tei )−f (x) t = fxi (x) and i was arbitrary, it follows that grad f (x) = (Di f (x), . . . , Dn f (x)) = 0. 7 Calculus of Functions of Several Variables 212 p Example 7.10 Let f (x, y) = 1 − x2 − y 2 be defined on the open unit disc U = {(x, y) ∈ R2 | x2 + y 2 < 1}. Then grad f (x, y) = (−x/r, −y/r) = p 0 if and only if x = y = 0. If f has an extremum in U then at the origin. Obviously, f (x, y) = 1 − x2 − y 2 ≤ 1 = f (0, 0) for all points in U such that f attains its global (and local) maximum at (0, 0). To obtain a sufficient criterion for the existence of local extrema we have to consider the Hessian matrix. Before, we need some facts from Linear Algebra. Definition 7.9 Let A ∈ Rn×n be a real, symmetric n × n-matrix, that is aij = aji for all i, j = 1, . . . , n. The associated quadratic form Q(x) = n X i,j=1 aij xi xj = x⊤ · A · x is called positive definite negative definite indefinite positive semidefinite negative semidefinite if Q(x) > 0 for all x 6= 0, if Q(x) < 0 for all x 6= 0, if Q(x) > 0, Q(y) < 0 for some x, y, if Q(x) ≥ 0 for all x and Q is not positive definite, if Q(x) ≤ 0 for all x and Q is not negative definite. Also, we say that the corresponding matrix A is positive defininite if Q(x) is. Example 7.11 Let n = 2, Q(x) = Q(x1 , x2 ). Then Q1 (x) = 3x21 + 7x22 is positive definite, Q2 (x) = −x21 − 2x22 is negative definite, Q3 (x) = x21 − 2x22 is indefinite, Q4 (x) = x21 is positive semidefinite, and Q5 (x) = −x22 is negative semidefinite. Proposition 7.15 (Sylvester) Let A be a real symmetric n × n-matrix and Q(x) = x·Ax the corresponding quadratic form. For k = 1, · · · , n let a11 · · · a1k .. , D = det A . Ak = ... k k . ak1 · · · akk Let λ1 , . . . , λn be the eigenvalues of A. Then (a) Q is positive definite if and only if λ1 > 0, λ2 > 0, . . . , λn > 0. This is the case if and only if D1 > 0, D2 > 0,. . . , Dn > 0. (b) Q(x) is negative definite if and only if λ1 < 0, λ2 < 0,. . . ,λn < 0. This is the case if and only if (−1)k Dk > 0 for all k = 1, . . . , n. (c) Q(x) is indefinite if and only if, A has both positive and negative eigenvalues. Example 7.12 Case n = 2. Let A ∈ R 2×2 , A = Sylvester’s criterion A is (a) positive definite if and only if det A > 0 and a > 0, a b , be a symmetric matrix. By b c 7.4 Extrema of Functions of Several Variables 213 (b) negative definite if and only if det A > 0 and a < 0, (c) indefinite if and only if det A < 0, (d) semidefinite if and only if det A = 0. Proposition 7.16 Let f : U → R be twice continuously differentiable and let grad f (a) = 0 at some point a ∈ U. (a) If Hess f (a) is positive definite, then f has a local minimum at a. (b) If Hess f (a) is negative definite, then f has a local maximum at a. (c) If Hess f (a) is indefinite, then f has not a local extremum at a. Note that in general there is no information on a if Hess f (a) is semidefinit. Proof. By (7.41) and since grad f (a) = 0, 1 f (a + x) = f (a) + x·A(x) + ϕ(x), 2 ϕ(x) = 0, x→0 kxk2 lim (7.43) where A = Hess f (a). (a) Let A be positive definite. Since the unit sphere S = {x ∈ Rn | kxk = 1} is compact (closed and bounded) and the map Q(x) = x·A(x) is continuous, the function attains its minimum, say m, on S, see Proposition 6.24, m = min{x·A(x) | x ∈ S}. Since Q(x) is positive definite and 0 6∈ S, m > 0. If x is nonzero, y = x/ kxk ∈ S and therefore 1 x A(x) 1 x m ≤ y ·A(y) = = x·A = x·A(x), · kxk kxk kxk kxk kxk2 This implies Q(x) = x·A(x) ≥ m kxk2 for all x ∈ U. Since ϕ(x)/ kxk2 −→ 0 as x → 0, there exists δ > 0 such that kxk < δ implies − m m kxk2 ≤ ϕ(x) ≤ kxk2 . 4 4 From (7.43) it follows f (a + x) = f (a) + m m 1 1 Q(x) + ϕ(x) ≥ f (a) + m kxk2 − kxk2 ≥ f (a) + kxk2 , 2 2 4 4 hence f (a + x) > f (a), if 0 < kxk < δ, and f has a strict (isolated) local minimum at a. (b) If A = Hess f (a) is negative definite, consider −f in place of f and apply (a). (c) Let A = Hess f (a) indefinite. We have to show that in every neighborhood of a there exist x′ and x′′ such that f (x′′ ) < f (a) < f (x′ ). Since A is indefinite, there is a vector x ∈ Rn \ 0 such that x·A(x) = m > 0. Then for small t we have m 1 f (a + tx) = f (a) + tx·A(tx) + ϕ(tx) = f (a) + t2 + ϕ(tx). 2 2 7 Calculus of Functions of Several Variables 214 If t is small enough, − m4 t2 ≤ ϕ(tx) ≤ m 2 t, 4 hence f (a + tx) > f (a), if 0 < | t | < δ. Similarly, if y ∈ Rn \ 0 satisfies y ·A(y) < 0, for sufficiently small t we have f (a + ty) < f (a). ! Example 7.13 (a) f (x, y) = x2 + y 2. Here ∇f = (2x, 2y) = 0 if and only if x = y = 0. Furthermore, 2 0 Hess f (x, y) = 0 2 is positive definite. f has a (strict) local minimum at (0, 0) (b)Find the local extrema of z = f (x, y) = 4x2 − y 2 on R2 . (the graph is a hyperbolic paraboloid). We find that the necesary condition ∇f = 0 implies fx = 8x = 0, fy = −2y = 0; thus x = y = 0. Further, fxx fxy 8 0 Hess f (x, y) = . = fyx fyy 0 −2 The Hessian matrix at (0, 0) is indefinite; the function has not an extremum at the origin (0, 0). (c) f (x, y) = x2 + y 3 . ∇f (x, y) = (2x, 3y 2) vanishes if and only if x = y = 0. Furthermore, 2 0 Hess f (0, 0) = 0 0 is positive semidefinit. However, there is no local extremum at the origin since f (ε, 0) = ε2 > 0 = f (0, 0) > −ε3 = f (0, −ε). (d) f (x, y) = x2 + y 4. Again the Hessian matrix at (0, 0) is positive semidefinite. However, (0, 0) is a strict local minimum. Local and Global Extrema To compute the global extrema of a function f : U → R where U ⊂ Rn is open and U is the closure of U we have go along the following lines: (a) Compute the local extrema on U; (b) Compute the global extrema on the boundary ∂U = U ∩ U c ; (c) If U is unbounded without boundary (as U = R), consider the limits at infinity. Note that sup f (x) = max{maximum of all local maxima in U, sup f (x)}. x∈U x∈∂U To compute the global extremum of f on the boundary one has to find the local extrema on the interior point of the boundary and to compare them with the values on the boundary of the boundary. 7.4 Extrema of Functions of Several Variables 215 Example 7.14 Find the global extrema of f (x, y) = x2 y on U = {(x, y) ∈ R2 | x2 + y 2 ≤ 1} (where U is the open unit disc.) Since grad f = (fx , fy ) = (2xy, x2 ) local extrema can appear only on the y-axis x = 0, y is arbitrary. The Hessian matrix at (0, y) is fxx fxy 2y 2x 2y 0 Hess f (0, y) = = = . fxy fyy x=0 2x 0 x=0 0 0 This matrix is positive semidefinite in case y > 0, negative semidefinite in case y < 0 and 0 at (0, 0). Hence, the above criterion gives no answer. We have to apply the definition directly. In case y > 0 we have f (x, y) = x2 y ≥ 0 for all x. In particular f (x, y) ≥ f (0, y) = 0. Hence (0, y) is a local minimum. Similarly, in case y < 0, f (x, y) ≤ f (0, y) = 0 for all x. Hence, f has a local maximum at (0, y), y < 0. However f takes both positive and negative values in a neighborhood of (0, 0), for example f (ε, ε) = ε3 and f (ε, −ε) = −ε3 . Thus (0, 0) is not a local extremum. We have to consider the boundary x2 + y 2 = 1. Inserting x2 = 1 − y 2 we obtain g(y) = f (x, y)|x2 +y2 =1 = x2 y x2 +y2 =1 = (1 − y 2)y = y − y 3 , | y | ≤ 1. We compute the local extrema of the boundary x2 +y 2 = 1 (note, that the circle has no boundary, such that the local extrema are actually the global extrema). ! g ′ (y) = 1 − 3y 2 = 0, 1 |y| = √ . 3 √ √ √ Since g ′′ (1/ 3) < 0 and g ′′ (−1/ 3) > 0, g attains its maximum 3√2 3 at y = 1/ 3. Since this is greater than the local maximum of f at (0, y), y > 0, f attains its global maximum at the two points ! r 2 1 M1,2 = ± ,√ , 3 3 √ where f (M1,2 ) = x2 y = 3√2 3 . g attains its minimum − 3√2 3 at y = −1/ 3. Since this is less than the local minimum of f at (0, y), y < 0, f attains its global minimum at the two points ! r 2 1 , −√ , m1,2 = ± 3 3 where f (m1,2 ) = x2 y = − 3√2 3 . The arithmetic-geometric mean inequality shows the same result for x, y > 0: 1 x2 + y 2 ≥ = 3 3 x2 2 + x2 2 3 + y2 ≥ x2 x2 2 y 2 2 31 2 =⇒ x2 y ≤ √ . 3 3 (b) Among all boxes with volume 1 find the one where the sum of the length of the 12 edges is minimal. Let x, y and z denote the length of the three perpendicular edges of one vertex. By assumption xyz = 1; and g(x, y, z) = 4(x + y + z) is the function to minimize. 7 Calculus of Functions of Several Variables 216 Local Extrema. Inserting the constraint z = 1/(xy) we have to minimize 1 f (x, y) = 4 x + y + on U = {(x, y) | x > 0, y > 0}. xy The necessary condition is 1 = 0, fx = 4 1 − 2 xy =⇒ x2 y = xy 2 = 1 =⇒ x = y = 1. 1 fy = 4 1 − 2 = 0, xy Further, fxx = such that 8 , x3 y fyy = 8 , xy 3 fxy = 4 x2 y 2 8 4 = 64 − 16 > 0; det Hess f (1, 1) = 4 8 hence f has an extremum at (1, 1). Since fxx (1, 1) = 8 > 0, f has a local minimum at (1, 1). Global Extrema. We show that (1, 1) is even the global minimum on the first quadrant U. 1 Consider N = {(x, y) | 25 ≤ x, y ≤ 5}. If (x, y) 6∈ N, f (x, y) ≥ 4(5 + 0 + 0) = 20, Since f (x, y) ≥ 12 = f (1, 1), the global minimum of f on the right-upper quadrant is attained on the compact rectangle N. Inserting the four boundaries x = 5, y = 5, x = 1/5, and y = 1/5, in all cases, f (x, y) ≥ 20 such that the local minimum (1, 1) is also the global minimum. 7.5 The Inverse Mapping Theorem Suppose that f : R → R is differentiable on an open set U ⊂ R, containing a ∈ U, and f ′ (a) 6= 0. If f ′ (a) > 0, then there is an open interval V ⊂ U containing a such that f ′ (x) > 0 for all x ∈ V . Thus f is strictly increasing on V and therefore injective with an inverse function g defined on some open interval W containing f (a). Moreover g is differentiable (see Proposition 4.5) and g ′ (y) = 1/f ′ (x) if f (x) = y. An analogous result in higher dimensions is more involved but the result is very important. W n R U f V 11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 a 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 11111 00000 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 f(a) 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 00000 11111 Rn 7.5 The Inverse Mapping Theorem 217 Theorem 7.17 (Inverse Mapping Theorem) Suppose that f : Rn → Rn is continuously differentiable on an open set U containing a, and det f ′ (a) 6= 0. Then there is an open set V ⊂ U containing a and an open set W containing f (a) such that f : V → W has a continuous inverse g : W → V which is differentiable and for all y ∈ W . For y = f (x) we have g ′ (y) = (f ′ (x)) −1 , Dg(y) = (Df (x))−1 . (7.44) For the proof see [Rud76, 9.24 Theorem] or [Spi65, 2-11]. Corollary 7.18 Let U ⊂ Rn be open, f : U → Rn continuously differentiable and det f ′ (x) 6= 0 for all x ∈ U. Then f (U) is open in Rn . Remarks 7.8 (a) One main part is to show that there is an open set V ⊂ U which is mapped onto an open set W . In general, this is not true for continuous mappings. For example sin x maps the open interval (0, 2π) onto the closed set [−1, 1]. Note that sin x does not satisfy the assumptions of the corollary since sin′ (π/2) = cos(π/2) = 0. (b) Note that continuity of f ′ (x) in a neighborhood of a, continuity of the determinant mapping det : Rn×n → R, and det f ′ (a) 6= 0 implies that det f ′ (x) 6= 0 in a neighborhood V1 of a, see homework 10.4. This implies that the linear mapping Df (x) is invertible for x ∈ V1 . Thus, Df (x)−1 and (f ′ (x))−1 exist for x ∈ V1 — the linear mapping Df (x) is regular. (c) Let us reformulate the statement of the theorem. Suppose y1 = f1 (x1 , . . . , xn ), y2 = f2 (x1 , . . . , xn ), .. . yn = fn (x1 , . . . , xn ) is a system of n equations in n variables x1 , . . . , xn ; y1 , . . . , yn are given in a neighborhood W of f (a). Under the assumptions of the theorem, there exists a unique solution x = g(y) of this system of equations x1 = g1 (y1 , . . . , yn ), x2 = g2 (y1 , . . . , yn ), .. . xn = gn (y1, . . . , yn ) in a certain neighborhood (x1 , . . . , xn ) ∈ V of a. Note that the theorem states the existence of such a solution. It doesn’t provide an explicit formula. (d) Note that the inverse function g may exist even if det f ′ (x) = 0. For example f : R → R, √ defined by f (x) = x3 has f ′ (0) = 0; however g(y) = 3 x is inverse to f (x). One thing is certain if det f ′ (a) = 0 then g cannot be differentiable at f (a). If g were differentiable at f (a), the chain rule applied to g(f (x)) = x would give g ′ (f (a)) · f ′ (a) = id 7 Calculus of Functions of Several Variables 218 and consequently det g ′ (f (a)) det f ′ (a) = det id = 1 contradicting det f ′ (a) = 0. (e) Note that the theorem states that under the given assumptions f is locally invertible. There is no information about the existence of an inverse function g to f on a fixed open set. See Example 7.15 (a) below. Example 7.15 (a) Let x = r cos ϕ and y = r sin ϕ be the polar coordinates in R2 . More precisely, let r cos ϕ x , f : R2 → R2 . = f (r, ϕ) = r sin ϕ y The Jacobian is ∂(x, y) xr xϕ cos ϕ −r sin ϕ = = r. = ∂(r, ϕ) yr yϕ sin ϕ r cos ϕ Let f (r0 , ϕ0 ) = (x0 , y0 ) 6= (0, 0), then r0 6= 0 and the Jacobian of f at (r0 , ϕ0 ) is non-zero. Since all partial derivatives of f with respect to r and ϕ exist and they are continuous on R2 , the assumptions of the theorem are satisfied. Hence, in a neighborhood U of (x0 , y0 ) there exists a continuously differentiable inverse function r = r(x, y), ϕ = ϕ(x, y). In this case, the function p 2 can be given explicitly, r = x + y 2 , ϕ = arg(x, y). We want to compute the Jacobi matrix of the inverse function. Since the inverse matrix −1 cos ϕ sin ϕ cos ϕ −r sin ϕ = − 1r sin ϕ 1r cos ϕ sin ϕ r cos ϕ we obtain by the theorem g ′(x, y) = ∂(r, ϕ) ∂(x, y) = cos ϕ − 1r sin ϕ sin ϕ = 1 cos ϕ r x x2 +y 2 y − x2 +y 2 √ √ y x2 +y 2 x x2 +y 2 ! ; in particular, the second row gives the partial derivatives of the argument function with respect to x and y ∂ arg(x, y) −y x ∂ arg(x, y) = 2 = , . ∂x x + y2 ∂y x2 + y 2 Note that we have not determined the explicit form of the argument function which is not unique since f (r, ϕ+2kπ) = f (r, ϕ), for all k ∈ Z. However, the gradient takes always the above form. Note that det f ′ (r, ϕ) 6= 0 for all r 6= 0 is not sufficient for f to be injective on R2 \ {(0, 0)}. (b) Let f : R2 → R2 be given by (u, v) = f (x, y) where u(x, y) = sin x − cos y, v(x, y) = − cos x + sin y. Since ∂(u, v) ux uy cos x sin y = = = cos x cos y − sin x sin y = cos(x + y) ∂(x, y) vx vy sin x cos y 7.6 The Implicit Function Theorem 219 f is locally invertible at (x0 , y0 ) = ( π4 , − π4 ) since the Jacobian at (x0 , y0 ) is cos 0 = 1 6= 0. √ Since f ( π4 , − π4 ) = (0, − 2), the inverse function g(u, v) = (x, y) is defined in a neighborhood √ √ of (0, − 2) and the Jacobi matrix of g at (0, − 2) is !−1 ! √1 √1 √1 √1 − 2 2 2 2 = . √1 √1 √1 √1 − 2 2 2 2 Note that at point ( π4 , π4 ) the Jacobian of f vanishes. There is indeed no neighborhood of ( π4 , π4 ) where f is injective since for all t ∈ R π π π π = (0, 0). + t, − t = f , f 4 4 4 4 7.6 The Implicit Function Theorem Motivation: Hyper Surfaces Suppose that F : U → R is a continuously differentiable function and grad F (x) 6= 0 for all x ∈ U. Then S = {(x1 , . . . , xn ) | F (x1 , . . . , xn ) = 0} is called a hyper surface in Rn . A hyper surface in Rn has dimension n − 1. Examples are hyper planes a1 x1 + · · · + an xn + c = 0 ((a1 , . . . , an ) 6= 0), spheres x21 + · · · + x2n = r 2 . The graph of differentiable functions f : U → R is also a hyper surface in Rn+1 Γf = {(x, f (x)) ∈ Rn+1 | x ∈ U}. Question: Is any hyper surface locally the graph of a differentiable function? More precisely, we may ask the following question: Suppose that f : Rn × R → R is differentiable and f (a1 , . . . , an , b) = 0. Can we find for each (x1 , . . . , xn ) near (a1 , . . . , an ) a unique y near b such that f (x1 , . . . , xn , y) = 0? The answer to this question is provided by the Implicit Function Theorem (IFT). Consider the function f : R2 → R defined by f (x, y) = x2 + y 2 − 1. If we choose (a, b) with a, b > 0, there are open intervals A and B containig a and b with the following property: if x ∈ A, there is a unique y ∈ B with f (x, y) = 0. We can therefore define a function g : A → B √ by the condition g(x) ∈ B and f (x, g(x)) = 0. If b > 0 then g(x) = 1 − x2 ; if b < 0 then √ g(x) = − 1 − x2 . Both functions g are differentiable. These functions are said to be defined implicitly by the equation f (x, y) = 0. On the other hand, there exists no neighborhood of (1, 0) such that f (x, y) = p 0 can locally be solved for y. Note that fy (1, 0) = 0. However it can be solved for x = h(y) = 1 − y 2. Theorem 7.19 Suppose that f : Rn × Rm → Rm , f = f (x, y), is continuously differentiable in an open set containing (a, b) ∈ Rn+m and f (a, b) = 0. Let Dfy (x, y) be the linear mapping from Rm into Rm given by ∂(f1 , . . . , fm ) ∂fi (x, y) i Dy f (x, y) = Dn+j f (x, y) = , i, j = 1, . . . , m. (x, y) = ∂(y1 , . . . , ym ) ∂yj (7.45) 7 Calculus of Functions of Several Variables 220 If det Dy f (a, b) 6= 0 there is an open set A ⊂ Rn containing a and an open set B ⊂ Rm containing b with the following properties: There exists a unique continuously differentiable function g : A → B such that (a) g(a) = b, (b) f (x, g(x)) = 0 for all x ∈ A. For the derivative Dg(x) ∈ L(Rn , Rm ) we have Dg(x) = −(Dfy (x, g(x)))−1 ◦Dfx (x, g(x)), g ′ (x) = −(fy′ (x, g(x)))−1 ·fx′ (x, g(x)). The Jacobi matrix g ′ (x) is given by ∂(g1 , . . . , gm ) ∂(f1 , . . . , fm ) ′ −1 (x) = −fy (x, g(x)) · (x, g(x)) (7.46) ∂(x1 , . . . , xn ) ∂(x1 , . . . , xn ) n X ∂(gk (x)) ∂fl (x, g(x)) =− (fy′ (x, g(x))−1 )kl · , k = 1, . . . , m, j = 1, . . . , n. ∂xj ∂xj l=1 Idea of Proof. Define F : Rn × Rm → Rn × Rm by F (x, y) = (x, f (x, y)). Let M = fy′ (a, b). Then 1n 0n,m ′ F (a, b) = =⇒ det F ′ (a, b) = det M 6= 0. 0m,n M By the inverse mapping theorem Theorem 7.17 there exists an open set W ⊂ Rn × Rm containing F (a, b) = (a, 0) and an open set V ⊂ Rn × Rm containing (a, b) which may be of the form A × B such that F : A × B → W has a differentiable inverse h : W → A × B. Since g is differentiable, it is easy to find the Jacobi matrix. In fact, since fi (x, g(x)) = 0, ∂f on both sides gives by the chain rule i = 1, . . . n, taking the partial derivative ∂x j m ∂fi (x, g(x)) X ∂fi (x, g(x)) ∂gk (x) 0= + · ∂xj ∂yk ∂xj k=1 ∂fi (x, g(x)) ∂gk (x) ′ 0= + fy (x, g(x)) · ∂xj ∂xj ∂fi (x, g(x)) ∂gk (x) − = fy′ (x, g(x)) · . ∂xj ∂xj ∂fi (x, g(x)) − ∂xj Since det fy′ (a, b) 6= 0, det fy′ (x, y) 6= 0 in a small neighborhood of (a, b). Hence fy′ (x, g(x)) is invertible and we can multiply the preceding equation from the left by (fy′ (x, g(x)))−1 which gives (7.46). Remarks 7.9 (a) The theorem gives a sufficient condition for “locally” solving the system of equations 0 = f1 (x1 , . . . , xn , y1 , . . . ym ), .. . 0 = fm (x1 , . . . , xn , y1 , . . . , ym ) 7.6 The Implicit Function Theorem 221 with given x1 , . . . , xn for y1 , . . . , ym . (b) We rewrite the statement in case n = m = 1: If f (x, y) is continously differentiable on an open set G ⊂ R2 which contains (a, b) and f (a, b) = 0. If fy (a, b) 6= 0 then there exist δ, ε > 0 such that the following holds: for every x ∈ Uδ (a) there exists a unique y = g(x) ∈ Uε (b) with f (x, y) = 0. We have g(a) = b; the function y = g(x) is continuously differentiable with g ′ (x) = − Be careful, note fx (x, g(x)) 6= d dx fx (x, g(x)) . fy (x, g(x)) (f (x, g(x))). Example 7.16 (a) Let f (x, y) = sin(x + y) + exy − 1. Note that f (0, 0) = 0. Since fy (0, 0) = cos(x + y) + xexy |(0,0) = cos 0 + 0 = 1 6= 0 f (x, y) = 0 can uniquely be solved for y = g(x) in a neighborhood of x = 0, y = 0. Further fx (0, 0) = cos(x + y) + yexy |(0,0) = 1. By Remark 7.9 (b) fx (x, y) cos(x + g(x)) + g(x) exg(x) = . g (x) = − fy (x, y) y=g(x) cos(x + g(x)) + x exg(x) ′ In particular g ′ (0) = −1. Remark. Differentiating the equation fx + fy g ′ = 0 we obtain 0 = fxx + fxy g ′ + (fyx + fyy g ′ )g ′ + fy g ′′ 1 fxx + 2fxy g ′ + fyy (g ′)2 g ′′ = − fy −fxx fy2 + 2fxy fx fy − fyy fx2 ′′ g = . fy3 g ′ =− ffx y Since fxx (0, 0) = − sin(x + y) + y 2 exy (0,0) = 0, fyy (0, 0) = − sin(x + y) + x2 exy (0,0) = 0, fxy (0, 0) = − sin(x + y) + exy (1 + xy)|(0,0) = 1, we obtain g ′′ (0) = 2. Therefore the Taylor expansion of g(x) around 0 reads g(x) = x + x2 + r3 (x). (b) Let γ(t) = (x(t), y(t)) be a differentiable curve γ ∈ C2 ([0, 1]) in R2 . Suppose in a neighborhood of t = 0 the curve describes a function y = g(x). Find the Taylor polynomial of degree 2 of g at x0 = x(0). 7 Calculus of Functions of Several Variables 222 Inserting the curve into the equation y = g(x) we have y(t) = g(x(t)). Differentiation gives ẏ = g ′ ẋ, ÿ = g ′′ẋ2 + g ′ ẍ Thus g ′(x) = ẏ , ẋ g ′′(x) = ÿ − g ′ẍ ÿ ẋ − ẍẏ = ẋ2 ẋ3 Now we have the Taylor ploynomial of g at x0 T2 (g)(x) = x0 + g ′(x0 )(x − x0 ) + g ′′ (x0 ) (x − x0 )2 . 2 (c) The tangent hyper plane to a hyper surface. Anyone who understands geometry can understand everything in this world (Galileo Galilei, 1564 – 1642) Suppose that F : U → R is continuously differentiable, a ∈ U, F (a) = 0, and grad F (a) 6= 0. Then n X ∇F (a)·(x − a) = Fxi (a)(xi − ai ) = 0 i=1 is the equation of the tangent hyper plane to the surface F (x) = 0 at point a. Proof. Indeed, since the gradient at a is nonzero we may assume without loss of generality that Fxn (a) 6= 0. By the IFT, F (x1 , . . . , xn−1 , xn ) = 0 is locally solvable for xn = g(x1 , . . . , xn−1 ) in a neighborhood of a = (a1 , . . . , an ) with g(ã) = an , where ã = (a1 , . . . , an−1 ) and x̃ = (x1 , . . . , xn−1 ). Define the tangent hyperplane to be the graph of the linearization of g at (a1 , . . . , an−1 , an ). By Example 7.7 (a) the hyperplane to the graph of g at ã is given by xn = g(ã) + grad g(ã)·x̃. Since F (ã, g(ã)) = 0, by the implicit function theorem Fx (a) ∂g(ã) =− j , ∂xj Fxn (a) j = 1, . . . , n − 1. Inserting this into (7.47) we have n−1 xn − an = − 1 X Fx (a)(xj − aj ). Fxn (a) j=1 j Multiplication by −Fxn (a) gives −Fxn (a)(xn − an ) = n−1 X j=1 Fxj (a)(xj − aj ) =⇒ 0 = grad F (a)·(x − a). (7.47) 7.7 Lagrange Multiplier Rule 223 U−1 U0 U1 Let f : U → R be differentiable. For c ∈ R define the level set Uc = {x ∈ U | f (x) = c}. The set Uc may be empty, may consist of a single point (in case of local extrema) or, in the “generic” case, that is if grad F (a) 6= 0 and Uc is non-empty, Uc it is a (n − 1)-dimensional hyper surface. {Uc | c ∈ R} is family of nonintersecting subsets of U which cover U. 7.7 Lagrange Multiplier Rule This is a method to find local extrema of a function under certain constraints. Consider the following problem: Find local extremma of a function f (x, y) of two variables where x and y are not independent from each other but satisfy the constraint ϕ(x, y) = 0. Suppose further that f and ϕ are continuously differentiable. Note that the level sets Uc = {(x, y) ∈ R2 | f (x, y) = c} form a family of non-intersecting curves in the plane. f=c We have to find the curve f (x, y) = c intersecting the constraint curve ϕ(x, y) = 0 where c is as large or as small as possible. Usually f = c intersects ϕ = 0 if c monotonically changes. However if c is maximal, the curve f = c touches the graph ϕ = 0. In other words, the tangent lines coincide. This means that the defining normal vectors to the tangent lines are scalar multiples of each other. ϕ =0 Theorem 7.20 (Lagrange Multiplier Rule) Let f, ϕ : U → R, U ⊂ Rn is open, be continuously differentiable and f has a local extrema at a ∈ U under the constraint ϕ(x) = 0. Suppose that grad ϕ(a) 6= 0. Then there exists a real number λ such that grad f (a) = λ grad ϕ(a). This number λ is called Lagrange multiplier. Proof. The idea is to solve the constraint ϕ(x) = 0 for one variable and to consider the “free” extremum problem with one variable less. Suppose without loss of generality that ϕxn (a) 6= 0. By the implicit function theorm we can solve ϕ(x) = 0 for xn = g(x1 , . . . , xn−1 ) in a neighborhood of x = a. Differentiating ϕ(x̃, g(x̃)) = 0 and inserting a = (ã, an ) as before we have ϕxj (a) + ϕxn (a)gxj (ã) = 0, j = 1, . . . , n − 1. (7.48) 7 Calculus of Functions of Several Variables 224 Since h(x̃) = f (x̃, g(x̃)) has a local extremum at ã all partial derivatives of h vanish at ã: fxj (a) + fxn (a)gxj (ã) = 0, j = 1, . . . , n − 1. (7.49) Setting λ = fxn (a)/ϕxn (a) and comparing (7.48) and (7.49) we find fxj (a) = λϕxj (a), j = 1, . . . , n − 1. Since by definition, fxn (a) = λϕxn (a) we finally obtain grad f (a) = λ grad ϕ(a) which completes the proof. Example 7.17 (a) Let A = (aij ) be a real symmetric n × n-matrix, and define f (x) = x·Ax = P n−1 = {x ∈ Rn | kxk = 1}. i,j aij xi xj . We aks for the local extrema of f on the unit sphere S P This constraint can be written as ϕ(x) = kxk2 − 1 = ni=1 x2i − 1 = 0. Suppose that f attains a local minimum at a ∈ Sn−1 . By Example 7.6 (b) grad f (a) = 2A(a). On the other hand grad ϕ(a) = (2x1 , . . . , 2xn )|x=a = 2a. By Theorem 7.20 there exists a real number λ1 such that grad f (a) = 2A(a) = λ1 grad ϕ(a) = 2a, Hence A(a) = λ1 a; that is, λ is an eigenvalue of A and a the corresponding eigenvector. In particular, A has a real eigenvalue. Since Sn−1 has no boundary, the global minimum is also a local one. We find: if f (a) = a·A(a) = a·λa = λ is the global minimum, λ is the smallest eigenvalue. (b) Let a be the point of a hypersurface M = {x | ϕ(x) = 0} with minimal distance to a given point b 6∈ M. Then the line through a and b is orthogonal to M. Indeed, the function f (x) = kx − bk2 attains its minimum under the condition ϕ(x) = 0 at a. By the Theorem, there is a real number λ such that grad f (a) = 2(a − b) = λ grad ϕ(a). The assertion follows since by Example 7.16 (c), grad ϕ(a) is orthogonal to M at a and b − a is a multiple of the normal vector ∇ϕ(a). Theorem 7.21 (Lagrange Multiplier Rule — extended version) Let f, ϕi : U → R, i = 1, . . . , m, m < n, be continuously differentiable functions. Let M = {x ∈ U | ϕ1 (x) = · · · = ϕm (x) = 0} and suppose that f (x) has a local extrema at a under the constraints x ∈ M. Suppose further that the Jacobi matrix ϕ′ (a) ∈ Rm×n has maximal rank m. Then there exist real numbers λ1 , . . . , λm such that grad f (a) = grad (λ1 ϕ1 + · · · + λm ϕm )(a) = 0. Note that the rank condition ensures that there is a choice of m variables out of x1 , . . . , xn such that the Jacobian of ϕ1 , . . . , ϕm with respect to this set of variable is nonzero at a. 7.8 Integrals depending on Parameters 225 7.8 Integrals depending on Parameters Rb Problem: Define I(y) = a f (x, y) dx; what are the relations between properties of f (x, y) and of I(y) for example with respect to continuity and differentiability. 7.8.1 Continuity of I(y) Proposition 7.22 Let f (x, y) be continuous on the rectangle R = [a, b] × [c, d]. Rb Then I(y) = a f (x, y) dx is continuous on [c, d]. Proof. Let ε > 0. Since f is continuous on the compact set R, f is uniformly continuous on R (see Proposition 6.25). Hence, there is a δ > 0 such that | x − x′ | < δ and | y − y ′ | < δ and (x, y), (x′, y ′ ) ∈ R imply | f (x, y) − f (x′ , y ′) | < ε. Therefore, | y − y0 | < δ and y, y0 ∈ [c, d] imply Z b (f (x, y) − f (x, y0 )) dx ≤ ε(b − a). | I(y) − I(y0 ) | = a This shows continuity of I(y) at y0 . For example, I(y) = R1 0 arctan xy dx is continuous for y > 0. Remark 7.10 (a) Note that continuity at y0 means that we can interchange the limit and the Z b Z b Z b integral, lim f (x, y) dx = lim f (x, y) dx = f (x, y0 ) dx. y→y0 a y→y0 a a (b) A similar statement holds for y → ∞: Suppose that f (x, y) is continuous on [a, b]×[c, +∞) and limy→+∞ f (x, y) = ϕ(x) exists uniformly for all x ∈ [a, b] that is ∀ ε > 0 ∃ R > 0 ∀ x ∈ [a, b], y ≥ R : | f (x, y) − ϕ(x) | < ε. Then Rb a ϕ(x) dx exists and lim I(y) = y→∞ Z b ϕ(x) dx. a 7.8.2 Differentiation of Integrals Proposition 7.23 Let f (x, y) be defined on R = [a, b] × [c, d] and continuous as a function of x for every fixed y. Suppose that fy (x, y) exists for all (x, y) ∈ R and is continuous as a function of the two variables x and y. Then I(y) is differentiable and d I (y) = dy ′ Z a b f (x, y) dx = Z a b fy (x, y) dx. 7 Calculus of Functions of Several Variables 226 Proof. Let ε > 0. Since fy (x, y) is continuous, it is uniformly continuous on R. Hence there exists δ > 0 such that | x′ − x′′ | < δ and | y ′ − y ′′ | < δ imply | fy (x′ , y ′) − fy (x′′ , y ′′) | < ε. We have for | h | < δ Z b Z b I(y0 + h) − I(y0 ) f (x, y0 + h) − f (x, y0 ) − − fy (x, y0 ) dx fy (x, y0 ) dx ≤ h h a a Z b | fy (x, y0 + θh) − fy (x, y0 ) | dx < ε(b − a) ≤ Mean value theorem a for some θ ∈ (0, 1). Since this inequality holds for all small h, it holds for the limit as h → 0, too. Thus, Z b ′ I (y0 ) − fy (x, y0 ) dx ≤ ε(b − a). a Since ε was arbitrary, the claim follows. In case of variable integration limits we have the following theorem. Proposition 7.24 Let f (x, y) be as in Proposition 7.23. Let α(y) and β(y) be differentiable on [c, d], and suppose that α([c, d]) and β([c, d]) are contained in [a, b]. R β(y) Let I(y) = α(y) f (x, y) dx. Then I(y) is differentiable and ′ I (y) = Z Proof. Let F (y, u, v) = of calculus yields β(y) α(y) Rv u fy (x, y) dx + β ′ (y)f (β(y), y) − α′ (y)f (α(y), y). f (x, y) dx; then I(y) = F (y, α(y), β(y)). The fundamental theorem ∂F (y, u, v) = ∂v ∂F (y, u, v) = ∂u Z v ∂ f (x, y) dx = f (v, y), ∂v u Z u ∂ − f (x, y) dx = −f (u, y). ∂u v By the chain rule, the previous proposition and (7.51) we have ∂F ′ ∂F ∂F ′ + α (y) + β (y) ∂y ∂u ∂v ∂F ∂F ∂F = (y, α(y), β(y)) + (y, α(y), β(y)) α′(y) + (y, α(y), β(y)) β ′(y) ∂y ∂u ∂v Z β(y) fy (x, y) dx + α′ (y)(−f (α(y), y)) + β ′ (y)f (β(y), y). = I ′ (y) = α(y) (7.50) (7.51) 7.8 Integrals depending on Parameters 227 R4 Example 7.18 (a) I(y) = 3 sin(xy) dx is differentiable by Proposition 7.23 since fy (x, y) = x cos(xy) x = cos(xy) is continuous. Hence x ′ I (y) = Z 4 sin(xy) sin 4y sin 3y − . cos(xy) dx = = y y y 3 4 3 (b) I(y) = R sin y log y 2 ex y dx is differentiable with ′ I (y) = Z sin y 2 2 x2 ex y dx + cos yey sin log y y 1 2 − ey(log y) . y 7.8.3 Improper Integrals with Parameters R∞ f (x, y) dx exists for y ∈ [c, d]. R∞ Definition 7.10 We say that the improper integral a f (x, y) dx converges uniformly with respect to y on [c, d] if for every ε > 0 there is an A0 > 0 such that A > A0 implies Z ∞ Z A I(y) − f (x, y) dx ≡ f (x, y) dx < ε Suppose that the improper integral a a A for all y ∈ [c, d]. Note that the Cauchy and Weierstraß criteria (see Proposition 6.1 and Theorem 6.2) for uniform convergence of series of functions also hold for improper parametric integrals. For example the theorem of Weierstraß now reads as follows. RA Proposition 7.25 Suppose that a f (x, y) dx exists for all A ≥ a and y ∈ [c, d]. Suppose R∞ further that | f (x, y) | ≤ ϕ(x) for all x ≥ a and a ϕ(x) dx converges. R∞ Then a f (x, y) dx converges uniformly with respect to y ∈ [c, d]. R∞ Example 7.19 I(y) = 1 e−xy xy y 2 dx converges uniformly on [2, 4] since | f (x, y) | = e−xy xy y 2 ≤ e−2x x4 42 = ϕ(x). R∞ and 1 e−2x x4 42 dx < ∞ converges. If we add the assumption of uniform convergence then the preceding theorems remain true for improper integrals. Proposition 7.26 Let f (x, y) be continuous on {(x, y) ∈ R2 | a ≤ x < ∞, c ≤ y ≤ d}. R∞ Suppose that I(y) = a f (x, y) dx converges uniformly with respect to y ∈ [c, d]. Then I(y) is continuous on [c, d]. Proof. This proof was not carried out in the lecture. Let ε > 0. Since the improper integral converges uniformly, there exists A0 > 0 such that for all A ≥ A0 we have Z ∞ <ε f (x, y) dx A 7 Calculus of Functions of Several Variables 228 for all y ∈ [c, d]. Let A ≥ A0 be fixed. On {(x, y) ∈ R2 | a ≤ x ≤ A, c ≤ y ≤ d} f (x, y) is uniformly continuous; hence there is a δ > 0 such that | x′ − x′′ | < δ and | y ′ − y ′′ | < δ implies ε . | f (x′ , y ′ ) − f (x′′ , y ′′) | < A−a Therefore, Z A ε (A − a) = ε, for | y − y0 | < δ. | f (x, y) − f (x, y0 ) | dx < A−a a Finally, | I(y) − I(y0) | = Z A + a Z ∞ A | f (x, y) − f (x, y0 ) | ≤ 2ε for | y − y0 | < δ. We skip the proof of the following proposition. Proposition 7.27 Let fy (x, y) be continuous on {(x, y) | a ≤ x < ∞, c ≤ y ≤ d}, f (x, y) continuous with respect to x for all fixed y ∈ [c, d]. R∞ Suppose that for all y ∈ [c, d] the integral I(y) = a f (x, y) dx exists and the integral R∞ fy (x, y) dx converges uniformly with respect to y ∈ [c, d]. a R∞ Then I(y) is differentiable and I ′ (y) = a fy (x, y) dx. Combinig the results of the last Proposition and Proposition 7.25 we get the following corollary. Corollary 7.28 Let fy (x, y) be continuous on {(x, y) | a ≤ x < ∞, c ≤ y ≤ d}, f (x, y) continuous with respect to x for all fixed y ∈ [c, d]. Suppose that R∞ (a) for all y ∈ [c, d] the integral I(y) = a f (x, y) dx exists, (b) | fy (x, y) | ≤ ϕ(x) for all x ≥ a and all y R∞ (c) a ϕ(x) dx exists. R∞ Then I(y) is differentiable and I ′ (y) = a fy (x, y) dx. R∞ 2 2 Example 7.20 (a) I(y) = 0 e−x cos(2yx) dx. f (x, y) = e−x cos(2yx), fy (x, y) = 2 −2x sin(2yx) e−x converges uniformly with respect to y since 2 | fy (x, y) | ≤ 2xe−x ≤ Ke−x Hence, ′ I (y) = − Z ∞ 2 /2 . 2 2x sin(2yx) e−x dx. 0 ′ 2 2 Integration by parts with u = sin(2yx), v = −e−x 2x gives u′ = 2y cos(2yx), v = e−x and Z A Z A 2 −x2 −A2 −e 2x sin(2yx) dx = sin(2yA) e − 2y cos(2yx) e−x dx. 0 0 7.8 Integrals depending on Parameters 229 As A → ∞ the first summand on the right tends to 0; thus I(y) satisfies the ordinary differential equation I ′ (y) = −2yI(y). ODE: y ′ = −2xy; dy = −2xy dx; dy/y = −2x dx. Integration yields log y = −x2 + c; 2 y = c′ e−x . 2 The general solution is I(y) = Ce−y . We determine the constant C. Insert y = 0. Since R∞ √ 2 I(0) = 0 e−x dx = π/2, we find √ π −y2 I(y) = e . 2 R∞ (b) The Gamma function Γ(x) = 0 tx−1 e−t dt is in C∞ (R+ ). Let x > 0, say x ∈ [c, d]. Recall from Subsection 5.3.3 the definition and the proof of the convergence of the imR1 proper integrals Γ1 = 0 f (x, t) dt and Γ2 = R∞ f (x, t) dt, where f (x, t) = tx−1 e−t . Note 1 that Γ1 (x) is an improper integral at t = 0 + 0. By L’Hospital’s rule lim tα log t = 0 for all Gamma Function 2.2 2 1.8 1.6 1.4 1.2 t→0+0 α > 0. In particular, | log t | < t−c/2 if 0 < t < t0 < 1. 1 0.5 1 1.5 2 2.5 3 x Since e−t < 1 and moreover tx−1 < tc−1 for t < t0 by Lemma 1.23 (b) we conclude that ∂ f (x, t) = tx−1 log te−t ≤ | log t | tc−1 ≤ t− 2c tc−1 = 1 c , ∂x t1− 2 for 0 < t < t0 . Since ϕ(t) = 1−1 2c is integrable over [0, 1], Γ1 (x) is differentiable by the t R1 Corrollary with Γ′1 (x) = 0 tx−1 log te−t dt. Similarly, Γ2 (x) is an improper integral over an unbounded interval [1, +∞), for sufficiently large t ≥ t0 > 1, we have log t < t and tx < td , such that ∂ f (x, t) = tx−1 log te−t ≤ tx e−t ≤ td e−t ≤ td e−t/2 e−t/2 ≤ Me−t/2 . ∂x Since td e−t/2 tends to 0 as t → ∞, it is bounded by some constant M and e−t/2 is integrable on [1, +∞) such that Γ2 (x) is differentiable with Z ∞ ′ Γ2 (x) = tx−1 log te−t dt. 1 Consequently, Γ(x) is differentiable for all x > 0 with Z ∞ ′ tx−1 log t e−t dt. Γ (x) = 0 Similarly one can show that Γ ∈ C (R>0 ) with Z ∞ (k) Γ (x) = tx−1 (log t)k e−t dt. ∞ 0 7 Calculus of Functions of Several Variables 230 7.9 Appendix ∂fi Proof of Proposition 7.7. Let A = (Aij ) = be the matrix of partial derivatives consid∂xj ered as a linear map from Rn to Rm . Our aim is to show that kf (a + h) − f (a) − A hk = 0. h→0 khk lim For, it suffices to prove the convergence to 0 for each coordinate i = 1, . . . , m by Proposition 6.26 P fi (a + h) − fi (a) − nj=1 Aij hj = 0. (7.52) lim h→0 khk Without loss of generality we assume m = 1 and f = f1 . For simplicity, let n = 2, f = f (x, y), a = (a, b), and h = (h, k). Note first that by the mean value theorem we have f (a + h, b + k) − f (a, b) = f (a + h, b + k) − f (a, b + k) + f (a, b + k) − f (a, b) ∂f ∂f (ξ, b + k)h + (a, η)k, = ∂x ∂y where ξ ∈ (a, a + h) and η ∈ (b, b + k). Using this, the expression in(7.52) reads (a, b) h − ∂f (a, b) k f (a + h, b + k) − f (a, b) − ∂f ∂x ∂y √ = h2 + k 2 ∂f ∂f ∂f ∂f (ξ, b + k) − (a, b) h + (a, η) − (a, b) k ∂x ∂x ∂y ∂x √ . = h2 + k 2 Since both ∂f and ∂f are continuous at (a, b), ∂x ∂y Uδ ((a, b)) implies ∂f ∂f ∂x (x, y) − ∂x (a, b) < ε, This shows ∂f (ξ, b + k) − ∂x ∂f (a, b) ∂x √ h+ h2 + k 2 given ε > 0 we find δ > 0 such that (x, y) ∈ ∂f (a, η) ∂y ∂f ∂f ∂y (x, y) − ∂y (a, b) < ε. − ∂f (a, b) ∂x k ε|h| + ε|k| ≤ √ ≤ 2ε, h2 + k 2 (a, b) ∂f (a, b)). hence f is differentiable at (a, b) with Jacobi matrix A = ( ∂f ∂x ∂y Since both components of A—the partial derivatives—are continuous functions of (x, y), the assignment x 7→ f ′ (x) is continuous by Proposition 6.26. Chapter 8 Curves and Line Integrals 8.1 Rectifiable Curves 8.1.1 Curves in R k We consider curves in Rk . We define the tangent vector, regular points, angle of intersection. Definition 8.1 A curve in Rk is a continuous mapping γ : I → Rk , where I ⊂ R is a closed interval consisting of more than one point. The interval can be I = [a, b], I = [a, +∞), or I = R. In the first case γ(a) and γ(b) are called the initial and end point of γ. These two points derfine a natural orientation of the curve “from γ(a) to γ(b)”. Replacing γ(t) by γ(a + b − t) we obtain the curve from γ(b) to γ(a) with opposite orientation. If γ(a) = γ(b), γ is said to be a closed curve. The curve γ is given by a k-tupel γ = (γ1 , . . . , γk ) of continuous real-valued functions. If γ is differentiable, the curve is said to be differentiable. Note that we have defined the curve to be a mapping, not a set of points in Rk . Of course, with each curve γ in Rk there is associated a subset of Rk , namely the image of γ, C = γ(I) = {γ(t) ∈ Rk | t ∈ I}. but different curves γ may have the same image C = γ(I). The curve is said to be simple if γ is injective on the inner points I ◦ of I. A simple curve has no self-intersection. Example 8.1 (a) A circle in R2 of radius r > 0 with center (0, 0) is described by the curve γ : [0, 2π] → R2 , γ(t) = (r cos t, r sin t). Note that γ̃ : [0, 4π] → R2 with γ̃(t) = γ(t) has the same image but is different from γ. γ is a simple curve, γ̃ is not. (b) Let p, q ∈ Rk be fixed points, p 6= q. Then γ1 (t) = (1 − t)p + tq, γ2 (t) = (1 − t)p + tq, 231 t ∈ [0, 1], t ∈ R, 8 Curves and Line Integrals 232 are the segment pq from p to q and the line pq through p and q, respectively. If v ∈ Rk is a vector, then γ3 (t) = p + tv, t ∈ R, is the line through p with direction v. (c) If f : [a, b] → R is a continuous function, the graph of f is a curve in R2 : γ : [a, b] → R2 , γ(t) = (t, f (t)). (d) Implicit Curves. Let F : U ⊂ R2 → R be continuously differentiable, F (a, b) = 0, and ∇F (a, b) 6= 0 for some point (a, b) ∈ U. By the Implicit function theorem, F (x, y) = 0 can locally be solved for y = g(x) or x = f (y). In both cases γ(t) = (t, g(t)) and γ(t) = (f (t), t) is a curve through (a, b). For example, F (x, y) = y 2 − x3 − x2 = 0 is locally solvable except for (a, b) = (0, 0). The corresponding curve is Newton’s knot Definition 8.2 (a) A simple curve γ : I → Rk is said to be regular at t0 if γ is continuously differentiable on I and γ ′ (t0 ) 6= 0. γ is regular if it is regular at every point t0 ∈ I. (b) The vector γ ′ (t0 ) is called the tangent vector, α(t) = γ(t0 ) + tγ ′ (t0 ), t ∈ R, is called the tangent line to the curve γ at point γ(t0 ). Remark 8.1 The moving partice. Let t the time variable and s(t) the coordinates of a point vector of the moving point. moving in Rk . Then the tangent vector v(t) = s′ (t) is the velocityp The instantaneous velocity is the euclidean norm of v(t) kv(t)k = s′1 (t)2 + · · · + s′k (t)2 . The acceleration vector is the second derivative of s(t), a(t) = v ′ (t) = s′′ (t). Let γi : Ii → Rk , i = 1, 2, be two regular curves with a common point γ1 (t1 ) = γ2 (t2 ). The angle of intersection ϕ between the two curves γi at ti is defined to be the angle between the two tangent lines γ1′ (t1 ) and γ2′ (t2 ). Hence, cos ϕ = γ1′ (t1 )·γ2′ (t2 ) , kγ1′ (t1 )k kγ2′ (t2 )k ϕ ∈ [0, π]. NewtonsKnot 1.5 1 0.5 0 -1 -0.5 0 -0.5 -1 -1.5 0.5 1 Example 8.2 (a) Newton’s knot. The curve γ : R → R2 given by γ(t) = (t2 − 1, t3 − t) is not injective since γ(−1) = γ(1) = (0, 0) = x0 . The point x0 is a double point of the curve. In general γ has two different tangent lines at a double point. Since γ ′ (t) = (2t, 3t2 − 1) we have γ ′ (−1) = (−2, 2) and γ ′ (1) = (2, 2). The curve is regular since γ ′ (t) 6= 0 for all t. Let us compute the angle of self-intersection. Since γ(−1) = γ(1) = (0, 0), the selfintersection angle ϕ satisfies cos ϕ = (−2, 2)·(2, 2) = 0, 8 8.1 Rectifiable Curves 233 hence ϕ = 90◦ , the intersection is orthogonal. (b) Neil’s parabola. Let γ : R → R2 be given by γ(t) = (t2 , t3 ). Since γ ′ (t) = (2t, 3t2 ), the origin is the only singular point. 8.1.2 Rectifiable Curves The goal of this subsection is to define the length of a curve. For differentiable curves, there is a formula using the tangent vector. However, the “lenght of a curve” makes sense for some non-differentiable, continuous curves. Let γ : [a, b] → Rk be a curve. We associate to each partition P = {t0 , . . . , tn } of [a, b] the points xi = γ(ti ), i = 0, . . . , n, and the number ℓ(P, γ) = n X i=1 kγ(ti ) − γ(ti−1 )k . (8.1) The ith term in this sum is the euclidean distance of the points xi−1 = γ(ti−1 ) and xi = γ(ti ). x3 Hence ℓ(P, γ) is the length of the polygonal path with vertices x0 , . . . , xn . As our partition becomes finer and finer, this polygon approaches the image of γ more and more closely. x0 x11 x2 a t1 t2 b Definition 8.3 A curve γ : [a, b] → Rk is said to be rectifiable if the set of non-negative real numbers {ℓ(P, γ) | P is a partition of [a, b]} is bounded. In this case ℓ(γ) = sup ℓ(P, γ), where the supremum is taken over all partitions P of [a, b], is called the length of γ. In certain cases, ℓ(γ) is given by a Riemann integral. We shall prove this for continuously differentiable curves, i. e. for curves γ whose derivative γ ′ is continuous. Proposition 8.1 If γ ′ is continuous on [a, b], then γ is rectifiable, and Z b kγ ′ (t)k dt. ℓ(γ) = a R ti ′ Proof. If a ≤ ti−1 < ti ≤ b, by Theorem 5.28, γ(ti ) − γ(ti−1 ) = ti−1 γ (t) dt. Applying Proposition 5.29 we have Z ti Z ti ′ kγ(ti ) − γ(ti−1 )k = γ (t) dt kγ ′ (t)k dt. ≤ ti−1 Hence ℓ(P, γ) ≤ Z ti−1 b a kγ ′ (t)k dt 8 Curves and Line Integrals 234 for every partition P of [a, b]. Consequently, Z b ℓ(γ) ≤ kγ ′ (t)k dt. a To prove the opposite inequality, let ε > 0 be given. Since γ ′ is uniformly continuous on [a, b], there exists δ > 0 such that kγ ′ (s) − γ ′ (t)k < ε if | s − t | < δ. Let P be a partition with ∆ti ≤ δ for all i. If ti−1 ≤ t ≤ ti it follows that kγ ′ (t)k ≤ kγ ′ (ti )k + ε. Hence Z ti kγ ′ (t)k dt ≤ kγ ′ (ti )k ∆ti + ε∆ti ti−1 Z ti ′ ′ ′ + ε∆ti = (γ (t) − γ (t ) − γ (t)) dt i t Z i−1 Z ti ′ ti ′ ′ + + ε∆ti ≤ γ (t) dt (γ (t ) − γ (t)) dt i ti−1 ti−1 ′ ′ ≤ kγ (ti ) − γ (ti−1 )k + 2ε∆ti . If we add these inequalities, we obtain Z b kγ ′ (t)k dt ≤ ℓ(P, γ) + 2ε(b − a) ≤ ℓ(γ) + 2ε(b − a). a Since ε was arbitrary, Z a This completes the proof. b kγ ′ (t) dtk ≤ ℓ(γ). Special Case k = 2 k = 2, γ(t) = (x(t), y(t)), t ∈ [a, b]. Then Z bp ℓ(γ) = x′ (t)2 + y ′ (t)2 dt. a In particular, let γ(t) = (t, fp (t)) be the graph of a continuously differentiable function Rb f : [a, b] → R. Then ℓ(γ) = a 1 + (f ′ (t))2 dt. Example 8.3 Catenary Curve. Let f (t) = a cosh at , t ∈ [0, b], b > 0. Then f ′ (t) = sinh at and moreover s b 2 Z b Z b t t b t ℓ(γ) = 1 + sinh dt = cosh dt = a sinh = a sinh . a a b 0 a 0 0 8.1 Rectifiable Curves 235 Cycloid and Circle 2 1.5 1 0.5 0 1 2 3 4 5 6 (b) The position of a bulge in a bicycle tire as it rolls down the street can be parametrized by an angle θ as shown in the figure. Cycloid Defining Circle Curve 3 Curve 4 Let the radius of the tire be a. It can be verified by plane trigonometry that a(θ − sin θ) . γ(θ) = a(1 − cos θ) This curve is called a cycloid. Find the distance travelled by the bulge for 0 ≤ θ ≤ 2π. θ Using 1 − cos θ = 2 sin2 we have 2 γ ′ (θ) = a(1 − cos θ, sin θ) q √ ′ kγ (θ)k = a (1 − cos θ)2 + sin2 θ = a 2 − 2 cos θ √ √ θ = a 2 1 − cos θ = 2a sin . 2 Therefore, ℓ(γ) = 2a Z 2π 0 θ sin dθ = −4a 2 2π θ = 4a(− cos π + cos 0) = 8a. cos 2 0 (c) The arc element ds. Formally the arc element of a plane differentiable curve can be computed using the pythagorean theorem ds p dy ( ds)2 = ( dx)2 + ( dy)2 =⇒ ds = dx2 + dy 2 r dy 2 ds = dx 1 + dx dx2 p ds = 1 + (f ′ (x))2 dx. (d) Arc of an Ellipse. The ellipse with equation x2 /a2 + y 2/b2 = 1 , 0 < b ≤ a, is parametrized by γ(t) = (a cos t, b sin t), t ∈ [0, t0 ], such that γ ′ (t) = (−a sin t, b cos t). Hence, Z t0 p Z t0 p 2 a2 sin t + b2 cos2 t dt = a2 − (a2 − b2 ) cos2 t dt ℓ(γ) = 0 0 Z t0 √ 1 − ε2 cos2 t dt, =a 0 where ε = √ a2 −b2 . a This integral can be transformed into the function Z τp E(τ, ǫ) = 1 − ε2 sin2 t dt 0 8 Curves and Line Integrals 236 is the elliptic integral of the second kind as defined in Chapter5. (e) A non-rectifiable curve. Consider the graph γ(t) = (t, f (t)), t ∈ [0, 1], of the function f , ( π , 0 < t ≤ 1, t cos 2t f (t) = 0, t = 0. Since lim f (t) = f (0) = 0, f is continuous and γ(t) is a curve. However, this curve is not rect→0+0 tifiable. Indeed, choose the partition Pk = {t0 = 0, 1/(4k), 1/(4k − 2), . . . , 1/4, 1/2, t2k+1 = 1 1} consisting of 2k + 1 points, ti = 4k−2i+2 , i = 1, . . . , 2k. Note that t0 = 0 and t2k+1 = 1 play a special role and will be omitted in the calculations below. Then cos 2tπi = (1, −1, 1, −1, . . . , −1, 1), i = 1, . . . , 2k. Thus ℓ(Pk , γ) ≥ 2k X p i=2 (ti − ti−1 )2 + (f (ti ) − f (ti−1 ))2 ≥ 2k X i=2 | f (ti ) − f (ti−1 ) | 1 1 1 1 1 1 ≥ + +···+ + + + 4k − 2 4k 4k − 4 4k − 2 2 4 1 1 1 1 + +···+ = +2 2 4 4k − 2 4k which is unbounded for k → ∞ since the harmonic series is unbounded. Hence γ is not rectifiable. 8.2 Line Integrals A lot of physical applications are to be found in [MW85, Chapter 18]. Integration of vector fields along curves is of fundamental importance in both mathematics and physics. We use the concept of work to motivate the material in this section. The motion of an object is described by a parametric curve ~x = ~x(t) = (x(t), y(t), z(t)). By differentiating this function, we obtain the velocity ~v (t) = ~x′ (t) and the acceleration ~a(t) = ~x′′ (t). We use the physicist notation ~x˙ (t) and ~x¨(t) to denote derivatives with respect to the time t. According to Newton’s law, the total force F̃ acting on an object of mass m is F̃ = m~a. Since the kinetic energy K is defined by K = 12 m~v 2 = 12 m~v · ~v we have 1 K̇(t) = m(~v˙ · ~v + ~v · ~v˙ ) = m~a · ~v = F̃ · ~v. 2 The total change of the kinetic energy from time t1 to t2 , denoted W , is called the work done by the force F̃ along the path ~x(t): Z t2 Z t2 Z t2 W = K̇(t) dt = F̃ · ~v dt = F̃ (t) · ~x˙ (t) dt. t1 t1 t1 8.2 Line Integrals 237 Let us now suppose that the force F̃ at time t depends only on the position ~x(t). That is, we assume that there is a vector field F~ (~x) such that F̃ (t) = F~ (~x(t)) (gravitational and electrostatic attraction are position-dependent while magnetic forces are velocity-dependent). Then we may rewrite the above integral as Z t2 W = t1 F~ (~x(t)) · ~x˙ (t) dt. In the one-dimensional case, by a change of variables, this can be simplified to Z b W = F (x) dx, a where a and b are the starting and ending positions. Definition 8.4 Let Γ = {~x(t) | t ∈ [r, s]}, be a continuously differentiable curve ~x(t) ∈ C1 ([r, s]) in Rn and f~ : Γ → Rn a continuous vector field on Γ . The integral Z Z s ~ f (~x) · d~x = f~(~x(t)) · ~x˙ (t) dt Γ r is called the line integral of the vector field f~ along the curve Γ . Remark 8.2 (a) The definition of the line integral does not depend on the parametrization of Γ. (b) If we take different curves between the same endpoints, the line integral may be different. R (c) If the vector field f~ is orthogonal to the tangent vector, then Γ f~ · d~x = 0. (d) Other notations. If f~ = (P, Q) is a vector field in R2 , Z Z ~ f · d~x = P dx + Q dy, Γ Γ R where the right side is either a symbol or Γ P dx = Γ (P, 0) · d~x. R Example 8.4 (a) Find the line integral Γi y dx + (x − y) dy, i = 1, 2, where R Γ1 = {~x(t) = (t, t2 ) | t ∈ [0, 1]} Γ and Γ2 = Γ3 ∪ Γ4 , Γ y dx + (x − y) dy = R R Z 1 2 0 2 (t · 1 + (t − t )2t) dt = R Γ 4 1 with Γ3 = {(t, 0) | t ∈ [0, 1]}, Γ4 = {(1, t) | t ∈ [0, 1]}. In the first case ~x˙ (t) = (1, 2t); hence Z (1,1) (0,0) Z 0 1 Γ3 (1,0) 1 (3t2 − 2t3 ) dt = . 2 In the second case Γ2 f d~x = Γ3 f d~x + Γ4 f d~x. For the first part ( dx, dy) = ( dt, 0), for the second part ( dx, dy) = (0, dt) such that 1 Z Z Z 1 Z 1 1 2 1 f d~x = y dx + (x − y) dy = 0 dt + (t − 0) · 0 + t · 0 + (1 − t) dt = t − t = . 2 0 2 Γ Γ 0 0 (b) Find the work done by the force field F~ (x, y, z) = (y, −x, 1) as a particle moves from (1, 0, 0) to (1, 0, 1) along the following paths ε = ±1: 8 Curves and Line Integrals 238 ~x(t)ε = (cos t, ε sin t, 2πt ), t ∈ [0, 2π], We find Z Γε F~ · d~x = Z 2π (ε sin t, − cos t, 1) · (− sin t, ε cos t, 1/(2π)) dt Z 2π 1 2 2 = dt −ε sin t − ε cos t + 2π 0 0 = −2πε + 1. In case ε = 1, the motion is “with the force”, so the work is positive; for the path ε = −1, the motion is against the force and the work is negative. We can also define a scalar line integral in the following way. Let γ : [a, b] → Rn be a continuously differentiable curve, Γ = γ([a, b]), and f : Γ → R a continuous function. The integral Z Z b f (x) ds := f (γ(t)) kγ ′ (t)k dt Γ a is called the scalar line integral of f along Γ . Properties of Line Integrals Remark 8.3 (a) Linearity. Z Z Z ~ ~ f d~x + ~g d~x, (f + ~g ) d~x = Γ Γ Γ Z λf~ d~x = λ Γ Z f~ d~x. Γ (b) Change of orientation. If ~x(t), t ∈ [r, s] defines a curve Γ which goes from a = ~x(r) to b = ~x(s), then ~y (t) = ~x(r + s − t), t ∈ [r, s], defines the curve −Γ which goes in the opposite direction from b to a. It is easy to see that Z Z f~ d~x = − f~ d~x. −Γ (c) Triangle inequality. Γ Z ~ f~ d~x ≤ ℓ(Γ ) sup f(x) . x∈Γ Γ Proof. Let ~x(t), t ∈ [t0 , t1 ] be a parametrization of Γ , then Z t1 Z Z t1 ~ ′ ′ ~ f~ d~x = f (~ x (t)) f (~ x (t)) · ~ x (t) dt ≤ k~x (t)k dt tri.in.,CSI Γ t0 t0 Z t ~ 1 ′ ~ ≤ sup f (~x) k~x (t)k dt = sup f (~x) ℓ(Γ ). ~ x∈Γ ~ x∈Γ t0 (d) Splitting. If Γ1 and Γ2 are two curves such that the ending point of Γ1 equals the starting point of Γ2 then Z Z Z f~ d~x = f~ d~x + f~ d~x. Γ1 ∪Γ2 Γ1 Γ2 8.2 Line Integrals 239 8.2.1 Path Independence Problem: For which vector fields f~ the line integral from a to b does not depend upon the path (see Example 8.4 (a) Example 8.2)? Definition 8.5 A vector field f~ : G → Rn , G ⊂ Rn , is called conservative if for any points a and b in G and any curves Γ1 and Γ2 from a to b we have Z Z f~ d~x = f~ d~x. Γ1 Γ2 In this case we say that the line integral Rb f~ d~x. a R Γ f~ d~x is path independent and we use the notation Definition 8.6 A vector field f~ : G → Rn is called potential field or gradient vector field if there exists a continuously differentiable function U : G → R such that f~(x) = grad U(x) for x ∈ G. We call U the potential or antiderivative of f~. Example 8.5 The gravitational force is given by x , F~ (x) = −α kxk3 where α = γmM. It is a potential field with potential U(x) = α 1 kxk x with f (y) = 1/y and f ′ (y) = This follows from Example 7.2 (a), grad f (kxk) = f ′ (kxk) kxk −1/y 2. Remark 8.4 (a) A vector field f~ is conservative if and only if the line integral over any closed curve in G is 0. Indeed, suppose that f~ is conservative and Γ = Γ1 ∪ Γ2 is a closed curve, where Γ1 is a curve from a to b and Γ2 is a curve from b to a. By Remark 8.3 (b), changing the orientation of Γ2 , the sign of the line integral changes and −Γ2 is again a curve from a to b: Z Z Z Z Z ~ ~ + − f d~x = f d~x = f~ d~x = 0. Γ Γ1 Γ2 Γ1 −Γ2 The proof of the other direction is similar. y G G x x y (b) Uniqueness of a potential. An open subset G ⊂ Rn is said to be connected, if any two points x, y ∈ G can be connected by a polygonal path from x to y inside G. If it exists, U(x) is uniquely determined up to a constant. 8 Curves and Line Integrals 240 Indeed, if grad U1 (x) = grad U2 (x) = f~ put U = U1 − U2 then we have ∇U = ∇U1 − ∇U2 = f~ − f~ = 0. Now suppose that x, y ∈ G can be connected by a segment xy ⊂ G inside G. By the MVT (Corollary 7.12) U(y) − U(x) = ∇U((1 − t)x + ty) (y − x) = 0, since ∇U = 0 on the segment xy. This shows that U = U1 − U2 = const. on any polygonal path inside G. Since G is connected, U1 − U2 is constant on G. Theorem 8.2 Let G ⊂ Rn be a domain. (i) If U : G → R is continuously differentiable and f~ = grad U. Then f~ is conservative, and for every (piecewise continuously differentiable) curve Γ from a to b, a, b ∈ G, we have Z f~ d~x = U(b) − U(a). Γ (ii) Let f~ : G → Rn be a continuous, conservative vector field and a ∈ G. Put Z x U(x) = f~ d~y , x ∈ G. a Then U(x) is a potential for f~, that is grad U = f~. (iii) A continuous vector field f~ is conservative in G if and only if it is a potential field. Proof. (i) Let Γ = {~x(t) | t ∈ [r, s]}, be a continuously differentiable curve from a = ~x(r) to b = ~x(s). We define ϕ(t) = U(~x(t)) and compute the derivative using the chain rule ϕ̇(t) = grad U(~x(t)) · ~x˙ (t) = f~(~x(t)) · ~x˙ (t). By definition of the line integral we have Z Z s ~ f d~x = f~(~x(t)) ~x˙ (t) dt. Γ r Inserting the above expression and applying the fundamental theorem of calculus, we find Z Z s ~ ϕ̇(t) dt = ϕ(s) − ϕ(r) = U(~x(s)) − U(~x(r)) = U(b) − U(a). f d~x = Γ r (ii) Choose h ∈ Rn small such that x + th ∈ G for all t ∈ [0, 1]. By the path independence of the line integral Z x+h Z x Z x+h ~ ~ U(x + h) − U(x) = f · d~y − f · d~y = f~ · d~y a a x Consider the curve ~x(t) = x + th, t ∈ [0, 1] from x to x + h. Then ~x˙ (t) = h. By the mean value theorem of integration (Theorem 5.18 with ϕ = 1, a = 0 and b = 1) we have Z 1 Z x+h ~ ~ + θh) · h, f · d~y = f~(~x(t)) · h dt = f(x x 0 8.2 Line Integrals 241 where θ ∈ [0, 1]. We check grad U(x) = f~(x) using the definition of the derivative: ~ ~ U(x + h) − U(x) − f~(x) · h f (x + θh) − f~(x) khk (f(x + θh) − f~(x)) · h = ≤ khk khk khk CSI ~ = f~(x + θh) − f(x) −→ 0, h→0 since f~ is continuous at x. This shows that ∇U = f~. (iii) follows immediately from (i) and (ii). Remark 8.5 (a) In case n = 2, a simple path to compute the line integral (and so the potential U) in (ii) consists of 2 segments: from (0, 0) via (x, 0) to (x, y). The line integral of P dx+Q dy then reads as ordinary Riemann integrals Z x Z y U(x, y) = P (t, 0) dt + Q(x, t) dt. 0 0 (b) Case n = 3. You can also use just one single segment from the origin to the endpoint (x, y, z). This path is parametrized by the curve ~x(t) = (tx, ty, tz), t ∈ [0, 1], ~x˙ (t) = (x, y, z). We obtain U(x, y, z) = Z =x (x,y,z) f1 dx + f2 dy + f3 dz (0,0,0) Z 1 f1 (tx, ty, tz) dt + y 0 Z 1 f2 (tx, ty, tz) dt + z 0 (8.2) Z 1 f3 (tx, ty, tz) dt. (8.3) 0 (c) Although Theorem 8.2 gives a necessary and sufficient condition for a vector field to be conservative, we are missing an easy criterion. Recall from Example 7.4, that a necessary condition for f~ = (f1 , . . . , fn ) to be a potential vector field is ∂fj ∂fi = , 1 ≤ i < j ≤ n. ∂xj ∂xi which is a simple consequence from Schwarz’s lemma since if fi = Uxi then Uxi xj = The condition ∂fi ∂xj = ∂fj , ∂xi ∂Uxj ∂Uxi ∂fi ∂fj = = = = Uxj xi . ∂xj ∂xj ∂xi ∂xi ~ It is a 1 ≤ i < j ≤ n. is called integrability condition for f. necessary condition for f~ to be conservative. However, it is not sufficient. Remark 8.6 Counter example. Let G = R2 \ {(0, 0)} and −y x ~ . f = (P, Q) = , x2 + y 2 x2 + y 2 8 Curves and Line Integrals 242 The vector field satisfies the integrability condition Py = Qx . However, it is not conservative. For, consider the unit circle γ(t) = (cos t, sin t), t ∈ [0, 2π]. Then γ ′ (t) = (− sin t, cos t) and Z Z 2π Z Z 2π −y dx sin t sin t dt cos t cos t dt x dy ~ f d~x = + = + 2 = dt = 2π. 2 2 x + y2 1 1 0 0 γ γ x +y R This contradicts Γ f~ d~x = 0 for conservative vector fields. Hence, f~ is not conservative. f~ fails to be conservative since G = R2 \ {(0, 0)} has an hole. For more details, see homework 30.1. The next proposition shows that under one additional assumption this criterion is also sufficient. A connected open subset G (a region) of Rn is called simply connected if every closed polygonal path inside G can be shrunk inside G to a single point. Roughly speaking, simply connected sets do not have holes. convex subset of Rn 1-torus S1 = {z ∈ C | | z | = 1} annulus {(x, y) ∈ R2 | r 2 < x2 + y 2 ≤ R2 }, 0 ≤ r < R ≤ ∞ R2 \ {(0, 0)} R3 \ {(0, 0, 0)} simply connected not simply connected not simply connected not simlpy connected simply connected The precise mathematical term for a curve γ to be “shrinkable to a point” is to be nullhomotopic. Definition 8.7 (a) A closed curve γ : [a, b] → G, G ⊂ Rn open, is said to be null-homotopic if there exists a continuous mapping h : [a, b] × [0, 1] → G and a point x0 ∈ G such that (a) h(t, 0) = γ(t) for all t, (b) h(t, 1) = x0 for all t, (c) h(a, s) = h(b, s) = x0 for all s ∈ [0, 1]. (b) G is simply connected if any curve in G is null homotopic. Proposition 8.3 Let f~ = (f1 , f2 , f3 ) a continuously differentiable vector field on a region G ⊂ R3 . (a) If f~ is conservative then curl f~ = 0, i. e. ∂f2 ∂f3 − = 0, ∂x2 ∂x3 ∂f1 ∂f3 − = 0, ∂x3 ∂x1 ∂f2 ∂f1 − = 0. ∂x1 ∂x2 (b) If curl f~ = 0 and G is simply connected, then f~ is conservative. Proof. (a) Let f~ be conservative; by Theorem 8.2 there exists a potential U, grad U = f~. However, curl grad U = 0 since ∂f3 ∂f2 ∂2U ∂2U − = − =0 ∂x2 ∂x3 ∂x2 ∂x3 ∂x3 ∂x2 by Schwarz’s Lemma. (b) This will be an application of Stokes’ theorem, see below. 8.2 Line Integrals 243 Example 8.6 Let on R3 , f~ = (P, Q, R) = (6xy 2 + ex , 6x2 y, 1). Then curl f~ = (Ry − Qz , Pz − Rx , Qx − Py ) = (0, 0, 12xy − 12xy) = 0; hence, f~ is conservative with the potential U(x, y, z). First method to compute the potential U: ODE ansatz. The ansatz Ux = 6xy 2 + ex will be integrated with respect to x: Z Z U(x, y, z) = Ux dx + C(y, z) = (6xy 2 + ex ) dx + C(y, z) = 3x2 y 2 + ex + C(y, z). Hence, ! Uy = 6x2 y + Cy (y, z) = 6x2 y, ! Uz = Cz = 1. This implies Cy = 0 and Cz = 1. The solution here is C(y, z) = z + c1 such that U = 3x2 y 2 + ex + z + c1 . Second method: Line Integrals. See Remark 8.5 (b)) Z 1 Z 1 Z 1 U(x, y, z) = x f1 (tx, ty, tz) dt + y f2 (tx, ty, tz) dt + z f3 (tx, ty, tz) dt 0 0 0 Z 1 Z 1 Z 1 3 2 tx 3 2 =x (6t xy + e ) dt + y 6t x y dt + z dt 0 = 3x2 y 2 + ex + z. 0 0 244 8 Curves and Line Integrals Chapter 9 Integration of Functions of Several Variables References to this chapter are [O’N75, Section 4] which is quite elemantary and good accessible. Another elementary approach is [MW85, Chapter 17] (part III). A more advanced but still good accessible treatment is [Spi65, Chapter 3]. This will be our main reference here. Rudin’s book [Rud76] is not recommendable for an introduction to integration. 9.1 Basic Definition The definition of the Riemann integral of a function f : A → R, where A ⊂ Rn is a closed rectangle, is so similar to that of the ordinary integral that a rapid treatment will be given, see Section 5.1. If nothing is specified otherwise, A denotes a rectangle. A rectangle A is the cartesian product of n intervals, A = [a1 , b1 ] × · · · × [an , bn ] = {(x1 , . . . , xn ) ∈ Rn | ak ≤ xk ≤ bk , k = 1, . . . , n}. Recall that a partition of a closed interval [a, b] is a sequence t0 , . . . , tk where a = t0 ≤ t1 ≤ · · · ≤ tk = b. The partition divides the interval [a, b] in to k subintervals [ti−1 , ti ]. A partition of a rectangle [a1 , b1 ] × · · · × [an , bn ] is a collection P = (P1 , . . . , Pn ) where each Pi is a partition of the interval [ai , bi ]. Suppose for example that P1 = (t0 , . . . , tk ) is a partition of [a1 , b1 ] and P2 = (s0 , . . . , sl ) is a partition of [a2 , b2 ]. Then the partition P = (P1 , P2 ) of [a1 , b1 ] × [a2 , b2 ] divides the closed rectangle [a1 , b1 ] × [a2 , b2 ] into kl subrectangles, a typical one being [ti−1 , ti ] × [sj−1 , sj ]. In general, if Pi divides [ai , bi ] into Ni subintervals, then P = (P1 , . . . , Pn ) divides [a1 , b1 ] × · · · × [an , bn ] into N1 · · · Nn subrectangles. These subrectangles will be called subrectangles of the partition P . Suppose now A is a rectangle, f : A → R is a bounded function, and P is a partition of A. For each subrectangle S of the partition let mS = inf{f (x) | x ∈ S}, MS = sup{f (x) | x ∈ S}, 245 9 Integration of Functions of Several Variables 246 and let v(S) be the volume of the rectangle S. A = [a1 , b1 ] × · · · × [an , bn ] is Note that volume of the rectangle v(A) = (b1 − a1 )(b2 − a2 ) · · · (bn − an ). The lower and the upper sums of f for P are defined by X X mS v(S) and U(P, f ) = MS v(S), L(P, f ) = S S where the sum is taken over all subrectangles S of the partition P . Clearly, if f is bounded with m ≤ f (x) ≤ M on the rectangle x ∈ R, m v(R) ≤ L(P, f ) ≤ U(P, f ) ≤ M v(R), so that the numbers L(P, f ) and U(P, f ) form bounded sets. Lemma 5.1 remains true; the proof is completely the same. Lemma 9.1 (a) Suppose the partition P ∗ is a refinement of P (that is, each subrectangle of P ∗ is contained in a subrectangle of P ). Then L(P, f ) ≤ L(P ∗ , f ) and U(P ∗ , f ) ≤ U(P, f ). (b) If P and P ′ are any two partitions, then L(P, f ) ≤ U(P ′ , f ). It follows from the above corollary that all lower sums are bounded above by any upper sum and vice versa. Definition 9.1 Let f : A → R be a bounded function. The function f is called Riemann integrable on the rectangle A if Z A f dx := sup{L(P, f )} = inf {U(P, f )} =: P P Z f dx, A where the supremum and the infimum are taken over all partitions P of A. This common number is the Riemann integral of f on A and is denoted by Z Z f dx or f (x1 , . . . , xn ) dx1 · · · dxn . A A R R f dx and A f dx are called the lower and the upper integral of f on A, respectively. They A always exist. The set of integrable function on A is denoted by R(A). As in the one dimensional case we have the following criterion. Proposition 9.2 (Riemann Criterion) A bounded function f : A → R is integrable if and only if for every ε > 0 there exists a partition P of A such that U(P, f ) − L(P, f ) < ε. 9.1 Basic Definition 247 Example 9.1 (a) Let f : A → R be a constant function f (x) = c. Then for any Partition P and any subrectangle S we have mS = MS = c, so that X X c v(S) = c v(S) = cv(A). L(P, f ) = U(P, f ) = S R S Hence, A c dx = cv(A). (b) Let f : [0, 1] × [0, 1] → R be defined by ( 0, f (x, y) = 1, if x is rational, if x is irrational. If P is a partition, then every subrectangle S will contain points (x, y) with x rational, and also points (x, y) with x irrational. Hence mS = 0 and MS = 1, so X L(P, f ) = 0v(S) = 0, S and U(P, f ) = Therefore, R f dx = 1 6= 0 = A X R S A 1v(S) = v(A) = v([0, 1] × [0, 1]) = 1. f dx and f is not integrable. 9.1.1 Properties of the Riemann Integral We briefly write R for R(A). R Remark 9.1 (a) R is a linear space and A (·) dx is a linear functional, i. e. f, g ∈ R imply λf + µg ∈ R for all λ, µ ∈ R and Z Z Z (λf + µg) dx = λ f dx + µ g dx. A A A (b) R is a lattice, i. e., f ∈ R implies | f | ∈ R. If f, g ∈ R, then max{f, g} ∈ R and min{f, g} ∈ R. (c) R is an algebra, i. e., f, g ∈ R imply f g ∈ R. (d) The triangle inequality holds: Z Z f dx ≤ | f | dx. A (e) C(A) ⊂ R(A). A (f) f ∈ R(A) and f (A) ⊂ [a, b], g ∈ C[a, b]. Then g ◦f ∈ R(A). (g) If f ∈ R and f = g except at finitely many points, then g ∈ R and R A f dx = R A g dx. (h) Let f : A → R and let P be a partition of A. Then f ∈ R(A) if and only if f ↾S is integrable for each subrectangle S. In this case Z XZ f dx = f ↾S dx. A S S 9 Integration of Functions of Several Variables 248 9.2 Integrable Functions We are going to characterize integrable functions. For, we need the notion of a set of measure zero. Definition 9.2 Let A be a subset of Rn . A has (n-dimensional) measure zero if for every ε > 0 P there exists a sequence (Ui )i∈N of closed rectangles Ui which cover A such that ∞ i=1 v(Ui ) < ε. Open rectangles can also be used in the definition. Remark 9.2 (a) Any finite set {a1 , . . . , am } ⊂ Rn is of measure 0. Indeed, let ε > 0 and choose Ui be a rectangle with midpoint ai and volume ε/m. Then {Ui | i = 1, . . . , m} covers P A and i v(Ui ) ≤ ε. (b) Any contable set is of measure 0. (c) Any countable set has measure 0. (d) If each (Ai )i∈N has measure 0 then A = A1 ∪ A2 ∪ · · · has measure 0. Proof. Let ε > 0. Since Ai has measure 0 there exist closed rectangles Uik , i ∈ N, k ∈ N, S such that for fixed i, the family {Uik | k ∈ N} covers Ai , i.e. k∈N Uik ⊇ Ai and P i−1 , i ∈ N. In this way we have constructed an infinite array {Uik } which k∈N v(Uik ) ≤ ε/2 covers A. Arranging those sets in a sequence (cf. Cantor’s first diagonal process), we obtain a sequence of rectangles which covers A and Hence, ∞ X ∞ X ε v(Uik ) ≤ = 2ε. i−1 2 i=1 i,k=1 P∞ i,k=1 v(Uik ) ≤ 2ε and A has measure 0. (e) Let A = [a1 , b1 ]×· · ·×[an , bn ] be a non-singular rectangle, that is ai < bi for all i = 1, . . . , n. Then A is not of measure 0. Indeed, we use the following two facts about the volume of finite unions of rectangles: P (a) v(U1 ∪ · · · ∪ Un ) ≤ ni=1 v(Ui ), (b) U ⊆ V implies v(U) ≤ v(V ). Now let ε = v(A)/2 = (b1 − a1 ) · · · (bn − an )/2 and suppose that the open rectangles (Ui )i∈N cover the compact set A. Then there exists a finite subcover U1 ∪ · · · ∪ Um ⊇ A. This and (a), (b) imply ! n n ∞ [ X X Ui ≤ v(Ui ) ≤ v(Ui ). ε < v(A) ≤ v This contradicts P i i=1 i=1 i=1 v(Ui ) ≤ ε; thus, A has not measure 0. Theorem 9.3 Let A be a closed rectangle and f : A → R a bounded function. B = {x ∈ A | f is discontinuous at x}. Then f is integrable if and only if B is a set of measure 0. For the proof see [Spi65, 3-8 Theorem] or [Rud76, Theorem 11.33]. Let 9.2 Integrable Functions 249 9.2.1 Integration over More General Sets We have so far dealt only with integrals of functions over rectangles. Integrals over other sets are easily reduced to this type. If C ⊂ Rn , the characteristic function χC of C is defined by ( 1, x ∈ C, χC (x) = 0, x 6∈ C. 1111111111111111111 0000000000000000000 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 0000000000000000000 1111111111111111111 C A Definition 9.3 Let f : C → R be bounded and A a rectangle, C ⊂ A. We call f Riemann integrable on C if the product function f ·χC : A → R is Riemann integrable on A. In this case we define Z Z f dx = f χC dx. C A This certainly occurs if both f and χC are integrable on R R A. Note, that C 1 dx = A χC dx =: v(C) is defined to be the volume or measure of C. R Problem: Under which conditions on C, the volume v(C) = C dx exists? By Theorem 9.3 χC is integrable if and only if the set B of discontinuities of χC in A has measure 0. The boundary of a set C 111 000 000 111 000 111 C A δC 11111 00000 00000 11111 00000 11111 00000 11111 111 000 000 111 000 111 Let C ⊂ A. For every x ∈ A exactly one of the following three cases occurs: (a) x has a neighborhood which is completely contained in C (x is an inner point of C), (b) x has a neighborhood which is completely contained in C c (x is an inner point of C c ), (c) every neighborhood of x intersects both C and C c . In this case we say, x belongs to the boundary ∂C of C. By definition ∂C = C ∩C c ; also ∂C = C \ C ◦ . By the above discussion, A is the disjoint union of two open and a closed set: A = C ◦ ∪ ∂C ∪ (C c )◦ . Theorem 9.4 The characteristic function χC : A → R is integrable if and only if the boundary of C has measure 0. Proof. Since the boundary ∂C is closed and inside the bounded set, ∂C is compact. Suppose first x is an inner point of C. Then there is an open set U ⊂ C containing x. Thus χC (x) = 1 on x ∈ U; clearly χC is continuous at x (since it is locally constant). Similarly, if x is an inner point of C c , χC (x) is locally constant, namely χC = 0 in a neighborhood of x. Hence χC is 9 Integration of Functions of Several Variables 250 continuous at x. Finally, if x is in the boundary of C for every open neighborhood U of x there is y1 ∈ U ∩ C and y2 ∈ U ∩ C c , so that χC (y1 ) = 1 whereas χC (y2 ) = 0. Hence, χC is not continuous at x. Thus, the set of discontinuity of χC is exactly the boundary ∂C. The rest follows from Theorem 9.3. Definition 9.4 A bounded set C is called Jordan measurable or simply a Jordan set if its boundR ary has measure 0. The integral v(C) = C 1 dx is called the n-dimensional Jordan measure of C or the n-dimensional volume of C; sometimes we write µ(C) in place of v(C). Naturally, the one-dimensional volume in the length, and the two-dimensional volume is the area. Typical Examples of Jordan Measurable Sets P Hyper planes ni=1 ai xi = c, and, more general, hyper surfaces f (x1 , . . . , xn ) = c, f ∈ C1 (G) are sets with measure 0 in Rn . Curves in Rn have measure 0. Graphs of functions Γf = {(x, f (x)) ∈ Rn+1 | x ∈ G}, f continuous, are of measure 0 in Rn+1 . If G is a bounded region in Rn , the boundary ∂G has measure 0. If G ⊂ Rn is a region, the cylinder C = ∂G × R = {(x, xn+1 ) | x ∈ ∂G} ⊂ Rn+1 is a measure 0 set. Let D ⊂ Rn+1 be given by D = {(x, xn+1 ) | x ∈ K, 0 ≤ xn+1 ≤ f (x)}, f(x) D K δK where K ⊂ Rn is a compact set and f : K → R is continuous. Then D is Jordan measurable. Indeed, D is bounded by the graph Γf , the hyper plane xn+1 = 0 and the cylinder ∂D × R = {(x, xn+1 ) | x ∈ ∂K} and all have measure 0 in Rn+1 . 9.2.2 Fubini’s Theorem and Iterated Integrals Our goal is to evaluate Riemann integrals; however, so far there was no method to compute multiple integrals. The following theorem fills this gap. Theorem 9.5 (Fubini’s Theorem) Let A ⊂ Rn and B ⊂ Rm be closed rectangles, and let f : A × B → R be integrable. For x ∈ A let gx : B → R be defined by gx (y) = f (x, y) and let L(x) = U(x) = Z Z gx dy = B gx dy = B Z Z f (x, y) dy, B f (x, y) dy. B 9.2 Integrable Functions 251 Then L(x) and U(x) are integrable on A and Z Z Z f dxdy = L(x) dx = A×B Z A f dxdy = A×B Z U(x) dx = A Z A Z Z A f (x, y) dy B f (x, y) dy B ! dx, dx. The integrals on the right are called iterated integrals. The proof is in the appendix to this chapter. Remarks 9.3 (a) A similar proof shows that we can exchange the order of integration ! Z Z Z Z Z f dxdy = f (x, y) dx dy = f (x, y) dx dy. A×B B B A A These integrals are called iterated integrals for f . R (b) In practice it is often the case that each gx is integrable so that A×B f dxdy = R R f (x, y) dy dx. This certainly occurs if f is continuous. A B (c) If A = [a1 , b1 ] × · · · × [an , bn ] and f : A → R is continuous, we can apply Fubini’s theorem repeatedly to obtain Z Z bn Z b1 f dx = ··· f (x1 , . . . , xn ) dx1 · · · dxn . A an a1 R (d) If C ⊂ A × B, Fubini’s theorem can be used to compute C f dx since this is by definition R f χC dx. Here are two examples in case n = 2 and n = 3. A×B Let a < b and ϕ(x) and ψ(x) continuous real valued functions on [a, b] with ϕ(x) < ψ(x) on [a, b]. Put ψ (x) C = {(x, y) ∈ R2 | a ≤ x ≤ b, ϕ(x) ≤ y ≤ ψ(x)}. Let f (x, y) be continuous on C. Then f is integrable on C and ! ZZ Z Z b ψ(x) f dxdy = C f (x, y) dy a dx. φ (x) a b ϕ(x) Let G = {(x, y, z) ∈ R3 | a ≤ x ≤ b, ϕ(x) ≤ y ≤ ψ(x), α(x, y) ≤ z ≤ β(x, y)}, where all functions are sufficiently nice. Then ZZZ Z b Z ψ(y) f (x, y, z) dxdydz = G a ϕ(x) Z β(x,y) α(x,y) f (x, y, z) dz ! dy ! dx. (e) Cavalieri’s Principle. Let A and B be Jordan sets in R3 and let Ac = {(x, y) | (x, y, c) ∈ A} be the section of A with the plane z = c; Bc is defined similar. Suppose each Ac and Bc is Jordan measurable (in R2 ) and they have the same area v(Ac ) = v(Bc ) for all c ∈ R. Then A and B have the same volume v(A) = v(B). 9 Integration of Functions of Several Variables 252 (1,1) y Example 9.2 (a) Let f (x, y) = xy and C = {(x, y) ∈ R2 | 0 ≤ x ≤ 1, x2 ≤ y ≤ x} √ = {(x, y) ∈ R2 | 0 ≤ y ≤ 1, y ≤ x ≤ y}. x Then ZZ xy dxdy = Z Z 1 xy dy dx = x2 0 C x Z 1 0 x Z 1 1 3 xy 2 dx = (x − x5 ) dx 2 y=x2 2 0 1 1 x4 x6 1 1 = − = − = . 8 12 0 8 12 24 Interchanging the order of integration we obtain ZZ xy dxdy = C Z 1 0 Z √ y xy dx dy = y Z 1 0 √y Z 1 1 2 x2 y = (y − y 3) dy 2 y 2 0 1 y 3 y 4 1 1 1 = − = − = . 6 8 0 6 8 24 (b) Let G = {(x, y, z) ∈ R3 | x, y, z ≥ 0, x + y + z ≤ 1} and f (x, y, z) = 1/(x + y + z + 1)3 . The set G can be parametrized as follows ZZZ G (2,4) (1,2) (2,2) (1,1) f dxdydz = Z 1 0 Z 1−x 0 Z 1−x−y 0 dz (1 + x + y + z)3 dy dx 1−x−y ! 1 −1 = dy dx 2 (1 + x + y + z)2 0 0 0 Z 1 Z 1−x 1 1 1 dy dx = − 2 (1 + x + y)2 8 0 0 Z 1 5 x−3 1 1 1 dx = log 2 − . + = 2 0 x+1 4 2 8 Z 1 Z 1−x (c) Let f (x, y) = ey/x and D the above region. Compute the integral of f on D. D can be parametrized as follows D = {(x, y) | 1 ≤ x ≤ 2, x ≤ y ≤ 2x} Hence, Z 2 Z 2x ZZ y f dxdy = dx e x dy D = 1 Z 2 1 y 2x dx xe x = x x Z 1 2 3 (e2 x − ex) dx = (e2 − e). 2 9.3 Change of Variable 253 But trying to reverse the order of integration we encounter two problems. First, we must break D in several regions: ZZ Z 2 Z y Z 4 Z 2 y/x f dxdy = dy e dx + dy ey/x dx. 1 D 1 2 y/2 This is not a serious problem. A greater problem is that e1/x has no elementary antiderivative, so R y y/x R2 e dx and y/2 ey/x dx are very difficult to evaluate. In this example, there is a considerable 1 advantage in one order of integration over the other. The Cosmopolitan Integral Let f ≥ 0 be a continuous real-valued function on [a, b]. y There are some very special solids whose volumes can f(x) be expressed by integrals. The simplest such solid G is a “volume of revolution” obtained by revolving the region under the graph of f ≥ 0 on [a, b] around the horizontal x axis. We apply Fubini’s theorem to the set a b G = {(x, y, z) ∈ R3 | a ≤ x ≤ b, y 2 + z 2 ≤ f (x)2 }. Consequently, the volume of v(G) is given by v(G) = ZZZ dxdydz = G Z b dx a Z Z dydz , Gx (9.1) where Gx = {(y, z) ∈ R2 | y 2 + z 2 ≤ f (x)2 } is the closed disc of radius f (x) around (0, 0). RR For any fixed x ∈ [a, b] its area is v(Gx ) = Gx dydz = πf (x)2 . Hence v(G) = π Z b f (x)2 dx. (9.2) a Example 9.3 We compute the volume of the ellipsoid obtained by revolving the graph of the ellipse x2 y 2 + 2 =1 a2 b 2 around the x-axis. We have y 2 = f (x)2 = b2 1 − xa2 ; hence 2 v(G) = πb a x2 x3 2a3 4π 2 2 2 b a. 1 − 2 dx = πb x − 2 = πb 2a − 2 = a 3a 3a 3 −a −a Z a 9.3 Change of Variable We want to generalize change of variables formula R g(b) g(a) f (x) dx = Rb a f (g(y))g ′(y) dy. 9 Integration of Functions of Several Variables 254 If BR is the ball in R3 with radius R around the origin we have in cartesian coordinates ZZZ Z Z √ 2 2 Z √ 2 2 2 R f dxdydz = dx −R BR R −x √ − R2 −x2 R −x −y dy − √ dzf (x, y, z). R2 −x2 −y 2 Usually, the complicated limits yield hard computations. Here spherical coordinates are appropriate. To motivate the formula consider the area of a parallelogram D in the x-y-plane spanned by the two vectors a = (a1 , a2 ) and b = (b1 , b2 ). g1 (λ, µ) D = {λa + µb | λ, µ ∈ [0, 1]} = (λ, µ ∈ [0, 1] , g2 (λ, µ) where g1 (λ, µ) = λa1 + µb1 and g2 (λ, µ) = λa2 + µb2 . As known from linear algebra the area of D equals the norm of the vector product e e e 1 2 3 v(D) = ka × bk = det a1 a2 0 = k(0, 0, a1b2 − a2 b1 )k = | a1 b2 − a2 b1 | =: d b1 b2 0 Introducing new variables λ and µ with x = λa1 + µb1 , y = λa2 + µb2 , the parallelogram D in the x-y-plane is now the unit square C = [0, 1] × [0, 1] in the λ-µ-plane and D = g(C). We want to compare the area d of D with the area 1 of C. Note that d is exactly 1 ,g2 ) ; indeed the absolute value of the Jacobian ∂(g ∂(λ,µ) ∂(g1 , g2 ) = det ∂(λ, µ) Hence, ZZ D ∂g1 ∂λ ∂g2 ∂λ ∂g1 ∂µ ∂g2 ∂µ ! a1 a2 = det = a1 b2 − a2 b1 . b1 b2 ZZ ∂(g1 , g2 ) dxdy = ∂(λ, µ) dλdµ. C This is true for any Rn and any regular map, g : C → D. Theorem 9.6 (Change of variable) Let C and D be compact Jordan set in Rn ; let M ⊂ C a set of measure 0. Let g : C → D be continuously differentiable with the following properties 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 (i) g is injective on C \ M. 00000000000 11111111111 0000000000000000 1111111111111111 00000000000 11111111111 0000000000000000 1111111111111111 00000000000 11111111111 0000000000000000 1111111111111111 00000000000 11111111 11111111111 0000000000000000 00000000 1111111111111111 00000000000 11111111111 0000000000000000 1111111111111111 00000000000 11111111111 0000000000000000 1111111111111111 00000000000 11111111111 (ii) g ′ (x) is regular on C \ M. 0000000000000000 1111111111111111 00000000000 11111111111 g D, y 0000000000000000 1111111111111111 00000000000 11111111111 Let f : D → R be continuous. Then Z Z f (y) dy = f (g(x)) D C 1111111111111111 0000000000000000 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 fg 00000000 11111111 C, x 00000000 11111111 IR 1111111111 0000000000 00000000000 11111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 f 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 ∂(g1 , . . . , gn ) ∂(x1 , . . . , xn ) (x) dx. (9.3) 9.3 Change of Variable 255 Remark 9.4 Why the absolute value of the Jacobian? In R1 we don’t have the absolute value. Rb Ra But in contrast to Rn , n ≥ 1, we have an orientation of the integration set a f dx = − b f dx. For the proof see [Rud76, 10.9 Theorem]. The main steps of the proof are: 1) In a small open set g can be written as the composition of n “flips” and n “primitive mappings”. A flip changes two variables xi and xk , wheras a primitive mapping H is equal to the identity except for one variable, H(x) = x + (h(x) − x)em where h : U → R. 2) If the statement is true for transformations S and T , then it is true for the composition S ◦T which follows from det(AB) = det A det B. 3) Use a partition of unity. Example 9.4 (a) Polar coordinates. Let A = {(r, ϕ) | 0 ≤ r ≤ R, 0 ≤ ϕ < 2π} be a rectangle in polar coordinates. The mapping g(r, ϕ) = (x, y), x = r cos ϕ, y = r sin ϕ maps this rectangle continuously differentiable onto the disc D with radius R. Let M = {(r, ϕ) | r = 0}. = r, the map g is bijective and regular on A \ M. The assumptions of the theorem Since ∂(x,y) ∂(r,ϕ) are satisfied and we have ZZ ZZ f (x, y) dxdy = f (r cos ϕ, r sin ϕ)rdrdϕ D = Fubini Z 0 A R Z 2π f (r cos ϕ, r sin ϕ)r drdϕ. 0 (b) Spherical coordinates. Recall from the exercise class the spherical coordinates r ∈ [0, ∞), ϕ ∈ [0, 2π], and ϑ ∈ [0, π] x = r sin ϑ cos ϕ, y = r sin ϑ sin ϕ, z = r cos ϑ. The Jacobian reads xr xϑ xϕ sin ϑ cos ϕ r cos ϑ cos ϕ −r sin ϑ sin ϕ ∂(x, y, z) = yr yϑ yϕ = sin ϑ sin ϕ r cos ϑ sin ϕ r sin ϑ cos ϕ = r 2 sin ϑ ∂(r, ϑ, ϕ) zr zϑ zϕ cos ϑ −r sin ϑ 0 Sometimes one uses Hence ZZZ B1 ∂(x, y, z) = −r 2 sin ϑ. ∂(r, ϕ, ϑ) f (x, y, z) dxdydz = Z 0 1 Z 0 2π Z π f (x, y, z) sin2 ϑ dr dϕ dϑ. 0 This example was not covered in the lecture. Compute the volume of the ellipsoid E given by u2 /a2 + v 2 /b2 + w 2 /c2 = 1. We use scaled spherical coordinates: u = ar sin ϑ cos ϕ, v = br sin ϑ sin ϕ, w = cr cos ϑ, 9 Integration of Functions of Several Variables 256 where r ∈ [0, 1], ϑ ∈ [0, π], ϕ ∈ 0, 2π]. Since the rows of the spherical Jacobian matrix are simply multiplied by a, b, and c, respectively, we have ∂(x,y,z) ∂(r,ϑ,ϕ) ∂(u, v, w) = abcr 2 sin ϑ. ∂(r, ϑ, ϕ) Hence, if B1 is the unit ball around 0 we have using iterated integrals ZZZ ZZZ v(E) = r 2 sin ϑ drdϑdϕ dudvdw = abc E = abc Z 1 drr 2 0 Z 0 2π dϕ Z B1 π sin ϑdϑ 0 1 4π = abc 2π (− cos ϑ)|π0 = abc. 3 3 2 2 x - y =1 2 x -y 2 =4 (c) ZZ (x2 + y 2) dxdy where C is bounded by the four hyperbolas C xy=2 xy=1 xy = 1, xy = 2, x2 − y 2 = 1, x2 − y 2 = 4. We change coordinates g(x, y) = (u, v) u = xy, The Jacobian is v = x2 − y 2 . ∂(u, v) y x = −2(x2 + y 2). = 2x −2y ∂(x, y) The Jacobian of the inverse transform is ∂(x, y) 1 =− . 2 ∂(u, v) 2(x + y 2) In the (u, v)-plane, the region is a rectangle D = {(u, v) ∈ R2 | 1 ≤ u ≤ 2, Hence, ZZ ZZ ZZ x2 + y 2 2 2 2 2 ∂(x, y) dudv = dudv = (x + y ) dxdy = (x + y ) ∂(u, v) 2(x2 + y 2 ) C D 1 ≤ v ≤ 4}. 1 3 v(D) = . 2 2 D Physical Applications If ̺(x) = ρ(x1 , x2 , x3 ) is a mass density of a solid C ⊂ R3 , then m= xi = Z ρ dx is the mass of C and CZ 1 m C xi ρ(x) dx, i = 1, . . . , 3 are the coordinates of the mass center x of C. 9.4 Appendix 257 The moments of inertia of C are defined as follows ZZZ ZZZ ZZZ 2 2 2 2 Ixx = (y + z )ρ dxdydz,Iyy = (x + z )ρ dxdydz,Izz = (x2 + y 2 )ρ dxdydz, Ixy = ZCZ Z xyρ dxdydz, Ixz = C ZCZ Z xzρ dxdydz, Iyz = C ZCZ Z yzρ dxdydz. C Here Ixx , Iyy , and Izz are the moments of inertia of the solid with respect to the x-axis, y-axis, and z-axis, respectively. Example 9.5 Compute the mass center of a homogeneous half-plate of radius R, C = {(x, y) | x2 + y 2 ≤ R2 , y ≥ 0}. Solution. By the symmetry of C with respect to the y-axis, x = 0. Using polar coordinates we find ZZ Z Z Z 1 1 R π 1 R 2 1 2R3 . y= y dxdy = r sin ϕrdϕ dr = r dr (− cos ϕ) |π0 = m m 0 0 m 0 m 3 C 2 ) is the mass center of Since the mass is proportional to the area, m = π R2 and we find (0, 4R 3π the half-plate. 9.4 Appendix Proof of Fubini’s Theorem. Let PA be a partition of A and PB a partition of B. Together they give a partition P of A × B for which any subrectangle S is of the form SA × SB , where SA is a subrectangle of the partition PA , and SB is a subrectangle of the partition PB . Thus X X mSA ×SB v(SA × SB ) mS v(S) = L(P, f ) = S SA ,SB = X X SA SB ! mSA ×SB v(SB ) v(SA ). Now, if x ∈ SA , then clearly mSA ×SB (f ) ≤ mSB (gx ) since the reference set SA × SB on the left is bigger than the reference set {x} × SB on the right. Consequently, for x ∈ SA we have Z X X gx dy = L(x) mSA ×SB v(SB ) ≤ mSB (gx ) v(SB ) ≤ SB X SB B SB mSA ×SB v(SB ) ≤ mSA (L(x)). Therefore, X X SA SB ! mSA ×SB v(SB ) v(SA ) ≤ X SA mSA (L(x))v(SA ) = L(PA , L). 258 9 Integration of Functions of Several Variables We thus obtain L(P, f ) ≤ L(PA , L) ≤ U(PA , L) ≤ U(PA , U) ≤ U(P, f ), where the proof of the last inequality is entirely analogous to the proof of the first. Since f is R integrable, sup{L(P, f )} = inf{U(P, f )} = A×B f dxdy. Hence, Z sup{L(PA , L)} = inf{U(PA , L)} = f dxdy. A×B R R In other words, L(x) is integrable on A and A×B f dxdy = A L(x) dx. The assertion for U(x) follows similarly from the inequalities L(P, f ) ≤ L(PA , L) ≤ L(PA , U) ≤ U(PA , U) ≤ U(P, f ). Chapter 10 Surface Integrals 10.1 Surfaces in R 3 Recall that a domain G is an open and connected subset in Rn ; connected means that for any two points x and y in G, there exist points x0 , x1 , . . . , xk with x0 = x and xk = y such that every segment xi−1 xi , i = 1, . . . , k, is completely contained in G. Definition 10.1 Let G ⊂ R2 be a domain and F : G → R3 continuously differentiable. The mapping F as well as the set F = F (G) = {F (s, t) | (s, t) ∈ G} is called an open regular surface if the Jacobian matrix F ′ (s, t) has rank 2 for all (s, t) ∈ G. If the Jacobian matrix of F is x(s, t) F (s, t) = y(s, t) , z(s, t) xs xt F ′ (s, t) = ys yt . zs zt The two column vectors of F ′ (s, t) span the tangent plane to F at (s, t): ∂y ∂z ∂x (s, t), (s, t), (s, t) , D1 F (s, t) = ∂s ∂s ∂s ∂x ∂y ∂z D2 F (s, t) = (s, t), (s, t), (s, t) . ∂t ∂t ∂t Justification: Suppose (s, t0 ) ∈ G where t0 is fixed. Then γ(s) = F (s, t0 ) defines a curve in F with tangent vector γ ′ (s) = D1 F (s, t0 ). Similarly, for fixed s0 we obtain another curve γ̃(t) = F (s0 , t) with tangent vector γ̃ ′ (t) = D2 F (s0 , t). Since F ′ (s, t) has rank 2 at every point of G, the vectors D1 F and D2 F are linearly independent; hence they span a plane. 259 10 Surface Integrals 260 Definition 10.2 Let F : G → R3 be an open regular surface, and (s0 , t0 ) ∈ G. Then ~x = F (s0 , t0 ) + αD1 F (s0 , t0 ) + βD2 F (s0 , t0 ), α, β ∈ R is called the tangent plane E to F at F (s0 , t0 ). The line through F (s0 , t0 ) which is orthogonal to E is called the normal line to F at F (s0 , t0 ). Recall that the vector product ~x × ~y of vectors ~x = (x1 , x2 , x3 ) and ~y = (y1 , y2, y3 ) from R3 is the vector e1 e2 e3 ~x × ~y = x1 x2 x3 = (x2 y3 − y2 x3 , x3 y1 − y3 x1 , x1 y2 − y1 x2 ). y y y 1 2 3 It is orthogonal to the plane spanned by the parallelogram P with edges ~x and ~y . Its length is the area of the parallelogram P . A vector which points in the direction of the normal line is e1 e2 D1 F (s0 , t0 ) × D2 F (s0 , t0 ) = xs ys x y t t D1 F ~n = ± kD1 F e3 zs zt × D2 F , × D2 F k (10.1) (10.2) where ~n is the unit normal vector at (s0 , t0 ). Example 10.1 (Graph of a function) Let F be given by the graph of a function f : G → R, namely F (x, y) = (x, y, f (x, y)). By definition D1 F = (1, 0, fx ), hence D2 F = (0, 1, fy ), e1 e2 e3 D1 f × D2 f = 1 0 fx = (−fx , −fy , 1). 0 1 f y Therefore, the tangent plane has the equation −fx (x − x0 ) − fy (y − y0 ) + 1(z − z0 ) = 0. Further, the unit normal vector to the tangent plane is (fx , fy , −1) ~n = ± p 2 . fx + fy2 + 1 10.1 Surfaces in R3 261 10.1.1 The Area of a Surface Let F and F be as above. We assume that the continuous vector fields D1 F and D2 F on G can be extended to continuous functions on the closure G. Definition 10.3 The number | F | = F := ZZ kD1 F × D2 F k dsdt (10.3) G is called the area of F and of F. We call dS = kD1 F × D2 F k dsdt RR dS. the scalar surface element of F . In this notation, | F | = G Helix: (s*cos(t), s*cos(t), 2*t) Example 10.2 Let F = {(s cos t, s sin t, 2t) | s ∈ [0, 2], t ∈ [0, 4π]} be the surfaced spanned by a helix. We shall compute its area. The normal vector is 25 20 15 D1 F = (cos t, sin t, 0), 10 5 D2 F = (−s sin t, s cos t, 2) -2 -1 0 0 1 -2 -1 2 0 1 e1 e2 e3 D1 F × D2 F = cos t sin t 0 = (2 sin t, −2 cos t, s). −s sin t s cos t 2 Therefore, |F| = Z 0 4π such that 2 Z 2p Z 2 2 2 4 cos t + 4 sin t + s dsdt = 4π 0 2 0 √ √ √ 4 + s2 ds = 8π( 2 − log( 2 − 1)). Example 10.3 (Guldin’s Rule (Paul Guldin, 1577–1643, Swiss Mathematician)) Let f be a continuously differentiable function on [a, b] with f (x) ≥ 0 for all x ∈ [a, b]. Let the graph of f revolve around the x-axis and let F be the corresponding surface. We have | F | = 2π Z a b f (x) p 1 + f ′ (x)2 dx. Proof. Using polar coordinates in the y-z-plane, we obtain a parametrization of F F = {(x, f (x) cos ϕ, f (x) sin ϕ) | x ∈ [a, b], ϕ ∈ [0, 2π]}. 10 Surface Integrals 262 We have D1 F = (1, f ′ (x) cos ϕ, f ′ (x) sin ϕ), D1 F × D2 F = (f f ′ , −f cos ϕ, −f sin ϕ); D1 F = (0, −f sin ϕ, f cos ϕ), p so that dS = f (x) 1 + f ′ (x)2 dxdϕ. Hence |F| = Z bZ a 0 2π Z b p p ′ 2 f (x) 1 + f (x) dϕ dx = 2π f (x) 1 + f ′ (x)2 dx. a 10.2 Scalar Surface Integrals Let F and F be as above, and f : F → R a continuous function on the compact subset F ⊂ R3 . Definition 10.4 The number ZZ ZZ f (~x) dS := f (F (s, t)) kD1 F (s, t) × D2 (s, t)k dsdt G F is called the scalar surface integral of f on F. 10.2.1 Other Forms for dS (a) Let the surface F be given as the graph of a function F (x, y) = (x, y, f (x, y)), (x, y) ∈ G. Then, see Example 10.1, q dS = 1 + fx2 + fy2 dxdy. (b) Let the surface be given implicitly as G(x, y, z) = 0. Suppose G is locally solvable for z in a neighborhood of some point (x0 , y0 , z0 ). Then the surface element (up to the sign) is given by dS = p Fx2 + Fy2 + Fz2 k grad F k dxdy = dxdy. | Fz | | Fz | One checks that DF1 × DF2 = (Fx , Fy , Fz )/Fz . (c) If F is given by F (s, t) = (x(s, t), y(s, t), z(s, t)) we have dS = √ EG − H 2 dsdt, where E = x2s + ys2 + zs2 , G = x2t + yt2 + zt2 , H = xs xt + ys yt + zs zt . 10.2 Scalar Surface Integrals 263 ~ Indeed, using ~a × b = k~ak by ~a and ~b we get ~ b sin ϕ and sin2 ϕ = 1 − cos2 ϕ, where ϕ is the angle spanned EG − H 2 = kD1 F k2 kD2 F k2 − (D1 F ·D2 F )2 = kD1 F k2 kD2 F k2 (1 − cos2 ϕ) = kD1 F k2 kD2 F k2 sin2 ϕ = kD1 F × D2 F k2 which proves the claim. Example 10.4 (a) We give two different forms for the scalar surface element of a sphere. By (b), the sphere x2 + y 2 + z 2 = R2 has surface element dS = k(2x, 2y, 2z)k R dxdy = dxdy. 2z z If x = R cos ϕ sin ϑ, y = R sin ϕ sin ϑ, z = R cos ϑ, we obtain D1 = Fϑ = R(cos ϕ cos ϑ, sin ϕ cos ϑ, − sin ϑ), D2 = Fϕ = R(− sin ϕ sin ϑ, cos ϕ sin ϑ, 0), D1 × D2 = R2 (cos ϕ sin2 ϑ, sin ϕ sin2 ϑ, sin ϑ cos ϑ). Hence, dS = kD1 × D2 k dϑdϕ = R2 sin ϑdϑdϕ. (b) Riemann integral in R3 and surface integral over spheres. Let M = {(x, y, z) ∈ R3 | ρ ≤ k(x, y, z)k ≤ R} where R > ρ ≥ 0. Let f : M → R be continuous. Let us denote the sphere of radius r by Sr = {(x, y, z) ∈ R3 | x2 + y 2 + z 2 = r 2 }. Then ZZZ M f dxdydz = Z R ρ dr ZZ Sr f (~x) dS = Z R ρ ZZ f (r~x) dS(~x) dr. r2 S1 Indeed, by the previous example, and by our knowledge of spherical coordinates (r, ϑ, ϕ). dxdydz = r 2 sin ϑ dr dϑ dϕ = dr dSr . On the other hand, on the unit sphere S1 , dS = sin ϑ dϑ dϕ such that dxdydz = r 2 dr dS which establishes the second formula. 10 Surface Integrals 264 10.2.2 Physical Application (a) If ρ(x, y, z) is the mass density on a surface F, RR ρ dS is the total mass of F. The mass F center of F has coordinates (xc , yc , zc ) with xc = RR 1 ρ dS ZZ x ρ(x, y, z) dS, F F and similarly for yc and zc . (b) If σ(~x) is a charge density on a surface F. Then U(~y ) = ZZ F σ(~x) dS(~x), k~y − ~xk ~y 6∈ F is the potential generated by F. 10.3 Surface Integrals 10.3.1 Orientation We want to define the notion of orientation for a regular surface. Let F be a regular (injective) surface with or without boundary. Then for every point x0 ∈ F there exists the tangent plane Ex0 ; the normal line to F at x0 is uniquely defined. However, a unit vector on the normal line can have two different directions. Definition 10.5 (a) Let F be a surface as above. A unit normal field to F is a continuous function ~n : F → R3 with the following two properties for every x0 ∈ F (i) ~n(x0 ) is orthogonal to the tangent plane to F at x0 . (ii) k~n(x0 )k = 1. (b) A regular surface F is called orientable, if there exists a unit normal field on F. Suppose F is an oriented, open, regular surface with piecewise smooth boundary ∂F. Let F (s, t) be a parametrization of F. We assume that the vector functions F , DF1 , and DF2 can be extended to continuous functions on F. The unit normal vector is given by ~n = ε D1 F × D2 F , kD1 F × D2 F k where ε = +1 or ε = −1 fixes the orientation of F. It turns out that for a regular surface F there either exists exactly two unit normal fields or there is no such field. If F is provided with an orientation we write F+ for the pair (F, ~n). For F with the opposite orientation, we write F− . 10.3 Surface Integrals 265 Examples of non-orientable surfaces are the Möbius band and the real projective plane. Analytically the Möbius band is given by 1 + t cos 2s sin s F (s, t) = 1 + t cos 2s cos s , t sin 2s 1 1 (s, t) ∈ [0, 2π] × − , . 2 2 Definition 10.6 Let f~ : F → R3 be a continuous vector field on F. The number ZZ ~ x) · ~n dS f(~ (10.4) F+ is called the surface integral of the vector field f~ on F+ . We call ~ = ~n dS = ε D1 F × D2 F dsdt dS the surface element of F. Remark 10.1 (a) The surface integral is independent of the parametrization of F but depends RR RR ~ =− ~ on the orientation; F+ f~· dS f~ dS. F− · For, let (s, t) = (s(ξ, η), t(ξ, η)) be a new parametrization with F (s(ξ, η), t(ξ, η)) = G(ξ, η). Then the Jacobian is ∂(s, t) dsdt = = (sξ tη − sη tξ ) dξdη. ∂(ξ, η) Further D1 G = D1 F sξ + D2 F tξ , D2 G = D1 F sη + D2 F tη , so that using ~x × ~x = 0, ~x × ~y = −~y × ~x D1 G × D2 Gdξdη = (D1 F sξ + D2 F tξ ) × (D1 F sη + D2 F tη )dξdη, = (sξ tη − sη tξ )D1 F × D2 F dξdη = D1 F × D2 F dsdt. (b) The scalar surface integral is a special case of the surface integral, namely RR RR f dS = f~n·~n dS. (c) Special cases. Let F be the graph of a function f , F = {(x, y, f (x, y)) | (x, y) ∈ C}, then ~ = ±(−fx , −fy , 1) dxdy. dS If the surface is given implicitly by F (x, y, z) = 0 and it is locally solvable for z, then ~ = ± grad F dxdy. dS Fz 10 Surface Integrals 266 ~ (d) Still another form of dS. ZZ ~ =ε f~ dS ZZ f (F (s, t)) · (D1 F × D2 F ) dsdt G F f1 (F (s, t)) f2 (F (s, t)) f3 (F (s, t)) ~ (s, t)) · (D1 F × D2 F ) = xs (s, t) . f(F y (s, t) z (s, t) s s x (s, t) yt (s, t) zt (s, t) t (10.5) (e) Again another notation. Computing the previous determinant or the determinant (10.1) explicitely we have ys zs zs xs xs ys ~ = f1 ∂(y, z) +f2 ∂(z, x) +f3 ∂(x, y) . +f2 +f3 f ·(D1 F ×D2 F ) = f1 yt zt zt xt xt yt ∂(s, t) ∂(s, t) ∂(s, t) Hence, ~ = εD1 F × D2 F dsdt = ε dS ~ = ε ( dydz, dzdx, dxdy) . dS Therefore we can write ZZ ~ = f~ · dS F In this setting ZZ (f1 dydz + f2 dzdx + f3 dxdy) . F f1 dydz = F ZZ ∂(z, x) ∂(x, y) ∂(y, z) dsdt, dsdt, dsdt ∂(s, t) ∂(s, t) ∂(s, t) ZZ ~ =± (f1 , 0, 0) · dS ZZ f1 (F (s, t)) ∂(y, z) dsdt. ∂(s, t) G F Sometimes one uses ~ = (cos(~n, e1 ), cos(~n, e2 ), cos(~n, e3 )) dS, dS ~ = ~n dS. since cos(~n, ei ) = ~n·ei = ni and dS Note that we have surface integrals in the last two lines, not ordinary double integrals since F is a surface in R3 and f1 = f1 (x, y, z) can also depend on x. RR ~ is the flow of the vector field f~ through the surface F. The The physical meaning of F f~ · dS flow is (locally) positive if ~n and f~ are on the same side of the tangent plane to F and negative in the other case. Example 10.5 (a) Compute the surface integral ZZ f dzdx F+ of f (x, y, z) = x2 yz where F is the graph of g(x, y) = x2 + y over the unit square G = [0, 1] × [0, 1] with the downward directed unit normal field. 10.3 Surface Integrals 267 By Remark 10.1 (c) ~ = (gx , gy , −1) dxdy = (2x, 1, −1) dxdy. dS Hence ZZ f dzdx = F+ ZZ ~ (0, f, 0) · dS F+ = (R) ZZ 2 2 x y(x + y) dxdy = 1 dx 0 G z Z Z 1 (x4 y + x2 y 2) dy = 0 19 . 90 (b) Let G denote the upper half ball of radius R in R3 : G = {(x, y, z) | x2 + y 2 + z 2 ≤ R2 , z ≥ 0}, and let F be the boundary of G with the orientation of the outer normal. Then F consists of the upper half sphere F1 y R R x F1 = {(x, y, p R2 − x2 − y 2 ) | x2 + y 2 ≤ R2 }, z = g(x, y) = p R2 − x2 − y 2 , with the upper orientation of the unit normal field and of the disc F2 in the x-y-plane F2 = {(x, y, 0) | x2 + y 2 ≤ R2 }, z = g(x, y) = 0, with the downward directed normal. Let f~(x, y, z) = (ax, by, cz). We want to compute ZZ ~ f~ · dS. F+ ~ = 1 (x, y, z) dxdy. Hence By Remark 10.1 (c), the surface element of the half-sphere F1 is dS z ZZ ZZ ZZ 1 1 2 2 2 ~ ~ f · dS = I1 = (ax + by + cz ) (ax, by, cz) · (x, y, z) dxdy = dxdy. z z z=g(x,y) BR F1+ BR Using polar coordinates x = r cos ϕ, y = r sin ϕ, r ∈ [0, R], and z = √ R2 − r 2 we get I1 = Z 2π dϕ 0 Noting R 2π 0 sin2 ϕdϕ = 0 R 2π I1 = π 0 Z Z ar 2 cos2 ϕ + br 2 sin2 ϕ + c(R2 − r 2 ) √ rdr. R2 − r 2 cos2 ϕdϕ = π we continue R 0 R p √ ar 3 br 3 √ +√ + 2cr R2 − r 2 R2 − r 2 R2 − r 2 dr. R2 − x2 − y 2 = 10 Surface Integrals 268 Using r = R sin t, dr = R cos t dt we have Z R Z π 3 3 Z π 2 R sin tR cos t dt 2 2 r3 3 p √ dr = =R sin3 t dt = R3 . 3 R2 − r 2 0 0 0 R 1 − sin2 t Hence, 2π 3 R (a + b) + πc I1 = 3 Z R 1 (R2 − r 2 ) 2 d(r 2 ) 0 R 2π 3 2 2 2 32 = R (a + b) + πc − (R − r ) 3 3 0 2π 3 R (a + b + c). = 3 In case of the disc F2 we have z = f (x, y) = 0, such that fx = fy = 0 and ~ = (0, 0, −1) dxdy dS by Remark 10.1 (c). Hence ZZ ZZ ZZ ~ ~ f · dS = (ax, by, cz) · (0, 0, −1) dxdy = −c z dxdy = 0. z=0 BR F2+ Hence, ZZ BR ~ = 2π R3 (a + b + c). (ax, by, cz) · dS 3 F 10.4 Gauß’ Divergence Theorem The aim is to generalize the fundamental theorem of calculus to higher dimensions: Z b f ′ (x) dx = f (b) − f (a). a Note that a and b form the boundary of the segment [a, b]. There are three possibilities to do this RR RRR ~ g dxdydz =⇒ Gauß’ theorem in R3 , f~· dS G (∂G)+ RR R g dxdy =⇒ f~· d~x Green’s theorem in R2 , G (∂G)+ RR R ~ f~· d~x Stokes’ theorem . =⇒ ~g · dS F+ (∂G)+ Let G ⊂ R3 be a bounded domain (open, connected) such that its boundary F = ∂G satisfies the following assumptions: 1. F is a union of regular, orientable surfaces Fi . The parametrization Fi (s, t), (s, t) ∈ Ci , of Fi as well as D1 Fi and D2 Fi are continuous vector functions on Ci; Ci is a domain in R2 . 10.4 Gauß’ Divergence Theorem 269 2. Let Fi be oriented by the outer normal (with respect to G). 3. There is given a continuously differentiable vector field f~ : G → R3 on G (More precisely, there exist an open set U ⊃ G and a continuously differentiable function f˜: U → R3 such that f˜↾G = f~.) Theorem 10.1 (Gauß’ Divergence Theorem) Under the above assumptions we have ZZ ZZZ ~ f~· dS div f~ dxdydz = (10.6) ∂G+ G Sometimes the theorem is called Gauß–Ostrogadski theorem or simply Ostrogadski theorem. Other writings: ZZZ ZZ ∂f1 ∂f2 ∂f3 dxdydz = + + (f1 dydz + f2 dzdx + f3 dxdy) (10.7) ∂x ∂y ∂z ∂G G The theorem holds for more general regions G ⊂ R3 . We give a proof for G = {(x, y, z) | (x, y) ∈ C, α(x, y) ≤ z ≤ β(x, y)}, β (x,y) where C ⊂ R2 is a domain and α, β ∈ C1 (C) define regular top and bottom surfaces F1 and F2 of F, respectively. We prove only one part of (10.7) namely f~ = (0, 0, f3). ZZ ZZZ ∂f3 dxdydz = f3 dxdy. (10.8) ∂z G Proof. α (x,y) 0000000000000000000000000000 1111111111111111111111111111 1111111111111111111111111111 0000000000000000000000000000 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 C 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 0000000000000000000000000000 1111111111111111111111111111 ∂G G By Fubini’s theorem, the left side reads ZZZ ∂f3 dxdydz = ∂x ZZ Z α(x,y) C G = ZZ β(x,y) ∂f3 dz ∂z ! dxdy (f3 (x, y, β(x, y)) − f3 (x, y, α(x, y))) dxdy, (10.9) C where the last equality is by the fundamental theorem of calculus. Now we are going to compute the surface integral. The outer normal for the top surface is (−βx (x, y), −βy (x, y), 1) such that ZZ ZZ I1 = (0, 0, f3) · (−βx (x, y), −βy (x, y), 1) dxdy f3 dxdy = F1+ = ZZ C C f3 (x, y, β(x, y)) dxdy. 10 Surface Integrals 270 Since the bottom surface F2 is oriented downward, the outer normal is (αx (x, y), αy (x, y), −1) such that ZZ ZZ I2 = f3 dxdy = − f3 (x, y, α(x, y)) dxdy. C F2+ Finally, the shell F3 is parametrized by an angle ϕ and z: F3 = {(r(ϕ) cos ϕ, r(ϕ) sin ϕ, z) | α(x, y) ≤ z ≤ β(x, y), ϕ ∈ [0, 2π]}. Since D2 F = (0, 0, 1), the normal vector is orthogonal to the z-axis, ~n = (n1 , n2 , 0). Therefore, ZZ ZZ f3 dxdy = (0, 0, f3 ) · (n1 , n2 , 0) dS = 0. I3 = F3+ F3+ Comparing I1 + I2 + I3 with (10.9) proves the theorem in this special case. Remarks 10.2 (a) Gauß’ divergence theorem can be used to compute the volume of the domain G ⊂ R3 . Suppose the boundary ∂G of G has the orientation of the outer normal. Then ZZ ZZ ZZ x dydz = y dzdx = z dxdy. v(G) = ∂G ∂G ∂G (b) Applying the mean value theorem to the left-hand side of Gauß’ formula we have for any bounded region G containing x0 ZZZ ZZ ~ ~ ~ div f (x0 + h) dxdydz = div f (x0 + h)v(G) = f~· dS, G ∂G+ where h is a small vector. The integral on the left is the volume v(G). Hence 1 G→x0 v(G) div f~(x0 ) = lim ZZ ∂G+ ~ = lim 1 3 f~· dS 4πε ε→0 3 ZZ Sε (x0 )+ ~ f~· dS, where the region G tends to x0 . In the second formula, we have chosen G = Bε (x0 ) the open ball of radius ε with center x0 . The right hand side can be thought as to be the source density of ~ In particular, the right side gives a basis independent description of div f~. the field f. Example 10.6 We want to compute the surface integral from Example 10.5 (b) using Gauß’ theorem: ZZ ZZZ ZZZ 2πR3 ~ ~ div f dxdydz = (a + b + c) dxdydz = f · dS = (a + b + c). 3 F+ C x2 +y 2 +z 2 ≤R2 , z≥0 10.4 Gauß’ Divergence Theorem 271 Gauß’ divergence theorem which play an important role in partial differential equations. Recall (Proposition 7.9 (Prop. 8.9)) that the directional derivative of a function v : U → R, U ⊂ Rn , at x0 in the direction of the unit vector ~n is given by D~n f (x0 ) = grad f (x0 )·~n. Notation. Let U ⊂ R3 be open and F+ ⊂ U be an oriented, regular open surface with the unit normal vector ~n(x0 ) at x0 ∈ F. Let g : U → R be differentiable. Then ∂g (x0 ) = grad g(x0 )·~n(x0 ) ∂~n (10.10) is called the normal derivative of g on F+ at x0 . Proposition 10.2 Let G be a region as in Gauß’ theorem, the boundary ∂G is oriented with the outer normal, u, v are twice continuously differentiable on an open set U with G ⊂ U. Then we have Green’s identities: ZZ ZZZ ZZZ ∂v ∇(u)·∇(v) dxdydz = u u ∆(v) dxdydz, (10.11) dS − ∂~n G G ∂G ZZZ ZZ ∂u ∂v dS, (10.12) −v (u∆(v) − v∆(u)) dxdydz = u ∂~n ∂~n G ZZZ Z∂GZ ∂u dS. (10.13) ∆(u) dxdydz = ∂~n G ∂G Proof. Put f = u∇(v). Then by nabla calculus div f = ∇(u∇v) = ∇(u) · ∇(v) + u∇·(∇v) = grad u· grad v + u∆(v). Applying Gauß’ theorem, we obtain ZZZ ZZZ ZZZ div f dxdydz = ( grad u· grad v) dxdydz + u∆(v) dxdydz G = ZG Z ∂G u grad v · ~n dS = ZZ G u ∂v dS. ∂~n ∂G This proves Green’s first identity. Changing the role of u and v and taking the difference, we obtain the second formula. Inserting v = −1 into (10.12) we get (10.13). Application to Laplace equation Let u1 and u2 be functions on G with ∆u1 = ∆u2 , which coincide on the boundary ∂G, u1 (x) = u2 (x) for all x ∈ ∂G. Then u1 ≡ u2 in G. 10 Surface Integrals 272 Proof. Put u = u1 − u2 and apply Green’s first formula (10.11) to u = v. Note that ∆(u) = ∆(u1 ) − ∆(u2 ) = 0 (U is harmonic ini G) and u(x) = u1 (x) − u2 (x) = 0 on the boundary x ∈ ∂G. In other words, a harmonic function is uniquely determined by its boundary values. ZZZ ZZZ ZZ ∂u dS − u ∆(u) dxdydz = 0. ∇(u)·∇(u) dxdydz = u |{z} | {z } ∂~n G ∂G 2 0,x∈∂G 0 2 Since ∇(u)·∇(u) = k∇(u)k ≥ 0 and k∇(u)k is a continuous function on G, by homework 14.3, k∇(u)k2 = 0 on G; hence ∇(u) = 0 on G. By the Mean Value Theorem, Corollary 7.12, u is constant on G. Since u = 0 on ∂G, u(x) = 0 for all x ∈ G. 10.5 Stokes’ Theorem Roughly speaking, Stokes’ theorem relates a surface integral over a surface F with a line integral over the boundary ∂F. In case of a plane surface in R2 , it is called Green’s theorem. 10.5.1 Green’s Theorem Γ 11111 00000 00000 11111 00000 11111 00000 11111 Γ 00000000000 11111111111 00000 11111 00000000000 11111111111 00000 11111 00000000000 11111111111 00000 00000000000 11111 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 00000000000 11111111111 2 Γ1 Let G be a domain in R2 with picewise smooth (differentiable) boundaries Γ1 , Γ2 , . . . , Γk . We give an orientation to the boundary: the outer curve is oriented counter clockwise (mathematical positive), the inner boundaries are oriented in the opposite direction. 3 G Theorem 10.3 (Green’s Theorem) Let (P, Q) be a continuously differentiable vector field on G and let the boundary Γ = ∂G be oriented as above. Then Z ZZ ∂Q ∂P dxdy = − P dx + Q dy. (10.14) ∂x ∂y Γ G Proof. (a) First, we consider a region G of type 1 in the plane, as shown in the figure and we will prove that Z ZZ ∂P dxdy = − P dx. (10.15) ∂y Γ G ψ(x) y Γ4 Γ3 Γ1 ϕ(x) Γ 2 a b x The double integral on the left may be evaluated as an iterated integral (Fubini’s theorem), we have ! ZZ Z b Z ψ(x) ∂P Py (x, y) dy dx dxdy = ∂y a ϕ(x) G Z b = (P (x, ψ(x)) − P (x, ϕ(x))) dx. a 10.5 Stokes’ Theorem 273 The latter equality is due to the fundamental theorem of calculus. To compute the line integral, we parametrize the four parts of Γ in a natural way: −Γ1 , ~x1 (t) = (a, t), Γ2 , ~x2 (t) = (t, ϕ(t)), Γ3 , ~x3 (t) = (b, t), −Γ4 , t ∈ [ϕ(a), ψ(a)], dx = 0, t ∈ [ϕ(b), ψ(b)], dx = 0, t ∈ [a, b], ~x4 (t) = (t, ψ(t)), dy = dt, dx = dt, t ∈ [a, b], dy = ϕ′ (t) dt, dy = dt, dx = dt, dy = ψ ′ (t) dt. Since dx = 0 on Γ1 and Γ3 we are left with the line integrals over Γ2 and Γ4 : Z b Z b Z P dx = P (t, ϕ(t)) dt − P (t, ψ(t)) dt Γ Let us prove the second part, − Z ψ(x) ∂ d Q(x, y) dy = ∂x dx ϕ(x) Inserting this into RR ∂Q ∂x G ZZ ∂Q dxdy = ∂x G = Z G Z dxdy = b d dx a Z RR Z a ∂Q ∂x dxdy = ψ(x) Q dy. Using Proposition 7.24 we have Γ Q(x, y) dy ϕ(x) R b R ψ(x) ∂Q ϕ(x) ∂x a Q(x, y) dy ϕ(x) Q(b, y) dy − Z ! − ψ ′ (x)Q(x, ψ(x)) + ϕ′ (x)Q(x, ϕ(x)). dy dx, we get ψ(x) ψ(b) ϕ(b) a R ! ′ − ψ (x)Q(x, ψ(x)) + ϕ (x)Q(x, ϕ(x)) ψ(a) ϕ(a) Q(a, y) dy − + We compute the line integrals: Z Z − Q dy = − −Γ1 Further, Z Q dy = Γ2 ψ(a) Q(a, y) dy, ϕ(a) Z Q(t, ϕ(t)) ϕ (t) dt, a Z Z b Z − Z −Γ4 ! dx b Q(x, ψ(x)) ψ ′ (x) dx+ (10.16) a Q(x, ϕ(x)) ϕ′ (x) dx. (10.17) a Q dy = Γ3 b ′ ′ Q dy = − Z Z ψ(b) Q(b, y) dy. ϕ(b) b Q(t, ψ(t)) ψ ′ (t) dt. a Adding up these integrals and comparing the result with (10.17), the proof for type 1 regions is complete. Γ1 Γ4 y type 2 ϕ(x) 1010 1010 type 1 110010 1111111111111111111111111 0000000000000000000000000 11111111111111 00000000000000 00000000000000 11111111111111 11111111 00000000 10 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 type 2 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 type 2 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 00000000000000 11111111111111 0000000000000000000000000 1111111111111111111111111 1111111111111111111 0000000000000000000 0000000000000000000000000 1111111111111111111111111 ψ(x) Γ2 Γ3 111 000 x 10 Surface Integrals 274 Exactly in the same way, we can prove that if G is a type 2 region then (10.14) holds. (b) Breaking a region G up into smaller regions, each of which is both of type 1 and 2, Green’s theorem is valid for G. The line integrals along the inner boundary cancel leaving the line integral around the boundary of G. (c) If the region has a hole, one can split it into two simply connected regions, for which Green’s theorem is valid by the arguments of (b). Application: Area of a Region R If Γ is a curve which bounds a region G, then the area of G is A = Γ (1 − α)x dy − αy dx where α ∈ R is arbitrary, in particular, Z Z Z 1 A= x dy − y dx = x dy = − y dx. (10.18) 2 Γ Γ Γ Proof. Choosing Q = (1 − α)x, P = −αy one has ZZ ZZ Z ZZ dxdy = ((1 − α) − (−α)) dxdy = (Qx − Py ) dxdy = P dx + Q dy A= G = −α Z Γ G y dx + (1 − α) Z Γ G x dy. Γ Inserting α = 0, α = 1, and α = 1 2 yields the assertion. x2 y2 Example 10.7 Find the area bounded by the ellipse Γ : 2 + 2 = 1. We parametrize Γ by a b ~x(t) = (a cos t, b sin t), t ∈ [0, 2π], ~x˙ (t) = (−a sin t, b cos t). Then (10.18) gives 1 A= 2 Z 0 2π 1 a cos t b sin t dt − b sin t(−a sin t) dt = 2 Z 2π ab dt = πab. 0 10.5.2 Stokes’ Theorem Conventions: Let F+ be a regular, oriented surface. Let Γ = ∂F+ be the boundary of F with the induced orientation: the orientation of the surface (normal vector) together with the orientation of the boundary form a right-oriented screw. A second way to get the induced orientation: sitting in the arrowhead of the unit normal vector to the surface, the boundary curve has counter clockwise orientation. Theorem 10.4 (Stokes’ theorem) Let F+ be a smooth regular oriented surface with a parametrization F ∈ C2 (G) and G is a plane region to which Green’s theorem applies. Let 10.5 Stokes’ Theorem 275 Γ = ∂F be the boundary with the above orientation. Further, let f~ be a continuously differentiable vector field on F. Then we have Z ZZ ~ ~ curl f · dS = f~· d~x. (10.19) ∂F+ F+ This can also be written as ZZ Z ∂f3 ∂f2 ∂f1 ∂f3 ∂f2 ∂f1 dydz+ dzdx+ dxdy = − − − f1 dx+f2 dy+f3 dz. ∂y ∂z ∂z ∂x ∂x ∂y Γ F+ Proof. Main idea: Reduction to Green’s theorem. Since both sides of the equation are additive with respect to the vector field f~, it suffices to proof the statement for the vector fields (f1 , 0, 0), (0, f2 , 0), and (0, 0, f3 ). We show the theorem for f~ = (f, 0, 0), the other cases are quite analogous: Z ZZ ∂f ∂f dzdx − dxdy = f dx. ∂z ∂y ∂F F Let F (u, v), u, v ∈ G be the parametrization of the surface F. Then dx = ∂x ∂x du + dv, ∂u ∂v (u, v) such that the line integral on the right reads with P (u, v) = f (x(u, v), y(u, v), z(u, v)) ∂x ∂u ∂x and Q(u, v) = f ∂v . Z f dx = ∂F Z f xu du + f xv dv = ∂G = = = ZZ ZGZ ZGZ Z P du + Q dv ∂G = Green′ s th. −(fu xu + f xvu ) + (fu xv + f xuv ) du dv = ZZ ZZ ∂Q ∂P + − ∂v ∂u G du dv −fv xu + fu xv du dv G −(fx xv + fy yv + fz zv )xu + (fx xu + fy yu + fz zu )xv du dv (−fy (xu yv − xv yu ) + fz (zu xv − zv xu )) du dv G ZZ ZZ ∂(x, y) ∂(z, x) du dv = + fz = −fy −fy dxdy + fz dzdx. ∂(u, v) ∂(u, v) G F This completes the proof. Remark 10.3 (a) Green’s theorem is a special case with F = G × {0}, ~n = (0, 0, 1) (orientation) and f~ = (P, Q, 0). 10 Surface Integrals 276 (b) The right side of (10.19) is called the circulation of the vector field f~ over the closed curve Γ . Now let ~x0 ∈ F be fixed and consider smaller and smaller neighborhoods F0+ of ~x0 with boundaries Γ0 . By Stokes’ theorem and by the Mean Value Theorem of integration, Z ZZ ~ curl f~·~n dS = curl f~(x0 )~n(x0 ) area (F0 ). f d~x = Γ0 F0 Hence, curl f~(x0 )·~n(x0 ) = lim F0 →x0 R ∂F0 f~ d~x | F0 | . We call curl f~(x0 ) · ~n(x0 ) the infinitesimal circulation of the vector field f~ at x0 corresponding to the unit normal vector ~n. (c) Stokes’ theorem then says that the integral over the infinitesimal circulation of a vector field f~ corresponding to the unit normal vector ~n over F equals the circulation of the vector field along the boundary of F. Path Independence of Line Integrals We are going complete the proof of Proposition 8.3 and show that for a simply connected region G ⊂ R3 and a twice continuously differentiable vector field f~ with curl f~ = 0 for all x ∈ G the vector field f~ is conservative. Proof. Indeed, let Γ be a closed, regular, piecewise differentiable curve Γ ⊂ G and let Γ be the the boundary of a smooth regular oriented surface F+ , Γ = ∂F+ such that Γ has the induced orientation. Inserting curl f~ = 0 into Stokes’ theorem gives ZZ Z ~ f~· d~x; curl f · dS = 0 = F+ Γ the line integral is path independent and hence, f~ is conservative. Note that the region must be simply connected; otherwise its in general impossible to find F with boundary Γ . 10.5.3 Vector Potential and the Inverse Problem of Vector Analysis Let f~ be a continuously differentiable vector field on the simply connected region G ⊂ R3 . Definition 10.7 The vector field f~ on G is called a source-free field (solenoidal field) if there exists a vector field ~g on G with f~ = curl ~g . Then ~g is called the vector potential to f~. Theorem 10.5 f~ is source-free if and only if div f~ = 0. 10.5 Stokes’ Theorem 277 Proof. (a) If f~ = curl ~g then div f~ = div ( curl ~g ) = 0. (b) To simplify notations, we skip the arrows. We explicitly construct a vector potential g to f with g = (g1 , g2 , 0) and curl g = f . This means ∂g2 , ∂z ∂g1 f2 = , ∂z ∂g2 ∂g1 − . f3 = ∂x ∂y f1 = − Integrating the first two equations, we obtain Z z g2 = − f1 (x, y, t) dt + h(x, y), z0 Z z f2 (x, y, t) dt, g1 = z0 where h(x, y) is the integration constant, not depending on z. Inserting this into the third equation, we obtain Z z Z z ∂f1 ∂f2 ∂g2 ∂g1 − =− (x, y, t) dt + hx (x, y) − (x, y, t) dt ∂x ∂y z0 ∂x z0 ∂y Z z ∂f1 ∂f2 =− dt + hx + ∂x ∂y z0 Z z ∂f3 (x, y, t) dt + hx = div f =0 z ∂z 0 f3 (x, y, z) = f3 (x, y, z) − f3 (x, y, z0 ) + hx (x, y). This yields, hx (x, y) = f3 (x, y, z0). Integration with respect to x finally gives Rx h(x, y) = x0 f3 (t, y, z0) dt; the third equation is satisfied and curl g = f . Remarks 10.4 (a) The proof of second direction is a constructive one; you can use this method to calculate a vector potential explicitly. You can also try another ansatz, say g = (0, g2 , g3 ) or g = (g1 , 0, g3 ). (b) If g is a vector potential for f and U ∈ C2 (G), then g̃ = g + grad U is also a vector potential for f . Indeed curl g̃ = curl g + curl grad U = f. The Inverse Problem of Vector Analysis Let h be a function and ~a be a vector field on G; both continuously differentiable. Problem: Does there exist a vector field f~ such that div f~ = h and curl f~ = ~a. 10 Surface Integrals 278 Proposition 10.6 The above problem has a solution if and only if div ~a = 0. Proof. The condition is necessary since div ~a = div curl f~ = 0. We skip the vector arrows. For the other direction we use the ansatz f = r + s with curl r = 0, div r = h, (10.20) curl s = a, div s = 0. (10.21) Since curl r = 0, by Proposition 8.3 there exists a potential U with r = grad U. Then curl r = 0 and div r = div grad U = ∆(U). Hence (10.20) is satisfied if and only if r = grad U and ∆(U) = h. Since div a = 0 by asssumption, there exists a vector potential g such that curl g = a. Let ϕ be twice continuously differentiable on G and set s = g + grad ϕ. Then curl s = curl g = a and div s = div g + div grad ϕ = div g + ∆(ϕ). Hence, div s = 0 if and only if ∆(ϕ) = − div g. Both equations ∆(U) = h and ∆(ϕ) = − div g are so called Poisson equations which can be solved within the theory of partial differential equations (PDE). The inverse problem has not a unique solution. Choose a harmonic function ψ, ∆(ψ) = 0 and put f1 = f + grad ψ. Then div f1 = div f + div grad ψ = div f + ∆(ψ) = div f = h, curl f1 = curl f + curl grad ψ = curl f = a. Chapter 11 Differential Forms on Rn We show that Gauß’, Green’s and Stokes’ theorems are three cases of a “general” theorem which R R is also named after Stokes. The simple formula now reads c dω = ∂c ω. The appearance of the Jacobian in the change of variable theorem will become clear. We formulate the Poincaré lemma. Good references are [Spi65], [AF01], and [vW81]. 11.1 The Exterior Algebra Λ( R) n Although we are working with the ground field R all constructions make sense for arbitrary fields K, in particular, K = C. Let {e1 , . . . , en } be the standard basis of Rn ; for h ∈ Rn we P write h = (h1 , . . . , hn ) with respect to the standard basis, h = i hi ei . 11.1.1 The Dual Vector Space V ∗ The interplay between a normed space E and its dual space E ′ forms the basis of functional analysis. We start with the definition of the (algebraic) dual. Definition 11.1 Let V be a linear space. The dual vector space V ∗ to V is the set of all linear functionals f : V → R, V ∗ = {f : V → R | f is linear}. It turns out that V ∗ is again a linear space if we introduce addition and scalar multiples in the natural way. For f, g ∈ V ∗ , α ∈ R put (f + g)(v) := f (v) + g(v), (αf )(v) := αf (v). The evaluation of f ∈ V ∗ on v ∈ V is sometimes denoted by f (v) = hf , vi ∈ K. 279 11 Differential Forms on Rn 280 In this case, the brackets denote the dual pairing between V ∗ and V . By definition, the pairing is linear in both components. That is, for all v, w ∈ V and for all λ, µ ∈ R hλf + µg , vi = λ hf , vi + µ hg , vi , hf , λv + µwi = λ hf , vi + µ hf , wi . Example 11.1 (a) Let V = Rn with the above standard basis. For i = 1, . . . , n define the ith coordinate functional dxi : Rn → R by dxi (h) = dxi (h1 , . . . , hn ) = hi , h ∈ Rn . The functional dxi associates to each vector h ∈ Rn its ith coordinate hi . The functional dxi is indeed linear since for all v, w ∈ Rn and α, β ∈ R, dxi (αv+βw) = (αv+βw)i = αvi +βwi = α dxi (v) + β dxi (w). The linear space (Rn )∗ has also dimension n. We will show that { dx1 , dx2 , . . . , dxn } is a basis of (Rn )∗ . We call it the dual basis of to {e1 , . . . , en }. Using the Kronecker symbol the evaluation of dxi on ej reads as follows dxi (ej ) = δij , i, j = 1, . . . , n. { dx1 , dx2 , . . . , dxn } generates V ∗ . Indeed, let f ∈ V ∗ . Then f = coincide for all h ∈ V : n X i=1 f (ei ) dxi (h) = n X f (ei ) hi i=1 = f homog. X f (hi ei ) = f i Pn i=1 X i f (ei ) dxi since both hi ei ! = f (h). In Propostition 11.1 below, we will see that { dx1 , . . . , dxn } is not only generating but linearly independent. (b) If V = C([0, 1]), the continuous functions on [0, 1] and α is an increasing on [0, 1] function, then the Riemann-Stieltjes integral ϕα (f ) = Z 1 f dα, 0 defines a linear functional ϕα on V . If a ∈ [0, 1], δa (f ) = f (a), f ∈V f ∈V defines the evaluation functional of f at a. In case a = 0 this is Dirac’s δ-functional playing an important role in the theory of distributions (generalized functions). P (c) Let a ∈ Rn . Then ha , xi = ni=1 ai xi , x ∈ Rn defines a linear functional on Rn . By (a) this is already the most general form of a linear functional on Rn . 11.1 The Exterior Algebra Λ(Rn ) 281 Definition 11.2 Let k ∈ N. An alternating (or skew-symmetric) multilinear form of degree k on Rn , a k-form for short, is a mapping ω : Rn × · · · × Rn → R, k factors Rn , which is multilinear and skew-symmetric, i. e. MULT ω(· · · , αvi + βwi , · · · ) = αω(· · · , vi , · · · ) + βω(· · · , wi, · · · ), SKEW ω(· · · , vi , · · · , vj , · · · ) = −ω(· · · , vj , · · · , vi , · · · ), (11.1) i, j = 1, . . . , k, i 6= j, (11.2) for all vectors v1 , v2 , . . . , vk , wi ∈ Rn . We denote the linear space of all k-forms on Rn by Λk (Rn ) with the convention Λ0 (Rn ) = R. In case k = 1 property (11.2) is an empty condition such that Λ1 (Rn ) = (Rn )∗ is just the dual space. Let f1 , . . . , fk ∈ (Rn )∗ be linear functionals on Rn . Then we define the k-form f1 ∧ · · · ∧fk ∈ Λk (Rn ) (read: “f1 wedge f2 . . . wedge fk ”) as follows f1 (h1 ) · · · f1 (hk ) .. (11.3) f1 ∧ · · · ∧fk (h1 , . . . , hk ) = ... . fk (h1 ) · · · fk (hk ) In particular, let i1 , . . . , ik ∈ {1, . . . , n} be fixed and choose fj = dxij , j = 1, . . . , k. Then h1i · · · hki 1 1 .. dxi1 ∧ · · · ∧ dxik (h1 , . . . , hk ) = ... . h1i · · · hki k k f1 ∧ · · · ∧fk is indeed a k-form since the fi are linear, the determinant is multilinear ′ a b c a b c λa + µa′ b c λd + µd′ e f = λ d e f + µ d′ e f , g ′ h i g h i λg + µg ′ h i and skew-symmetric b a c a b c d e f = − e d f . h g i g h i For example, let y = (y1 , . . . , yn ), z = (z1 , . . . , zn ) ∈ Rn , y3 z3 = y3 z1 − y1 z3 . dx3 ∧ dx1 (y, z) = y1 z1 If fr = fs = f for some r 6= s, we have f1 ∧ · · · ∧f ∧ · · · ∧f ∧ · · · ∧fk = 0 since determinants with identical rows vanish. Also, for any ω ∈ Λk (Rn ), ω(h1 , . . . , h, . . . , h, . . . , hk ) = 0, h1 , . . . , hk , h ∈ Rn since the defining determinant has two identical columns. 11 Differential Forms on Rn 282 Proposition 11.1 For k ≤ n the k-forms { dxi1 ∧ · · · ∧ dxik | 1 ≤ i1 < i2 < · · · < ik ≤ n} form a basis of the vector space Λk (Rn ). A k-form with k > n is identically zero. We have n k n dim Λ (R ) = . k Proof. Any k-form ω is uniquely determined by its values on the k-tuple of vectors (ei1 , . . . , eik ) with 1 ≤ i1 < i2 < · · · < ik ≤ n. Indeed, using skew-symmetry of ω, we know ω on all ktuples of basis vectors; using linearity in each component, we get ω on all k-tuples of vectors. This shows that the dxi1 ∧ · · · ∧ dxik with 1 ≤ i1 < i2 < · · · < ik ≤ n generate the linear space P P Λk (Rn ). We make this precise in case k = 2. With y = i yi ei , z = j zj ej we have by linearity and skew-symmetry of ω ω(y, z) = n X yi zj ω(ei , ej ) = SKEW i,j=1 = X i<j Hence, X (yi zj − yj zi )ω(ei , ej ) 1≤i<j≤n yi zi X = ω(ei , ej ) ω(ei , ej ) dxi ∧ dxj (y, z). yj zj i<j ω= X i<j ω(ei, ej ) dxi ∧ dxj . This shows that the n2 2-forms { dxi ∧ dxj | i < j} generate Λ2 (Rn ). P We show its linear independence. Suppose that i<j αij dxi ∧ dxj = 0 for some αij ∈ R. Evaluating this on (er , es ), r < s, gives X X δri δsi X = αij dxi ∧ dxj (er , es ) = αij αij (δri δsj − δrj δsi ) = αrs ; 0= δ δ rj sj i<j i<j i<j hence, the above 2-forms are linearly independent. The arguments for general k are similar. In general, let ω ∈ Λk (Rn ) then there exist unique numbers ai1 ···ik = ω(ei1 , . . . , eik ) ∈ R, i1 < i2 < · · · < ik such that X ω= ai1 ···ik dxi1 ∧ · · · ∧ dxik . 1≤i1 <···<ik ≤n Example 11.2 Let n = 3. k = 1 { dx1 , dx2 , dx3 } is a basis of Λ1 (R3 ). k = 2 { dx1 ∧ dx2 , dx1 ∧ dx3 , dx2 ∧ dx3 } is a basis of Λ2 (R3 ). k = 3 { dx1 ∧ dx2 ∧ dx3 } is a basis of Λ3 (R3 ). Λk (R3 ) = {0} for k ≥ 4. Definition 11.3 An algebra A over R is a linear space together with a product map (a, b) → ab, A × A → A, such that the following holds for all a, b, c ∈ A an d α ∈ R 11.1 The Exterior Algebra Λ(Rn ) 283 (i) a(bc) = (ab)c (associative), (ii) (a + b)c = ac + bc, a(b + c) = ab + ac, (iii) α(ab) = (αa)b = a(αb). Standard examples are C(X), the continuous functions on a metric space X or Rn×n , the full n × n-matrix algebra over R or the algebra of polynomials R[X]. n M n Let Λ(R ) = Λk (Rn ) be the direct sum of linear spaces. k=0 Proposition 11.2 (i) Λ(Rn ) is an R-algebra with unity 1 and product ∧ defined by ( dxi1 ∧ · · · ∧ dxik ) ∧ ( dxj1 ∧ · · · ∧ dxjl ) = dxi1 ∧ · · · ∧ dxik ∧ dxj1 ∧ · · · ∧ dxjl (ii) If ωk ∈ Λk (Rn ) and ωl ∈ Λl (Rn ) then ωk ∧ωl ∈ Λk+l (Rn ) and ωk ∧ωl = (−1)kl ωl ∧ωk . Proof. (i) Associativity is clear since concatanation of strings is associative. The distributive laws are used to extend multiplication from the basis to the entire space Λ(Rn ). We show (ii) for ωk = dxi1 ∧ · · · ∧ dxik and ωl = dxj1 ∧ · · · ∧ dxjl . We already know dxi ∧ dxj = − dxj ∧ dxi . There are kl transpositions dxir ↔ dxjs necessary to transport all dxjs from the right to the left of ωk . Hence the sign is (−1)kl . In particular, dxi ∧ dxi = 0. The formula dxi ∧ dxj = − dxj ∧ dxi determines the product in Λ(Rn ) uniquely. We call Λ(Rn ) is the exterior algebra of the vector space Rn . The following formula will be used in the next subsection. Let ω ∈ Λk (Rn ) and η ∈ Λl (Rn ) then for all v1 , . . . , vk+l ∈ Rn (ω∧η)(v1 , . . . , vk+l ) = 1 X (−1)σ ω(vσ(1) , . . . , vσ(k) ) η(vσ(k+1) , . . . , vσ(k+l) ). k!l! σ∈S (11.4) k+l Indeed, let ω = f1 ∧ · · · ∧fk , η = fk+1 ∧ · · · ∧fk+l , fi ∈ (Rn )∗ . The above formula can be obtained from (f1 ∧ · · · ∧fk ) ∧ (fk+1 ∧ · · · ∧fk+l )(v1 , . . . , vk+l ) = f1 (v1 ) · · · f1 (vk ) f1 (vk+1 ) · · · f1 (vk+l ) f2 (v1 ) · · · f2 (vk ) f2 (vk+1 ) · · · f2 (vk+l ) .. .. .. . . . = fk (v1 ) · · · fk (vk ) fk (vk+1 ) · · · fk (vk+l ) f (v ) · · · f (v ) f (v ) · · · f (v ) k+1 k k+1 k+1 k+1 k+l k+1 1 .. .. .. . . . f (v ) · · · f (v ) f (v ) · · · f (v ) k+l 1 k+l k k+l k+1 k+l k+l 11 Differential Forms on Rn 284 when expanding this determinant with respect to the last l rows. This can be done using Laplace expansion: ai j · · · ai j ai j · · · aik+1 jk+l 1 1 1 k k+1 k+1 X Pk .. .. .. |A| = (−1) m=1 (im +jm ) ... , . . . 1≤j1 <···<jk ≤n ai j · · · ai j ai j · · · ai j k 1 k k k+l k+1 k+l k+l where (i1 , . . . , ik ) is any fixed orderd multi-index and (jk+1 , . . . , jk+l ) is the complementary orderd multi-index to (j1 , . . . , jk ) such that all inegers 1, 2, . . . , k + l appear. 11.1.2 The Pull-Back of k-forms Definition 11.4 Let A ∈ L(Rn , Rm ) a linear mapping and k ∈ N. For ω ∈ Λk (Rm ) we define a k-form A∗ (ω) ∈ Λk (Rn ) by (A∗ ω)(h1 , . . . , hk ) = ω(A(h1 ), A(h2 ), . . . , A(hk )), h1 , . . . , hk ∈ Rn We call A∗ (ω) the pull-back of ω under A. Note that A∗ ∈ L(Λk (Rm ), Λk (Rn )) is a linear mapping. In case k = 1 we call A∗ the dual mapping to A. In case k = 0, ω ∈ R we simply set A∗ (ω) = ω. We have A∗ (ω∧η) = A∗ (ω)∧A∗ (η). Indeed, let ω ∈ Λk (Rn ), η ∈ Λl (Rn ), and hi ∈ Rn , i = 1, . . . , k + l, then by (11.4) A∗ (ω∧η)(h1 , . . . ,hk+l ) = (ω∧η)(A(h1 ), . . . , A(hk+l )) 1 X (−1)σ ω(A(vσ(1) ), . . . , A(vσ(k) )) η(A(vσ(k+1) ), . . . , A(vσ(k+l) )) = k!l! σ∈S k+l 1 X (−1)σ A∗ (ω)(vσ(1) , . . . , vσ(k) ) A∗ (η)(vσ(k+1) , . . . , vσ(k+l) ) = k!l! σ∈S k+l ∗ = (A (ω)∧A∗ (η))(h1 , . . . , hk+l ). 1 0 3 Example 11.3 (a) Let A = ∈ R2×3 be a linear map A : R3 → R2 , defined by 2 1 0 matrix multiplication, A(v) = A·v, v ∈ R3 . Let {e1 , e2 , e3 } and {f1 , f2 } be the standard bases of R3 and R2 resp. and let { dx1 , dx2 , dx3 } and { dy1 , dy2 } their dual bases, respectvely. First we compute A∗ ( dy1 ) and A∗ ( dy2 ). A∗ ( dy1)(ei ) = dy1 (A(ei )) = ai1 , A∗ ( dy2 )(ei ) = ai2 . In particular, A∗ ( dy1 ) = 1 dx1 + 0 dx2 + 3 dx3 , A∗ ( dy2 ) = 2 dx1 + dx2 . Compute A∗ ( dy2 ∧ dy1 ). By definition, for 1 ≤ i < j ≤ 3, ai2 aj2 . A ( dy2 ∧ dy1 )(ei , ej ) = dy2 ∧ dy1 (A(ei ), A(ej )) = dy2 ∧ dy1 (Ai , Aj ) = aj2 aj1 ∗ 11.2 Differential Forms 285 In particular 2 A ( dy2 ∧ dy1 )(e1 , e2 ) = 1 1 A∗ ( dy2 ∧ dy1 )(e2 , e3 ) = 3 ∗ Hence, 1 = −1, 0 0 = 3. 0 2 0 = 6, A ( dy2 ∧ dy1 )(e1 , e3 ) = 1 3 ∗ A∗ ( dy2∧ dy1 ) = − dx1 ∧ dx2 + 6 dx1 ∧ dx3 + 3 dx2 ∧ dx3 . On the other hand A∗ ( dy2 )∧A∗ ( dy1) = (2 dx1 + dx2 )∧( dx1 +3 dx3 ) = − dx1 ∧ dx2 +3 dx1 ∧ dx3 +6 dx1 ∧ dx3 . (b) Let A ∈ Rn×n , A : Rn → Rn and ω = A∗ (ω) = det(A) ω. 11.1.3 Orientation of dx1 ∧ · · · ∧ dxn ∈ Λn (Rn ). Then R n If {e1 , . . . , en } and {f1 , . . . , fn } are two bases of Rn there exists a unique regular matrix A = P (aij ) (det A 6= 0) such that ei = j aij fj . We say that {e1 , . . . , en } and {f1 , . . . , fn } are equivalent if and only if det A > 0. Since det A 6= 0, there are exactly two equivalence classes. We say that the two bases {ei | i = 1, . . . , n} and {fi | i = 1, . . . , n} define the same orientation if and only if det A > 0. Definition 11.5 An orientation of Rn is given by fixing one of the two equivalence classes. Example 11.4 (a) In R2 the bases {e1 , e2 } and {e2 , e1 } have different orientations since A = 0 1 and det A = −1. 1 0 (b) In R3 the bases {e1 , e2 , e3 }, {e3 , e1 , e2 } and {e2 , e3 , e1 } have the same orientation whereas {e1 , e3 , e2 }, {e2 , e1 , e3 }, and {e3 , e2 , e1 } have opposite orientation. (c) The standard basis {e1 , . . . , en } and {e2 , e1 , e3 , . . . , en } define different orientations. 11.2 Differential Forms 11.2.1 Definition Throughout this section let U ⊂ Rn be an open and connected set. Definition 11.6 (a) A differential k-form on U is a mapping ω : U → Λk (Rn ), i. e. to every point p ∈ U we associate a k-form ω(p) ∈ Λk (Rn ). The linear space of differential k-forms on U is denoted by Ω k (U). 11 Differential Forms on Rn 286 (b) Let ω be a differential k-form on U. Since { dxi1 ∧· · ·∧ dxik | 1 ≤ i1 < i2 < · · · < ik ≤ n} forms a basis of Λk (Rn ) there exist uniquely determined functions ai1 ···ik on U such that X ω(p) = (11.5) ai1 ···ik (p) dxi1 ∧ · · · ∧ dxik . 1≤i1 <···<ik ≤n If all functions ai1 ···ik are in Cr (U), r ∈ N ∪ {∞} we say ω is an r times continuosly differentiable differential k-form on U. The set of those differential k-forms is denoted by Ωrk (U) We define Ωr0 (U) r = C (U) and Ω(U) = n M Ω k (U). The product in Λ(Rn ) defines a product k=0 in Ω(U): (ω∧η)(x) = ω(x)∧η(x), x ∈ U, hence Ω(U) is an algebra. For example, if ω1 = x2 dy∧ dz + xyz dx∧ dy ω2 = (xy 2 + 3z 2 ) dx define a differential 2-form and a 1-form on R3 , ω1 ∈ Ω 2 (R3 ), ω2 ∈ Ω 1 (R3 ) then ω1 ∧ω2 = (x3 y 2 + 3x2 z 2 ) dx∧ dy∧ dz. 11.2.2 Differentiation Definition 11.7 Let f ∈ Ω 0 (U) = Cr (U) and p ∈ U. We define df (p) = Df (p); then df is a differential 1-form on U. X If ω(p) = ai1 ···ik (p) dxi1 ∧ · · · ∧ dxik is a differential k-form, we define 1≤i1 <···<ik ≤n dω(p) = X 1≤i1 <···<ik ≤n dai1 ···ik (p)∧ dxi1 ∧ · · · ∧ dxik . (11.6) Then dω is a differential (k + 1)-form. The linear operator d : Ω k (U) → Ω k+1 (U) is called the exterior differential. Remarks 11.1 (a) Note, that for a function f : U → R, Df ∈ L(Rn , R) = Λ1 (Rn ). By Example 7.7 (a) n n X X ∂f ∂f Df (x)(h) = grad f (x) · h = (x)hi = (x) dxi (h), ∂x ∂x i i i=1 i=1 hence n X ∂f df (x) = (x) dxi . ∂xi i=1 (11.7) 11.2 Differential Forms 287 Viewing xi : U → R as a C∞ -function, by the above formula dxi (x) = dxi . This justifies the notation dxi . If f ∈ C∞ (R) we have df (x) = f ′ (x) dx. (b) One can show that the definition of dω does not depend on the choice of the basis { dx1 , . . . , dxn } of Λ1 (Rn ). Example 11.5 (a) G = R2 , ω = exy dx + xy 3 dy. Then dω = d(exy )∧ dx + d(xy 3 )∧ dy = (yexy dx + xexy dy)∧ dx + (y 3 dx + 3xy 2 dy)∧ dy = (−xexy + y 3) dx∧ dy. (b) Let f be continuously differentiable. Then df = fx dx + fy dy + fz dz = grad f ·( dx, dy, dz) = grad f ·. d~x. (c) Let v = (v1 , v2 , v3 ) be a C1 -vector field. Put ω = v1 dx + v2 dy + v3 dz. Then we have dω = ∂v3 ∂v2 − ∂y ∂z dy∧ dz + ∂v1 ∂v3 − ∂z ∂x dz∧ dx + ∂v2 ∂v1 − ∂x ∂y dx∧ dy = curl (v)· ( dy∧ dz, dz∧ dx, dx∧ dy) = curl (v)· dS. (d) Let v be as above. Put ω = v1 dy∧ dz + v2 dz∧ dx + v3 dx∧ dy. Then we have dω = div (v) dx∧ dy∧ dz. Proposition 11.3 The exterior differential d is a linear mapping which satisfies (i) d(ω∧η) = dω∧η + (−1)k ω∧dη, (ii) d(dω) = 0, ω ∈ Ω2 (U). ω ∈ Ω1k (U), η ∈ Ω1 (U). Proof. (i) We first prove Leibniz’ rule for functions f, g ∈ Ω10 (U). By Remarks 11.1 (a), X ∂f X ∂ ∂g (f g) dxi = g+f dxi d(f g) = ∂xi ∂xi ∂xi i i X ∂f X ∂g = dxi g + dxi f = df g + f dg. ∂xi ∂xi i i For I = (i1 , . . . , ik ) and J = (j1 , . . . , jl ) we abbreviate dxI = dxi1 ∧ · · · ∧ dxik and dxJ = 11 Differential Forms on Rn 288 dxj1 ∧ · · · ∧ dxjl . Let ω = X I d(ω∧η) = d = X I,J = aI dxI and η = X I,J X bJ dxJ . By definition J X I,J aI bJ dxI ∧ dxJ ! = X I,J d(aI bj )∧ dxI ∧ dxJ (daI bJ + aI dbJ ) ∧ dxI ∧ dxJ daI ∧ dxI ∧ bJ dxJ + k = dω∧η + (−1) ω∧dη, X I,J aI dxI ∧dbJ ∧ dxJ (−1)k where in the third line we used dbJ ∧ dxI = (−1)k dxI ∧dbJ . (ii) Again by the definition of d: d(dω) = X I X 2 X ∂aI ∂ aI d (daI ∧ dxI ) = d dxj ∧ dxI = dxi ∧ dxj ∧ dxI ∂x ∂x j i ∂xj I,j I,i,j X ∂ 2 aI (− dxj ∧ dxi ∧ dxI ) = −d(dω). Schwarz’ lemma ∂xj ∂xi I,i,j = It follows that d(dω) = d2 ω = 0. 11.2.3 Pull-Back Definition 11.8 Let f : U → V be a differentiable function with open sets U ⊂ Rn and V ⊂ Rm . Let ω ∈ Ω k (V ) be a differential k-form. We define a differential k-form f ∗ (ω) ∈ Ω k (U) by (f ∗ ω)(p) = (Df (p)∗)ω(f (p)), (f ∗ ω)(p; h1, . . . , hk ) = ω(f (p); Df (p)(h1), . . . , Df (p)(hk )), p ∈ U, h1 , . . . , hk ∈ Rn . In case k = 0 and ω ∈ Ω 0 (V ) = C∞ (V ) we simply set (f ∗ ω)(p) = ω(f (p)), f ∗ ω = ω ◦f. We call f ∗ (ω) the pull-back of the differential k-form ω with respect to f . Note that by definition the pull-back f ∗ is a linear mapping from the space of differential kforms on V to the space of differential k-forms on U, f ∗ : Ω k (V ) → Ω k (U). 11.2 Differential Forms 289 Proposition 11.4 Let f be as above and ω, η ∈ Ω(V ). Let { dy1 , . . . , dym } be the dual basis to the standard basis in (Rm )∗ . Then we have with f = (f1 , . . . , fm ) n X ∂fi dxj = dfi , ∂x j j=1 (a) f ∗ ( dyi) = (b) f ∗ (dω) = d(f ∗ ω), (c) f ∗ (aω) = (a◦f )f ∗ (ω), (d) ∗ ∗ ∗ f (ω∧η) = f (ω)∧f (η). i = 1, . . . , m. a ∈ C∞ (V ), (11.8) (11.9) (11.10) (11.11) If n = m, then (e) f ∗ ( dy1 ∧ · · · ∧ dyn ) = ∂(f1 , . . . , fn ) dx1 ∧ · · · ∧ dxn . ∂(x1 , . . . , xn ) (11.12) Proof. We show (a). Let h ∈ Rn ; by Definition 11.7 and the definition of the derivative we have + * ! n X (p) ∂f k f ∗ ( dyi )(h) = dyi (Df (p)(h)) = dyi , hj ∂x j j=1 k=1,...,m n n X ∂fi (p) X ∂fi (p) = hj = dxj (h). ∂xj ∂xj j=1 j=1 This shows (a). Equation (11.10) is a special case of (11.11); we prove (d). Let p ∈ U. Using the pull-back formula for k forms we obtain f ∗ (ω∧η)(p) = (Df (p))∗ (ω∧η(f (p))) = (Df (p))∗(ω(f (p))∧η(f (p))) = (Df (p))∗ (ω(f (p))) ∧ (Df (p))∗ (η(f (p))) = f ∗ (ω)(p) ∧ f ∗ (η)(p) = (f ∗ (ω) ∧ f ∗ (η))(p) To show (11.9) we start with a 0-form g and prove that f ∗ (dg) = d(f ∗ g) for functions g : U → R. By (11.7) and (11.10) we have ! m m X X ∂g ∂g(f (p)) ∗ ∗ ∗ f (dg)(p) = f dyi (p) = f ( dyi) ∂yi ∂yi i=1 i=1 n m X ∂g(f (p)) X ∂fi (p) dxj ∂y ∂x i j i=1 j=1 ! n m X X ∂g(f (p)) ∂fi (p) = dxj ∂yi ∂xj j=1 i=1 = n X ∂ = (g ◦f )(p) dxj chain rule ∂x j j=1 = d(g ◦f )(p) = d(f ∗ g)(p). Now let ω = P I aI dxI be an arbitrary form. Since by Leibniz rule df ∗ ( dxI ) = d(d(fi1 )∧ · · · ∧d(fik )) = 0, 11 Differential Forms on Rn 290 we get by Leibniz rule d (f ∗ ω) = d = X I X f ∗ (aI )f ∗ ( dxI ) I ! (d(f ∗ (aI )) ∧ f ∗ ( dxI ) + f ∗ (aI ) d(f ∗ ( dxI ))) = X I d(f ∗ (aI ))i ∧ f ∗ ( dxI ). On the other hand, by (d) we have ! ! X X X ai dxI = f ∗ daI ∧ dxI = f ∗ (daI ) ∧ f ∗ ( dxI ). f∗ d I I I By the first part of (b), both expressions coincide. This completes the proof of (b). We finally prove (e). By (b) and (d) we have f ∗ ( dy1 ∧ · · · ∧ dyn ) = f ∗ ( dy1 )∧ · · · ∧f ∗ ( dyn ) n n X X ∂f1 ∂fn dxin dxi1 ∧ · · · ∧ = ∂xi1 ∂xin i =1 i =1 1 = n n X i1 ,...,in ∂f1 ∂fn dxi1 ∧ · · · ∧ dxin . ··· ∂xi1 ∂xin =1 Since the square of a 1-form vanishes, the only non-vanishing terms in the above sum are the permutations (i1 , . . . , in ) of (1, . . . , n). Using skew-symmetry to write dxi1 ∧ · · · ∧ dxin as a multiple of dx1 ∧ · · · ∧ dxn , we obtain the sign of the permutation (i1 , . . . , in ): X ∂f1 ∂fn sign (I) f ∗ ( dy1 ∧ · · · ∧ dyn ) = ··· dx1 ∧ · · · ∧ dxn ∂xi1 ∂xin I=(i1 ,...,in )∈Sn = ∂(f1 , . . . , fn ) dx1 ∧ · · · ∧ dxn . ∂(x1 , . . . , xn ) Example 11.6 (a) Let f (r, ϕ) = (r cos ϕ, r sin ϕ) be given on R2 \ (0 × R) and let { dr, dϕ} and { dx, dy} be the dual bases to {er , eϕ } and {e1 , e2 }. We have f ∗ (x) = r cos ϕ, f ∗ (y) = r sin ϕ, f ∗ ( dx) = cos ϕ dr − r sin ϕdϕ, f ∗ ( dy) = sin ϕ dr + r cos ϕdϕ, f∗ f ∗ ( dx∧ dy) = r dr∧dϕ, −y x dx + 2 dy = dϕ. x2 + y 2 x + y2 (b) Let k ∈ N, r ∈ {1, . . . , k}, and α ∈ R. Define a mapping a mapping from I : Rk → Rk+1 and ω ∈ Ω k (Rk+1 ) by I(x1 , . . . , xk ) = (x1 , . . . , xr−1 , α, xr , . . . xk ), ω(y1, . . . , yk+1) = k+1 X i=1 fi (y) dy1∧ · · · d dyi · · · ∧ dyk+1, 11.2 Differential Forms 291 where fi ∈ C∞ (Rk+1 ) for all i; the hat means omission of the factor dyi . Then I ∗ (ω)(x) = fr (x1 , . . . , xr−1 , α, xr , . . . , xk ) dx1 ∧ · · · ∧ dxk . This follows from I ∗ ( dyi ) = dxi , ∗ I ( dyr ) = 0, I ∗ ( dyi+1 ) = dxi , i = 1, . . . , r − 1, i = r, . . . , k. Roughly speaking: f ∗ (ω) is obtained by substituting the new variables at all places. 11.2.4 Closed and Exact Forms Motivation: Let f (x) be a continuous function on R. Then ω = f (x) dx is a 1-form. By the fundamental theorem of calculus, there exists an antiderivative F (x) to f (x) such that d F (x) = f (x) dx = ω. Problem: Given ω ∈ Ω k (U). Does there exist η ∈ Ω k−1 (U) with dη = ω? Definition 11.9 ω ∈ Ω k (U) is called closed if d ω = 0. ω ∈ Ω k (U) is called exact if there exists η ∈ Ω k−1 (U) such that d η = ω. Remarks 11.2 (a) An exact form ω is closed; indeed, dω = d(dη) = 0. P (b) A 1-form ω = i fi dxi is closed if and only if curl f~ = 0 for the corresponding vector field f~ = (f1 , . . . , fn ). Here the general curl can be defined as a vector with n(n−1)/2 components ~ ij = ∂fj − ∂fi . ( curl f) ∂xi ∂xj The form ω is exact if and only if f~ is conservative, that is, f~ is a gradient vector field with f~ = grad (U). Then ω = d U. (c) There are closed forms that are not exact; for example, the winding form ω= −y x dx + 2 dy 2 +y x + y2 x2 on R2 \ {(0, 0)} is not exact, cf. homework 30.1. (d) If d η = ω then d(η + dξ) = ω, too, for all ξ ∈ Ω k−2 (U). x0 Definition 11.10 An open set U is called star-shaped if there exists an x0 ∈ U such that for all x ∈ U the segment from x0 to x is in U, i. e. (1 − t)x0 + tx ∈ U for all t ∈ [0, 1]. Convex sets U are star-shaped (take any x0 ∈ U); any star-shaped set is connected and simply connected. 11 Differential Forms on Rn 292 Lemma 11.5 Let U ⊂ Rn be star-shaped with respect to the origin. X ω= ai1 ···ik dxi1 ∧ · · · ∧ dxik ∈ Ω k (U). Define Let i1 <···<ik Z 1 k X X r−1 k−1 dir ∧ · · · ∧ dxi , (−1) t ai1 ···ik (tx) dt xir dxi1 ∧ · · · ∧ dx k I(ω)(x) = 0 i1 <···<ik r=1 (11.13) where the hat means omission of the factor dxir . Then we have I(d ω) + d(I ω) = ω. (11.14) (Without proof.) Example 11.7 (a) Let k = 1, n = 3, and ω = a1 dx1 + a2 dx2 + a3 dx3 . Then I(ω) = x1 Z 1 a1 (tx) dt + x2 0 Z 1 a2 (tx) dt + x3 0 Z 1 a3 (tx) dt. 0 Note that this is exactly the formula for the potential U(x1 , x2 , x3 ) from Remark 8.5 (b). Let (a1 , a2 , a3 ) be a vector field on U with dω = 0. This is equivalent to curl a = 0 by Example 11.5 (c). The above lemma shows dU = ω for U = I(ω); this means grad U = (a1 , a2 , a3 ), U is the potential to the vector field (a1 , a2 , a3 ). (b) Let k = 2, n = 3, and ω = a1 dx2 ∧ dx3 + a2 dx3 ∧ dx1 + a3 dx1 ∧ dx2 where a is a C1 -vector field on U. Then I(ω) = x3 Z 0 1 ta2 (tx) dt − x2 ta3 (tx) dt dx1 + 0 Z 1 Z 1 ta3 (tx) dt − x3 ta1 (tx) dt dx2 + + x1 0 0 Z 1 Z 1 + x2 ta1 (tx) dt − x1 ta2 (tx) dt dx3 . Z 1 0 0 By Example 11.5 (d), ω is closed if and only if div (a) = 0 on U. Let η = b1 dx1 + b2 dx2 + b3 dx3 such that dη = ω. This means curl b = a. The Poincaré lemma shows that b with curl b = a exists if and only if div (a) = 0. Then b is the vector potential to a. In case d ω = 0 we can choose ~b d~x = I(ω). Theorem 11.6 (Poincaré Lemma) Let U be star-shaped. Then every closed differential form is exact. Proof. Without loss of generality let U be star-shaped with respect to the origin and dω = 0. By Lemma 11.5, d(Iω) = ω. 11.3 Stokes’ Theorem 293 Remarks 11.3 (a) Let U be star-shaped, ω ∈ Ω k (U). Suppose d η0 = ω for some η ∈ Ω k−1 (U). Then the general solution of d η = ω is given by η0 + dξ with ξ ∈ Ω k−2 (U). Indeed, let η be a second solution of d η = ω. Then d(η − η0 ) = 0. By the Poincaré lemma, there exists ξ ∈ Ω k−2 (U) with η − η0 = dξ, hence η = η0 + dξ. (b) Let V be a linear space and W a linear subspace of V . We define an equivalence relation on V by v1 ∼ v2 if v1 − v2 ∈ W . The equivalence class of v is denoted by v + W . One easily sees that the set of equivalence classes, denoted by V /W , is again a linear space: α(v + W ) + β(u + W ) := αv + βu + W . Let U be an arbitrary open subset of Rn . We define C k (U) = {ω ∈ Ω k (U) | d ω = 0}, the cocycles on U, B k (U) = {ω ∈ Ω k (U) | ω is exact}, the coboundaries on U. Since exact forms are closed, B k (U) is a linear subspace of C k (U). The factor space k (U) = C k (U)/B k (U) HdeR k is called the de Rham cohomology of U. If U is star-shaped, HdeR (U) = 0 for k ≥ 1, by 1 2\ Poincaré’s lemma. The first de Rham cohomology HdeR of R {(0, 0)} is non-zero. The winding form is a non-zero element. We have 0 HdeR (U) ∼ = Rp , if and only if U has exactly p components which are not connected U = U1 ∪ · · · ∪ Up (disjoint union). Then, the characteristic functions χUi , i = 1, . . . , p, form a basis of the 0-cycles C 0 (U) (B 0 (U) = 0). 11.3 Stokes’ Theorem 11.3.1 Singular Cubes, Singular Chains, and the Boundary Operator A very nice treatment of the topics to this section is [Spi65, Chapter 4]. The set [0, 1]k = [0, 1] × · · · × [0, 1] = {x ∈ Rk | 0 ≤ xi ≤ 1, i = 1, . . . , k} is called the k-dimensional unit cube. Let U ⊂ Rn be open. Definition 11.11 (a) A singular k-cube in U ⊂ Rn is a continuously differentiable mapping ck : [0, 1]k → U. (b) A singular k-chain in U is a formal sum sk = n1 ck,1 + · · · + nr ck,r with singular k-cubes ck,i and integers ni ∈ Z. c2 c2 A singular 0-cube is a point, a singular 1-cube is a curve, in general, a singular 2-cube (in R3 ) is a surface with a boundary of 4 pieces which are differentiable curves. Note that a singular 2-cube can also be a single point— that is where the name “singular” comes from. 11 Differential Forms on Rn 294 Let Ik : [0, 1]k → Rk be the identity map, i. e. Ik (x) = x, x ∈ [0, 1]k . It is called the standard kcube in Rk . We are going to define the boundary ∂sk of a singular k-chain sk . For i = 1, . . . , k define k (x1 , . . . , xk−1 ) = (x1 , . . . , xi−1 , 0, xi , . . . , xk−1 ), I(i,0) k I(i,1) (x1 , . . . , xk−1 ) = (x1 , . . . , xi−1 , 1, xi , . . . , xk−1 ). Insert a 0 and a 1 at the ith component, respectively. The boundary of the standard k-cube Ik is now defined by ∂Ik : [0, 1]k−1 → [0, 1]k ∂Ik = k X i=1 k k (−1)i I(i,0) − I(i,1) . (11.15) It is the formal sum of 2k singular (k − 1)-cubes, the faces of the k-cube. The boundary of an arbitrary singular k-cube ck : [0, 1]k → U ⊂ Rn is defined by the composition of the above mapping ∂ Ik : [0, 1]k−1 → [0, 1]k and the k-cube ck : ∂ck = ck ◦∂Ik = k X i=1 k k (−1)i ck ◦I(i,0) − ck ◦I(i,1) , (11.16) and for a singular k-chain sk = n1 ck,1 + · · · + nr ck,r we set ∂sk = n1 ∂ck,1 + · · · + nr ∂ck,r . The boundary operator ∂ck associates to each singular k-chain a singular (k − 1)-chain (since k k both I(i,0) and I(i,1) depend on k − 1 variables, all from the segment [0, 1]). One can show that ∂(∂sk ) = 0 for any singular k-chain sk . Example 11.8 (a) In case n = k = 3 have + I(3,1) x3 3 3 3 3 3 3 ∂I3 = −I(1,0) + I(1,1) + I(2,0) − I(2,1) − I(3,0) + I(3,1) , +I(2,0) - I(1,0) -I + I(1,1) x2 x1 - I(3,0) where (2,1) 3 3 −I(1,0) (x1 , x2 ) = −(0, x1 , x2 ), +I(1,1) (x1 , x2 ) = +(1, x1 , x2 ), 3 3 (x1 , x2 ) = +(x1 , 0, x2 ), −I(2,1) (x1 , x2 ) = −(x1 , 1, x2 ), +I(2,0) 3 3 −I(3,0) (x1 , x2 ) = −(x1 , x2 , 0), +I(3,1) (x1 , x2 ) = +(x1 , x2 , 1). k k × D2 I(i,j) to Note, if we take care of the signs in (11.15) all 6 unit normal vectors D1 I(i,j) 3 the faces have the orientation of the outer normal with respect to the unit 3-cube [0, 1] . The above sum ∂I3 is a formal sum of singular 2-cubes. You are not allowed to add componentwise: −(0, x1 , x2 ) + (1, x1 , x2 ) 6= (1, 0, 0). 11.3 Stokes’ Theorem 295 (b) In case k = 2 we have E3 I (1,0) + E1 I (2,0) 2 2 2 2 ∂ I2 (x) = I(1,1) − I(1,0) + I(2,0) − I(2,1) Γ3 I (2,1) E4 + I (1,1) Γ4 c2 Γ1 = (1, x) − (0, x) + (x, 0) − (x, 1). Γ2 ∂∂ I2 = (E3 − E2 ) − (E4 − E1 )+ E2 + (E2 − E1 ) − (E3 − E4 ) = 0. Here we have ∂c2 = Γ1 + Γ2 − Γ3 − Γ4 . be the singular 2-cube (c) Let c2 : [0, 2π] × [0, π] → R 3\ {(0, 0, 0)} c(s, t) = (cos s sin t, sin s sin t, cos t). By (b) ∂c2 (x) = c2 ◦∂I2 = = c2 (2π, x) − c2 (0, x) + c2 (x, 0) − c2 (x, π) = (cos 2π sin x, sin 2π sin x, cos 2π) − (cos 0 sin x, sin 0 sin x, cos x)+ + (cos x sin 0, sin x sin 0, cos 0) − (cos x sin π, sin x sin π, cos π) = (sin x, 0, cos x) − (sin x, 0, cos x) + (0, 0, 1) − (0, 0, −1) = (0, 0, 1) − (0, 0, −1). Hence, the boundary ∂c2 of the singular 2-cube c2 is a degenerate singular 1-chain. We come back to this example. 11.3.2 Integration Definition 11.12 Let ck : [0, 1]k → U ⊂ Rn , ~x = ck (t1 , . . . , tk ), be a singular k-cube and ω a k-form on U. Then (ck )∗ (ω) is a k-form on the unit cube [0, 1]k . Thus there exists a unique function f (t), t ∈ [0, 1]k , such that (ck )∗ (ω) = f (t) dt1 ∧ · · · ∧ dtk . Then Z ck ω := Z Ik (c∗k ) (ω) := Z [0,1]k f (t) dt1 · · · dtk is called the integral of ω over the singular cube ck ; on the right there is the k-dimensional Riemann integral. r X If sk = ni ck,i is a k-chain, set i=1 Z sk ω= r X i=1 ni Z ω. ck,i If k = 0, a 0-cube is a single point c0 (0) = x0 and a 0-form is a function ω ∈ C∞ (G). We set R ω = c∗0 (ω)|t=0 = ω(c0(0)) = ω(x0 ). We discuss two special cases k = 1 and k = n. c0 11 Differential Forms on Rn 296 Example 11.9 (a) k = 1. Let c : [0, 1] → Rn be an oriented, smooth curve Γ = c([0, 1]). Let ω = f1 (x) dx1 + · · · + fn (x) dxn be a 1-form on Rn , then c∗ (ω) = (f1 (c(t))c′1 (t) + · · · + fn (c(t))c′n (t)) dt is a 1-form on [0, 1] such that Z Z Z 1 Z ∗ ′ ′ ω= c ω= f1 (c(t))c1 (t) + · · · + fn (c(t))cn (t) dt = f~ · d~x. c [0,1] Γ 0 R Obviously, c ω is the line integral of f~ over Γ . (b) k = n. Let c : [0, 1]k → Rk be continuously differentiable and let x = c(t). Let ω = f (x) dx1 ∧ · · · ∧ dxk be a differential k-form on Rk . By Proposition 11.4 (e), c∗ (ω) = f (c(t)) ∂(c1 , . . ., ck ) dt1 ∧. . .∧ dtk . ∂(t1 , . . ., tk ) Therefore, Z ω= c Z f (c(t)) ∂(c1 , . . ., ck ) dt1 . . . dtk ∂(t1 , . . ., tk ) (11.17) [0,1]k Let c = Ik be the standard k-cube in [0, 1]k . Then Z Z f (x) dx1 · · · dxk ω= Ik [0,1]k is the k-dimensional Riemann integral of f over [0, 1]k . Let I˜k (x1 , . . ., xk ) = (x2 , x1 , x3 , . . ., xk ). Then Ik ([0, 1]k ) = I˜k ([0, 1]k ) = [0, 1]k , however Z Z ω= f (x2 , x1 , x3 , . . ., xk )(−1) dx1 · · · dxk I˜k [0,1]k =− Z f (x1 , x2 , x3 , . . ., xk ) dx1 · · · dxk = − Z ω. Ik [0,1]k R We see that ω is an oriented Riemann integral. Note that in the above formula (11.17) we do Ik not have the absolute value of the Jacobian. 11.3.3 Stokes’ Theorem Theorem 11.7 Let U be an open subset of Rn , k ≥ 0 a non-negative integer and let sk+1 be a singular (k + 1)-chain on U. Let ω be a differential k-form on U, ω ∈ Ω1k (U). Then we have Z Z ω= d ω. ∂sk+1 sk+1 11.3 Stokes’ Theorem 297 Proof. (a) Let sk+1 = Ik+1 be the standard (k + 1)-cube, in particular n = k + 1. k+1 X Let ω = fi (x) dx1 ∧· · ·∧ d dxi ∧· · ·∧ dxk+1 . Then i=1 k+1 X dω = (−1)i+1 i=1 ∂fi (x) dx1 ∧· · ·∧ dxk+1 , ∂xi hence by Example 11.6 (b), Fubini’s theorem and the fundamental theorem of calculus Z Z k+1 X ∂fi i+1 dω = (−1) dx1 · · · dxk+1 ∂xi i=1 Ik+1 [0,1]k+1 = k+1 X (−1) i=1 = k+1 X (−1) Example 11.6 (b) k+1 X i=1 ω, ∂fi (x1 , . . ., t, . . .xk+1 ) dt dx1 · · · dxi−1 dxi+1 · · · dxk+1 bi ∂xi (fi (x1 , . . . , 1, . . . , xk+1 ) − fi (x1 , . . . , 0, . . . , xk+1 )) dx1 ·b· · dxk+1 bi [0,1]k = = Z i+1 1 0 [0,1]k i=1 Z Z Z i+1 (−1)i+1 Z [0,1]k bi Z ∗ ∗ k+1 k+1 ω− ω I(i,1) I(i,0) [0,1]k ∂Ik+1 by definition of ∂Ik+1 . The assertion is shown in case of identity map. (b) The general case. Let Ik+1 be the standard (k + 1)-cube. Since the pull-back and the differential commute (Proposition 11.4) we have Z Z Z Z ∗ ∗ dω = (ck+1) (d ω) = d ((ck+1) ω) = (ck+1 )∗ ω ck+1 Ik+1 k+1 X = i=1 k+1 X = (−1)i P sk+1 i Z ∂Ik+1 (ck+1)∗ ω − k+1 I(i,0) (−1) Z k+1 I(i,0) Z i i=1 (c) Finally, let sk+1 = by (b), Z Ik+1 (ck+1 )∗ ω ω= Z ω. ∂ck+1 k+1 k+1 ck+1 ◦I(i,0) −ck+1 ◦I(i,1) ni cki with ni ∈ Z and singular (k + 1)-cubes cki . By definition and dω = X i ni Z dω = ck+1 X i ni Z ω= ∂ck+1 Z ∂sk+1 ω. 11 Differential Forms on Rn 298 Remark 11.4 Stokes’ theorem is valid for arbitrary oriented compact differentiable kdimensional manifolds F and continuously differentiable (k − 1)-forms ω on F. Example 11.10 We come back to Example 11.8 (c). Let ω = (x dy∧ dz + y dz∧ dx + z dx∧ dy)/r 3 be a 2-form on R3 \ {(0, 0, 0)}. It is easy to show that ω is closed, d ω = 0. We R compute ω. First note that c∗2 (r 3 ) = 1, c∗2 (x) = cos s sin t, c∗2 (y) = sin s sin t, c∗2 (z) = cos t c2 such that c∗2 ( dx) = d(cos s sin t) = − sin s sin t ds + cos s cos t dt, c∗2 ( dy) = d(sin s sin t) = cos s sin t ds + sin s cos t dt, c∗2 ( dz) = − sin t dt. and c∗2 (ω) = c∗2 (x dy∧ dz + y dz∧ dx + z dx∧ dy) = c∗2 (x) c∗2 ( dy)∧c∗2 ( dz) + c∗2 (y dz∧ dx) + c∗2 (z dx∧ dy) = (− cos2 s sin3 t − sin2 s sin3 t − cos t(sin2 s sin t cos t + cos2 s sin t cos t) ds∧ dt = − sin t ds∧ dt, such that Z ω= c2 Z c∗2 (ω) [0,2π]×[0,π] = Z − sin t ds∧ dt = [0,2π]×[0,π] Z 2π 0 Z 0 π (− sin t) dsdt = −4π. Stokes’ theorem shows that ω is not exact on R3 \ {(0, 0, 0)}. Suppose to the contrary that ω = d η for some η ∈ Ω 1 (R3 \ {(0, 0, 0)}). Since by Example 11.8 (c), ∂c2 is a degenerate 1-chain (it consists of two points), the pull-back (∂c2 )∗ (η) is 0 and so is the integral Z Z Z Z ∗ 0= (∂c2 ) (η) = η= dη = ω = −4π, I1 ∂c2 c2 c2 a contradiction; hence, ω is not exact. We come back to the two special cases k = 1, n = 2 and k = 1, n = 3. 11.3.4 Special Cases k = 1, n = 3. Let c : [0, 1]2 → U ⊆ R3 be a singular 2-cube, F = c([0, 1]2 ) is a regular smooth surface in R3 . Then ∂F is a closed path consisting of 4 parts with the counter-clockwise orientation. Let ω = f1 dx1 +f2 dx2 +f3 dx3 be a differential 1-form on U. By Example 11.9 (a) Z Z ω= f1 dx1 + f2 dx2 + f3 dx3 ∂c2 ∂F On the other hand by Example 11.5 (c) d ω = curl f · ( dx2 ∧ dx2 , dx3 ∧ dx2 , dx1 ∧ dx2 ). 11.3 Stokes’ Theorem 299 In this case Stokes’ theorem gives Z Z ω= Z dω c2 ∂c2 f1 dx1 + f2 dx2 + f3 dx3 = ∂F Z curl f · ( dx2 ∧ dx3 , dx3 ∧ dx1 , dx1 ∧ dx2 ) F If F is in the x1 –x2 plane, we get Green’s theorem. k = 2, n = 3. Let c3 be a singular 3-cube in R3 and G = c3 ([0, 1]). Further let ω = v1 dx2 ∧ dx3 + v2 dx3 ∧ dx1 + v3 dx1 ∧ dx2 , with a continuously differentiable vector field v ∈ C1 (G). By Example 11.5 (d), d ω = div (v) dx1 ∧ dx2 ∧ dx3 . The boundary of G consists of the 6 faces ∂c3 ([0, 1]3 ). They are oriented with the outer unit normal vector. Stokes’ theorem then gives Z Z dω = v1 dx2 ∧ dx3 + v2 dx3 ∧ dx1 + v3 dx1 ∧ dx2 , c3 ∂c3 Z Z ~ div v dxdydz = ~v · dS. G ∂G This is Gauß’ divergence theorem. 11.3.5 Applications The following two applications were not covered by the lecture. (a) Brower’s Fixed Point Theorem Proposition 11.8 (Retraction Theorem) Let G ⊂ Rn be a compact, connected, simply connected set with smooth boundary ∂G. There exist no vector field f : G → Rn , fi ∈ C2 (G), i = 1, . . . , n such that f (G) ⊂ ∂G and f (x) = x for all x ∈ ∂G. Proof. Suppose to the contrary that such f exists; consider ω ∈ Ω n−1 (U), ω = x1 dx2 ∧ dx3 ∧ · · · ∧ dxn . First we show that f ∗ (dω) = 0. By definition, for v1 , . . . , vn ∈ Rn we have f ∗ (dω)(p)(v1 , . . . , vn ) = dω(f (p))(Df (p)v1, Df (p)v2 , . . . , Df (p)vn )). Since dim f (G) = dim ∂G = n − 1, the n vectors Df (p)v1 , Df (p)v2 , . . . , Df (p)vn can be thought as beeing n vectors in an n − 1 dimensional linear space; hence, they are linearly dependent. Consequently, any alternating n-form on those vectors is 0. Thus f ∗ (dω) = 0. By Stokes’ theorem Z Z ∗ f (ω) = 0 = f ∗ (dω). ∂G G 11 Differential Forms on Rn 300 On the other hand, f = id on ∂G such that f ∗ (ω) = ω |∂G= x1 dx2 ∧ · · · ∧ dxn |∂G . Again, by Stokes’ theorem, 0= Z ∗ f (ω) = ∂G Z G dx1 ∧ · · · ∧ dxn = | G | ; a contradiction. Theorem 11.9 (Brower’s Fixed Point Theorem) Let g : B1 → B1 a continuous mapping of the closed unit ball B1 ⊂ Rn into itself. Then f has a fixed point p, f (p) = p. Proof. (a) We first prove that the theorem holds true for a twice continuously differentiable mapping g. Suppose to the contrary that g has no fixed point. For any p ∈ B1 the line through p and f (p) is then well defined. Let h(p) be those intersection point of the above line with the the unit sphere Sn−1 such that h(p) − p is a positive multiple of f (p) − p. In particular, h is a C2 -mapping from B1 into Sn−1 which is the identity on Sn−1 . By the previous proposition, such a mapping does not exist. Hence, f has a fixed point. (b) For a continuous mapping one needs Stone–Weierstraß to approximate the continuous functions by polynomials. In case n = 1 Brower’s theorem is just the intermediate value theorem. (b) The Fundamental Theorem of Algebra We give a first proof of the fundamental theorem of algebra, Theorem 5.19: Every polynomial f (z) = z n + a1 z n−1 + · · · + an with complex coefficients ai ∈ C has a root in C. We use two facts, the winding form ω on R2 \ {(0, 0)} is closed but not exact and z n and f (z) are “close together” for sufficiently large | z |. We view C as R2 with (a, b) = a + bi. Define the following singular 1-cubes on R2 11.3 Stokes’ Theorem 301 cR,n (s) = (Rn cos(2πns), Rn sin(2πns)) = z n , cR,f (s) = f ◦cR,1 (s) = f (R cos(2πs), R sin(2πs)) = f (z), z where z = z(s) = R(cos 2πs + i sin 2πs), s ∈ [0, 1]. Note that | z | = R. Further, let n f(z) c(s, t) = (1 − t)cR,f (s) + tcR,n c(s,t) = (1 − t)f (z) + tz n , (1-t) f(z)+t z n (s, t) ∈ [0, 1]2 , b(s, t) = f ((1 − t) R(cos 2πs, sin 2πs)) = f ((1 − t)z), (s, t) ∈ [0, 1]2 be singular 2-cubes in R2 . Lemma 11.10 If | z | = R is sufficiently large, then | c(s, t) | ≥ Rn , 2 (s, t) ∈ [0, 1]2. Proof. Since f (z) − z n is a polynomial of degree less than n, f (z) − z n −→ 0, z→∞ zn in particular | f (z) − z n | ≤ Rn /2 if R is sufficiently large. Then we have | c(s, t) | = | (1 − t)f (z) + tz n | = | z n + (1 − t)(f (z) − z n ) | Rn Rn ≥ | z n | − (1 − t) | f (z) − z n | ≥ Rn − = . 2 2 The only fact we need is c(s, t) 6= 0 for sufficiently large R; hence, c maps the unit square into R2 \ {(0, 0)}. Lemma 11.11 Let ω = ω(x, y) = (−y dx + x dy)/(x2 + y 2 ) be the winding form on R2 \ {(0, 0)}. Then we have (a) ∂c = cR,f − cR,n , ∂b = f (z) − f (0). (b) For sufficiently large R, c, cR,n , and cR,f are chains in R2 \ {(0, 0)} and Z Z ω= ω = 2πn. cR,n cR,f 11 Differential Forms on Rn 302 Proof. (a) Note that z(0) = z(1) = R. Since ∂I2 (x) = (x, 0) − (x, 1) + (1, x) − (0, x) we have ∂c(s) = c(s, 0) − c(s, 1) + c(1, s) − c(0, s) = f (z) − z n − ((1 − s)f (R) + sRn ) + ((1 − s)f (R) + sRn ) = f (z) − z n . This proves (a). Similarly, we have ∂b(s) = b(s, 0) − b(s, 1) + b(1, s) − b(0, s) = f (z) − f (0) + f ((1 − s)R) − f ((1 − s)R) = f (z) − f (0). (b) By the Lemma 11.10, c is a singular 2-chain in R2 \ {(0, 0, 0)} for sufficiently large R. Hence ∂c is a 1-chain in R2 \ {(0, 0)}. In particular, both cR,n and cR,f take values in R2 \ {(0, 0)}. Hence (∂c)∗ (ω) is well-defined. We compute c∗R,n (ω) using the pull-backs of dx and dy cR,n ∗ (x2 + y 2 ) = R2n , cR,n ∗ ( dx) = −2πnRn sin(2πns) ds, cR,n ∗ ( dy) = 2πnRn cos(2πns) ds, cR,n ∗ (ω) = 2πn ds. Hence Z ω= cR,n Z 1 2πn ds = 2πn. 0 By Stokes’ theorem and since ω is closed, Z Z ω = d ω = 0, ∂c such that by (a), and the above calculation Z Z Z 0= ω= ω− ω, ∂c cR,n c hence cR,f Z ω= cR,n Z ω = 2πn. cR,f 111111111111 000000000000 000000000000 111111111111 0000 1111 00 11 We complete the proof of the fundamental theorem of al000000000000 111111111111 0000 1111 00 11 000000000000 111111111111 0000 1111 n b(s,t)=(1-t) z 000000000000 111111111111 gebra. Suppose to the contrary that the polynomial f (z) 0000 1111 000000000000 111111111111 0000 1111 for f=1. 000000000000 111111111111 is non-zero in C, then b as well as ∂b are singular chains 000000000000 111111111111 000000000000 111111111111 000000000000 111111111111 in R2 \ {(0, 0)}. 000000000000 111111111111 000000000000 111111111111 By Lemma 11.11 (b) and again by Stokes’ theorem we have Z Z Z Z ω= ω= ω = dω = 0. cR,f cR,f −f (0) ∂b b But this is a contradiction to Lemma 11.11 (b). Hence, b is not a 2-chain in R2 \ {(0, 0)}, that is there exist s, t ∈ [0, 1] such that b(s, t) = f ((1 − t)z) = 0. We have found that (1 − 11.3 Stokes’ Theorem 303 t)R(cos(2πs) + i sin(2πs)) is a zero of f . Actually, we have shown a little more. There is a P zero of f in the disc {z ∈ C | | z | ≤ R} where R ≥ max{1, 2 i | ai |}. Indeed, in this case n−1 n−k X Rn n−1 | ak | z | ak | R ≤ ≤ | f (z) − z | ≤ 2 k=1 k=1 n n−1 X and this condition ensures | c(s, t) | = 6 0 as in the proof of Lemma 11.10. 304 11 Differential Forms on Rn Chapter 12 Measure Theory and Integration 12.1 Measure Theory Citation from Rudins book, [Rud66, Chapter 1]: Towards the end of the 19th century it became clear to many mathematicians that the Riemann integral should be replaced by some other type of integral, more general and more flexible, better suited for dealing with limit processes. Among the attempts made in this direction, the most notable ones were due to Jordan, Borel, W.H. Young, and Lebesgue. It was Lebesgue’s construction which turned out to be the most successful. In a brief outline, here is the main idea: The Riemann integral of a function f over an interval [a, b] can be approximated by sums of the form n X f (ti ) m(Ei ), i=1 where E1 , . . . , En are disjoint intervals whose union is [a, b], m(Ei ) denotes the length of Ei and ti ∈ Ei for i = 1, . . . , n. Lebesgue discovered that a completely satisfactory theory of integration results if the sets Ei in the above sum are allowed to belong to a larger class of subsets of the line, the so-called “measurable sets,” and if the class of functions under consideration is enlarged to what we call “measurable functions.” The crucial set-theoretic properties involved are the following: The union and the intersection of any countable family of measurable sets are measurable;. . . the notion of “length” (now called “measure”) can be extended to them in such a way that m(E1 ∪ E2 ∪ · · · ) = m(E1 ) + m(E2 ) + · · · for any countable collection {Ei } of pairwise disjoint measurable sets. This property of m is called countable additivity. The passage from Riemann’s theory of integration to that of Lebesgue is a process of completion. It is of the same fundamental importance in analysis as the construction of the real number system from rationals. 305 12 Measure Theory and Integration 306 12.1.1 Algebras, σ-algebras, and Borel Sets (a) The Measure Problem. Definitions Lebesgue (1904) states the following problem: We want to associate to each bounded subset E of the real line a positive real number m(E), called measure of E, such that the following properties are satisfied: (1) Any two congruent sets (by shift or reflexion) have the same measure. (2) The measure is countably additive. (3) The measure of the unit interval [0, 1] is 1. He emphasized that he was not able to solve this problem in full detail, but for a certain class of sets which he called measurable. We will see that this restriction to a large family of bounded sets is unavoidable—the measure problem has no solution. Definition 12.1 Let X be a set. A family (non-empty) family A of subsets of X is called an algebra if A, B ∈ A implies Ac ∈ A and A ∪ B ∈ A. An algebra A is called a σ-algebra if for all countable families {An | n ∈ N} with An ∈ A we have [ An = A1 ∪ A2 ∪ · · · ∪ An ∪ · · · ∈ A. N n∈ Remarks 12.1 (a) Since A ∈ A implies A ∪ Ac ∈ A; X ∈ A and ∅ = X c ∈ A. (b) If A is an algebra, then A ∩ B ∈ A for all A, B ∈ A. Indeed, by de Morgan’s rule, S T T S ( α Aα )c = α Acα and ( α Aα )c = α Acα , we have A ∩ B = (Ac ∪ B c )c , and all the members on the right are in A by\ the definition of an algebra. (c) Let A be a σ-algebra. Then An ∈ A if An ∈ A for all n ∈ N. Again by de Morgan’s N n∈ rule \ N An = n∈ [ N n∈ Acn !c . (d) The family P(X) of all subsets of X is both an algebra as well as a σ-algebra. (e) Any σ-algebra is an algebra but there are algebras not being σ-algebras. (f) The family of finite and cofinite subsets (these are complements of finite sets) of an infinite set form an algebra. Do they form a σ-algebra? (b) Elementary Sets and Borel Sets in R n Let R be the extended real axis together with ±∞, R = R ∪ {+∞} ∪ {−∞}. We use the old rules as introduced in Section 3.1.1 at page 79. The new rule which is used in measure theory only is 0 · ±∞ = ±∞ · 0 = 0. The set I = {(x1 , . . . , xn ) ∈ Rn | ai xi bi , i = 1, . . . , n} 12.1 Measure Theory 307 is called a rectangle or a box in Rn , where either stands for < or for ≤ where ai , bi ∈ R. For example ai = −∞ and bi = +∞ yields I = Rn , whereas a1 = 2, b1 = 1 yields I = ∅. A subset of Rn is called an elementary set if it is the union of a finite number of rectangles in Rn . Let En denote the set of elementary subsets of Rn . En = {I1 ∪ I2 ∪ · · · ∪ Ir | r ∈ N, Ij is a box in Rn }. Lemma 12.1 En is an algebra but not a σ-algebra. Proof. The complement of a finite interval is the union of two intervals, the complement of an infinite interval is an infinite interval. Hence, the complement of a rectangle in Rn is the finite union of rectangles. S The countable (disjoint) union M = n∈N [n, n + 12 ] is not an elementary set. Note that any elementary set is the disjoint union of a finite number of rectangles. Let B be any (nonempty) family of \ subsets of X. Let σ(B) denote the intersection of all σalgebras containing B, i. e. σ(B) = Ai , where {Ai | i ∈ I} is the family of all σ-algebras i∈I Ai which contain B, B ⊆ Ai for all i ∈ I. Note that the σ-algebra P(X) of all subsets is always a member of that family {Ai } such that σ(B) exists. We call σ(B) the σ-algebra generated by B. By definition, σ(B) is the smallest σ-algebra which contains the sets of B. It is indeed a σ-algebra. Moreover, σ(σ(B)) = σ(B) and if B1 ⊂ B2 then σ(B1 ) ⊂ σ(B2 ). Definition 12.2 The Borel algebra Bn in Rn is the σ-algebra generated by the elementary sets En . Its elements are called Borel sets. The Borel algebra Bn = σ(En ) is the smallest σ-algebra which contains all boxes in Rn We will see that the Borel algebra is a huge family of subsets of Rn which contains “all sets we are ever interested in.” Later, we will construct a non-Borel set. Proposition 12.2 Open and closed subsets of Rn are Borel sets. Proof. We give the proof in case of R2 . Let Iε (x0 , y0 ) = (x0 − ε, x0 + ε) × (y0 − ε, y0 + ε) denote the open square of size 2ε by 2ε with midpoint (x0 , y0 ). Then I 1 ⊆ I 1 for n ∈ N. Let n+1 n M ⊂ R2 be open. To every point (x0 , y0 ) with rational coordinates x0 , y0 we choose the largest square I1/n (x0 , y0 ) ⊆ M in M and denote it by J(x0 , y0 ). We show that M= [ Q (x0 ,y0 )∈M, x0 ,y0 ∈ J(x0 , y0 ). 12 Measure Theory and Integration 308 (x,y) 0 0 (a,b) Since the number of rational points in M is at least countable, the right side is in σ(E2 ). Now let (a, b) ∈ M arbitrary. Since M is open, there exists n ∈ N such that I2/n (a, b) ⊆ M. Since the rational points are dense in R2 , there is rational point (x0 , y0 ) which is contained in I1/n (a, b). Then we have I 1 (x0 , y0) ⊆ I 2 (a, b) ⊆ M. n n Since (a, b) ∈ I 1 (x0 , y0) ⊆ J(x0 , y0 ), we have shown that M is the union of the countable n family of sets J. Since closed sets are the complements of open sets and complements are again in the σ-algebra, the assertion follows for closed sets. Remarks 12.2 (a) We have proved that any open subset M of Rn is the countable union of rectangles I ⊆ M. (b) The Borel algebra Bn is also the σ-algebra generated by the family of open or closed sets in Rn , Bn = σ(Gn ) = σ(Fn ). Countable unions and intersections of open or closed sets are in Bn . Let us look in more detail at some of the sets in σ(En ). Let G and F be the families of all open and closed subsets of Rn , respectively. Let Gδ be the collection of all intersection of sequences of open sets (from G), and let Fσ be the collection of all unions of sequences of sets of F. One can prove that F ⊂ Gδ and G ⊂ Fσ . These inclusions are strict. Since countable intersection and unions of countable intersections and union are still countable operations, Gδ , Fσ ⊂ σ(En ) For an arbitrary family S of sets let Sσ be the collection of all unions of sequences of sets in S, and let Sδ be the collection of all unions of sequences of sets in S. We can iterate the operations represented by σ and δ, obtaining from the class G the classes Gδ , Gδσ , Gδσδ ,. . . and from F the classes Fσ , Fσδ ,. . . . It turns out that we have inclusions G ⊂ Gδ ⊂ Gδσ ⊂ · · · ⊂ σ(En ) F ⊂ Fσ ⊂ Fσδ ⊂ · · · ⊂ σ(En ). No two of these classes are equal. There are Borel sets that belong to none of them. 12.1.2 Additive Functions and Measures Definition 12.3 (a) Let A be an algebra over X. An additive function or content µ on A is a function µ : A → [0, +∞] such that (i) µ(∅) = 0, (ii) µ(A ∪ B) = µ(A) + µ(B) for all A, B ∈ A with A ∩ B = ∅. (b) An additive function µ is called countably additive (or σ-additive in the German literature) on A if for any disjoint family {An | An ∈ A, n ∈ N}, that is Ai ∩ Aj = ∅ for all i 6= j, with 12.1 Measure Theory S 309 N An ∈ A we have n∈ [ µ N An n∈ ! = X N µ(An ). n∈ (c) A measure is a countably additive function on a σ-algebra A. If X is a set, A a σ-algebra on X and µ a measure on A, then the tripel (X, A, µ) is called a measure space. Likewise, if X is a set and A a σ-algebra on X, the pair (X, A) is called a measurable space. Notation. X [ An in place of An if {An } is a disjoint family of subsets. The countable We write N n∈ N n∈ additivity reads as follows µ(A1 ∪ A2 ∪ · · · ) = µ(A1 ) + µ(A2 ) + · · · , ! ∞ ∞ X X µ An = µ(An ). n=1 n=1 We say µ is finite if µ(X) < ∞. If µ(X) = 1, we call (X, A, µ) a probability space. We call µ S σ-finite if there exist sets An ∈ A, with µ(An ) < ∞ and X = ∞ n=1 An . Example 12.1 (a) Let X be a set, x0 ∈ X and A = P(X). Then µ(A) = ( 1, 0, x0 ∈ A, x0 6∈ A defines a finite measure on A. µ is called the point mass concentrated at x0 . (b1) Let X be a set and A = P(X). Put µ(A) = ( n, if A has n elements +∞, if A has infinitely many elements. µ is a measure on A, the so called counting measure. (b2) Let X be a set and A = P(X). Put µ(A) = ( 0, +∞, if A has finitely many or countably many elements if A has uncountably many elements. µ is countably additive, not σ-finite. (b3) Let X be a set and A = P(X). Put µ(A) = µ is additive, not σ-additive. ( 0, +∞, if A is finite if A is infinite. 12 Measure Theory and Integration 310 (c) X = Rn , A = En is the algebra of elementary sets of Rn . Every A ∈ En is the finite disjoint P Pm union of rectangles A = m k=1 Ik . We set µ(A) = k=1 µ(Ik ) where µ(I) = (b1 − a1 ) · · · (bn − an ), if I = {(x1 , . . . , xn ) ∈ Rn | ai xi bi , i = 1, . . . , n} and ai bi ; µ(∅) = 0. Then µ is an additive function on En . It is called the Lebesgue content on Rn . Note that µ is not a measure (since A is not a σ-algebra and µ is not yet shown to be countably additive). However, we will see in Proposition 12.5 below that µ is even countably additive. By definition, µ(line in R2 ) = 0 and µ(plane in R3 ) = 0. (d) Let X = R, A = E1 , and α an increasing function on R. For a, b in R with a < b set µα ([a, b]) = α(b + 0) − α(a − 0), µα ([a, b)) = α(b − 0) − α(a − 0), µα ((a, b]) = α(b + 0) − α(a + 0), µα ((a, b)) = α(b − 0) − α(a + 0). Then µα is an additive function on E1 if we set µα (A) = n X µα (Ii ), if A = i=1 n X Ii . i=1 We call µα the Lebesgue–Stieltjes content. On the other hand, if µ : E1 → R is an additive function, then αµ : R → R defined by αµ (x) = ( µ((0, x]), −µ((x, 0]), x ≥ 0, x < 0, defines an increasing, right-continuous function αµ on R such that µ = µαµ . In general α 6= αµα since the function on the right hand side is continuous from the right whereas α is, in general, not . Properties of Additive Functions Proposition 12.3 Let A be an algebra over X and µ an additive function on A. Then ! n n X X Ak = µ(Ak ) if Ak ∈ A, k = 1, . . . , n form a disjoint family of n (a) µ k=1 k=1 subsets. (b) µ(A ∪ B) + µ(A ∩ B) = µ(A) + µ(B), A, B ∈ A. (c) A ⊆ B implies µ(A) ≤ µ(B) (µ is monotone). (d) If A ⊆ B, A, B ∈ A, and µ(A) < +∞, then µ(B \ A) = µ(B) − µ(A), (µ is subtractive). 12.1 Measure Theory 311 (e) µ n [ Ak k=1 ! ≤ n X µ(Ak ), k=1 if Ak ∈ A, k = 1, . . . , n, (µ is finitely subadditive). ∞ X Ak ∈ A. Then (f) If {Ak | k ∈ N} is a disjoint family in A and k=1 µ ∞ X Ak k=1 ! ≥ ∞ X µ(Ak ). k=1 Proof. (a) is by induction. (d), (c), and (b) are easy (cf. Homework 34.4). n [ (e) We can write Ak as the finite disjoint union of n sets of A: k=1 n [ k=1 Ak = A1 ∪ A2 \ A1 ∪ A3 \ (A1 ∪ A2 ) ∪ · · · ∪ An \ (A1 ∪ · · · ∪ An−1 ) . Since µ is additive, µ n [ k=1 Ak ! = n X k=1 µ (Ak \ (A1 ∪ · · · Ak−1 )) ≤ n X µ(Ak ), k=1 where we used µ(B \ A) ≤ µ(B) (from (d)). (f) Since µ is additive, and monotone n X µ(Ak ) = µ k=1 n X Ak k=1 ! ≤µ ∞ X k=1 Ak ! . Taking the supremum on the left gives the assertion. Proposition 12.4 Let µ be an additive function on the algebra A. Consider the following statements (a) µ is countably additive. (b) For any increasing sequence An ⊆ An+1 , An ∈ A, with have lim µ(An ) = µ(A). n→∞ (c) For any decreasing sequence An ⊇ An+1 , An ∈ A, with µ(An ) < ∞ we have lim µ(An ) = µ(A). n→∞ (d) Statement (c) with A = ∅ only. ∞ [ n=1 ∞ \ n=1 An = A ∈ A we An = A ∈ A and We have (a) ↔ (b) → (c) → (d). In case µ(X) < ∞ (µ is finite), all statements are equivalent. 12 Measure Theory and Integration 312 Proof. (a) → (b). Without loss of generality A1 = ∅. Put Bn = An \ An−1 for n = 2, 3, . . . . S Then {Bn } is a disjoint family with An = B2 ∪ B3 ∪ · · · ∪ Bn and A = ∞ n=1 Bn . Hence, by countable additivity of µ, ! ∞ n n X X X µ(A) = µ(Bk ) = lim µ(Bk ) = lim µ Bk = lim µ(An ). n→∞ k=2 n→∞ k=2 k=2 n→∞ S (b) → (a). Let {An } be a family of disjoint sets in A with An = A ∈ A; put Bk = A1 ∪ · · · ∪ Ak . Then Bk is an increasing to A sequence. By (b) ! ! n ∞ n X X X µ(Bn ) = µ = Ak µ(Ak ) −→ µ(A) = µ Ak ; µ is additive k=1 Thus, ∞ X µ(Ak ) = µ k=1 ∞ X k=1 n→∞ k=1 k=1 ! Ak . (b) → (c). Since An is decreasing to A, A1 \ An is increasing to A1 \ A. By (b) µ(A1 \ An ) −→ µ(A1 \ A), n→∞ hence µ(A1 ) − µ(An ) −→ µ(A1 ) − µ(A) which implies the assertion. n→∞ (c) → (d) is trivial. Now let µ be finite, in particular, µ(B) < ∞ for all B ∈ A. We show (d) → (b). Let (An ) be an increasing to A sequence of subsets A, An ∈ A. Then (A \ An ) is a decreasing to ∅ sequence. By (d), µ(A \ An ) −→ 0. Since µ is subtractive (Proposition 12.3 (d)) and all values are finite, n→∞ µ(An ) −→ µ(A). n→∞ Proposition 12.5 Let α be a right-continuous increasing function α : R → R, and µα the corresponding Lebesgue–Stieltjes content on E1 . Then µα is countably additive if. Proof. For simplicity we write µ for µα . Recal that µ((a, b]) = α(b) − α(a). We will perform the proof in case of ∞ [ (ak , bk ] (a, b] = k=1 with a disjoint family [ak , bk ) of intervals. By Proposition 12.3 (f) we already know µ ((a, b]) ≥ ∞ X µ((ak , bk ]). (12.1) k=1 We prove the opposite direction. Let ε > 0. Since α is continuous from the right at a, there exists a0 ∈ [a, b) such that α(a0 )−α(a) < ε and, similarly, for every k ∈ N there exists ck > bk such that α(ck ) − α(bk ) < ε/2k . Hence, [a0 , b] ⊂ ∞ [ (ak , bk ] ⊂ k=1 ∞ [ (ak , ck ) k=1 12.1 Measure Theory 313 is an open covering of a compact set. By Heine–Borel (Definition 6.14) there exists a finite subcover N N [ [ [a0 , b] ⊂ (ak , ck ), hence (a0 , b] ⊂ (ak , ck ], k=1 k=1 such that by Proposition 12.3 (e) µ((a0 , b]) ≤ N X µ((ak , ck ]). k=1 By the choice of a0 and ck , µ((ak , ck ]) = µ((ak , bk ]) + α(ck ) − α(bk ) ≤ µ((ak , bk ]) + ε . 2k Similarly, µ((a, b]) = µ((a, a0 ]) + µ((a0 , b]) such that N X ε µ((ak , bk ]) + k + ε µ((a, b]) ≤ µ((a0 , b]) + ε ≤ 2 k=1 ≤ N X k=1 µ((ak , bk ]) + 2ε ≤ Since ε was arbitrary, µ((a, b]) ≤ In view of (12.1), µα is countably additive. ∞ X ∞ X µ((ak , bk ]) + 2ε k=1 µ((ak , bk ]). k=1 Corollary 12.6 The correspondence µ 7→ αµ from Example 12.1 (d) defines a bijection between countably additive functions µ on E1 and the monotonically increasing, right-continuous functions α on R (up to constant functions, i. e. α and α + c define the same additive function). Historical Note. It was the great achievment of Émile Borel (1871–1956) that he really proved the countable additivity of the Lebesgue measure. He realized that the countable additivity of µ is a serious mathematical problem far from being evident. 12.1.3 Extension of Countably Additive Functions Here we must stop the rigorous treatment of measure theory. Up to now, we know only two trivial examples of measures (Example 12.1 (a) and (b)). We give an outline of the steps toward the construction of the Lebesgue measure. • Construction of an outer measure µ∗ on P(X) from a countably additive function µ on an algebra A. • Construction of the σ-algebra Aµ of measurable sets. 12 Measure Theory and Integration 314 The extension theory is due to Carathéodory (1914). For a detailed treatment, see [Els02, Section II.4]. Theorem 12.7 (Extension and Uniqueness) Let µ be a countably additive function on the algebra A. (a) There exists an extension of µ to a measure on the σ-algebra σ(A) which coincides with µ on A. We denote the measure on σ(A) also by µ. It is defined as the restriction of the outer measure µ∗ : P(X) → [0, ∞] µ∗ (A) = inf ( ∞ X k=n µ(An ) | A ⊂ ∞ [ n=1 ) An , An ∈ A, n ∈ N to the µ-measurable sets Aµ∗ . (b) This extension is unique, if (X, A, µ) is σ-finite. (For a proof, see [Brö92, (2.6), p.68]) Remark 12.3 (a) A subset A ⊂ X is said to be µ-measurable if for all Y ⊂ X µ∗ (Y ) = µ∗ (A ∩ Y ) + µ∗ (Ac ∩ Y ). The family of µ-measurable sets form a σ-algebra Aµ∗ . (b) We have A ⊂ σ(A) ⊂ Aµ∗ and µ∗ (A) = µ(A) for all A ∈ A. (c) (X, Aµ∗ , µ∗ ) is a measure space, in particular, µ∗ is countably additive on the measurable sets Aµ∗ and we redenote it by µ. 12.1.4 The Lebesgue Measure on R n Using the facts from the previous subsection we conclude that for any increasing, right continuous function α on R there exists a measure µα on the σ-algebra of Borel sets. We call this measure the Lebesgue–Stieltjes measure on R. In case α(x) = x we call it the Lebesgue measure. Extending the Lebesgue content on elementary sets of Rn to the Borel algebra Bn , we obtain the n-dimensional Lebesgue measure λn on Rn . Completeness A measure µ : A → R+ on a σ-algebra A is said to be complete if A ∈ A, µ(A) = 0, and B ⊂ A implies B ∈ A. It turns out that the Lebesgue measure λn on the Borel sets of Rn is not complete. Adjoining to Bn the subsets of measure-zero-sets, we obtain the σ-algebra Aλn of Legesgue measurable sets Aλn . Aλn = σ (Bn ∪ {X ⊆ Rn | ∃ B ∈ Bn : X ⊂ E, The Lebesgue measure λn on Aλn is now complete. λn (B) = 0}) . 12.1 Measure Theory 315 Remarks 12.4 (a) The Lebesgue measure is invariant under the motion group of Rn . More precisely, let O(n) = {T ∈ Rn×n | T ⊤ T = T T ⊤ = En } be the group of real orthogonal n × n-matrices (“motions”), then λn (T (A)) = λn (A), A ∈ Aλn , T ∈ O(n). (b) λn is translation invariant, i. e. λn (A) = λn (x+A) for all x ∈ Rn . Moreover, the invariance of λn under translations uniquely characterizes the Lebesgue measure λn : If λ is a translation invariant measure on Bn , then λ = cλn for some c ∈ R+ . (c) There exist non-measurable subsets in Rn . We construct a subset E of R that is not Lebesgue measurable. We write x ∼ y if x − y is rational. This is an equivalence relation since x ∼ x for all x ∈ R, x ∼ y implies y ∼ x for all x and y, and x ∼ y and y ∼ z implies x ∼ z. Let E be a subset of (0, 1) that contains exactly one point in every equivalence class. (the assertion that there is such a set E is a direct application of the axiom of choice). We claim that E is not measurable. Let E + r = {x + r | x ∈ E}. We need the following two properties of E: (a) If x ∈ (0, 1), then x ∈ E + r for some rational r ∈ (−1, 1). (b) If r and s are distinct rationals, then (E + r) ∩ (E + s) = ∅. To prove (a), note that for every x ∈ (0, 1) there exists y ∈ E with x ∼ y. If r = x − y, then x = y + r ∈ E + r. To prove (b), suppose that x ∈ (E + r) ∩ (E + s). Then x = y + r = z + s for some y, z ∈ E. Since y − z = s − r 6= 0, we have y ∼ z, and E contains two equivalent points, in contradiction to our choice of E. [ Now assume that E is Lebesgue measurable with λ(E) = α. Define S = (E + r) where the union is over all rational r ∈ (−1, 1). By (b), the sets E + r are pairwise disjoint; since λ is translation invariant, λ(E + r) = λ(E) = α for all r. Since S ⊂ (−1, 2), λ(S) ≤ 3. The countable additivity of λ now forces α = 0 and hence λ(S) = 0. But (a) implies (0, 1) ⊂ S, hence 1 ≤ λ(S), and we have a contradiction. (d) Any countable set has Lebesgue measure zero. Indeed, every single point is a box with edges of length 0; hence λ({pt}) = 0. Since λ is countably additive, λ({x1 , x2 , . . . , xn , . . . }) = ∞ X n=1 λ({xn }) = 0. In particular, the rational numbers have Lebesgue measure 0, λ(Q) = 0. (e) There are uncountable sets with measure zero. The Cantor set (Cantor: 1845–1918, inventor of set theory) is a prominent example: ) ( ∞ X ai C= ai ∈ {0, 2} ∀ i ∈ N ; 3i i=1 Obviously, C ⊂ [0, 1]; C is compact and can be written as the intersection of a decreasing sequence (Cn ) of closed subsets; C1 = [0, 1/3] ∪ [2/3, 1], λ(C1 ) = 2/3, and, recursively, 2 1 1 1 2 2 1 Cn+1 = Cn ∪ = λ(Cn ). + Cn =⇒ λ(Cn+1 ) = λ(Cn ) + λ Cn + 3 3 3 3 3 3 3 12 Measure Theory and Integration 316 iai ∈ {0, 2} ∀ i = 1, . . . , n Clearly, n n+1 2 2 2 λ(Cn+1) = λ(Cn ) = · · · = λ(C1 ) = . 3 3 3 It turns out that Cn = P∞ ai i=1 3i By Proposition 12.4 (c), λ(C) = lim λ(Cn ) = 0. However, C has the same cardinality as n→∞ {0, 2}N ∼ = {0, 1}N ∼ = R which is uncountable. 12.2 Measurable Functions Let A be a σ-algebra over X. Definition 12.4 A real function f : X → R is called A-measurable if for all a ∈ R the set {x ∈ X | f (x) > a} belongs to A. A complex function f : X → C is said to be A-measurable if both Re f and Im f are Ameasurable. A function f : U → R, U ⊂ Rn , is said to be a Borel function if f is Bn -measurable, i. e. f is measurable with respect to the Borel algebra on Rn . A function f : U → V , U ⊂ Rn , V ⊂ Rm , is called a Borel function if f −1 (B) is a Borel set for all Borel sets B ⊂ V . It is part of homework 39.3 (b) to prove that in case m = 1 these definitions coincide. Also, f = (f1 , . . . , fm ) is a Borel function if all fi are. Note that {x ∈ X | f (x) > a} = f −1 ((a, +∞)). From Proposition 12.8 below it becomes clear that the last two notions are consistent. Note that no measure on (X, A) needs to be specified to define a measurable function. Example 12.2 (a) Any continuous function f : U → R, U ⊂ Rn , is a Borel function. Indeed, since f is continuous and (a, +∞) is open, f −1 ((a, +∞)) is open as well and hence a Borel set (cf. Proposition 12.2). (b) The characteristic function χA is A-measurable if and only if A ∈ A (see homework 35.3). (c) Let f : U → V and g : V → W be Borel functions. Then g ◦f : U → W is a Borel function, too. Indeed, for any Borel set C ⊂ W , g −1(C) is a Borel set in V since g is a Borel function. Since f is a Borel function (g ◦f )−1 (C) = f −1 (g −1(C)) is a Borel subset of U which shows the assertion. Proposition 12.8 Let f : X → R be a function. The following are equivalent (a) {x | f (x) > a} ∈ A for all a ∈ R (i. e. f is A-measurable). (b) {x | f (x) ≥ a} ∈ A for all a ∈ R. (c) {x | f (x) < a} ∈ A for all a ∈ R. (d) {x | f (x) ≤ a} ∈ A for all a ∈ R. (e) f −1 (B) ∈ A for all Borel sets B ∈ B1 . Proof. (a) → (b) follows from the identity [a, +∞] = \ N n∈ (a − 1/n, +∞] 12.2 Measurable Functions 317 and the invariance of intersections under preimages, f −1 (A ∩ B) = f −1 (A) ∩ f −1 (B). Since f is A-measurable and A is a σ-algebra, the countable intersection on the right is in A. (a) → (d) follows from {x | f (x) ≤ a} = {x | f (x) > a}c . The remaining directions are left to the reader (see also homework 35.5). Remark 12.5 (a) Let f, g : X → R be A-measurable. Then {x | f (x) > g(x)} and {x | f (x) = g(x)} are in A. Proof. Since [ {x | f (x) < g(x)} = ({x | f (x) < q} ∩ {x | q < g(x)}) , Q q∈ and all sets {f < q} and {q < g} the right are in A, and on the right there is a countable union, the right hand side is in A. A similar argument works for {f > g}. Note that the sets {f ≥ g} and {f ≤ g} are the complements of {f < g} and {f > g}, respectively; hence they belong to A as well. Finally, {f = g} = {f ≥ g} ∩ {f ≤ g}. (b) It is not difficult to see that for any sequence (an ) of real numbers lim an = inf sup ak n→∞ N n∈ lim an = sup inf ak . and (12.2) N k≥n n→∞ k≥n n∈ As a consequence we can construct new mesurable functions using sup and limn→∞ . Let (fn ) be a sequence of A-measurable real functions on X. Then sup fn , inf fn , lim fn , lim fn are N n∈ N n∈ A-measurable. In particular lim fn is measurable if the limit exists. n→∞ Proof. Note that for all a ∈ R we have \ {fn ≤ a}. {sup fn ≤ a} = n→∞ n→∞ N n∈ Since all fn are measurable, so is sup fn . A similar proof works for inf fn . By (12.2), lim fn n→∞ and lim fn , are measurable, too. n→∞ Proposition 12.9 Let f, g : X → R Borel functions on X ⊂ Rn . Then αf + βg, f · g, and | f | are Borel functions, too. Proof. The function h(x) = (f (x), g(x)) : X → R2 is a Borel function since its coordinate functions are so. Since the sum s(x, y) = x + y and the product p(x, y) = xy are continuous functions, the compositions s◦h and p◦h are Borel functions by Example 12.2 (c). Since the constant functions α and β are Borel, so are αf , βg, and finally αf + βg. Hence, the Borel functions over X form a linear space, moreover a real algebra. In particular −f is Borel and so is | f | = max{f, −f }. 12 Measure Theory and Integration 318 f+ f- Let (X, A, µ) be a measure space and f : X → R arbitrary. Let f + = max{f, 0} and f − = max{−f, 0} denote the positive and negative parts of f . We have f = f + −f − and | f | = f + +f − ; moreover f + , f − ≥ 0. Corollary 12.10 Let f is a Borel function if and only if both f + and f − are Borel. 12.3 The Lebesgue Integral We define the Lebesgue integral of a complex function in three steps; first for positive simple functions, then for positive measurable functions and finally for arbitrary measurable functions. In this section (X, A, µ) is a measure space. 12.3.1 Simple Functions Definition 12.5 Let M ⊆ X be a subset. The function ( 1, x ∈ M, χM (x) = 0, x 6∈ M, is called characteristic function of M. An A-measurabel function f : X → R is called simple if f takes only finitely many values c1 , . . . , cn . We denote the set of simple functions on (X, A) by S; the set of non-negative simple functions is denoted by S+ . Clearly, if c1 , . . . , cn are the distinct values of the simple function f , then f= n X ci χAi , i=1 where Ai = {x | f (x) = ci }. It is clear, that f measurable if and only if Ai ∈ A for all i. Obviously, {Ai | i = 1, . . . , n} is a disjoint family of subsets of X. It is easy to see that f, g ∈ S implies αf + βg ∈ S, max{f, g} ∈ S, min{f, g} ∈ S, and f g ∈ S. Step 1: Positive Simple Functions For f = n X i=1 ci χAi ∈ S+ define Z f dµ = X n X ci µ(Ai ). (12.3) i=1 The convention 0 · (+∞) is used here; it may happen that ci = 0 for some i and µ(Ai ) = +∞. 12.3 The Lebesgue Integral 319 Remarks 12.6 (a) Since ci ≥ 0 for all i, the right hand side is well-defined in R. m m X X (b) Given another presentation of f , say, f (x) = dj χBj , dj µ(Bj ) gives the same value j=1 j=1 as (12.3). The following properties are easily checked. Lemma 12.11 For f, g ∈ S+ , A ∈ A, c ∈ R+ we have R (1) X χA dµ = µ(A). R R (2) X cf dµ = c X f dµ. R R R (3) X (f + g) dµ = X f dµ + X g dµ. R R (4) f ≤ g implies X f dµ ≤ X g dµ. 12.3.2 Positive Measurable Functions The idea is to approximate a positive measurable function with an increasing sequence of positive simple ones. Theorem 12.12 Let f : X → [0, +∞] be measurable. There exist simple functions sn , n ∈ N, on X such that (a) 0 ≤ s1 ≤ s2 ≤ · · · ≤ f . (b) sn (x) −→ f (x), as n → ∞, for every x ∈ X. n→∞ y f 1 Example. X = (a, b), n = 1, 1 ≤ i ≤ 2. Then 1 −1 , 0, E11 = f 2 1 −1 E12 = f ,1 , 2 s1 3/4 E24 E24 1/2 F2 = f −1 ([1, +∞]) . 0 a E11 E12 E12 E11 b F1 Proof. For n ∈ N and for 1 ≤ i ≤ n2n , define i−1 i −1 , Eni = f 2n 2n x and Fn = f −1 ([n, ∞]) and put n sn = n2 X i−1 i=1 2n χEni + nχFn . 12 Measure Theory and Integration 320 Proposition 12.8 shows that Eni and Fn are measurable sets. It is easily seen that the functions sn satisfy (a). If x is such that f (x) < +∞, then 0 ≤ f (x) − sn (x) ≤ 1 2n (12.4) as soon as n is large enough, that is, x ∈ Eni for some n, i ∈ N and not x ∈ Fn . If f (x) = +∞, then sn (x) = n; this proves (b). From (12.4) it follows, that sn ⇉ f uniformly on X if f is bounded. Step 2: Positive Measurable Real Functions Definition 12.6 (Lebesgue Integral) Let f : X → [0, +∞] be measurable. Let (sn ) be an increasing sequence of non-negative simple functions sn converging to f (x) for all x ∈ X, lim sn (x) = sup sn (x) = f (x). Define n→∞ N n∈ Z f dµ = lim n→∞ X Z X sn dµ = sup N n∈ Z sn dµ (12.5) X and call this number in [0, +∞] the Lebesgue integral of f (x) over X with respect to the measure µ or µ-integral of f over X. The definition of the Lebesgue integral does not depend on the special choice of the increasing functions sn ր f . One can define Z Z f dµ = sup s dµ | s ≤ f, and s is a simple function . X X R Observe, that we apparently have two definitions for X f dµ if f is a simple function. However these assign the same value to the integral since f is the largest simple function greater than or equal to f . Proposition 12.13 The properties (1) to (4) from Lemma 12.11 hold for any non-negative measurable functions f, g : X → [0, +∞], c ∈ R+ . (Without proof.) Step 3: Measurable Real Functions Let f : X → R be measurable and f + (x) = max(f, 0), f − (x) = max(−f (x), 0). Then f + and f − are both positive and measurable. Define Z Z Z + f dµ = f dµ − f − dµ X X X if at least one of the integrals on the right is finite. We say that f is µ-integrable if both are finite. 12.3 The Lebesgue Integral 321 Step 4: Measurable Complex Functions Definition 12.7 (Lebesgue Integral—Continued) A complex, f : X → C is called µ-integrable if Z | f | dµ < ∞. measurable function X If f = u + iv is µ-integrable, where u = Re f and v = Im f are the real and imaginary parts of f , u and v are real measurable functions on X. Define the µ-integral of f over X by Z Z Z Z Z + − + f dµ = u dµ − u dµ + i v dµ − i v − dµ. (12.6) X X X X X These four functions u+ , u− , v + , and v − are measurable, real, and non-negative. Since we have u+ ≤ | u | ≤ | f | etc., each of these four integrals is finite. Thus, (12.6) defines the integral on the left as a complex number. We define L 1 (X, µ) to be the collection of all complex µ-integrable functions f on X. R Note that for an integrable functions f , X f dµ is a finite number. Proposition 12.14 Let f, g : X → C be measurable. (a) f is µ-integrable if and only if | f | is µ-integrable and we have Z Z f dµ ≤ | f | dµ. X X (b) f is µ-integrable if and only if there exists an integrable function h with | f | ≤ h. (c) If f, g are integrable, so is c1 f + c2 g where Z Z Z (c1 f + c2 g) dµ = c1 f dµ + c2 g dµ. X (d) If f ≤ g on X, then X R X f dµ ≤ R X X g dµ. It follows that the set L 1 (X, µ) of µ-integrable complex-valued functions on X is a linear space. The Lebesgue integral defines a positive linear functional on L 1 (X, µ). Note that (b) implies that any measurable and bounded function f on a space X with µ(X) < ∞ is integrable. Step 5: R A f dµ Definition 12.8 Let A ∈ A, f : X → R or f : X → C measurable. The function f is called µ-integrable over A if χA f is µ-integrable over X. In this case we put, Z Z f dµ = χA f dµ. A In particular, Lemma 12.11 (1) now reads X R A dµ = µ(A). 12 Measure Theory and Integration 322 12.4 Some Theorems on Lebesgue Integrals 12.4.1 The Role Played by Measure Zero Sets Equivalence Relations Let X be a set and R ⊂ X × X. For a, b ∈ X we write a ∼ b if (a, b) ∈ R. Definition 12.9 (a) The subset R ⊂ X ×X is said to be an equivalence relation if R is reflexive, symmetric and transitive, that is, (r) ∀ x ∈ X : x ∼ x. (s) ∀ x, y ∈ X : x ∼ y =⇒ y ∼ x. (t) ∀ x, y, z ∈ X : x ∼ y ∧ y ∼ z =⇒ x ∼ z. For a ∈ X the set a := {x ∈ X | x ∼ a} is called the equivalence class of a. We have a = b if and only if a ∼ b. (b) A partition P of X is a disjoint family P = {Aα | α ∈ I} of subsets Aα ⊂ X such that P Aα = X. α∈I The set of equivalence classes is sometimes denoted by X/ ∼. Example 12.3 (a) On Z define a ∼ b if 2 | (a−b). a and b are equivalent if both are odd or both are even. There are two equivalence classes, 1 = −5 = 2Z + 1 (odd numbers), 0 = 100 = 2Z even numbers. (b) Let W ⊂ V be a subspace of the linear space V . For x, y ∈ V define x ∼ y if x − y ∈ W . This is an equivalence relation, indeed, the relation is reflexive since x − x = 0 ∈ W , it is symmetric since x − y ∈ W implies y − x = −(x − y) ∈ W , and it is transitive since x − y, y − z ∈ W implies that there sum (x − y) + (y − z) = x − z ∈ W such that x ∼ z. One has 0 = W and x = x + W := {x + w | w ∈ W }. Set set of equivalence classes with respect to this equivalence relation is called the factor space or quotient space of V with respect to W and is denoted by V /W . The factor space becomes a linear space if we define x + y := x + y and λ x = λx, λ ∈ C. Addition is indeed well-defined since x ∼ x′ and y ∼ y ′ , say, x − x′ = w1 , y − y ′ = w2 , w1 , w2 ∈ W implies x + y − (x′ + y ′) = w1 + w2 ∈ W such that x + y = x′ + y ′ . (c) Similarly as in (a), for m ∈ N define the equivalence relation a ≡ b (mod m) if m | (a − b). We say “a is congruent b modulo m”. This defines a partition of the integers into m disjoint equivalence classes 0, 1, . . . , m − 1, where r = {am + r | a ∈ Z}. (d) Two triangles in the plane are equivalent if (1) there exists a translation such that the first one is mapped onto the second one. (2) there exists a rotation around (0, 0) (3) there exists a motion (rotation or translation or reflexion or composition) Then (1) – (3) define different equivalence relations on triangles or more generally on subsets of the plane. (e) Cardinality of sets is an equivalence relation. 12.4 Some Theorems on Lebesgue Integrals 323 Proposition 12.15 (a) Let ∼ be an equivalence relation on X. Then P = {x | x ∈ X} defines a partition on X denoted by P∼ . (b) Let P be a partition on X. Then x ∼ y if there exists A ∈ P with x, y ∈ A defines an equivalence relation ∼P on X. (c) ∼P∼ =∼ and P∼P = P. Let P be a property which a point x may or may not have. For instance, P may be the property “f (x) > 0” if f is a given function or “(fn (x)) converges” if (fn ) is a given sequence of functions. Definition 12.10 If (X, A, µ) is a measure space and A ∈ A, we say “P holds almost everywhere on A”, abbreviated by “P holds a. e. on A”, if there exists N ∈ A such that µ(N) = 0 and P holds for every point x ∈ A \ N. This concept of course strongly depends on the measure µ, and sometimes we write a. e. to emphasize the dependence on µ. (a) Main example. On the set of measurable functions on X, f : X → C we define an equivalence relation by f ∼ g, if f = g a. e. on X. This is indeed an equivalence relation since f (x) = f (x) for all x ∈ X, f (x) = g(x) implies g(x) = f (x). Let f = g a. e. on X and g = h a. e. on X, that is, there exist M, N ∈ A with µ(M) = µ(N) = 0 and f (x) = g(x) for all x ∈ X \ M, g(x) = h(x) for all x ∈ N. Hence, f (x) = h(x) for all x ∈ M ∪ N. Since 0 ≤ µ(M ∪ N) ≤ µ(M) + µ(N) = 0 + 0 = 0 by Proposition 12.3 (e), µ(M ∪ N) = 0 and finally f = h a. e. on X. (b) Note that f = g a. e. on X implies Z Z f dµ = g dµ. X X Indeed, let N denote the zero-set where f 6= g. Then Z Z Z Z Z f dµ − g dµ ≤ | f − g | dµ = | f − g | dµ + X X X X \N N | f − g | dµ ≤ µ(N)(∞) + µ(X \ N) · 0 = 0. Here we used that for disjoint sets A, B ∈ A, Z Z Z Z Z Z f dµ = χA∪B f dµ = χA f dµ + χB f dµ = f dµ + f dµ. A∪B X X X A B Proposition 12.16 Let f : X → [0, +∞] be measurable. Then R f dµ = 0 if and only if f = 0 a. e. on X. X R R Proof. By the above argument in (b), f = 0 a. e. implies X f dµ = X 0 dµ = 0 which proves one direction. The other direction is homework 40.4. 12 Measure Theory and Integration 324 12.4.2 The space Lp (X, µ) For any measurable function f : X → C and any real p, 1 ≤ p < ∞ define kf kp = Z p X | f | dµ p1 . (12.7) This number may be finite or ∞. In the first case, | f |p is integrable and we write f ∈ L 1 (X, µ). Proposition 12.17 Let p, q ≥ 1 be given such that 1p + 1q = 1. (a) Let f, g : X → C be measurable functions such that f ∈ L p and g ∈ L q . Then f g ∈ L 1 and Z | f g | dµ ≤ kf kp kgkq (Hölder inequality). (12.8) X (b) Let f, g ∈ L q . Then f + g ∈ L q and kf + gkq ≤ kf kq + kgkq , (Minkowski inequality). (12.9) Idea of proof. Hölder follows from Young’s inequality (Proposition 1.31, as in the calssical case of Hölder’s inequality in Rn , see Proposition 1.32 Minkowski’s inequality follows from Hölder’s inequality as in Propostion 1.34 Note that Minkowski implies that f, g ∈ L p yields kf + gk < ∞ such that f + g ∈ L p . In particular, L p is a linaer space. Let us check the properties of k·kp . For all measurable f, g we have kf kp ≥ 0, kλf kp = | λ | kf kp , kf + gk ≤ kf k + kgk . All properties of a norm, see Definition 6.9 at page 179 are satisfied except for the definitness: R kf kp = 0 imlies X | f |p dµ = 0 implies by Proposition 12.16, | f |p = 0 a. e. implies f = 0 a. e. . However, it does not imply f = 0. To overcome this problem, we use the equivalece relation f = g a. e. and consider from now on only equivalence classes of functions in L p , that is we identify functions f and g which are equal a. e. . The space N = {f : X → C | f is measurable and f = 0 a. e. } is a linear subspace of L p (X, µ) for all all p, and f = g a. e. if and only if f − g ∈ N. Then the factor space L p /N, see Example 12.3 (b) is again a linear space. Definition 12.11 Let (X, A, µ) be a measure space. Lp (X, µ) denotes the set of equivalence classes of functions of L p (X, µ) with respect to the equivalence relation f = g a. e. that is, Lp (X, µ) = L p (X, µ)/N is the quotient space. (Lp (X, µ), k·kp ) is a normed space. With this norm Lp (X, µ) is complete. 12.4 Some Theorems on Lebesgue Integrals 325 Example 12.4 (a) We have χQ = 0 in Lp (R, λ) since χQ = 0 a. e. on R with respect to the Lebesgue measure (note that Q is a set of measure zero). (b) In case of sequence spaces L p (N) with respect to the counting measure, L p = Lp since f = 0 a. e. implies f = 0. (c) f (x) = x1α , α > 0 is in L2 (0, 1) if and only if 2α < 1. We identify functions and their equivalence classes. 12.4.3 The Monotone Convergence Theorem The following theorem about the monotone convergence by Beppo Levi (1875–1961) is one of the most important in the theory of integration. The theorem holds for an arbitrary increasing R sequence of measurable functions with, possibly, X fn dµ = +∞. Theorem 12.18 (Monotone Convergence Theorem/Lebesgue) Let (fn ) be a sequence of measurable functions on X and suppose that (1) (2) 0 ≤ f1 (x) ≤ f2 (x) ≤ · · · ≤ +∞ for all x ∈ X, fn (x) −→ f (x), for every x ∈ X. n→∞ Then f is measurable, and lim n→∞ Z fn dµ = X Z f dµ = X Z lim fn dµ. n→∞ X (Without proof) Note, that f is measurable is a consequence of Remark 12.5 (b). Corollary 12.19 (Beppo Levi) Let fn : X → [0, +∞] be measurable for all n ∈ N, and ∞ X f (x) = fn (x) for x ∈ X. Then n=1 Z X ∞ fn dµ = X n=1 ∞ Z X n=1 fn dµ. X Example 12.5 (a) Let X = N, A = P(N) the σ-algebra of all subsets, and µ the counting measure on N. The functions on N can be identified with the sequences (xn ), f (n) = xn . Trivially, any function is A-measurable. R What is N xn dµ? First, let f ≥ 0. For a simple function gn , given by gn = xn χ{n} , we obtain ∞ X R gn dµ = xn µ({n}) = xn . Note that f = gn and gn ≥ 0 since xn ≥ 0. By Corollary 12.19, n=1 Z N f dµ = ∞ Z X n=1 R N gn dµ = ∞ X xn . n=1 P Now, let f be arbitrary integrable, i. e. N | f | dµ < ∞; thus ∞ n=1 | xn | < ∞. Therefore, P 1 (xn ) ∈ L (N, µ) if and only if xn converges absolutely. The space of absolutely convergent series is denoted by ℓ1 or ℓ1 (N). 12 Measure Theory and Integration 326 (b) Let anm ≥ 0 for all n, m ∈ N. Then ∞ ∞ X X amn = n=1 m=1 ∞ ∞ X X amn . m=1 n=1 Proof. Consider the measure space (N, P(N), µ) from (a). For n ∈ N define functions fn (m) = amn . By Corollary 12.19 we then have Z X ∞ X n=1 = | fn (m) dµ = {z f (m) ∞ XZ n=1 } Z f dµ = (a) X fn (m) dµ = X ∞ X ∞ X ∞ X f (m) = m=1 ∞ X ∞ X amn m=1 n=1 amn . n=1 m=1 Proposition 12.20 Let f : X → [0, +∞] be measurable. Then Z ϕ(A) = f dµ, A ∈ A A defines a measure ϕ on A. Proof. Since f ≥ 0, ϕ(A) ≥ 0 for all A ∈ A. Let (An ) be a countable disjoint family of P P∞ measurable sets An ∈ A and let A = ∞ n=1 An . By homework 40.1, χA = n=1 χAn and therefore Z Z X Z ∞ ∞ Z X f dµ = χA f dµ = χAn f dµ = χAn f dµ ϕ(A) = A = X ∞ Z X n=1 An f dµ = X n=1 ∞ X B.Levi n=1 X ϕ(An ). n=1 12.4.4 The Dominated Convergence Theorem Besides the monotone convergence theorem the present theorem is the most important one. It is due to Henry Lebesgue. The great advantage, compared with Theorem6.6, is that µ(X) = ∞ is allowed, that is, non-compact domains X are included. We only need the pointwise convergence of (fn ), not the uniform convergence. The main assumtion here is the existence of an integrable upper bound for all fn . Theorem 12.21 (Dominated Convergence Theorem of Lebesgue) Let fn : X g, fn : X → C be measurable functions such that → R or 12.4 Some Theorems on Lebesgue Integrals 327 (1) fn (x) → f (x) as n → ∞ a. e. on X, (2) | fn (x) | ≤ g(x) Z g dµ < +∞. (3) a. e. on X, X R Then f is measurable and integrable, X | f | dµ < ∞, and Z Z Z lim fn dµ = f dµ = lim fn dµ, n→∞ X n→∞ X X Z | fn − f | dµ = 0. lim n→∞ (12.10) X Note, that (12.10) shows that (fn ) converges to f in the normed space L1 (X, µ). Example 12.6 (a) Let An ∈ A, n ∈ N, A1 ⊂ A2 ⊂ · · · be an increasing sequence with [ An = A. If f ∈ L 1 (A, µ), then f ∈ L 1 (An ) for all n and N n∈ lim n→∞ Z f dµ = An Z f dµ. (12.11) A Indeed, the sequence (χAn f ) is pointwise converging to χA f since χA (x) = 1 iff x ∈ A iff x ∈ An for all n ≥ n0 iff limn→∞ χAn (x) = 1. Moreover, | χAn f | ≤ | χA f | which is integrable. By Lebesgue’s theorem, Z Z Z Z f dµ = lim lim χAn f = χA f dµ = f dµ. n→∞ An n→∞ X X A However, if we do not assume f ∈ L 1 (A, µ), the statement is not true (see Remark 12.7 below). Exhausting theorem. Let (An ) be an increasing sequencce of measurable sets and A = R S∞ n=1 An . suppose that f is measurable, and An f dµ is a bounded sequence. Then f ∈ L 1 (A, µ) and (12.11) holds. (b) Let fn (x) = (−1)n xn on [0, 1]. The sequence is dominated by the integrable function R R 1 ≥ | fn (x) | for all x ∈ [0, 1]. Hence limn→∞ [0,1] fn dλ = 0 = [0,1] limn→∞ fn dλ. 12.4.5 Application of Lebesgue’s Theorem to Parametric Integrals As a direct application of the dominated convergence theorem we now treat parameter dependent integrals see Propostions 7.22 and 7.23 Proposition 12.22 (Continuity) Let U ⊂ Rn be an open connected set, t0 ∈ U, and f : Rm × U → R be a function. Assume that (a) for a.e. x ∈ Rm , the function t 7→ f (x, t) is continuous at t0 , (b) There exists an integrable function F : Rm → R such that for every t ∈ U, | f (x, t) | ≤ F (x) a.e. on Rm . 12 Measure Theory and Integration 328 Then the function g(t) = is continuous at t0 . Z Rm f (x, t) dx Proof. First we note that for any fixed t ∈ U, the function ft (x) = f (x, t) is integrable on Rm since it is dominated by the integrable function F . We have to show that for any sequence tj → t0 , tj ∈ U, g(tj ) tends to g(t0 ) as n → ∞. We set fj (x) = f (x, tj ) and f0 (x) = f (x, t0 ) for all n ∈ N. By (b) we have f0 (x) = lim fj (x), j→∞ a. e. x ∈ Rm . By (a) and (c), the assumptions of the dominated convergence theorem are satisfied and thus Z Z Z lim g(tj ) = lim fj (x) dx = lim fj (x) dx = f0 (x) dx = g(t0 ). j→∞ j→∞ Rm Rm j→∞ Rm Proposition 12.23 (Differentiation under the Integral Sign) Let I ⊂ R be an open interval and f : Rm × I → R be a function such that (a) for every t ∈ I the function x 7→ f (x, t) is integrable, (b) for almost all x ∈ Rm , the function t 7→ f (x, t) is finite and continuously differentiable, (c) There exists an integrable function F : Rm → R such that for every t ∈ I ∂f ≤ F (x), a.e. x ∈ Rm . (x, t) ∂t R Then the function g(t) = Rm f (x, t) dx is differentiable on I with Z ∂f ′ (x, t) dx. g (t) = Rm ∂t The proof uses the previous theorem about the continuity of the parametric integral. A detailed proof is to be found in [Kön90, p. 283]. Example 12.7 (a) Let f ∈ L 1 (R). Then the Fourier transform fˆ: R → C, R −itx 1 f (x) dx is continuous on R, see homework 41.3. fˆ(t) = √2π e R (b) Let K ⊂ R3 be a compact subset and ρ : K → R integrable, the Newton potential (with mass density ρ) is given by Z ρ(x) u(t) = dx, t 6∈ K. K kx − tk Then u(t) is a harmonic function on R3 \ K. Similarly, if K ⊂ R2 is compact and ρ ∈ L (K), the Newton potential is given by Z u(t) = ρ(x) log kx − tk dx, t 6∈ K. K Then u(t) is a harmonic function on R2 \ K. 12.4 Some Theorems on Lebesgue Integrals 329 12.4.6 The Riemann and the Lebesgue Integrals Proposition 12.24 Let f be a bounded function on the finite interval [a, b]. (a) f is Riemann integrable on [a, b] if and only if f is continuous a. e. on [a, b]. (b) If f is Riemann integrable on [a, b], then f is Lebesgue integrable, too. Both integrals coincide. Let I ⊂ R be an interval such that f is Riemann integrable on all compact subintervals of I. (c) f is Lebesgue integrable on I if and only if | f | is improperly Riemann integrable on I (see Section 5.4); both integrals coincide. Remarks 12.7 (a) The characteristic function χQ on [0, 1] is Lebesgue but not Riemann integrable; χQ is nowhere continuous on [0, 1]. (b) The (improper) Riemann integral Z ∞ sin x dx x 1 converges (see Example 5.11); however, the Lebesgue integral does not exist since the integral does not converge absolutely. Indeed, for non-negative integers n ≥ 1 we have with some c > 0 Z (n+1)π Z (n+1)π sin x 1 c dx ≥ ; | sin x | dx = x (n + 1)π nπ (n + 1)π nπ hence n X sin x 1 dx ≥ c . x π k=1 k + 1 π R∞ Since the harmonic series diverges, so does the integral 1 sinx x dx. Z (n+1)π 12.4.7 Appendix: Fubini’s Theorem Theorem 12.25 Let (X1 , A1 , µ1 ) and (X2 , A2 , µ2 ) be σ-finite measure spaces, let f be an A1 ⊗ A2 -measurable function and X = X1 × X2 . R R (a) If f : X → [0, +∞], ϕ(x1 ) = f (x1 , x2 ) dµ2 , ψ(x2 ) = f (x1 , x2 ) dµ1 , then Z X2 ψ(x2 ) dµ2 = X2 Z X1 f d(µ1 ⊗ µ2 ) = X1 ×X2 (b) If f ∈ L 1 (X, µ1 ⊗ µ2 ) then Z Z f d(µ1 ⊗ µ2 ) = X1 ×X2 X2 Z Z ϕ(x1 ) dµ1 . X1 f (x1 , x2 ) dµ1 X1 dµ2 . Here A1 ⊗ A2 denotes the smallest σ-algebra over X, which contains all sets A × B, A ∈ A1 and B ∈ A2 . Define µ(A × B) = µ1 (A)µ2 (B) and extend µ to a measure µ1 ⊗ µ2 on A1 ⊗ A2 . Remark 12.8 In (a), as in Levi’s theorem, we don’t need any assumption on f to change the order of integration since f ≥ 0. In (b) f is an arbitrary measurable function on X1 × X2 , R however, the integral X | f | dµ needs to be finite. 330 12 Measure Theory and Integration Chapter 13 Hilbert Space Functional analysis is a fruitful interplay between linear algebra and analysis. One defines function spaces with certain properties and certain topologies and considers linear operators between such spaces. The friendliest example of such spaces are Hilbert spaces. This chapter is divided into two parts—one describes the geometry of a Hilbert space, the second is concerned with linear operators on the Hilbert space. 13.1 The Geometry of the Hilbert Space 13.1.1 Unitary Spaces Let E be a linear space over K = R or over K = C. Definition 13.1 An inner product on E is a function h· , ·i : E × E → K with (a) hλ1 x1 + λ2 x2 , yi = λ1 hx1 , yi + λ2 hx2 , yi (Linearity) (b) hx , yi = hy , xi. (Hermitian property) (c) hx , xi ≥ 0 for all x ∈ E, and hx , xi = 0 implies x = 0 (Positive definiteness) A unitary space is a linear space together with an inner product. Let us list some immediate consequences from these axioms: From (a) and (b) it follows that (d) hy , λ1 x1 + λ2 x2 i = λ1 hy , x1 i + λ2 hy , x2 i . A form on E × E satisfying (a) and (d) is called a sesquilinear form. (a) implies h0 , yi = 0 for all y ∈ E. The mapping x 7→ hx , yi is a linear mapping into K (a linear functional) for all y ∈ E. By (c), we may define kxk, the norm of the vector x ∈ E to be the square root of hx , xi; thus kxk2 = hx , xi . 331 (13.1) 13 Hilbert Space 332 Proposition 13.1 (Cauchy–Schwarz Inequality) Let (E, h· , ·i) be a unitary space. x, y ∈ E we have | hx , yi | ≤ kxk kyk . For Equality holds if and only if x = βy for some β ∈ K. Proof. Choose α ∈ C, | α | = 1 such that α hy , xi = | hx , yi |. For λ ∈ R we then have (since α hx , yi = α hy , xi = | hx , yi |) hx − αλy , x − αλyi = hx , xi − αλ hy , xi − αλ hx , yi + λ2 hy , yi = hx , xi − 2λ | hx , yi | + λ2 hy , yi ≥ 0. This is a quadratic polynomial aλ2 + bλ + c in λ with real coefficients. Since this polynomial takes only non-negative values, its discriminant b2 − 4ac must be non-positive: 4 | hx , yi |2 − 4 kxk2 kyk2 ≤ 0. This implies | hx , yi | ≤ kxk kyk. Corollary 13.2 k·k defines a norm on E. Proof. It is clear that kxkq≥ 0. From (c) it follows that kxk = 0 implies x = 0. Further, p kλxk = hλx , λxi = | λ |2 hx , xi = | λ | kxk. We prove the triangle inequality. Since 2 Re (z) = z + z we have by Proposition 1.20 and the Cauchy-Schwarz inequality kx + yk2 = hx + y , x + yi = hx , xi + hx , yi + hy , xi + hy , yi = kxk2 + kyk2 + 2 Re hx , yi ≤ kxk2 + kyk2 + 2 | hx , yi | ≤ kxk2 + kyk2 + 2 kxk kyk = (kxk + kyk)2 ; hence kx + yk ≤ kxk + kyk. p By the corollary, any unitary space is a normed space with the norm kxk = hx , xi. Recall that any normed vector space is a metric space with the metric d(x, y) = kx − yk. Hence, the notions of open and closed sets, neighborhoods, converging sequences, Cauchy sequences, continuous mappings, and so on make sense in a unitary space. In particular lim xn = x n→∞ means that the sequence (kxn − xk) of non-negative real numbers tends to 0. Recall from Definition 6.8 that a metric space is said to be complete if every Cauchy sequence converges. Definition 13.2 A complete unitary space is called a Hilbert space. Example 13.1 Let K = C. (a) E = C , x = (x1 , . . . , xn ) ∈ C , y = (y1 , . . . , yn ) ∈ C . Then hx , yi = n n n n X k=1 xk yk defines 1 P an inner product, with the euclidean norm kxk = ( nk=1 | xk |) 2 . (Cn , h· , ·i) is a Hilbert space. 13.1 The Geometry of the Hilbert Space 333 R (b) E = L2 (X, µ) is a Hilbert space with the inner product hf , gi = X f g dµ. By Proposition 12.17 with p = q = 2 we obtain the Cauchy-Schwarz inequality Z Z 12 Z 12 2 2 f g dµ ≤ | f | dµ | g | dµ . X X X Using CSI one can prove Minkowski’s inequality, that is, f, g ∈ L2 (X, µ) implies f + g ∈ L2 (X, µ). Also, hf , gi is a finite complex number, since f g ∈ L1 (X, µ). R Note that the inner product is positive definite since X | f |2 dµ = 0 implies (by Proposition 12.16) | f | = 0 a. e. and therefore, f = 0 in L2 (X, µ). To prove the completeness of L2 (X, µ) is more complicated, we skip the proof. (c) E = ℓ2 , i. e. ∞ X | xn |2 < ∞}. ℓ2 = {(xn ) | xn ∈ C, n ∈ N, n=1 Note that Cauchy–Schwarz’s inequality in R (Corollary 1.26) implies 2 k k ∞ k ∞ X X X X X 2 2 2 xn yn ≤ | xn | | yn | ≤ | xn | | y n |2 . n=1 n=1 n=1 n=1 n=1 k Taking the supremum over all k ∈ N on the left, we have 2 ∞ ∞ ∞ X X X 2 xn yn ≤ | xn | | y n |2 ; n=1 n=1 hence h(xn ) , (yn )i = n=1 ∞ X xn yn n=1 is an absolutely converging series such that the inner product is well-defined on ℓ2 . Lemma 13.3 Let E be a unitary space. For any fixed y ∈ E the mappings f, g : E → C given by f (x) = hx , yi , and g(x) = hy , xi are continuous functions on E. Proof. First proof. Let (xn ) be a sequence in E , converging to x ∈ E, that is, limn→∞ kxn − xk = 0. Then | hxn , yi − hx , yi | = | hxn − x , yi | ≤ kxn − xk kyk =⇒ 0 CSI as n → ∞. This proves continuity of f . The same proof works for g. Second proof. The Cauchy–Schwarz inequality implies that for x1 , x2 ∈ E | hx1 , yi − hx2 , yi | = | hx1 − x2 , yi | ≤ kx1 − x2 k kyk , which proves that the map x 7→ hx , yi is in fact uniformly continuous (Given ε > 0 choose δ = ε/ kyk. Then kx1 − x2 k < δ implies | hx1 , yi − hx2 , yi | < ε). The same is true for x 7→ hy , xi. 13 Hilbert Space 334 Definition 13.3 Let H be a unitary space. We call x and y orthogonal to each other, and write x ⊥ y, if hx , yi = 0. Two subsets M, N ⊂ H are called orthogonal to each other if x ⊥ y for all x ∈ M and y ∈ N. For a subset M ⊂ H define the orthogonal complement M ⊥ of M to be the set M ⊥ = {x ∈ H | hx , mi = 0, for all m ∈ M }. For example, E = Rn with the standard inner product and v = (v1 , . . . , vn ) ∈ Rn , v 6= 0 yields {v} = {x ∈ R | ⊥ n n X xk vk = 0}. k=1 This is a hyperplane in Rn which is orthogonal to v. Lemma 13.4 Let H be a unitary space and M ⊂ H be an arbitrary subset. Then, M ⊥ is a closed linear subspace of H. Proof. (a) Suppose that x, y ∈ M ⊥ . Then for m ∈ M we have hλ1 x + λ2 y , mi = λ1 hx , mi + λ2 hy , mi = 0; hence λ1 x + λ2 y ∈ M ⊥ . This shows that M ⊥ is a linear subspace. (b) We show that any converging sequence (xn ) of elements of M ⊥ has its limit in M ⊥ . Suppose lim xn = x, xn ∈ M ⊥ , x ∈ H. Then for all m ∈ M, hxn , mi = 0. Since the inner product is n→∞ continuous in the first argument (see Lemma 13.3) we obtain 0 = lim hxn , mi = hx , mi . n→∞ This shows x ∈ M ⊥ ; hence M ⊥ is closed. 13.1.2 Norm and Inner product Problem. Given p a normed linear space (E, k·k). Does there exist an inner product h· , ·i on E such that kxk = hx , xi for all x ∈ E? In this case we call k·k an inner product norm. Proposition 13.5 (a) A norm k·k on a linear space E over K = C or K = R is an inner product norm if and only if the parallelogram law kx + yk2 + kx − yk2 = 2(kxk2 + kyk2 ), x, y ∈ E (13.2) is satisfied. (b) If (13.2) is satisfied, the inner product h· , ·i is given by (13.3) in the real case K = R and by (13.4) in the complex case K = C. 1 kx + yk2 − kx − yk2 , if K = R. (13.3) hx , yi = 4 1 kx + yk2 − kx − yk2 + i kx + iyk2 − i kx − iyk2 , if K = C. hx , yi = (13.4) 4 These equations are called polarization identities. 13.1 The Geometry of the Hilbert Space 335 Proof. We check the parallelogram and the polarization identity in the real case, K = R. kx + yk2 + kx − yk2 = hx + y , x + yi + hx − y , x − yi = hx , xi + hy , xi + hx , yi + hy , yi + (hx , xi − hy , xi − hx , yi + hy , yi) = 2 kxk2 + 2 kyk2 . Further, kx + yk2 − kx − yk2 = (hx , xi + hy , xi + hx , yi + hy , yi)− − (hx , xi − hy , xi − hx , yi + hy , yi) = 4 hx , yi . The proof that the parallelogram identity is sufficient for E being a unitary space is in the appendix to this section. R2 Example 13.2 We show that L1 ([0, 2]) with kf k1 = 0 | f | dx is not an inner product norm. R1 Indeed, let f = χ[1,2] and g = χ[0,1] . Then f + g = | f − g | = 1 and kf k1 = kgk1 = 0 dx = 1 such that kf + gk21 + kf − gk21 = 22 + 22 = 8 6= 4 = 2(kf k21 + kgk21 ). The parallelogram identity is not satisfied for k·k1 such that L1 ([0, 2]) is not an inner product space. 13.1.3 Two Theorems of F. Riesz (born: January 22, 1880 in Austria-Hungary, died: February 28, 1956, founder of functional analysis) Definition 13.4 Let (H1 , h· , ·i1 ) and (H2 , h· , ·i2 ) be Hilbert spaces. Let H = {(x1 , x2 ) | x1 ∈ H1 , x2 ∈ H2 } be the direct sum of the Hilbert spaces H1 and H2 . Then h(x1 , x2 ) , (y1 , y2)i = hx1 , y1 i1 + hx2 , y2 i2 defines an inner product on H. With this inner product H becomes a Hilbert space. H = H1 ⊕ H2 is called the (direct) orthogonal sum of H1 and H2 . Definition 13.5 Two Hilbert spaces H1 and H2 are called isomorphic if there exists a bijective linear mapping ϕ : H1 → H2 such that hϕ(x) , ϕ(y)i2 = hx , yi1 , x, y ∈ H1 . ϕ is called isometric isomorphism or a unitary map. Back to the orthogonal sum H = H1 ⊕ H2 . Let H̃1 = {(x1 , 0) | x1 ∈ H1 } and H̃2 = {(0, x2 ) | x2 ∈ H2 }. Then x1 7→ (x1 , 0) and x2 7→ (0, x2 ) are isometric isomorphisms from Hi → H̃i , i = 1, 2. We have H̃1 ⊥ H̃2 and H̃i , i = 1, 2 are closed linear subspaces of H. In this situation we say that H is the inner orthogonal sum of the two closed subspaces H̃1 and H̃2 . 13 Hilbert Space 336 (a) Riesz’s First Theorem Problem. Let H1 be a closed linear subspace of H. Does there exist another closed linear subspace H2 such that H = H1 ⊕ H2 ? Answer: YES. Lemma 13.6 ( Minimal Distance Lemma) Let C be a conx vex and closed subset of the Hilbert space H. For x ∈ H let 1111 00 0000 11 00 11 00 11 c 000000000 111111111 00 11 01 000000000 111111111 000000000 111111111 000000000 111111111 000000000 111111111 C ̺(x) = inf{kx − yk | y ∈ C}. Then there exists a unique element c ∈ C such that ̺(x) = kx − ck . Proof. Existence. Since ̺(x) is an infimum, there exists a sequence (yn ), yn ∈ C, which approximates the infimum, limn→∞ kx − yn k = ̺(x). We will show, that (yn ) is a Cauchy sequence. By the parallelogram law (see Proposition 13.5) we have kyn − ym k2 = kyn − x + x − ym k2 = 2 kyn − xk2 + 2 kx − ym k2 − k2x − yn − ym k2 2 yn + ym 2 2 . = 2 kyn − xk + 2 kx − ym k − 4 x − 2 m Since C is convex, (yn + ym )/2 ∈ C and therefore x − yn +y ≥ ̺(x). Hence 2 ≤ 2 kyn − xk2 + 2 kx − ym k2 − 4̺(x)2 . By the choice of (yn ), the first two sequences tend to ̺(x)2 as m, n → ∞. Thus, lim kyn − ym k2 = 2(̺2 (x) + ̺(x)2 ) − 4̺(x)2 = 0, m,n→∞ hence (yn ) is a Cauchy sequence. Since H is complete, there exists an element c ∈ H such that limn→∞ yn = c. Since yn ∈ C and C is closed, c ∈ C. By construction, we have kyn − xk −→ ̺(x). On the other hand, since yn −→ c and the norm is continuous (see homework 42.1. (b)), we have kyn − xk −→ kc − xk . This implies ̺(x) = kc − xk. Uniqueness. Let c, c′ two such elements. Then, by the parallelogram law, 2 2 0 ≤ kc − c′ k = kc − x + x − c′ k 2 c + c′ 2 ′ 2 = 2 kc − xk + 2 kx − c k − 4 x − 2 ≤ 2(̺(x)2 + ̺(x)2 ) − 4̺(x)2 = 0. This implies c = c′ ; the point c ∈ C which realizes the infimum is unique. 13.1 The Geometry of the Hilbert Space 337 11111111 00000000 000000000 111111111 000 111 000000 111111 0 1 00000000 11111111 000000000 111111111 000 111 000000 111111 0 1 00000000 11111111 000000000 111111111 000 111 000000 111111 0 1 00000000 11111111 000000000 111111111 000 111 000000 111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 000000000 111111111 000 111 000000 111111 0 1 00000000 11111111 H 0000000000000000000 1111111111111111111 000 111 000000 111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 x 000000 111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 000000 111111 0 1 x 00000000 11111111 0000000000000000000 1111111111111111111 000000 111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 000000 111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 0 1 00000000 11111111 0000000000000000000 1111111111111111111 00000000 11111111 0000000000000000000 1111111111111111111 00000000 11111111 00 11 000000000 111111111 000 x 111 0 1 00000000 H11111111 2 00 11 Theorem 13.7 (Riesz’s first theorem) Let H1 be a closed linear subspace of the Hilbert space H. Then we have H = H1 ⊕ H1⊥ , 1 2 1 that is, any x ∈ H has a unique representation x = x1 + x2 with x1 ∈ H1 and x2 ∈ H1⊥ . Proof. Existence. Apply Lemma 13.6 to the convex, closed set H1 . There exists a unique x1 ∈ H1 such that ̺(x) = inf{kx − yk | y ∈ H1 } = kx − x1 k ≤ kx − x1 − ty1 k for all t ∈ K and y1 ∈ H1 . homework 42.2 (c) now implies x2 = x − x1 ⊥ y1 for all y1 ∈ H1 . Hence x2 ∈ H1⊥ . Therefore, x = x1 + x2 , and the existence of such a representation is shown. Uniqueness. Suppose that x = x1 + x2 = x′1 + x′2 are two possibilities to write x as a sum of elements of x1 , x′1 ∈ H1 and x2 , x′2 ∈ H1⊥ . Then x1 − x′1 = x′2 − x2 = u belongs to both H1 and H1⊥ (by linearity of H1 and H2 ). Hence hu , ui = 0 which implies u = 0. That is, x1 = x′1 and x2 = x′2 . Let x = x1 + x2 be as above. Then the mappings P1 (x) = x1 and P2 (x) = x2 are well-defined on H. They are called orthogonal projections of H onto H1 and H2 , respectively. We will consider projections in more detail later. Example 13.3 Let H be a Hilbert space, z ∈ H, z 6= 0, H1 = K z the one-dimensional linear subspace spanned by one single vector z. Since any finite dimensional subspace is closed, Riesz’s first theorem applies. We want to compute the projections of x ∈ H with respect to H1 and H1⊥ . Let x1 = αz; we have to determine α such that hx − x1 , zi = 0, that is hx − αz , zi = hx , zi − hαz , zi = hx , zi − α hz , zi = 0. Hence, α= hx , zi hx , zi = . hz , zi kzk2 The Riesz’s representation with respect to H1 = Kz and H1⊥ is hx , zi hx , zi x= z+ x− z . kzk2 kzk2 13 Hilbert Space 338 (b) Riesz’s Representation Theorem Recall from Section 11 that a linear functional on the vector space E is a mapping F : E → K such that F (λ1 x1 + λ2 x2 ) = λ1 F (x1 ) + λ2 F (x2 ) for all x1 , x2 ∈ E and λ1 , λ2 ∈ K. Let (E, k·k) be a normed linear space over K. Recall that a linear functional F : E → K is called continuous if xn −→ x in E implies F (xn ) −→ F (x). The set of all continuous linear functionals F on E form a linear space E ′ with the same linear operations as in E ∗ . Now let (H, h· , ·i) be a Hilbert space. By Lemma 13.3, Fy : H → K, Fy (x) = hx , yi defines a continuous linear functional on H. Riesz’s representation theorem states that any continuous linear functional on H is of this form. Theorem 13.8 (Riesz’s Representation Theorem) Let F be a continuous linear functional on the Hilbert space H. Then there exists a unique element y ∈ H such that F (x) = Fy (x) = hx , yi for all x ∈ H. Proof. Existence. Let H1 = ker F be the null-space of the linear functional F . H1 is a linear subspace (since F is linear). H1 is closed since H1 = F −1 ({0}) is the preimage of the closed set {0} under the continuous map F . By Riesz’s first theorem, H = H1 ⊕ H1⊥ . Case 1. H1⊥ = {0}. Then H = H1 and F (x) = 0 for all x. We can choose y = 0; F (x) = hx , 0i. Case 2. H1⊥ 6= {0}. Suppose u ∈ H1⊥ , u 6= 0. Then F (u) 6= 0 (otherwise, u ∈ H1⊥ ∩ H1 such that hu , ui = 0 which implies u = 0). We have F (x) F (x) u = F (x) − F (u) = 0. F x− F (u) F (u) Hence x − F (x) u ∈ H1 . Since u ∈ H1⊥ we have F (u) F (x) x− u, u F (u) F (x) = hx , ui − hu , ui F (u) * + F (u) F (u) hx , ui = x , u = Fy (x), F (x) = hu , ui kuk2 0= F (u) u. kuk2 Uniqueness. Suppose that both y1 , y2 ∈ H give the same functional F , i. e. F (x) = hx , y1 i = hx , y2 i for all x. This implies where y = hy1 − y2 , xi = 0, x ∈ H. In particular, choose x = y1 − y2 . This gives ky1 − y2 k2 = 0; hence y1 = y2 . 13.1 The Geometry of the Hilbert Space 339 (c) Example Any continuous linear functionals on L2 (X, µ) are of the form F (f ) = g ∈ L2 (X, µ). Any continuous linear functional on ℓ2 is given by F ((xn )) = ∞ X xn yn , n=1 R X f g dµ with some with (yn ) ∈ ℓ2 . 13.1.4 Orthogonal Sets and Fourier Expansion Motivation. Let E = Rn be the euclidean space with the standard inner product and standard basis {e1 , . . . , en }. Then we have with xi = hx , ei i x= n X xk ek , k=1 2 kxk = n X k=1 2 | xk | , hx , yi = n X xk yk . k=1 We want to generalize these formulas to arbitrary Hilbert spaces. (a) Orthonormal Sets Let (H, h· , ·i) be a Hilbert space. Definition 13.6 Let {xi | i ∈ I} be a family of elements of H. {xi } is called an orthogonal set or OS if hxi , xj i = 0 for i 6= j. {xi } is called an orthonormal set or NOS if hxi , xj i = δij for all i, j ∈ I. Example 13.4 (a) H = ℓ2 , en = (0, 0, . . . , 0, 1, 0, . . . ) with the 1 at the nth component. Then {en | n ∈ N} is an OS in H. R 2π (b) H = L2 ((0, 2π)) with the Lebesgue measure, hf , gi = 0 f g dλ. Then {1, sin(nx), cos(nx) | n ∈ N} is an OS in H. sin(nx) cos(nx) 1 √ , √ , √ |n∈N , π π 2π to be orthonormal sets of H. einx √ |n∈N 2π Lemma 13.9 (The Pythagorean Theorem) Let {x1 , . . . , xk } be an OS in H, then kx1 + · · · + xk k2 = kx1 k2 + · · · + kxk k2 . The easy proof is left to the reader. Lemma 13.10 Let {xn } be an OS in H. Then converges. The proof is in the appendix. P∞ k=1 xk converges if and only if P∞ k=1 kxk k 2 13 Hilbert Space 340 (b) Fourier Expansion and Completeness Throughout this paragraph let {xn | n ∈ N} an NOS in the Hilbert space H. Definition 13.7 The numbers hx , xn i, n ∈ N, are called Fourier coefficients of x ∈ H with respect to the NOS {xn }. 1 sin(nx) cos(nx) Example 13.5 Consider the NOS √ , √ , √ | n ∈ N from the previous exπ π 2π ample on H = L2 ((0, 2π)). Let f ∈ H. Then Z 2π sin(nx) 1 f (t) sin(nt) dt, f, √ =√ π π 0 Z 2π cos(nx) 1 f (t) cos(nt) dt, f, √ =√ π π 0 Z 2π 1 1 f, √ f (t) dt, =√ 2π 2π 0 These are the usual Fourier coefficients—up to a factor. Note that we have another normalization than in Definition 6.3 since the inner product there has the factor 1/(2π). Proposition 13.11 (Bessel’s Inequality) For x ∈ H we have ∞ X k=1 | hx , xk i |2 ≤ kxk2 . Proof. Let n ∈ N be a positive integer and yn = x − hyn , xm i = hx , xm i − n X k=1 Pn k=1 hx , (13.5) xk i xk . Then hx , xk i hxk , xm i = hx , xm i − n X k=1 hx , xk i δkm = 0 for m = 1, . . . , n. Hence, {yn , hx , x1 i x1 , . . . , hx , xn i xn } is an OS. By Lemma 13.9 2 n n n X X X 2 2 2 2 kxk = yn + hx , xk i xk = kyn k + | hx , xk i | kxk k ≥ | hx , xk i |2 , k=1 k=1 k=1 since kxk k2 = 1 for all k. Taking the supremum over all n on the right, the assertion follows. Corollary 13.12 For any x ∈ H the series ∞ X k=1 hx , xk i xk converges in H. Proof. Since {hx , xk i xk } is an OS, by Lemma 13.10 the series converges if and only if the P∞ P 2 2 series ∞ k=1 khx , xk i xk k = k=1 | hx , xk i | converges. By Bessel’s inequality, this series converges. We call P∞ k=1 hx , xk i xk the Fourier series of x with respect to the NOS {xk }. 13.1 The Geometry of the Hilbert Space 341 Remarks 13.1 (a) In general, the Fourier series of x does not converge to x. √ √ (b) The NOS { √12π , sin(nx) , cos(nx) } gives the ordinary Fourier series of a function f which is π π integrable over (0, 2π). Theorem 13.13 Let {xk | k ∈ N} be an NOS in H. The following are equivalent: (a) x = ∞ X k=1 hx , xk i xk for all x ∈ H, i.e. the Fourier series of x converges to x. (b) hz , xk i = 0 for all k ∈ N implies z = 0, i. e. the NOS is maximal. ∞ X 2 (c) For every x ∈ H we have kxk = | hx , xk i |2 . k=1 ∞ X (d) If x ∈ H and y ∈ H, then hx , yi = k=1 hx , xk i hxk , yi. Formula (c) is called Parseval’s identity. Definition 13.8 An orthonormal set {xi | i ∈ N} which satisfies the above (equivalent) properties is called a complete orthonormal system, CNOS for short. Proof. (a) → (d): Since the inner product is continuous in both components we have hx , yi = = *∞ X k=1 ∞ X k=1 hx , xk i xk , ∞ X n=1 hy , xn i xn + = ∞ X k,n=1 hx , xk i hxk , yi . hx , xk i hy , xn i hxk , xn i | {z } δkn (d) → (c): Put y = x. (c) → (b): Suppose hz , xk i = 0 for all k. By (c) we then have 2 kzk = ∞ X k=1 | hz , xk i |2 = 0; hence z = 0. P∞ (b) → (a): Fix x ∈ H and put y = k=1 hx , xk i xk which converges according to Corollary 13.12. With z = x − y we have for all positive integers n ∈ N hz , xn i = hx − y , xn i = hz , xn i = hx , xn i − ∞ X k=1 * x− ∞ X k=1 hx , xk i xk , xn + hx , xk i hxk , xn i = hx , xn i − hx , xn i = 0. This shows z = 0 and therefore x = y, i. e. the Fourier series of x converges to x. 13 Hilbert Space 342 Example 13.6 (a) H = ℓ2 , {en | n ∈ N} is an NOS. We show that this NOS is complete. For, let x = (xn ) be orthogonal to every en , n ∈ N; that is, 0 = hx , en i = xn . Hence, x = (0, 0, . . . ) = 0. By (b), {en } is a CNOS. How does the Fourier series of x look like? The Fourier coefficients of x are hx , en i = xn such that x= ∞ X xn en n=1 is the Fourier series of x . The NOS {en | n ≥ 2} is not complete. (b) H = L2 ((0, 2π)), inx e 1 sin(nx) cos(nx) √ , √ √ |n∈Z , √ | n ∈ N , and π π 2π 2π are both CNOSs in H. This was stated in Theorem6.14 (c) Existence of CNOS in a Separable Hilbert Space Definition 13.9 A metric space E is called separable if there exists a countable dense subset of E. Example 13.7 (a) Rn is separable. M = {(r1 , . . . , rn ) | r1 , . . . , rn ∈ Q} is a countable dense set in Rn . (b) Cn is separable. M = {(r1 + is1 , . . . , rn + isn ) | r1 , . . . , rn , s1 , . . . , sn ∈ Q} is a countable dense subset of Cn . (c) L2 ([a, b]) is separable. The polynomials {1, x, x2 , . . . } are linearly independent in L2 ([a, b]) and they can be orthonormalized via Schmidt’s process. As a result we get a countable CNOS in L2 ([a, b]) (Legendre polynomials in case −a = 1 = b). However, L2 (R) contains no polyno2 mial; in this case the Hermite functions which are of the form pn (x) e−x with polynomials pn , form a countable CNOS. More general, L2 (G, λn ) is separable for any region G ⊂ Rn with respect to the Lebesgue measure. (d) Any Hilbert space is isomorphic to some L2 (X, µ) where µ is the counting measure on X; X = N gives ℓ2 . X uncountable gives a non-separable Hilbert space. Proposition 13.14 (Schmidt’s Orthogonalization Process) Let {yk } be an at most countable linearly independent subset of the Hilbert space H. Then there exists an NOS {xk } such that for every n lin {y1 , . . . , yn } = lin {x1 , . . . , xn }. The NOS can be computed recursively, y1 x1 := , ky1 k xn+1 = (yn+1 − n X k=1 hyn+1 , xk i xk )/ k∼k Corollary 13.15 Let {ek | k ∈ N} be an NOS where N = {1, . . . , n} for some n ∈ N or N = N. Suppose that H1 = lin {ek | k ∈ N} is the linear span of the NOS. Then P x1 = k∈N hx , ek i ek is the orthogonal projection of x ∈ H onto H1 . 13.1 The Geometry of the Hilbert Space 343 Proposition 13.16 (a) A Hilbert space H has an at most countable complete orthonormal system (CNOS) if and only if H is separable. (b) Let H be a separable Hilbert space. Then H is either isomorphic to Kn for some n ∈ N or to ℓ2 . 13.1.5 Appendix (a) The Inner Product constructed from an Inner Product Norm Proof of Proposition 13.5. We consider only the case K = R. Assume that the parallelogram identity is satisfied. We will show that hx , yi = 1 kx + yk2 − kx − yk2 4 defines a bilinear form on E. (a) We show Additivity. First note that the parallelogram identity implies 1 1 kx1 +x2 +yk2 + kx1 +x2 +yk2 2 2 1 1 2 2 = 2 kx1 +yk +2 kx2 k − kx1 −x2 +yk2 + 2 kx2 +yk2 +2 kx1 k2 − kx2 −x1 +yk2 2 2 1 2 2 2 2 = kx1 +yk + kx2 +yk + kx1 k + kx2 k − kx1 −x2 +yk2 + kx2 −x1 +yk2 2 kx1 +x2 +yk2 = Replacing y by −y, we have kx1 +x2 −yk2 = kx1 −yk2 + kx2 −yk2 + kx1 k2 + kx2 k2 − By definition and the above two formulas, 1 kx1 −x2 −yk2 + kx2 −x1 −yk2 . 2 1 kx1 + x2 + yk2 − kx1 + x2 − yk2 4 1 = kx1 + yk2 − kx1 − yk2 + kx2 + yk2 − kx2 − yk2 2 = hx1 , yi + hx2 , yi , hx1 + x2 , yi = that is, h· , ·i is additive in the first variable. It is obviously symmetric and hence additive in the second variable, too. (b) We show hλx , yi = λ hx , yi for all λ ∈ R, x, y ∈ E. By (a), h2x , yi = 2 hx , yi. By induction on n, hnx , yi = n hx , yi for all n ∈ N. Now let λ = m , m, n ∈ N. Then n Dm E D m E n hλx , yi = n x , y = n x , y = m hx , yi n n m =⇒ hλx , yi = hx , yi = λ hx , yi . n Hence, hλx , yi = λ hx , yi holds for all positive rational numbers λ. Suppose λ ∈ Q+ , then 0 = hx + (−x) , yi = hx , yi + h−x , yi 13 Hilbert Space 344 implies h−x , yi = − hx , yi and, moreover, h−λx , yi = −λ hx , yi such that the equation holds for all λ ∈ Q. Suppose that λ ∈ R is given. Then there exists a sequence (λn ), λn ∈ Q of rational numbers with λn → λ. This implies λn x → λx for all x ∈ E and, since k·k is continuous, hλx , yi = lim hλn x , yi = lim λn hx , yi = λ hx , yi . n→∞ n→∞ This completes the proof. (b) Convergence of Orthogonal Series We reformulate Lemma 13.10 P P∞ 2 Let {xn } be an OS in H. Then ∞ k=1 xk converges if and only if k=1 kxk k converges. P x of elements xi of a Hilbert space H is defined Note that the convergence of a series ∞ i=1 Pni to be the limit of the partial sums limn→∞ i=1 xi . In particular, the Cauchy criterion applies since H is complete: P The series yi converges if and only if for every ε > 0 there exists n0 ∈ N such n X yi < ε. that m, n ≥ n0 imply i=m P Pn 2 Proof. By the above discussion, ∞ k=1 xk converges if and only if k k=m xk k becomes small for sufficiently large m, n ∈ N. By the Pythagorean theorem this term equals n X k=m hence the series P kxk k2 ; xk converges, if and only if the series P kxk k2 converges. 13.2 Bounded Linear Operators in Hilbert Spaces 13.2.1 Bounded Linear Operators Let (E1 , k·k1 ) and (E2 , k·k2 ) be normed linear space. Recall that a linear map T : E1 → E2 is called continuous if xn −→ x in E1 implies T (xn ) −→ T (x) in E2 . Definition 13.10 (a) A linear map T : E1 → E2 is called bounded if there exist a positive real number C > 0 such that kT (x)k2 ≤ C kxk1 , for all x ∈ E1 . (13.6) 13.2 Bounded Linear Operators in Hilbert Spaces 345 (b) Suppose that T : E1 → E2 is a bounded linear map. Then the operator norm is the smallest number C satisfying (13.6) for all x ∈ E1 , that is kT k = inf {C > 0 | ∀ x ∈ E1 : kT (x)k2 ≤ C kxk1 } . One can show that kT (x)k2 | x ∈ E1 , x 6= 0 , kxk1 (a) kT k = sup (b) kT k = sup {kT (x)k2 | kxk1 ≤ 1} (c) kT k = sup {kT (x)k2 | kxk1 = 1} Indeed, we may restrict ourselves to unit vectors since | α | kT (x)k2 kT (x)k2 kT (αx)k2 = = . kαxk1 | α | kxk1 kxk1 This shows the equivalence of (a) and (c). Since kT (αx)k2 = | α | kT (x)k2 , the suprema (b) and (c) are equal. From The last equality follows from the fact that the least upper bound is the infimum over all upper bounds. From (a) and (d) it follows, kT (x)k2 ≤ kT k kxk1 . S (13.7) T Also, if E1 ⇒ E2 ⇒ E3 are bounded linear mappings, then T ◦S is a bounded linear mapping with kT ◦Sk ≤ kT k kSk . Indeed, for x 6= 0 one has by (13.7) kT (S(x))k3 ≤ kT k kS(x)k2 ≤ kT k kSk kxk1 . Hence, k(T ◦S)(x)k3 / kxk1 ≤ kT k kSk. Proposition 13.17 For a linear map T : E1 → E2 of a normed space E1 into a normed space E2 the following are equivalent: (a) T is bounded. (b) T is continuous. (c) T is continuous at one point of E1 . Proof. (a) → (b). This follows from the fact kT (x1 ) − T (x2 )k = kT (x1 − x2 )k ≤ kT k kx1 − x2 k , and T is even uniformly continuous on E1 . (b) trivially implies (c). (c) → (a). Suppose T is continuous at x0 . To each ε > 0 one can find δ > 0 such that kx − x0 k < δ implies kT (x) − T (x0 )k < ε. Let y = x − x0 . In other words kyk < δ implies kT (y + x0 ) − T (x0 )k = kT (y)k < ε. Suppose z ∈ E1 , kzk ≤ 1. Then kδ/2zk ≤ δ/2 < δ; hence kT (δ/2z)k < ε. By linearity of T , kT (z)k < 2ε/δ. This shows kT k ≤ 2ε/δ. 13 Hilbert Space 346 Definition 13.11 Let E and F be normed linear spaces. Let L (E, F ) denote the set of all bounded linear maps from E to F . In case E = F we simply write L (E) in place of L (E, F ). Proposition 13.18 Let E and F be normed linear spaces. Then L (E, F ) is a normed linear space if we define the linear structure by (S + T )(x) = S(x) + T (x), (λT )(x) = λ T (x) for S, T ∈ L (E, F ), λ ∈ K. The operator norm kT k makes L (E, F ) a normed linear space. Note that L (E, F ) is complete if and only if F is complete. Example 13.8 (a) Recall that L (Kn , Km ) is a normed vector space with kAk ≤ P 1 2 2 , where A = (aij ) is the matrix representation of the linear operator A, see | a | ij i,j Proposition 7.1 (b) The space E ′ = L (E, K) of continuous linear functionals on E. (c) H = L2 ((0, 1)), g ∈ C([0, 1]), Tg (f )(t) = g(t)f (t) defines a bounded linear operator on H. (see homework) (d) H = L2 ((0, 1)), k(s, t) ∈ L2 ([0, 1] × [0, 1]). Then Z 1 k(s, t)f (s) ds, f ∈ H = L2 ([0, 1]) (Kf )(t) = 0 defines a continuous linear operator K ∈ L (H). We have Z 1 2 Z 1 2 2 | (Kf )(t) | = k(s, t)f (s) ds ≤ | k(s, t) | | f (s) | ds 0 0 Z 1 Z 1 2 ≤ | k(s, t) | ds | f (s) |2 ds C-S-I 0 0 Z 1 = | k(s, t) |2 ds kf k2H . 0 Hence, kK(f )k2H ≤ Z 1 0 Z 1 0 2 | k(s, t) | ds dt kf k2H kK(f )kH ≤ kkkL2 ([0,1]×[0,1]) kf kH . This shows Kf ∈ H and further, kKk ≤ kkkL2 ([0,1]2 ) . K is called an integral operator; K is compact, i. e. it maps the unit ball into a set whose closure is compact. (e) H = L2 (R), a ∈ R, (Va f )(t) = f (t − a), t ∈ R, defines a bounded linear operator called the shift operator. Indeed, Z Z 2 2 kVa f k2 = | f (t − a) | dt = | f (t) |2 dt = kf k22 ; R R 13.2 Bounded Linear Operators in Hilbert Spaces 347 since all quotients kVa (x)k / kxk = 1, kVa k = 1 . (f) H = ℓ2 . We define the right-shift S by S(x1 , x2 , . . . ) = (0, x1 , x2 , . . . ). 1 P∞ 2 2 Obviously, kS(x)k = kxk = | x | . Hence, kSk = 1. n n=1 1 (g) Let E1 = C ([0, 1]) and E2 = C([0, 1]). Define the differentiation operator (T f )(t) = f ′ (t). Let kf k1 = kf k2 = sup | f (t) |. Then T is linear but not bounded. Indeed, let fn (t) = 1 − tn . t∈[0,1] Then kfn k1 = 1 and T fn (t) = ntn−1 such that ktfn k2 = n. Thus, kT fn k2 / kfn k1 = n → +∞ as n → ∞. T is unbounded. However, if we put kf k1 = sup | f (t) | + sup | f ′ (t) | and kf k2 as before, then T is bounded t∈[0,1] t∈[0,1] since kT f k2 = sup | f ′ (t) | ≤ kf k1 =⇒ kT k ≤ 1. t∈[0,1] 13.2.2 The Adjoint Operator In this subsection H is a Hilbert space and L (H) the space of bounded linear operators on H. Let T ∈ L (H) be a bounded linear operator and y ∈ H. Then F (x) = hT (x) , yi defines a continuous linear functional on H. Indeed, | F (x) | = | hT (x) , yi | ≤ kT (x)k kyk ≤ kT k kyk kxk ≤ C kxk . | {z } CSI C Hence, F is bounded and therefore continuous. in particular, kF k ≤ kT k kyk By Riesz’s representation theorem, there exists a unique vector z ∈ H such that hT (x) , yi = F (x) = hx , zi . Note that by the above inequality kzk = kF k ≤ kT k kyk . (13.8) Suppose y1 is another element of H which corresponds to z1 ∈ H with hT (x) , y1 i = hx , z1 i . Finally, let u ∈ H be the element which corresponds to y + y1 , hT (x) , y + y1 i = hx , ui . Since the element u which is given by Riesz’s representation theorem is unique, we have u = z + z1 . Similarly, hT (x) , λyi = F (x) = hx , λzi shows that λz corresponds to λy. 13 Hilbert Space 348 Definition 13.12 The above correspondence y 7→ z is linear. Define the linear operator T ∗ by z = T ∗ (y). By definition, hT (x) , yi = x , T ∗ (y) , x, y ∈ H. (13.9) T ∗ is called the adjoint operator to T . Proposition 13.19 Let T, T1 , T2 ∈ L (H). Then T ∗ is a bounded linear operator with T ∗ = kT k. We have (a) (T1 + T2 )∗ = T1∗ + T2∗ and (b) (λ T )∗ = λ T ∗ . (c) (T1 T2 )∗ = T2∗ T1∗ . (d) If T is invertible in L (H), so is T ∗ , and we have (T ∗ )−1 = (T −1 )∗ . (e) (T ∗ )∗ = T . Proof. Inequality (13.8) shows that ∗ T (y) ≤ kT k kyk , By definition, this implies and T ∗ is bounded. Since y ∈ H. ∗ T ≤ kT k ∗ T (x) , y = hy , T ∗ (x)i = hT (y) , xi = hx , T (y)i , we get (T ∗ )∗ = T . We conclude kT k = T ∗∗ ≤ T ∗ ; such that T ∗ = kT k. (a). For x, y ∈ H we have h(T1 + T2 )(x) , yi = hT1 (x) + T2 (x) , yi = hT1 (x) , yi + hT2 (x) , yi = x , T1∗ (y) + x , T2∗ (y) = x , (T1∗ + T2∗ )(y) ; which proves (a). (c) and (d) are left to the reader. A mapping ∗ : A → A such that the above properties (a), (b), and (c) are satisfied is called an involution. An algebra with involution is called a ∗-algebra. We have seen that L (H) is a (non-commutative) ∗-algebra. An example of a commutative ∗-algebra is C(K) with the involution f ∗ (x) = f (x). Example 13.9 (Example 13.8 continued) (a) H = Cn , A = (aij ) ∈ M(n × n, C). Then A∗ = (bij ) has the matrix elements bij = aji . (b) H = L2 ([0, 1]), Tg∗ = Tg . (c) H = L2 (R), Va (f )(t) = f (t − a) (Shift operator), Va∗ = V−a . 13.2 Bounded Linear Operators in Hilbert Spaces 349 (d) H = ℓ2 . The right-shift S is defined by S((xn )) = (0, x1 , x2 , . . . ). We compute the adjoint S ∗. ∞ ∞ X X hS(x) , yi = xn−1 yn = xn yn+1 = h(x1 , x2 , . . . ) , (y2 , y3 , . . . )i . n=2 n=1 Hence, S ∗ ((yn )) = (y2 , y3 , . . . ) is the left-shift. 13.2.3 Classes of Bounded Linear Operators Let H be a complex Hilbert space. (a) Self-Adjoint and Normal Operators Definition 13.13 An operator A ∈ L (H) is called (a) self-adjoint, if A∗ = A, (b) normal , if A∗ A = A A∗ , A self-adjoint operator A is called positive, if hAx , xi ≥ 0 for all x ∈ H. We write A ≥ 0. If A and B are self-adjoint, we write A ≥ B if A − B ≥ 0. A crucial role in proving the simplest properties plays the polarization identity which generalizes the polarization identity from Subsection 13.1.2. However, this exist only in complex Hilbert spaces. 4 hA(x) , yi = hA(x + y) , x + yi − hA(x − y) , x − yi + + i hA(x + iy) , x + iyi − i hA(x − iy) , x − iyi . We use the identity as follows hA(x) , xi = 0 for all x ∈ H implies A = 0. Indeed, by the polarization identity, hA(x) , yi = 0 for all x, y ∈ H. In particular y = A(x) yields A(x) = 0 for all x; thus, A = 0. Remarks 13.2 (a) A is normal if and only if kA(x)k = A∗ (x) for all x ∈ H. Indeed, if A is normal, then for all x ∈ H we have A∗ A(x) , x = A A∗ (x) , x which imply kA(x)k2 = 2 hA(x) , A(x)i = A∗ (x) , A∗ (x) = A∗ (x) . On the other hand, the polarization identity and A∗ A(x) , x = A A∗ (x) , x implies (A∗ A − A A∗ )(x) , x = 0 for all x; hence A∗ A − A A∗ = 0 which proves the claim. (b) Sums and real scalar multiples of self-adjoint operators are self-adjoint. (c) The product AB of self-adjoint operators is self-adjoint if and only if A and B commute with each other, AB = BA. (d) A is self-adjoint if and only if hAx , xi is real for all x ∈ H. Proof. Let A∗ = A. Then hAx , xi = hx , Axi = hAx , xi is real; for the opposite direction hA(x) , xi = hx , A(x)i and the polarization identity yields hA(x) , yi = hx , A(y)i for all x, y; hence A∗ = A. 13 Hilbert Space 350 (b) Unitary and Isometric Operators Definition 13.14 Let T ∈ L (H). Then T is called (a) unitary, if T ∗ T = I = T T ∗ . (b) isometric, if kT (x)k = kxk for all x ∈ H. Proposition 13.20 (a) T is isometric if and only if T ∗ T = I and if and only if hT (x) , T (y)i = hx , yi for all x, y ∈ H. (b) T is unitary, if and only if T is isometric and surjective. (c) If S, T are unitary, so are ST and T −1 . The unitary operators of L (H) form a group. Proof. (a) T isometric yields hT (x) , T (x)i = hx , xi and further (T ∗ T − I)(x) , x = 0 for all x. The polarization identity implies T ∗ T = I. This implies (T ∗ T − I)(x) , y = 0, for all x, y ∈ H. Hence, hT (x) , T (y)i = hx , yi. Inserting y = x shows T is isometric. (b) Suppose T is unitary. T ∗ T = I shows T is isometric. Since T T ∗ = I, T is surjective. Suppose now, T is isometric and surjective. Since T is isometric, T (x) = 0 implies x = 0; hence, T is bijective with an inverse operator T −1 . Insert y = T −1 (z) into hT (x) , T (y)i = hx , yi. This gives hT (x) , zi = x , T −1 (z) , x, z ∈ H. Hence T −1 = T ∗ and therefore T ∗ T = T T ∗ = I. (c) is easy (see homework 45.4). Note that an isometric operator is injective with norm 1 (since kT (x)k / kxk = 1 for all x). In case H = Cn , the unitary operators on Cn form the unitary group U(n). In case H = Rn , the unitary operators on H form the orthogonal group O(n). Example 13.10 (a) H = L2 (R). The shift operator Va is unitary since Va Vb = Va+b . The multiplication operator Tg f = gf is unitary if and only if | g | = 1. Tg is self-adjoint (resp. positive) if and only if g is real (resp. positive). (b) H = ℓ2 , the right-shift S((xn )) = (0, x1 , x2 , . . . ) is isometric but not unitary since S is not surjective. S ∗ is not isometric since S ∗ (1, 0, . . . ) = 0; hence S ∗ is not injective. (c) Fourier transform. For f ∈ L1 (R) define Z 1 (Ff )(t) = √ e−itx f (x) dx. 2π R Let S(R) = {f ∈ C∞ (R) | supt∈R tn f (k) (t) < ∞, ∀ n, k ∈ Z+ }. S(R) is called the Schwartz space after Laurent Schwartz. We have S(R) ⊆ L1 (R) ∩ L2 (R), for example, f (x) = 2 e−x ∈ S(R). We will show later that F : S(R) → S(R) is bijective and norm preserving, kF(f )kL2 (R) = kf kL2 (R) , f ∈ S(R). F has a unique extension to a unitary operator on L2 (R). The inverse Fourier transform is Z 1 −1 (F f )(t) = √ eitx f (x) dx, f ∈ S(R). 2π R 13.2 Bounded Linear Operators in Hilbert Spaces 351 13.2.4 Orthogonal Projections (a) Riesz’s First Theorem—revisited Let H1 be a closed linear subspace. By Theorem 13.7 any x ∈ H has a unique decomposition x = x1 + x2 with x1 ∈ H1 and x2 ∈ H1⊥ . The map PH1 (x) = x1 is a linear operator from H to H, (see homework 44.1). PH1 is called the orthogonal projection from H onto the closed subspace H1 . Obviously, H1 is the image of PH1 ; in particular, PH1 is surjective if and only if H1 = H. In this case, PH = I is the identity. Since kPH1 (x)k2 = kx1 k2 ≤ kx1 k2 + kx2 k2 = kxk2 we have kPH1 k ≤ 1. If H1 6= {0}, there exists a non-zero x1 ∈ H1 such that kPH1 (x1 )k = kx1 k. This shows kPH1 k = 1. Here is the algebraic characterization of orthogonal projections. Proposition 13.21 A linear operator P ∈ L (H) is an orthogonal projection if and only if P 2 = P and P ∗ = P . In this case H1 = {x ∈ H | P (x) = x}. Proof. “→”. Suppose that P = PH1 is the projection onto H1 . Since P is the identity on H1 , P 2 (x) = P (x1 ) = x1 = P (x) for all x ∈ H; hence P 2 = P . Let x = x1 + x2 and y = y1 + y2 be the unique decompositions of x and y in elements of H1 and H1⊥ , respectively. Then hP (x) , yi = hx1 , y1 + y2 i = hx1 , y1 i + hx1 , y2 i = hx1 , y1 i = hx1 + x2 , y1 i = hx , P (y)i , | {z } =0 that is, P ∗ = P . “←”. Suppose P 2 = P = P ∗ and put H1 = {x | P (x) = x}. First note, that for P 6= 0, H1 6= {0} is non-trivial. Indeed, since P (P (x)) = P (x), the image of P is part of the eigenspace of P to the eigenvalues 1, P (H) ⊂ H1 . Since for z ∈ H1 , P (z) = z, H1 ⊂ P (H) and thus H1 = P (H). Since P is continuous and {0} is closed, H1 = (P − I)−1 ({0}) is a closed linear subspace of H. By Riesz’s first theorem, H = H1 ⊕ H1⊥ . We have to show that P (x) = x1 for all x. Since P 2 = P , P (P (x)) = P (x) for all x; hence P (x) ∈ H1 . We show x − P (x) ∈ H1⊥ which completes the proof. For, let z ∈ H1 , then hx − P (x) , zi = hx , zi − hP (x) , zi = hx , zi − hx , P (z)i = hx , zi − hx , zi = 0. Hence x = P (x) + (I − P )(x) is the unique Riesz decomposition of x with respect to H1 and H1⊥ . Example 13.11 (a) Let {x1 , . . . , xn } be an NOS in H. Then P (x) = n X k=1 hx , xk i xk , x ∈ H, 13 Hilbert Space 352 defines the orthogonal projection P : H → H onto lin {x1 , . . . , xn }. Indeed, since P (xm ) = Pn 2 k=1 hxm , xk i xk = xm , P = P and since + * n n X X hP (x) , yi = hx , xk i hxk , yi = x , hxk , yi xk = hx , P (y)i . k=1 k=1 Hence, P ∗ = P and P is a projection. (b) H = L2 ([0, 1] ∪ [2, 3]), g ∈ C([0, 1] ∪ [2, 3]). For f ∈ H define Tg f = gf . Then Tg = (Tg )∗ if and only if g(t) is real for all t. Tg is an orthogonal projection if g(t)2 = g(t) such that g(t) = 0 or g(t) = 1. Since g is continuous, there are only four solutions: g1 = 0, g2 = 1, g3 = χ[0,1] , and g4 = χ[2,3] . In case of g3 , the subspace H1 can be identified with L2 ([0, 1]) since f ∈ H1 iff Tg f = f iff gf = f iff f (t) = 0 for all t ∈ [2, 3]. (b) Properties of Orthogonal Projections Throughout this paragraph let P1 and P2 be orthogonal projections on the closed subspaces H1 and H2 , respectively. Lemma 13.22 The following are equivalent. (a) P1 + P2 is an orthogonal projection. (b) P1 P2 = 0. (c) H1 ⊥ H2 . Proof. (a) → (b). Let P1 + P2 be a projection. Then ! (P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + P2 + P1 P2 + P2 P1 = P1 + P2 , hence P1 P2 + P2 P1 = 0. Multiplying this from the left by P1 and from the right by P1 yields P1 P2 + P1 P2 P1 = 0 = P1 P2 P1 + P2 P1 . This implies P1 P2 = P2 P1 and finally P1 P2 = P2 P1 = 0. (b) → (c). Let x1 ∈ H1 and x2 ∈ H2 . Then 0 = hP1 P2 (x2 ) , x1 i = hP2 (x2 ) , P1 (x1 )i = hx2 , x1 i . This shows H1 ⊥ H2 . (c) → (b). Let x, z ∈ H be arbitrary. Then hP1 P2 (x) , zi = hP2 (x) , P1 (z)i = hx2 , z1 i = 0; Hence P1 P2 (x) = 0 and therefore P1 P2 = 0. The same argument works for P2 P1 = 0. (b) → (a). Since P1 P2 = 0 implies P2 P1 = 0 (via H1 ⊥ H2 ), (P1 + P2 )∗ = P1∗ + P2∗ = P1 + P2 , (P1 + P2 )2 = P12 + P1 P2 + P2 P1 + P22 = P1 + 0 + 0 + P2 . 13.2 Bounded Linear Operators in Hilbert Spaces 353 Lemma 13.23 The following are equivalent P1 P2 is an orthogonal projection. P1 P2 = P2 P1 . (a) (b) In this case, P1 P2 is the orthogonal projection onto H1 ∩ H2 . Proof. (b) → (a). (P1 P2 )∗ = P2∗ P1∗ = P2 P1 = P1 P2 , by assumption. Moreover, (P1 P2 )2 = P1 P2 P1 P2 = P1 P1 P2 P2 = P1 P2 which completes this direction. (a) → (b). P1 P2 = (P1 P2 )∗ = P2∗ P1∗ = P2 P1 . Clearly, P1 P2 (H) ⊆ H1 and P2 P1 (H) ⊆ H2 ; hence P1 P2 (H) ⊆ H1 ∩ H2 . On the other hand x ∈ H1 ∩ H2 implies P1 P2 x = x. This shows P1 P2 (H) = H1 ∩ H2 . The proof of the following lemma is quite similar to that of the previous two lemmas, so we omit it (see homework 40.5). Lemma 13.24 The following are equivalent. (a) H1 ⊆ H2 , (b) (d) P1 P2 = P1 , (e) P2 − P1 (c) is an orth. projection, (f) P1 ≤ P2 , P2 P1 = P1 , kP1 (x)k ≤ kP2 (x)k , x ∈ H. Proof. We show (d) ⇒ (c). From P1 ≤ P2 we conclude that I − P2 ≤ I − P2 . Note that both I − P1 and I − P2 are again orthogonal projections on H1⊥ and H2⊥ , respectively. Thus for all x ∈ H: k(I − P2 )P1 (x)k2 = h(I − P2 )P1 (x) , (I − P2 )P1 (x)i = (I − P2 )∗ (I − P2 )P1 (x) , P1 (x) = h(I − P2 )P1 (x) , P1 (x)i ≤ h(I − P1 )P1 (x) , P1 (x)i = hP1 (x) − P1 (x) , P1 (x)i = h0 , P1 (x)i = 0. Hence, k(I − P2 )P1 (x)k2 = 0 which implies (I − P2 )P1 = 0 and therefore P1 = P2 P1 . 13.2.5 Spectrum and Resolvent Let T ∈ L (H) be a bounded linear operator. (a) Definitions Definition 13.15 (a) The resolvent set of T , denoted by ρ(T ), is the set of all λ ∈ C such that there exists a bounded linear operator Rλ (T ) ∈ L (H) with Rλ (T )(T − λI) = (T − λI)Rλ (T ) = I, 13 Hilbert Space 354 i. e. there T − λI has a bounded (continuous) inverse Rλ (T ). We call Rλ (T ) the resolvent of T at λ. (b) The set C \ ρ(T ) is called the spectrum of T and is denoted by σ(T ). (c) λ ∈ C is called an eigenvalue of T if there exists a nonzero vector x, called eigenvector, with (T − λI)x = 0. The set of all eigenvalues is the point spectrum σp (T ) Remark 13.3 (a) Note that the point spectrum is a subset of the spectrum, σp (T ) ⊆ σ(T ). Suppose to the contrary, the eigenvalue λ with eigenvector y belongs to the resolvent set. Then there exists Rλ (T ) ∈ L (H) with y = Rλ (T )(T − λI)(y) = Rλ (T )(0) = 0 which contradicts the definition of an eigenvector; hence eigenvalues belong to the spectrum. (b) λ ∈ σp (T ) is equivalent to T − λI not being injective. It may happen that T − λI is not surjective, which also implies λ ∈ σ(T ) (see Example 13.12 (b) below). Example 13.12 (a) H = Cn , A ∈ M(n × n, C). Since in finite dimensional spaces T ∈ L (H) is injective if and only if T is surjective, σ(A) = σp (A). (b) H = L2 ([0, 1]). (T f )(x) = xf (x). We have σp (T ) = ∅. Indeed, suppose λ is an eigenvalue and f ∈ L 2 ([0, 1]) an eigenfunction to T , that is (T − λI)(f ) = 0; hence (x − λ)f (x) ≡ 0 a. e. on [0, 1]. Since x − λ is nonzero a. e. , f = 0 a. e. on [0, 1]. That is f = 0 in H which contradicts the definition of an eigenvector. We have C \ [0, 1] ⊆ ρ(T ). Suppose λ 6∈ [0, 1]. Since x − λ 6= 0 for all x ∈ [0, 1], g(x) = bounded) function on [0, 1]. Hence, (Rλ f )(x) = 1 is a continuous (hence x−λ 1 f (x) x−λ defines a bounded linear operator which is inverse to T − λI since 1 1 (T − λI) f (x) = (x − λ) f (x) = f (x). x−λ x−λ We have σ(T ) = [0, 1]. Suppose to the contrary that there exists λ ∈ ρ(T ) ∩ [0, 1]. Then there exists Rλ ∈ L (H) with Rλ (T − λI) = I. (13.10) By homework 39.5 (a), the norm of the multiplication operator Tg is less that or equal to kgk∞ (the supremum norm of g). Choose fε = χ(λ−ε,λ+ε) . Since χM = χ2M , k(T − λI)fε k = (x − λ)χUε (λ) (x)fε (x) ≤ sup (x − λ)χUε (λ) (x) kfε k . x∈[0,1] 13.2 Bounded Linear Operators in Hilbert Spaces However, sup (x − λ)χUε (λ) (x) = sup | x − λ | = ε. x∈[0,1] This shows 355 x∈Uε (λ) k(T − λI)fε k ≤ ε kfε k . Inserting fε into (13.10) we obtain kfε k = kRλ (T − λI)fε k ≤ kRλ k k(T − λI)fε k ≤ kRλ k ε kfε k which implies kRλ k ≥ 1/ε. This contradicts the boundedness of Rλ since ε > 0 was arbitrary. (b) Properties of the Spectrum Lemma 13.25 Let T ∈ L (H). Then σ(T ∗ ) = σ(T )∗ , (complex conjugation) ρ(T ∗ ) = ρ(T )∗ . Proof. Suppose that λ ∈ ρ(T ). Then there exists Rλ (T ) ∈ L (H) such that Rλ (T )(T − λI) = (T − λI)Rλ (T ) = I (R (T )(T − λI))∗ = ((T − λI)R )∗ = I λ λ (T − λI)Rλ (T )∗ = Rλ (t)∗ (T ∗ − λI) = I. ∗ This shows Rλ (T ∗ ) = Rλ (T )∗ is again a bounded linear operator on H. Hence, ρ(T ∗ ) ⊆ (ρ(T ))∗ . Since ∗ is an involution (T ∗∗ = T ), the opposite inclusion follows. Since σ(T ) is the complement of the resolvent set, the claim for the spectrum follows as well. For λ, µ, T and S we have Rλ (T ) − Rµ (T ) = (λ − µ)Rλ (T )Rµ (T ) = (λ − µ)Rµ (T )Rλ (T ), Rλ (T ) − Rλ (S) = Rλ (T )(S − T )Rλ (S). Proposition 13.26 (a) ρ(T ) is open and σ(T ) is closed. (b) If λ0 ∈ ρ(T ) and | λ − λ0 | < kRλ0 (T )k−1 then λ ∈ ρ(T ) and Rλ (T ) = ∞ X n=0 (λ − λ0 )n Rλ0 (T )n+1 . (c) If | λ | > kT k, then λ ∈ ρ(T ) and Rλ (T ) = − ∞ X n=0 λ−n−1 T n . 13 Hilbert Space 356 Proof. (a) follows from (b). (b) For brevity, we write Rλ0 in place of Rλ0 (T ). With q = | λ − λ0 | kRλ0 (T )k, q ∈ (0, 1) we have ∞ ∞ X X kRλ0 k n n+1 converges. | λ − λ0 | kRλ0 k = q n kRλ0 k = 1 − q n=0 n=0 P P By homework 38.4, xn converges if kxn k converges. Hence, B= ∞ X n=0 (λ − λ0 )n Rλn+1 0 converges in L (H) with respect to the operator norm. Moreover, (T − λI)B = (T − λ0 I)B − (λ − λ0 )B ∞ ∞ X X n n+1 = (λ − λ0 ) (T − λ0 I)Rλ0 − (λ − λ0 )n+1 Rλn+1 0 n=0 = n=0 ∞ X n=0 (λ − λ0 )n Rλn0 − = (λ − λ0 ) 0 Rλ0 0 = I. ∞ X n=0 (λ − λ0 )n+1 Rλn+1 0 Similarly, one shows B(T − λI) = I. Thus, Rλ (T ) = B. (c) Since | λ | > kT k, the series converges with respect to operator norm, say C=− We have (T − λI)C = − ∞ X λ ∞ X λ−n−1 T n . n=0 −n−1 T n+1 n=0 + ∞ X λ−n T n = λ0 T 0 = I. n=0 Similarly, C(T − λI) = I; hence Rλ (T ) = C. Remarks 13.4 (a) By (b), Rλ (T ) is a holomorphic (i. e. complex differentiable) function in the variable λ with values in L (H). One can use this to show that the spectrum is non-empty, σ(T ) 6= ∅. P n (b) If kT k < 1, T − I is invertible with inverse − ∞ n=0 T . (c) Proposition 13.26 (c) means: If λ ∈ σ(T ) then | λ | ≤ kT k. However, there is, in general, a smaller disc around 0 which contains the spectrum. By definition, the spectral radius r(T ) of T is the smallest non-negative number such that the spectrum is completely contained in the disc around 0 with radius r(T ): σ(Τ) 0 r(T) r(T ) = sup{| λ | | λ ∈ σ(T )}. 13.2 Bounded Linear Operators in Hilbert Spaces 357 (d) λ ∈ σ(T ) implies λn ∈ σ(T n ) for all non-negative integers. Indeed, suppose λn ∈ ρ(T n ), that is B(T n − λn ) = (T n − λn )B = I for some bounded B. Hence, B n X k=0 T k λn−1−k (T − λ) = (T − λ)CB = I; thus λ ∈ ρ(T ). We shall refine the above statement and give a better upper bound for {| λ | | λ ∈ σ(T )} than kT k. Proposition 13.27 Let T ∈L (H) be a bounded linear operator. Then the spectral radius of T is 1 r(T ) = lim kT n k n . (13.11) n→∞ The proof is in the appendix. 13.2.6 The Spectrum of Self-Adjoint Operators Proposition 13.28 Let T = T ∗ be a self-adjoint operator in L (H). Then λ ∈ ρ(T ) if and only if there exists C > 0 such that k(T − λI)xk ≥ C kxk . Proof. Suppose that λ ∈ ρ(T ). Then there exists (a non-zero) bounded operator Rλ (T ) such that kxk = kRλ (T )(T − λI)xk ≤ kRλ (T )k k(T − λI)xk . Hence, k(T − λI)xk ≥ 1 kxk , kRλ (T )k x ∈ H. We can choose C = 1/ kRλ (T )k and the condition of the proposition is satisfied. Suppose, the condition is satisfied. We prove the other direction in 3 steps, i. e. T − λ0 I has a bounded inverse operator which is defined on the whole space H. Step 1. T − λI is injective. Suppose to the contrary that (T − λ)x1 = (T − λ)x2 . Then 0 = k(T − λ)(x1 − x2 )k ≥ C kx1 − x2 k , and kx1 − x2 k = 0 follows. That is x1 = x2 . Hence, T − λI is injective. Step 2. H1 = (T − λI)H, the range of T − λI is closed. Suppose that yn = (T − λI)xn , xn ∈ H, converges to some y ∈ H. We want to show that y ∈ H1 . Clearly (yn ) is a Cauchy sequence such that kym − yn k → 0 as m, n → ∞. By assumption, kym − yn k = k(T − λI)(xn − xm )k ≥ C kxn − xm k . Thus, (xn ) is a Cauchy sequence in H. Since H is complete, xn → x for some x ∈ H. Since T − λI is continuous, yn = (T − λI)xn −→ (T − λI)x. n→∞ 13 Hilbert Space 358 Hence, y = (T − λI)x and H1 is a closed subspace. Step 3. H1 = H. By Riesz first theorem, H = H1 ⊕ H1⊥ . We have to show that H1⊥ = {0}. Let u ∈ H1⊥ , that is, since T ∗ = T , 0 = h(T − λI)x , ui = x , (T − λI)u , for all x ∈ H. This shows (T − λI)u = 0, hence T (u) = λu. This implies hT (u) , ui = λ hu , ui . However, T = T ∗ implies that the left side is real, by Remark 13.2 (d). Hence λ = λ is real. We conclude, (T − λI)u = 0. By injectivity of T − λI, u = 0. That is H1 = H. We have shown that there exists a linear operator S = (T − λI)−1 which is inverse to T − λI and defined on the whole space H. Since kyk = k(T − λI)S(y)k ≥ C kS(y)k , S is bounded with kSk ≤ 1/C. Hence, S = Rλ (T ). Note that for any bounded real function f (x, y) we have sup f (x, y) = sup(sup f (x, y)) = sup(sup f (x, y)). x,y x y y x In particular, kxk = sup | hx , yi | since y = x/ kxk yields the supremum and CSI gives the kyk≤1 upper bound. Further, kT (x)k = sup | hT (x) , yi | such that kyk≤1 kT k = sup sup | hT (x) , yi | = kxk≤1 kyk≤1 sup kxk≤1, kyk≤1 | hT (x) , yi | = sup sup | hT (x) , yi | kyk≤1 kxk≤1 In case of self-adjoint operators we can generalize this. Proposition 13.29 Let T = T ∗ ∈ L (H). Then we have kT k = sup | hT (x) , xi | . (13.12) kxk≤1 Proof. Let C = sup | hT (x) , xi |. By Cauchy–Schwarz inequality, | hT (x) , xi | ≤ kT k kxk2 kxk≤1 such that C ≤ kT k. For any real positive α > 0 we have: 1 kT (x)k2 = hT (x) , T (x)i = T 2 (x) , x = T (αx + α−1 T (x)) , αx + α−1 T (x) − 4 = − T (αx − α−1 T (x)) , αx − α−1 T (x) 2 2 1 C αx + α−1 T (x) + C αx − α−1 T (x) ≤ 4 2 C 2 C 2 kαxk2 + 2 α−1 T (x) = α kxk2 + α−2 kT (x)k2 . = P.I. 4 2 13.2 Bounded Linear Operators in Hilbert Spaces 359 Inserting α2 = kT (x)k / kxk we obtain = C (kT (x)k kxk + kxk kT (x)k) 2 which implies kT (x)k ≤ C kxk. Thus, kT k = C. Let m = inf hT (x) , xi and M = sup hT (x) , xi denote the lower and upper bound of T . kxk=1 kxk=1 Then we have sup | hT (x) , xi | = max{| m | , M} = kT k , kxk≤1 and m kxk2 ≤ hT (x) , xi ≤ M kxk2 , for all x ∈ H. Corollary 13.30 Let T = T ∗ ∈ L (H) be a self-adjoint operator. Then σ(T ) ⊂ [m, M]. Proof. Suppose that λ0 6∈ [m, M]. Then C := inf µ∈[m,M ] | λ0 − µ | > 0. Since m = inf hT (x) , xi and M = sup hT (x) , xi we have for kxk = 1 kxk=1 kxk=1 2 k(T − λ0 I)xk = kxk k(T − λ0 I)xk ≥ | h(T − λ0 I)x , xi | = hT (x) , xi − λ0 kxk ≥ C. |{z} | {z } CSI ∈[m,M ] 1 This implies k(T − λ0 I)xk ≥ C kxk for all x ∈ H. By Proposition 13.28, λ0 ∈ ρ(T ). Example 13.13 (a) Let H = L2 [0, 1], g ∈ C[0, 1] a real-valued function, and (Tg f )(t) = g(t)f (t). Let m = inf g(t), M = sup g(t). One proves that m and M are the lower and t∈[0,1] t∈[0,1] upper bounds of Tg such that σ(Tg ) ⊆ [m, M]. Since g is continuous, by the intermediate value theorem, σ(Tg ) = [m, M]. (b) Let T = T ∗ ∈ L (H) be self-adjoint. Then all eigenvalues of T are real and eigenvectors to different eigenvalues are orthogonal to each other. Proof. The first statement is clear from Corollary 13.30. Suppose that T (x) = λx and T (y) = µy with λ 6= µ. Then λ hx , yi = hT (x) , yi = hx , T (y)i = µ hx , yi = µ hx , yi . Since λ 6= µ, hx , yi = 0. The statement about orthogonality holds for arbitrary normal operators. 13 Hilbert Space 360 Appendix: Compact Self-Adjoint Operator in Hilbert Space Proof of Proposition 13.27. From the theory of power series, Theorem 2.34 we know that the series −z ∞ X n=0 kT n k z n (13.13) converges if | z | < R and diverges if | z | > R, where R= lim n→∞ 1 p n kT n k . (13.14) Inserting z = 1/λ and using homework 38.4, we have − p n ∞ X λ−n−1 T n n=0 n→∞ kT n k (and converges if | λ | > lim p n kT n k). The reason for the p divergence of the power series is, that the spectrum σ(T ) and the circle with radius lim n kT n k n→∞ have points in common; hence p r(T ) = lim n kT n k. diverges if | λ | < lim n→∞ n→∞ On the other hand, by Remark 13.4 (d), λ ∈ σ(T ) implies λn ∈ σ(T n ); hence, by Remark 13.4 (c), p | λn | ≤ kT n k =⇒ | λ | ≤ n kT n k. Taking the supremum over all λ ∈ σ(T ) on the left and the lim over all n on the right, we have p p r(T ) ≤ lim n kT n k ≤ lim n kT n k = r(T ). n→∞ Hence, the sequence p n n→∞ kT n k converges to r(T ) as n tends to ∞. Compact operators generalize finite rank operators. Integral operators on compact sets are compact. Definition 13.16 A linear operator T ∈ L (H) is called compact if the closure T (U1 ) of the unit ball U1 = {x | kxk ≤ 1} is compact in H. In other words, for every sequence (xn ), xn ∈ U1 , there exists a subsequence such that T (xnk ) converges. Proposition 13.31 For T ∈ L (H) the following are equivalent: (a) T is compact. (b) T ∗ is compact. (c) For all sequences (xn ) with (hxn , yi) −→ hx , yi converges for all y we have T (xn ) −→ T (x). (d) There exists a sequence (Tn ) of operators of finite rank such that kT − Tn k −→ 0. 13.2 Bounded Linear Operators in Hilbert Spaces 361 Definition 13.17 Let T be an operator on H and H1 a closed subspace of H. We call H1 an reducing subspace if both H1 and H1⊥ are T -invariant, i. e. T (H1 ) ⊂ H1 and T (H1⊥ ) ⊂ H1⊥ . Proposition 13.32 Let T ∈ L (H) be normal. (a) The eigenspace ker(T − λI) is a reducing subspace for T and ker(T − λI) = ker(T − λI)∗ . (b) If λ, µ are distinct eigenvalues of T , ker(T − λI) ⊥ ker(T − µI). Proof. (a) Since T is normal, so is T − λ. Hence k(T − λ)(x)k = (T − λ)∗ (x). Thus, ker(T − λ) = ker(T − λ)∗ . In particular, T ∗ (x) = λx if x ∈ ker(T − λI). We show invariance. Let x ∈ ker(T − λ); then T (x) = λx ∈ ker(T − αI). Similarly, x ∈ ker(T − λI)⊥ , y ∈ ker(T − λI) imply hT (x) , yi = x , T ∗ (y) = x , λy = 0. Hence, ker(T − λI)⊥ is T -invariant, too. (b) Let T (x) = λx and T (y) = µy. Then (a) and T ∗ (y) = µy ... imply λ hx , yi = hT (x) , yi = x , T ∗ (y) = hx , µyi = µ hx , yi . Thus (λ − µ) hx , yi = 0; since λ 6= µ, x ⊥ y. Theorem 13.33 (Spectral Theorem for Compact Self-Adjoint Operators) Let H be an infinite dimensional separable Hilbert space and T ∈ L (H) compact and self-adjoint. Then there exists a real sequence (λn ) with λn −→ 0 and an CNOS {en | n ∈ N} ∪ {fk | k ∈ n→∞ N ⊂ N} such that T (en ) = λn en , n∈N T (fk ) = 0, k ∈ N. Moreover, T (x) = ∞ X n=1 λn hx , en i en , x ∈ H. (13.15) Remarks 13.5 (a) Since {en } ∪ {fk } is a CNOS, any x ∈ H can be written as its Fourier series x= ∞ X n=1 hx , en i en + X k∈N hx , fk i fk . Applying T using T (en ) = λn en we have T (x) = ∞ X n=1 hx , en i λn en + X k∈N hx , fk i T (fk ) | {z } =0 which establishes (13.15). The main point is the existence of a CNOS of eigenvectors {en } ∪ {fk }. (b) In case H = Cn (Rn ) the theorem says that any hermitean (symmetric) matrix A is diagonalizable with only real eigenvalues. 362 13 Hilbert Space Chapter 14 Complex Analysis Here are some useful textbooks on Complex Analysis: [FL88] (in German), [Kno78] (in German), [Nee97], [Rüh83] (in German), [Hen88]. The main part of this chapter deals with holomorphic functions which is another name for a function which is complex differentiable in an open set. On the one hand, we are already familiar with a huge class of holomorphic functions: polynomials, the exponential function, sine and cosine functions. On the other hand holomorphic functions possess quite amazing properties completely unusual from the vie point of real analysis. The properties are very strong. For example, it is easy to construct a real function which is 17 times differentiable but not 18 times. A complex differentiable function (in a small region) is automatically infinitely often differentiable. Good references are Ahlfors [Ahl78], a little harder is Conway [Con78], easier is Howie [How03]. 14.1 Holomorphic Functions 14.1.1 Complex Differentiation We start with some notations. Ur UR (a) Ur ◦ Ur Sr {z | | z | < r} {z | | z − a | < R} {z | | z | ≤ r} {z | 0 < | z | < r} {z | | z | = r} open ball of radius r around 0 open ball of radius R around a closed ball of radius r around 0 punctured ball of radius r circle of radius r around 0 Definition 14.1 Let U ⊂ C be an open subset of C and f : U → C be a complex function. (a) If z0 ∈ U and the limit f (z) − f (z0 ) lim =: f ′ (z0 ) z→z0 z − z0 exists, we call f complex differentiable at z0 and f ′ (z0 ) the derivative of f at z0 . We call f ′ (z0 ) the derivative of f at z0 . 363 14 Complex Analysis 364 (b) If f is complex differentiable for every z0 ∈ U, we say that f is holomorphic in U. We call f ′ the derivative of f on U. (c) f is holomorphic at z0 if it complex differentiable in a certain neighborhood of z0 . To be quite explicit, f ′ (z0 ) exists if to every ε > 0 there exists some δ > 0 such that z ∈ Uδ (z0 ) implies f (z) − f (z0 ) ′ − f (z0 ) < ε. z − z0 Remarks 14.1 (a) Differentiability of f at z0 forces f to be continuous at z0 . Indeed, f is differentiable at z0 with derivative f ′ (z0 ) if and only if, there exists a function r(z, z0 ) such that f (z) = f (z0 ) + f ′ (z0 )(z − z0 ) + (z − z0 )r(z, z0 ), where limz→z0 r(z, z0 ) = 0, In particular, taking the limit z → z0 in the above equation we get lim f (z) = f (z0 ), z→z0 which proves continuity at z0 . Complex conjugation is a uniformly continuous function on C since | z − z 0 | = | z − z0 | for all z, z0 ∈ C. (b) The derivative satisfies the well-known sum, product, and quotient rules, that is, if both f and g are holomorphic in U, so are f + g, f g, and f /g, provided g 6= 0 in U and we have ′ f ′g − f g′ f ′ ′ ′ ′ ′ ′ = . (f + g) = f + g , (f g) = f g + f g , g g2 f g Also, the chain rule holds; if U → V → C are holomorphic, so is g ◦f and (g ◦f )′ (z) = g ′(f (z)) f ′ (z). The proofs are exactly the same as in the real case. Since the constant functions f (z) = c and the identity f (z) = z are holomorphic in C, so is every polynomial with complex coefficients and, moreover, every rational function (quotient of two polynomials) f : U → C, provided the denominator has no zeros in U. So, we already know a large class of holomorphic functions. Another bigger class are the convergent power series. Example 14.1 f (z) = | z |2 is complex differentiable at 0 with f ′ (0) = 0. f is not differentiable at z0 = 1. Indeed, f (h + 0) − f (0) | h |2 lim = lim = lim h = 0. h→0 h→0 h h→0 h On the other hand. Let ε ∈ R | 1 + ε |2 − 1 2ε + ε2 = lim =2 ε→0 ε→0 ε ε lim whereas | 1 + iε |2 − 1 1 + ε2 − 1 = lim = 0. lim ε→0 ε→0 iε iε This shows that f ′ (1) does not exist. 14.1 Holomorphic Functions 365 14.1.2 Power Series Recall from Subsection 2.3.9 that a power series R= lim n→∞ P 1 p n cn z n has a radius of convergence | cn | . That is, the series converges absolutely for all z with | z | < R; the series diverges for | z | > R, the behaviour for | z | = R depends on the (cn ). Moreover, it converges uniformly on every closed ball Ur with 0 < r < R, see Proposition 6.4. We already know that a real power series can be differentiated elementwise, see Corollary 6.11. We will see, that power series are holomorphic inside its radius of convergence. Proposition 14.1 Let a ∈ C and f (z) = ∞ X n=0 cn (z − a)n (14.1) be a power series with radius of convergence R. Then f : UR (a) → C is holomorphic and the derivative is ′ f (z) = ∞ X n=1 ncn (z − a)n−1 . (14.2) Proof. If the series (14.1) converges in UR (a), the root test shows that the series (14.2) also converges there. Without loss of generality, take a = 0. Denote the sum of the series (14.2) by g(z), fix w ∈ UR (0) and choose r so that | w | < r < R. If z 6= w, we have n ∞ X z − wn f (z) − f (w) n−1 cn . − g(w) = − nw z−w z − w n=0 The expression in the brackets is 0 if n = 1. For n ≥ 2 it is (by direct computation of the following term) = (z − w) n−1 X k=1 kw k−1 z n−k−1 = n−1 X k=1 kw k−1 z n−k − kw k z n−k−1 , (14.3) which gives a telescope sum if we shift k := k + 1 in the first summand. If | z | < r, the absolute value of the sum (14.3) is less than n(n − 1) n−2 r , 2 so ∞ X f (z) − f (w) − g(w) ≤ | z − w | n2 | cn | r n−2 . (14.4) z−w n=2 Since r < R, the last series converges. Hence the left side of (14.4) tends to 0 as z → w. This says that f ′ (w) = g(w), and completes the proof. 14 Complex Analysis 366 Corollary 14.2 Since f ′ (z) is again a power series with the same radius of convergence R, the proposition can be applied to f ′ (z). It follows that f has derivatives of all orders and that each derivative has a power series expansion around a f (k) (z) = ∞ X n=k n(n − 1) · · · (n − k + 1)cn (z − a)n−k (14.5) Inserting z = a implies f (k) (a) = k!ck , k = 0, 1, . . . . This shows that the coefficients cn in the power series expansion f (z) = with midpoint a are unique. z Example 14.2 The exponential function e = P∞ n=0 cn (z − a)n of f ∞ X zn is holomorphic on the whole complex n! plane with (ez )′ = ez ; similarly, the trigonometric functions sin z and cos z are holomorphic in C since ∞ ∞ 2n+1 2n X X n z n z sin z = (−1) (−1) , cos z = . (2n + 1)! (2n)! n=0 n=0 n=0 We have (sin z)′ = cos z and (cos z)′ = − sin z. Definition 14.2 A complex function which is defined on C and which is holomorphic on the entire complex plane is called an entire function. 14.1.3 Cauchy–Riemann Equations Let us identify the complex field C and the two dimensional real plane R2 via z = x + iy, that is, every complex number z corresponds to an ordered pair (x, y) of real numbers. In this way, a complex function w = f (z) corresponds to a function U → R2 where U ⊂ C is open. We have w = u + iv where u = u(x, y) and v = v(x, y) are the real and the imaginary parts of the function f ; u = Re w and v = Im w. Problem: What is the relation between complex differentiability and the differentiability of f as a function from R2 to R2 ? Proposition 14.3 Let f : U → C, U ⊂ C open, a∈U be a function. Then the following are equivalent: (a) f is complex differentiable at a. (b) f (x, y) = u(x, y) + iv(x, y) is real differentiable at a as a function f : U ⊂ R2 → R2 , and the Cauchy–Riemann equations are satisfied at a: ∂u ∂v (a) = (a), ∂x ∂y ux = vy , ∂u ∂v (a) = − (a). ∂y ∂x uy = −vx . (14.6) 14.1 Holomorphic Functions 367 In this case, f ′ = ux + ivx = vy − iuy . Proof. (a) → (b): Suppose that z = h + ik is a complex number such that a + z ∈ U; put f ′ (a) = b1 + ib2 . By assumption, | f (a + z) − f (a) − zf ′ (a) | = 0. z→0 |z | lim We shall write this in the real form with real variables h and k. Note that zf ′ (a) = (h + ik)(b1 + ib2 ) = hb1 − kb2 + i(hb2 + kb1 ) hb1 − kb2 b1 −b2 h = . = k hb2 + kb2 b2 b1 This implies, with the identification z = (h, k), h f (a + z) − f (a) − b1 −b2 k b2 b1 = 0. lim z→0 |z | That is (see Subsection 7.2), f is real differentiable at a with the Jacobian matrix b1 −b2 ′ f (a) = Df (a) = . b2 b1 (14.7) By Proposition 7.6, the Jacobian matrix is exactly the matrix of the partial derivatives, that is ux uy . Df (a) = vx vy Comparing this with (14.7), we obtain ux (a) = vy (a) = Re f ′ (a) and uy (a) = −vx (a) = Im f ′ (a). This completes the proof of the first direction. (b) → (a). Since f = (u, v) is differentiable at a ∈ U as a real function, there exists a linear mapping Df (a) ∈ L (R2 ) such that f (a + (h, k)) − f (a) − Df (a) h k lim = 0. (h,k)→0 k(h, k)k By Proposition 7.6, ux uy . Df (a) = vx vy The Cauchy–Riemann equations show that Df takes the form b1 −b2 Df (a) = , b2 b1 14 Complex Analysis 368 where ux = b1 and vx = b2 . Writing h hb1 − kb2 Df (a) = = z(b1 + ib2 ) k hb2 + kb2 in the complex form with z = h+ik gives f is complex differentiable at a with f ′ (a) = b1 +ib2 . Example 14.3 (a) We already know that f (z) = z 2 is complex differentiable. Hence, the Cauchy–Riemann equations must be fulfilled. From f (z) = z 2 = (x + iy)2 = x2 − y 2 + 2ixy, u(x, y) = x2 − y 2 , v(x, y) = 2xy we conclude ux = 2x, uy = −2y, vx = 2y, vy = 2x. The Cauchy–Riemann equations are satisfied. (b) f (z) = | z |2 . Since f (z) = x2 + y 2, u(x, y) = x2 + y 2 , v(x, y) = 0. The Cauchy-Riemann equations yield ux = 2x = 0 = vy and uy = 2y = 0 = −vx such that z = 0 is the only solution of the CRE. z = 0 is the only point where f is differentiable. f (z) = z is nowhere differentiable since u(x, y) = x, v(x, y) = −y; thus 1 = ux 6= vy = −1. c2 c1 c3 A function f : U → C, U ⊂ C open, is called locally constant in U, if for every point a ∈ U there exists a ball V with a ∈ V ⊂ U such that f is constant on V . Clearly, on every connectedness component of U, f is constant. In fact, one can define U to be connected if for every holomorphic f : U → C, f is constant. Corollary 14.4 Let U ⊂ C be open and f : U → C be a holomorphic function on U. (a) If f ′ (z) = 0 for all z ∈ U, then f is locally constant in U. (b) If f takes real values only, then f is locally constant. (c) If f has a continuous second derivative, u = Re f and v = Im f are harmonic functions, i. e., they satisfy the Laplace equation ∆(u) = uxx + uyy = 0 and ∆(v) = 0. Proof. (a) Since f ′ (z) = 0 for all z ∈ U, the Cauchy–Riemann equations imply ux = uy = vx = vy = 0 in U. From real analysis, it is known that u and v are locally constant in U (apply Corollary 7.12 with grad f (a + θx) = 0). (b) Since f takes only real values, v(x, y) = 0 for all (x, y) ∈ U. This implies vx = vy = 0 on U. By the Cauchy–Riemann equations, ux = uy = 0 and f is locally constant by (a). 14.2 Cauchy’s Integral Formula 369 (c) ux = vy implies uxx = vyx and differentiating uy = −vx with respect to y yields uyy = −vxy . Since both u and v are twice continuously differentiable (since so is f ), by Schwarz’ Lemma, the sum is uxx + uyy = vyx − vxy = 0. The same argument works for vxx + vyy = 0. Remarks 14.2 (a) We will see soon that the additional differentiability assumption in (c) is superfluous. (b) Note, that an inverse statement to (c) is easily proved: If Q = (a, b) × (c, d) is an open rectangle and u : Q → R is harmonic, then there exists a holomorphic function f : Q → C such that u = Re f . 14.2 Cauchy’s Integral Formula 14.2.1 Integration The major objective of this section is to prove the converse to Proposition 14.1: Every in D holomorphic function is representable as a power series in D. The quickest route to this is via Cauchy’s Theorem and Cauchy’s Integral Formula. The required integration theory will be developed. It is a useful tool to study holomorphic functions. Recall from Section 5.4 the definition of the Riemann integral of a bounded complex valued function ϕ : [a, b] → C. It was defined by integrating both the real and the imaginary parts of ϕ. In what follows, a path is always a piecewise continuously differentiable curve. Definition 14.3 Let U ⊂ C be open and f : U → C a continuous function on U. Suppose that γ : [t1 , tn ] → U is a path in U. The Z integral of f along γ is defined as the line integral 6 Z5 U f C Z4 Z1 γ Z2 Z3 Z f (z) dz := t n Zk X f (γ(t))γ ′ (t) dt, (14.8) k=1t γ k−1 where γ is continuously differentiable on [tk−1 , tk ] for all k = 1, . . . , n. By the change of variable rule, the integral of f along γ does not depend on the parametrization γ of the path {γ(t) | t ∈ [t0 , t1 ]}. However, if we exchange the initial and the end point of γ(t), we obtain a negative sign. Remarks 14.3 (Properties of the complex integral) (a) The integral of f along γ is linear over C: Z Z Z Z Z (αf1 + βf2 ) dz = α f1 dz + β f2 dz, f (z) dz = − f (z) dz, γ γ γ γ− γ+ 14 Complex Analysis 370 where γ− has the opposite orientation of γ+ . (b) If γ1 and γ2 are two paths so that γ1 and γ2 join to form γ, then we have Z f (z) dz = γ Z f (z) dz + Z f (z) dz. γ2 γ1 (c) From the definition and the triangle inequality, it follows that for a continuously differentiable path γ Z f (z) dz ≤ M ℓ, γ Rb where | f (z) | ≤ M for all z ∈ γ and ℓ is the length of γ, ℓ = a | γ ′ (t) | dt. t ∈ [t0 , t1 ]. Note that the integral on the right is the length of the curve γ(t). Rb (d) The integral of f over γ generalizes the real integral f (t) dt. Indeed, let γ(t) = t, t ∈ [a, b], a then Z f (z) dz = γ Zb f (t) dt. a (e) Let γ be the circle Sr (a) of radius r with center a. We can parametrize the positively oriented circle as γ(t) = a + reit , t ∈ [0, 2π]. Then Z f (z) dz = ir γ Z2π 0 f a + reit eit dt. Example 14.4 (a) Let γ1 (t) = eit , t ∈ [0, π], be the half of the unit circle from 1 to −1 via i and γ2 (t) = −t, t ∈ [−1, 1] the segment from 1 to −1. Then γ1′ (t) = ieit and γ2 (t)′ = −1. Hence, Z γ1 2 z dz = i Z 0 π e2it eit dt = i Z π −2it+it e 0 π i −it = e = −(−1 − 1) = 2. −i 0 Z 1 Z 2 z 2 dz = − t2 dt = − . see (b) 3 −1 dt = i Z π e−it dt 0 γ2 In particular, the integral of z 2 is not path independent. (b) ( Z Z 2π Z 2π 0, n 6= 1, dz ireit = dt = ir −n+1 e−(n−1)it dt = n n int r e 2πi, n = 1. Sr z 0 0 14.2 Cauchy’s Integral Formula 371 14.2.2 Cauchy’s Theorem Cauchy’s theorem is the main part in the proof that every in a holomorphic function can be written as a power series with midpoint a. As a consequence of Corollary 14.2, holomorphic functions have derivatives of all orders. We start with a very weak form. The additional assumption is, that f has an antiderivative. Lemma 14.5 Let f : U → C be continuous, and suppose that f has an antiderivative F which is holomorphic on U, F ′ = f . If γ is any path in U joining z0 and z1 from U, we have Z f (z) dz = F (z2 ) − F (z1 ). γ In particular, if γ is a closed path in U Z f (z) dz = 0. γ Proof. It suffices to prove the statement for a continuously differentiable curve γ(t). Put h(t) = F (γ(t)). By the chain rule h′ (t) = d F (γ(t)) = F ′ (γ(t))γ ′ (t) = f (γ(t))γ ′ (t) dt By definition of the integral and the fundamental theorem of calculus (see Subsection 5.5), Z Z b Z b ′ f (z) dz = f (γ(t))γ (t) dt = h′ (t) dt = h(t)|ba = h(b) − h(a) = F (z1 ) − F (z0 ). a γ Example 14.5 (a) (b) R πi 1 a Z 1−i z 3 dz = 2+3i (1 − i)4 (2 + 3i)4 − . 4 4 ez dz = −1 − e. Theorem 14.6 (Cauchy’s Theorem) Let U be a simply connected region in C and let f (z) be R holomorphic in U. Suppose that γ(t) is a path in U joining z0 and z1 in U. Then f (z) dz γ R depends on z0 and z1 only and not on the choice of the path. In particular, f (z) dz = 0 for γ any closed path in U. Proof. We give the proof under the weak additional assumption that f ′ not only exists but is continuous in U. In this case, the partial derivatives ux , uy , vx , and vy are continuous and we can apply the integrability criterion Proposition 8.3 which was a consequence of Green’s theorem, see Theorem 10.3. Note that we need U to be simply connected in contrast to Lemma 14.5. 14 Complex Analysis 372 Without this additional assumption (f ′ is continuous), the proof is lengthy (see [FB93, Lan89, Jän93]) and starts with triangular or rectangular paths and is generalized then to arbitrary paths. We have Z f (z) dz = γ Z (u + iv)( dx + i dy) = γ Z (u dx − v dy) + i γ We have path independence of the line integral R Z (v dx + u dy). γ P dx + Q dy if and only if, the integrability γ condition Qx = Py is satisfied if and only if P dx + Q dy is a closed form. In our case, the real part is path independent if and only if −vx = uy . The imaginary part is path independent if and only if ux = vy . These are exactly the Cauchy–Riemann equations which are satisfied since f is holomorphic. Remarks 14.4 (a) The proposition holds under the following weaker assumption: f is continuous in the closure U and holomorphic in U, U is a simply connected region, and γ = ∂U is a path. (b) The statement is wrong without the assumption “U is simply connected”. Indeed, consider the circle of radius r with center a, that is γ(t) = a + reit . Then f (z) = 1/(z − a) is singular at a and we have Z 2π it Z 2π Z dz e = ir dt = i dt = 2πi. z−a reit 0 0 Sr (a) δ1 δ3 δ2 (c) For a non-simply connected region G one cuts G with pairwise inverse to each other paths (in the picture: δ1 , δ2 , δ3 and δ4 ). The resulting region G̃ is now simply connected such R that f (z) dz = 0 by δ4 γ γ 2 γ1 ∂G (a). Since the integrals along δi , i = 1, . . . , 4, cancel, we have Z f (z) dz = 0. γ+γ1 +γ2 Sr2 Sr1 In particular, if f is holomorphic in {z | 0 < | z − a | < R} and 0 < r1 < r2 < R, then Z Z f (z) dz = f (z) dz Sr1 (a) Sr2 (a) if both circles are positively oriented. 14.2 Cauchy’s Integral Formula 373 Proposition 14.7 Let U be a simply connected region, z0 ∈ U, U0 = U \ {z0 }. Suppose that f is holomorphic in U0 and bounded in a certain neighborhood of z0 . Then Z f (z) dz = 0 γ for every non-selfintersecting closed path γ in U0 . Proof. Suppose that | f (z) | ≤ C for | z − z0 | < ε0 . For any ε with 0 < ε < ε0 we then have by Remark 14.3 (c) Z f (z) dz ≤ 2πε C. Sε (z0 ) Z Z Z By Remark 14.4 (c), f (z) dz = f (z) dz = f (z) dz. Hence γ Sε0 (z0 ) Sε (z0 ) Z Z f (z) dz = γ Sε (z0 ) Since this is true for all small ε > 0, R f (z) dz ≤ 2πε C. f (z) dz = 0. γ We will see soon that under the conditions of the proposition, f can be made holomorphic at z0 , too. 14.2.3 Cauchy’s Integral Formula Theorem 14.8 (Cauchy’s Integral Formula) Let U be a region. Suppose that f is holomorphic in U, and γ a non-selfintersecting positively oriented path in U such that γ is the boundary of U0 ⊂ U; in particular, U0 is simply connected. Then for every a ∈ U0 we have Z f (z) dz 1 f (a) = (14.9) 2πi z−a γ Proof. a ∈ U0 is fixed. For z ∈ U we define ( F (z) = f (z)−f (a) , z−a 0, z 6= a z = a. Then F (z) is holomorphic in U \ {a} and bounded in a neighborhood of a since f ′ (a) exists and therefore, f (z) − f (a) ′ <ε − f (a) z−a 14 Complex Analysis 374 as z approaches a. Using Proposition 14.7 and Remark14.4 (b) we have R F (z) dz = 0, that is γ Z γ such that Z γ f (z) dz = z−a Z γ f (z) − f (a) dz = 0, z−a f (a) dz = f (a) z−a Z γ dz = 2πi f (a). z−a Remark 14.5 The values of a holomorphic function f inside a path γ are completely determined by the values of f on γ. Example 14.6 Evaluate Ir := Z sin z dz z2 + 1 Sr (a) in cases a = 1 + i and r = 12 , 2, 3. Solution. We use the partial fraction decomposition of z 2 + 1 to obtain linear terms in the denominator. 1 1 1 1 . = − z2 + 1 2i z − i z + i Hence, with f (z) = sin z we have in case r = 3 Z Z Z 1 1 sin z dz sin z = dz − I3 = z2 + 1 2i z−i 2i Sr (a) Sr (a) sin z dz z+i Sr (a) = π(f (i) − f (−i)) = 2π sin(i) = πi(e − 1/e). In case r = 2, the function sin z z+i is holomorphic inside the circle of radius 2 with center a. Hence, I2 = π sin(i) = I3 /2. In case r = 12 , both integrand are holomorphic, such that I 1 = 0. 2 2 γ2 γ3 45° 0 γ1 R Example 14.7 Consider the function f (z) = eiz which is an entire function. Let γ1 (t) = t, t ∈ [0, R], be the segment from 0 to R on the real line; let γ2 (t) = Reit , t ∈ [0, π/4], be the sector of the circle of radius R with center 0; and let finally γ3 (t) = teiπ/4 , t ∈ [0, R], be the segment from 0 to Reiπ/4 . By Cauchy’s Theorem, Z I1 + I2 − I3 = f (z) dz = 0. γ1 +γ2 −γ3 14.2 Cauchy’s Integral Formula Obviously, since eiπ/4 2 375 = eiπ/2 = i Z R 2 eit dt, 0 Z R 2 iπ/4 I3 = e e−t dt I1 = 0 We shall show that | I2 (R) | −→ 0 as R tends to ∞. We have n→∞ Z Z π/4 Z π/4 π/4 2 iR2 (cos 2t+i sin 2t) i(R2 e2it ) it | I2 (R) | = e e−R sin 2t dt. Rie dt ≤ R e dt ≤ R 0 0 0 Note that sin t is a concave function on [0, π/2], that is, the graph of the sine function is above the graph of the corresponding linear function through (0, 0) and (π/2, 1); thus, sin t ≥ 2t/π, t ∈ [0, π/2]. We have Z π/4 π −R2 π −R2 4t/π −R2 . e dt = − −1 = | I2 (R) | ≤ R e 1−e 4R 4R 0 We conclude that | I2 (R) | tends to 0 as R → ∞. By Cauchy’s Theorem I1 + I2 − I3 = 0 for all R, we conclude Z ∞ Z ∞ 2 it2 iπ/4 lim I1 (R) = e dt = e e−t dt = lim I3 (R). R→∞ 0 R→∞ 0 √ 2 π/2 (see below); hence eit = cos(t2 ) + i sin(t2 ) implies √ Z ∞ Z ∞ 2π 2 = cos(t ) dt = sin(t2 ) dt. 4 0 0 R∞ √ 2 These are the so called Fresnel integrals. We show that I = 0 e−x dx = π/2. (This was already done in Homework 41) For, we compute the double integral using Fubini’s theorem: The integral on the right is Z∞ Z∞ 0 −x2 −y 2 e 0 dxdy = Z ∞ −x2 e 0 dx Z ∞ 2 e−y dy = I 2 . 0 Passing to polar coordinates yields dxdy = r dr, x2 + y 2 = r 2 such that Z π/2 Z R ZZ 2 −x2 −y 2 e dxdy = lim dϕ e−r r dr. R+)2 ( R→∞ 0 0 The change of variables r 2 = t, dt = 2r dr yields √ Z 1 π ∞ −t π π 2 e dt = =⇒ I = I = . 22 0 4 2 √ This proves the claim. In addition, the change of variables x = s also yields Z ∞ −x √ 1 e √ dx = π. Γ = x 2 0 14 Complex Analysis 376 Theorem 14.9 Let γ be a path in an open set U and g be a continuous function on U. If a is not on γ, define Z g(z) h(a) = dz. z−a γ Then h is holomorphic on the complement of γ in U and has derivatives of all orders. They are given by Z g(z) (n) h (a) = n! dz. (z − a)n+1 γ Proof. Let b ∈ U and not on γ. Then there exists some r > 0 such that | z − b | ≥ r for all points z on γ. Let 0 < s < r. We shall see that h has a power series expansion in the ball Us (b). We write z 1 1 1 1 = = z−a z − b − (a − b) z − b 1 − a−b z−b ! 2 a−b a−b 1 1+ + = +··· . z−b z−b z−b a b s γ This geometric series converges absolutely and uniformly for | a − b | ≤ s because a−b s z − b ≤ r < 1. U Since g is continuous and γ is a compact set, g(z) is bounded on γ such that by Theorem 6.6, P a−b n the series ∞ can be integrated term by term, and we find n=0 g(z) z−b h(a) = = Z X ∞ (a − b)n g(z) dz (z − b)n+1 n=0 γ ∞ X n=0 = ∞ X n=0 (a − b) n Z γ g(z) dz (z − b)n+1 cn (a − b)n , where cn = Z γ g(z) dz . (z − b)n+1 This proves that h can be expanded into a power series in a neighborhood of b. By Proposition 14.1 and Corollary 14.2, f has derivatives of all orders in a neighborhood of b. By the 14.2 Cauchy’s Integral Formula 377 formula in Corollary 14.2 (n) h (b) = n!cn = n! Z γ g(z) dz . (z − b)n+1 Remark 14.6 There is an easy way to deduce the formula. Formally, we can exchange the R d and γ : differentiation da Z Z Z d g(z) d g(z) ′ −1 dz. h (a) = g(z)(z − a) dz = dz = da z−a da (z − a)2 γ γ γ Z Z g(z) d h′′ (a) = g(z)(z − a)−2 dz = 2 dz. da (z − a)3 γ γ Theorem 14.10 Suppose that f is holomorphic in U and Ur (a) ⊂ U, then f has a power series expansion in Ur (a) ∞ X cn (z − a)n . f (z) = n=0 In particular, f has derivatives of all orders, and we have the following coefficient formula Z f (n) (a) f (z) dz 1 cn = . (14.10) = n! 2πi (z − a)n+1 Sr (a) Proof. In view of Cauchy’s Integral Formula (Theorem 14.8) we obtain Z 1 f (z) dz f (a) = 2πi z−a SR (a) Inserting g(z) = f (z)/(2πi) (f is continuous) into Theorem 14.9, we see that f can be expanded into a power series with center a and, therefore, it has derivatives of all orders at a, Z f (z) dz n! (n) f (a) = . 2πi (z − a)n+1 SR (a) 14.2.4 Applications of the Coefficient Formula Proposition 14.11 (Growth of Taylor Coefficients) Suppose that f is holomorphic in U and is bounded by M > 0 in Ur (a) ⊂ U; that is, | z − a | < r implies | f (z) | ≤ M. P n Let ∞ n=0 cn (z − a) be the power series expansion of f at a. Then we have M | cn | ≤ n . (14.11) r 14 Complex Analysis 378 Proof. By the coefficient formula (14.10) and Remark 14.3 (c) we have noting that M f (z) (z − a)n+1 ≤ r n+1 for z ∈ Sr a Z 1 1 M M M f (z) ≤ dz ℓ(S (a)) = 2πr = . | cn | ≤ r 2πi 2π r n+1 (z − a)n+1 2πr n+1 rn Sr (a) Theorem 14.12 (Liouville’s Theorem) A bounded entire function is constant. Proof. Suppose that | f (z) | ≤ M for all z ∈ C. Since f is given by a power series f (z) = P∞ n n=0 cn z with radius of convergence R = ∞, the previous proposition gives | cn | ≤ M rn for all r > 0. This shows cn = 0 for all n 6= 0; hence f (z) = c0 is constant. Remarks 14.7 (a) Note that we explicitly assume f to be holomorphic on the entire complex plane. For example, f (z) = e1/z is holomorphic and bounded outside every ball Uε (0). However, f is not constant. (b) Note that f (z) = sin z is an entire function which is not constant. Hence, sin z is unbounded as a complex function. Theorem 14.13 (Fundamental Theorem of Algebra) A polynomial p(z) with complex coefficients of degree deg p ≥ 1 has a complex root. Proof. Suppose to the contrary that p(z) 6= 0 for all z ∈ C. It is known, see Example 3.3, that lim | p(z) | = +∞. In particular there exists R > 0 such that | z |→∞ | z | ≥ R =⇒ | p(z) | ≥ 1. That is, f (z) = 1/p(z) is bounded by 1 if | z | ≥ R. On the other hand, f is a continuous function and {z | | z | ≤ R} is a compact subset of C. Hence, f (z) = 1/p(z) is bounded on UR , too. That is, f is bounded on the entire plane. By Liouville’s theorem, f is constant and so is p. This contradicts our assumption deg p ≥ 1. Hence, p has a root in C. Now, there is an inverse-like statement to Cauchy’s Theorem. Theorem 14.14 (Morera’s Theorem) Let f : U → C be a continuous function where U ⊂ C is open. Suppose that the integral of f along each closed triangular path [z1 , z2 , z3 ] in U is 0. Then f is holomorphic in U. 14.2 Cauchy’s Integral Formula 379 Proof. Fix z0 ∈ U. We show that f has an anti-derivative in a small neighborhood Uε (z0 ) ⊂ U. For a ∈ Uε (z0 ) define Za F (a) = f (z) dz. z0 a+h a Note that F (a) takes the same value for all polygonal paths from z0 to a by assumption of the theorem. We have Z a+h F (a + h) − F (a) 1 (f (z) − f (a)) dz , − f (a) = h h a where the integral on the right is over the segment form a to R a+h a + h and we used a c dz = ch. By Remark 14.3 (c), the right side is less than or equal to z ≤ 1 sup | f (z) − f (a) | | h | = sup | f (z) − f (a) | . | h | z∈Uh(a) z∈Uh (a) Since f is continuous the above term tends to 0 as h tends to 0. This shows that F is differentiable at a with F ′ (a) = f (a). Since F is holomorphic in U, by Theorem 14.10 it has derivatives of all orders; in particular f is holomorphic. Corollary 14.15 Suppose that (fn ) is a sequence of holomorphic functions on U, uniformly converging to f on U. Then f is holomorphic on U. Proof. Since fn are continuous and uniformly converging, f is continuous on U. Let γ be any closed triangular path in U. Since (fn ) converges uniformly, we may exchange integration and limit: Z Z Z f (z) dz = γ lim fn (z) dz = lim γ n→∞ n→∞ fn (z) dz = lim 0 = 0 γ n→∞ since each fn is holomorphic. By Morera’s theorem, f is holomorphic in U. Summary Let U be a region and f : U → C be a function on U. The following are equivalent: (a) f is holomorphic in U. (b) f = u + iv is real differentiable and the Cauchy–Riemann equations ux = vy and uv = −vx are satisfied in U. (c) If U is simply connected, f is continuous and for every closed triangular path R γ = [z1 , z2 , z3 ] in U, γ f (z) dz = 0 (Morera condition). 14 Complex Analysis 380 (d) f possesses locally an antiderivative, that is, for every a ∈ U there is a ball Uε (a) ⊂ U and a holomorphic function F such that F ′ (z) = f (z) for all z ∈ Uε (a). (e) f is continuous and for every ball Ur (a) with Ur (a) ⊂ U we have Z 1 f (z) f (b) = dz, ∀ b ∈ Ur (a). 2πi z−b Sr (a) (f) For a ∈ U there exists a ball with center a such that f can be expanded in that ball into a power series. (g) For every ball B which is completely contained in U, f can be expanded into a power series in B. 14.2.5 Power Series Since holomorphic functions are locally representable by power series, it is quite useful to know how to operate with power series. In case that a holomorphic function f is represented by a power series, we say that f is an analytic function. In other words, every holomorphic function is analytic and vice versa. Thus, by Theorem 14.10, any holomorphic function is analytic and vice versa. (a) Uniqueness P P bn z n converge in a ball around 0 and define the same function then If both cn z n and cn = bn for all n ∈ N0 . (b) Multiplication P P If both cn z n and bn z n converge in a ball Ur (0) around 0 then ∞ X n=0 where dn = Pn n cn z · ∞ X n=0 n bn z = ∞ X dn z n , n=0 | z | < r, k=0 cn−k bk . (c) The Inverse 1/f Let f (z) = ∞ X cn z n be a convergent power series and n=0 c0 6= 0. Then f (0) = c0 6= 0 and, by continuity of f , there exists r > 0 such that the power series converges in the ball Ur (0) and is non-zero there. Hence, 1/f (z) is holomorphic in Ur (0) and therefore it can be expanded into a converging power series in Ur (0), see summary (f). Suppose 14.2 Cauchy’s Integral Formula that 1/f (z) = g(z) = ∞ X n=0 381 bn z n , | z | < r. Then f (z)g(z) = 1 = 1 + 0z + 0z 2 + · · · , the uniqueness and (b) yields 1 = c0 b0 , 0 = c0 b1 + c1 b0 , 0 = c0 b2 + c1 b1 + c2 b0 , · · · This system of equations can be solved recursively for bn , n ∈ N0 , for example, b0 = 1/c0, b1 = −c1 b0 /c0 . (d) Double Series Suppose that fk (z) = ∞ X n=0 ckn (z − a)n , k∈N are converging in Ur (a) power series. Suppose further that the series ∞ X fk (z) k=1 converges locally uniformly in Ur (a) as well. Then ∞ X fk (z) = ∞ ∞ X X n=0 k=1 k=1 ! ckn (z − a)n . P In particular, one can form the sum of a locally uniformly convergent series fk (z) of power P series coefficientwise. Note that a series of functions ∞ k=1 fk (z) converges locally uniformly at b if there exists ε > 0 such that the series converges uniformly in Uε (b). Note that any locally uniformly converging series of holomorphic functions defines a holomorphic function (Theorem of Weierstraß). Indeed, since the series converges uniformly, line integral and summation can be exchanged: Let γ = [z0 z1 z2 ] be any closed triangular path inside U, then by Cauchy’s theorem Z Z X ∞ ∞ Z ∞ X X f (z) dz = fk (z) dz = fk (z) dz = 0 = 0. γ By Morera’s theorem, f (z) = γ k=1 P∞ k=1 f (z) k=1 γ k=1 is holomorphic. (e) Change of Center P n Let f (z) = ∞ n=0 cn (z − a) be convergent in Ur (a), r > 0, and b ∈ Ur (a). Then f can be expanded into a power series with center b ∞ X f (n) (b) n bn (z − b) , bn = f (z) = , n! n=0 with radius of convergence at least r − | b − a |. Also, the coefficients can be obtained by reordering for powers of (z − b)k using the binomial formula n X n n n (z − b)k (b − a)n−k . (z − a) = (z − b + b − a) = k k=0 14 Complex Analysis 382 (f) Composition We restrict ourselves to the case f (z) = a0 + a1 z + a2 z 2 + · · · g(z) = b1 z + b2 z 2 + · · · where g(0) = 0 and therefore, the image of g is a small neighborhood of 0, and we assume that the first power series f is defined there; thus, f (g(z)) is defined and holomorphic in a certain neighborhood of 0, see Remark 14.1. Hence h(z) = f (g(z)) = c0 + c1 z + c2 z 2 + · · · where the coefficients cn = h(n) (0)/n! can be computed using the chain rule, for example, c0 = f (g(0)) = a0 , c1 = f ′ (g(0))g ′(0) = a1 b1 . (g) The Composition Inverse f −1 P n Suppose that f (z) = ∞ n=1 an z , a1 6= 0, has radius of convergence r > 0. Then there exists a P∞ power series g(z) = n=1 bn z n converging on Uε (0) such that f (g(z)) = z = g(f (z)) for all z ∈ Uε (0). Using (f) and the uniqueness, the coefficients bn can be computed recursively. Example 14.8 (a) The function f (z) = 1 1 + 2 1+z 3−z is holomorphic in C \ {i, −i, 3}. Expanding f into a power series with center 1, the closest singularity to 1 is ±i. Since the disc of convergence cannot contain ±i, the radius of convergence √ is | 1 − i | = 2. Expanding the power series around a = 2, the closest singularity of f is 3; hence, the radius of convergence is now | 3 − 2 | = 1. 1 1−z which is holomorphic in C \ {1} into a power series around b = i/2. For arbitrary b with | b | < 1 we have (b) Change of center. We want to expand f (z) = i i/2 0 1 1 1 1 1 = = · 1−z 1 − b − (z − b) 1 − b 1 − z−b 1−b ∞ X 1 = (z − b)n = f˜(z). n+1 (1 − b) n=0 By the root test, the of this series is | 1 − b |. In case b = i/2 we have p radius of convergence √ r = | 1 − i/2 | = 1 + 1/4 = 5/2. Note that the power series 1 + z + z 2 + · · · has radius of convergence 1 and a priori defines an analytic (= holomorphic) function in the open unit ball. However, changing the center we obtain an analytic continuation of f to a larger region. This example shows that (under certain assumptions) analytic functions can be extended into a larger region by changing the center of the series. 14.3 Local Properties of Holomorphic Functions 383 14.3 Local Properties of Holomorphic Functions We omit the proof of the Open Mapping Theorem and refer to [Jän93, Satz 11, Satz 13] see also [Con78]. Theorem 14.16 (Open Mapping Theorem) Let G be a region and f a non-constant holomorphic function on G. Then for every open subset U ⊂ G, f (U) is open. The main idea is to show that any at some point a holomorphic function f with f (a) = 0 looks like a power function, that is, there exists a positive integer k such that f (z) = h(z)k in a small neighborhood of a, where h is holomorphic at a with a zero of order 1 at a. Theorem 14.17 (Maximum Modulus Theorem) Let f be holomorphic in the region U and a ∈ U is a point such that | f (a) | ≥ | f (z) | for all z ∈ U. Then f must be a constant function. Proof. First proof. Let V = f (U) and b = f (a). By assumption, | b | ≥ | w | for all w ∈ V . b cannot be an inner point of V since otherwise, there is some c in the neighborhood of b with | c | > | b | which contradict the assumption. Hence b is in the boundary ∂V ∩ V . In particular, V is not open. Hence, the Open Mapping Theorem says that f is constant. Second Proof. We give a direct proof using Cauchy’s Integral formula. For simplicity let a = 0 and let Ur (0) ⊆ U be a small ball in U. By Cauchy’s theorem with γ = Sr (0), z = γ(t) = reit , t ∈ [0, 2π], dz = rieit dt we get Z 2π Z 2π f reit rieit 1 1 f (0) = dt = f (reit ) dt. 2πi 0 reit 2π 0 In other words, f (0) is the arithmetic mean of the values of f on any circle with center 0. Let M = | f (a) | ≥ | f (z) | be the maximal modulus of f on U. Suppose, there exists z0 with z0 = r0 eit0 with r0 < r and | f (z0 ) | < M. Since f is continuous, there exists a whole neighborhood of t ∈ Uε (t0 ) with f (reit ) < M. However, in this case Z 2π Z 2π 1 1 it f (reit ) dt < M f (re ) dt ≤ M = | f (0) | = 2π 0 2π 0 which contradicts the mean value property. Hence, | f (z) | = M is constant in any sufficiently small neighborhood of 0. Let z1 ∈ U be any point in U. We connect 0 and z1 by a path in U. Let d be its distance from the boundary ∂U. Let z continuously moving from 0 to z1 and cosider the chain of balls with center z and radius d/2. By the above, | f (z) | = M in any such ball, hence | f (z) | = M in U. It follows from homework 47.2 that f is constant. Remark 14.8 In other words, if f is holomorphic in G and U ⊂ G, then sup | f (z) | is attained z∈U on the boundary ∂U. Note that both theorems are not true in the real setting: The image of the sine function of the open set (0, 2π) is [−1, 1] which is not open. The maximum of f (x) = 1−x2 over (−1, 1) is not attained on the boundary since f (−1) = f (1) = 0 while f (0) = 1. However | z 2 − 1 | on the complex unit ball attains its maximum in z = ±i—on the boundary. 14 Complex Analysis 384 Recall from topology: • An accumulation point of a set M ⊂ C is a point of z ∈ C such that for every ε > 0, Uε (z) contains infinitely many elements of M. Accumulation points are in the closure of M, not necessarily in the boundary of M. The set of accumulation points of M is closed (Indeed, suppose that a is in the closure of the set of accumulation points of M. Then every neighborhood of a meets the set of accumulation points of M; in particular, every neighborhood has infinitely many element of M. Hence a itself is an accumulation point of M). • M is connected if every locally constant function is constant. M is not connected if M is the disjoint union of two non-empty subsets A and B, both are open and closed in M. For example, M = {1/n | n ∈ N} has no accumulation point in C \ {0} but it has one accumulation point, 0, in C. Proposition 14.18 Let U be a region and let f : U → C be holomorphic in U. If the set Z(f ) of the zeros of f has an accumulation point in U, then f is identically 0 in U. Example 14.9 Consider the holomorphic function f (z) = sin 1z , f : U → C, U = C \ {0} and 1 | n ∈ Z}. The only accumulation point 0 of Z(f ) does with the set of zeros Z(f ) = {zn = nπ not belong to U. The proposition does not apply. Proof. Suppose a ∈ U is an accumulation point of Z(f ). Expand f into a power series with center a: ∞ X f (z) = cn (z − a)n , | z − a | < r. n=0 Since a is an accumulation point of the zeros, there exists a sequence (zn ) of zeros converging to a. Since f is continuous at a, limn→∞ f (zn ) = 0 = f (a). This shows c0 = 0. The same argument works with the function f (z) , z−a which is holomorphic in the same ball with center a and has a as an accumulation point of zeros. Hence, c1 = 0. In the same way we conclude that c2 = c3 = · · · = cn = · · · = 0. This shows that f is identically 0 on Ur (a). That is, the set f1 (z) = c1 + c2 (z − a) + c3 (z − a)3 + · · · = A = {a ∈ U | a is an accumulation point of Z(f ) } is an open set. Also, B = {a ∈ U | a is not an accumulation point of Z(f ) } is open (with every non-accumulation point z, there is a whole neighborhood of z not containing accumulation points of Z(f )). Now, U is the disjoint union of A and B, both are open as well as closed in U. Hence, the characteristic function on A is a locally constant function on U. Since U is connected, either U = A or U = B. Since by assumption A is non-empty, A = U, that is f is identically 0 on U. 14.4 Singularities 385 Theorem 14.19 (Uniqueness Theorem) Suppose that f and g are both holomorphic functions on U and U is a region. Then the following are equivalent: (a) f = g (b) The set D = {z ∈ U | f (z) = g(z)} where f and g are equal has an accumulation point in U. (c) There exists z0 ∈ U such that f (n) (z0 ) = g (n) (z0 ) for all non-negative integers n ∈ N0 . Proof. (a) ↔ (b). Apply the previous proposition to the function f − g. (a) implies (c) is trivial. Suppose that (c) is satisfied. Then, the power series expansion of f − g at z0 is identically 0. In particular, the set Z(f − g) contains a ball Bε (z0 ) which has an accumulation point. Hence, f − g = 0. The following proposition is an immediate consequence of the uniqueness theorem. Proposition 14.20 (Uniqueness of Analytic Continuation) Suppose that M ⊂ U ⊂ C where U is a region and M has an accumulation point in U. Let g be a function on M and suppose that f is a holomorphic function on U which extents g, that is f (z) = g(z) on M. Then f is unique. Remarks 14.9 (a) The previous proposition shows a quite amazing property of a holomorphic function: It is completely determined by “very few values”. This is in a striking contrast to C∞ -functions on the real line. For example, the “hat function” ( − 1 | x | < 1, e 1−x2 h(x) = 0 |x| ≥ 1 is identically 0 on [2, 3] (a set with accumulation points), however, h is not identically 0. This shows that h is not holomorphic. (b) For the uniqueness theorem, it is an essential point that U is connected. (c) It is now clear that the real function ex , sin x, and cos x have a unique analytic continuation into the complex plane. (d) The algebra O(U) of holomorphic functions on a region U is a domain, that is, f g = 0 implies f = 0 or g = 0. Indeed, suppose that f (z0 ) 6= 0, then f (z) 6= 0 in a certain neighborhood of z0 (by continuity of f ). Then g = 0 on that neighborhood. Since an open set has always an accumulation point in itself, g = 0. 14.4 Singularities ◦ We consider functions which are holomorphic in a punctured ball U r (a). From information about the behaviour of the function near the center a, a number of interesting and useful results will be derived. In particular, we will use these results to evaluate certain unproper integrals over the real line which cannot be evaluated by methods of calculus. 14 Complex Analysis 386 14.4.1 Classification of Singularities Throughout this subsection U is a region, a ∈ U, and f : U \ {a} → C is holomorphic. Definition 14.4 (a) Let f be holomorphic in U \ {a} where U is a region and a ∈ U. Then a is said to be an isolated singularity. (b) The point a is called a removable singularity if there exists a holomorphic function g : Ur (a) → C such that g(z) = f (z) for all z with 0 < | z − a | < r. sin z 1 , , and e1/z all have isolated singularities at 0. However, Example 14.10 The functions z z sin z has a removable singularity. The holomorphic function g(z) which coincides only f (z) = z with f on C \ {0} is g(z) = 1 − z 2 /3! + z 4 /5! − + · · · . Hence, redefining f (0) := g(0) = 1 makes f holomorphic in C. We will see later that the other two singularities are not removable. It is convenient to denote the the new function g with one more point in its domain (namely a) also by f . Proposition 14.21 (Riemann—1851) Suppose that f : U \ {a} → C, a ∈ U, is holomorphic. Then a is a removable singularity of f if and only if there exists a punctured neighborhood ◦ U r (a) where f is bounded. Proof. The necessity of the condition follows from the fact, that a holomorphic function g is continuous and the continuous function | g(z) | defined on the compact set Ur/2 (a) is bounded; hence f is bounded. For the sufficiency we assume without loss of generality, a = 0 (if a is non-zero, consider the function f˜(z) = f (z − a) instead). The function ( z 2 f (z), z 6= 0, h(z) = 0, z=0 ◦ is holomorphic in U r (0). Moreover, h is differentiable at 0 since f is bounded in a neighborhood of 0 and h(z) − h(0) h′ (0) = lim = lim zf (z) = 0. z→0 z→0 z Thus, h can be expanded into a power series at 0, h(z) = c0 + c1 z + c2 z 3 + c3 z 3 + · · · = c2 z 2 + c3 z 3 + · · · with c0 = c1 = 0 since h(0) = h′ (0) = 0. For non-zero z we have f (z) = h(z) = c2 + c3 z + c4 z 2 + · · · . 2 z The right side defines a holomorphic function in a neighborhood of 0 which coincides with f for z 6= 0. The setting f (0) = c2 removes the singularity at 0. 14.4 Singularities 387 Definition 14.5 (a) An isolated singularity a of f is called a pole of f if there exists a positive integer m ∈ N and a holomorphic function g : Ur (a) → C such that f (z) = g(z) (z − a)m The smallest number m such that (z − a)m f (z) has a removable singularity at a is called the order of the pole. (b) An isolated singularity a of f which is neither removable nor a pole is called an essential singularity. (c) If f is holomorphic at a and there exists a positive integer m and a holomorphic function g such that f (z) = (z − a)m g(z), and g(a) 6= 0, a is called a zero of order m of f . Note that m = 0 corresponds to removable singularities. If f (z) has a zero of order m at a, 1/f (z) has a pole of order m at a and vice versa. Example 14.11 The function f (z) = 1/z 2 has a pole of order 2 at z = 0 since z 2 f (z) = 1 has a removable singularity at 0 and zf (z) = 1/z not. The function f (z) = (cos z − 1)/z 3 has a pole of order 1 at 0 since (cos z − 1)/z 3 = −/(2z) + z/4! ∓ · · · . 14.4.2 Laurent Series In a neighborhood of an isolated singularity a holomorphic function cannot expanded into a power series, however, in a so called Laurent series. Definition 14.6 A Laurent series with center a is a series of the form ∞ X n=−∞ cn (z − a)n or more precisely the pair of series f− (z) = ∞ X n=1 c−n (z − a) −n and f+ (z) = ∞ X n=0 cn (z − a)n . The Laurent series is said to be convergent if both series converge. 14 Complex Analysis 388 R R r r a a a f(z) converges + f(z) converges − 1 .” Thus, we can derive facts about the z−a convergence of Laurent series from the convergence of power series. In fact, suppose that 1/r P n is the radius of convergence of the power series ∞ n=1 c−n ζ and R is the radius of convergence P P∞ of the series n=0 cn z n , then the Laurent series n∈Z cn z n converges in the annulus Ar,R = {z | r < z < R} and defines there a holomorphic function. P (a) The power series f+ (z) = n≥0 cn (z − a)n converges in the inner part of the ball UR (a) whereas the series with negative powers, called the principal part of the Laurent series, f− (z) = P n n<0 cn (z − a) converges in the exterior of the ball Ur (a). Since both series must converge, f (z) convergence in intersection of the two domains which is the annulus Ar,R (a). Remark 14.10 (a) f− (z) is “a power series in The easiest way to determine the type of an isolated singularity is to use Laurent series which are, roughly speaking, power series with both positive and negative powers z n . Proposition 14.22 Suppose that f is holomorphic in the open annulus Ar,R (a) = {z | r < | z − a | < R}. Then f (z) has an expansion in a convergent Laurent series for z ∈ Ar,R f (z) = ∞ X n=0 n cn (z − a) + ∞ X n=1 c−n 1 (z − a)n (14.12) n ∈ Z, (14.13) with coefficients 1 cn = 2πi Z Sρ (a) where r < ρ < R. r < s1 ≤ s2 < R. f (z) dz, (z − a)n+1 The series converges uniformly on every annulus As1 ,s2 (a) with 14.4 Singularities 389 Proof. z s1 a s 2 γ Let z be in the annulus As1 ,s2 and let γ be the closed path around z in the annulus consisting of the two circles −Ss1 (a), Ss2 (a) and the two “bridges.” By Cauchy’s integral formula, Z f (w) 1 dw = f1 (z) + f2 (z) = f (z) = 2πi γ w − z Z Z 1 1 f (w) f (w) dw − dw = 2πi w−z 2πi w−z Ss2 (a) We consider the two functions Z 1 f1 (z) = 2πi Ss2 (a) f (w) dw, w−z Ss1 (a) 1 f2 (z) = − 2πi Z Ss1 (a) f (w) dw w−z separately. P n In what follows, we will see that f1 (z) is a power series ∞ n=0 cn (z − a) and f2 (z) = P∞ 1 The first part is completely analogous to the proof of Theorem 14.9. n=1 c−n (z−a)n . z−a <1 Case 1. w ∈ Ss2 (a). Then | z − a | < | w − a | and | q | = s w−a z such that 2 a ∞ ∞ 1 1 1 X n X (z − a)n 1 q = . = = z−a w−z w − a 1 − w−a w − a n=0 (w − a)n+1 n=0 w Since f (w) is bounded on Ss2 (a), the geometric series has a converging numerical upper bound. Hence, the series converges uniformly with respect to w; we can exchange integration and summation: Z 1 f1 (z) = 2πi Ss2 (a) where cn = 1 2πi X (z − a)n (z − a)n f (w) dw = (w − a)n+1 2πi n=0 n=0 R Ss2 (a) s1 z a w ∞ ∞ X Z Ss2 (a) ∞ X f (w)dw = cn (z−a)n , (w − a)n+1 n=0 f (w)dw (w−a)n+1 are the coefficients of the power series f1 (z). w−a < 1 such Case 2. w ∈ Ss1 (a). Then | z − a | > | w − a | and z−a that ∞ ∞ X 1 1 1 X n 1 (w − a)n =− = − q = − . w−z z − a 1 − w−a z − a n=0 (z − a)n+1 z−a n=0 Since f (w) is bounded on Ss1 (a), the geometric series has a converging numerical upper bound. Hence, the series converges uniformly with respect to w; we can exchange integration and 14 Complex Analysis 390 summation: Z 1 f2 (z) = 2πi Ss1 (a) = ∞ X n=1 where c−n = Ss2 (a) ∞ X 1 (w − a)n f (w) dw = (z − a)n+1 2πi(z − a)n+1 n=0 n=0 Z f (w) (w − a)n dw Ss1 (a) c−n (z − a)−n , 1 2πi R Ss2 (a) Since the integrand mark 14.4 (c) Z ∞ X f (w) (w − a)n−1 dw are the coefficients of the series f2 (z). f (w) , k ∈ Z, is holomorphic in both annuli As1 ,ρ and Aρ,s2 , by Re(w − a)k f (w)dw = (w − a)k Z Sρ (a) f (w)dw , (w − a)k Z and Ss1 (a) f (w)dw = (w − a)k Z Sρ (a) f (w)dw , (w − a)k that is, in the coefficient formulas we can replace both circles Ss1 (a) and Ss2 (a) by a common circle Sρ (a). Since a power series converge uniformly on every compact subset of the disc of convergence, the last assertion follows. Remark 14.11 The Laurent series of f on Ar,R (a) is unique. Its coefficients cn , n ∈ Z are uniquely determined by (14.13). Another value of ρ with r < ρ < R yields the same values cn by Remark 14.4 (c). Example 14.12 Find the Laurent expansion of f (z) = midpoint 0 0 < | z | < 1, z2 1 < | z | < 3, 2 in the three annuli with − 4z + 3 3 < |z|. 1 1 + , we find in the case 1−z z−3 ∞ 1 1 1 X z n 1 = = . 3−z 3 1 − z3 3 n=0 3 Using partial fraction decomposition, f (z) = (a) | z | < 1 Hence, ∞ X 1 = zn, 1−z n=0 ∞ X f (z) = 1− n=0 In the case | z | > 1, and as in (a) for | z | < 3 1 3n+1 1 1 1 = 1−z z 1− 1 z zn , | z | < 1. ∞ X 1 = n+1 z n=0 1 1 X z n = 3−z 3 n=0 3 ∞ 14.4 Singularities 391 such that f (z) = ∞ X −1 n=0 (c) In case | z | > 3 we have ∞ X −1 n + z . n z 3n+1 n=0 ∞ such that X 3n 1 1 = = z−3 z n+1 z 1 − 3z n=0 f (z) = ∞ X n=1 (3n−1 − 1) 1 . zn We want to study the behaviour of f in a neighborhood of an essential singularity a. It is characterized by the following theorem. For the proof see Conway, [Con78, p. 300]. Theorem 14.23 (Great Picard Theorem (1879)) Suppose that f (z) is holomorphic in the annulus G = {z | 0 < | z − a | < r} with an essential singularity at a. Then there exists a complex number w1 with the following property. For any complex number w 6= w1 , there are infinitely many z ∈ G with f (z) = w. In other words, in every neighborhood of a the function f (z) takes all complex values with possibly one omission. In case of f (z) = e1/z the number 0 is omitted; in case of f (z) = sin 1z no complex number is omitted. We will prove a much weaker form of this statement. Proposition 14.24 (Casorati–Weierstraß) Suppose that f (z) is holomorphic in the annulus G = {z | 0 < | z − a | < r} with an essential singularity at a. Then the image of any neighborhood of a in G is dense in C, that is for every w ∈ C and any ε > 0 and δ > 0 there exists z ∈ G such that | z − a | < δ and | f (z) − w | < ε. Proof. For simplicity, assume that δ < r. Assume to the contrary that there exists w ∈ C and ◦ ε > 0 such that | f (z) − w | ≥ ε for all z ∈ U δ (a). Then the function g(z) = 1 , f (z) − w ◦ z ∈ U δ (a) is bounded (by 1/ε) in some neighborhood of a; hence, by Proposition 14.21, a is a removable singularity of g(z). We conclude that f (z) = 1 +w g(z) has a removable singularity at a if g(a) 6= 0. If, on the other hand, g(z) has a zero at a of order m, that is ∞ X cn (z − a)n , cm 6= 0, g(z) = n=m 14 Complex Analysis 392 the function (z − a)m f (z) has a removable singularity at a. Thus, f has a pole of order m at a. Both conclusions contradict our assumption that f has an essential singularity at a. The Laurent expansion establishes an easy classification of the singularity of f at a. We summarize the main facts about isolated singularities. ◦ Proposition 14.25 Suppose that f (z) is holomorphic in the punctured disc U = U R (a) and ∞ X cn (z − a)n . possesses there the Laurent expansion f (z) = n=−∞ Then the singularity at a (a) is removable if cn = 0 for all n < 0. In this case, | f (z) | is bounded in U. (b) is a pole of order m if c−m 6= 0 and cn = 0 for all n < −m. In this case, limz→a | f (z) | = +∞. (c) is an essential singularity if cn 6= 0 for infinitely many n < 0. In this case, | f (z) | has no finite or infinite limit as z → a. The easy proof is left to the reader. Note that Casorati–Weierstraß implies that | f (z) | has no limit at a. Example 14.13 f (z) = e1/z has in C \ {0} the Laurent expansion 1 ez = ∞ X 1 , n n!z n=0 | z | > 0. Since c−n 6= 0 for all n, f has an essential singularity at 0. 14.5 Residues Throughout U ⊂ C is an open connected subset of C. ◦ Definition 14.7 Suppose that f : U r (a) → C, is holomorphic, 0 < r1 < r and let Z X 1 f (z) dz n cn (z − a) , cn = f (z) = 2πi Sr1 (a) (z − a)n+1 n∈Z be the Laurent expansion of f in the annulus {z | 0 < | z − a | < r}. Then the coefficient Z 1 f (z) dz c−1 = 2πi Sr1 (a) is called the residue of f at a and is denoted by Res f (z) or Res f . a a 14.5 Residues 393 Remarks 14.12 (a) If f is holomorphic at a, Res f = 0 by Cauchy’s theorem. a R (b) The integral γ f (z) dz depends only on the coefficient c−1 in the Laurent expansion of f (z) around a. Indeed, every summand cn (z − a)n , n 6= 0, has an antiderivative in U \ {a} such that the integral over a closed path is 0. (c) Res f + Res g = Res f + g and Res λf = λ Res f . a a a a a Theorem 14.26 (Residue Theorem) Suppose that f : U \ {a1 , . . . , am } → C, a1 , . . . , am ∈ U, is holomorphic. Further, let γ be a non-selfintersecting positively oriented closed curve in U such that the points a1 , . . . , am are in the inner part of γ. Then Z m X f (z) dz = 2πi Res f (14.14) γ k=1 Proof. δ1 δ3 δ2 δ4 γ a a1 2 γ 2 γ1 ak R As in Remark 14.4 we can replace γ by the sum of integrals over small circles, one around each singularity. As before, we obtain Z Z m X f (z) dz = f (z) dz, γ k=1 Sε (ak ) where all circles are positively oriented. Applying the definition of the residue we obtain the assertion. Remarks 14.13 (a) The residue theorem generalizes the Cauchy’s Theorem, see Theorem 14.6. Indeed, if f (z) possesses an analytic continuation to the points a1 , . . . , am , all the residues are R zero and therefore γ f (z) dz = 0. ∞ X (b) If g(z) is holomorphic in the region U, g(z) = cn (z − a)n , c0 = g(a), then n=0 f (z) = g(z) , z−a z ∈ U \ {a} is holomorphic in U \ {a} with a Laurent expansion around a: f (z) = c0 + c1 + c2 (z − a)2 + · · · , z−a where c0 = g(a) = Res f . The residue theorem gives a Z Sr (a) g(z) dz = 2πi Res f = 2πi c0 = 2πig(a). a z−a We recovered Cauchy’s integral formula. 14 Complex Analysis 394 14.5.1 Calculating Residues (a) Pole of order 1 As in the previous Remark, suppose that f has a pole of order 1 at a and g(z) = (z − a)f (z) is the corresponding holomorphic function in Ur (a). Then Res f = g(a) = a lim g(z) = lim (z − a)f (z). (14.15) z→a z→a,z6=a (b) Pole of order m Suppose that f has a pole of order m at a. Then f (z) = c−m c−m+1 c−1 + c0 + c1 (z − a) + · · · , + +···+ m m−1 (z − a) (z − a) z−a 0 < |z − a| < r (14.16) is the Laurent expansion of f around a. Multiplying (14.16) by (z − a)m yields a holomorphic function (z − a)m f (z) = c−m + c−m+1 (z − a) + · · · c−1 (z − a)m−1 + · · · , | z − a | < r. Differentiating this (m − 1) times, all terms having coefficient c−m , c−m+1 , . . . , c−2 vanish and we are left with the power series dm−1 ((z − a)m f (z)) = (m − 1)!c−1 + m(m − 1) · · · 2 c0 (z − a) + · · · dz m−1 Inserting z = a on the left, we obtain c−1 . However, on the left we have to take the limit z → a since f is not defined at a. Thus, if f has a pol of order m at a, Res f (z) = a dm−1 1 lim m−1 ((z − a)m f (z)) . (m − 1)! z→a dz (14.17) (c) Quotients of Holomorphic Functions Suppose that f = pq where p and q are holomorphic at a and q has a zero of order 1 at a, that is q(a) = 0 6= q ′ (a). Then, by (a) p(z) p = lim Res = lim (z − a) a z→a q q(z) z→a p(z) q(z)−q(a) z−a lim p(z) = z→a lim q(z)−q(a) z−a z→a = p(a) . q ′ (a) (14.18) 14.6 Real Integrals 395 R dz Example 14.14 Compute S1 (i) 1+z The only singularities of 4. 4 f (z) = 1/(1 + z ) inside the disc {z | | z − i | < 1} are √ √ a1 = eπi/4 = (1 + i)/ 2 and a2 = e3πi/4 = (−1 + i)/ 2. In√ deed, | a1 − i |2 = 2 − 2 < 1. We apply the Residue Theorem and (c) and obtain Im i a2 a1 0 1 Z Re S1 (i) dz = 2πi Res f + Res f = a1 a2 1 + z4 √ 2π 1 −a1 − a2 1 = . + 3 = 2πi 2πi 3 4a1 4a2 4 2 14.6 Real Integrals 14.6.1 Rational Functions in Sine and Cosine Suppose we have to compute the integral of such a function over a full period [0, 2π]. The idea is to replace t by z = eit on the unit circle, cos t and sin t by (z + 1/z)/2 and (z − 1/z)/(2i), respectively, and finally dt = dz/(iz). Proposition 14.27 Supppose that R(x, y) is a rational function in two variables and R(cos t, sin t) is defined for all t ∈ [0, 2π]. Then Z 2π X R(cos t, sin t) dt = 2πi Res f (z), (14.19) 0 where a∈U1 (0) 1 f (z) = R iz a 1 1 1 1 z+ , z− 2 z 2i z and the sum is over all isolated singularities of f (z) in the open unit ball. Proof. By the residue theorem, Z f (z) dz = 2πi S1 (0) X a∈U1 (0) Res f a Let z = eit for t ∈ [0, 2π]. Rewriting the integral on the left using dz = eit i dt = iz dt Z 2π Z f (z) dz = R(cos t, sin t) dt S1 (0) 0 completes the proof. Example 14.15 For | a | < 1, Z 2π 0 dt 2π = . 2 1 − 2a cos t + a 1 − a2 14 Complex Analysis 396 For a = 0, the statement is trivially true; suppose now a 6= 0. Indeed, the complex function corresponding to the integrand is f (z) = 1 1 i/a . = = iz(1 + a2 − az − a/z) i(−az 2 − a + (1 + a2 )z) (z − a) z − a1 In the unit disc, f (z) has exactly one pole of order 1, namely z = a. By (14.15), the formula in Subsection 14.5.1, i i a ; Res f = lim (z − a)f (z) = = 2 1 a z→a a −1 a− a the assertion follows from the proposition: Z 2π 0 dt 2π i = = 2πi 2 . 2 1 − 2a cos t + a a −1 1 − a2 Specializingi R = 1 and r = a ∈ R in Homework 49.1, we obtain the same formula. 14.6.2 Integrals of the form (a) The Principal Value R∞ f (x) dx −∞ We often compute improper integrals of the form Z ∞ f (x) dx. Using the residue theorem, we −∞ calculate limits lim R→∞ Z R f (x) dx, (14.20) −R which is called the principal value (or Cauchy mean value) of the integral over R and we denote it by Z ∞ Vp f (x) dx. −∞ The existence of the “coupled” limit (14.20) in general does not imply the existence of the improper integral Z ∞ −∞ f (x) dx = lim r→−∞ Z r 0 f (x) dx + lim s→∞ Z s f (x) dx. 0 R∞ R∞ R∞ For example, Vp −∞ x dx = 0 whereas −∞ x dx does not exist since 0 x dx = +∞. In general, the existence of the improper integral implies the existence of the principal value. If f is an even function or f (x) ≥ 0, the existence of the principal value implies the existence of the improper integral. (b) Rational Functions R The main idea to evaluate the integral R f (x) dx is as follows. Let H = {z | Im (z) > 0} be the upper half-plane and f : H \ {a1 , . . . , am } → C be holomorphic. Choose R > 0 large 14.6 Real Integrals 397 enough such that | ak | < R for all k = 1, . . . , m, that is, all isolated singularities of f are in the upper-plane-half-disc of radius R around 0. Consider the path as in the picture which consists of the segment from −R to R on the real iR line and the half-circle γR of radius R. By the γ . R residue theorem, . . Z R Z m . . . X . f (x) dx+ Res f (z). f (z) dz = 2πi −R −R R If lim R→∞ Z γR f (z) dz = 0 k=1 ak (14.21) γR the above formula implies lim R→∞ Z R f (x) dx = 2πi −R Knowing the existence of the improper integral Z ∞ Z Res (z). k=1 ∞ ak f (x) dx one has −∞ f (x) dx = 2πi −∞ m X m X Res (z). k=1 ak Suppose that f = pq is a rational function such that q has no real zeros and deg q ≥ deg p + 2. Then (14.21) is satisfied. Indeed, since only the two leading terms of p and q determine the the p(z) ≤ C on γR . Using the limit behaviour of f (z) for | z | → ∞, there exists C > 0 with q(z) R2 estimate Mℓ(γ) from Remark 14.3 (c) we get Z p(z) C πC dz ≤ 2 ℓ(γR ) = −→ 0. R R R→∞ γR q(z) By the same reason namely | p(x)/q(x) | ≤ C/x2 , for large x, the improper real integral exists (comparison test) and converges absolutely. Thus, we have shown the following proposition. Proposition 14.28 Suppose that p and q are polynomials with deg q ≥ deg p + 2. Further, q in the open has no real zeros and a1 , . . . , am are all poles of the rational function f (z) = p(z) q(z) upper half-plane H. Then Z ∞ m X f (x) dx = 2πi Res f. −∞ k=1 ak R ∞ dx 2 2 Example 14.16 (a) −∞ 1+x 2 . The only zero of q(z) = z + 1 in H is a1 = i and deg(1 + z ) = 2 ≥ deg(1) + 2 such that Z ∞ dx 1 1 = 2πi Res = 2πi = π. 2 i 1 + z2 1 + z z=i −∞ 1 + x 14 Complex Analysis 398 (b) It follows from Example 14.14 that √ Z ∞ π 2 dx . = 2πi Res f + Res f = 4 a1 a2 2 −∞ 1 + x (c) We compute the integral Z ∞ 0 1 dt = 6 1+t 2 Z ∞ −∞ dt . 1 + t6 a 2 =i a3 a1 The zeros of q(z) = z 6 + 1 in the upper half-plane are a1 = eiπ/6 , a2 = eiπ/2 and a3 = e5iπ/6 . They are all of multiplicity 1 such that Formula (14.18) applies: 1 Res ak 1 1 ak 1 = ′ = 5 =− . q(z) q (ak ) 6ak 6 By Proposition 14.28 and noting that a1 + a3 = i, Z ∞ πi 1 dt −1 iπ/6 π 5iπ/6 = − 2i = . = + i + e e 2πi 6 1+t 2 6 6 3 0 (c) Functions of Type g(z) eiαz Proposition 14.29 Suppose that p and q are polynomials with deg q ≥ deg p + 1. Further, q has no real zeros and a1 , . . . , am are all poles of the rational function g = pq in the open upper half-plane H. Put f (z) = g(z) eiαz , where α ∈ R is positive α > 0. Then Z ∞ m X f (x) dx = 2πi Res f. −∞ k=1 ak Proof. (The proof was omitted in the lecture.) r+ir ir −r+ir . . . Instead of a semi-circle it is more appropriate to consider a rectangle now. . . . . −r r According to the residue theorem, Z r+ir Z Z r f (x) dx + f (z) dz + −r r r−ir r+ir f (z) dz + Z −r −r+ir f (z) dz = 2πi m X k=1 Res f. ak p(z) p(z) exists and tends to 0 as Since deg q ≥ deg p + 1, limz→∞ q(z) = 0. Thus, sr = sup | z |≥r q(z) r → ∞. 14.6 Real Integrals 399 Consider the second integral I2 with z = r + it, t ∈ [0, r], dz = i dt. On this segment we have the following estimate p(z) iα(r+it) ≤ sr e−αt q(z) e which implies | I2 | ≤ sr Z r e−αt dt = 0 sr sr 1 − e−αr ≤ . α α A similar estimate holds for the fourth integral from −r + ir to −r. In case of the third integral one has z = t + ir, t ∈ [−r, r], dz = dt such that Z r Z r iα(t+ri) −αr | I3 | ≤ sr e dt = 2rsr e−αr . dt = sr e −r −r Since 2re−αr is bounded and sr → 0 as r → ∞, all three integrals I2 , I3 , and I4 tend to 0 as r → ∞. This completes the proof. Example 14.17 For a > 0, Obviously, Z 0 The function f (z) = formula (14.18) Z ∞ ∞ 0 cos t π −a e . dt = t2 + a2 2a cos t 1 dt = Re 2 2 t +a 2 Z ∞ −∞ eit dt . t2 + a2 eit has a single pole of order 1 in the upper half-plane at z = ai. By t2 + a2 eiz eiz e−a Res 2 . = = ai z + a2 2z z=ai 2ai Proposition 14.29 gives the result. (d) A Fourier transformations Lemma 14.30 For a ∈ R, 1 √ 2π Z R 1 2 −iax e− 2 x 1 2 dx = e− 2 a . (14.22) Proof. −R+ai γ3 ai γ4 γ −R γ1 Let f (z) = e− 2 z , z ∈ C and γ the closed rectangular path γ = γ1 + γ2 + γ3 + γ4 as in the picture. Since f is an entire function, by R Cauchy’s theorem γ f (z) dz = 0. Note that γ2 is parametrized as z = R + ti, t ∈ [0, a], dz = i dt, such that 1 2 R+ai R 2 14 Complex Analysis 400 Z Z 2 Z − 12 (R2 +it)2 a 1 2 2 f (z) dz = e i dt = e− 2 (R +2Rit−t ) i dt 0 Z0 a Z γ2 Z a 1 2 1 2 − 12 R2 + 12 t2 − 12 R2 f (z) dz ≤ e e 2 t dt = Ce− 2 R . dt = e γ2 1 a 0 0 SinceZ e− 2 R tends to 0 as R → ∞, the above integral tends to 0, as well; hence Z R lim f (z) dz = 0. Similarly, one can show that lim f (z) dz = 0. Since γ f (z) dz = 0, R→∞ γ R→∞ γ 2Z 4 f (z) dz = 0, that is we have γ1 +γ3 Z Using ∞ f (x) dx = −∞ Z R 1 2 Z ∞ f (x + ai) dx. −∞ e− 2 x dx = √ 2π, which follows from Example 14.7, page 374, or from homework 41.3, we have Z Z Z √ 1 2 1 2 − 12 x2 − 12 (x2 +2iax−a2 ) a 2 2π = e dx = e dx = e e− 2 x −iax dx R R Z R 1 2 1 2 1 √ e− 2 x −iax dx = e− 2 a . 2π R Chapter 15 Partial Differential Equations I — an Introduction 15.1 Classification of PDE 15.1.1 Introduction There is no general theory known concerning the solvability of all PDE. Such a theory is extremely unlikely to exist, given the rich variety of physical, geometric, probabilistic phenomena which can be modelled by PDE. Instead, research focuses on various particular PDEs that are important for applications in mathematics and physics. Definition 15.1 A partial differential equation (abbreviated as PDE) is an equation of the form F (x, y, . . . , u, ux, uy , . . . , uxx , uxy , . . . ) = 0 (15.1) where F is a given function of the independent variables x, y, . . . of the unknown function u and a finite number of its partial derivatives. We call u a solution of (15.1) if after substitution of u(x, y, . . . ) and its partial derivatives (15.1) is identically satisfied in some region Ω in the space of the independent variables x, y, . . . . The order of a PDE is the order of the highest derivative that occurs. A PDE is called linear if it is linear in the unknown function u and their derivatives ux , uy , uxy , . . . , with coefficients depending only on the variables x, y, . . . . In other words, a linear PDE can be written in the form G(u, ux , uy , . . . , uxx , uxy , . . . ) = f (x, y, . . . ), (15.2) where the function f on the right depends only on the variables x, y, . . . and G is linear in all components with coefficients depending on x, y, . . . . More precisely, the formal differential operator L(u) = G(u, ux, uy , . . . , uxx , uxy , . . . ) which associates to each function u(x, y, . . . ) a new function L(u)(x, y, . . . ) is a linear operator. The linear PDE (15.2) (L(u) = f ) is called homogeneous if f = 0 and inhomogeneous otherwise. For example, cos(xy 2 )uxxy − y 2 ux + 401 15 Partial Differential Equations I — an Introduction 402 u sin x + tan(x2 + y 2) = 0 is a linear inhomogeneous PDE of order 3, the corresponding homogeneous linear PDE is cos(xy 2 )uxxy − y 2 ux + u sin x = 0. A PDE is called quasi-linear if it is linear in all partial derivatives of order m (the order of the PDE) with coefficients which depend on the variables x, y, · · · and partial derivatives of order less than m; for example ux uxx + u2 = 0 is quasi-linear, uxy uxx + 1 = 0 not. Semi-linear equations are those quasi-linear equation in which the coefficients of the highest order terms does not depend on u and its partial derivatives; sin xuxx +u2 = 0 is semi-linear; ux uxx +u2 = 0 not. Sometimes one considers systems of PDEs involving one or more unknown functions and their derivatives. 15.1.2 Examples (1) The Laplace equation in n dimensions for a function u(x1 , . . . , xn ) is the linear second order equation ∆u = ux1x1 + · · · + uxn xn = 0. The solutions u are called harmonic (or potential) functions. In case n = 2 we associate with a harmonic function u(x, y) its “conjugate” harmonic function v(x, y) such that the first-order system of Cauchy–Riemann equations ux = vy , uy = −vx is satisfied. A real solution (u, v) gives rise to the analytic function f (z) = u + iv. The Poisson equation is ∆u = f, for a given function f : Ω → R. The Laplace equation models equilibrium states while the Poisson equation is important in electrostatics. Laplace and Poisson equation always describe stationary processes (there is no time dependence). (2) The heat equation. Here one coordinate t is distinguished as the “time” coordinate, while the remaining coordinates x1 , . . . , xn represent spatial coordinates. We consider u : Ω × R+ → R, Ω open in Rn , where R+ = {t ∈ R | t > 0} is the positive time axis and pose the equation kut = ∆u, where ∆u = ux1 x1 + · · · + uxn xn . The heat equation models heat conduction and other diffusion processes. (3) The wave equation. With the same notations as in (2), here we have the equation utt − a2 ∆u = 0. It models wave and oscillation phenomena. 15.1 Classification of PDE 403 (4) The Korteweg–de Vries equation ut − 6u ux + uxxx = 0 models the propagation of waves in shallow waters. (5) The Monge–Ampère equation uxx uyy − u2xy = f with a given function f , is used for finding surfaces with prescribed curvature. (6) The Maxwell equations for the electric field strength E = (E1 , E2 , E3 ) and the magnetic field strength B = (B1 , B2 , B3 ) as functions of (t, x1 , x2 , x3 ): div B = 0, (magnetostatic law), Bt + curl E = 0, (magnetodynamic law), div E = 4πρ, (electrostatic law, ρ = charge density), Et − curl B = −4πj (electrodynamic law, j = current density) (7) The Navier–Stokes equations for the velocity v(x, t) = (v 1 , v 2 , v 3 ) and the pressure p(x, t) of an incompressible fluid of density ρ and viscosity η: ρvtj +ρ 3 X i=1 v i vxj i − η∆v j = −pxj , j = 1, 2, 3, div v = 0. (8) The Schrödinger equation ~2 ∆u + V (x, u) i~ut = − 2m (m = mass, V = given potential, u : Ω → C) from quantum mechanics is formally similar to the heat equation, in particular in the case V = 0. The factor i, however, leads to crucial differences. Classification We have seen so many rather different-looking PDEs, and it is hopeless to develop a theory that can treat all these diverse equations. In order to proceed we want to look for criteria to classify PDEs. Here are some possibilities (I) Algebraically. (a) Linear equations are (1), (2), (3), (6) which is of first order, and (8) (b) semi-linear equations are (4) and (7) 15 Partial Differential Equations I — an Introduction 404 (c) a non-linear equation is (5) naturally, linear equations are simple than non-linear ones. We shall therefore mostly study linear equations. (II) The order of the equation. The Cauchy–Riemann equations and the Maxwell equations are linear first order equations. (1), (2), (3), (5), (7), (8) are of second order; (4) is of third order. Equations of higher order rarely occur. The most important PDEs are second order PDEs. (III) Elliptic, parabolic, hyperbolic. In particular, for the second order equations the following classification turns out to be useful: Let x = (x1 , . . . , xn ) ∈ Ω and F (x, u, uxi , uxixj ) = 0 be a second-order PDE. We introduce auxiliary variables pi , pij , i, j = 1, . . . , n, and study the function F (x, u, pi, pij ). The equation is called elliptic in Ω if the matrix Fpij (x, u(x), uxi (x), uxi xj (x))i,j=1,...,n of the first derivatives of F with respect to the variables pij is positive definite or negative definite for all x ∈ Ω. this may depend on the function u. The Laplace equation is the prominent example of an elliptic equation. Example (5) is elliptic if f (x) > 0. The equation is called hyperbolic if the above matrix has precisely one negative and d − 1 positive eigenvalues (or conversely, depending on the choice of the sign). Example (3) is hyperbolic and so is (5) if f (x) < 0. Finally, the equation is parabolic if one eigenvalue of the above matrix is 0 and all the other eigenvalues have the same sign. More precisely, the equation can be written in the form ut = F (t, x, u, uxi , uxixj ) with an elliptic F . (IV) According to solvability. We consider the second-order PDE F (x, u, uxi , uxixj ) = 0 for u : Ω → R, and wish to impose additional conditions upon the solution u, typically prescribing the values of u or of certain first derivatives of u on the boundary ∂Ω or part of it. Ideally such a boundary problem satisfies the three conditions of Hadamard for a wellposed problem (a) Existence of a solution u for the given boundary values; (b) Uniqueness of the solution; (c) Stability, meaning continuous dependence on the boundary values. 15.2 First Order PDE — The Method of Characteristics 405 Example 15.1 In the following examples Ω = R2 and u = u(x, y). (a) Find all solutions u ∈ C2 (R2 ) with uxx = 0. We first integrate with respect to x and find that ux is independent on x, say ux = a(y). We again integrate with respect to x and obtain u(x, y) = xa(y) + b(y) with arbitrary functions a and b. Note that the ODE u′′ = 0 has the general solution ax + b with coefficients a, b. Now the coefficients are functions on y. (b) Solve uxx + u = 0, u ∈ C2 (R2 ). The solution of the corresponding ODE u′′ + u = 0, u = u(x), u ∈ C2 (R), is a cos x + b sin x such that the general solution of the corresponding PDE in 2 variables x and y is a(y) cos x + b(y) sin x with arbitrary functions a and b. ∂ (c) Solve uxy = 0, u ∈ C2 (R2 ). First integrate ∂y (ux ) = 0 with respect to y. we obtain R ˜ ux = f(x). Integration with respect to x yields u = f˜(x) dx + g(y) = f (x) + g(y), where f is differentiable and g is arbitrary. 15.2 First Order PDE — The Method of Characteristics We solve first order PDE by the method of chracteristics. It applies to quasi-linear equations a(x, y, u)ux + b(x, y, u)uy = c(x, y, u) (15.3) as well as to the linear equation a(x, y)ux + b(x, y)uy = c0 (x, y)u + c1 (x, y). (15.4) We restrict ourselves to the linear equation with an initial condition given as a parametric curve in the xyu-space Γ = Γ (s) = (x0 (s), y0(s), u0 (s)), s ∈ (a, b) ⊆ R. (15.5) The curve Γ will be called the initial curve. The initial condition then reads u(x0 (s), y0 (s)) = u0 (s), u initial curve (x(0,s), y(0,s),u(0,s)) Γ(s) integral surface initial point y characteristic curves x s ∈ (a, b). The geometric idea behind this method is the following. The solution u = u(x, y) can be thought as surface in R3 = {(x, y, u) | x, y, u ∈ R3 }. Starting from a point on the initial curve, we construct a chracteristic curve in the surface u. If we do so for any point of the initial curve, we obtain a one-parameter family of characteristic curves; glueing all these curves we get the solution surface u. The linear equation (15.4) can be rewritten as (a, b, c0 u + c1 )·(ux , uy , −1) = 0. (15.6) 15 Partial Differential Equations I — an Introduction 406 Recall that (ux , uy , −1) is the normal vector to the surface (x, y, u(x, y)), that is, the tangent equation to u at (x0 , y0 , u0 ) is u − u0 = ux (x − x0 ) + uy (y − y0 ) ⇔ (x − x0 , y − y0 , u − u0 )·(ux , uy , −1) = 0. It follows from (15.6) that (a, b, c0 u + c1 ) is a vector in the tangent plane. Finding a curve (x(t), y(t), u(t)) with exactely this tangent vector (a(x(t), y(t)), b(x(t), y(t)), c0(x(t), y(t))u(t) + c1 (x(t), y(t))) is equivalent to solve the ODE x′ (t) = a(x(t), y(t)), (15.7) y ′(t) = b(x(t), y(t)), (15.8) ′ u (t) = c0 (x(t), y(t))u(t) + c1 (x(t), y(t))). (15.9) This system is called the characteristic equations. The solutions are called characteristic curves of the equation. Note that the above system is autonomous, i. e. there is no explicit dependence on the parameter t. In order to determine characteristic curves we need an initial condition. We shall require the initial point to lie on the initial curve Γ (s). Since each curve (x(t), y(t), u(t)) emanates from a different point Γ (s), we shall explicitely write the curves in the form (x(t, s), y(t, s), u(t, s)). The initial conditions are written as x(0, s) = x0 (s), y(0, s) = y0 (s), u(0, s) = u0 (s). Notice that we selected the parameter t such that the characteristic curve is located at the initial curve at t = 0. Note further that the parametrization (x(t, s), y(t, s), u(t, s)) represents a surface in R3 . The method of characteristics also applies to quasi-linear equations. To summarize the method: In the first step we identify the initial curve Γ . In the second step we select a point s on Γ as initial point and solve the characterictic equations using the point we selected on Γ as an initial point. After preforming the steps for all points on Γ , we obtain a portion of the solution surface, also called integral surface. That consists of the union of the characteristic curves. Example 15.2 Solve the equation ux + uy = 2 subject to the initial condition u(x, 0) = x2 . The characteristic equations and the parametric initial conditions are xt (t, s) = 1, yt (t, s) = 1, ut (t, s) = 2, x(0, s) = s, y(0, s) = 0, u(0, s) = s2 . It is easy to solve the characteristic equations: x(t, s) = t + f1 (s), y(t, s) = t + f2 (s), u(t, s) = 2t + f3 (s). 15.2 First Order PDE — The Method of Characteristics 407 Inserting the initial conditions, we find x(t, s) = t + s, y(t, s) = t, u(t, s) = 2t + s2 . We have obtained a parametric representation of the integral surface. To find an explicit representation we have to invert the transformation (x(t, s), y(t, s)) in the form (t(x, y), s(x, y)), namely, we have to solve for s and t. In the current example, we find t = y, s = x − y. Thus the explicit formula for the integral surface is u(x, y) = 2y + (x − y)2. Remark 15.1 (a) This simple example might lead us to think that each initial value problem for a first-order PDE possesses a unique solution. But this is not the case Is the problem(15.3) together with the initial condition (15.5) well-posed? Under which conditions does there exists a unique integral surface that contains the initial curve? (b) Notice that even if the PDE is linear, the characteristic equations are non-linear. It follows that one can expect at most a local existanece theorem for a first ordwer PDE. (c) The inversion of the parametric presentation of the integral surface might hide further difficulties. Recall that the implicit function theorem implies that the inversion locally exists if the 6= 0. An explicit computation of the Jacobian at a point s of the initial curve Jacobian ∂(x,y) ∂(t,s) gives a b ∂x ∂y ∂x ∂y ′ ′ . J= − = ay0 − bx0 = ′ x0 y0′ ∂t ∂s ∂s ∂t Thus, the Jacobian vanishes at some point if and only if the vectors (a, b) and (x′0 , y0′ ) are linearly dependent. The geometrical meaning of J = 0 is that the projection of Γ into the xy plane is tangent to the projection of the characteristic curve into the xy plane. To ensure a unique solution near the initial curve we must have J 6= 0. This condition is called the transersality condition. Example 15.3 (Well-posed and Ill-posed Problems) (a) Solve ux = 1 subject to the initial condition u(0, y) = g(y). The characteristic equations and the inition conditions are given by xt = 1, yt (t, s) = 0, ut (t, s) = 1, x(0, s) = 0, y(0, s) = s, u(0, s) = g(s). The parametric integral surface is (x(t, s), y(t, s), u(t, s)) = (t, s, t+g(s)) such that the explicit solution is u(x, y) = x + g(y). (b) If we keep ux = 1 but modify the initial condition into u(x, 0) = h(x), the picture changess dramatically. xt = 1, yt (t, s) = 0, ut (t, s) = 1, x(0, s) = s, y(0, s) = 0, u(0, s) = h(s). In this case the parametric solution is (x(t, s), y(t, s), u(t, s)) = (t + s, 0, t + h(s)). 15 Partial Differential Equations I — an Introduction 408 Now, however, the transformation x = t + s, y = 0 cannot be inverted. Geometrically, the projection of the initial curve is the x axis, but this is also the projection of the characteristic curve. In the speial case h(x) = x + c for some constant c, we obtain u(t, s) = t + s + c. Then it is not necessary to invert (x, y) since we find at once u = x + c + f (y) for every differentiable function f with f (0) = 0. We have infinitely many solutions — uniqueness fails. (c) However, for any other choice of h Existence fails — the problem has no solution at all. Note that the Jacobian is a b 1 0 = 0. J = ′ = x0 y0′ 1 0 Remark 15.2 Because of the special role played by the projecions of the characteristics on the xy plane, we also use the term characteristics to denote them. In case of the linear PDE (15.4) the ODE for the projection is x′ (t) = which yields y ′ (x) = dx = a(x(t), y(t)), dt y ′(t) = dy = b(x(t), y(t)), dt (15.10) dy b(x, y) = . dx a(x, y) 15.3 Classification of Semi-Linear Second-Order PDEs 15.3.1 Quadratic Forms We recall some basic facts about quadratic forms and symmetric matrices. Proposition 15.1 (Sylvester’s Law of Inertia) Suppose that A ∈ Rn×n is a symmetric matrix. (a) Then there exist an invertible matrix B ∈ Rn×n , r, s, t ∈ N0 with r + s + t = n and a diagonal matrix D = diag (d1 , d2 , . . . , dr+s , 0, . . . , 0) with di > 0 for i = 1, . . . , r and di < 0 for i = r + 1, . . . , r + s and BAB ⊤ = diag (d1 , . . . , dr+s , 0, . . . , 0). We call (r, s, t) the signature of A. (b) The signature does not depend on the change of coordinates, i. e. If there exist another regular matrix B ′ and a diagonal matrix D ′ with D ′ = B ′ A(B ′ )⊤ then the signature of D ′ and D coincide. 15.3.2 Elliptic, Parabolic and Hyperbolic Consider the semi-linear second-order PDE in n variables x1 , . . . , xn in a region Ω ⊂ Rn n X aij (x) uxi xj + F (x, u, ux1 , . . . , uxn ) = 0 (15.11) i,j=1 with continuous coefficients aij (x). Since we assume u ∈ C2 (Ω), by Schwarz’s lemma we assume without loss of generality that aij = aji. Using the terminology of the introduction 15.3 Classification of Semi-Linear Second-Order PDEs 409 (Classification (III), see page 404) we find that the matrix A(x) := (aij (x))i,j=1,...,n , coincides with the matrix (Fpij )i,j defined therein. Definition 15.2 We call the PDE (15.11) elliptic at x0 if the matrix A(x0 ) is positive definite or negative definite. We call it parabolic at x0 if A(x0 ) is positive or negative semidefinite with exactly one eigenvalue 0. We call it hyperbolic if A(x0 ) has the signature (n − 1, 1, 0), i. e. A is indefinite with n − 1 positive eigenvalues and one negative eigenvalue and no zero eigenvalue (or vice versa). 15.3.3 Change of Coordinates First we study how the coefficients aij will change if we impose a non-singular transformation of coordinates y = ϕ(x); yl = ϕl (x1 , . . . , xn ), l = 1, . . . , n; 1 ,...,ϕn ) The transformation is called non-singular if the Jacobian ∂(ϕ (x0 ) 6= 0 is non-zero at ∂(x1 ,...,xn ) any point x0 ∈ Ω. By the Inverse Mapping Theorem, the transformation possesses locally an inverse transformation denoted by x = ψ(y) xl = ψl (y1, . . . , yn ), l = 1, . . . , n. Putting ũ(y) := u(ψ(y)), then u(x) = ũ(ϕ(x)) and if moreover ϕl ∈ C2 (Ω) we have by the chain rule u xi = n X l=1 ũyl ∂ϕl , ∂xi uxi xj = (uxi )xj = n X n ũyl yk k,l=1 ∂ϕl ∂ϕk X ∂ 2 ϕl + ũyl . ∂xi ∂xj ∂x i ∂xj l=1 (15.12) Inserting (15.12) into (15.11) one has n X k,l=1 ũylyk n X i,j=1 aij n n X ∂ϕl ∂ϕk X ∂ 2 ϕl + ũyl aij + F̃ (y, ũ, ũy1 , . . . , ũyn ) = 0. ∂xi ∂xj ∂x ∂x i j i,j=1 l=1 (15.13) We denote by ãlk the new coefficients of the partial second derivatives of ũ, ãlk = n X aij (x) i,j=1 ∂ϕl ∂ϕk , ∂xi ∂xj and write (15.13) in the same form as (15.11) n X k,l=1 ãlk (y)ũylyk + F̃ (y, ũ, ũy1 , . . . , ũyn ) = 0. (15.14) 15 Partial Differential Equations I — an Introduction 410 Equation (15.14) later plays a crucial role in simplifying PDE (15.11). Namely, if we want some of the coefficients ãlk to be 0, the right hand side of (15.14) has to be 0. Writing blj = ∂ϕl , ∂xj l, j = 1, . . . , n, B = (blj ), the new coefficient matrix Ã(y) = (ãlk (y)) reads as follows à = B ·A·B ⊤ . By Proposition 15.1, A and à have the same signature. We have shown the following proposition. Proposition 15.2 The type of a semi-linear second order PDE is invariant under the change of coordinates. Notation. We call the operator L with L(u) = n X aij (x) i,j=1 ∂2u + F (x, u, ux1 , . . . , uxn ) ∂xi ∂xj differential operator and denote by L2 n X ∂2u L2 (u) = aij (x) ∂xi ∂xj i,j=1 the sum of its the highest order terms; L2 is a linear operator. Definition 15.3 The second-order PDE L(u) = 0 has normal form if L2 (u) = m X j=1 u xj xj − r X u xj xj j=m+1 with some positive integers m ≤ r ≤ n. Remarks 15.3 (a) It happens that the type of the equation depends on the point x0 ∈ Ω. For example, the Trichomi equation yuxx + uyy = 0 is of mixed type. More precisely, it is elliptic if y > 0, parabolic if y = 0 and hyperbolic if y < 0. (b) The Laplace equation is elliptic, the heat equation is parabolic, the wave equation is hyperbolic. (c) The classification is not complete in case n ≥ 3; for example, the quadratic form can be of type (n − 2, 1, 1). (d) Case n = 2. The PDE auxx + 2buxy + cuyy + F (x, y, u, ux, uy ) = 0 with coefficients a = a(x, y), b = b(x, y) and c = c(x, y) is elliptic, parabolic or hyperbolic at (x0 , y0) if and only if ac − b2 > 0, ac − b2 = 0 or ac − b2 < 0 at (x0 , y0 ), respectively. 15.3 Classification of Semi-Linear Second-Order PDEs 411 15.3.4 Characteristics Suppose we are given the semi-linear second-order PDE in Ω ⊂ Rn n X aij (x) i,j=1 ∂2u + F (x, u, ux1 , . . . , uxn ) = 0 ∂xi ∂xj (15.15) with continuous aij ; aij (x) = aji(x). We define the concept of characteristics which plays an important role in the theory of PDEs, not only of second-order PDEs. Definition 15.4 Suppose that σ ∈ C1 (Ω) is continuously differentiable, grad σ 6= 0, and (a) for some point x0 of the hypersurface F = {x ∈ Ω | σ(x) = c}, c ∈ R, we have n X i,j=1 aij (x0 ) ∂σ(x0 ) ∂σ(x0 ) = 0. ∂xi ∂xj (15.16) Then F is said to be characteristic at x0 . (b) If F is characteristic at every point of F, F is called a characteristic hypersurface or simply a characteristic of the PDE (15.11). Equation (15.16) is called the characteristic equation of (15.11). In case n = 2 we speak of characteristic lines. If all hypersurfaces σ(x) = c, a < c < b, are characteristic, this family of hypersurfaces fills the region Ω such that for any point x ∈ Ω there is exactly one hypersurface with σ(x) = c. This c can be chosen to be one new coordinate. Setting y1 = σ(x) we see from (15.14) that ã11 = 0. That is, the knowledge of one or more characteristic hypersurfaces can simplify the PDE. Example 15.4 (a) The characteristic equation of uxy = 0 is σx σy = 0 such that σx = 0 and σy = 0 define the characteristic lines; the parallel lines to the coordinate axes, y = c1 and x = c2 , are the characteristics. (b) Find type and characteristic lines of x2 uxx − y 2 uyy = 0, x 6= 0, y 6= 0. Since det = x2 (−y 2 )−0 = −x2 y 2 < 0, the equation is hyperbolic. The characteristic equation, in the most general case, is aσx2 + 2bσx σy + cσy2 = 0. Since grad σ 6= 0, σ(x, y) = c is locally solvable for y = y(x) such that y ′ = −σx /σy . Another way to obtain this is as follows: Differentiating the equation σ(x, y) = c yields σx dx+σy dy = 0 or dy/ dx = −σx /σy . Inserting this into the previous equation we obtain a quadratic ODE a(y ′ )2 − 2by ′ + c = 0, 15 Partial Differential Equations I — an Introduction 412 with solutions √ b2 − ac , if a 6= 0. a We can see, that the elliptic equation has no characteristic lines, the parabolic equation has one family of characteristics, the hyperbolic equation has two families of characteristic lines. Hyperbolic case. In general, if c1 = ϕ1 (x, y) is the first family of characteristic lines and c2 = ϕ2 (x, y) is the second family of characteristic lines, ′ y = b± ξ = ϕ1 (x, y), η = ϕ2 (x, y) is the appropriate change of variable which gives ã = c̃ = 0. The transformed equation then reads 2b̃ ũξη + F (ξ, η, ũ, ũξ , ũη ) = 0. Parabolic case. Since det A = 0, there is only one real family of characteristic hyperplanes, say c1 = ϕ1 (x, y). We impose the change of variables z = ϕ1 (x, y), y = y. Since det à = 0, the coefficients b̃ vanish (together with ã). The transformed equation reads c̃ ũyy + F (z, y, ũ, ũz , ũy ) = 0. The above two equations are called the characteristic forms of the PDE (15.11) In our case the characteristic equation is x2 (y ′)2 − y 2 = 0, y ′ = ±y/x. This yields dx dy =± , log | y | = ± log | x | + c0 . y x We obtain the two families of characteristic lines Indeed, in our example ξ= c2 . x y = c1 x, y= y = c1 , x η = xy = c2 gives ηx = y, ξx = − ηy = x, 1 ξy = , x y , x2 ηxx = 0, ξxx = 2 y , x3 ηyy = 0, ηxy = 1, ξyy = 0, ξxy = − In our case (15.12) reads uxx = ũξξ ξx2 + 2ũξη ξx ηx + ũηη ηx2 + ũξ ξxx + ũη ηxx uyy = ũξξ ξy2 + 2ũξη ξy ηy + ũηη ηy2 + ũξ ξyy + ũη ηyy 1 . x2 15.3 Classification of Semi-Linear Second-Order PDEs 413 Noting x2 = η/ξ, y 2 = ξη and inserting the values of the partial derivatives of ξ and η we get y2 y2 y − 2 ũξη + ũηη y 2 + 2 3 ũξ , 4 2 x x x 1 = ũξξ 2 + 2ũξη + ũηη x2 . x uxx = ũξξ uyy Hence y x2 uxx − y 2 uyy = −4y 2 ũξη + 2 ũξ = 0 x 1 1 ũξη − ũξ = 0. 2 xy Since η = xy, we obtain the characteristic form of the equation to be ũξη − 1 ũξ = 0. 2η 1 1 v = 0 which corresponds to the ODE v ′ − 2η v= Using the substitution v = ũξ , we obtain vη − 2η √ √ 0. Hence, v(η, ξ) = c(ξ) η. Integration with respect to ξ gives ũ(ξ, η) = A(ξ) η + B(η). Transforming back to the variables x and y, the general solution is y √ u(x, y) = A xy + B(xy). x (c) The one-dimensional wave equation utt − a2 uxx = 0. The characteristic equation σt2 = a2 σx2 yields −σt /σx = dx/ dt = ẋ = ±a. The characteristics are x = at + c1 and x = −at + c2 . The change of variables ξ = x − at and η = x + at yields ũξη = 0 which has general solution ũ(ξ, η) = f (ξ) + g(η). Hence, u(x, t) = f (x − at) + g(x + at) is the general solution, see also homework 23.2. . (d) The wave equation in n dimensions has characteristic equation σt2 2 −a n X σx2i = 0. i=1 This equation is satisfied by the characteristic cone 2 (0) 2 σ(x, t) = a (t − t ) − n X i=1 (0) (xi − xi )2 = 0, where the point (x(0) , t(0) ) is the peak of the cone. Indeed, implies σt2 − a2 Pn i=1 (xi σt = 2a2 (t − t(0) ), (0) − xi )2 = 0. (0) σx1 = −2(xi − xi ) 15 Partial Differential Equations I — an Introduction 414 Further, there are other characteristic surfaces: the hyperplanes σ(x, t) = at + n X bi xi = 0, i=1 where kbk = 1. P (e) The heat equation has characteristic equation ni=1 σx2i = 0 which implies σxi = 0 for all i = 1, . . . , n such that t = c is the only family of characteristic surfaces (the coordinate hyperplanes). (f) The Poisson and Laplace equations have the same characteristic equation; however we have one variable less (no t) and obtain grad σ = 0 which is impossible. The Poisson and Laplace equations don’t have characteristic surfaces. 15.3.5 The Vibrating String (a) The Infinite String on R We consider the Cauchy problem for an infinite string (no boundary values): utt − a2 uxx = 0, u(x, 0) = u0 (x), where u0 and u1 are given. Inserting the initial values into u(x, t) = f (x − at) + g(x + at) we get the u0 (x) = f (x) + g(x), ut (x, 0) = u1(x), general solution (see Example 15.4 (c)) u1(x) = −af ′ (x) + ag ′ (x). Differentiating the first one yields u′0 (x) = f ′ (x) + g ′(x) such that 1 1 f ′ (x) = u′0 (x) − u1 (x), 2 2a Integrating these equations we obtain Z x 1 1 f (x) = u0 (x) − u1 (y) dy + A, 2 2a 0 1 1 g ′(x) = u′0 (x) + u1 (x). 2 2a 1 1 g(x) = u0 (x) + 2 2a Z x u1 (y) dy + B, 0 where A and B are constants such that A + B = 0 (since f (x) + g(x) = u0 (x)). Finally we have u(x, t) = f (x − at) + g(x + at) 1 = (u0 (x + at) + u0 (x − at)) − 2 1 = (u0 (x + at) + u0 (x − at)) + 2 Z x−at Z x+at 1 1 u1 (y) dy + u1 (y) dy 2a 0 2a 0 Z x+at 1 u1 (y) dy. (15.17) 2a x−at 15.3 Classification of Semi-Linear Second-Order PDEs 415 t It is clear from (15.17) that u(x, t) is uniquely determined by the values of the initial functions u0 and u1 in the interval [x − at, x + at] whose end points are cut out by the characteristic lines through the point (x, t). This interval represents the domain of dependence for the solution at point u(x, t) as shown in the figure. (x,t) ξ x-at x x+at Conversely, the initial values at point (ξ, 0) of the x-axis influence u(x, t) at points (x, t) in the wedge-shaped region bounded by the characteristics through (ξ, 0), i. e. , for ξ−at < x < ξ+at. This indicates that our “signal” or “disturbance” only moves with speed a. We want to give some interpretation of the solution (15.17). Suppose u1 = 0 and u 1 t=0 ( | x | ≤ a, 1 − | xa | , u0 (x) = x 0, | x | > a. -a a In this example we consider the vibrating string which is plucked at time t = 0 as in the above picture (given u0 (x)). The initial velocity is zero (u1 = 0). u t=1/2 1/2 -3a/2 a/2 -a/2 x 3a/2 u t=1 1/2 -2a -a a 2a x In the pictures one can see the behaviour of the string. The initial peek is divided into two smaller peeks with the half displacement, one moving to the right and one moving to the left with speed a. u t=2 1/2 -3a -2a -a a 2a 3a Formula (15.17) is due to d’Alembert (1746). Usually one assumes u0 ∈ C2 (R) and u1 ∈ C1 (R). In this case, u ∈ C2 (R2 ) and we are able to evaluate the classical Laplacian ∆(u) which gives a continuous function. On the other hand, the right hand side of (15.17) makes sense for arbitrary continuous function u1 and arbitrary u0 . If we want to call these u(x, t) a generalized solution of the Cauchy problem we have to alter the meaning of ∆(u). In particular, we need more general notion of functions and derivatives. This is our main objective of the next section. (b) The Finite String over [0, l] We consider the initial boundary value problem (IBVP) utt = a2 uxx , u(0, x) = u0(x), ut (0, x) = u1 (x), x ∈ [0, l], u(0, t) = u(l, t) = 0, t ∈ R. 15 Partial Differential Equations I — an Introduction 416 Suppose we are given functions u0 ∈ C2 ([0, l]) and u1 ∈ C1 ([0, l]) on [0, l] with u0 (0) = u0 (l) = 0, u1 (0) = u1 (l) = 0, u′′0 (0) = u′′0 (l) = 0. To solve the IBVP, we define new functions ũ0 and ũ1 on R as follows: first extend both functions to [−l, l] as odd functions, that is, ũi (−x) = −ui (x), i = 0, 1. Then extend ũi as a 2l-periodic function to the entire real line. The above assumptions ensure that ũ0 ∈ C2 (R) and ũ1 ∈ C1 (R). Put Z x+at 1 1 u(x, t) = (ũ0 (x + at) + ũ0 (x − at)) + ũ1 (y) dy. 2 2a x−at Then u(x, t) solves the IVP. Chapter 16 Distributions 16.1 Introduction — Test Functions and Distributions In this section we introduce the notion of distributions. Distributions are generalized functions. The class of distributions has a lot of very nice properties: they are differentiable up to arbitrary order, one can exchange limit procedures and differentiation, Schwarz’ lemma holds. Distributions play an important role in the theory of PDE, in particular, the notion of a fundamental solution of a differential operator can be made rigorous within the theory of distributions only. Generalized functions were first used by P. Dirac to study quantum mechanical phenomena. Systematically he made use of the so called δ-function (better: δ-distribution). The mathematical foundations of this theory are due to S. L. Sobolev (1936) and L. Schwartz (1950, 1915 – 2002). Since then many mathematicians made progress in the theory of distributions. Motivation comes from problems in mathematical physics and in the theory of partial differential equations. Good accessible (German) introductions are given in the books of W. Walter [Wal74] and O. Forster [For81, § 17]. More detailed explanations of the theory are to be found in the books of H. Triebel (in English and German), V. S. Wladimirow (in russian and german) and Gelfand/Schilow (in Russian and German, part I, II, and III), [Tri92, Wla72, GS69, GS64]. 16.1.1 Motivation Distributions generalize the notion of a function. They are linear functionals on certain spaces of test functions. Using distributions one can express rigorously the density of a mass point, charge density of a point, the single-layer and the double-layer potentials, see [Arn04, pp. 92]. Roughly speaking, a generalized function is given at a point by the “mean values” in the neighborhood of that point. The main idea to associate to each “sufficiently nice” function f a linear functional Tf (a distribution) on an appropriate function space D is described by the following formula. hTf , ϕi = Z R f (x)ϕ(x) dx, 417 ϕ ∈ D. (16.1) 16 Distributions 418 On the left we adopt the notation of a dual pairing of vector spaces from Definition 11.1. In general the bracket hT , ϕi denotes the evaluation of the functional T on the test function ϕ. Sometimes it is also written as T (ϕ). It does not denote an inner product; the left and the right arguments are from completely different spaces. What we really want of Tf is (a) The correspondence should be one-to-one, i. e., different functionals Tf and Tg correspond to different functions f and g. To achieve this, we need the function space D sufficiently large. (b) The class of functions f should contain at least the continuous functions. However, if f (x) = xn , the function f (x)ϕ(x) must be integrable over R, that is xn ϕ(x) ∈ L1 (R). Since polynomials are not in L1 (R), the functions ϕ must be “very small” for large | x |. Roughly speaking, there are two possibilities to this end. First, take only those functions ϕ which are identically zero outside a compact set (which depends on ϕ). This leads to the test functions D(R). Then Tf is well-defined if f is integrable over every compact subset of R. These functions f are called locally integrable. Secondly, we take ϕ(x) to be rapidly decreasing as | x | tends to ∞. More precisely, we want sup | xn ϕ(x) | < ∞ R x∈ for all non-negative integers n ∈ Z+ . This concept leads to the notion of the so called Schwartz space S (R). (c) We want to differentiate f arbitrarily often, even in case that f has discontinuities. The only thing we have to do is to give the expression Z R f ′ (x)ϕ(x) dx, ϕ∈D a meaning. Using integration by parts and the fact that ϕ(+∞) = ϕ(−∞) = 0, the above R expression equals − R f (x)ϕ′ (x) dx. That is, instead differentiating f , we differentiate the test function ϕ. In this way, the functional Tf ′ makes sense as long as f ϕ′ is integrable. Since we want to differentiate f arbitrarily often, we need the test function ϕ to be arbitrarily differentiable, ϕ ∈ C∞ (R). Note that conditions (b) and (c) make the space of test functions sufficiently small. R ) and D(Ω) 16.1.2 Test Functions D( n We want to solve the problem f ϕ to be integrable for all polynomials f . We use the first approach and consider only functions ϕ which are 0 outside a bounded set. If nothing is stated otherwise, Ω ⊆ Rn denotes an open, connected subset of Rn . 16.1 Introduction — Test Functions and Distributions 419 (a) The Support of a Function and the Space of Test Functions Let f be a function, defineds on Ω. The set supp f := {x ∈ Ω | f (x) 6= 0} ⊂ Rn is called the support of f , denoted by supp f . Remark 16.1 (a) supp f is always closed; it is the smallest closed subset M such that f (x) = 0 for all x ∈ Rn \ M. (b) A point x0 6∈ supp f if and only if there exists ε > 0 such that f ≡ 0 in Uε (x0 ). This in particular implies that for f ∈ C∞ (Rn ) we have f (k) (x0 ) = 0 for all k ∈ N. (c) supp f is compact if and only if it is bounded. Example 16.1 (a) supp sin = R. (b) Let let f : (−1, 1) → R, f (x) = x(1 − x). Then supp f = [−1, 1]. (c)The characteristic function χM has support M . (d) Let h be the hat function on R – note that supp h = [−1, 1] and f (x) = 2h(x) − 3h(x − 10). Then supp f = [−1, 1] ∪ [−11, −9]. Definition 16.1 (a) The space D( Rn with compact support. R ) consists of all infinitely differentiable functions f on n n ∞ n D(Rn ) = C∞ 0 (R ) = {f ∈ C (R ) | supp f is compact}. (b) Let Ω be a region in Rn . Define D(Ω) as follows D(Ω) = {f ∈ C∞ (Ω) | supp f is compact in Rn and supp f ⊂ Ω}. We call D(Ω) the space of test functions on Ω. c/e h(t) First of all let us make sure the existence of such functions. On the real axis consider the “hat” function (also called bump function) ( − 1 | t | < 1, c e 1−t2 , h(t) = 0, | t | ≥ 1. −1 1 R The constant c is chosen such that R h(t) dt = 1. The function h is continuous on R. It was already shown in Example 4.5 that h(k) (−1) = h(k) (1) = 0 for all k ∈ N. Hence h ∈ D(R) is a test function with supp h = [−1, 1]. Accordingly, the function ( − 1 kxk < 1, cn e 1−kxk2 , h(x) = 0, kxk ≥ 1. 16 Distributions 420 is an element of D(Rn ) with support being theZ closed unit ball supp h = {x | kxk ≤ 1}. The Z constant cn is chosen such that h(x) dx = Rn h(x) dx = 1. U1 (0) For ε > 0 we introduce the notation 1 x . h εn ε hε (x) = Then supp hε = U ε (0) and Z Z Z x 1 dx = h h(y) dy = 1. hε (x) dx = n ε Uε (0) ε Rn U1 (0) So far, we have constructed only one function h(x) (as well as its scaled relatives hε (x)) which is C∞ and has compact support. Using this single hat-function hε we are able (a) to restrict the support of an arbitrary integrable function f to a given domain by replacing f by f hε (x − a) which has a support in Uε (a), (b) to make f smooth. (b) Mollification In this way, we have an amount of C∞ functions with compact support which is large enough for our purposes (especially, to recover the function f from the functional Tf ). Using the function hε , S. L. Sobolev developed the following mollification method. Definition 16.2 (a) Let f ∈ L1 (Rn ) and g ∈ D(Rn ), define the convolution product f ∗ g by Z Z f (y)g(x − y) dy = f (x − y)g(y) dy = (g ∗ f )(x). (f ∗ g)(x) = Rn Rn (b) We define the mollified function fε of f by fε = f ∗ hε . Note that fε (x) = Z Rn hε (x − y)f (y) dy = Z hε (x − y)f (y) dy. (16.2) Uε (x) Roughly speaking, fε (x) is the mean value of f in the ε-neighborhood of x. If f is continuous at x0 then fε (x0 ) = f (ξ) for some ξ ∈ Uε (x0 ). This follows from Proposition 5.18. In particular let f = χ[1,2] the characteristic function of the interval [1, 2]. The mollification fε looks as follows 0, x < 1 − ε, 1 2 1 − ε < x < 1 + ε, ∗, fε (x) = 1− ε 1 1+ ε 2− ε 2 2+ ε 1, ∗, 0, 1 + ε < x < 2 − ε, 2 − ε < x < 2 + ε, 2 + ε < x, 16.1 Introduction — Test Functions and Distributions 421 where ∗ denotes a value between 0 and 1. Remarks 16.2 (a) For f ∈ L1 (Rn ), fε ∈ C∞ (Rn ), (b) fε → f in L1 (Rn ) as ε → 0. (c) C0 (Rn ) ⊂ L1 (Rn ) is dense (with respect to the L1 -norm). In other words, for any R f ∈ L1 (Rn ) and ε > 0 there exists g ∈ C(Rn ) with supp g is compact and Rn | f − g | dx < ε. (Sketch of Proof). (A) Any integrable function can be approximated by integrable functions with compact support. This follows from Example 12.6. (B) Any integrable function with compact support can be approximated by simple functions (which are finite linear combinations of characteristic functions) with compact support. (C) Any characteristic function with compact support can be approximated by characteristic functions χQ where Q is a finite union of boxes. (D) Any χQ where Q is a closed box can be approximated by a sequence fn of continuous functions with compact support: fn (x) = max{0, n d(x, Q)}, n ∈ N, where d(x, Q) denotes the distance of x from Q. Note that fn is 1 in Q and 0 outside U1/n (Q). n 1 n (d) C∞ 0 (R ) ⊂ L (R ) is dense. (b) Convergence in D Notations. For x ∈ Rn and α ∈ Nn0 (a multi-index), α = (α1 , . . . , αn ) we write | α | = α1 + α2 + · · · + αn , α! = α1 ! · · · αn ! xα = xα1 1 xα2 2 · · · xαnn , ∂ | α | u(x) . D u(x) = α1 ∂x1 · · · ∂xαnn α It is clear that D(Rn ) is a linear space. We shall introduce an appropriate notion of convergence. Definition 16.3 A sequence (ϕn (x)) of functions of D(Rn ) converges to ϕ ∈ D(Rn ) if there exists a compact set K ⊆ Rn such that (a) supp ϕn ⊆ K for all n ∈ N and (b) D α ϕn ⇉ D α ϕ, uniformly on K for all multi-indices α . We denote this type of convergence by ϕn −→ ϕ. D 16 Distributions 422 Example 16.2 Let ϕ ∈ D be a fixed test function and consider the sequence (ϕn (x)) given by ϕ(x) (a) . This sequence converges to 0 in D since supp ϕn = supp ϕ for all n and the n convergence isuniform for all x ∈ Rn (in fact, it suffices to consider x ∈ supp ϕ). ϕ(x/n) . The sequence does not converge to 0 in D since the supports supp (ϕn ) = (b) n n supp n ∈ N, are not in any common compact subset. (ϕ), ϕ(nx) (c) has no limit if ϕ 6= 0, see homework 49.2. n Note that D(Rn ) is not a metric space, more precisely, there is no metric on D(Rn ) such that the metric convergence and the above convergence coincide. 16.2 The Distributions D′( R) n Definition 16.4 A distribution (generalized function) is a continuous linear functional on the space D(Rn ) of test functions. Here, a linear functional T on D is said to be continuous if and only if for all sequences (ϕn ), ϕn , ϕ ∈ D, with ϕn −→ ϕ we have hT , ϕn i → hT , ϕi in C. D The set of distributions is denoted by D′ (Rn ) or simply by D′ . The evaluation of a distribution T ∈ D′ on a test function ϕ ∈ D is denoted by hT , ϕi. Two distributions T1 and T2 are equal if and only if hT1 , ϕi = hT2 , ϕi for all ϕ ∈ D. Remark 16.3 (Characterization of continuity.) (a) A linear functional T on D(Rn ) is continuous if and only if ϕn −→ 0 implies hT , ϕn i → 0 in C. Indeed, T continuous, trivially D implies the above statement. Suppose now, that ϕn −→ ϕ. Then (ϕn − ϕ) −→ 0; thus D D hT , ϕn − ϕi → 0 as n → ∞. Since T is linear, this shows hT , ϕn i → hT , ϕi and T is continuous. (b) A linear functional T on D is continuous if and only if for all compact sets K there exist a constant C > 0 and l ∈ Z+ such that for all | hT , ϕi | ≤ C sup x∈K, | α |≤l | D α ϕ(x) | , ∀ϕ ∈ D with supp ϕ ⊂ K. (16.3) We show that the criterion (16.3) in implies continuity of T . Indeed, let ϕ −→ 0. Then there D exists compact subset K ⊂ Rn such that supp ϕn ⊆ K for all n. By the criterion, there is a C > 0 and an l ∈ Z+ with | hT , ϕn i | ≤ C sup | D α ϕn (x) |, where the supremum is taken over all x ∈ K and multiindices α with | α | ≤ l. Since D α ϕn ⇉ 0 on K for all α, we particularly have sup | D α ϕn (x) | −→ 0 as n → ∞. This shows hT , ϕn i → 0 and T is continuous. For the proof of the converse direction, see [Tri92, p. 52] 16.2.1 Regular Distributions ′ A large subclass of distributions Z of D is given by ordinary functions via the correspondence f ↔ Tf given by hTf , ϕi = f (x)ϕ(x) dx. We are looking for a class which is as large as R 16.2 The Distributions D′ (Rn ) 423 possible. Definition 16.5 Let Ω be an open subset of Rn . A function f (x) on Ω is said to be locally integrable over Ω if f (x) is integrable over every compact subset K ⊆ Ω; we write in this case f ∈ L1loc (Ω). Remark 16.4 The following are equivalent: (a) f ∈ L1loc (Rn ). (b) For any R > 0, f ∈ L1 (UR (0)). (c) For any x0 ∈ Rn there exists ε > 0 such that f ∈ L1 (Uε (x0 )). Lemma 16.1 If f is locally integrable f ∈ L1loc (Ω), Tf is a distribution, Tf ∈ D′ (Ω). A distribution T which is of the form T = Tf with some locally integrable function f is called regular. Proof. First, Tf is linear functional on D since integration is a linear operation. Secondly, if ϕm −→ 0, then there exists a compact set K with supp ϕm ⊂ K for all m. We have the D following estimate: Z Z | f (x) | dx = C sup | ϕm (x) | , f (x)ϕm (x) dx ≤ sup | ϕm (x) | R Rn x∈K K x∈K where C = K | f | dx exists since f ∈ L1loc . The expression on the right tends to 0 since ϕm (x) uniformly tends to 0. Hence hTf , ϕm i → 0 and Tf belongs to D′ . Example 16.3 (a) C(Ω) ⊂ L1loc (Ω), L1 (Ω) ⊆ L1loc (Ω). 1 is in L1loc ((0, 1)); however, f 6∈ L1 ((0, 1)) and f 6∈ L1loc (R) since f is not (b) f (x) = x integrable over [−1, 1]. Lemma 16.2 (Du Bois–Reymond, Fund. Lemma of the Calculus of Variation) Let Ω Rn be a region. Suppose that f ∈ L1loc (Rn ) and hTf , ϕi = 0 for all ϕ ∈ D(Ω). Then f = 0 almost everywhere in Ω. ⊆ Proof. For simplicity we consider the case n = 1, Ω = (−π, π). Fix ε with 0 < ε < π. Let ϕn (x) = e−inx hε (x), n ∈ Z. Then supp ϕn ⊂ [−π, π]. Since both, ex and hε are C∞ -functions, ϕn ∈ D(Ω) and Z π cn = hTf , ϕn i = f (x)e−inx hε (x) dx = 0, n ∈ Z; −π and all Fourier coefficients of f hε ∈ L2 [−π, π] vanish. From Theorem 13.13 (b) it follows that f hε is 0 in L2 (−π, π). By Proposition 12.16 it follows that f hε is 0 a.e. in (−π, π). Since hε > 0 on (−π, π), f = 0 a.e. on (−π, π). 16 Distributions 424 Remark 16.5 The previous lemma shows, if f1 and f2 are locally integrable and Tf1 = Tf2 then f1 = f2 a.e.; that is, the correspondence is one-to-one. In this way we can identify L1loc (Rn ) ⊆ D′ (Rn ) the locally integrable functions as a subspace of the distributions. 16.2.2 Other Examples of Distributions Definition 16.6 Every non-regular distribution is called singular. The most important example of singular distribution is the δ-distribution δa defined by a ∈ Rn , hδa , ϕi = ϕ(a), ϕ ∈ D. It is immediate that δa is a linear functional on D. Suppose that ϕn −→ 0 then ϕn (x) → 0 D pointwise. Hence, δa (ϕn ) = ϕn (a) → 0; the functional is continuous on D and therefore a distribution. We will also use the notation δ(x − a) in place of δa and δ or δ(x) in place of δ0 . Proof that δa is singular. If δa ∈ D′ were regular there would exist a function f ∈ L1loc such that R δa = Tf , that is ϕ(a) = Rn f (x)ϕ(x) dx. First proof. Let Ω ⊂ Rn be an open such that a 6∈ Ω. R Let ϕ ∈ D(Ω), that is, supp ϕ ⊂ Ω. In particular ϕ(a) = 0. That is, Ω f (x)ϕ(x) dx = 0 for all ϕ ∈ D(Ω). By Du Bois-Reymond’s Lemma, f = 0 a.e. in Ω. Since Ω was arbitrary, f = 0 a.e. in Rn \ {a} and therefore, f = 0 a.e. in Rn . It follows that Tf = 0 in D′ (Rn ), however δa 6= 0 – a contradiction. Second Proof for a = 0. Since f ∈ L1loc there exists ǫ > 0 such that d := Z Uǫ (0) | f (x) | dx < 1. Putting ϕ(x) = h(x/ε) with the bump function h we have supp ϕ = Uε (0) and supx∈Rn | ϕ(x) | = ϕ(0) > 0 such that Z Z f (x)ϕ(x) dx ≤ sup | ϕ(x) | n R Uε (0) | f (x) | dx = ϕ(0)d < ϕ(0). R This contradicts Rn f (x)ϕ(x) dx = | ϕ(0) | = ϕ(0). In the same way one can show that the assignment hT , ϕi = D α ϕ(a), defines an element of D′ which is singular. The distribution Z hT , ϕi = Rn a ∈ Rn , f (x)D α ϕ(x) dx, ϕ∈D f ∈ L1loc , may be regular or singular which depends on the properties of f . Locally integrable functions as well as δa describe mass, force, or charge densities. That is why L. Schwartz named the generalized functions “distributions.” 16.2 The Distributions D′ (Rn ) 425 16.2.3 Convergence and Limits of Distributions There are a lot of possibilities to approximate the distribution δ by a sequence of L1loc functions. Definition 16.7 A sequence (Tn ), Tn ∈ D′ (Rn ), is said to be convergent to T ∈ D′ (Rn ) if and only if for all ϕ ∈ D(Rn ) lim Tn (ϕ) = T (ϕ). n→∞ Similarly, let Tε , ε > 0, be a family of distributions in D′ , we say that limε→0 Tε = T if limε→0 Tε (ϕ) = T (ϕ) for all ϕ ∈ D. Note that D′ (Rn ) with the above notion of convergence is complete, see [Wal02, p. 39]. Example 16.4 Let f (x) = 12 χ[−1,1] and fε = 1ε f xε be the scaling of f . Note that fε = 1/(2ε)χ[−ε,ε]. We will show tht fε → δ in D′ (R). Indeed, for ϕ ∈ D(R), by the Mean Value Theorem of integration, Z Z 1 1 ε 1 Tfε (ϕ) = χ[−ε,ε]ϕ dx = ϕ(x) dx = 2εϕ(ξ) = ϕ(ξ), ξ ∈ [−ε, ε], 2ε R 2ε −ε 2ε for some ξ. Since ϕ is continuous at 0, ϕ(ξ) tends to ϕ(0) as ε → 0 such that lim Tfε (ϕ) = ϕ(0) = δ(ϕ). ε→0 This proves the calaim. The following lemma generalizes this example. Lemma 16.3 Suppose that f ∈ L1 (R) with function fε (x) = 1ε f xε . Then lim Tfε = δ in D′ (R). R R f (x) dx = 1. For ε > 0 define the scaled ε→0+0 R Proof. By the change of variable theorem, R fε (x) dx = 1 for all ε > 0. To prove the claim we have to show that for all ϕ ∈ D Z Z fε (x)ϕ(x) dx −→ ϕ(0) = fε (x)ϕ(0) dx as ε → 0; R or, equivalently, R Z fε (x)(ϕ(x) − ϕ(0)) dx −→ 0, R as ε → 0. Using the new coordinate y with x = εy, dx = ε dy the above integral equals Z Z = . εf (εy)(ϕ(εy) − ϕ(0)) dy f (y)(ϕ(εy) − ϕ(0)) dy ε R R Since ϕ is continuous at 0, for every fixed y, the family of functions (ϕ(εy) − ϕ(0)) tends to 0 as ε → 0. Hence, the family of functions gε (y) = f (y)(ϕ(εy) − ϕ(0)) pointwise tends to 16 Distributions 426 0. Further, gε has an integrable upper bound, 2C | f |, where C = sup | ϕ(x) |. By Lebesgue’s R x∈ theorem about the dominated convergence, the limit of the integrals is 0: Z Z lim | f (y) | | ϕ(εy) − ϕ(0) | dy = | f (y) | lim | ϕ(εy) − ϕ(0) | dy = 0. ε→0 R R ε→0 This proves the claim. The following sequences of locally integrable functions approximate δ as ε → 0. 1 2 x sin , πεx2 ε x2 1 fε (x) = √ e− 4ε2 , 2ε π 1 ε , π x2 + ε2 x 1 sin fε (x) = πx ε fε (x) = fε (x) = (16.4) The first three functions satisfy the assumptions of the Lemma, the last one not since sinx x is R∞ not in L1 (R). Later we will see that the above lemma even holds if −∞ f (x) dx = 1 as an improper Riemann integral. 16.2.4 The distribution P 1 x Since the function x1 is not locally integrable in a neighborhood of 0, 1/x is not a regular distribution. However, we can define a substitute that coincides with 1/x for all x 6= 0. Recall that the principal value (or Cauchy mean value) of an improper Riemann integral is defined as follows. Suppose f (x) has a singularity at c ∈ [a, b] then Z c−ε Z b Z b f (x) dx := lim + f (x) dx. Vp ε→0 a a c+ε R 1 dx For example, Vp −1 x2n+1 = 0, n ∈ N. For ϕ ∈ D define Z −ε Z ∞ Z ∞ ϕ(x) ϕ(x) F (ϕ) = Vp dx = lim dx. + ε→0 x −∞ x −∞ ε Then F is obviously linear. We have to show, that F (ϕ) is finite and continuous on D. Suppose that supp ϕ ⊆ [−R, R]. Define the auxiliary function ( ϕ(x)−ϕ(0) , x 6= 0 x ψ(x) = ϕ′ (0), x = 0. Rε Since ϕ is differentiable at 0, ψ ∈ C(R). Since 1/x is odd, −ε dx/x = 0 and we get Z −ε Z ∞ Z −ε Z R ϕ(x) ϕ(x) − ϕ(0) F (ϕ) = lim + + dx = lim dx ε→0 ε→0 x x −∞ ε −R ε Z −ε Z R Z R + ψ(x) dx = ψ(x) dx. = lim ε→0 −R ε −R 16.2 The Distributions D′ (Rn ) 427 Since ψ is continuous, the above integral exists. We now prove continuity of F . By Taylor’s theorem, ϕ(x) = ϕ(0) + xϕ′ (ξx ) for some value ξx between x and 0. Therefore Z −ε Z ∞ ϕ(x) | F (ϕ) | = lim dx + ε→0 x −∞ ε Z −ε Z R ′ (ξ ) ϕ(0) + xϕ x = lim dx + ε→0 x −R ε Z R ≤ | ϕ′ (ξx ) | dx ≤ 2R sup | ϕ′ (x) | . R x∈ −R This shows that the condition (16.3) in Remark 16.3 is satisfied with C = 2R and l = 1 such that F is a continuous linear functional on D(R), F ∈ D′ (R). We denote this distribution by P x1 . In quantum physics one needs the so called Sokhotsky’s formulas, [Wla72, p.76] 1 1 = −πiδ(x) + P , ε→0+0 x + εi x 1 1 lim = πiδ(x) + P . ε→0+0 x − εi x lim Idea of proof: Show the sum and the difference of the above formulas instead. lim ε→0+0 x2 2x 1 = 2P , 2 +ε x −2iε = −2πiδ. ε→0+0 x2 + ε2 lim The second limit follows from (16.4). 16.2.5 Operation with Distributions The distributions are distinguished by the fact that in many calculations they are much easier to handle than functions. For this purpose it is necessary to define operations on the set D′ . We already know how to add distributions and how to multiply them with complex numbers since D′ is a linear space. Our guiding principle to define multiplication, derivatives, tensor products, convolution, Fourier transform is always the same: for regular distributions, i.e. locally integrable functions, we want to recover the old well-known operation. (a) Multiplication There is no notion of a product T1 T2 of distributions. However, we can define a·T = T ·a, T ∈ D′ (Rn ), a ∈ C∞ (Rn ). What happens in case of a regular distribution T = Tf ? Z Z haTf , ϕi = (16.5) a(x)f (x)ϕ(x) dx = f (x) a(x)ϕ(x) dx = hTf , a ϕi . Rn Rn Obviously, aϕ ∈ D(Rn ) since a ∈ C∞ (Rn ) and ϕ has compact support; thus, aϕ has compact support, too. Hence, the right hand side of (16.5) defines a linear functional on D(Rn ). We have to show continuity. Suppose that ϕn −→ 0 then aϕn −→ 0. Then hT , aϕn i → 0 since T D is continuous. D 16 Distributions 428 Definition 16.8 For a ∈ C∞ (Rn ) and T ∈ D′ (Rn ) we define aT ∈ D′ (Rn ) by haT , ϕi = hT , aϕi and call aT the product of a and T . Example 16.5 (a) xP 1 xP , ϕ x = 1 x = 1. Indeed, for ϕ ∈ D(Rn ), 1 P , xϕ(x) x = Vp Z ∞ xϕ(x) dx = x −∞ Z R ϕ(x) dx = h1 , ϕi . (b) If f (x) ∈ C∞ (Rn ) then hf (x)δa , ϕi = hδa , f (x)ϕ(x)i = f (a)ϕ(a) = f (a) hδa , ϕi . This shows f (x)δa = f (a)δa . (c) Note that multiplication of distribution is no longer associative: 1 1 (δ · x) P = 0 · P = 0, x (b) x 1 δ x·P x = δ · 1 = δ. (a) (b) Differentiation Consider n = 1. Suppose that f ∈ L1loc is continuously differentiable. Suppose further that ϕ ∈ D with supp ϕ ⊂ (−a, a) such that ϕ(−a) = ϕ(a) = 0. We want to define (Tf )′ to be Tf ′ . Using integration by parts we find Z a Z a a ′ hTf ′ , ϕi = f (x)ϕ(x) dx = f (x)ϕ(x)|−a − f (x)ϕ′ (x) dx −a −a Z a f (x)ϕ′ (x) dx = − hTf , ϕ′ i , =− −a where we used that ϕ(−a) = ϕ(a) = 0. Hence, it makes sense to define Tf′ , ϕ = − hTf , ϕ′ i. This can easily be generalized to arbitrary partial derivatives D α Tf . Definition 16.9 For T ∈ D′ (Rn ) and a multi-index α ∈ Nn0 define D α T ∈ D′ (Rn ) by hD α T , ϕi = (−1)| α | hT , D α ϕi . We have to make sure that D α T is indeed a distribution. The linearity of D α T is obvious. To prove continuity let ϕn −→ 0. By definition, this implies D α ϕn −→ 0. Since T is continuous, D D hT , D α ϕn i → 0. This shows hD α T , ϕn i → 0; hence D α T is a continuous linear functional on D. Note, that exactly the fact D α T ∈ D′ needs the complicated looking notion of convergence in D, D α ϕn → D α ϕ. Further, a distribution has partial derivatives of all orders. 16.2 The Distributions D′ (Rn ) 429 Lemma 16.4 Let a ∈ C∞ (Rn ) and T ∈ D′ (Rn ). Then (a) Differentiation D α : D′ → D′ is continuous in D′ , that is, Tn → T in D′ implies D α Tn → D α T in D′ . (b) ∂ ∂a ∂T (a T ) = T +a , ∂xi ∂xi ∂xi i = 1, . . . , n (product rule). (c) For any two multi-indices α and β D α+β T = D α (D β T ) = D β (D α T ) (Schwarz’s Lemma). Proof. (a) Suppose that Tn → T in D′ , that is hTn , ψi → hT , ψi for all ψ ∈ D. In particular, for ψ = D α ϕ, ϕ ∈ D, we get (−1)| α | hD α Tn , ϕi = hTn , D α ϕi −→ hT , D α ϕi = (−1)| α | hD α T , ϕi . Since this holds for all ϕ ∈ D, the assertion follows. (b) This follows from ∂T ∂ ∂T a ,ϕ = , aϕ = − T , (aϕ) ∂xi ∂xi ∂xi ∂ϕ ∂ϕ = − haxi T , ϕi − aT , = − hT , axi (x)ϕi − T , a ∂xi ∂xi ∂ ∂ = − haxi T , ϕi + (aT ) , ϕ = −axi T + (aT ) , ϕ . ∂xi ∂xi “Cancelling” ϕ on both sides proves the claim. (c) The easy proof uses D α+β ϕ = D α (D β ϕ) for ϕ ∈ D. Example 16.6 (a) Let a ∈ Rn , f ∈ L1loc (Rn ), ϕ ∈ D. Then hD α δa , ϕi = (−1)| α | hδa , D α ϕi = (−1)| α | D α ϕ(a) Z α |α| f Dα ϕ dx. hD f , ϕi = (−1) Rn (b) Recall that the so-called Heaviside function H(x) is defined as the characteristic function of the half-line (0, +∞). We compute its derivative in D′ : Z Z ∞ ′ ′ H(x)ϕ (x) dx = − ϕ(x)′ dx = −ϕ(x)|∞ hTH , ϕ(x)i = − 0 = ϕ(0) = hδ , ϕi . R 0 This shows TH′ = δ. f(x) h c (c) More generally, let f (x) be differentiable on G = R \ {c} = (−∞, c) ∪ (c, ∞) with a discontinuity of the first kind at c. 16 Distributions 430 The derivative of Tf in D′ is Tf′ = Tf ′ + hδc , where h = f (c + 0) − f (c − 0), is the difference between the right-handed and left-handed limits of f at c. Indeed, for ϕ ∈ D we have Z c Z ∞ ′ Tf , ϕ = − f (x)ϕ′ (x) dx − −∞ c Z = −f (c − 0)ϕ(c) + f (c + 0)ϕ(c) + f ′ (x)ϕ(x) dx G = (f (c + 0) − f (c − 0))δc + Tf ′ (x) , ϕ = hh δc + Tf ′ , ϕi . (d) We prove that f (x) = log | x | is in L1loc (R) (see homework 50.4, and 51.5) and compute its derivative in D′ (R). Proof. Since f (x) is continuous on R \ {0}, the only critical point is 0. Since the integral R1 R0 (improper Riemann or Lebesgue) 0 log x dx = − −∞ et dt = −1 exists f is locally integrable at 0 and therefore defines a regular distribution. We will show that f ′ (x) = P x1 . We use the R∞ R −ε R ε R ∞ fact that −∞ = −∞ + −ε + ε for all positive ε > 0. Also, the limit ε → 0 of the right hand R∞ side gives the −∞ . By definition of the derivative, Z ∞ ′ ′ log | x | ϕ′ (x) dx hlog | x | , ϕ(x)i = − hlog | x | , ϕ (x)i = Z −ε Z ε Z ∞−∞ ′ + + log | x | ϕ (x) dx . =− −∞ −ε ε R Rε 1 Since −1 log | x | ϕ′ (x) dx < ∞, the middle integral −ε log | x | ϕ′ (x) dx tends to 0 as ε → 0 (Apply Lebesgue’s theorem to the family of functions gε (x) = χ[−ε,ε](x) log | x | ϕ′ (x) which pointwise tends to 0 and is dominated by the integrable function log | x | ϕ′ (x)). We consider the third integral. Integration by parts and ϕ(+∞) = 0 gives Z ∞ Z ∞ Z ∞ ϕ(x) ϕ(x) ∞ ′ dx = log εϕ(ε) − dx. log x ϕ (x) dx = log x ϕ(x)|ε − x x ε ε ε Similarly, Z −ε −∞ ′ log(−x) ϕ (x) dx = − log ε ϕ(−ε) − Z −ε −∞ ϕ(x) dx. x The sum of the first two (non-integral) terms tends to 0 as ε → 0 since ε log ε → 0. Indeed, log ε ϕ(ε) − log ε ϕ(−ε) = log ε Hence, ′ hf , ϕi = lim ε→0 Z −ε −∞ ϕ(ε) − ϕ(−ε) 2ε −→ 2 lim ε log ε ϕ′ (0) = 0. ε→0 2ε + Z ε ∞ ϕ(x) dx = x 1 P ,ϕ . x 16.2 The Distributions D′ (Rn ) 431 (c) Convergence and Fourier Series Lemma 16.5 Suppose that (fn ) converges locally uniformly to some function f , that is, fn ⇉ f uniformly on every compact set; assume further that fn is locally integrable for all n, fn ∈ L1loc (Rn ). (a) Then f ∈ L1loc (Rn ) and Tfn → Tf in D′ (Rn ). (b) D α Tfn → D α Tf in D′ (Rn ) for all multi-indices α. Proof. (a) Let K be a compact subset of Rn , we will show that f ∈ L1 (K). Since fn converge R R uniformly on K to 0, by Theorem 6.6 f is integrable and limn→∞ K fn (x) dx = K f dx. such that f ∈ L1loc (Rn ). We show that Tfn → Tf in D′ . Indeed, for any ϕ ∈ D with compact support K, again by Theorem 6.6 and uniform convergence of fn ϕ on K, Z fn (x)ϕ(x) dx lim Tfn (ϕ) = lim n→∞ n→∞ K Z Z = lim fn (x) ϕ(x) dx = f (x)ϕ(x) dx = Tf (ϕ); K n→∞ K we are allowed to exchange limit and integration since (fn (x)ϕ(x)) uniformly converges on K. Since this is true for all ϕ ∈ D, it follows that Tfn → Tf . (b) By Lemma 16.4 (a), differentiation is a continuous operation in D′ . Thus D α Tfn → D α Tf . Example 16.7 (a) Suppose that a, b > 0 and m ∈ N are given such that | cn | ≤ a | n |m + b for all n ∈ Z. Then the Fourier series X cn einx , Z n∈ converges in D′ (R). First consider the series X cn c0 xm+2 + einx . (m + 2)! n∈Z,n6=0 (ni)m+2 By assumption, cn cn inx (ni)m+2 e = (ni)m+2 a | n |m + b ã ≤ ≤ . m+2 |n| | n |2 (16.6) P Since n6=0 | nã|2 < ∞ , the series (16.6) converges uniformly on R by the criterion of Weierstraß (Theorem 6.2). By Lemma 16.5, the series (16.6) converges in D′ , too and can be differentiated term-by-term. The (m + 2)nd derivative of (16.6) is exactly the given Fourier series. 1/2 π −π −1/2 2π x The 2π-periodic function f (x) = 12 − 2π ,x∈ [0, 2π) has discontinuities of the first kind at 2πn, n ∈ Z; the jump has height 1 since f (0 + 0) − f (0 − 0) = 12 + 12 = 1. 16 Distributions 432 Therefore in D′ f ′ (x) = − The Fourier series of f (x) is X 1 δ(x − 2πn). + 2π n∈Z f (x) ∼ 1 X 1 inx e . 2πi n6=0 n R 2π Note that f and the Fourier series g on the right are equal in L2 (0, 2π). Hence 0 | f − g |2 = 0. This implies f = g a. e. on [0, 2π]; moreover f = g a. e. on R. Thus f = g in L1loc (R) and f coincides with g in D′ (R). 1 X 1 inx f (x) = e in D′ (R). 2πi n n6=0 By Lemma 16.5 the series can be differentiated elementwise up to arbitrary order. Applying Example 16.6 we obtain: X 1 1 X inx = δ(x − 2πn) = e in D′ (R). f ′ (x) + 2π n∈Z 2π n∈Z (b) A solution of xm u(x) = 0 in D′ is u(x) = m−1 X cn δ (n) (x), n=0 cn ∈ C. Since for every ϕ ∈ D and n = 0, . . . , m − 1 we have m (n) x δ (x) , ϕ = (−1)n δ , (xm ϕ(x))(n) = (xm ϕ(x))(n) x=0 = 0; thus, the given u satisfies xm u = 0. One can show, that this is the general solution, see [Wla72, p. 84]. (c) The general solution of the ODE u(m) = 0 in D′ is a polynomial of degree m − 1. Proof. We only prove that u′ = 0 implies u = c in D′ . The general statement follows by induction on m. Suppose that u′ = 0. That is, for all ψ ∈ D we have 0 = hu′ , ψi = − hu , ψ ′ i. In particular, for ϕ, ϕ1 ∈ D we have Z x (ϕ(t) − ϕ1 (t)I) dt, where I = h1 , ϕi , ψ(x) = −∞ belongs to D since both ϕ and ϕ1 do; ϕ1 plays an auxiliary role. Since hu , ψ ′ i = 0 and ψ ′ = ϕ − I ϕ1 we obtain 0 = hu , ψ ′ i = hu , ϕ − ϕ1 Ii = hu , ϕi − hu , ϕ1 i h1 , ϕi = hu , ϕi − h1 hu , ϕ1 i , ϕi = hu − 1 hu , ϕ1 i , ϕi = hu − c 1 , ϕi , where c = hu , ϕ1 i. Since this is true for all test functions ϕ ∈ D(Rn ), we obtain 0 = u − c or u = c which proves the assertion. 16.3 Tensor Product and Convolution Product 433 16.3 Tensor Product and Convolution Product 16.3.1 The Support of a Distribution Let T ∈ D′ be a distribution. We say that T vanishes at x0 if there exists ε > 0 such that hT , ϕi = 0 for all functions ϕ ∈ D with supp ϕ ⊆ Uε (x0 ). Similarly, we say that two distributions T1 and T2 are equal at x0 , T1 (x0 ) = T2 (x0 ), if T1 − T2 vanishes at x0 . Note that T1 = T2 if and only if T1 = T2 at a ∈ Rn for all points a ∈ Rn . Definition 16.10 Let T ∈ D′ be a distribution. The support of T , denoted by supp T , is the set of all points x such that T does not vanish at x, that is supp T = {x | ∀ ε > 0 ∃ ϕ ∈ D(Uε (x)) : hT , ϕi = 6 0}. Remarks 16.6 (a) If f is continuous, then supp Tf = supp f ; for an arbitrary locally integrable function we have, in general supp Tf ⊂ supp f . The support of a distribution is closed. Its complement is the largest open subset G of Rn such that T ↾G = 0. (b) supp δa = {a}, that is, δa vanishes at all points b 6= a. supp TH = [0, +∞), supp TχQ = ∅. 16.3.2 Tensor Products (a) Tensor product of Functions Let f : Rn → C, g : Rm → C be functions. Then the tensor product f ⊗ g : Rn+m → C is defined via f ⊗g(x, y) = f (x)g(y), x ∈ Rn , y ∈ Rm . If ϕ ∈ D(Rn ) and ψk ∈ D(Rm ), k = 1, . . . , r, we call the function ϕ(x, y) = Pr k (y) which is defined on Rn+m the tensor product of the functions ϕk and ψk . It is k=1 ϕk (x)ψ Pk P denoted by k ϕk ⊗ ψk . The set of such tensors rk=1 ϕk ⊗ ψk is denoted by D(Rn ) ⊗ D(Rm ). It is a linear space. P Note first that under the above assumptions on ϕk and ψk the tensor product ϕ = k ϕk ⊗ ψk ∈ C∞ (Rn+m ). Let K1 ⊂ Rn and K2 ⊂ Rm denote the common compact supports of the families {ϕk } and {ψk }, respectively. Then supp ϕ ⊂ K1 × K2 . Since both K1 and K2 are compact, its product K1 × K2 is agian compact. Hence, ϕ(x, y) ∈ D(Rn+m ). Thus, D(Rn ) ⊗ D(Rm ) ⊂ D(Rn+m ). Moreover, D(Rn ) ⊗ D(Rm ) is a dense subspace in D(Rn+m ). (m) (m) That is, for any η ∈ D(Rn+m ) there exist positive integers rk ∈ N and test functions ϕk , ψk such that rm X (m) (m) ϕk ⊗ ψk −→ η as m → ∞. D l=1 (b) Tensor Product of Distributions Definition 16.11 Let T ∈ D′ (Rn ) and S ∈ D′ (Rm ) be two distributions. Then there exists a unique distribution F ∈ D′ (Rn+m ) such that for all ϕ ∈ D(Rn ) and ψ ∈ D(Rm ) F (ϕ ⊗ ψ) = T (ϕ)S(ψ). This distribution F is denoted by T ⊗ S. 16 Distributions 434 P Indeed, T ⊗ S is linear on D(Rn ) ⊗ D(Rn ) such that (T ⊗ S)( rk=1 ϕk ⊗ ψk ) = Pr n n n+m ). For k=1 T (ϕk )S(ψk ). By continuity it is extended from D(R ) ⊗ D(R ) to D(R n m n example, if a ∈ R , b ∈ R then δa ⊗ δb = δ(a,b) . Indeed, for ϕ ∈ D(R ) and ψ ∈ D(Rm ) we have (δa ⊗ δb )(ϕ ⊗ ψ) = ϕ(a)ψ(b) = (ϕ ⊗ ψ)(a, b) = δ(a,b) (ϕ ⊗ ψ). Lemma 16.6 Let F = T ⊗ S be the unique distribution in D′ (Rn+m ) where T ∈ D(Rn ) and S ∈ D(Rm ) and η(x, y) ∈ D(Rn+m ). Then ϕ(x) = hS(y) , η(x, y)i is in D(Rn ), ψ(y) = hT (x) , η(x, y)i is in D(Rm ) and we have h(T ⊗ S) , ηi = hS , hT , ηii = hT , hS , ηii . For the proof, see [Wla72, II.7]. Example 16.8 (a) Regular Distributions. Let f ∈ L1loc (Rn ) and g ∈ L1loc (Rm ). Then f ⊗ g ∈ L1loc (Rn+m ) and Tf ⊗ Tg = Tf ⊗g . Indeed, by Fubini’s theorem, for test functions ϕ and ψ one has Z Z h(Tf ⊗ Tg ) , ϕ ⊗ ψi = hTf , ϕi hTg , ψi = f (x)ϕ(x) dx g(y)ψ(y) dy ZR Rm n = Rn+m f (x)g(y) ϕ(x)ψ(y) dxdy = hTf ⊗g , ϕ ⊗ ψi . (b) hδx0 ⊗ T , ηi = hT , η(x0 , y)i. Indeed, hδx0 ⊗ T , ϕ(x)ψ(y)i = hδx0 , ϕ(x)i hT , ψi = ϕ(x0 ) hT , ψ(y)i = hT , ϕ(x0 )ψ(y)i . In particular, (δa ⊗ Tg )(η) = (c) For any α ∈ Nn0 , β ∈ Nm 0 , Z Rm g(y)η(a, y) dy. D α+β (T ⊗ S) = (Dxα T ) ⊗ (Dyβ S) = D β ((D α T ) ⊗ S) = D α (T ⊗ D β S). Idea of proof in case n = m = 1. Let ϕ, ψ ∈ D(R). Then ∂ ∂ (T ⊗ S) (ϕ ⊗ ψ) = − (T ⊗ S) (ϕ ⊗ ψ) ∂x ∂x = −(T ⊗ S)(ϕ′ ⊗ ψ) = −T (ϕ′ )S(ψ) = T ′ (ϕ)S(ψ) = (T ′ ⊗ S)(ϕ ⊗ ψ). 16.3.3 Convolution Product Motivation: Knowing the fundamental solution E of a linear differential operator L, that is L(E) = δ, one can immediately has the solution of the equation L[u] = f for an arbitrary f , namely u = E ∗ f where the ∗ is the convolution product already defined for functions in Definition 16.2. 16.3 Tensor Product and Convolution Product 435 (a) Convolution Product of Functions The main problem with convolutions is: we run into trouble with the support. Even in case that f and g are locally integrable, f ∗ g need not to be a locally integrable function. However, there are three cases where all is fine: 1. One of the two functions f or g has compact support. 2. Both functions have support in [0, +∞). 3. Both functions are in L1 (R). R In the last case (f ∗ g)(x) = f (y)g(x − y) dy is again integrable. The convolution product is a commutative and associative operation on L1 (Rn ). (b) Convolution Product of Distributions Let us consider the case of regular distributions. Suppose that If f, g, f ∗ g ∈ L1loc (Rn ). As usual we want to have Tf ∗ Tg = Tf ∗g . Let ϕ ∈ D(Rn ), Z ZZ hTf ∗g , ϕi = (f ∗ g)(x)ϕ(x) dx = f (y)g(x − y)ϕ(x) dxdy R = t=x−y ZZ R2 R2 f (y)g(t)ϕ(y + t) dy dt = Tf ⊗g (ϕ̃), (16.7) where ϕ̃(y, t) = ϕ(y + t). 0110 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 supp ϕ(x+y) 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 x 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 supp ϕ 00000000000000000000000 11111111111111111111111 1010 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 y There are two problems: (a) in general ϕ̃ is not a test function since it has unbounded support in R2n . Indeed, (y, t) ∈ supp ϕ̃ if y + t = c ∈ supp ϕ, which is a family of parallel lines forming an unbounded strip. (b) the integral does not exist. We overcome the second problem if we impose the condition that the set Kϕ = {(y, t) ∈ R2n | y ∈ supp Tf , t ∈ supp Tg , y + t ∈ supp ϕ} is bounded for any ϕ ∈ D(Rn ); then the integral (16.7) makes sense. We want to solve the problem (a) by “cutting” ϕ̃. Define Tf ∗g (ϕ) = lim (Tf ⊗ Tg )(ϕ(y + t)ηk (y, t)), k→∞ where ηk −→ 1 as k → ∞ and ηk ∈ D(R2n ). Such a sequence exists; let η(y, t) ∈ D(R2n ) n→∞ with η(y, t) = 1 for kyk2 +ktk2 ≤ 1. Put ηk (y, t) = η ky , kt , k ∈ N. Then limk→∞ ηk (y, t) = 1 for all (y, t) ∈ R2n . 16 Distributions 436 Definition 16.12 Let T, S ∈ D′ (Rn ) be distributions and assume that for every ϕ ∈ D(Rn ) the set Kϕ := {(x, y) ∈ R2n | x + y ∈ supp ϕ, x ∈ supp T, y ∈ supp S} is bounded. Define hT ∗ S , ϕi = lim hT ⊗ S , ϕ(x + y)ηk (x, y)i . (16.8) k→∞ T ∗ S is called the convolution product of the distributions S and T . Remark 16.7 (a) The sequence (16.8) becomes stationary for large k such that the limit exists. Indeed, for fixed ϕ, the set Kϕ is bounded, hence there exists k0 ∈ N such that ηk (x, y) = 1 for all x, y ∈ Kϕ and all k ≥ k0 . That is ϕ(x + y)ηk (x, y) does not change for k ≥ k0 . (b) The limit is a distribution in D′ (Rn ) (c) The limit does not depend on the special choice of the sequence ηk . Remarks 16.8 (Properties) (a) If S or T has compact support, then T ∗ S exists. Indeed, suppose that supp T is compact. Then x + y ∈ supp ϕ and x ∈ supp T imply y ∈ supp ϕ − supp T = {y1 − y2 | y1 ∈ supp ϕ, y2 ∈ supp T }. Hence, (x, y) ∈ Kϕ implies k(x, y)k ≤ kxk + kyk ≤ kxk + ky1 k + ky2 k ≤ 2C + D if supp T ⊂ UC (0) and supp ϕ ⊂ UD (0). That is, Kϕ is bounded. (b) If S ∗ T exists, so does T ∗ S and S ∗ T = T ∗ S. (c) If T ∗ S exists, so do D α T ∗ S, T ∗ D α S, D α (T ∗ S) and they coincide: D α (T ∗ S) = D α T ∗ S = T ∗ D α S. Proof. For simplicity let n = 1 and D α = d . dx Suppose that ϕ ∈ D(R) then h(S ∗ T )′ , ϕi = − hS ∗ T , ϕ′ i = − lim hS ⊗ T , ϕ′ (x + y) ηk (x, y)i k→∞ ∂ ∂ηk = − lim S ⊗ T , (ϕ(x + y)ηk (x, y)) − ϕ(x + y) k→∞ ∂x ∂x * = lim hS ′ ⊗ T , ϕ(x + y)ηk (x, y)i − lim k→∞ k→∞ S ⊗ T , ϕ(x + y) ∂ηk ∂x |{z} + =0 for large k ′ = hS ∗ T , ϕi The proof of the second equality uses commutativity of the convolution product. (d) If supp S is compact and ψ ∈ D(Rn ) such that ψ(y) = 1 in a neighborhood of supp S. Then (T ∗ S)(ϕ) = hT ⊗ S , ϕ(x + y)ψ(y)i , ∀ ϕ ∈ D(Rn ). (e) If T1 , T2 , T3 ∈ D′ (Rn ) all have compact support, then T1 ∗ (T2 ∗ T3 ) and (T1 ∗ T2 ) ∗ T3 exist and T1 ∗ (T2 ∗ T3 ) = (T1 ∗ T2 ) ∗ T3 . 16.3 Tensor Product and Convolution Product 437 16.3.4 Linear Change of Variables Suppose that y = Ax + b is a regular, linear change of variables; that is, A is a regular n × n ˜ matrix. As usual, consider first the case of a regular distribution f (x). Let f(x) = f (Ax + b) −1 with y = Ax + b, x = A (y − b), dy = det A dx. Then D E Z ˜ f (x) , ϕ(x) = f (Ax + b)ϕ(x) dx Z 1 = f (y)ϕ(A−1(y − b)) dy | det A | 1 f (y) , ϕ(A−1 (y − b)) . = | det A | Definition 16.13 Let T ∈ D′ (Rn ), A a regular n × n-matrix and b ∈ Rn . Then T (Ax + b) denotes the distribution 1 T (y) , ϕ(A−1 (y − b)) . hT (Ax + b) , ϕ(x)i := | det A | For example, T (x) = T , hT (x − a) , ϕ(x)i = hT , ϕ(x + a)i, in particular, δ(x − a) = δa . hδ(x − b) , ϕ(x)i = hδ(x) , ϕ(x + b)i = ϕ(0 + b) = ϕ(b) = hδb , ϕi . Example 16.9 (a) δ ∗ S = S ∗ δ = S for all S ∈ D′ . The existence is clear since δ has compact support. h(δ ∗ S) , ϕi = lim hδ(x) ⊗ S(y) , ϕ(x + y)ηk (x, y)i k→∞ = lim hS(y) , ϕ(y)ηk (0, y)i = hS , ϕi k→∞ (b) δa ∗ S = S(x − a). Indeed, (δa ∗ S)(ϕ) = lim hδa ⊗ S , ϕ(x + y)ηk (x, y)i k→∞ = lim hS(y) , ϕ(a + y)ηk (a, y)i = hS(y) , ϕ(a + y)i = hS(y − a) , ϕi . k→∞ Inparticular δa ∗ δb = δa+b . (c) Let ̺ ∈ L1loc (Rn ) and supp Tf is compact. 1 Case n = 2 f (x) = log kxk ∈ L1loc (R2 ). We call V (x) = (̺ ∗ f )(x) = surface potential with density ̺. Case n ≥ 3 f (x) = 1 kxkn−2 ZZ ∈ L1loc (Rn ). We call V (x) = (̺ ∗ f )(x) = Z vector potential with density ̺. (d) For α > 0 and x ∈ R put fα (x) = 2 x √1 e− 2α2 . α 2π R2 Rn ̺(y) log ̺(y) 1 dy kx − yk 1 dy kx − ykn−2 Then fα ∗ fβ = f√α2 +β 2 . 16 Distributions 438 16.3.5 Fundamental Solutions Suppose that L[u] is a linear differential operator on Rn , X L[u] = cα (x)D α u, | α |≤k where cα ∈ C∞ (Rn ). Definition 16.14 A distribution E ∈ D′ (Rn ) is said to be a fundamental solution of the differential operator L if L(E) = δ. Note that E ∈ D′ (Rn ) need not to be unique. It is a general result due to Malgrange and Ehrenpreis (1952) that any linear partial differential operator with constant coefficients possesses a fundamental solution. (a) ODE We start with an example from the theory of ordinary differential equations. Recall that H = χ(0,+∞) denotes the Heaviside function. Lemma 16.7 Suppose that u(t) is a solution of the following initial value problem for the ODE L[u] = u(m) + a1 (t)u(m−1) + · · · + am (t)u = 0, u(0) = u′ (0) = · · · = u(m−2) (0) = 0, u(m−1) (0) = 1. Then E = TH(t) u(t) is a fundamental solution of L, that is, it satisfies L(E) = δ. Proof. Using Leibniz’ rule, Example 16.5 (b), and u(0) = 0 we find E′ = δ Tu + THu′ = u(0)δ + THu′ = THu′ . Similarly, on has E′′ = THu′′ , . . . , E(m−1) = THu(m−1) , E(m) = THu(m) + δ(t). This yields L(E) = E(m) + a1 (t)E(m−1) + · · · + am (t)E(t) = TH(t) L(u(t)) + δ = T0 + δ = δ. Example 16.10 We have the following example of fundamental solutions: y ′ + ay = 0, E = TH(x)e−ax , ′′ E = TH(x) sin ax . 2 y + a y = 0, a 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 439 (b) PDE Here is the main application of the convolution product: knowing the fundamental solution of a partial differential operator L one immediately knows a weak solution of the inhomogeneous equations L(u) = f for f ∈ D′ (Rn ). X Theorem 16.8 Suppose that L[u] = cα D α u is a linear differential operator in Rn with | α |≤k constant coefficients cα . Suppose further that E ∈ D′ (Rn ) is a fundamental solution of L. Let f ∈ D′ (Rn ) be a distribution such that the convolution product S = E ∗ f exists. Then L(S) = f in D′ . In the set of distributions of D′ which possess a convolution with E, S is the unique solution of L(S) = f Proof. By Remark 16.8 (b) we have X X L(S) = ca D α (E ∗ f ) = ca D α (E) ∗ f = L(E) ∗ f = δ ∗ f = f. | α |≤k | α |≤k Suppose that S1 and S2 are both solutions of L(S) = f , i. e. L(S1 ) = L(S2 ) = f . Then S1 − S2 = (S1 − S2 ) ∗ δ = (S1 − S2 ) ∗ = X ca D α E = | α |≤k X | α |≤k ca D α (S1 − S2 ) ∗ E = (f − f ) ∗ E = 0. (16.9) 16.4 Fourier Transformation in S ( R ) and S (R ) n ′ n We want to define the Fourier transformation for test functions ϕ as well as for distributions. The problem with D(Rn ) is that its Fourier transformation Z e−ixξ ϕ(x) dx Fϕ(ξ) = αn R of ϕ is an entire (analytic) function with real support R. That is, Fϕ does not have compact support. The only test function in D which is analytic is 0. To overcome this problem, we enlarge the space of test function D ⊂ S in such a way that S becomes invariant under the Fourier transformation F(S ) ⊂ S . R Lemma 16.9 Let ϕ ∈ D(R). Then the Fourier transform g(z) = αn R e−itz ϕ(t) dt is holomorphic in the whole complex plane and bounded in any half-plane Ha = {z ∈ C | Im (z) ≤ a}. 16 Distributions 440 Proof. (a) We show that the complex limit limh→0(g(z + h) − g(z))/h exists for all z ∈ C. Indeed, Z g(z + h) − g(z) e−iht − 1 = αn ϕ(t) dt. e−izt h h R −izt e−iht −1 Since e ϕ(t) ≤ C for all x ∈ supp (ϕ), h ∈ C, | h | ≤ 1, we can apply Lebesgue’s h Dominated Convergence theorem: Z Z g(z + h) − g(z) e−iht − 1 −izt = αn ϕ(t) dt = αn e lim e−izt (−it)ϕ(t) dt = F(−itϕ(t)). lim h→0 h→0 h h R R (b) Suppose that Im (z) ≤ a. Then Z Z −it Re (z) t Im (z) | g(z) | ≤ αn e e ϕ(t) dt ≤ αn sup | ϕ(t) | eta dt, R t∈K K where K is a compact set which contains supp ϕ. R) 16.4.1 The Space S ( n Definition 16.15 S (Rn ) is the set of all functions f ∈ C∞ (Rn ) such that for all multi-indices α and β pα,β (f ) = sup xβ D α f (x) < ∞. Rn x∈ S is called the Schwartz space or the space of rapidly decreasing functions. S (Rn ) = {f ∈ C∞ (Rn ) | ∀ α, β : Pα,β (f ) < ∞}. Roughly speaking, a Schwartz space function is a function decreasing to 0 (together with all its 1 partial derivatives) faster than any rational function as x → ∞. In place of pα,β one can P (x) also use the norms X pk,l (ϕ) = pα,β (ϕ), k, l ∈ Z+ | α |≤k, | β |≤l, to describe S (Rn ). The set S (Rn ) is a linear space and pα,β are norms on S . 2 For example, P (x) 6∈ S for any non-zero polynomial P (x); however e−kxk ∈ S (Rn ). S (Rn ) is an algebra. Indeed, the generalized Leibniz rule ensures pkl (ϕ · ψ) < ∞. For 2 example, f (x) = p(x)e−ax +bx+c , a > 0, belongs to S (R) for p is a polynomial; g(x) = e−| x | is not differentiable at 0 and hence not in S (R). Convergence in S Definition 16.16 Let ϕn , ϕ ∈ S . We say that the sequence (ϕn ) converges in S to ϕ, abbreviated by ϕn −→ ϕ, if one of the following equivalent conditions is satisfied for all multi-indices S 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 441 α and β: pα,β (ϕ − ϕn ) −→ 0; n→∞ −→ 0, uniformly on Rn ; x D (ϕ − ϕn ) − −→ −→ xβ D α ϕ, uniformly on Rn . −→ xβ D α ϕn − β α Remarks 16.9 (a) In quantum mechanics one defines the position and momentum operators Qk and Pk , k = 1, . . . , n, by (Qk ϕ)(x) = xk ϕ(x), (Pk ϕ)(x) = −i ∂ϕ , ∂xk respectively. The space S is invariant under both operators Qk and Pk ; that is xβ D α ϕ(x) ∈ S (Rn ) for all ϕ ∈ S (Rn ). (b) S (Rn ) ⊂ L1 (Rn ). Recall that a rational function P (x)/Q(x) is integrable over [1, +∞) if and only if Q(x) 6= 0 for x ≥ 1 and deg Q ≥ deg P + 2. Indeed, C/x2 is then an integrable upper bound. We want to find a condition on m such that Z dx < ∞. Rn (1 + kxk2 )m For, we use that any non-zero x ∈ Rn can uniquely be written as x = r y where r = kxk and y is on the unit sphere Sn−1 . One can show that dx1 dx2 · · · dxm = r n−1 dr dS where dS is the surface element of the unit sphere Sn−1 . Using this and Fubini’s theorem, Z Z ∞Z Z ∞ n−1 dx r n−1 dr dS r dr = ωn−1 , 2 m = 2 m (1 + r 2 )m 0 0 Rn (1 + kxk ) Sn−1 (1 + r ) where ωn−1 is the (n − 1)-dimensional measure of the unit sphere Sn−1 . By the above criterion, the integral is finite if and only if 2m − n + 1 > 1 if and only if m > n/2. In particular, Z dx < ∞. Rn 1 + kxkn+1 In case n = 1 the integral is π. By the above argument Z Z | ϕ(x) | dx = Rn Rn (1 + kxk2n )ϕ(x) ≤ C p0,2n (ϕ) Z Rn dx 1 + kxk2n dx < ∞. 1 + kxk2n (c) D(Rn ) ⊂ S (Rn ); indeed, the supremum pα,β (ϕ) of any test function ϕ ∈ D(Rn ) is finite since the supremum of a continuous function over a compact set is finite. On the other hand, 2 D ( S since e−kxk is in S but not in D. (d) In contrast to D(Rn ), the rapidly decreasing functions S (Rn ) form a metric space. Indeed, S (Rn ) is a locally convex space, that is a linear space V such that the topology is given by a 16 Distributions 442 set of semi-norms pα separating the elements of V , i. e. pα (x) = 0 for all α implies x = 0. Any locally convex linear space where the topology is given by a countable set of semi-norms is metrizable. Let (pn )n∈N be the defining family of semi-norms. Then ∞ X 1 pn (ϕ − ψ) d(ϕ, ψ) = , 2n 1 + pn (ϕ − ψ) n=1 ϕ, ψ ∈ V defines a metric on V describing the same topology. (In our case, use Cantor’s first diagonal method to write the norms pkl , k, l ∈ N, from th array into a sequence pn .) Definition 16.17 Let f (x) ∈ L1 (Rn ), then the Fourier transform Ff of the function f (x) is given by Z 1 ˆ e−ix·ξ f (x) dx, Ff (ξ) = f (ξ) = √ n 2π Rn P where x = (x1 , . . . , xn ), ξ = (ξ1 , . . . , ξn ), and x · ξ = nk=1 xk ξk . 1 Let us abbreviate the normalization factor, αn = √2π n . Caution, Wladimirow, [Wla72] uses +iξ·x under the integral and normalization factor 1 in place of αn . another convention with e R Note that Ff (0) = αn Rn f (x) dx. Example 16.11 We calculate the Fourier transform Fϕ 2 1 ϕ(x) = e−kxk /2 = e− 2 x·x , x ∈ Rn . (a) n = 1. From complex analysis, Lemma 14.30, we know Z x2 ξ2 x2 1 − 2 F e (ξ) = √ e− 2 e−ixξ dx = e− 2 . 2π R of the function (16.10) (b) Arbitrary n. Thus, Fϕ(ξ) = ϕ̂(ξ) = αn = αn = Z Rn k=1 n Y (16.10) Z 1 Rn 1 e− 2 Pn k=1 x2k e−i Pn k=1 xk ξk dx 2 e− 2 xk −ixk ξk dx1 · · · dxn 1 2 αn e− 2 xk −ixk ξk dxk k=1 Fϕ(ξ) = n Y Z n Y 1 2 1 2 e− 2 ξk = e− 2 ξ . k=1 1 2 Hence, the Fourier transform of e− 2 x is the function itself. It follows via scaling x 7→ cx that c2 x 2 ξ2 1 F e− 2 (ξ) = n e− 2c2 . c Theorem 16.10 Let ϕ, ψ ∈ S (Rn ). Then we have (i) F(xα ϕ(x)) = i| α | D α (Fϕ), that is F ◦Qk = −Pk ◦F, k = 1, . . . , n. 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 443 (ii) F(D αϕ(x))(ξ) = i| α | ξ α (Fϕ)(ξ), that is F ◦Pk = Qk ◦F, k = 1, . . . , n. (iii) F(ϕ) ∈ S (Rn ), moreover ϕn −→ ϕ implies Fϕn −→ Fϕ, that is, the Fourier transform S S F is a continuous linear operator on S . (iv) F(ϕ ∗ ψ) = αn−1 F(ϕ) F(ψ). (v) F(ϕ · ψ) = αn F(ϕ) ∗ F(ψ) (vi) F(ϕ(Ax + b))(ξ) = 1 −1 eiA b·ξ Fϕ A−⊤ ξ , | det A | where A is a regular n × n matrix and A−⊤ denotes the transpose of A−1 . In particular, ξ 1 , F(ϕ(λx))(ξ) = n (Fϕ) |λ| λ F(ϕ(x + b))(ξ) = eib·ξ (Fϕ)(ξ). Proof. (i) We carry out the proof in case α = (1, 0, . . . , 0). The general case simply follows. Z ∂ ∂ (Fϕ)(ξ) = αn e−iξ·x ϕ(x) dx. ∂ξ1 ∂ξ1 Rn Since ∂ξ∂1 e−iξ·x ϕ(x) = −ix1 e−iξ·xϕ(x) tends to 0 as x → ∞, we can exchange partial differentiation and integration, see Proposition 12.23 Hence, Z ∂ e−iξ·x ix1 ϕ(x) dx = F(−ix1 ϕ(x))(ξ). (Fϕ)(ξ) = −αn ∂ξ1 Rn (ii) Without loss of generality we again assume α = (1, 0, . . . , 0). Using integration by parts, we obtain Z Z ∂ ∂ −iξ·x ϕ(x) (ξ) = αn ϕx1 (x) dx = −αn e F e−iξ·x ϕ(x) dx ∂x1 RZn Rn ∂x1 e−iξ·x ϕ(x) dx = iξ1 (Fϕ)(ξ). = iξ1 αn Rn (iii) By (i) and (ii) we have for | α | ≤ k and | β | ≤ l Z Z α β α β D (x ϕ(x)) dx ≤ c1 ξ D Fϕ ≤ αn R n Z R n (1 + kxkl ) (1 + kxkl+n+1) X | D γ ϕ(x) | dx n+1 n R (1 + kxk ) | γ |≤k X | D γ ϕ(x) | ≤ c3 sup (1 + kxkl+n+1 ) ≤ c2 Rn x∈ ≤ c4 pk,l+n+1(ϕ). | γ |≤k X | γ |≤k | D γ ϕ(x) | dx 16 Distributions 444 This implies Fϕ ∈ S (Rn ) and, moreover F : S → S being continuous. (iv) First note that L1 (Rn ) is an algebra with respect to the convolution product where kf ∗ gkL1 ≤ kf kL1 kgkL1 . Indeed, Z Z Z | f ∗ g | dx = | f (y)g(x − y) | dy dx kf ∗ gkL1 = n n n R R R Z Z ≤ | f (y) | | g(x − y) | dx dy Rn Z Rn | f (y) | dy = kf kL1 kgkL1 . ≤ kgkL1 Rn This in particular shows that | f ∗ g(x) | < ∞ is finite a.e. on Rn . By definition and Fubini’s theorem we have Z Z −ix·ξ F(ϕ ∗ ψ)(ξ) = αn e ϕ(y)ψ(x − y) dy dx n n R R Z Z −1 −i(x−y)·ξ = αn ψ(x − y) dx αn e−iy·ξ ϕ(y) dy e αn Rn = z=x−y Rn αn−1 Fψ(ξ) Fϕ(ξ). (v) will be done later, after Proposition 16.11. (vi) is straightforward using hA−1 (y) , ξi = y , A−⊤ (ξ) . Remark 16.10 Similar properties as F has the operator G which is also defined on L1 (Rn ): Z Gϕ(ξ) = ϕ̌(ξ) = αn e+ix·ξ ϕ(x) dx. Rn Put ϕ− (x) := ϕ(−x). Then Gϕ = Fϕ− = F(ϕ) and Fϕ = Gϕ− = G(ϕ). It is easy to see that (iv) holds for G, too, that is G(ϕ ∗ ψ) = αn−1 G(ϕ)G(ψ). Proposition 16.11 (Fourier Inversion Formula) The Fourier transformation is a one-to-one mapping of S (Rn ) onto S (Rn ). The inverse Fourier transformation is given by Gϕ: F(Gϕ) = G(Fϕ) = ϕ, x·x ϕ ∈ S (Rn ). ε2 x2 Proof. Let ψ(x) = e− 2 and Ψ (x) = ψ(εx) = e− 2 . Then Ψε (x) := FΨ (x) = ε1n ψ have Z Z Z 1 x dx = αn αn Ψε (x) dx = αn ψ(x) dx = ψ̂(0) = 1. ψ Rn Rn εn ε Rn x ε . We 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 445 Further, Z Z Z 1 1 1 √ n Ψε (x)ϕ(x) dx = √ n ψ(x)ϕ(εx) dx −→ √ n ψ(x)ϕ(0) dx = ϕ(0). ε→0 2π Rn 2π Rn 2π Rn (16.11) In other words, αn Ψε (x) is a δ-sequence. We compute G(Fϕ Ψ )(x). Using Fubini’s theorem we have Z Z Z 1 1 i x·ξ i x·ξ (Fϕ)(ξ)Ψ (ξ)e dξ = Ψ (ξ)e e−i ξ·y ϕ(y) dydξ G(Fϕ Ψ )(x) = √ n n (2π) Rn 2π ZRn Rn Z 1 1 ϕ(y) √ n e−i(y−x)·ξ Ψ (ξ)dξ dy =√ n 2π ZRn 2π Rn 1 ϕ(y) (FΨ )(y − x) dy =√ n 2π Rn Z Z 1 1 = √ n FΨ (z) ϕ(z + x) dz = √ n Ψε (z) ϕ(z + x) dz z:=y−x 2π Rn 2π Rn −→ ϕ(x). as ε → 0, see (16.11) On the other hand, by Lebesgue’s dominated convergence theorem, Z Z i x·ξ (Fϕ)(ξ)ψ(0)e dξ = αn (Fϕ)(ξ)ei x·ξ dξ = G(Fϕ)(x). lim G(Fϕ · Ψ )(x) = αn Rn ε→0 Rn This proves the first part. The second part F(Gϕ) = ϕ follows from G(ϕ) = F(ϕ), F(ϕ) = G(ϕ), and the first part. We are now going to complete the proof of Theorem 16.10 (v). For, let ϕ = Gϕ1 and ψ = Gψ1 with ϕ1 , ψ1 ∈ S . By (iv) we have F(ϕ · ψ) = F(G(ϕ1 )G(ψ1 )) = F(αn G(ϕ1 ∗ ψ1 )) = αn ϕ1 ∗ ψ1 = αn Fϕ ∗ Fψ. Proposition 16.12 (Fourier–Plancherel formula) For ϕ, ψ ∈ S (Rn ) we have Z Z ϕ ψ dx = F(ϕ) F(ψ) dx, Rn Rn In particular, kϕkL2 (Rn ) = kF(ϕ)kL2 (Rn ) Proof. First note that F(ψ)(−y) = αn Z ix·y R e n ψ(x) dx = αn Z R e−ix·y ψ(x) dx = F(ψ)(y). By Theorem 16.10 (v), Z ϕ(x) ψ(x) dx = αn−1 F(ϕ · ψ)(0) = (F(ϕ) ∗ F(ψ))(0) Rn Z Z = F(ϕ)(y) F(ψ)(0 − y) dy = Rn (16.12) n (16.12) Rn F(ϕ)(y) F(ψ)(y) dy. 16 Distributions 446 Remark 16.11 S (Rn ) ⊂ L2 (Rn ) is dense. Thus, the Fourier transformation has a unique extension to a unitary operator F : L2 (Rn ) → L2 (Rn ). (To a given f ∈ L2 choose a sequence ϕn ∈ S converging to f in the L2 -norm. Since F preserves the L2 -norm, kFϕn − Fϕm k = kϕn − ϕm k and (ϕn ) is a Cauchy sequence in L2 , (Fϕn ) is a Cauchy sequence, too; hence it converges to some g ∈ L2 . We define F(f ) = g.) R) 16.4.2 The Space S ′ ( n Definition 16.18 A tempered distribution (or slowly increasing distribution) is a continuous linear functional T on the space S (Rn ). The set of all tempered distributions is denoted by S ′ (Rn ). A linear functional T on S (Rn ) is continuous if for all sequences ϕn ∈ S with ϕn −→ 0, S hT , ϕn i → 0. For ϕn ∈ D with ϕn −→ 0 it follows that ϕn −→ 0. So, every continuous linear functional S D on S (restricted to D) is continuous on D. Moreover, the mapping ι : S ′ → D′ , ι(T ) = T ↾D is injective since T (ϕ) = 0 for all ϕ ∈ D implies T (ψ) = 0 for all ψ ∈ S which follows from D ⊂ S is dense and continuity of T . Using the injection ι, we can identify S ′ with as a subspace of D′ , S ′ ⊆ D′ . That is, every tempered distribution “is” a distribution. Lemma 16.13 (Characterization of S ′ ) A linear functional T defined on S (Rn ) belongs to S ′ (Rn ) if and only if there exist non-negative integers k and l and a positive number C such that for all ϕ ∈ S (Rn ) | T (ϕ) | ≤ C pkl (ϕ), X where pkl (ϕ) = pαβ (ϕ). | α |≤k | β |≤l, Remarks 16.12 (a) With the usual identification f ↔ Tf of functions and regular distributions, L1 (Rn ) ⊆ S ′ (Rn ), L2 (Rn ) ⊆ S ′ (Rn ). 2 (b) L1loc 6⊂ S ′ , for example Tf 6∈ S ′ ( R), f (x) = ex , since Tf (ϕ) is not well-defined for all 2 Schwartz function ϕ, for example Tex2 e−x = +∞. (c) If T ∈ D′ (Rn ) and supp T is compact then T ∈ S ′ (Rn ). (d) Let f (x) be measurable. If there exist C > 0 and m ∈ N such that | f (x) | ≤ C(1 + kxk2 )m a. e. x ∈ Rn . Then Tf ∈ S ′ . Indeed, the above estimate and Remark 16.9 imply Z Z 1 f (x)ϕ(x) dx ≤ C (1 + kxk2 )m (1 + kxk2 )n | hTf , ϕi | = | ϕ(x) | dx n (1 + kxk2 )n Rn R Z dx ≤ Cp0,2m+2n (ϕ) . Rn (1 + kxk2 )n By Lemma 16.13, f is a tempered regular distribution, f (x) ∈ S ′ . 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 447 Operations on S ′ The operations are defined in the same way as in case of D′ . One has to show that the result is again in the (smaller) space S ′ . If T ∈ S ′ then (a) D α T ∈ S ′ for all multi-indices α. (b) f ·T ∈ S ′ for all f ∈ C∞ (Rn ) such that D α f growth at most polynomially at infinity for all multi-indices α, (i. e. for all multi-indices α there exist Cα > 0 and kα such that | D α f (x) | ≤ Cα (1 + kxk)kα . ) (c) T (Ax + b) ∈ S ′ for any regular real n × n- matrix and b ∈ Rn . (d) T ∈ S ′ (Rn ) and S ∈ S ′ (Rm ) implies T ⊗ S ∈ S ′ (Rn+m ). (e) Let T ∈ S ′ (Rn ), ψ ∈ S (Rn ). Define the convolution product hψ ∗ T , ϕi = h(1(x) ⊗ T (y)) , ψ(x)ϕ(x + y)i , Z = T, ψ(x)ϕ(x + y) dx ϕ ∈ S (Rn ) Rn Note that this definition coincides with the more general Definition 16.12 since lim ψ(x)ϕ(x + y)ηk (x, y) = ψ(x)ϕ(x + y) ∈ S (R2n ). k→∞ R) 16.4.3 Fourier Transformation in S ′ ( n We are following our guiding principle to define the Fourier transform of a distribution T ∈ S ′ : First consider the case of a regular tempered distribution. We want to define F(Tf ) := TFf . Suppose that f (x) ∈ L1 (Rn ) is integrable. Then its Fourier transformation Ff exists and is a bounded continuous function: Z Z iξ·x | Ff (ξ) | ≤ αn | f (x) | dx = αn kTf kL1 < ∞. e f (x) dx = αn Rn Rn By Remark 16.12 (d), Ff defines a distribution in S ′ . By Fubini’s theorem ZZ Z hTFf , ϕi = f (x)e−iξ·xϕ(ξ)dξ dx Ff (ξ)ϕ(ξ)dξ = αn Rn = Z Rn R2n f (x) Fϕ(x) dx = hTf , Fϕi . Hence, hTFf , ϕi = hTf , Fϕi. We take this equation as the definition of the Fourier transformation of a distribution T ∈ S ′ . Definition 16.19 For T ∈ S ′ and ϕ ∈ S define hFT , ϕi = hT , Fϕi . We call FT the Fourier transform of the distribution T . (16.13) 16 Distributions 448 Since Fϕ ∈ S (Rn ), FT it is well-defined on S . Further, F is a linear operator and T a linear functional, hence FT is again a linear functional. We show that FT is a continuous linear functional on S . For, let ϕn −→ ϕ in S . By Theorem 16.10, Fϕn −→ Fϕ. Since T is S S continuous, hFT , ϕn i = hT , Fϕn i −→ hT , Fϕi = hFT , ϕi which proves continuity. Lemma 16.14 The Fourier transformation F : S ′ (Rn ) → S ′ (Rn ) is a continuous bijection. Proof. (a) We show continuity (see also homework 53.3). Suppose that Tn → T in S ′ ; that is for all ϕ ∈ S , hTn , ϕi → hT , ϕi. Hence, hFTn , ϕi = hTn , Fϕi −→ hT , Fϕi = hFT , ϕi n→∞ which proves the assertion. (b) We define a second transformation G : S ′ → S ′ via hGT , ϕi := hT , Gϕi and show that F ◦G = G◦F = id on S ′ . Taking into account Proposition 16.11 we have hG(FT ) , ϕi = hFT , Gϕi = hT , F(Gϕ)i = hT , ϕi ; thus, G◦F = id . The proof of the direction F ◦G = id is similar; hence, F is a bijection. Remark 16.13 All the properties of the Fourier transformation as stated in Theorem 16.10 (i), (ii), (iii), (iv), and (v) remain valid in case of S ′ . In particular, F(xα T ) = i| α | D α (FT ). Indeed, for ϕ ∈ S (Rn ), by Theorem 16.10 (ii) F(xα T )(ϕ) = hxα T , Fϕi = hT , xα Fϕi = T , (−i)| α | F(D α ϕ) = (−1)| α | (−i)| α | hD α (FT ) , ϕi = i| α | D α T , ϕ . Example 16.12 (a) Let a ∈ Rn . We compute F δa . For ϕ ∈ S (Rn ), Z e−ix·a ϕ(x) dx = Tαn e−ix·a (ϕ). Fδa (ϕ) = δa (Fϕ) = (Fϕ)(a) = αn Rn Hence, Fδa is the regular distribution corresponding to f (x) = αn e−ix·a . In particular, F(δ) = 1 Tαn 1 is the constant function. Note that F(δ) = G(δ) = √2π n T1 . Moreover, F(T1 ) = G(T1 ) = −1 αn δ. (b) n = 1, b > 0. Z e−ixξ H(b − | x |) dx F(H(b − | x |)) = α1 R Z b 2 sin(bξ) = α1 e−ixξ dx = √ . ξ 2π −b 16.4 Fourier Transformation in S (Rn ) and S ′ (Rn ) 449 (c) The single-layer distribution. Suppose that S is a compact, regular, piecewise differentiable, non self-intersecting surface in R3 and ̺(x) ∈ L1loc (R3 ) is a function on S (a density function or distribution—in the physical sense). We define the distribution ̺δS by the scalar surface integral ZZ h̺δS , ϕi = ̺(x)ϕ(x) dS. S The support of ̺δS is S, a set of measure zero with respect to the 3-dimensional Lebesgue measure. Hence, ̺δS is a singular distribution. Similarly, one defines the double-layer distribution (which comes from dipoles) by ZZ ∂ ∂ϕ(x) − (̺δS ) , ϕ = dS, ̺(x) ∂~n ∂~n S where ~n denotes the unit normal vector to the surface. We compute the Fourier transformation of the single layer F(̺δS ) in case of a sphere of radius r, Sr = {x ∈ R3 | kxk = r} and density ̺ = 1. By Fubini’s theorem, Z Z Z 1 −ix·ξ hFδSr , ϕi = δSr (0) , Fϕ √ 3 ϕ(x) dx dSξ e 3 Sr R 2π Z ZZ 1 (cos(x · ξ) − i sin(x · ξ)) dSξ ϕ(x) dx =√ 3 | {z } 3 Sr 2π R is 0 Using spherical coordinates on Sr , where x is fixed to be the z-axis and ϑ is the angle between x and ξ ∈ Sr , we have dS = r 2 sin ϑ dϕ dϑ and x · ξ = r kxk cos ϑ. Hence, the inner (surface) integral reads = Z 2π 0 = 2π Z Z π cos(kxk r cos ϑ)r 2 sin ϑdϑdϕ, 0 −kxkr kxkr − cos s s = kxk r cos ϑ, ds = − kxk r sin ϑdϑ r r ds = 4π sin(kxk r). kxk kxk Hence, 2r hFδSr , ϕi = √ 2π Z R3 ϕ(x) sin(r kxk) dx; kxk the Fourier transformation of δSr is the regular distribution 2r sin(r kxk) FδSr (x) = √ . kxk 2π (d) The Resolvent of the Laplacian −∆. Consider the Hilbert space H = L2 (Rn ) and its dense subspace S (Rn ). For ϕ ∈ S there is defined the Laplacian −∆ϕ. Recall that the resolvent of a linear operator A at λ is the bounded linear operator on H, given by Rλ (A) = (A − λI)−1 . Given f ∈ H we are looking for u ∈ H with Rλ (A) f = u. This is equivalent to solve 16 Distributions 450 f = (A − λI)(u) for u. In case of A = −∆ we can apply the Fourier transformation to solve this equation. By Theorem 16.10 (ii) ! n 2 X ∂ −∆u − λu = f, −F u − λFu = Ff, ∂x2k k=1 n X k=1 ξk2 (Fu)(ξ) − λFu(ξ) = (Fu)(ξ)(ξ 2 − λ) = Ff (ξ) Ff (ξ) ξ2 − λ 1 Ff (ξ) (x) u(x) = G 2 ξ −λ Fu(ξ) = Hence, Rλ (−∆) = F −1◦ 1 ◦F, ξ2 − λ where in the middle is the multiplication operator by the function 1/(ξ 2 − λ). One can see that this operator is bounded in H if and only if λ ∈ C \ R+ such that the spectrum of −∆ satisfies σ(−∆) ⊂ R+ . 16.5 Appendix—More about Convolutions Since the following proposition is used in several places, we make the statement explicit. Proposition 16.15 Let T (x, t) and S(x, t) be distributions in D′ (Rn+1 ) with supp T ⊂ Rn × R+ and supp S ⊂ Γ+ (0, 0). Here Γ+ (0, 0) = {(x, t) ∈ Rn+1 | kxk ≤ at} denotes the forward light cone at the origin. Then the convolution T ∗ S exists in D′ (Rn+1 ) and can be written as hT ∗ S , ϕi = hT (x, t) ⊗ S(y, s) , η(t)η(s)η(as − kyk) ϕ(x + y, t + s)i , (16.14) ϕ ∈ D(Rn+1 ), where η ∈ D(R) with η(t) = 1 for t > −ε and ε > 0 is any fixed positive number. The convolution (T ∗ S)(x, t) vanishes for t < 0 and is continuous in both components, that is (a) If Tk → T in D′ (Rn+1 ) and supp fk , f ⊆ Rn × R, then Tk ∗ S → T ∗ S in D′ (Rn+1 ). (b) If Sk → S in D′ (Rn+1 ) and supp Sk , S ⊆ Γ+ (0, 0), then T ∗ Sk → T ∗ S in D′ (Rn+1 ). Proof. Since η ∈ D(R), there exists δ > 0 with η(x) = 0 for x < −δ. Let ϕ(x, t) ∈ D(Rn+1 ) with supp ϕ ∈ UR (0) for some R > 0. Let ηK (x, t, y, s), K → R2n+2 , be a sequence in D(R2n+2 ) converging to 1 in R2n+2 , see before Definition 16.12. For sufficiently large K we then have ψK := η(s)η(t)η(as − kyk)ηK (x, t, y, s)ϕ(x + y, t + s) = η(s)η(t)η(at − kyk)ϕ(x + y, t + s) =: ψ. (16.15) 16.5 Appendix—More about Convolutions 451 To prove this it suffices to show that ψ ∈ D(R2n+2 ). Indeed, ψ is arbitrarily often differentiable and its support is contained in {(x, t, y, s) | s, t ≥ −δ, as − kyk ≥ −δ, kx + yk2 + | r + s |2 ≤ R2 }, which is a bounded set. Since η(t) = 1 in a neighborhood of supp T and η(s)η(as − kyk) = 1 in a neighborhood of supp S, T (x, t) = η(x)T (x, t) and S(y, s) = η(s)η(as − kyk)S(y, s). Using (16.15) we have hT ∗ S , ϕi = = lim2n+2 hT (x, t) ⊗ S(y, s) , ηK (x, t, y, s)ϕ(x + y, t + s)i K→ R lim2n+2 hT (x, t) ⊗ S(y, s) , ψK i , K→ R ϕ ∈ D(R2n+2 ). This proves the first assertion. We now prove that the right hand side of (16.14) defines a continuous linear functional on D(Rn+1 ). Let ϕk −→ ϕ as k → ∞. Then D ψk := η(t)η(s)η(as − kyk) ϕk (x + y, t + s) −→ ψ D as k → ∞. Hence, hT ∗ S , ϕk i = hT (x, t) ⊗ S(y, s) , ψk i → hT (x, t) ⊗ S(y, s) , ψi = hT ∗ S , ϕi , k → ∞, and T ∗ S is continuous. We show that T ∗ S vanishes for t < 0. For, let ϕ ∈ D(Rn+1 ) with supp ϕ ⊆ Rn × (−∞, −δ1 ]. Choosing δ > δ1 /2 one has η(t)η(s)η(as − kyk)ϕ(x + y, t + s) = 0, such that hT ∗ S , ϕi = 0. Continuity of the convolution product follows from the continuity of the tensor product. 452 16 Distributions Chapter 17 PDE II — The Equations of Mathematical Physics In this chapter we study in detail the Laplace equation, wave equation as well as the heat equation. Firstly, for all space dimensions n we determine the fundamental solutions to the corresponding differential operators; then we consider initial value problems and initial boundary value problems. We study eigenvalue problems for the Laplace equation. Recall Green’s identities, see Proposition 10.2, ZZ ZZZ ∂u ∂v dS, −v u (u∆(v) − v∆(u)) dxdydz = ∂~n ∂~n G ZZZ Z∂GZ ∂u ∆(u) dxdydz = dS. (17.1) ∂~n G ∂G We also need that for x ∈ Rn \ {0}, 1 = 0, ∆ kxkn−2 n ≥ 3, ∆(log kxk) = 0, n = 2, see Example 7.5. 17.1 Fundamental Solutions 17.1.1 The Laplace Equation Let us denote by ωn the measure of the unit sphere Sn−1 in Rn , that is, ω2 = 2π, ω3 = 4π. Theorem 17.1 The function En (x) = ( 1 2π log kxk , 1 − (n−2)ω n 1 , kxkn−2 n = 2, n≥3 is locally integrable; the corresponding regular distribution En satisfies the equation ∆En = δ, and hence is a fundamental solution for the Laplacian in Rn . 453 17 PDE II — The Equations of Mathematical Physics 454 Proof. Step 1. Example 7.5 shows that ∆(E(x)) = 0 if x 6= 0. Step 2. By homework 50.4, log kxk is in L1loc (R2 ) and 1/ kxkα ∈ L1loc (Rn ) if and only if α < n. Hence, En , n ≥ 2, define regular distributions in Rn . Let n ≥ 3 and ϕ ∈ D(Rn ). Using that 1/ kxkn−2 is locally integrable and Example 12.6 (a), ψ ∈ D implies Z Z ψ(x) ψ(x) dx = lim dx. n−2 ε→0 kxk≥ε r n−2 Rn r Abbreviating βn = −1/((n − 2)ωn ) we have Z ∆ϕ(x) dx RnZ kxkn−2 ∆ϕ(x) dx = βn lim . ε→0 kxk≥ε kxkn−2 h∆En , ϕi = βn We compute the integral on the right using v(x) = ∆v = 0. Applying Green’s identity, we have βn Z kxk≥ε ∆ϕ(x) dx = βn kxkn−2 Z kxk=ε 1 , r n−2 which is harmonic for kxk ≥ ε, 1 ∂ ∂ ϕ(x) − ϕ(x) n−2 r ∂r ∂r 1 r n−2 dS Let us consider the first integral as ε → 0. Note that ϕ and grad ϕ are both bounded by a constant C since hϕ is a test function. We make use of the estimate Z kxk=ε Z Z ∂ϕ(x) c dS c′ εn−1 1 ∂ ≤ ϕ(x) n−2 ≤ n−2 dS dS = = c′ ε εn−2 ∂r n−2 ∂r r ε ε kxk=ε kxk=ε which tends to 0 as ε → 0. \ n \v second integral = βn Hence we are left with computing the second integral. Note x that the outer unit normal vector to the sphere is ~n = − kxk ∂ 1 1 such that ∂~n kxkn−2 = (n − 2) rn−1 and we have Z −(n − 2) 1 1 ϕ(x) dS = n−1 r ωn εn−1 kxk=ε Z ϕ(x) dS. kxk=ε Note that ωn εn−1 is exactly the (n − 1)-dimensional measure of the sphere of radius ε. So, the integral is the mean value of ϕ over the sphere of radius ε. Since ϕ is continuous at 0, the mean value tends to ϕ(0). This proves the assertion in case n ≥ 3. The proof in case n = 2 is quite analogous. 17.1 Fundamental Solutions 455 Corollary 17.2 Suppose that f (x) is a continuous function with compact support. Then S = E ∗ f is a regular distribution and we have ∆S = f in D′ . In particular, ZZ 1 S(x) = log kx − yk f (y) dy, n = 2; 2π R2Z Z Z (17.2) f (y) 1 dy, n = 3. S(x) = − 4π kx − yk R3 Proof. By Theorem 16.8, S = E ∗ f is a solution of Lu = f if E is a fundamental solution of the differential operator L. Inserting the fundamental solution of the Laplacian for n = 2 and n = 3 and using that f has compact support, the assertion follows. Remarks 17.1 (a) The given solution (17.2) is even a classical solutions of the Poisson equation. Indeed, we can differentiate the parameter integral as usual. (b) The function G(x, y) = En (x − y) is called the Green’s function of the Laplace equation. 17.1.2 The Heat Equation Proposition 17.3 The function F (x, t) = kxk2 1 − 2 n H(t) e 4a t (4πa2 t) 2 defines a regular distribution E = TF and a fundamental solution of the heat equation ut − a2 ∆u = 0, that is Et − a2 ∆x E = δ(x) ⊗ δ(t). (17.3) Proof. Step 1. The function F (x, t) is locally integrable since F = 0 for t ≤ 0 and F ≥ 0 for t > 0 and Z Z Z n Y 2 1 1 2 − r2 −ξ √ e k dξk = 1. (17.4) F (x, t) dx = e 4a t dx = (4πa2 t)n/2 Rn π Rn R k=1 Step 2. For t > 0, F ∈ C∞ and therefore 2 ∂F x n F, = − ∂t 4a2 t2 2t 2 ∂F xi 1 xi ∂2F = − 2 F; = − , ∂xi 2a t ∂x2i 4a4 t2 2a2 t ∂F − a2 ∆F = 0. ∂t See also homework 59.2. (17.5) 17 PDE II — The Equations of Mathematical Physics 456 We give a proof using the Fourier transformation with respect to the spatial variables. Let E(ξ, t) = (Fx F )(ξ, t). We apply the Fourier transformation to (17.3) and obtain a first order ODE with respect to the time variable t: ∂ E(ξ, t) + a2 ξ 2 E(ξ, t) = αn 1(ξ)δ(t). ∂t Recall from Example 16.10, that u′ + bu = δ has the fundamental solution u(t) = H(t) e−bt ; hence 2 2 E(ξ, t) = H(t) αn e−a ξ t We want to apply the inverse Fourier transformation with respect to the spatial variables. For, note that by Example 16.11, ξ2 F −1 e− 2c2 where, in our case, 1 2c2 = a2 t or c = √1 . 2a2 t 22 E(x, t) = H(t) αn F −1 e−a ξ t = = cn e− c2 x 2 2 , Hence, 2 2 1 1 1 − x2 − x2 n · n e 2·2a t = n e 4a t . (2π) 2 (2a2 t) 2 (4πa2 t) 2 Corollary 17.4 Suppose that f (x, t) is a continuous function on Rn × R+ with compact support. Let kx−yk2 Z tZ − e 4a2 (t−s) 1 V (x, t) = H(t) n n f (y, s) dy ds (4a2 π) 2 0 Rn (t − s) 2 Then V (x, t) is a regular distribution in D′ (Rn × R+ ) and a solution of ut − a2 ∆u = f in D′ (Rn × R+ ). Proof. This follows from Theorem 16.8. 17.1.3 The Wave Equation We shall determine the fundamental solutions for the wave equation in dimensions n = 3, n = 2, and n = 1. In case n = 3 we again apply the Fourier transformation. For the other dimensions we use the method of descent. (a) Case n = 3 Proposition 17.5 H(t) ∈ D′ (R4 ) 4πa2 t is a fundamental solution for the wave operator 2a u = utt − a2 (ux1 x1 + ux2x2 + ux3 x3 ) where δSat denotes the single-layer distribution of the sphere of radius a·t around 0. E(x, t) = δSat ⊗ 17.1 Fundamental Solutions 457 Proof. As in case of the heat equation let E(ξ, t) = FE(ξ, t) be the Fourier transform of the fundamental solution E(x, t). Then E(ξ, t) satisfies ∂2 E + a2 ξ 2 E = α3 1(ξ)δ(t) ∂t2 Again, this is an ODE of order 2 in t. Recall from Example 16.10 that u′′ + a2 u = δ, a 6= 0, has a solution u(t) = H(t) sinaat . Thus, E(ξ, t) = α3 H(t) sin(a kξk t) , a kξk where ξ is thought to be a parameter. Apply the inverse Fourier transformation Fx−1 to this function. Recall from Example 16.12 (b), the Fourier transform of the single layer of the sphere of radius at around 0 is 2at sin(at kξk) FδSat (ξ) = √ . kξk 2π This shows E(x, t) = 1 1 1 1 H(t)δSat (x) = H(t)δSat (x) 2π 2at a 4πa2 t Let’s evaluate hE3 , ϕ(x, t)i. Using dx1 dx2 dx3 = dSr dr where x = (x1 , x2 , x3 ) and r = kxk as well as the transformation r = at, dr = a dt and dS is the surface element of the sphere Sr (0), we obtain Z ∞ ZZ 1 1 hE3 , ϕ(x, t)i = ϕ(x, t) dS dt (17.6) 2 4πa 0 t S Z ∞ ZatZ a 1 dr r = dS ϕ x, 2 4πa 0 r a a Sr Z ϕ x, kxk a 1 = dx. (17.7) 2 4πa R3 kxk (b) The Dimensions n = 2 and n = 1 To construct the fundamental solution E2 (x, t), x = (x1 , x2 ), we use the so-called method of descent. Lemma 17.6 A fundamental solution E2 of the 2-dimensional wave operator 2a,2 is given by hE2 , ϕ(x1 , x2 , t)i = lim hE3 (x1 , x2 , x3 , t) , ϕ(x1 , x2 , t)ηk (x3 )i , k→∞ where E3 denotes a fundamental solution of the 3-dimensional wave operator 2a,3 and ηk ∈ D(R) is the function converging to 1 as k → ∞. 17 PDE II — The Equations of Mathematical Physics 458 Proof. Let ϕ ∈ DR2 ). Noting that ηk′′ → 0 uniformly on R as k → ∞, we get h2a,2 E2 , ϕ(x1 , x2 , t)i = hE2 , 2a,2 ϕ(x1 , x2 , t)i = lim hE3 , 2a,2 (ϕ(x1 , x2 , t)ηk (x3 ))i k→∞ = lim hE3 , 2a,2 ϕ(x1 , x2 , t)·ηk (x3 )+ϕ·ηk′′ (x3 )i k→∞ = lim E3 , 2a,3 ϕ(x1 , x2 , t) ηk (x3 ) k→∞ = lim h2a,3 E3 , ϕ(x1 , x2 , t)ηk (x3 )i = k→∞ lim hδ(x1 , x2 , x3 )δ(t) , ϕ(x1 , x2 , t)ηk (x3 )i k→∞ = ϕ(0, 0, 0) = hδ(x1 , x2 , t) , ϕ(x1 , x2 , t)i . In the third line we used that ∆x1 ,x2 ,x3 ϕ(x1 , x2 , t)·η(x3 ) = ∆x1 ,x2 ϕ(x1 , x2 , t) η + ϕ η ′′ (x3 ). Proposition 17.7 (a) For x = (x1 , x2 ) ∈ R2 and t ∈ R, the regular distribution ( 1 √ 1 , at > kxk , 1 H(at − kxk) 2πa a2 t2 −x2 √ = E2 (x, t) = 2πa a2 t2 − x2 0, at ≤ kxk is a fundamental solution of the 2-dimensional wave operator. (b) The regular distribution ( 1 , | x | < at, 1 E1 (x, t) = H(at − | x |) = 2a 2a 0, | x | ≥ at is a fundamental solution of the one-dimensional wave operator. Proof. By the above lemma, 1 hE2 , ϕ(x1 , x2 , t)i = hE3 , ϕ(x1 , x2 , t)1(x3 )i = 4πa2 Z ∞ 0 1 t ZZ ϕ(x1 , x2 , t) dS dt. Sat We compute the surface element of the sphere of radius p at around 0 in terms of x1 , x2 . The surface is the graph of the function x3 = f (x1 , x2 ) = a2 t2 − x21 − x22 . By the formula before p Example 10.4, dS = 1 + fx21 + fx22 dx1 dx2 . In case of the sphere we have at dx1 dx2 dSx1 ,x2 = p a2 t2 − x21 − x22 Integration over both the upper and the lower half-sphere yields factor 2, Z ∞ ZZ 1 1 atϕ(x1 , x2 , t) p =2 dx1 dx2 dt 2 4πa 0 t a2 t2 − x21 − x22 x2 +x2 ≤a2 t2 Z ∞ ZZ 1 2 ϕ(x1 , x2 , t) 1 p = dx1 dx2 dt. 2πa 0 a2 t2 − x21 − x22 kxk≤at 17.2 The Cauchy Problem 459 This shows that E2 (x, t) is a regular distribution of the above form. RRR One can show directly that E2 ∈ L1loc (R3 ). Indeed, E2 (x1 , x2 , t) dx dt < ∞ for all R2×[−R,R] R > 0. (b) It was already shown in homework 57.2 that E1 is the fundamental solution to the onedimensional wave operator. A short proof is to be found in [Wla72, II. §6.5 Example g)]. Use the method of descent to complete this proof. Let ϕ ∈ D(R2 ). Since R R R | E2 (x1 , x2 , t) | dx2 < ∞ and R E2 (x1 , x2 , t) dx2 again defines a locally integrable function, we have as in Lemma 17.6 Z E1 (ϕ) = lim hE2 (x1 , x2 , t) , ϕ(x1 , t)ηk (x2 )i = lim E2 (x1 , x2 , t)ηk (x2 )ϕ(x1 , t) dx1 dx2 dt. k→∞ k→∞ R3 Hence, the fundamental solution E1 is the regular distribution Z ∞ E2 (x1 , x2 , t) dx2 E1 (x1 , t) = −∞ 17.2 The Cauchy Problem In this section we formulate and study the classical and generalized Cauchy problems for the wave equation and for the heat equation. 17.2.1 Motivation of the Method To explain the method, we first apply the theory of distribution to solve an initial value problem of a linear second order ODE. Consider the Cauchy problem u′′ (t) + a2 u(t) = f (t), u |t=0+ = u0 , u′ |t=0+ = u1 , (17.8) where f ∈ C(R+ ). We extend the solution u(t) as well as f (t) by 0 for negative values of t, t < 0. We denote the new function by ũ and f˜, respectively. Since ũ has a jump of height u0 at 0, by Example 16.6, ũ′ (t) = {u′(t)} + u0 δ(t). Similarly, u′ (t) jumps at 0 by u1 such that ũ′′ (t) = {u′′ (t)} + u0 δ ′ (t) + u1 δ(t). Hence, ũ satisfies on R the equation ũ′′ + a2 ũ = f˜(t) + u0 δ ′ (t) + u1δ(t). (17.9) We construct the solution ũ. Since the fundamental solution E(t) = H(t) sin at/a as well as the right hand side of (17.9) has positive support, the convolution product exists and equals ũ = E ∗ (f˜ + u0 δ ′ (t) + u1 δ(t)) = E ∗ f˜ + u0 E′ (t) + u1E(t) Z 1 t f (τ ) sin a(t − τ )dτ + u0 E′ (t) + u1 E(t). ũ = a 0 Since in case t > 0, ũ satisfies (17.9) and the solution of the Cauchy problem is unique, the above formula gives the classical solution for t > 0, that is Z sin at 1 t . f (τ ) sin a(t − τ )dτ + u0 cos at + u1 u(t) = a 0 a 17 PDE II — The Equations of Mathematical Physics 460 17.2.2 The Wave Equation (a) The Classical and the Generalized Initial Value Problem—Existence, Uniqueness, and Continuity Definition 17.1 (a) The problem 2a u = f (x, t), u|t=0+ = u0(x), ∂u = u1(x), ∂t x ∈ Rn , t > 0, (17.10) (17.11) (17.12) t=0+ where we assume that f ∈ C(Rn × R+ ), u0 ∈ C1 (Rn ), u1 ∈ C(Rn ). is called the classical initial value problem (CIVP, for short) to the wave equation. A function u(x, t) is called classical solution of the CIVP if u(x, t) ∈ C2 (Rn × R+ ) ∩ C1 (Rn × R+ ), u(x, y) satisfies the wave equation (17.10) for t > 0 and the initial conditions (17.11) and (17.12) as t → 0 + 0. (b) The problem 2a U = F (x, t) + U0 (x) ⊗ δ ′ (t) + U1 (x) ⊗ δ(t) with F ∈ D′ (Rn+1 ), U0 , U1 ∈ D′ (Rn ), and supp F ⊂ Rn × [0, +∞) is called generalized initial value problem (GIVP). A generalized function U ∈ D′ (Rn+1 ) with supp U⊂ Rn × [0, +∞) which satisfies the above equation is called a (generalized, weak) solution of the GIVP. Proposition 17.8 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f , u0 , and u1 . Then the regular distribution Tu is a solution of the GIVP with the right hand side Tf + Tu0 ⊗ δ ′ (t) + Tu1 ⊗ δ(t) provided that f (x, t) and u(x, t) are extended by 0 into the domain {(x, t) | (x, t) ∈ Rn+1 , t < 0}. (b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 , U1 = Tu1 and U = Tu be regular and they satisfy the regularity assumptions of the CIVP. Then, u(x, t) is a solution of the CIVP. Proof. (b) Suppose that U is a solution of the GIVP; let ϕ ∈ D(Rn+1 ). By definition of the tensor product and the derivative, Utt − a2 ∆U , ϕ = hF , ϕi + hU0 ⊗ δ ′ , ϕi + hU1 ⊗ δ , ϕi Z ∞Z Z Z ∂ϕ = f (x, t)ϕ(x, t) dx dt − u0 (x) (x, 0) dx + u1 (x)ϕ(x, 0) dx. ∂t 0 Rn Rn Rn (17.13) 17.2 The Cauchy Problem 461 Applying integration by parts with respect to t twice, we find Z ∞ Z ∞ ∞ uϕtt dt = uϕt |0 − ut ϕt dt 0 0 Z ∞ ∞ = −u(x, 0)ϕt (x, 0) − ut ϕ|0 + ut tϕ dt 0 Z ∞ = −u(x, 0)ϕt (x, 0) + ut (x, 0)ϕ(x, 0) + ut tϕ dt. 0 Since Rn has no boundary and ϕ has compact support, integration by parts with respect to the spatial variables x yields no boundary terms. RR RR Hence, by the above formula and u ∆ϕ dt dx = ∆u ϕ dt dx, we obtain Z Z ∞ 2 2 Utt − a ∆U , ϕ = U , ϕtt − a ∆ϕ = u(x, t) ϕtt − a2 ∆ϕ dt dx n Z Z ∞ ZR 0 (u(x, 0)ϕt (x, 0) − ut (x, 0)ϕ(x, 0)) dx. = utt − a2 ∆u ϕ(x, t) dx dt − Rn Rn 0 (17.14) For any ϕ ∈ D(Rn × R+ ), supp ϕ is contained in Rn × (0, +∞) such that ϕ(x, 0) = ϕt (x, 0) = 0. From (17.13) and (17.14) it follows that Z Z ∞ f (x, t) − utt + a2 ∆u ϕ(x, t) dt dx = 0. Rn 0 By Lemma 16.2 (Du Bois Reymond) it follows that utt − a2 ∆u = f on Rn × R+ . Inserting this into (17.13) and (17.14) we have Z Z (u0 (x) − u(x, 0))ϕt (x, 0) dx − (u1 (x) − ut (x, 0)) ϕ(x, 0) dx = 0. Rn Rn If we set ϕ(x, t) = ψ(x)η(t) where η ∈ D(R) and η(t) = 1 is constant in a neighborhood of 0, ϕt (x, 0) = 0 and therefore Z (u1 (x) − ut (x, 0))ψ(x) = 0, ψ ∈ D(Rn ). Rn Moreover, Z R n (u0(x) − u(x, 0))ψ(x) dx = 0, ψ ∈ D(Rn ) if we set ϕ(x, t) = tη(t)ψ(x). Again, Lemma 16.2 yields u0 (x) = u(x, 0), u1 (x) = ut (x, 0) and u(x, t) is a solution of the CIVP. (a) Conversely, if u(x, t) is a solution of the CIVP then (17.14) holds with U(x, t) = H(t) u(x, t). 17 PDE II — The Equations of Mathematical Physics 462 Comparing this with (17.13) it is seen that Utt − a2 ∆U = F + U0 ⊗ δ ′ + U1 ⊗ δ where F (x, t) = H(t)f (x, t), U0 (x) = u(x, 0) and U1 (x) = ut (x, 0). Corollary 17.9 Suppose that F , U0 , and U1 are data of the GIVP. Then there exists a unique solution U of the GIVP. It can be written as U = V + V (0) + V (1) where ∂En ∗x U0 . ∂t Here En ∗x U1 := En ∗ (U1 (x) ⊗ δ(t)) denotes the convolution product with respect to the spatial variables x only. The solution U depends continuously in the sense of the convergence in D′ on F , U0 , and U1 . Here En denote the fundamental solution of the n-dimensional wave operator 2a,n . V = En ∗ F, V (1) = En ∗x U1 , V (0) = Proof. The supports of the distributions U0 ⊗ δ ′ and U1 ⊗ δ are contained in the hyperplane {(x, t) ∈ Rn+1 | t = 0}. Hence the support of the distribution F + U0 ⊗ δ ′ + U1 ⊗ δ is contained in the half space Rn × R+ . It follows from Proposition 16.15 below that the convolution product U = En ∗ (F + U0 ⊗ δ ′ + U1 ⊗ δ) exists and has support in the positive half space t ≥ 0. It follows from Theorem 16.8 that U is a solution of the GIVP. On the other and, any solution of the GIVP has support in Rn × R+ and therefore, by Proposition 16.15, posses the convolution with En . By Theorem 16.8, the solution U is unique. Suppose that Uk −→ U1 as k → ∞ in D′ (Rn+1 ) then En ∗ Uk −→ En ∗ U1 by the continuity of the convolution product in D′ (see Proposition 16.15). (b) Explicit Solutions for n = 1, 2, 3 We will make the above formulas from Corollary 17.9 explicit, that is, we compute the above convolutions to obtain the potentials V , V (0) , and V (1) . Proposition 17.10 Let f ∈ C2 (Rn × R+ ), u0 ∈ C3 (Rn ), and u1 ∈ C2 (Rn ) for n = 2, 3; let f ∈ C1 (R+ ), u0 ∈ C2 (R), and u1 ∈ C1 (R) in case n = 1. Then there exists a unique solution of the CIVP. It is given in case n = 3 by Kirchhoff’s formula x−y ZZZ Z Z Z Z f y, t − a ∂ 1 1 1 u1(y) dSy + u0 (y) dSy . dy + u(x, t) = 2 4πa kx − yk t ∂t t Uat (x) Sat (x) Sat (x) (17.15) 17.2 The Cauchy Problem 463 The first term V is called retarded potential. In case n = 2, x = (x1 , x2 ), y = (y1 , y2 ), it is given by Poisson’s formula 1 u(x, t) = 2πa Z t ZZ f (y, s) dy ds q 0 a2 (t − s)2 − kx − yk2 Ua(t−s) (x) 1 + 2πa ZZ Uat (x) 1 ∂ + 2πa ∂t ZZ Uat (x) u1 (y) dy q a2 t2 − kx − yk2 u0 (y) dy q . (17.16) 2 2 2 a t − kx − yk In case n = 1 it is given by d’Alembert’s formula Z t Z x+a(t−s) Z x+at 1 1 1 u(x, t) = f (y, s) dy ds + u1 (y) dy + (u0 (x + at) + u0 (x − at)) . 2a 0 x−a(t−s) 2a x−at 2 (17.17) The solution u(x, t) depends continuously on u0 , u1 , and f in the following sense: If ˜ f − f < ε, | u0 − ũ0 | < ε0 , | u1 − ũ1 | < ε1 , k grad (u0 − ũ0 )k < ε′0 (where we impose the last inequality only in cases n = 3 and n = 2), then the corresponding solutions u(x, t) and ũ(x, t) satisfy in a strip 0 ≤ t ≤ T 1 | u(x, t) − ũ(x, y) | < T 2 ε + T ε1 + ε0 + (aT ε′0 ), 2 where the last term is omitted in case n = 1. Proof. (idea of proof) We show Kirchhoff’s formula. (a) The potential term with f . By Proposition 16.15 below, the convolution product E3 ∗ f exists. It is shown in [Wla72, p. 153] that for a locally integrable function f ∈ L1loc (Rn+1 ) with supp f ⊂ Rn × R+ , En ∗ Tf is again a locally integrable function. Formally, the convolution product is given by Z Z (E3 ∗ f )(x, t) = E3 (y, s)f (x − y, t − s) dy ds = E3 (x − y, t − s) f (y, s) dy ds, R4 R4 where the integral is to be understood the evaluation of E3 (y, s) on the shifted function f (x − y, t − s). Since f has support on the positive time axis, one can restrict oneselves to s > 0 and t − s > 0, that is to 0 < s < t. That is formula (17.6) gives Z t ZZ 1 1 E3 ∗ f (x, t) = f (x − y, t − s) dS(y) ds 2 4πa 0 s Sas Using r = as, dr = a ds, we obtain 1 V (x, t) = 4πa2 Z 0 at ZZ Sr 1 r dS(y) dr. f x − y, t − r a 17 PDE II — The Equations of Mathematical Physics 464 Using dy1 dy2 dy3 = dr dS as well as kyk = r = as we can proceed Z Z Z f x − y, t − kyk a 1 V (x, t) = dy1 dy2 dy3 . 2 4πa kyk Uat(0) The shift z = x − y, dz1 dz2 dz3 = dy1 dy2 dy3 finally yields Z Z Z f z, t − kx−zk a 1 V (x, t) = dz. 2 4πa kx − zk Uat(x) This is the first potential term of Kirchhoff’s formula. (b) We compute V (1) (x, t). By definition, V (1) = E3 ∗ (u1 ⊗ δ) = E3 ∗x u1 . Formally, this is given by, V (1) 1 (x, t) = 4πa2 t ZZZ ZRZ 1 δSat (y) u1(x − y) dy = 4πa2 t 3 = 1 4πa2 t ZZ u1 (x − y) dS(y) Sat u1 (y) dS(y). Sat (x) (c) Recall that (D α S) ∗ T = D α (S ∗ T ), by Remark 16.8 (b). In particular E3 ∗ (u0 ⊗ δ ′ ) = ∂ (E3 ∗x u0 ) ; ∂t which immediately gives (c) in view of (b). Remark 17.2 (a) The stronger regularity (differentiability) conditions on f, u0 , u1 are necessary to prove u ∈ C2 (Rn × R+ ) and to show stability. (b) Proposition 17.10 and Corollary 17.9 show that the GIVP for the wave wave equation is a well-posed problem (existence, uniqueness, stability). 17.2.3 The Heat Equation Definition 17.2 (a) The problem ut − a2 ∆u = f (x, t), u(x, 0) = u0 (x), x ∈ Rn , t > 0 where we assume that f ∈ C(Rn × R+ ), u0 ∈ C(Rn ) (17.18) (17.19) 17.2 The Cauchy Problem 465 is called the classical initial value problem (CIVP, for short) to the heat equation. A function u(x, t) is called classical solution of the CIVP if u(x, t) ∈ C2 (Rn × (0, +∞)) ∩ C(Rn × [0, +∞)), and u(x, t) satisfies the heat equation (17.18) and the initial condition (17.19). (b) The problem Ut − a2 ∆U = F + U0 ⊗ δ with F ∈ D′ (Rn+1 ), U0 ∈ D′ (Rn ), and supp F ⊂ Rn × R+ is called generalized initial value problem (GIVP). A generalized function U ∈ D′ (Rn+1 ) with supp U ⊂ Rn × R+ which satisfies the above equation is called a generalized solution of the GIVP. The fundamental solution of the heat operator has the following properties: Z E(x, t) dx = 1, Rn E(x, t) −→ δ(x), as t→0+. The fundamental solution describes the heat distribution of a point-source at the origin (0, 0). Since E(x, t) > 0 for all t > 0 and all x ∈ Rn , the heat propagates with infinite speed. This is in contrast to our experiences. However, for short distances, the heat equation is gives sufficiently good results. For long distances one uses the transport equation. We summarize the results which are similar to that of the wave equation. Proposition 17.11 (a) Suppose that u(x, t) is a solution of the CIVP with the given data f and u0 . Then the regular distribution Tũ is a solution of the GIVP with the right hand side Tf˜ + Tu−0 ⊗ δ provided that f (x, t) and u(x, t) are extended to f˜(x, t) and ũ(x, t) by 0 into the left half-space {(x, t) | (x, t) ∈ Rn+1 , t < 0}. (b) Conversely, suppose that U is a solution of the GIVP. Let the distributions F = Tf , U0 = Tu0 , and U = Tu be regular and they satisfy the regularity assumptions of the CIVP. Then, u(x, t) is a solution of the CIVP. Proposition 17.12 Suppose that F and U0 are data of the GIVP. Suppose further that F and U0 both have compact support. Then there exists a solution U of the GIVP which can be written as U = V + V (0) where V = E ∗ F, V (0) = E ∗x U0 . The solution U varies continuously with F and U0 . Remark 17.3 The theorem differs from the corresponding result for the wave equation in that there is no proposition on uniqueness. It turns out that the GIVP cannot be solved uniquely. A. Friedman, Partial differential equations of parabolic type, gave an example of a non-vanishing 17 PDE II — The Equations of Mathematical Physics 466 distribution which solves the GIVP with F = 0 and U0 = 0. However, if all distributions are regular and we place additional requirements on the growth for t, kxk → ∞ of the regular distribution, uniqueness can be achieved. For existence and uniqueness we introduce the following class of functions M = {f ∈ C(Rn × R+ ) | f Cb (Rn ) = {f ∈ C(Rn ) | f is bounded on the strip Rn × [0, T ] for all T > 0}, is bounded on Rn } Corollary 17.13 (a) Let f ∈ M and u0 ∈ Cb (Rn ). Then the two potentials V (x, t) as in Corollary 17.4 and Z 2 H(t) − kx−yk (0) 4a2 t V (x, t) = E ∗ Tu0 ⊗ δ = (y)e dy u n 0 (4πa2 t) 2 Rn are regular distributions and u = V + V (0) is a solution of the GIVP. (b) In case f ∈ C2 (Rn × R+ ) with D α f ∈ M for all α with | α | ≤ 1 (first order partial derivatives), the solution in (a) is a solution of the CIVP. In particular, V (0) (x, t) −→ u0 (x) as t → 0+. (c) The solution u of the GIVP is unique in the class M. 17.2.4 Physical Interpretation of the Results s The backward light cone s (x,t) 00000000000000000000000 11111111111111111111111 11111111111111111111111 00000000000000000000000 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 Γ (x,t) − y1 00000000000000000000000 11111111111111111111111 11111111111111111111111 00000000000000000000000 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 00000000000000000000000 11111111111111111111111 radius = at Γ (x,t) + y2 y2 (x,t) The forward light cone y1 Definition 17.3 We introduce the two cones in Rn+1 Γ− (x, t) = {(y, s) | kx − yk < a(t − s)}, Γ+ (x, t) = {(y, s) | kx − yk < a(s − t)}, s < t, s > t, which are called domain of dependence (backward light cone) and domain of influence (forward light cone), respectively. Recall that the boundaries ∂Γ+ and ∂Γ− are characteristic surfaces of the wave equation. 17.2 The Cauchy Problem 467 (a) Propagation of Waves in Space Consider the fundamental solution E3 (x, t) = 1 δS ⊗ T 1 t 4πa2 at of the 3-dimensional wave equation. It shows that the disturbance at time t > 0 effected by a point source δ(x)δ(t) in the origin is located on a sphere of radius at around 0. The disturbance moves like a spherical wave, kxk = at, with velocity a. In the beginning there is silence, then disturbance (on the sphere), and afterwards, again silence. This is called Huygens’ principle. Sat Disturbance is on a sphere of radius at It follows by the superposition principle that the solution u(x0 , t0 ) of an initial disturbance u0 (x)δ ′ (t) + u1 (x)δ(t) is completely determined by the values of u0 and u1 on the sphere of the backwards light-cone at t = 0; that is by the values u0 (x) and u1 (x) at all values x with kx − x0 k = at0 . Now, let the disturbance be situated in a compact set K rather than in a single point. Suppose that d and D are the minimal and maximal distances of x from K. Then the disturbance starts to act in x at time t0 = d/a it lasts for (D − d)/a; and again, for t > D/a = t1 there is silence at x. Therefore, we can observe a forward wave front at time t0 and a backward wave front at time t1 . silence disturbance M(K) M(K) silence K This shows that the domain of influence M(K) of compact set K is the union of all boundaries of forward light-cones Γ+ (y, 0) with y ∈ K at time t = 0. M(K) = {(y, s) | ∃ x ∈ K : kx − yk = as}. (b) Propagation of Plane Waves Consider the fundamental solution E2 (x, t) = H(at − kxk) q , 2 2 2 2πa a t − kxk of the 2-dimensional wave equation. x = (x1 , x2 ) 17 PDE II — The Equations of Mathematical Physics 468 000 111 111 000 000 111 000 111 000 111 000 111 t=0 t=1 0000000 1111111 1111111 0000000 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 0000000 1111111 t=2 1111111111 0000000000 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 0000000000 1111111111 t=3 0000000000000000 1111111111111111 1111111111111111 0000000000000000 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 0000000000000000 1111111111111111 t=4 It shows that the disturbance effected by a point source δ(x)δ(t) in the origin at time 0 is a disc Uat of radius at around 0. One observes a forward wave front moving with speed a. In contrast to the 3-dimensional picture, there exists no back front. The disturbance is permanently present from t0 on. We speak of wave diffusion; Huygens’ principle does not hold. Diffusion can also be observed in case of arbitrary initial disturbance u0 (x)δ ′ (t) + u1 (x)δ(t). Indeed, the superposition principle shows that the domain of dependence of a compact initial disturbance K is the union of all discs Uat (y) with y ∈ K. (c) Propagation on a Line 1 Recall that E1 (x, t) = 2a H(at − | x |). The disturbance at time t > 0 which is effected by a point source δ(x)δ(t) is the whole closed interval [−at, at]. We have two forward wave “fronts” one at the point x = at and one at x = −at; one moving to the right and one moving to the left. As in the plane case, there does not exist a back wave font; we observe diffusion. For more details, see the discussion in Wladimirow, [Wla72, p. 155 – 159]. 17.3 Fourier Method for Boundary Value Problems A good, easy accessable introduction to the Fourier method is to be found in [KK71]. In this section we use Fourier series to solve BEVP to the Laplace equation as well as initial boundary value problems to the wave and heat equations. Recall that the following sets are CNOS in the Hilbert space H 1 int √ e | n ∈ Z, , H = L2 (a, a + 2π), 2π 2πi 1 nt √ e b−a | n ∈ Z , H = L2 (a, b), b−a 1 1 1 √ , √ sin(nt), √ cos(nt) | n ∈ N , H = L2 (a, a + 2π), π π 2π ( ) r r 1 2 2 2π 2π √ , sin nt , cos nt | n ∈ N , b−a b−a b−a b−a b−a For any function f ∈ L1 (0, 2π) one has an associated Fourier series f∼ X Z n∈ int cn e , 1 cn = √ 2π Z 0 2π f (t)e−int dt. H = L2 (a, b), 17.3 Fourier Method for Boundary Value Problems 469 Lemma 17.14 Each of the following two sets forms a CNOS in H = L2 (0, π) (on the half interval). (r ) (r ) 2 2 sin(nt) | n ∈ N , cos(nt) | n ∈ N0 . π π Proof. To check that they form an NOS is left to the reader. We show completeness of the first set. Let f ∈ L2 (0, π). Extend f to an odd function f˜ ∈ L2 (−π, π), that is f˜(x) = f (x) and f˜(−x) = −f (x) for x ∈ (0, π). Since f˜ is an odd function, in its Fourier series ∞ a0 X + an cos(nt) + bn sin(nt) 2 n=1 P 2 ˜ we have an = 0 for all n. Since the Fourier series ∞ n=1 bn sin(nt) converges to f in L (−π, π), it converges to f in L2 (0, π). Thus, the sine system is complete. The proof for the cosine system is analogous. 17.3.1 Initial Boundary Value Problems (a) The Homogeneous Heat Equation, Periodic Boundary Conditions We consider heat conduction in a closed wire loop of length 2π. Let u(x, t) be the temperature of the wire at position x and time t. Since the wire is closed (a loop), u(x, t) = u(x + 2π, t); u is thought to be a 2π periodic function on R for every fixed t. Thus, we have the following periodic boundary conditions u(0, t) = u(2π, t), ux (0, t) = ux (2π, t), t ∈ R+ . (PBC) The initial temperature distribution at time t = 0 is given such that the BIVP reads ut − a2 uxx = 0, x ∈ R, t > 0, u(x, 0) = u0 (x), (PBC). x ∈ R, Separation of variables. We are seeking solutions of the form u(x, t) = f (x) · g(t) ignoring the initial conditions for a while. The heat equation then takes the form f (x)g ′ (t) = a2 f ′′ (x)g(t) ⇐⇒ 1 g ′ (t) f ′′ (x) = = κ = const. a2 g(t) f (x) We obtain the system of two independent ODE only coupled by κ: f ′′ (x) − κf (x) = 0, g ′(t) − a2 κg(t) = 0. (17.20) 17 PDE II — The Equations of Mathematical Physics 470 The periodic boundary conditions imply: f (0)g(t) = f (2π)g(t), f ′ (x)g(t) = f ′′ (2π)g(t) which for nontrivial g gives f (0) = f (2π) and f ′ (0) = f ′ (2π). In case κ = 0, f ′′ (x) = 0 has the general solution f (x) = ax + b. The only periodic solution is f (x) = b = const. Suppose now that κ = −ν 2 < 0. Then the general solution of the second order ODE is f (x) = c1 cos(νx) + c2 sin(νx). Since f is periodic with period 2π, only a discrete set of values ν are possible, namely νn = n, n ∈ Z. This implies κn = −n2 , n ∈ N. Finally, in case κ = ν 2 > 0, the general solution f (x) = c1 eνx + c2 e−νx provides no periodic solutions f . So far, we obtained a set of solutions fn (x) = an cos(nx) + bn sin(nx), n ∈ N0 , corresponding to κn = −n2 . The ODE for gn (t) now reads gn′ (t) + a2 n2 gn (t) = 0. 2 n2 t , n ∈ N. Hence, the solutions of the BVP are given by 2 n2 t (an cos(nx) + bn sin(nx)), n ∈ N, Its solution is gn (t) = ce−a un (x, t) = e−a u0 (x, t) = a0 2 and finite or “infinite” linear combinations: ∞ a0 X −a2 n2 t + e (an cos(nx) + bn sin(nx)), u(x, t) = 2 n=0 (17.21) Consider now the initial value, that is t = 0. The corresponding series is ∞ a0 X an cos(nx) + bn sin(nx), + u(x, 0) = 2 n=0 (17.22) which gives the ordinary Fourier series of u0 (x). That is, the Fourier coefficient an and bn of the initial function u0 (x) formally give a solution u(x, t). 1. If the Fourier series of u0 pointwise converges to u0 , the initial conditions are satisfied by the function u(x, t) given in (17.21) 2. If the Fourier series of u0 is twice differentiable (with respect to x), so is the function u(x, t) given by (17.21). 17.3 Fourier Method for Boundary Value Problems 471 Lemma 17.15 Consider the BIVP (17.20). (a) Existence. Suppose that u0 ∈ C4 (R) is periodic. Then the function u(x, t) given by (17.21) is in C2,1 x,t ([0, 2π] × R+ ) and solves the classical BIVP (17.20). (b) Uniqueness and Stability. In the class of functions C2,1 x,t ([0, 2π] × R+ ) the solution of the above BIVP is unique. (4) Proof. (a) The Fourier coefficients of u0 are bounded such that the Fourier coefficients of u0 (4) have growth 1/n4 (integrate the Fourier series of u0 four times). Then the series for uxx (x, t) ∞ X 1 and ut (x, t) both are dominated by the series ; hence they converge uniformly. This n2 n=0 shows that the series u(x, t) can be differentiated term by term twice w.r.t. x and once w.r.t. t. (b) For any fixed t ≥ 0, u(x, t) is continuous in x. Consider v(t) := ku(x, t)k2L2 (0,2π) . Then Z 2π Z 2π Z 2π d 2 u(x, t) dx = 2 u(x, t)ut (x, t) dx = 2 u(x, t)a2 uxx (x, t) dx v (t) = dt 0 0 0 Z 2π Z 2π 2π = 2 a2 ux u0 − a2 (ux (x, t))2 dx = −2a2 u2x dx ≤ 0. ′ 0 0 This shows that v(t) is monotonically decreasing in t. Suppose now u1 and u2 both solve the BIVP in the given class. Then u = u1 − u2 solves the BIVP in this class with homogeneous initial values, that is u(x, 0) = 0, hence, v(0) = ku(x, 0)k2L2 = 0. Since v(t) is decreasing for t ≥ 0 and non-negative, v(t) = 0 for all t ≥ 0; hence u(x, t) = 0 in L2 (0, 2π) for all t ≥ 0. Since u(x, t) is continuous in x, this implies u(x, t) = 0 for all x and t. Thus, u1 (x, t) = u2 (x, t)—the solution is unique. Stability. Since v(t) is decreasing sup ku(x, t)kL2 (0,2π) ≤ ku0 kL2 (0,2π) . R+ t∈ This shows that small changes in the initial conditions u0 imply small changes in the solution u(x, t). The problem is well-posed. (b) The Inhomogeneous Heat Equation, Periodic Boundary Conditions We study the IBVP ut − a2 uxx = f (x, t), u(x, 0) = 0, (17.23) (PBC). √ Solution. Let en (x) = einx / 2π, n ∈ Z, be the CNOS in L2 (0, 2π). These functions are all ′′ 2 d2 eigen functions with respect to the differential operator dx 2 , en (x) = −n en (x). Let t > 0 be fixed and X f (x, t) ∼ cn (t) en (x) Z n∈ 17 PDE II — The Equations of Mathematical Physics 472 be the Fourier series of f (x, t) with coefficients cn (t). For u, we try the following ansatz X u(x, t) ∼ dn (t) en (x) (17.24) Z n∈ If f (x, t) is continuous in x and piecewise continuously differentiable with respect to x, its Fourier series converges pointwise and we have X X cn (t)en . en d′ (t) + a2 n2 en d(t) = f (x, t) = ut − a2 uxx = Z N n∈ n∈ For each n this is an ODE in t d′n (t) + a2 n2 dn (t) = cn (t), dn (0) = 0. From ODE the solution is well-known −a2 n2 t dn (t) = e Z t 2 n2 s ea cn (s) ds. 0 Under certain regularity and growth conditions on f , (17.24) solves the inhomogeneous IBVP. (c) The Homogeneous Wave Equation with Dirichlet Conditions Consider the initial boundary value problem of the vibrating string of length π. (E) (BC) (IC) utt − a2 uxx = 0, 0 < x < π, t > 0; u(0, t) = u(π, t) = 0, u(x, 0) = ϕ(x), ut (x, 0) = ψ(x), 0 < x < π. The ansatz u(x, t) = f (x)g(t) yields g ′′ f ′′ =κ= 2 , f ag f ′′ (x) = κf (x), g ′′ = κa2 g. The boundary conditions imply f (0) = f (π) = 0. Hence, the first ODE has the only solutions fn (x) = cn sin(nx), κn = −n2 , n ∈ N. The corresponding ODEs for g then read gn′′ + n2 a2 gn = 0, which has the general solution an cos(nat) + bn sin(nat). Hence, ∞ X u(x, t) = (an cos(nat) + bn sin(nat)) sin(nx) n=1 17.3 Fourier Method for Boundary Value Problems 473 solves the boundary value problem in the sense of D′ (R2 ) (choose any an , bn of polynomial growth). Now, insert the initial conditions, t = 0: u(x, 0) = ∞ X ! an sin(nx) = ϕ(x), ut (x, 0) = n=1 q ∞ X ! nabn sin(nx) = ψ(x). n=1 Since { π2 sin(nx) | n ∈ N} is a CNOS in L2 (0, π), we can determine the Fourier coefficients of ϕ and ψ with respect to this CNOS and obtain an and bn , respectively. Regularity. Suppose that ϕ ∈ C4 ([0, π]), ψ ∈ C3 ([0, π]). Then the Fourier-Sine coefficients an and anbn of ϕ and ψ have growth 1/n4 and 1/n3 , respectively. Hence, the series ∞ X u(x, t) = (an cos(nat) + bn sin(nat)) sin(nx) (17.25) n=1 can be differentiated twice with respect to x or t since the differentiated series have a summable P upper bound c/n2 . Hence, (17.25) solves the IBVP. (d) The Wave Equation with Inhomogeneous Boundary Conditions Consider the following problem in Ω ⊂ Rn utt − a2 ∆u = 0, u(x, 0) = ut (x, 0) = 0 u |∂Ω = w(x, t). Idea. Find an extension v(x, t) of w(x, t), v ∈ C2 (Ω × R+ ), and look for functions ũ = u − v. Then ũ has homogeneous boundary conditions and satisfies the IBVP ũtt − a2 ∆ũ = −vtt + a2 ∆v, ũ(x, 0) = −v(x, 0), ũ |∂Ω = 0. ũt (x, 0) = −vt (x, 0) This problem can be split into two problems, one with zero initial conditions and one with homogeneous wave equation. 17.3.2 Eigenvalue Problems for the Laplace Equation In the previous subsection we have seen that BIVPs using Fourier’s method often lead to boundary eigenvalue problems (BEVP) for the Laplace equation. We formulate the problems. Let n = 1 and Ω = (0, l). One considers the following types of BEVPs to the Laplace equation; f ′′ = λf : • Dirichlet boundary conditions: f (0) = f (l) = 0. • Neumann boundary conditions: f ′ (0) = f ′ (l) = 0. 17 PDE II — The Equations of Mathematical Physics 474 • Periodic boundary conditions : f (0) = f (l), f ′ (0) = f ′ (l). • Mixed boundary conditions: α1 f (0) + α2 f ′ (0) = 0, β1 f (l) + β2 f ′ (l) = 0. • Symmetric boundary conditions: If u and v are function satisfying these boundary conditions, then (u′ v − uv ′ )|l0 = 0. In this case integration by parts gives Z l Z l ′′ ′′ ′ ′ l (u v − uv ) dx = u v − v u|0 − (u′ v ′ − v ′ u′) dx = 0. 0 0 That is u′′ ·v = u·v ′′ and the Laplace operator becomes symmetric. Proposition 17.16 Let Ω ⊂ Rn . The BEVP with Dirichlet conditions ∆u = λu, u ∈ C2 (Ω) ∩ C1 (Ω) u |∂Ω = 0, (17.26) has countably many eigenvalues λk . All eigenvalues are negative and of finite multiplicity. Let 0 > λ1 > λ2 > · · · then sequence ( λ1k ) tends to 0. The eigenfunctions uk corresponding to λk form a CNOS in L2 (Ω). Sketch of proof. (a) Let H = L2 (Ω). We use Green’s 1st formula with u = v , u |∂Ω = 0, Z Z u ∆u dx + (∇u)2 dx = 0. Ω Ω to show that all eigenvalues of ∆ are negative. Let ∆u = λu. First note, that λ = 0 is not an eigenvalue of ∆. Suppose to the contrary ∆u = 0, that is, u is harmonic. Since u |∂Ω = 0, by the uniqueness theorem for the Dirichlet problem, u = 0 in Ω. Then Z Z 2 u ∆u dx = − (∇u)2 dx < 0. λ kuk = λ hu , ui = hλu , ui = h∆u , ui = Ω Ω Hence, λ is negative. (b) Assume that a Green’s function G for Ω exists. By (17.34), that is Z Z ∂G(x, y) u(y) = G(x, y) ∆u(x) dx + u(x) dS(x), ∂~nx Ω ∂Ω u |∂Ω = 0 implies u(y) = Z G(x, y) ∆u(x) dx. Ω 2 This shows that the integral operator A : L (Ω) → L2 (Ω) defined by Z (Av)(y) := G(x, y)v(x) dx Ω is inverse to the Laplacian. Since G(x, y) = G(y, x) is real, A is self-adjoint. By (a), its eigenvalues, 1/λk are all negative. If ZZ | G(x, y) |2 dxdy < ∞, Ω×Ω 17.3 Fourier Method for Boundary Value Problems 475 A is a compact operator. R We want to justify the last statement. Let (Kf )(x) = Ω k(x, y)f (y) dy be an integral operator on H = L2 (Ω), with kernel k(x, y) ∈ H = L2 (Ω × Ω). Let {un | n ∈ N} be a CNOS in H; then {un (x)um (y) | n, m ∈ N} is a CNOS in H. Let knm be the Fourier coefficients of k with respect to the basis {un (x)um (y)} in H. Then Z X (Kf )(x) = f (y) knm un (x)um (y) dy Ω = n,m X un (x) n = X X knm m un (x) n X m Z um (y)f (y) dy Ω knm hf , um i = X n,m ! knm hf , um i un . This in particular shows that 2 kKf k = 2 kKk ≤ X m,n X n 2 kmn X 2 hf , um i ≤ 2 kmn m,n 2 2 2 kf k = kf k Z Ω×Ω k(x, y)2 dxdy = kf k2 X n kKun k2 kKun k We show that K is approximated by the sequence (Kn ) defined by Kn f = n XX m r=1 krm hf , um i ur of finite rank operators. Indeed, 2 k(K − Kn )f k = ∞ X X 2 krm m r=n+1 2 | hf , um i | ≤ sup such that 2 kK − Kn k = sup m m ∞ X r=n+1 ∞ X r=n+1 2 krm kf k2 2 krm −→ 0 as n → ∞. Hence, K is compact. (c) By (a) and (b), A is a negative, compact, self-adjoint operator. By the spectral theorem for compact self-adjoint operators, Theorem 13.33, there exists an NOS (uk ) of eigenfunctions to 1/λk of A. The NOS (uk ) is complete since 0 is not an eigenvalue of A. Example 17.1 Dirichlet Conditions on the Square. Let Q = (0, π) × (0, π) ⊂ R2 . The Laplace operator with Dirichlet boundary conditions on Ω has eigenfunctions umn (x, y) = 2 sin(mx) sin(ny), π corresponding to the eigenvalues λmn = −(m2 + n2 ). The eigenfunctions {umn | m, n ∈ N} form a CNOS in the Hilbert space L2 (Ω). 17 PDE II — The Equations of Mathematical Physics 476 Example 17.2 Dirichlet Conditions on the ball U1 (0) in Dirichlet boundary conditions on the ball. −∆u = λu, R . We consider the BEVP with 2 u |S1 (0) = 0. In polar coordinates u(x, y) = ũ(r, ϕ) this reads, 1 1 ∂ (r ũr ) + 2 ũϕϕ = −λũ, 0 < r < 1, 0 ≤ ϕ < 2π. r ∂r r Separation of variables. We try the ansatz ũ(r, ϕ) = R(r)Φ(ϕ). We have the boundary condition R(1) = 0 and in R(r) is bounded in a neighborhood of r = 0. Also, Φ is periodic. Then ∂ ũ = R′ Φ and ∂r ∆ũ(r, ϕ) = ∂ ∂ (rũr ) = (rR′ Φ) = (R′ + rR′′ )Φ, ũϕϕ = RΦ′′ . ∂r ∂r Hence, ∆u = −λu now reads ′ R R ′′ + R Φ + 2 Φ′′ = −λRΦ r r R′ r + R′′ 1 Φ′′ + 2 = −λ R r Φ Φ′′ rR′ + r 2 R′′ + λr 2 = − = µ. R Φ In this way, we obtain the two one-dimensional problems 2 ′′ ′ Φ′′ + µΦ = 0, Φ(0) = Φ(2π); 2 | R(0) | < ∞, r R + rR + (λr − µ)R = 0, R(1) = 0. (17.27) The eigenvalues and eigenfunctions to the first problem are µk = k 2 , Φk (ϕ) = eikϕ , k ∈ Z. 2 Equation (17.27) is the Bessel ODE. √ For µ = k the solution of (17.27) bounded in r = 0 is given by the Bessel function Jk (r λ). Recall from homework 21.2 that 2n+k ∞ X (−1)n x2 Jk (x) = , k ∈ N0 . n!(n + k)! n=0 To √ determine the eigenvalues λ we use the boundary condition R(1) = 0 in (17.27), namely √ Jk ( λ) = 0. Hence, λ = µkj , where µkj , j = 1, 2, . . . , denote the positive zeros of Jk . We obtain λkj = µ2kj , Rkj (r) = Jk (µkj r), j = 1, 2, · · · . The solution of the BEVP is λkj = µ2kj , ukj (x) = J| k | (µ| k |j r)eikϕ , k ∈ Z, j = 1, 2, · · · . Note that the Bessel functions {Jk | k ∈ Z+ } and the system {eikt | k ∈ Z} form a complete OS in L2 ((0, 1), r dr) and in in L2 (0, 2π), respectively. Hence, the OS {ukl | k ∈ Z, l ∈ Z+ } is a complete OS in L2 (U1 (0)). Thus, there are no further solutions to the given BEVP. For more details on Bessel functions, see [FK98, p. 383]. 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 477 17.4 Boundary Value Problems for the Laplace and the Poisson Equations Throughout this section (if nothing is stated otherwise) we will assume that Ω is a bounded region in Rn , n ≥ 2. We suppose further that Ω belongs to the class C2 , that is, the boundary ∂Ω consists of finitely many twice continuously differentiable hypersurfaces; Ω ′ := Rn \ Ω is assumed to be connected (i. e. it is a region, too). All functions are assumed to be real valued. 17.4.1 Formulation of Boundary Value Problems (a) The Inner Dirichlet Problem: Given ϕ ∈ C(∂Ω) and f ∈ C(Ω), find u ∈ C(Ω) ∩ C2 (Ω) such that ∆u(x) = f (x) ∀ x ∈ Ω, and u(y) = ϕ(y), ∀ y ∈ ∂Ω. (b) The Exterior Dirichlet Problem: Given ϕ ∈ C(∂Ω) and f ∈ C(Ω ′ ), find u ∈ C(Ω ′ ) ∩ C2 (Ω ′ ) such that ∆u(x) = f (x), ∀ x ∈ Ω ′ , and u(y) = ϕ(y), ∀ y ∈ ∂Ω, lim u(x) = 0. | x |→∞ (c) The Inner Neumann Problem: n y Ω Given ϕ ∈ C(∂Ω) and f ∈ C(Ω), find u ∈ C1 (Ω) ∩ C2 (Ω) such that x=y−tn ∆u(x) = f (x) ∀ x ∈ Ω, and ∂u (y) = ϕ(y), ∀ y ∈ ∂Ω. ∂~n− Here ∂u (y) ∂~ n− denotes the limit of directional derivative ∂u (y) = lim ~n(y) · grad u(y − t~n(y)) t→0+0 ∂~n− and ~n(y) is the outer normal to Ω at y ∈ ∂Ω. That is, x ∈ Ω approaches y ∈ ∂Ω in the direction of the normal vector ~n(y). We assume that this limit exists for all boundary points y ∈ ∂Ω. 17 PDE II — The Equations of Mathematical Physics 478 (d) The Exterior Neumann Problem: x=y+tn n y Given ϕ ∈ C(∂Ω) and f ∈ C(Ω ′ ), find u ∈ C1 (Ω ′ ) ∩ C2 (Ω ′ ) such that Ω ∆u(x) = f (x) ∀ x ∈ Ω ′ , ∂u (y) = ϕ(y), ∀ y ∈ ∂Ω ∂~n + lim u(x) = 0. and | x |→∞ Here ∂u (y) ∂~ n+ denotes the limit of directional derivative ∂u (y) = lim ~n(y) · grad u(y + t~n(y)) t→0+0 ∂~n + and ~n(y) is the outer normal to Ω at y ∈ ∂Ω. We assume that this limit exists for all boundary points y ∈ ∂Ω. In both Neumann problems one can also look for a function u ∈ C2 (Ω) ∩ C(Ω) ′ or u ∈ C2 (Ω ′ ) ∩ C(Ω ), respectively, provided the above limits exist and define continuous functions on the boundary. These four problems are intimately connected with each other, and we will obtain solutions to all of them simultaneously. 17.4.2 Basic Properties of Harmonic Functions Recall that a function u ∈ C2 (Ω), is said to be harmonic if ∆u = 0 in Ω. We say that an operator L on a function space V over Rn is invariant under a affine transformation T , T (x) = A(x) + b, where A ∈ L (Rn ), b ∈ Rn , if L◦T ∗ = T ∗ ◦L, where T ∗ : V → V is given by (T ∗ f )(x) = f (T (x)), f ∈ V . It follows that the Laplacian is invariant under translations (T (x) = x+ b) and rotations T (i. e. T ⊤ T = T T ⊤ = I). Indeed, for translations, the matrix B with à = BAB ⊤ is the identity and in case of the rotation, B = T −1 . Since A = I, in both cases, à = A = I; the Laplacian is invariant. In this section we assume that Ω ⊂ Rn is a region where Gauß’ divergence theorem is valid for all vector fields f ∈ C1 (Ω) ∩ C(Ω) for : Z Z ~ div f (x) dx = f (y) · dS(y), Ω ∂Ω where the dot · denotes the inner product in R . The term under the integral can be written as n ~ = (f1 (y), · · · , fn (y)) · ( dy2 ∧ dy3 ∧ · · · ∧ dyn , − dy1 ∧ dy3 ∧ · · · ∧ dyn , ω(y) = f (y) · dS · · · , (−1)n−1 dy1 ∧ · · · ∧ dyn−1 n X = (−1)k−1 fk (y) dy1 ∧ · · · ∧ d dyk ∧ · · · ∧ dyn , k=1 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 479 where the hat means ommission of this factor. In this way ω(y) becomes a differential (n − 1)form. Using differentiation of forms, see Definition 11.7, we obtain dω = div f (y) dy1 ∧ dy2 ∧ · · · ∧ dyn . This establishes the above generalized form of Gauß’ divergence theorem. Let U : ∂Ω → R be ~ a continuous scalar function on ∂Ω, one can define U(y) dS(y) := U(y)~n(y) · dS(y), where ~n is the outer unit normal vector to the surface ∂Ω. Recall that we obtain Green’s first formula inserting f (x) = v(x)∇u(x), u, v ∈ C2 (Ω): Z Z Z ∂u v(x)∆u(x) dx + ∇u(x) · ∇v(x) dx = v(y) (y) dS(y). ∂~n Ω Ω ∂Ω Interchanging the role of u and v and taking the difference, we obtain Green’s second formula Z Z ∂v ∂u (y) − u(y) (y) dS(y). (17.28) v(y) (v(x)∆u(x) − u(x)∆v(x)) dx = ∂~n ∂~n Ω ∂Ω Recall that 1 log kxk , n = 2, 2π 1 kxk−n+2 , En (x) = − (n − 2)ωn E2 (x) = n≥3 are the fundamental solutions of the Laplacian in Rn . Theorem 17.17 (Green’s representation formula) Let u ∈ C2 (Ω). Then for x ∈ Ω we have Z Z ∂En ∂u (y) dS(y) u(x) = En (x − y) ∆u(y) dy + u(y) (x − y) − En (x − y) ∂~ny ∂~n Ω ∂Ω (17.29) Here y. ∂ ∂~ ny denotes the derivative in the direction of the outer normal with respect to the variable ∂u ∂ Note that the distributions {∆u}, ∂~ δ , and ∂~ (u δ∂Ω ) have compact support such that the n ∂Ω n convolution products with En exist. Proof. Idea of proof For sufficiently small ε > 0, Uε (x) ⊂ Ω, since Ω is open. We apply Green’s second formula with v(y) = En (x − y) and Ω \ Uε (x) in place of Ω. Since En (x − y) is harmonic with respect to the variable y in Ω \ {x} (recall from Example 7.5, that En (x) is harmonic in Rn \ {0}), we obtain Z Z ∂u ∂En (x − y) En (x − y)∆(y) dy = En (x − y) (y) − u(y) dS(y) ∂~n ∂~ny ∂Ω Ω \ Uε (x) + Z ∂Uε (x) ∂En (x − y) ∂u En (x − y) (y) − u(y) dS(y). (17.30) ∂~n ∂~ny 17 PDE II — The Equations of Mathematical Physics 480 In the second integral ~n denotes the outer normal to Ω \ Uε (x) hence the inner normal of Uε (x). We wish to evaluate the limits of the individual integrals in this formula as ε → 0. Consider the left-hand side of (17.30). Since u ∈ C2 (Ω), ∆u is bounded; since En (x − y) is locally integrable, the lhs converges to Z En (x − y) ∆u(y) dy. Ω On ∂Uε (x), we have En (x − y) = βn ε−n+2 , βn = −1/(ωn (n − 2)). Thus as ε → 0, Z Z | β | Z ∂u ∂u(y) | βn | ∂u n En (x − y) (y) dS ≤ n−2 dS ∂~n (y) dS ≤ εn−2 sup ∂~n ε ∂~n Uε (x) ∂Uε (x) ∂Uε (x) Sε (x) ∂u(y) 1 ωn εn−1 sup = Cε −→ 0. ≤ n−2 (n − 2)ωn ε ∂~n Uε (x) Furthermore, since ~n is the interior normal of the ball Uε (y), the same calculations as in the d proof of Theorem 17.1 show that ∂En∂~(x−y) = −βn dε (ε−n+2 ) = −ε−n+1 /ωn . We obtain, ny Z Z ∂En (x − y) 1 − u(y) dS(y) = u(y) dS(y) −→ u(x). ∂~ny ωn εn−1 ∂Uε (x) Sε (x) {z } | spherical mean In the last line we used that the integral is the mean value of u over the sphere Sε (x), and u is continuous at x. Remarks 17.4 (a) Green’s representation formula is also true for functions u ∈ C2 (Ω) ∩ C1 (Ω). To prove this, consider Green’s representation theorem on smaller regions Ωε ⊂ Ω such that Ωε ⊂ Ω. (b) Applying Green’s representation formula to a test function ϕ ∈ D(Ω), see Definition 16.1, ϕ(y) = ∂ϕ (y) = 0, y ∈ ∂Ω, we obtain ∂~ n Z ϕ(x) = En (x − y)∆ϕ(x) dx Ω (c) We may now draw the following consequence from Green’s representation formula: If one knows ∆u, then u is completely determined by its values and those of its normal derivative on ∂Ω. In particular, a harmonic function on Ω can be reconstructed from its boundary data. One may ask conversely whether one can construct a harmonic function for arbitrary given values ∂u of u and ∂~ on ∂Ω. Ignoring regularity conditions, we will find out that this is not possible in n general. Roughly speaking, only one of these data is sufficient to describe u completely. (d) In case of a harmonic function u ∈ C2 (Ω)∩C1 (Ω), ∆u = 0, Green’s representation formula reads (n = 3): Z 1 1 ∂ ∂u(y) 1 u(x) = dS(y). (17.31) − u(y) 4π ∂Ω kx − yk ∂~n ∂~ny kx − yk 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 481 In particular, the surface potentials V (0) (x) and V (1) (x) can be differentiated arbitrarily often for x ∈ Ω. Outside ∂Ω, V (0) and V (1) are harmonic. It follows from (17.31) that any harmonic function is a C∞ -function. Spherical Means and Ball Means First of all note that u ∈ C1 (Ω) and u harmonic in Ω implies Z ∂u(y) dS = 0. n ∂Ω ∂~ (17.32) Indeed, this follows from Green’s first formula inserting v = 1 and u harmonic, ∆u = 0. Proposition 17.18 (Mean Value Property) Suppose that u is harmonic in UR (x0 ) and continuous in UR (x0 ). (a) Then u(x0 ) coincides with its spherical mean over the sphere SR (x0 ). Z 1 u(x0 ) = u(y) dS(y) (spherical mean). (17.33) ωn Rn−1 SR (x0 ) (b) Further, n u(x0 ) = ωn R n Z u(x) dx (ball mean). UR (x0 ) Proof. (a) For simplicity, we consider only the case n = 3 and x0 = 0. Apply Green’s representation formula (17.31) to any ball Ω = Uρ (0) with ρ < R. Noting (17.32) from (17.31) it follows that ! Z Z ∂u(y) 1 1 ∂ 1 dS − dS u(0) = u(y) 4π ρ Sρ (0) ∂~n ∂~ny kyk Sρ (0) Z Z 1 1 ∂ 1 1 dS = =− u(y) u(y) 2 dS 4π Sρ (0) ∂~ny kyk 4π Sρ (0) ρ Z 1 = u(y) dS, 4πρ2 Sρ (0) Since u is continuous on the closed ball of radius R, the formula remains valid as ρ → R. (b) Use dx = dx1 · · · dxn = dr dSr where kxk = r. Multiply both sides of (17.33) by r n−1 dr and integrate with respect to r from 0 to R: Z R Z Z R 1 n−1 n−1 r u(x0 ) dr = r u(y) dS dr ωn r n−1 SR (x0 ) 0 0 Z 1 n 1 u(x) dx. R u(x0 ) = n ωn UR (x0 ) The assertion follows. Note that Rn ωn /n is exactly the n-dimensional volume of UR (x0 ). The proof in case n = 2 is similar. 17 PDE II — The Equations of Mathematical Physics 482 Proposition 17.19 (Minimum-Maximum Principle) Let u be harmonic in Ω and continuous in Ω. Then max u(x) = max u(x); x∈Ω x∈∂Ω i. e. u attains its maximum on the boundary ∂Ω. The same is true for the minimum. Proof. Suppose to the contrary that M = u(x0 ) = max u(x) is attained at an inner point x0 ∈ Ω x∈Ω and M > m = max u(x) = u(y0 ), y0 ∈ ∂Ω. x∈∂Ω (a) We show that u(x) = M is constant in any ball Uε (x0 ) ⊂ Ω around x0 . Suppose to the contrary that u(x1 ) < M for some x1 ∈ Uε (x0 ). By continuity of u, u(x) < M for all x ∈ Uε (x0 ) ∩ Uη (x1 ). In particular Z Z n n M = u(x0 ) = u(x) dx < M dy = M; ωn εn Bε (x0 ) ωn εn Bε (x0 ) this is a contradiction; u is constant in Uε (x0 ). (b) u(x) = M is constant in Ω. Let x1 ∈ Ω; we will show that u(x1 ) = M. Since Ω is connected and bounded, there exists a path from x0 to x1 which can be covered by a chain of finitely many balls in Ω. In all balls, starting with the ball around x0 from (a), u(x) = M is constant. Hence, u is constant in Ω. Since u is continuous, u is constant in Ω. This contradicts the assumption; hence, the maximum is assumed on the boundary ∂Ω. Passing from u to −u, the statement about the minimum follows. Remarks 17.5 (a) A stronger proposition holds with “local maximum” in place of “maximum” (b) Another stricter version of the maximum principle is: Let u ∈ C2 (Ω) ∩ C(Ω) and ∆u ≥ 0 in Ω. Then either u is constant or u(y) < max u(x) x∈∂Ω for all y ∈ Ω. Corollary 17.20 (Uniqueness) The inner and the outer Dirichlet problem has at most one solution, respectively. Proof. Suppose that u1 and u2 both are solutions of the Dirichlet problem, ∆u1 = ∆u2 = f . Put u = u1 − u2 . Then ∆u(x) = 0 for all x ∈ Ω and u(y) = 0 on the boundary y ∈ ∂Ω. (a) Inner problem. By the maximum principle, u(x) = 0 for all x ∈ Ω; that is u1 = u2 . (b) Suppose that u 6≡ 0. Without loss of generality we may assume that u(x1 ) = α > 0 for some x1 ∈ Ω ′ . By assumption, | u(x) | → 0 as B(0) x → ∞. Hence, there exists r > 0 such that | u(x) | < α/2 for all x ≥ r. Since u is harmonic in Br (0) \ Ω, the maximum principle yields Ω 11111111111111111111111111 00000000000000000000000000 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 r 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 00000000000000000000000000 11111111111111111111111111 α = u(x1 ) ≤ a contradiction. max x∈SR (0)∪∂Ω u(x) ≤ α/2; 17.4 Boundary Value Problems for the Laplace and the Poisson Equations 483 Corollary 17.21 (Stability) Suppose that u1 and u2 are solutions of the inner Dirichlet problem ∆u1 = ∆u2 = f with boundary values ϕ1 (y) and ϕ2 (y) on ∂Ω, respectively. Suppose further that | ϕ1 (y) − ϕ2 (y) | ≤ ε ∀ y ∈ ∂Ω. Then | u1 (x) − u2 (x) | ≤ ε for all x ∈ Ω. A similar statement is true for the exterior Dirichlet problem. Proof. Put u = u1 − u2 . Then ∆u = 0 and | u(y) | ≤ ε for all y ∈ ∂Ω. By the Maximum Principle, | u(x) | ≤ ε for all x ∈ Ω. Lemma 17.22 Suppose that u is a non-constant harmonic function on Ω and the maximum of u(x) is attained at y ∈ ∂Ω. ∂u Then ∂~ (y) > 0. n For the proof see [Tri92, 3.4.2. Theorem, p. 174]. Proposition 17.23 (Uniqueness) (a) The exterior Neumann problem has at most one solution. (b) A necessary condition for solvability of the inner Neumann problem is Z Z ϕ dS = f (x) dx. ∂Ω Ω Two solutions of the inner Neumann problem differ by a constant. Proof. (a) Suppose that u1 and u2 are solutions of the exterior Neumann problem, then u = ∂u u1 − u2 satisfies ∆u = 0 and (y) = 0. The above lemma shows that u(x) = c is constant in ∂~n Ω. Since lim| x |→∞ u(x) = 0, the constant c is 0; hence u1 = u2. (b) Inner problem. The uniqueness follows as in (a). The necessity of the formula follows from ∂v (17.28) with v ≡ 1, ∆u = f , ∂~ = 0. n Proposition 17.24 (Converse Mean Value Theorem) Suppose that u ∈ C(Ω) and that whenever x0 ∈ Ω such that Ur (x0 ) ⊂ Ω we have the mean value property Z Z 1 1 u(y) dS(y) = u(x0 + ry) dS(y). u(x0 ) = ωn r n−1 Sr (x0 ) ωn S1 (0) Then u ∈ C∞ (Ω) and u is harmonic in Ω. Proof. (a) We show that u ∈ C∞ (Ω). The Mean Value Property ensures that the mollification hε ∗ u equals u as long as U1/ε (x0 ) ⊂ Ω; that is, the mollification does not change u. By 17 PDE II — The Equations of Mathematical Physics 484 homework 49.4, u ∈ C∞ since hε is. We prove hε ∗ u = u. Here g denotes the 1-dimensional bump function, see page 419. Z Z uε (x) = (u ∗ hε )(x) = u(y)hε (x − y) dy = u(x − y)hε (y) dy n n R R Z Z −n u(x − y)h(y/ε)ε dy =y u(x − εz)h(z) dz = = Z Uε (0) 1Z 0 zi = S1 (0) = ωn u(x) Z i ε U1 (0) u(x − rεy)g(r)r n−1 dS(y) dr 1 g(r)r 0 n−1 dr = u(x) Z Rn h(y) dy = u(x). R Second Part. Differentiating the above equation with respect to r yields Ur (x) ∆u(y) dy = 0 for any ball in Ω, since the left-hand side u(x) does not depend on r. Z Z d u(x + ry) dS(y) = y · ∇u(x + ry) dS(y) 0= dr S1 (0) S1 (0) Z = (r −1 z)∇u(x + z)r 1−n dS(z) Sr (0) Z −n =r ~n(z) · ∇u(x + z) dS(z) Sr (0) Z ∂u −n =r (x + z) dS(z) n Sr (0) ∂~ Z Z ∂u(y) −n −n =r dS(y) = r ∆u(x) dx. n Sr (x0 ) ∂~ Ur (x0 ) In the last line we used Green’s 2nd formula with v = 1. Thus ∆u = 0. Suppose to the contrary that ∆u(x0 ) 6= 0, say ∆u(x0 ) > 0. By continuity of ∆u(x), ∆u(x) > 0 for x ∈ Uε (x0 ). Hence R ∆u(x) dx > 0 which contradicts the above equation. We conclude that u is harmonic in Uε (x0 ) Ur (x0 ). Remark 17.6 A regular distribution u ∈ D′ (Ω) is called harmonic if ∆u = 0, that is, Z u(x) ∆ϕ(x) dx = 0, ϕ ∈ D(Ω). h∆u , ϕi = Ω Weyl’s Lemma: Any harmonic regular distribution is a harmonic function, in particular, u ∈ C∞ (Ω). Example 17.3 Solve ∆u = −2, u |∂Ω = 0. (x, y) ∈ Ω = (0, a) × (−b/2, b/2), 17.5 Appendix 485 Ansatz: u = w + v with ∆w = 0. For example, choose w = −x2 + ax. Then ∆v = 0 with boundary conditions v(0, y) = v(a, y) = 0, v(x, −b/2) = v(x, b/2) = x2 − ax. Use separation of variables, u(x, y)) = X(x)Y (y) to solve the problem. 17.5 Appendix 17.5.1 Existence of Solutions to the Boundary Value Problems (a) Green’s Function Let u ∈ C2 (Ω) ∩ C1 (Ω). Let us combine Green’s representation formula and Green’s 2nd formula with a harmonic function v(x) = vy (x), x ∈ Ω, where y ∈ Ω is thought to be a parameter. Z Z ∂En ∂u (x) dS(x) u(y) = En (x − y) ∆u(x) dx + u(x) (x − y) − En (x − y) ∂~nx ∂~n Ω ∂Ω Z Z ∂vy ∂u 0= (x) − vy (x) (x) dS(x) u(x) vy (x)∆u(x) dx + ∂~n ∂~n Ω ∂Ω Adding up these two lines and denoting G(x, y) = En (x − y) + vy (x) we get Z Z ∂G(x, y) ∂u (x) dS(x). G(x, y) ∆u(x) dx + u(x) − G(x, y) u(y) = ∂~nx ∂~n Ω ∂Ω Suppose now that G(x, y) vanishes for all x ∈ ∂Ω then the last surface integral is 0 and Z Z ∂G(x, y) u(y) = G(x, y) ∆u(x) dx + u(x) dS(x). (17.34) ∂~nx Ω ∂Ω In the above formula, u is completely determined by its boundary values and ∆u in Ω. This motivates the following definition. Definition 17.4 A function G : Ω × Ω → R satisfying (a) G(x, y) = 0 for all x ∈ ∂Ω, y ∈ Ω, x 6= y. (b) vy (x) = G(x, y) − En (x − y) is harmonic in x ∈ Ω for all y ∈ Ω. is called a Green’s function of Ω. More precisely, G(x, y) is a Green’s function to the inner Dirichlet problem on Ω. Remarks 17.7 (a) The function vy (x) is in particular harmonic in x = y. Since En (x − y) has a pole at x = y, G(x, y) has a pole of the same order at x = y such that G(x, y) − En (x − y) has no singularity. If such a function G(x, y) exists, for all u ∈ C2 (Ω) we have (17.34). 17 PDE II — The Equations of Mathematical Physics 486 In particular, if in addition, u is harmonic in Ω, u(y) = Z u(x) ∂Ω ∂G(x, y) dS(x). ∂~nx (17.35) This is the so called Poisson’s formula for Ω. In general, it is difficult to find Green’s function. For most regions Ω it is even impossible to give G(x, y) explicitely. However, if Ω has kind of symmetry, one can use the reflection principle to construct G(x, y) explicitely. Nevertheless, G(x, y) exists for all “well-behaved” Ω (the boundary is a C2 -set and Gauß’ divergence theorem holds for Ω). (c) The Reflection Principle This is a method to calculate Green’s function explicitely in case of domains Ω with the following property: Using repeated reflections on spheres and on hyperplanes occurring as boundaries of Ω and its reflections, the whole Rn can be filled up without overlapping. Example 17.4 Green’s function on a ball UR (0). For, we use the reflection on the sphere SR (0). For y ∈ Rn put . R y := O y − y ( y R2 , kyk2 ∞, y 6= 0, y = 0. Note that this map has the the property y · y = R2 and kyk2 y = R2 y. Points on the sphere SR (0) are fix under this map, y = y. Let En : R+ → R denote the corresponding to En radial scalar function with E(x) = En (kxk), that is En (r) = −1/((n − 2)ωn r n−2 ), n ≥ 2. Then we put E (kx − yk) − E kyk kx − yk , n n R G(x, y) = En (kxk) − En (R), y 6= 0, (17.36) y = 0. For x 6= y, G(x, y) is harmonic in x, since for kyk < R, kyk > R and therefore x − y 6= 0. The function G(x, y) has only one singularity in UR (0) namely at x = y and this is the same as that of En (x − y). Therefore, −E kyk kx − yk , n R vy (x) = G(x, y) − En (x − y) = En (R), y 6= 0, y = 0. 17.5 Appendix 487 is harmonic for all x ∈ Ω. For x ∈ ∂Ω = SR (0) we have for y 6= 0 12 21 kyk 2 2 2 2 − En G(x, y) = En kxk + kyk − 2x · y kxk + kyk − 2x · y R ! 12 2 2 1 y kyk kyk = En R2 + kyk2 − 2x · y 2 − En kyk2 + − 2 kyk2 x · 2 2 R R 1 1 = En R2 + kyk2 − 2x · y 2 − En kyk2 + R2 − 2x · y 2 = 0. For y = 0 we have G(x, 0) = En (kxk) − En (R) = En (R) − En (R) = 0. This proves that G(x, y) is a Green’s function for UR (0). In particular, the above calculation shows that r = kx − yk and r = kyk kx − yk are equal if x ∈ ∂Ω. R One can show that Green’s function is symmetric, that is G(x, y) = G(y, x). This is a general property of Green’s function. ∂ To apply formula (17.34) we have to compute the normal derivative G(x, y). Note first that ∂~nx n for any constant z ∈ R and x ∈ SR (0) x x−z ∂ f (kx − zk) = ~n · ∇f (kx − zk) = · f ′ (kx − zk) . ∂~nx kxk kx − zk Note further that for kxk = R we have r = kx − yk = (x − y) · x − kyk kx − yk , R kyk2 (x − y) · x = R2 − kyk2 . R2 Hence, for y 6= 0, ∂ 1 G(x, y) = − ∂~nx (n − 2)ωn (17.37) ∂ ∂ kx − yk−n+2 − ∂~nx ∂~nx kyk−n+2 kx − yk−n+2 R−n+2 (17.38) !! x kyk−n+2 x x−y x−y · − −n+2 kx − yk−n+1 · kx − yk−n+1 kx − yk kxk R kx − yk kxk ! 1 kyk2 = (x − y) · x − (x − y) ·x ωn r n R R2 1 = ωn By (17.38), the expression in the brackets is R2 − kyk2 . Hence, ! 1 R2 − kyk2 ∂ G(x, y) = . ∂~nx ωn R kx − ykn This formula holds true in case y = 0. Inserting this into (17.34) we have for any harmonic function u ∈ C2 (UR (0)) ∩ C(UR (0)) we have Z R2 − kyk2 u(x) u(y) = dS(x). (17.39) ωn R kx − ykn SR (0) This is the so called Poisson’s formula for the ball UR (0). 17 PDE II — The Equations of Mathematical Physics 488 Proposition 17.25 Let n ≥ 2. Consider the inner Dirichlet problem in Ω = UR (0) and f = 0. The function R2 −kyk2 R ϕ(x) ωn R dS(x), kyk < R, kx−ykn SR (0) u(y) = ϕ(y), kyk = R is continuous on the closed ball UR (0) and harmonic in UR (0). In case n = 2 the function u(y), can be written in the following form Z 1 z + y dz u(y) = Re , y ∈ UR (0) ⊂ C. ϕ(z) 2πi SR (0) z−y z For the proof of the general statement with n ≥ 2, see [Jos02, Theorem 1.1.2] or [Joh82, p. 107]. We show the last statement for n = 2. Since yz − yz is purely imaginary, z+y (z + y)(z − y) | z |2 − | y |2 + yz − yz R 2 − | y |2 Re = Re = Re = . z−y (z − y)(z − y) | z − y |2 | z − y |2 Using the parametrization z = Reit , dt = Re 1 2π Z z + y dz ϕ(z) z − y iz SR (0) 1 = 2π dz we obtain iz Z 0 2π R2 − | y |2 ϕ(z) dt = | z − y |2 R2 − | y |2 2πR Z SR (0) ϕ(x) | dx | . | x − y |2 In the last line we have a (real) line integral of the first kind, using x = (x1 , x2 ) = x1 + ix2 = z, x ∈ SR (0) and | dx | = R dt on the circle. Other Examples. (a) n = 3. The half-space Ω = {(x1 , x2 , x3 ) ∈ R3 | x3 > 0}. We use the ordinary reflection map with respect to the plane x3 = 0 which is given by y = (y1 , y2, y3 ) 7→ y ′ = (y1, y2 , −y3 ). Then Green’s function to Ω is 1 1 1 ′ G(x, y) = E3 (x, y) − E3 (x, y ) = . − 4π kx − y ′k kx − yk (see homework 57.1) (b) n = 3. The half ball Ω = {(x1 , x2 , x3 ) ∈ R3 | kxk < R, x3 > 0}. We use the reflections y → y ′ and y → y (reflection with respect to the sphere SR (0)). Then G(x, y) = E3 (x − y) − R R E3 (x − y) − E3 (x − y ′ ) + E3 (x − y ′ ) kyk kyk is Green’s function to Ω. (c) n = 3, Ω = {(x1 , x2 , x3 ) ∈ R3 | x2 > 0, x3 > 0}. We introduce the reflection y = (y1 , y2 , y3 ) 7→ y ∗ = (y1 , −y2 , y3 ). Then Green’s function to Ω is G(x, y) = E3 (x − y) − E3 (x − y ′) − E3 (x − y ∗ ) + E3 (x − (y ∗ )′ ). 17.5 Appendix 489 Consider the Neumann problem and the ansatz for Green’s function in case of the Dirichlet problem: Z Z ∂H(x, y) ∂u u(y) = (x) dS(x). (17.40) u(x) H(x, y) ∆u(x) dx + − H(x, y) ∂~nx ∂~n Ω ∂Ω We want to choose a Green’s function of the second kind H(x, y) in such a way that only the last surface integral remains present. Inserting u = 1 in the above formula, we have Z ∂G(x, y) dS(x). 1= ∂~nx ∂Ω = α = const. , this constant must be α = 1/vol(∂Ω). Imposing, ∂G(x,y) ∂~ nx Note, that one defines a Green’s function to the Neumann problem replacing G(x, y) = 0 on ∂Ω by the condition ∂~∂ny G(x, y) = const. Green’s function of second kind (Neumann problem) to the ball of radius R in R3 , UR (0). 1 H(x, y) = − 4π 1 R 1 2R2 + + log 2 kx − yk kyk kx − yk R R − x · y + kyk kx − yk (c) Existence Theorems The aim of this considerations is to sketch the method of proving existence of solutions of the four BVPs. Definition. Suppose that Ω ⊂ Rn , n ≥ 3 and let µ(y) be a continuous function on the boundary ∂Ω, that is µ ∈ C(∂Ω). We call Z w(x) = µ(y) En(x − y) dS(y), x ∈ Rn , (17.41) ∂Ω a single-layer potential and v(x) = Z ∂Ω µ(y) ∂En (x − y) dS(y), ∂~n x ∈ Rn (17.42) a double-layer potential Remarks 17.8 (a) For x 6∈ ∂Ω the integrals (17.41) and (17.42) exist. (b) The single layer potential u(x) is continuous on Rn . The double-layer potential jumps at y0 ∈ ∂Ω by µ(y0 ) as x approaches y0 , see (17.43) below. Theorem 17.26 Let Ω be a connected bounded region in Rn of the class C2 and Ω ′ = Rn \ Ω also be connected. Then the interior Dirichlet problem to the Laplace equation has a unique solution. It can be represented in form of a double-layer potential. The exterior Neumann problem likewise has a unique solution which can be represented in form of a single-layer potential. 17 PDE II — The Equations of Mathematical Physics 490 Theorem 17.27 Under the same assumptions as in the previous theorem, the inner Neumann R problem to the Laplace equation has a solution if and only if ∂Ω ϕ(y) dS(y) = 0. If this condition is satisfies, the solution is unique up to a constant. The exterior Dirichlet problem has a unique solution. Remark. Let µID denote the continuous functions which produces the solution v(x) of the inteR ∂ En (x−y). rior Dirichlet problem, i. e. v(x) = ∂Ω µID (y) K(x, y) dS(y), where K(x, y) = ∂~ny Because of the jump relation for v(x) at x0 ∈ ∂Ω: 1 1 v(x) − µ(x0 ) = v(x0 ) = lim ′ v(x) + µ(x0 ), x→x0 ,x∈Ω x→x0 ,x∈Ω 2 2 lim (17.43) µID satisfies the integral equation 1 ϕ(x) = µID (x) + 2 Z µID (y)K(x, y) dS(y), ∂Ω x ∈ ∂Ω. The above equation can be written as ϕ = (A + 12 I)µID , where A is the above integral operator in L2 (∂Ω). One can prove the following facts: A is compact, A + 12 I is injective and surjective, ϕ continuous implies µID continuous. For details, see [Tri92, 3.4]. Application to the Poisson Equation Consider the inner Dirichlet problem ∆u = f , and u = ϕ on ∂Ω. We suppose that f ∈ C(Ω) ∩ C1 (Ω). We already know that Z 1 f (y) dy w(x) = (En ∗ f )(x) = − (n − 2)ωn Rn kx − ykn−2 is a distributive solution of the Poisson equation, ∆w = f . By the assumptions on f , w ∈ C2 (Ω) and therefore is a classical solution. To solve the problem we try the ansatz u = w + v. Then ∆u = ∆w + ∆v = f + ∆v. Hence, ∆u = f if and only if ∆v = 0. Thus, the inner Dirichlet problem for the Poisson equation reduces to the inner Dirichlet problem for the Laplace equation ∆v = 0 with boundary values v(y) = u(y) − w(y) = ϕ(y) − w(y) =: ϕ̃(y), y ∈ ∂Ω. Since ϕ and w are continuous on ∂Ω, so is ϕ̃. 17.5.2 Extremal Properties of Harmonic Functions and the Dirichlet Principle (a) The Dirichlet Principle Consider the inner Dirichlet problem to the Poisson equation with given data f ∈ C(Ω) and ϕ ∈ C(∂Ω). Put C1ϕ (Ω) := {v ∈ C1 (Ω) | v = ϕ on ∂Ω}. 17.5 Appendix 491 On this space define the Dirichlet integral by Z Z 1 2 E(v) = k∇vk dx + f · v dx, 2 Ω Ω v ∈ C1ϕ (Ω). (17.44) This integral is also called energy integral. The Dirichlet principle says that among all functions v with given boundary values ϕ, the function u with ∆u = f minimizes the energy integral E. Proposition 17.28 A function u ∈ C1ϕ (Ω) ∩ C2 (Ω) is a solution of the inner Dirichlet problem if and only if the energy integral E attains its minimum on C1ϕ (Ω) at u. Proof. (a) Suppose first that u ∈ C1ϕ (Ω) ∩ C2 (Ω) is a solution of the inner Dirichlet problem, ∆u = f . For v ∈ C1ϕ (Ω) let w = v − u ∈ C1ϕ (Ω). Then Z Z 1 E(v) = E(u + w) = (∇u + ∇w) · (∇u + ∇w) dx + (u + w)f dx 2 Ω Z Z Z ZΩ 1 1 k∇uk2 + k∇wk2 + ∇u · ∇w dx + (u + w)f dx = 2 Ω 2 Ω Ω Ω Since u and v satisfy the same boundary conditions, w |∂Ω = 0. Further, ∆u = f . By Green’s 1st formula, Z Z Z Z ∂u w dS = − ∇u · ∇w dx = − (∆u) w dx + f w dx. n Ω Ω ∂Ω ∂~ Ω Inserting this into the above equation, we have Z Z Z Z 1 1 2 2 k∇uk + k∇wk − f w dx + (u + w)f dx E(v) = 2 Ω 2 Ω Ω Ω Z 1 k∇wk2 dx ≥ E(u). = E(u) + 2 Ω This shows that E(u) is minimal. (b) Conversely, let u ∈ C1ϕ (Ω) ∩ C2 (Ω) minimize the energy integral. In particular, for any test function ψ ∈ D(Ω), ψ has zero boundary values, the function Z Z 1 2 (∇u · ∇ψ + f ψ) dx + t k∇ψk2 dx g(t) = E(u + tψ) = E(u) + t 2 Ω Ω has a local minimum at t = 0. Hence, g ′(0) = 0 which is, again by Green’s 1st formula and ψ |∂Ω = 0, equivalent to Z Z 0 = (∇u · ∇ψ + f ψ) dx = (−∆u + f ) ψ dx. Ω Ω By the fundamental Lemma of calculus of variations, ∆u = f almost everywhere on Ω. Since both ∆u and f are continuous, this equation holds pointwise for all x ∈ Ω. 17 PDE II — The Equations of Mathematical Physics 492 (b) Hilbert Space Methods We want to give another reformulation of the Dirichlet problem. Consider the problem ∆u = −f, u |∂Ω = 0. On C1 (Ω) define a bilinear map u·vE = Z Ω ∇u · ∇v dx. C1 (Ω) is not yet an inner product space since for any non-vanishing constant function u, u·uE = 0. Denote by C10 (Ω) the subspace of functions in C1 (Ω) vanishing on the boundary ∂Ω. Now, u·vE is an inner product on C10 (Ω). The positive definiteness is a consequence of the Poincaré R inequality below. Its corresponding norm is kuk2E = Ω k∇uk2 dx. Let u be a solution of the above Dirichlet problem. Then for any v ∈ C10 (Ω), by Green’s 1st formula Z Z Z v ·uE = ∇v · ∇u dx = − v ∆u dx = v f dx = v ·fL2 . Ω Ω Ω This suggests that u can be found by representing the known linear functional in v Z F (v) = v f dx Ω as an inner product v ·uE . To make use of Riesz’s representations theorem, Theorem 13.8, we have to complete C10 (Ω) into a Hilbert space W with respect to the energy norm k·kE and to prove that the above linear functional F is bounded with respect to the energy norm. This is a consequence of the next lemma. We make the same assumtions on Ω as in the beginning of Section 17.4. Lemma 17.29 (Poincaré inequality) Let Ω ⊂ Rn . Then there exists C > 0 such that for all u ∈ C10 (Ω) kukL2 (Ω) ≤ C kukE . Proof. Let Ω be contained in the cube Γ = {x ∈ Rn | | xi | ≤ a, i = 1, . . . , n}. We extend u by zero outside Ω. For any x = (x1 , . . . , xn ), by the Fundamental Theorem of Calculus Z x1 2 Z x1 2 2 ux1 (y1 , x2 , . . . , xn ) dy1 = 1 · ux1 (y1 , x2 , . . . , xn ) dy1 u(x) = −a −a Z x1 Z x1 Z x1 Z a 2 2 ≤ dy1 ux1 dy1 = (x1 + a) ux1 dy1 ≤ 2a u2x1 dy1 . CSI −a −a −a −a Since the last integral does not depend on x1 , integration with respect to x1 gives Z a Z a 2 2 u(x) dx1 ≤ 4a u2x1 dy1 . −a −a Integrating over x2 , . . . , xn from −a to a we find Z Z 2 2 u dx ≤ 4a u2x1 dy. Γ Γ 17.5 Appendix 493 The same inequality holds for xi , i = 2, . . . , n in place of x1 such that Z Z 4a2 2 2 kukL2 = u dx ≤ (∇u)2 dx = C 2 kuk2E , n Γ Γ √ where C = 2a/ n. The Poincaré inequality is sometimes called Poincaré–Friedrich inequality. It remains true for functions u in the completion W . Let us discuss the elements of W in more detail. By definition, f ∈ W if there is a Cauchy sequence (fn ) in C10 (Ω) such that (fn ) “converges to f ” in the energy norm. By the Poincaré inequality, (fn ) is also an L2 -Cauchy sequence. Since L2 (Ω) is complete, (fn ) has an L2 -limit f . This shows W ⊆ L2 (Ω). For simplicity, let Ω ⊂ R1 . R By definition of the energy norm, Ω | (fn − fm )′ |2 dx → 0, as m, n → ∞; that is (fn′ ) is an L2 -Cauchy sequence, too. Hence, (fn′ ) has also some L2 -limit, say g ∈ L2 (Ω). So far, kfn − f kL2 −→ 0, kfn′ − gkL2 −→ 0. (17.45) We will show that the above limits imply f ′ = g in D′ (Ω). Indeed, by (17.45) and the Cauchy– Schwarz inequality, for all ϕ ∈ D(Ω), Z ′ Ω Z (fn − f )ϕ dx ≤ Ω Hence, Z ′ Ω (fn′ − g)ϕ dx ≤ f ϕ dx = − Z Ω ′ Z Ω Z Ω 21 Z 12 ′ 2 −→ 0, | fn − f | dx | ϕ | dx 2 Ω 12 Z 21 2 2 ′ −→ 0. | fn − g | dx | ϕ | dx Ω f ϕ dx = − lim n→∞ Z ′ fn ϕ dx = lim Ω n→∞ Z Ω fn′ ϕ dx = Z g ϕ dx. Ω This shows f ′ = g in D′ (Ω). One says that the elements of W provide weak derivatives, that is, its distributive derivative is an L2 -function (and hence a regular distribution). Also, the inner product ···E is positive definite since the L2 -inner product is. It turns out that W is a separable Hilbert space. W is the so called Sobolev space W01,2 (Ω) sometimes also denoted by H10 (Ω). The upper indices 1 and 2 in W01,2 (Ω) refer to the highest order of partial derivatives (| α | = 1) and the Lp -space (p = 2) in the definition of W , respectively. The lower index 0 refers to the so called generalized boundary values 0. For further readings on Sobolev spaces, see [Fol95, Chapter 6]. R Corollary 17.30 F (v) = v ·fL2 = Ω f v dx defines a bounded linear functional on W . Proof. By the Cauchy–Schwarz and Poincaré inequalities, Z | F (v) | ≤ | f v | dx ≤ kf kL2 kvkL2 ≤ C kf kL2 kvkE . Ω Hence, F is bounded with kF k ≤ C kf kL2 . 17 PDE II — The Equations of Mathematical Physics 494 Corollary 17.31 Let f ∈ C(Ω). Then there exists a unique u ∈ W such that v ·uE = v ·fL2 , ∀ v ∈ W. This u solves ∆u = −f, in D′ (Ω). The first statement is a consequence of Riesz’s representation theorem; note that F is a bounded linear functional on the Hilbert space W . The last statement follows from D(Ω) ⊂ C10 (Ω) and u·ϕE = − h∆u , ϕi = f ·ϕL2 . This is the so called modified Dirichlet problem. It remains open the task to identify the solution u ∈ W with an ordinary function u ∈ C2 (Ω). 17.5.3 Numerical Methods (a) Difference Methods Since most ODE and PDE are not solvable in a closed form there are a lot of methods to find approximative solutions to a given equation or a given problem. A general principle is discretization. One replaces the derivative u′(x) by one of its difference quotients ∂ + u(x) = u(x + h) − u(x) , h ∂ − u(x) = u(x) − u(x − h) , h where h is called the step size. One can also use a symmetric difference Five-Point formula for the Laplacian in R2 is then given by u(x+h)−u(x−h) . 2h The ∆h u(x, y) := (∂x− ∂x+ + ∂y− ∂y+ )u(x, y) = u(x − h, y) + u(x + h, y) + u(x, y − h) + u(x, y + h) − 4u(x, y) = h2 Besides the equation, the domain Ω as well as its boundary ∂Ω undergo a discretization: If Ω = (0, 1) × (0, 1) then Ωh = {(nh, mh) ∈ Ω | n, m ∈ N}, ∂Ωh = {(nh, mh) ∈ ∂Ω | n, m ∈ Z}. The discretization of the inner Dirichlet problem then reads ∆h u = f, u |∂Ωh = ϕ. x ∈ Ωh , Also, Neumann problems have discretizations, [Hac92, Chapter 4]. (b) The Ritz–Galerkin Method Suppose we have a boundary value problem in its variational formulation: Find u ∈ V , so that u·vE = F (v) for all v ∈ V , 17.5 Appendix 495 where we are thinking of the Sobolev space V = W from the previous paragraph. Of course, F is assumed to be bounded. Difference methods arise through discretising the differential operator. Now we wish to leave the differential operator which is hidden in ···E unchanged. The Ritz–Galerkin method consists in replacing the infinite-dimensional space V by a finite-dimensional space VN , VN ⊂ V, dim VN = N < ∞. VN equipped with the norm k·kE is still a Banach space. Since VN ⊆ V , both the inner product ···E and F are defined for u, v ∈ VN . Thus, we may pose the problem Find uN ∈ VN , so that uN ·vE = F (v) for all v ∈ VN , The solution to the above problem, if it exists, is called Ritz–Galerkin solution (belonging to VN ). An introductory example is to be found in [Hac92, 8.1.11, p. 164], see also [Bra01, Chapter 2]. 496 17 PDE II — The Equations of Mathematical Physics Bibliography [AF01] I. Agricola and T. Friedrich. Globale Analysis (in German). Friedr. Vieweg & Sohn, Braunschweig, 2001. [Ahl78] L. V. Ahlfors. Complex analysis. An introduction to the theory of analytic functions of one complex variable. International Series in Pure and Applied Mathematics. McGraw-Hill Book Co., New York, 3 edition, 1978. [Arn04] V. I. Arnold. Lectures in Partial Differential Equations. Universitext. Springer and Phasis, Berlin. Moscow, 2004. [Bra01] D. Braess. Finite elements. Theory, fast solvers, and applications in solid mechanics. Cambridge University Press, Cambridge, 2001. [Bre97] Glen E. Bredon. Topology and geometry. Number 139 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1997. [Brö92] Th. Bröcker. Analysis II (German). B. I. Wissenschaftsverlag, Mannheim, 1992. [Con78] J. B. Conway. Functions of one complex variable. Number 11 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1978. [Con90] J. B. Conway. A course in functional analysis. Number 96 in Graduate Texts in Mathematics. Springer-Verlag, New York, 1990. [Cou88] R. Courant. Differential and integral calculus I–II. Wiley Classics Library. John Wiley & Sons, New York etc., 1988. [Die93] J. Dieudonné. Treatise on analysis. Volume I – IX. Pure and Applied Mathematics. Academic Press, Boston, 1993. [Els02] J. Elstrodt. Maß- und Integrationstheorie. Springer, Berlin, third edition, 2002. [Eva98] L. C. Evans. Partial differential equations. Number 19 in Graduate Studies in Mathematics. AMS, Providence, 1998. [FB93] E. Freitag and R. Busam. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, 1993. 497 BIBLIOGRAPHY 498 [FK98] H. Fischer and H. Kaul. Mathematik für Physiker. Band 2. Teubner Studienbücher: Mathematik. Teubner, Stuttgart, 1998. [FL88] W. Fischer and I. Lieb. Ausgewählte Kapitel der Funktionentheorie. Number 48 in Vieweg Studium. Friedrich Vieweg & Sohn, Braunschweig, 1988. [Fol95] G. B. Folland. Introduction to partial differential equations. Princeton University Press, Princeton, 1995. [For81] O. Forster. Analysis 3 (in German). Vieweg Studium: Aufbaukurs Mathematik. Vieweg, Braunschweig, 1981. [For01] O. Forster. Analysis 1 – 3 (in German). Vieweg Studium: Grundkurs Mathematik. Vieweg, Braunschweig, 2001. [GS64] I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). III: Einige Fragen zur Theorie der Differentialgleichungen. (German). Number 49 in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1964. [GS69] I. M. Gelfand and G. E. Schilow. Verallgemeinerte Funktionen (Distributionen). I: Verallgemeinerte Funktionen und das Rechnen mit ihnen. (German). Number 47 in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1969. [Hac92] W. Hackbusch. Theory and numerical treatment. Number 18 in Springer Series in Computational Mathematics. Springer-Verlag, Berlin, 1992. [Hen88] P. Henrici. Applied and computational complex analysis. Vol. 1. Wiley Classics Library. John Wiley & Sons, Inc., New York, 1988. [How03] J. M. Howie. Complex analysis. Springer-Verlag, London, 2003. [HS91] Springer Undergraduate Mathematics Series. F. Hirzebruch and W. Scharlau. Einführung in die Funktionalanalysis. Number 296 in BI-Hochschultaschenbücher. BI-Wissenschaftsverlag, Mannheim, 1991. [HW96] E. Hairer and G. Wanner. Analysis by its history. Undergraduate texts in Mathematics. Readings in Mathematics. Springer-Verlag, New York, 1996. [Jän93] K. Jänich. Funktionentheorie. Springer-Lehrbuch. Springer-Verlag, Berlin, third edition, 1993. [Joh82] F. John. Partial differential equations. Number 1 in Applied Mathematical Sciences. Springer-Verlag, New York, 1982. [Jos02] J. Jost. Partial differential equations. Number 214 in Graduate Texts in Mathematics. Springer-Verlag, New York, 2002. BIBLIOGRAPHY 499 [KK71] A. Kufner and J. Kadlec. Fourier Series. G. A. Toombs Iliffe Books, London, 1971. [Kno78] K. Knopp. Elemente der Funktionentheorie. Number 2124 in Sammlung Göschen. Walter de Gruyter, Berlin, New York, 9 edition, 1978. [Kön90] K. Königsberger. Analysis 1 (English). Springer-Verlag, Berlin, Heidelberg, New York, 1990. [Lan89] S. Lang. Undergraduate Analysis. Undergraduate texts in mathematics. Springer, New York-Heidelberg, second edition, 1989. [MW85] J. Marsden and A. Weinstein. Calculus. I, II, III. Undergraduate Texts in Mathematics. Springer-Verlag, New York etc., 1985. [Nee97] T. Needham. Visual complex analysis. Oxford University Press, New York, 1997. [O’N75] P. V. O’Neil. Advanced calculus. Collier Macmillan Publishing Co., London, 1975. [RS80] M. Reed and B. Simon. Methods of modern mathematical physics. I. Functional Analysis. Academic Press, Inc., New York, 1980. [Rud66] W. Rudin. Real and Complex Analysis. International Student Edition. McGraw-Hill Book Co., New York-Toronto, 1966. [Rud76] W. Rudin. Principles of mathematical analysis. International Series in Pure and Applied Mathematics. McGraw-Hill Book Co., New York-Auckland-Düsseldorf, third edition, 1976. [Rüh83] F. Rühs. Funktionentheorie. Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 4 edition, 1983. [Spi65] M. Spivak. Calculus on manifolds. W. A. Benjamin, New York, Amsterdam, 1965. [Spi80] M. Spivak. Calculus. Publish or Perish, Inc., Berkeley, California, 1980. [Str92] W. A. Strauss. Partial differential equations. John Wiley & Sons, New York, 1992. [Tri92] H. Triebel. Higher analysis. Hochschulbücher für Mathematik. Johann Ambrosius Barth Verlag GmbH, Leipzig, 1992. [vW81] C. von Westenholz. Differential forms in mathematical physics. Number 3 in Studies in Mathematics and its Applications. North-Holland Publishing Co., AmsterdamNew York, second edition, 1981. [Wal74] W. Walter. Einführung in die Theorie der Distributionen (in German). Bibliographisches Institut, B.I.- Wissenschaftsverlag, Mannheim-Wien-Zürich, 1974. [Wal02] W. Walter. Analysis 1–2 (in German). Springer-Lehrbuch. Springer, Berlin, fifth edition, 2002. 500 BIBLIOGRAPHY [Wla72] V. S. Wladimirow. Gleichungen der mathematischen Physik (in German). Number 74 in Hochschulbücher für Mathematik. VEB Deutscher Verlag der Wissenschaften, Berlin, 1972. PD D R . A. S CH ÜLER M ATHEMATISCHES I NSTITUT U NIVERSIT ÄT L EIPZIG 04009 L EIPZIG [email protected]