* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download classical theoretical physics II
Metric tensor wikipedia , lookup
Euler equations (fluid dynamics) wikipedia , lookup
Nordström's theory of gravitation wikipedia , lookup
History of Lorentz transformations wikipedia , lookup
Navier–Stokes equations wikipedia , lookup
Path integral formulation wikipedia , lookup
Field (physics) wikipedia , lookup
Aharonov–Bohm effect wikipedia , lookup
Special relativity wikipedia , lookup
Photon polarization wikipedia , lookup
Lagrangian mechanics wikipedia , lookup
Partial differential equation wikipedia , lookup
Kaluza–Klein theory wikipedia , lookup
Maxwell's equations wikipedia , lookup
Noether's theorem wikipedia , lookup
Electromagnetism wikipedia , lookup
Relativistic quantum mechanics wikipedia , lookup
Lorentz force wikipedia , lookup
Introduction to gauge theory wikipedia , lookup
Equations of motion wikipedia , lookup
Derivations of the Lorentz transformations wikipedia , lookup
Routhian mechanics wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Classical Theoretical Physics Alexander Altland ii Contents 1 Electrodynamics 1.1 Magnetic field energy . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Electromagnetic gauge field . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Covariant formulation . . . . . . . . . . . . . . . . . . . . . 1.3 Fourier transformation . . . . . . . . . . . . . . . . . . . . . . . . . 1.3.1 Fourier transform (finite space) . . . . . . . . . . . . . . . . 1.3.2 Fourier transform (infinte space) . . . . . . . . . . . . . . . 1.3.3 Fourier transform and solution of linear differential equations 1.3.4 Algebraic interpretation of Fourier transform . . . . . . . . . 1.4 Electromagnetic waves in vacuum . . . . . . . . . . . . . . . . . . . 1.4.1 Solution of the homogeneous wave equations . . . . . . . . . 1.4.2 Polarization . . . . . . . . . . . . . . . . . . . . . . . . . . 1.5 Green function of electrodynamics . . . . . . . . . . . . . . . . . . . 1.5.1 Physical meaning of the Green function . . . . . . . . . . . . 1.5.2 Electromagnetic gauge field and Green functions . . . . . . . 1.6 Field energy and momentum . . . . . . . . . . . . . . . . . . . . . . 1.6.1 Field energy . . . . . . . . . . . . . . . . . . . . . . . . . . 1.6.2 Field momentum ∗ . . . . . . . . . . . . . . . . . . . . . . . 1.6.3 Energy and momentum of plane electromagnetic waves . . . 1.7 Electromagnetic radiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1 3 4 7 8 10 12 14 20 20 21 23 26 27 28 28 30 32 32 2 Macroscopic electrodynamics 2.1 Macroscopic Maxwell equations . . . . . . . . . 2.1.1 Averaged sources . . . . . . . . . . . . . 2.2 Applications of macroscopic electrodynamics . . 2.2.1 Electrostatics in the presence of matter * 2.2.2 Magnetostatics in the presence of matter 2.3 Wave propagation in media . . . . . . . . . . . 2.3.1 Model dielectric function . . . . . . . . . 2.3.2 Plane waves in matter . . . . . . . . . . 2.3.3 Dispersion . . . . . . . . . . . . . . . . 2.3.4 Electric conductivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 38 38 47 47 50 51 51 52 54 57 iii . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv CONTENTS 3 Relativistic invariance 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Galilei invariance and its limitations . . . . . . . . . 3.1.2 Einsteins postulates . . . . . . . . . . . . . . . . . 3.1.3 First consequences . . . . . . . . . . . . . . . . . . 3.2 The mathematics of special relativity I: Lorentz group . . . 3.2.1 Background . . . . . . . . . . . . . . . . . . . . . 3.3 Aspects of relativistic dynamics . . . . . . . . . . . . . . . 3.3.1 Proper time . . . . . . . . . . . . . . . . . . . . . 3.3.2 Relativistic mechanics . . . . . . . . . . . . . . . . 3.4 Mathematics of special relativity II: Co– and Contravariance 3.4.1 Covariant and Contravariant vectors . . . . . . . . . 3.4.2 The relativistic covariance of electrodynamics . . . . 4 Lagrangian mechanics 4.1 Variational principles . . . . . . . . . . . . . . . . . . . . 4.1.1 Definitions . . . . . . . . . . . . . . . . . . . . . 4.1.2 Euler–Lagrange equations . . . . . . . . . . . . . 4.1.3 Coordinate invariance of Euler-Lagrange equations 4.2 Lagrangian mechanics . . . . . . . . . . . . . . . . . . . 4.2.1 The idea . . . . . . . . . . . . . . . . . . . . . . 4.2.2 Hamilton’s principle . . . . . . . . . . . . . . . . 4.2.3 Lagrange mechanics and symmetries . . . . . . . 4.2.4 Noether theorem . . . . . . . . . . . . . . . . . . 4.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 61 61 63 64 66 67 71 71 72 76 76 78 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 82 83 84 86 89 89 91 92 93 95 5 Hamiltonian mechanics 5.1 Hamiltonian mechanics . . . . . . . . . . . . . . . . . . . . 5.1.1 Legendre transform . . . . . . . . . . . . . . . . . . 5.1.2 Hamiltonian function . . . . . . . . . . . . . . . . . . 5.2 Phase space . . . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Phase space and structure of the Hamilton equations . 5.2.2 Hamiltonian flow . . . . . . . . . . . . . . . . . . . . 5.2.3 Phase space portraits . . . . . . . . . . . . . . . . . 5.2.4 Poisson brackets . . . . . . . . . . . . . . . . . . . . 5.2.5 Variational principle . . . . . . . . . . . . . . . . . . 5.3 Canonical transformations . . . . . . . . . . . . . . . . . . . 5.3.1 When is a coordinate transformation ”canonical”? . . 5.3.2 Canonical transformations: why? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99 100 100 101 106 106 107 109 110 112 113 113 113 . . . . . . . . . . Chapter 1 Electrodynamics 1.1 Magnetic field energy As a prelude to our discussion of the full set of Maxwell equations, let us address a question which, in principle, should have been answered in the previous chapter: What is the energy stored in a static magnetic field? In section ??, the analogous question for the electric field was answered in a constructive manner: we computed the mechanical energy required to build up a system of charges. It turned out that the answer could be formulated entirely in terms of the electric field, without explicit reference to the charge distribution creating it. By symmetry, one might expect a similar prescription to work in the magnetic case. Here, one would ask for the energy needed to build up a current distribution against the magnetic field created by those elements of the current distribution that have already been put in place. A moments thought, however, shows that this strategy is not quite as straightforwardly implemented as in the electric case: no matter how slowly we move a ‘current loop’ in an magnetic field, an electric field acting on the charge carriers maintaining the current in the loop will be induced — the induction law. Work will have to be done to maintain a constant current and it is this work function that essentially enters the energy balance of the current distribution. To make this picture quantitative, consider a current loop carrying a current I. We may think of this loop as being consecutively I B built up by importing small current loops (carrying current I) from infinity (see the figure.) The currents flowing along adjacent segments of these loops will eventually cancel so that only the net current I flowing around the boundary remains. Let us, then, compute the work that needs to be done to bring in one of these loops from infinity. Consider, first, an ordinary point particle kept at a (mechanical) potential U . The rate at which this potential changes if the particle changes its position is dt U = dt U (x(t)) = ∇U · dt x = −F · v, where F is the force acting on the particle. Specifically, for the charged particles moving in our prototypical current loops, F = qE, where E is the electric field induced by variation of 1 2 CHAPTER 1. ELECTRODYNAMICS the magnetic flux through the loop as it approaches from infinity. Next consider an infinitesimal volume element d3 x inside the loop. Assuming that the charge carriers move at a velocity v, the charge of the volume element is given by q = ρd3 x and rate of its potential change by dt φ = −ρd3 xv · E = −d3 xj · E. RWe may now use R the induction −1 law toRexpress the electric field in terms dσ n·dt B = δS R of magnetic quantities: ds·E = −c −c−1 δS dσ n · dt (∇ × A) = −c−1 ds · dt A. Since this holds regardless of the geometry of the loop, we have E = −c−1 dt A, where dt A is the change in vector potential due to the movement of the loop. Thus, the rate at which the potential of a volume element changes is given by dt φ = c−1 d3 xj · dt A. Integrating over time and space, we find that the total potential energy of the loop due to the presence of a vector potential is given by Z 1 d3 x j · A. (1.1) E= c Although derived for the specific case of a current loop, Eq.(1.1) holds for general current distributions subject to a magnetic field. (For example, for the current density carried by a point particle at x(t), j = qδ(x − x(t))v(t), we obtain E = qv(t) · A(x(t)), i.e. the familiar Lorentz–force contribution to the Lagrangian of a charged particle.) Now, assume that we shift the loop at fixed current against the magnetic The change R field. −1 3 in potential energy corresponding to a small shift is given by δE = c d x j · δA, where ∇ × δA = δB denotes the change in the field strength. Using that ∇ × H = 4πc−1 j, we represent δE as Z Z 1 1 3 d x (∇ × H) · δA = d3 x ijk (∂j Hk )δAi = δE = 4π 4π Z Z 1 1 3 d x Hk ijk ∂j δAi = d3 x H · δB, =− 4π 4π where in the integration by parts we noted that due to the spatial decay of the fields no surface terms at infinitey arise. Due to the linear relation H = µ−1 0 B, we may write H · δB = R 1 δ(H · B)/2, i.e. δE = 8π δ B · E. Finally, summing over all shifts required to bring the current loop in from infinity, we obtain 1 E= 8π Z d3 x H · B (1.2) for the magnetic field energy. Notice (a) that we have again managed to express the energy of the system entirely in terms of the fields, i.e. without explicit reference to the sources creating these fields and (b) the structural similarity to the electric field energy (??). Finally, (c) the above construction shows that a separation of electromagnetism into static and dynamic phenomena, resp., is not quite as unambiguous as one might have expected: the derivation of the magnetostatic field energy relied on the dynamic law of induction. We next proceed to embed the general static theory – as described by the concept of electric and magnetic potentials – into the larger framework of electrodynamics. 1.2. ELECTROMAGNETIC GAUGE FIELD 3 Thus, from now on, we are up to the theory of the full set of Maxwell equations ∇ · D = 4πρ, 1∂ 4π ∇×H− D= j, c ∂t c 1∂ ∇×E+ B = 0, c ∂t ∇ · B = 0. 1.2 (1.3) (1.4) (1.5) (1.6) Electromagnetic gauge field Consider the full set of Maxwell equations in vacuum (E = D and B = H), ∇·E 1∂ E ∇×B− c ∂t 1∂ B ∇×E+ c ∂t ∇·B = 4πρ, 4π = j, c = 0, = 0. (1.7) As in previous sections we will try to use constraints inherent to these equations to compactify them to a smaller set of equations. As in section ??, the equation ∇ · B = 0 implies B = ∇ × A. (1.8) (However, we may no longer expect A to be time independent.) Now, substitute this representation into the law of induction: ∇×(E+c−1 ∂t A) = 0. This implies that E+c−1 ∂t A = −∇φ can be written as the gradient of a scalar potential, or 1 E = −∇φ − ∂t A. c (1.9) We have, thus, managed to represent the electromagnetic fields as in terms of derivatives of a generalized four–component potential A = (φ, A). (The negative sign multiplying A has been introduced for later reference.) Substituting Eqs. (1.8) and (1.9) into the inhomogeneous Maxwell equations, we obtain −∆φ − 1c ∂t ∇ · A = 4πρ, −∆A + c12 ∂t2 A + ∇(∇ · A) + 1c ∂t ∇φ = 4π j c (1.10) These equations do not look particularly inviting. However, as in section ?? we may observe that the choice of the generalized vector potential A is not unique; this freedom can be used to transform Eq.(1.10) into a more manageable form: For an arbitrary function f : R3 × R → 4 CHAPTER 1. ELECTRODYNAMICS R, (x, t) 7→ f (x, t). The transformation A → A + ∇f leaves the magnetic field unchanged while the electric field changes according to E → E − c−1 ∂t ∇f . If, however, we synchronously redefine the scalar potential as φ → φ − c−1 ∂t f , the electric field, too, will not be affected by the transformation. Summarizing, the generalized gauge transformation A → A + ∇f, 1 φ → φ − ∂t f, c (1.11) leaves the electromagnetic fields (1.8) and (1.9) unchanged. The gauge freedom may be used to transform the vector potential into one of several convenient representations. Of particular relevance to the solution of the time dependent Maxwell equations is the so–called Lorentz gauge 1 ∇ · A + ∂t φ = 0. c (1.12) Importantly, it is always possible to satisfy this condition by a suitable gauge transformation. Indeed, if (φ0 , A0 ) does not obey the Lorentz condition, i.e. ∇ · A0 + 1c ∂t φ0 6= 0, we may define a transformed potential (φ, A) through φ = φ0 − c−1 ∂t f and A = A0 + ∇f . We now require that the newly defined potential obey (1.12). This is equivalent to the condition 1 2 1 0 0 ∆ − 2 ∂t f = − ∇ · A + ∂t φ , c c i.e. a wave equation with inhomogeneity −(∇ · A + c−1 ∂t φ). We will see in a moment that such equations can always be solved, i.e. we may always choose a potential (φ, A) so as to obey the Lorentz gauge condition. In the Lorentz gauge, the Maxwell equations assume the simplified form 1 2 ∆ − 2 ∂t φ = −4πρ, c 1 2 4π ∆ − 2 ∂t A = − j. c c (1.13) In combination with the gauge condition (1.12), Eqs. (1.13) are fully equivalent to the set of Maxwell equations (1.7). 1.2.1 Covariant formulation The formulation introduced above is mathematically efficient, however it has an unsatisfactory touch to it: the gauge transformation (1.11) couples the “zeroth component” φ with the 1.2. ELECTROMAGNETIC GAUGE FIELD 5 “vectorial” components A. While this indicates that the four components (φ, A) should form an entity, the equations above treat φ and A separately. In the following we show that there exists a much more concise and elegant covariant formulation of electrodynamics that transcendents this separation. At this point, we introduce the covariant notation merely as a mnemonic aid. Later in the text, we will understand the physical principles behind the new formulation. Definitions Consider a vector x ∈ R4 . We represent this object in terms of four components {xµ }, µ = 0, 1, 2, 3, where x0 will be called the time–like component and xi , i = 1, 2, 3 space1 like components. Four–component objects of this structure will be called contravariant vectors. To a contravariant vector xµ , we associate a covariant vector xµ (notice the positioning of the indices as superscripts and subscripts, resp.) through x0 = x 0 , xi = −xi . (1.14) A co– and a contravariant vector xµ and x0µ can be “contracted” to produce a number: 2 xµ x0µ ≡ x0 x00 − xi x0i . Here, we are employing theP Einstein summation convention according to which pairs of indices are summed over, xµ x0µ ≡ 4µ=1 xµ x0µ , etc. Notice that xµ x0µ = xµ x0µ . We next introduce a number of examples of co- and contravariant vectors that will be of importance below: . Consider a point in space time (t, r) defined by a time argument, and three real space components ri , i = 1, 2, 3. We store this information in a so-called contravariant four– component vector, xµ , µ = 0, 1, 2, 3 as x0 = ct, xi = ri , i = 1, 2, 3. Occasionally, we write x ≡ {xµ } = (ct, x) for short. The associated covariant vector has components {xµ } = (ct, r). . Derivatives taken w.r.t. the components of the contravariant 4–vector are denoted by ∂µ ≡ δxδµ = (c−1 ∂t , ∂r ). Notice that this is a covariant object. Its contravariant partner is given by {∂ µ } = (c−1 ∂t , −∂r ). . The 4-current j = {j µ } is defined through j ≡ (cρ, j), where ρ and j are charge density and vectorial current density, resp. . the 4–vector potential A ≡ {Aµ } through A = (φ, A). We next engage these definitions to introduce the 1 Following a widespread convention, we denote four component indices µ, ν, · · · = 0, 1, 2, 3 by Greek letters (from the middle of the alphabet.) “Space–like” indices i, j, · · · = 1, 2, 3 are denoted by (mid–alphabet) latin letters. 2 Although one may formally consider sums xµ x0µ , objects of this type do not make sense in the present formalism. 6 CHAPTER 1. ELECTRODYNAMICS Covariant representation of electrodynamics Above, we saw that the freedom to represent the potentials (φ, A) in different gauges may be used to obtain convenient formulations of electrodynamics. Within the covariant formalism, the general gauge transformation (1.11) assumes the form Aµ → Aµ − ∂ µ f. (1.15) (Exercise: convince yourself that this is equivalent to (1.11).) Specifically, this freedom may be used to realize the Lorentz gauge, ∂µ Aµ = 0. (1.16) Once more, we verify that any 4-potential may be transformed to one that obeys the Lorentz condition: for if A0µ obeys ∂µ A0µ ≡ λ, where λ is some function, we may define the transformed potential Aµ ≡ A0µ − ∂ µ f . Imposing on Aµ the Lorentz condition, ! 0 = ∂µ Aµ = ∂µ (A0µ − ∂ µ f ) = λ − ∂µ ∂ µ f, ! we obtain the equivalent condition f = λ. Here, we have introduced the d’Alambert operator 1 (1.17) ≡ ∂µ ∂ µ = 2 ∂t2 − ∆. c (In a sense, this is a four–dimentional generalization of the Laplace operator, hence the four– cornered square.) This is the covariant formulation of the condition discussed above. A gauge transformation Aµ → Aµ + ∂ µ f of a potential Aµ in the Lorenz gauge class ∂µ Aµ = 0 is compatible with the Lorentz condition iff f = 0. In other words, the class of Lorentz gauge vector potentials still contains a huge gauge freedom. The general representation of the Maxwell equations through potentials (1.10) affords the 4-representation 4π µ j . Aµ − ∂ µ (∂ν Aν ) = c P We may act on this equation with µ ∂µ to obtain the constraint ∂µ j µ = 0, (1.18) which we identify as the covariant formulation of charge conservation, ∂µ j µ = ∂t ρ+∇·j = 0: the Maxwell equations imply current conservation. Finally, imposing the Lorentz condition, ∂µ Aµ = 0, we obtain the simplified representation ∂ν ∂ ν Aµ = 4π µ j , c ∂µ Aµ = 0. (1.19) Before turning to the discussion of the solution of these equations a few general remarks are in order: 1.3. FOURIER TRANSFORMATION 7 . The Lorentz condition is the prevalent gauge choice in electrodynamics because (a) it brings the Maxwell equations into a maximally simple form and (b) will turn out below to be invariant under the most general class of coordinate transformations, the Lorentz transformations see below. (It is worthwhile to note that the Lorentz condition does not unambiguously fix the gauge of the vector potential. Indeed, we may modify any Lorentz gauge vector potential Aµ as Aµ → Aµ +∂ µ f where f satisfies the homogeneous wave equation ∂ µ ∂µ f = 0. This doesn’t alter the condition ∂ µ Aµ = 0.) Other gauge conditions frequently employed include . the Coulomb gauge or radiation gauge ∇ · A = 0 (employed earlier in section ??.) The advantage of this gauge is that the scalar Maxwell equation assumes the form of a Poisson equation ∆φ(x, t) = 4πρ(x, t) which is solved by φ(x, t) = Rsimple 3 0 d x ρ(x, t)/|x − x0 |, i.e. by an instantaneous ’Coulomb potential’ (hence the name Coulomb gauge.) This gauge representation has also proven advantageous within the context of quantum electrodynamics. However, we won’t discuss it any further in this text. . The passage from the physical fields E, B to potentials Aµ as fundamental variables implies a convenient reduction of variables. Working with fields, we have 2 × 3 vectorial components, which, however are not independent: equations such as ∇×B = 0 represent constraints. This should be compared to the four components of the potential Aµ . The gauge freedom implies that of the four components only three are independent (think about this point.) In general, no further reduction in the number of variable is possible. To see this, notice that the electromagnetic fields are determined by the “sources” of the theory, i.e. the 4-current j µ . The continuity equation ∂µ j µ = 0 reduces the number of independent sources down to 3. We certainly need three independent field components to describe the field patterns created by the three independent sources. (The electromagnetic field in a source–free environment j µ = 0 is described in terms of only two independent components. This can be seen, e.g., by going to the Coulomb gauge. The above equation for the time–like component of the potential is then uniquely solved by φ = 0. Thus, only the two independent components of A constraint by ∇ · A = 0 remain. 1.3 Fourier transformation In this – purely mathematical – section, we introduce the concept of Fourier transformation. We start with a formal definition of the Fourier transform as an abstract integral transformation. Later in the section, we will discuss the application of Fourier transformation in the solution of linear differential equations, and its conceptual interpretation as a basis-change in function space. 8 1.3.1 CHAPTER 1. ELECTRODYNAMICS Fourier transform (finite space) Consider a function f : [0, L] → C on an interval [0, L] ⊂ R. Conceptually, the Fourier transformation is a representation of f as a superposition (a sum) of plane waves: f (x) = 1 X ikx e fk , L k (1.20) where the sum runs over all values k = 2πn/L, n ∈ Z. Series representations of this type are summarily called Fourier series representations. At this point, we haven’t proven that the function f can actually be represented by a series representation of this type. However, if it exists, the coefficients fk can be computed with the help of the identity (prove it) Z 1 L 0 dx e−ikx eikx = δkk0 , (1.21) L 0 where k = 2πn/L, k 0 = 2πn0 /L and δkk0 ≡ δnn0 . We just need to multiply Eq. (1.1) by the function e−ikx and integrate over x to obtain Z L Z L X 0 (1.1) 1 −ikx dx e−ikx eik x fk0 dx e f (x) = L 0 0 k0 Z L X 1 0 (1.21) = fk 0 dx e−ikx eik x = fk . L 0 k0 We call the numbers fk ∈ C the Fourier coefficients of the function f . The relation Z fk = L dx e−ikx f (x), (1.22) 0 connecting a function with its Fourier coefficients is called the Fourier transform of f . The ’inverse’ identity (1.1) expressing f through its Fourier coefficients is called the inverse Fourier transform. To repeat, at this point, we haven’t proven the existence of the inverse transform, i.e. the possibility to ’reconstruct’ a function f from its Fourier coefficients fk . A criterion for the existence of the Fourier transform is that the insertion of (1.22) into the r.h.s. of (1.1) gives the back the original function f : Z L X 0 ! 1 ikx e dx0 e−ikx f (x0 ) = f (x) = L k 0 Z L 1 X ik(x−x0 ) = dx0 e f (x0 ). L 0 k A necessary and sufficient condition for this to hold is that 1 X ik(x−x0 ) e = δ(x − x0 ), L k (1.23) 1.3. FOURIER TRANSFORMATION 9 1.0 0.5 1.5 2.0 2.5 3.0 -0.5 -1.0 Figure 1.1: Representation of a function (thick line) by Fourier series expansion. The graphs show series epansions up to (1, 3, 5, 7, 9) terms in * where δ(x) is the Dirac δ–function. For reasons that will become transparent below, Eq. (1.23) is called a completeness relation. Referring for a proof of this relation to the exercise below, we here merely conclude that Eq. (1.23) establishes Eq. (1.22) and its inverse (1.1) as an integraltransform, i.e. the full information carried by f (x) is contained in the set of all Fourier coefficients {fk }. By way of example, figure 1.1 illustrates the approximation of a sawtooth shaped function f by Fourier series truncated after a finite number of terms. EXERCISE Prove the completeness relation. To this end, add an infinitesimal “convergence generating factor” δ > 0 to the sum: ∞ 1 X ik(x−x0 )−δ|k| 1 X 2πin(x−x0 )/L−2π|n|δ/L e . e = L L n=−∞ k Next compute the sum as a convergent geometric series. Take the limit δ & 0 and convince yourself that you obtain a representation of the Dirac δ–function. It is straightforward to generalize the formulae above to functions defined in higher dimensional spaces: consider a function f : M → C, where M = [0, L1 ] × [0, L2 ] × · · · × [0, Ld ] is a d–dimensional cuboid. It is straightforward to check, that the generalization of the formulae above reads d Y 1 X ik·x f (x) = e fk , L l k Zl=1 fk = dd x e−ix·k f (x), (1.24) M where the sum runs over all vectors k = (k1 , . . . , kd ), kl = 2πnl /Ll , nl ∈ Z. Eqs. (1.21) 10 CHAPTER 1. ELECTRODYNAMICS and (1.23) generalize to Z 0 dd x ei(k−k )·x = δk,k0 , M d Y 1 X ik(x−x0 ) e = δ(x − x0 ), L l l=1 k Q where δk,k0 ≡ dl=1 δkl ,kl0 . Finally, the Fourier transform of vector valued functions, f : M → Cn is defined through the Fourier transform of its components, i.e. fk is an n–dimensional vector whose components fkl , are given by the Fourier transform of the component functions f l (x), l = 1, . . . , n. This completes the definition of Fourier transformation in finite integration domains. At this point, a few general remarks are in order: . Why is Fourier transformation useful? There are many different answers to this question, which underpins the fact that we are dealing with an important concept: In many areas of electrical engineering, sound engineering, etc. it is customary to work with wave like signals, i.e. signals of harmonic modulation ∼ Re eikx = cos(kx). The (inverse) Fourier transform is but the decomposition of a general signal into harmonic constituents. Similarly, many “signals” encountered in nature (water or sound waves reflected from a complex pattern of obstacles, light reflected from rough surfaces, etc.) can be interpreted in terms of a superposition of planar waves. Again, this is a situation where a Fourier ’mode’ decomposition is useful. On a different note, Fourier transformation is an extremely potent tool in the solution of linear differential equations. This is a subject to which we will turn below. . There is some freedom in the P definitions of FourierR transforms. For example, 1 c ikx fk with fk = e−ikx fk and an arbitrary we might have chosen f (x) = ke R c dx P L −ikx 1 ikx fk with fk = dx e fk (inverted sign of the constant c. Or, f (x) = L k e phases), or combinations thereof. The actual choice of convention is often a matter of convenience. 1.3.2 Fourier transform (infinte space) In many applications, we want to consider functions f : Rd → C without explicit reference to some finite cuboid shaped integration domain. Formally, the extension to infinite space may be effected by sending the extension of the integration interval, L, to infinity. In the limit L → ∞, the spacing of consecutive Fourier modes ∆k = 2π/L approaches zero, and Eq. 1.3. FOURIER TRANSFORMATION 11 (1.1) is identified as the Riemann sum representation of an integral, X 1 X ikx 1 lim ∆k e fk = ei∆knx f∆kn L→∞ L ∆k→0 2π n∈Z k Z 1 = dk eikx f (k), 2π lim f (x) = lim L→∞ Following a widespread convention, we denote the Fourier representation of functions in infinite space by f (k) instead of fk . Explicit reference to the argument “k” will prevent us from confusing the Fourier representation f (k) from the original function f (x). The infinite space version of the inverse transform reads as Z ∞ dx e−ikx f (x). (1.25) f (k) = −∞ While this definition is formally consistent, it has a problem: only functions decaying sufficiently fast at infinity so as to make f (x) exp(ikx) integrable can be transformed in this way. This makes numerous function classes of applied interest – polynomials, various types of transcendental functions, etc. – non-transformable and severely restrict the usefulness of the formalism. However, by a little trick, we may upgrade the definition to one that is far more useful. Let us modify the transform according to Z 2 f (k) = lim dx e−ikx−δx f (x). (1.26) δ&0 Due to the inclusion of the convergence generating factor, δ, any function that does not increase faster than exponential can be transformed! Similarly, it will be useful to include a convergence generating in the reverse transform, i.e. Z 2 f (x) = lim dk eikx−δk f (k). δ&0 Finally, upon straightforward extension of these formulae to multi–dimensional functions f : Rd → C, we arrive at the definition of Fourier transform in infinite space Z f (x) = lim δ&0 Z f (k) = lim δ&0 dd k ix·k−δk2 e f (k), (2π)d d −ik·x−δx2 d xe (1.27) f (x). In these formulae, x2 ≡ x · x is the norm of the vector x, etc. A few more general remarks on these definitions: . To keep the notation simple, we will suppress the explicit reference to the convergence generating factor, unless convergence is an issue. 12 CHAPTER 1. ELECTRODYNAMICS . Occasionally, it will be useful to work with convergence generating factors different from the one above. For example, in one dimension, it may be convenient to effect convergence by the definition Z 1 dk eixk−δ|k| f (k), f (x) = lim δ&0 2π Z f (k) = lim dx e−ikx−δ|x| f (x). δ&0 . For later reference, we note that the generalization of the completeness relation to infinite space reads as Z dd k ik·(x−x0 ) e = δ(x − x0 ). (2π)d (1.28) . Finally, one may wonder, whether the inclusion of the convergence generating factor might not spoil the consistency of the transform. To check that this is not the case, we need to verify the infinite space variant of the completeness relation (1.23), i.e. Z dd k ik·(x−x0 )−δk2 lim e f (k) = δ(x − x0 ). (1.29) δ&0 (2π)d (For only if this relation holds, the two equations in (1.27) form a consistent pair, cf. the argument in section 1.3.1.) EXERCISE Prove relation (1.29). Do the Gaussian integral over k (hint: notice that the integral factorizes into d one–dimensional Gaussian integrals.) to obtain a result that in the limit δ & 0 asymptotes to a representation of the δ–function. 1.3.3 Fourier transform and solution of linear differential equations One of the most striking features of the Fourier transform is that it turns derivative operations 3 (complicated) into algebraic multiplications (simple): consider the function ∂x f (x). Without much loss of generality, we assume the function f : [0, L] → C to obey periodic boundary conditions, f (0) = f (L). A straightforward integration by parts shows that the Fourier coefficients of ∂x f are given by ikfk . In other words, denoting the passage from a function to its Fourier transform by an arrow, f (x) −→ fk ⇒ ∂x f (x) −→ ikfk . This operation can be iterated to give ∂xn f (x) −→ (ik)n fk . 3 Although our example is one–dimensional, we write ∂x instead of higher–dimensional applications below. d dx . This is in anticipation of our 1.3. FOURIER TRANSFORMATION 13 In a higher dimensional setting these identities generalize to (think why!) ∂l f (x) −→ (ikl )fk , where ∂l ≡ ∂x∂ l and kl is the l–component of the vector k. Specifically, the operations of vector analysis become purely algebraic operations: ∇f (x) −→ ikfk , ∇ · f (x) −→ ik · fk , ∇ × f (x) −→ ik × fk , ∆f (x) −→ −(k · k)fk , (1.30) etc. Relation of this type are of invaluable use in the solution of linear differential equations (of which electrodynamics is crawling!) To begin with, consider a linear differential equation with constant coefficients. This is a differential equation of the structure (1) (2) (c(0) + ci ∂i + cij ∂i ∂j + . . . )ψ(x) = φ(x), (n) where ci,j,... ∈ C are constants. We may now Fourier transform both sides of the equation to obtain (2) (1) (c(0) + ici ki − cij ki kj + . . . )ψk = φk . This is an algebraic equation which is readily solved as ψk = φk c(0) + (1) ici ki (2) − cij ki kj + . . . . The solution ψ(x) can now be obtained by computation of the inverse Fourier transform, Z dd k φk e−ik·x , (1.31) ψ(x) = (2) (2π)d c(0) + ic(1) i ki − cij ki kj + . . . Fourier transform methods thus reduce the solution of linear partial differential equations with constant coefficients to the computation of a definite integral. Technically, this integral may be difficult to compute, however, the conceptual task of solving the differential equation is under control. EXAMPLE By way of example, consider the second order ordinary differential equation (−∂x2 + λ−2 )ψ(x) = cδ(x − x0 ), 4 where λ, c and x0 are constants. Application of formula (1.31) gives a solution Z dk e−ikx eikx0 ψ(x) = c . 2π k 2 + λ−2 4 The solution is not unique: addition of a solution ψ0 of the homogeneous equation (−∂x2 + λ−2 )ψ0 (x) = 0 gives another solution ψ(x) → ψ(x) + ψ0 (x) of the inhomogeneous differential equation. 14 CHAPTER 1. ELECTRODYNAMICS The integral above can be done by the method of residues (or looked up in a table). As a result we obtain the solution. ψ(x) = cλe−|x−x0 |/λ . (Exercise: convince yourself that ψ is a solution to the differential equation.) In the limit λ → ∞, the differential equation collapses to a one–dimensional variant of the Poisson equation, −∂x2 ψ(x) = cδ(x − x0 ) for a charge c/4π at x0 . Our solution asymptotes to ψ(x) = −c|x − x0 |, which we may interpret as a one–dimensional version of the Coulomb potential. However, the usefulness of Fourier transform methods has its limits. The point is that the Fourier transform does not only turn complicated operations (derivatives) simple, the converse is also true. The crux of the matter is expressed by the so–called convolution theorem: straightforward application of the definitions (1.27) and of the completeness relation (1.29) shows that Z dd p f (k − p)g(p). (1.32) f (x)g(x) −→ (f ∗ g)(k) ≡ (2π)d In other words, the Fourier transform turns a simple product operation into a non–local integral convolution. Specifically, this means that the Fourier transform representation of (linear) differential equations with spatially varying coefficient functions will contain convolution operations. Needless to say that the solution of the ensuing integral equations can be complicated. Finally, notice the mathematical similarity between the Fourier transform f (x) → f (k) and its reverse f (k) → f (x). Apart from a sign change in the phase and the constant pre–factor (2π)−d the two operations in (1.27) are identical. This means that identities such as (1.30) or (1.32) all have a reverse version. For example, ∂k f (k) −→ −ixf (x), a convolution, (f ∗ g)(x) ≡ Z dd xf (x − y)g(y) −→ f (k)g(k) transforms into a simple product, etc. In practice, one often needs to compromise and figure out whether the real space or the Fourier transformed variant of a problem offers the best perspectives for mathematical attack. 1.3.4 Algebraic interpretation of Fourier transform While it is perfectly o.k. to apply the Fourier transform as a purely operational tool, new perspective become available once we understand the conceptual meaning of the transform. The Fourier transform is but one member of an entire family of integral transforms. All these transforms have a common basis, which we introduce below. The understanding of these principles will elucidate the mathematical structure of the formulas introduced above, and be of considerable practical use. 1.3. FOURIER TRANSFORMATION 15 A word of caution: this section won’t contain difficult formulae. Nonetheless, it doesn’t make for easy reading. Please do carefully think about the meaning of the structures introduced below. Function spaces as vector spaces Consider the space V of L2 functions f : [0, L] → C, i.e. the space of functions for which RL dx |f |2 exists. We call this a ’space’ of functions, because V is a vector space: for two 0 elements f, g ∈ V and a, b ∈ C, af + bg ∈ V , the defining criterion of a complex vector space. However, unlike the finite–dimensional spaces studied in ordinary linear algebra, the dimension of V is infinite. Heuristically, we may think of the dimension of a complex vector space as the number of linearly independent complex components required to uniquely specify a vector. In the case of functions, we need the infinitely many “components” f (x), x ∈ [0, L] for a full characterization. 1.0 In the following we aim to develop some intuition for the “linear algebra” on function spaces. To this end, it will be useful to think of V as the limit of finite dimensional vector spaces. Let xi = iδ, δ = L/N be a discretization of the interval [0, L] into N segments. Provided N is sufficiently large, the data set fN ≡ (f1 , . . . , fN ), fi ≡ f (xi ) will be an accurate representative of a (smooth) function f (see the figure for a discretization of a smooth function into N = 30 data points). We think of the N -component complex vectors fN generated in this way as elements of an N -dimensional complex vector space VN . Heuristically it should be clear that in the limit of infinitely fine discretization, N → ∞, the vectors fN carry the full information on their reference functions, and VN approaches a representative of the function 5 space “limN →∞ VN = V ”. 0.5 0.2 0.4 0.6 0.8 1.0 -0.5 -1.0 5 The quotes are quite appropriate here. Many objects standard in linear algebra, traces, determinants, etc. do not afford a straightforward extrapolation to the infinite dimensional case. However, in the present context, the ensuing structures – the subject of functional analysis – do not play an essential role, and a naive approach will be sufficient. 16 CHAPTER 1. ELECTRODYNAMICS On VN , we may P define a complex scalar product, h , i : VN × VN → C 6 ∗ −1 through, hf , gi ≡ N1 N has been included to keep i=1 fi gi . Here, the factor N the scalar product roughly constant as N increases. In the limit N → ∞, the operation h , i evolves into a scalar product on the function space V : to two functions f, g ∈ V , we assign Z L 1 X ∗ dxf ∗ (x)g(x). hf, gi ≡ lim fi gi = N →∞ N 0 i Corresponding to the scalar product on VN , we have a natural orthonormal basis {ei |i = 1, . . . , N }, where ei = (0, . . . , N, 0, . . . ) and the entry N sits at the ith position. With this definition, we have hei , ej i = N δij . The N –normalization of the basis vectors ei implies that the coefficients of general vectors f are simply obtained as fi = hei , f i. (1.33) What is the N → ∞ limit of these expressions? At N → ∞ the vectors ei become the representative of functions that are “infinitely sharply peaked” around a single point (see the figure above). Increasing N , and choosing a sequence of indices i that correspond to a fixed 7 point y ∈ [0, L], we denote this function by δy . Equation (1.33) then asymptotes to f (y) = hδy , f i = Z L dx δy (x)f (x), 0 8 where we noted that δy is real. This equation identifies δy (x) = δ(y − x) as the δ-function. This consideration identifies the δ–functions {δy |y ∈ [0, L]} as the function space analog of the standard basis associated to the canonical scalar product. Notice that there are “infinitely many” basis vectors. Much like we need to distinguish between vectors v ∈ VN and the components of vectors, vi ∈ C, we have to distinguish between functions f ∈ V and their “coefficients” or function values f (x) = hδx , f i ∈ C. Fourier transformation is a change of basis Why did we bother to introduce these formal analogies? The point is that much like in linear algebra we may consider bases different from the standard one. In fact, there is often good motivation to switch to another basis. Imagine one aims to mathematically describe a problem displaying a degree of “symmetry”. Think, for example, of the Poisson equation of a rotationally symmetric charge distribution. It will certainly be a good idea, to seek for 6 If no confusion is possible, we omit the subscript N in fN . For example, i = [N y/L], where [x] is the largest integer smaller than x ∈ R. 8 Strictly speaking, δy is not a “function”. Rather, it is what in mathematics is called a distribution. However, we will continue to use the loose terminology “δ–function”. 7 1.3. FOURIER TRANSFORMATION 17 solutions that reflect this symmetry, i.e. one might aim to represent the solution in terms of a basis of functions that show definite behavior under rotations. Specifically, the Fourier transformation is a change to a function basis that behaves conveniently when acted upon by derivative operations. To start with, let us briefly recall the linear algebra of changes of basis in finite dimensional vector spaces. Let {wα |α = 1, . . . , N } be a second basis of the space VN (besides the reference basis {ei |i = 1, . . . , N }). For simplicity, we presume that the new basis is orthonormal with respect to the given scalar product, i.e. hwα , wβ i = N δα,β . Under these conditions, any vector v ∈ VN may be expanded as X v= vα0 wα . (1.34) α Written out for individual components vi = hei , vi, this reads X vi = vα0 wα,i , (1.35) α where wα,i ≡ hei , wα i are the components of the new basis vectors in the old basis. The coefficients of the expansion,Pvα0 , of v in the new basis are obtained by taking the scalar product with wα : hwα , vi = β vβ0 hwα , wβ i = N vα0 , or vα0 = 1 X ∗ w v N j α,j j (1.36) P P ∗ wα,i vj . This relation Substitution of this expansion into (1.35) obtains vi = N −1 j α wα,j holds for arbitrary component sets {vi }, which is only possible if δij = 1 X ∗ w w . N α α,j α,i (1.37) This is the finite dimensional variant of a completeness relation. The completeness relation holds only for orthonormalized bases. However, the converse is also true: any system of vectors {wα |i = 1, . . . , n} that satisfying the completeness criterion (1.37) is orthonormal and “complete” in the sense that it can be used to represent arbitrary vectors as in (1.34). We may also write the completeness relation in a less index–oriented notation, X hei , ej i = hei , wα i hwα , ej i. (1.38) α EXERCISE Prove that the completeness relation (1.37) represents a criterion for the orthonormality of a basis. 18 CHAPTER 1. ELECTRODYNAMICS Let us now turn to the infinite dimensional generalization of these concepts. The (inverse) Fourier transform states (1.1) that an arbitrary element f ∈ V of function space (a “vector”) can be expanded in a basis of functions {ek ∈ V |k ∈ 2πL−1 Z}. The function values (“components in the standard basis”) of these candidate basis functions are given by ek (x) = hδx , ek i = eikx . 9 The functions ek are properly orthonormalized (cf. Eq. (1.21)) Z L dx e∗k (x)ek0 (x) = Lδkk0 . hek , ek0 i = 0 The inverse Fourier transform states that an arbitrary function can be expanded as f= 1X fk e k . L k (1.39) This is the analog of (1.34). In components, f (x) = 1X fk ek (x), L k (1.40) which is the analog of (1.35). Substitution of ek (x) = eikx gives (1.1). The coefficients can be obtained by taking the scalar product hek , i with (1.39), i.e. Z L dx e−ikx f (x), fk = hek , f i = 0 which is the Fourier transform. Finally, completeness holds if the substitution of these coefficients into the inverse transform (1.40) reproduces the function f (x). Since this must hold for arbitrary functions, we are led to the condition 1 X −ikx iky e e = δ(x − y), L k The abstract representation of this relation (i.e. the analog of (1.38)) reads (check it!) hδx , δy i = 1X hδx , ek i hek , δy i. L k For the convenience of the reader, the most important relations revolving around completeness and basis change are summarized in table 1.1. The column “invariant” contains abstract definitions, while “components” refers to the coefficients of vectors/function values (For better readability the normalization factors N we have our specific definition of scalar products above are omitted in the table. ) 9 However, this by itself does not yet prove their completeness as a basis! 1.3. FOURIER TRANSFORMATION 19 vector space invariant components elements scalar product standard basis alternate basis orthonormality expansion function space invariant vector, v components vi function, f hv,wi vi∗ wi hf,gi ei ei,j =δij δx value, f (x) R dx f ∗ (x)g(x) δx (y)=δ(x−y) wα wα,i ek ek (x)=exp ikx hwα ,wβ i=δαβ ∗ wα,i wβ,i =δαβ dx ei(k−k )x =Lδkk0 1 P f eikx f (x)= L k k R fk = dx e−ikx f (x) 1 P eik(x−y) δ(x−y)= L v=vα wα vi =vα wα,i hek ,k0 i=Lδkk0 1 Pf e f= L k k vα =hwα ,vi ∗ vi vα =wα,i fk =hek ,f i hei ,ej i=hei ,wα ihwα ,ej i ∗ δij =wα,i wα,j 1 hδ ,e ihe ,δ i δ(x−y)= L x k k y k coefficients completeness components R 0 k Table 1.1: Summarizing the linear algebraic interpretation of basis changes in function space. (In all formulas, a double index summation convention implied.) With this background, we are in a position to formulate the motivation for Fourier transformation from a different perspective. Above, we had argued that “Fourier transformation transforms derivatives into something simple”. The derivative operation ∂x : V → V can be viewed as a linear operator in function space: derivatives map functions onto functions, f → ∂x f and ∂x (af + bg) = a∂x f + b∂x g, a, b ∈ C, the defining criterion for a linear map. In linear algebra we learn that given a linear operator, A : VN → VN , it may be a good idea to look at its eigenvectors, wα , and eigenvalues, λα , i.e. Awα = λα wα . In the specific case, where A is hermitean, hv, Awi = hAv, wi for arbitrary v, w, we even know that the set of eigenvectors {wα } can be arranged to form an orthonormal basis. These structures extend to function space. Specifically, consider the operator −i∂x , where the factor (−i) has been added for convenience. A straightforward integration by parts shows that this operator is hermitean: Z Z ∗ hf, (−i∂x )gi = dx f (x)(−i∂x )g(x) = dx (−i∂x f )∗ (x)g(x) = h(−i∂x )f, gi. Second, we have, (−i∂x ek )(x) = −i∂x exp(ikx) = k exp(ikx) = kek (x), or (−i∂x ) ek = k ek for short. This demonstrates that the exponential functions exp(ikx) are eigenfunctions of the derivative operator, and k is the corresponding eigenvalue. This observation implies the conclusion: Fourier transformation is an expansion of functions in the “eigenbasis” of the derivative operator, i.e. the basis of exponential functions {ek }. This expansion is appropriate, whenever we need a “simple” representation of the derivative operator. In the sections to follow, we will explore how Fourier transformation works in practice. 20 1.4 CHAPTER 1. ELECTRODYNAMICS Electromagnetic waves in vacuum As a warmup to our discussion of the full problem posed by the solution of Eqs. (1.13), we consider the vacuum problem, i.e. a situation where no sources are present, j µ = 0. Taking the curl of the second of Eqs. (1.13) and using (1.8) we then find 1 2 ∆ − 2 ∂t B = 0. c Similarly, adding to the gradient of the first equation c−1 times the time derivative of the second, and using Eq.(1.9), we obtain 1 2 ∆ − 2 ∂t E = 0, c i.e. in vacuum both the electric field and the magnetic field obey homogeneous wave equations. 1.4.1 Solution of the homogeneous wave equations The homogeneous wave equations are conveniently solved by Fourier transformation. To this end, we define a four–dimensional variant of Eq. (1.27), Z 3 d kdω f (ω, k)eik·x−iωt , f (t, x) = (2π)4 Z f (ω, k) = d3 xdt f (t, x)e−ik·x+iωt , (1.41) The one difference to Eq. (1.27) is that the sign–convention in the exponent of the (ω/t)–sector of the transform differs from that in the (k, x)–sector. We next subject the homogeneous wave equation 1 2 ∆ − 2 ∂t ψ(t, x) = 0, c (where ψ is meant to represent any of the components of the E– or B–field) to this transformation and obtain ω 2 2 k − ψ(ω, k) = 0, c where k ≡ |k|. Evidently, the solution ψ must vanish for all values (ω, k), except for those for which the factor k 2 − (ω/c)2 = 0. We may thus write ψ(ω, k) = 2πc+ (k)δ(ω − kc) + 2πc− (k)δ(ω + kc), where c± ∈ C are arbitrary complex functions of the wave vector k. Substituting this representation into the inverse transform, we obtain the general solution of the scalar homogeneous wave equation Z d3 k i(k·x−ckt) i(k·x+ckt) ψ(t, x) = c (k)e + c (k)e . (1.42) + − (2π)3 In words: 1.4. ELECTROMAGNETIC WAVES IN VACUUM 21 The general solution of the homogeneous wave equation is obtained by linear superposition of elementary plane waves, ei(k·x∓ckt) , where each wave is weighted with an arbitrary coefficient c± (k). The elementary constituents are called waves because for any fixed instance of space, x/time, t they harmonically depend on the complementary argument time, t/position vector, x. The waves are planar in the sense that for all points in the plane fixed by the condition x · k = const. the phase of the wave is identical, i.e. the set of points k · x = const. defines a ‘wave front’ perpendicular to the wave vector k. The spacing between consequtive wave fronts with the same phase ≡ λ, where λ is the wave length of the wave arg(exp(i(k · x − ckt)) is given by ∆x = 2π k −1 and λ = k/2π its wave number. The temporal oscillation period of the wave fronts is set by 2π/ck. Focusing on a fixed wave vector k, we next generalize our results to the vectorial problem posed by the homogeneous wave equations. Since every component of the fields E and B is subject to its own independent wave equation, we may write down the prototypical solutions E(x, t) = E0 ei(k·x−ωt) , B(x, t) = B0 ei(k·x−ωt) , E B k (1.43) where we introduced the abbreviation ω = ck and E0 , B0 ∈ C3 are constant coefficient vectors. The Maxwell equations ∇ · B = 0 and (vacuum) ∇ · E = 0 imply the condition k · E0 = k · B0 = 0, i.e. the coefficient vectors are orthogonal to the wave vector. Waves of this type, oscillating in a direction perpendicular to the wave vector, are called transverse waves. Finally, evaluating Eqs.(1.43) on the law of induction ∇ × E + c−1 ∂t B = 0, we obtain the additional equation B0 = nk × E0 , where nk ≡ k/k, i.e. the vector B0 is perpendicular to both k and E0 , and of equal magnitude as E0 . Summarizing, the vectors k ⊥ E ⊥ B ⊥ k, |B| = |E| (1.44) form an orthogonal system and B is uniquely determined by E (and vice versa). At first sight, it may seem that we have been to liberal in formulating the solution (1.43): while the physical electromagnetic field is a real vector field, the solutions (1.43) are manifestly complex. The simple solution to this conflict is to identify Re E and Re B with the physical 10 fields. 1.4.2 Polarization In the following we will discuss a number of physically different realizations of plane electromagnetic waves. Since B is uniquely determined by E, we will focus attention on the latter. 10 One may ask why, then, did we introduce complex notation at all. The reason is that working with exponents of phases is way more convenient than explicitly having to distinguish between the sin and cos functions that arise after real parts have been taken. 22 CHAPTER 1. ELECTRODYNAMICS Let us choose a coordinate system such that e3 k k. We may then write E(x, t) = (E1 e1 + E2 e2 )eik·x−iωt , where Ei = |Ei | exp(iφi ). Depending on the choice of the complex coefficients Ei , a number of physically different wave–types can be distinguished. Linearly polarized waves For identical phases φ1 = φ2 = φ, we obtain Re E = (|E1 |e1 + |E2 |e2 ) cos(k · x − ωt + φ), i.e. a vector field linearly polarized in the direction of the vector |E1 |e1 + |E2 |e2 . The corresponding magnetic field is given by Re B = (|E1 |e2 − |E2 |e1 ) cos(k · x − ωt + φ), y z x i.e. a vector perpendicular on both E and k, and phase synchronous with E. INFO Linear polarization is a hallmark of many artificial light sources, e.g. laser light is usually linearly polarized. Likewise, the radiation emitted by many antennae shows approximately linear polarization. In nature, the light reflected from optically dense media – water surfaces, window 11 panes, etc. – tends to exhibit a partial linear polarization. Circularly polarized waves Next consider the case of a phase difference, φ1 = φ2 ∓ π/2 ≡ φ and equal amplitudes |E1 | = |E2 | = E: Re E(x, t) = E (e1 cos(k · x − ωt + φ) ∓ e2 sin(k · x − ωt + φ)) . Evidently, the tip of the vector E moves on a circle — E is circularly polarized. Regarded as a function of x (for any fixed instance of time), E traces out a spiral whose characteristic period is set by the wave number λ = 2π/k, (see Fig. ??.) The sense of orientation of this spiral is determined by the sign of the phase mismatch. For φ1 = φ2 + π/2 ( φ1 = φ2 − π/2) we speak of a right (left) circulary polarized wave or a wave of positive (negative) helicity. y z x INFO As with classical point particles, electromagnetic fields can be subjected to a quantization procedure. In quantum electrodynamics it is shown that the quantae of the electromagnetic field, the photons carry definite helicity, i.e. they represent the minimal quantum of circularly polarized waves. 11 This phenomenon motivates the application of pole filters in photography. A pole filter blocks light of a pre–defined linear polarization. This direction can be adjusted so as to suppress light reflections. As a result one obtains images exhibiting less glare, more intense colouring, etc. 1.5. GREEN FUNCTION OF ELECTRODYNAMICS 23 Elliptically polarized waves Circular and linear polarization represent limiting cases of a more general form of polarization. Indeed, the minimal geometric structure capable of continuously interpolating between a line segment and a circle is the ellipse. To conveniently see the appearance of ellipses, consider the basis change, e± = √12 (e1 ± ie2 ), and represent the electric field as E(x, t) = (E+ e+ + E− e− )eik·x−iωt . In this representation, a circularly polarized wave corresponds to the limit E+ = 0 (positive helicity) or E− = 0 (negative helicity). Linear polarization is obtained by setting E+ = ±E− . It is straightforward to verify that for generic values of the ratio |E− /E+ | ≡ r one obtains an elliptically polarized wave where the ratio between the major and the minor axis is set by |1 + r/1 − r|. The tilting angle α of the ellipse w.r.t. the 1–axis is given by α = arg(E− /E+ ). y z x This concludes our discussion of the polarization of electromagnetic radiation. It is important to keep in mind that the different types of polarization discussed above represent limiting cases of what in general is only partially or completely un–polarized radiation. Radiation of a given value of the frequency, ω, usually involves the superposition of waves of different wave vector k (at fixed wave number k = |k| = ωc−1 .) Only if the amplitudes of all partial waves share a definite phase/amplitude relation, do we obtain a polarized signal. The degree of polarization of a wave can be determined by computing the so–called Stokes parameters. However, we will not discuss this concept in more detail in this text. 1.5 Green function of electrodynamics To prepare our discussion of the full problem (1.19), let us consider the inhomogeneous scalar wave equation 1 2 ∆ − 2 ∂t ψ = −4πg, (1.45) c where the inhomogeneity g and the solution f are scalar functions. As with the Poisson equation (??) the weak spot of the wave equation is its linearity. We may, therefore, again employ the concept of Green functions to simplify the solution of the problem. 24 CHAPTER 1. ELECTRODYNAMICS INFO It may be worthwhile to rephrase the concept of the solution of linear differential equations by Green function methods in the present dynamical context: think of a generic inhomogeneity 4πg(x, t) as an additive superposition of δ–inhomogeneities, δ(y,s) (x, t) ≡ δ(x − y)δ(t − s) concentrated in space and time around y and s, resp. (see the figure, where the vertical bars represent individual contributions δ(y,s) weighted by g(y, s) and a finite discretization in space–time is understood.) Z 4πg(x, t) = d3 yds (4πg(y, s))δ(y,s) (x, t). We may suppress the arguments (x, t) to write this identity as Z 4πg = d3 yds (4πg(y, s)) δ(y,s) . g This formula emphasizes the interpretation of t g as a weighted sum of particular functions δ(y,s) , with constant “coefficients” (i.e. numx bers independent of (x, t)) 4πg(y, s). Denote the solutions of the wave equation for an individual point source δ(y,s) by G(y,s) . The linearity of the wave R equation means that the solution f of the equation for the function g is given by the sum f = dd yds (4πg(y, s))G(y,s) of point source solutions, weighted by the coefficients 12 g(y, t). Or, reinstating the arguments (x, t), Z f (x, t) = d3 yds G(x, t; y, s) g(y, s). (1.46) The functions G(y,s)(x,t) ≡ G(x, t; y, s) are the Green functions of the wave equation. Knowledge of the Green function is equivalent to a full solution of the general equation, up to the final integration (1.46). The Green function of the scalar wave equation (1.45) G(x, t; x0 , t) is defined by the differential equation 1 2 (1.47) ∆ − 2 ∂t G(x, t; x0 , t0 ) = −δ(x − x0 )δ(t − t0 ). c Once the solution of this equation has been obtained (which requires specification of a set of boundary conditions), the solution of (1.45) becomes a matter of a straightforward integration: Z f (x, t) = 4π d3 x0 dt0 G(x, t; x0 , t0 )g(x0 , t0 ). (1.48) Assuming vanishing boundary conditions at infinity G(x, t; x0 , t0 ) to the solution of Eq.(1.47). 12 |x−x0 |,|t−t0 |→∞ −→ 0, we next turn To check that this is a solution, act with the wave equation operator −∆ + c−2 ∂t on both sides of the equation and use that (∆ − c−2 ∂t )G(x, t; y, s) = −δ(y,s) (x, t) ≡ −δ(x − y)δ(t − s). 1.5. GREEN FUNCTION OF ELECTRODYNAMICS 25 INFO In fact, the most elegant and efficient solution strategy utilizes methods of the theory of complex functions. however, we do not assume familiarity of the reader with this piece of mathematics and will use a more elementary technique. We first note that for the chosen set of boundary conditions, the Green function G(x − x0 , t − t0 ) will depend on the difference of its arguments only. We next Fourier transform Eq.(1.47)R in the temporal variable, i.e. we act on theR equation with the integral transform dt exp(iωt)f (t) (whose inverse is f (t) = dt exp(−iωt)f (ω).) The temporally f (ω) = 2π transformed equation is given by 1 ∆ + k 2 G(x, ω) = − δ(x), 2π (1.49) where we defined k ≡ ω/c. If it were not for the constant k (and a trivial scaling factor (2π)−1 on the l.h.s.) this equation, known in the literature as the Helmholtz equation, would be equivalent to the Poisson equation of electrostatics. Indeed, it is straightforward to verify that the solution of (1.49) is given by G± (x, ω) = 1 e±ik|x| , 8π 2 |x| (1.50) where the sign ambiguity needs to be fixed on physical grounds. INFO To prove Eq.(1.50), we introduce polar coordinates centered around x0 and act with the spherical representation of the Laplace operator (cf. section ??) on G± (r) = 8π1 2 eikr /r. Noting that the radial part of the Laplace operator, ∆# is given by r−2 ∂r r2 ∂r = ∂r2 + (2/r)∂r and ∆# (4πr)−1 = −δ(x) (the equation of the Green function of electrostatics), we obtain e±ikr 1 ∆ + k 2 G± (r, ω) = 2 ∂r2 + 2r−1 ∂r + k 2 = 8π r −1 ±ikr e±ikr 2 −1 ∓ikr −1 ±ikr ∓ikr −1 2 2 −1 = ∂r + 2r ∂r r + 2e (∂r r )∂r e +e r ∂r + k + 2r ∂r e = 8π 2 1 1 = − δ(x) ∓ 2ikr−2 ± 2ikr−2 = − δ(x), 2π 2π as required. Doing the inverse temporal Fourier transform, we obtain Z Z ±iωc−1 |x| 1 δ(t ∓ c−1 |x|) ± −iωt −iωt 1 e G (x, t) = dω e Gω/c (x) = dω e = , 8π 2 |x| 4π |x| or G± (x − x0 , t − t0 ) = 1 δ(t − t0 ∓ c−1 |x − x0 |) . 4π |x − x0 | (1.51) For reasons to become clear momentarily, we call G+ (G− ) the retarded (advanced) Green function of the wave equation. 26 CHAPTER 1. ELECTRODYNAMICS t x x = ct Figure 1.2: Wave front propagation of the pulse created by a point source at the origin schematically plotted as a function of two–dimensional space and time. The width of the front is set by δx = cδt, where δt is the duration of the time pulse. Its intensity decays as ∼ x−1 . 1.5.1 Physical meaning of the Green function Retarded Green function To understand the physical significance of the retarded Green function G+ , we substitute the r.h.s. of Eq.(1.51) into (1.48) and obtain Z g(x0 , t − |x − x0 |/c) . (1.52) f (x, t) = d3 x0 |x − x0 | For any fixed instance of space and time, (x, t), the solution f (x, t) is affected by the sources g(x0 , t0 ) at all points in space and fixed earlier times t0 = t − |x − x|0 /c. Put differently, a time t − t0 = |x − x0 |/c has to pass before the amplitude of the source g(x0 , t0 ) may cause an effect at the observation point x at time t — the signal received at (x, t) is subject to a retardation mechanism. When the signal is received, it is so at a strength g(x0 , t)/|x−x0 |, similarly to the Coulomb potential in electrostatics. (Indeed, for a time independent source g(x0 , t0 ) = g(x0 ), we may enter the wave equation with a time–independent ansatz f (x), whereupon it reduces to the Poisson equation.) Summarizing, the sources act as ‘instantaneous Coulomb charges’ which are (a) of strength g(x0 , t0 ) and felt at times t = t0 + |x − x0 |/c. EXAMPLE By way of example, consider a point source at the origin, which ‘broadcasts’ a signal for a short instance of time at t0 ' 0, g(x0 , t0 ) = δ(x0 )F (t0 ) where the function F is sharply peaked around t0 = 0 and describes the temporal profile of the source. The signal is then given by f (x, t) = F (t − |x|/c)/|x|, i.e. we obtain a pattern of out–moving spherical waves, whose amplitude diminishes as |x|−1 or, equivalently ∼ 1/tc (see Fig. 1.2.) Advanced Green function Consider now the solution we would have obtained from the advanced Green function, f (x, t) = R 3 0 g(x0 ,t+|x−x0 |/c) . Here, the signal responds to the behaviour of the source in the future. dx |x−x0 | 1.5. GREEN FUNCTION OF ELECTRODYNAMICS 27 The principle of cause–and–effect or causality is violated implying that the advanced Green function, though mathematically a legitimate solution of the wave equation, does not carry physical meaning. Two more remarks are in order: (a) when solving the wave equation by techniques borrowed from the theory of complex functions, the causality principle is built in from the outset, and the retarded Green function automatically selected. (b) Although the advanced Green function does not carry immanent physical meaning, its not a senseless object altogether. However, the utility of this object discloses itself only in quantum theories. 1.5.2 Electromagnetic gauge field and Green functions So far we have solved the wave equation for an arbitrary scalar source. Let us now specialize to the wave equations (1.19) for the components of the electromagnetic potential, Aµ . To a first approximation, these are four independent scalar wave equations for the four sources jµ . We may, therefore, just copy the prototypical solution to obtain the retarded potentials of electrodynamics Z ρ(x0 , t − |x − x0 |/c) , φ(x, t) = d3 x0 |x − x0 | Z 0 0 1 3 0 j(x , t − |x − x |/c) dx . (1.53) A(x, t) = c |x − x0 | INFO There is, however, one important consistency check that needs to be performed: recalling that the wave equations (1.19) hold only in the Lorentz gauge ∂µ Aµ = 0 ⇔ c−1 ∂t φ + ∇ · A = 0, we need to check that the solutions (1.53) actually meet the Lorentz gauge condition. Relatedly, we have to keep in mind that the sources are not quite independent; they obey the continuity equation ∂µ j µ = 0 ⇔ ∂t ρ + ∇ · j = 0. As in section ??, it will be most convenient to probe the gauge behaviour of the vector potential in a fully developed Fourier language. Also, we will use a 4–vector notation throughout. In this notation, the Fourier transformation (1.41) assumes the form Z d4 x µ f (k) = (1.54) f (x)eix kµ , 4 (2π) where a factor of c−1 has been absorbed in the integration measure and kµ = (k0 , k) with k0 = 13 ω/c. The Fourier transform of the scalar wave equation (1.45) becomes kµ k µ ψ(k) = −4πg(k). Specifically, the Green function obeys the equation kµ k µ G(k) = −(2π)−4 which is solved by G(k) = −((2π)4 kµ k µ )−1 . (One may check explicitly that this is the Fourier transform of our solution (1.51), however for our present purposes there is no need to do so.) The solution of the general scalar wave equation (1.48) obtains by convoluting the Green function and the source, Z f (x) = 4π(G ∗ g)(x) ≡ 4π d4 x G(x − x0 )g(x0 ). Using the convolution theorem, this transforms to f (k) = (2π)4 4π G(k)g(k) = − 4πg(k) kµ kµ . Specifically, for the vector potential, we obtain Aµ = −4π 13 jµ . kν k ν Recall that x0 = ct, i.e. kµ xµ = k0 x0 − k · x = ωt − k · x. (1.55) 28 CHAPTER 1. ELECTRODYNAMICS This is all we need to check that the gauge condition is fulfilled: The Fourier transform of the Lorentz gauge relation ∂µ Aµ = 0 is given by kµ Aµ = k µ Aµ = 0. Probing this relation on (1.55), we obtain kµ Aµ ∝ kµ j µ = 0, where we noted that kµ j µ = 0 is the continuity relation. We have thus shown that the solution (1.53) conforms with the gauge constraints. As an important byproduct, our proof above reveals an intimate connection between the gauge behaviour of the electromagnetic potential and current conservation. Eqs. (1.53) generalize Eqs.(??) and (??) to the case of dynamically varying sources. In the next sections, we will explore various aspects of the physical contents of these equations. 1.6 Field energy and momentum In sections ?? and 1.1 we have seen that the static electric and the magnetic field, resp., carry energy. How does the concept of field energy generalize to the dynamical case where the electric and the magnetic field form an inseparable entity? Naively, one may speculate that the energy, E, carried by an field is given by the sum of its electric and R electromagnetic 1 d3 x(E · D + B · H). As we will show below, this expectation magnetic components, E = 8π actually turns out to be correct. However, there is more to the electromagnetic field energy than the formula above. For example, we know from daily experience that electromagnetic energy can flow, and that it can be converted to mechanical energy. (Think of a microwave oven where radiation flows into whatever fast–food you put into it, to next deposit its energy into [mechanical] heating.) In section 1.6.1, we explore the balance between bulk electromagnetic energy, energy flow (current), and mechanical energy. In section 1.6.2, we will see that the concept of field energy can be generalized to one of “field momentum”. Much like energy, the momentum carried by the electromagnetic field can flow, get converted to mechanical momentum, etc. 1.6.1 Field energy We begin our discussion of electromagnetic field energy with a few formal operations on the Maxwell equations: multiplying Eq. (1.4) by E· and Eq. (1.5) by −H·, and adding the results to each other, we obtain 4π 1 j · E. E · (∇ × H) − H · (∇ × E) − (E · ∂t D + H · ∂t B) = c c Using that for general vector field v and w, (check!) ∇ · (v × w) = w · (∇ × v) − v · (∇ × w), as well as E · ∂t D = ∂t (E · D)/2 and H · ∂t B = ∂t (H · B)/2, this equation can be rewritten as 1 c ∂t (E · D + B · H) + ∇ · (E × H) + j · E = 0. (1.56) 8π 4π To understand the significance of Eq.(1.60), let us integrate it over a test volume V : Z Z Z 1 c 3 dt d x (E · D + B · H) = − dσ n · (E × H) − d3 x j · E, (1.57) 8π 4π V S(V ) V 1.6. FIELD ENERGY AND MOMENTUM 29 where use of Gauß’s theorem has been made. The first term is the sum over the electric and the magnetic field energy density, as derived earlier for static field configurations. We now interpret this term as the energy stored in general electromagnetic field configurations and w≡ 1 (E · D + B · H) 8π (1.58) as the electromagnetic energy density. What is new in electrodynamics is that the field energy may change in time. The r.h.s. of the equation tells us that there are two mechanisms whereby the eletromagnetic field energy may be altered. First, there is a surface integral over the so–called Poynting vector field S≡ c E × H. 4π (1.59) We interpret the integral over this vector as the energy current passing through the surface S and the Poynting vector field as the energy current density. Thus, the pair (energy density, w)/(Poynting vector S) plays a role similar to the pair (charge density, ρ)/(current density, j) for the matter field. However, unlike with matter, the energy of the electromagnetic field is not conserved. Rather, the balance equation of electromagnetic field energy above states that ∂t w + ∇ · S = −j · E. (1.60) The r.h.s. of this equation contains the matter field j which suggests that it describes the conversion of electromagnetic into mechanical energy. Indeed, the temporal change of the energy of a charged point particle in an electromagnetic field (cf. section 1.1 for a similar line of reasoning) is given by dt U (x(t)) = −F · ẋ = −q(E + c−1 v × B) · v = −qE · v, where we observed that the magnetic component of the Lorentz force is perpendicular to the velocity and, therefore, does not do work on the particle. Recalling that the current densityR of a point particle is given by j = qδ(x − x(t))v, this expression may be rewritten as dt U = d3 x j · E. The r.h.s. of Eq. (1.59) is the generalization of this expression to arbitrary current densities. Energy conservation implies that the work done on a current of charged particles has to be taken from the electromagnetic field. This explains the appearance of the mechanical energy on the r.h.s. of the balance equation (1.60). INFO Equations of the structure ∂t ρ + ∇ · j = σ, (1.61) generally represent continuity equations: suppose we have some quantity – energy, oxygen 14 molecules, voters of some political party – that may be described by a density distribution ρ. 14 In the case of discrete quantities (the voters, say) ρ(x, t)d3 x is the average number of “particles” present in a volume d3 x where the reference volume d3 x must be chosen large enough to make this number bigger than unity (on average.) 30 CHAPTER 1. ELECTRODYNAMICS The contents of our quantity in a small reference volume d3 x may change due to (i) flow of the quantity to somewhere else. Mathematically, this is described by a current density j, where j may be a function of ρ. However, (ii) changes may also occur as a consequence of a conversion of ρ into something else (the chemical reaction of oxygen to carbon dioxide, voters changing their party, etc.). In (1.61), the rate at which this happens is described by σ. If σ equals zero, we speak of a conserved quantity. Any conservation law can be represented in differential form, Eq. (1.61), or in the equivalent integral form Z Z Z 3 d3 x σ(x, t). (1.62) dS · j = d x ρ(x, t) + dt V S V It is useful to memorize the form of the continuity equation. For if a mathematical equation of the form ∂t (function no. 1) + ∇ · (vector field) = function no. 2 crosses our way, we know that we have encountered a conservation law, and that the vector field must be “the current” associated to function no. 1. This can be non–trivial information in contexts where the identification of densities/currents from first principles is difficult. For example, the structure of (1.60) identifies w as a density and S as its current. 1.6.2 Field momentum ∗ Unllike in previous sections on conservation laws, we here identify B = H, D = E, i.e. we 15 consider the vacuum case. As a second example of a physical quantity that can be exchanged between matter and electromagnetic fields, we consider momentum. According to Newtons equations, the change of the total mechanical momentum Pmech carried by the particles inside a volume B is given by the integrated force density, i.e. Z Z 1 3 3 dt Pmech = d xf = d x ρE + j × B , c B B where in the second equality we inserted the Lorentz force density. Using Eqs. (??) and (??) to eliminate the sources we obtain Z 1 dt Pmech = d3 x (∇ · E)E − B × (∇ × B) + c−1 B × Ė . 4π B Now, using that B × Ė = dt (B × E) − Ḃ × E = dt (B × E) − cE × (∇ × E) and adding 0 = B(∇ · B) to the r.h.s., we obtain the symmetric expression Z 1 dt Pmech = d3 x (∇ · E)E − E × (∇ × E) + B(∇ · B) − B × (∇ × B) + c−1 dt (B × E) , 4π B 15 The discussion of the field momentum in matter (cf. ??) turns out to be a delicate matter, wherefore we prefer to stay on the safe ground of the vaccum theory. 1.6. FIELD ENERGY AND MOMENTUM 31 which may be reorganized as Z 1 3 dt Pmech − d xB × E = 4πc B Z 1 = d3 x ((∇ · E)E − E × (∇ × E) + B(∇ · B) − B × (∇ × B)) . 4π B This equation is of the form dt (something) = (something else). Comparing to our earlier discussion of conservation laws, we are led to interpret the ‘something’ on the l.h.s. as a conserved quantity. Presently, this quantity is the sum of the total mechanical momentum density Pmech and the integral Z Pfield = d3 x g, g≡ 1 S E × B = 2. 4πc c (1.63) The structure of this expression suggests to interpret Pfield as the momentum carried by the electromagnetic field and g as the momentum density (which happens to be given by c−2 × the Poynting vector.) If our tentative interpretation of the equation above as a conservation law is to make sense, we must be able to identify its r.h.s. as a surface integral. This in turn requires that the components of the (vector valued) integrand be representable as a Xj ≡ ∂i Tij i.e. as the divergence of a vector–field Tj with components Tij . (Here, j = 1, 2, 3 plays the role of a spectator is the case, R we may, indeed, transform R 3 index.) RIf this 3 the integral to a surface integral, B d x Xj = B d x ∂i Tij = ∂B dσ n · Tj . Indeed, it is not difficult to verify that δij [(∇ · X) − X × (∇ × X)]j = ∂i Xi Xj − X · X 2 Identifiyng X = E, B and introducing the components of the Maxwell stress tensor as 1 δij Tij = Ei Ej + Bi Bj − (E · E + B · B) , 4π 2 (1.64) R R The r.h.s. of the conservation law assumes the form B d3 x ∂i Tij = ∂B dσ ni Tij , where ni are the components of the normal vector field of the system boundary. The law of the conservation of momentum thus assumes the final form Z dt (Pmech + Pfield )j = dσ ni Tij . (1.65) ∂B Physically, dtdσni Tij is the (jth component of the) momentum that gets pushed through dσ in time dt. Thus, dσni Tij is the momentum per time, or force excerted on dσ and ni Tij the force per area or radiation pressure due to the change of linear momentum in the system. 32 CHAPTER 1. ELECTRODYNAMICS It is straightforward to generalize the discussion above to the conservation of angular momentum: The angular momentum carried by a mechanical system of charged particles may be converted into angular momentum of the electromagnetic field. It is evident from Eq. (1.63) that the angular momentum density of the field is given by l=x×g = 1 x × (E × B). 4πc However, in this course we will not discuss angular momentum conservation any further. 1.6.3 Energy and momentum of plane electromagnetic waves Consider a plane wave in vacuum. Assuming that the wave propagates in 3–direction, the physical electric field, Ephys is given by Ephys. (x, t) = Re E(x, t) = Re (E1 e1 + E2 e2 )eikx3 −iωt = r1 u1 (x3 , t)e1 + r2 u2 (x3 , t)e2 , where ui (x3 , t) = cos(φi + kx3 − ωt) and we defined Ei = ri exp(iφ1 ) with real ri . Similarly, the magnetic field Bphys. is given by Bphys. (x, t) = Re (e3 × E(x, t)) = r1 u1 (x3 , t)e2 − r2 u2 (x, t)e1 . From these relations we obtain the energy density and Poynting vector as 1 1 (E2 + B2 ) = ((r1 u1 )2 + (r2 u2 )2 )(x3 , t) 8π 4π S(x, t) = cw(x, t)e3 , w(x, t) = where we omitted the subscript ’phys.’ for notational simplicity. EXERCISE Check that w and S above comply with the conservation law (1.60). 1.7 Electromagnetic radiation To better understand the physics of the expressions (1.53), let us consider where the sources are confined to within a region in space of typical extension d. Without loss of generality, we assume the time dependence of the sources to be harmonic, with a certain characteristic frequency ω, j µ (x) = j µ (x) exp(−iωt). (The signal generated by sources of more general time dependence can always be obtained by superposition of harmonic signals.) As we shall see, all other quantities of relevance to us (potentials, fields, etc.) will inherit the same time dependence, i.e. X(x, t) = X(x) exp(−iωt). Specifically, Eq. (1.53) implies that the electromagnetic potentials, too, oscillate in time, Aµ (x) = Aµ (x) exp(−iωt). Substituting R µ 0 )eik|x−x0 | this ansatz into Eq. (1.53), we obtain Aµ (x) = 1c d3 x0 j (x|x−x . 0| 1.7. ELECTROMAGNETIC RADIATION 33 As in our earlier analyses of multipole fields, we assume that the observation point x is far away from the source, r = |x| d. Under these circumstances, we may focus attention on the spatial components of the vector potential, Z 0 ik|x−x0 | 1 3 0 j(x )e , (1.66) dx A(x) = c |x − x0 | where we have substituted the time–oscillatory ansatz for sources and potential into (1.53) and divided out the time dependent phases. From (1.66) the magnetic and electric fields are obtained as B = ∇ × A, E = ik −1 ∇ × (∇ × A), (1.67) where in the second identity we used the source–free variant of the law of magnetic circulation, c−1 ∂t E = −ikE = ∇ × B. We may now use the smallness of the ratio d/r to expand |x − x0 | ' r − n · x0 + . . . , where n is the unit vector in x–direction. Substituting this expansion into (1.66) and expanding the result in powers of r−1 , we obtain a series similar in spirit to the electric and magnetic multipole series discussed above. For simplicity, we here focus attention on the dominant contribution to the series, obtained by approximating |x − x0 | ' r. For reasons to become clear momentarily, this term, Z eikr d3 x0 j(x0 ), (1.68) A(x) ' cr generates electric dipoleR radiation. In section ?? we have seen that for a static current distribution, the integral j vanishes. However, for a dynamic distribution, we may engage the Fourier transform of the continuity relation, iωρ = ∇ · j to obtain Z Z Z Z 3 3 3 0 d x ji = d x ∇xi · j = − d x xi (∇ · j) = −iω d3 x0 xi ρ. Substituting this result into (1.68) and recalling the definition of the electric dipole moment R of a charge distribution, d ≡ d3 x x · ρ, we conclude that eikr A(x) ' −ikd , r (1.69) is indeed controlled by the electric dipole moment. Besides d and r another characteristic length scale in the problem is the characteristic wave length λ ≡ 2π/k = 2πc/ω. For simplicity, we assume that the wave length λ d is much larger than the extent of the source. INFO This latter condition is usually met in practice. E.g. for radiation in the MHz range, ω ∼ 106 s−1 the wave length is given by λ = 3 · 108 m s−1 /106 s−1 ' 300 m much larger than the extension of typical antennae. 34 CHAPTER 1. ELECTRODYNAMICS We then need to distinguish between three different regimes (see Fig. ??): the near zone, d r λ, the intermediate zone, d λ ∼ r, and the far zone, d λ r. We next discuss these regions in turn. Near zone For r λ or kr 1, the exponential in (1.68) may be approximated by unity and we obtain k A(x, t) ' −i d eiωt , r where we have re–introduced the time dependence of the sources. Using Eq.(1.67) to compute the electric and magnetic field, we get k n × deiωt , r2 3n(n · d) − d iωt e . E(x, t) = r3 B(x, t) = i The electric field equals the dipole field (??) created by a dipole with time dependent moment d exp(iωt) (cf. the figure where the numbers on the ordinate indicate the separation from the radiation center in units of the wave length and the color coding is a measure of the field intensity.) This motivates the denotation ‘electric dipole radiation’. The magnetic field is by a factor kr 1 smaller than the electric field, i.e. in the near zone, the electromagnetic field is dominantly electric. In the limit k → 0 the magnetic field vanishes and the electric field reduces to that of a static dipole. The analysis of the intermediate zone, kr ∼ 1 is complicated in as much as all powers in an expansion of the exponential in (1.68) in kr must be retained. For a discussion of the resulting field distribution, we refer to[1]. Far zone For kr 1, it suffices to let the derivative operations on the argument kr of the exponential function. Carrying out the derivatives, we obtain B = k2n × d ei(kr−ωt) , r E = −n × B. (1.70) These asymptotic field distributions have much in common with the vacuum electromagnetic fields above. Indeed, one would expect that far away from the source (i.e. many wavelengths away from the source), the electromagnetic field resembles a spherical wave (by which we 1.7. ELECTROMAGNETIC RADIATION 35 mean that the surfaces of constant phase form spheres.) For observation points sufficiently far from the center of the wave, its curvature will be nearly invisible and we the wave will look approximately planar. These expectations are met by the field distributions (1.70): neglecting all derivatives other than those acting on the combination kr (i.e. neglecting corrections of O(kr)−1 ), the components of E and B obey the wave equation (the Maxwell equations in 16 vacuum. The wave fronts are spheres, outwardly propagating at a speed c. The vectors n ⊥ E ⊥ B ⊥ n form an orthogonal set, as we saw is characteristic for a vacuum plane wave. Finally, it is instructive to compute the energy current carried by the wave. To this end, we recall that the physically realized values of the electromagnetic field obtain by taking the real part of (1.70). We may then compute the Poynting vector (1.59) as S= 4 ck 4 h... it ck 2 2 2 n (d − (n · d) ) cos (kr − ωt) −→ n (d2 − (n · d)2 ), 4πr2 8πr2 Where the last expression is the energy current temporally averaged over several oscillation periods ω −1 . The energy current is maximal in the plane perpendicular to the dipole moment of the source and decays according to an inverse square law. It is also instructive to compute the total power radiated by the source, i.e. the energy current integrated over spheres of constant radius r. (Recall that the integrated energy current accounts for the change of energy inside the reference volume per time, i.e. the power radiated by the source.) Choosing the z–axis of the coordinate system to be colinear with the dipole moment, we have d2 − (n · d)2 = d2 sin2 θ and Z Z 2π Z ck 4 d2 ck 4 d2 2 π 2 r sin θdθ dφ sin θ = . P ≡ dσn · S = 8πr2 3 0 0 S Notice that the radiated power does not depend on the radius of the reference sphere, i.e. the work done by the source is entirely converted into radiation and not, say, in a steadily increasing density of vacuum field energy. INFO As a concrete example of a radiation source, let us consider a center–fed linear antenna, i.a. a piece of wire of length a carrying an AC current that is maximal at the center of the wire. We model the current flow by the distribution I(z) = I0 (1 − 2|z|a−1 ) exp(−iωt), where a λ = c/ω and alignment with the z–axis has been assumed. Using the continuity equation, ∂z I(z, t) + ∂t ρ(z, t), we obtain the charge density (charge per unit length) in the wire as Rρ(z) = iI0 2(ωa)−1 sgn(z) exp(−iωt). The dipole moment of the source is thus given by a d = ez −a dz zρ(z, t) = iI0 (a/2ω), and the radiated power by P = (ka)2 2 I . 12c 0 The coefficient Rrad ≡ (ka)2 /12c of the factor I02 is called radiation resistance of the antenna. To understand the origin of this denotation, notice that [Rrad ] = [c−1 ] has indeed the dimension of resistivity (exercise.) Next recall that the power required to drive a current through a conventional resistor R is given by P = U I = RI 2 . Comparison with the expression above suggests to interpret 16 Indeed, one may verify (do it!) that the characteristic factors r−1 ei(kr−ωt) are exact solutions of the wave equations; they describe the space–time profile of a spherical wave. 36 CHAPTER 1. ELECTRODYNAMICS Rrad as the ‘resistance’ of the antenna. However, this resistance has nothing to do with dissipative energy losses inside the antenna. (I.e. those losses that hold responsible for the DC resistivity of metals.) Rather, work has to be done to feed energy into electromagnetic radiation. This work determines the radiation resistance. Also notice that Rrad ∼ k 2 ∼ ω 2 , i.e. the radiation losses increase quadratic in frequency. This latter fact is of great technological importance. Our results above relied on a first order expansion in the ratio d/r between the extent of the sources and the distance of the observation point. We saw that at this order of the expansion, the source coupled to the electromagnetic field by its electric dipole moment. A more sophisticated description of the field field may be obtained by driving the expansion in d/r to higher order. E.g. at next order, the electric quadrupole moment and the magnetic dipole moment enter the stage. This leads to the generation of electric quadrupole radiation and magnetic dipole radiation. For an in–depth discussion of these types of radiation we refer to [1]. Chapter 2 Macroscopic electrodynamics Suppose we want to understand the electromagnetic fields in an environment containing extended pieces of matter. From a purist point of view, we would need to regard the O(1023 ) carriers of electric and magnetic moments — electrons, protons, neutrons — comprising the medium as sources. Throughout, we will denote the fields e, . . . , h created by this system of sources as microscopic fields. (Small characters are used to distinguish the microscopic fields from the effective macroscopic fields to be introduced momentarily.) The ‘microscopic Maxwell equations’ read as ∇·e 1∂ e ∇×b− c ∂t 1∂ ∇×e+ b c ∂t ∇·b = 4πρµ , 4π = jµ , c = 0, = 0, where ρµ and jµ are the microscopic charge and current density, respectively. Now, for several reasons, it does not make much sense to consider these equations as they stand: First, it is clear that attempts to get a system of O(1023 ) dynamical sources under control are bound to fail. Second, the dynamics of the microscopic sources is governed by quantum effects and cannot be described in terms of a classical vector field j. Finally, we aren’t even interested in knowing the microscopic fields. Rather, (in classical electrodynamics) we want to understand the behaviour of fields on ‘classical’ length scales which generally exceed the atomic scales by 1 far. In the following, we develop a more praxis oriented theory of electromagnetism in matter. We aim to lump the influence of the microscopic degrees of freedom of the matter–environment into a few effective modifiers of the Maxwell equations. 1 For example, the wave–length of visible light is about 600 nm, whilst the typical extension of a molecule is 0.1 nm. Thus, ‘classical’ length scales of interest are about three to four orders of magnitude larger than the microscopic scales. 37 38 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS Figure 2.1: Schematic on the spatially averaged system of sources. A zoom into a solid (left) reveals a pattern of ions or atoms or molecules (center left). Individual ions/atoms/molecules centered at lattice coordinate xm may carry a net charge (qm ), and a dipole element dm (center right). A dipole element forms if sub-atomic constituents (electrons/nuclei) are shifted asymmetrically w.r.t. the center coordinate. In an ionic system, mobile electrons (at coordinates xj contribute to the charge balance.) 2.1 Macroscopic Maxwell equations Let us, then, introduce macroscopic fields by averaging the microscopic fields as E ≡ hei, B ≡ hbi, R 3 0 where the averaging procedure is defined by hf (x)i ≡ d x f (x − x0 )g(x0 ), and g is a weight R function that is unit normalized, gt = 1 and decays over sufficiently large regions in space. Since the averaging procedure commutes with taking derivatives w.r.t. both space and time, ∂t,x E = h∂t,x ei (and the same for B), the averaged Maxwell equations read as ∇·E 1∂ E ∇×B− c ∂t 1∂ B ∇×E+ c ∂t ∇·B 2.1.1 = 4πhρµ i, 4π = hjµ i, c = 0, = 0. Averaged sources To better understand the effect of the averaging on the sources, we need to take a (superficial) look into the atomic structure of a typical solid. Generally, two different types of charges in solids have to be distinguished: first, there are charges — nuclei and valence electrons — that are by no more than a vector am,j off the center coordinate of a molecule (or atom for that matter) xm . Second, (in metallic systems) there are free conduction electrons which may abide at arbitrary coordinates xi (t) in the system. Denoting P the two contributions by ρb and ρf , respectively, we have ρµ = ρb + ρf , where ρb (x, t) = m,j qm,j δ(x − xm (t) − am,j (t)) and P ρf = qe i δ(x − xi (t)). Here, qm,j is the charge of the ith molecular constituent and qe the electron charge. 2.1. MACROSCOPIC MAXWELL EQUATIONS 39 Averaged charge density The density of the bound charges, averaged over macroscopic proportions, is given by Z X hρb (x)i = dx0 g(x0 ) qm,j δ(x − x0 − xm − am,j ) = m,j = X m,j ' X m g(x − xm − am,j )qm,j ' qm g(x − xm ) − ∇ X m g(x − xm ) · X j | = X m am,j qm,j = {z dm } qm hδ(x − xm )i − ∇ · P(x). In the second line, we used the fact that the range over which the weight function g changes exceeds atomic extensions by far to Taylor expand to first order in the offsets a. The zeroth order term of the expansion is oblivious to the molecular structure, i.e. it contains only the P total charge qm = j qm,j of the molecules and their center positions. The first order term P 2 contains the molecular dipole moments, dm ≡ j am,j qm,j . By the symbol * + X P(x, t) ≡ δ(x − xm (t))dm (t) , (2.1) m we denote the average polarization of the medium. Evidently, P is a measure of the density of the dipole moments carried by the molecules in the system. P The average density of the mobile carriers in the system is given by hρf (x, t)i = hqe i δ(x− xi (t))i, so that we obtain the average charge density as hρµ (x, t)i = ρav (x, t) − ∇ · P(x, t) where we defined the effective or macroscopic charge density in the system, * + X X ρav (x, t) ≡ qm δ(x − xm (t)) + qe δ(x − xi (t)) , m (2.2) (2.3) i Notice that the molecules/atoms present in the medium enter the quantity ρav (x, t) as point– like entities. Their finite polarizability is accounted for by the second term contributing to hρi. Also notice that most solids are electrically neutral on average, i.e. the positive charge of ions accounted for in the first term in (2.3) cancels against the negative charge of mobile electrons (the second term) meaning that ρav (x, t) = 0. Under these circumstances, hρµ (x, t)i = −∇ · P(x, t). 2 (2.4) Here, we allow for dynamical atomic or molecular center coordinates, xm = xm (t). In this way, the definition of P becomes general enough to encompass non-crystalline, or liquid substances in which atoms/molecules are not tied to a crystalline matrix. 40 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS where P is given by the averaged polarization (2.1). The electric field E in the medium is caused by the sum of the averaged microscopic density, hρµ i, and an ’external’ charge 3 density, ∇ · E = −4π(ρ + hρµ i), (2.5) where ρ is the external charge density. Averaged current density Averaging the current density proceeds in much the same way as the charge density procedure outlined above. As a result of a somewhat more tedious calculation (see the info block below) we obtain hjµ i ' jav + Ṗ + c∇ × M, (2.6) where M(x) = * X m 1 X δ(x − xm ) qm,j am,j × ȧm,j 2c j + (2.7) is the average density of magnetic dipole moments in the system. The quantity * + E D X X ẋi δ(x − xi ) + qm ẋm δ(x − xm ) jav ≡ qe m is the current carried by the free charge carriers and the point–like approximation of the molecules. Much like the average charge density ρav this quantity vanishes in the majority of cases – solids usually do not support non–vanishing current densities on average, i.e. jav = 0, meaning that the average of the microscopic current density is given by hjµ i = Ṗ + c∇ × M. (2.8) In analogy to Eq. (2.5), the magnetic induction has the sum of hjµ i and an ’external’ current density, j as its source: 1 4π ∇ × B − ∂t E = (hjµ i + j) . c c (2.9) INFO To compute the average of the microscopic current density,P we we decompose jµ = jb + jf the current density into a bound and a free part. With jb (x) = 3 m,j (ẋm + ȧm,j )δ(x − For example, for a macroscopic system placed into a capacitor, the role of the external charges will be taken by the charged capacitor plates, etc. 2.1. MACROSCOPIC MAXWELL EQUATIONS 41 xm − am,j ), the former is averaged as Z X hjb (x)i = dx0 g(x0 ) qm,j (ẋm + ȧm,j ) δ(x − x0 − xm − am,j ) = m,j = X m,j g(x − xm − am,j )(ẋm + ȧm,j )qm,j ' # " ' X m,j qm,j g(x − xm ) − X m ∇g(x − xm ) · am,j (ẋm + ȧm,j ). We next consider the different orders in a contributing to this expansion in turn. At zeroth order, we obtain + * X X qm g(x − xm )ẋm = qm ẋm δ(x − xm ) , hjb (x)i(0) = m m i.e. the current carried by the molecules in a point–like approximation. The first order term is given by i Xh hjb (x)i(1) = g(x − xm ) ḋm − (∇g(x − xm ) · dm ) ẋm . m The form of this expression suggests to compare it with the time derivative of the polarization vector, i X Xh Ṗ = dt g(x − xm )dm = g(x − xm ) ḋm − (∇g(x − xm ) · ẋm ) dm . m m P The difference of these two expressions is given by X ≡ hjb (x)i(1) − Ṗ = m ∇g(x − xm ) × (dm × ẋm ). By a dimensional argument, one may show that this quantity is negligibly small: the magnitude |ẋm | ∼ v is of the order of the typical velocity of the atomic or electronic compounds inside the solid. We thus obtain the rough estimate |X(q, ω)| ∼ v(∇P)(q, ω)| ∼ vq|P|, where in the second step we generously neglected the differences in the vectorial structure of P and X, resp. However, |(Ṗ)(q, ω)| = ω|P(q, ω)|. This means that |X|/|Ṗ| ∼ vq/ω ∼ v/c is of the order of the ratio of typical velocities of non–relativistic matter and the speed of light. This ratio is so small that it (a) over–compensates the crudeness of our estimates above by far and (b) justifies to neglect the difference X. We thus conclude that hjb i(1) ' Ṗ. The second order term contributing to the average current density is given by X hjb i(2) = − qm,j am,j (ȧm,j · ∇g(x − xm )) m,j This expression is related to the density of magnetic dipole moments carried by the molecules. The latter is given by (cf. Eq. (??) and Eq. (2.7)) 1 X M= qm,j am,j × ȧm,j g(x − xm ). 2c m,j As a result of a straightforward calculation, we find that the curl of this expression is given by 1 1 X ∇ × M = hjb i(2) + dt qm,j am,j (am,j · ∇g(x − xm )). c 2c m,j 42 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS The second term on the Pr.h.s. of this expression engages the time derivative of the electric quadrupole moments ∼ qaa ab , a, b = 1, 2, 3 carried by the molecules. The fields generated by these terms are arguably very weak so that hjb i(2) ' c∇ × M. P Adding to the results above the contribution of the free carriers hjf (x)i = qe h i ẋi δ(x − xi )i, we arrive at Eq. (2.6). Interpretation of the sources I: charge and current densities Eqs. (2.4) and (2.8) describe the spatial average of the microscopic sources of electromagnetic fields in an extended medium. The expressions appearing on the r.h.s. of these equations can be interpreted in two different ways, both useful in their own right. One option is to look at the r.h.s. of, say, Eq. (2.4) as an effective charge density. This density obtains as the divergence of a vector field P describing the average polarization in the medium (cf. Eq. (2.1)). To understand the meaning of this expression, consider the cartoon of a solid shown in Fig. 2.2. Assume that the microscopic constituents of the medium are ’polarized’ in a certain 4 direction. Formally, this polarization is described by a vector field P, as indicated by the horizontal cut through the medium in Fig. 2.2. In the bulk of the medium, the ’positive’ and ’negative ends’ of the molecules carrying the polarization mutually neutralize and no netcharge density remains. (Remember that the medium is assumed to be electrically neutral on average.) However, the boundaries of the system carry layers of uncompensated positive and negative polarization, and this is where a non-vanishing charge density will form (indicated by a solid and a dashed line at the system boundaries.) Formally, the vector field P begins and ends (has its ’sources’) at the boundaries, i.e. ∇ · P is non-vanishing at the boundaries where it represents the effective surface charge density. This explains the appearance of ∇ · P as a source term in the Maxwell equations. Dynamical changes of the polarization ∂t P 6= 0, – due to, for example, fluctuations in the orientation of polar molecules – correspond to the movement of charges parallel to P. The 5 rate ∂t P of charge movement is a current flow in P direction. Thus, ∂t P must appear as a bulk current source in the Maxwell equations. Finally, a non-vanishing magnetization M (cf. the vertical cut through the system) corresponds to the flow of circular currents in a plane perpendicular to M. For homogeneous M, the currents mutually cancel in the bulk of the system, but a non-vanishing surface current remains. Formally, this surface current density is represented by the curl of M (Exercise: compute the curl of a vector field that is perpendicular to the cut plane and homogeneous across the systems.), i.e. c∇ × M is a current source in the effective Maxwell equations. 4 Such a polarization may form spontaneously, or, more frequently, be induced externally, cf. the info block below. 5 Notice that a current flow j = ∂t P is not in conflict with the condition of electro-neutrality. 2.1. MACROSCOPIC MAXWELL EQUATIONS 43 Figure 2.2: Cartoon illustrating the origin of (surface) charge and current densities forming in a solid with non-vanishing polarization and/or magnetization. Interpretation of the sources II: fields Above, we have interpreted ∇·P, ∂t P and ∇×M as charge and current sources in the Maxwell equations. But how do we know these sources? It stands to reason that the polarization, P, and magnetization, M, of a bulk material will sensitively depend on the electric field, E, and the magnetic field, B, present in the system. For in the absence of such fields, E = 0, B = 0, only 6 few materials will ’spontaneously’ build up a non–vanishing polarization, or magnetization. Usually, it takes an external electric field, to orient the microscopic constituents of a solid so as to add up to a bulk polarization. Similarly, a magnetic field is usually needed to generate a bulk magnetization. This means that the solution of the Maxwell equations poses us with a self consistency problem: An external charge distribution, ρext immersed into a solid gives rise to an electric field. This field in turn creates a bulk polarization, P[E], which in turn alters the charge distribution, ρ → ρeff [E] ≡ ρext − ∇ · P[E]. (The notation indicates that the polarization depends on E.) Our task then is to self consistently solve the equation ∇· E = 4πρeff [E], with a charge distribution that depends on the sought for electric field E. Similarly, the current distribution j = jeff [B, E] in a solid depends on the electric and the magnetic field which means that ’Amperès law’, too, poses a self consistency problem. In this section we re–interpret polarization and magnetization in a manner tailor made to the self consistent solution of the Maxwell equations. We start by substituting Eq. (2.2) into (2.5). Rearranging terms we obtain ∇ · (E + 4πP) = 4πρ. 6 Materials disobeying this rule are called ferroelectric (spontaneous formation of polarization), or ferromagnetic (spontaneous magnetization), respectively. In spite of their relative scarcity, ferroelectric and –magnetic materials are of huge applied importance. Application fields include, storage and display technology, sensor technology, general electronics, and, of course, magnetism. 44 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS This equation suggests to introduce a field D ≡ E + 4πP. (2.10) The source of this displacement field is the external charge density, ∇ · D = 4πρ, (2.11) (while the electric field has the spatial average of the full microscopic charge system as its source, ∇ · E = 4πhρµ i.) Notice the slightly different perspectives at which Eqs. (2.5) and (2.10) interpret the phenomenon of matter polarization. Eq. (2.5) places the emphasis on the intrinsic charge density −4πhρµ i. The actual electric field, E, is caused by the sum of external charges and polarization charges. In Eq. (2.10), we consider a fictitious field D, whose sources are purely external. The field D differs from the actual electric field E by (4π×) the polarization, P, building up in the system. Of course the two alternative interpretations of polarization are physically equivalent. (Think about this point.) The magnetization, M, can be treated in similar terms: Substituting Eq. (2.6) into (2.9), we obtain 4π 1 j. ∇ × (B − 4πM) − ∂t (E + 4πP) = c c The form of this equation motivates the definition of the magnetic field H = B − 4πM. (2.12) Expressed in terms of this quantity, the Maxwell equation assumes the form 1 4π ∇ × H − ∂t D = j, c c (2.13) where j is the external current density. According to this equation, H plays a role similar to D: the sources of the magnetic field H are purely external. The field H differs from the induction B (i.e. the field an magnetometer would actually measure in the system) by (−4π×) the magnetization M. In conceptual analogy to the electric field, E, the induction B has the sum of external and intrinsic charge and current densities as its source (cf. Eq. (2.9).) Electric and magnetic susceptibility To make progress with Eqs. (2.10) and (2.11), we need to say more about the polarization. Above, we have anticipated that a finite polarization is usually induced by an electric field, i.e. P = P[E]. Now, external electric fields triggering a non–vanishing polarization are ususally much weaker than the intrinsic microscopic fields keeping the constituents of a solid together. In practice, this means that the polarization may be approximated by a linear functional of the 2.1. MACROSCOPIC MAXWELL EQUATIONS 45 7 electric field. The most general form of a linear functional reads as Z Z 1 3 0 P(x, t) = dx dt0 χ(x, t; x0 , t0 )E(x0 , t0 ), (2π)4 (2.14) where the integral kernel χ is called the electric susceptibility of the medium and the factor (2π)−4 has been introduced for convenience. The susceptibility χ(x, t; x0 , t0 ) describes how 8 an electric field amplitude at x0 and t0 < t affects the polarization at (x, t). Notice that the polarization obtains by convolution of χ and E. In Fourier space, the equation assumes the simpler form P(q) = χ(q)E(q), where q = (ω/c, q) as usual. Accordingly, the electric field and the displacement field are connected by the linear relation D(q) = (q)E(q), (q) = 1 + 4πχ(q), (2.15) where the function (q) is known as the dielectric function. INFO On a microscopic level, at least two different mechanisms causing field–induced polarization need to be distinguished: Firstly, a finite electric field may cause a distortion of the electron clouds surrounding otherwise unpolarized atoms or molecules. The relative shift of the electron clouds against the nuclear centers causes a finite molecular dipole moment which integrates to a macroscopic polarization P. Second, in many substances (mostly of organic chemical origin), the molecules (a) carry a permanent intrinsic dipole moment and (b) are free to move. A finite electric field will cause spatial alignment of the otherwise dis–oriented molecules. This leads to a polarization of the medium as a whole. Information on the frequency/momentum dependence of the dielectric function has to be inferred from outside electrodynamics, typically from condensed matter physics, or plasma 9 physics. In many cases of physical interest — e.g. in the physics of plasmas or the physics of metallic systems — the dependence of on both q and ω has to be taken seriously. Sometimes, only the frequency dependence, or the dependence on the spatial components of momentum is of importance. However, the crudest approximation at all (often adopted in this course) is to approximate the function (q) ' by a constant. (Which amounts to approximating χ(x, t; x0 , t0 ) ∝ δ(x − x0 )δ(t − t0 ) by a function infinitely short ranged in space and time.) If not mentioned otherwise, we will thus assume D = E, where is a material constant. 7 Yet more generally, one might allow for a non co–linear dependence of the polarization on the field vector: Z Z 1 3 0 Pi (x, t) = d x dt0 χij (x, t; x0 , t0 )Ej (x0 , t0 ), (2π)4 where χ = {χij } is a 3 × 3–matrix field. 8 By causality, χ(x, t; x0 , t0 ) ∝ Θ(t − t0 ). 9 A plasma is a gas of charged particles. 46 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS As with the electric sector of the theory, the magnetic fields created by externally imposed macroscopic current distributions (a) ‘magnetize’ solids, i.e. generate finite fields M, and (b) are generally much weaker than the intrinsic microscopic fields inside a solid. This means that we can write B = H + 4πM[H], where M[H] may be assumed to be linear in H. In analogy to Eq. (2.14) we thus define Z Z 1 3 0 dx dt0 χm (x, t; x0 , t0 )H(x0 , t0 ), (2.16) M(x, t) = (2π)4 where the function χ is called the magnetic susceptibility of the system. The magnetic susceptibility describes how a magnetic field at (x0 , t0 ) causes magnetization at (x, t). Everything we said above about the electric susceptibility applies in a similar manner to the magnetic susceptibility. Specifically, we have the Fourier space relation M(q) = χm (q)H(q) and B(q) = µ(q)H(q), µ(q) = 1 + 4πχm (q), (2.17) where µ(q) is called the magnetic permeability. In cases where the momentum dependence of the susceptibility is negligible, we obtain the simplified relation B = µH, (2.18) where µ is a constant. As in the electric case, a number of different microscopic mechanisms causing deviations of µ off unity can be distinguished. Specifically, for molecules carrying no intrinsic magnetic moment, the presence of an external magnetic field may induce molecular currents and, thus, a non–vanishing magnetization. This magnetization is generally directed opposite to the external field, i.e. µ < 1. Substances of this type are called diamagnetic. (Even for the most diamagnetic substance known, bismuth, 1 − µ = O(10−4 ), i.e. diamagnetism is a very weak effect.) For molecules carrying a non–vanishing magnetic moment, an external field will cause alignment of the latter and, thus, an effective amplification of the field, µ > 1. Materials of this type are called paramagnetic. Typical values of paramagnetic permeabilities are of the order of µ − 1 = O(10−5 ) − O(10−2 ). Before leaving this section, it is worthwhile to take a final look at the general structure of the Maxwell equations in matter, We note that . The averaged charge densities and currents are sources of the electric displacement field D and the magnetic field H, respectively. . These fields are related to the electric field E and the magnetic induction B by Eqs. (2.15) and (2.17), respectively. . The actual fields one would measure in an experiment are E and B; these fields determine the coupling (fields matter) as described by the Lorentz force law. . Similarly, the (matter un–related) homogeneous Maxwell equations contain the fields E and B. 2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS 2.2 47 Applications of macroscopic electrodynamics In this section, we will explore a number of phenomena caused by the joint presence of matter and electromagnetic fields. The organization of the section parallels that of the vacuum part of the course, i.e. we will begin by studying static electric phenomena, then advance to the static magnetic case and finally discuss a few applications in electrodynamics. 2.2.1 Electrostatics in the presence of matter * Generalities Consider the macroscopic Maxwell equations of electrostatics ∇ · D = 4πρ, ∇ × E = 0, D = E, (2.19) where in the last equation we assumed a dielectric constant for simplicity. Now assume that we wish to explore the fields in the vicinity of a boundary separating two regions containing types of matter. In general, the dielectric constants characterizing the two domains will be different, i.e. we have D = E in region #1 and D = 0 E in region #2. (For 0 = 1, we describe the limiting case of a matter/vacuum interface.) Optionally, the system boundary may carry a finite surface charge density η. Arguing as in section ??, we find that the first and the second of the Maxwell equations 10 imply the boundary conditions (D(x) − D0 (x)) · n(x) = 0, (E(x) − E0 (x)) × n(x) = 0, (2.20) where the unprimed/primed quantities refer to the fields in regions #1 and #2, respectively, and n(x) is a unit vector normal to the boundary at x. Thus, the tangential component of the electric field continuous and the normal component of the displacement field are continuous at the interface. Example: Dielectric sphere in a homogeneous electric field To illustrate the computation of electric fields in static environments with matter, consider the example of a massive sphere of radius R and dielectric constant placed in an homogeneous external displacement field D. We wish to compute the electric field E in the entire medium. 10 If the interface between #1 and #2 carries external surface charges, the first of these equations generalizes to (D(x) − D0 (x)) · n(x) = 4πη(x), where η is the surface charge density. In this case, the normal component of the displacement field jumps by an amount set the by the surface charge density. 48 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS Since there are no charges present, the electric potential inside and outside the sphere obeys the Laplace equation ∆φ = 0. Further, choosing polar coordinates such that the z–axis is aligned with the orientation of the external field, we expect the potential to be azimuthally symmetric, φ(r, θ, φ) = φ(r, θ). Expressed in terms of this potential, the boundary equations (2.20) assume the form ∂r φr=R−0 = ∂r φr=R+0 , (2.21) ∂θ φr=R−0 = ∂θ φr=R+0 , (2.22) φr=R−0 = φr=R+0 , (2.23) where R±0 ≡ limη→0 R±η. To evaluate these conditions, we expand the potential inside and outside the sphere in a series of spherical harmonics, q(??). Due to the azimuthal symmetry of the problem, only φ–independent functions Yl,0 = φ(r, θ) = X l Pl (cos θ) × 2l+1 Pl 4π contribute, i.e. Al rl , r < R, , l −(l+1) Bl r + Cl r , r ≥ R, where we have absorbed the normalization factors (2l + 1/4π)1/2 in the expansion coefficients Al , Bl , Cl . At spatial infinity, the potential must approach the potential φ = −Dz = −Dr cos θ = −DrP1 (cos θ) of the uniform external field D = Dez . Comparison with the series above shows that B1 = −D and Bl6=1 = 0. To determine the as yet unknown coefficients Al and Cl , we consider the boundary conditions above. Specifically, Eq. (2.21) translates to the condition X Pl (cos θ) lAl Rl−1 + (l + 1)Cl R−(l+2) + Dδl,1 = 0. l Now, the Legendre polynomials are a complete set of functions, i.e. the vanishing of the l.h.s. of the equation implies that all series coefficients must vanish individually: C0 = 0, A1 + 2R−3 C1 + D = 0, lAl Rl−1 + (l + 1)Cl R−(l+2) = 0, l > 1. P l −(l+1) A second set of equations is obtained from Eq. (2.22): ∂ )= θ l Pl (cos θ)((Al −Bl )R −Cl R P l −(l+1) 0 or l>0 Pl (cos θ)((Al − Bl )R − Cl R ) = const. The second condition implies (think why!) (Al − Bl )Rl − Cl R−(l+1) = 0 for all l > 0. Substituting this condition into Eq. (2.23), we finally obtain A0 = 0. To summarize, the expansion coefficients are determined by the set of equations A0 = 0, (A1 + D)R − C1 R−2 = 0, Al Rl − Cl R−(l+1) = 0, l > 1. 2.2. APPLICATIONS OF MACROSCOPIC ELECTRODYNAMICS D + η>0 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 49 η<0 Figure 2.3: Polarization of a sphere by an external electric field. It is straightforward to verify that these equations are equivalent to the conditions C1 = DR3 −1 , +2 A1 = − 3 D, +2 while Cl>1 = Al>1 = 0. The potential distorted by the presence of the sphere is thus given by 3 ,r < R +2 3 (2.24) φ(r, θ) = −rE0 cos θ × −1 R , r ≥ R, 1 − +2 r The result (2.24) affords a very intuitive physical interpretation: 3 3 . Inside the sphere the electric field E = −∇φ = ∇E0 z +2 = E0 +2 ez is parallel to the external electric field, but weaker in magnitude (as long as > 1 which usually is the 3 −1 case.) This indicates the buildup of a polarization P = −1 E = 4π E. 4π +2 0 Outside the sphere, the electric field is given by a superposition of the external field and the field generated by a dipole potential φ = d · x/r3 = d cos θ/r2 , where the dipole moment is given by d = V P and V = 4πR3 /3 is the volume of the sphere. To understand the origin of this dipole moment, notice that in a polarizable medium, and in the absence of external charges, 0 = ∇ · D = ∇ · E + 4π∇ · P, while the microscopic charge density ∇ · E = 4πρmic need not necessarily vanish. In exceptional cases where ρmic does not vanish upon averaging, we denote hρmic i = ρpol the polarization charge. The polarization charge is determined by the condition ∇ · P = −ρpol . Turning back to the sphere, the equation 0 = ∇ · D = χ ∇ · P tells us that there is no polarization charge in the bulk of the sphere (nor, of course, outside the sphere.) 50 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS However, arguing as in section ??, we find that the sphere (or any boundary between two media with different dielectric constants) carries a surface polarization charge ηind = Pn0 − Pn . 3 −1 E cos θ. Qualitatively, Specifically, in our problem, Pn0 = 0 so that η = −Pn = − 4π +2 0 the origin of this charge is easy enough to understand: while in the bulk of the sphere, the induced dipole moments cancel each other upon spatial averaging (see the figure), uncompensated charges remain at the surface of the medium. These surface charges induce a finite and quite macroscopic dipole moment which is aligned opposite to E0 and acts as a source of the electric field (but not of the displacement field). 2.2.2 Magnetostatics in the presence of matter Generalities The derivation of ‘magnetic boundary conditions’ in the presence of materials with a finite permeability largely parallels the electric case: the macroscopic equations of magnetostatics read as 4π j, ∇×H = c ∇ · B = 0, H = µ−1 B. (2.25) Again, we wish to explore the behaviour of the fields in the vicinity of a boundary between two media with different permeabilities. Proceeding in the usual manner, i.e. by application of infinitesimal variants of Gaus̈’s and Stokes law, respectively, we derive the equations (B(x) − B0 (x)) · n = 0, 4π (H(x) − H0 (x)) × n = js , c where js is an optional surface current. (2.26) Example: paramagnetic sphere in a homogeneous magnetic field Consider a sphere of radius R and permeability µ > 1 in the presence of an external magnetic field H. We wish to compute the magnetic induction B inside and outside the sphere. In principle, this can be done as in the analogous problem on the dielectric sphere, i.e. by Legendre polynomial series expansion. However, comparing the magnetic Maxwell equations to electric Maxwell equations considered earlier, ∇·B=0 ∇×H=0 B = µH −1 P= E 4π ↔ ↔ ↔ ↔ ∇ · D = 0, ∇ × E = 0, D = E, µ−1 M= H, 4π 2.3. WAVE PROPAGATION IN MEDIA 51 we notice that there is actually no need to do so: Formally identifying B ↔ D, H ↔ E, µ ↔ , M ↔ P, the two sets of equation become equivalent, and we may just copy the solution obtained above. Thus (i) the magnetic field outside the sphere is a superposition of the 3 µ−1 H is external field and a magnetic dipole field, where (ii) the dipole moment M = 4π µ+2 parallel to the external field. This magnetic moment is caused by the orientation of the intrinsic moments along the external field axis; consequently, the actually felt magnetic field (the magnetic induction) B = H + 4πM exceeds the external field. 2.3 Wave propagation in media What happens if an electromagnetic wave impinges upon a medium characterized by a non– 11 trivial dielectric function ? And how does the propagation behaviour of waves relate to actual physical processes inside the medium? To prepare the discussion of these questions, we will first introduce a physically motivated model for thhe dependence of the dielectric function on its arguments. This modelling will establish a concrete link between the amplitude of the electromagnetic waves and the dynamics of the charge carriers inside the medium. 2.3.1 Model dielectric function In Fourier space, the dielectric function, (q, ω) is a function of both wave vector and frequency. It owes its frequency dependence to the fact that a wave in matter create excitations of the molecular degrees of freedom. The feedback of these excitations to the electromagnetic wave is described by the frequency dependence of the function . To get an idea of the relevant frequency scales, notice that typical excitation energies in solids are of the order ~ω < 1 eV. Noting that Planck’s constant ~ ' 0.6 × 10−15 eVs, we find that the characteristic frequencies at which solids dominantly respond to electromagnetic waves are of the order ω < 1015 s−1 . However, the wave lengths corresponding to these frequencies, λ = 2π/|q| ∼ 2πc/ω ∼ 10−6 m are much larger than the range of a few interatomic spacings over which we expect the non–local feedback of the external field into the polarization to be ‘screened’ (the range of the susceptibility χ.) This means (think why!) that the function (q, ω) ' (ω) is largely momentum–independent at the momentum scales of physical relevance. To obtain a crude model for the function (ω), we imagine that the electrons surrounding the molecules inside the solid are harmonically bound to the nuclear center coordinates. An individual electron at spatial coordinate x (measured w.r.t. a nuclear position) will then be subject to the equation of motion m d2t + ω02 /2 + γdt x(t) = eE(t), (2.27) where ω0 is the characteristic frequency of the oscillator motion of the electron, γ a relaxation constant, and the variation of the external field over the extent of the molecule has been 11 We emphasize dielectric function because, as shall be seen below, the magnetic permeability has a much lesser impact on electromagnetic wave propagation. 52 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS neglected (cf. the discussion above.) Fourier transforming this equation, we obtain d ≡ ex(ω) = e2 E(ω)m−1 /(ω02 − ω 2 − iγω). The dipole moment d(ω) of a molecule P with electrons fi oscillating at frequency ωi and damping rate γi is given by d(ω) = i fi d(ω0 →ωi ,γ→γi ) . Denoting the density of molecules in the by n, we thus obtain (cf. Eqs. (2.1), (2.10), and (2.15)) 4πne2 X fi . (2.28) (ω) = 1 + 2 m ωi − ω 2 − iγi ω i With suitable quantum mechanical definitions of the material constants fi , ωi and γi , the model dielectric function (2.28) provides an accurate description of the molecular contribution to the dielectric behaviour of solids. A schematic of the typical behaviour of real and imaginary part of the dielectric function is shown in Fig. 2.4. A few remarks on the profile of these functions: . The material constant employed in the ‘static’ sections of this chapter is given by = (0). The imaginary part of the zero frequency dielectric function is negligible. . For reasons to become clear later on, we call the function α(ω) = 2ω p Im µ(ω) c (2.29) the (frequency dependent) absorption coefficient and n(ω) = Re p µ(ω) (2.30) the (frequency dependent) refraction coefficient of the system. . For most frequencies, i.e. everywhere except for the immediate vicinity of a resonant d frequency ωi , the absorption coefficient is negligibly small. In these regions, dω Re (ω) > 0. The positivity of this derivative defines a region of normal dispersion. . In the immediate vicinity of a resonant frequency, |ω − ωi | < γi /2, the absorption d coefficient shoots up while the derivative dω Re (ω) < 0 changes sign. As we shall discuss momentarily, the physics of these frequency windows of anomalous dispersion is governed by resonant absorption processes. 2.3.2 Plane waves in matter To study the impact of the non–uniform dielectric function on the behaviour of electromagnetic waves in matter, we consider the spatio–temporal Fourier transform of the Maxwell equations in a medium free of extraneous sources, ρ = 0, j = 0. Using that D(k) = (k)E(k) and 2.3. WAVE PROPAGATION IN MEDIA 53 Im Re γi ω ωi Figure 2.4: Schematic of the functional profile of the function (ω). Each resonance frequency ωi is center of a peak in the imaginary part of and of a resonant swing of in the real part. The width of these structures is determined by the damping rate γi . H(k) = µ−1 (k)B(k), where k = (ω/c, k), we have k · (E) ω k × (µ−1 B) + (E) c ω k×E− B c k·B = 0, = 0, = 0, = 0, where we omitted momentum arguments for notational clarity. We next compute k×(third equation) + (ωµ/c)×(second equation) and (ω/c)×(third equation) − k×(second equation) to obtain the Fourier representation of the wave equation ω2 E(k, ω) 2 = 0, (2.31) k − 2 × B(k, ω) c (ω) where c c(ω) = p , µ(ω) (2.32) is the effective velocity of light in matter. Comparing with our discussion of section 1.4, we conclude that a plane electric wave in a matter is mathematically described by the function E(x, t) = E0 exp(i(k · x − ωt)), 54 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS where k = kn and k = ω/c(ω) is a (generally complex) function of the wave frequency. Specifically, for Im (k · x) > 0, the wave is will be exponentially damped. Splitting the wave number k into its real and imaginary part, we have k = Re k + i Im k = i ω n(ω) + α(ω), c 2 where n and α are the refraction and absorption coefficient, respectively. To better understand the meaning of the absorption coefficient, let us compute the Poynting vector of the electromagnetic wave. As in section 1.4, the Maxwell equation 0 = ∇ · D = ∇ · E ∝ k · E0 implies transversality of the electric field. (Here we rely on the approximate x–independence of the dielectric function.) Choosing coordinates such that n = e3 and assuming planar polarization for simplicity, we set E0 = E0 e1 with a real coefficient E0 . The — physically relevant — real part of the electric field is thus given by Re E = E0 e1 cos(ω(zn/c − t))e−αz/2 . From the Maxwell equations ∇ · B = 0 and ∇ × E + c−1 ∂t B, we further obtain (exercise) Re B = E0 e2 cos(ω(zn/c − t))e−αz/2 . Assuming that the permeability is close to unity, µ ' 1 or B ' H, we thus obtain for the magnitude of the Poynting vector |S| = 2 cE02 c h... it cE0 −αz |E × H| = cos2 (ω(zn/c − t))e−αz −→ e , 4π 4π 8π where in the last step we have averaged over several intervals of the oscillation period 2π/ω. According to this result, The electromagnetic energy current inside a medium decays at a rate set by the absorption coefficient. This phenomenon is easy enough to interpret: according to our semi–phenomenological model of the dielectric function above, the absorption coefficient is non–vanishing in the immediate vicinity of a resonant frequency ωi , i.e. a frequency where molecular degrees of freedom oscillate. The energy stored in an electromagnetic wave of this frequency may thus get converted into mechanical oscillator energy. This goes along with a loss of field intensity, i.e. a diminishing of the Poynting vector or, equivalently, a loss of electromagnetic energy density, w. 2.3.3 Dispersion In the previous section we have focused on the imaginary part of the dielectric function and on its attenuating impact on individual plane waves propagating in matter. We next turn to the discussion of the — equally important — role played by the real part. To introduce 2.3. WAVE PROPAGATION IN MEDIA 55 the relevant physical principles in a maximally simple environment, we will focus on a one– dimensional model of wave propagation throughout. Consider, thus, the one–dimensional variant of the wave equations (2.31), ω2 2 ψ(k, ω) = 0, (2.33) k − 2 c (ω) p where k is a one–dimensional wave ’vector’ and c(ω) = c/ (ω) as before. (For simplicity, we set µ = 1 from the outset.) Plane wave solutions to this equation are given by ψ(x, t) = 12 ψ0 exp(ik(ω)x − iωt), where k(ω) = ω/c(ω). Notice that an alternative representation of the plane wave configurations reads as ψ(x, t) = ψ0 exp(ikx − iω(k)t), where ω(k) is defined implicitly, viz. as a solution of the equation ω/c(ω) = k. To understand the consequences of frequency dependent variations in the real part of the dielectric function, we need, however, to go beyond the level of isolated plane wave. Rather, let us consider a superposition of plane waves (cf. Eq. (1.42)), Z ψ(x, t) = dk ψ0 (k)e−i(kx−ω(k)t) , (2.34) where ψ0 (k) is an arbitrary function. (Exercise: check that this superposition solves the wave equations.) Specifically, let us chose the function ψ0 (k) such that, at initial time t = 0, the distribution ψ(x, t = 0) is localized in a finite region in space. Configurations of this type are called wave packets. While there is a lot of freedom in choosing such spatially localizing envelope functions, a convenient (and for our purposes sufficiently general) choice of the weight function ψ0 is a Gaussian, 2 2ξ , ψ0 (k) = ψ0 exp −(k − k0 ) 4 where ψ0 ∈ C is a amplitude coefficient fixed and ξ a coefficient of dimensionality ‘length’ that determines the spatial extent of the wave package (at time t = 0.) To check that latter assertion, let us compute the spatial profile ψ(x, 0) of the wave package at the initial time: Z Z 2 2 2ξ ik0 x −(k−k0 )2 ξ4 −ikx = ψ0 e dk e−k 4 −ikx = ψ(x, 0) = ψ0 dk e Z 2 √ 2ξ 2 2 ik0 x dk e−k 4 −(x/ξ) = 2 πψ0 ξ eik0 x e−(x/ξ) . = ψ0 e The function ψ(x, 0) is concentrated in a volume of extension ξ; it describes the profile of a small wave ‘packet’ at initial time t = 0. What will happen to this wave packet as time goes on? To answer this question, let us assume that the scales over which the function ω(k) varies are much smaller than the extension of the wave package in wave number space, k. It is, then, 12 There is also the ‘left moving’ solution, ψ(x, t) = ψ0 exp(ik(ω)x + iωt), but for our purposes it will be sufficient to consider right moving waves. 56 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS ψ ξ x vg t Figure 2.5: Left: Dispersive propagation of an initially sharply focused wave package. Right: A more shallow wave package suffers less drastically from dispersive deformation. a good idea to expand ω(k) around k0 : a ω(k) = ω0 + vg (k − k0 ) + (k − k0 )2 + . . . , 2 where we introduced the abbreviations ω0 ≡ ω(k0 ), vg ≡ ω 0 (k0 ) and a ≡ ω 00 (k0 ). Substituting this expansion into Eq. (2.34) and doing the Gaussian integral over wave numbers we obtain Z 2 2 ψ(x, t) = ψ0 dk e−(k−k0 ) ξ /4−i(kx−ω(k)t) ' Z 2 2 ik0 (x−vp t) ' ψ0 e dk e−(ξ +2iat)(k−k0 ) /4−i(k−k0 )(x−vg t) = √ 2 ' 2 πψ0 ξ(t)eik0 (x−vp t) e−((x−vg t)/ξ(t)) , p where we introduced the function ξ(t) = ξ 2 + 2iat, and the abbreviation vp ≡ ω(k0 )/k0 Let us try to understand the main characteristics of this result: . The center of the wave package moves with a characteristic velocity vg = ∂k ω(k0 ) to the right. In as much as vg determines the effective center velocity of a superposition of a continuum or ‘group’ of plain waves, vg is called the group velocity of the system. Only in vacuum where c(ω) = c is independent of frequency we have ω = ck and the group velocity vg = ∂k ω = c coincides with the vacuum velocity of light. A customary yet far less useful notion is that of a phase velocity: an individual plane wave behaves as ∼ exp(i(kx − ω(k)t)) = exp(ik(x − (ω(k)/k)t)). In view of the structure of the exponent, one may call vp ≡ ω(k)/k the ‘velocity’ of the wave. Since, however, the wave extends over the entire system anyway, the phase velocity is not of direct physical relevance. Notice that the phase velocity vp = ω(k)/k = c(k) = c/n(ω), where the refraction coefficient has been introduced in (2.30). For most frequencies and in almost all substances, n > 1 ⇒ vp < c. In principle, however, the phase velocity 2.3. WAVE PROPAGATION IN MEDIA 57 13 may become larger than the speed of light. The group velocity vg = ∂k ω(k) = −1 −1 (∂ω k(ω)) = (∂ω ωn(ω)) c = (n(ω) + ω∂ω n(ω))−1 c = vp (1 + n−1 ω∂ω n)−1 . In regions of normal dispersion, ∂ω n > 0 implying that vg < vp . For the discussion of anomalous cases, where vg may exceed the phase velocity or even the vacuum velocity of light, see [1]. . The width of the wave package, ∆x changes in time. Inspection of the Gaussian fac2 −2 ∗−2 tor controlling the envelope of the packet shows that Re ψ ∼ e(x−vg t) (ξ +ξ )/2 , where the symbol ’∼’ indicates that only the exponential dependence of the package is taken into account. Thus, the width of the package is given by ξ(t) = ξ(1 + (2at/ξ 2 )2 )1/2 . According to this formula, the rate at which ξ(t) increases (the wave package flows apart) is the larger, the sharper the spatial concentration of the initial wave package was (cf. Fig. 2.5). For large times, t ξ 2 /a, the width of the wave package increases as ∆x ∼ t. The disintegration of the wave package is caused by a phenomenon called dispersion: The form of the package at t = 0 is obtained by (Fourier) superposition of a large number of plane waves. If all these plane waves propagated with the same phase velocity (i.e. if ω(k) was k–independent), the form of the wave package would remain un–changed. However, due to the fact that in a medium plane waves of different wave number k generally propagate at different velocities, the Fourier spectrum of the package at non–zero times differs from that at time zero. Which in turn means that the integrity of the form of the wave package gets gradually lost. The dispersive spreading of electromagnetic wave packages is a phenomenon of immense applied importance. For example, dispersive deformation is one of the main factors limiting the information load that can be pushed through fiber optical cables. (For too high loads, the ’wave packets’ constituting the bit stream through such fibers begin to overlap thus loosing their identity. The construction of ever more sophisticated counter measures optimizing the data capacity of optical fibers represents a major stream of applied research. 2.3.4 Electric conductivity As a last phenomenon relating to macroscopic electrodynamics, we discuss the electric conduction properties of metallic systems. Empirically, we know that in a metal, a finite electric field will cause current flow. In its most general form, the relation between field and current assumes a form similar to that between field and polarization discussed above: Z ji (x, t) = 3 0 Z t dx −∞ 13 dt0 σij (x − x0 , t − t0 )Ej (x0 , t0 ), (2.35) However, there is no need to worry that such anomalies conflict with Einsteins principle of relativity to be discussed in chapter 3 below: the dynamics of a uniform wave train is not linked to the transport of energy or other observable physical quantities, i.e. there is no actual physical entity that is transported at a velocity larger than the speed of light. 58 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS where σ = {σij (x, t)} is called the conductivity tensor. This equation states that a field may cause current flow at different spatial locations in time. Equally, a field may cause current flow in directions different from the field vector. (For example, in a system subject to a magnetic field in z–direction, an electric field in x–direction will cause current flow in both x– and y–direction, why?) As with the electric susceptibility, the non–local space dependence of the conductivity may often be neglected, σ(x, t) ∝ δ(x), or σ(q, ω) = σ(ω) in Fourier space. However, except for very low frequencies, the ω–dependence is generally important. Generally, one calls the finite–frequency conductivity ’AC conductivity’, where ’AC’ stands for ’alternating current’. For ω → 0, the conductivity crosses over to the ’DC conductivity’, where ’DC’ stands for ’directed current’. In the DC limit, and for an isotropic medium, the field–current relation assumes the form of Ohm’s law j = σE. (2.36) In the following, we wish to understand how the phenomenon of electrical conduction can be explained from our previous considerations. There are two different ways to specify the dynamical response of a metal to external fields: We may either postulate a current–field relation in the spirit of Ohms law. Or we may determine a dielectric function which (in a more microscopic way) describes the response of the mobile charge carriers in the system to the presence of a field. However, since that response generally implies current flow, the simultaneous speccificiation of both a frequency 14 dependent dielectric function and a current–field relation would over–determine the problem. In the following, we discuss the two alternatives in more detail. The phenomenological route Let us impose Eq. (2.36) as a condition extraneous to the Maxwell equations; We simply postulate that an electric field drives a current whose temporal and spatial profile is rigidly locked to the electric field. As indiciated above, this postulate largely determines the dynamical response of the system to the external field, i.e. the dielectric function (ω) = has to be chosen constant. Substitution of this ansatz into the temporal Fourier transform of the Maxwell equation (??) obtains ∇ × H(x, ω) = 1 (−iω + 4πσ) E(x, ω), c (2.37) For the moment, we leave this result as it is and turn to The microscopic route We want to describe the electromagnetic response of a metal in terms of a model dielectric function. The dielectric function (2.28) constructed above describes the response of charge carriers harmonically bound to molecular center coordinates by a harmonic potential ∼ mi ωi2 /2 and subject to an effective damping or friction mechanism. Now, the conduction electrons of a 14 As we shall see below, a dielectric constant does not lead to current flow. 2.3. WAVE PROPAGATION IN MEDIA 59 metal mey be thought of as charge carriers whose confining potential is infinitely weak, ωi = 0, so that their motion is not tied to a reference coordinate. Still, the conduction electrons will be subject to friction mechanisms. (E.g., scattering off atomic imperfections will impede their ballistic motion through the solid.) We thus model the dielectric function of a metal as ωωi 4πne2 X fi 4πne e2 4πne e2 + ' − , (ω) = 1 − m ωi2 − ω 2 − iγi ω mω(ω + τi ) mω(ω + τi ) i (2.38) where ne ≡ N f0 is a measure of the concentration of conduction electrons, the friction 15 coefficient of the electrons has been denoted by τ −1 , and in the last step we noted that for frequencies well below the oscillator frequencies ωi of the valence electrons, the frequency dependence of the second contribution to the dielectric function may be neglected; We thus lump that contribution into an effective material constant . Keep in mind that the free electron contribution to the dielectric function exhaustingly describes the acceleration of charge carriers by the field (the onset of current), i.e. no extraneous current–field relations must be imposed. Substitution of Eq. (2.38) into the Maxwell equation (??) obtains the result 1 4πne e2 iω(ω) E(x, ω) = −iω + E(x, ω), (2.39) ∇ × H(x, ω) = − c c m(−iω + τ1 ) Comparison to the phenomenological result (2.37) yields the identification σ(ω) = ne e 2 , m(−iω + τ1 ) (2.40) i.e. a formula for the AC conductivity in terms of electron density and mass, and the impurity collision rate. Eq. (2.40) affords a very intuitive physical interpretations: . For high frequencies, ω τ −1 , we may approximate σ ∼ (iω)−1 , or ωj ∼ E. In time space, this means ∂t j ∼ ρv̇ ∼ E. This formula describes the ballistic acceleration of electrons by an electric field: On time scales t τ , which, in a Fourier sense, correspond to large frequencies ω τ −1 the motion of electrons is not hindered by impurity collisions and they accelarate as free particles. . For low frequencies, ω τ −1 , the conductivity may be approximated by a constant, σ ∼ τ . In this regime, the motion of the electrons is impeded by repeated impurity collisions. As a result they diffuse with a constant drift induced by the electric field. 15 ... alluding to the fact that the attenuation of the electrons is due to collisions of impurities which take place at a rate denoted by τ . 60 CHAPTER 2. MACROSCOPIC ELECTRODYNAMICS Chapter 3 Relativistic invariance 3.1 Introduction Consult an entry level physics textbook to re–familiarize yourself with the basic notions of special relativity! 3.1.1 Galilei invariance and its limitations The laws of mechanics are the same in two coordinate systems K and K 0 moving at a constant velocity relative to one another. Within classical Newtonian mechanics, the space time coordinates (t, x) and (t0 , x0 ) of two such systems are related by a Galilei transformation, t0 = t x0 = x − vt. Substituting this transformation into Newton’s equations of motion as formulated in system K, X d2 xi = −∇ Vij (xi − xj ) K: xi dt2 j (xi are particle coordinates in system K and Vij is a pair potential which, crucially, depends only on the differences between particle coordinates but not on any distinguished reference point in universe.) we find that K0 : X d2 x0i 0 = −∇ Vij (x0i − x0j ), x i 02 dt j i.e. the equations remain form invariant. This is what is meant when we say that classical 1 mechanics is Galilei invariant. 1 More precisely, Galilei invariance implies invariance under all transformations of the Galilei group, the uniform motions above, rotations of space, and uniform translations of space and time. 61 62 CHAPTER 3. RELATIVISTIC INVARIANCE Now, suppose a certain physical phenomenon in K (the dynamics of a water surface, say) is effectively described by a wave equation, 1 2 K: ∆ − 2 ∂t f (x, t) = 0. c Substitution of the Galilei transformation into this equation obtains 2 1 1 2 0 0 2 0 0 K : ∆ − 2 ∂t0 − 2 v · ∇ ∂t0 − 2 (v · ∇ ) f (x0 , t0 ) = 0, c c c i.e. an equation of modified form. At first sight this result may look worrisome. After all, many waves (water waves, sound waves, etc.) are of pure mechanical origin which means that wave propagation ought to be a Galilei invariant phenomenon. The resolution to this problem lies with the fact, that matter waves generally propagate in a host medium (water, say.) While this medium is at rest in K, it isn’t in K 0 . But a wave propagating in a non–stationary medium must be controlled by a different equation of motion, i.e. there is no a priori contradiction to the principle of Galilei invariance. Equivalently, one may observe that a wave emitted by a stationary point source in K will propagate with velocity c (in K.) However, an observer in K 0 will observe wave fronts propagating at different velocities. Mathematically, this distortion of the wave pattern is described by K 0 ’s ‘wave equation’. But what about electromagnetic waves? So far, we have never talked about a ‘host medium’ supporting the propagation of electromagnetic radiation. The lack of Galilean invariance of Maxwells equations then leaves us with three possibilities: 1. Nature is Galilean invariant, but Maxwells equations are incorrect (or, to put it more mildly, incomplete.) 2. An electromagnetic host medium (let’s call it the ether) does, indeed, exist. 3. Nature is invariant under a group of space–time transformations different from Galilei. If so, Newton’s equations of motion are in need of modification. These questions became pressing in the second half of the 19th century, after Maxwell had brought the theory of electromagnetism to completion, and the exploration of all kinds of electromagnetic wave equation was a focal point of theoretical and experimental research. In view of the sensational success of Maxwells theory the first answer was not really considered an option. Since Newtonian mechanics wasn’t a concept to be sacrificed in a lightheartededly either, the existence of an ether was postulated. However, it was soon realized that the ether medium would have to have quite eccentric physical properties. Since no one had ever actually observed an ether, this medium would have to be infinitely light and fully non–interacting with matter. (Its presumed etheric properties actually gave it its name.) Equally problematic, electromagnetic waves would uniformly propagate with the speed of light, c, only in the distinguished ether rest frame. The ether postulate was tantamount to the abandonment of the idea of ‘no preferential intertial frames in nature’ a concept close at heart to the development of modern 3.1. INTRODUCTION 63 Figure 3.1: Measurement of Newtons law of gravitational attraction (any other law would be just as good) in two frames that are relatively inertial. The equations describing the force between masses will be the same in both frames physics. Towards the end of the nineteenth century, experimental evidence was mounting that the vacuum velocity of light was, independent of the observation frame, given by c; the idea of an ether became increasingly difficult to sustain. 3.1.2 Einsteins postulates In essence, this summarizes the state of affairs before Einstein entered the debate. In view of the arguments outlined above, Einstein decided that 3. had to be the correct option. Specifically, he put forward two fundamental postulates . The postulate of relativity: This postulate is best formulated in a negative form: there is no such thing like ‘absolute rest’ or an ’absolute velocity’, or an ‘absolute point in space’. No absolute frame exists in which the laws of nature assume a distinguished form. To formulate this principle in a positive form, we define two coordinate frames to be inertial relatively to each other if one is related to the other by a translation in space and/or time, a constant rotation, a uniform motion, or a combination of these operations. The postulate then states that the laws of nature will assume the same form (be expressed by identical fundamental equations) in both systems. As a corollary, this implies that physical laws must never make reference to absolute coordinates, times, angles, etc. . The postulate of the constancy of the speed of light: the speed of light is independent of the motion of its source. What makes this postulate — which superficially might seem to be no more than an 2 innocent tribute to experimental observation — so revolutionary is that it abandons the 2 ... although the constancy of the speed of light had not yet been fully established experimentally when Einstein made his postulate. 64 CHAPTER 3. RELATIVISTIC INVARIANCE notion of ‘absolute time’. For example, two events that are simultaneous in one inertial frame will in general no longer be simultaneous in other inertial frames. While, more than one hundred years later, we have grown used to its implications the revolutionary character of Einsteins second postulate cannot be exaggerated. 3.1.3 First consequences As we shall see below, the notion of an ’absolute presence’ — which makes perfect sense in 3 Newtonian mechanics — becomes meaningless in Einsteins theory of relativity. When we monitor physical processes we must be careful to assign to each physical event its own space and time coordinate x = (ct, x) (where the factor of c has been included for later convenience.) An event is canonical in that it can be observed in all frames (no matter whether they are relatively inertial or not). However, both its space and time coordinates are non–canonical, i.e. when observed from a different frame K 0 it will have coordinates x0 = (ct0 , x0 ), where t0 may be different from t. ct When we talk of a ’body’ or a ’particle’ in relativity, what we actually mean is the continuous sequence of events x = (ct, x) defined by its instantaneous position x at time t (both monitored in a specific frame). The assembly of these events obtains the world line of the body, a (directed) curve in a space–time coordinate frame (cf. the figure for a schematic of the (1 + 1)– dimensional world lines of a body in uniform motion (left) and an x accelerated body (right).) The most important characteristic of the relative motion of inertial frames is that no acceleration is involved. This implies, in particular, that a uniform motion in one frame will stay uniform in the other; straight world lines transform into straight world lines. Parameterizing a general uniform motion in system K by x = (ct, wt + b), where w is the velocity and b an offset vector, we must require that the coordinate representation x0 in any inertial system K 0 is again a straight line. Now, the most general family of transformations mapping straight lines onto straight lines are the affine transformations x0 = Λx + a, (3.1) 4 where Λ : Rd+1 → Rd+1 is a linear map from (d + 1)–dimensional space time into itself and a ∈ Rd+1 a vector describing the displacement of the origin of coordinate systems in space and or time. For a = 0, we speak of a homogeneous coordinate transformation. Since any transformation may be represented as a succession of a homogeneous transformation and a trivial translation, we will focus on the former class throughout. INFO Above, we have set up criteria for two frames being relatively inertial. However, in the theory of relativity, one meets with the notion of (absolutely) inertial frames (no reference to 3 Two events that occur simultaneously in one frame will remain simultaneous under Galilei transformations. Although we will be foremostly interested in the case d = 3, it is occasionaly useful to generalize to other dimensions. Specifically, space–time diagrams are easiest to draw/imagine in (1 + 1)–dimensions. 4 3.1. INTRODUCTION 65 another frame.) But how can we tell, whether a frame is inertial or not? And what, for that matter, does the attribute ’inertial’ mean, if no transformation is involved? Newton approached this question by proposing the concept of an ’absolute space’. Systems at rest in this space, or at constant motion relative to it, were ’inertial’. For practical purposes, it was proposed that a reference system tied to the position of the fixed stars was (a good approximation of) an absolute system. Nowadays, we know that these concepts are problematic. The fixed stars aren’t fixed (nor for that matter is any known constituent of the universe), i.e. we are not aware of a suitable ’anchor’ reference system. What is more, the declaration of an absolute frame is at odds with the ideas of relativity. In the late nineteenth century, the above definition of inertial frames was superseded by a purely operational one. A modern formulation of the alternative definition might read: An inertial frame of reference is one in which the motion of a particle not subject to forces is in a straight line at constant speed. The meaning of this definition is best exposed by considering its negative: imagine an observer tied to a rotating disc. A particle thrown by that observer will not move at a straight line (relative to the disc.) If the observer does not know that he/she is on a rotating frame, he/she will have to attribute bending of the trajectory to some ’fictitious’ forces. The presence of such forces is proof that the disc system is not inertial. However, in most physical contexts questions concerning the ’absolute’ status of a system do not 5 arise. Far more frequently, we care about what happens as we pass from one frame (laboratory, say) to another (that of a particle moving relatively to the lab.) Thus, the truly important issue is whether systems are relatively inertial or not. What conditions will the transformation matrix Λ have to fulfill in order to qualify as a transformation between inertial frames? Referring for a rigorous discussion of this question to Ref. [2], we here merely note that symmetry conditions implied by the principle of relativity nearly but not completely specify the class of permissible transformations. This class of transformations includes the Gallilei transformations which we saw are incompatible with the physics of electromagnetic wave propagation. At this point, Einsteins second postulate enters the game. Consider two inertial frames K and K 0 related by a homogeneous coordinate transformation. At time t = 0 the origins of the two systems coalesce. Let us assume, however, the K 0 moves relatively to K at constant velocity v. Now consider the event: at time t = 0 a light source at x = 0 emits a signal. In K, the wave fronts of the light signal then trace out world lines propagating at the vacuum speed of light, i.e. the spatial coordinates x of a wave front obey the relation x2 − c2 t2 = 0. The crucial point now is that in spite of its motion relative to the point of emanation of the light signal, an observer in K 0 , too, will observe a light signal moving with velocity c in all directions. (Notice the difference to, say, a sound wave. From K 0 ’s point of view, fronts moving along v would be slowed down while those moving in the opposite direction would 5 Exceptions include high precision measurements of gravitational forces. 66 CHAPTER 3. RELATIVISTIC INVARIANCE 0 = vsound − v.) This means that the light front coordinates propagate at higher speed, vsound 0 in x obey the same relation x02 − c2 t02 = 0. This additional condition unambiguously fixes the set of permissible transformations. To formulate it in the language of linear coordinate transformations (which, in view of the linearity of transformations between inertial frames is the adequate one), let us define the matrix 1 −1 , g = {gµν } ≡ (3.2) −1 −1 where all non–diagonal entries are zero. The constancy of the speed of light may then be expressed in concise form as xT gx = 0 ⇒ x0T gx0 = 0. This condition suggests to focus on 6 coordinate transformations for which the bilinear form xt gx is conserved, xT gx = x0T gx0 . (3.3) Substituting x0 = Λx, we find that Λ must obey the condition ΛT gΛ = g (3.4) Linear coordinate transformations satisfying this condition are called Lorentz transformations. We thus postulate that All laws of nature must be invariant under affine coordinate transformations (3.1) where the homogeneous part of the transformation obeys the Lorentz condition (3.4). The statement above characterizes the set of legitimate transformations in implicit terms. In the next section, we aim for a more explicit description of the Lorentz transformation. These results will form the basis for our discussion of Lorentz invariant electrodynamics further down. 3.2 The mathematics of special relativity I: Lorentz group In this section, we will develop the mathematical framework required to describe transformations between inertial frames. We will show that these transformations form a continuous group, the Lorentz group. 6 Notice the gap in the logics of the argument. Einsteins second postulate requires Eq. (3.3) only for those space time vectors for which xT gx = 0 vanishes; We now suggest to declare it as a universal condition, i.e. one that holds irrespective of the value of xT gx. To show that the weaker condition implies the stronger, one may first prove (cf. Ref. [2]) that relatively straightforward symmetry considerations suffice to fix the class of allowed transformations up to one undetermined scalar constant. The value of that constant is then fixed by the ‘weak condition’ above. Once it has been fixed, one finds that Eq. (3.3) holds in general. 3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 67 ct time like forward light cone space like x backward light cone Figure 3.2: Schematic of the decomposition of space time into a time like region and a space like region. The two regions are separated by a ‘light cone’ of light–like vectors. (A ’cone’ because in space dimensions d > 1 it acquires a conical structure.) The forward/backward part of the light cone extends to positive/negative times 3.2.1 Background The bilinear form introduced above defines a scalar product of R4 : g : R4 × R4 → R, (x, y) 7→ xT gy ≡ xµ gµν y ν . (3.5) (3.6) A crucial feature of this scalar product is that it is not positive definite. For reasons to be discussed below, we call vectors whose norm is positive, xT gx = (ct)2 − x2 > 0, time like vectors. Vectors with negative norm will be called space like and vectors of vanishing norm light like. We are interested in Lorentz transformations, i.e. linear transformations Λ : R4 → R4 , x 7→ Λx that leave the scalar product g invariant, ΛT gΛ = g. (Similarly to, say, the orthogonal transformations which leave the standard scalar product invariant.) Before exploring the structure of these transformations in detail, let us observe a number of general properties. First notice that the set of Lorentz transformations forms a group, the Lorentz group, L. For if Λ and Λ0 are Lorentz transformations, so is the composition ΛΛ0 . The identity transformation obviously obeys the Lorentz condition. Finally, the equation ΛT gΛ = g implies that Λ is non–singular (why?), i.e. it possesses an inverse, Λ−1 . We have thus shown that the Lorentz transformations form a group. Global considerations on the Lorentz group What more can be said about this group? Taking the determinant of the invariance relation, we find that det(ΛT gΛ) = det(Λ)2 det(g) = det(g), i.e. det(Λ)2 = 1 which means that 68 CHAPTER 3. RELATIVISTIC INVARIANCE det(Λ) ∈ {−1, 1}. (Λ is a real matrix, i.e. det(Λ) ∈ R.) We may further observe that the absolute value of the component Λ00 is always larger than unity. This is shown inspection P by T 2 2 of the 00–element of the invariance relation: 1 = g00 = (Λ gΛ)00 = Λ00 − i Λ0i . Since the subtracted term is positive (or zero), we have |Λ00 | ≥ 1. We thus conclude that L contains four components defined by L↑+ L↓+ L↑− L↓− : : : : det Λ = +1, det Λ = +1, det Λ = −1, det Λ = −1, Λ00 Λ00 Λ00 Λ00 ≥ +1, ≤ −1, > +1, ≤ −1. (3.7) Transformations belonging to any one of these subsets cannot be continuously deformed into a transformation of a different subset (why?), i.e. the subsets are truly disjoint. In the following, we introduce a few distinguished transformations belonging to the different components of the Lorentz group. Space reflection, or parity, P : (ct, x) → (ct, −x) is a Lorentz transformation. It belongs to the component L↑− . Time inversion, T : (ct, x) → (−ct, x) belongs to the component L↓− . The product of space reflection and time inversion, PT : (ct, x) → (−ct, −x) belongs to the component L↓+ . Generally, the product of two transformations belonging to a given component no longer belongs to that component, i.e. the subsets do not for subgroups of the Lorentz group. However, the component L↑+ is exceptional in that it is a subgroup. It is called the proper orthochrome Lorentz group, or restricted Lorentz group. The attribute ’proper’ indicates that it does not contain excpetional transformations (such as parity and time inversion) which cannot be continuosly deformed back to the identity transform. It is called ‘orthochronous’ because it maps the positive light cone into it self, i.e. it respects the ‘direction’ of time. To see that L↑+ is a group, notice that it contains the identity transformation. Further, its elements can be continuously contracted to unity; however, if this property holds for two elements Λ, Λ0 ∈ L↑+ , then so for the product ΛΛ0 , i.e. the product is again in L↑+ . The restricted Lorentz group The restricted Lorentz group contains those transformations which we foremostly associate with coordinate transformations between inertial frames. For example, rotations of space, 1 ΛR ≡ , R where R is a three–dimensional rotation matrix trivially fall into this subgroup. However, it also contains the second large family of homogeneous transformations between inertial frames, uniform motions. 3.2. THE MATHEMATICS OF SPECIAL RELATIVITY I: LORENTZ GROUP 69 x3 x03 K K0 Consider two inertial frames K and K 0 whose origins at time t = 0 coalesce (which means that the affine x2 x02 vt coordinate transformation (3.1) will be homogeneous, x1 x01 a = 0.) We assume that K 0 moves at constant velocity v relative to K. Lorentz transformations of this type, Λv , are called special Lorentz transformations. Physx03 K0 ically, they describe what is sometimes callled a Lorentz boost, i.e. the passage from a stationary to a moving x02 frame. (Although we are talking about a mere coorx3 K x01 dinate transformation, i.e. no acceleration processes is vt x2 described.) We wish to explore the mathematical and physical properties of these transformations. x1 Without loss of generality, we assume that the vector v describing the motion is parallel to the e1 direction of K (and hence to the e01 –direction of K 0 : Any general velocity vector v may be rotated to v k e1 by a space like rotation R. This means that a generic special Lorentz transformation Λv may be obtained from the prototypical one Λve1 by a spatial rotation, Λv = ΛR Λve1 Λ−1 R (cf. the figure.) A special transformation in e1 –direction will leave the coordinates x2 = x02 and x3 = x03 7 unchanged. The sought for transformation thus assumes the form Λve1 = Av 1 , 1 where the 2 × 2 matrix Av operates in (ct, x1 )–space. There are many ways to determine the matrix Av . We here use a group theory inspired method which may come across as somewhat abstract but has the advantage of straightforward applicability to many other physical problems. P Being element of a continuous group of transformations (a Lie group), the matrix Av ≡ exp( i λi Ti ) can be written in an exponential parameterization. Here, λi are real parameters and Ti two–dimensional fixed ‘generator matrices’ (Cf. with the conventional rotation group; the Ti ’s play the role of angular momentum generator matrices, and the λi ’s are generalized real–valued ‘angles’.) To determine the group generators, we consider the transformation Av for infinitesimal angles λi . Substituting the matrix Av into the defining equation of the Lorentz group, using that the restriction of the metric to (ct, x) space assumes the form g → σ3 ≡ diag(1, −1), and expanding to first order in λi , we obtain ATv σ3 Av ' (1 + 7 X i λi TiT )σ3 (1 + X i λi Ti ) ' σ3 + X ! λi (TiT σ3 + σ3 Ti ) = σ3 . i To actually prove this, notice that from the point of view of an observer in K 0 we consider a special transformation with velocity −ve01 . Then apply symmetry arguments (notably the condition of parity invariance; a mirrored universe should not differ in an observable manner in its transformation behaviour from our one.) to show that any change in these coordinates would lead to a contradiction. 70 CHAPTER 3. RELATIVISTIC INVARIANCE This relation must hold regardless of the value of λi , i.e. the matrices Ti must obey the condition TiT σ3 = −σ3 T i . Upto normalization, there is only one matrix that meets this 0 1 condition, viz Ti = σ1 ≡ . The most general form of the matrix Av is thus given by 1 0 0 1 cosh λ sinh λ Av = exp λ = , 1 0 sinh λ cosh λ where we denoted the single remaining parameter by λ and in the second equality expanded the exponential in a power series. We now must determine the parameter λ = λ(v) in such a way that it actually describes our v–dependent transformation. We first substitute Av into the transformation law x0 = Λv x, or 0 00 x x . = Av 10 x1 x Now, the origin of K 0 (having coordinate x10 = 0) is at coordinate x = vt. Substituting this condition into the equation above, we obtain tanh λ = −v/c. Using this result and introducing the (standard) notation β = |v|/c, γ ≡ (1 − β 2 )−1/2 , (3.8) the special Lorentz transformation assumes the form Λve1 γ −βγ −βγ γ = 1 1 (3.9) EXERCISE Repeat the argument above for the full transformation group, i.e. not just the set of special Lorentz transformations along a certain axis. To this end, introduce an exponential P representation Λ = exp( i λi Ti ) for general elements of the restricted Lorentz group. Use the 8 defining condition of the group to identify six linearly independent matrix generators. Among the matrices Ti , identify three as the generators Ji of the spatial rotation group, Ti = Ji . Three others generate special Lorentz transformations along the three coordinate axes of K. (Linear combinations of these generators can be used to describe arbitrary special Lorentz transformations.) A few more remarks on the transformations of the restricted Lorentz group: 8 Matrices are elements of a certain vector space (the space of linear transformations of a given vector space.) As A set of matrices A1 , . . . Ak is linearly independent if no linear combination exists such that P x A i i i = 0 vanishes. 3.3. ASPECTS OF RELATIVISTIC DYNAMICS 71 . In the limit of small β, the transformation Λve1 asymptotes to the Galilei transformation, t0 = t and x01 = x1 − vt. . We repeat that a general special Lorentz transformation can be obtained from the transformation along e1 by a space–like rotation, Λv = ΛR Λve1 ΛR−1 , where R is a rotation mapping e1 onto the unit vector in v–direction. . Without proof (However, the proof is by and large implied by the exercise above.) we mention that a general restricted Lorentz transformation can always be represented as Λ = ΛR Λv , i.e. as the product of a rotation and a special Lorentz transformation. 3.3 Aspects of relativistic dynamics In this section we explore a few of the physical consequences deriving from Einsteins postulates; Our discussion will be embarassingly superficial, key topics relating to the theory of relativity aren’t mentioned at all. Rather the prime objective of this section will be to provide the core background material required to discuss the relativistic invariance of electrodynammics below. 3.3.1 Proper time ct Consider the world line of a particle moving in space in a (not necessarily uniform manner.) At any instance of time, t, the particle ct will be at rest in an inertial system K 0 whose origin coincides with x x(t) and whose instantaneous velocity w.r.t. K is given by dt x(t). At this point, it is important to avoid confusion. Of course, the coordinate system whose origin globally coincides with x(t) is not inertial w.r.t. K (unless the motion is uniform.) Put differently, the origin of the inertial system K 0 defined by the instantaneous position x(t) and the instantaneous velocity dt x(t) coincides with x the particles position only at the very instant t. The t0 axis of K 0 is parallel to the vector x0 (t). Also, we know that the t0 axis cannot be tilted w.r.t. the t axis by more than an angle π/4, i.e. it must stay within the instantaneous light cone attached to the reference coordinate x. Now, imagine an observer traveling along the world line x(t) and carrying a watch. At time t (corresponding to time zero in the instantaneous rest frame) the watch is reset. We wish to identify the coordinates of the event ‘watch shows (infinitesimal) time dτ ’ in both K and K 0 . In K’ the answer to this question is formulated easily enough, the event has coordinates (cdτ, 0), where we noted that, due to the orthogonality of the space–like coordinate axes to the world line, the watch will remain at coordinate x0 = 0. To identify the K– coordinates, we first note that the origin of K 0 is translated w.r.t. that of K by a = (ct, x(t). 0 0 72 CHAPTER 3. RELATIVISTIC INVARIANCE Thus, the coordinate transformation between the two frames assumes the general affine form x = Lx0 + a. In K, the event takes place at some (as yet undetermined) time dt later than t. It’s spatial coordinates will be x(t) + vdt, where v = dt x is the instantaneous velocity of the moving observer. We thus have the identification (cdτ, 0) ↔ (c(t + dt), x + vdt). Comparing this with the general form of the coordinate transformation, we realize that (cdτ, 0) and (cdt, vdt) are related by a homogeneous Lorentz transformation. This implies, in particular, that (cdτ )2 = (cdt)2 − (vdt)2 , or p (3.10) dτ = dt 1 − v2 /c2 . The time between events of finite duration may be obtained by integration, Z p τ = dt 1 − v2 /c2 . This equation expresses the famous phenomenon of the relativity of time: For a moving observer time proceeds slower than for an observer at rest. The time measured by the propagating watch, τ , is called proper time. The attribute ‘proper’ indicates that τ is defined with reference to a coordinate system (the instantaneous rest frame) that can be canonically defined. I.e. while the time t associated to a given point on the world line x(t) = (ct, x(t)) will change from system to system (as we just saw), the proper time remains the same; the proper time can be used to parameterize the space–time curve in an invariant manner as x(τ ) = (ct(τ ), x(τ )). 3.3.2 Relativistic mechanics Relativistic energy momentum relation In section 3.1.2 we argued that Einstein solved the puzzle posed by the Lorentz invariance of electrodynamics by postulating that Newtonian mechanics was in need of generalization. We will begin our discussion of the physical ramifications of Lorentz invariance by developing this extended picture. Again, we start from postulates motivated by physical reasoning: . The extended variant of Newtonian mechanics (lets call it relativistic mechanics) ought to be invariant under Lorentz transformations, i.e. the generalization of Newtons equations should assume the same form in all inertial frames. . In the rest frame of a particle subject to mechanical forces, or in frames moving at low velocity v/c 1 relatively to the rest frame, the equations must asymptote to Newtons equations. Newton’s equations in the rest frame K are of course given by m d2 x = f, dτ 2 where m is the particle’s mass (in its rest frame — as we shall see, the mass is not an invariant quantity. We thus better speak of a particle rest mass), and f is the force. As always in 3.3. ASPECTS OF RELATIVISTIC DYNAMICS 73 relativity, a spatial vector (such as x or f ) must be interpreted as the space–like component of a four–vector. The zeroth component of x is given by x0 = cτ , so that the rest frame generalization of the Newton equation reads as md2τ xµ = f µ , where f = (0, f ) and we noted that for the zeroth component md2τ x0 = 0, so that f 0 = 0. It is useful to define the four– momentum of the particle by pµ = mdτ xµ . (Notice that both τ and the rest mass m are invariant quantities, they are not affected by Lorentz transformations. The transformation behaviour of p is determined by that of x.) The zeroth component of the momentum, mdτ x0 carries the dimension [energy]/[velocity]. This suggests to define E/c p= , p where E is some characteristic energy. (In the instantaneous rest frame of the particle, E = mc2 and p = 0.) Expressed in terms of the four–momentum, the Newton equation assumes the simple form dτ p = f. (3.11) Let us explore what happens to the four momentum as we boost the particle from its rest frame to a frame moving with velocity vector v = ve1 . Qualitatively, one will expect that in the moving frame, the particle does carry a finite space–like momentum (which will be proportional to the velocity of the boost, v). We should also expect a renormalization of its energy (from the point of view of the moving frame, the particle has acquired kinetic energy). Focusing on the 0 and 1 component of the momentum (the others won’t change), Eq. (3.9) implies 2 0 mc /c E /c γmc2 /c → ≡ . 0 p01 −(γm)v This equation may be interpreted in two different ways. Substituting the explicit formula E = mc2 , we find E 0 = (γm)c2 and p01 = −(γm)v. In Newtonian mechanics, we would have expected that a particle at rest in K carries momentum p01 = −mv in K 0 . The difference to the relativistic result is the renormalization of the mass factor: A particle that has mass m in its rest frame appears to carry a velocity dependent mass m m(v) = γm = p 1 − (v/c)2 in K 0 . For v → c, it becomes infinitely heavy and its further acceleration will become progressively more difficult: Particles of finite rest frame mass cannot move faster than with the speed of light. The energy in K 0 is given by (mγ)c2 , and is again affected by mass renormalization. However, a more useful representation is obtained by noting that pT gp is a conserved quantity. In 74 CHAPTER 3. RELATIVISTIC INVARIANCE K, pT gp = (E/c)2 = (mc2 )2 . Comparing with the bilinear form in K 0 , we find (mc)2 = (E 0 /c)2 − p02 or p E 0 = (mc2 )2 + (p0 c)2 (3.12) This relativistic energy–momentum relation determines the relation between energy and momentum of a free particle. It most important implication is that even a free particle at rest carries energy E = mc2 , Einsteins world–famous result. However, this energy is usually not observable (unless it gets released in nuclear processes of mass fragmentation, with all the known consequences.) What we do observe in daily life are the changes in energy as we observe the particle in different frames. This suggests to define the kinetic energy of a particle by p T ≡ E − mc2 = (mc2 )2 + (pc)2 − mc2 , For velocities v c, the kineatic energy T ' p2 /2m indeed reduces to the familiar non– relativistic result. Deviations from this relation become sizeable at velocities comparable to the speed of light. Particle subject to a Lorentz force We next turn back to the discussion of the relativistically invariant Newton equation dτ pµ = fµ . Both, pµ and the components of the force, fµ , transform as vectors under Lorentz transformations. Specifically, we wish to explore this transformation behaviour on an example of obvious relevance to electrodynamics, the Newton equation of a particle of velocity v subject to an electric and a magnetic field. Unlike in much of our previous discussion our observer frame K is not the rest frame of the particle. The four–momentum of the particle is thus given by p = (mγc, mγv). What we take for granted is the Lorentz force relation, d p = q(E + (v/c) × B). dt In view of our discussion above, it is important to appreciate the meaning of the non–invariant constituents of this equation. Specifically, t is the time in the observer frame, and p is the space–like component of the actual momentum. This quantity is different from mv. Rather, it is given by the product (observed mass) × (observed velocity), where the observed mass differs from the rest mass, as discussed above. We now want to bring this equation into the form of an invariant Newton equation (3.11). Noting that (cf. Eq. (3.10)) dtd = γ −1 dτd , we obtain d p = f, dτ where the force acting on the particle is given by f = qγ(E + (v/c) × B). 3.3. ASPECTS OF RELATIVISTIC DYNAMICS 75 To complete our derivation of the Newton equation in K, we need to identify the zeroth component of the force, f0 . The zeroth component of the left hand side of the Newton equation is given by (γ/c)dt E, i.e. (γ/c)× the rate at which the kinetic energy of the particle changes. However, we have seen before that this rate of potential energy change is given by dt U = −f · v = −qE · v. Energy conservation implies that this energy gets converted into the kinetic energy, dt T = dt E = +qE · v. This implies the identification f0 = (γq/c)E · v. The generalized form of Newtons equations is thus given by q E · (γv) γc . mdτ = γv c γcE + (γv) × B The form of these equations suggests to introduce the four–velocity vector γc v≡ γv (in terms of which the four–momentum is given by p = mv, i.e. by multiplication by the rest mass.) The equations of motion can then be expressed as mdτ v µ = F µν vν , (3.13) where we used the index raising and lowering convention originally introduced in chapter ??, i.e. v 0 = v0 and v i = −vi , and the matrix F is defined by 0 −E1 −E2 −E3 E 1 0 −B3 B2 . F = {F µν } = E2 B3 0 −B1 E3 −B2 B1 0 (3.14) The significance of Eq. (3.13) goes much beyond that of a mere reformulation of Newton’s equations: The electromagnetic field enters the equation through a matrix, the so–called field strength tensor. This signals that the transformation behaviour of the electromagnetic field will be different from that of vectors. Throughout much of the course, we treated the electric field as if it was a (space–like) vector. Naively, one might have expected that in the theory of relativity, this field gets augmented by a fourth component to become a four–vector. However, a moments thought shows that this picture cannot be correct. Consider, say, a charged particle at rest in K. This particle will create a Coulomb electric field. However, in a frame K 0 moving relative to K, the charge will be in motion. Thus, an observer in K 0 will see an electric current plus the corresponding magnetic field. This tells us that under inertial transformations, electric and magnetic fields get transformed into one another. We thus need to find a relativistically invariant object accommodating the six components of the electric and magnetic field ’vectors’. Eq. (3.14) provides a tentative answer to that problem. The idea is that the fields enter the theory as a matrix and, therefore, transform in a way fundamentally different from that of vectors. 76 CHAPTER 3. RELATIVISTIC INVARIANCE 3.4 Mathematics of special relativity II: Co– and Contravariance So far, we have focused on the Lorentz transformation behaviour of vectors in R4 . However, our observations made in the end of the previous section suggest to include other objects (such as matrices) into our discussion. We will begin by providing some essential mathematical background and then, finally, turn to the discussion of Lorentz invariance of electrodynamics. 3.4.1 Covariant and Contravariant vectors Generalities Let x = {xµ } be a four component object that transforms under homogeneous Lorentz transformations as x → Λx or xµ → Λµν xν in components. (By Λµν we denote the components of the Lorentz transformation matrix. For the up/down arrangement of indices, see the disucssion below.) Quantities of this transformation behaviour are called contravariant vectors (or contravariant tensors of first rank). Now, let us define another four–component object, xµ ≡ gµν xν . Under a Lorentz transformation, xµ → gµν Λν ρ xρ = (ΛT −1 )µν xν . Quantities transforming in this way will be denoted as covariant vectors (or covariant tensors of first rank). Defining g µν to be the inverse of the Lorentz metric (in a basis where g is diagonal, g = g −1 ), the contravariant anchestor of xµ may be obtained back by ‘raising the indices’ as xµ = g µν xν . Critical readers may find these ’definitions’ unsatisfactory. Formulated in a given basis, one may actually wonder whether they are definitions at all. For a discussion of the basis invariant meaning behind the notions of co- and contravariance, we refer to the info block below. However, we here proceed in a pragmatic way and continue to explore the consequences of the definitions (which, in fact, they are) above. INFO Consider a general real vector space V . Recall that the dual space V ∗ is the linear space of all mappings ṽ : V → R, v 7→ ṽ(v). (This space is a vector space by itself which is why we denote ∗ its elements by vector–like symbols, ṽ.) For a given basis {eµ } of V , P a dual basis {ẽµ } of PV may µ be defined by the condition ẽµ (eν ) = δµν . With the expansions v = µ v eµ and ṽ = µ ṽµ ẽµ , respectively, we have ṽ(v) = ṽµ v µ . (Notice that components of objects in dual space will be indexed by subscripts throughout.) In the literature of relativity, elements of the vector space — then to be identified with space–time, see below — are usually called contravariant vectors while their partners in dual space are called covariant vectors. EXERCISE Let A : V → V be a linear map defining a basis change in V , i.e. eµ = Aµν e0ν , where {Aµν } are the matrix elements of A. Show that . The components of a contravariant vector v transform as v µ → v 0µ = (AT )µν v ν . . The components of a covariant vector, w̃ transform as w̃µ → w̃µ0 = (A−1 )µν w̃ν . . The action of w̃ on v remains invariant, i.e. w̃(v) = w̃µ v µ = w̃µ0 v 0ν does not change. (Of course, the action of the linear map w̃ on vectors must not depend on the choice of a particular basis.) 3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 77 In physics it is widespread (if somewhat tautological) practice to define co– or contravariant vectors by their transformation behaviour. For example, a set of d–components {wµ } is denoted a covariant vector if these components change under linear transformations as wµ → Aµν wν . Now, let us assume that V is a vector space with scalar product g : V × V → R, (v, w) 7→ v T gw ≡ hv, wi. A special feature of vector spaces with scalar product is the existence of a canonical mapping V → V ∗ , v 7→ ṽ, i.e. a mapping that to all vectors v canonically (without reference to a specific basis) assigns a dual element v. The vector ṽ is implicitly defined by the P ! condition ∀w ∈ V, ṽ(w) = hv, wi. For a given basis {eµ } of V the components of ṽ = µ ṽ µ ẽµ may be obtained as ṽµ = ṽ(eµ ) = hv, eµ i = v ν gνµ = gµν v ν , where in the last step we used the symmetry of g. With the so–called index lowering convention vµ = gµν v ν , (3.15) we are led to the identification ṽµ = vµ and ṽ(w) = vµ wµ . Before carrying on, let us introduce some more notation: by g µν ≡ (g −1 )µν we denote the components of the inverse of the metric (which, in the case of the Lorentz metric in its standard diagonal representation equals the metric, but that’s an exception.) Then g µν gνρ = δρµ , where δµρ is the standard Kronecker symbol. With this definition, we may introduce an index raising convention as v µ ≡ g µν vν . Indeed, g µν vν = g µν gνρ v ρ = δ µρ v ρ = v µ . (Notice that indices that are summed over or ‘contracted’ always appear crosswise, one upper and one lower index.) f on Consider a map A : V → V, v 7→ Av. In general the action of the canonical covariant vector Av transformed vectors Aw need not equal the original value ṽ(w). However, it is natural to focus on f those transformations that are compatible with the canonical assignment, i.e. Av(Aw) = ṽ(w). In a component language, this condition translates to (AT )µσ gσρ Aρ ν = gµν or AT gA = g. I.e. transformations compatible with the canonical identification (vector space) ↔ (dual space.) have to respect the metric. E.g., in the case of the Lorentz metrix, the ’good’ transformations belong to the Lorentz group. Summarizing, we have seen that the invariant meaning behind contra– and covariant vectors is that of vectors and dual vectors resepectively. Under Lorentz transformations these objects behave as is required by the component definitions given in the main text. A general tensor of degree ∗ (n, m) is an element of (⊗n1 V ) ⊗ (⊗m 1 V ). The definitions above may be extended to objects of higher complexity. A two component quantity W µν is called a contravariant tensor of second degree if it transforms under 0 0 Lorentz transformations as W µν → Λµµ0 Λν ν 0 W µ ν . Similiarly, a covariant tensor of second 0 0 degree transforms as Wµν → (ΛT −1 )µµ (ΛT −1 )ν ν Wµ0 ν 0 . Covariant and contravariant tensors 0 0 are related to each other by index raising/lowering, e.g. Wµν = gµµ0 gνν 0 W µ ν . A mixed 0 0 second rank tensor Wµ ν transforms as Wµ ν → (ΛT −1 )µµ Λν ν 0 Wµ0ν . The generalization of these definitions to tensors of higher degree should be obvious. Finally, we define a contravariant vector field (as opposed to a fixed vector) as a field vµ (x) that transforms under Lorentz transformations as v µ (x) → v 0µ (x0 ) = Λµν v ν (x). A Lorentz scalar (field), or tensor of degree zero is one that does not actively transform under Lorentz transformations: φ(x) → φ0 (x0 ) = φ(x). Covariant vector fields, tensor fields, etc. are defined in an analogous manner. 78 CHAPTER 3. RELATIVISTIC INVARIANCE The summation over a contravariant and a covariant index is called a contraction. For example, v µ wµ ≡ φ is the contraction of a contravariant and a covariant vector (field). As a result, we obtain a Lorentz scalar, i.e. an object that does not change under Lorentz transformations. The contraction of two indices lowers the tensor degree of an object by two. For example the contraction of a mixed tensor of degree two and a contravariant tensor of degree one, Aµν v ν ≡ wµ obtains a contravariant tensor of degree one, etc. 3.4.2 The relativistic covariance of electrodynamics We now have everything in store to prove the relativistic covariance of electrodynamics, i.e. the form–invariance of its basic equations under Lorentz transformations. Basically, what we need to do is assign a definite transformation behaviour scalar, co–/contravariant tensor, etc. to the building blocks of electrodynamics. The coordinate vector xµ is a contravariant vector. In electrodynamics we frequently differentiate w.r.t. the compoents xµ , i.e. the next question to ask is whether the four– dimensional gradient {∂xµ } is co– or contravariant. According to the chain rule, ∂xµ ∂ ∂ . = ∂x0µ0 ∂x0µ0 ∂xµ 0 Using that xµ = (Λ−1 )µµ0 x0µ , we find that i.e. ∂xµ ∂ −1T µ ∂ )µ0 µ , 0 = (Λ 0µ ∂x ∂x ≡ ∂µ transforms as a covariant vector. In components, ∂µ = (c−1 ∂t , ∇), ∂ µ = (c−1 ∂t , −∇). The four–current vector We begin by showing that the four–component object j = (cρ, j) is a Lorentz vector. Consider the current density carried by a point particle at coordinate x(t) in some coordinate system K. The four–current density carried by the particle is given by j(x) = q(c, dt x(t))δ(x − x(t)), or j µ (x) = qdt xµ (t)δ(x − x(t)), where xµ (t) = (ct, x(t)). To show that this is a contravariant vector, we introduce a dummy integration, Z Z dxµ 4 dxµ (t̃) µ δ(x − x(t̃))δ(t − t̃) = qc dτ δ (x − x(τ )), j (x) = q dt̃ dτ dt̃ where in the last step we switched to an integration over the proper time τ = τ (t̃) uniquely assigned to the world line parameterization x(t̃) = (ct̃, x(t̃)) in K. Now, the four–component δ–distribution, δ 4 (x) = δ(x)δ(ct) is a Lorentz scalar (why?). The proper time, τ , also is a Lorentz scalar. Thus, the transformation behaviour of j µ is dictated by that of xµ , i.e. j µ defines a contravariant vector. As an important corrolary we note that the continuity equation — the contraction of the covariant vector ∂µ and the contravariant vector j µ is Lorentz invariant, ∂µ j µ = 0 is a Lorentz scalar. 3.4. MATHEMATICS OF SPECIAL RELATIVITY II: CO– AND CONTRAVARIANCE 79 Electric field strength tensor and vector potential In section 3.3.2 we obtained Eq. (3.13) for the Lorentz invariant generalization of Newton’s equations. Since vµ and v ν transform a covariant and contravariant vectors, respectively, the matrix F µν must transform as a contravariant tensor of rank two. Now, let us define the four vector potential as {Aµ } = (φ, A). It is then a straightforward exercise to verify that the field strength tensor is obtained from the vector potential as F µν = ∂ µ Aν − ∂ ν Aµ . (3.16) (Just work out the antisymmetric combination of derivatives and compare to the definitions B = ∇ × A, E = −∇φ − c−1 ∂t A.) This implies that the four–component object Aµ indeed transforms as a contravariant vector. Now, we have seen in chapter 1.2 that the Lorentz condition can be expressed as ∂µ Aµ = 0, i.e. in a Lorentz invariant form. In the Lorentz gauge, the inhomogeneous wave equations assume the form 4π µ j , (3.17) Aµ = c where = ∂µ ∂ µ is the Lorentz invariant wave operator. Formally, this completes the proof of the Lorentz invariance of the theory. The combination Lorentz–condition/wave equations, which we saw carries the same information as the Maxwell equations, has been proven to be invariant. However, to stay in closer contact to the original formulation of the theory, we next express Maxwells equations themselves in a manifestly invariant form. Invariant formulation of Maxwells equations One may verify by direct inspection that the two inhomogeneous Maxwell equations can be formulated in a covariant manner as ∂µ F µν = 4π µ j . c (3.18) To obtain the covariant formulation of the homogeneous equations a bit of preparatory work is required: Let us define the fourth rank antisymmetric tensor as 1, (µ, ν, ρ, σ) = (0, 1, 2, 3)or an even permutation −1 for an odd permutation µνρσ ≡ (3.19) 0 else. One may show (do it!) that µνρσ transforms as a contravariant tensor of rank four (under the transformations of the unit–determinant subgroup L+ .) Contracting this object with the covariant field strength tensor Fµν , we obtain a contravariant tensor of rank two, 1 F µν ≡ µνρσ Fρσ , 2 80 CHAPTER 3. RELATIVISTIC INVARIANCE known as the dual field strength tensor. Using (3.20), it is straightforward to verify that 0 −B1 −B2 −B3 B1 0 E3 −E2 , F = {F µν } = B2 −E3 0 E1 B3 E2 −E1 0 (3.20) i.e. that F is obtained from F by replacement E → B, B → −E. One also verifies that the homogeneous equations assume the covariant form ∂µ F µν = 0. (3.21) This completes the invariance proof. Critical readers may find the definition of the dual tensor somewhat un–motivated. Also, the excessive use of indices — something one normally tries to avoid in physics — does not help to make the structure of the theory transparent. Indeed, there exists a much more satisfactory, coordinate independent formulation of the theory in terms of differential forms. However, as we do not assume familiarity of the reader with the theory of forms, all we can do is encourage him/her to learn it and to advance to the truly invariant formulation of electrodynamics not discussed in this text ... Chapter 4 Lagrangian mechanics Newton’s equations of motion provide a complete description of mechanical motion. Even from today’s perspective, their scope is limited only by velocity (at high velocities v ∼ c, Newtonian mechanics has to be replaced by its relativistic generalization), and classicicity (the dynamics of small bodies is affected by quantum effects.) Otherwise, they remain fully applicable, which is remarkable for a theory that old. Newtonian theory had its first striking successes in celestial mechanics. But how useful is this theory in a context more worldly than that of a planet in open space? To see the justification of this question, consider the system shown in the figure, a setup known as the Atwood machine: two bodies subject to gravitational force are tied to each other by an idealized massless string over an idealized frictionless pulley. What makes this problem different from those considered in celestial mechanics is that the participating bodies (the two masses) are constrained in their motion. The question then arises how this constraint can be incorporated into (Newton’s) mechanical equations of motion, and how the ensuing equations can be solved. This problem is of profound applied relevance – of the hundreds of motions taking place in, say, the engine of a car practically all are constrained. In the eighteenth century, the era initiating the age of engineering and industrialization, the availability of a formalism capable of efficient formulation of problems subject to mechanical constraints became pressing. Early solution schemes in terms of Newtonian mechanics relied on the concept of “constraining forces”. The strategy there was to formulate a problem in terms of its basal unconstrained variables (e.g., the real space coordinates of the two masses in the figure). In a second step one would then introduce (infinitely strong) constraining forces serving to reduce the number of free coordinates (e.g., down to the one coordinate measuring the height of one of the masses in the figure.) However, strategies of this type soon turned out to be operationally sub-optimal. The need to find more efficient formulations was motivation for intensive research activity, which eventually culminated in the modern formulations of classical mechanics, Lagrangian and Hamiltonian mechanics. 81 82 CHAPTER 4. LAGRANGIAN MECHANICS INFO There are a number of conceptualy different types of mechanical constraints: constraints that can be expressed in terms of equalities such as f (q1 , . . . , qn ; q̇1 , . . . , q̇n ) = 0 are called holonomic. Here, q1 , . . . , qn are the basal coordinates of the problem, and q̇1 , . . . , q̇n the corresponding velocities. (The constraint of the Atwood machine belongs to this category: x1 + x2 = l, where l is a constant, and xi , i = 1, 2 are the heights of the two bodies measured with respect to a common reference height.) This has to be contrasted to non-holonomic constraints, i.e. constraints formulated by inequalities. (Think of the molecules moving in a piston. Their coordinates obey the constraint 0 ≤ qj ≤ Lj , j = 1, 2, 3, where Lj are the extensions of the piston.) Constraints explicitly involving time f (q1 , . . . , qn ; q̇1 , . . . , q̇n ; t) = 0 are called rheonomic. For example, a particle constraint to move on a moving surface is subject to a rheonomic constraint. Constraints void of explicit time dependence are called scleronomic. In this chapter, we will introduce the concept of variational principles to derive Lagrangian (and later Hamiltonian) mechanics from their Newtonian ancestor. Our construction falls short to elucidate the beautiful and important history developments that eventually led to the modern formulation. Also, it is difficult to motivate in advance. However, within the framework of a short introductory course, these shortcomings are outweighed by the brevity of the derivation. Still, it is highly recommended to consult a textbook on classical mechanics to learn more about the historical developments that led from Newtonian to Lagrangian mechanics. Let us begin with a few simple and seemingly un–inspired manipulations of Newton’s equations mq̈ = F of a single particle (generalization to many particles will be obvious) subject to a conservative force F. Now, notice that the l.h.s. of the equation may be written as mq̈ = dt ∂q̇ T , where T = T (q̇) = m2 q̇2 is the particle’s kinetic energy. By definition, the conservative force affords a representation F = −∂q U , where U = U (q) is the potential. This means that Newton’s equation may be equivalently represented as (dt ∂q̇ − ∂q )L(q, q̇) = 0, (4.1) where we have defined the Lagrangian function, L = T − U. (4.2) But what is this reformulation good for? To appreciate the meaning of the mathematical structure of Eq. (4.1), we need to introduce the purely mathematical concept of 4.1 Variational principles In standard calculus, one is concerned with functions F (v) that take vectors v ∈ Rn as arguments. Variational calculus generalizes standard calculus, in that one considers “functions” F [f ] taking functions as arguments. Now, “function of a function” does not sound nice. This may be the reason for why the “function” F is actually called a functional. Similarly, it is customary to indicate the argument of a functional in angular brackets. The generalization v → f is not in the least mysterious: as we have seen in previous chapters, one may discretize a function (cf. the discussion in section 1.3.4) f → {fi |i = 1, . . . , N }, thus interpreting it as 4.1. VARIATIONAL PRINCIPLES 83 Figure 4.1: On the mathematical setting of functionals on curves (discussion, see text) the limiting case of an N -dimensional vector; in many aspects, variational calculus amounts to a straightforward generalization of standard calculus. At any rate, we will see that one may work with functionals much like with ordinary functions. 4.1.1 Definitions In this chapter we will focus on the class of functionals relevant to classical mechanics, viz. 1 functionals F [γ] taking curves in (subsets of) Rn as arguments. To be specific, consider the set of all curves M ≡ {γ : I ≡ [t0 , t1 ] → U } mapping an interval I into a subset U ⊂ Rn of 2 n-dimensional space (see Fig. 4.1.) Now, consider a mapping Φ :M → R, γ 7→ Φ[γ], (4.3) assigning to each curve γ a real number, i.e. a (real) functional on M . EXAMPLE The length of a curve is defined as L[γ] ≡ 1 Z t1 dt t0 n X !1/2 γ̇i (t)γ̇i (t) . (4.4) i=1 We have actually met with more general functionals before. For example, the electric susceptibility χ[E] is a functional of the electric field E : R4 → R3 . 2 The set U may actually be a lower dimensional submanifold of Rn . For example, we might consider curves in a two–dimensional plane embedded in R3 , etc. 84 CHAPTER 4. LAGRANGIAN MECHANICS It assigns to each curve its euclidean length. Some readers may question the consistency of the notation: on the l.h.s. we have the symbol γ (no derivatives), and on the r.h.s. γ̇ (temporal derivatives.) However, there is no contradiction here. The notation [γ] indicates dependence on the curve as a geometric object. By definition, this contains the full information on the curve, including all derivatives. The r.h.s. indicates that the functional L reads only partial information on the curve, viz. that contained in first derivatives. Consider now two curves γ, γ 0 ∈ M that lie “close” to each other. (For example, we may P 0 2 1/2 require that |γ(t) − γ 0 (t)| ≡ < for all t and some positive .) We i (γi (t) − γi (t)) 0 are interested in the increment Φ[γ] − Φ[γ ]. Defining γ 0 = γ + h, the functional Φ is called differentiable iff Φ[γ + h] − Φ[γ] = F γ [h] + O(h2 ), (4.5) where F γ [h] is a linear functional of h, i.e. a functional obeying F γ [c1 h1 + c2 h2 ] = c1 F γ [h1 ] + c2 F γ [h2 ] for c1 , c2 ∈ R and h1,2 ∈ M . In (4.5), O(h2 ) indicates residual contributions of order h2 . For example, if |h(t)| < for all t, these terms would be of O(2 ). The functional F |γ is called the differential of the functional Φ at γ. Notice that F |γ need not depend linearly on γ. The differential generalizes the notion of a derivative to functionals. Similarly, we may think of Φ[γ +h] = Φ[γ]+F |γ [h]+O(h2 ) as a generalized Taylor expansion. The linear functional F |γ describes the behavior of Φ in the vicinity of the reference curve γ. A curve γ is called an extremal curve of Φ if F |γ = 0. EXERCISE Re-familiarize yourself with the definition of the derivative f 0 (x) of higher dimensional functions f : Rk → R. Interpret the functional Φ[γ] as the limit of a function Φ : RN → R, {γi } → Φ({γi }) where the vector {γi |i = 1, . . . , N } is a discrete approximation of the curve γ. Think how the definition (4.5) generalizes the notion of differentiability and how F |γ ↔ f 0 (x) generalizes the definition of a derivative. This is about as much as we need to say/define in most general terms. In the next section we will learn how to determine the extremal curves for an extremely important sub–family of functionals. 4.1.2 Euler–Lagrange equations In the following, we will focus on functionals that afford a “local representation” Z t1 S[γ] = dt L(γ(t), γ̇(t), t), t0 (4.6) 4.1. VARIATIONAL PRINCIPLES 85 where L : Rn ⊕ Rn ⊕ R → R is a function. The functional S is “local” in that the integral kernel L does not depend on points on the curve at different times. Local functionals play an important role in applications. (For example, the length functional (4.4) belongs to this family.) We will consider the restriction of local functionals to the set of all curves γ ∈ M that begin and end at common terminal points: γ(t0 ) ≡ γ0 and γt1 = γ1 with fixed γ0 and γ1 (see the figure.) Again, this is a restriction motivated by the applications below. To keep the notation slim, we will denote the space of all curves thus restricted again by M . We now prove the following important fact: the local functional S[γ] is differentiable and its derivative is given by F γ [h] = Z t1 t0 dt (∂γ L − dt ∂γ̇ L) · h. (4.7) P Here, we are using the shorthand notation ∂γ L·h ≡ ni=1 ∂xi L(x, γ̇, t)xi =γi hi and analogously for ∂γ̇ · h. Eq. (4.7) is verified by straightforward Taylor series expansion: S[γ + h] − S[γ] = Z t1 t dt L(γ + h, γ̇ + ḣ, t) − L(γ, γ̇, t) = t h i dt ∂γ L · h + ∂γ̇ · ḣ + O(h2 ) = t0 dt [∂γ L − dt (∂γ̇ L)] · h + ∂γ̇ L · h|tt10 + O(h2 ), Z 0t1 = Z 0t1 = where in the last step, we have integrated by parts. The surface term vanishes because h(t0 ) = h(t1 ) = 0, on account of the condition γ(ti ) = (γ + h)(ti ) = γi , i = 0, 1. Comparison with the definition (4.5) then readily gets us to (4.7). Eq. (4.7) entails an important corollary: the local functional S is extremal on all curves obeying the so-called Euler-Lagrange equations d dL dL − = 0, dt dγ̇i dγi i = 1, . . . , N. The reason is that if and only if these N conditions hold, will the linear functional (4.7) vanish on arbitrary curves h. (Exercise: assuming that one or several of the conditions above are violated, construct a curve h on which the functional (4.7) will not vanish.) 86 CHAPTER 4. LAGRANGIAN MECHANICS Let us summarize what we have got: for a given function L : Rn ⊕ Rn ⊕ R → R, the local functional S : M → R, γ 7→ S[γ] ≡ Z t1 dt L(γ(t), γ̇(t), t), (4.8) t0 defined on the set M = {γ : [t0 , t1 ] → Rn |γ(t0 ) = γ0 , γ(t1 ) = γ1 } is extremal on curves obeying the conditions d dL dL − = 0, dt dγ̇i dγi i = 1, . . . , N. (4.9) These equations are called the Euler–Lagrange equations of the functional S. The function L is often called the Lagrangian (function) and S an action functional. At this stage, one may observe a suspicious structural similarity between the Euler-Lagrange equations and our early reformulation of Newton’s equations (4.1). However, before shedding more light on this connection, it is worthwhile to illustrate the usage of Euler-Lagrange calculus on an EXAMPLE Let us ask what curve between fixed initial and final points γ0 and γ1 , resp. will have extremal curve length. Here, extremal means “shortest”, there is no “longest” curve. To answer this question, we need to formulate and solve the Euler-Lagrange equations of the functional (4.4). Rt P Differentiating the Lagrangian L(γ̇) = t01 dt ( ni=1 γ̇i γ̇i )1/2 , we obtain (show it) γ̇i (γ̇j γ̈j ) d ∂L ∂L γ̈i − = − . dt ∂ γ̇i ∂γi L(γ̇) L(γ̇)3 These equations are solved by all curves connecting γ0 and γ1 as a straight line. For example, 1 0 the “constant velocity” curve, γ(t) = γ0 tt−t + γ1 tt−t has γ̈i = 0, thus solving the equation. 0 −t1 1 −t0 (Exercise: the most general curve connecting γ0 and γ1 by a straight line is given by γ(t) = γ0 + f (t)(γ0 − γ1 ), where f (t0 ) = 0 and f (t1 ) = 1. In general, these curves describe “accelerated motion”, γ̈ 6= 0. Still, they solve the Euler-Lagrange equations, i.e. they are of shortest geometric length. Show it!) 4.1.3 Coordinate invariance of Euler-Lagrange equations is: differentiate the function L(γ, γ̇, t) w.r.t. the What we actually mean when we write ∂γ∂L i (t) ith coordinate of the curve γ at time t. However the same curve can be represented in different coordinates! For example, a three dimensional curve γ will have different representations depending on whether we work in cartesian coordinates γ(t) ↔ (x1 (t), x2 (t), x3 (t)) or spherical coordinates γ(t) ↔ (r(t), θ(t), φ(t)). 4.1. VARIATIONAL PRINCIPLES 87 φ ◦ φ−1 Figure 4.2: On the representation of curves and functionals in different coordinates Yet, nowhere in our derivation of the Euler–Lagrange equations did we make reference to specific properties of the coordinate system. This suggests that the Euler–Lagrange equations assume the same form, Eq. (4.9), in all coordinate systems. (To appreciate the meaning of this statement, compare with the Newton equations which assume their canonical form ẍi = fi (x) only in cartesian coordinates.) Coordinate changes play an extremely important role in analytical mechanics, especially when it comes to problems with constraints. Hence, anticipating that the Euler–Lagrange formalism is slowly revealing itself as a replacement of Newton’s formulation, it is well invested time to take a close look at the role of coordinates. Both a curve γ and the functional S[γ] are canonical objects, no reference to coordinates made here. The curve is simply a map γ : I → U and S[γ] assigns to that map a number. However, most of the time when we actually need to work with curve, we do so in a system 3 of coordinates. Mathematically, a coordinate system of U is a diffeomorphic map: φ :U → V, x 7→ φ(x) ≡ x ≡ (x1 , . . . , xn ), where the coordinate domain V is an open subset of Rm and m is the dimensionality of U ⊂ Rm . (The set U may be of lower dimension than embedding space Rn .) For example, spherical coordinates ]0, π[×]0, 2π[3 (θ, φ) 7→ S 2 ⊂ R3 assign to each coordinate pair (θ, φ) 4 a point on the two–dimensional sphere, etc. 3 Loosely speaking, this means invertible and smooth (differentiable.) We here avoid the discussion of the complications arising when U cannot be covered by a single coordinate “chart” φ. For example, the sphere S 2 cannot be fully covered by a single coordinate mapping. (Think why.) 4 88 CHAPTER 4. LAGRANGIAN MECHANICS Given a coordinate system, φ, the abstract curve γ defines a curve x ≡ φ ◦ γ : I → V ; t 7→ x(t) = φ(γ(t)) in coordinate space (see Fig. ??.) What we actually mean when we refer to γi (t) in the previous sections, are the coordinates xi (t) of γ in the coordinate system φ. (In cases where unambiguous reference to one coordinate system is made, it is customary to avoid the introduction of extra symbols (xi ) and to write γi instead. As long as one knows what one is doing, no harm is done.) Similarly, the abstract functional S[γ] defines a functional Sc [x] ≡ S[φ−1 ◦x] on curves in the coordinate domain. In our derivation of the Euler-Lagrange equations we have been making tacit use of a coordinate representation of this kind. In other words, the Euler–Lagrange equations were actually derived for the representation Sc [x]. (Again it is customary to simply write S[γ] if unambiguous reference to a coordinate representation is made. Other customary notations include S[x] (omitting the subscript c), or just S[γ] if γ is tacitly identified with a specific coordinate representation. Occasionally one writes S[γ] where γ is meant to be the vector of coordinates of γ in a specific system.) Now, suppose we are given another coordinate representation of U , that is, a diffeomorphism φ0 : U → V 0 , x 7→ φ0 (x) = x0 ≡ (x01 , . . . , x0m ). A point on the curve, γ(t) now has two different coordinate representations, x(t) = φ(γ(t)) and x0 (t) = φ0 (γ(t)). The coordinate transformation between these representations is mediated by the map φ0 ◦ φ−1 :V → V 0 , x 7→ x0 = φ0 ◦ φ−1 (x). By construction, this is a smooth and invertible map between open subsets of Rm . For example, if x = (x1 , x2 , x3 ) are cartesian coordinates and x0 = (r, θ, φ) are spherical coordinates, we would have x01 (x) = (x21 + x22 + x23 )1/2 , etc. The most important point is that x(t) and x0 (t) describe the same curve, γ(t), only in different representations. Specifically, if the reference curve γ is an extremal curve, the coordinate representations x : I → V and x0 : I → V2 will be extremal curves, too. According to our discussion in section 4.1.2 both curves must be solutions of the Euler-Lagrange equations, i.e. we can draw the conclusion: γ extremal ⇒ ∂L d ∂L − = 0, dt ∂ ẋi ∂xi ∂L d ∂L0 − 0 = 0, 0 dt ∂ ẋi ∂xi (4.10) where L0 (x0 , ẋ0 , t) ≡ L(x(x0 ), ẋ(x0 ), t) is the Lagrangian in the φ0 –coordinate representation. In other words The Euler–Lagrange equations are coordinate invariant. They assume the same form in all coordinate systems. 4.2. LAGRANGIAN MECHANICS 89 EXERCISE Above we have shown the coordinate invariance of the Euler–Lagrange equations by conceptual reasoning. However, it must be possible to obtain the same invariance properties by brute force computation. To show that the second line in (4.10) follows from the first, use the P ∂x0 ∂ ẋ0 ∂x0 chain rule, dt x0i = j ∂xji dt xj , and its immediate consequence ∂ ẋji = ∂xji . EXAMPLE Let us illustrate the coordinate invariance of the variational formalism on the example of the functional “curve length” discussed on p 86. Considering the case of curves in the plane, n = 2, we might get the idea to attack this problem in polar coordinates φ−1 (x) = (r, ϕ). The polar coordinate representation of the cartesian Lagrangian L(x1 , x2 , ẋ1 , ẋ2 ) = (ẋ21 + ẋ22 )1/2 reads (verify it) L(r, ϕ, ṙ, ϕ̇) = (ṙ2 + r2 ϕ̇2 )1/2 . It is now straightforward to compute the Euler–Lagrange equations d ∂L ∂L ! − = ϕ̇(. . . ) = 0, dt ∂ ṙ ∂r d ∂L ∂L ! − = ϕ̇(. . . ) = 0. dt ∂ ϕ̇ ∂ϕ Here, the notation ϕ̇(. . . ) indicates that we are getting a lengthy list of terms which, however, are all weighted by ϕ̇. Putting the initial point into the origin φ(γ0 ) = (0, 0) and the final point somewhere into the plane, φ(γ1 ) = (r1 , ϕ1 ), we thus conclude that the straight line connectors, φ(γ(t)) = (r(t), ϕ0 ) are solutions of the Euler–Lagrange equations. (It is less straightforward to show that these are the only solutions.) With these preparations in store, we are now in a very good position to apply variational calculus to a new and powerful reformulation of Newtonian mechanics. 4.2 4.2.1 Lagrangian mechanics The idea Imagine a mechanical problem subject to constraints. For definiteness, we may consider the system shown on the right – a bead sliding on a string and subject to the gravitational force, Fg . In principle, we may describe the situation in terms of Newton’s equations. These equations must contain an “infinitely strong” force Fc whose sole function is to keep the bead on the string. About this force we do not know much (other than that it acts vertically to the string, and vanishes right on the string.) According to the structures outlined above, we may now reformulate Newton’s equations as (4.1), where the potential V = Vg + Vc in the Lagrangian L = T − V contains two contributions, one accounting for the gravitational force, 90 CHAPTER 4. LAGRANGIAN MECHANICS Vg , and another, Vc , for the force Fc . We also know that the sought for solution curve Rt q : I → R3 , t 7→ q(t) will extremize the action S[q] = t01 dt L(q, q̇). (Our problem does not include explicit time dependence, i.e. L does not carry a time–argument.) So far, we have not gained much. But let us now play the trump card of the new formulation, its invariance under coordinate changes. In the above formulation of the problem, we seeking for an extremum of S[q] on the set of all curves I → R3 . However, we know that all curves in R3 will be subject to the “infinitely strong” potential of the constraint force, unless they lie right on the string S. The action of those generic curves will be infinitely large and we may remove them from the set of curves entering the variational procedure from the outset. This observation suggests to represent the problem in terms of coordinates (s, q⊥ ), where the one–dimensional coordinate s parameterizes the curve, and the (3 − 1)–dimensional coordinate vector q⊥ parameterizes the space normal to the curve. Knowing that curves with non-vanishing q⊥ (t) will have an infinitely large action, we may restrict the set of curves under consideration to M ≡ {γ : I → S, t 7→ (s(t), 0)}, i.e. to curves in S. On the string, S, the constraint force Fc vanishes. The above limitation thus entails that the constraint forces will never explicitly appear in the variational procedure; this is an enormous simplification of the problem. Also, our problem has effectively become one–dimensional. The Lagrangian evaluated on curves in S is a function L(s(t), ṡ(t)). It is much simpler than the original Lagrangian L(q, q̇(t)) with its constraint force potential. Before generalizing these ideas to a general solution strategy of problems with constraints, let us illustrate its potential on a concrete problem. EXAMPLE We consider the Atwood machine depicted on p81. The Lagrangian of this problem reads L(q1 , q2 , q̇1 , q̇2 ) = m1 2 m2 2 q̇ + q̇ − m1 gx1 − m2 gx2 . 2 1 2 2 Here, qi is the position of mass i, i = 1, 2 and xi is its height. In principle, the sought for solution γ(t) = (q1 (t), q2 (t)) is a curve in six dimensional space. We assume that the masses are released at rest at initial coordinates (xi (t0 ), yi (t0 ), zi (t0 )). The coordinates yi and zi will not change in the process, i.e. effectively we are seeking for solution curves in the two–dimensional space of coordinates (x1 , x2 ). The constraint now implies that x ≡ x1 = l − x2 , or x2 = l − x. Thus, our solution curves are uniquely parameterized by the “generalized coordinate” x as (x1 , x2 ) = (x, l − x). We now enter with this parameterization into the Lagrangian above to obtain L(x, ẋ) = m1 + m2 2 ẋ − (m1 − m2 )gx + const. 2 It is important to realize that this function uniquely specifies the action Z t1 S[x] = dt L(x(t), ẋ(t)) t0 of physically allowed curves. The extremal curve, which then describes the actual motion of the two–body system, will be solution of the equation dt ∂L ∂L − = (m1 + m2 )ẍ + (m1 − m2 )g = 0. ∂ ẋ ∂x 4.2. LAGRANGIAN MECHANICS 91 For the given initial conditions x(t0 ) = x1 (t0 ) and ẋ(t0 ) = 0, this equation is solved by x(t) = x(t0 ) − m1 − m2 g (t − t0 )2 . m1 + m2 2 Substitution of this solution into q1 = (x, y1 , z1 ) and q2 = (l − x, y2 , z2 ) solves our problem. 4.2.2 Hamilton’s principle After this preparation, we are in a position to formulate a new approach to solving mechanical problems. Suppose, we are given an N –particle setup specified by the following data: (i) a Lagrangian function L : R6N +1 → R, (x1 , . . . , xN , ẋ1 , . . . , ẋN , t) 7→ L(x1 , . . . , xN , ẋ1 , . . . , ẋN , t) defined in the 6N –dimensional space needed to register the coordinates and velocities of N – particles, and (ii) a set of constraints limiting the motion of particles to an f –dimensional 5 submanifold of RN . Mathematically, a (holonomic) set of constraints will be implemented through 3N − f equations Fj (x1 , . . . , xN , t) = 0, j = 1, . . . , 3N − f. (4.11) The number f is called the number of degrees of freedom of the problem. Hamiltons’s principle states that such a problem is to be solved by a three–step algorithm: . Resolve the constraints (4.11) in terms of f parameters q ≡ (q1 , . . . , qf ), i.e. find a representation xl (q), l = 1, . . . , N , such that the constraints Fj (x1 (q), . . . , xN (q)) = 0 are resolved for all j = 1, . . . , 3N − f . The parameters qi are called generalized coordinates of the problem. The maximal set of parameter configurations q ≡ (q1 , . . . qf ) compatible with the constraint defines a subset V ⊂ Rf and the map V → R3N , q 7→ (x1 (q), . . . , xN (q)) defines an f –dimensional submanifold of R3N . . Reduce the Lagrangian of the problem to an effective Lagrangian L(q, q̇, t) ≡ L(x1 (q), . . . , xN (q), ẋ1 (q), . . . , ẋN (q), t). In practice, this amounts to a substitution of xi (t) = xi (q(t)) into the original Lagrangian. The effective Lagrangian is a function L : V × Rf × R → R, (q, q̇, t) 7→ L(q, q̇, t). . Finally formulate and solve the Euler–Lagrange equations d L(q, q̇, t) L(q, q̇, t) − = 0, dt dq̇i dqi 5 i = 1, . . . , f. (4.12) Loosely speaking, a d–dimensional submanifold of Rn is a subset of Rn that affords a smooth parameterization in terms of d < n coordinates (i.e. is locally diffeomorphic to open subsets of Rd .) Think of a smooth surface in three–dimensional space (n = 3, d = 2) or a line (n = 3, d = 1), etc. 92 CHAPTER 4. LAGRANGIAN MECHANICS The prescription above is equivalent to the following statement, which is known as Hamiltons principle Consider a mechanical problem formulated in terms of f generalized coordinates q = (q1 , . . . , qf ) and a Lagrangian L(q, q̇, t) = (T − U )(q, q̇, t). Let q(t) be solution of the Euler-Lagrange equations (dt ∂q̇i − ∂qi )L(q, q̇, t) = 0, i = 1, . . . , f, at given initial and final configuration q(t0 ) = q0 and q(t1 ) = q1 . This curve describes the physical motion of the system. It is an extremal curve of the action functional Z t1 dt L(q, q̇, t). S[q] = t0 To conclude this section, let us transfer a number of important physical quantities from Newtonian mechanics to the more general framework of Lagrangian mechanics: the generalized momentum associated to a generalized coordinate qi is defined as pi = ∂L(q, q̇, t) . ∂ q̇i (4.13) Notice that for cartesian coordinates, pi = ∂q̇i L = ∂q̇i T = mq̇i reduces to the familiar momentum variable. We call a coordinate qi a cyclic variable if it does not enter the Lagrangian function, that is if ∂qi L = 0. These two definitions imply that the generalized momentum corresponding to a cyclic variable is conserved, dt pi = 0. This follows from dt pi = dt ∂q̇i L = ∂qi L = 0. In general, we call the derivative ∂qi L ≡ Fi a generalized force. In the language of generalized momenta and forces, the Lagrange equations (formally) assume the form of Newton-like equations, dt pi = F i . For the convenience of the reader, the most important quantities revolving around the Lagrangian formalism are summarized in table 4.1. 4.2.3 Lagrange mechanics and symmetries Above, we have emphasized the capacity of the Lagrange formalism to handle problems with constraints. However, an advantage of equal importance is its flexibility in the choice of problem adjusted coordinates: unlike the Newton equations, the Lagrange equations maintain their form in all coordinate systems. 4.2. LAGRANGIAN MECHANICS 93 quantity generalized coordinate generalized momentum generalized force Lagrangian Action (functional) Euler-Lagrange equations designation or definition qi pi = ∂q̇i L Fi = ∂qi L L(q, q̇, t) = (T − U )(q, q̇, t) Rt S[q] = t01 dt L(q, q̇, t) (dt ∂q̇i − ∂qi )L = 0 Table 4.1: Basic definitions of Lagrangian mechanics The choice of “good coordinates” becomes instrumental in problems with symmetries. From experience we know that a symmetry (think of z–axis rotational invariance) entails the conservation of a physical variable (the z–component of angular momentum), and that it is important to work in coordinates reflecting the symmetry (z–axis cylindrical coordinates.) But how do we actually define the term “symmetry”? And how can we find the ensuing conservation laws? Finally, how do we obtain a system of symmetry–adjusted coordinates? In this section, we will provide answers to these questions. 4.2.4 Noether theorem Consider a family of mappings, hs , of the coordinate manifold of a mechanical system into itself, hs :V → V, q → hs (q). (4.14) Here, s ∈ R is a control parameter, and we require that h0 = id. is the identity transform. By way of example, consider the map hs (r, θ, φ) ≡ (r, θ, φ+ s) describing a rotation around the z-axis in the language of spherical coordinates, etc. For each curve q : I → V, t 7→ q(t), the map hs gives us a new curve hs ◦ q : I → V, t 7→ hs (q(t)) (see the figure). We call the transformation hs a symmetry (transformation) of a mechanical system iff S[hs ◦ q] = S[q], for all s and curves q. The action is then said to be invariant under the symmetry transformation. Suppose, we have found a symmetry and its representation in terms of a family of invariant transformations. Associated to that symmetry there is a quantity that is conserved during the dynamical evolution of the system. The correspondence between symmetry and its conservation law is established by a famous result due to Emmy Noether: 94 CHAPTER 4. LAGRANGIAN MECHANICS Noether theorem (1915-18): Let the action S[q] be invariant under the transformation q 7→ hs (q). Let q(t) be an solution curve of the system (a solution of the Euler-Lagrange equations). Then, the quantity f X I(q, q̇) ≡ i=1 pi d (hs (q))i ds (4.15) is dynamically conserved: dt I(q, q̇) = 0. Here, pi = ∂q̇i L(q,q̇)=(hs (q),ḣs (q)) is the generalized momentum of the ith coordinate, and the quantity I(q, q̇) is known as the Noether momentum. The proof of Noether’s theorem is straightforward: the requirement of action–invariance under the transformation hs is equivalent to the condition ds S[hs (q)] = 0 for all values of s. We now consider the action Z tf S[q] = dt L(q, q̇, t), t0 a solution curve samples during a time interval [t0 , tf ]. Here, tf is a free variable and the transformed value hs (q(tf )) may differ from q(tf ). We next explore what the condition of action invariance tells us about the Lagrangian of the theory: ! Z tf 0 = ds dt L(hs (q), ḣs (q), t) t0 Z t1 = dt ∂qi Lq=hs (q) ds (hs (q))i + ∂q̇i Lq̇=ḣs (q) ds (ḣs (q))i = t Z 0t1 tf = dt (∂qi − dt ∂q̇i L) (hs (q),ḣs (q)) ds (hs (q))i + ∂q̇i Lq̇=ḣs (q) ds hs (q)i . t0 t0 Now, our reference curve is a solution curve, which means that the tf integrand in the last line vanishes. We are thus led to the conclusion ∂q̇i Lq̇=ḣs (q) ds hs (q)i = 0 for all values tf , and t0 this is equivalent to the constancy of the Noether momentum. It is often convenient, to evaluate both, the symmetry transformation hs and the Noether momentum I = I(s) at infinitesimal values of the control parameter s. In practice, this means that we fix a pair (q, q̇) comprising coordinate and velocity of a solution curve. We then consider an infinitesimal transformation h (q), where is infinitesimally small. The Noether momentum is then obtained as I(q, q̇) = f X i=1 d pi (h (q))i . d (4.16) 4.2. LAGRANGIAN MECHANICS 95 It is usually convenient to work in symmetry adjusted coordinates, i.e. in coordinates where the transformation hs assumes its simplest possible form. These are coordinates where hs acts by translation in one coordinate direction, that is (hs (q))i = qi + sδij , where j is the affected coordinate direction. Coordinates adjusted to a symmetry are cyclic (think about this point), and the Noether momentum I = pi , collapses to the standard momentum of the symmetry coordinate. 4.2.5 Examples In this section, we will discuss two prominent examples of symmetries and their conservation laws. Translational invariance ↔ conservation of momentum Consider a mechanical system that is invariant under translation in some direction. Without loss of generality, we choose cartesian coordinates q = (q1 , q2 , q3 ) in such a way that the coordinate q1 parameterizes the invariant direction. The symmetry transformation hs then translates in this direction: hs (q) = q + se1 , or hs (q) = (q1 + s, q2 , q3 ). With ds hs (q) = e1 , we readily obtain ∂L = p1 , I(q, q̇) = ∂ q̇1 where p1 = mq̇1 is the ordinary cartesian momentum of the particle. We are thus led to the conclusion Translational invariance entails the conservation of cartesian momentum. EXERCISE Generalize the construction above to a system of N particles in the absence of external potentials. The (interaction) potential of the system then depends only on coordinate differences qi − qj . Show that translation in arbitrary directions is a symmetry of the system and that this P implies the conservation of the total momentum P = N j=1 mj q̇j . Rotational invariance ↔ conservation of angular momentum As a second example, consider a system invariant under rotations around the 3-axis. In cartesian coordinates, the corresponding symmetry transformation is described by (3) q 7→ hφ (q) ≡ Rφ q, where the angle φ serves as a control parameter (i.e. φ assumes the role of the parameter s above), and cos(φ) sin(φ) 0 (3) Rφ ≡ − sin(φ) cos(φ) 0 0 0 1 96 CHAPTER 4. LAGRANGIAN MECHANICS is an O(3)–matrix generating rotations by the angle φ around the 3–axis. The infinitesimal variant of a rotation is described by 1 0 R(3) ≡ − 1 0 + O(2 ). 0 0 1 From this representation, we obtain the Noether momentum as I(q, q̇) = ∂L d (R(3) q)i = m(q̇1 q2 − q̇2 q1 ). ∂ q̇i =0 This is (the negative of) the 3-component of the particle’s angular momentum. Since the choice of the 3–axis as a reference axis was arbitrary, we have established the result: Rotational invariance around a symmetry axis entails the conservation of the angular momentum component along that axis. Now, we have argued above that in cases with symmetries, one should employ adjusted coordinates. Presently, this means coordinates that are organized around the 3–axis: spherical, or cylindrical coordinates. Choosing cylindrical coordinates for definiteness, the Lagrangian assumes the form L(r, φ, ṙ, φ̇, ż) = m 2 (ṙ + r2 φ̇2 + ż 2 ) − U (r, z), 2 (4.17) where we noted that the problem of a rotationally invariant system does not depend on φ. The symmetry transformation now simply acts by translation, hs (r, z, φ) = (r, z, φ + s), and the Noether current is given by I ≡ l3 = ∂L = mr2 φ̇. ∂ φ̇ (4.18) We recognize this as the cylindrical coordinate representation of the 3–component of angular momentum. EXAMPLE Let us briefly extend the discussion above to re-derive the symmetry optimized representation of a particle in a central potential. We choose cylindrical coordinates, such that (a) the force center lies in the origin, and (b) at time t = 0, both q and q̇ lie in the z = 0 plane. Under these conditions, the motion will stay in the z = 0 plane. (Exercise: show this from the Euler–Lagrange equations.) and the cylindrical coordinates (r, φ, z) can be reduced to the polar coordinates (r, φ) of the invariant z = 0 plane. 4.2. LAGRANGIAN MECHANICS 97 The reduced Lagrangian reads L(r, ṙ, φ̇) = m 2 (ṙ + r2 φ̇2 ) − U (r). 2 From our discussion above, we know that l ≡ l3 = mr2 φ̇ is a constant, and this enables us to express the angular velocity φ̇ = l3 /mr2 in terms of the radial coordinate. This leads us the effective Lagrangian of the radial coordinate, L(r, ṙ) = l2 m 2 − U (r). ṙ + 2 2mr2 The solution of its Euler–Lagrange equations, mr̈ = −∂r U − has been discussed in section * l2 , mr3 98 CHAPTER 4. LAGRANGIAN MECHANICS Chapter 5 Hamiltonian mechanics The Lagrangian approach has introduced a whole new degree of flexibility into mechanical theory building. Still, there is room for further development. To see how, notice that in the last chapter we have consistently characterized the state of a mechanical system in terms of the 2f variables (q1 , . . . , qf , q̇1 , . . . , q̇f ), the generalized coordinates and velocities. It is clear that we need 2f variables to specify the state of a mechanical system. But are coordinates and velocities necessarily the best variables? The answer is: often yes, but not always. In our discussion of symmetries in section 4.2.3 above, we have argued that in problems with symmetries, one should work with variables qi that transform in the simplest possible way (i.e. additively) under symmetry transformations. If so, the corresponding momentum pi = ∂q̇i L is conserved. Now, if it were possible to express the velocities uniquely in terms of coordinates and momenta, q̇ = q̇(q, p), we would be in possession of an alternative set of variables (q, p) such that in a situation with symmetries a fraction of our variables stays constant. And that’s the simplest type of dynamics a variable can show. In this chapter, we show that the reformulation of mechanics in terms of coordinates and momenta as independent variables is an option. The resulting description is called Hamiltonian mechanics. Important examples of this new approach include: . It exhibits a maximal degree of flexibility in the choice of problem adjusted coordinates. . The variables (q, p) live in a mathematical space, so called “phase space”, that carries a high degree of mathematical structure (much more than an ordinary vector space.) This structure turns out to be of invaluable use in the solution of complex problems. Relatedly, . Hamiltonian theory is the method of choice in the solution of advanced mechanical problems. For example, the theory of (conservative) chaotic systems is almost exclusively formulated in the Hamiltonian approach. . Hamiltonian mechanics is the gateway into quantum mechanics. Virtually all concepts introduced below have a direct quantum mechanical extension. However, this does not mean that Hamiltonian mechanics is “better” than the Lagrangian theory. The Hamiltonian approach has its specific advantages. However, in some cases it 99 100 CHAPTER 5. HAMILTONIAN MECHANICS may be preferable to stay on the level of the Lagrangian theory. At any rate, Lagrangian and Hamiltonian mechanics form a pair that represents the conceptual basis of many modern theories of physics, not just mechanics. For example, electrodynamics (classical and quantum) can be formulated in terms of a Lagrangian and an Hamiltonian theory, and these formulations are strikingly powerful. 5.1 Hamiltonian mechanics The Lagrangian L = L(q, q̇, t) is a function of coordinates and velocities. What we are after is a function H = H(q, p, t) that is a function of coordinates and momenta, where the momenta and velocities are related to each other as pi = ∂q̇i L = pi (q, q̇, t). Once in possession of the new function H, there must be a way to express the key information carriers of the theory – the Euler Lagrange equations – in the language of the new variables. In mathematics, there exist different ways of formulating variable changes of this type, and one of them is known as 5.1.1 Legendre transform Reducing the notation to a necessary minimum, the task formulated above amounts to the following problem: given a function f (x) (here, f assumes the role of L and x represents q̇i ), consider the variable z = ∂x f (x) = z(x) (z represents the momentum pi .) If this equation is invertible, i.e. if a representation x = x(z) exists, find a partner function g(z), such that g carries the same amount of information as f . The latter condition means that we are looking for a transformation, i.e. a mapping between functions f (x) → g(z) that can be inverted g(z) → f (x), so that the function f (x) can be reconstructed in unique terms from g(z). If this latter condition is met, it must be possible to express any mathematical operation formulated for the function of f in terms of a corresponding operation on the function g (cf. the analogous situation with the Fourier transform.) Let us now try to find a transformation that meets the criteria above. The most obvious guess might be a direct variable substitution: compute z = ∂x f (x) = z(x), invert to x = x(z), ? and substitute this into f , i.e. g(z) = f (x(z)). This idea goes in the right direction but is not quite good enough. The problem is that in this way information stored in the function f may get lost. To exemplify the situation, consider the function f (x) = C exp(x), where C is a constant. Now, z = ∂x f (x) = C exp(x), which means x = ln(z/C) substitution back into f gets us to g(z) = C exp(ln(z/C)) = z. The function g no longer knows about C, so information on this constant has been irretrievably lost. (Knowledge of g(z) is not sufficient to re-construct f (x).) 5.1. HAMILTONIAN MECHANICS 101 However, it turns out that a slight extension of the above idea does the trick. Namely, consider the so called Legendre transform of f (x), g(z) ≡ f (x(z)) − zx(z). (5.1) We claim that g(z) defines a transformation of functions. To verify this, we need to show that the function f (x) can be obtained from g(z) by some suitable inverse transformation. It turns out that (up to a harmless sign change) the Legendre transform is self-inverse; just apply it once more and you get back to the function f . Indeed, let us define the variable y ≡ ∂z g(z) = ∂x f x(z) ∂z x(z) − z∂z x(z) − x(z). Now, by definition of the function z(x) we have ∂x f x(z) = z(x(z)) = z. Thus, the first two terms cancel, and we have y(z) = −x(z). We next compute the Legendre transform of g(z): h(y) = f (x(z(y)) − x(z(y))z(y) − z(y)y. Evaluating the relation y(z) = −x(z) on the specific argument z(y), we get y = y(z(y)) = −x(z(y)), i.e. f (x(z(y)) = f (−y), and the last two terms in the definition of h(y) are seen to cancel. We thus obtain h(y) = f (−y), the Legendre transform is (almost) self–inverse, and this means that by passing from f (x) to g(z) no information has been lost. The abstract definition of the Legendre transform with its nested variable dependencies can be somewhat confusing. However, when applied to concrete functions, the transform is actually easy to handle. Let us illustrate this on the example considered above: with f (x) = C exp(x), and x = ln(z/C), we get g(z) = C exp(ln(z/C)) − ln(z/C)z = z(1 − ln(z/C)). (Notice that g(z) “looks” very different from f (x), but this need not worry us.) Now, let us apply the transform once more: define y = ∂z g(z) = − ln(z/C), or z(y) = C exp(−y). This gives h(y) = C exp(−y)(1 + y) − C exp(−y)y = C exp(−y), in accordance with the general result h(y) = f (−y). The Legendre transform of multivariate functions f (x) is obtained by application of the rule to all variables: compute zi ≡ ∂xi f (x). Next construct the inverse x = x(z). Then define X g(z) = f (x(z)) − zi xi (z). (5.2) i 5.1.2 Hamiltonian function Definition We now apply the construction above to compute the Legendre transform of the Lagrange function. We thus apply Eq. (5.2) to the function L(q, q̇, t) and identify variables as x ↔ q̇ 102 CHAPTER 5. HAMILTONIAN MECHANICS and z ↔ p. (Notice that the coordinates, q, themselves play the role of spectator variables. The Legendre transform is in the variables q̇!) Le us now formulate the few steps it takes to pass to the Legendre transform of the Lagrange functions: 1. Compute the f variables pi = ∂q̇i L(q, q̇, t). (5.3) 2. Invert these relations to obtain q̇i = q̇i (q, p, t) 3. Define a function H(q, p, t) ≡ X i pi q̇i (q, q̇, t) − L(q, q̇(q, p, t)). (5.4) Technically, this is the negative of L’s Legendre transform. Of course, this function carries the same information as the Legendre transform itself. The function H is known as the Hamiltonian function of the system. It is usually called “Hamiltonian” for short (much like the Lagrangian function) is called “Lagrangian”. The notation above emphasizes the variable dependencies ofP the quantities q̇i , pi , etc. One usually keeps the notation more compact,P e.g. by writing H = i pi q̇i − L. However, it is important to remember that in both L and i q̇i pi , the variable q̇i has to be expressed as a function of q and p. It doesn’t make sense to write down formulae such as H = . . . , if the right hand side contains q̇i ’s as fundamental variables! Hamilton equations Now we need to do something with the Hamiltonian function. Our goal will be to transcribe the Euler–Lagrange equations to equations defined in terms of the Hamiltonian. The hope then is, that these new equations contain operational advantages over the Euler–Lagrange equations. Now, the Euler-Lagrange equations probe changes (derivatives) of the function L. It will therefore be a good idea, to explore what happens if we ask similar questions to the function H. In doing so, we should keep in mind that our ultimate goal is to obtain unambiguous information on the evolution of particle trajectories. Let us then compute ∂qi H(q, p, t) = pj ∂qi q̇j (q, p) − ∂qi L(q, q̇(q, p)) − ∂q̇j L(q, q̇(q, p))∂qi q̇j (q, p). The first and the third term cancel, because ∂q̇j L = pj . Now, iff the curve q(t) is a solution curve, the middle term equals −dt ∂q̇i L = −dt pi . We thus conclude that the (q, p)representation of solution curves must obey the equation ṗi = −∂qi H(q, p, t). 5.1. HAMILTONIAN MECHANICS 103 Similarly, ∂pi H(q, p, t) = q̇i (q, p) + pj ∂pi q̇j (q, p, t) − ∂q̇j L(q, q̇(q, p))∂pi q̇j (q, p, t). The last two terms cancel, and we have q̇i = ∂pi H(q, p, t). 1 Finally, let us compute the partial time derivative ∂t H. dt H(q, p, t) = pi ∂t q̇i (q, p, t) − ∂q̇i L(q, q̇(q, p, t))∂t q̇i (q, p, t) − ∂t L(q, q̇, t). Again, two terms cancel and we have ∂t H(q, p, t) = −∂t L(q, q̇(q, p, t), t), (5.5) where the time derivative on the r.h.s. acts only on the third argument L( . , . , t). Let us summarize where we are: The solution curve q(t) of a mechanical system defines a 2f –dimensional “curve”, (q(t), q̇(t)). Eq. (5.3) then defines a 2f –dimensional curve (q(t), p(t)). The invertibility of the relation p ↔ q̇ implies that either representation faithfully describes the curve. The derivation above then shows that The solution curves (q, p)(t) of mechanical problems obey the so–called Hamilton equations q̇i = ∂pi H, ṗi = −∂qi H. (5.6) Some readers may worry about the numbers of variables employed in the description of curves: in principle, a curve is uniquely represented by the f variables q(t). However, we have now decided to represent curves in terms of the 2f –variables (q, p). Is there some redundancy here? No there isn’t and the reason can be understood in different ways. First notice that in Lagrangian mechanics, solution curves are obtained from second order differential equations in time. (The Euler–Lagrange equations contain terms q̈i .) In contrast, the Hamilton equations 2 are first order in time. The price to be payed for this reduction is the introduction of a second set of variables, p. Another way to understand the rational behind the introduction of 2f variables is by noting that the solution of the (2nd order differential) Euler–Lagrange equations 1 Here it is important to be very clear about what we are doing: in the present context, q̇ is a variable in the Lagrange function. (We could also name it v or z, or whatever.) It is considered a free variable (no time dependence), unless the transformation q̇i = q̇i (q, p, t) becomes explicitly time dependent. Remembering the origin of this transformation, we see that this may happen if the function L(q, p, t) contains explicit time dependence. This happens, e.g., in the case of rheonomic constraints, or time dependent potentials. 2 Ordinary differential equations of nth order can be transformed to systems of n ordinary differential equations of first order. The passage L → H is example of such an order reduction. 104 CHAPTER 5. HAMILTONIAN MECHANICS requires the specification of 2f boundary conditions. These can be the 2f conditions 3 stored in the specification q(t0 ) = q0 and q(t1 ) = q1 of an initial and a final point, or the 4 specification q(t0 ) = q0 and q̇(t0 ) = v0 of an initial configuration and velocity. In contrast the Hamilton are uniquely solvable once an initial configuration (q(t0 ), p(t0 )) = (q0 , p0 ) has been specified. Either way, we need 2f boundary conditions, and hence 2f variables to uniquely specify the state of a mechanical system. EXAMPLE To make the new approach more concrete, let us formulate the Hamilton equations of a point particle in cartesian coordinates. The Lagrangian is given by m L(q, q̇, t) = q̇2 − U (q, t), 2 where we allow for time dependence of the potential. Thus pi = mq̇i , which is inverted as q̇i (p) = pi m. This leads to H(q, p, t) = 3 X i=1 pi p2 pi − L(q, p/m, t) = + U (q, t). m 2m The Hamilton equations assume the form p , m ṗ = −∂q U (q, t). q̇ = Further, ∂t H = −∂t L = ∂t U inherits its time dependence from the time dependence of the potential. The two Hamilton equations are recognized as a reformulation of the Newton equation. (Substituting the time derivative of the first equation, q̈ = ṗ/m, into the second we obtain the Newton equation mq̈ = −∂q U .) p2 Notice the Hamiltonian function H = 2m + U (q, t) equals the energy of the particle! In the next section, we will discuss this connection in more general terms. EXAMPLE Let us now solve these equations for the very elementary example of the one2 2 dimensional harmonic oscillator. In this case, V (q) = mω 2 q and the Hamilton equations assume the form p q̇ = , m ṗ = −mω 2 q. For a given initial configuration x(0) = (q(0), p(0)), these equations afford the unique solution p(0) sin ωt, mω p(t) = p(0) cos ωt − mω sin ωt. q(t) = q(0) cos ωt + 3 (5.7) But notice that conditions of this type do not always uniquely specify a solution – think of examples! Formally, this follows from a result of the theory of ordinary differential equations: a system of n first order differential equations ẋi = fi (x1 , . . . , xn ), i = 1, . . . , n affords a unique solution, once n initial conditions xi (0) have been specified. 4 5.1. HAMILTONIAN MECHANICS 105 Physical meaning of the Hamilton function The Lagrangian L = T − U did not carry immediate physical meaning; its essential role was that of a generator of the Euler–Lagrange equations. However, with the Hamiltonian the situation is different. To understand its physical interpretation, let us compute the full time derivative dt H = dt H(q(t), p(t), t) of the Hamiltonian evaluated on a solution curve (q(t), p(t)), (i.e. a solution of the Hamilton equations (5.6)): dt H(q, p, t) = f X ∂H(q, p, t) i=1 (5.6) = ∂qi ∂H(q, p, t) q̇i + ṗi ∂pi + ∂H(q, p, t) ∂t ∂H(q, p, t) . ∂t Now, above we have seen (cf. Eq. (5.5)) that the Hamiltonian inherits its explicit time dependence ∂t H = −∂t L from the explicit time dependence of the Lagrangian. In problems without explicitly time dependent potentials (they are called autonomous problems), the Hamiltonian stays constant on solution curves. We have thus found that the function H(q, p), (the missing time argument indicates that we are now looking at an autonomous situation) defines a constant of motion of extremely general nature. What is its physical meaning? To answer this question, we consider an general N –body system in an arbitrary potential. (This is most general conservative autonomous setup one may imagine.) In cartesian coordinates, its Lagrangian is given by L(q, p) = N X j=1 mj q̇2j − U (q1 , . . . , qN ), where U is the N –body potential and mj the mass of the jth particle. Proceeding as in the example on p 104, we readily find H(q, p) = N X p2i + U (q1 , . . . , qN ). 2m i j=1 (5.8) The expression on the r.h.s. we recognize as the energy of the system. We are thus led to the following important conclusion: The Hamiltonian H(q, p) of an autonomous problem (a problem with time–independent potentials) is dynamically conserved: H(q(t), p(t)) = E = const. on solution curves (q(t), p(t)). However, our discussion above implies another very important corollary. It has shown that The Hamiltonian H = T + U is given by the sum of potential energy, U (q, t), and kinetic energy, T (q, p, t), expressed as a function of coordinates and momenta. 106 CHAPTER 5. HAMILTONIAN MECHANICS We have established this connection in cartesian coordinates. However, the coordinate invariance of the theory implies that this identification holds in general coordinate systems. Also, the identification H = T + U did not rely on the time–independence of the potential, it extends to potentials U (q, t). INFO In principle, the identification H = T + U provides an option to access the Hamiltonian without reference to the Lagrangian. In cases where the expression T = T (q, p, t) of the kinetic energy in terms of the coordinates and momenta of the theory is known, we may just add H(q, p, t) = T (q, p, t) + U (q, t). In such circumstances, there is no need to compute the Legendre Hamiltonian by the route L(q, q̇, t) −→ H(q, p, t). Often, however, the identification of the momenta of the theory is not so obvious, and one is better advised to proceed via Legendre transform. 5.2 Phase space In this section, we will introduce phase space as the basic “arena” of Hamiltonian mechanics. We will start from an innocent definition of phase space as a 2f –dimensional coordinate space comprising configuration space coordinates, q, and momenta, p. However, as we go along, we will realize that this working definition is but the tip of an iceberg: the coordinate spaces of Hamiltonian mechanics are endowed with very deep mathematical structure, and this is one way of telling why Hamiltonian mechanics is so powerful. 5.2.1 Phase space and structure of the Hamilton equations To keep the notation simple, we will suppress reference to an optional explicit time dependence of the Hamilton functions throughout this section, i.e. we just write H(. . . ) instead of H(. . . , t). The Hamilton equations are coupled equations for coordinates q, and momenta p. As was argued above, the coordinate pair (q, p) contains sufficient information to uniquely encode the state of a mechanical system. This suggests to consider the 2f –component objects q x≡ (5.9) p as the new fundamental variables of the theory. Given a coordinate system, we may think of x as element of a 2f –dimensional vector space. However, all what has been said in section 4.1.3 about the curves of mechanical systems and their coordinate representations carries over to the present context: in abstract terms, the pair “(configuration space points, momenta)” defines a mathematical space known as phase space, Γ. The formulation of coordinate invariant descriptions of phase space is a subject beyond the scope of the present course. 5 However, locally it is always possible to parameterize Γ in terms of coordinates, and to describe 5 Which means that Γ has the status of a 2f –dimensional manifold. 5.2. PHASE SPACE 107 its elements through 2f –component objects such as (5.9). This means that, locally, phase space can be identified with a 2f –dimensional vector space. However, different coordinate representations correspond to different coordinate vector spaces, and sometimes it is important 6 to recall that the identification (phase space) ↔ (vector space) may fail globally. Keeping the above words of caution in mind, we will temporarily identify phase space Γ with a 2f –dimensional vector space. We will soon see that this space contains a very particular sort of “scalar product”. This additional structure makes phase space a mathematical object far more interesting than an ordinary vector space. Let us start with a rewriting of the Hamilton equations. The identification xi = qi and xf +i = pi , i = 1, . . . , f enables us to rewrite Eq. (5.6) as, ẋi = ∂xi+f H, ẋi+f = −∂xi H, where i = 1, . . . , f . We can express this in a more compact form as ẋi = Iij ∂xj H, i = 1, . . . , 2f, (5.10) or, in vectorial notation, ẋ = I∂x H. Here, the (2f ) × (2f ) matrix I is defined as 1f I= , −1f (5.11) (5.12) and 1f is the f -dimensional unit matrix. The matrix I is sometimes called the symplectic unity. 5.2.2 Hamiltonian flow For any x, the quantity I∂x H(x) = {Iij ∂xj H(x, t)} is a vector in phase space. This means that the prescription XH :Γ → Rn , x 7→ I∂x H, (5.13) defines a vector field in phase space, the so-called Hamiltonian vector field. The form of the Hamilton equations ẋ = Xh , (5.14) suggests an interpretation in terms of the “flow lines” of the Hamiltonian vector field: at each point in phase space, the field XH defines a vector XH (x). The Hamilton equations state 6 For example, there are mechanical systems whose phase space is a two–sphere, and the sphere is not a vector space. (However, locally, it can be represented in terms of vector–space coordinates.) 108 CHAPTER 5. HAMILTONIAN MECHANICS Figure 5.1: Visualization of the Hamiltonian vector field and its flow lines that the solution curve x(t) is tangent to that vector, x(t) = XH (x). One may visualize the situation in terms of the streamlines of a fluid. Within that analogy, the value XH (x) is a measure of the local current flow. If one injected a drop of colored ink into the fluid, its trace would be a representation of the curve x(t). For a general vector field v : U → RN , x 7→ v(x), where U ⊂ RN , one may define a parameter–dependent map Φ : U × R → U, (x, t) 7→ Φ(x, t), (5.15) ! through the condition ∂x Φ(x, t) = v(Φ(x, t)). The map Φ is called the flow of the vector field v. Specifically, the flow of the Hamiltonian vector field XH is defined by the prescription Φ(x, t) ≡ x(t), (5.16) where x(t) is a curve with initial condition x(t = 0) = x. Equation (5.16) is a proper definition because for a given x(0) ≡ x, the solution x(t) is uniquely defined. Further, ∂t Φ(x, t) = dt x(t) = XH (x(t)) satisfies the flow condition. The map Φ(x, t) is called the Hamiltonian flow. There is a corollary to the uniqueness of the curve x(t) emanating from a given initial condition x(0): Phase space curves never cross. For if they did, the crossing point would be the initial point of the two out-going stretches of the curve, and this would be in contradiction to the uniqueness of the solution for a given initial configuration. There is, however, one subtle caveat to the statement above: although phase space curves do not cross, they may actually touch each other in common terminal points. (For an example, see the discussion in section 5.2.3 below.) 5.2. PHASE SPACE 5.2.3 109 Phase space portraits Graphical representations of phase flow, so-called phase space portraits are very useful in the qualitative characterization of mechanical motion. The only problem with this is that phase flow is defined in 2f dimensional space. For f > 1, we cannot draw this. What can be drawn, however, is the projection of a curve onto any of the 2f (2f − 1)/2 coordinate planes (xi , xj ), 1 ≤ i < j ≤ 2f . See the figure for the example of a coordinate plane projection out of 3d–dimensional space. However, it takes some experience to work with those representations and we will not discuss them any further. However, the concept of phase space portraits becomes truly useful in problems with one degree of freedom, f = 1. In this case, the conservation of energy on individual curves H(q, p) = E = const. enables us compute an explicit representation q = q(p, E), or p = p(q, E) just by solving the relation H(q, p) = E for q or p. This does not “solve” the mechanical problem under consideration. (For that one would need to know the time dependence at which the curves are traversed.) Nonetheless, it provides a far–reaching characterization of the motion: for any phase space point (q, p) we can construct the curve running through it without solving differential equations. The projection of these curves onto configuration space (q(t), p(t)) 7→ q(t) tells us something about the evolution of the configuration space coordinate. Let us illustrate these statements, on an example where a closed solution is possible, the harmonic oscillator. From the Hamiltonian 2 p2 H(q, p) = 2m + mω q 2 = E, we find 2 s p = p(q, E) = ± mω 2 2 2m E − q . 2 Examples of these (ellipsoidal) curves are shown in the figure. It is straightforward to show (do it!) that they are tangential to the Hamiltonian vector field −1 m p XH = . −mω 2 q 2 q2, At the turning points, p = 0, the energy of the curve E = mω 2 equals the potential energy. Now, the harmonic oscillator may be an example a little bit too basic to illustrate the usefulness of phase space portraits. Instead, consider the problem defined by the potential shown in Fig. 5.2. The Hamilton equations can now no longer be solved in closed form. However, we may still compute curves by solution of H(p, q) = E ⇒ p = p(q, E), and the results look as shown qualitatively for three different values of E in the figure. These curves 110 CHAPTER 5. HAMILTONIAN MECHANICS give a far–reaching impression of the motion of a point particle in the potential landscape. Specifically, notice the threshold energy E2 corresponding to the local potential maximum. At first sight, the corresponding phase space curve appears to violate the criterion of the nonexistence of crossing points (cf. the “critical point” at p = 0 beneath the potential maximum.) In fact, however, this isn’t a crossing point. Rather the two incoming curves “terminate” at this point: coming from the left, or right it takes infinite time for the particle to climb the potential maximum. EXERCISE Show that a trajectory at energy E = V ∗ ≡ V (q ∗ ) equal to the potential maximum at q = q ∗ needs infinite time to climb the potential hill. To this end, use that the potential maximum at q ∗ can be locally modelled as an inverted harmonic oscillator potential, V (q) ' V ∗ − C(q − q ∗ )2 , where C is a positive constant. Consider the local approximation of the Hamilton p2 − C(q − q ∗ )2 + const. and formulate and solve the corresponding equations of function H = 2m motion (hint: compare to the standard oscillator discussed above.) Compute the time it takes to reach the maximum if E = V (q ∗ ). Eventually it will rest at the maximum in an unstable equilibrium position. Similarly, the outgoing trajectories “begin” at this point. Starting at zero velocity (corresponding to zero initial momentum), it takes infinite time to accelerate and move downhill. In this sense, the curves touching (not crossing!) in the critical point represent idealizations that are never actually realized. Trajectories of this type are called separatrices. Separatrices are important in that they separate phase space into regions of qualitatively different type of motion (presently, curves that make it over the hill and those which do not.) EXERCISE Take a few minutes to familiarize yourself with the phase space portrait and to learn how to “read” such representations. Sketch the periodic potential V (q) = cos(2πq/a), a = const. and its phase space portrait. 5.2.4 Poisson brackets Consider a (differentiable) function in phase space, g : Γ → R, x 7→ g(x). A natural question to ask is how g(x) will evolve, as x ≡ x(0) 7→ x(t) traces out a phase space trajectory. In other words, we ask for the time evolution of g(x(t)) with initial condition g(x(0)) = g(x). The answer is obtained by straightforward time differentiation: f X d d g(x(t), t) = g(q(t), p(t), t) = dt dt i=1 ∂g dqi ∂g dpi + ∂qi dt ∂pi dt + ∂t g(q(t), p(t), t) = (q(t),p(t),t) f X ∂g ∂H ∂g ∂H = − + ∂t g(q(t), p(t), t), ∂q ∂p ∂p ∂q i i i i (q(t),p(t),t) i=1 where the terms ∂t g account for an optional explicit time dependence of g. The characteristic combination of derivatives governing this expressions appears frequently in Hamiltonian 5.2. PHASE SPACE 111 Figure 5.2: Phase space portrait of a ”non–trivial” one-dimensional potential. Notice the “critical point” at the threshold energy 2). mechanics. It motivates the introduction of a shorthand notation, f X ∂f ∂g ∂f ∂g − {f, g} ≡ . ∂p ∂q ∂q ∂p i i i i i=1 (5.17) This is expression is called the Poisson bracket of two phase space functions f and g. In the invariant notation of phase space coordinates x, it assumes the form {f, g} = (∂x f )T I ∂x g, (5.18) where I is the symplectic unit matrix defined above. The time evolution of functions may be concisely expressed in terms of the Poisson bracket of the two functions H and g: dt g = {H, g} + ∂t g. (5.19) Notice that the Poisson bracket operation bears similarity to a scalar product of functions. Let CΓ be the space of smooth (and integrable) functions in phase space. Then { , } : CΓ × CΓ → R, (f, g) 7→ {f, g}, (5.20) 112 CHAPTER 5. HAMILTONIAN MECHANICS defines a scalar product. This scalar product comes with a number of characteristic algebraic properties: . {f, g} = −{g, f }, (skew symmetry), . {cf + c0 f 0 , g} = c{f, g} + c0 {f 0 , g}, (linearity), . {c, g} = 0, . {f f 0 , g} = f {f 0 , g} + {f, g}f 0 , (product rule), . {f, {g, h}} + {h, {f, g}} + {g, {h, f }} = 0 (Jacobi identity), where, c, c0 ∈ R are constants. The first three of these relations are immediate consequences of the definition and the fourth follows from the product rule. The proof of the Jacobi identity amounts to a straightforward if tedious exercise in differentiation. At this point, the Poisson bracket appears to be little more than convenient notation. However, as we go along, we will see that it encapsulates important information on the mathematical structure of phase space. 5.2.5 Variational principle The solutions of the Hamilton equations x(t) can be interpreted as extremal curves of a certain action functional. This connection will be a gateway to the further development of the theory. Let us consider the set M = {x : [t0 , t1 ] → Γ, q(t0 ) = q0 , q(t1 ) = q1 } of all phase space curves beginning and ending at a common configuration space point, q0,1 . We next define the local functional S :M → R, x 7→ S[x] = Z t1 dt t0 f X i=1 ! pi q̇i − H(q, p, t) . (5.21) We now claim that the extremal curves of this functional are but P the solutions of the Hamilton equations (5.6). To see this, define the function F (x, t) ≡ fi=1 pi q̇i − H(q, p, t), in terms Rt of which S[x] = t01 dt F (x, t). Now, according to the general discussion of section 4.1.2, the extremal curves are solutions of the Euler Lagrange equations (dt ∂ẋi − ∂xi ) F (x, t) = 0, i = 1, . . . , 2f. Evaluating these equations for i = 1, . . . , f and i = f + 1, . . . , 2f , respectively, we obtain the first and second set of the equations (5.6). 5.3. CANONICAL TRANSFORMATIONS 5.3 113 Canonical transformations In the previous chapter, we have emphasized the flexibility of the Lagrangian approach in the choice of problem adjusted coordinates: any diffeomorphic transformation qi 7→ qi0 = qi0 (q), i = 1, . . . , f let the Euler–Lagrange equations form–invariant. Now, in the Hamiltonian formalism, we have twice as many variables xi , i = 1, . . . , 2f . Of course, we may consider “restricted” transformations, qi 7→ qi0 (q), where only the configuration space coordinates are transformed (the transformation of the pi ’s then follows.) However, things get more interesting, when we set out to explore the full freedom of coordinate transformations “mixing” configuration space coordinates, and momenta. This would include, for example, a mapping qi 7→ qi0 ≡ pi , pi 7→ p0i = qi (a diffeomorphism!) that exchanges the roles of coordinates and momenta. Now, this newly gained freedom raises two questions: (i) what might such generalized transformations be good for, and (ii) does every diffeormorphic mapping xi 7→ x0i qualify as a good coordinate transformation? Beginning with the second one, let us address these two questions in turn: 5.3.1 When is a coordinate transformation ”canonical”? In the literature of variable transformations in Hamiltonian dynamics it is customary to denote the “new variables” X ≡ x0 by capital letters. We will follow this convention. Following our general paradigm of valuing “form invariant equations”, we deem a coordinate transformation “canonical”, if the following condition is met: 7 A diffeomorphism x 7→ X(x) defines a canonical transformation if there exists a function H 0 (X, t) such that the representation of solution curves (i.e. solutions of the equations ẋ = I∂x H) in the new coordinates X = X(x) solves the “transformed Hamilton equations” Ẋ0 = I∂X H 0 (X, t). (5.22) Now, this definition may leave us a bit at a loss: it does not tell how the “new” Hamiltonian H 0 relates to the old one, H and in this sense seems to be strangely “under–defined”. However, before exploring the ways by which canonical transformations can be constructively obtained, let us take a look at possible motivations for switching to new coordinates. 5.3.2 Canonical transformations: why? Why would one start thinking about generalized coordinate transformations mixing configuration space coordinates and momenta, etc.? In previous sections, we had argued that coordinates should relate to the symmetries of a problem. Symmetries, in turn, generate conservations laws (Noether), i.e. for each symmetry one obtains a quantity, I, that is dynamically 114 CHAPTER 5. HAMILTONIAN MECHANICS conserved. Within the framework of phase space dynamics, this means that there is a function I(x) such that dt I(x(t)), where x(t) is a solution curve. We may think of I(x) as a function that is constant along the Hamiltonian flow lines. Now, suppose we have s ≤ f symmetries and, accordingly, s conserved functions Ii , i = 1, . . . , f . Further assume that we find a canonical transformation to phase space coordinates x 7→ X, where P = (I1 , . . . , Is , Ps+1 , . . . , Pf ), and Q = (φ1 , . . . , φs , Qs+1 , . . . , Qf ). In words: the first s of the new momenta are the functions Ii . Their conjugate configuration 8 space coordinates are called φi . The new Hamiltonian will be some function of the new coordinates and momenta, H 0 (φ1 , . . . , φs , Qs+1 , . . . , Qf , I1 , . . . , Is , Ps+1 , . . . , Pf ). Now let’s take a look at the Hamilton equations associated to the variables (φi , Ii ): dt φi = ∂Ii H 0 , ! dt Ii = −∂φi H 0 = 0. The second equation tells us that H 0 must be independent of the variables φi : The coordinates, φi , corresponding to conserved momenta Ii are cyclic coordinates, ∂H 0 = 0. ∂φi At this point, it should have become clear that coordinate sets containing conserved momenta, Ii , and their coordinates, φi are tailored to the optimal formulation of mechanical problems. The power of these coordinates becomes fully evident in the extreme case where s = f , i.e. where we have as many conserved quantities as degrees of freedom. In this case H 0 = H 0 (I1 , . . . , If ), and the Hamilton equations assume the form dt φi = ∂Ii H 0 ≡ ωi dt Ii = −∂φi H 0 = 0. These equations can now be trivially solved: 9 φi (t) = ωi t + αi , Ii (t) = Ii = const. 8 It is natural to designate the conserved quantities as “momenta”: in section 4.2.4 we saw that the conserved quantity associated to the “natural” coordinate of a symmetry (a coordinate transforming additively under the symmetry operation) is the Noether momentum of that coordinate. 9 The problem remains solvable, even if H 0 (I1 , . . . , If , t) contains explicit time dependence. In this case, φ̇i = ∂Ii H 0 (I1 , . . . , If , t) ≡ ωi (t), where ωi (t) is a function of known time dependence. (The time dependence is known because the variables Ii are constant and we are left with the externally imposed dependence on Rt time.) These equations – ordinary first order differential equations in time – are solved by φi (t) = t0 dt0 ωi (t0 ). 5.3. CANONICAL TRANSFORMATIONS 115 ? A very natural guess would H 0 (X, t) = H(x(X), t), i.e. the “old” Hamiltonian expressed in new coordinates. However, we will see that this ansatz can be too restrictive. Progress with this situation is made once we remember that solution curves x are extrema of an action functional S[x] (cf. section 5.2.5). This functional is the x–representation of an “abstract” functional. The same object can be expressed in terms of X–coordinates (cf. our discussion in section 4.1.3.) as S 0 [X] = S[x]. The form of the functional S 0 follows from the condition that the X–representation of the curve be solution of the equations (5.22). This is equivalent to the condition ! Z t1 f X Pi Q̇i − H 0 (Q, P, t) + const., dt S 0 [X] = t0 i=1 where “const.” is a contribution whose significance we will discuss in a moment. Indeed, the variation of S 0 [X] generates Euler–Lagrange equations which we saw in section 5.2.5 are just the Hamilton equations (5.22). 116 CHAPTER 5. HAMILTONIAN MECHANICS Bibliography [1] J. D. Jackson. Classical Electrodynamics. Wiley, 1975. [2] R. Sexl and H. Urbantke. Relativity, Groups, Particles. Springer, 2001. 117