Download Axiomatic and constructive quantum field theory Thesis for the

Axiomatic and constructive quantum field theory Thesis for the master’s in Mathematical physics Sohail Sheikh Student number: 0481289 Thesis advisor : prof.dr. R.H. Dijkgraaf Second reader : dr. H.B. Posthuma Korteweg-de Vries Institute (KdVI) Universiteit van Amsterdam (UvA) FNWI August 2013 Abstract We investigate the mathematical structure of quantum field theory. For this purpose, we first have to develop the mathematical framework for a relativistic quantum theory in terms of a Hilbert space on which there is defined a unitary representation of the universal covering group of the restricted Poincaré group. We then discuss how quantum fields can be used in physics to compute physical quantities for scattering processes, such as scattering amplitudes. After this discussion about the use of quantum fields in physics, we analyze two different axiom systems for mathematically rigorous quantum field theory, namely the Wightman axiom system and the Haag-Kastler axiom system. Finally, we look at some results in constructive quantum field theory (CQFT). In CQFT the goal is to construct concrete non-trivial examples of models that satisfy the Wightman axioms or Haag-Kastler axioms. Contents 1 Introduction 1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Conventions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Special relativity and quantum theory 2.1 Special relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1.1 Minkowski spacetime . . . . . . . . . . . . . . . . . . . . . . 2.1.2 The Lorentz group, causal structure and the Poincaré group 2.2 Quantum theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2.1 States and observables . . . . . . . . . . . . . . . . . . . . . 2.2.2 The general framework of quantum theory . . . . . . . . . . 2.2.3 Symmetries in quantum theory . . . . . . . . . . . . . . . . 2.2.4 Poincaré invariance and one-particle states . . . . . . . . . 2.2.5 Many-particle states and Fock space . . . . . . . . . . . . . 3 The 3.1 3.2 3.3 3.4 3.5 physics of quantum fields The interaction picture and scattering theory . . . . . The use of free quantum fields in scattering theory . . Calculation of the S-matrix using perturbation theory Obtaining V from a Lagrangian . . . . . . . . . . . . . Some remarks on the physics of quantum fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 5 5 . . . . . . . . . 6 6 8 10 19 19 21 24 32 46 . . . . . 52 52 57 62 64 71 4 The mathematics of quantum fields 73 4.1 The Wightman formulation of quantum field theory . . . . . . . . . . . . . . . . . 73 4.1.1 Mathematical preliminaries: Distributions and operator-valued distributions 73 4.1.2 The Wightman axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 4.1.3 Wightman functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 4.1.4 Important theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84 4.1.5 Example: The free hermitean scalar field . . . . . . . . . . . . . . . . . . . 90 4.1.6 Haag-Ruelle scattering theory . . . . . . . . . . . . . . . . . . . . . . . . . . 94 4.2 The Haag-Kastler formulation of quantum field theory . . . . . . . . . . . . . . . . 95 4.2.1 The algebraic approach to quantum theory . . . . . . . . . . . . . . . . . . 95 4.2.2 The Haag-Kastler axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 4.2.3 Vacuum states in the Haag-Kastler framework . . . . . . . . . . . . . . . . 105 5 Constructive quantum field theory 5.1 The Hamiltonian approach . . . . . . . . . . . . . . . . . . . . . . 5.1.1 The (λφ4 )2 -model as a Haag-Kastler model . . . . . . . . . 5.1.2 The physical vacuum for the (λφ4 )2 -model . . . . . . . . . . 5.1.3 The P(φ)2 -model and verification of some of the Wightman 5.1.4 Similar methods for other models . . . . . . . . . . . . . . . 5.2 The Euclidean approach . . . . . . . . . . . . . . . . . . . . . . . . 5.2.1 Euclidean fields and probability theory . . . . . . . . . . . . 5.2.2 An alternative method: The Osterwalder-Schrader theory . 5.2.3 The P(φ)2 -model as a Wightman model . . . . . . . . . . . . . . . . . . . . . . . . . . axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 107 107 114 115 116 116 117 127 127 A Hilbert space theory 130 A.1 Direct sums and integrals of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . 130 A.2 Self-adjoint operators and the spectral theorem . . . . . . . . . . . . . . . . . . . . 130 1 B Examples of free fields B.1 The (0, 0)-field (or scalar field) . . . . B.2 The ( 21 , 12 )-field (or vector field) . . . . B.3 The ( 21 , 0)-field and the (0, 12 )-field . . B.4 The ( 12 , 0) ⊕ (0, 12 )-field (or Dirac field) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135 135 135 137 138 References 140 Popular summary (english) 142 Popular summary (dutch) 143 2 1 Introduction Quantum field theory (QFT) is the physical theory that emerged when physicists tried to construct a quantum theory that would be compatible with special relativity. Although this theory is used to describe high-energy sub-atomic particles, the fundamental objects of the theory are fields. The theoretical predictions that can be made with QFT have been tested several times against experimental results, and these predictions turned out to be highly accurate. For this reason, QFT should be viewed as a very important physical theory. When I first came to study QFT, there were some aspects of the theory that puzzled me. For example, I was very used to the systematic approach of non-relativistic quantum mechanics, where for each system one can specify very precisely in what Hilbert space the physical states live. In fact, in non-relativistic quantum mechanics the choice of Hilbert space is completely determined by the number of degrees of freedom, by virtue of the Stone-Von Neumann theorem. However, in QFT the only systems for which the Hilbert spaces are specified are the free systems, i.e. the systems that describe particles that do not interact with each other. I found it very confusing that in QFT one describes a physical system quantum mechanically without ever mentioning the Hilbert space. After all, in non-relativistic quantum mechanics the starting point for any quantum mechanical description was the Hilbert space. Another thing that I found very difficult about QFT is the fact that it is built around perturbation theory. In non-relativistic quantum mechanics one often starts with an exact theory for describing a particular system and then uses perturbation theory to approximate the solution for the system. In QFT the starting point is already a perturbative expression, such as the Dyson series, without the mention of what exact mathematical expression we are approximating. In my first meeting with professor Dijkgraaf I explained to him that I had some difficulties in understanding QFT and that I would like to work on a master thesis that would take these difficulties away. It was in this context that we came up with the idea to examine the mathematical structure of QFT in terms of the Wightman framework and the Haag-Kastler framework. Professor Dijkgraaf emphasized that he wanted me to motivate these mathematical frameworks by using arguments from physics, and that he also wanted to see some results from constructive quantum field theory (CQFT). Besides these two demands, I was completely free to organize the thesis according to my own taste. This master thesis should be readable to anyone with a healthy knowledge of real and complex analyis, measure and integration theory, functional analysis, operator algebras, Lie groups and Lie algebras. Since I have followed several courses1 in these fields during my study, I did not bother explaining anything concerning these topics. For instance, I do not give the definition of C ∗ algebras and Von Neumann algebras, and I do not explain the Gelfand-Naimark-Segal construction for general C ∗ -algebras or the Gelfand-Naimark theorem for abelian C ∗ -algebras. On the other hand, since I was not familiar with the theory of distributions and operator-valued distributions, I did write a subsection on these subjects. Furthermore, since the theory of unbounded operators on a Hilbert space is not part of any course that one can follow at the University of Amsterdam, I also included an appendix on unbounded (self-adjoint) operators. Strictly speaking, no knowledge of physics is required for reading this thesis, although the material would be more natural if one already has some background knowledge in quantum physics. 1.1 Overview This thesis consists of four chapters, excluding the present introduction. These chapters consist of several sections, some of which are further decomposed into subsections. The content of each of the four chapters can be briefly summarized as follows. 1 In the national mastermath programm there were two excellent courses given by M. Müger and N.P. Landsman on C ∗ -algebras and operator algebras, respectively. I am very grateful to these two teachers from de Radboud Universiteit Nijmegen, not only for giving these courses, but also for the fact that they helped me very much in finding good literature for this thesis. 3 In chapter 2 we will consider the two main ingredients of quantum field theory, namely special relativity and quantum theory. The first section of this chapter, on special relativity, commences with the introduction of certain physical concepts such as inertial observers and the invariance of the speed of light. These physical concepts will then be used to motivate the structure of the mathematical model that will be used for the description of spacetime. In the remainder of the section we will investigate the properties of this mathematical model, including a detailed discussion of the Poincaré group and its universal covering. In the second section, on quantum theory, we define the notion of a state and an observable from the physical point of view. We will then explain that in quantum theory these states and observables are represented mathematically in terms of Hilbert space objects, and we will consider time evolution in both the Heisenberg picture and Schrödinger picture. Because any quantum theory that is consistent with special relativity should be invariant under Poincaré transformations, the next logical step is to study how symmetries are implemented in a quantum theory. The important result in this context is Wigner’s theorem, which states that, in a quantum system without superselection rules, any symmetry can be represented either by a unitary operator or by an anti-unitary operator. Wigner’s theorem can then be applied to Poincaré transformations, which will eventually lead to the conclusion that the Hilbert space of any relativistic quantum system must contain a unitary representation of the universal covering group of the restricted Poincaré group. In case this representation is irreducible, the corresponding Hilbert space is interpreted as the pure state space of a single-particle system. The concepts of mass and spin of a particle arise very naturally in the study of these irreducible representations. In the last part of the section we will construct the Hilbert spaces for systems consisting of more than one particle. These spaces are tensor products of single particle Hilbert spaces, but in general not all states in these tensor products are physically realizable. Finally, we will consider spaces in which the total number of particles is not constant. In chapter 3 we will give a brief overview of the use of quantum fields in physics. The concept of a quantum field will be introduced as a computational tool in the perturbative calculations that are carried out in the quantitative description of experiments in which high-energy particles collide with each other. This point of view is adopted from Weinberg’s book [35]. Once the quantum fields are defined, we will sketch how they can be used to compute physically interesting quantities. These computations are done by using perturbation theory, which is made somewhat easier by the use of Feynman diagrams and the corresponding Feynman rules. However, we will not explain the precise content of the Feynman rules, nor will we consider the process of renormalization which is necessary to transform infinite quantities into finite ones. After considering methods of computation in quantum field theory, we will show how one can obtain a quantum field theory from a classical Lagrangian field theory, since this is the route that is followed in practice. Finally, we will try to motivate what aspects of the physical theory should be included in any mathematically rigorous treatment of quantum fields. In chapter 4 we will discuss two different axiom systems for quantum field theory. In the first section of this chapter we will consider the Wightman axioms. These axioms will be motivated by the physical structure of quantum field theory as described in chapter 3. However, before we can formulate the Wightman axioms, we first need to study the theory of distributions and operator-valued distributions. After these mathematical concepts have been studied and after the axioms have been formulated, we will prove several properties that are shared by all Wightman theories. Among these properties are the spin-statistics theorem and the PCT-theorem, which are very important from the physical point of view. As an easy example of a Wightman theory we will consider the free hermitean scalar field. We will treat this example in some detail, because the results will also be necessary when we will construct an interacting quantum field theory in the next chapter. To close our discussion of the Wightman theory, we will show that under certain conditions the Wightman theories allow an interpretation in terms of particles. These conditions define what is called the Haag-Ruelle scattering theory. In the second section of chapter 4 we will consider an alternative axiom system, namely the Haag-Kastler system. Again we will try to motivate the content of the axioms by using our knowledge from previous discussions. Because the Haag-Kastler systems are formulated in terms of abstract algebras rather than concrete Hilbert 4 spaces, we will begin with a treatment of algebraic quantum theory. This treatment includes topics such as physical representations, superselection rules and symmetries. After we have considered algebraic quantum theory in some detail, we will formulate the Haag-Kastler axioms and we will consider some important results concerning Haag-Kastler systems. In chapter 5 we will describe some of the early developments in the constructive quantum field theory programm, which started in the 1960s. In constructive quantum field theory (CQFT) the purpose is to construct concrete examples of non-trivial models that satisfy all axioms of one or both of the two axiom systems mentioned above. This is a very difficult task, and for that reason people started with the easiest possible models in 2 and 3-dimensional spacetime. We will consider some of the results that were obtained for these ’easy’ models. Because the detailed proofs of these results are almost always very technical (and not very fun), we will only focus on the main arguments and constructions. 1.2 Conventions and notation Physicists and mathematicians often use very different notations for the same mathematical object, so we had to make some choices. The most important conventions and notations that we will use are the following. -We will use the Einstein summation convention: if in some equation a Greek letter appears once as a lower index and once as an upper index, then that index should be summed over all possible values of the index. -Inner products h., .i : V × V → C on a complex vector space will always be linear in the first argument and conjugate-linear in the second. This coincides with the convention used in the mathematics literature on linear algebra and functional analysis, but is opposite to the convention used in the physics literature on quantum theory. In particular, the bra-ket notation of Dirac will not be used here. -The complex conjugate of a complex number z is denoted by z (not by z ∗ ) and the adjoint of an operator A is denoted by A∗ (not by A† ). 1.3 Acknowledgements I would like to thank professor Dijkgraaf for all his time and effort in supervising this thesis. I really appreciate that he always took his time to answer my questions, despite the fact that his agenda was always overfull during his function as president of the KNAW. Even when he became director of the Institute of Advanced Study (IAS) in Princeton, he still made time for me during the short periods that he was in the Netherlands. In this context I would also like to thank his personal assistant Ms. Corina de Boer, who always arranged my appointments with the professor and who brought me into contact with professor Dijkgraaf in the first place. I am also very grateful to the second reader of this thesis, doctor Posthuma, for all his time spent on reading the text and for all his feedback. I would also like to thank my family for all their support during my study. My parents always inspire and motivate me in everything I do, and, as the youngest child, I have the privilege that I can always take an example from my brother and sister. 5 2 Special relativity and quantum theory In this chapter we will describe the two main ingredients for quantum field theory, namely special relativity and quantum theory. At the end of this chapter we will combine the two theories to obtain the general framework for a relativistic quantum theory. This chapter is rather long and detailed, because it is only after we have developed a proper comprehension of relativistic quantum theory that we can introduce quantum fields. 2.1 Special relativity In the absence of gravity, freely moving macroscopic objects (i.e. macroscopic objects on which no external agencies act) have constant relative velocities. This emperical fact allows us to define a special class of observers, namely those for which all freely moving macroscopic objects move with constant velocities. Such observers are called inertial observers. For convenience, we assume that all inertial observers construct an orthogonal three-dimensional coordinate system in precisely the same manner. For example, we might agree that the origin of this coordinate system is always the center of mass of the observer’s body and that the x1 -axis runs from left to right, the x2 -axis runs from back to front and the x3 -axis runs from down to up; note that this defines a right-handed coordinate system. Distances along any of these axes are measured by using light rays. We also assume that all inertial observers are equipped with the same kind of clock. Thus, we may in fact define an inertial observer to be a right-handed three-dimensional coordinate system moving through space with constant velocity relative to all freely moving macroscopic objects, together with a clock at the origin. Two such coordinate systems that coincide but carry clocks that do not have the same t = 0 moment, are considered as different inertial observers. By using light rays, any two clocks that are at rest with respect to the same inertial observer can be synchronized in the familiar way. We may therefore imagine that there is a clock at every point of the coordinate system of any inertial observer and that all these clocks are synchronized. This allows a coordinatization of space and time by four numbers (t, x1 , x2 , x3 ) for any inertial observer. It is an emperical fact that all inertial observers are physically equivalent in the sense that they all obtain the same outcome whenever they conduct the same experiment. This is called the principle of special relativity. In order to fully understand this principle, we first introduce some terminology concerning physical experiments; this terminology is borrowed from chapter 1 of [1]. The part of the physical world that is studied in a particular measurement is called the measured object under consideration and the measurements that can be carried out on the measured object are called physical quantities. In any measurement process, the measured object is interacting for some time with a measuring apparatus, after which the interaction stops and the measuring apparatus immediately indicates the measured value. In any measurement, the measured object is prepared during a preparation process. By definition, this preparation process ends at the moment when the interaction between the measured object and the measuring apparatus stops and when the measuring apparatus indicates the measured value. In what follows, we will denote measured objects by Greek letters α, β, . . ., and these Greek letters should be interpreted as a complete description of how the measured object is prepared, in terms of the space and time coordinates of all constituent parts of the measured object during the preparation process. We will often denote these spacetime coordinates symbolically as α(x). Physical quantities will be denoted by capital letters A, B, . . ., and these should be interpreted as a complete description of how to perform a certain measurement process, in terms of the space and time coordinates of all constituent parts of all measuring apparatuses during the measurement process; by definition, the measurement process ends at the moment that the measured value is produced, i.e. at the same moment that the preparation process for the measured object ends. As for measured objects, we will often write A(x) to symbolically denote the spacetime coordinates of all parts of the measuring apparatuses. The outcome of the measurement of a physical quantity A for a measured object α, i.e. the measured value, is denoted by M (α, A). Measured values are assumed to be Borel subsets of Rn , where n = nA depends on the physical quantity A. The reason that measured values are Borel subsets of Rn , rather than elements of Rn , is that measurements always involve some errors. 6 Consider now some inertial observer O measuring the physical quantity A(y) for the measured object α(x), where x and y symbolically represent the spacetime coordinates of all parts of the measured object and of all parts of the measuring apparatuses, respectively. Note, in particular, that this implies that x and y are such that the end of the preparation process α(x) and the end of the measuring process A(y) coincide. We say that a second inertial observer O0 carries out a similar experiment as observer O if this second observer measures the physical quantity A(y 0 ) for the measured object α(x0 ), where x0 and y 0 are the coordinates with respect to O0 and these coordinates have the same numerical values as the coordinates x and y. Now consider the situation where N different inertial observers carry out similar experiments. If N (B) denotes the number of observers that find a measured value in the Borel subset B ⊂ Rn , then it is an emperical fact that for any such B the fraction N (B)/N approaches some definite value as N becomes large enough. This suggests that the similar experiments carried out by the different inertial observers should be interpreted as repetitions of the same probabilistic experiment. This is the form of the principle of special relativity that we will need in the following. In most texts on special relativity this principle is not stated in probabilistic form but in deterministic form, since these texts are often concerned only with classical dynamics. The deterministic form is stated by the equation M (α(x), A(y)) = M (α(x0 ), A(y 0 )), i.e. the measured values are the same for all inertial observers carrying out a similar experiment. In other words, in deterministic form the principle of special relativity can be loosely stated as follows: inertial observers carrying out similar experiments will obtain the same measured value. When we apply (the deterministic form of) the principle of special relativity to experiments in classical electromagnetism, it follows that all inertial observers measure the same speed of light. This fact can be used to derive the coordinate transformations between different inertial frames, as explained in any introductory text on special relativity. Here we will merely state the results for later reference. We already mentioned above that an inertial observer can coordinatize the points of spacetime by four numbers (x0 , x1 , x2 , x3 ), where x0 = t represents the time coordinate2 and the other components represent the spatial coordinates. Therefore, in any particular inertial frame O, spacetime can be identified with the four-dimensional vector space R4 . If a second inertial observer O0 is at rest with respect to the observer O and is standing at the point in space with spatial coordinates (a1 , a2 , a3 ) with respect to the frame O and if O0 is oriented parallel to observer O, then any point in space with coordinates (x1 , x2 , x3 ) with respect to O has coordinates ((x0 )1 , (x0 )2 , (x0 )3 ) = (x1 − a1 , x2 − a2 , x3 − a3 ) with respect to O0 . Furthermore, if the time zero moment (x0 )0 = 0 of the clock of observer O0 takes place at time x0 = a0 with respect to O, then the time coordinate of any point in spacetime with respect to O0 is (x0 )0 = x0 − a0 , where x0 is the time coordinate of the point with respect to O. Thus, in this case the coordinates with respect to O0 are related to those with respect to O by (x0 )µ = xµ − aµ . (2.1) Such coordinate transformations are called spacetime translations. Now consider another observer O00 that is standing at the same point in space as observer O with a clock that is synchronized with the clock of O, but suppose that the orientation of O00 is obtained from the orientation of O by rotating counterclockwise (as seen from a point with positive x3 -coordinate) over an angle θ in the x1 x2 -plane. Then the coordinates with respect to O00 of a point in spacetime are related to those with respect to O by ((x00 )0 , (x00 )1 , (x00 )2 , (x00 )3 ) = (x0 , x1 cos θ + x2 sin θ, −x1 sin θ + x2 cos θ, x3 ). (2.2) A very similar expression is obtained for rotations in the other two spatial planes. Rotations around arbitrary axes are more complicated, but we will come back to them later. Coordinate transformations of this form are called (spatial) rotations. Finally, consider yet another observer O000 that has the same spatial orientation as O but moves with velocity v in the positive x1 -direction, 2 We will always use units in which the speed of light c is equal to 1. Otherwise it would have been more natural to define x0 = ct. 7 and suppose that at the unique moment where the two observers are at the same point in space their clocks are both at time zero, i.e. x0 = (x000 )0 = 0 at that moment. Then their coordinates are related by ((x000 )0 , (x000 )1 , (x000 )2 , (x000 )3 ) = (γ(v)(x0 − vx1 ), γ(v)(x1 − vx0 ), x2 , x3 ), (2.3) where γ(v) = (1−v 2 )−1/2 is the Lorentz factor. Similar expressions are obtained when the observer O000 moves along one of the other two spatial axes. For more general directions, the expression becomes more complicated as we will discuss later. These coordinate transformations are called (Lorentz) boosts. It should be clear that the coordinate transformations between any two inertial frames can be obtained by a composition of translations, rotations and boosts. In the situations above, where an intertial observer passes from the coordinates of his own frame to the coordinates of another inertial observer’s frame, in order to see how the other observer observes all objects in spacetime, we speak of a passive transformation. However, the observer can obtain the same result by keeping his own coordinates and by transforming all objects in spacetime. For example, observer O above could translate all objects in spacetime over the vector −aµ to obtain the same point of view as observer O0 . Such transformations are called active transformations. From the active viewpoint, the principle of special relativity may be stated as follows: if we move all objects in spacetime (along with all measuring apparatuses) according to a transformation that relates two inertial frames, then the outcome of any measurement remains unchanged. Note that, in either form, this principle makes it possible to ’repeat’ an experiment at different times and places, but also at different velocities. Our goal for the rest of this section is to formulate a mathematical theory of space and time that agrees with the data described above. In the first subsection we will give the mathematical definition of spacetime and in the second subsection we will describe the transformations between different inertial reference frames. 2.1.1 Minkowski spacetime As dicussed above, each inertial observer can identify spacetime with the vector space R4 and the coordinate transformations between different inertial frames are generated by translations, rotations and boosts. However, we would like to define spacetime mathematically in a manner that describes its intrinsic properties, independent of a choice of inertial frame. For example, since two inertial frames that are related by a spacetime translation have different origins for their coordinate frames, it is clear that spacetime should not be represented mathematically by a vector space, but rather by an affine space. Furthermore, under any coordinate transformation between inertial frames the quantity 3 2 X 2 (∆x) · (∆x) = (∆x)0 − (∆x)j , (2.4) j=1 is left invariant, so this quantity should play an important role in the mathematical definition of spacetime; here ∆x denotes the difference between two points in spacetime. In fact, we can identify the quantity in (2.4) as some kind of metric (analogous to the Euclidean metric describing distances in Euclidean space) that acts on differences of two spacetime points. Thus, spacetime should be represented by a four-dimensional affine space together with some kind of metric acting on difference vectors, and the coordinate transformations between two inertial frames are then represented by transformations that preserve the metric. Before defining the precise mathematical model for spacetime (which will be called Minkowski spacetime, or simply Minkowski space), we will first recall the definition of an affine space and of symmetric nondegenerate bilinear forms. Definition 2.1 An affine space is a triple (A, V, `) consisting of a set A, a vector space V and a map ` : V × A → A such that (1) `(0, a) = a for all a ∈ A; 8 (2) `(v, `(w, a)) = `(v + w, a) for all v, w ∈ V and a ∈ A; (3) for each a ∈ A the map à : V → A defined by à (v) = `(v, a) is a bijection. The dimension of the affine space is defined to be the dimension of the vector space V . Note that (3) implies that A and V are in fact identical as sets. Instead of `(v, a) we also write v + a. If a1 , a2 ∈ A then, according to condition (3) in the definition, there exists a unique v ∈ V such that a2 = à1 (v) = `(v, a1 ) = v + a1 , which we rewrite as a2 − a1 = v. In this sense, we can subtract points in A to obtain elements in V , so V can be interpreted as the set of differences between points in A. Recall from (multi-)linear algebra that a multilinear map T : (V ∗ )×k × V ×l → R on a vector space V is called a (k, l)-tensor on V . If T : V × V → R is a (0, 2)-tensor on a real vector space V , then we say that T is symmetric if T (v, w) = T (w, v) for all v, w ∈ V and antisymmetric if T (v, w) = −T (w, v) for all v, w ∈ V . If T is either symmetric or antisymmetric, then T is called nondegenerate if T (v, w) = 0 for all w ∈ V implies that v = 0. The following theorem on nondegenerate symmetric bilinear forms, which we will state without proof, will be very useful for our purposes. Theorem 2.2 Let V be an n-dimensional real vector space on which a nondegenerate symmetric (0, 2)-tensor T : V ×V → R is defined. Then there exists a basis {ej }nj=1 of V such that T (ei , ej ) = 0 if i 6= j and T (ei , ei ) = ±1 for i = 1, . . . , n. Moreover, the number of basis vectors ei for which T (ei , ei ) = −1 is the same for any such basis (this number is called the index of T ). Such a basis as described in the theorem is called an orthonormal basis with respect to T . Now that we have considered affine spaces and symmetric nondegenerate bilinear forms, we can define our mathematical model for spacetime as follows. Definition 2.3 Let (M, V, `) be a 4-dimensional real affine space equipped with a nondegenerate symmetric (0, 2)-tensor η : V × V → R of index 3. Then η is called a Minkowski metric on M, and (M, V, `, η) is called Minkowski spacetime. The action of η on two vectors in V , η(v, w) = v · w, is called the inner product (or scalar/dot product) of v and w. Two vectors v, w ∈ V for which v · w = 0 are called orthogonal. The norm of a vector v ∈ V is v · v (not the square root of this quantity), and we say that a vector v ∈ V is timelike if its norm is positive, lightlike or null if its norm is zero and spacelike if its norm is negative. Definition 2.4 We define the light cone Cx at a point x ∈ M in Minkowksi spacetime by Cx = {y ∈ M : η(y − x, y − x) = 0}. If y ∈ M with η(y − x, y − x) > 0, we say that y lies inside the lightcone at x; if η(y − x, y − x) < 0, we say that y lies outside the light cone at x. Note that a point y ∈ M lies on the light cone at x if and only if y − x is a null vector, whereas y ∈ M lies inside (or outside) the light cone at x if and only if y − x is a timelike (or spacelike) vector. We will denote Minkowski spacetime simply by M instead of (M, V, `, η). Furthermore, with abuse of notation, more often than not we will denote the vector space V also by M. In fact, from now on M will always denote the vector space V in the triple (M, V, `); whenever we are considering the affine space M, we will state this explicitly. By theorem 2.2, there exists a basis {eµ }3µ=0 of M with η(e0 , e0 ) = 1, η(ei , ei ) = −1 for i = 1, 2, 3 and η(eµ , eν ) = 0 if µ 6= ν. We will always work in an orthonormal basis from now on, and we will always label the orthonormal basis vectors such that η(e0 , e0 ) = +1. If {eµ }3µ=0 denotes the dual basis of {eµ }, i.e. {eµ } is a basis of the dual vector space M∗ of M such that eµ (eν ) = δνµ , then we can write η = ηµν eµ ⊗ eν , where ηµν = η(eµ , eν ). Of course, in 9 our particular choice of basis we have η00 = −η11 = −η22 = −η33 = 1 and all other coefficients are 0. The metric η : M × M → R defines a map η̃ : M → M∗ v 7→ η(eµ , v)eµ . In physics it is more convenient to write η̃(v) = vµ eµ rather than η̃(v) = [η̃(v)]µ eµ , and for this reason physicists often refer to the map v 7→ η̃(v) as ’lowering the indices of v’ and in their notation this map is simply v µ 7→ vµ = ηµν v ν . Because η̃(e0 ) = e0 and η̃(ej ) = −ej for j = 1, 2, 3, it is clear that η̃ has an inverse η̃ −1 : M∗ → M and this inverse can be considered as a map that ’raises indices’: fµ 7→ f µ . This map, in turn, defines a (2, 0)-tensor η −1 : M∗ × M∗ → R on M given by η −1 (f, g) = g(η̃ −1 (f )) = f (η̃ −1 (g)) for f, g ∈ M∗ . In components we have η −1 (f, g) = gµ f µ = fµ g µ . We call η −1 the inverse Minkowski metric and its nonzero components η µν = η(eµ , eν ) are η 00 = −η 11 = −η 22 = −η 33 = 1. Note that for f ∈ M∗ we can now write the raised components f µ in terms of η µν as f µ = η µν fν . Also, for f, g ∈ M∗ we have η µν fµ gν = fµ g µ = ηµν f µ g ν . For this reason we will often write, with abuse of notation, f ·g instead of η −1 (f, g). Similarly, because for f ∈ M∗ and v ∈ M we have that f (v) = fµ v µ = ηµν f ν v µ , we will also write f · v instead of f (v). In other words, because we can move vectors back and forth between M and M∗ , we will not make a clear distinction between M and M∗ and we write all scalars η(v, w), η −1 (f, g) and f (v) as a dot product. 2.1.2 The Lorentz group, causal structure and the Poincaré group As stated above, the coordinate transformations between different inertial frames are transformations that leave the Minkowski metric invariant. Therefore, we will now study such transformations. Definition 2.5 A linear map L : M → M is said to be an orthogonal transformation of M if η(Lv, Lw) = η(v, w) for all v, w ∈ M. An orthogonal transformation is also called a (general) Lorentz transformation. In this subsection we will investigate the properties of these Lorentz transformations. We will also find that the Lorentz transformations form a Lie group, and we will study some important properties of this Lie group. Algebraic properties of Lorentz transformations and causal structure of M The following discussion is largely based on the first chapter of [26]. If L : M → M is an orthogonal transformation and Lv = 0 for some v ∈ M, then for all w ∈ M we have η(v, w) = η(Lv, Lw) = η(0, Lw) = 0, so that v must be the zero vector in M since η is nondegenerate. This means that L is injective and hence we can conclude that orthogonal transformations are linear automorphisms of M. In particular, an orthogonal transformation L is invertible and we have η(L−1 v, L−1 w) = η(LL−1 v, LL−1 w) = η(v, w), so that its inverse is also an orthogonal transformation. We have the following characterization of orthogonal transformations: Lemma 2.6 Let L : M → M be a linear map in Minkowski space. Then the following statements are equivalent: (a) L is an orthogonal transformation, i.e. η(Lv, Lw) = η(v, w) for all v, w ∈ M. (b) η(Lv, Lv) = η(v, v) for all v ∈ M. (c) L carries an orthonormal basis of M onto another orthonormal basis of M. Proof That (a) implies (b) is trivial. The opposite implication follows from the identity 4η(v, w) = η(v + w, v + w) − η(v − w, v − w) = η(L(v + w), L(v + w)) − η(L(v − w), L(v − w)) = η(Lv + Lw, Lv + Lw) − η(Lv − Lw, Lv − Lw) = 4η(Lv, Lw). 10 To see that (a) implies (c), let {eµ } be an orthonormal basis of M. Because L is an automorphism of M, {Leµ } is also a basis of M, and from η(Leµ , Leν ) = η(eµ , eν ) it follows that {Leµ } is an orthonormal basis of M. Finally, to prove that (c) implies (b), let {eµ } be an orthonormal basis of M. By assumption, {Leµ } is also an orthonormal basis of M, so that3 η(Leµ , Leν ) = η(eµ , eν ). This immediately implies that η(Lv, Lv) = η(v, v) for all v ∈ M. If L : M → M is a linear map and {eµ } is an orthonormal basis of M, we define Lµ ν for µ, ν ∈ {0, 1, 2, 3} by Lµ ν = (Leν )µ . Then Leν = (Leν )µ eµ = Lµ ν eµ , so for all v ∈ M we have Lv = v ν Leν = v ν Lµ ν eµ , and hence the components of Lv can be expressed in terms of the Lµ ν as (Lv)µ = Lµ ν v ν . In case L is an orthogonal transformation, the constants Lµ ν satisfy a special property, namely, ηµν = η(Leµ , Leν ) = η(Lρ µ eρ , Lσ ν eσ ) = Lρ µ Lσ ν η(eρ , eσ ) = Lρ µ Lσ ν ηρσ (2.5) or, equivalently, η µν = Lµ ρ Lν σ η ρσ . (2.6) Conversely, it is also true that if L : M → M is a linear map such that the constants Lµ ν = (Leν )µ (with {eµ } an orthonormal basis of M) satisfy (2.5), then L : M → M is an orthogonal transformation. Thus, the identity above gives a characterization of orthogonal transformations on Minkowski space M in terms of the constants Lµ ν = (Leν )µ . Of course, the Lµ ν are nothing else than the matrix coefficients of L with respect to the orthonormal basis {eµ }, where µ denotes the row index of the matrix [L] of L and ν denotes the column index. When we define the matrix [η] by [η]µν = ηµν , we can write (2.5) in matrix form as [L]T [η][L] = [η]. From this matrix identity, the following proposition follows immediately by taking the determinant on both sides. Proposition 2.7 Let L : M → M be an orthogonal transformation and let [L] be its matrix with respect to some orthonormal basis {eµ } of M. Then det([L]) = ±1. An orthogonal transformation for which the determinant of its matrix is +1 (or −1) is called proper (or improper ). Because (1) the composition of two orthogonal transformations is again an orthogonal transfromation, (2) the composition of linear maps on M is associative, (3) the identity map is orthogonal and (4) the inverse4 of an orthogonal transformation is again an orthogonal transformation, it follows that the orthogonal transformations on M form a group under the compostion of maps. This group is called the Lorentz group and is denoted by L. Because the determinant is multiplicative, the set of proper elements in L forms a subgroup of L. It is called the proper Lorentz group and is denoted by L+ . In practice, we will often identify the Lorentz group L with the 4 × 4-matrices Λ satisfying ΛT [η]Λ = [η]; the proper Lorentz group L+ is then identified with the set of those elements Λ ∈ L that also satisfy det(Λ) = 1. Apart from the determinant being either +1 or −1, the elements of L have another important property: Proposition 2.8 Let L : M → M be an element of L. Then either L0 0 ≥ 1 or L0 0 ≤ −1. Proof P P Substitution of µ = ν = 0 in (2.5) gives (L0 0 )2 − 3k=1 (Lk 0 )2 = 1, or (L0 0 )2 = 1+ 3k=1 (Lk 0 )2 ≥ 1, so that |L0 0 | ≥ 1. 3 Here we use our convention that orthonormal bases are always labeled such that η(e0 , e0 ) = +1. Note that it follows from the identity [L]T [η][L] = [η] that the inverse matrix [L]−1 = [L−1 ] of [L] can be expressed as [L]−1 = [η][L]T [η]. 4 11 An element L ∈ L is called orthochronous (or nonorthochronous) if L0 0 ≥ 1 (or L0 0 ≤ −1). To understand the properties of orthochronous elements of L, we first need to define the notion of a past and future. For this, we need the following theorem. Theorem 2.9 Suppose that v ∈ M is timelike and w ∈ M is either timelike or else a nonzero null vector. Let {eµ } be an orthonormal basis for M and write v = v µ eµ and w = wµ eµ . Then either (a) v 0 w0 > 0, in which case η(v, w) > 0, or (b) v 0 w0 < 0, in which case η(v, w) < 0. In particular, v 0 w0 6= 0 and η(v, w) 6= 0. Proof By assumption, (v 0 )2 − (v 1 )2 − (v 2 )2 − (v 3 )2 = η(v, v) > 0 and (w0 )2 − (w1 )2 − (w2 )2 − (w3 )2 = η(w, w) ≥ 0, so (v 0 w0 )2 = (v 0 )2 (w0 )2 > (v 1 )2 + (v 2 )2 + (v 3 )2 (w1 )2 + (w2 )2 + (w3 )2 ≥ (v 1 w1 + v 2 w2 + v 3 w3 )2 , where we have used the Cauchy-Schwarz inequality for R3 . Thus, we find that 0 0 1 1 v w > v w + v 2 w2 + v 3 w3 , 0 0 0 0 v 0 w0 6= 0 and, so0 in0 particular moreover, η(v, w) 6= 0. Suppose that v w > 0. Then v w = v w > v 1 w1 + v 2 w2 + v 3 w3 ≥ v 1 w1 + v 2 w2 + v 3 w3 and so v 0 w0 − v 1 w1 − v 2 w2 − v 3 w3 > 0, i.e. η(v, w) > 0. On the other hand, if v 0 w0 < 0, then η(v, −w) > 0, so η(v, w) < 0. Corollary 2.10 If a nonzero vector in M is orthogonal to a timelike vector, then it must be spacelike. We denote by τ the collection of all timelike vectors in M and define a relation ∼ on τ by v ∼ w if and only if η(v, w) > 0 (so that, by theorem 2.9, v 0 and w0 have the same sign in any orthonormal basis). The relation ∼ on τ is an equivalence relation with exactly two equivalence classes. We denote the two equivalence classes of ∼ on τ (in an arbitrary way) by τ + and τ − and refer to the elements of τ + as future-directed timelike vectors, whereas we refer to the elements of τ − as past-directed. Then, given some orthonormal basis {eµ }, we have that either v 0 < 0 for all v ∈ τ + (and v 0 > 0 for all v ∈ τ − ) or else that v 0 > 0 for all v ∈ τ + (and v 0 < 0 for all v ∈ τ − ). This follows immediately by considering the equivalence class of the timelike vector e0 . Clearly, both τ + and τ − are cones, which means that if v, w ∈ τ ± and r > 0, then rv ∈ τ ± and v + w ∈ τ ± . Now that we have obtained the notion of past- and future-directed timelike vectors, we will define past- and future-directed null vectors. For this we need the following lemma. Lemma 2.11 If n ∈ M is a nonzero null vector, then n · v has the same sign for all v ∈ τ + . Proof n·v2 Suppose that v1 , v2 ∈ τ + with n · v1 < 0 and n · v2 > 0. Define v10 := |n·v v1 . Because n · v2 > 0, it 1| n·v2 n·v1 + 0 + 0 follows from the fact that τ is a cone that v1 ∈ τ , and we have n · v1 = |n·v n · v1 = |n·v n · v2 = 1| 1| 0 0 + −n · v2 . Thus 0 = n · v1 + n · v2 = n · (v1 + v2 ). But again using the fact that τ is a cone, v10 + v2 ∈ τ + ; in particular v10 + v2 is timelike. Since n is nonzero and null this contradicts corollary 2.10. From this lemma it follows that the following definition makes sense. Definition 2.12 Let n ∈ M be a nonzero null vector. Then n is called future-directed if n · v > 0 for all v ∈ τ + and past-directed if n · v < 0 for all v ∈ τ + . Proposition 2.13 Let n1 , n2 ∈ M be two nonzero null vectors. Then n1 and n2 have the same time orientation (i.e. future-directed or past-directed) if and only if (n1 )0 has the same sign as (n2 )0 relative to any orthonormal basis for M. 12 Proof Suppose that (n1 )0 and (n2 )0 have the same sign with respect to any orthonormal basis. Choose an arbitrary orthonormal basis {eµ } and let λ ∈ {−1, 1} be such that v := λe0 ∈ τ + . Then the two inner products v · n1 = λ(n1 )0 and v · n2 = λ(n2 )0 have the same sign by assumption. By the previous lemma, this is not only true for v, but for all vectors in τ + . Thus, n1 and n2 have the same time orientation. For the converse statement, assume that n1 and n2 have the same time orientation and let {eµ } be an orthonormal basis. Again we let λ ∈ {−1, 1} be such that λe0 ∈ τ + . Because n1 and n2 have the same time orientation, λe0 · n1 and λe0 · n2 have the same sign, which implies that (n1 )0 and (n2 )0 have the same sign. Definition 2.14 In the affine space M we define the future light cone at a point x ∈ M by Cx+ = {y ∈ Cx : y − x is future-directed} and the past light cone by Cx− = {y ∈ Cx : y − x is past-directed}. Now that we have introduced past- and future-directed timelike and null vectors, we can interpret the orthochronous elements of L. This interpretation is given in the following theorem. Theorem 2.15 Let L ∈ L and let {eµ } be an orthonormal basis for M. Then the following are equivalent: (a) L is orthochronous. (b) L preserves the time orientation of all nonzero null vectors. (c) L preserves the time orientation of all timelike vectors. Before we can prove the fact. P Let L ∈ L. Substituting µ = ν = 0 Ptheorem, we first need a little )2 > 3k=1 (L0 k )2 . Now let v = v µ eµ ∈ M be in (2.6) gives (L0 0 )2 − 3k=1 (L0 k )2 = 1 and so (L0 0P either timelike or else null and nonzero, so (v 0 )2 ≥ 3k=1 (v k )2 (note that v 0 6= 0, since otherwise v = 0). Using these two inequalities and the Cauchy-Schwarz inequality for R3 (and the fact that v 0 6= 0), we get 3 X !2 0 L kv k 3 X ≤ k=1 ! 0 2 (L k ) k=1 3 X ! k 2 (v ) < (L0 0 )2 (v 0 )2 = (L0 0 v 0 )2 . k=1 We may rewrite this as 0 > 3 X !2 L0 k v k − (L0 0 v 0 )2 = 3 X ! k=1 " = − 3 X ! L0 k v k − L0 0 v 0 k=1 L0 k ek 3 X ! L0 k v k + L0 0 v 0 k=1 # · v L0 µ v µ = −(w · v)(Lv)0 , k=0 P where we have defined the timelike vector w = 3k=0 L0 k ek . Thus (w · v)(Lv)0 > 0, so we conclude that w · v and (Lv)0 have the same sign. To summarize, if v ∈ M is either timelike or else null and nonzero and if L ∈ L, then (Lv)0 and w · v (with w the timelike vector defined above) have the same sign. We will use this fact to prove the theorem. Proof Let v ∈ M be again a timelike or nonzero null vector and let w be the timelike vector defined above. Assume L0 0 ≥ 1 (L orthochronous). We separate two cases. In case v 0 > 0 we have w0 v 0 = L0 0 v 0 > 0, so by theorem 2.9 we have v · w > 0. Thus (Lv)0 > 0, by the discussion above. In case 13 v 0 < 0 we have w0 v 0 = L0 0 v 0 < 0, so by theorem 2.9 we have v · w < 0. Thus (Lv)0 < 0, by the discussion above. So we conclude that if L0 0 ≥ 1, then v 0 and (Lv)0 always have the same sign, i.e. we have proved that (a) implies (b) and that (a) implies (c). Assume L0 0 ≤ −1 (L nonorthochronous). We separate two cases. In case v 0 > 0 we have w0 v 0 = L0 0 v 0 < 0, so by theorem 2.9 we have v · w < 0. Thus (Lv)0 < 0, by the discussion above. In case v 0 < 0 we have w0 v 0 = L0 0 v 0 > 0, so by theorem 2.9 we have v · w > 0. Thus (Lv)0 > 0, by the discussion above. So we conclude that if L0 0 ≤ −1, then v 0 and (Lv)0 always have opposite signs, i.e. we have proved that (b) implies (a) and that (c) implies (a). Corollary 2.16 If L ∈ L is nonorthochronous, it reverses the time orientation of all timelike and nonzero null vectors. It is now clear how to interpret the orthochronous elements of L: they are precisely those elements of L that preserve the causal structure of Minkowski space. Now suppose that L1 , L2 ∈ L are both orthochronous. If v ∈ M is a timelike or nonzero null vector that is future-directed (or pastdirected), then by theorem 2.15 the vector w = L1 v is also a future-directed (or past-directed) timelike or nonzero null vector in M, and by the same argument, so is L2 w = L2 L1 v. Thus, the element L2 L1 ∈ L preserves the time orientation of all timelike and all nonzero null vectors. Using theorem 2.15 again, we conclude that L2 L1 ∈ L is orthochronous. Furthermore, it is clear that the identity map I : M → M is orthochronous since I 0 0 = 1. Finally, if L ∈ L is orthochronous, and L−1 ∈ L would be nonorthochronous, then I = L−1 L would reverse the time orientation of all timelike and all nonzero null vectors. This shows that L−1 must in fact be orthochronous whenever L is orthochronous. Thus, the set of orthochronous elements of L forms a subgroup L↑ of L and is called the orthochronous Lorentz group. The intersection L↑+ := L+ ∩ L↑ of the two subgroups L+ and L↑ of L is again a subgroup of L and is called the restricted Lorentz group. We also define the subsets L− and L↓ , consisting of the Lorentz transformations L with det(L) = −1 and L0 0 ≤ −1, respectively. Of course these subsets cannot be subgroups of L since they do not contain the identity element. The Lie group structure of the Lorentz group The Lorentz group L can be viewed as a subgroup of the matrix Lie group GL(4, R), and it can in fact be shown that L is itself a six-dimensional matrix Lie group. It has four connected components, namely L↑+ , L↓+ := L+ ∩ L↓ , L↑− := L− ∩ L↑ and L↓− := L− ∩ L↓ . Typical elements of L↑− , L↓− and L↓+ are the space-inversion Is , time-reversal It and spacetime-inversion Ist = Is It , respectively, where Is is defined by Is (x0 , x1 , x2 , x3 ) = (x0 , −x1 , −x2 , −x3 ) and It is defined by It (x0 , x1 , x2 , x3 ) = (−x0 , x1 , x2 , x3 ). The transformation Is defines a bijection L↑− → L↑+ by L 7→ Is L. Similarly, It defines a bijection L↓− → L↑+ and Ist defines a bijection L↓+ → L↑+ . In other words, the orthochronous Lorentz group L↑ = L↑+ ∪ L↑− is generated by L↑+ ∪ {Is } and the proper Lorentz group L+ = L↑+ ∪ L↓+ is generated by L↑+ ∪ {Ist }. It also follows that the subgroup L0 := L↑+ ∪ L↓− , called the orthochorous Lorentz group, is generated by L↑+ ∪ {It }. The Lie algebra l of the Lorentz group L is by definition the set of all transformations X : M → M such that etX ∈ L for all t ∈ R, or, in terms of matrices, the set of all 4 × 4-matrices such that [η] = (etX )T [η]etX , which is equivalent to (etX )−1 = [η]−1 (etX )T [η] = [η](etX )T [η]. But T (etX )T = etX and (etX )−1 = e−tX , so a 4 × 4-matrix X is in l if and only if T e−tX = [η]etX [η] = et[η]X T [η] , where we have used that for each 4 × 4-matrix M and each 4 × 4-matrix G satisfying G2 = I we k (GM G)k P P P k GM k G kM k t ∞ ∞ ∞ t t = k=0 =G G = GetM G. Thus, X ∈ l if have etGM G = k=0 k=0 k! k! k! and only if [η]X T [η] = −X, or X T [η] + [η]X = 0. We have thus found that l = {X ∈ M4 (R) : X T [η] + [η]X = 0}. 14 Now let {eµν } be the standard basis of M4 (R) (i.e. (eµν )ρσ = 1 if (ρ, σ) = (µ, ν) and (eµν )ρσ = 0 for other values of ρ and σ) and define for j, k = 1, 2, 3 the matrices Xjk = ekj − ejk Xk0 = −X0k = ek0 + e0k . The basis matrices eµν of M4 (R) cannot be defined covariantly with the lower indices µ and ν acting as Lorentz indices, by which we mean that for L ∈ L the matrices e0ρσ := Lµ ρ Lν σ eµν are not the same as the matrices eµν defined above5 . However, if we define X00 = 0 then the matrices X00 , Xk0 and Xjk can be given by the covariant expression (Xµν )α β = δµα ηνβ − δνα ηµβ , which is antisymmetric in µ and ν. The matrices Xµν satisfy the commutation relations [Xµν , Xρσ ] = ηµρ Xσν + ηνρ Xµσ + ηνσ Xρµ + ηµσ Xνρ , (2.7) which easily follow from writing out [Xµν , Xρσ ]α β by using the covariant expression for (Xµν )α β above. In other words, we have [Xµν , Xρσ ] = 0 whenever the sets {µ, ν} and {ρ, σ} are either equal or disjoint, and we have [Xµν , Xνσ ] = ηνν Xµσ if µ 6= ν and ν 6= σ. The matrices {Xµν }µ<ν form a basis of l and their commutation relations are obtained from (2.7) by replacing Xκλ → −Xλκ on the right-hand side whenever κ > λ. We mention furthermore that the elements of the form etXµν generate L↑+ . We will now focus on the restricted Lorentz group L↑+ . The restricted Lorentz group is a connected six-dimensional Lie group that is not simply connected, i.e. not every closed path in this group can be contracted continuously to a point. According to the theory of Lie groups, there e together with a Lie group exists for each connected Lie group G a simply-connected Lie group G 6 e → G such that the associated Lie algebra homomorphism7 φ : e homomorphism Φ : G g → g is e a Lie algebra isomorphism. The Lie group G is called a universal covering group of G and the homomorphism Φ is called the covering homomorphism. The universal covering group is unique e 1 , Φ1 ) and (G e 2 , Φ2 ) are universal covers of G then there exists a Lie in the following sense: if (G e e group isomorphism Ψ : G1 → G2 such that Φ2 ◦ Ψ = Φ1 . The universal covering group Le↑+ of L↑+ is SL(2, C), the group of all 2 × 2 complex matrices with unit determinant. The covering homomorphism Φ : SL(2, C) → L↑+ can be obtained as follows. Let H(2, C) be the set of 2 × 2 complex Hermitian matrices and, given an orthonormal basis {eµ } of M, define the R-linear isomorphism ψ : M → H(2, C) by ψ(x) = 3 X µ=0 µ µ x σ = x0 + x3 x1 − ix2 , x1 + ix2 x0 − x3 where σ 0 is the 2 × 2 identity matrix and the σ j with j = 1, 2, 3 are the Pauli matrices 0 1 0 −i 1 0 1 2 3 σ = , σ = , σ = . 1 0 i 0 0 −1 Note that this definition of ψ depends on the choice of the orthonormal basis {eµ } of M; furthermore, once this choice of basis has been made (so that ψ is determined), we cannot use the same formula to compute ψ(x) with respect to another orthonormal basis because the σ µ are not supposed to transform in any way. The important property of ψ is that det(ψ(x)) = (x0 )2 − 3 X (xk )2 = x · x. k=0 5 The eµν are given by (eµν )αβ = δµα δνβ , where δµν is the (non-covariant) Kronecker delta with lower indices. Note that it would in fact be possible to define a standard basis {eµ ν } by using the covariant Kronecker deltas δνµ . 6 This is a smooth map of Lie groups that is also a group homomorphism. 7 This is the unique map φ : e g → g satisfying Φ(eX ) = eφ(X) for all X ∈ e g. The proof of the existence and uniqueness of such a map (and that such a map is a Lie algebra homomorphism, i.e. φ([X, Y ]eg ) = [φ(X), φ(Y )]g for all X, Y ∈ e g) can be found in [22], theorem 2.21. 15 If A ∈ SL(2, C) and X ∈ H(2, C), then (AXA∗ )∗ = (A∗ )∗ X ∗ A∗ = AXA∗ , so AXA∗ ∈ H(2, C). Also, det(AXA∗ ) = det(A) det(X) det(A∗ ) = det(X), since for all A ∈ SL(2, C) we have det(A) = det(A∗ ) = 1. Thus, each element A ∈ SL(2, C) defines a map ΨA : H(2, C) → H(2, C) by ΨA (X) = AXA∗ that preserves the determinant. Under the correspondence ψ : M → H(2, C), this determinant preserving map on H(2, C) corresponds to a norm preserving linear map Φ(A) := ψ −1 ◦ ΨA ◦ ψ : M → M given by   3 X xµ σ µ  Φ(A)x = (ψ −1 ◦ ΨA ◦ ψ)(xµ eµ ) = (ψ −1 ◦ ΨA )  µ=0   = ψ −1 A  3 X    xµ σ µ  A∗  = ψ −1  3 = xµ Aσ µ A∗   3  1 X  X µ µ ∗  ν  x Aσ A σ eν Tr 2 µ=0 ν=0 =  µ=0 µ=0  3 X 1 2 3 X 3 X xµ Tr (Aσ µ A∗ σ ν ) eν , ν=0 µ=0 where we have used that the inverse of the R-linear isomorphism ψ : M → H(2, C) is given by 1 P3 −1 ψ : X 7→ 2 µ=0 Tr(Xσ µ )eµ . Thus we obtain a map Φ : SL(2, C) → L, where 1 Φ(A)µ ν = Tr(Aσ ν A∗ σ µ ). 2 Note that the different index placement on both sides reflects the fact that the map Φ is not defined in a covariant way. We can rewrite the equation Φ(A)x = (ψ −1 ◦ ΨA ◦ ψ)(x) = ψ −1 (Aψ(x)A∗ ) as ψ(Φ(A)x) = Aψ(x)A∗ . Using this equation, we find that for A, B ∈ SL(2, C) we have for each x ∈ M ψ(Φ(AB)x) = ABψ(x)B ∗ A∗ = Aψ(Φ(B)x)A∗ = ψ(Φ(A)Φ(B)x). Using the invertibility of ψ, we conclude that for each x ∈ M we have Φ(AB)x = Φ(A)Φ(B)x, and thus that Φ(AB) = Φ(A)Φ(B). So Φ : SL(2, C) → L is a group homomorphism. It is also clear from the formula for Φ(A)µ ν that this map is smooth, so Φ is in fact a Lie group homomorphism. In particular, since Φ is continuous and SL(2, C) is a connected (even simply connected) Lie group, the image Φ(SL(2, C)) ⊂ L must be connected. Because the identity of L is contained in this image, the image must lie in the connected component of the identity, i.e. Φ(SL(2, C)) ⊂ L↑+ . The Lie group homomorphism Φ : SL(2, C) → L↑+ induces a homomorphism φ : sl(2, C) → l of the associated Lie algebras. Note that because L↑+ is the connected component of the identity in L, the Lie algebra associated to L↑+ coincides with the Lie algebra l associated with L. To see what φ is, we first need some information about sl(2, C). The Lie algebra sl(2, C) consists of all complex 2 × 2-matrices with zero trace. It can be viewed as a three dimensional complex Lie algebra, but for our purposes it is more convenient to consider it as a six-dimensional real Lie algebra. A basis 1 of sl(2, C) is given by the six matrices { 21 σj , 2i σj }j=1,2,3 , where the σj denote the Pauli matrices. 1 1 1 1 Note that the 2i σj ’s span the Lie algebra su(2) ⊂ sl(2, C); they satisfy [ 2i σj , 2i σk ] = 2i σl , where (j, k, l) is a cyclic permutation of (1, 2, 3). In terms of this basis for sl(2, C), the Lie algebra homomorphism φ : sl(2, C) → l is given by 1 φ σj = Xkl for (j, k, l) a cyclic permutation of (1, 2, 3) 2i 1 φ σk = X0k . 2 16 Since φ maps a basis of sl(2, C) onto a basis of l, it is clear that φ : sl(2, C) → l is an isomorphism of Lie algebras, and by definition of φ we have t 1 Φ(e 2i σj ) = etφ( 2i σj ) = etXkl Φ(e t σ 2 k tφ( 12 σk ) ) = e (2.8) = etX0k , where in the first expression (j, k, l) is a cyclic permutation of (1, 2, 3). Because the elements etXjk and etX0k on the right-hand sides generate L↑+ , it follows immediately that Φ is surjective. Thus, Φ : SL(2, C) → L↑+ satisfies all the right properties of the universal covering map. We note furthermore that the map Φ : SL(2, C) → L↑+ is two-to-one: for each L ∈ L↑+ the inverse image Φ−1 (L) is a set of the form {A, −A}. The Poincaré group So far we have given the definition of Minkowski spacetime as a 4-dimensional affine space and we have studied the group of transformations L : M → M with η(Lv, Lw) = η(v, w), where v and w are elements in the vector space (and not the affine space) M. However, we are actually interested in transformations P : M → M of the affine space M that satisfy η(P x − P y, P x − P y) = η(x − y, x − y) for all x and y in the affine space. Such transformations are called Poincaré transformations. In order to formulate the general form of a Poincaré transformation, note that if we choose a fixed point x0 ∈ M in the affine space then we can write any x in the affine space as x = x0 + (x − x0 ), where x − x0 lies in the vector space M. When we have agreed on such point x0 , we can actually identify the affine space M with the vector space M by identifying a point x in the affine space with the point x − x0 in the vector space; note that the point x0 in the affine space is then identified with the origin of the vector space. With this identification, a general Poincaré transformation can then be written as Pa,L (x) = Lx + a, with L a Lorentz transformation and a an element of the vector space M. If we take L to be the identity map 1 ∈ L, we obtain the map Ta (x) := Pa,1 = x+a, which is a spacetime translation. If we take a to be the zero vector, we obtain the map P0,L (x) = Lx, which is a Lorentz transformation. A general Poincaré transformation can thus be written as the composition of a Lorentz transformation and a spacetime translation: Pa,L (x) = Lx + a = (Ta ◦ L)(x). From now on we will always write (Ta , L), or simply (a, L), to denote the Poincaré transformation Pa,L . The composition of two Poincaré transformations (a1 , L1 ) and (a2 , L2 ) is again a Poincaré transformation and its action is given by ((a1 , L1 ) ◦ (a2 , L2 ))x = (a1 , L1 )(L2 x + a2 ) = L1 (L2 x + a2 ) + a1 = L1 L2 x + (L1 a2 + a1 ), so we have found the rule (a1 , L1 ) ◦ (a2 , L2 ) = (a1 + L1 a2 , L1 L2 ). In particular, we have for any Poincaré transformation (a, L) (0, 1) ◦ (a, L) = (a, L) ◦ (0, 1) = (a, L), where 1 ∈ L is the identity map. Furthermore, for any isometry (a, L) we have (a, L) ◦ (−L−1 a, L−1 ) = (−L−1 a, L−1 ) ◦ (a, L) = (0, 1). This shows that the set of Poincaré transformations forms a group under composition of maps with multiplication given by (a1 , L1 )(a2 , L2 ) = (a1 + L1 a2 , L1 L2 ), unit element (0, 1) and (a, L)−1 = (−L−1 a, L−1 ). This group is called the Poincaré group and is denoted by P. From the considerations above, the group P is a semi-direct product of the additive group R4 and the Lorentz group 17 ↑ L. We obtain the subgroups P+ , P+ , and P ↑ of P by demanding that the Lorentz transformation ↑ L in (a, L) ∈ P lies in L↑+ , L+ or L↑ , respectively. The subgroups P+ , P+ , and P ↑ are called the restricted Poincaré group, the proper Poincaré group and the orthochronous Poincaré group, ↓ ↑ ↓ respectively. In a similar way we also define the subsets P+ , P− and P− by demanding that L lies ↓ ↑ ↓ in L+ , L− or L− . ↑ ↓ The Poincaré group P is a ten-dimensional real Lie group with connected components P+ , P+ , ↑ ↓ P− and P− . The Lie algebra p of the Poincaré group contains the Lie algebra l of the Lorentz group as a Lie subalgebra. The Lie algebra p is spanned by the basis elements {Xµν }µ<ν of l together with four elements {Yµ } with µ = 0, 1, 2, 3. The Lie bracket in p is given in terms of these basis elements Xµν and Yµ by [Xµν , Xρσ ] = ηµρ Xσν + ηνρ Xµσ + ηνσ Xρµ + ηµσ Xνρ [Xµν , Yρ ] = −(ηνρ Yµ − ηµρ Yν ) [Yµ , Yν ] = 0. ↑ ↑ Because the subgroup P+ of P is connected, it has a universal covering group P̃+ . This universal ↑ e covering group P+ consists of all pairs (a, A), where a ∈ M and A ∈ SL(2, C). The multiplication e↑ is given by in P + (a1 , A1 )(a2 , A2 ) = (a1 + Φ(A1 )a2 , A1 A2 ), where Φ denotes the covering homomorphism Φ : SL(2, C) → L↑+ . The covering homomorphism e↑ → P ↑ is given by Π:P + + Π((a, A)) = (a, Φ(A)), so that Π((a, A)) acts on any x ∈ M as Π((a, A))x = Φ(A)x + a = 1X Tr(Aσ ν A∗ σ µ )xµ eν + a. 2 µ,ν Physical interpretation of the Poincaré transformations At the beginning of this section on special relativity we gave the explicit form of some important coordinate transformations, namely spacetime translations, spatial rotations (around the x3 -axis) and Lorentz boosts (in the x1 -direction). We will now relate these coordinate transformations to the Poincaré transformations (a, L). It is clear that the spacetime translation in (2.1) can be written as x0 = (−a, 0)x. To rewrite the spatial rotation in (2.2), note that  1 0 0  0 cos(t) − sin(t) etX12 =  0 sin(t) cos(t) 0 0 0  0 0 . 0 1 We can thus write (2.2) as x00 = 0, e−θX12 x. More generally, if we define X = (X23 , X31 , X12 ), then a rotation of the coordinate axes of observer b (according to the right-hand rule) with respect to the O00 over an angle θ around the unit vector θ b b axes of observer O gives the transformation rule x00 = 0, e−θθ·X x = 0, e−θ·X x, where θ = θθ. Finally, because   cosh(t) − sinh(t) 0 0 − sinh(t) cosh(t) 0 0 , etX01 =   0 0 1 0 0 0 0 1 18 we can write (2.3) as x000 = 0, eφv X01 x, where sinh(φv ) = γ(v)v and cosh(φv ) = γ(v), i.e. φv = 1 2 1+v 1−v . We call φv the boost parameter 000 O moves with velocity v with respect ln corresponding to the speed v. More generally, if observer e = (X01 , X02 , X03 ), then the coordinate transformation is given by x000 = to X O and ifwe define e e b. 0, eφ|v| vb·X x = 0, eφv ·X x, where we have defined φv = φ|v| v Earlier we mentioned that the elementsof the form etXµν generate the restricted Lorentz group ↑ so the elements of the form a, etXµν generate the restricted Poincaré group P+ . In other ↑ words, P+ is generated by translations, spatial rotations and Lorentz boosts. The restricted ↑ Poincaré group P+ is therefore precisely the group of transformations that relates different inertial ↑ frames. The physically important group is thus P+ , rather than the entire Poincaré group P. L↑+ , 2.2 Quantum theory In this section we will describe quantum theory. In the first subsection we introduce the definitions of states and observables and some of their properties. In the second subsection we formulate the general mathematical structure of quantum theory. In the third subsection we discuss how symmetries are described in the mathematical framework of quantum theory. In the fourth subsection we will consider a particular form of symmetry, namely relativistic symmetry; this automatically leads to the theory of unitary representations of the universal covering group of the Poincaré group. The irreducible representations will lead naturally to a mathematical definition of a particle. Finally, in the fifth subsection we will describe the state spaces associated with many-particle states. 2.2.1 States and observables As stated at the beginning of the previous section on special relativity8 , if we repeat an experiment N times, and if N (B) denotes the number of times that we find a measured value in the Borel subset B ⊂ Rn , then it is an emperical fact that for any such B the fraction N (B)/N approaches some definite value as N becomes large enough. Therefore, we assume that for any measured object α and for any physical quantity A there exists some theoretical probability PαA (B) := lim (N (B)/N ) N →∞ that the measured value M (α, A) lies in the Borel set B ⊂ RnA . Note that if we write α(x) and A(y) to symbolically denote the spacetime components of all parts of the measured object and of all parts of the measuring apparatuses, then according to the principle of special relativity these probabilities satisfy A(y 0 ) A(y) Pα(x0 ) (B) = Pα(x) (B) (2.9) where x0 and y 0 denote spacetime components with respect to another inertial observer that are numerically equal to x and y. States and observables in the physical world If for two measured objects α and β we have that PαA (B) = PβA (B) for all physical quantities A and all Borel sets B, then the measured objects α and β cannot be distinguished by any experiment. This defines an equivalence relation on the set of all measured objects, the corresponding equivalence classes of which are called states. With abuse of notation, 8 Like our discussion at the beginning of the previous section, the present discussion is also largely inspired by the first chapter of [1]. 19 we will denote the equivalence class (i.e. the state) of a measured object α also by α, and for the probabilities we correspondingly write PαA (B) where α now denotes the state rather than a measured object. If for two physical quantities A and B we have that PαA (B) = PαB (B) for all states α and all Borel sets B, then the physical quantities A and B cannot be distinguished by any experiment. This defines an equivalence relation on the set of physical quantities, and the corresponding equivalence classes are called observables. We will use the same letter for a physical quantity as for its equivalence class (i.e. the observable) and we use the notation PαA (B) where α is a state and A is an observable. It is clear from the definition of a state that a state α is completely characterized by the set of probabilities {PαA (B)}A,B , where A runs over all possible observables. Now we would like to define the notion of simultaneously measurable observables. For any observable A, we can define for each Borel-measurable function f : RnA → Rm an observable f (A); namely, if some kind of measuring process results in a measured value in the Borel subset B ⊂ RnA for the observable A then the Borel subset f (B) ⊂ Rm represents the measured value for the observable f (A). We say that the observables A1 , . . . , Am are simultaneously measurable in n state α if there exists an observable B and functions fj : RnB → R Aj such that for all j = 1, . . . , m A f (B) the observables Aj are indistinguishable from fj (B) in state α, i.e. if Pα j (B) = Pαj (B) for all Borel sets B. Note that for such a set {A1 , . . . , Am } of simultaneously measurable observables in state α we can now define the observable g(A1 , . . . , Am ) for state α for arbitrary measurable functions g : RnA1 +...+nAm → Rk by g(A1 , . . . , Am ) = g(f1 (B), . . . , fm (B)). In particular, we can define sums and products of such observables to obtain new observables. States and observables in physical theories In physical theories, the states {α} and observables {A} defined above are represented by certain b which are also called states and observables, respectively. For mathematical objects {b α} and {A}, each inertial observer there are bijective correspondences Ts : α 7→ α b and b To : A 7→ A, and for each inertial observer these correspondences are defined in identical ways in terms of the coordinate system. It seems reasonable to suspect that the mathematical objects corresponding to the states and observables at one particular instant of time exhaust the entire set of mathematical objects necessary to describe the physical world. Thus, at each moment of time we can use the same set mathematical objects. However, it is not true that two states or two observables that are related by a time translation will automatically be mapped to the same mathematical object. We will come back to this below, when we will consider the time evolution of a system. Although the correspondences Ts and To are completely determined for a given inertial observer, there might be mathematical objects in the theory, other than states and observables, that are not completely determined for a given inertial observer. For instance, in electrodynamics each observer can choose any particular gauge for the electromagnetic potentials. For each pair (Ts (α), To (A)) and for each Borel set B, a physical theory should provide a T (A) number PTso(α) (B) that represents the probability PαA (B); this is the minimum requirement that any physical theory should satisfy in order to be consistent with emperical data. There is also another consistency requirement for physical theories: the theory should be consistent with the probabilistic form of the principle of special relativity. What this means, can be understood as follows. Suppose that a particular inertial observer O is interested in the possible outcome of a measurement of the observable A for some state α.9 For this, the observer O uses the correspondences Ts/o to obtain the mathematical objects Ts (α) and To (A), and finds that the theory predicts the 9 It is important to emphasize at this point that the descriptions of the α and A are complete: there are no external forces that are not included in the descriptions of α and A. 20 T (A) probabilities PTso(α) (B). Now suppose that O actually performs the corresponding measurement and suppose that a second observer O0 watches O performing the measurement. Then O0 will conclude, from his or her point of view, that a measurement was made of the observable A0 for a state α0 . If O0 wants to know the probabilities for the possible outcomes of the measurement of O, he or she should consider the mathematical objects Ts (α0 ) and To (A0 ). Note in particular that this procedure defines bijections 0 0 FO→O 0 : Ts (α) 7→ Ts (α ) and FO→O0 : To (A) 7→ To (A0 ) of the mathematical objects. The observer O0 now concludes that the theory predicts the probT (A0 ) abilities PTso(α0 ) (B). The requirement that the theory must be consistent with the probabilistic form of the principle of special relativity can thus be stated as T (A) T (A0 ) F 0T (A) PTso(α) (B) = PTso(α0 ) (B) = PF 0O→O Tso(α) (B), O→O 0 (2.10) where α0 and A0 are the state and observable α and A of observer O, as seen from observer O0 . Symmetries of theories and of systems In discussing the principle of special relativity, we argued that there is a pair (s0 , s) of bijections on the sets of mathematical objects representing states and observables, respectively. Such a pair of bijections is called a symmetry of the theory. However, in physics one often studies the symmetries of a particular physical system, which can only be found in a limited set of states and for which only a limited set of observables can be measured. The phrase ’symmetry of a physical system’ actually refers to the symmetries of a subsystem with respect to the entire system. For example, if some inertial observer O places a fixed point charge Q at the origin of his/her coordinate system and he/she wants to study the motion of some test charge q in the field of Q, it would be easier to make use of the spherical symmetry. By spherical symmetry we mean the following. Suppose for the moment that observer O does not know that there is a charge Q at the origin and suppose that he/she develops a theory that describes the subsystem consisting of the test charge q. This theory assigns mathematical objects to the states {αq } and observables {Aq } of the test charge q in a similar manner as we discussed above. Now consider a second observer O0 whose coordinate axes are rotated with respect to the coordinate system of O, and suppose that this second observer uses the same physical theory as O and uses the same correspondence to assign mathematical objects to the states and observables of q. Then spherical symmetry means that there is a relation like (2.10) between the mathematical objects of O and O0 , only this time we consider only the possible states and observables of the test charge q, rather than the set of all states and observables. In particular, there is also a pair (s0 , s) of bijections of the mathematical objects as above. Generalization of these results to arbitrary subsystems motivates us to define a symmetry of a (sub)system to be a pair of bijections (s0 , s) of the mathematical objects representing the states and observables of that particular system. However, in what follows we will only be concerned with symmetries of the theory. In the following subsection we will explain precisely how states and observables are represented mathematically in quantum theory, and how these (mathematical representations of) states and observables can be paired to obtain probabilities. 2.2.2 The general framework of quantum theory The content of this subsection can be found in very many places; we have mainly used [34]. For some background information on self-adjoint operators on a Hilbert space one can consult appendix A. The mathematical description of a quantum system is characterized by a pair (H, A) consisting of a separable Hilbert space H and a set A of self-adjoint operators on H, the elements of which are called observables. Until subsection 4.2.1 we will always assume that A is the set of all self-adjoint operators in H; in subsection 4.2.1 we will discuss the more general case in which A does not 21 contain all self-adjoint operators on the physical Hilbert space. The subset A0 = A ∩ B(H) of A is called the set of bounded observables and B(H) is called the algebra of bounded observables. The set of states S consists of all linear functionals ρ : B(H) → C of the form ρ(A) = Tr(b ρA), where ρb ∈ B1 (H) is a trace-class operator with Tr(b ρ) = 1. Such an operator is called a density operator and it is clear that any convex combination ρb := λb ρ1 + (1 − λ)b ρ2 (with 0 ≤ λ ≤ 1) of two density operators ρb1 , ρb2 is again a density operator and hence (using linearity of the trace) defines a state given by ρ(A) = Tr(b ρA) = λTr(b ρ1 A) + (1 − λ)Tr(b ρ2 A) = λρ1 (A) + (1 − λ)ρ2 (A), where ρi ∈ S denotes the state corresponding to ρbi . In particular, this shows that S is a convex set (just read the equation from right to left and use the fact that ρb = λb ρ1 + (1 − λ)b ρ2 indeed defines a density operator). The extremal points of the convex set S are called pure states, and we denote the set of all pure states by P S. The elements in S\P S are called mixed states. Any density operator is a countable sum of the form X ρb = λi Ei , (2.11) i P with Ei one-dimensional projections on H and λi ≥ 0 with i λi = 1. Conversely, any operator of the form (2.11) defines a density operator. As a special case of (2.11), when ρb is a onedimensional projection onto the (one-dimensional) subspace V1 of H, it is easy to see, by choosing an orthonormal basis {ei }i for H containing a unit vector Ψ ∈ V1 (so ek = Ψ for some k), that for such a state we have that for all A ∈ B(H) X ρ(A) = Tr(b ρA) = hb ρAei , ei i = hAΨ, Ψi. (2.12) | {z } i ∈CΨ Such a state is also called a vector state and we often write such a vector state as ρΨ . Combining (2.11) and (2.12), we see that each state ρ ∈ S is a countable convex combination of vector states and therefore we can write the action of ρ on A ∈ B(H) as X ρ(A) = λi hAΨi , Ψi i (2.13) i P (with 0 ≤ λi ≤ 1 and i λi = 1 as before) for some countable collection {Ψi }i of unit vectors in H. An immediate consequence of (2.13) is that all pure states must be vector states. The converse of this statement is also true10 and therefore the set of pure states coincides exactly with the set of vector states, which in turn is in one-to-one correspondence with the set of unit vectors in H modulo phase factors. Thus the set P S of all pure states is in one-to-one correspondence with the set of all unit rays R(Ψ) = {eiθ Ψ : kΨk = 1, 0 ≤ θ < 2π} in H. Note that R(Ψ1 ) = R(Ψ2 ) if and only if Ψ2 = eiθ Ψ1 for some θ ∈ [0, 2π). Let P(R) denote the set of probability measures R on R. We then define a map A × S → P(R) as follows. If A ∈ A with spectral resolution A = R λdEA (λ) and if ρ ∈ S, we define a probability measure µA,ρ on R by µA,ρ (B) = ρ(EA (B)) = Tr(b ρEA (B)) for any Borel set B ⊂ R. The quantity µA,ρ (B) then represents the probability that, when the quantum system is in the state ρ, the result of a measurement of the observable A lies in the Borel set B. The expectation value hAiρ of the observable A in the state ρ of the system is then Z hAiρ = λdµA,ρ (λ) R 10 This will no longer be the case in the more general situation of subsection 4.2.1, where the algebra of observables is allowed to be a proper subalgebra of B(H). 22 and the variance of A in this state is σρ2 (A) = h(A − hAiρ 1H )2 iρ = hA2 iρ − hAi2ρ . In particular, if ρ ∈ S is a pure state given by ρ(.) = h.Ψ, Ψi, then µA,ρ (B) = hEA (B)Ψ, Ψi for any observable A and for all Borel sets B ⊂ R, and Z λ dhEA (λ)Ψ, Ψi = hAΨ, Ψi, hAiρ = R where the last equality makes sense only when Ψ lies in the domain of A. Also, hA2 iρ = hA2 Ψ, Ψi = hAΨ, AΨi = kAΨk2 , so σρ2 (A) = kAΨk2 − hAΨ, Ψi2 . If A = {A1 , . . . , An } is a finite set of observables that pairwise commute11 , we can define a projection-valued measure EA on Rn by setting EA (B1 × · · · × Bn ) = EA1 (B1 ) . . . EAn (Bn ) for any Borel sets B1 , . . . , Bn ⊂ R, and this will then define EA for all Borel subsets of Rn . If the system is in state ρ, this will define a probability measure µA,ρ on Rn by µA,ρ (B) = ρ(EA (B)) = Tr(b ρEA (B)) for any Borel set B ⊂ Rn , and the quantity µA,ρ (B) is to be interpreted as the probability that the result (a1 , . . . , an ) of a simultaneous measurement of the observables A1 , . . . , An belongs to the Borel set B ⊂ Rn . When two observables do not commute, it is impossible to measure them simultaneously. For each quantum system there is a special observable H ∈ A, called the Hamiltonian operator, or simply the Hamiltonian of the quantum system. Given this Hamiltonian H (with dense domain D(H)), we define a strongly-continuous one-parameter unitary group UH (t) on H by12 UH (t) = e−itH . d For any vector Ψ ∈ D(H) we then have i dt UH (t)Ψ = UH (t)HΨ = HUH (t)Ψ, where in the last equality we have assumed that each UH (t) leaves D(H) invariant. In quantum theory we have the so-called Heisenberg picture and Schrödinger picture to describe the dynamics of a quantum system. In the Heisenberg picture, the observables depend on time, whereas the states do not depend on time. Here we will only consider quantum systems for which the Hamiltonian does not depend explicitly on time. When we have a quantum system in state ρ ∈ S at time t = 0, then the time evolution A(t) of an observable with A(0) = A ∈ A is given by A(t) = UH (−t)AUH (t) ∈ A and hence the probability that the result of a measurement at time t of an observable A lies in the Borel set B ⊂ R is µA(t),ρ (B) = Tr(ρ̂EA(t) (B)). Now assume that Ψ ∈ D(H) is such that UH (t0 )Ψ ∈ D(A) for all t0 in some neighborhood of t ∈ R, then d d d 0 A(t)Ψ = UH (−t )AUH (t)Ψ + 0 UH (−t)AUH (t0 )Ψ dt dt0 t0 =t dt t0 =t = iHUH (−t)AUH (t)Ψ − iUH (−t)AUH (t)HΨ = i(HA(t) − A(t)H)Ψ. In this sense, we can say that A(t) satisfies the Heisenberg equation of motion dA (t) = i[H, A(t)]. dt 11 We say that two self-adjoint elements S and T commute if ES (B1 )ET (B2 ) = ET (B2 )ES (B1 ) for all Borel sets B1 , B2 ⊂ R. 12 We will always use units in which ~ = 1. Otherwise there would have been a factor ~−1 in the exponent. 23 An observable A ∈ A is called a quantum integral of motion (or constant of motion) if A(t) = A for all t ∈ R. So an observable A is a constant of motion if and only if it commutes with all UH (t), which happens precisely when it commutes with H. Hence constants of motion are observables that commute with the Hamiltonian. In the Schrödinger picture, states depend on time and observables do not depend on time. When we have a quantum system which is at state ρ at t = 0, then the time evolution of the state is described in term of the time evolution of the density operator as ρb(t) = UH (t)b ρUH (−t) ∈ B1 (H). and hence the probability that the result of a measurement at time t of an observable A lies in the Borel set B ⊂ R is µA,ρ(t) (B) = Tr(ρ̂(t)EA (B)). For any Ψ ∈ D(H) we then have d d d 0 ρb(t)Ψ = UH (t )b ρUH (−t)Ψ + 0 UH (t)b ρUH (−t0 )Ψ dt dt0 t0 =t dt t0 =t = −iHUH (t)b ρUH (−t)Ψ + iUH (t)b ρUH (−t)HΨ = i(H ρb(t) − ρb(t)H)Ψ. In this sense, ρb(t) satisfies the Schrödinger equation of motion db ρ (t) = −i[H, ρb(t)]. dt A state ρ ∈ S is called stationary when ρb(t) = ρb for all t ∈ R. Thus a state ρ is stationary if and only if ρb commutes with H. If ρ is a pure state given by ρb = EΨ , then for any Φ ∈ H we have ρb(t)Φ = UH (t)EΨ (UH (−t)Φ) = UH (t)(hUH (−t)Φ, ΨiΨ) = hΦ, UH (t)ΨiUH (t)Ψ = EΨ(t) Φ, where Ψ(t) = UH (t)Ψ. Now if Ψ ∈ D(H), we get so then Ψ(t) satisfies the Schrödinger equation i 2.2.3 dΨ dt (t) = d dt UH (t)Ψ = −iHUH (t)Ψ = −iHΨ(t), dΨ (t) = HΨ(t). dt Symmetries in quantum theory The global structure of this subsection is borrowed from section 2.4 of [1], but the proofs presented here are more detailed than in [1]. In quantum theory, a symmetry of a quantum system (H, A) is a pair (s, s0 ) of bijections s : A0 → A0 and s0 : S → S on the set of bounded observables and the set of states, respectively, satisfying (s0 ρ)(sA) = ρ(A), s(f (A)) = f (s(A)) for all ρ ∈ S, f ∈ C(σ(A)) and A ∈ A0 . Also, if (s1 , s01 ) and (s2 , s02 ) are two symmetries, it can be shown that s1 = s2 ⇔ s01 = s02P ; in other words, a symmetry (s, s0 ) is completely determined once we know either s or s0 . If ρ = i λi ρi ∈ S is a convex combination of states ρi ∈ S, then for each A ∈ A0 we have X X (s0 ρ)(sA) = ρ(A) = λi ρi (A) = λi (s0 ρi )(sA). i i Because s is a bijection, this implies that for each A ∈ A0 we have X X (s0 ρ)(A) = (s0 ρ)(s(s−1 A)) = λi (s0 ρi )(s(s−1 A)) = λi (s0 ρi )(A), i i 24 P 0 0 so s0 ρ = i λi s ρi , and hence s preserves the convex structure of S. In particular, the set of extreme points is mapped bijectively onto itself, so s0 gives a bijection from the set of pure states onto itself. Because the set of pure states is in one-to-one correspondence to the set of unit rays in H (since we have assumed that A is the set of all self-adjoint operators on H), this means that (s, s0 ) induces a bijection sb : {R(Ψ)}Ψ∈H → {R(Ψ)}Ψ∈H from the set of unit rays of H onto itself. If ρ1 , ρ2 ∈ P S are the two pure states corresponding to the unit rays R(Ψ1 ) and R(Ψ2 ), respectively, then we say that the pure states ρ1 and ρ2 are orthogonal if R(Ψ1 ) ⊥ R(Ψ2 ) (i.e. if the vectors in the unit ray R(Ψ1 ) are orthogonal to the vectors in the unit ray R(Ψ2 )), and we write ρ1 ⊥ ρ2 . Orthogonal pure states can also be characterized as follows. Lemma 2.17 Two pure states ρ1 , ρ2 ∈ P S are orthogonal if and only if there exists a projection operator E satisfying ρ1 (E) = 1 and ρ2 (E) = 0. Proof If such projection operator exists, then k(1 − E)Ψ1 k2 = ρ1 (1 − E) = 0 and kEΨ2 k2 = ρ2 (E) = 0, where Ψ1 and Ψ2 are unit vectors in the unit rays corresponding to the pure states ρ1 and ρ2 , respectively. Thus (1 − E)Ψ1 = 0 and EΨ2 = 0, so hΨ1 , Ψ2 i = h(1 − E)Ψ1 , Ψ2 i + hEΨ1 , Ψ2 i = 0 + hΨ1 , EΨ2 i = 0. Now suppose that hΨ1 , Ψ2 i = 0. If E is the one-dimensional projection onto CΨ1 then EΨ1 = Ψ1 and EΨ2 = 0, so ρ1 (E) = hEΨ1 , Ψ1 i = 1 and ρ2 (E) = hEΨ2 , Ψ2 i = 0. Lemma 2.18 If (s, s0 ) is a symmetry and ρ1 , ρ2 ∈ P S are pure states, then ρ1 ⊥ ρ2 if and only if s0 ρ1 ⊥ s0 ρ2 . In other words, if R1 and R2 are the unit rays corresponding to ρ1 and ρ2 , then R1 ⊥ R2 if and only if sb(R1 ) ⊥ sb(R2 ). Proof Suppose that ρ1 ⊥ ρ2 . Let E ∈ A0 be a projection with ρ1 (E) = 1 and ρ2 (E) = 0. Because (sE)2 = s(E 2 ) = sE, sE is also a projection and we have (s0 ρ1 )(sE) = ρ1 (E) = 1 and (s0 ρ2 )(sE) = ρ2 (E) = 0. Hence s0 ρ1 ⊥ s0 ρ2 , by the previous lemma. Now suppose that s0 ρ1 ⊥ s0 ρ2 . Because (s−1 , (s0 )−1 ) is also a symmetry, the argument above gives that (s0 )−1 s0 ρ1 ⊥ (s0 )−1 s0 ρ2 , i.e. ρ1 ⊥ ρ2 . Definition 2.19 If ρ1 and ρ2 are two pure states corresponding to the unit rays R(Ψ1 ) and R(Ψ2 ), respectively, then hρ1 , ρ2 i := |hΨ1 , Ψ2 i|2 is called the transition probability between the states ρ1 and ρ2 . Physically, this transition probability represents the probability that a system in state ρ1 is found to be in state ρ2 after measurement, or vice versa. Lemma 2.20 If (s, s0 ) is a symmetry, then for all pure states ρ1 , ρ2 ∈ P S we have hs0 ρ1 , s0 ρ2 i = hρ1 , ρ2 i. As a consequence, if R1 and R2 are the unit rays corresponding to ρ1 and ρ2 respectively, then for arbitrary Ψj ∈ Rj and Ψ0j ∈ sb(Rj ) we have |hΨ01 , Ψ02 i|2 = |hΨ1 , Ψ2 i|2 . 25 Proof Let Ψ1 and Ψ2 be two unit vectors in the unit rays corresponding to ρ1 and ρ2 , respectively, and let Eρ1 be the one-dimensional projection onto CΨ1 . Then sEρ1 is a projection, (s0 ρ1 )(sEρ1 ) = ρ1 (Eρ1 ) = 1 and for any ρ ∈ P S with ρ ⊥ ρ1 we have (s0 ρ)(sEρ1 ) = ρ(Eρ1 ) = 0. This implies that sEρ1 = Es0 ρ1 . We also have hρ1 , ρ2 i = |hΨ1 , Ψ2 i|2 = |hΨ1 , Eρ1 Ψ2 i|2 = kEρ1 Ψ2 k2 = hEρ1 Ψ2 , Eρ1 Ψ2 i = hEρ1 Ψ2 , Ψ2 i = ρ2 (Eρ1 ), and similarly, hs0 ρ1 , s0 ρ2 i = (s0 ρ2 )(Es0 ρ1 ), so we get hs0 ρ1 , s0 ρ2 i = (s0 ρ2 )(Es0 ρ1 ) = (s0 ρ2 )(sEρ1 ) = ρ2 (Eρ1 ) = hρ1 , ρ2 i. This lemma has a nice implication. Let {Ψn }n be an orthonormal basis of H, let (s, s0 ) be a symmetry and let {Ψ0n }n be a set of vectors with Ψ0n ∈ sb(R(Ψn )). Then, according to the lemma, for arbitrary m, n we have |hΨ0m , Ψ0n i|2 = |hΨm , Ψn i|2 = δmn . By positive definiteness of the inner product, this implies that hΨ0m , Ψ0n i = δmn . So {Ψ0n }n is an orthonormal set in H. Now suppose that there would exist a unit vector Ψ0 ∈ H with Ψ0 ⊥ Ψ0n for all n. Then for each Ψ ∈ sb−1 (R(Ψ0 )) we have that |hΨ, Ψn i|2 = |hΨ0 , Ψ0n i|2 = 0 for all n. Because {Ψn }n is an orthonormal basis of H, this implies that Ψ = 0. This contradicts the fact that Ψ is a unit vector. Hence, we conclude that no unit vector Ψ0 ∈ H satisfying Ψ0 ⊥ Ψ0n for all n can exist. This shows that {Ψ0n }n is also an orthonormal basis of H. Lemma 2.21 Let Ψ1 , Ψ2 ∈ H be unit vectors with Ψ1 ⊥ Ψ2 and let R(Ψ1 ) and R(Ψ2 ) be their corresponding unit rays. For λ, µ ∈ C with |λ|2 + |µ|2 = 1 we define the unit vector Ψλ,µ := λΨ1 + µΨ2 with corresponding unit ray R(Ψλ,µ ). If (s, s0 ) is a symmetry and Ψ0 ∈ sb(R(Ψc1 ,c2 )) with c1 c2 6= 0, then there exist Ψ01 ∈ sb(R(Ψ1 )) and Ψ02 ∈ sb(R(Ψ2 )) satisfying Ψ0 = c1 Ψ01 + c2 Ψ02 . Proof If we define R := {R(Ψλ,µ ) : |λ|2 + |µ|2 = 1}, then R is precisely the set of all unit rays which are orthogonal to any ray which is orthogonal to both R(Ψ1 ) and R(Ψ2 ). By lemma 2.18, sb(R) := {b s(R) : R ∈ R} is then precisely the set of all unit rays which are orthogonal to any ray which is orthogonal to both sb(R(Ψ1 )) and sb(R(Ψ2 )). But this implies that we can write sb(R) as sb(R) = {R(λ0 Ψ001 + µ0 Ψ002 ) : |λ0 |2 + |µ0 |2 = 1}, where Ψ001 and Ψ002 are some fixed vectors in sb(R(Ψ1 )) and sb(R(Ψ2 )). Thus for each R(Ψλ,µ ) ∈ R we can write sb(R(Ψλ,µ )) as sb(R(Ψλ,µ )) = R(λ0 Ψ001 + µ0 Ψ002 ) for some λ0 , µ0 ∈ C with |λ0 |2 + |µ0 |2 = 1. Now let Ψ0 ∈ sb(R(Ψc1 ,c2 )) with c1 c2 6= 0. Because sb(R(Ψc1 ,c2 )) = R(c01 Ψ001 + c02 Ψ002 ) for some 0 c1 , c02 ∈ C with |c01 |2 + |c02 |2 = 1, there exists a θ ∈ [0, 2π) such that Ψ0 = eiθ (c01 Ψ001 + c02 Ψ002 ) = c001 Ψ001 + c002 Ψ002 , where c00j := eiθ c0j . Because |c00j |2 = |hΨ0 , Ψ00j i|2 = |hΨc1 ,c2 , Ψj i|2 = |cj |2 , we can write c00j = cj eiθj for some θj ∈ [0, 2π). But then Ψ0 = c1 eiθ1 Ψ001 + c2 eiθ2 Ψ002 = c1 Ψ01 + c2 Ψ02 , 26 where Ψ0j := eiθj Ψ00j ∈ sb(R(Ψj )). We will now formulate and prove the main theorem concerning symmetries in quantum theory. This theorem, due to Wigner, states that we can represent any symmetry transformation of a quantum system by a unitary or antiunitary operator on the corresponding Hilbert space. The (constructive) proof of Wigner’s theorem that is given below is a mixture of the proofs in [35] and [1]. Theorem 2.22 (Wigner’s theorem) Let (H, A) be a quantum system and let sb be a bijection of the set of unit rays onto itself that conserves transition probabilities. Then there exists a map U : H → H that is either linear and unitary or antilinear and antiunitary satisfying U Ψ ∈ sb(R(Ψ)) for all unit vectors Ψ ∈ H. Furthermore, such a map U is uniquely determined up to a phase factor. Proof We divide the proof into several steps. •Step 1: Define U on an orthonormal basis {Ψn }n≥1 of H. Let {Ψn }n≥1 be an orthonormal basis of H (here we use the assumption that Hilbert spaces in quantum mechanics are separable) and write Rn := R(Ψn ) for the corresponding unit rays. For n 6= 1 we define the unit vectors 1 Φn = √ (Ψ1 + Ψn ). 2 We now choose an arbitrary vector in sb(R1 ) and call it U Ψ1 . For arbitrary Φ0n ∈ sb(R(Φn )) we have |hΦ0n , U Ψ1 i|2 = |hΦn , Ψ1 i|2 = 12 , so there exists a unique U Φn ∈ sb(R(Φn )) such that hU Φn , U Ψ1 i = √12 . For n 6= 1 we then define U Ψn := √ 2U Φn − U Ψ1 . To see that U Ψn ∈ sb(Rn ), observe that, according to lemma 2.21, there exist Ψ01 ∈ sb(R1 ) and Ψ0n ∈ sb(Rn ) satisfying U Φn = √12 (Ψ01 + Ψ0n ). Because Ψ01 , U Ψ1 ∈ sb(R1 ), there exists a θ ∈ [0, 2π) √ such that Ψ01 = eiθ U Ψ1 . Then U Ψn can be written as U Ψn = √2U Φn − U Ψ1 = (eiθ − 1)U Ψ1 + Ψ0n . Because U Ψ1 ⊥ Ψ0n , we find that eiθ − 1 = hU Ψn , U Ψ1 i = 2hU Φn , U Ψ1 i − hU Ψ1 , U Ψ1 i = 0. Hence we have U Ψn = (eiθ − 1)U Ψ1 + Ψ0n = Ψ0n , so indeed U Ψn ∈ sb(Rn ). We have now defined U on the basis elements Ψn and on the elements Φn = √12 (Ψ1 + Ψn ). According to the discussion after lemma 2.18, {U Ψn }n forms an orthonormal basis of H, and our definition of U Φn does not spoil the possibility to make U an R-linear operator, since 1 1 U √ (Ψ1 + Ψn ) = U Φn = √ (U Ψ1 + U Ψn ). 2 2 P P •Step 2: If Ψ = n cn Ψn is a unit vector with c1 6= 0 and if Ψ0 = n c0n U Ψn ∈ sb(R(Ψ)), then either cn /c1 = c0n /c01 for all n or cn /c1 = c0n /c01 for all n. Let X Ψ= cn Ψn ∈ H n be a unit vector with c1 6= 0. If Ψ0 ∈ sb(R(Ψ)), we may write it as X Ψ0 = c0n U Ψn . n 27 Note that for all n we have |cn |2 = |hΨ, Ψn i|2 = |hΨ0 , U Ψn i|2 = |c0n |2 . Also, |c1 + cn |2 = √ √ 2|hΨ, Φn i|2 = 2|hΨ0 , U Φn i|2 = |c01 + c02 |2. For arbitrary complex numbers a, b ∈ C with a 6= 0 we have |a + b|2 = |a|2 +|b|2 + 2|a|2 Re ab , so theequality |c1 + cn |2 = |c01 + c02 |2 implies that 0 |c1 |2 + |c2 |2 + 2|c1 |2 Re ccn1 = |c01 |2 + |c02 |2 + 2|c01 |2 Re ccn0 . Together with |cn |2 = |c0n |2 this implies 1 that 0 c cn = Re n0 . Re c1 c1 h i2 2 h i2 0 2 h 0 i2 h 0 i2 Also, Im ccn1 = Im ccn0 . Hence, = ccn1 − Re ccn1 = ccn0 − Re ccn0 1 Im cn c1 1 = ±Im c0n c01 1 . Thus we conclude that we either have cn /c1 = c0n /c01 (2.14) cn /c1 = c0n /c01 . (2.15) or We will now show that for each n, the same choice between (2.14) and (2.15) must be made. Suppose that for some k we have ck /c1 = c0k /c01 and that for some l we have cl /c1 = c0l /c01 and that both ratios are not real. Note that this requires that k, l > 1 and k 6= l. Let 1 Υ := √ (Ψ1 + Ψk + Ψl ) 3 and let Υ0 be an arbitrary vector in sb(R(Υ)). When we write Υ = d1 Ψ1 + dk Ψk + dl Ψl and d0 d0 Υ0 = d01 U Ψ1 + d0k U Ψk + d0l U Ψl , we have ddk1 , dd1l ∈ R, so we must have ddk1 = dk0 and dd1l = d0l , so we 1 1 can write Υ0 = α(d1 U Ψ1 + dk U Ψk + dl Ψl ) = √α3 (U Ψ1 + U Ψk + U Ψl ) for some α ∈ C with |α| = 1. It then follows from conservation of transition probability that 2 0 0 2 1 + ck + cl = 3 |hΨ0 , Υ0 i|2 = 3 |hΨ, Υi|2 = 1 + ck + cl . c01 c01 |c01 |2 |c1 |2 c1 c1 By assumption, we have ck /c1 = c0k /c01 and cl /c1 = c0l /c01 , so 2 1 + ck /c1 + cl /c1 = |1 + ck /c1 + cl /c1 |2 . For arbitrary complex numbers a, b ∈ C the equality |a+b|2 = |a+b|2 implies that Re(ab) = Re(ab), which is equivalent to Re(a)Re(b) − Im(a)Im(b) = Re(a)Re(b) + Im(a)Im(b), or Im(a)Im(b) = 0. When we apply this to a = 1 + ck /c1 and b = cl /c1 , we find that Im(1 + ck /c1 )Im(cl /c1 ) = 0. But Im(1 + ck /c1 ) = Im(ck /c1 ), so Im(ck /c1 )Im(cl /c1 ) = 0. This implies that at least one of the two ratios ck /c1 and cl /c1 is real, P which contradicts our assumption that both ratios are not real. We thus conclude that if Ψ = n cn Ψn is a unit vector with P c1 6= 0 and if Ψ0 = n c0n U Ψn ∈ sb(R(Ψ)), then either cn /c1 = c0n /c01 for all n or else cn /c1 = c0n /c01 for all n. P •Step 3: Define U on unit vectors Ψ = n cn Ψn with c1 6= 0 and for which it is not true that cn /c1 ∈ R for all n. If cn /c1 = c0n /c01 for all n and if it is not true that cn /c1 ∈ R for all n, then we define U Ψ to be the unique vector in sb(R(Ψ)) for which c01 = c1 . In that case we get cn = (cn /c1 )c1 = (c0n /c01 )c01 = c0n for all n. 28 If cn /c1 = c0n /c01 for all n and if it is not true that cn /c1 ∈ R for all n, then we define U Ψ to be the unique vector in sb(R(Ψ)) for which c01 = c1 . In that case we get cn = (cn /c1 )c1 = (c0n /c01 )c01 = c0n for all n. If cn /c1 ∈ R for all n, we will not define U Ψ yet. P •Step 4: Define U on unit vectors Ψ = n cn Ψn with c1 = 0 and for which it is not true that cn ∈ R for all n ≥ 2. Now suppose that X cn Ψn ∈ H Ψ= n is a unit vector with c1 = 0. We then define X e := √1 (Ψ1 + Ψ) = Ψ e cn Ψn , 2 n √1 2 cn √ 2 for n 6= 1. If it is not true that cn ∈ R for all n ≥ 2 then it is also not 0 e =P e e with true that e cn /e c1 ∈ R for all n, so the procedure in step 3 defines U Ψ b(R(Ψ)) n cn U Ψn ∈ s 0 0 either e cn = e cn for all n or else e cn = e cn for all n. We then define X √ e − U Ψ1 = c0n U Ψn , U Ψ := 2U Ψ where e c1 = and e cn = n √ where c01 = 0 and c0n = 2e c0n for n 6= 1. Thus, we either have c0n = cn for all n or else c0n = cn for all n. P We have now defined U Ψ for all unit vectors Ψ = n cn Ψn for which not all coefficients have the same phase (since we assumed that it is not true that cn /c1 ∈ R for all n), and for such unit vectors we either have ! X X UΨ = U cn Ψn = cn U Ψn (2.16) n n or ! UΨ = U X cn Ψn n = X cn U Ψn . (2.17) n Furthermore, for such unit vectors our choice between the only choice that is P (2.16) and (2.17) is P possible, since for suchPunit vectors we either have n cn U Ψn ∈ sb(R(Ψ)) or n cn U Ψn ∈ sb(R(Ψ)), but not both. If Ψ = n cn Ψn is a unit vector for P which all coefficients have the same phase, then also all coefficients of an arbitrary vector Ψ0 = n c0n U Ψn in sb(R(Ψ)) have the same phase. This meansPthat for such Ψ Pwe are free to choose whether we want U Ψ to satisfy (2.16) or (2.17), since both n cn U Ψn and n cn U Ψn are in sb(R(Ψ)). As stated before, we will not make this choice yet. •Step 5: The choice between (2.16) and (2.17) must be the same for all unit vectors Ψ for which we have defined U Ψ (i.e. the unit P P vectors where not all coefficients have the same phase). Let Υ1 = n an Ψn and Υ2 = n bn Ψn be two unit vectors for which U Υ1 and U Υ2 are already defined by steps 3 and 4 above, and such that U Υ1 satisfies equation (2.16) and U Υ2 satisfies equation (2.17); in particular, this implies that U Υ1 6= U Υ2 . Conservation of transition probability gives 2 * +2 X X X ak Ψk , bl Ψl = |hΥ1 , Υ2 i|2 bn an = n k l * +2 X X = |hU Υ1 , U Υ2 i|2 = ak U Ψk , bl U Ψl k l 2 X = bn an . n 29 Using this equality, we find that X X [Re(bk bl )Re(ak al ) − Im(bk bl )Im(ak al )] [Re(bk bl )Re(ak al ) + Im(bk bl )Im(ak al )] = k,l k,l " = X k,l Re[(bk bl )(ak al )] = Re ! X bk ak !# X k bl al l 2 2 2 X X X = Re bn an = bn an = bn an n n n 2 " ! !# X X X = Re bn an = Re bk ak bl al n k l X = Re[(bk bl )(ak al )] k,l = X [Re(bk bl )Re(ak al ) − Im(bk bl )Im(ak al )]. k,l Thus, we find that U Υ1 and U Υ2 satisfy (2.16) and (2.17), respectively, if and only if X Im(bk bl )Im(ak al ) = 0. (2.18) k,l We now separate two cases. Case 1. If Υ1 and Υ2 lie in the same unit ray, then there exists a θ ∈ (0, 2π) such that for all n we have bn = an eiθ .PBut then bk bl = ak eiθ al eiθ = ak al for all k, l, so equation (2.18) gives P 2 2 k,l [Im(ak al )] , which implies that bk bl , ak al ∈ R for all k, l. This in turn k,l [Im(bk bl )] = 0 = implies that all an ’s have the same phase and all bn ’s have the same phase. But this contradicts our assumption that U Υ1 and U Υ2 were already defined by our procedure. Hence, if Υ1 and Υ2 are in the same unit ray then the same choice between (2.16) and (2.17) must be made for U Υ1 and U Υ2 . Case 2. Now suppose that Υ1 and Υ2 are not in the same unit ray. Because we have assumed that U Υ1 and U Υ2 are defined, not all ak al and not all bk bl are real (since then all an would have the same phase, as well as all bn ). We again separate two cases. Case 2a. If there exists a pair (i, j) such that both ai aj and bi bj are not P real, then define a unit P vector Υ = n cn Ψn = √12 (eiλ Ψi + eiµ Ψj ) with 0 < λ < µ < 2π. Then k,l Im(ck cl )Im(ak al ) 6= 0 P and k,l Im(ck cl )Im(bk bl ) 6= 0, so for U Υ and U Υ1 the same choice between (2.16) and (2.17) must be made, and for U Υ and U Υ2 the same choice between (2.16) and (2.17) must be made. Hence the same choice must be made for U Υ1 and U Υ2 . Case 2b. Now suppose that there is no such pair (i, j). Then we choose a pair (i, j) for which ai aj is not real and we choose a different pair (m, n) with {m, n} ∩ {i, j} = 6 ∅) for P (possibly P which bm bn is not real. Now take a unit vector Υ = n cn Ψn = Pk∈{i,j,m,n} ck Ψk such that all (three or four) coefficients ci , cj , cm , cn have different phases. Then k,l Im(ck cl )Im(ak al ) 6= 0 and P k,l Im(ck cl )Im(bk bl ) 6= 0, so again we conclude that the same choice between (2.16) and (2.17) must be made for U Υ1 and U Υ2 . Thus for all unit vectors Ψ for which we have already defined U Ψ, the same choice between (2.16) and (2.17) must be made. •Step 6: Define U for those unit vectors Ψ for which U Ψ was not yet defined by the previous steps. P As stated before, vectors Ψ = n cn Ψn for which we have not yet defined U Ψ, we Pfor those unit P have that both n cn U Ψn and n cn U Ψn are in the unit ray sb(R(Ψ)). We will now define U Ψ for these vectors P as follows. If U satisfies (2.16) for all unit vectors for which U is defined, then we define U Ψ := n cn U Ψn . If U satisfies (2.17) for all unit vectors for which U is defined, then we 30 define U Ψ := P n cn U Ψn . •Step 7: Define U for Ψ ∈ H with kΨk = 6 1. We have now defined U Ψ for all unit vectors Ψ in H and U either satisfies (2.16) for all unit vectors, or else U satisfies (2.17) for all unit vectors. In both cases we can extend U to a map Ψ U : H → H by defining U Ψ := kΨkU kΨk for all Ψ ∈ H. In the first case, U becomes a linear P P map on H and for arbitrary Ψ = n an Ψn and Φ = n bn Ψn in H we have X X X an bn = hU Ψ, U Φi = ak bl hU Ψk , U Ψl i = ak bl hΨk , Ψl i = hΨ, Φi, n k,l k,l so U is unitary. InP the second case, U becomes an antilinear map on H and for arbitrary Ψ = P a Ψ and Φ = n n n n bn Ψn in H we have X X X ak bl hU Ψk , U Ψl i = an bn = ak bl hΨk , Ψl i = hΦ, Ψi, hU Ψ, U Φi = n k,l k,l so U is antiunitary. •Step 8: Uniqueness of U up to a phase factor. Now suppose that U 0 : H → H is another (anti)linear (anti)unitary map satisfying U 0 Ψ ∈ sb(R(Ψ)) for all unit vectors Ψ ∈ H. Choose an arbitrary unit vector Ψ0 ∈ H. Because U Ψ0 , U 0 Ψ0 ∈ sb(R(Ψ0 )), there exists a λ0 ∈ [0, 2π) such that U 0 Ψ0 = eiλ0 U Ψ0 . Let R be a unit ray in H that is not orthogonal to the unit ray R(Ψ0 ), and let Ψ ∈ R be the unique unit vector with hΨ, Ψ0 i ∈ R>0 . Since U Ψ and U 0 Ψ lie in the same unit ray, we can write U 0 Ψ = eiλ U Ψ for some λ ∈ [0, 2π). But, because U and U 0 preserve real inner products, hU Ψ, U Ψ0 i = hΨ, Ψ0 i = hU 0 Ψ, U 0 Ψ0 i = ei(λ−λ0 ) hU Ψ, U Ψ0 i, so λ = λ0 and thus U 0 Ψ = eiλ0 U Ψ. Using the fact that U and U 0 are both (anti)linear, we find that this holds not only for Ψ, but for all vectors in CΨ. We have thus found that U 0 Υ = eiλ0 U Υ for all Υ ∈ H with hΥ, Ψ0 i = 6 0. Now let Ψ ∈ H with hΨ, Ψ0 i = 0. Define Φ = Ψ + Ψ0 . Because Φ is not orthogonal to Ψ0 , it follows from the discussion above that U 0 Φ = eiλ0 U Φ. Because U and U 0 are R-linear, we have that U Φ = U Ψ + U Ψ0 and U 0 Φ = U 0 Ψ + U 0 Ψ0 . This gives us U 0 Ψ = (U 0 Φ − U 0 Ψ0 ) = eiλ0 (U Φ − U Ψ0 ) = eiλ0 U Ψ. We thus conclude that U 0 = eiλ0 U . In case U is unitary, we have for all Ψ ∈ H and any observable A hAΨ, Ψi = ρΨ (A) = (s0 ρΨ )(sA) = h(sA)U Ψ, U Ψi = hU ∗ (sA)U Ψ, Ψi, which implies that A = U ∗ (sA)U , or sA = U AU −1 , (2.19) where we have used that unitarity of a linear map U : H → H is equivalent to U ∗ = U −1 . Note that the expression for sA in (2.19) does not depend on the arbitrary phase factor in U , and the bijection s : A0 → A0 so defined is R-linear and satisfies sA2 = (sA)2 . We may as well extend (2.19) to a bijection s : B(H) → B(H), and this bijection is in fact an automorphism of the C ∗ -algebra B(H) (we will discuss this in more detail in subsection 4.2.1). Now consider the case 31 where U is anti-unitary. Because the definition of the adjoint U ∗ of an anti-linear map is given by the condition hU Ψ1 , Ψ2 i = hΨ1 , U ∗ Ψ2 i, we now obtain the equality hAΨ, Ψi = ρΨ (A) = (s0 ρΨ )(sA) = h(sA)U Ψ, U Ψi = hU ∗ (sA)U Ψ, Ψi = hΨ, U ∗ (sA)U Ψi for all Ψ ∈ H and any observable A. This implies that A∗ = U ∗ (sA)U , or sA = U A∗ U −1 , (2.20) where we have used that anti-unitarity of an anti-linear map U : H → H is equivalent to U ∗ = U −1 . Again this expression does not depend on the arbitrary phase factor in U and the map s : A0 → A0 in (2.20) is R-linear and satisfies sA2 = (sA)2 . When we extend s in (2.20) to a map s : B(H) → B(H), we obtain an anti-automorphism of the C ∗ -algebra B(H) (i.e. a ∗-preserving vector space isomorphism satisfying sAB = sBsA). We will come back to this in subsection 4.2.1. It is clear that the set of all symmetries of a quantum system forms a group under the composition of (bijective) maps; it is called the symmetry group of the system. Now suppose that (s1 , s01 ) and (s2 , s02 ) are two symmetries of a quantum system with corresponding unit ray transformations sb1 and sb2 . Then the composition (s2 ◦ s1 , s02 ◦ s01 ) is also a symmetry of the system with corresponding unit ray transformation sb2 ◦ sb1 . By Wigner’s theorem there exist operators U1 , U2 , U2◦1 : H → H each of which is either linear and unitary or antilinear and antiunitary, and such that U1 Ψ ∈ sb1 (R(Ψ)), U2 Ψ ∈ sb2 (R(Ψ)), U2◦1 Ψ ∈ sb2 ◦ sb1 (R(Ψ)) for all unit vectors Ψ ∈ H. Hence we must have U2 ◦ U1 = λ(sb1 , sˆ2 )U2◦1 , where λ(sb1 , sb2 ) is some complex number with |λ(sb1 , sb2 )| = 1. In other words, we may conclude that if G is (a subgroup of) the symmetry group of the quantum system and if for each g ∈ G we have chosen an operator U (g) as in Wigner’s theorem, then for all g1 , g2 ∈ G we have U (g1 )U (g2 ) = λ(g1 , g2 )U (g1 g2 ), where λ : G × G → C with |λ(g1 , g2 )| = 1 for all g1 , g2 ∈ G. We say that U : G → B(H) is a ray representation of G: it is a representation of G up to a phase factor. Because B(H) is an associative algebra, we must have (U (g1 )U (g2 ))U (g3 ) = U (g1 )(U (g2 )U (g3 )) for all g1 , g2 , g3 ∈ G, which implies that λ must satisfy λ(g1 , g2 )λ(g1 g2 , g3 ) = λ(g1 , g2 g3 )λ(g2 , g3 ); a function λ : G × G → C with |λ(g, h)| = 1 for all g, h ∈ G satisfying this equation is called a 2-cocycle of G. Because the operators U (g) are only determined up to a phase factor, we can redefine them by letting U 0 (g) := µ(g)U (g), where µ : G → C with |µ(g)| = 1 for all g ∈ G. By considering all such functions µ, we obtain all possible ray representations U : G → C. A natural question to ask is whether we can choose µ such that λ(g1 , g2 ) = 1 for all g1 , g2 ∈ G; in that case we get U (g1 )U (g2 ) = U (g1 g2 ) for all g1 , g2 ∈ G, so U becomes an ordinary representation of G instead of a ray representation. In the next subsection we will answer this question for the case ↑ where G is the restricted Poincaré group P+ . 2.2.4 Poincaré invariance and one-particle states In relativistic quantum theory it is believed that the laws of nature are the same for two observers whose reference frames are related by a restricted Poincaré transformation, so every restricted Poincaré transformation on spacetime must give rise to a symmetry of the quantum system. For ↑ notational simplicity we will identify a restricted Poincaré transformation g ∈ P+ with its associated symmetry by writing g instead of ŝ(g) for the corresponding ray transformation. We can ↑ then say that in relativistic quantum theory P+ must be a subgroup of the symmetry group of ↑ the theory. By the remarks in the previous subsection, this means that the action of P+ on a ↑ ↑ quantum system is given by a ray representation U : P+ → B(H) of P+ on H. In a sufficiently ↑ ↑ small neighborhood N ⊂ P+ of the identity 1 ∈ P+ , each element g ∈ N can be written as g = h2 32 ↑ with h ∈ P+ . But then U (g) = λ(h, h)−1 U (h)U (h), which is linear and unitary, regardless of whether U (h) is linear and unitary or antilinear and antiunitary. Hence for all g ∈ N the operator ↑ U (g) is linear and unitary. Because each elements g ∈ P+ can be written as a finite product of ↑ elements in N (since P+ is a connected Lie group), this in turn implies that U (g) is linear and ↑ ↑ unitary for all g ∈ P+ . In other words, the action of P+ on a quantum system is given by a unitary ↑ ↑ ray representation U : P+ → B(H) of P+ on H. ↑ ↑ e↑ → P ↑ is Now suppose that Uray : P+ → B(H) is a unitary ray representation of P+ . If Φ : P + + ↑ ↑ e e e the covering map, then Uray := Uray ◦ Φ : P+ → B(H) is a unitary ray representation of P+ . For eray the following theorem applies. U e↑ can, by a suitable choice of phase factors, Theorem 2.23 Any unitary ray representation of P + e↑ .13 be made into a unitary representation of P + e↑ → C with |µ(e e↑ such that ge 7→ µ(e eray (e Thus there exists a function µ : P g )| = 1 for all ge ∈ P g )U g) + + ↑ ↑ e ; we denote this unitary representation of P e by U e . The ray is a unitary representation of P + + ↑ ↑ e e representation Uray of P+ can thus be described by the unitary representation U : P+ → B(H) of e (e the universal covering group since R(Uray (g)Ψ) = R(U g )Ψ), where ge is one of the two elements −1 of the set Φ ({g}). e↑ also gives rise to a ray representation of P ↑ . Conversely, each unitary representation of P + + e is a unitary representation of P e↑ . Because U e (1) = 1H and because To see this, suppose that U + e (−1)U e (−1) = U e (1) = 1H , we must have U e (−1) = ±1H . Then for each ge ∈ P e↑ we have U + e e e e e e e U (−e g ) = U (−1)U (e g ) = ±U (e g ), so R(U (−e g )Ψ) = R(U (e g )Ψ). This shows that U gives rise to a ↑ e (±g̃)} for all g ∈ P ↑ , where ge is one unitary ray representation Ur of P+ by choosing Uray (g) ∈ {U + of the two elements of Φ−1 ({g}). We can thus conclude that for each unitary ray representation of ↑ e↑ that gives rise to the same transformation of unit rays P+ there is a unitary representation of P + e↑ there exists a unitary ray representation of of H, and that for each unitary representation of P + ↑ P+ that gives rise to the same transformation of unit rays of H. e↑ Classification of the irreducible representations of P + e↑ in some detail. There are several We will now study the irreducible unitary representations of P + texts where one can find a discussion about these representations, but each of these texts misses some of the information that can be found in one of the other texts. Here we have tried to include as much (relevant) information as possible, based on [1], [3], [2], [8], [20], [29], [32] and [35]. Let e↑ in the Hilbert space H. We will always assume U be an irreducible unitary representation of P + that such representations are continuous with respect to the weak operator topology on B(H). e↑ there corresponds a unitary operator U (a, L) on H. Before we So to each element (a, L) ∈ P + proceed we have to choose some convention concerning the physical interpretation of the elements of SL(2, C). In our discussion we will assume that we have chosen a fixed basis {eµ } of M. We then let Φ : SL(2, C) → L↑+ be precisely the (basis-dependent) covering map that was defined t in section 2.1.2. So for instance, the element e 2i σj corresponds to a spatial rotation around the xj -axis (over an angle t) in the chosen basis. We will now study the representations in several steps. •Step 1: Decomposition according to the translation subgroup e↑ of translations gives rise to a continuous 4-parameter The abelian subgroup {(a, 1)}a∈M ⊂ P + unitary group U (a, 0) of commuting unitary operators on H. The four parameters correspond to the decomposition of the vectors a ∈ M with respect to our chosen orthonormal basis {eµ }3µ=0 of M. According to a generalization of Stone’s theorem on strongly continuous 1-parameter unitary groups in a Hilbert space (see also section X.5 of [5]), called the SNAG theorem (for Stone, 13 e↑ by P ↑ . This theorem is not true when we replace P + + 33 Naimark, Ambrose and Godement), there exist four pairwise commuting self-adjoint operators P µ (µ = 0, 1, 2, 3) defined on a common dense domain DP ⊂ H such that U (a, 1) = eia·P for all a ∈ M, where we have used the notation P := (P 0 , P 1 , P 2 , P 3 ). The operator P µ is called the generator of translations in the xµ -direction. Under an SL(2, C) transformation these operators transform according to U (0, A)P µ U (0, A)−1 = Φ(A)ν µ P ν , which follows from µP U (0, A)eia µ µ Φ(A)ν U (0, A)−1 = U (Φ(A)a, 1) = eia µ Pν , which implies that U (0, A)Pµ U (0, A)−1 = Φ(A)ν µ Pν . Let EP denote the joint spectral measure of the operators P µ as defined in appendix A.2. Then the operators P µ and U (a, 1) on H can be written as Z Z Pµ = pµ dEP (p) and U (a, 1) = eia·p dEP (p). M M R4 , Here we write M (Minkowski space) instead of because we need the Minkowski metric η on this space. As stated in appendix A.2 we can represent H as a direct integral Z ⊕ H(p)dµ(p) M of Hilbert spaces H(p), corresponding to the operators P µ . If we denote the decomposition of an element Ψ ∈ H with respect to this direct integral decomposition as a function Ψ(p), where Ψ(p) ∈ H(p) for all p ∈ M and Z M kΨ(p)k2H(p) dµ(p) < ∞, then for each Ψ ∈ DP we have (P µ Ψ)(p) = pµ Ψ(p) and for each Ψ ∈ H we have (U (a, 0)Ψ)(p) = eia·p Ψ(p). Of course one is always free to change the value of the functions Ψ(p) on a set of µ-measure zero, but we will simply ignore this fact in the following. In other words, we always pretend that we have chosen some particular representative in the equivalence class of functions Ψ(p) corresponding to some Ψ ∈ H. •Step 2: Lorentz generators According to Stone’s theorem, there exist self-adjoint operators {M j }3j=1 and {N j }3j=1 on H such t t that the one-parameter unitary groups {U (0, e 2i σj )}3j=1 and {U (0, e 2 σj )}3j=1 can be written as j j {eitM }3j=1 and {eitN }3j=1 , respectively. For obvious reasons we will call the operator M j the generator of a rotation around the xj -axis and the operator N j the generator of a Lorentz boost in the xj direction. Now define a set of operators {M µν }3µ,ν=0 by M µν = −M νµ M jk = M l M 0j = Nj, where in the second line (j, k, l) is a cyclic permutation of (1, 2, 3). The operators M µν satisfy U (0, A)M µν U (0, A)−1 = Φ(A)µ ρ Φ(A)ν σ M ρσ . It follows from the definition of M µν above that the operators iMµν and iPµ satisfy the same commutation relations as the Lie algebra basis elements Xµν and Yµ , so [M µν , M ρσ ] = −i(η µρ M σν + η νρ M µσ + η νσ M ρµ + η µσ M νρ ) [M µν , P ρ ] = i(η νρ P µ − η µρ P ν ) [P µ , P ν ] = 0. 34 An immediate consequence of these relations is that the operator P 2 := Pµ P µ commutes with all e↑ on H, it follows from generators P µ and M µν . Because U is an irreducible representation of P + 2 Schur’s lemma that P is a scalar multiple of the identity operator, i.e. P 2 = c1 1H . In particular, this implies that the measure µ is supported in a subset of M of which all elements p have the same value of p · p, but we will come back to this, in more detail, in step 5. From the generators P µ and M µν we can construct four new operators 1 W µ := − µνρσ Mνρ Pσ , 2 called Pauli-Lubanski operators. Here µνρσ is the completely antisymmetric tensor, normalized such that 0123 = −0123 = 1; in some textbooks (for example in [2]) the sign of this antisymmetric tensor is reversed, i.e. 0123 = −0123 = −1, and in those texts there is no minus sign in the definition of W µ . The Pauli-Lubanski operators satisfy Pµ W µ = 0. The Pauli-Lubanski operators all commute with the operators P µ , i.e. [W µ , P ν ] = 0, so the action of W µ on a vector Ψ ∈ H may be described by (W µ Ψ)(p) = W µ (p)Ψ(p) for some set of operators {W µ (p) : H(p) → H(p)}p∈M . Also, the operator W 2 = Wµ W µ commutes with all generators P µ and M µν , so it must be a scalar multiple of the identity operator, i.e. W 2 = c2 1H . •Step 3: The action of SL(2, C) on the spectral measures Because U (0, A)U (a, 1)U (0, A)∗ = U ((0, A)(a, 1)(0, A−1 )) = U (Φ(A)a, 1) for all A ∈ SL(2, C), the spectral measures (defined in step 1) satisfy U (0, A)EP (∆)U (0, A)∗ = EP (Φ(A)∆) (2.21) for all Borel sets ∆ ⊂ M. Because U (0, A)∗ is unitary, we have U (0, A)∗ H = H, so (2.21) implies that U (0, A)EP (∆)H = EP (Φ(A)∆)H. If we use the notation H∆ := EP (∆)H, then this can be rewritten as U (0, A)H∆ = HΦ(A)∆ . (2.22) In particular, this impies that we are free to identify the spaces H(p1 ) and H(p2 ) with each other if p1 and p2 are related by a restricted Lorentz transformation. •Step 4: The support of the measure µ is an L↑+ -invariant subset of M Because U (0, A) is unitary, we have for each Borel set ∆ ⊂ M and for all vectors Ψ1 , Ψ2 ∈ H∆ Z hΨ1 (p), Ψ2 (p)iH(p) dµ(p) = hΨ1 , Ψ2 i ∆ = hU (0, A)Ψ1 , U (0, A)Ψ2 i Z = h(U (0, A)Ψ1 )(p), (U (0, A)Ψ2 )(p)iH(p) dµ(p), Φ(A)∆ where h., .iH(p) denotes the inner product on H(p). Now let q ∈ M be a point outside the support of the measure µ, which means that there exists an open neighborhood Vq of q with µ(Vq ) = 0. For all Ψ1 , Ψ2 ∈ HVq we then have Z Z 0= hΨ1 (p), Ψ2 (p)iH(p) dµ(p) = h(U (0, A)Ψ1 )(p), (U (0, A)Ψ2 )(p)iH(p) dµ(p). Vq Φ(A)Vq In particular, when we choose Ψ1 and Ψ2 such that the integrand on the right-hand side is strictly positive on Φ(A)Vq , we find that µ(Φ(A)Vq ) = 0. But Φ(A)Vq is an open neighborhood of the 35 point Φ(A)q, so the point Φ(A)q also lies outside the support of µ. Because this is true for all A ∈ SL(2, C) and because Φ(SL(2, C)) = L↑+ , it follows that all elements of the form Lq with L ∈ L↑+ are outside the support of µ. This shows that the support of µ must in fact be invariant under L↑+ . •Step 5: Orbits and their relation to irreducible representations For p ∈ M we call the set {Lp}L∈L↑ the orbit of the point p. We can define an equivalence relation + on M by defining two points in M to be equivalent if and only if they have the same orbit; in particular, this gives rise to a partition of M into disjoint orbits. It is clear that {0} is an orbit. We will now characterize all other orbits, so in the following discussion the orbits are assumed to be different from {0}. Note that the elements of one orbit all have the same value of p2 = p · p. When this value is nonnegative, we can write it as p2 = m2 with m ≥ 0. In case this value is negative, we can write it as p2 = (im)2 with m ≥ 0. So to each orbit we assign a number m (or im) in this manner. In the nonnegative case, it follows from our results of section 2.1.2 that for all elements p in the same orbit the component p0 has the same sign (note that p0 6= 0 because we have excluded the orbit {0} from our discussion), so to orbits with p2 ≥ 0 we can also assign a sign ∈ {+, −}. The label (m, ) completely characterizes the orbits with p2 ≥ 0. For an orbit for which p2 < 0, the label im completely characterizes the orbit. Thus M is partitioned into the following orbits (where m ≥ 0): + Om = {p ∈ M : p2 = m2 , p0 > 0}; − Om = {p ∈ M : p2 = m2 , p0 < 0}; Oim = {p ∈ M : p2 = −m2 }; {0}. Because the support of the measure µ is L↑+ -invariant, the support of µ is the union of complete orbits in M. If it contains two or more orbits, then we can always construct an open subset W that is invariant under L↑+ and is such that it contains at least one orbit in the support of µ and such that it excludes at least one orbit in the support of µ. Using the results that we derived earlier, we then find that U (0, A)HW = HΦ(A)W = HW and Z U (a, 1)HW = ia·p e Z dEP (p) EP (W )H = M eia·p dEP (p)H ⊂ HW , W so HW is an invariant subspace of H, contradicting the irreducibility of U . We thus conclude that the support of µ consists of exactly one orbit and the operator P 2 = Pµ P µ = c1 1H that we defined in step 2 is given by P 2 = m2 1H . e↑ is (partly) characterized by the labels of Therefore each irreducible unitary representation of P + the corresponding orbit. In particular, H can be represented as direct integral Z ⊕ H(p)dµ(p), O where O denotes the corresponding orbit of the irreducible representation U . As we shall see later, + (with m ≥ 0) and {0}, so from now on we the only orbits of physical relevance are the orbits Om will only consider these orbits. •Step 6: Representations corresponding to the orbit {0} In this case the irreducible representations are either one-dimensional (the trivial representation) or infinite-dimensional. In physics, only the trivial representation will be relevant: U (a, A)Ψ = Ψ 36 for all Ψ ∈ H, where H is one-dimensional. + with m ≥ 0 •Step 7: Representations corresponding to the orbits Om + We now fix some orbit Om and derive some general properties of the corresponding irreducible representations. The Hilbert space H can be decomposed according to Z ⊕ H(p)dµm (p). H= + Om + and It can be shown that the requirements that the support of the measure µm must equal Om ↑ that µm must be L+ -invariant, determine µm uniquely up to a positive factor. We will choose this + ) by factor in such a way that the measure is given (on Om dµm (p) = d3 p , 2p0 (2.23) p + , which of course implies that p0 = where we write p = (p0 , p) for all p ∈ Om m2 + p2 . A nice property of this normalization of µm is that for each function f : M → R we have Z Z p δ(p2 − m2 )θ(p0 )f (p)d4 p = f ( m2 + p2 , p)dµm (p), + Om M where θ : R → {0, 1} denotes the step function. Note that the elements in our Hilbert space can + + now be represented as (∪p∈Om + H(p))-valued functions Ψ on Om with Ψ(p) ∈ H(p) for all p ∈ Om and Z kΨ(p)k2H(p) dµm (p) < ∞. + Om Because U (0, A)H∆ = HΦ(A)∆ for all Borel sets ∆ and because Φ(A) : M → M is bijective, + in the orbit the vector space isomorphisms (but not Hilbert we can now define for each p ∈ Om space isomorphisms) Up→Φ(A)p (0, A) : H(p) → H(Φ(A)p) such that Up→Φ(A)p (0, A)(Ψ(p)) := (U (0, A)Ψ)(Φ(A)p). (2.24) + . Sometimes we will simply write U (0, A) instead of U for all p ∈ Om p p→Φ(A)p (0, A) to save some space whenever the equations get too long. Because U (0, A) is unitary, these mappings are indeed + . They are not isomorphisms of Hilbert spaces (i.e. isomorphisms of vector spaces for all p ∈ Om 0 inner q product preserving) because of the p in the denominator of equation (2.23); however, the 0 p map U (0, A) : H(p) → H(Φ(A)p) is in fact a Hilbert space isomorphism. This (Φ(A)p)0 p→Φ(A)p follows from the fact that kΨ(p)k2H(p) d3 p d3 p 2 = kU (0, A)Ψ(p)k p→Φ(A)p H(Φ(A)p) 2p0 2(Φ(A)p)0 (which follows from the unitarity of U (0, A)), which in turn implies that p0 kU (0, A)Ψ(p)k2H(Φ(A)p) (Φ(A)p)0 p→Φ(A)p 2 s 0 p = U (0, A)Ψ(p) (Φ(A)p)0 p→Φ(A)p kΨ(p)k2H(p) = . H(Φ(A)p) q 0 p We will use the Hilbert space isomorphisms U (0, A) later when we will define (Φ(A)p)0 p→Φ(A)p orthonormal bases on the spaces H(p). Note that the (vector space) isomorphisms Up→Φ(A)p (0, A) satisfy UΦ(AB)−1 p (0, AB)(Ψ(Φ(AB)−1 p)) = UΦ(A)−1 p (0, A)UΦ(AB)−1 p (0, B)(Ψ(Φ(AB)−1 p)). 37 In particular, if Φ(A), Φ(B) ∈ L↑+ are such that Φ(A)p = Φ(B)p = p, then this becomes Up→p (0, AB)(Ψ(p)) = Up→p (0, A)Up→p (0, B)(Ψ(p)). (2.25) + in the orbit and choose14 an orthonormal basis {e (k)} of H(k). Now fix an element k ∈ Om σ σ We will now use the operators Up→Φ(A)p (0, A) to define an orthonormal basis for the other points + . First we fix for each p ∈ O + an element M ∈ SL(2, C) such that p = Φ(M )k. Then for of Om p p m + we define an orthonormal basis {e (p)} of H(p) by each p ∈ Om σ σ s k0 eσ (p) := Uk→p (0, Mp )eσ (k). p0 q 0 Here we use that kp0 Uk→p (0, Mp ) is a Hilbert space isomorphism, as shown above. With this basis, P we can write each Ψ(p) ∈ H(p) as Ψ(p) = σ Ψ(p, σ)eσ (p), and we can identify a vector Ψ ∈ H with the function Ψ(p, σ). We will see in a moment that the spaces H(p) are finite-dimensional in all physically relevant cases, so the index σ takes finite number of values. The Hilbert space L on a + + , µ ). The inner H can thus be realized as a finite direct sum σ L2 (Om , µm ) of copies of L2 (Om m product is thus given by XZ hΨ1 (p, σ), Ψ2 (p, σ)i = Ψ1 (p, σ)Ψ2 (p, σ)dµm (p) σ = + Om XZ σ Ψ1 ((p0 , p), σ)Ψ2 ((p0 , p), σ) R3 d3 p , 2p0 p where in the last line p0 = m2 + p2 . Of course we could just as well have defined functions L 2 3 d3 p Ψ(p, σ) with p in R3 (or R3 \{0} for massless particles) and realize H as σ L (R , 2p0 ), and we will in fact do this later. However, for the moment we will stick with Ψ(p, σ) for notational convenience. Now that we have defined these bases of H(p) (given some basis of H(k)), we can express the action of Up→Φ(A)p (0, A) for any A ∈ SL(2, C) as s k0 U (0, A)Uk→p (0, Mp )eσ (k) p0 p→Φ(A)p s k0 U (0, AMp )eσ (k) p0 k→Φ(A)p s k0 −1 U (0, MΦ(A)p MΦ(A)p AMp )eσ (k) p0 k→Φ(A)p s k0 −1 U (0, MΦ(A)p )Uk→k (0, MΦ(A)p AMp )eσ (k), p0 k→Φ(A)p Up→Φ(A)p (0, A)eσ (p) = = = = (2.26) where in the last step we used that −1 −1 −1 Φ(MΦ(A)p AMp )k = Φ(MΦ(A)p )Φ(A)Φ(Mp )k = Φ(MΦ(A)p )Φ(A)p = k. + in the orbit we define To understand (2.26) we introduce some terminology. For any point p ∈ Om a subgroup Gp ⊂ SL(2, C) by Gp := {A ∈ SL(2, C) : Φ(A)p = p}. 14 In steps 7a and 7b we will show how to choose such basis for the cases m > 0 and m = 0 separately. 38 + and A ∈ SL(2, C) is such that Φ(A)p = p0 , then the Clearly, if p0 is another point in the orbit Om groups Gp and Gp0 are isomorphic and the isomorphism from Gp to Gp0 is given by B 7→ ABA−1 . + The isomorphic subgroups {Gp }p∈Om + are called the little group of the orbit Om . −1 In this terminology, the transformation MΦ(A)p AMp ∈ SL(2, C) in (2.26) is an element of the little group Gk . In fact, it follows from (2.25) that U induces a unitary representation of Gk on the Hilbert space H(k) by A 7→ Uk→k (0, A) for A ∈ Gk . For A ∈ Gk we write [Uk→k (0, A)]σ,σ0 for the matrix components of the unitary operator Uk→k (0, A) on H(k) with respect to the orthonormal basis eσ (k). With this notation we can write (2.26) as s ! X k0 −1 Up→Φ(A)p (0, A)eσ (p) = U (0, MΦ(A)p ) [Uk→k (0, MΦ(A)p AMp )]σ0 ,σ eσ0 (k) p0 k→Φ(A)p 0 σ s 0 X k −1 [Uk→k (0, MΦ(A)p AMp )]σ0 ,σ Uk→Φ(A)p (0, MΦ(A)p )eσ0 (k) = p0 0 | {z } σ r = s = (Φ(A)p)0 eσ0 (Φ(A)p) k0 (Φ(A)p)0 X −1 [Uk→k (0, MΦ(A)p AMp )]σ0 ,σ eσ0 (Φ(A)p) p0 0 (2.27) σ Using (2.24) with Φ(A)−1 p instead of p, we then find that (U (0, A)Ψ)(p) = UΦ(A)−1 p→p (0, A)(Ψ(Φ(A)−1 p)) X = Ψ(Φ(A)−1 p, σ)UΦ(A)−1 p→p (0, A)eσ (Φ(A)−1 p) σ s = X Ψ(Φ(A)−1 p, σ) σ (s X p0 [Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ0 ,σ eσ0 (p) (Φ(A)−1 p)0 0 σ ) X [Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ0 ,σ Ψ(Φ(A)−1 p, σ) eσ0 (p). p0 (Φ(A)−1 p)0 σ σ0 P Because we also have (U (0, A)Ψ)(p) = σ (U (0, A)Ψ)(p, σ)eσ (p), we thus conclude that s X p0 [Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ,σ0 Ψ(Φ(A)−1 p, σ 0 ). (U (0, A)Ψ)(p, σ) = (Φ(A)−1 p)0 0 = X σ This shows that the action of U (0, A) on H is completely determined once we know the action of Uk→k (0, A) on H(k) for all little group elements A ∈ Gk . In other words, we have reduced the e↑ to the problem of finding irreducible problem of finding irreducible unitary representations of P + unitary representations of Gk on the Hilbert space H(k). We will now briefly discuss these representations of Gk . We separate two cases: m > 0 and m = 0. •Step 7a: m > 0 If m > 0, then the little group Gk is SU (2), the double cover of the rotation group SO(3). This can be seen easily by choosing k = (m, 0, 0, 0). The only restricted Lorentz transformations that leave (m, 0, 0, 0) invariant are rotations, i.e. elements of SO(3) (Note that these are the only restricted Lorentz transformations that leave the zeroth component of a vector invariant, so there cannot be any other restricted Lorentz transformations that leave (m, 0, 0, 0) invariant). Therefore, the little group in this case is Gk = Φ−1 (SO(3)) = SU (2). This can also be seen directly. The image of k under the map ψ : M → H(2, C) is ψ(k) = m1C2 , so an element A ∈ SL(2, C) is in Gk if and only if m1C2 = A(m1C2 )A∗ = mAA∗ , or A−1 = A∗ . This shows that A ∈ SL(2, C) must be unitary, i.e. A ∈ SU (2) and hence Gk = SU (2). The irreducible representations of SU (2) are finite-dimensional and are labelled by the parameter s ∈ {0, 21 , 1, 32 , . . .}, where 2s + 1 is the dimension of the representation. Because SU (2) is 39 a simply-connected Lie group, all these representations can be characterized by the irreducible 2s + 1-dimensional representations D(s) : su(2) → V2s+1 of its Lie algebra su(2). If we choose a 1 3 basis of the 2s + 1-dimensional vector space V2s+1 in which D(s) ( 2i σ ) is diagonal, we can specify (s) D by 1 3 (s) σ D = −iσδσ,σ0 2i σσ 0 p p 1 1 (s) D σ = −i δσ0 ,σ+1 (s − σ)(s + σ + 1) + δσ0 ,σ−1 (s + σ)(s − σ + 1) 2i 0 σσ p p 1 2 D(s) σ = − δσ0 ,σ+1 (s − σ)(s + σ + 1) − δσ0 ,σ−1 (s + σ)(s − σ + 1) 2i σσ 0 where the row and column indices σ and σ 0 run from s to −s. We will denote the representation of SU (2) corresponding to the representation D(s) of su(2) by D(s) . We thus conclude that the Hilbert space H(k) is equal to C2s+1 for some j ∈ 12 Z≥0 and that Uk→k (0, A) = D(s) (A) for A ∈ SU (2). Note that we have implicitly chosen the orthonormal basis {eσ (k)}σ on H(k) = C2s+1 to be a set of 1 3 σ ), and (as described above) this also defines orthonormal bases {eσ (p)}σ eigenvectors of D(s) ( 2i 2s+1 + . The representation U is now given by for H(p) = C at all other points p ∈ Om s (Φ(A)p)0 X (s) −1 Up→Φ(A)p (0, A)eσ (p) = [D (MΦ(A)p AMp )]σ0 σ eσ0 (Φ(A)p). p0 0 σ In terms of the functions Ψ(p, σ) this reads s X p0 [D(s) (Mp−1 AMΦ(A)−1 p )]σ,σ0 Ψ(Φ(A)−1 p, σ 0 ). (U (0, A)Ψ)(p, σ) = (Φ(A)−1 p)0 0 σ On H(k) ' C2s+1 , we define the hermitian operators S (s),j (k) = iD (s) 1 j σ 2i (2.28) for j = 1, 2, 3, which we will call the spin operators at the point k. They satisfy [S (s) 2 (k)] = 3 h X S (s),j (k) i2 = s(s + 1)1H(k) j=1 and [S (s),a (k), S (s),b (k)] = iabc S (s),c (k). (2.29) The three generators M j commute with P 0 and the operators P j are zero operators on H(k) so they commute trivially with the M j , so the M j leave the space H(k) invariant. We can therefore define the operators M j (k) : H(k) → H(k) in the obvious way. We now find eitS (s),j (k) = eiD (s) ( t σ j ) 2i t j = Uk→k (0, e 2i σ ) = eitM j (k) . So S (s),j (k) = M j (k). We will now use this to show that S (s),j (k) is proportional to W j (k). Because Pµ W µ = 0, it follows that W 0 (k) = 0. For the other components of W (k) we find that 1 iνρσ 1 iνρσ 1 i W (k) = − Mνρ Pσ (k) = − Mνρ kσ (k) = − mijl0 Mjl (k) 2 2 2 1 0ijl m X 0ijl jl = m Mjl (k) = M (k) = mM i 2 2 j,l = mS (s),i (k). 40 In other words, S(s) (k) = 1 W(k). m (2.30) In particular, this implies that W (k)2 = −[W(k)]2 = −m2 [S(s) (k)]2 = −m2 s(s + 1)1H(k) and therefore that W 2 = −m2 s(s + 1)1H , since we already knew that W 2 is a scalar multiple of 1H . We now define the spin operator S (s),j on H by 1 W 0P j (s),j j S = W − . (2.31) m m + P0 It is clear that, since this operator is constructed from the P µ and W µ , it leaves all spaces H(p) + . The reason invariant, and therefore we can define spin operators S (s),j (p) at any point p ∈ Om (s) for the definition (2.31) is that the three operator components of S form a (pseudo-)vector, i.e. + this definition reproduces (2.30) and at each [M a , S (s),b ] = iabc S (s),c . Also, at the point k ∈ Om point p in the orbit the commutation relations (2.29) hold. It can in fact be shown that this is the unique operator that is a linear combination of the W µ with coefficients that are functions of P µ and that satisfies all these properties, see section 7.2C of [2]. The action of S (s),3 on a function Ψ(p, σ) is given by (S (s),3 Ψ)(p, σ) = σΨ(p, σ), but we will not prove this. •Step 7b: m = 0 For m = 0 we choose k to be the vector k = (1, 0, 0, 1). Under the map ψ : M → H(2, C) as defined in section 2.1.2 the vector k corresponds to the hermitean matrix 0 2 0 k + k 3 k 1 − ik 2 . = ψ(k) = 0 0 k 1 + ik 2 k 0 − k 3 For an arbitrary 2 × 2-matrix A with components Aij (i, j = 1, 2) the condition Aψ(k)A∗ = ψ(k) implies that |A11 |2 = 1 and A21 = 0. If A is also in SL(2, C) then we must have 1 = det(A) = A11 A22 − A12 A21 = A11 A22 , which implies that A22 = A−1 11 = A11 . Thus, A ∈ SL(2, C) is in the little group Gk of k if and only if it is of the form iα e z Aα,z = 0 e−iα with α ∈ R and z = z1 + iz2 ∈ C. If α = 0 we can obtain A0,z by 1 A0,z = ez1 ( 2 σ 1 − 1 σ 2 )+z (− 1 σ 1 − 1 σ 2 ) 2 2i 2i 2 (2.32) and if α 6= 0 and α 6= π, we can obtain Aα,z by 1 Aα,z = e−2α 2i σ 3+ α sin α [z1 ( 12 σ1 − 2i1 σ2 )+z2 (− 2i1 σ1 − 12 σ2 )] . (2.33) Here we used that for c = (c1 , c2 , c3 ) ∈ C3 we have that ec·σ = if c21 + c22 + c23 6= 0 and p q sinh c21 + c22 + c23 p cosh c21 + c22 + c23 1C2 + c·σ c21 + c22 + c23 ec·σ = 1C2 + c · σ 41 (2.34) (2.35) if c21 + c22 + c23 = 0. In the case α = 0 we chose c = ( z2 , iz2 , 0) and applied (2.35); in the case where zα izα α 6= 0 and α 6= π we chose c = ( 2 sin α , 2 sin α , iα) and applied (2.34). Note that the elements Aα,z of Gk satisfy the algebraic properties Aα1 ,0 Aα2 ,0 = Aα1 +α2 ,0 A0,z1 A0,z2 = A0,z1 +z2 Aα,0 A0,z A−α,0 = A0,ze2iα . In order to understand this group, we need to recall the definition of the group E+ (2), the proper Euclidean group in two dimensions. The group E+ (2) acts on the plane R2 and is generated by the translations T (~v ) over a vector ~v ∈ R2 and the rotations R(θ) around an angle θ ∈ [0, 2π). These generators satisfy R(θ1 )R(θ2 ) = R(θ1 + θ2 ) T (~v1 )T (~v2 ) = T (~v1 + ~v2 ) R(θ)T (~v )R(−θ) = T (R(θ)~v ). e+ (2) of E+ (2). The elements Comparing these two groups, we observe that Gk is the double cover E A 1 α,0 and A0,z satisfy the same algebraic properties as R(θ) and T (~v ), respectively, only the range 2 of α runs from 0 to 4π, while the range of θ runs from 0 to 2π. The only finite-dimensional irreducible unitary representations of Gk are one-dimensional, and are given by D(σ) (Aα,z ) = e2iσα 1H(k) for σ ∈ 21 Z. Here H(k) ' C is one-dimensional. All other representations are infinite-dimensional, but they turn out to be physically irrelevant. Thus, U is given by s −1 (Φ(A)p)0 2iσα(MΦ(A)p AMp ) Up→Φ(A)p (0, A)eσ (p) = eσ (Φ(A)p), (2.36) e 0 p where the index σ can only take on one value, since the representation of Gk is one-dimensional, and α(M ) denotes the angle α in M = Aα,z for M ∈ Gk . In terms of the functions Ψ(p, σ) this reads s −1 p0 2iσα(MΦ(A)p AMp ) (U (0, A)Ψ)(p, σ) = e Ψ(Φ(A)−1 p, σ). (Φ(A)−1 p)0 e+ (2) is spanned by the elements It follows from (2.32) and (2.33) that the Lie algebra of Gk ' E 1 3 σ , 2i 1 := − σ 2 + 2i 1 := − σ 1 − 2i R := T1 T2 1 1 σ 2 1 2 σ . 2 The Lie algebra representation D(σ) induced by D(σ) maps the basis element R to −iσ1H(k) , because 1 3 (σ) 1 3 e−2αD ( 2i σ ) = D(σ) (e−2α 2i σ ) = D(σ) (Aα,0 ) = e2iσα . A similar calculation shows that the other two basis elements of the Lie algebra are mapped to 0 by D(σ) . On the space H(k) we define the operator 1 3 (σ) σ = σ1H(k) , λ(k) := iD 2i which we will call the helicity operator at the point k. Because [M 3 , P µ ] = [M 12 , P µ ] = i(η 2µ P 1 − η 1µ P 2 ) and because P 1 and P 2 are the zero operators on H(k), we find that M 3 leaves the space 42 H(k) invariant, so we can define an operator M 3 (k) : H(k) → H(k) in the obvious way. By the same reasoning as for m > 0 we then find that λ(k) = M 3 (k). The Pauli-Lubanski operator at the point k is W µ (k) = − 21 µνρσ Mνρ kσ = − 21 (µνρ0 − µνρ3 )Mνρ , where we have used that k3 = −k 3 = −1. Writing out these expressions gives15 W 0 (k) = M 3 (k) = λ(k) W 1 (k) = (M 1 + N 2 )(k) = 0 W 2 (k) = (M 2 − N 1 )(k) = 0 W 3 (k) = M 3 (k) = λ(k), so W µ (k) = k µ λ(k) = σk µ 1H(k) . We have thus found that W µ (k) is proportional to k µ and, in particular, that [W (k)]2 = Wµ (k)W µ (k) = 0. Because W 2 = Wµ W µ is a scalar multiple of the identity operator on H, this gives W 2 = Wµ W µ = 0. Because we always have Pµ W µ = 0 and because P 2 = m2 1H = 0, we conclude that W µ must be proportional to P µ . Since W µ (k) = σk µ 1H(k) , the proportionality constant is σ and we obtain W µ = σP µ . On the Hilbert space H we now define the helicity operator by λ= M·P W0 = . P0 |P| We will now briefly discuss the image of Gk ⊂ SL(2, C) under the covering map Φ : SL(2, C) → because this is not found in any of the literature that we have used. The image of an arbitrary element Aα,z is easily obtained from ψ(Φ(Aα,z )x) = Aα,z ψ(x)A∗α,z , or (Φ(Aα,z )x)0 + (Φ(Aα,z )x)3 (Φ(Aα,z )x)1 − i(Φ(Aα,z )x)2 (Φ(Aα,z )x)1 + i(Φ(Aα,z )x)2 (Φ(Aα,z )x)0 − (Φ(Aα,z )x)3 (1 + |z|2 )x0 + 2Re[zeiα (x1 − ix2 )] + (1 − |z|2 )x3 e2iα (x1 − ix2 ) + zeiα (x0 − x3 ) = . e−2iα (x1 + ix2 ) + ze−iα (x0 − x3 ) x0 − x3 L↑+ , After some straightforward computations this gives   z 2 +z 2 z 2 +z 2 1+ 12 2 z1 cos α + z2 sin α z1 sin α − z2 cos α − 12 2   cos 2α sin 2α −z1 cos α + z2 sin α  z cos α − z2 sin α Φ(Aα,z ) =  1  − sin 2α cos 2α z1 sin α + z2 cos α  −z1 sin α − z2 cos α z12 +z22 2    = R3 (−α)   1+ z12 +z22 2 z1 −z2 z12 +z22 2 z1 cos α + z2 sin α z1 sin α − z2 cos α  z 2 +z 2 z1 −z2 − 1 2 2  1 0 −z1   R3 (−α), 0 1 z2  b21 +b22 z1 −z2 1 − 2 1− z12 +z22 2 where R(α) denotes a rotation around the x3 -axis over an angle α (counterclockwise, as seen from a point with positive x3 -coordinate). In particular, this computation shows that Aα,0 is mapped onto the rotation R3 (−2α). Note that it was to be expected that {R3 (α)}α∈[0,2π) would be a subgroup of Φ(Gk ) because k = (1, 0, 0, 1) is invariant under rotations around the x3 -axis. In the book [35] of Weinberg the group Φ(Gk ) ⊂ L↑+ is computed directly as follows16 . Let W ∈ L↑+ be such that W k = k. Then for the vector t = (1, 0, 0, 0) we have 1 = t·k = (W t)·(W k) = 15 Here we use that eit(M 1 +N 2 )(k) = Uk→k (0, e−tT2 ) = e−D −D (σ) (T1 ) e (σ) (T2 ) = 1H(k) and eit(M 2 −N 1 )(k) = Uk→k (0, e−tT1 ) = = 1H(k) . Weinberg does not use the universal covering group at all. As a consequence, he needs to consider double-valued representations, i.e. representations up to a sign, of the little groups SO(3) and E+ (2). 16 43 (W t) · k = (W t)0 − (W t)3 , so we can write W t as W t = (1 + ct , at , bt , ct ) with at , bt , ct ∈ R. Also 1 = t · t = (W t) · (W t) = (1 + ct )2 − a2t − b2t − c2t , from which it follows that ct can be expressed in terms of at and bt as ct = ct (at , bt ) = (a2t +b2t )/2. Now define the restricted Lorentz transformations Sa,b by   1 + c(a, b) a b −c(a, b)   a 1 0 −a  ∈ L↑ , Sa,b =  +   b 0 1 −b c(a, b) a b 1 − c(a, b) where c(a, b) := (a2 + b2 )/2. Using that c(a + a0 , b + b0 ) = c(a, b) + c(a0 , b0 ) + aa0 + bb0 , it follows easily that Sa,b Sa0 ,b0 = Sa+a0 ,b+b0 , so the Sa,b form an abelian subgroup of L↑+ . Note that Sat ,bt t = (1 + ct , at , bt , ct ) = W t and thus that Sa−1 W t = t, which shows that t ,bt Sa−1 W ∈ Φ(Gt ) = SO(3). t ,bt W k = k, so Sa−1 W ∈ SO(3) must be a Because Sa,b k = k for all a, b ∈ R, we also have Sa−1 t ,bt t ,bt rotation that leaves the x3 -component invariant, i.e. it must be a rotation R3 (θ) around the 3-axis. We thus conclude that W = Sat ,bt R3 (θ). A general element in L↑+ that leaves k invariant is thus of the form Wa,b,θ = Sa,b R3 (θ), and these elements do indeed form a group. As we have already seen above, the elements Wa,b,0 form an abelian subgroup of L↑+ with multiplication given by Wa,b,0 Wa0 ,b0 ,0 = Wa+a0 ,b+b0 ,0 . The elements W0,0,θ also from an abelian subgroup with multiplication given by W0,0,θ W0,0,θ0 = W0,0,θ+θ0 . Furthermore, we also have W0,0,θ Wa,b,0 W0,0,−θ = Wa cos θ+b sin θ,−a sin θ+b cos θ,0 . Thus the group formed by the elements Wa,b,θ is isomorphic to the two dimensional Euclidean group E+ (2), i.e. the group of translations and rotations in the plane; the isomorphism is given by identifying Wa,b,0 with a translation in the plane over the vector (a, b) and identifying W0,0,θ with a rotation in the plane around an angle θ. Physical interpretation of the irreducible representations e↑ that are relevant in We have now fully classified those irreducible unitary representations of P + physics. Quantum systems in which the pure state vectors Ψ ∈ H transform as such representae↑ are interpreted as one-particle states. The label m is interpreted as the mass of the tions of P + particle and the operators P µ are interpreted as the four-momentum operators corresponding to the one-particle system (this last fact is made more rigorous in [1], section 3.6). We found that the one-particle states are ∪p∈Om + H(p)-valued functions of the four-momentum p with Ψ(p) ∈ H(p), p 0 but since p = m2 + p2 , we will from now on write the one-particle states as functions Ψ(p) of the three-momentum p. The generators M j of rotations around the xj -axis are interpreted as the xj -component of the angular momentum of the particle. If m > 0 and if the representation of SU (2) is 2s + 1-dimensional, then the state vectors of the particle are functions Ψ(p, σ), where p ∈ R3 and σ ∈ {−s, . . . , s}. The label s ∈ 21 Z≥0 is called the spin of the particle and because (S (s),3 Ψ)(p, σ) = σΨ(p, σ), the label σ ∈ {−s, . . . , s} denotes the spin component along the x3 -direction. The operators S(s) contribute to the total angular momentum M of the particle and for a particle at rest we have 44 in fact M = S(s) . For a massive particle with spin s the probability of finding the particle with three-momentum p in some Borel set B ⊂ R3 and spin x3 -component σ is Z d3 p , |Ψ(p, σ)|2 2ωp B where we defined ωp := p m 2 + p2 . For this reason, we may interpret 1 ψ(p, σ) := p Ψ(p, σ) 2ωp as the momentum-spin wave function of the particle. This momentum-spin wave function ψ is d3 p square-integrable with respect to d3 p, not with respect to 2ω . We will denote the Hilbert space of p such momentum-spin wave functions by H, instead of H. Thus, we can describe one-particle states either by elements in Ψ in H or by elements ψ in H, both descriptions being unitarily equivalent (and hence physically equivalent) to each other. The map J : Ψ 7→ √ 1 Ψ = ψ which relates two 2ωp physically equivalent elements with each other, provides the unitary map from H onto H. This map should be interpreted as some kind of change of variables on the space of functions. The e↑ on H can easily be made into a representation u of P e↑ on H by using the representation U of P + + map J: p 1 [U (a, A) ( 2ωp ψ)](p, σ). [u(a, A)ψ](p, σ) = [JU (a, A)J −1 ψ](p, σ) = p | {z } 2ωp ∈H From now on we will write both representations U and u as U ; it will always be clear from the context which one is meant. In the following chapters we will need to switch often between both Hilbert spaces H and H to describe one-particle states, but we will always make a very explicit distinction between the two descriptions. The Fourier transform of the momentum-spin wave function can be interpreted as the position (and spin) wave function. The position operators X j act on the momentum space wave function as the operators X j = i ∂p∂ j . Therefore, the action of X j on Ψ(p, σ) is given by the operator Xj = p 1 ∂ 2p0 i j p . ∂p 2p0 If m = 0 then the Hilbert spaces H(p) are all one-dimensional and therefore the state vectors are functions Ψ(p) of the three-momentum. The action of the helicity operator λ on Ψ is just a multiplication by σ, i.e. λΨ = σΨ; here σ denotes the label that occurs in the classification of the one-dimensional representations of (the double cover of) E+ (2). The absolute value |σ| of the label σ is called the spin of the particle and σ itself is called the helicity of the particle. Because for the helicity we have M·P σ1H = , |P| the helicity measures the angular momentum of the particle in the direction of its three-momentum p. Unlike the spin components σ for massive particles, the quantity σ for massless particles is fixed for a given particle type. For example, if neutrinos are massless (which might not be the case), then they would have σ = − 21 and anti-neutrinos would have σ = 12 . However, as we will see after equation (2.39) below, there are cases where the particles with helicities σ and −σ should be identified. In that case H is a direct sum of two irreducible representations, corresponding to helicity σ and −σ, and the state vectors are described by functions Ψ(p, σ) with σ taking on the two values ±σ. From now on we will always use the notation Ψ(p, σ) to denote the state vector of a massless particle, even if σ can only take on one value. As for massive particles, we also define 45 the Hilbert space H of momentum-spin wave functions ψ(p, σ) for massless particles. However, for massless particles it is not possible to give a satisfactory definition of position operators. Space inversion and time reversal e↑ , but also under the action of a space Some quantum systems are not only invariant under P + 0 0 inversion Is : (x , x) 7→ (x , −x) or time reversal It : (x0 , x) 7→ (−x0 , x) or the combination Is It : x 7→ −x. By Wigner’s theorem, these transformations can then be represented by unitary or antiunitary operators P, T and PT on the Hilbert space H corresponding to the quantum system. It can be shown (see section 2.6 of [35]) that in order to avoid the existence of negative-energy states, we must choose P to be linear and unitary and we must choose T to be antilinear and antiunitary. Without any derivation, we will now simply give the action of P and T on states e↑ , i.e. on states that represent one-particle states. transforming irreducibly under P + For massive particles the action of P is given by Peσ (p) = ξeσ (Is p), (2.37) where ξ is a phase factor that only depends on the species of particle; it is called the intrinsic parity of the particle. The action of T is given by Teσ (p) = ζ · (−1)s−σ e−σ (Is p), (2.38) where ζ is a phase factor that only depends on the species of particle and s is the spin of the particle. In contrast to the intrinsic parity ξ, the phase factor ζ has no physical significance because when we redefine eσ (p) by ζ 1/2 eσ (p), the factor ζ cancels out in equation (2.38). This trick does not work for the intrinsic parity ξ because P is linear, rather than antilinear. For massless particles the action of P is given by Peσ (p) = ξσ · eiπ(p)σ e−σ (Is p), (2.39) where ξσ is a phase factor and (p) ∈ {−1, 1} is the sign of the x2 -component of p. Thus, if a theory is invariant under space inversions then massless particles in this theory with some σ should be identified with those particles obtained by substituting σ → −σ. This happens for instance in quantum electrodynamics, where the massless particles with σ = 1 and σ = −1 are identified and are both refered to as photons. The action of T is given by Teσ (p) = ζσ · eiπ(p)σ eσ (Is p), (2.40) where ζσ is a phase factor and (p) is as above. 2.2.5 Many-particle states and Fock space Most of the material covered in this subsection can be found in one of the texts [1], [2] or [8]. Suppose that we have a system consisting of n non-interacting distinguishable particles; here distinguishable means that all particles are of a different type. If the individual particles are described by one-particle states in Hilbert spaces17 H1 , . . . , Hn , then the total system is described by the Hilbert space H1 ⊗ . . . ⊗ Hn , and the algebra of observables of the total system is the tensor product of the algebras of observables on the one-particle states. If the system consists of n non-interacting particles of the same type then the Hilbert space of the system is still H ⊗n , with H the Hilbert space for a single particle of the given type, but not all unit rays in this Hilbert space represent physically realizable states. This last fact comes from the fact that in quantum mechanics two particles of the same type cannot be distinguished when they form a single system; there is no way of keeping track of which particle is which. Mathematically, this means the following. 17 Here and in the rest of this section all the one-particle Hilbert spaces H either represent the spaces of state functions Ψ(p, σ) or else they all represent the momentum-spin wave functions ψ(p, σ), but one should be consequent in this choice. Thus, we either have H = H in all definitions, or else H = H in all definitions. 46 If Sn denotes the symmetric group on n objects, we define for each σ ∈ Sn a unitary operator R(n) (σ) : H ⊗n → H ⊗n by R(n) (σ)(h1 ⊗ . . . ⊗ hn ) = hσ(1) ⊗ . . . ⊗ hσ(n) . Note that this defines a unitary representation of Sn in H ⊗n . The statement that the particles are truely indistinguishable is then equivalent to saying that for any physically realizable pure state h ∈ H ⊗n and for any σ ∈ Sn we must have18 R(n) (σ)h = λ(σ, n, h)h (2.41) with λ(σ, n, h) a complex number of absolute value 1. Note that it follows from the linearity of R(n) that for any nonzero complex number c we have R(n) (σ)(ch) = λ(σ, n, h)ch, so that λ satisfies λ(σ, n, ch) = λ(σ, n, h); this holds in particular whenever |c| = 1 and therefore λ(σ, n, h) is independent of the choice of unit vector in the unit ray R(h). For σ1 , σ2 ∈ Sn and for any physically realizable state h ∈ H ⊗n we have that λ(σ1 σ2 , n, h)h = R(n) (σ1 σ2 )h = R(n) (σ1 )R(n) (σ2 )h = λ(σ1 , n, h)λ(σ2 , n, h)h, so for any physically realizable state h ∈ H ⊗n the map λ(., n, h) : Sn → U (1) defines a 1dimensional representation of Sn on the space Ch. But the only two 1-dimensional representations of Sn are the completely symmetric one, λS (σ) = 1 for all σ ∈ Sn , and the completely antisymmetric one, λA (σ) = (σ) for all σ ∈ Sn , where : Sn → {−1, 1} denotes the sign of the permutation. Thus, for any physically realizable state h ∈ H ⊗n we either have λ(σ, n, h) = λS (σ) or λ(σ, n, h) = λA (σ). In the first case we say that h is a completely symmetric state and in the second case we say that h is a completely antisymmetric state. The set of symmetric state vectors forms a linear subspace n (H) and the set of antisymmetric state vectors forms a linear of H ⊗n which we denote by F+ n (H). The orthogonal projections P + : H ⊗n → F n (H) subspace of H ⊗n which we denote by F− + n n ⊗n n − and Pn : H → F− (H) onto F± (H) are given by Pn+ = Pn− = 1 X R(σ) n! σ∈Sn 1 X (σ)R(σ). n! σ∈Sn n (H) with a state in F n (H) does not satisfy It is obvious that a superposition of a state in F+ − (2.41) and is therefore not physically realizable. It turns out that for any given particle type occuring in nature, the state vectors of an n-particle system of particles of that type are either always symmetric or else always antisymmetric. Furthermore, the choice between the symmetric and antisymmetric case is the same for each n, so for any given particle type we either have that n (H) and that λ(σ, n, h) = λ (σ) for all the space of physically realizable states of n particles is F+ S n n (H) h ∈ F+ (H) or else we have that the space of physically realizable states of n particles is F− n (H). In the first case we say that the given particle is a and that λ(σ, n, h) = λA (σ) for all h ∈ F− boson and in the second case we say that it is a fermion. 18 There are more general possibilities here than a phase factor λ, the more general condition being that R(n) (σ)h is physically indistinguishable from h. Instead of by unit rays, physical states are then mathematically described by the (higher dimensional) generalized unit rays, see also [24]. Below we will see that λ defines a 1-dimensional representation of the permutation group. In the more general case one then also considers higher dimensional representations of the permutation group Sn induced by R(n) on H ⊗n . This more general theory is also referred to as parastatistics and is of an entirely different nature than the generalizations that one encounters in 3-dimensional spacetime (i.e. braid statistics, anyons, etc.). Note that in the case of two particles (n = 2) there is actually no generalization because H ⊗2 decomposes precisely into a direct sum of the two subspaces corresponding to the two 1-dimensional representations of S2 . This is related to the fact that we can write v ⊗ w as 12 (v ⊗ w + w ⊗ v) + 12 (v ⊗ w − w ⊗ v). 47 Because we often have to consider systems of identical particles in which the number of particles may change, we introduce the Hilbert spaces F± (H) = ∞ M n F± (H), n=0 0 (H) ' C represents the vacuum state, i.e. the state with no particles. We will choose a where F± 0 (H) and call it the Fock vacuum. For each one-particle state vector h ∈ H we unit vector Ω ∈ F± define a (densely defined) operator A∗± (h) : F± (H) → F± (H) by A∗± (h)Ω = h √ ± A∗± (h)Pn± (h1 ⊗ . . . ⊗ hn ) = n + 1Pn+1 (h ⊗ h1 ⊗ . . . ⊗ hn ). (2.42) (2.43) n (H) into F n+1 (H) by ’creating’ an extra particle with state vector h. For This operator maps F± ± ∗ this reason A± (h) is called a creation operator. Note that it is defined on the dense subspace ± D = ∞ M n [ j F± (H) n=0 j=0 of F± (H), and that it leaves this subspace invariant. Furthermore, it can be shown that the operator A∗± (h) is closable. Finally, note that the mapping h 7→ A∗± (h) is linear. For vectors h1 , . . . , hn ∈ H it follows easily that √ A∗± (h1 ) . . . A∗± (hn )Ω = n!Pn± (h1 ⊗ . . . ⊗ hn ). The inner product on D± can be expressed as ± hPn± (h1 ⊗ . . . ⊗ hn ), Pm (g1 ⊗ . . . ⊗ gm )iD± = 1 n!m! X ± (σ)± (σ 0 ) σ∈Sn ,σ 0 ∈Sm hhσ(1) ⊗ . . . ⊗ hσ(n) , gσ0 (1) ⊗ . . . ⊗ gσ0 (m) i δnm X ± = (σσ 0 )hhσ(1) , gσ0 (1) i . . . hhσ(n) , gσ0 (n) i (n!)2 0 σ,σ ∈Sn = = δnm (n!)2 X σ,σ 0 ∈Sn ± (σσ 0 ) hh1 , gσ0 σ−1 (1) i . . . hh1 , gσ0 σ−1 (n) i | {z } =± (σ 0 σ −1 ) δmn X ± (σ)hh1 , gσ(1) i . . . hhn , gσ(n) i, n! σ∈Sn where + (σ) = 1 and − (σ) = (σ) for all σ ∈ Sn . The action of the adjoint operator A± (h) : F± (H) → F± (H) of A∗± (h) on the subspace D± is given by A± (h)Ω = 0 A± (h)Pn± (h1 ⊗ . . . ⊗ hn ) = 1 √ n (2.44) n X ± (±1)j−1 hhj , hiPn−1 (h1 ⊗ . . . , hj−1 ⊗ hj+1 , . . . hn ). (2.45) j=1 n (H) into F n−1 (H), it is called an annihilation operator. Because Because this operator maps F± ± A± (h) is the adjoint of a densely defined operator, it is a closed operator. Unlike h 7→ A∗± (h), the mapping h 7→ A± (h) is antilinear. furthermore that A± (h) is the restriction to the space L∞ Note L∞ ⊗n ⊗n F± (H) of the operator B(h) : n=0 H → n=0 H , defined by √ B(h)(h1 ⊗ . . . ⊗ hn ) = nhh1 , hih2 ⊗ . . . ⊗ hn . 48 Indeed, when we apply B(h) to the vector Pn± (h1 ⊗ . . . ⊗ hn ), we get precisely the right-hand side of (2.45). The creation and annihilation operators satisfy the following relations: [A∗± (h1 ), A∗± (h2 )]∓ = [A± (h1 ), A± (h2 )]∓ = 0 [A± (h1 ), A∗± (h2 )]∓ = hh2 , h1 i1F± (H) , where [X, Y ]± = XY ± Y X. Finally, we note that when we have an operator L on the one-particle Hilbert space H, we can define an operator Γ± (L) on D± by Γ± (L)Pn± (h1 ⊗ . . . ⊗ hn ) = Pn± (Lh1 ⊗ . . . ⊗ Lhn ). n (H) invariant. Here, by definition, we set Γ± (L)Ω = Ω. The operator Γ± (L) thus leaves all F± Note that if L is invertible, then Γ± (L)A∗± (h)Γ± (L)−1 Pn± (h1 ⊗ . . . ⊗ hn ) = Γ± (L)A∗± (h)Pn± (L−1 h1 ⊗ . . . ⊗ L−1 hn ) √ ± = n + 1Γ± (L)Pn+1 (h ⊗ L−1 h1 ⊗ . . . ⊗ L−1 hn ) √ ± = n + 1Pn+1 (Lh ⊗ h1 ⊗ . . . ⊗ hn ) = A∗± (Lh)Pn± (h1 ⊗ . . . ⊗ hn ), so if L is invertible then Γ± (L)A∗± (h)Γ± (L)−1 = A∗± (Lh) (2.46) on D± . In case L is also unitary, so that L−1 = L∗ , we have Γ± (L)A± (h)Γ± (L)−1 Pn± (h1 ⊗ . . . ⊗ hn ) = Γ± (L)A± (h)Pn± (L−1 h1 ⊗ . . . ⊗ L−1 hn ) n X 1 ± = √ Γ (L) (±1)j−1 hh, L−1 hj i n = j=1 ± −1 Pn−1 (L h1 ⊗ . . . n 1 X j−1 √ n (±1) ⊗ L−1 hj−1 ⊗ L−1 hj+1 ⊗ . . . ⊗ L−1 hn ) hLh, hj i j=1 ± (h1 ⊗ . . . ⊗ hj−1 ⊗ hj+1 ⊗ . . . ⊗ hn ) Pn−1 = A± (Lh)Pn± (h1 ⊗ . . . ⊗ hn ), so if L is unitary then Γ± (L)A± (h)Γ± (L)−1 = A± (Lh) (2.47) on D± . If L and K are operators on H then Γ± (LK) = Γ± (L)Γ± (K), and we also have Γ± (1H ) = 1F± (H) . Thus, if φ : G → B(H) is a (unitary) representation of a group G on H, then Γ± (φ) := Γ± ◦ φ defines a (unitary) representation of G on F± (H). In particular, this holds for the unitary e↑ on H, so we get a unitary representation Γ± (U ) on F± (H). This representation19 U of P + representation is clearly reducible and the only vector in F± (H) which is invariant under Γ± (U ) e↑ we have is the vacuum vector Ω. Using (2.46) and (2.47) we also find that for any (a, A) ∈ P + Γ± (U (a, A))A∗± (h)Γ± (U (a, A))−1 = A∗± (U (a, A)h) Γ± (U (a, A))A± (h)Γ± (U (a, A))−1 = A± (U (a, A)h). These are the transformation properties of the creation and annihilation operators under the e↑ and we will need them several times in the following chapters. Of course, if there are group P + also parity and time reversal operators P and T defined on the one-particle space H, then we also obtain operators Γ± (P) and Γ± (T) on F± (H). e↑ on Recall from section 2.2.4 that we use the same notation (namely U ) to denote the representations of P + H = H and H = H. 19 49 In case we have a system that contains different types {τ }τ ∈T of particles that are not interacting, we proceed as follows. Let H [τ ] denote the Hilbert space of one-particle states for the particle type τ . So the vectors in this Hilbert space transform according to the irreducible unitary repree↑ with mass mτ and spin sτ (or helicity στ ), as described in the previous section. sentation of P + We then partition the set T of all particle types into two disjoint subsets TB and TF , consisting of all particle types that are bosons or fermions, respectively. Then the Hilbert spaces H1B and H1F of boson, respectively fermion, one-particle state vectors are defined by M M H1B := H [τ ] , H1F := H [τ ] . τ ∈TB τ ∈TF B/F If we write TB = {τB,1 , . . . , τB,kB } and TF = {τF,1 , . . . , τF,kF }, then an arbitrary vector h in H1 can be written as a sum h(τB/F,1 ) ⊕ · · · ⊕ h(τB/F,kB/F ) with h(τB/F,j ) ∈ H [τB/F,j ] . Because each h(τB/F,j ) is in turn a function of p and σ, we can write B/F h ∈ H1 as a function20 h(τ, p, σ), where the set of possible values of σ depends on τ . The inner product of two such functions h and g is then given by X hh, giH B/F = hh(τ ), g(τ )iH [τ ] 1 τ ∈TB/F = X XZ τ ∈TB/F σ∈Iτ h(τ, p, σ)g(τ, p, σ)dλ(p) (2.48) R3 where Iτ = {−sτ , . . . , sτ } if τ is a massive particle with spin sτ , Iτ = {στ } if τ is a massless particle with helicity στ and Iτ = {−στ , στ } if τ is a massless particle with possible helicities ±στ (see also d3 p the discussion above about parity); the volume element dλ(p) is either d3 p or 2ω , depending on p B/F B/F B/F ⊗n whether h, g ∈ H1 or h, gΨ ∈ H1 , respectively. The n-fold tensor product H1 can be identified with the closed linear span of all product functions h1 ⊗ . . . ⊗ hn ' h1 (τ1 , p1 , σ1 )h2 (τ2 , p2 , σ2 ) . . . hn (τn , pn , σn ) B/F with all hj ∈ H1 . We can then construct the n-fold symmetrized (respectively antisymmetrized) n (H B/F ) by using the projection operators P ± : the space F n (H B/F ) is the closed tensor products F± n ± 1 1 linear span of all functions of the form 1 X ± (ρ)hρ(1) (τ1 , p1 , σ1 )hρ(2) (τ2 , p2 , σ2 ) . . . hρ(n) (τn , pn , σn ). Pn (h1 ⊗ . . . ⊗ hn ) = n! ρ∈Sn B/F We then take the direct sum of all these spaces to obtain the Fock spaces F± (H1 ). On B/F these spaces F± (H1 ) we can define, as in (2.42)-(2.45), the creation and annihilation operaB/F B/F tors A∗± (h) and A± (h) for vectors h ∈ H1 . Because we can write each such h ∈ H1 as a direct sum of vectors h(τ ) ∈ H [τ ] with all τ ∈ TB/F , it is useful to introduce a special notation for creation and annihilation operators corresponding to a single particle species. We will write A∗ (τ, h) and A(τ, h) to denote the creation and annihilation operators corresponding to a vec(∗) B/F tor h = h(p, σ) ∈ H[τ ] . In other words, A(∗) (τ, h) = A± (g), where g ∈ H1 is the function L 0 (∗) 0 g(τ , p, σ) = τ ∈TB/F δτ τ h(p, σ). Note that we suppress the subindex ± in A (τ, .), because the choice between + and − follows from τ . The Fock space corresponding to the entire system of particles T = TB ∪ TF is the tensor product HFock := F+ (H1B ) ⊗ F− (H1F ) (2.49) B/F 20 These functions h(τ, p, σ) can be either state functions Ψ ∈ H1 B/F ψ ∈ H1 . 50 or else momentum-spin wave functions of the boson Fock space F+ (H1B ) and the fermion Fock space F− (H1F ). If ΩB and ΩF denote the vacuum vectors of F+ (H1B ) and F+ (H1F ), then the vector ΩFock := ΩB ⊗ ΩF is called the vacuum vector of HFock . When τ is a boson, respectively fermion, we can let the operators A(∗) (τ, h) act on the entire space HFock by taking the tensor product A(∗) (τ, h) ⊗ 1F− (H F ) , respectively 1 1F+ (H B ) ⊗ A(∗) (τ, h). If τ and τ 0 are both bosons or both fermions, so either τ, τ 0 ∈ TB or 1 τ, τ 0 ∈ TF , then these operators satisfy the relations [A∗ (τ, h1 ), A∗ (τ 0 , h2 )]∓ = [A(τ, h1 ), A(τ 0 , h2 )]∓ = 0 [A(τ, h1 ), A∗ (τ 0 , h2 )]∓ = δτ,τ 0 hh2 , h1 i1HFock , where the upper sign corresponds to the boson case and the lower sign to the fermion case. Note that interchanging two operators corresponding to different fermion types costs a minus sign. If one of the particles τ and τ 0 is a boson and the other is a fermion, then their creation and annihilation operators commute with each other. We finally note that the unitary representations e↑ on the one-particle spaces H [τ ] define unitary representations UB := L {Uτ }τ ∈T of P + τ ∈TB Uτ and L ↑ B F e on H and H , respectively. These can then be used to define a unitary UF := Uτ of P τ ∈TF + 1 1 representation UFock := Γ+ (UB ) ⊗ Γ− (UF ) on HFock . If all one-particle spaces H [τ ] also contain parity and time reversal operators, then a similar construction gives us parity and time reversal operators on HFock . As emphasized above, all definitions in this section hold for both the Hilbert space H of oneparticle state functions Ψ(p, σ), as well as for the Hilbert space H of one-particle momentum-spin wave functions ψ(p, σ). In the first case (i.e. in the case H = H, H [τ ] = H[τ ] , etc.) we obtain the Fock spaces F± (H) and HFock and in this case we will write the creation and annihilation operators with a capital letter A, (∗) A± (Ψ), A(∗) (τ, Ψ), etc., whereas in the second case (i.e. in the case H = H, H [τ ] = H[τ ] , etc.) we obtain Fock spaces F± (H) and HFock and in this case we will always write the creation and annihilation operators with a small letter a (∗) a± (ψ), a(∗) (τ, ψ), etc. In the previous section we defined the unitary map J : H → H that relates the two Hilbert spaces. This map naturally extends to a family of maps Γn (J) : H⊗n → H⊗n by defining Γn (J)(Ψ1 ⊗ . . . ⊗ n (H) then defines a unitary Ψn ) = (JΨ1 ) ⊗ . . . ⊗ (JΨn ). For each n, the restriction of Γn (J) to F± n n n map Γ± (J) : F± (H) → F± (H). Hence, there is a unitary map Γ± (J) : F± (H) → F± (H) that relates the two Fock spaces. For Υ ∈ F± (H), the element Γ± (J)Υ is physically equivalent to Υ in a similar sense that, for Ψ ∈ H, the element JΨ is physically equivalent to Ψ. Using Γ± (J), we can express the relation between A(∗) and a(∗) as (∗) (∗) A± (Ψ) = Γ± (J)−1 a± (JΨ)Γ± (J) (∗) (∗) a± (ψ) = Γ± (J)a± (J −1 ψ)Γ± (J)−1 (∗) (∗) Thus A± (Ψ) and a± (JΨ) represent the same physics in the sense that if Υ ∈ F± (H) and if (∗) υ = Γ± (J)Υ ∈ F± (H) (which is physically equivalent to Υ), then A± (Ψ)Υ is physically equivalent (∗) to a± (ψ)υ. Especially in the next chapter, on the physics of quantum fields, the description in terms of H will be most convenient. This coincides with the notation in most of the physics literature, where small letters a are used to denote creation and annihilation operators. 51 3 The physics of quantum fields In this chapter we will give a brief overview of the use of quantum fields in physics in the so-called canonical formalism (as opposed to the path-integral formalism). We will do this by following the main arguments that are stated in chapters 3, 4 and 5 of Weinberg’s book [35]. In contrast to the previous chapters (and the following ones) this chapter will not be mathematically rigorous. Also, we will not provide any derivations of the results in this chapter since they can all be found in [35]. The purpose of this chapter is merely to give some physical background on the mathematical constructions in the next chapter and to motivate the content of the axioms of the two mathematical frameworks that will be discussed in the next chapter. 3.1 The interaction picture and scattering theory For any quantum system with Hamiltonian H the time-evolution operator is given by UH (t) = e−itH , see also subsection 2.2.2. Now assume that this Hamiltonian can be written as H = H0 + V , where H0 is a Hamiltonian which corresponds to an easy and well-understood quantum system and V is some extra term which makes the system more complicated. We thus assume that we already know the evolution operator UH0 (t) = e−itH0 , and we want to express UH (t) in terms of UH0 (t) and V . For this purpose we introduce the so-called interaction picture, which lies in between the Heisenberg picture and the Schrödinger picture. The interaction picture In the interaction picture an observable A evolves according to AI (t) = UH0 (−t)AUH0 (t) and a state vector Ψ evolves according to ΨI (t) = Ω(t)∗ Ψ, where Ω(t) := UH (−t)UH0 (t) and hence Ω(t)∗ = UH0 (−t)UH (t). It is easy to see that the interaction picture is physically equivalent to the Heisenberg picture (which is in turn physically equivalent to the Schrödinger picture), since hAI (t)ΨI (t), ΨI (t)i = hUH0 (−t)AUH0 (t)UH0 (−t)UH (t)Ψ, UH0 (−t)UH (t)Ψi = hUH0 (−t)AUH (t)Ψ, UH0 (−t)UH (t)Ψi = hAUH (t)Ψ, UH (t)Ψi = hUH (−t)AUH (t)Ψ, Ψi. Since we are interested in UH (t) and because UH (t) = UH0 (t)Ω(t)∗ , we must thus find a way to calculate Ω(t)∗ . It follows directly from the expression Ω(t)∗ = UH0 (−t)UH (t) that Ω(t)∗ satisfies dΩ(t)∗ = 1i VI (t)Ω(t)∗ , where VI (t) denotes the interaction picture evolution of V . This allows us dt to write Z t Z dΩ(τ1 )∗ 1 t ∗ ∗ Ω(t) = Ω(0) + dτ1 = 1 + VI (τ1 )Ω(τ1 )∗ dτ1 . dτ1 i 0 0 We can then insert this expression for Ω(t)∗ into the right-hand side of this same equation and repeat this procedure over and over again, so that finally we obtain ∗ Ω(t) Z Z Z τ2 ∞ X 1 t τn ∼ 1+ ... VI (τn )VI (τn−1 ) . . . VI (τ1 )dτ1 . . . dτn−1 dτn in 0 0 0 n=1 Z ∞ X 1 = 1+ VI (τn )VI (τn−1 ) . . . VI (τ1 )dτ1 . . . dτn−1 dτn in t>τn >τn−1 >...>τ1 >0 n=1 Z t Z t ∞ X 1 = 1+ ... T {VI (τn )VI (τn−1 ) . . . VI (τ1 )}dτ1 . . . dτn−1 dτn , in n! 0 0 n=1 52 (3.1) where T { } denotes the time-ordered product (i.e. the operators are ordered in such a way that the time variables of the operators are in anti-chronological order from left to right). This is the expression for Ω(t)∗ that we needed. We will now apply these results to the case of a scattering experiment. Scattering experiments In a typical scattering experiment particles approach each other from very large mutual distances. They will then interact with each other in some small region in space, and finally a collection of particles (not necessarily the same ones) comes out of this region and moves apart to large mutual distances. At the beginning and at the end of such experiment the particles are so far apart that they do not interact with each other. Therefore, if H denotes the Hilbert space corresponding to the scattering experiment and Ψ ∈ H is a pure state vector (in the Heisenberg picture), the transformed state vectors e−iHt Ψ must in some sense ’look like’ state vectors in a free particle Fock space HFock when t → ±∞; here either HFock = HFock or HFock = HFock , see also section 2.2.5. Mathematically, this means that we have two linear isometric embeddings Ωin , Ωout : HFock → H from the free particle Fock space into H. We then define Hin , Hout ⊂ H by Hin = Ωin HFock and Hout = Ωout HFock ; these are called the spaces of asymptotic states of incoming and outgoing particles, respectively. Physically, the maps Ωin and Ωout should be interpreted as follows. If h ∈ HFock is a state corresponding to a collection of non-interacting particles with some specified momenta and spinx3 components, then Ωin h ∈ H represents the state vector in a scattering experiment where the state of the incoming particles (when they are not yet interacting) is given by h. The physical interpretation of Ωout is analogous. The linear operator S = (Ωout )∗ Ωin : HFock → HFock is called the scattering operator. In physics it is often assumed that Hin = Hout (so-called asymptotic completeness), which is equivalent to the requirement that S is unitary. The scattering matrix, or S-matrix, is defined by Sβα := hShα , hβ iHFock = hΩin hα , Ωout hβ iH , where hα , hβ ∈ HFock are free-particle states. Note that it represents the transition amplitude for e↑ the transition Ωin hα → Ωout hβ . Now recall the definition of the unitary representation UFock of P + on HFock as given subsection 2.2.5. The representation UFock induces two unitary representations e↑ on Hin = Hout . The theory will be U in := Ωin UFock (Ωin )∗ and U out := Ωout UFock (Ωout )∗ of P + Poincaré invariant if U in = U out , which is equivalent to SUFock = UFock S. In physics textbooks, it is often assumed that the free particle states and the asymptotic states both live in the same Hilbert space HFock = HFock of many-particle momentum-spin wave functions, so in the rest of this chapter we will take HFock = HFock (rather than HFock = HFock ). Also, the Hamiltonian H of the system is assumed to be a sum H = H0 + V of a free-particle Hamiltonian H0 and an interaction term V . Then the operator Ω(t) = UH (−t)UH0 (t) = eiHt e−iH0 t is defined, and Ω(∓∞) = limt→∓∞ Ω(t) correspond to Ωin and Ωout , respectively, so that S = lim lim Ω(t)∗ Ω(t0 ). t→∞ t0 →−∞ A similar calculation as the one which led to equation (3.1) gives that the operator Ω(t)∗ Ω(t0 ) can be written as Z t Z t ∞ X 1 Ω(t)∗ Ω(t0 ) = 1 + . . . T {VI (τn )VI (τn−1 ) . . . VI (τ1 )}dτ1 . . . dτn−1 dτn . in n! t0 t0 n=1 The S-operator can then be written as Z Z ∞ ∞ X (−i)n ∞ S =1+ ... dt1 . . . dtn T {VI (t1 ) . . . VI (tn )}. n! −∞ −∞ n=1 53 As a first step to guarantee that the S-matrix will be Lorentz-invariant, it is assumed that VI (t) is of the form Z VI (t) = H (t, x)d3 x, (3.2) R3 with H (x) a scalar in the sense that UFock (a, A)H (x)UFock (a, A)−1 = H (Φ(A)x + a), (3.3) e↑ → P ↑ denotes the covering map, as usual21 . The S-operator may then be written where Φ : P + + as Z Z ∞ X (−i)n ... d4 x1 . . . d4 xn T {H (x1 ) . . . H (xn )}. (3.4) S =1+ n! M M n=1 The time ordering of two points in spacetime is only Lorentz-invariant when the two points are not spacelike separated, so to obtain Lorentz invariance we must have that H (x1 )H (x2 ) = H (x2 )H (x1 ) whenever x1 and x2 are spacelike separated. Thus, H (x) satisfies [H (x1 ), H (x2 )] = 0 (3.5) when (x − y)2 < 0. Actually, to avoid singularities when x = y, it is assumed that this holds even when (x − y)2 ≤ 0. In physics it is assumed that different experiments that are carried out at large spatial distances from each other cannot influence one another. This is called the cluster decomposition principle. In terms of the S-matrix, this principle implies that for multiparticle scattering processes α1 → β1 , . . . , αN → βN that are carried out at N different laboratories that are far apart, the S-matrix element of the composite experiment will factorize: Sβ1 +...+βN ,α1 +...+αN → Sβ1 α1 · . . . · SβN αN . To see what kind of Hamiltonians will give rise to an S-matrix that satisfies this property, we need to consider the creation and annihilation operators a(∗) (τ, ψ) on the space HFock defined in subsection 2.2.5.R In the notation where we write a distribution F as a function F (x) in the sense that F (f ) = F (x)f (x)dx, we can write the creation and annihilation operators a(∗) (τ, .) as (operator-valued) functions a(∗) (τ, p, σ) in the sense that22 XZ ∗ a (τ, ψ) = d3 pψ(τ, p, σ)a∗ (τ, p, σ) σ∈Iτ a(τ, ψ) = R3 XZ σ∈Iτ d3 pψ(τ, p, σ)a(τ, p, σ), R3 where we used that a(τ, ψ) depends conjugate-linearly on ψ. It is useful to introduce the so-called definite momentum-spin wave functions ψτ 0 ,p0 ,σ0 . They are defined by ψτ 0 ,p0 ,σ0 (τ, p, σ) = δτ τ 0 δσσ0 δ(p − p0 ). The corresponding definite momentum-spin state functions Ψτ 0 ,p0 ,σ0 (τ, p, σ) are then given by p Ψτ 0 ,p0 ,σ0 (τ, p, σ) = 2ωp0 δτ τ 0 δσσ0 δ(p − p0 ), 21 In some (gauge) theories both (3.2) and (3.3) are only approximately true, but this will not affect the Lorentz invariance. In such theories the interaction is obtained from a Lorentz invariant Lagrangian, which will guarantee Lorentz invariance in a more general way than equations (3.2) and (3.3). We will come back to Lagrangians later. 22 Here and in the rest of this chapter the index set Iτ is defined as in (2.48). 54 but they will not be very useful. The “inner product”of two definite-momentum wave functions is given by Z XXZ 3 ψτ 0 ,p0 ,σ0 (τ, p, σ)ψτ 00 ,p00 ,σ00 (τ, p, σ)d p = δτ 0 τ 00 δσ0 σ00 d3 p τ σ∈Iτ R3 R3 δ(p − p0 )δ(p − p00 ) = δτ 0 τ 00 δσ0 σ00 δ(p0 − p00 ). With the use of these definite momentum-spin wave functions, we can define the operators a(∗) (τ, p, σ) formally as a(∗) (ψτ,p,σ ), since XXZ d3 pδτ τ 0 δσσ0 δ(p − p0 )a(∗) (τ 0 , p0 , σ 0 ) a(∗) (τ 0 , p0 , σ 0 ) = τ = σ∈Iτ R3 XXZ d3 pψτ 0 ,p0 ,σ0 (τ, p, σ)a(∗) (τ 0 , p0 , σ 0 ) R3 τ σ∈Iτ (∗) (ψτ 0 ,p0 ,σ0 ) = a Since we are not being mathematically rigorous in this chapter, we can thus describe the action of these operators as √ ± a∗ (τ, p, σ)Pn± (ψ1 ⊗ . . . ⊗ ψn ) = (ψτ,p,σ ⊗ ψ1 ⊗ . . . ⊗ ψn ) n + 1Pn+1 n 1 X a(τ, p, σ)Pn (ψ1 ⊗ . . . ⊗ ψn ) = √ (±1)j−1 hψj , ψτ,p,σ i n = j=1 ± Pn−1 (ψ1 n X 1 √ n ⊗ . . . ⊗ ψj−1 ⊗ ψj+1 ⊗ . . . ⊗ ψn ) (±1)j−1 ψj (τ, p, σ) j=1 ± Pn−1 (ψ1 ⊗ . . . ⊗ ψj−1 ⊗ ψj+1 ⊗ . . . ⊗ ψn ). TheLoperator a(τ, p, σ) is the restriction to the subspace F± (H) of the operator b(τ, p, σ), defined ⊗n by on ∞ n=0 H √ √ b(τ, p, σ)(ψ1 ⊗ . . . ⊗ ψn ) = nhψ1 , ψτ,p,σ iψ2 ⊗ . . . ⊗ ψn = nψ1 (τ, p, σ)(ψ2 ⊗ . . . ⊗ ψn ). For an arbitrary function ψ (n) ∈ H⊗n this means that √ [b(τ, p, σ)ψ (n) ](τ1 , p1 , σ1 , . . . , τn−1 , pn−1 , σn−1 ) = nψ (n) (τ, p, σ, τ1 , p1 , σ1 , . . . , τn−1 , pn−1 , σn−1 ). If τ is a massive particle of spin sτ , then the operators a(∗) (τ, p, σ) transform under a transformation e↑ as23 (b, A) ∈ P + s (Φ(A)p)0 UFock (b, A)aτ (p, σ)UFock (b, A)−1 = e−ib·Φ(A)p p0 sτ X −1 [D(sτ ) (MΦ(A)p AMp )−1 ]σσ0 aτ (Φ(A)p, σ 0 ), (3.6) σ 0 =−sτ s UFock (b, A)a∗τ (p, σ)UFock (b, A)−1 = eib·Φ(A)p sτ X (Φ(A)p)0 p0 −1 [D(sτ ) (MΦ(A)p AMp )−1 ]σσ0 a∗τ (Φ(A)p, σ 0 ), (3.7) σ 0 =−sτ 23 (∗) In order to save space, we will from now on always write aτ (p, σ) instead of a(∗) (τ, p, σ). 55 where Φ(A)p denotes the spatial part of Φ(A)p. If τ is a massless particle with (possible) helicity στ , these transformations become s (Φ(A)p)0 UFock (b, A)aτ (p, στ )UFock (b, A)−1 = e−ib·Φ(A)p p0 e −1 −iστ α(MΦ(A)p AMp ) s UFock (b, A)a∗τ (p, στ )UFock (b, A)−1 = eib·Φ(A)p e aτ (Φ(A)p, στ ), (3.8) (Φ(A)p)0 p0 −1 iστ α(MΦ(A)p AMp ) ∗ aτ (Φ(A)p, στ ), (3.9) where α(M ) is defined in the same manner as in (2.36). The annihilation and creation operators, for both massive and massless particles, satisfy the (anti)commutation relations [aτ (p0 , σ 0 ), a∗τ 0 (p, σ)]± = δτ τ 0 δσ0 σ δ(p0 − p), (3.10) [a∗τ (p0 , σ 0 ), a∗τ 0 (p, σ)]± [aτ (p0 , σ 0 ), aτ 0 (p, σ)]± = 0, (3.11) = 0, (3.12) where the minus sign holds whenever τ and τ 0 are both bosons and the plus sign holds whenever τ and τ 0 are both fermions. When one of the particles τ or τ 0 is a boson and the other is a fermion, then all creation and annihilation operators commute. Instead of aτ (p, σ) we will often simply write a(q). Each operator A can then be represented in the form e := A∼A ∞ X ∞ Z X 0 0 dq10 . . . dqN dq1 . . . dqM AN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN )a(qM ) . . . a(q1 ) N =0 M =0 e 1 , Ψ2 i for all Ψ1 , Ψ2 ∈ H. Here the {AN M (q 0 , q)}N,M are in the sense that hAΨ1 , Ψ2 i = hAΨ 0 , q , . . . , q . In particular, we can write the complex-valued functions in the variables q10 , . . . , qN 1 M Hamiltonian as ∞ X ∞ Z X 0 0 H= dq10 . . . dqN dq1 . . . dqM HN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN )a(qM ) . . . a(q1 ). (3.13) N =0 M =0 It can be shown that the cluster decomposition principle will be satisfied if the coefficients HN M (q 0 , q) are of the form   N M X X e N M (q 0 , q), HN M (q 0 , q) = δ  p0i − pj  H i=1 j=1 e N M (q 0 , q) contains no further delta functions. Because the free particle Hamiltonian is where H always of the form Z H0 = dqE(q)a∗ (q)a(q), p with E(q) = E(τ, p, σ) = mτ 2 + p2 , it follows that the interaction V = H − H0 must be of the form ∞ X ∞ Z X 0 0 V = dq10 . . . dqN dq1 . . . dqM VN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN )a(qM ) . . . a(q1 ) (3.14) N =0 M =0 P PM 0 0 0 e e with VN M of the form δ( N i=1 pi − j=1 pj )VN M (q , q), where VN M (q , q) contains no further delta functions. Note furthermore that V will be hermitian if and only if the coefficients satisfy 0 ,q ,...,q ) = V 0 0 VN M (q10 , . . . , qN 1 M M N (q1 , . . . , qM , q1 , . . . , qN ). 56 3.2 The use of free quantum fields in scattering theory To summarize the results of the previous section, for Lorentz invariance the operator V must be of the form (3.2) with H (x) satisfying (3.3) and (3.5), and for the S-matrix to satisfy the cluster decomposition principle, V must be of the form (3.14) with coefficients VN M (q 0 , q) as described above. To satisfy both of these conditions, we will (at the end of this section) construct a scalar density24 X X X 0 τN − τ10 + τM H (x) = gj1 ...jN ,k1 ...kM (φτ1 )− )+ j1 (x) . . . (φ )jN (x)(φ )k1 (x) . . . (φ kM (x) N,M j1 ,...,jN k1 ,...,kM τ − out of annihilation fields {(φτ )+ j (x)}j,τ and creation fields {(φ )j (x)}j,τ : XZ τ + (φ )j (x) = d3 puj (x; p, σ, τ )aτ (p, σ), σ∈Iτ (φτ )− j (x) = XZ d3 pvj (x; p, σ, τ )a∗τ (p, σ). σ∈Iτ e↑ the fields Here the coefficients uj and vj are chosen so that under a transformation (b, M ) ∈ P + transform as X −1 (3.15) UFock (b, M )(φτ )± = D(M −1 )jk (φτ )± j (x)UFock (b, M ) k (Φ(M )x + b). k with {D(M −1 )jk }j,k some collection of numbers depending on M −1 . When we take M = 1 and compare this expression with (3.6), (3.7), (3.8) and (3.9), we find that the coefficients are of the form uj (x; p, σ, τ ) = (2π)−3/2 uj (p, σ, τ )e−ip·x vj (x; p, σ, τ ) = (2π)−3/2 vj (p, σ, τ )eip·x , p where p0 = m2τ + p2 . Furthermore, by applying two respective transformations as in (3.15), it can easily be seen that the matrices D(M ) form a representation of SL(2, C). Therefore, before we can construct such fields we first need to consider the representations of SL(2, C). Representations of SL(2, C) Every representation of SL(2, C) can be written as a direct sum of irreducible representations and because SL(2, C) is simply-connected, any representation D of SL(2, C) is completely determined by the corresponding representation ϕD of its Lie algebra sl(2, C). Recall that the 1 six matrices { 21 σj , 2i σj }j=1,2,3 form a basis (over R) for the Lie algebra sl(2, C) of SL(2, C). If ϕ : sl(2, C) → gl(V ) is any representation of sl(2, C) in some complex vector space V , then we 1 define on V the six linear maps Jk := iϕ( 2i σk ) and Kk = iϕ( 12 σk ), k = 1, 2, 3. These linear maps satisfy the commutation relations [Ji , Jj ] = iijk Jk , [Ji , Kj ] = iijk Kk , [Ki , Kj ] = −iijk Jk . We now define another six linear maps Ak = Bk = 24 1 (Jk + iKk ), 2 1 (Jk − iKk ), 2 Here τ denotes the particle species, as usual. 57 which satisfy the commutation relations [Ai , Aj ] = iijk Ak , [Bi , Bj ] = iijk Bk , [Ai , Bj ] = 0. From these relations it follows that V can be written as a tensor product V = VA ⊗ VB and that the action of Ak and Bk on V is given by (A) Ak .(vA ⊗ vB ) = (Sk vA ) ⊗ vB (B) Bk .(vA ⊗ vB ) = vA ⊗ (Sk vB ), (A) (B) where vA ∈ VA , vB ∈ VB , A, B ∈ 21 Z≥0 and Sk , Sk are the spin operators corresponding to spin A and B, respectively; see also equation (2.28) for the definition of spin operators. Note that, in particular, the dimensions of the spaces VA and VB are 2A + 1 and 2B + 1, respectively, and the dimension of V is (2A + 1)(2B + 1). We will label these representations as ϕ(A,B) and we will (A) (B) write V (A,B) , VA and VB instead of V , VA and VB . These representations of sl(2, C) give rise to irreducible representations D (A,B) of SL(2, C), which we will call the (A, B)-representation of SL(2, C). This representation is given by D (A,B) (M )(vA ⊗ vB ) = D(A) (M )vA ⊗ D(B) (M )vB , where the D(.) denote the SU (2) representations discussed in subsection 2.2.4. Note that the (A, B)-representations are not unitary, due to the fact that SL(2, C) is non-compact. However, the compact subgroup SU (2) ⊂ SL(2, C) is represented unitarily, with generators Jk = Ak + Bk . (A) (B) It follows that under an SU (2) transformation a vector v ∈ V (A,B) = VA ⊗ VB transforms as the direct sum of spin j objects, with j = A + B, A + B − 1, . . . , |A − B|. Finally, we mention that in case any (not necessarily irreducible) representation can be extended to a representation that includes space inversion, there is an operator β that satisfies βJk β −1 = Jk and βKk β −1 = −Kk , and hence βAk β −1 = Bk and βBk β −1 = −Ak . Clearly, such an operator β only makes sense in the (A, A)-representation and the (A, B) ⊕ (B, A)-representation. Construction of general free fields Now that we have found the representations of SL(2, C), we can write the annihilation and creation fields corresponding to the (A, B)-representation as25 XZ + τ −3/2 (φA,B )ab (x) = (2π) d3 p(uA,B )ab (p, σ, τ )e−ip·x aτ (p, σ), (3.16) σ∈Iτ (φτA,B )− ab (x) −3/2 = (2π) XZ d3 p(vA,B )ab (p, σ, τ )eip·x a∗τ (p, σ), (3.17) σ∈Iτ where a = −A, −A + 1, . . . , A and b = −B, −B + 1, . . . , B. We will sometimes suppress the capital letters A and B in the coefficients u and v to prevent that the equations become too wide. The annihilation and creation fields transform according to i Xh −1 (A,B) −1 UFock (b, M )(φτA,B )± (x)U (b, M ) = D (M ) (φτA,B )± Fock ab a0 b0 (Φ(M )x + b). (3.18) 0 0 ab,a b a0 ,b0 25 At this point we do not make any statements about the existence of such fields, i.e. about the question whether (for any particle species τ ) components (uA,B )ab (p, σ, τ ) and (vA,B )ab (p, σ, τ ) can be found that transform properly. Later we will answer this question for massive particles and massless particles separately. 58 The (anti-) commutation relations (3.10), (3.11) and (3.12) imply that the fields satisfy the (anti)commutation relations 0 + τ [(φτA,B )+ ab (x), (φA0 ,B 0 )a0 b0 (y)]± = 0, 0 − τ [(φτA,B )− ab (x), (φA0 ,B 0 )a0 b0 (y)]± = 0, Z δτ τ 0 X − τ0 d3 p(uA,B )ab (p, σ, τ )(vA,B )a0 b0 (p, σ, τ 0 )e−ip·(x−y) , [(φτA,B )+ (x), (φ ) (y)] = 0 0 ± A ,B a0 b0 ab (2π)3 σ∈Iτ where the plus sign corresponds to the case where τ and τ 0 are both fermions and the minus sign to the case where at least one of the particles τ and τ 0 is a boson. In order to obtain quantities that either commute or anticommute at space-like distances (for reasons that will become clear when we will come back to equation (3.5)), we will construct linear combinations − τ (3.19) (φτA,B )ab (x) := κ(φτA,B )+ ab (x) + λ(φA,B )ab (x) Z X = (2π)−3/2 d3 p κe−ip·x uab (p, σ)aτ (p, σ) + λeip·x vab (p, σ)a∗τ (p, σ) , σ∈Iτ where κ and λ (depending on τ and on the pair (A, B), but not on the components a and b) are chosen so that if x − y is spacelike, we have [(φτA,B )ab (x), (φτA0 ,B 0 )a0 b0 (y)]± = 0 (3.20) [(φτA,B )ab (x), (φτA0 ,B 0 )∗a0 b0 (y)]± (3.21) = 0 for all pairs (A, B) and (A0 , B 0 ) for which these fields exist (again, we will come back to the existence later). Note that instead of using the annihilation and creation fields {(φτA,B )± (x)}A,B for each particle τ occuring in the scattering experiment, we will from now on use the fields {(φτA,B )(x)}A,B and their adjoints {(φτA,B )∗ (x)}A,B . The correct values of κ and λ wil be given later, but first we have to consider another problem. It might happen that particles that are created and annihilated by these fields carry conserved quantum numbers, such as electric charge, in which case we must be sure that H (x) commutes with the operator that corresponds to the conserved quantity. If Q is an operator corresponding to some conserved quantum number (for example electric charge) and if q(τ ) is the value of the conserved quantum number for particles of type τ , then we have the commutation relations [Q, aτ (p, σ)] = −q(τ )aτ (p, σ) [Q, a∗τ (p, σ)] = q(τ )a∗τ (p, σ), which show that the commutation relations between Q and the field (3.19) are not so pretty in the sense that they do not allow an easy way to construct an interaction H (x) out of fields of the form (3.19) that also commutes with Q. To solve this problem, we postulate that for each particle type τ there exists another particle type τ C , called the antiparticle of τ , which has the same mass but carries opposite values for all conserved quantum numbers. It can also happen that τ C = τ , in which case the antiparticle and the particle are identical and in this case there is no problem in C using the field (3.19). In case τ C 6= τ , we define the annihilation and creation fields (φτA,B )+ ab (x) C C and (φτA,B )− ab (x) to be the same as those in (3.16) and (3.17) but with every τ replaced with τ . In this case we replace equation (3.19) with C − τ (φτA,B )ab (x) := κ(φτA,B )+ (3.22) ab (x) + λ(φA,B )ab (x) Z X = (2π)−3/2 d3 p κe−ip·x uab (p, σ)aτ (p, σ) + λeip·x vab (p, σ)a∗τ C (p, σ) , σ∈Iτ with κ and λ still such that (3.20) and (3.21) are satisfied. If Q and q(τ ) are as above, then we now have the simple commutation relations [Q, (φτA,B )ab (x)] = −q(τ )(φτA,B )ab (x), (3.23) [Q, (φτA,B )∗ab (x)] (3.24) = 59 q(τ )(φτA,B )∗ab (x). Note that these fields thus have the nice property that [Q, (φτA,B )(x)(φτA,B )∗ (x)] = [Q, (φτA,B )(x)](φτA,B )∗ (x) + (φτA,B )(x)[Q, (φτA,B )∗ (x)] = −q(τ )(φτA,B )(x)(φτA,B )∗ (x) + q(τ )(φτA,B )(x)(φτA,B )∗ (x) = 0, where we have suppressed the indices a and b. Later we will use a generalization of this property to ensure that the interaction commutes with Q. To summarize our results so far, for each particle type τ occuring in the scattering experiment we construct (for certain pairs (A, B) depending on the particle species τ ) fields as in (3.19) if C τ C = τ and fields as in (3.22) if τ C 6= τ . However, we will not construct fields (φτA,B )(x), so for each particle-antiparticle pair we must agree which of the two is the particle and which is the antiparticle. Note that the partial derivatives of the fields (3.19) and (3.22) are given by τ /τ C − ∂µ (φτA,B )ab (x) = −ipµ κ(φτA,B )+ ab (x) + ipµ λ(φA,B )ab (x), which we can regard as a different field on its own since it is also of the form (3.19) or (3.22). By taking partial derivatives again, it follows immediately that each component (φτA,B )ab (x) of the fields satisfies the Klein-Gordon equation: ( + m2τ )(φτA,B )ab (x) = 0. For notational convenience we will combine all the irreducible fields {(φτA,B )(x)}τ,(A,B) that occur in the description of a scattering experiment into a single field φ with components φj . The soobtained field φ of course no longer transforms irreducibly. Along with the field φ, we will also consider its adjoint φ∗ . We can then construct very general Hamiltonian densities X X X H (x) = gj1 ...jN ,k1 ...kM : φj1 (x) . . . φjN (x)φ∗k1 (x) . . . φ∗kM (x) : (3.25) N,M j1 ,...,jN k1 ,...,kM where the colons : : indicate normal ordering, i.e. the expression obtained by moving all creation operators to the left of all annihilation operators while including a minus sign whenever two fermion operators are interchanged. When we choose the coefficients gj1 ...jN ,k1 ...kM to transform properly, H (x) will automatically satisfy the scalar condition (3.3). If each term in this interaction contains an even number of fermion fields, then this interaction will also commute with itself at spacelikedistances. In equations (3.23) and (3.24) we have seen that for an operator Q that corresponds to some conserved quantum number (such as electric charge) our fields satisfy commutation relations of the form [Q, φj ] = −q(j)φj and [Q, φ∗j ] = q(j)φ∗j , so if each term in H (x) consists of a product φj1 (x) . . . φjN (x)φ∗k1 (x) . . . φ∗kM (x) of fields and adjoint fields for which q(j1 ) + . . . + q(jN ) − q(k1 ) − . . . − q(kM ) = 0, then H (x) will satisfy [Q, H (x)] = 0. Before closing this section, we will give the explicit construction of the fields described above. In order to find the coefficients uab and vab , as well as the constants κ and λ, we have to consider the fields of massive particles and the fields of massless particles separately. Massive particle fields If τ is a massive particle, then the index σ takes values in {−sτ , . . . , sτ } and the coefficients are given by 1 X h θ·S(A) i h −θ·S(B) i e e CAB (sτ , σ; a0 , b0 ) (3.26) uab (p, σ) = p aa0 bb0 2ωp a0 ,b0 vab (p, σ) = (−1)jτ +σ uab (p, −σ), where θ is the boost parameter corresponding to the boost that maps (m, 0, 0, 0) to (ωp , p) and CAB (j, σ; a, b) are Clebsch-Gordan coefficients defined by X (s1 ,s2 ) vs,m = Cs1 s2 (s, m; m1 , m2 )vs1 ,m1 ⊗ vs2 ,m2 , m1 ,m2 60 where on the right-hand side the vectors vsi ,mi ∈ V (si ) denote the joint eigenvectors of [S(si ) ]2 and (s ) (s1 ,s2 ) S3 i (in physics these vectors are written as |si mi i), and on the left-hand side the vectors vs,m ∈ (s) (s) (s ) (s ) 1 2 (s ) (s ) (s) 2 V 1 ⊗ V 2 denote the joint eigenvectors of [S ] and S3 with Sk := Sk ⊗ 1 + 1 ⊗ Sk . Before we can go on, we first make some remarks concerning the relationship between particles and the fields that describe them. It is not true that any (A, B)-field can describe some given massive particle species τ . By considering the transformation properties of the (A, B)-field under rotations, it can be shown that it can only describe particles with spin A+B, A+B −1, . . . , |A−B|. Thus, for a given particle species τ we can only construct (A, B)-fields for which |A − B| ≤ sτ ≤ A + B and A + B − sτ ∈ Z≥0 . These different fields for τ are not physically distinct, however. For example, when we take the derivative ∂µ φ0,0 (x) of the (0, 0)-field (or scalar field) φ0,0 (x), we obtain a field that transforms as a ( 12 , 12 )-field (or vector field). In general, any (A, B)-field for a given particle type τ can be expressed either as a rank 2B differential operator acting on a (sτ , 0)-field or as a rank 2A differential operator acting on a (0, sτ )-field. For reasons that will become clear later, we want that all (A, B)-fields that describe a single type of particle τ will commute with each other when τ is a boson and anticommute with each other when τ is a fermion. To achieve this, we must choose the constants κ and λ in (3.22) such that26 λ = (−1)2B κ. By adjusting the overall scale of the field, we can then write it as (φτA,B )ab (x) −3/2 = (2π) Z sτ X d3 puab (p, σ, τ ) e−ip·x aτ (p, σ) + (−1)2B+sτ +σ eip·x a∗τ C (p, −σ) . σ=−sτ As shown in section 5.7 of [35] (in particular, equation (5.7.53)), the adjoints of the components of an (A, B)-field for a particle with τ C = τ are proportional to the components of the (B, A)-field for the particle τ , so if τ C = τ then the adjoint fields do not give rise to new kinds of objects. Finally, we will mention the transformation properties of these fields under space inversions. In the (A, A)-representation the field transforms according to PφτA,A (x)P−1 = (−1)2A−j ξτ φτA,A (Is x), with ξτ the intrinsic parity defined in the previous chapter, while in the (A, B)⊕(B, A)-representation it transforms according to τ τ φA,B (x) φB,A (Is x) −1 A+B−sτ P τ P = (−1) ξτ . (3.27) φB,A (x) φτA,B (Is x) So under space inversion, the (A, B) ⊕ (B, A)-representation becomes the (B, A) ⊕ (A, B)- representation. In appendix B we give some examples of free massive fields that can be obtained by explicit calculation of the coefficients uab (p, σ, τ ) for some representation (A, B). As is shown in that appendix, the ( 21 , 0) ⊕ (0, 21 )-field automatically satisfies the Dirac equation. Massless particle fields As we have already mentioned when we discussed parity, it is sometimes necessary to identify two massless particles that only differ from each other in the fact that they have opposite helicities. We will first consider the case where the massless particle τ can have only one helicity στ . In this case the field has the form Z τ −3/2 (φA,B )ab (x) = (2π) d3 p κe−ip·x uab (p, στ )aτ (p, στ ) + λeip·x vab (p, στ C )a∗τ C (p, στ C ) . Just as the (A, B)-field for massive particles could only describe particles with spin sτ for which |A − B| ≤ sτ ≤ A + B, the (A, B)-field for massless particles can only describe particles for which 26 In deriving this result (see section 5.7 of [35]) it becomes clear that fields which describe particles with integer spin commute, while fields that describe particles with half-odd integer spin anticommute. In other words, particles with integer spin are bosons and particles with half-odd integer spin are fermions. This is the content of the famous spin-statistics theorem. We will come back to this theorem in the next chapter. 61 στ = A − B and στ C = −στ . Thus the simplest field that one can construct for a massless particle τ is the (στ , 0)-field if στ ≥ 0 and the (0, |στ |)-field if στ < 0. The (B + στ , B)-fields are 2Bth order derivatives of the (στ , 0)-field and the (A, A + |στ |)-fields are 2Ath order derivatives of the (0, |στ |)-field. As stated before, the only irreducible representations (A, B) that can be extended to a representation that also contains space inversion are the representations for which A = B. In the present case of massless particles this representation can only be chosen if στ = 0 and in that case we have A = B = 0, i.e. a scalar field. For στ 6= 0 we again have to consider fields of type (A, B) ⊕ (B, A). However, in order to obtain a transformation law for massless particles analogous to (3.27), we must identify the particle type τ with the particle type which is obtained by substituting στ → −στ . Otherwise the two τ ’s on the right-hand side of (3.27) cannot be the same as the ones on the left-hand side, because the τ s necessarily have opposite helicity (because A and B are switched). 3.3 Calculation of the S-matrix using perturbation theory In this section we will give a brief overview of how physicists use perturbation theory to calculate the S-matrix elements for some scattering process. If ψα , ψβ ∈ HFock are two Fock space vectors ∗ in M ∗ out given by ψα = ΠN j=1 a (qj )ΩFock and ψβ = Πj=1 a (qj )ΩFock with the qj specifying particle species, momentum and spin, then it follows from equation (3.4) that the S-matrix element hSψα , ψβ i is equal to Z ∞ X M 1 ∗ in Πj=1 a(qjout )T {H (x1 ) . . . H (xn )}ΠN j=1 a (qj )ΩFock , ΩFock dx1 . . . dxn . n i n! R4n n=0 We can write the interaction density (3.25) as H (x) = X gi Hi (x), i where i denotes a multi-index and each Hi (x) is a normal-ordered product of fields and field adjoints. The task is therefore to calculate expressions of the form M ∗ in Πj=1 a(qjout )T {Hi1 (x1 ) . . . Hin (xn )}ΠN j=1 a (qj )ΩFock , ΩFock or, in terms of the field φ D E (∗) (∗) (∗) (∗) out N ∗ in ΠM a(q )T {: φ (x ) . . . φ (x ) : . . . : φ (x ) . . . φ (x ) :}Π a (q )Ω , Ω Fock Fock , j=1 j j=1 j i1 ,1 1 in ,1 n i1 ,k(1) 1 in ,k(n) n where the double indices of the fields are defined by (∗) (∗) Him (x) =: φim ,1 (x) . . . φim ,k(im ) (x) : . For notational convenience, we will write such expressions simply as hT {A1 . . . An }ΩFock , ΩFock i , where the Aj are either fields/field adjoints or creation/annihilation operators and, by definition, the time ordering has no effect on the a(qjout ) and a∗ (qjin ). For any pair (Aj , Ak ) we define the contraction by z }| { Aj Ak = T {Aj Ak }− : Aj Ak : (time-ordering minus normal-ordering), where T {Aj Ak } = Aj Ak whenever at least one of the two operators does not depend on time. Contractions happen to be always scalar multiples of the 62 identity operator, and we will identify the contraction with this scalar (thus forgetting about the identity operator). Then according to Wick’s theorem we have for n even hT {A1 . . . An }ΩFock , ΩFock i = X z }| { z }| { (P ) Aj1 Aj2 . . . Ajn−1 Ajn , (3.28) P where the sum runs over all groupings P of the Aj ’s into n/2 pairs with the ordering in each pair the same as on the left-hand side, so j1 < j2 , j3 < j4 , . . . , jn−1 < jn . The sign (P ) depends on the number of fermion interchanges that were needed to bring the two elements in each pair next to each other. For n odd, the expression on the left-hand side is simply zero. Recalling that the Aj are actually fields or creation/annihilation operators, we see that only a few contractions are nonzero. Their contributions are27 : z }| { out ∗ in in out in aτ (C) (pout , σ )a (p , σ j j k k ) = δσjout σkin δ(pj − pk ) τ (C) }| { z out out τ ∗ out aτ (pout , σ )(φ ) (x) = (2π)−3/2 eipj ·x u` (pout j j ` j , σj ) }| { z out out out τ out aτ C (pj , σj )(φ )` (x) = (2π)−3/2 eipj ·x v` (pout j , σj ) z }| { in −3/2 −ipin in (φτ )` (x)a∗τ (pin , σ e j ·x u` (pin j j ) = (2π) j , σj ) z }| { in −3/2 −ipin in , σ e j ·x v` (pin (φτ )∗` (x)a∗τ C (pin j j ) = (2π) j , σj ) z }| { + ∗ φj (xm )φ∗k (xm+l ) = θ(x0m − x0m+l )[φ+ j (xm ), (φk ) (xm+l )]∓ − ∗ ±θ(x0m+l − x0m )[(φ− k ) (xm+l ), φj (xm )]∓ =: −i∆jk (xm , xm+l ) z }| { φ∗j (xm )φk (xm+l ) = − ∗ θ(x0m − x0m+l )[(φ− j ) (xm ), φk (xm+l )]∓ + ∗ ±θ(x0m+l − x0m )[φ+ k (xm+l ), (φj ) (xm )]∓ = ∓i∆kj (xm+l , xm ) where in the last two equalities m, l ≥ 1 and θ : R → {0, 1} denotes the step function and the lower signs correspond to the case where both components φj and φk are fermionic. Furthermore, + − the φ± j (x) refer to the decomposition φj (x) = φj (x) + φj (x) of the field into an annihilation field and a creation field. The quantities ∆ij (x, y) are called propagators. The nonzero terms in (3.28) are products of pairings of the forms above and such products of pairings can be graphically represented by Feynman diagrams. In these diagrams the initial particles are represented by vertices at the bottom of the diagram and the final particles are represented by vertices at the top of the diagram, and these external vertices are labeled by their corresponding particle species, momenta and spin states. Each Hj (xk ) is represented by a vertex that is drawn between the initial and final vertices, and each such internal vertex is labeled by j and xk . Each pairing is then represented by a line connecting the two vertices that represent the two paired objects. Here we mean, in particular, that if a field or adjoint field in Hj (xk ) is paired with some other object then we draw a line between that object and the vertex labeled by j, xk . Of course this implies that if Hj (xk ) is a product of K fields and adjoint fields, then there are K lines that connect the vertex j, xk with other vertices. Each line that connects the external vertex of a particle carries an upward arrow, while each line that connects the external vertex of an antiparticle carries a downward arrow. Internal lines, representing a pairing of a field with an adjoint field, carry an arrow that points from the vertex of the adjoint field to the vertex of the 27 Here τ C denotes the antiparticle of τ , as usual. In each line where there are two τ s, these two τ s are the same. In the first line τ (C) is either τ or τ C , but the same choice must be made for both factors in the first line. Furthermore, the field φ can be decomposed into irreducible components each of which corresponds to a single particle (not antiparticle) species. If we write (φτ )` instead of φ` we mean that the `th component of φ belongs to an irreducible representation corresponding to particle species τ . 63 field. All lines in the diagram are labeled by the value of the corresponding contraction. The term in (3.28) that corresponds to the diagram can now be recovered from the diagram as follows. For each internal vertex j, xk we write a factor −igj and for each line we write a factor that is equal to its label (i.e. to the corresponding contraction). The result is then integrated over all coordinates xk , yielding the desired term in (3.28). In practice, one uses the Feynman diagrams to calculate all terms (up to some order) in the expansion of the S-matrix elements. However, we will not discuss the details of this procedure here, nor will we discuss the renormalization procedure that is needed whenever the obtained expressions are infinite. 3.4 Obtaining V from a Lagrangian In the previous section we mentioned that perturbation theory can be used to compute S-matrix elements when we are given an expression for the interaction V (in the interaction picture) in terms of the free fields. In this section we will show how this expression for V can be obtained from a Lagrangian. In practice, this is very useful because Lagrangians are often easier to guess than Hamiltonians. Furthermore, if there is Poincaré invariance in the Lagrangian formalism, then the S-matrix will automatically be Poincaré invariant, even if the requirements (3.2) and (3.3) are not exactly satisfied; see section 7.4 of [35]. These Lagrangians are given in the context of a classical Lagrangian field theory, so we will first discuss classical Lagrangian field theory. Next we will make the transition to Hamiltonian classical field theory, and then we will quantize the theory to obtain Heisenberg picture quantum fields. After quantization we can then make the transition from the Heisenberg picture to the interaction picture, and finally derive the expression for V . Although the main structure of this chapter is based on [35], some parts of the present section are based more on [6]. Classical relativistic field theory In classical relativistic field theory, the field components φ1 (x), . . . , φn (x) are complex-valued28 e↑ as29 functions on spacetime that transform under a transformation (a, A) ∈ P + φj (x) → n X [D(A)]jk φk (Φ(A)x + a), k=1 where D is a representation of SL(2, C) and Φ : SL(2, C) → L↑+ is the covering map. Note that the partial derivatives ∂µ φj (x) also transform in such a way, namely as the tensor product of the representation D with the representation A 7→ Φ(A) (which transforms the index µ in ∂µ ). For fixed time x0 = t, the fields φj (t, x) and their time derivatives φ̇(t, x) are functions of the space coordinates x. We will call the space of all such possible functions the configuration space of the system. This configuration space is thus a function space, the elements of which are ordered 2ntuples (q1 (x), . . . , qn (x), q̇1 (x), . . . , q̇n (x)) of functions qj and q̇j with certain smoothness conditions (and perhaps some boundary conditions as well); the notation q̇j has nothing to do with the timederivative of qj (which does not even depend on time). The Lagrangian L is a functional L[qj , q̇j ] of these 2n functions. We will always assume that there is a smooth function L(aj , bk , cl ) on Cn+3n+n = C5n such that L[qj , q̇j ] can be written as Z L[qj , q̇j ] = L(qj (x), ∇qj (x), q̇j (x))d3 x. R3 In particular, when these functions qj (x) and q̇j (x) are taken to be the values of the field components φj (x) and their time-derivatives ∂0 φ(x) at a fixed moment x0 = t of time, we can define a 28 Here we consider real-valued functions as a special subclass of complex-valued functions. This transformation property is only approximately true for components that represent gauge fields, where the transformation rule may contain gauge transformations. Such extra terms have no effect on the Poincaré invariance of the theory because the action S (we will define actions below) is always considered to be invariant under such gauge transformations. 29 64 functional L(t) by: Z L(φ(t, x), ∇φ(t, x), ∂0 φ(t, x))d3 x L(t) = L[φ(t, .), ∂0 φ(t, .)] = 3 R Z = L(φ(t, x), ∂µ φ(t, x))d3 x, R3 where φ(t, .) and ∂0 φ(t, .) denote the functions x 7→ φ(t, x), and x 7→ ∂0 φ(t, x). In the last step we simply used a different convention for writing the dependence of L on its arguments. For given φj (x), the action S is now defined as the time integral of L(t) and it is a functional of the fields φj (x), but not of their time-derivatives, since these can be calculated once we know the fields (at all times), Z L(φ(x), ∂µ φ(x))d4 x. S = S[φ(.)] = M The action is manifestly invariant under spacetime translations φj (x) → φj (x+a) and it is assumed that it is also a real-valued Lorentz-scalar. The equations of motion (or field equations) are obtained by demanding that the action is stationary under variations; these equations are ∂L ∂L − ∂µ = 0, ∂φj ∂(∂µ φj ) for j = 1, . . . , n, and are called the Euler-Lagrange equations. Suppose that the action S is invariant (by which we mean that δS = 0) under an infinitesimal variation φj (x) → φj (x) + Fj (x), where is a constant and the Fj (x) are functions of the field components and their derivatives at the point x. We also call this an infinitesimal symmetry of the action. Then there exists a current J µ (x) that is conserved, i.e. ∂µ J µ (x) = 0. In particular, this implies that we can define a quantity Z Q(t) := J 0 (t, x)d3 x R3 that is conserved in the sense that dQ dt ≡ 0; in other words, an infinitesimal symmetry of the action S implies the existence of a conserved quantity. This statement is also known as Noether’s theorem. In cases where not only the action S is invariant under a certain infinitesimal variation of the fields, but also the Lagrangian (this is the case for spatial translations and rotations), one can write down an explicit expression for the conserved quantity Q in terms of the Lagrangian L and the functions Fj (x). If the Lagrangian density L is also invariant under the variation, then one can even find an explicit expression for the current J µ (x) in terms of the Lagrangian density and the functions Fj (x). These expressions can be found in section 7.3 of [35]. As already stated above, the action is invariant under spacetime translations, which can be described in infinitesimal form as φj (x) → φj (x) + µ ∂µ φj (x). These are in fact four independent symmetries (one for each spacetime dimension) and hence there are four conserved currents T µ0 (x), . . . , T µ3 (x). Because the Lagrangian density is not invariant under translations, one cannot use the explicit expression for the currents that we mentioned above. However, it is possible to derive an expression for these currents by more direct means, see section 7.3 of [35]; the result is T µν := n X j=1 ∂L ∂ ν φj − η µν L, ∂(∂µ φj ) which is also called the energy-momentum tensor (the index ν is a Lorentz index, so this is indeed a tensor), and the corresponding conserved quantities are   Z Z X n ∂L P µ = T 0µ d3 x =  ∂ µ φj − η 0µ L d3 x. ∂(∂0 φj ) j=1 65 The conserved quantities P µ are interpreted as the four-momentum. The conserved currents {M µνρ }ν,ρ corresponding to Lorentz invariance are given by M µρσ n ∂L 1 X 1 = [Dρσ ]jk φk − (T µρ xσ − T µσ xρ ), 2 ∂(∂µ φj ) 2 j,k=1 where D is the Lie algebra representation of l ' sl(2, C) induced by D, and Dρσ = D(X ρσ ) with X µν ∈ l as defined before. The corresponding conserved quantities J ρσ are then Z M 0ρσ d3 x. J ρσ = R3 If (i, j, k) is a cyclic permutation of (1, 2, 3), then J ij is interpreted as the xk -component of the angular momentum. Although the Lagrangian formulation of classical fields is very useful in order to describe symmetries, it will also be necessary to construct the Hamiltonian formalism from a given Lagrangian. The reason for this is that the Hamiltonian formalism involves Poisson (and Dirac) brackets, which can be used to postulate the commutation relations of the field component in the corresponding quantum theory. Also, after quantizing in the Hamiltonian formalism, the time evolution of the (quantized) fields can be stated in a simple form. As described above, the Lagrangian L is a functional on the configuration space and this configuration space consists of 2n-tuples of functions (qj (x), q̇j (x)). We now define a second space, called the phase space, which also consists of 2n-tuples (qj (x), pj (x)), where the first n functions are allowed to be functions of the same class as the first n functions in the elements of configuration space, but the last n objects pj (x) are defined as linear functionals Z q̇ 7→ pj (x)q̇(x)d3 x R3 on the space of all allowed functions q̇(x) (note that this is very similar to the case of finitely many particles, where the configuration space is the tangent bundle of some smooth manifold and the phase space is the cotangent bundle of the same manifold). In order to define the Poisson brackets we need to define the notion of a functional derivative of a functional. Let A be a functional of k functions f1 (x), . . . , fk (x), i.e. A is a map (f1 , . . . , fk ) 7→ A[f1 , . . . , fk ]. We then define the functional derivative of A with respect to fj at the point (g1 , . . . , gk ) as the linear map d h 7→ A[g1 , . . . , gj−1 , gj + h, gj+1 , . . . , gk ] , d =0 where h(x) is a function in the same class as the fj . The integral kernel of this map is denoted by δA/δfj : Z d δA[f1 , . . . , fk ] A[g1 , . . . , gj−1 , gj + h, gj+1 , . . . , gk ] h(x)d3 x = d δf (x) 3 j R =0 (g1 ,...,gk ) and this integral kernel will also be called the functional derivative of A with respect to fj . We may interpret this functional derivative as a functional Z δA[f1 , . . . , fk ] (g1 , . . . , gk , h) 7→ h(x)d3 x, δf (x) 3 j R (g1 ,...,gk ) which is linear in h, as already stated above. Recall that in writing partial derivatives of ordinary functions, we often denote the partial derivative of f (x1 , . . . , xn ) with respect to xj at the point ∂f (y1 ,...,yn ) ∂f (x1 ,...,xn ) (y1 , . . . , yn ) as instead of . Similarly, we will often write the functional ∂yj ∂xj (y1 ,...,yn ) derivative of the functional A[f1 , . . . , fk ] above with respect to fj at the point (g1 , . . . , gk ) as δA[g1 , . . . , gk ] . δgj (x) 66 When the functional A depends in a well-behaved manner on the functions fj , then it might happen that these functional derivatives depend on x in a well-behaved manner, and in that case we can consider them as a map (x, g1 , . . . , gk ) 7→ δA[g1 , . . . , gk ] , δgj (x) i.e. as a functional of the gj , depending in a well-behaved manner on the variable x. We now apply these definitions as follows. Let F [q, p] and G[q, p] be two functionals on phase space. We then define the Poisson bracket {F, G}P of F and G by {F, G}P := n Z X j=1 R3 δF [q, p] δG[q, p] δG[q, p] δF [q, p] 3 d x. − δqj (x) δpj (x) δqj (x) δpj (x) In particular, for the functionals Fj,x [q, p] = qj (x) and Gk,y [q, p] = pk (y) we find that their Poisson bracket is δjk δ(x−y). We will now use the Lagrangian L[q, q̇] to define a function from configuration space to phase space by (q1 (x), . . . , qn (x), q̇1 (x), . . . , q̇n (x)) 7→ (q1 (x), . . . , qn (x), π1 [q, q̇](x), . . . , πn [q, q̇](x)), (3.29) where the πj [q, q̇] are given by πj [q, q̇](x) = δL[q, q̇] . δ q̇j (x) A priori the x-dependence should be interpreted in the sense of linear functionals as described above, but because the Lagrangian L is a space integral of a Lagrangian density L(aj , bk , cl ) (with j, l ∈ {1, . . . , n} and k ∈ {1, . . . , 3n}), the πj [q, q̇](x) are well-behaved functions given by ∂L(q(x), (∇q)(x), c) πj [q, q̇](x) = , ∂cj c=q̇(x) It might happen that the map (3.29) gives rise to certain constraints, e.g. if L(aj , bk , cl ) does not depend on cm for some m ∈ {1, . . . , n} then the image of configuration space under the map (3.29) only contains 2n-tuples of the form (q1 , . . . , qn , p1 , . . . , pm−1 , 0, pm+1 , . . . , pn ). In this case we can write the constraint as pm ≡ 0. In general, we will consider constraints of the form αm,x [q, p] = 0, where for each pair (m, x) ∈ {1, . . . , M } × R3 , the functional αm,x [q, p] is a function of the qs and ps and their derivatives at the point x. These constraints are called primary constraints because they already follow from the definition of the functions πj , without using equations of motion. We will assume that these primary constraints are all independent of one another. We now define a functional H 0 [q, q̇] on configuration space by Z 0 0 H [q, q̇] = H [q, q̇, π[q, q̇]] := q̇j (x)πj [q, q̇](x)d3 x − L[q, q̇]. R3 It can be shown that H 0 [q, q̇] is actually a functional H 0 [q, π[q, q̇]], i.e. that the dependence on q̇ is only via π. If there are primary constraints, we cannot extend H 0 uniquely to a functional H[q, p] on the entire phase space. Indeed, if H[q, p] is such that H[q, π[q, q̇]] = H 0 [q, q̇], then HT [q, p] = H[q, p] + M Z X um,x [q, p]αm,x [q, p]d3 x 3 m=1 R also satisfies this property for any set of functionals um,x [q, p], since αj,x [q, p] vanishes when we substitute π[q, q̇] for p. However, as we will explain below, not all of the HT of the form above will give rise to Hamiltonian equations of motion in the phase space that are consistent with 67 the primary constraints. To see this, we first have to introduce the concept of time evolution of the qs and ps. Analogues to the theory of Hamiltonian particle dynamics, the time evolution in Hamiltonian field theory is described by maps t 7→ (q (t) , p(t) ), where for each t ∈ R, (q (t) , p(t) ) is an element in phase space. This time evolution is given by the Hamiltonian equations of motion (t) ∂qj (x) ∂t (t) ∂pj (x) ∂t ∂ Fj,x [q (t) , p(t) ] ∂t ≈ {Fj,x [q (t) , p(t) ], HT [q (t) , p(t) ]}P = ∂ Gj,x [q (t) , p(t) ] ∂t ≈ {Gj,x [q (t) , p(t) ], HT [q (t) , p(t) ]}P , = where Fj,x [q, p] = qj (x) and Gj,x [q, p] = pj (x), and ≈ means that the equality only holds for those qs and ps that satisfy the constraints. This time evolution of the qs and ps determines the time evolution for any functional g[q, p] by t 7→ g[q (t) , p(t) ] =: g (t) [q, p], which is equivalent to d (t) (t) g [q, p] ≈ {g (t) [q, p], HT [q, p]}P . dt In particular, for g = HT we find that d (t) H [q, p] ≈ 0. dt T Therefore, we often suppress the time dependence of HT . We can also take g to be a constraint functional αj,x . Because the constraints must be satisfied for all time, we have the following equations {αm,x [q, p], HT [q, p]}P ≈ 0 (3.30) for m = 1, . . . , M . There are now three options for any one of these equations: (1) The equation is trivially true because it follows from the primary constraints; (2) The equation reduces to an equation not involving the um,x [q, p]. In this case we obtain a new constraint of the form βx [q, p] ≈ 0, which is called a secondary constraint; (3) The equation does not reduce in any of the two manners described above. In this case we obtain an equation that describes restrictions on the um,x , namely {αm,x [q, p], H[q, p]}P + M Z X m0 =1 d3 xum0 ,x [q, p]{αm,x [q, p], αm0 ,x [q, p]}P ≈ 0. (3.31) R3 The procedure is now as follows. For each value of m we check which of the three options is satisfied. If it is (1), then we obtain nothing new and we move on to the next m. If it is (2), then we have obtained a new constraint βx [q, p] ≈ 0 and we must demand consistency of this new constraint by substituting it into (3.30) instead of αm,x [q, p]: {βm,x [q, p], HT [q, p]}P ≈ 0. We then have to check again which of the three options is satisfied for this new equation. Finally, if option (3) is satisfied then we have obtained a constraint on the um,x [q, p] and we move on to the next m. The final result of this procedure is that we are left with M primary constraints αm,x [q, p] ≈ 0, K secondary constraints βk,x [q, p] ≈ 0 and L constraints on the um,x of the form (3.31). Because the distinction between primary and secondary constraints is not really necessary, we will use the letter χ for both primary and secondary constraints from now on, χm,x := αm,x 68 for m = 1, . . . , M and χM +k,x = βk,x for k = 1, . . . , K, so we can write the set of primary and secondary constraints as χn,x [q, p] ≈ 0, with n = 1, . . . , M + K. The constraints on the um,x [q, p] are now of the form M Z X {χn,x [q, p], H[q, p]}P + d3 yum,y [q, p]{χn,x [q, p], χm,y [q, p]}P ≈ 0 3 m=1 R for some of the n ∈ {1, . . . , M + K}. We can interpret this as a system of non-homogeneous linear equations in the unknown variables um,x [q, p]. If Um,x [q, p] is any particular solution of this system, then the general solution can be written as um,x [q, p] = Um,x [q, p] + A Z X d3 zva,z [q, p] · (V (a,z) )m,x [q, p], R3 a=1 where the va,x [q, p] are arbitrary functionals and the (V (a,z) )m,x [q, p] are all independent solutions of the corresponding homogeneous system: M Z X d3 y(V a,z )m,x [q, p]{χn,x [q, p], χm,y [q, p]}P ≈ 0. 3 m=1 R The index (a, z), which labels the solutions to the homogeneous system, in general takes on less values than the index (m, x), so the constraint equations on um,x [q, p] have reduced the arbitrariness of the Hamiltonian HT somewhat. The Hamiltonian HT can now be written as HT [q, p] = H[q, p] M Z X + d3 x Um,x [q, p] + 3 m=1 R = H[q, p] + Z + a=1 d3 zva,z [q, p](V (a,z) )m,x [q, p] χm,x [q, p] d3 xUm,x [q, p]χm,x [q, p] R3 d3 zva,z [q, p] R3 e p] + =: H[q, ! R3 a=1 M X m=1 A Z X A Z X M Z X ! d3 x(V (a,z) )m,x [q, p]χm,x [q, p] 3 m=1 R A Z X a=1 d3 zva,z [q, p]e χa,z [q, p]. R3 We have thus obtained an expression for the Hamiltonian in which the arbitrariness is made very explicit. The equations of motion for any functional g (t) [q, p] can be written in terms of this Hamiltonian as A X d (t) e p]}P + g [q, p] ≈ {g (t) [q, p], H[q, dt Z a=1 R3 (t) d3 zva,z [q, p]{g (t) [q, p], χ ea,z [q, p]}P , (t) where the time dependence of the va,z [q, p] is arbitrary. Because of this arbitrariness in the equations of motion, the time evolution of g (t) [q, p] is not uniquely defined. The physical interpretation (t) of this is that different choices for va,z [q, p] correspond to the same physical situation; the system has some gauge freedom. The infinitesimal gauge transformations are of the form g[q, p] → g[q, p] + A Z X a=1 R3 d3 za,z {g[q, p], χ ea,z [q, p]}P . 69 (3.32) Transformations of this kind do not change the physical state. The χ ea,y [q, p] are called generating functionals for the infinitesimal gauge transformations. It can be shown that the Poisson bracket {e χa,z [q, p], χ ea0 ,z0 [q, p]}P of two generating functionals is again a generating functional. In general, this gives rise to new gauge transformation, other than (3.32). For a better understanding of theses new gauge transformations, we introduce some useful terminology. Any functional A[q, p] for which {A[q, p], χn,x [q, p]}P ≈ 0 for any pair (n, x) is called first class, and it is easy to see that the Poisson bracket of two first class functionals is again first class. Any functional that is not first class is called second class. We can apply this terminology to the constraint functionals χn,x [q, p] themselves. In this manner the constraints can be divided into first class constraints and second class constraints. The constraint functionals χ ea,z [q, p], which are generating functionals for gauge transformations of the form (3.32), are all first class (and primary). Because the Poisson bracket of two first class functionals is again first class, we find that the {e χa,y [q, p], χ ea0 ,y0 [q, p]}P are first class constraints. However, it might happen that some of these are not primary (and hence secondary) and in this case we obtain gauge transformations other than (3.32). The corresponding first class secondary constraints can be added to the Hamiltonian in a similar manner as the χ ea,y [q, p] without changing the physics; this new (and more general) Hamiltonian is called the extended Hamiltonian. In general it is not true that every first class secondary constraint generates a gauge transformation (so we should not add them all to the Hamiltonian), but in all physically interesting models this turns out to be the case. In these models all first class constraints can be eliminated by choosing a gauge, so we from now on we will only need to focus on second class constraints. In particular, we will only discuss the quantization of a system with second class constraints. Quantization For a system with second class constraints χn,x [q, p] ≈ 0 we define the “matrix” C(n1 ,x1 ),(n2 ,x2 ) [q, p] = {χn1 ,x1 [q, p], χn2 ,x2 [q, p]}P . Because the constraints are second class, this matrix has an inverse (C −1 )(n1 ,x1 ),(n2 ,x2 ) [q, p] (recall that we are not trying to be mathematically rigorous in this chapter). We then define the Dirac bracket of two functionals by {A[q, p], B[q, p]}D := {A[q, p], B[q, p]}P X Z − d3 x1 d3 x2 {A[q, p], χn1 ,x1 [q, p]}P (C −1 )(n1 ,x1 ),(n2 ,x2 ) [q, p] n1 ,n2 {χn2 ,x2 [q, p], B[q, p]}P . In quantizing this system, the functionals become operators and we impose on them the commutation relations [A, B] = i{A, B}D and the time evolution in the Heisenberg picture of the corresponding quantum system is given by the Heisenberg equations of motions, as usual. The reason for choosing these commutation relations can be explained as follows for the case of finitely many30 degrees of freedom. Recall that for unconstraint systems the commutation relations are defined to be i times the Poisson bracket. If we have a classical system with only second class constraints, then it can be shown that there exists a canonical transformation31 that transforms the coordinates q1 , . . . , qn and their b1 , . . . , Q b r with canoncanonical conjugates p1 , . . . , pn to a set of coordinates Q1 , . . . , Qn−r and Q b b ical conjugates P1 , . . . , Pn−r and P1 , . . . , Pr , respectively, such that the constraints take the form b j = Pbj = 0 with j = 1, . . . , r. The Qs and the P s then form an unconstraint system which Q 30 Taking finitely many degrees of freedom makes the argument a little easier. This is a coordinate transformation for which the new coordinates have the same Poisson brackets as the old ones. Here the Poisson brackets of the new coordinates are computed with respect to the old coordinates. 31 70 we know how to quantize, namely by taking the commutators to be i times the Poisson brackets (in terms of the unconstraint P s and Qs of course). But it can be shown that the Dirac bracket, calculated in the original coordinates qj and pj , coincides with the Poisson bracket calculated in terms Qs and P s, so using the Dirac bracket indeed gives the desired result. More details about this quantization procedure can be found in section 7.6 of [35]. Transition to the interaction picture Many (or perhaps even all) of the free fields that we constructed at the beginning of this chapter can be obtained by quantizing a classical Lagrangian field theory according to the procedure described above, and in these cases the Lagrangian is known explicitly. However, the expansion of the free fields in terms of creation and annihilation operators does not arise in any way from the quantization procedure described above. Instead, this expansion can be obtained by solving the classical equations of motion (which are linear differential equations for these free field Lagragians) and writing the general solution as a Fourier expansion. The Fourier coefficients are then replaced with the creation and annihilation operators in the corresponding quantized field. Since the commutation relations of the quantized free fields are dictated by the quantization procedure above, the commutation relations of the creation and annihilation operators are also determined by the quantization procedure. However, these commutation relations turn out to be precisely the ones that we had before (so there is some consistency here). Furthermore, the free Hamiltonian that is derived from the Lagrangian is also of the correct form when we write it in terms of the creation and annihilation operators. Suppose now that we are given a Lagrangian for an interacting field theory. Once we have derived the Hamiltonian from the Lagrangian by the procedure described above, we will split the Hamiltonian into a sum H = H0 + V of a free part H0 and an interaction part V . Finding the correct free part H0 of H is not a very difficult task, because for all physically relevant free fields we know the explicit form of the Lagrangian (and hence also of the Hamiltonian). Once we have the decomposition H = H0 +V , we make the transition to the interaction picture at time t = 0. So at t = 0 the interaction picture fields and their canonical conjugates coincide with the Heisenberg fields and their canonical conjugates, but they evolve from t = 0 in a different manner. The next task is to express the canonical conjugates of the interaction picture fields in terms of the fields and their time derivatives. This is done by taking the functional derivative of H0 (not H) with respect to the canonically conjugate fields. The result is an expression for the interaction term V in the interaction picture in terms of the free fields, as desired. 3.5 Some remarks on the physics of quantum fields It is clear that the entire framework described in the preceding sections was based on the fact that we want to calculate the S-matrix elements for a given scattering experiment. These S-matrix elements describe the scattering experiment in terms of the incoming particles and the resulting outgoing particles, but they give no insight into the processes during the period that the particles are interacting. The free field expansion of the S-operator is not very useful to examine this. Also, the methods above give us no information about how to describe arbitrary relativistic quantum systems (beyond scattering experiments). One could argue that for many practical purposes this is unnecessary, but in the end any satisfactory fundamental theory of nature should in principle be able to describe any system. Therefore, we must extend our discussion of the preceding sections. The obvious extension would be to somehow give meaning to Lagrangian theories in terms of interacting Heisenberg fields, and to assume that these fields describe the exact evolution of the system. For scattering experiments, these fields then interpolate between the fields of incoming and outgoing free particles. It is difficult to imagine what these fields would be like from a mathematical point of view, since in general we cannot even solve the (non-linear) classical equations of motion corresponding to these fields. However, there are some properties that we must expect these fields to have. For example, they should transform under Poincaré transformations in such a way that the theory is Poincaré invariant, and any physical quantity that can be measured in some bounded 71 spacetime region must be compatible with the physical quantities that can be measured in some other spacelike-separated region. The latter demand can most easily be implemented by assuming that the fields either commute or anticommute at spacelike separated points, just as the free fields described earlier. These considerations will all be used to motivate the two mathematical frameworks for quantum field theory that we will discuss in the next chapter. 72 4 The mathematics of quantum fields In this chapter we will discuss the mathematical structure of quantum field theory. In the first section we will introduce the Wightman axioms and we will motivate these axioms by refering back to several physical aspects that we discussed in the previous chapters. After introducing these axioms we will formulate some of the important theorems that can be proven within the Wightman formalism. In the second section of this chapter we will briefly discuss an alternative approach to the problem of finding a mathematical framework for quantum field theory, namely the theory of local observables, also called algebraic quantum field theory. The axioms of the theory of local observables are often refered to as the Haag-Kastler axioms. 4.1 The Wightman formulation of quantum field theory However, before we can understand the Wightman formalism, we first need to develop some mathematical background. This mathematical background consists mainly of the theory of distributions and operator-valued distributions and will be summarized in the following subsection, which is based on chapter 2 of [32] and section 2.7 of [2]. The rest of this section on the Wightman formulation is also mainly based on these two books. 4.1.1 Mathematical preliminaries: Distributions and operator-valued distributions Let C ∞ (RN ) be the space of infinitely differentiable complex-valued functions f (x1 , . . . , xN ) on RN . For any sequence k = (k1 , . . . , kN ) with kj ∈ Z≥0 we define a function x = (x1 , . . . , xN ) 7→ xk in C ∞ (RN ), where xk is given by xk := x1 k1 . . . xN kN . Also, for any such sequence k we define the differential operator Dk on C ∞ (RN ) by Dk := ∂ |k| , (∂x1 )k1 . . . (∂xN )kN where |k| = k1 + . . . + kN . For each f ∈ C ∞ (RN ) we then define for any r, s ∈ Z≥0 X X kf kr,s = sup |xk Dl f (x)|, {k:|k|≤r} {l:|l|≤s} x which is either a non-negative real number or else +∞. For any fixed r, s the restriction of k kr,s : C ∞ (RN ) → R to the linear subspace C ∞ (RN )r,s of C ∞ (RN ) consisting of all functions f for which kf kr,s < ∞ is a norm. Definition 4.1 The function space S(RN ) is defined to be the set of all f ∈ C ∞ (RN ) for which kf kr,s < ∞ for all r, s ∈ Z≥0 . The space S(RN ) is also called the Schwartz space. In particular, since the k kr,s are norms (and hence semi-norms) on the Schwartz space, we can give S(RN ) the structure of a locally convex space by taking as a subbasis for the topology the sets of the form Br,s (f0 , ) = {f ∈ S(RN ) : kf − f0 kr,s < }, where f0 ∈ S(RN ), r, s ∈ Z≥0 and > 0. Thus a set U ⊂ S(RN ) T is open if and only if for each f0 ∈ U there are r1 , . . . , rn , s1 , . . . , sn and 1 , . . . , n > 0 such that nj=1 Brj ,sj (f0 , j ) ⊂ U . As in any topological space, we say that a sequence (fn ) of elements in S(RN ) converges to f ∈ S(RN ) if for each open neighborhood U of f there exists a positive integer NU such that fn ∈ U for all n ≥ NU . In particular, this condition for U must hold for any open neighborhood of the form Br,s (f, ), so if (fn ) converges to f then for every r, s ∈ Z≥0 and for every > 0 there exists a positive integer Nr,s, such that kfn − f kr,s < for all n ≥ Nr,s, . Conversely, if (fn ) is a sequence such that for every r, s ∈ Z≥0 and for every > 0 there exists a positive integer Nr,s, such that kfn − f kr,s < for all n ≥ Nr,s, , then (fn ) must converge to f . This is because each 73 T open neighborhood U of f contains an open subset of the form nj=1 Brj ,sj (f, j ). Therefore, we conclude that a sequence (fn ) in the Schwartz space converges to some f in the Schwartz space if and only if for all r, s ∈ Z≥0 we have limn→∞ kfn − f kr,s = 0. Now that we have defined the Schwartz space we will introduce the notion of a distribution on this space. Definition 4.2 A continuous linear functional T : S(RN ) → C on the Schwartz space is called a tempered distribution. We will denote the space of tempered distributions by S 0 (RN ). Because the Schwartz space S(RN ) is metrizable32 , the continuity of a linear functional T : S(RN ) → C can be expressed in terms of sequences (see for instance [25], theorem 21.3): the linear functional T is continuous if and only if for each sequence (fn ) converging to f we have that T (fn ) converges to T (f ), which is equivalent to the statement that limn→∞ kfn − f kr,s = 0 for all r, s ∈ Z≥0 implies that limn→∞ |T (fn ) − T (f )| = 0. The most natural topology to define on S 0 (RN ) is the weak*-topology, which is the topology that is defined by the seminorms pf : T 7→ |T (f )|. With respect to this topology, a sequence (Tn ) of tempered distributions converges to a tempered distribution T if and only if limn→∞ |Tn (f ) − T (f )| = 0 for all f ∈ S(RN ). We now introduce some terminology. We say that a distribution T ∈ S 0 (RN ) vanishes in an open set U ⊂ RN if T (f ) = 0 for all f ∈ S(RN ) for which supp(f ) ⊂ U . Here supp(f ) denotes the support of f , i.e. the complement of the largest open set contained in {x ∈ RN : f (x) = 0}. We then define the support supp(T ) of the distribution T to be the complement of the largest open set on which T vanishes. An important example of a tempered distribution is X Z T (f ) = tk (x1 , . . . , xN )Dk f (x1 , . . . , xN )dx1 . . . dxN , (4.1) N {k:|k|≤s} R where k = (k1 , . . . , kN ) (with kj ∈ Z≥0 ) and the functions tk are continuous and satisfy |tk (x)| ≤ Ck (1 + kxkjk )R for some Ck ≥ 0 and jk ∈ Z≥0 . A particularly nice case of (4.1) is when T is of the form T (f ) = RN t(x)f (x)dN x. In that case we say that the tempered distribution T is a function. However, it is convenient to write any distribution T as T (x), even though it is not a function. For any non-singular linear transformation L : RN → RN and any vector a ∈ RN we can define the diffeomorphism φ(a,L) (x) = Lx + a on RN . For any function f ∈ S(RN ) we then define a new function f(a,L) by −1 f(a,L) (x) := f (φ−1 (a,L) (x)) = f (L (x − a)). For a distribution T ∈ S 0 (RN ) we define a new distribution T(a,L) by T(a,L) (f ) := | det(L)|−1 T (f(a,L) ) −1 for all f ∈ S(RN ). If we define variables (y1 , . . . , yN ) = (φ−1 1 (x), . . . , φN (x)), then we find that the volume elementRin RN satisfies dN x = | det(L)|dN y. Now suppose that T is a distribution that is given by T (f ) = RN t(x)f (x)dN x, as a special case of (4.1). Then Z −1 −1 T(a,L) (f ) = | det(L)| T (f(a,L) ) = | det(L)| t(x)f (φ−1 (x))dN x N R Z −1 −1 −1 = | det(L)| (t ◦ φ)(φ (x))f (φ (x))dN x N ZR −1 = | det(L)| (t ◦ φ)(y)f (y)| det(L)|dN y N R Z = t(Lx + a)f (x)dN x, (4.2) RN 32 This is because the topology of S(RN ) is determined by countably many seminorms, see also proposition IV.2.1 of [5] for this argument. It can be shown that S(RN ) is complete as a metric space, so that it is in fact a Fréchet space, but we will not need this fact. 74 where in the last step we changed dummy variables. As we have stated before, distributions are often written as T (x), even though they are not functions in general. In view of equation (4.2) above, the distribution T(a,L) is then denoted by T (Lx + a). With the definition of T(a,L) at hand, we can now define the partial derivative of a tempered distribution T as ∂T := lim h−1 (T(hej ,0) − T ). h→0 ∂xj N 0 N where {ej }N j=1 is the standard basis for R . By definition of convergence in S (R ) this means that for each f ∈ S(RN ) we have ∂T (f ) = ∂xj lim h−1 (T(hej ,0) − T )(f ) = lim h−1 [T (f(hej ,0) ) − T (f )] h→0 h→0 lim T [h (f(hej ,0) − f )] = T [ lim h−1 (f(hej ,0) − f )] h→0 ∂f = −T , ∂xj = −1 h→0 where in the second last step we used that h−1 (f(hej ,0) − f ) converges in S(RN ) and that T is continuous. For higher-order derivatives we have (Dk T )(f ) = (−1)|k| T (Dk f ). Suppose that for some n ≥ 2 we have a distribution T ∈ S 0 (Rn·N ) and let f1 , . . . , fn ∈ S(RN ). Then the product function f1 · . . . · fn is an element of S(Rn·N ), so we can consider T (f1 · . . . · fn ). This expression is linear in each of the fj in the sense that T (f1 · . . . · (λfj0 + µfj00 ) · . . . · fn ) = λT (f1 · . . . · fj0 · . . . · fn ) + µT (f1 · . . . · fj00 · . . . · fn ), and it depends continuously on each of the fj in the sense that lim T (f1 · . . . · fj,l · . . . · fn ) = T (f1 · . . . · fj · . . . · fn ) l→∞ if liml→∞ fj,l = fj . Thus, T ∈ S 0 (Rn·N ) defines a multilinear functional on S(RN )×n that is separately continuous in each of its arguments. Conversely, it is also true that each multilinear functional on S(RN )×n which is continuous in each of its arguments can be derived from a (unique) tempered distribution on S(Rn·N ) as above. This is the content of the nuclear theorem, which can be found in section 1.3 of [9]. Theorem 4.3 (Nuclear theorem) Let Tb : S(RN )×n → C be a multilinear functional which is separately continuous in each of its arguments. Then there exists a unique tempered distribution T ∈ S 0 (Rn·N ) such that for all f1 , . . . , fn ∈ S(RN ) we have Tb(f1 , f2 , . . . , fn ) = T (f1 · f2 · . . . · fn ). We will now discuss the Fourier transform on distributions. Recall that the Fourier transform and the inverse Fourier transform of a Schwartz function f are defined by N Z 1 (FB f )(p) = √ e−iB(p,x) f (x)dN x 2π RN and N Z 1 eiB(p,x) f (x)dN x, (F B f )(p) = √ N 2π R respectively, where B(., .) denotes a non-degenerate symmetric bilinear form (for example the Euclidean or the Minkowskian form). Now let f, g ∈ S(RN ). Because Schwartz functions behave very nice, we can use Fubini’s theorem to conclude that the Fourier transform satisfies Z Z N f (x)(FB g)(x)dN x. (FB f )(p)g(p)d p = RN RN 75 We will use this case. Suppose that T ∈ S 0 (RN ) is a tempered distribution of the R in the following form T (f ) = RN h(x)f (x)dN x with h ∈ S(RN ) a Schwartz function. We will write Th instead of T to denote the dependence of T on h. Then the equality above implies that we have the identity Z Z TFB h (g) = (FB h)(p)g(p)dN p = h(x)(FB g)(x)dN x RN RN = Th (FB g). Similarly, we also have the identity TF B h (g) = Th (F B g) for the inverse Fourier transform. The left-hand sides of these two equations can be used as a definition of the Fourier transform and its inverse, respectively, of the tempered distribution Th . This motivates the following definition for the Fourier transform and the inverse Fourier transform for tempered distributions. Definition 4.4 Let T ∈ S 0 (RN ) be a tempered distribution. Then the Fourier transform FB T of T is defined by (FB T )(f ) = T (FB f ). The inverse Fourier transform F B T of T is defined by (F B T )(f ) = T (F B f ). In order to also define the Laplace transform on distributions, it is convenient to start with a larger class of distributions than S 0 (RN ). Let D(RN ) ⊂ C ∞ (RN ) denote the set of all C ∞ -functions with compact support. By definition, a sequence (fn ) in D(RN ) converges to f ∈ D(RN ) if the supports of all fn lie in a single compact set K, if fn → f uniformly in K and if all derivatives of fn converge uniformly in K to the derivatives of f . It is clear that D(RN ) ⊂ S(RN ) as sets, so every tempered distribution defines a linear functional on D(RN ), and because convergence of a sequence in D(RN ) implies convergence of the same sequence in S(RN ), we see that any tempered distribution is in fact continuous with respect to the topology on D(RN ). So if D0 (RN ) denotes the space of all continuous linear functionals on D(RN ), then we have the inclusion S 0 (RN ) ⊂ D0 (RN ). In general this inclusion will be strict and we have thus obtained a class of distributions D0 (RN ) that is larger than S 0 (RN ). Now if T ∈ D0 (RN ) then for each g ∈ C ∞ (RN ) we can define a distribution gT by (gT )(f ) = T (f g) for f ∈ D(RN ). It is easy to see that gT ∈ D0 (RN ). Sometimes it can happen that gT is even a tempered distribution. Definition 4.5 For each T ∈ D0 (RN ) we define a set Γ(T ) ⊂ RN by Γ(T ) = {γ ∈ RN : e−B( . ,γ) T ∈ S 0 (RN )}, where B denotes a non-degenerate symmetric bilinear form and e−B( . ,γ) denotes the C ∞ -function x 7→ e−B(x,γ) on RN . It can be shown (see section 2.3 of [32]) that Γ(T ) is convex, i.e. if γ1 , γ2 ∈ Γ(T ) then also tγ1 + (1 − t)γ2 ∈ Γ(T ) for all 0 < t < 1. Note that this does not exclude the case Γ(T ) = ∅, so Γ(T ) might still be empty. However, if T ∈ S 0 (RN ) then 0 ∈ Γ(T ), so then Γ(T ) is certainly non-empty. In general, whenever T ∈ D0 (RN ) is such that Γ(T ) is non-empty and such that the support of T lies in some half-space of the form (B) Hα,r := {x ∈ RN : B(x, α) > r} with α ∈ RN and r ∈ R, the following theorem gives some information about Γ(T ). 76 (B) Theorem 4.6 If T ∈ D0 (RN ) with supp(T ) ⊂ Hα,r for some α ∈ RN and r ∈ R, then Γ(T ) contains all points of the form γ + tα with γ ∈ Γ(T ) and t ≥ 0. This is theorem 2.7 of [32]. Note that the actual value of r ∈ R is not important here. We now define the Laplace transform of a distribution. Definition 4.7 Let T ∈ D0 (RN ). For each γ ∈ Γ(T ) we define the Laplace transform LB (T )γ ∈ S 0 (RN ) by LB (T )γ = FB (e−B( . ,γ) T ). R N If R T is given by T (fN) = RN t(x)f (x)d x, then its Laplace transform is given by LB (T )γ (f ) = RN LB (T )γ (p)f (p)d p, with the function LB (T )γ (p) given by N Z 1 √ LB (T )γ (p) = e−iB(p,x) e−B(x,γ) t(x)dN x N 2π R N Z 1 √ = e−iB(x,p−iγ) t(x)dN x, N 2π R where we have extended B to complex vectors by making B a C-bilinear form (not a sesquilinear form). We will often identify the Laplace transform LB (T )γ with the function LB (T )γ (p). The expression for LB (T )γ (p) gives the impression that the Laplace transform of a distribution T of the form above depends on the complex variables p − iγ = (p1 − iγ1 , . . . , pN − iγN ) in a nice way. In fact, as stated in the following theorem, which is theorem 2.6 in [32], this is even true for general distributions in D0 (RN ). Theorem 4.8 Let Γ ⊂ RN be a convex open set and let T ∈ D0 (RN ) be such that Γ ⊂ Γ(T ). Then the Laplace transform LB (T )γ is a holomorphic function LB (T )(p − iγ) on the tube RN − iΓ ⊂ CN . We will now apply the theorems above to distributions on Minkowski space. For the inside of the future light cone we will use the notation V + = {x ∈ M : η(x, x) > 0, x0 > 0}, where we have assumed that we have already chosen an orthonormal basis in M. The closure of this set is V + = {x ∈ M : η(x, x) ≥ 0, x0 ≥ 0} and is just the union of V + with the future light cone C + . For each a ∈ V + we have that η(x, a) ≥ 0 for all x ∈ V + , so for each a ∈ V + we have the inclusion (η) V + ⊂ Ha,− for any > 0. Now consider the n-fold product Mn ' R4n of M. On Mn we define a nondegenerate symmetric bilinear form η (n) by η (n) ((x1 , . . . , xn ), (y1 , . . . , yn )) = n X j=1 η(xj , yj ) = n X ηµν xµj yjµ . j=1 From now on, we will denote the bilinear form η (n) simply by η. Then, analogous to the inclusion above, we have for all a ∈ (V + )n (η) (V + )n ⊂ Ha,− for any > 0. Thus, if T ∈ S 0 (Mn ) (so that 0 ∈ Γ(T )) has support in V + then ta ∈ Γ(T ) for all a ∈ (V + )n and t ≥ 0, i.e. (V + )n ⊂ Γ(T ). Because (V + )n is open and convex in Mn , the Laplace transform of T is holomorphic on the tube Mn − i(V + )n =: Tn . This is summarized in the first part of the following theorem. The second part is theorem 2.9 in [32]. Theorem 4.9 If T ∈ S 0 (Mn ) with supp(T ) ⊂ (V + )n then the Laplace transform Lη (T )(p − iγ) is a holomorphic function on the tube Tn = Mn − i(V + )n . Also, for each f ∈ S(Mn ) we have Z lim Lη (p − iγ)f (p)d4n p = (Fη (T ))(f ). γ→0 Mn 77 We will now introduce the notion of a vector-valued distribution. Our purpose for using vectorvalued distributions will be to define operator-valued distributions. Vector-valued distributions can be defined to take on values in any locally convex space, but for us it will be enough to restrict ourselves to Hilbert spaces. Definition 4.10 Let H be a Hilbert space. Then a linear map T : S(RN ) → H is called a vectorvalued distribution if for all Ψ ∈ D, with D ⊂ H some dense linear subspace, the linear functional f 7→ hT (f ), Ψi is continuous. Thus, a vector-valued distribution is a linear map T : S(RN ) → H such that f 7→ hT (f ), Ψi is a tempered distribution for all Ψ in some dense linear subspace D ⊂ H. With this definition, the definition of an operator-valued distribution can be given as follows. Definition 4.11 Let T be a linear map from the Schwartz space S(RN ) to the set of closable33 operators on a Hilbert space H which are all defined on the same dense linear subspace D ⊂ H. Then the map T is called an operator-valued distribution in H if for all Ψ ∈ D the correspondence f 7→ T (f )Ψ is a vector-valued distribution. 4.1.2 The Wightman axioms In section 2.2.4 we showed that in any quantum theory that is Poincaré invariant we should have a e↑ of the restricted Poincaré group P ↑ on the Hilbert unitary representation of the double cover P + + space H of pure states. Therefore, before we can even start discussing quantum fields the following axiom must be satisfied: Axiom 0: Relativistic quantum theory If H denotes the Hilbert space of pure states for some quantum system in Minkowski spacetime, e↑ → B(H) of the double cover of the restricted Poincaré then there is a unitary representation U : P + group on H describing the transformation of states and operators under a Poincaré transformae↑ is represented by a unitary operator of tion. In particular, any spacetime translation (a, 1) ∈ P + the form U (a, 1) = eia·P , where the self-adjoint operators P = (P 0 , P 1 , P 2 , P 3 ) are interpreted as the energy-momentum operators of the system. The points in the joint spectrum of these operators lie on or inside the positive light cone in momentum space (positive energy condition), i.e. the operators P 0 and M 2 = P · P are both positive operators. Our description of quantum fields in the previous chapter seems to suggest that quantum fields are objects which assign an operator to each point in spacetime. However, the fields at a spacetime point are too singular to be a well-defined operator. Therefore, we assume that quantum fields only define well-defined operators after they are smeared out with some rapidly decreasing test function over spacetime. The quantum fields are thus operator-valued distributions. This motivates the following axiom. Axiom 1: Quantum field There is an object φ = (φ1 , . . . , φN ), called a quantum field, whose components are operator-valued distributions mapping each function f in the Schwartz space S(M) of functions on Minkowski spacetime to operators φ1 (f ), . . . , φN (f ) on H whose domains all contain the same dense subspace D ⊂ H and which satisfy φj (f )D ⊂ D. The adjoints φj (f )∗ are also operators whose domains contain D and which satisfy φj (f )∗ D ⊂ D; the adjoint field φ∗ = (φ∗1 , . . . , φ∗N ) is then defined by φ∗j (f ) = φj (f )∗ . Furthermore, the dense subset D is left invariant by U , i.e. U (a, A)D ⊂ D for e↑ . any (a, A) ∈ P + 33 An operator A : H → H is called closable if it has a closed extension, i.e. if it has an extension whose graph is closed in H ⊕ H. 78 Here S(M) is of course the same as S(R4 ). Note that the fact that D is invariant under the fields and their adjoints implies that for any Ψ ∈ D we can let arbitrary products of smeared fields and their adjoints act on Ψ. Equation (3.15) in the previous chapter shows that the quantum field components should transe↑ . This is stated in the following axiom. form according to a representation of P + Axiom 2: Transformation law of the field For each f ∈ S(M) we have the operator identity on D U (a, A)φj (f )U (a, A)−1 = N X S(A−1 )jk φk (f(a,A) ), k=1 e↑ → P ↑ is the covering map) and S : SL(2, C) → where f(a,A) (x) := f (Φ(A)−1 (x − a)) (here Φ : P + + GL(CN ) is a representation of SL(2, C) on CN . Note that this implies that for the adjoints of the smeared fields we have the transformation law N X S(A−1 )jk φk (f(a,A) )∗ , U (a, A)φj (f )∗ U (a, A)−1 = k=1 which follows easily by taking the adjoint of the transformation law of the fields. The representation S in axiom 2 is not assumed to be irreducible; in general it will be a direct sum S = S (κ1 ) ⊕. . .⊕S (κ` ) of irreducible representations S (κj ) of SL(2, C). Correspondingly, the field φ can be decomposed into irreducible fields as φ = (φ(κ1 ) , . . . , φ(κ` ) ). Each of these irreducible fields has components φ(κj ) ab with a = −Aj , −Aj + 1, . . . , Aj and b = −Bj , −Bj + 1, . . . , Bj , where Aj and Bj are the labels characterizing the irreducible representation S (κj ) of SL(2, C) as in the previous chapter. P` Of course, the Aj , Bj satisfy j=1 (2Aj + 1)(2Bj + 1) = N . Although the parameters {Aj , Bj }`j=1 give us information about how the different components of the field φ are related to each other, we cannot say anything about the complete form of any single component φj as operator-valued distribution. For instance, they do not have to satisfy the Klein-Gordon equation, which was satisfied for our (free) field components in the previous chapter. To say more about the field e↑ . components, we also need to know about the representation U (a, A) of P + To obtain a Lorentz invariant S-matrix it was also necessary that the Hamiltonian density commutes with itself at spacelike distances, see equation (3.5). This was then translated to the requirement that the fields and their adjoints should in fact commute or anticommute with each other as in equation (3.20), and in section 3.5 we argued that this property should probably remain valid beyond scattering theory. In terms of operator-valued distributions this can be formulated by using Schwartz functions whose supports are spacelike separated. We say that the supports of two Schwartz functions f and g are spacelike separated if f (x)g(y) = 0 whenever (x − y)2 ≤ 0. Axiom 3: Local commutativity or microscopic causality If f and g are Schwartz functions on Minkowski spacetime whose supports are spacelike separated, then for any j, k ∈ {1, . . . , N } the corresponding smeared operators either commute or anticommute, i.e. [φj (f ), φk (g)]± = 0 as operator identity on D. Similarly, we also have [φj (f ), φk (g)∗ ]± = 0. From the Poincaré covariance of the fields we expect that the components of an irreducible field φ(κj ) either all commute with each other at spacelike distances, or else they all anticommute with each other at spacelike distances. 79 Finally, we will also assume that each quantum field theory has a unique vacuum state and that the entire Hilbert space H of pure states can be constructed by acting on the vacuum state with polynomials in the smeared fields. Axiom 4: Vacuum state There exists a unique34 vector Ω ∈ D, called the vacuum state vector, which is invariant under the e↑ action of P + U (a, A)Ω = Ω and which is cyclic for the smeared fields, i.e. the set P (φ1 , . . . , φN )Ω of polynomials in the smeared fields acting on the vacuum vector forms a dense subspace in H. A quantum theory which satisfies the axioms 0-4 is called a quantum field theory. It is characterized by the objects (H, D, U, φ, Ω). The free fields discussed in the previous chapter provide examples of quantum field theories. We will prove this for the free hermitean scalar field in subsection 4.1.5. The existence of these examples implies that the axioms above must be compatible with each other. It can also be shown that these axioms are independent of each other, i.e. that one can find theories that satisfy only a proper subset of these axioms, but we will not discuss this here; see also section 3.2 of [32]. Finally, we want to make a remark related to the cyclicity of the vacuum state. We say that the smeared fields form an irreducible set of operators in the Hilbert space if for A ∈ B(H) the condition hAφj (f )Ψ1 , Ψ2 i = hAΨ1 , φj (f )∗ Ψ2 i for all Ψ1 , Ψ2 ∈ D, all f ∈ S(M) and all j, implies that A is a constant multiple of the identity. We mention without proof that the cyclicity of the vacuum implies that in any quantum field theory the fields form an irreducible set of operators. 4.1.3 Wightman functions Given any quantum field theory with field components {φj }N j=1 and their corresponding adjoints, we can define for each n ≥ 0 maps of the form D E (∗) (∗) (∗) wi(∗) ...i(∗) : (f1 , . . . , fn ) 7→ φi1 (f1 )φi2 (f2 ) . . . φin (fn )Ω, Ω n 1 D E =: φi(∗) (f1 )φi(∗) (f2 ) . . . φi(∗) (fn )Ω, Ω 1 2 n (∗) from S(M)×n to C. Here φj refers to either taking the adjoint of φj or not. Note that in the second line we also introduce the notation φj ∗ to denote the adjoint field φ∗j . The benefit of this notation is that we can refer to adjoint fields in expressions where only the field indices occur, such as in wi(∗) ...i(∗) . However, from now on we will often suppress the (∗) unless it is really necessary. n 1 The maps wi1 ...in are separately continuous in each of their n arguments, so according to the nuclear theorem in the previous subsection, there exist unique tempered distributions Wi1 ...in on S(M×n ) that satisfy Wi1 ...in (f1 · f2 · . . . · fn ) = wi1 ...in (f1 , . . . , fn ) for all fj ∈ S(M). Here the arguments of each of the functions f1 , . . . , fn in the product f1 ·f2 ·. . .·fn lie in different copies of M, so the product indeed defines a function in S(M×n ). The distributions Wi1 ...in are called (n-point) vacuum expectation values or Wightman functions. As stated in the previous subsection, we often write distributions as if they were functions. Thus we will often write the Wightman functions as Wi1 ...in (x1 , . . . , xn ), where each of the variables xj denotes a four-vector with components xµj . The Wightman functions satisfy some nice properties, as stated in the following theorem. A detailed proof can be found in section 3.3 of [32]. 34 Here we mean uniqueness up to a phase factor, of course. 80 Theorem 4.12 In any quantum field theory the Wightman functions are tempered distributions which satisfy the following properties. (a) (Relativistic transformation law). Under Poincaré transformations the Wightman functions transform as X S(A−1 )i(∗) j (∗) . . . S(A−1 )i(∗) j (∗) Wj (∗) ...j (∗) (Φ(A)x1 +a, . . . , Φ(A)xn +a) = Wi(∗) ...i(∗) (x1 , . . . , xn ), (∗) 1 (∗) n 1 n 1 n 1 n j1 ,...,jn where S(A−1 )i∗k jk∗ := S(A−1 )ik jk . So they are translation invariant and Lorentz covariant. For each n, let ξj = xj − xj+1 for j = 1, . . . , n − 1. Then translation invariance implies that there exist tempered distributions Vj1 ...jn (ξ1 , . . . , ξn−1 ) such that Wj1 ...jn (x1 , . . . , xn ) = Vj1 ...jn (ξ1 , . . . , ξn−1 ). cj ...jn = F η (Wj ...jn ) and Vbj ...jn = (b) (Spectral conditions). The (inverse) Fourier transforms W 1 1 1 F η (Vj1 ...jn ) of Wj1 ...jn and Vj1 ...jn are tempered distributions and are related by   n X cj ...jn (p1 , . . . , pn ) = (2π)4 δ  pj  Vbj1 ...jn (p1 , p1 + p2 , . . . , p1 + p2 + . . . + pn−1 ). W 1 j=1 Also, Vbj1 ...jn (q1 , . . . , qn−1 ) = 0 if any qj is not in the joint spectrum of the operators P µ . (c) (Hermiticity conditions). The Wightman functions satisfy Wi1 ...in (x1 , . . . , xn ) = Wi∗n ...i∗1 (x1 , . . . , xn ), where i∗k refers to the field obtained by taking the adjoint of the (adjoint) field that is referred to (∗) by the index ik ≡ ik . (d) (Local commutativity conditions). If (xj − xj+1 )2 < 0, then for j = 1, . . . , n − 1 we have Wj1 ...jn (x1 , . . . , xj+1 , xj , . . . , xn ) = ∓Wj1 ...jn (x1 , . . . , xj , xj+1 , . . . , xn ), where the signs ∓ correspond to the two cases [φj , φj+1 ]± , respectively. (e) (Positive definiteness conditions). For any sequence {fi1 ,...,in }∞ n=0 with fi1 ,...,in (x1 , . . . , xn ) n in S(M ) and with fn ≡ 0 for all but finitely many values of the multi-indices (i1 , . . . , in ), we have the inequality ∞ X X X Z m,n=0 (i∗1 ,...,i∗m ) (j1 ,...,jn ) Mm+n Wi∗m ,...,i∗1 ,j1 ,...,jn (xm , . . . , x1 , x01 , . . . , x0n )fi1 ,...,im (x1 , . . . , xm ) fj1 ,...,jn (x01 , . . . , x0n )d4 x1 . . . d4 xm d4 x01 . . . d4 x0n ≥ 0. (f ) (Cluster decomposition property). For any spacelike vector a ∈ M and for any m ∈ {1, . . . , n} we have lim Wj1 ...jn (x1 , . . . , xm , xm+1 + λa, xm+2 + λa, . . . , xn + λa) λ→∞ = Wj1 ...jm (x1 , . . . , xm )Wjm+1 ...jn (xm+1 , . . . , xn ), where the limit is taken in the topology of S 0 (Mn ). Conversely, if we have a set of tempered distributions satisfying all the properties in the theorem, then there exists a unique quantum field theory for which these distributions are the Wightman functions. This is also called the reconstruction theorem, see for example section 3.4 of [32]. 81 As stated in part (b) of the theorem above, the support of the distribution Vbj1 ,...,jn (q1 , . . . , qn−1 ) ∈ lies in (V + )n−1 . Therefore, according to theorem 4.9, the Laplace transform Lη (Vbj1 ,...,jn ) is holomorphic on the tube Tn−1 = Mn−1 − i(V + )n−1 and for each f ∈ S(Mn−1 ) we have Z Lη (Vbj1 ,...,jn )(x1 − iγ1 , . . . , xn−1 − iγn−1 ) f (x1 , . . . , xn−1 )d4(n−1) x lim S 0 (Mn−1 ) γ→0 Mn−1 = (Fη (Vbj1 ,...,jn ))(f ) = (Fη (F η (Vj1 ,...,jn )))(f ) = Vj1 ,...,jn (f ), so in this sense Vj1 ,...,jn is the boundary value of a holomorphic function defined on the tube Tn−1 , namely Lη (Vbj1 ,...,jn ). We will also denote this holomorphic function as Vjhol from now on. Thus, 1 ,...,jn Vj1 ,...,jn (x1 , . . . , xn−1 ) = lim Vjhol (x1 − iγ1 , . . . , xn−1 − iγn−1 ), 1 ,...,jn γ→0 where the convergence is in S 0 (Mn−1 ) and γj ∈ V + . On the set Tn := {(x1 − iγ1 , . . . , xn − iγn ) ∈ Mn + iMn : γj − γj+1 ∈ V + } we can define another holomorphic function by Wjhol (x1 − iγ1 , . . . , xn − iγn ) := Vjhol (x1 − x2 − i(γ1 − γ2 ), . . . , xn−1 − xn − i(γn−1 − γn )), 1 ,...,jn 1 ,...,jn (x1 , . . . , xn ) are boundary values of where γj − γj+1 ∈ V + , and the Wightman functions Wjhol 1 ,...,jn these functions. From part (a) of the theorem above it follows that under an SL(2, C) transformation the distributions V transform according to X S(A−1 )i1 j1 . . . S(A−1 )in jn Vj1 ...jn (Φ(A)x1 , . . . , Φ(A)xn−1 ) = Vi1 ...in (x1 , . . . , xn−1 ). j1 ,...,jn Now consider the holomorphic function X Vjhol (ξ , . . . , ξ ) − S(A−1 )i1 j1 . . . S(A−1 )in jn Vjhol (Φ(A)ξ1 , . . . , Φ(A)ξn−1 ) 1 n−1 ,...,j n 1 1 ...jn j1 ,...,jn on the tube Tn−1 . From the transformation properties of Vi1 ,...,in it follows that the boundary value bi1 ...in (x1 , . . . , xn−1 ) ∈ S 0 (Mn−1 ) of this holomorphic function is zero. According to the generalized uniqueness theorem for holomorphic functions with several complex variables (see for example theorem B.10 of [2]) this holomorphic function must then be identically zero on the tube Tn−1 . This shows that Vjhol has the same transformation properties on the tube Tn−1 as Vj1 ,...,jn 1 ,...,jn n−1 on M . In order to understand the following important theorem, we have to introduce the notion of a complex Lorentz transformation. Let M ' C4 be complex Minkowski spacetime with P3C := jM+iM 0 0 j Minkowski metric ηC (z, w) = z w − j=1 z w for w, z ∈ C4 . Then we define a complex Lorentz transformation to be a linear map L : MC → MC that preserves the metric ηC . The set L(C) of complex Lorentz transformations forms a group, called the complex Lorentz group. As for ordinary Lorentz transformations, we have det(L) = ±1 for complex Lorentz transformations, and we define the proper complex Lorentz group L+ (C) to be the set of those complex Lorentz transformations L with det(L) = +1. This group L+ (C) is connected (unlike L+ ) and its universal covering group is SL(2, C) × SL(2, C). The covering map ΦC : SL(2, C) × SL(2, C) → L+ (C) is defined in the following manner, which is very similar to the definition of the map Φ : SL(2, C) → L↑+ . We begin with a bijection ψC : MC → M2 (C) that maps each element z ∈ MC to a matrix ψC (z) with det(ψC (z)) = ηC (z, z); the matrix ψC (z) is defined by precisely the same formula as ψ(x) in subsection 2.1.2. Then for (A, B) ∈ SL(2, C) × SL(2, C) we define the determinant preserving map ΨA,B : M2 (C) → M2 (C) by ΨA,B (Z) = AZB T . 82 Under the bijection ψC this determinant preserving map corresponds to a metric preserving map ΦC (A, B) : MC → MC , so ΦC (A, B) ∈ L(C). Similar arguments as for the real case show that ΦC : SL(2, C) × SL(2, C) → L(C) is actually a surjective Lie group homomorphism onto L+ (C). The elements of the form (A, A) ∈ SL(2, C) × SL(2, C) form a subgroup isomorphic to SL(2, C) and for such elements we have ΦC (A, A) = Φ(A), T which simply follows from A = A∗ . So if we have a representation D of SL(2, C), then we can interpret this as a representation of the subgroup {(A, A) : A ∈ SL(2, C)} of SL(2, C) × SL(2, C). According to the discussion in section 9.1A of [2], this representation can be uniquely extended by analyticity to a representation DC of SL(2, C) × SL(2, C). We now apply these ideas to the Wightman functions. According to our discussion above, the holomorphic functions V hol satisfy a transformation law of the form X D(A−1 )jk Vkhol (Φ(A)z1 , . . . , Φ(A)zn ) = Vjhol (z1 , . . . , zn ), k where we write only a single index. This index corresponds to some basis for the vector space obtained by taking the n-fold tensor product of the N -dimensional vector space on which the representation S acts (here N is the number of field components in the theory and S is the representation as in the Wightman axioms). We can decompose the representation D into irreducible representations, which are of the form D (A,B) as we already noticed when we constructed general free fields in the previous chapter. If there are any representations with A + B a half-odd integer, then the corresponding components must be zero as follows from substituting A = −1 in the transformation law for Vjhol and using that D (A,B) (−1) = (−1)A+B . The non-trivial irreducible components of Vjhol thus transform according to single-valued representations of the restricted Lorentz group L↑+ . Also, the analytic continuation of D to a representation of SL(2, C) × SL(2, C) will define a single-valued representation of the proper complex Lorentz group L+ (C). We are now ready to state the following theorem of Bargmann, Hall and Wightman, the proof of which can be found in section 9.1B of [2] (theorem 9.1) or section 2.4 of [32] (theorem 2.11). Theorem 4.13 (Bargmann-Hall-Wightman) Let Fj (z1 , . . . , zn ) with j = 1, . . . , N be a set of holomorphic functions defined on the tube Tn that satisfies X D(A−1 )jk Fk (Φ(A)z1 , . . . , Φ(A)zn ) = Fj (z1 , . . . , zn ) k for A ∈ SL(2, C) and with D a representation of SL(2, C) the irreducible components of which are of the form D (A,B) with A + B ∈ Z≥0 . Then the Fj can be uniquely extended by analytic continuation to a holomorphic function on the so-called extended tube [ Tn0 := LTn L∈L+ (C) and this extension satisfies X DC (A−1 , B −1 )jk Fk (ΦC (A, B)z1 , . . . , ΦC (A, B)zn ) = Fj (z1 , . . . , zn ) k for (A, B) ∈ SL(2, C) × SL(2, C). In view of our discussion preceding the theorem, the theorem states that every L↑+ -covariant holomorphic function on the tube has a unique analytic continuation on the extended tube which is L+ (C)-covariant. We can apply the theorem to the functions Vjhol (z1 , . . . , zn ) to conclude that they are actually L+ (C)-covariant holomorphic functions on the extended tube. Similarly, the holomorphic Wightman functions W hol (z1 , . . . , zn ) can be extended to L+ (C)-covariant holomorphic functions on the extended tube [ Tn0 := LTn . L∈L+ (C) 83 Although the tube Tn did not contain any real points, the extended tune Tn0 does. These points are called Jost points. There is a simple characterization of Jost points, which can be found in section 2.4 of [32] (theorem 2.12) or in section 9.1C of [2] (proposition 9.5), but we will not need it. 4.1.4 Important theorems We will now discuss some of the famous theorems that can be proved for any quantum field theory, such as the spin-statistics theorem and the PCT theorem. To understand the main arguments in the proofs of these theorems, it is useful to know something about polynomial algebras of operators corresponding to open sets in Minkowski spacetime. In a quantum field theory with field φ = (φ1 , . . . , φN ) we define for each open set O ⊂ M the set P(O) consisting of all operators of the form M X c1H + φj1 (fk,1 ) . . . φjk (fk,k ) k=0 with c ∈ C, M ∈ Z≥0 and the fk,l (with 1 ≤ l ≤ k and 0 ≤ k ≤ M ) functions in S(M) with support contained in O. It is clear that P(O) is a ∗-algebra; it is called the polynomial algebra of O. According to the following theorem, the vacuum vector Ω ∈ H is cyclic for any P(O) with O ⊂ M open. We will only give a sketch of the proof, since some of the details of the proof require some more knowledge of holomorphic functions which will not be very relevant for our purposes; the full proof can be found in section 4.2 of [32]. Theorem 4.14 (Reeh-Schlieder) Given some quantum field theory (H, D, U, φ, Ω), let O ⊂ M be a non-empty open set and let Ψc ∈ H be cyclic for P(M). Then Ψc is also cyclic for P(O). Proof sketch Let Ψ ∈ H be a vector which is orthogonal to the set {AΨc }A∈P(O) . The first step in the proof consists of defining tempered distributions Fi(∗) ...i(∗) by 1 n Fi(∗) ...i(∗) (−x1 , x1 − x2 , . . . , xn−1 − xn ) = hφi(∗) (x1 ) . . . φi(∗) (xn )Ψc , Ψi 1 n 1 n and by argueing that the inverse Fourier transforms of these distributions vanish unless all of the variables lie in the joint spectrum of the operators P µ (which is a subset of V + ). Then theorem 4.9 is used to define a function Fihol which is holomorphic in the tube Tn in the complex variables 1 ...in (−x1 ) − iγ1 , (x1 − x2 ) − iγ2 , . . . , (xn−1 − xn ) − iγn and which converges to Fi1 ...in as γ1 , . . . , γn → 0 in V + . By definition of Ψ, the supports of the distributions Fi1 ...in lie in the complement of {(−x1 , x1 − x2 , . . . , xn−1 − xn ) ∈ Rn : x1 , . . . , xn ∈ O}, which in turn implies that Fihol vanishes 1 ...in on the whole tube Tn (this is a non-trivial argument from the theory of holomorphic functions in several complex variables). But then the distributions Fi1 ...in vanish on the whole space Mn . From the definition of these distributions it then follows that Ψ is in fact orthogonal to the set {AΨc }A∈P(M) . Because Ψc was cyclic for P(M), we must have Ψ = 0. This proves that Ψc is also cyclic for P(O). For any open set O ⊂ M we define an open set O∨ ⊂ M by O∨ = {x ∈ M : (x − y)2 < 0 for all y ∈ O}◦ , where we use the notation A◦ to denote the interior of a set A. Note that if O is also bounded, then O∨ will be non-empty. In that case the following theorem applies. Theorem 4.15 Given some quantum field theory (H, D, U, φ, Ω), let O ⊂ M be a non-empty open set with O∨ 6= ∅ and let A ∈ P(O) be a monomial with AΩ = 0. Then A = 0. 84 Proof Let Ψ ∈ D and let T ∨ ∈ P(O∨ ). Then, hA∗ Ψ, T ∨ Ωi = hΨ, AT ∨ Ωi = 0, where in the last step we used that A either commutes or anti-commutes with each term in the polynomial T ∨ , since O and O∨ are spacelike separated. By the previous theorem, {T ∨ Ω}T ∨ ∈P(O∨ ) is dense in H, so we conclude that A∗ Ψ = 0 for all Ψ ∈ D. For Ψ1 , Ψ2 ∈ D we then have hAΨ1 , Ψ2 i = hΨ1 , A∗ Ψ2 i = 0. Because D is dense in H, this implies that AΨ = 0 for all Ψ ∈ D. The (anti)commutator satisfies [A∗ , B ∗ ]± = (BA ± AB)∗ = ±[A, B]∗± , which implies that if the field components φj and φk (anti)commute at spacelike distances then the adjoint components φ∗j and φ∗k also (anti)commute at spacelike distances. Using the theorem above, we can also show that if the field components φj and φk (anti)commute at spacelike distances then the field components φj and φ∗k also (anti)commute at spacelike distances. Theorem 4.16 (Dell’Antonio). Let (H, D, U, (φi )N i=1 , Ω) be a quantum field theory and let j, k ∈ {1, . . . , N }. If we have at spacelike distances that [φj , φk ]± = 0, while [φj , φ∗k ]∓ = 0, then either φj or φk vanishes. Proof For any non-zero f, g ∈ S(M) with spacelike separated supports we have for Ψ ∈ D φj (f )∗ φk (g)∗ φk (g)φj (f )Ψ = ±φk (g)∗ φj (f )∗ φk (g)φj (f )Ψ = ∓ ± φk (g)∗ φk (g)φj (f )∗ φj (f )Ψ = −φk (g)∗ φk (g)φj (f )∗ φj (f )Ψ. Applying this to the vacuum vector Ω ∈ D we find the inequality 0 ≥ −kφk (g)φj (f )Ωk2 = −hφj (f )∗ φk (g)∗ φk (g)φj (f )Ω, Ωi = hφk (g)∗ φk (g)φj (f )∗ φj (f )Ω, Ωi. Suppose now that the supports K(f ), K(g) ⊂ M of f and g are compact and non-empty (and still spacelike separated). Let a ∈ M be a spacelike vector such that the compact set Kλ (g) := K(g)+λa remains spacelike separated from K(f ) for all λ > 0, and let gλ be the function gλ (x) = g(x − λa). Then the support of gλ is clearly Kλ (g), and for each λ ≥ 0 the inequality above gives hφk (gλ )∗ φk (gλ )φj (f )∗ φj (f )Ω, Ωi ≤ 0. By the cluster decomposition property of Wightman functions, we have lim hφk (gλ )∗ φk (gλ )φj (f )∗ φj (f )Ω, Ωi = hφk (g)∗ φk (g)Ω, Ωihφj (f )∗ φj (f )Ω, Ωi λ→∞ = kφk (g)Ωk2 kφj (f )Ωk2 ≥ 0. Together, these inequalities imply that kφk (g)Ωk2 kφj (f )Ωk2 = 0, so either φj (f )Ω = 0 or φk (g)Ω = 0. According to the previous theorem, this in turn implies that either φj (f ) = 0 or φk (g) = 0. We thus conclude that for all f, g ∈ S(M) with spacelike separated non-empty compact supports we have either φj (f ) = 0 or φk (g) = 0. Suppose that φj does not vanish. Then there exists a 85 function h ∈ S(M) with non-empty compact support K(h) such that φj (h) 6= 0. Then for any function p ∈ S(M) with compact support K(p) which is spacelike separated from K(h), we have φk (p) = 0. By considering different functions hi ∈ S(M) with compact supports K(hi ) ⊂ K(h) and repeating the same argument, we find that φk (p) = 0 for all p ∈ S(M) with compact support. Because the set of all such functions p is dense in S(M), this implies that φk vanishes. Similarly, assuming that φk does not vanish will imply that φj vanishes. As discussed before, in any quantum field theory we can decompose the field into fields which e↑ . In the previous chapter we found that irretransform as an irreducible representation of P + ducible fields which transform according to the (A, B)-representation can only describe particles with spin j ∈ {|A − B|, |A − B| + 1, . . . , A + B − 1, A + B}. Therefore, an irreducible field which transforms according to the (A, B)-representation will be called a field of integer spin if A + B is an integer and a field of half-odd integer spin if A + B is a half-odd integer. The following theorem shows that the components of an irreducible field of integer spin must commute with each other at spacelike separated distances and that the components of an irreducible field of half-odd integer spin must anticommute with each other at spacelike distances. Theorem 4.17 (Spin-statistics theorem). Let (H, D, U, φ, Ω) be a quantum field theory and let φ(κ) be an irreducible field in the decomposition of φ into irreducible fields. Suppose that φj is a component of φ which belongs to φ(κ) . Then, if φ(κ) is of integer spin and φj satisfies [φj (x), φ∗j (y)]+ = 0 for (x − y)2 < 0, or if φ(κ) is of half-odd integer spin and φj satisfies [φj (x), φ∗j (y)]− = 0 for (x − y)2 < 0, then φj and φ∗j vanish. Proof sketch Suppose that φ satisfies one of the two alternatives stated in the theorem. Then Vjj ∗ (x − y) + (−1) Vj ∗ j (−(x − y)) = hφj (x)φ∗j (y)Ω, Ωi + (−1) hφ∗j (y)φj (x)Ω, Ωi = h(φj (x)φ∗j (y) + (−1) φ∗j (y)φj (x))Ω, Ωi = 0, where = 0 for integer spin and = 1 for half-odd integer spin. This implies that the corresponding holomorphic functions satisfy hol Vjjhol (4.3) ∗ (ξ) + (−1) Vj ∗ j (−ξ) = 0 on the tube T1 . It can be shown (see theorem 2.11 of [32]) that there exists a single-valued analytic hol 35 T 0 , and continuation of the holomorphic functions Vjjhol ∗ (ξ) and Vj ∗ j (−ξ) to the extended tube 1 that hol Vjhol ∗ j (ξ) = (−1) Vj ∗ j (−ξ), where is as before. Combining this with (4.3) gives hol hol hol 0 = Vjjhol ∗ (ξ) + (−1) (−1) Vj ∗ j (ξ) = Vjj ∗ (ξ) + Vj ∗ j (ξ). Passing to the boundary, we obtain 0 = Vjj ∗ (x − y) + Vj ∗ j (x − y) = Vjj ∗ (x − y) + Vj ∗ j (−y − (−x)) = hφj (x)φ∗j (y)Ω, Ωi + hφ∗j (−y)φj (−x)Ω, Ωi. (4.4) The extended tube T10 is the set of all points ξ ∈ C4 of the form ξ = Λζ with ζ ∈ T1 and Λ a complex Lorentz transformation. A complex Lorentz transformation Λ is a 4 × 4 complex matrix which satisfies ΛT ηΛ = η. 35 86 Now let f ∈ S(M) and define f − (x) := f (−x), then kφj (f )∗ Ωk2 + kφj (f − )Ωk2 = hφj (f )φj (f )∗ Ω, Ωi + hφj (f − )∗ φj (f − )Ω, Ωi Z f (x)f (y)(hφj (x)φ∗j (y)Ω, Ωi + hφ∗j (−y)φj (−x)Ω, Ωi)dxdy = M2 = 0. This implies both φj (f )∗ Ω = 0 and φj (f − )Ω = 0. Thus, because f ∈ S(M) was arbitrary, we have φ∗j (x)Ω = 0 and φj (x)Ω = 0. Theorem 4.15 then implies that for all f ∈ S(M) with compact support we have φ∗j (f ) = 0 and φj (f ) = 0, which in turn implies that φ∗j and φj vanish. Together with our assumption that the components of an irreducible field either all commute or else all anticommute with each other at spacelike distances, this theorem implies that at spacelike distances the components of an irreducible field all commute with each other if the irreducible field is of integer spin and anticommute with each other if the irreducible field is of half-odd integer spin. This is the famous connection between spin and statistics: if we identify commuting fields with bosons and anticommuting fields with fermions, then bosons are described by fields of integer spin and fermions by fields of half-odd integer spin. The theorem gives no information about whether we should choose a commutator or anticommutator when we are dealing with components belonging to two different irreducible fields, so there is some freedom here. We say that in a quantum field theory we have a normal connection between spin and statistics if every component of a boson field commutes with any other field component in the theory and if two components of (different) fermion fields always anticommute with each other. It can be shown that in any quantum field theory in which there is no normal connection between spin and statistics, there is always a transformation of the fields, called a Klein transformation, which transforms the fields into new fields with a normal connection between spin and statistics. Therefore, we may as well assume from now on that all quantum field theories have a normal connection between spin and statistics. Then the following theorem applies. Theorem 4.18 (PCT-theorem). Let (H, D, U, φ, Ω) be a quantum field theory with normal connection between spin and statistics and let φ = (φ(κ1 ) j1 , . . . , φ(κ` ) j` ) be a decomposition of the field into irreducible fields φ(κj ) transforming according to the (Aj , Bj )-representation of SL(2, C). Then there exists a unique anti-unitary operator Θ on H which leaves the vacuum vector Ω invariant and satisfies ∗ Θφ(κj ) (x)Θ−1 = (−1)2Aj ij φ(κj ) (−x), where j = 0 if Aj + Bj is an integer and j = 1 if Aj + Bj is a half-odd integer. Proof sketch In the first part of the proof it is shown that in any quantum field theory with normal connection between spin and statistics we have Pk l=1 jl hφj1 (x1 ) . . . φjk (xk )Ω, Ωi = i (−1)2 Pk l=1 Ajl hφ∗j1 (−x1 ) . . . φ∗jk (−xk )Ω, Ωi, (4.5) where jl = 0 (or jl = 1) if φjl is a component of a boson (or fermion) field transforming according to the (Ajl , Bjl )-representation of SL(2, C). For functions fl ∈ S(M) this means that Pk hφj1 (f1 ) . . . φjk (fk )Ω, Ωi = i l=1 jl (−1)2 Pk l=1 Ajl hφj1 (fb1 )∗ . . . φjk (fbk )∗ Ω, Ωi, where fb(x) = f (−x). From this it easily follows that hφj1 (f1 )∗ . . . φjk (fk )∗ Ω, Ωi = hΩ, φjk (fk ) . . . φj1 (f1 )Ωi = hφjk (fk ) . . . φj1 (f1 )Ω, Ωi = i Pk l=1 jl (−1)2 Pk = (−i) l=1 jl Pk = (−i) l=1 jl 87 Pk l=1 Ajl hφjk (fbk )∗ . . . φj1 (fb1 )∗ Ω, Ωi (−1)2 Pk Ajl hφjk (fbk )∗ . . . φj1 (fb1 )∗ Ω, Ωi (−1)2 Pk Ajl hφj1 (fb1 ) . . . φjk (fbk )Ω, Ωi. l=1 l=1 So just like (4.5) we also get Pk hφ∗j1 (x1 ) . . . φ∗jk (xk )Ω, Ωi = (−i) l=1 jl (−1)2 Pk l=1 Ajl hφj1 (−x1 ) . . . φjk (−xk )Ω, Ωi. (4.6) Of course we can also make combinations of (4.5) and (4.6). The difference is that replacing a field component by its adjoint gives a factor ij (−1)2Aj , while replacing an adjoint field component by the corresponding field component gives a factor (−i)j (−1)2Aj . The next step in the proof consists of showing that the antilinear extension of Θφj1 (f1 ) . . . φjk (fk )Ω := (−i) Pk l=1 jl (−1)2 Pk l=1 Ajl φ∗j1 (fb1 ) . . . φ∗jk (fbk )Ω defines the anti-unitary operator with the desired properties. In showing this, the identities above are used to derive the anti-unitarity. In physics it is often convenient to consider quantum fields at a given time, for example when one wants to study equal-time commutation relations for the fields. According to the Wightman axioms, however, the fields are operator-valued distributions on Minkowski spacetime and therefore only the smeared fields φj (f ) for f ∈ S(M) define operators on the Hilbert space. In other words, the Wightman fields must be smeared out both in time and space. Suppose now that in addition to the Wightman axioms, we also assume that the fields φj define for each t and each f ∈ S(R3 ) a well-defined operator φj (t, f ) on the dense set D ∈ H such that for all u ∈ S(R) we have Z φj (f u) = φj (t, f )u(t)dt, R where f u ∈ S(M) is the function defined by f u(t, x) = f (x)u(t). Then the fields φj can also be considered as operator-valued distributions on S(R3 ) depending on a parameter t. To prevent bad t-dependence, we assume that for each f ∈ S(R3 ) and each Ψ ∈ D the norm of the vector φj (t, f )Ψ is a bounded function of |t|. When a quantum field theory satisfies these properties we will simply say that it satisfies the sharp-time axiom. For the following theorem we need the definition of the Euclidean group in three dimensions. The group E+ (3) of proper Euclidean motions in R3 is generated by translations and rotations in e+ (3) is therefore a semi-direct product of R3 and SU (2) (compare R3 . Its universal covering group E e↑ , which was a semi-direct product of R4 and SL(2, C)) and the multiplication law is this with P + given by (a1 , R1 )(a2 , R2 ) = (a1 + R1 a2 , R1 R2 ). Theorem 4.19 Let {(φ1 )j (t, .)}nj=1 and {(φ2 )j (t, .)}nj=1 be two sets of operator-valued distributions on S(R3 ), depending on a parameter t, that act on Hilbert spaces H1 and H2 , respectively, and assume that the operator-valued distributions at any time t form an irreducible set of operators36 . e+ (3) such Suppose further for i = 1, 2 that on Hi there are defined unitary representations Ui of E that n X −1 Ui (a, R)(φi )j (t, f )Ui (a, R) = S(R−1 )jk (φi )k (t, f(a,R) ) k=1 for all f ∈ where S is a matrix representation of SU (2). Finally suppose that for some t0 there exists a unitary operator V : H1 → H2 such that S(R3 ), (φ2 )j (t0 , .) = V (φ1 )j (t0 , .)V −1 . Then (a) the representations U1 and U2 are unitarily equivalent: U2 (a, R) = V U1 (a, R)V −1 ; (b) if there exists in H1 a unique (up to a phase) normalized vector Ω1 that is invariant under U1 , then there also exists in H2 a unique (up to a phase) normalized vector, namely Ω2 = V Ω1 , that is invariant under U2 . 36 See the end of subsection 4.1.2 for the definition of an irreducible set of operators. 88 Proof For t = t0 we have for f ∈ S(R3 ) U2 (a, R)V (φ1 )j (t0 , f )V −1 U2 (a, R)−1 = U2 (a, R)(φ2 )j (t0 , f )U2 (a, R)−1 n X = S(R−1 )jk (φ2 )k (t0 , f(a,R) ) k=1 = V n X ! S(R −1 0 )jk (φ1 )k (t , f(a,R) ) V −1 k=1 = V U1 (a, R)(φ1 )j (t0 , f )U1 (a, R)−1 V −1 , which is equivalent to (φ1 )j (t0 , f )V −1 U2 (a, R)−1 = V −1 U2 (a, R)−1 V U1 (a, R)(φ1 )j (t0 , f )U1 (a, R)−1 V −1 , which in turn is equivalent to (φ1 )j (t0 , f )V −1 U2 (a, R)−1 V U1 (a, R) = V −1 U2 (a, R)−1 V U1 (a, R)(φ1 )j (t0 , f ). Thus, the operator V −1 U2 (a, R)−1 V U1 (a, R) on H1 commutes with all the (φ1 )j (t, f ) and is therefore a (nonzero) multiple of the identity operator: V −1 U2 (a, R)−1 V U1 (a, R) = ω(a, R)−1 1H1 or U2 (a, R) = ω(a, R)V U1 (a, R)V −1 , where ω(a, R) is a complex number depending on (a, R). Now for any T1 = (a1 , R1 ) and T2 = (a2 , R2 ) we have ω(T1 T2 )V U1 (T1 T2 )V −1 = U2 (T1 T2 ) = U2 (T1 )U2 (T2 ) = ω(T1 )ω(T2 )V U1 (T1 )V −1 V U1 (T2 )V −1 = ω(T1 )ω(T2 )V U1 (T1 T2 )V −1 , e+ (3) and hence ω(a, R) ≡ 1. This proves part so ω is in fact a one-dimensional representation of E (a). Part (b) follows directly from the unitary equivalence of U1 and U2 . We will now state (a generalization of) Haag’s theorem. For simplicity, we will only state it for scalar fields. Theorem 4.20 (Generalized Haag’s theorem) Let (H1 , D1 , U1 , φ1 , Ω1 ) and (H2 , D2 , U2 , φ2 , Ω2 ) be two scalar quantum field theories which satisfy the sharp-time axiom and the fields of which have well-defined time-derivatives at each time t. For i = 1, 2, suppose that for each t the fields φi (t, .) and ∂t φi (t, .) together form an irreducible set of fields on Hi . Suppose also that for some instant t0 there exists a unitary operator V : H1 → H2 such that φ2 (t, .) = V φ1 (t, .)V −1 , ∂t φ2 (t, .) = V ∂t φ1 (t, .)V −1 . Then (a) the first four Wightman functions are the same in both quantum field theories; (b) if φ1 is a free field of mass m ≥ 0, then φ2 is also a free field of mass m and both theories are unitarily equivalent. Part (b) of the theorem is the original theorem of Haag, and its truth follows from the first part because the two-point Wightman functions in a free scalar field theory completely determine the other Wightman functions. A proof of this theorem can be found in [2], theorem 9.28. 89 4.1.5 Example: The free hermitean scalar field 3 Let H = L2 (R3 , d2pp0 ) be the one-particle state space for a spinless particle with mass m ≥ 0 that is equal to its own antiparticle. We denote the corresponding Fock space by F+ (H) and on this space we define the creation and annihilation operators A∗ (Ψ) := A∗+ (Ψ) and A(Ψ) := A+ (Ψ) for each e↑ with corresponding Ψ ∈ H. On the Fock space we also have a unitary representation UFock of P + energy-momentum operators P µ that satisfy the positive energy condition in axiom 0 and we also have a vacuum vector Ω that is the unique unit vector (up to a phase) in F+ (H) that is invariant under UFock . We will now construct a hermitean scalar field in F+ (H) that satisfies the remaining Wightman axioms. R 1 ip·x d4 x. For each Schwartz function f ∈ S(M) let fb denote its Fourier transform fb(p) = (2π) 2 M f (x)e Because fb is again a Schwartz function, its restriction fb| + to the orbit O+ is an element of Om 3 m L2 (R3 , d2pp0 ) = H. We can thus define a map R : S(M) → H by R(f ) = fb|Om +. Explicitly, Z 1 0 (Rf )(p) = f (x)ei(ωp x −p·x) d4 x. (4.7) 2 (2π) M Because R(f ) ∈ H, the operators A∗ (Rf ) and A(Rf ) are well-defined and we can use them to define for each real-valued f ∈ S(M) the operators √ φ(f ) = 2π(A(Rf ) + A∗ (Rf )). For complex-valued f = f1 + if2 ∈ S(M), with f1 and f2 the real and imaginary parts of f , we define φ(f ) := φ(f1 ) + iφ(f2 ). The reason for not defining φ(f ) by the same formula as for real-valued functions is that fields should depend linearly on the Schwartz function f (recall that annihilation operators A(Ψ) depend anti-linearly on Ψ). Because for each Ψ ∈ H the operators A(∗) (Ψ) are defined on the dense subspace D+ ⊂ F+ (H) (which was defined in subsection 2.2.5), the operators φ(f ) are defined on the dense subspace D+ for any f ∈ S(M). Also, because the A(∗) (Ψ) all leave D+ invariant, the operators φ(f ) also leave D+ invariant. Furthermore, for any Ψ1 , Ψ2 ∈ D+ the map S(M) → C, given by f 7→ hφ(f )Ψ1 , Ψ2 i, is a tempered distribution. Thus, f 7→ φ(f ) is an operator-valued distribution and each such φ(f ) is defined on the dense subspace D+ ⊂ F+ (H) and leaves this subspace invariant. For each f the adjoint φ(f )∗ is defined on D+ , so axiom 1 is satisfied. From the transformation properties of the e↑ (as derived in subsection 2.2.5), it follows that φ creation and annihilation operators under P + transforms as U (a, A)φ(f )U (a, A)−1 = φ(f(a,A) ), so axiom 2 is also satisfied. For real-valued f, g ∈ S(M) we have Z d3 p ∗ [A(Rf ), A (Rg)] = hRg, Rf i = (Rg)(p)(Rf )(p) 2ωp R3 Z 3 Z Z d p 1 0 −p·x) i(ωp y 0 −p·y) 4 i(ω x 4 = e g(y)d y e p f (x)d x 4 (2π) R3 M 2ωp M Z Z Z 3 1 d p 0 = e−i(ωp (x−y) −p·(x−y)) f (x)g(y)d4 xd4 y, 4 (2π) M M R3 2ωp so for real-valued f, g ∈ S(M) we have [φ(f ), φ(g)] = 2π ([A(Rf ), A∗ (Rg)] + [A∗ (Rf ), A(Rg)]) = 2π ([A(Rf ), A∗ (Rg)] − [A(Rf ), A∗ (Rg)]∗ ) Z Z Z 2i d3 p 0 = − sin(ωp (x − y) − p · (x − y)) f (x)g(y)d4 xd4 y. (2π)3 M M R3 2ωp 90 For complex-valued f = f1 + if2 and g = g1 + ig2 we then find [φ(f ), φ(g)] = [φ(f1 ) + iφ(f2 ), φ(g1 ) + iφ(g2 )] Z Z Z 2i d3 p 0 sin(ωp (x − y) − p · (x − y)) = − (2π)3 M M R3 2ωp [f1 (x)g1 (y) − f2 (x)g2 (y) + if1 (x)g2 (y) + if2 (x)g1 (y)]d4 xd4 y Z Z Z 2i d3 p 0 = − sin(ωp (x − y) − p · (x − y)) f (x)g(y)d4 xd4 y. (2π)3 M M R3 2ωp The integral between the square brackets is a distribution in the variable x − y and vanishes at points where x − y is spacelike. Therefore, if the supports of f and g are mutually spacelike separated then [φ(f ), φ(g)] = 0. So axiom 3 is also satisfied. It can also be shown that the Fock vacuum vector Ω is cyclic for the field operators φ(f ), so axiom 4 is also satisfied. Thus all Wightman axioms are satisfied; see section 8.4 of [2] for more details. Note that for the 2-point Wightman function we have hφ(f )φ(g)Ω, Ωi = 2πhA(Rf )A∗ (Rg)Ω, Ωi = 2πh[A(Rf ), A∗ (Rg)]Ω, Ωi Z Z Z 3 1 −i(ωp (x−y)0 −p·(x−y)) d p = e f (x)g(y)d4 xd4 y, (2π)3 M M R3 2ωp or 1 W (x, y) = (2π)3 Z e−i(ωp (x−y) 0 −p·(x−y)) R3 d3 p . 2ωp (4.8) For odd n the n-point Wightman functions are zero and for even n one can express the n-point function in terms of the n−2 point function and the 2-point function, and hence the 2-point function determines the other n-point functions. This was also mentioned briefly when we discussed Haag’s theorem, but we will not prove it here; these statements about the n-point functions can (for example) be found in section 8.4 of [2] or in section 3.3 of [32]. For any Schwartz function f ∈ S(M) the Fourier transform [(∂ 2 + m2 )f ]∧ of (∂ 2 + m2 )f is given by Z 1 2 2 ∧ [(∂ + m )f ] = eip·x (∂ 2 + m2 )f (x)d4 x (2π)2 M Z 1 = − [∂ 2 eip·x ] f (x)d4 x + m2 fb(p) (2π)2 M | {z } =p2 eip·x = (m2 − p2 )fb(p). + = {p ∈ M : p2 = m2 , p0 > 0} is identically So the restriction of this Fourier transform to Om 2 2 zero; in other words, R((∂ + m )f ) ≡ 0. This implies that the field φ satisfies the Klein-Gordon equation: [(∂ 2 + m2 )φ](f ) = φ((∂ 2 + m2 )f ) = 0 for any f ∈ S(M). We will now write the field φ in terms of the creation and annihilation operators a(∗) defined on the Fock space F+ (H). As indicated in subsection 2.2.5, the physical equivalent of A(∗) (Ψ) is a(∗) (JΨ), with J : H 3 Ψ 7→ √ 1 Ψ ∈ H. In the present case this means that we should define 2ωp the map r : S(M) → H by rf = JRf , so (rf )(p) = 1 p (2π)2 2ωp Z M 91 f (x)ei(ωp x 0 −p·x) d4 x. On F+ (H) the field φ(f ) for real-valued f ∈ S(M) is now given by √ √ φ(f ) = 2π(a∗ (JRf ) + a(JRf )) = 2π(a∗ (rf ) + a(rf )) h i √ Z = 2π d3 p (rf )(p)a∗ (p) + (rf )(p)a(p) R3 Z Z Z d3 p i(ωp x0 −p·x) 4 ∗ −i(ωp x0 −p·x) 4 −3/2 p f (x)e d x a (p) + f (x)e d x a(p) = (2π) 2ωp R3 M M ) Z Z ( i 3p h d 0 0 p = (2π)−3/2 e−i(ωp x −p·x) a(p) + ei(ωp x −p·x) a∗ (p) f (x)d4 x. 2ωp R3 M For this reason we also write φ(x) = (2π)−3/2 Z R3 i d3 p h −i(ωp x0 −p·x) 0 p e a(p) + ei(ωp x −p·x) a∗ (p) . 2ωp As stated in the previous chapter, the a∗ (p) and a(p) are not well-defined operators on the Fock space, but since we are smearing them out this is no problem. However, as we will show when we will discuss the (λφ4 )2 -model in the next chapter, it is not true that a∗ (p) and a(p) cannot be given any mathematical meaning without smearing them out. We will now define the notion of the field φ at a fixed moment in time, on the Fock space F+ (H). Analogous to (4.7) we define for each t ∈ R and for each Schwartz function f ∈ S(R3 ) on R3 a map Rt : S(R3 ) → H by Z Z f (x)e−ip·x d3 x. f (x)ei(ωp t−p·x) d3 x = (2π)−3/2 eiωp t (Rt f )(p) = (2π)−3/2 R3 R3 Then, for each t ∈ R and each real-valued f ∈ S(R3 ) we can define an operator φt (f ) = A∗ (Rt f ) + A(Rt f ) on the Fock space. We then extend φt to complex-valued functions f = f1 + if2 by defining φt (f ) = φt (f1 ) + iφ(f2 ). The operators φt (f ) are defined on D+ and it can be shown that the map t 7→ hφt (f )Ψ1 , Ψ2 i is smooth for any f ∈ S(R3 ) and Ψ1 , Ψ2 ∈ H. We will now investigate the relationship between φt and φ. For each Schwartz function u ∈ S(R) on R we find that for any f ∈ S(R3 ) Z Z Z −3/2 ip0 t −ip·x 3 f (x)e e d x u(t)dt (Rt f )(p)u(t)dt = (2π) R R R3 Z f (x)u(x0 )eip·x d4 x = (2π)−3/2 M √ = 2π[R(f · u)](p). For real-valued f ∈ S(R3 ) and real-valued u ∈ S(R), this implies that Z √ dtu(t)A(∗) (Rt f ) = 2πA(∗) (R(f · u)), R and therefore for real-valued f ∈ S(R3 ) and real-valued u ∈ S(R) we find that Z dtu(t)φt (f ) = φ(f · u). R This can then be extended by linearity to complex-valued f and u. This establishes the relationship between φt and φ: the operator-valued distribution φt on S(R3 ) is nothing else than the operatorvalued distribution φ on S(M) at fixed time t. For the time-derivative of the field at time t we find ∂t φt (f ) = A(∂t Rt f )∗ + A(∂t Rt f ) = A(iωp Rt f )∗ + A(iωp Rt f ). 92 So in order to obtain the commutators of the φt and their derivatives, we must compute commutators of the form [A(h1 · Rt f ), A(h2 · Rt g)∗ ], where the hj = hj (p) are either identically one, or else iωp . For these commutators we find [A(h1 · Rt f ), A(h2 · Rt g)∗ ] = hh2 · Rt g, h1 · Rt f i Z Z d3 p 1 −ip·y 3 iωp t g(y)e d y h2 (p)e = (2π)3 R3 2ωp R3 Z iω t −ip·x 3 p h1 (p)e f (x)e d x R3 " # Z Z Z 1 h1 (p)h2 (p) ip·(x−y) 3 = e d p f (x)g(y)d3 xd3 y. (2π)3 R3 R3 R3 2ωp The commutator of φt with itself now follows from choosing h1 = h2 ≡ 1, [φt (f ), φt (g)] = [A(Rt f ), A∗ (Rt g)] − [A(Rt f ), A∗ (Rt g)]∗ Z Z Z sin[p · (x − y)] 3 2i d p f (x)g(y)d3 xd3 y = (2π)3 R3 R3 R3 2ωp = 0. The commutator of ∂t φt with itself follows from choosing h1 = h2 = iωp , [∂t φt (f ), ∂t φt (g)] = [A(iωp Rt f ), A∗ (iωp Rt g)] − [A(iωp Rt f ), A∗ (iωp Rt g)]∗ Z Z Z ωp sin[p · (x − y)] 3 2i = d p f (x)g(y)d3 xd3 y (2π)3 R3 R3 R3 2 = 0. Finally, the commutator of φt with ∂t φt is obtained by taking h1 ≡ 1 and h2 = iωp , [φt (f ), ∂t φt (g)] = [A(Rt f ), A∗ (iωp Rt g)] − [A(Rt f ), A∗ (iωp Rt g)]∗ Z Z Z i 3 = cos[p · (x − y)]d p f (x)g(y)d3 xd3 y (2π)3 R3 R3 R3 Z Z = i δ(x − y)f (x)g(y)d3 xd3 y. R3 R3 We have thus found the commutation relations [φt (x), φt (y)] = 0 = [∂t φt (x), ∂t φt (y)] [φt (x), ∂t φt (y)] = iδ(x − y). As we did for the field φ, we can also define the field φt on F+ (H). This is done by using the map rt : S(R3 ) → H which is given by Z 1 iωp t p (rt f )(p) = e f (x)e−ip·x d3 x. 3 (2π)3/2 2ωp R The field φt on F+ (H) is then defined by φt (f ) = a∗ (rt f ) + a(rt f ) for real-valued f ∈ S(R3 ). The result is ) Z ( Z i 3p h d p φt (f ) = (2π)−3/2 e−i(ωp t−p·x) a(p) + ei(ωp t−p·x) a∗ (p) f (x)d3 p. 2ωp R3 R3 This suggests that we should write Z φt (x) = (2π)−3/2 R3 i d3 p h −i(ωp t−p·x) p e a(p) + ei(ωp t−p·x) a∗ (p) . 2ωp The right-hand side is the same as φ(x) with x0 = t, which reflects the fact that φt is precisely the field φ at time t. 93 4.1.6 Haag-Ruelle scattering theory In the previous chapter we showed that quantum fields arise quite naturally in the (perturbative) calculations in scattering theory. Before we introduced the perturbation theory, we mentioned that there must be some embeddings Ωin and Ωout from the Fock space HFock (describing free particles) into the physical Hilbert space H corresponding to the scattering experiment. We will now show that under some additional conditions, a quantum field theory (i.e. a theory satisfying the Wightman axioms) gives rise to such embeddings37 and can therefore describe scattering experiments. The additional conditions are the following. Haag-Ruelle axiom 1 The joint spectrum of the operators P µ lies in the set {0}∪Vµ+ , where Vµ+ = {p ∈ M : p · p > µ and p0 ≥ 0}. Haag-Ruelle axiom 2 The Hilbert space H contains countably many mutually orthogonal subspaces {H[τ ] }τ ∈T (so-called one-particle subspaces for particles of type τ ) which transform according to irreducible unitary repe↑ and are taken into H[τ C ] under PCT-transformations. Furthermore, resentations (mτ , sτ ) of P + for each particle type τ ∈ T in the theory there exists an operator Aτ ∈ P(M) in the polynomial C algebra such that Aτ Ω is the zero vector in H[τ ] and A∗τ Ω ∈ H[τ ] . Given the subspaces {H[τ ] }τ ∈T , the operators Aτ are called the solutions of the quantum field problem of one-particle states. Sometimes the one-particle problem has a simple solution; for instance, this is the case if the mass m > 0 of the particle is an isolated point of the spectrum of the mass operator. In a theory satisfying the Wightman axioms and the Haag-Ruelle axioms we define for each −1 particle type τ ∈ T the linear span B [τ ] of all operators Cof∗the form U (a, L)Aτ U (a, L) with e↑ . We then define the space A[τ ] := B [τ ] + B [τ ] . Then A[τ ] is a linear subspace of (a, L) ∈ P + C P(M) that is taken to A[τ ] under hermitean conjugation, is invariant with respect to restricted Poincaré transformations and is such that D[τ ] := A[τ ] Ω is dense in H[τ ] . Thus, for each particle type τ ∈ T essentially all one-particle states Ψ ∈ H[τ ] can be constructed by letting an operator in the polynomial algebra act on the vacuum vector, and when the adjoint of this operator acts on the vacuum vector then this gives a one-particle state of the corresponding antiparticle τ C . For each operator A ∈ A[τ ] we define a family of operators {At }t∈R by Z ∂ ∂ t A = A(x) 0 Dmτ (x) − Dmτ (x) 0 A(x) d3 x, ∂x ∂x 0 x =t R d4 p where A(x) := U (x, 1)AU (x, 1)−1 and Dm (x) := 2πi M (p0 )δ(p2 − m2 )e−ip·x (2π) 4 . An important t t property of the family {A }t∈R is that each element A acts in the same way on the vacuum as A, At Ω = AΩ. (4.9) The first part of the main result of Haag-Ruelle theory is that for Aj ∈ H[τj ] with j = 1, . . . , n the limits limt→∓∞ At1 . . . Atn Ω exist in H. These limits are denoted by Ψin and Ψout : Ψin (A1 , . . . , An ) = Ψout (A1 , . . . , An ) = lim At1 . . . Atn Ω t→−∞ lim At1 . . . Atn Ω. t→∞ To understand the second part of the result, let HFock be the Fock space describing a free system of particles of types τ ∈ T . We will identify the vacuum vector ΩFock ∈ HFock with the vacuum vector Ω ∈ H and the one-particle states in HFock with the one-particle states in H. Then for each 37 We will not provide the proofs of the relevant theorems here, since very detailed versions can be found in chapter 12 of [2]. Another good source for Haag-Ruelle theory is [20], sections II.3 and II.4. 94 A ∈ A[τ ] we can interpret AΩ and A∗ Ω as one-particle states in the Fock space HFock , describing a free particle of type τ and of type τ C , respectively. Using the creation and annihilation operators defined in subsection 2.2.5, we define the operator b := A∗τ (AΩ) + A∗ C (A∗ Ω) A τ on the Fock space. For each n ∈ N and each n-tuple (A1 , . . . , An ) with Aj ∈ A[τj ] we then define a Fock space vector ΨFock (A1 , . . . , An ) by c1 . . . A cn ΩFock , ΨFock (A1 , . . . , An ) = A and the closed linear span of all such vectors in the entire Fock space. A nice property of ΨFock is that it satisfies UFock (a, L)ΨFock (A1 , . . . , An ) = ΨFock (U (a, L)A1 U (a, L)−1 , . . . , U (a, L)An U (a, L)−1 ), e↑ and UFock is the representation of P e↑ on HFock . The second part of the main where (a, L) ∈ P + + result of Haag-Ruelle theory now states that there exist two linear isometries Ωin , Ωout : HFock → H satisfying Ωin/out (ΨFock (A1 , . . . , An )) = Ψin/out (A1 , . . . , An ), with Aj ∈ A[τj ] . Furthermore, this property determines Ωin and Ωout uniquely and these maps are Poincaré invariant, i.e. U (a, L)Ωin/out = Ωin/out UFock (a, L) e↑ , which in turn implies that the S-operator is Poincaré invariant: for all (a, L) ∈ P + UFock (a, L)SUFock (a, L)−1 = UFock (a, L)(Ωout )∗ Ωin UFock (a, L)−1 = UFock (a, L) UFock (a, L)−1 (Ωout )∗ U (a, L) U (a, L)−1 Ωin UFock (a, L) UFock (a, L)−1 = S. (4.10) S From (4.9) it follows that Ψin (A) = Ψout (A) for any A ∈ τ ∈T A[τ ] , and thus that Ωin ΨFock (A) = Ωout ΨFock (A) for any such A. This implies that the S-operator satisfies SΨFock (A) = (Ωout )∗ Ωin ΨFock (A) = (Ωout )∗ Ωout ΨFock (A) = ΨFock (A), where in the last step we used that Ωout is an isometry. Thus, the S-operator leaves one-particle states invariant. 4.2 The Haag-Kastler formulation of quantum field theory In this section we discuss the Haag-Kastler axioms as an alternative to the Wightman axioms. The Haag-Kastler framework is often called algebraic quantum field theory, because it makes use of abstract C ∗ -algebras, rather than concrete operators on a Hilbert space. 4.2.1 The algebraic approach to quantum theory To discuss the Haag-Kastler formulation of quantum field theory, we first need to reformulate the quantum theory that we discussed in section 2.2. In that section we assumed that the states and observables of a quantum system are given in terms of a concrete Hilbert space. In particular, the algebra of observables was B(H) and the states were given in terms of density matrices on H. In the algebraic approach to quantum theory, which we will introduce now38 , the algebra of observables corresponding to a quantum system is given as an abstract unital C ∗ -algebra U, the hermitian elements of which are called bounded observables. The set of states of this C ∗ -algebra, 38 Good sources for the algebraic approach are chapter 6 of [2] and chapter 2 of [7]. 95 i.e the normalized positive linear functionals on U, is denoted by S(U), but this set will be too large for physical purposes and we will therefore define the smaller set of physical states below. In the meantime, it will be convenient to introduce some terminology concerning the set S(U). The transition probability ω1 · ω2 between two pure states ω1 , ω2 ∈ P S(U) is defined as 1 ω1 · ω2 = 1 − kω1 − ω2 k2 , 4 where k.k denotes the operator norm on S(U). Because 0 ≤ kω1 − ω2 k ≤ kω1 k + kω2 k = 2, it is clear that ω1 · ω2 ∈ [0, 1] and it follows from the positive-definiteness of k.k that ω1 · ω2 = 1 if and only if ω1 = ω2 . When ω1 · ω2 = 0, we say that the states ω1 and ω2 are orthogonal, and two subsets S1 , S2 ⊂ P S(U) of pure states are called mutually orthogonal if ω1 · ω2 = 0 for all ω1 ∈ S1 and ω2 ∈ S2 . A non-empty subset S ⊂ P S(U) is called indecomposable if it cannot be written as the disjoint union of two non-empty mutually orthogonal subsets. Using this definition, we define a relation ∼ on P S(U) as follows: ω1 ∼ ω2 if and only if there exists an indecomposable set S ⊂ P S(U) with ω1 , ω2 ∈ S. Proposition 4.21 The relation ∼ is an equivalence relation on P S(U). Proof By considering the indecomposable set {ω}, it is clear that ω ∼ ω (reflexivity) for all ω ∈ P S(U). Because the definition of ω1 ∼ ω2 is manifestly symmetric in ω1 and ω2 , it is also clear that ω1 ∼ ω2 ⇒ ω2 ∼ ω1 (symmetry) for all ω1 , ω2 ∈ P S(U). To prove transitivity, assume that ω1 ∼ ω2 and ω2 ∼ ω3 . Then there exist indecomposable sets S1 , S2 ⊂ P S(U) with ω1 , ω2 ∈ S1 and ω2 , ω3 ∈ S2 . If the union S := S1 ∪ S2 would not be indecomposable, there would exist two disjoint non-empty mutually orthogonal subsets S 0 , S 00 ⊂ P S(U) with S = S 0 ∪ S 00 , and hence we could write Sj = (Sj ∩ S 0 ) ∪ (Sj ∩ S 00 ) for j = 1, 2. Note that either ω2 ∈ S 0 or ω2 ∈ S 00 . Assuming that ω2 ∈ S 0 would immediately lead to Sj ∩ S 00 = ∅ for j = 1, 2 (since the Sj are indecomposable), and thus also to S 00 = ∅. Similarly, assuming that ω2 ∈ S 00 would lead to S 0 = ∅. This contradiction shows that S must in fact be indecomposable. Because ω1 , ω2 , ω3 ∈ S, this implies that ω1 ∼ ω3 , and thus that ∼ is indeed an equivalence relation. Now consider an equivalence class C ⊂ P S(U) under ∼. We will show that C is indecomposable. If this would not be true then there would be disjoint mutually orthogonal non-empty sets C1 , C2 ⊂ P S(U) with C = C1 ∪ C2 . Now if ω1 , ω2 ∈ C with ωj ∈ Cj , then (since ω1 ∼ ω2 ) there exists an indecomposable set S ⊂ P S(U) with ω1 , ω2 ∈ S. Because all elements of S are equivalent under ∼, we must have S ⊂ C. But then S decomposes into disjoint mutually orthogonal non-empty subsets S ∩ C1 and S ∩ C2 , contradicting the indecomposability of S. Thus we conclude that the equivalence classes are indecomposable subsets of P S(U). Now suppose that for some equivalence class C we have an indecomposable set C 0 with C ⊂ C 0 . Then all elements in C 0 are equivalent under ∼ and hence we also have C 0 ⊂ C, which implies that C 0 = C. This shows that the equivalence classes are maximal indecomposable subsets of P S(U); we will call these sets sectors. Furthermore, note that if ω1 , ω2 ∈ P S(U) with ω1 · ω2 6= 0, then the set {ω1 , ω2 } is indecomposable and hence ω1 ∼ ω2 . This shows that the different sectors must be mutually orthogonal. General facts about representations In order to physically describe a system with abstract algebra of observables U, we choose an appropriate representation π : U → B(H) of the algebra of observables in some Hilbert space H. In this context we call π the physical representation and H the physical Hilbert space. If the system has finitely many coordinates and momenta that must satisfy the canonical commutation relations, the choice of representation is uniquely determined (up to unitary equivalence) by the Stone-Von Neumann theorem. However, if the system has infinitely many degrees of freedom, as in quantum field theory, the Stone-Von Neumann theorem is no longer applicable and in such cases there are many unitarily inequivalent representations of the canonical commutation relations. Therefore, for 96 such systems the physical representation π should be chosen carefully, depending on the particular dynamics of the system at hand39 ; for instance, the Fock representation cannot be used for interacting fields. Without loss of generality, we may always assume that the physical representation π is faithful, i.e. that π is injective. The reason for this is as follows. Suppose that it would have been possible to physically describe a quantum system by using a non-faithful representation π of the algebra of observables U. Then the representation π defines a representation π b of the quotient ∗ C -algebra U/ ker(π), and we could just as well have started with this quotient algebra (as the algebra of observables) from the beginning. Given the physical representation π : U → B(H) we define for each unit vector Ψ ∈ H a state on the C ∗ -algebra U by U 3 A 7→ hπ(A)Ψ, Ψi =: ρΨ (A). (4.11) We call this state the vector state associated with π corresponding to the vector Ψ ∈ H. If π is irreducible then this always defines a pure state in S(U) and in that case the set {ρΨ }Ψ∈H of all vector states associated with π coincides precisely with a sector C in P S(U) and if π 0 is another irreducible representation of U whose vector states correspond to some sector C 0 then C 0 = C if and only if π 0 is unitarily equivalent to π. Also, for each sector C ⊂ P S(U) there exists an irreducible representation π : U → B(H) such that C = {ρΨ }Ψ∈H , so we conclude that the sectors of P S(U) are in one-to-one correspondence with the irreducible representations of U modulo unitary equivalence. The proof of these facts can be found in section 6.1 (proposition 6.2) of [2]. As stated at the beginning of this subsection, the space S(U) is unnecessarily large for physical purposes. For a physical representation π : U → B(H), we define the set of physical states to be the set of all states in S(U) of the form ρ(A) = Tr(b ρπ(A)), A∈U (4.12) with ρb a density operator on H. To emphasize that this set of physical states depends on the representation π, we will denote it by Sπ . In general, Sπ is a proper subset of the set of all states S(U). Note that we can now characterize a quantum system by the pair (U, π), instead of by the pair (H, A) as we did in subsection 2.2.2. By the same reasoning as in subsection 2.2.2, we find that the vector state ρΨ defined in (4.11) is obtained as a special case of (4.12) by taking ρb to be the one-dimensional projection onto CΨ. Also, as in subsection 2.2.2, any ρ ∈ Sπ can be written as a countable convex combination of vector states. Again this shows that any pure state in Sπ must be a vector state. However, because in general π(U) is not equal to B(H), the converse is not necessarily true. To illustrate this, suppose that the physical Hilbert space is a direct sum H = H1 ⊕ H2 of Hilbert spaces and that π(U) = B(H1 ) ⊕ B(H2 ). Let Ψi ∈ Hi for i = 1, 2 be two unit vectors that define vector √ states ρΨi which are different from each other, and define the unit vector Ψ := (Ψ1 + Ψ2 )/ 2 ∈ H. Because π(A)Ψi ∈ Hi , and hence π(A)Ψi ⊥ Ψj for (i, j) ∈ {(1, 2), (2, 1)}, for every A ∈ U, the vector state defined by Ψ satisfies ρΨ (A) = = 1 1 1 hπ(A)(Ψ1 + Ψ2 ), Ψ1 + Ψ2 i = hπ(A)Ψ1 , Ψ1 i + hπ(A)Ψ2 , Ψ2 i 2 2 2 1 1 ρΨ (A) + ρΨ2 (A), 2 1 2 which shows that the vector state ρΨ is a convex combination of two different states and is therefore not pure. We note furthermore that although each state in Sπ is a countable convex combination of vector states, it is not necessarily true that each state is a countable convex combination of pure states. We now introduce some terminology concerning representations. Two representations π1 : U → B(H1 ) and π2 : U → B(H2 ) are called phenomenologically equivalent if Sπ1 = Sπ2 . Note that two unitarily equivalent representations are in particular phenomenologically equivalent. A 39 In contrast to the case of finitely many degrees of freedom, where the chosen representation only depends on the number of degrees of freedom (i.e. H ' L2 (RN ) for N degrees of freedom) and not on the specific dynamics of the system. 97 representation π : U → B(H) is called factorial of type I if π is a direct sum of a (possibly e so H = H e ⊕K and π = infinite) number of copies of some irreducible representation π e : U → H, ⊕K π e . Two factorial representations of type I are called disjoint if they are multiples of irreducible representations that are unitarily inequivalent. Without proof we mention that a representation π is phenomenologically equivalent to some irreducible representation π e if and only if π is a direct sum of copies of π e (and is thus factorial of type I). Once we have chosen the physical representation π, we can consider the closure π(U) of the algebra π(U) ⊂ B(H) in the σ-weak topology. By Von Neumann’s bicommutant theorem, this is a Von Neumann algebra and we have π(U) = π(U)00 ; it is called the Von Neumann algebra of observables of the quantum system. To be able to consider observables which are represented by unbounded observables, we proceed as follows. We say that a (possibly unbounded) self-adjoint operator A on H is affiliated to the Von Neumann algebra of observables π(U)00 if all spectral projection operators EA (∆), with ∆ a Borel set in R, belong to π(U)00 . The set of observables of the system is then defined to be the set of all self-adjoint operators on H which are affiliated to π(U)00 . Superselection rules For a quantum system (U, π) the elements in the commutant π(U)0 are called superselection operators and a set of operators in B(H) that generates π(U)0 is refered to as the superselection rules of the system. Of course, we always have C1H ⊂ π(U)0 and in case the inclusion is strict we say that the system has non-trivial superselection rules. If H is a Hilbert space and V ⊂ H is a subset of non-zero vectors in H, then V is called a linked system of vectors if V cannot be written as a disjoint union of two non-empty mutually orthogonal subsets. In particular, for any linear subspace V ⊂ H the set of unit vectors in V forms a linked system. Now let W ⊂ H be a set of nonzero vectors that is total in H, i.e. the closed linear span of W is equal to H. We then define a relation ∼ on W as follows: Ψ1 ∼ Ψ2 if and only if there exists a linked system L ⊂ W with Ψ1 , Ψ2 ∈ L. By using similar arguments as for indecomposable sets of P S(U) (see above), we find that ∼ defines an equivalence relation on W . The equivalence classes give rise to a partitioning of W into mutually orthogonal maximal linked systems {Wν }ν∈N , where the index set N may be uncountable. For each ν ∈ N we define a subspace Hν ⊂ H as the closed linear span of Wν . Because W is total in H, we then have M H= Hν . ν∈N So we conclude that if we have a total subset W in a Hilbert space H, then H decomposes into a direct sum of non-zero subspaces Hν such that Wν = W ∩ Hν . Note that although N might be uncountable, the direct sum is still discrete in the sense that the measure on the index set N is discrete, in contrast to the general case of a direct integral when the set N is equipedR with a more ⊕ general measure and in which case we would really have to write a direct integral instead of L . We will now apply this in the following way. Let π : U → B(H) be a representation of the ∗ C -algebra U and suppose that the set P ⊂ H of all vectors in H that define pure states on U forms a total subset L of H. Then according to the discussion above we can decompose H into a direct sum H = ν∈N Hν of non-zero subspaces Hν with Pν = Hν ∩ P, where {Pν }ν∈N are the maximal linked systems in P as above. Now fix some ν0 ∈ N and choose a Ψ0 ∈ Pν0 . If A ∈ U with π(A)Ψ0 6= 0, then it can be shown40 that the state on U defined by the unit vector AΨ0 /kAΨ0 k is pure, so the unit vectors of π(U)Ψ0 form a subset of P. Because π(U)Ψ0 is a linear subspace, the unit vectors in π(U)Ψ0 form a linked system. Thus the set of unit vectors of π(U)Ψ0 is a subset of Pν1 for some ν1 ∈ N . But Ψ0 ∈ Pν0 ∩ Pν1 , so we must in fact have ν1 = ν0 and thus the unit vectors of π(U)Ψ0 lie in Pν0 . Since Ψ0 ∈ Pν0 was arbitrary, this implies that π(U)Pν0 ⊂ Hν0 ; since ν0 ∈ N was arbitrary, we then have π(U)Pν ⊂ Hν for all ν ∈ N . Because for each ν the set Pν is 40 See exercise 6.10 of [2]. 98 total in Hν , this in turn implies that π(U) leaves all the subspaces Hν invariant. Without proof41 we mention furthermore that the subrepresentation of π(U) on the subspaces Hν are all factorial of type I and are pairwise disjoint. We can thus write H as a double direct sum M M eν⊕Mν H (4.13) H= Hν = ν∈N ν∈N and we can write π as π= M πν = ν∈N M π eν⊕Mν , (4.14) ν∈N eν ) are irreducible representations. This decomposition into a (discrete) direct where π eν : U → B(H sum of irreducible representations was possible because of the assumption that P is total in H; this assumption about P is therefore called the hypothesis of discrete superselection rules. For representations satisfying this hypothesis the following proposition holds, which can be found in section 6.2 (proposition 6.6) of [2]. Proposition 4.22 Let π : U → B(H) be a representation of a C ∗ -algebra U, let P ⊂ H be the set of all vectors that define pure states and suppose that π satisfies the hypothesis of discrete superselection rules. Then the following statements are equivalent: (1) The elements of P Sπ are in one-to-one correspondence with the elements of P. (2) The representations πν : U → B(Hν ) in the decompositions (4.13) and (4.14) are irreducible (i.e. Mν = 1 for S all ν ∈ N ). (3) P = {Ψ ∈ ν∈N Hν : kΨk = 1}. (4) The commutant π(U)0 of π(U) is abelian. Note that if (2) is satisfied, the representation is still phenomenologically equivalent to a representation where some (or all) of the Mν are larger than 1 (including the case where some Mν is infinite). So demanding that the physical representation π satisfies (2) does not restrict the possibilities for the state space Sπ , but it has the benefit that it simplifies the representation π. For this reason it is often assumed that a system (U, π) that satisfies the hypothesis of discrete superselection rules, also satisfies the equivalent statements in the proposition above. Because of (4), this assumption is called the hypothesis of commutative (discrete) superselection rules. So for a system (U, π) that satisfies the hypothesis of commutative discrete superselection rules the representation decomposes into a direct sum of unitarily inequivalent representations of U. As stated earlier, all unit vectors in an irreducible representation define pure states on U, so in each of the spaces Hν in the direct sum we have the unrestricted superposition principle, i.e. the superposition of two pure states again defines a pure state. On the entire space H we then have the following restricted version of the superposition principle: a normalized linear combination of two vectors defining pure states again defines a pure state if the two vectors belong to the same space Hν . For this reason the subspaces Hν are called coherent subspaces of H. For a system (U, π) with commutative discrete superselection rules, the commutant is given by M M πν (U)0 = C1ν , π(U)0 = ν∈N ν∈N where the first equality already holds if only the hypothesis of discrete superselection rules is satisfied. The second equality follows from the irreducibility of the πν . The Von Neumann algebra of observables is now clearly M π(U)00 = B(Hν ). ν∈N Symmetries in the algebraic approach We will now discuss symmetries in the algebraic approach. The definition of a symmetry of the 41 See proposition 6.5 of [2] for a proof. 99 quantum system (H, A) that was given in 2.2.3 can be restated for a quantum system (U, π) in the algebraic approach. A symmetry of a quantum system (U, π) is defined to be a pair (s, s0 ) of bijections s : Us-a → Us-a and s0 : Sπ → Sπ on the set of self-adjoint (s-a) elements of U and on the set of physical states, respectively, satisfying (s0 ρ)(sA) = ρ(A) (4.15) for all ρ ∈ Sπ and A ∈ Us-a . The map s : Us-a → Us-a is continuous in the norm topology (it even preserves the norm) and consequently the map s0 : Sπ → Sπ is weak*-continuous. The proofs of these facts can be found in [2], proposition 6.7 and the paragraph preceding that proposition. Because π is assumed to be faithful, Sπ can be shown42 to be weak*-dense in S(U). Thus we can extend s0 uniquely by weak*-continuity to a map s0 : S(U) → S(U); therefore we may assume that s0 is a map from S(U) into itself; it is in fact a bijection, with inverse given by the extension of (s0 )−1 : Sπ → Sπ . By using the same reasoning as in section 2.2.3, the map s0 : S(U) → S(U) preserves the convex structure of S(U) and therefore maps pure states onto pure states. Thus, s0 : S(U) → S(U) is a weak*-continuous affine bijection which maps Sπ onto itself. Furthermore, s0 preserves transition probabilities and as a consequence the image of a sector of S(U) under the map s0 is again a sector. Conversely, given a weak*-continuous affine map s0 that maps Sπ onto itself, we can define a unique symmetry (s, s0 ). Before we close our discussion of the map s0 and go over to a discussion of the map s, we define the notion of an invariant state. A state ρ ∈ Sπ is called an invariant state under (s, s0 ) if (s0 ρ)(A) = ρ(A) for all A ∈ Us−a , or equivalently (s0 ρ)(sA) = (s0 ρ)(A) for all A ∈ Us−a . Now that we have discussed s0 in some detail, we will discuss some properties of s. To see that s : Us−a → Us−a is R-linear, we note that for λ, µ ∈ R and A, B ∈ Us-a we have (s0 ρ)(s(λA + µB)) = ρ(λA + µB) = λρ(A) + µρ(B) = λ(s0 ρ)(sA) + µ(s0 ρ)(sB) = (s0 ρ)(λsA + µsB) (4.16) for all ρ ∈ S(U), which implies that s(λA+µB) = λsA+µsB since S(U) separates the points of Us-a . Furthermore, it can also be shown that s satisfies s(A2 ) = s(A)2 for all A ∈ Us-a (see for instance proposition 6.10 of [2]), which is equivalent to the property that s(AB+BA) = s(A)s(B)+s(B)s(A) for all A, B ∈ Us-a . We will now extend s to a map s : U → U by demanding that condition (4.15) also holds for all A ∈ U. Then the first step in (4.16) also makes sense for λ, µ ∈ C and it follows that s : U → U is C-linear, and is hence a vector space automorphism. Using the fact that each 1 A ∈ U can be written as linear combination A = 12 (A + A∗ ) + i 2i (A − A∗ ) =: Re(A) + iIm(A) 2 2 of self-adjoint elements, it is also easy to see that s(A ) = s(A) for all A ∈ U. Finally, for each A ∈ U we also have s(A∗ ) = s(Re(A) − iIm(A)) = s(Re(A)) − is(Im(A)) = [s(Re(A)) + is(Im(A))]∗ = s(A)∗ , where in the last step we used that s is C-linear. If U1 and U2 are C ∗ -algebras, a linear map s : U1 → U2 satisfying s(A2 ) = s(A)2 and s(A∗ ) = s(A)∗ for all A ∈ U is called a Jordan*homomorphism. Thus we have found that the symmetries of a quantum system (U, π) must be Jordan*-automorphisms of U. Conversely, if s : U → U is a Jordan*-automorphism then we get a pair (s, s0 ) of bijections where (s0 ρ)(A) = ρ(s−1 A) for A ∈ U; however, this pair (s, s0 ) will only define a symmetry if s0 maps Sπ onto itself. The set J (U) of all Jordan*-automorphisms inherits a topology from the weak*-topology of U∗ . Together with the composition law of Jordan*automorphisms this gives J (U) the structure of a topological group. Note that a Jordan*automorphism is more general than a C ∗ -isomorphism because a Jordan*-isomorphism is not 42 See section 6.1 and 6.3 of [2]. Here it is shown that if π is faithful, Sπ distinguishes the positive elements of U (i.e. ρ(A) ≥ 0 for all ρ ∈ Sπ implies A ≥ 0), which in turn (by a result of Kadison which also uses that Sπ is convex) implies that Sπ is weak*-dense in S(U). 100 necessarily multiplicative. For instance, a C ∗ -anti-automorphism of U (i.e. a vector space automorphism s : U → U which preserves the ∗-operation and satisfies s(AB) = s(B)s(A) for all A, B ∈ U) is also a Jordan*-automorphism. The following theorem of Kadison, which can be found in section 2.2 of [7] (theorem II.2.1), gives us more insight in the nature of a given Jordan*automorphism. Theorem 4.23 Let U1 be an abstract C ∗ -algebra and let U2 be a C ∗ -algebra of operators on a Hilbert space H. Then a linear ∗-preserving surjection α : U1 → U2 is a Jordan*-homomorphism if and only if there exists a projection operator E ∈ (U2 )00 ∩ (U2 )0 such that α(AB)E = α(A)α(B)E and α(AB)(1 − E) = α(B)α(A)(1 − E) for all A, B ∈ U1 . We can apply the theorem as follows. If s : U → U is a Jordan*-automorphism that defines a symmetry of the system (U, π), then the map πs : U → π(U) defined by A 7→ π(sA) is a surjective Jordan*-homomorphism. According to the theorem, there exists a projection E ∈ π(U)00 ∩ π(U)0 with πs (AB)E = πs (A)πs (B)E and πs (AB)(1 − E) = πs (B)πs (A)(1 − E) for all A, B ∈ U. In other words, if H denotes the representation space corresponding to π then H decomposes into a direct sum H = H1 ⊕ H2 of subspaces H1 = EH and H2 = (1 − E)H which are invariant under π(U) (this follows from the fact that E ∈ π(U)0 ) and such that the Jordan*-automorphism43 π(A) 7→ πs (A) on π(U) is a direct sum of a C ∗ -automorphism of the algebra π(U)E ' π(U)|H1 and a C ∗ -antiautomorphism of the algebra π(U)(1 − E) ' π(U)|H2 . As a special case, if π(U)00 ∩ π(U)0 = C1 then the Jordan automorphism π(A) 7→ πs (A) is either a C ∗ -automorphism or else a C ∗ -antiautomorphism and we have thus obtained the following corollary to the theorem above. Corollary 4.24 Let U be a C ∗ -algebra and let π : U → B(H) be a representation. If s : U → U is a Jordan*-automorphism such that s(ker(π)) ⊂ ker(π), then π(A) 7→ π(sA) =: πs (A) is a Jordan*-automorphism of π(U). Furthermore, there exists a projection operator E ∈ π(U)00 such that π(U) leaves the subspaces H1 = EH and H2 = (1−E)H invariant and such that π(A) 7→ πs (A) decomposes into a direct sum of a C ∗ -automorphism of π(U)|H1 and a C ∗ -anti-automorphism of π(U)|H2 . The relationship between this corollary and Wigner’s theorem in section 2.2.3 is as follows. Assume that (U, π) satisfies the hypotheses of commutative and discrete superselection rules. Then the physical Hilbert space decomposes into a direct sum of Hilbert spaces Hν M H= Hν , ν∈N L and the physical representation π decomposes accordingly into a direct sum π = ν∈N πν of irreducible representations πν : U → B(Hν ). Because the Hilbert spaces H1 = EH and H2 = (1 − E)H are invariant under π(U), the index set N can be written as a disjoint union N = N1 ∪ N2 such that M M Hν and H2 = Hν H1 = ν∈N1 ν∈N2 Note that it is allowed that one of the Ni is empty. It can now be shown that the C ∗ -automorphism of π(U)|H1 can be represented by a unitary operator U1 : H1 → H1 as πs (A)|H1 7→ U1 π(A)|H1 U1 −1 (4.17) and that there exists a bijection b1 : N1 → N1 such that this operator U1 maps the coherent subspace Hν with ν ∈ N1 unitarily onto the coherent subspace Hb1 (ν) . Similarly, the C ∗ -antiautomorphism of π(U)|H2 can be represented by an anti-unitary operator U2 : H2 → H2 as πs (A)|H2 7→ U2 π(A)∗ |H2 U2 −1 43 Here we assume that s(ker(π)) ⊂ ker(π), which is certainly the case when π is faithful. 101 (4.18) and there exists a bijection b2 : N2 → N2 such that U2 maps the coherent subspace Hν with ν ∈ N2 anti-unitarily onto the coherent subspace Hb2 (ν) . By comparing equations (4.17) and (4.18) with equations (2.19) and (2.20), the relationship with Wigner’s theorem is clear. In fact, we have obtained a generalization of Wigner’s theorem: a symmetry in the algebraic approach can be represented in the physical Hilbert space as a direct sum of a unitary operator and an anti-unitary operator, each of which maps coherent subspaces unitarily, resp. anti-unitarily, onto coherent subspaces. The proof of all these facts can be found in section 6.3 (proposition 6.11) of [2]. In the absence of commutative and discrete superselection rules, we cannot always make the step from corollary 4.24 to unitary and anti-unitary operators on H. In particular, when a Jordan*-automorphism s : U → U defines a C ∗ -automorphism of π(U) (rather than a direct sum of an automorphism and an anti-automorphism), it is not always true that there exists a unitary operator U : H → H such that π(sA) = U π(A)U −1 . When such an operator U does exist, we say that the symmetry s is implementable. An important example, which we will need in the following subsection when we discuss vacuum states, is obtained in the case of a GNSrepresentation corresponding to a state which is invariant under a symmetry (s, s0 ) for which s is a C ∗ -automorphism: Theorem 4.25 Let U be a C ∗ -algebra and let (s, s0 ) be a symmetry for which the Jordan*automorphism s : U → U is a C ∗ -automorphism and suppose that the state ρ ∈ S(U) is invariant under (s, s0 ). Let πρ : U → B(Hρ ) be the GNS-representation associated to the state ρ and suppose that s(ker(π)) ⊂ ker(π). Then there exists a unique unitary operator Us : Hρ → Hρ on the representation space Hρ that satisfies Us πρ (A)Us−1 = πρ (sA), Us Ωρ = Ωρ (4.19) for each A ∈ U, where Ωρ ∈ Hρ denotes the cyclic vector corresponding to πρ . Proof Because s(ker(π)) ⊂ ker(π), we see that if πρ (A)Ωρ = πρ (B)Ωρ for some A, B ∈ U then also πρ (sA)Ωρ = πρ (sB)Ωρ . Thus, on the dense subset πρ (U)Ωρ of Hρ we can define a linear operator Us by Us πρ (A)Ωρ := πρ (sA)Ωρ . This operator satisfies hUs πρ (A)Ωρ , Us πρ (B)Ωρ i = hπρ (sA)Ωρ , πρ (sB)Ωρ i = hπρ (sB ∗ A)Ωρ , Ωρ i = ρ(sB ∗ A) = ρ(B ∗ A) = hπρ (A)Ωρ , πρ (B)Ωρ i, where we have used that ρ is invariant under the symmetry. Because s is bijective, we have Us πρ (U)Ωρ = πρ (U)Ωρ , so Us is indeed unitary. If U has a unit, then the second equation in (4.19) follows from the fact that s1 = 1, since this implies πρ (s1)Ωρ = Ωρ . If U has no unit, then the identity follows by taking an approximate unit eν of U. The first equation in (4.19) follows from Us πρ (A)Us−1 [πρ (sB)Ωρ ] = Us πρ (A)Us−1 [Us πρ (B)Ωρ ] = Us πρ (AB)Ωρ = πρ (sAB)Ωρ = πρ (sA)πρ (sB)Ωρ and from the fact that the set {πρ (sB)Ωρ }B∈U is dense in Hρ . To show uniqueness, suppose that Us0 is a linear operator satisfying the two equations in (4.19). Then for all A ∈ U we have Us0 πρ (A)Ωρ = Us0 πρ (A)(Us0 )−1 Us0 Ωρ = πρ (sA)Ω = Us πρ (A)Ωρ , so Us0 coincides with Us on πρ (U)Ωρ . Hence Us0 = Us . Now that we have discussed individual symmetries, we will consider symmetry groups. Recall that after proving Wigner’s theorem we argued that the elements of a connected Lie group of 102 symmetries must be represented by unitary operators rather than anti-unitary ones. In view of this, we expect that in the algebraic approach connected symmetry groups should be represented by C ∗ -automorphisms. In the algebraic approach we say that a topological group G is a symmetry group of the system if there is a morphism α : G → J (U) of topological groups. As demonstrated in section 2.2 of [7] (theorem II.2.4), it is indeed the case that if a topological group G is a connected symmetry group then the elements αg := α(g) ∈ J (U) are all C ∗ -automorphisms. In particular, ↑ in relativistic quantum systems the elements (a, L) of the restricted Poincaré group P+ give rise ∗ to C -automorphisms α(a,L) : U → U. 4.2.2 The Haag-Kastler axioms We will now state and motivate the axioms of the Haag-Kastler framework. In section 4.1.6 we showed that under some extra assumptions (the Haag-Ruelle axioms) the Wightman framework is capable of describing particles in a scattering experiment. These extra assumptions concerned certain properties of the spectrum of the energy-momentum operator and also the existence of oneparticle subspaces in the Hilbert space H and the existence of certain operators in the polynomial algebra P(M) which generate one-particle states from the vacuum state. Using these operators which generate one-particle states, it was possible to construct the in- and out-states that one needs in scattering theory, as well as the Poincare-invariant isometries Ωin/out : HFock → H. It seems that in the Haag-Ruelle theory the quantum fields only play a role in the background: they were needed to obtain the correspondence O → P(O) from spacetime domains to ∗-algebras of operators, and they were needed in the proofs of the mathematical statements (although we did not consider these proofs in section 4.1.6; see for instance section 12.2 of [2] for detailed proofs). It turns out that when in a quantum theory a correspondence between spacetime domains and operators is chosen properly (i.e. as in the Haag-Kastler framework that we will introduce now), the results of the Haag-Ruelle theory can be derived without using quantum fields, see also chapter 5 of [1]. This fact should be considered as one of the reasons for discussing the Haag-Kastler theory. Apart from the fact that this framework is capable of incorporating the Haag-Ruelle scattering theory, Haag and Kastler give (in their paper [21]) as a motivation for introducing their theory that the true essence of quantum field theory is that it gives rise to the notion of observables which can be measured in some spacetime region and that the observables corresponding to spacelike separated regions are compatible. In this sense it should be expected that the Haag-Kastler theory, which focusses on the assignment of observables to spacetime domains, is more general than quantum field theory. Our discussion in this and the following subsection is inspired by the books [1] and [20] and, of course, by the article [21]. As stated in the previous subsection, the Haag-Kastler theory is formulated in the setting of algebraic quantum theory, so we should begin with the following axiom. Axiom 0: Algebra of observables There is a C ∗ -algebra U, called the algebra of observables. Notice that there is no mention of the choice of the (faithful) physical representation here. The reason for this, as explained in the article [21], is as follows. Suppose that we have a number of physical systems that are prepared in identical ways and suppose that for each of these systems we make a measurement of a set of simultaneously measurable observables A1 , . . . , An , in order to obtain some knowledge about the unknown state α ∈ S(U) of the identical systems. More specifiA cally, these measurements provide us with estimates p(Aj , B) ∈ [0, 1] of the probabilities Pα j (B) that a measurement of observable Aj will result in a value in the Borel set B for the system in state α. However, each measurement always involves some error and we can only prepare a finite number of identical systems, so after our measurements we can only conclude that the system is in some state α ∈ S(U) for which the following inequalities hold: A |Pα j (B) − p(Aj , B)| < j . 103 These inequalities do not specify the point α ∈ S(U) exactly, but rather define some neighborhood in S(U) with respect to the weak topology on S(U). A particular physical representation π should thus be considered as adequate for describing the system if all open neighborhoods in S(U) that can be obtained from an experiment (in the way described above) contains an element of Sπ . In view of these considerations, it seems natural to introduce the following definition. Two physical representations π1 and π2 are called physically equivalent if every weakly open neighborhood in Sπ1 contains an element of Sπ2 , and vice versa. Note that Sπ1 and Sπ2 are both subsets of the larger space S(U), so that the definition makes sense. An important result44 in the theory of C ∗ -algebras now states that any two representations π1 and π2 with ker(π1 ) = ker(π2 ) are physically equivalent in the sense defined above. In particular, any two faithful representations are physically equivalent and for this reason it is not necessary to specify the particular choice of the (faithful) physical representation in the axioms. The correspondence between spacetime domains and operators is now established by assuming that the algebra of observables has some a substructure which is given in terms of spacetime domains. Axiom 1: Local algebras For each bounded45 open set O ⊂ M in Minkowski spacetime there is a C ∗ -subalgebra U(O) ⊂ U. The self-adjoint elements of U(O) are interpreted as the observables that can be measured in the spacetime region O, also called local observables. A local observable which is measurable in some subset of Minkowski spacetime, should also be measurable in some larger subset of Minkowski spacetime. This is expressed in the following axiom. Axiom 2: Monotonicity If O1 , O2 ⊂ M are bounded open sets satisfying O1 ⊂ O2 , then U(O1 ) ⊂ U(O2 ). When we have some bounded open set O ⊂ M and some observable A ∈ U(O) which can be ↑ measured in O, then a restricted Poincaré transformation g ∈ P+ should have the effect of map↑ ping A to some observable which can be measured in gO. So an element g ∈ P+ defines for each (O) bounded open set O ⊂ M a map αg : U(O) → U(gO). Because restricted Poincaré transformations are assumed to be symmetries of any relativistic quantum system, this map must in fact be an isomorphism of C ∗ -algebras. Axiom 3: Covariance ↑ For each restricted Poincaré transformation g ∈ P+ we have an automorphism αg : U → U such that for each bounded open subset O ⊂ M the restriction of αg to U(O) is a C ∗ -isomorphism αg : U(O) → U(gO). For a fixed observable A ∈ U, the map g 7→ αg (A) is continuous in g. If two regions of spacetime are spacelike separated, then no physical process in one of the two regions can affect a physical process in the other region. In particular, this means that we can perform simultaneous measurements in both regions and therefore the local observables corresponding to one of the two regions must in fact commute with all local observables corresponding to the other region. Axiom 4: Locality If the bounded open subsets O1 , O2 ⊂ M are spacelike separated, then the algebras U(O1 ) and U(O2 ) commute. 44 Haag and Kastler call this result Fell’s equivalence theorem. In their article [21] there is a reference to the relevant article of J.M.G. Fell. 45 By a bounded set in M we mean a set with compact closure. 104 Finally, we assume that the algebra U of observables is the smallest C ∗ -algebra containing all the local observables. This emphasizes the importance of local observables. Axiom 5: Generating property S The algebra O U(O) is dense in U. Here the union is taken over all bounded open subsets of M. 4.2.3 Vacuum states in the Haag-Kastler framework A large difference with the Wightman theory is that there is no mention of a vacuum state in the Haag-Kastler axioms. As we will show now, there is in fact a notion of vacuum states in the Haag-Kastler framework. To this end, we first consider a physical representation π : U → B(H) of the algebra of observables and we assume that there is some unitary representation T (x) of the translation group on the physical Hilbert space H with corresponding energy-momentum generators P µ . Let EP denote the joint spectral measure of the operators P µ . Then for any Ψ ∈ H we can define a measure µΨ on Minkowski space M by µΨ (B) = hEP (B)Ψ, Ψi for any Borel set B ⊂ M. The support of the measure µΨ is of course precisely the support of the wave function of Ψ in energy-momentum space in case Ψ is a one-particle state. In general, it is given the following name. Definition 4.26 Let H be the Hilbert space of a physical system and let T (x) be a unitary representation of the translation group on H with corresponding energy-momentum generators P µ which have the joint spectral measure EP . Then for a vector Ψ ∈ H the support of the measure µΨ : B 7→ hEP (B)Ψ, Ψi on M is called the energy-momentum spectrum of Ψ. We now have the following lemma, which states that (the representation of) some operators in the algebra U can shift the energy-momentum spectrum of vectors in the physical Hilbert space. This lemma is lemma 4.1 in [1]. R ip·x d4 x has b(p) = Lemma 4.27 Let f ∈ C ∞ (M) be such that its Fourier transform f M f (x)e R 4 bounded support ∆ ⊂ M, and for Q ∈ U define Q(f ) = α(x,1) (Q)f (x)d x as a Bochner integral46 . Let π : U → B(H) be a representation and suppose that there is a unitary representation T (x) of the translation group on H with energy-momentum generators P µ . If the energy-momentum spectrum FΨ ⊂ M of a vector Ψ ∈ H is a closed set, then π(Q(f ))Ψ has energy-momentum spectrum FΨ +∆. For this reason, we say that the operator Q(f ) ∈ U, with f as in the lemma, increases the energymomentum by ∆. Now define for a future-directed timelike vector e ∈ M the set M− (e) = {p : p · e < 0}. Note that e lies along the time-axis in some particular inertial frame, so the set M− (e) contains all energy-momentum vectors which have negative energy in this particular inertial frame. If ∆ ⊂ M− (e), then the lemma implies that according to this inertial observer the operator π(Q(f )) decreases the energy of any vector Ψ ∈ H. Thus, if we want some vector Ψ0 ∈ H to represent a vacuum vector (which has the lowest possible energy in any inertial frame) then we must have π(Q(f ))Ψ0 = 0 for any Q ∈ U and any smooth f whose Fourier transform has support ∆ ⊂ M− (e) for some future-directed timelike vector e ∈ M. For such functions f we thus have ρΨ0 (Q(f )∗ Q(f )) = hπ(Q(f )∗ Q(f ))Ψ0 , Ψ0 i = 0. When we translate this back into the language of the abstract algebra, we obtain the following definition. Definition 4.28 A state ω ∈ S(U) on a C ∗ -algebra U is called a vacuum state if ω(Q(f )∗ Q(f )) = 0 for all Q ∈ U and for any smooth function f whose Fourier transform has bounded support ∆ ⊂ M− (e) for some future-directed timelike vector e ∈ M. 46 A Bochner integral is an integral of a function on a measure space with values in a Banach space. Its definition is very similar to that of a Lebesgue integral of a complex-valued function on a measure space. 105 Let V+ = {p : p · p ≥ 0, p0 ≥ 0} denote the closed forward light cone in momentum space. Then clearly M− (e) ⊂ M\V+ for any future-directed timelike vector e ∈ M, so if ω ∈ S(U) is a state which satisfies ω(Q(f )∗ Q(f )) = 0 for all Q ∈ U and for any smooth function f whose Fourier transform has bounded support ∆ ⊂ M\V+ , then ω is a vacuum state. Conversely, suppose that ω is a vacuum state and suppose that f is a smooth function whose Fourier transform fb has bounded support ∆ S ⊂ M\V+ . If V+ ⊂ M denotes the set of all futuredirected timelike vectors, then M\V+ = e∈V+ M− (e); because each M− (e) is open, we have obtained an open cover {M− (e)}e∈V+ of M\V+ and hence also of ∆. But ∆ is compact, so there exists a finite subcover {M− (ej )}nj=1 (with e1 , . . . , en ∈ V+ ) of ∆. Now let {gj }nj=1 be a partition of unity subordinate to the cover {M− (ej )}nj=1 , i.e. each gj is a smooth function with support in S P M− (ej ) and nj=1 gj (p) = 1 for all p ∈ nj=1 M− (ej ). Now define the smooth functions {fbj }nj=1 by P fbj = gj fb. Then the support of each fbj lies in M− (ej ) and n fbj = fb. If we denote the inverse j=1 Fourier transform of fbj by fj , then we find that 0 ≤ ω(Q(f )∗ Q(f )) = n X n X ω(Q(fj )∗ Q(fk )) ≤ j=1 k=1 n X n X |ω(Q(fj )∗ Q(fk ))| ≤ 0. j=1 k=1 p p where in the last step we used the property |ω(A∗ B)| ≤ ω(A∗ A) ω(B ∗ B) of states. Thus ω(Q(f )∗ Q(f )) = 0 and we have proved the following proposition. Proposition 4.29 A state ω ∈ S(U) on a C ∗ -algebra U is a vacuum state if and only if ω(Q(f )∗ Q(f )) = 0 for all Q ∈ U and any smooth function f whose Fourier transform has bounded support ∆ ⊂ M\V+ . 106 5 Constructive quantum field theory After our investigation of the Wightman and Haag-Kastler axiom schemes we might wonder whether there are any concrete models that satisfy all axioms of one or both of these axiom schemes. Of course, we have already seen that the free field theories are examples of such concrete models. There are also some other models that were constructed at a very early stage in the development of rigorous quantum field theory, such as the Schwinger and Thirring models, but these models turned out to be trivial in the sense that the corresponding fields could be expressed as functions of free fields. The goal of the constructive quantum field theory programm that emerged in the 1960s was to prove that concrete non-trivial models exist within the Wightman and/or Haag-Kastler axiom scheme. In this chapter we will discuss some of the earliest results that were obtained in constructive quantum field theory. We have used the historical notes [23] and [33] as a guide through the literature, especially concerning the chronology of the results. The two main strategies for constructive quantum field theory were the Hamiltonian and Euclidean strategy. We will discuss both of them in separate sections, with a special focus on the scalar boson models with a self-interaction. Because the proofs of almost all theorems that we will be needing are very long and technical, we have decided not to include them here. Instead, we will focus on the main arguments and we will specify how the different mathematical objects are constructed, without proving that the construction makes sense mathematically. 5.1 The Hamiltonian approach In the Hamiltonian approach one begins with a free field theory on Fock space and uses cutoffs in order to make sense of the interaction term in the Hamiltonian of some interacting field theory. The methods that are used in this approach are of a functional-analytic nature. 5.1.1 The (λφ4 )2 -model as a Haag-Kastler model The scalar quantum field theory in 2-dimensional spacetime with a quartic self-interaction was one of the first non-trivial models that people tried to construct in the 1960s, because it is probably the simplest of all non-trivial models. The Hamiltonian for this model is given formally by47 Z H = H0 + λ : φ0 4 (x) : dx, (5.1) R where H0 is the free field Hamiltonian (see also equation (5.2) below), λ is the coupling constant and φ0 (x) denotes the free field at time t = 0. This model is called the (λφ4 )2 -model, where the subindex 2 refers to the number of spacetime dimensions. Since the interaction term is not well-defined, we will introduce a cutoff version of this interaction. However, we will begin with a discussion of the free field system in two spacetime dimensions. Description of the free field Let H ' L2 (R, dp) be the Hilbert space of one-particle momentum-spin wave functions in 2dimensional Minkowski spacetime M2 for a particle with mass m and spin s = 0 that is equal to its own antiparticle. Since the spacetime is 2-dimensional, these wave functions ψ(p) depend only p on a single real variable48 p, and analogous to the 4-dimensional case we define ωp = m2 + p2 . Let F ≡ F+ (H) be the boson Fock space corresponding to H, and let φ be the free scalar field, defined on real-valued f ∈ S(M2 ) by √ φ(f ) = 2π(a∗ (rf ) + a(rf )) 47 Here we use a boldface letter x to denote a single real variable, because the notation x is already used to denote spacetime vectors, which have two components in this 2-dimensional model. 48 We will write p instead of p for the same reason that we write x instead of x. 107 as defined before, where for f ∈ S(M2 ) we define (rf )(p) = 2π √1 2ωp R R2 f (t, x)ei(ωp t−px) dtdx. In terms of the ill-defined operators a(∗) (p) we can express the field as Z dp 1 p [ei(ωp t−px) a∗ (p) + e−i(ωp t−px) a(p)]. φ(t, x) = √ 2π R 2ωp We also define the sharp-time field φt and its derivative πt := ∂t φt in the same way as in the 4-dimensional case. Of special importance to us will be the t = 0 field Z 1 dp −ipx ∗ p φ0 (x) = √ e [a (p) + a(−p)] 2π R 2ωp and its canonical conjugate π0 . For real-valued f ∈ S(R), both φ0 (f ) and π0 (f ) are essentially self-adjoint operators that are defined on the subspace D ≡ D+ of F consisting of all finite particle states, as defined in subsection 2.2.5: D = ∞ M n [ F n = {ψ = (ψ 0 , ψ 1 , ψ 2 , . . .) ∈ F : ∃N with ψ n = 0 for all n ≥ N }, n=0 j=1 n (H) is the symmetric n-particle Hilbert space, consisting of all square-integrable where F n ≡ F+ functions ψ n (p1 , . . . , pn ) with ψ n (pσ(1) , . . . , pσ(n) ) = ψ n (p1 , . . . , pn ) for all σ ∈ Sn ; also, we have used the notation where we write an element ψ ∈ F as a sequence ψ = (ψ 0 , ψ 1 , . . .) with ψ n ∈ F n for all n. Because φ0 (f ) and π0 (f ) are essentially self-adjoint for real-valued f ∈ S(R), their closures φ0 (f )− and π0 (f )− are self-adjoint, and by the spectral theorem they define spectral measures Eφ0 (f ) and Eπ0 (f ) . If O ⊂ R is a bounded open set49 , let DR (O) = {f ∈ S(R) : f real-valued, supp(f ) ⊂ O}. If BR denotes the Borel σ-algebra on R, then we define the set    [ [ A(O) =  Eφ0 (f ) (∆) ∪  ∆∈BR ,f ∈DR (O)  Eπ0 (f ) (∆) . ∆∈BR ,f ∈DR (O) The Von Neumann algebra A(O) (notice the difference between the letters A and A) is then defined to be the Von Neumann algebra generated by the set A(O) ⊂ B(F). Equivalently, A(O) is the Von Neumann algebra generated by the unitary elements eiφ0 (f ) and eiπ0 (f ) with f ∈ DR (O). We will now show how the operators a(∗) (p) can be defined rigorously. We define the subset D ⊂ F as the set of all elements ψ = (ψ 0 , ψ 1 , . . .) in D for which ψ n is a Schwartz function for all n: D = {ψ = (ψ 0 , ψ 1 , . . .) ∈ D : ψ n ∈ S(Rn ) for all n}. The annihilation operator a(p) can now be defined as a map a(p) : D → D. The action of a(p) on an element ψ = (ψ 0 , ψ 1 , . . .) ∈ D is √ (a(p)ψ)n−1 (p1 , . . . , pn−1 ) = nψ n (p, p1 , p2 , . . . , pn−1 ). Because a(p) maps D into itself, we can let an arbitrary product a(p1 ) . . . a(pn ) act on D and hence such products are well-defined operators on D. Furthermore, for any ψ, υ ∈ D such a product gives rise to a Schwartz function (p1 , . . . , pn ) 7→ ha(p1 ) . . . a(pn )ψ, υi. 49 We will write open spatial sets (which are subsets of R) by boldface letters to distinguish them from open sets in two-dimensional spacetime. 108 Unfortunately, the creation operators a∗ (p) are not so well-behaved as the annihilation operators a(p). Formally, their action on an element ψ = (ψ 0 , ψ 1 , . . .) ∈ D is n+1 (a∗ (p)ψ)n+1 (p1 , . . . , pn+1 ) = √ X 1 δ(p − pj )ψ n (p1 , . . . , pj−1 , pj+1 , . . . , pn+1 ). n + 1 j=1 The delta function makes it impossible to define a∗ (p) as an operator on a non-trivial subspace of F, but the fact that for any ψ ∈ H the operator a∗ (ψ) is the adjoint of a(ψ) suggests that we can make sense of a∗ (p) as a bilinear form on D × D, D × D 3 (ψ, υ) 7→ ha∗ (p)ψ, υi := hψ, a(p)υi. More generally, for any product a∗ (p1 ) . . . a∗ (pn )a(p1 ) . . . a(pm ) we can define a bilinear form on D × D by (ψ, υ) 7→ ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υi := ha(p01 ) . . . a(p0m )ψ, a(pn ) . . . a(p1 )υi. For fixed ψ, υ ∈ D, the right-hand side is a Schwartz function in the variables pi , p0j , i.e. fψ,υ ∈ S(Rn+m ), where fψ,υ (p1 , . . . , pn , p01 , . . . , p0m ) = ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υi. If F ∈ S 0 (Rn+m ) is a tempered distribution and if we write this distribution as a function F (p1 , . . . , pn , p01 , . . . , p0m ), then the action of F on fψ,υ can be written as Z F (fψ,υ ) = F (p1 , . . . , pn , p01 , . . . , p0m )fψ,υ (p1 , . . . , pn , p01 , . . . , p0m )dn pdm p0 n+m R Z = F (p1 , . . . , pn , p01 , . . . , p0m )ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υidn pdm p0 . Rn+m In this sense, we may say that for each distribution F (p1 , . . . , pn , p01 , . . . , p0m ) ∈ S 0 (Rn+m ) we can define the integral Z F (p1 , . . . , pn , p01 , . . . , p0m )a∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )dn pdm p0 . Rn+m Since ωp δ(p − p0 ) is a Schwartz distribution in the variables p and p0 , i.e. ωp δ(p − p0 ) ∈ S 0 (R2 ), we can use this to define for each n the operator Z Z Nn := ωp n δ(p − p0 )a∗ (p)a(p0 )dpdp0 = ωp n a∗ (p)a(p)dp. R2 R For n = 0 this gives the number operator N0 = N and for n = 1 this gives the free Hamiltonian N1 = H0 , Z H0 = ω(p)a∗ (p)a(p)dp. (5.2) R This bilinear form gives rise to a self-adjoint operator, the domain of which we denote by D(H0 ). The interaction term For g ∈ S(R), let gb ∈ S(R) denote its Fourier transform. Then, as a bilinear form, we may define V (g) := 4 Z X 4 n=0 n R4 dp1 . . . dp4 gb(p1 + . . . + p4 ) a∗ (p1 ) . . . a∗ (pn )a(−pn+1 ) . . . a(−p4 ). √ √ (2π)3/2 ωp1 . . . ωp4 109 This bilinear form defines a self-adjoint operator (with domain D(V (g))), which we will also denote by V (g), that is essentially self-adjoint on the subspace D0 = ∞ \ D(H0n ). n=0 The right-hand side in the definition of V (g) can be further rewritten, resulting in R " 4 # Z −i(p1 +...+p4 )x dx √1 g(x)e X 4 R p p a∗ (p1 ) . . . a∗ (pn )a(−pn+1 ) . . . a(−p4 ) V (g) = dp1 . . . dp4 2π (2π)3/2 ω(p1 ) . . . ω(p4 ) n=0 n R4 ) Z (Z e−ip1 x dp1 e−ip4 x dp4 ∗ ∗ √ √ ... √ √ : [a (p1 ) + a(−p1 )] . . . [a (p4 ) + a(−p4 )] : g(x)dx = 2π ωp1 2π ωp4 R R4 #4 Z "Z e−ipx dp ∗ √ √ [a (p) + a(−p)] : g(x)dx : = 2π ωp R R Z = : φ0 4 (x) : g(x)dx. R So V (g) is in fact a smeared out version of the interaction term in the total Hamiltonian (5.1). We define the cut-off Hamiltonian H(g) to be H(g) = H0 + V (g). This cut-off Hamiltonian is self-adjoint with domain D(H0 ) ∩ D(V (g)). For any bounded open set O ⊂ R we define the set Ot = {x ∈ R : dist(x, O) < |t|}. With this notation, Glimm and Jaffe show in the article [11] that the free Hamiltonian satisfies eitH0 A(O)e−itH0 ⊂ A(Ot ), (5.3) where A(O) is the Von Neumann algebra that was defined above. Now let Eφ0 (f ) be the spectral measure of the closure of φ0 (f ), which was already used in the definitions of A(O) and A(O). We then define M to be the Von Neumann algebra generated by the set of projections [ Eφ0 (f ) (∆). ∆∈BR ,f ∈S(R) Note that the functions f are not assumed to have a bounded support this time. Perhaps the most important result in the article [11] is the following theorem, which will allow us to remove the cut-off in the time-evolution of a local observable. Theorem 5.1 Let O ⊂ R be an open interval. If g ∈ S(R) is real-valued with supp(g) ⊂ O, then eitV (g) ∈ A(O) ∩ M. (5.4) In what follows we choose O to be of the form O = {x ∈ R : |x| < M }. Now let A ∈ A(O), and fix some n ∈ N and some g ∈ S(R). For 1 ≤ k ≤ n and t ∈ R we then define ik h it ik h it it it An,k (t) := e n H0 e n V (g) A e− n V (g) e− n H0 . Let > 0. We can write g as a sum g = g1 + g2 , where g1 and g2 are smooth and satisfy supp(g1 ) ⊂ O and supp(g2 ) ∩ O/2 = ∅. Then V (g) = V (g1 ) + V (g2 ) and it follows from the definition of V (g) that V (g1 ) and V (g2 ) commute. Thus, for any t ∈ R we have it it it e n V (g) = e n V (g1 ) e n V (g2 ) . 110 it Because supp(g2 ) ∩ O/2 = ∅, e n V (g2 ) commutes with50 A(O/4 ) and hence, in particular, with our operator A ∈ A(O) ⊂ A(O/4 ). From this it follows that for An,1 (t) we have it it it it it it it it An,1 (t) = e n H0 e n V (g) Ae− n V (g) e− n H0 it it it it = e n H0 e n V (g1 ) e n V (g2 ) Ae− n V (g2 ) e− n V (g1 ) e− n H0 it it = e n H0 e n V (g1 ) Ae− n V (g1 ) e− n H0 . Thus, An,1 (t) only depends on g1 ; in other words, An,1 (t) depends only on the value of g in the it it region O . Theorem 5.1 implies that e n V (g1 ) Ae− n V (g1 ) ∈ A(O ) and equation (5.3) then implies that it it it it An,1 (t) = e n H0 e n V (g1 ) Ae− n V (g1 ) e− n H0 ∈ A((O )t/n ) = A(O+t/n ). it it it it Because An,k (t) = e n H0 e n V (g) An,k−1 (t)e− n V (g) e− n H0 , we can repeat the procedure above (with An,k−1 (t) instead of A) by choosing in each step an appropriate decomposition g = g1 + g2 . The result is that for each 1 ≤ k ≤ n we have An,k (t) ∈ A(Okt/n+k ) and that An,k (t) depends only on the value of g in Okt/n+k . In particular, for k = n we find that An,n (t) ∈ A(Ot+n ) and that An,n (t) depends only on the value of g in Ot+n . Because T was arbitrary, An,n (t) 0depends only on the value of g in Ot (the closure of Ot ), and An,n (t) ∈ >0 A(Ot+ ). Thus, if O ⊂ R is an open region with O0 ∩ Ot = ∅ then An,n (t) commutes with every observable B ∈ A(O0 ). Because n was arbitrary, the statements above hold for all n and thus also for σt (A) := eitH(g) Ae−itH(g) = strong lim An,n (t), n→∞ where we have used the Trotter product formula, which states that if S and T are self-adjoint and S + T is essentially self-adjoint on D(S) ∩ D(T ), then for each ψ ∈ H we have i i n i(S+T ) e ψ = lim e n S e n T ψ. n→∞ In the present case this means that eitH(g) = strong lim n→∞ it it e n H0 e n V (g) n . In particular, σt (A) depends only on the value of g in the region Ot . The idea is now to take g ∈ S(R) to be a nonnegative function such that it equals the coupling constant λ in the region Ot . The time evolution σt (A) of A ∈ A(O) is then determined by the value of g in the region where it equals λ, and hence the cut-off has been removed. This was the main result of the article [11]. We will now turn to the second article, namely [12], of Glimm and Jaffe on the (λφ4 )2 -model. The ground state of H(g) and the field operators The Hamiltonian H(g) defined above is bounded from below, i.e. its spectrum has an infimum Eg ∈ R. This was shown by Nelson in [27], and later in a more general context by Glimm in [10]. There exists a vector Ωg ∈ F that satisfies H(g)Ωg = Eg Ωg and kΩg k = 1 and this vector is uniquely determined up to a phase factor. This phase factor is fixed by the requirement that hΩg , ΩFock i > 0. The existence and uniqueness of the ground state Ωg are the content of theorems 2.2.1 and 2.3.1 of the article [12]. Using the t = 0 field φ0 (x), we define the field φg (t, x) by φg (t, x) = eitH(g) φ0 (x)e−itH(g) as a bilinear form on some subset of F, and if ψ ∈ F lies in this subset, then the function (t, x) 7→ hφg (t, x)ψ, ψi is continuous. For each f ∈ S(R2 ) we then define another bilinear form Ag,f (t) by Z Ag,f (t) = φg (t, x)f (t, x)dx. R 50 This is argued by Glimm and Jaffe in the proof of the theorem 5.1 above. 111 This bilinear form gives rise to a self-adjoint operator which we will also denote by Ag,f (t). In a similar fashion, we obtain a self-adjoint operator from the bilinear form Z Bg,f (t) = πg (t, x)f (t, x)dx, R where πg (t, x) = eitH(g) π0 (x)e−itH(g) . Using these self-adjoint operators, we then define the integrals Z Ag,f (t)ψdt φg (f )ψ := R Z Bg,f (t)ψdt, πg (f )ψ := R which in turn define closed symmetric operators φg (f ) and πg (f ). Under certain conditions on f , which we will not discuss here (see section 3.2 of the article [12] for the details), these operators satisfy (∂t φg )(f ) = πg (f ) = [iH(g), φg (f )] on a certain subset of F. A similar reasoning then also gives that (∂t2 φg )(f ) = [iH(g), πg (f )]. The commutator on the right-hand side is a bilinear form equal to Z 2 2 (∂x φg )(f ) − m φg (f ) − 4 : φ3g (t, x) : f (t, x)g(x)dxdt, R2 where : φ3g (t, x) : is a shorthand for eitH(g) : φ30 (x) : e−itH(g) , so φg satisfies the differential equation ( + m2 )φg (f ) = −4λ : φ3g : (f ), where the equality is in the sense of bilinear forms. If f has compact support, the operator φg (f ) is self-adjoint and the differential equation above can be interpreted as an equality of self-adjoint operator-valued distributions. Also, if f has compact support, say supp(f ) ⊂ O for some bounded open region in R2 , then we can remove the cutoff g in a similar manner as we did for operators A ∈ A(O) above, i.e. by choosing a function g (O) ∈ S(R) such that g (O) (x) = λ on an interval I of R whose causal shadow contains O. Thus for each bounded open subset O we can define a field φ(O) (t, x) without any cutoff, where φ(O) (f ) = φg(O) (f ). We now want to patch such φ(O) (t, x) together to form a field φ0 (t, x) without cutoffs. To accomplish this, we divide R2 into overlapping squares Sj and we define a partition of unity subordinate to this open cover of overlapping squares, see also section 3.4 of [12]. Thus, we define a set of functions ζj : R2 → [0, P P1] with supp(ζ P j ) ⊂ Sj 2 and with j ζj = 1R2 . A function f ∈ S(R ) can then be written as f = j f ζj =: j fj , where supp(fj ) ⊂ Sj . The idea is now to define a bilinear form XZ A0g,f (t) = φ(Sj ) (t, x)fj (t, x)dx, j R which gives rise to a self-adjoint operator which we will also denote by A0g,f (t). In a similar 0 (t) by replacing φ(Sj ) by π (Sj ) . We then define manner we also obtain a self-adjoint operator Bg,f the integrals Z 0 φ (f )ψ := A0f,g (t)ψdt (5.5) R Z 0 π 0 (f )ψ := Bf,g (t)ψdt, (5.6) R 112 which give rise to closed symmetric operators φ0 (f ) and π 0 (f ) for f ∈ S(R2 ). We write φ0 and π 0 (instead of φ and π) to distinguish these objects from the free field φ and its time-derivative π. The field φ0 is local in the sense that φ0 (f1 ) and φ0 (f2 ) commute whenever the supports of f1 and f2 are mutually spacelike separated. The algebra of local observables For each bounded open subset O ⊂ R2 of spacetime, we define U(O) to be the Von Neumann algebra generated by the set 0 {eiφ (f ) : supp(f ) ⊂ O, f = f } of (bounded) operators on F. We will show that these algebras satisfy the Haag-Kastler axioms. It is clear that for bounded open sets O1 ⊂ O2 ⊂ R2 we have U(O1 ) ⊂ U(O2 ), so monotonicity is satisfied. By construction of the field φ0 , we can find for any bounded open O ⊂ R2 a function g which equals λ on an interval of R and is such that for all f ∈ S(R2 ) with supp(f ) ⊂ O we have ei∆tH(g) φ0 (f )e−i∆tH(g) = φ0 (f∆t,0 ), (5.7) where f∆t,∆x (t, x) = f (t−∆t, x−∆x). Because supp(f∆t,0 ) ⊂ O +(∆t, 0), we see that the spectral measures of φ0 (f∆t,0 ) are in U(O + (∆t, 0)). So (5.7) induces a map U(O) → U(O + (∆t, 0)) for each bounded open region O ⊂ R2 and this map is in fact a C ∗ -isomorphism. Thus, this map gives us a transformation of the algebras U(O) under time translations that is of the form required by the Haag-Kastler axioms. For space translations, we first consider the free field generator P of space translations. Because φ0 (0, x) = φ0 (x), we have e−i∆xP φ0 (0, x)ei∆xP = φ0 (0, x + ∆x). If one now chooses a cutoff function g such that the interval where g = λ is large enough, we get e−i∆xP φ0 (t, x)ei∆xP = e−i∆xP eitH(g) φ0 (0, x)e−itH(g) ei∆xP = eitH(g) e−i∆xP φ0 (0, x)ei∆xP e−itH(g) = eitH(g) φ0 (0, x + ∆x)e−itH(g) = φ0 (t, x + ∆x), see section 3.6 of [12]. In particular, this shows that e−i∆xP φ0 (f )ei∆xP = φ0 (f0,∆x ) for any f ∈ S(R2 ) with bounded support. Applying the same reasoning as for time translations, we find a map that transforms U(O) to U(O + (0, ∆x)) as required by the Haag-Kastler axioms. Showing that there also exists a transformation of the algebras U(O) under Lorentz boosts with all the desired properties is much harder. In fact, Cannon and Jaffe have written an article of 61 pages to prove this covariance under Lorentz boosts, see [4]. The starting point of their solution is to consider the expression for the Lorentz boost generator as used in physics, i.e. the expression in terms of the energy-momentum tensor of the field. Analogues to the interaction term in the Hamiltonian, they introduce cutoff functions to obtain a well-defined local version of the boost generator. This local boost generator then defines a transformation of the algebras U(O) under Lorentz boosts, and this transformation can be shown to have the desired properties. This, together with the results about spacetime translations above, completes the verification of the covariance axiom. So for each Poincaré transformation (a, L) in two-dimensional spacetime, we have a C ∗ automorphism σ(a,L) : U → U such that for each bounded region O of spacetime the restriction of σ(a,L) to U(O) defines a C ∗ -isomorphism σ(a,L) : U(O) → U((a, L)O). (5.8) Finally, the locality axiom is also satisfied because we already noticed that φ0 (f1 ) and φ0 (f2 ) commute whenever the supports of the two compactly supported functions f1 and f2 are spacelike separated, and the algebras U(O) are constructed from the smeared fields φ0 (f ) with f compactly supported. The algebra of observables U is now obtained by taking the norm completion of S O U(O). 113 5.1.2 The physical vacuum for the (λφ4 )2 -model As stated above, the cutoff Hamiltonian H(g) has an infimum Eg ∈ R and there exists a unique unit vector Ωg ∈ F, up to a phase factor, that satisfies H(g)Ωg = Eg Ωg . This vector was called the ground state of H(g). In order to obtain an operator such that the ground state has eigenvalue 0, we define Hg := H(g) − Eg . For this operator the vector Ωg is the unique vector, up to a phase factor, with the property Hg Ωg = 0. Corresponding to the vector Ωg we can define a linear functional ωg : U → C, with U the algebra of observables defined above, by ωg (A) = hAΩg , Ωg i. This linear functional is a state in the sense of C ∗ -algebras, as is any vector state. In the article [13], Glimm and Jaffe use this state ωg to construct a physical vacuum state. They begin with a cutoff function g that equals the coupling constant λ in some interval of the form [−M, M ] and then they define the sequence (gn ) of functions gn (x) = g(x/n). If σx : U → U denotes the transformation of U corresponding to a space translation over the vector x, then they define a sequence of states (ωn ) by Z h(x/n) ωn (A) = hσx (A)Ωgn , Ωgn idx, (5.9) n R where h is a smooth nonnegative function with support in [−1, 1] and Z h(x)dx = 1, R which also implies that x 7→ h(x/n)/n integrates to 1. This sequence (ωn ) of states can be shown to have a weakly-convergent subsequence (ωnk ), the limit of which is denoted by ω, i.e. for all A ∈ U we have lim ωnk (A) = ω(A). (5.10) k→∞ The obtained state ω can now be used to define a physical vacuum state in a physical Hilbert space, via the Gelfand-Naimark-Segal construction. Thus we consider the GNS-representation πω : U → B(Hω ) of the C ∗ -algebra U in the Hilbert space Hω , in which the state ω is given by ω(A) = hπω (A)Ωω , Ωω i, where the unit vector Ωω ∈ Hω is uniquely determined up to a phase factor. As shown in theorem 2.1 of [13], on the Hilbert space Hω there exists a unitary representation U (a) of the translation group and this representation satisfies U (a)πω (A)U (a)∗ = πω (σ(a,1) (A)) U (a)Ωω = Ωω , where A ∈ U and σ(a,L) is as in equation (5.8). The existence of U (a) follows from the fact that ω is a translation-invariant state, i.e. ω(σ(a,1) (A)) = ω(A). The representation U (a) is stronglycontinuous, so according to the SNAG-theorem (which we also used in step 1 of the classification e↑ ) there exist two commuting self-adjoint operators H and P such that of the irreps of P + U (a) = eia·P , where P = (H, P). The operator H is positive, which is a consequence of (5.10) and of the fact that each Hgn is positive; see also the end of the proof of theorem 2.1 of [13]. 114 In the proof of theorem 2.2 of [13], it is shown that Hω is a separable Hilbert space and that for each bounded region O of spacetime, there exists a unitary operator UO : F → Hω such that for all A ∈ U(O) πω (A) = UO AUO∗ . This is also true if O is replaced by a bounded region O of space at time t = 0. So locally the representation is unitarily equivalent to the local algebra of Fock space operators. For this reason, the representation (Hω , πω ) is called locally Fock. This property can be used to construct fields on the Hilbert space Hω as follows. Let O be a bounded open region of spacetime and let f ∈ S(R2 ) be a real-valued function with support in O. Then φ0 (f ) is self-adjoint on the Fock space F, and 0 hence s 7→ eisφ (f ) defines a strongly-continuous one parameter unitary group on F. Using the unitary map UO described above, we then obtain a strongly-continuous one parameter unitary group 0 0 s 7→ πω (eisφ (f ) ) = UO eisφ (f ) UO∗ on the Hilbert space Hω . According to Stone’s theorem there exists a self-adjoint operator φω (f ) on Hω that generates this unitary group. In Stone’s theorem the self-adjoint operator is constructed explicitly in terms of the derivative of the unitary group with respect to the parameter, and from this construction it easily follows that the generators of the two unitary groups on F and Hω are related by φω (f ) = UO φ0 (f )UO∗ . In the previous subsection we showed how a partition of unity can be used to define the smeared fields φ0 (f ) for f ∈ S(R2 ). This same technique can also be used to define φω (f ) for arbitrary Schwartz functions, see also the last pages of [15]. 5.1.3 The P(φ)2 -model and verification of some of the Wightman axioms At the beginning of the 1970s, all results in the previous two sections on the (λφ4 )2 -model were rederived for the more general P(φ)2 -model, characterized (formally) by the Hamiltonian Z H = H0 + λ : P(φ0 (x)) : dx, R where P is a polynomial that is bounded from below. So for the P(φ)2 -model the Haag-Kastler axioms were established, as well as the existence of a vacuum state ω that gives rise to a locally Fock representation (Hω , πω ) of the algebra of observables and on Hω there is a unitary representation of the translation group with corresponding energy-momentum operators. Also, the locally Fock representation allows the construction of fields φω as in the (λφ4 )2 -model. The problem of verifying the Wightman axioms for the (λφ4 )2 -model could thus be investigated in the more general context of the P(φ)2 -model. Some of the first (new) results in this more general context were derived in the articles [14] and [15]. In these articles, Glimm and Jaffe show that the energy-momentum spectrum lies in the forward light cone for the P(φ)2 -model, as required by the Wightman axioms (spectral condition), and that HΩω = 0 = P Ωω . They also show that, under the assumption that the model has a mass gap, the vacuum vector Ωω is unique and the vacuum expectation values exist; we will come back to the vacuum expectation values later. What is not established in these articles is the Lorentz invariance of the vacuum, i.e. ω(σ(0,L) (A)) = ω(A) for all A ∈ U. The state of affairs for the P(φ)2 -model at this point of history (i.e. 1970/1971) is also summarized in part I of the lecture notes [16], which can also be found in volume 1 of the two-volume book ’collected papers’ of Glimm and Jaffe. However, it soon became clear that there was a gap in the proof of the spectral condition, as was pointed out by Frölich and Faris. So the spectral condition was no longer established for the P(φ)2 -model. In the meantime, Streater proved in the article [31] that if one could prove the spectral condition, then the Lorentz covariance of the Wightman functions would follow automatically (given the results that were already established at that point). The existence of these Wightman functions (as tempered distributions) was established in the article [17] of 115 Glimm and Jaffe which was published later than the article [31] of Streater, but Streater explains that Glimm and Jaffe communicated some of their results before their article was published. In [17] Glimm and Jaffe begin with the quantities of the form hφ0 (f1 ) . . . φ0 (fm )Ωg , Ωg i (5.11) and they show that these quantities can be bounded in absolute value by a product of Schwartz space norms kf1 k1 . . . kfm km , |hφ0 (f1 ) . . . φ0 (fm )Ωg , Ωg i| ≤ kf1 k1 . . . kfm km , independently of the cutoff g, and such that each of the norms is translation invariant (i.e. a translation of a function f does not change the norm of f ). In a similar manner as in equation (5.9) (with n = 1), they then average the quantities in (5.11) over space translations: Z h(x)hφ0 ((f1 )((0,x),1) ) . . . φ0 ((fm )((0,x),1) )Ωg , Ωg idx, R where h is a similar function as in (5.9) and f(a,L) (x) = f (L−1 (x − a)) as usual. Due to translation invariance of the norms described above, this averaged quantity is also bounded by the same product of norms. By considering sequences (gn ) as in (5.9) and by taking a convergent subsequence, we then obtain quantities that we denote by hφω (f1 ) . . . φω (fm )Ωω , Ωω i. It is quite nice that this procedure gives us the opportunity to somehow define the vacuum expectation values for the field φω , even though we are not sure whether the expressions φω (f1 ) . . . φω (fm )Ωω are well-defined. The bounds above still hold for these vacuum expectation values, which shows that they are separately continuous and therefore, by the nuclear theorem, define tempered distributions on the Schwartz space S(R2m ). Although it is not yet clear whether these vacuum expectation values satisfy all the properties of Wightman functions, we can still use the reconstruction theorem to construct a Hilbert space with quantum fields, but this theory might not satisfy all the Wightman axioms. For instance, it is not clear whether the spectrum condition is satisfied or whether the vacuum is unique. Also Lorentz covariance is not established, but as explained above this follows once the spectrum condition holds. A summary of all results for the P(φ)2 -model up to this moment in history (i.e. 1972) is given in the notes [18], which can also be found in volume 1 of the ’collected papers’. 5.1.4 Similar methods for other models Without any further details we mention that, up to the beginning of the 1970s, similar techniques that were used for the P(φ)2 -model, were also used to establish some results for other models. Among these models were the two-dimensional Yukawa-model, or Y2 -model, and the twodimensional model with exponential bosonic self-interaction. However, the results for these models did not go as far as those for the P(φ)2 -model. For a summary of the results for the Y2 -model, one can consult part II of the notes [16]. 5.2 The Euclidean approach Despite the hard work of constructive field theorists that we described above, the results at the beginning of the 1970s were still very small. For that reason there was a large need for a new approach to the constructive field programm, other than the brute-force methods above. This new approach was Euclidean quantum field theory. To understand Euclidean quantum field theory, recall that the Wightman functions are boundary values of holomorphic functions Wihol (z1 , . . . , zn ) defined on the extended tube Tn0 . Here we 1 ...in consider the general case of a quantum field theory in d-dimensional Minkowski spacetime Md . It 116 can be shown that in a quantum field theory (in the sense of Wightman) with a normal connection between spin and statistics, these holomorphic functions can be analytically continued to the symmetrized tube [ (Tn0 )S := σTn0 , σ∈Sn where σ ∈ Sn permutes the n variables of (z1 , . . . , zn ) ∈ Tn0 in the obvious way. This is (part of) the content of theorem 9.6 in [2]. To each x ∈ Md we now assign an element x0 ∈ Cd given by x0 = (ix0 , x). (5.12) A point (z1 , . . . , zn ) ∈ Cdn where each of the zj is of the form (5.12) is called a Euclidean point. If such a Euclidean point satisfies the property that zj 6= zk for j 6= k, then we speak of a nonexceptional (or non-coincident) Euclidean point. The important step toward Euclidean quantum field theory is the statement that (Tn0 )S contains all non-exceptional Euclidean points, which is proposition 9.10 in [2]. As a consequence, the Wihol (z1 , . . . , zn ) are holomorphic on the set of 1 ...in non-exceptional Euclidean points. We can use this property as follows. Define the set of points (Rd )n6= := {(x1 , . . . , xn ) ∈ (Rd )n : xj 6= xk for all j 6= k} and let x 7→ x0 be as in (5.12). Then we can define the Schwinger functions Si1 ...in (x1 , . . . , xn ) := Wihol (x01 , . . . , x0n ), 1 ...in which are holomorphic functions on (Rd )n6= . The most important property of the Schwinger functions is that they are E+ (d)-covariant, i.e. covariant in the Euclidean sense. 5.2.1 Euclidean fields and probability theory At the beginning of the 1970s Edward Nelson developed a framework for Euclidean quantum fields in terms of certain stochastic processes. For a good comprehension of these ideas we first recall some terminology from probability theory. The entire content of this subsection can be found in [30] and [28], but the order in which we present the material is quite different from these references. Probability spaces A probability space is a measure space (X, A, µ) with µ(X) = 1. The σ-algebra A has the structure of a ring when we define addition by A∆B = (A\B) ∪ (B\A) and multiplication by A ∩ B. If Nµ denotes the collection of all sets in A with µ-measure zero, then Nµ is an ideal in A and we can define the quotient ring A/Nµ ; we denote the equivalence class of A ∈ A by [A]. The measure µ then defines a function [µ] on this quotient in the obvious way. Two probability spaces (X, A, µ) and (X 0 , A0 , µ0 ) are called isomorphic if there exists a ring isomorphism ψ : A/Nµ → A0 /Nµ0 such that for all [A] ∈ A/Nµ we have [µ0 ](ψ([A])) = [µ]([A]). Let (X, A, µ) and (X 0 , A0 , µ0 ) be two probability spaces and let T : X → X 0 be (A, A0 )measurable, i.e. T −1 (A0 ) ∈ A for all A0 ∈ A0 . Then T is called a measure-preserving transformation if µ(T −1 (A0 )) = µ0 (A0 ) for all A0 ∈ A0 . If T is bijective and if its inverse T −1 : X 0 → X is also measure-preserving, then T is called an invertible measure-preserving transformation. In particular, we can apply this terminology to the case where the two probability spaces coincide. In this case the invertible measure-preserving transformations form a group under composition, which will be denoted by T (X, A, µ), or simply T when there is no confusion about the probability space. Random variables A function f : X → R on a probability space (X, A, µ) is called a random variable if it is (A, BR )measurable, where BR denotes the Borel σ-algebra on R. We denote the set of all random variables on (X, A, µ) by LR (X, A). A random variable f ∈ LR (X, A) defines a probability measure µf , called the probability distribution of f , on the measurable space (R, BR ) by µf (B) := µ(f −1 (B)). 117 The Fourier transform cf of µf , Z e cf (t) := itx Z eitf dµ, dµf (x) = X R is called the characteristic function of f . The expectation value of an integrable random variable f ∈ LR (X, A) is defined by Z Z Eµ (f ) := f dµ = xdµf (x). X R Often, we will also write hf iµ to denote the expectation value of f . If f ∈ LR (X, A) is a random variable with f n integrable, then the n-th moment of f is defined as Z Z f n dµ = xn dµf (x). If the characteristic function cf is moments by differentiation of cf : Z X ∞ C , then R f has moments of all orders and we can obtain these n f dµ = (−i) n X d dt n cf . t=0 If we have two isomorphic probability spaces (X, A, µ) and (X 0 , A0 , µ0 ) with isomorphism ψ : A/Nµ → A0 /Nµ0 , then we say that two random variables f ∈ LR (X, A) and f 0 ∈ LR (X 0 , A0 ) correspond under the isomorphism ψ if for all B ∈ BR we have ψ([f −1 (B)]) = [(f 0 )−1 (B)]. When we come to define Markov fields, we need the following theorem, a proof of which can be found in section III.3 of [30] (theorem III.7). Theorem 5.2 Let (X, A, µ) be a probability space and let A0 ⊂ A be a σ-subalgebra. We write µ0 for the restriction of µ to A0 . If f ∈ LR (X, A) is an integrable random variable, then there exists a unique function (f |A0 ) ∈ LR (X, A0 ) such that for all g ∈ L∞ (X, A0 , µ0 ) we have Z Z g · (f |A0 )dµ0 = gf dµ, X X i.e. Eµ0 (g · (f |A0 )) = Eµ (gf ). Finally, we want to define the notion of a representation on a probability space. Let (X, A, µ) be a probability space and let T be the group of invertible measure-preserving transformations on (X, A, µ), as defined above. Note that any transformation T ∈ T defines a map from LR (X, A) to itself, which we will also denote by T , given by (T f )(x) := f (T −1 (x)). A representation of a group G on the probability space (X, A, µ) is a homomorphism T : G → T . We often write Tg rather than T (g) in this context. In case G is a topological group, we also assume that a representation T is ’continuous’ in the following sense. If gn → g with respect to the topology in G and if f ∈ LR (X, A), then Tgn f → Tg f in measure, which means that for all > 0 we have limn→∞ µ({x ∈ X : |Tgn f (x) − Tg f (x)| ≥ }) = 0. Sets of random variables If f1 , . . . , fn ∈ LR (X, A) are random variables on a probability space (X, A, µ), then we define their joint probability distribution µf1 ,...,fn on (Rn , BRn ) by µf1 ,...,fn (B) = µ((f1 , . . . , fn )−1 (B)) for B ∈ BRn , where on the right-hand side (f1 , . . . , fn ) : X → Rn is given by (f1 , . . . , fn )(x) = (f1 (x), . . . , fn (x)). Their joint characteristic function cf1 ,...,fn is defined by Z cf1 ,...,fn (t1 , . . . , tn ) = ei(t1 x1 +...+tn xn ) dµf1 ,...,fn (x) n ZR = ei(t1 f1 +...+tn fn ) dµ. X 118 We will now define the notion of a σ-algebra generated by a set of random variables. Let S ⊂ LR (X, A) be a set of random variables on some probability space (X, A, µ). Then the σ-subalgebra of A generated by the collection {f −1 (B) : B ∈ BR , f ∈ S} ⊂ A is called the σ-algebra generated by the collection S of random variables and we will denote it by AS (note that it is the smallest σ-subalgebra with respect to which all f ∈ S are measurable). The restriction µS of the measure µ to this σ-subalgebra defines a probability space (X, AS , µS ). On this probability space we can again define the set of µS -measure zero sets in AS , which we will denote by NµS . In case the corresponding quotient ring AS /NµS happens to coincide with A/Nµ , we say that the set S is full. Now fix some set of random variables f1 , . . . , fk ∈ LR (X, A) on a probability space (X, A, µ). Then we can consider formal power series X an1 ...nk f1 n1 . . . fk nk (n1 ,...,nk ) in the random variables fj , with addition and multiplication of two such series defined in the obvious way. By ’formal’ we mean that we do not bother about the convergence and we do not substitute actual relations that are satisfied for the fj (for instance, if f1 ≡ 1 then we still consider f1 and f1 2 as two different formal power series). We define partial derivatives of these formal power series by ∂ ∂fj X X an1 ...nk f1 n1 . . . fk nk := (n1 ,...,nk ) nj an1 ...nk f1 n1 . . . fj nj −1 . . . fk nk , (n1 ,...,nk ) where we use the convention that fj 0−1 = 0. With these formal power series we can define Wick products of random variables as follows. For each (n1 , . . . , nk ) ∈ (Z≥0 )k the Wick product : f1 n1 . . . fk nk : is the unique formal power series in the fj that is defined recursively in n = n1 + . . . + nk by the following relations : f1 0 . . . fk 0 : = 1 ∂ : f1 n1 . . . fk nk : = nj : f1 n1 . . . fj nj −1 . . . fk nk : ∂fj h: f1 n1 . . . fk nk :iµ = 0, where h.iµ denotes the expectation value as usual. It follows from the first two relations that : f1 n1 . . . fk nk : is a power series of degree n1 in f1 , of degree n2 in f2 , and so on. If we have computed all Wick products with n = n1 + . . . + nk ≤ m for some m ∈ Z≥0 , then the second relation tells us that the Wick products with n = m + 1 can be obtained by computing anti-derivatives of the Wick products with n = m. The third relation then fixes the constant term in the power series expansion of the Wick product with n = m + 1 (’the constant of integration’). Random processes, random fields, Markov fields and Euclidean fields If T is a set and (X, A, µ) is a probability space, then a map ρ : T → LR (X, A) is called a random process indexed by T . If V is a vector space and (X, A, µ) is a probability space, then a linear map λ : V → LR (X, A) is called a linear random process indexed by V . In case V is also a topological vector space, we also assume that vn → v implies that the sequence (λ(vn )) in LR (X, A) converges in measure to λ(v). In case V = D(Rd ) (see subsection 4.1.1 for the definition), we call λ a random field. On D(Rd ) we define for m > 0 and q ∈ R≥0 an inner product h., .i−q,m by Z 2 −q hf, gi−q,m := h(−∆ + m ) f, giL2 = [(−∆ + m2 )−q f ](x)g(x)dd x Rd Z = Rd ∂2 j=1 ∂xj 2 is the Laplacian Hm−q (Rd ) be the completion of where ∆ = space fb(k)b g (k) dd k [G(k, k) + m2 ]q Pd and G is the Euclidean inner product on Rd . Let the Hilbert this space. It can be shown that the embedding of D(Rd ) 119 in Hm−1 (Rd ) is continuous, so every linear random process λ : Hm−1 (Rd ) → LR (X, A) defines a random field when we restrict λ to D(Rd ). For this reason, we will call a linear random process λ : Hm−1 (Rd ) → LR (X, A) a random field, from now on. These random fields do not exhaust the set of all random fields, but they will suffice for our purposes, so we will restrict our attention to these random fields. Let (X, A, µ) be a probability space and let λ : Hm−1 (Rd ) → LR (X, A) be a random field. If K ⊂ Rd , let Aλ,K ⊂ A be the σ-subalgebra generated by the set of random variables {λ(h) ∈ LR (X, A) : h ∈ Hm−1 (Rd ), supp(h) ⊂ K}. Then the random field λ is called a Markov field over Hm−1 (Rd ) if for all open sets U ⊂ Rd and for every positive random variable f ∈ LR (X, Aλ,U ) ⊂ LR (X, A) we have the Markov property: Eµ (f |Aλ,U c ) = Eµ (f |Aλ,∂U ), where U c = Rd \U and ∂U is the boundary of U . Now let (X, A, µ) be a probability space. A Euclidean field over Hm−1 (Rd ) is a Markov field λ : Hm−1 (Rd ) → LR (X, A) together with a representation T of the Euclidean group E(d) on (X, A, µ) such that for all h ∈ Hm−1 (Rd ) and all g ∈ E(d) we have Tg (λ(h)) = λ(h ◦ g −1 ). This property is called Euclidean covariance. It is convenient to assume that for any Euclidean field the set {λ(h)}h is a full set of random variables. This situation can always be obtained by making the σ-algebra A smaller. For s ∈ R we will use the notation Ys to denote the subset {(x1 , . . . , xd ) ∈ Rd : x1 = s}. Theorem 5.3 Let λ : Hm−1 (Rd ) → LR (X, A) be a Euclidean field with corresponding representation T of E(d) on the probability space (X, A, µ) and let E0 : L2 (X, A, µ) → L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) be defined by E0 (f ) = (f |Aλ,Y0 ). If Tt ∈ E(d) denotes the translation (x1 , . . . , xd ) 7→ (x1 + t, . . . , xd ), which in turn defines a transformation on L(X, A), then we can define the operator E0 Tt E0 on L2 (X, A, µ). If the restriction of this operator to L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) is written as P t , then there exists a positive self-adjoint operator H on L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) such that P t = e−|t|H . The operator H plays an important role in Nelson’s axiom scheme for Euclidean field theory, as we will see later. Gaussian random variables and Gaussian random processes If (X, A, µ) is a probability space, then a random variable f ∈ LR (X, A) is called a Gaussian random variable (G.r.v.) if its characteristic function has the form 1 2 cf (t) = e− 2 at with a ≥ 0. If a = 0 then µf is a Dirac distribution at the origin, while µf is a Gaussian distribution whenever a > 0. A finite set f1 , . . . , fn ∈ LR (X, A) of random variables is called jointly Gaussian if their joint characteristic function has the form 1 cf1 ,...,fn (t1 , . . . , tn ) = e− 2 P i,j aij ti tj with (aij ) a symmetric real positive-definite n × n-matrix. We now have the following important result, which can be found in section I.1 of [30] (proposition I.2). 120 Proposition 5.4 (Wick’s theorem) Let (X, A, µ) be a probability space and let f1 , . . . , f2n ∈ LR (X, A) be (not necessarily distinct) jointly Gaussian random variables. Then X hfi1 fj1 iµ . . . hfin fjn iµ , hf1 . . . f2n iµ = pairings where h.iµ := Eµ (.) and the sum is over all distinct ways of partitioning the set of indices {1, . . . , 2n} into n two-element subsets {i1 , j1 }, . . . , {in , jn }. If (X, A, µ) is a probability space and V is a real vector space, then a linear random process γ : V → LR (X, A) is called a Gaussian random process indexed by V if each γ(v) is a G.r.v. and if the set {γ(v) ∈ LR (X, A) : v ∈ V } is a full set of random variables. Note that γ defines a semi-inner product h., .iγ : V × V → R by hv, wiγ := hγ(v)γ(w)iµ . If we define the linear subspace Nγ := {v ∈ V : hv, viγ = 0}, we can form the quotient vector space V /Nγ to obtain an inner product space, and then complete it to obtain a real Hilbert space HV . Note that v ∈ Nγ if and only if the probability distribution µγ(v) of the Gaussian random variable γ(v) : X → R is a Dirac distribution, since hv, viγ = hγ(v)2 iµ is the variance of γ(v) (which can only be zero for Dirac distributions). In terms of the random variable itself, the condition hv, viγ = 0 is equivalent to γ(v) ≡ 0. We now give the definition of a Gaussian random process over a real Hilbert space, which is more restrictive than the definition of a Gaussian random process over a real vector space. Definition 5.5 Let (X, A, µ) be a probability space and let H be a real Hilbert space. A linear random process γ : H → LR (X, A) is called a Gaussian random process indexed by H if (1) each γ(v) is a G.r.v., (2) the set {γ(v) ∈ LR (X, A) : v ∈ V } is a full set of random variables, (3) hv, wiγ = CG hv, wiH with CG ∈ R>0 fixed (by some convention) for all v, w ∈ H. In [30] the convention is that CG = 12 , but we will keep things general here. It follows from the positive-definiteness of h., .iH that hγ(v)2 iµ = 0 if and only if v = 0, so for the Gaussian random variable γ(v) we have γ(v) ≡ 0 if and only if v = 0. The following important theorem, which is theorem I.6 in [30], states that Gaussian random processes over a real Hilbert space are unique in a sense. Theorem 5.6 Let (X, A, µ) and (X 0 , A0 , µ0 ) be two probability spaces and let H be a real Hilbert space. If γ : H → LR (X, A) and γ 0 : H → LR (X 0 , A0 ) are Gaussian random processes indexed by H, then there exists an isomorphism ψ : A/Nµ → A0 /Nµ0 of the probability spaces such that for every h ∈ H the random variables γ(h) and γ 0 (h) correspond under the isomorphism ψ, i.e. ψ([γ(h)−1 (B)]) = [γ 0 (h)−1 (B)] for all B ∈ BR . This result allows us to speak of ’the’ Gaussian random process γH indexed by the real Hilbert space H. The existence of this Gaussian random process is shown in section I.2 of [30], where there are given several explicit constructions (which are of course equivalent to each other in the sense of the theorem). For a real Hilbert space H, we will write the underlying probability space for the Gaussian random process as (QH , AH , µH ), which is often called Q-space in the literature. For fixed n, the Wick products : γH (v1 ) . . . γH (vn ) : are square-integrable and we denote their linear span by n . For n = 0 we define W 0 to be the 1-dimensional space spanned by 1 WH QH . The algebraic (i.e. H L(alg) n 2 uncompleted) direct sum WH := n∈Z≥0 WH is a dense subspace in L (QH , AH , µH ), so L2 (QH , AH , µH ) = M n∈Z≥0 121 n WH . Most of the time we will interpret the random variables γH (h) as multiplication operators on the Hilbert space L2 (QH , AH , µH ): f 7→ γH (h)f. In this way, the Gaussian random variables γH (h) become the same kind of mathematical object as quantum fields, namely operators on a Hilbert space. n by If A ∈ B(H) is a bounded operator, then we can define a linear operator Γn (A) on WH Γn (A) : γH (v1 ) . . . γH (vn ) : = : γH (Av1 ) . . . γH (Avn ) : . It can be shown that this definition is algebraically consistent and that the norm of this operator L(alg) n is ≤ kAkn . So if kAk ≤ 1, then we can extend this to a linear operator Γ(A) on n on WH n∈Z≥0 WH and for this operator we have kΓ(A)k ≤ 1. So Γ(A) is continuous, and we can extend it to all of L2 (QH , AH , µH ). We thus conclude that any A ∈ B(H) with kAk ≤ 1 defines an operator Γ(A) ∈ B(L2 (QH , AH , µH )) with kΓ(A)k ≤ 1. For each operator A on H with domain D(A) ⊂ H, we can also define an operator dΓn (A) on n by : γH (D(A)) . . . γH (D(A)) :⊂ WH n dΓ (A) : γH (v1 ) . . . γH (vn ) := n X : γH (v1 ) . . . γH (Avj ) . . . γH (vn ) :, j=1 L(alg) for v1 , . . . , vn ∈ D(A). The extension of this operator to n∈Z≥0 : γH (D(A)) . . . γH (D(A)) : is denoted by dΓ(A) and is sometimes called the second quantization of A. Fock space and Gaussian random processes over Hilbert spaces Let H be a real Hilbert space and let HC be its complexification. Let F(HC ) be the symmetric Fock space with the creation and annihilation operators A(∗) (h) defined as before for h ∈ H (not HC ). On F(HC ) we define for h ∈ H the Segal field p b φ(h) := CG (A∗ (h) + A(h)), defined as an operator on the dense subspace D of finite-particle states, which was defined several b times before. The operators {φ(h)} h∈H all commute with each other in the sense that their spectral measures Eφ(h) commute. We denote the abelian Von Neumann algebra in B(F(HC )) generated b by these spectral measures by M. According to the Gelfand-Naimark theorem we can represent (the unital abelian C ∗ -algebra) M as C(Σ(M)), where (the compact Hausdorff space) Σ(M) is the Gelfand spectrum of M. We will write α : M → C(Σ(M)) to denote the corresponding C ∗ isomorphism. The state A 7→ hAΩ, Ωi, with Ω the vacuum vector in F(HC ), defines a probability measure µΩ on Σ(M) (with the Borel σ-algebra BΣ(M) on the topological space Σ(M)) such that for all A ∈ M we have Z hAΩ, Ωi = α(A)dµΩ . Σ(M) b Although the φ(h) are not in M, there is a natural way in which they can be represented as Borel-measurable functions (and hence random variables) on Σ(M) by extending the continuous functional calculus to the Borel functional calculus. With abuse of notation we will write these b b random variables as α(φ(h)). Because on the Fock space we have the equality heiφ(h) Ω, Ωi = 1 2 b e− 4 khkH , the random variables α(φ(h)) are in fact Gaussian: Z 1 2 2 b b cα(φ(h)) (t) = eitα(φ(h)) dµΩ = heitφ(h) Ω, Ωi = e− 4 t khkH . b Σ(M) b The set {α(φ(h))} h∈H is a full set of random variables on the measurable space (Σ(M), BΣ(M) ) 122 and, finally, we also have Z hh1 , h2 iα◦φb b 1 ))α(φ(h b 2 ))iµ = = hα(φ(h Ω b 1 ))α(φ(h b 2 ))dµΩ α(φ(h Σ(M) b 1 )φ(h b 2 )Ω, ΩiF (H ) = hφ(h b 2 )Ω, φ(h b 1 )ΩiH = hφ(h C = CG hA∗ (h2 )Ω, A∗ (h1 )ΩiH = CG hh2 , h1 iH = CG hh1 , h2 iH , so by uniqueness of the Gaussian random process indexed by H, γH := α◦φb : H → LR (Σ(M), BΣ(M) ) must be the Gaussian random process indexed by H. b For h ∈ HC L we can define φ(h) by linearity, as we did when we defined the free√ field. ⊗n 2 Now define U : → L (Σ(M), BΣ(M) , µΩ ) by U (h1 ⊗ . . . ⊗ hn ) = (n!)−1/2 ( 2)n : n∈Z≥0 HC γH (h1 ) . . . γH (hn ) : and U Ω = 1ΣM . The restriction of U to Fock space is a unitary operator U : F(HC ) → L2 (Σ(M), BΣ(M) , µΩ ) that satisfies U Ω = 1Σ(M) −1 b U φ(h)U = γH (h) n U Fn (HC ) = WH . This shows that there is a very intimate relation between the Gaussian random process indexed by a real Hilbert space H and the Segal field on the Fock space F(HC ). As operators on a Hilbert space, the Gaussian random variables γH (h) are just Segal fields on the Fock space F(HC ). The free Euclidean field in two dimensions Let N be the Hilbert space obtained by completing the set of real-valued elements in D(R) with respect to the inner product h., .iN := C1G h., .i−1,m and let γN : N → LR (QN , AN ) be the Gaussian random process indexed by N . An element g ∈ E(2) in the Euclidean group acts naturally as a linear operator on an element h ∈ N by (u(g)h)(x) := h(g −1 x). (5.13) The operator u(g) preserves the inner product h., .iN on N and therefore ku(g)k = 1. We can thus define an operator U (g) := Γ(u(g)) on L2 (QN , AN , µN ), and this operator is unitary. Now U (g), in turn, defines an invertible measure-preserving map Tg : QN → QN , but we will not show this. We merely mention that the correspondence g 7→ Tg defines a representation of E(2) on (QN , AN , µN ), see also section III.1 of [30]. Finally, it also turns out that γN satisfies the Markov property, so the Gaussian random process γN is in fact automatically a Euclidean field. We will call it the free Euclidean field in two dimensions. This free Euclidean field has the additional property that the translation subgroup of E(d) acts ergodically, which means that the only random variables that are left invariant by all translations are constant random variables. Note that proposition 5.4 above implies that γN satisfies X hγN (h1 ) . . . γN (h2n )iµN = hγN (hi1 )γN (hj1 )iµN . . . hγN (hin )γN (hjn )iµN (5.14) pairings for all h1 , . . . , h2n ∈ N . The connection with the free hermitean scalar quantum field We will now argue that there is a very natural connection between the free hermitean scalar quantum field in 2-dimensional spacetime and the free Euclidean field γN defined above. Consider the Schwinger functions S(x1 , . . . , xn ) for the free hermitean scalar field in 2-dimensional spacetime. If n is odd, these functions are identically zero because the same is true for the Wightman functions. The even Schwinger functions S(x1 , . . . , x2n ) can all be obtained from S(x1 , x2 ) by the recurrence relation X S(x1 , . . . , x2n ) = S(xi1 , xj1 ) . . . S(xin , xjn ), pairings 123 where the sum is defined as in proposition 5.4 above. In particular, S(x1 , x2 ) is symmetric in its two arguments, and so all S are symmetric under permutations of the arguments. This symmetry of the Schwinger functions is in fact present in any boson quantum field theory. The explicit form of S(x1 , x2 ) is given by Z 1 eiG(k,x1 −x2 ) 2 S(x1 , x2 ) = d k, (2π)2 R2 G(k, k) + m2 where G denotes the Euclidean inner product in R2 and m is the mass of the free field. This formula can be formally derived as follows. The two-point Wightman function for the free scalar quantum field in 2-dimensional spacetime follows easily from the 4-dimensional case (4.8) by removing two 1 space variables and by removing two factors 2π . With some additional formal manipulations we then find51 Z 1 −i[ωp (x−y)0 −p(x−y)] 1 1 p e dp W (x, y) = 2 2π R 2 p + m2 Z Z 1 −i[ωp (x−y)0 −p(x−y)] 1 dp0 1 dp = e 2π R 2 π R (p0 )2 + p2 + m2 Z 0 e−i[ωp (x−y) −p(x−y)] 2 1 d p. = (2π)2 R2 (p0 )2 + p2 + m2 Replacing the spacetime vectors xj = ((xj )0 , xj ) by x0j = (i(xj )0 , xj ) gives the desired result for S(x1 , x2 ). Although S(x1 , x2 ) is a function on (R2 )26= , it is sufficiently well-behaved to define a distribution on S(R4 ) by Z S(f ) := S(x1 , x2 )f (x1 , x2 )d4 x, R4 R2 . where x1 , x2 ∈ From this and the recurrence relation above it also follows that S(x1 , . . . , xn ) defines a distribution on S(R2n ). In terms of these distributions we can formulate the recurrence relation above as X S(f1 · . . . · f2n ) = S(fi1 · fj1 ) . . . S(fin · fjn ), (5.15) pairings where f1 , . . . , f2n ∈ S(R2 ) and (f1 · . . . · fj )(x1 , . . . , xj ) := f1 (x1 ) . . . fj (xj ) for any 2 ≤ j ≤ 2n. Now notice the resemblence between equations (5.14) and (5.15). In fact it goes much further than a mere resemblence, as we will now show. First observe that for any real-valued f1 , f2 ∈ S(R2 ) ⊂ N we have hγN (f1 )γN (f2 )iµN = hf1 , f2 iγN = CG hf1 , f2 iN Z 1 = CG [(∆ + m2 )−1 f1 ](x)f2 (x)d2 x CG R2 Z fb1 (k)fb2 (k) 2 = d k 2 R2 G(k, k) + m Z Z 1 eiG(k,x1 −x2 ) = f1 (x1 )f2 (x2 )d4 xd2 k (2π)2 R2 R4 G(k, k) + m2 = S(f1 · f2 ), where we have used that fb2 is not necessarily real-valued when f2 is real-valued. When we combine 51 The reason for including this non-rigorous derivation is that we can check that all factors of 2π are correct. 124 this equality with equations (5.14) and (5.15) we find that Z γN (f1 ) . . . γN (f2n )dµN = hγN (f1 ) . . . γN (f2n )iµN QN = X hγN (fi1 )γN (fj1 )iµN . . . hγN (fin )γN (fjn )iµN pairings = X S(fi1 · fj1 ) . . . S(fin · fjn ) pairings = S(f1 · . . . · f2n ). If we write δ · f for the function (δ · f )(t, x) = δ(t)f (x), then the formula above implies that hγN (δ · f1 ) . . . γN (δ · fn )ΩN , ΩN i = W ((δ · f1 ) · . . . · (δ · fn )). (5.16) Note that for the free quantum field it is allowed to smear out over a product of a function and a delta-function (for general quantum fields this might not be the case). Despite these beautiful relations between the free Euclidean field in two dimensions and the free hermitean scalar quantum field in 2-dimensional spacetime, the two are not the same. Of course this was to be expected, since the smeared quantum fields φ(f ) do not commute with each other in the general case where f is complex-valued, whereas random variables always commute with each other. As we will demonstrate now, it is possible to represent the time-zero quantum field φ0 as a Gaussian random process, but this Gaussian random process is not the two dimensional free Euclidean field. On the space DR (R) of real-vaued functions in D(R) we define for m > 0 the inner product h., .iF := CF h., .i− 1 ,m , where CF is fixed by some convention (in [30] the convention 2 is CF = 1), and let the Hilbert space F be the completion of this inner product space. On this b = (−∆ + m2 ) 21 . Let γF : F → LR (QF , AF ) be the Hilbert space we define the operator D b of D b as Gaussian random process indexed by F and write H0 for the second quantization dΓ(D) defined above. For each f ∈ S(R2 ) we then define Z 0 γF (f ) := eitH0 γF (ft )e−itH0 dt, R where ft is the function on R defined by ft (x) = f (t, x). By defining ΩF ∈ L2 (QF , AF , µF ) to be the random variable that is 1 everywhere, it can be shown that the quantities Z 0 0 0 0 hγF (f1 ) . . . γF (fn )ΩF , ΩF i := γF (f1 ) . . . γF (fn )dµF (5.17) QF are equal to the smeared Wightman functions W (f1 · . . . · fn ), see also theorem II.17 in [30]. Thus, for the smeared Wightman functions we have Z W (f1 · . . . · fn ) = dt1 . . . dtn heit1 H0 γF ((f1 )t1 )e−it1 H0 . . . eitn H0 γF ((fn )tn )e−itn H0 ΩF , ΩF i n R Z = dt1 . . . dtn hγF ((f1 )t1 )ei(t2 −t1 )H0 . . . ei(tn −tn−1 )H0 γF ((fn )tn )ΩF , ΩF i, Rn where in the last step we used that eitH0 ΩF = ΩF . In this sense we may interpret L2 (QF , AF , µF ) 0 as the free quantum field φ. In particular as the Fock space for a scalar particle of mass m, and γF the Gaussian process γF itself can be interpreted as the time-zero quantum field. Now that we know that the free Euclidean field and the time-zero free quantum field are Gaussian processes γN and γF , respectively, it is time to compare them. For each r ∈ R we define a linear map jr : F → N by p (jr f )(s, x) := 2CG CF δr (s)f (x), 125 where δr is the delta function concentrated at r, i.e. δr (s) = δ(s − r). The Fourier transform of jr f is √ √ Z 2CG CF 2CG CF b −i(us+kx) c f (k)e−iru , jr f (u, k) = f (x)δr (s)e dsdx = √ 2π 2π R2 so for the norm of jr f we find that kjr f k2N = = = = = 1 CG Z c jc r f (u, k)jr f (u, k) dudk k2 + u2 + m2 R2 Z |fb(k)|2 1 2CG CF dudk 2 2 2 CG 2π 2 k + u + m R Z CF π √ |fb(k)|2 dk π R k2 + m2 Z |fb(k)|2 √ dk CF k2 + m2 R kf k2F , so jr is an isometry. If gt ∈ E(2) is the time translation over t, then we will write ut to denote u(gt ) as defined in (5.13). Then p p (ut (jr f ))(s, x) = (jr f )(s−t, x) = 2CG CF δr (s−t)f (x) = 2CG CF δ(s−t−r)f (x) = (jr+t f )(s, x), b b is the which shows that ut jr = jr+t . Another property of jr is that jr∗1 jr2 = e−|r2 −r1 |D , where D differential operator on F defined above. In the same way as we did before, we can define a map Jr := Γ(jr ) : L2 (QF , AF , µF ) → L2 (QN , AN , µN ). This map is also an isometry and it satisfies Ut Jr = Jr+t and Jr∗ Jt = e−|r−t|H0 . One can now prove the Feynman-Kac-Nelson formula, see theorem III.6 of [30]. Theorem 5.7 (The free field FKN formula) Let f1 , . . . , fn ∈ F , let F0 , . . . , Fk : Rn → P C be bounded functions and let t1 , . . . , tk ≥ 0 be fixed. Let s0 ∈ R be arbitrary and let sj = s0 + ji=1 ti for 1 ≤ j ≤ k. Then hF0 (γF (f1 ), . . . , γF (fn ))e−t1 H0 F1 (γF (f1 ), . . . , γF (fn )) . . . e−tk H0 Fk (γF (f1 ), . . . , γF (f1 ))ΩF , ΩF i Z k Y = Fl (γN (jsl f1 ), . . . , γN (jsl fn ))dµN . QN l=0 As Simon shows in theorem III.19 of [30], the FKN formula remains valid if the Fj are polynomially bounded. Now consider the following special case of the FKN formula. Let f1 , . . . , fn ∈ F and let Fm (x1 , . . . , xn ) = xm+1 for 0 ≤ m ≤ n − 1 (so k = n − 1), which are polynomially bounded. Then the FKN formula reads Z −t1 H0 −tn H0 hγF (f1 )e ...e γF (fn )ΩF , ΩF i = γN (js0 f1 ) . . . γN (jsn−1 fn )dµN QN = hγN (js0 f1 ) . . . γN (jsn−1 fn )ΩN , ΩN i, where we have chosen s0 = 0 and ΩN ∈ L2 (QN , AN , µN ) is the function that is identically 1. Nelson’s axioms for Euclidean field theory As stated in equation (5.16), the free Euclidean field can be used to obtain the Wightman functions for the free quantum field. This immediately leads one to ask under what conditions a (non-free) Euclidean field will define a Wightman quantum field theory. In his article [28], Nelson defines a set of conditions on a Euclidean field that will guarantee that the Euclidean field gives rise to a Wightman quantum field theory. These conditions are known as Nelson’s axioms. The idea behind the axioms is that we try to carry out the same construction for a (non-free) Euclidean field as for 126 the free Euclidean field, and then look what extra conditions are needed to satisfy all characteristic properties of the Wightman functions. Thus, given a Euclidean field λ : Hm−1 (Rd ) → LR (X, A) with associated representation T of E(d) on the probability space (X, A, µ), we begin by defining a map jr : S(Rd−1 ) → Hm−1 (Rd ) by (jr f )(s, x) = δr (s)f (x). Then we define a time-zero field λ0 : S(Rd−1 ) → LR (X, A) by λ0 (f ) := λ(jr f ) and for t ∈ R we define λt (f ) = eitH λ0 (f )e−itH , where H is the positive self-adjoint operator that satisfies P t = e−|t|H . Finally, for f ∈ S(Rd ) we define Z λt (ft )dt, θ(f ) := R where ft (x) = f (t, x). The candidate Wightman functions are now hθ(f1 ) . . . θ(fn )Ω, Ωi where Ω is the function that is identically 1 on X. The problem is now reduced to formulating conditions that guarantee that these distributions are indeed the Wightman functions of some quantum field theory. One of these conditions is that the translation subgroup of E(2) acts ergodically, a property that was also present in the free Euclidean field. Besides this ergodicity, there is also some regularity condition, but we will not discuss it here. To summarize, the Nelson axioms require the existence of some Euclidean field, together with some regularity condition and the ergodicity property. This will guarantee that the construction above will yield a Wightman quantum field theory. 5.2.2 An alternative method: The Osterwalder-Schrader theory When the Schwinger functions are computed for some given Wightman quantum field theory, the Wightman functions can be recovered by analytic continuation of the Schwinger functions and hence also the entire quantum field theory can be recovered, by the reconstruction theorem. These ideas led in the early 1970s to the question whether it is possible to somehow begin with a set of functions which can be shown to be the Schwinger functions for some quantum field theory, and then recover the Wightman functions from these Schwinger functions. Of course, one should be able to recognize whether some given set of functions is indeed the set of Schwinger functions for some quantum field theory, i.e. one should have a set of conditions that should be satisfied for these functions in order to be the Schwinger functions of some quantum field theory. Soon after Nelson developed his axiom system for Euclidean field theory, Osterwalder and Schrader developed another axiom system, called the Osterwalder-Schrader axioms, in which the axioms describe properties of Schwinger functions that guarantee the existence of a corresponding Wightman quantum field theory. We will not further discuss the Osterwalder-Schrader axioms here. 5.2.3 The P(φ)2 -model as a Wightman model Now that we have seen two general axiomatic frameworks for a Euclidean approach, we briefly mention how the Euclidean framework can be used in the construction of the P(φ)2 -model. The first task is of course to make sense of the formal expression Z V =λ : P(φ(x)) : dx R for the interaction of the P(φ)2 -model within the Euclidean framework. In the Hamiltonian approach this was achieved by considering the interaction as a perturbation of the free field theory and by introducing a cutoff version of the interaction. The first of these two points was manifest 127 in the fact that we began with the free particle Fock space as our Hilbert space, on which we would later define the local observables for the interacting theory. The direct translation of this to Nelson’s framework would be to begin the Euclidean construction of the P(φ)2 -model with the Gaussian random process γN of which we know that it gives rise to the Schwinger functions of the free quantum field: Z S(x1 · . . . · xn ) = γN (x1 ) . . . γN (xn )dµN . QN Path-integral arguments from physics then suggest that the Schwinger functions for the interacting theory are formally given by R R −λ R2 :P(γN (x)):dx dµN QN γN (x1 ) . . . γN (xn )e R Sint (x1 · . . . · xn ) = . R −λ R2 :P(γN (x)):dx dµN QN e Recall that on the free particle Fock space we defined the normal ordering : φm (x) : of an unsmeared field by writing all creation operators to the left of all annihilation operators. For random variables we also defined the Wick ordering : f1 n1 . . . fk nk : of a product of powers of random variables, which can in particular be applied to define Wick products : γN (f1 ) . . . γN (fm ) : of the free Euclidean field γN (f ). The unsmeared version of these products can be written formally as : γN (x1 ) . . . γN (xm ) :. m (x) : as in : P(γ (x)) : above, However, these objects are not what is meant when we write : γN N which should be clear from the fact that we write only one variable x rather than m variables x1 , . . . , xm . To understand what is meant here, first consider the expression γ(x) for a Gaussian random process γ. Formally, by this expression we mean something like Z γ(x) = γ(δx ) = γ(y)δx (y)dy, but of course smearing out over a δ-function is not allowed. Mathematically we should thus replace R this with γh (x) = γ(y)h(x − y)dy, where h is a smooth function that looks like a δ-function that is concentrated at the Rorigin. Then γh (x) is a random variable for any x and we can even define random fields by g 7→ γh (x)m g(x)dx. Since γh (x) is a well-definedR random variable, we can also take the Wick product : γh (x)m : and define a random field by g 7→ : γh (x)m : g(x)dx =: γhm (g) :. By taking some limit in which h → δ, we then obtain the expression that is meant by : γ m (g) :. The unsmeared version is then denoted by : γ m (x) :. This defines : P(γN (x)) :. For details about the precise conditions for γ under which this definition is possible, see section V.1 of [30]. The cutoff for the interaction term in the Hamiltonian approach can easily be translated to a cutoff in the Euclidean theory: Z U (g) = λ : P(γN (x)) : g(x)dx, R2 where g is a cutoff function that equals 1 on a large region, only this time the cutoff function is a function on R2 instead of on R. The corresponding cutoff Schwinger functions are R γN (x1 ) . . . γN (xn )e−U (g) dµN Q R Sg (x1 , . . . , xn ) = N . −U (g) dµ N QN e When we define the measure dνg by dνg = R e−U (g) dµN , −U (g) dµ N QN e we can write the cutoff Schwinger functions more simply as Z γN (x1 ) . . . γN (xn )dνg . Sg (x1 , . . . , xn ) = QN 128 The idea is now to show that in the limit where g → 1, the cutoff Schwinger functions become a set of functions Sint (x1 , . . . , xn ) that satisfy all the Osterwalder-Schrader axioms and that these functions can be identified as the Schwinger functions of the P(φ)2 -model. This is one of the main results of Glimm, Jaffe and Spencer in their article [19]. They prove the result under the assumption that λ/m2 is sufficiently small. They also prove that the theory has a mass gap and that the infimum of the set that is obtained by removing the point {0} from the spectrum of the mass operator (H 2 − P2 )1/2 is an isolated point {mr } in the spectrum of the mass operator, and e↑ to the subspace corresponding to {mr } that the restriction of the unitary representation of P + is irreducible (and thus describes a one-particle state). In particular, the Haag-Ruelle theory can be applied to the P(φ)2 -model and this model thus has a particle structure. The techniques that were used to derive these properties are directly inspired by techniques of statistical mechanics. This can be understood by realizing that the expression for the Schwinger functions looks very much like the correlation functions that one encounters in statistical mechanics. This analogy with statistical mechanics also gave rise to the study of phase transitions and other quantities from statistical mechanics for the P(φ)2 -model. Unfortunately, there is not enough time to discuss all these interesting developments in this thesis. 129 A Hilbert space theory In this appendix we discuss some facts of Hilbert space theory that are usually not covered in an elementary course in functional analysis, such as direct integrals and unbounded operators. We will not discuss the more basic topics such as bounded operators on Hilbert space. A.1 Direct sums and integrals of Hilbert spaces k The to be the direct sum Lk direct sum of a finite number of Hilbert spaces {Hn }n=1 is defined k with inner product of two elements (hn )n=1 and (gn )kn=1 defined by n=1 Hn (of vector spaces) Lk P k h(hn )kn=1 , (gn )kn=1 i = n=1 Hn with this inner product n=1 hhn , gn iHn . It follows easily that becomes a Hilbert space. If we have a countably infinite collection of Hilbert spaces {Hn }∞ n=1 , then their direct sum is defined to be ∞ M Hn := {(hn )∞ n=1 : hn ∈ Hn for all n and n=1 ∞ X khn k2Hn < ∞}, n=1 with similar addition and scalar multiplication as P in the case of a finite number of Hilbert L∞spaces and inner product defined by h(hn )kn=1 , (gn )kn=1 i = ∞ hh , g i . It follows easily that n=1 n n Hn n=1 Hn is a Hilbert space with this inner product. We will now generalize this notion of a direct sum of Hilbert spaces to direct integrals of Hilbert spaces. Let (X, A, µ) be a measure space with positive measure µ, and suppose that for each x ∈ X we are given a Hilbert space H(x) of dimension n(x) ∈ N ∪ {∞} in such a way that the function n : X → N ∪ {∞} is A-measurable. We will then define the direct integral of the Hilbert spaces H(x) as follows. First we partition the set X into (disjoint) subsets {Xn }n∈N∪{∞} , where Xn = {x ∈ X : dim(H(x)) = n}. For any n we may then identify all Hilbert spaces {H(x)}x∈Xn with some particular Hilbert space H(n) with dim(H(n) ) = n. Now let Hn be the set of functions52 h : Xn → H(n) for whichR the function x 7→ hh(x), giH(n) from Xn to C is A-measurable for all g ∈ H(n) and such that Xn kh(x)k2H(n) dµ(x) < ∞. We then define a vector space structure on Hn by (h + g)(x) = h(x) + g(x) and R (λh)(x) = λh(x) for h, g ∈ Hn and λ ∈ C, and we define an inner product on Hn by hh, giHn = Xn hh(x), g(x)iH(n) dµ(x). It can be shown that Hn is in fact a R⊕ Hilbert space, which we will denote by Xn H(x)dµ(x). Finally, we then define the direct integral of the Hilbert spaces {H(x)}x∈X by Z ⊕ MZ ⊕ H(x)dµ(x) = H(x)dµ(x). X n Xn R ⊕ When we are given some Hilbert space H, we say that H can be represented as a direct integral Hilbert spaces H(x), with (X, µ) some measure space, if there exists an X H(x)dµ(x) of the R ⊕ isomorphism α : H → X H(x)dµ(x) of Hilbert spaces. A.2 Self-adjoint operators and the spectral theorem If H is a Hilbert space and D ⊂ H is a dense linear subspace of H, then we call an operator A : D → H (which is not necessarily bounded) a densely defined operator. If A : D → H is a densely defined operator, we define a linear subspace D∗ ⊂ H by D∗ = {k ∈ H : h 7→ hAh, ki is a bounded linear functional on D}. If k ∈ D, then h 7→ hAh, ki is a bounded linear functional on the dense subspace D ⊂ H and can therefore be extended to a bounded linear functional F on H. By the Riesz representation theorem, there exists a unique vector k ∗ ∈ H such that F (h) = hh, k ∗ i for all h ∈ H. In particular, we have hAh, ki = hh, k ∗ i for all h ∈ D. We then define a map A∗ : D∗ → H by A∗ k = k ∗ . 52 Actually equivalence classes of functions, where the equivalence relation is given by h ∼ g ⇔ h = g µ-almost everywhere. Here we pretend as if we have already chosen a particular representative of each class, so we can work with functions instead of equivalence classes of functions. 130 Definition A.1 Let A : D → H be a densely defined operator in the Hilbert space H. (i) We call A symmetric if hAh, gi = hh, Agi for all g, h ∈ D. (ii) We call A self-adjoint if A = A∗ . It is clear that every self-adjoint operator A is also symmetric, but the converse is false since in general we do not have D = D∗ . When A is bounded, the converse is in fact true. In order to formulate the spectral theorem for self-adjoint operators, we first need to recall the definition of a spectral measure and some facts about integration with respect to spectral measures. Definition A.2 If (X, Ω) is a measurable space and H is a Hilbert space, then a spectral measure for (X, Ω, H) is a function E : Ω → B(H) such that: (1) for each ∆ ∈ Ω the operator E(∆) ∈ B(H) is an orthogonal projection; (2) E(∅) = 0 and E(X) = 1H ; (3) E(∆1 ∩ ∆2 ) = E(∆1 )E(∆2 ) for ∆1 , ∆2 ∈ Ω; (4) if {∆n }∞ n=1 are pairwise disjoint sets in Ω, then ! ∞ ∞ X [ E(∆n ). E ∆n = n=1 n=1 From (2) and (3) it follows that if ∆1 , ∆2 ∈ Ω with ∆1 ∩∆2 = ∅ then E(∆1 )E(∆2 ) = E(∆2 )E(∆1 ) = 0. Also, for each h ∈ H the map µ : Ω → R given by ∆ 7→ hE(∆)h, hi is a positive measure on (X, Ω). We will denote this measure by µh . Let E be a spectral measure for (X, Ω, H) and let t : X → C be a simple function on the measurable space (X, A). We can write the simple function t as t = α1 1∆1 + . . . + αn 1∆n with αj ∈ C and all ∆j ∈ Ω disjoint. We then define the integral Z n X t(x)dE(x) := αj E(∆j ). X j=1 Since for j 6= k the sets ∆j and ∆k are disjoint, E(∆j )h and E(∆k )h are orthogonal for all h ∈ H. We may thus use the Pythagoras theorem: 2 Z n n X X 2 t(x)dE(x)h = |α | hE(∆ )h, E(∆ )hi = |αj |2 hE(∆j )h, hi j j j X j=1 = n X j=1 |αj |2 µh (∆j ) = R X |t(x)|2 dµh (x). (A.1) X j=1 Since Z dµh = µh (X) = hE(X)h, hi = khk2 it follows that Z 2 t(x)dE(x)h ≤ khk2 sup |t(x)|2 = khk2 max |t(x)|2 . x∈X X x∈X R Hence X t(x)dE(x) is a bounded operator with norm ≤ maxx∈X |t(x)|. An arbitrary bounded measurable function f : X → C can be approximated uniformly by a sequence {tn } of simple functions. According to the estimate above it follows that Z 2 Z 2 Z ≤ khk2 sup |tn (x)−tm (x)|2 . tn (x)dE(x)h − t (x)dE(x)h = [t (x) − t (x)]dE(x)h n m m X X x∈X X R Because this goes to zero uniformly on the unit ball in H for m, n → ∞, the sequence { X tn (x)dE(x)}n is a Cauchy sequence in the Banach space B(H) and thus converges to an element in B(H). We then define Z Z f (x)dE(x) := lim tn (x)dE(x) ∈ B(H), X n→∞ X 131 where the limit is taken in the norm-topology of B(H). The limit does not depend on the chosen sequence of simple functions. Because for each converging sequence {An }n in B(H) with limit A ∈ B(H)R it is in particular true that R limn→∞ (An h) = Ah for all h ∈ H, we have in the present case that X f (x)dE(x)h = limn→∞ ( X tn (x)dE(x)h). Using (A.1), we then find that 2 Z 2 Z Z 2 f (x)dE(x)h = lim tn (x)dE(x)h = lim tn (x)dE(x)h n n X X Z X Z = lim |tn (x)|2 dµh (x) = |f (x)|2 dµh (x). n X (A.2) X R Now that we have defined the integral X f (x)dE(x) for a spectral measure E for (X, Ω, H) and a bounded measurable function f : X → C, we will do the same for unbounded measurable functions. Let E be a spectral measure for (X, Ω, H), let g : X → C be a measurable function and let h ∈ H be such that g ∈ L2 (X, Ω, µh ). We can approximate g ∈ L2 (X, A, µh ) (in the L2 -norm) by a sequence of bounded measurable functions. For instance, we can take the sequence {gn }n , defined by g(x) if |g(x)| ≤ n gn (x) = 0 if |g(x)| > n. Because the functions gn are bounded, we can use the identity (A.2): Z 2 Z 2 Z gn (x)dE(x)h − gm (x)dE(x)h = [gn (x) − gm (x)]dE(x)h X X X Z = |gn (x) − gm (x)|2 dµh (x). X Because this goes to zero for n, m → ∞ (this follows from the fact that the R sequence {gn }n 2 converges to g in the L -norm and is hence a Cauchy sequence), the sequence { X gn (x)dE(x)h}n is a Cauchy sequence in H and therefore converges in H. We now define Z Z g(x)dE(x)h := lim gn (x)dE(x)h. n→∞ X X This definition does not depend on the choice of the sequence {gn } and analogously to (A.2) we now also have Z 2 Z g(x)dE(x)h = |g(x)|2 dµh (x). (A.3) X X R In general, the operator X g(x)dE(x) is not bounded and it is only defined for those h ∈ H for which g ∈ L2 (X, Ω, µh ), i.e. for those h ∈ H for which the right-hand side of (A.3) is finite. Note that (A.3) implies that Z 2 Z xdE(x)h = |x|2 dµh (x) X X for all h ∈ H for which the right-hand side is finite. We now formulate the spectral theorem for self-adjoint operators. Theorem A.3 Let A : D → H be a self-adjoint operator in a separable Hilbert space H. Then there exists a unique spectral measure EA for (R, BR , H), with BR the Borel σ-algebra on R, such that Z A= x dEA (x). R We call EA the spectral measure generated by A. The domain of A can then be expressed as R dom(A) = {h ∈ H : R |x|2 dµh (x) < ∞}. 132 We can slightly generalize the spectral theorem as follows. If {Ak }nk=1 is a system of pairwise commuting self-adjoint operators defined on some common dense domain in H, then there exists a unique spectral measure EA for (Rn , BRn , H), with BRn the Borel σ-algebra on Rn , such that Z Ak = xk dEA (x) Rn for k = 1, . . . , n, where xk denotes the k-th component of the integration variable x. We will call EA the joint spectral measure generated by the system {Ak }nk=1 . Note that we can apply our results about integration with respect to spectral measures in particular to the spectral measure EA . Thus, for each measurable function g : Rn → C we can define the operator Z g(x)dEA (x). (A.4) g(A1 , . . . , An ) := Rn The domain of this operator is the set {h ∈ H : R 2 R |g(x)| dµh (x) < ∞}. Definition A.4 If A : D → H is a self-adjoint operator in a (separable) Hilbert space H and f ∈ H, we define Hf to be the smallest closed subspace in H that contains all vectors of the form EA (∆)f with ∆ ∈ BR , and we call Hf the cyclic subspace in H generated by f . Note that if h ∈ H with h ⊥ Hf , then for g ∈ Hf we have hEA (∆)h, gi = hh, EA (∆)gi = 0, so EA (∆)h ⊥ Hf . This shows that Hh ⊥ Hf whenever h ⊥ Hf . We will now state a theorem that uses the spectral theorem for self-adjoint operators to represent the Hilbert space as a certain direct integral of Hilbert spaces. We will not prove the theorem in detail, but we will give a sketch of the proof. The reason for not omitting the proof altogether is that the construction is of great importance in quantum physics. Theorem A.5 Let A : D → H be a self-adjoint operator in a separable Hilbert space H. Then H can be represented as a direct integral Z ⊕ H(x)dµ(x) R of Hilbert spaces H(x) relative to a positive measure µ on R such that the action of A is given by multiplication by x. Proof sketch Because H is separable, we can choose a countable dense set {f1 , f2 . . .} in H. Define H1 := Hf1 = {EA (∆)f1 }∆∈BR and suppose that for some n ≥ 1 we have already constructed cyclic subspaces H1 , . . . , Hn in H that are pairwise orthogonal, and write Hn := H1 ⊕ . . . ⊕ Hn for their direct sum. L n / Hn }. If Hn = H, then we can write H = k=1 Hk . If this is not the case, let kn = min{k : fk ∈ n We choose a unit vector hn+1 in the subspace of H spanned by H and fkn such that hn+1 ⊥ Hn ; we then define Hn+1 := Hhn+1 . Then Hn+1 is orthogonal to H1 , . . . , Hn and fkn ∈ H1 ⊕. . .⊕Hn+1 , and since {f1 , f2 , . . .} is dense in H, this construction gives rise to a decomposition M H= Hn n of H into an orthogonal direct sum of (a finite or countably infinite number of) cyclic subspaces. It can be shown (although we will not prove this here) that the spaces Hn can be realized as function spaces L2µn (R), where µn (∆) = hEA (∆)hn , hn i with hn as defined above. The corresponding isomorphism πn : Hn →L ˜ 2µn (R) is defined 53 by EA (∆)hn 7→ 1∆ with 1∆ the characteristic function of the set ∆; in particular, hn 7→ 1R . The corresponding action of A on 53 The elements of L2 are actually equivalence classes of functions, not functions, but often we will work with representatives whenever this is possible. So in this particular case we actually mean that EA (∆)hn is mapped to the equivalence class of χ∆ . 133 R {f ∈ L L2µn (R) : R x2 |f (x)|2 dµn (x) < ∞} ⊂ L2µn (R) is then given by πn (Af ) = idR · πn (f ). Since H = n Hn , we can thus realize each g = (g1 , g2 , . . .) ∈ H as a sequence (π1 (g1 ), π2 (g2 ), . . .) of functions πn (gn ) ∈ L2µn (R) g ∼ π(g) := (π1 (g1 ), π2 (g2 ), . . .) ∈ M L2µn (R) n and if g ∈ D then Ag ∼ π(Ag) = (idR · π1 (g1 ), idR · π2 (g2 ), . . .) ∈ M L2µn (R). n Now define a measure µ on R by ∞ X µ(∆) = 2−n µn (∆). n=1 Note that µn (∆) = hEA (∆)hn , hn i ≤ khn k2 = 1, so the sum converges for each Borel set ∆, i.e. µ(∆) < ∞ for each Borel set ∆. It is clear that if µ(∆) = 0, then µn (∆) = 0 for all n, so we have µn << µ for all n. By the Radon-Nikodym theorem from measure R theory it then follows that for each n there exists a nonnegative function ϕn such that µn (∆) = ∆ ϕn (x)dµ(x). Now let gn ∈ Hn √ and let πn (gn ) be the corresponding function in L2µn (R). Then the function π̂n (gn ) := ϕn πn (gn ) is in L2µ (R) and Z Z Z 2 2 2 |π̂n (gn )(x)| dµ(x) = |πn (gn )(x)| ϕn (x)dµ(x) = |πn (gn )(x)|2 dµn (x) kπ̂n (gn )kL2µ (R) = R R = kπn (gn )k2L2µ n (R) R = kgn k2Hn , i.e. the mapping gn 7→ π̂n (gn ) is an isometry of Hn into L2µ (R). Now define Xn ⊂ R by Xn = {x ∈ R : ϕn (x) > 0}; then the above mapping π̂n : Hn → L2µ (R) defines an isomorphism π en : Hn →L ˜ 2µ (Xn ). We thus have an isomorphism π e : H→ ˜ M L2µ (Xn ). n Define a function n : R → N ∪ {∞} by n(x) = #{m : x ∈ Xm }. Then n(x) is measurable and if we write Bn := {x ∈ R : n(x) = n} then clearly µ(B0 ) = 0. For x ∈ Bn weL write m1 (x) < m2 (x) < . . . < mn (x) for the values of m for which x ∈ Xm . If ge = (e g1 , ge2 , . . .) ∈ n L2µ (Xn ), we (n) (n) (n) define for each n a set of functions ϕk (e g ) ∈ L2µ (Bn ) (k = 1, . . . , n) by ϕk (e g )(x) = gemk (x) (x), so L 2 2 ⊕n for each n this defines a map αn : k Lµ (Xk ) → Lµ (Bn ) given by ge = (e g1 , ge2 , . . .) 7→ ϕ(n) (e g ) := L L 2 (n) (n) 2 ⊕n (ϕ1 (e g ), . . . , ϕn (e g )). This, in turn, gives rise to a map α : n Lµ (Xn ) → n Lµ (Bn ) given by (1) (2) ge 7→ (ϕ (e g ), ϕ (e g ), . . .). Because ke g k2L 2 n Lµ (Xn ) = X ke gn k2L2µ (Xn ) = n = n = n XZ X XZ n X 2 |e gn (x)| dµ(x) = Xn n (n) |ϕk (e g )(x)|2 dµ(x) = Bn k=1 (n) kϕk (e g )(x)k2L2µ (Bn )⊕n XZ n XX n X Bn k=1 |e gmk (x) (x)|2 dµ(x) (n) kϕk (e g )(x)k2L2µ (Bn ) n k=1 = kα(e g )kLn L2µ (Bn )⊕n , n we see that the map α is isometric. For each x ∈ R we now choose a Hilbert space H(x) with dim(H(x)) = nx , where nx is the unique index in N ∪ {∞} such that x ∈ Bnx , and for x each x ∈ H we also specify a basis {ek (x)}nk=1 for H(x). For each n we can then represent R⊕ 2 ⊕n the Hilbert space Lµ (Bn ) as a direct integral Bn H(x)dµ(x) of the spaces H(x). Thus, we 134 now have isomorphisms L 2 L R⊕ 2 ⊕n ' n Lµ (Xn ) ' n Lµ (Bn ) n Bn H(x)dµ(x), which are given by (1) (2) (2) (n) (n) 7→ (ϕ1 (e g )e1 , ϕ1 (e g )e1 + ϕ2 (e g )e2 , . . . , ϕ1 (e g )e1 + . . . + ϕn (e g )en , . . .). L ge 7→ (ϕ(1) (e g ), ϕ(2) (e g ), . . .) It now follows from the definition of the direct integral that H can be represented as direct integral Z ⊕ H(x)dµ(x). R This theorem can be slightly generalized as follows. When we have a finite system {Ak }nk=1 of pairwise commuting self-adjoint operators Ak on a Hilbert space H (i.e. the spectral measures EAl and EAm of Al and Am commute for all l, m), then H can be represented as a direct integral Z ⊕ H(x)dµ(x) Rn of Hilbert spaces H(x) relative to a positive measure µ on Rn such that the action of Ak is given by multiplication by xk , where xk denotes the k-th component of the integration variable x ∈ Rn . B Examples of free fields We will now construct some of the fields for massive particles. The computation of the coefficients in (3.26) is done by using the identities # " 3 h σi 1) X (2 1 eθ·J = eθ· 2 pl [σ l ]jk = [2m(p0 + m)]− 2 (p0 + m)δjk + jk jk (1) e−θ·J 2 = h e −θ· 2 0 jk jk B.1 " i σ − 12 = [2m(p + m)] 0 l=1 3 X (p + m)δjk − # l l p [σ ]jk l=1 The (0, 0)-field (or scalar field) The (0, 0)-field (or scalar field) can only describe particles τ with spin jτ = 0. According to (3.26) the coefficients u(p) are given by u(p) = (2p0 )−1/2 C00 (0, 0; 0, 0) = (2p0 )−1/2 , so (using (3.22)) the scalar field is Z i h d3 p τ −iη(p,x) ip·x ∗ (ψ0,0 )0,0 (x) = (p) , (B.1) e a (p) + e a τ τC (2π)3/2 (2p0 )1/2 p with p0 = m2τ + p2 . Note that in case the particle τ coincides with its antiparticle τ C then ∗ (x) = ψ (x). For this reason, such field will also be called real. ψ0,0 0,0 B.2 The ( 12 , 12 )-field (or vector field) We will now consider the ( 12 , 21 )-field (or vector field). This field can only describe particles τ with spin jτ = 0 or jτ = 1. Spin 0 For spin 0 the coefficients are   u− 1 − 1 (p) 2 2  u 1 1 (p)  1  −2 2  p  =  u 1 − 1 (p)  2m p0 2 2 u 1 1 (p) 2 2 135  p1 + ip2  −p0 + p3     p0 + p3  −p1 + ip2  Now define new coefficients     0 u 1 − 1 − u− 1 1 u0 (p) p 2 2 2 2  u 1 1 − u1 1  1 u1 (p)  −im   − − 0 −1/2 p   := √   2 2 2 2 = −i(2p )  u2 (p) p2  . 2 −i[u− 12 − 12 + u 21 12 ] u3 (p) p3 u− 1 1 + u 1 − 1  2 2 2 2 1 1 (1) This new choice of coefficients corresponds to a basis transformation in the space V ( 2 , 2 ) = VA 2 ⊗ (1) VB2 . With respect to this new basis, the vector field for a particle of spin 0 becomes Z τ µ −3/2 d3 puµ (p) e−ip·x aτ (p) − eip·x a∗τ C (p) (ψ 1 , 1 ) (x) = (2π) 2 2 Z −3/2 0 − 21 = (2π) (p ) d3 p −ipµ e−ip·x aτ (p) + ipµ eip·x a∗τ C (p) τ = ∂ µ (ψ0,0 )0,0 (x) where in the first line we used that λ = (−1)2B κ = −κ. Spin 1 For spin 1 the coefficients are  u− 1 − 1 (p, 1) 2 2  u 1 1 (p, 1)   −2 2    =  u 1 − 1 (p, 1)  2 2 u 1 1 (p, 1) 2 2   u− 1 − 1 (p, 0) 2 2  u 1 1 (p, 0)   −2 2    =  u 1 − 1 (p, 0)  2 2 u 1 1 (p, 0) 2 2   u− 1 − 1 (p, −1) 2 2  u 1 1 (p, −1)   −2 2    =  u 1 − 1 (p, −1)  2 2 u 1 1 (p, −1)   −(p1 + ip2 )2  (p0 + m − p3 )(p1 + ip2 )    −(p0 + m + p3 )(p1 + ip2 ) , (p0 + m)2 − (p3 )2  [2m(p0 + m)]−1 p 2p0  2(p1 + ip2 )p3 (p0 + m − p3 )2 − (p1 )2 − (p2 )2    (p0 + m + p3 )2 − (p1 )2 − (p2 )2  , 2(−p1 + ip2 )p3  [2m(p0 + m)]−1 p 2 p0  (p0 + m)2 − (p3 )2 (p0 + m − p3 )(−p1 + ip2 )    (p0 + m + p3 )(p1 − ip2 )  . −(p1 − ip2 )2  [2m(p0 + m)]−1 p 2p0 2 2 For σ ∈ {1, 0, −1} we define    0  u 1 − 1 (p, σ) − u− 1 1 (p, σ) e (p, σ) 2 2 2 2 e1 (p, σ) p  u− 1 − 1 (p, σ) − u 1 1 (p, σ)     = p0  2 2 2 2 .  e2 (p, σ) −i[u− 1 − 1 (p, σ) + u 1 1 (p, σ)] 2 2 2 2 e3 (p, σ) u 1 1 (p, σ) + u 1 1 (p, σ) −2 2 2 −2 1 1 (1) (1) Again, this corresponds to a basis transformation in the space V ( 2 , 2 ) = VA 2 ⊗ VB2 . However, the new coefficients are not eµ (p, σ), but rather √1 0 eµ (p, σ) (since any new basis must be a linear p 136 combination of the old basis, with scalar coefficients). Explicitly, these new coefficients are   0   (p1 + ip2 )(p0 + m) e (p, 1) 0 1 2 1 2  e1 (p, 1)   = − √1 [m(p0 + m)]−1  m(p + m) + (p ) + ip p  , 2 1 2 0 p p + im(p + m) + i(p2 )2  e (p, 1) 2 e3 (p, 1) (p1 + ip2 )p3    0  (p0 + m)p3 e (p, 0)  e1 (p, 0)  p1 p3 0 −1    , 2 3 e2 (p, 0) = [m(p + m)]   p p 3 3 2 0 e (p, 0) (p ) + m(p + m)    0  (p1 − ip2 )(p0 + m) e (p, −1) 0 1 2  1 2 e1 (p, −1)  = √1 [m(p0 + m)]−1  (p ) + m(p + m) − ip p  .  2 1 2 0 p p − im(p + m) − i(p2 )2  e (p, −1) 2 e3 (p, −1) (p1 − ip2 )p3 Note that for zero momentum these coefficients are     0 0     1 1 µ , 0 , e (0, 0) = eµ (0, 1) = − √  0 2 i 1 0  0 1 1  eµ (0, −1) = √  2 −i 0  and that the eµ (p, σ) are related to these by eµ (p, σ) = L(p)µ ν eν (0, σ), p where L(p) is the standard boost that maps the momentum vector (m, 0) to ( m2 + p2 , p):  0 0  p (p + m) p1 (p0 + m) p2 (p0 + m) p3 (p0 + m)  p1 (p0 + m) m(p0 + m) + (p1 )2 p1 p2 p1 p3 . L(p)µ ν = [m(p0 +m)]−1  2 0 1 2 0 2 2 2 3  p (p + m) p p m(p + m) + (p ) p p 3 0 1 3 2 3 0 3 2 p (p + m) p p p p m(p + m) + (p ) (B.2) The field is σ=1 X Z d3 p τ µ −3/2 p (ψ 1 , 1 ) (x) = (2π) eµ (p, σ) e−ip·x aτ (p, σ) + (−1)σ eip·x a∗τ C (p, −σ) . 2 2 2p0 σ=−1 B.3 The ( 12 , 0)-field and the (0, 12 )-field These fields can only describe particles with spin 21 . The coefficients for the ( 21 , 0)-field are given by ! 0 3 u 1 0 (p, 12 ) 0 0 − 12 p + m + p 2 = [4mp (p + m)] , u− 1 0 (p, 21 ) p1 + ip2 2 ! 1 2 u 1 0 (p, − 12 ) 1 p − ip 0 0 − 2 = [4mp (p + m)] 2 . u− 1 0 (p, − 21 ) p0 + m − p3 2 So the field is given by σ= 12 τ −3/2 (ψ 1 ,0 )a0 (x) = (2π) 2 X Z h i 1 d3 pua0 (p, σ) e−ip·x aτ (p, σ) + (−1) 2 σ eip·x a∗τ C (p, −σ) σ=− 21 σ= 21 −3/2 = (2π) X Z d3 p ua0 (p, σ)e−ip·x aτ (p, σ) + va0 (p, σ)eip·x a∗τ C (p, σ) , σ=− 21 137 1 where in the last line we have restored the coefficients va0 (p, σ) = (−1) 2 +σ ua0 (p, −σ) and we have used that λ = (−1)2B κ = κ. The coefficients for the (0, 21 )-field are given by ! 0 3 u0 1 (p, 12 ) 0 0 − 12 p + m − p 2 = [4mp (p + m)] , u0− 1 (p, 21 ) −p1 − ip2 2 ! u0 1 (p, − 21 ) −p1 + ip2 0 0 − 12 2 = [4mp (p + m)] . u0− 1 (p, − 21 ) p0 + m + p3 2 So the field is given by σ= 12 τ (0, ψ 1 )0b (x) = (2π) −3/2 2 X Z h i 3 d3 pu0b (p, σ) e−ip·x aτ (p, σ) + (−1) 2 σ eip·x a∗τ C (p, −σ) σ=− 12 σ= 21 = (2π)−3/2 X Z d3 p u0b (p, σ)e−ip·x aτ (p, σ) − v0b (p, σ)eip·x a∗τ C (p, σ) , σ=− 12 1 where v0b (p, σ) = (−1) 2 +σ u0b (p, −σ) and the minus sign in the second term comes from λ = (−1)2B κ = −κ. The ( 12 , 0) ⊕ (0, 21 )-field (or Dirac field) B.4 This field is just (ψ τ1 ,0 )a0 (x) 2 τ ) (x) (ψ0, 1 0b σ= 21 ! = (2π) −3/2 X Z σ=− 12 2 va0 (p, σ) ua0 (p, σ) −ip·x eip·x a∗τ C (p, σ) e aτ (p, σ) + d p −v0b (p, σ) u0b (p, σ) 3 σ= 12 = (2π) −3/2 X Z d3 p u(p, σ)e−ip·x aτ (p, σ) + v(p, σ)eip·x a∗τ C (p, σ) , σ=− 12 where  u 1 0 (p, 12 ) 2 u 1 (p, 1 ) 1  −20 2  u(p, ) =   2  u0 1 (p, 12 )  2 u0− 1 (p, 21 ) 2   u 1 0 (p, − 12 ) 2 u 1 (p, − 1 ) 1  −20 2  u(p, − ) =   2  u0 1 (p, − 12 )  2 u0− 1 (p, − 12 ) 2   v 1 0 (p, 12 ) 2  v 1 (p, 1 )  1  −20 2  v(p, ) =   2  −v0 1 (p, 12 )  2 −v0− 1 (p, 21 ) 2   v 1 0 (p, − 12 ) 2  v 1 (p, − 1 )  1  −20 2  v(p, − ) =   2  −v0 1 (p, − 12 )  2 −v0− 1 (p, − 12 )   p0 + m + p3  p1 + ip2    p0 + m − p3  , −p1 − ip2  1 = [4mp0 (p0 + m)]− 2  p1 − ip2 p0 + m − p3     −p1 + ip2  , p0 + m + p3  1 = [4mp0 (p0 + m)]− 2  −p1 + ip2 −p0 − m + p3     −p1 + ip2  , p0 + m + p3  1 = [4mp0 (p0 + m)]− 2  p0 + m + p3  p1 + ip2    −p0 − m + p3  . p1 + ip2  1 = [4mp0 (p0 + m)]− 2 2 138 If we define the 4 × 4-matrix M by   0 0 p0 + p3 p1 − ip2  0 0 p1 + ip2 p0 − p3  , M = 1 2  p0 − p3 −p + ip 0 0  −p1 − ip2 p0 + p3 0 0 then it is easily seen that M u(p, σ) = mu(p, σ) and M v(p, σ) = −mv(p, σ). If we now define the 4 × 4-matrices 0 1C2 0 −σ i 0 i γ = , γ = , 1C2 0 σi 0 where σ i denote the Pauli matrices, these equations can be rewritten as (γ µ pµ − m)u(p, σ) = 0 and (γ µ pµ + m)v(p, σ) = 0. Using that i∂µ u(p, σ)e−ip·x = pµ u(p, σ)e−ip·x and i∂µ v(p, σ)eip·x = −pµ v(p, σ)eip·x , this in turn implies that the field satisfies the Dirac equation (iγ µ ∂µ − m) ! (ψ τ1 ,0 )a0 (x) 2 τ ) (x) (ψ0, 1 0b = 0. 2 Note furthermore that under a space inversion the field transforms as ! τ ) (x) ! (ψ τ1 ,0 )a0 (x) (ψ0, 1 0b 2 2 P−1 = P . τ τ (ψ0, 1 )0b (x) (ψ 1 ,0 )a0 (x) 2 2 139 References [1] H. Araki. Mathematical theory of quantum fields, Oxford University Press, Oxford, 2000 [2] N.N. Bogolubov, A.A. Logunov, A.I. Oksak, I.T. Todorov. General principles of quantum field theory, Kluwer, Dordrecht, 1990 [3] N.N. Bogolubov, A.A. Logunov, I.T. Todorov. Introduction to axiomatic quantum field theory, W.A. Benjamin, Inc., Massachusetts, 1975 [4] J. Cannon, A. Jaffe. Lorentz covariance of the λ(φ4 )2 quantum field theory, Comm. Math. Phys. 17 (1970), 261-321 [5] J.B. Conway. A course in functional analysis (2nd edition), Springer, New York, 1990 [6] P.A.M. Dirac. Lectures on quantum mechanics, Yeshiva University, New York, 1964 [7] G.G. Emch. Algebraic methods in statistical mechanics and quantum field theory, John Wiley and sons, Inc., New York, 1972 [8] G.B. Folland. Quantum field theory, A tourist guide for mathematicians, American Mathematical Society, Providence, 2008 [9] I.M. Gel’fand, N.Ya. Vilenkin. Generalized functions, Volume 4: Applications of harmonic analysis, Academic Press, New York, 1964 [10] J. Glimm. Boson fields with non-linear self-interaction in two dimensions, Comm. Math. Phys. 8 (1968), 12-25 [11] J. Glimm, A. Jaffe. A λφ4 quantum field theory without cut-offs I., Phys. Rev., 176 (1968), 1945-1961 [12] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs II. The field operators and the approximate vacuum, Ann. Math., 91 (1970), 362-401 [13] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs III. The physical vacuum, Acta Math., 125 (1970), 204-267 [14] J. Glimm, A. Jaffe. The energy momentum spectrum and vacuum expectation values in quantum field theory I, J. Math. Phys. 11 (1970), 3335-3338 [15] J. Glimm, A. Jaffe. The energy momentum spectrum and vacuum expectation values in quantum field theory II, Comm. Math. Phys. 22 (1971), 1-22 [16] J. Glimm, A. Jaffe. Quantum field theory models, in Statistical mechanics and quantum field theory, C. De Witt and R. Stora, Editors, Gordon and Breach, New York, 1971 [17] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs III. Perturbations of the Hamiltonian, J. Math. Phys., 13 (1972), 1568-1584 [18] J. Glimm, A. Jaffe. Boson quantum field models, in Mathematics of contemporary physics, R. Streater, Editor, Academic Press, London, 1972 [19] J. Glimm, A. Jaffe, T. Spencer. The Wightman axioms and particle structure in the P(φ)2 quantum field model, Ann. Math. 100 (1974), 585-632 [20] R. Haag. Local quantum physics: Fields, particles, algebras, Springer, Berlin, 1996 [21] R. Haag, D.Kastler. An algebraic approach to quantum field theory, J. Math. Phys. 5 (1964), 848 140 [22] B.C. Hall. Lie groups, Lie algebras, and representations, Springer, New York, 2003 [23] A. Jaffe. Constructive www.arthurjaffe.com quantum field theory, Unpublished notes available at [24] A.M.L Messiah, O.W. Greenberg. Symmetrization postulate and its experimental foundation, Phys. Rev. 136 (1964), 248-267 [25] J.R. Munkres. Topology (2nd edition), Prentice Hall, Inc., New Jersey, 2000 [26] G.L. Naber. The geometry of Minkonwski spacetime, an introduction to the mathematics of the special theory of relativity Springer, New York, 1992 [27] E. Nelson. A quartic interaction in two dimensions, in Mathematical Theory of Elementary Particles, R. Goodman and I. Segal, Editors, M.I.T. Press, Cambridge, 1966 [28] E. Nelson. Construction of quantum fields from Markoff fields, J. Func. Anal. 12 (1973), 97-112 [29] L.H. Ryder. Quantum field theory (2nd edition), Cambridge University press, Cambridge 1996 [30] B. Simon. The P (φ)2 (quantum) field theory, Princeton University Press, Princeton, 1974 [31] R.F. Streater. Connection between the spectrum condition and the Lorentz invariance of P(φ)2 , Comm. Math. Phys. 26 (1972), 109-120 [32] R.F. Streater, A.S. Wightman. PCT, spin and statistics, and all that (2nd revised printing), Benjamin, New York, 1978; reprinted by Princeton University Press, Princeton, 2000 [33] S.J. Summers. A perspective on constructive quantum field theory, arXiv: 1203.3991v1 [34] L.A.Takhtajan. Quantum mechanics for mathematicians, American Mathematical Society, Providence, 2008 [35] S. Weinberg. The quantum theory of fields I: Foundations, Cambridge University Press, Cambridge, 1995 141 Popular summary (english) The special theory of relativity provides a mathematical model for space and time in the absence of gravity; this mathematical model is also called Minkowski spacetime. The benefit of Minkowski spacetime, over the more classical Newtonian spacetime, is that Minkowski spacetime remains accurate when objects are moving with (almost) the speed of light. Quantum theory provides a mathematical model for the behaviour of microscopically small objects, such as atoms or elementary particles. If one speaks of quantum mechanics, one often means the quantum theory in which the microscopically small objects are in a spacetime that is described by the Newtonian model of spacetime. However, in order to make accurate predictions for systems consisting of microscopically small objects that travel with almost the speed of light (think for instance of the situation in a particle accelerator), it becomes necessary to develop a quantum theory that uses Minkowski spacetime. In the development of such a theory it soon becomes very natural to introduce fields, and for this reason the corresponding theory is called quantum field theory. Quantum field theory is a very successful theory in the sense that the predictions that can be made within the theory are in very good agreement with experimental data. From the point of view of a physicist, quantum field theory is therefore a very good theory. However, from a mathematical point of view quantum field theory is very ill-defined, because the precise nature of the ’mathematical’ objects that are used in the theory is often unclear. In this thesis we investigate to what extent quantum field theory can be described as a mathematical theory. We will find that there are interesting formulations of what quantum field theory should be mathematically, but that it is not so easy to prove that quantum field theory, as it is used by physicists, is actually of the form as described by these mathematical formulations54 . That this is not so easy will follow from the difficulties that we encounter when we prove this for much easier (and non-realistic) versions of quantum field theory. 54 This problem, also called the quantum Yang-Mills problem, is one of the seven Millenium Prize Problems. Whoever manages to solve such a problem, receives a million dollars from the Clay Mathematics Institute. Thusfar, only one of these seven problems has been solved, namely the Poincaré conjecture. 142 Popular summary (dutch) De speciale relativiteitstheorie geeft een wiskundig model voor ruimte en tijd in de afwezigheid van zwaartekracht; dit wiskundig model heet ook wel Minkowski ruimtetijd. Het voordeel van Minkowski ruimtetijd, boven de meer ouderwetse Newtoniaanse ruimtetijd, is dat Minkowski ruimtetijd ook nauwkeurig blijft naarmate objecten met (bijna) de lichtsnelheid bewegen. De kwantumtheorie geeft een wiskundig model voor het gedrag van microscopisch kleine objecten, zoals atomen of elementaire deeltjes. Wanneer men spreekt over kwantummechanica, dan bedoelt men meestal de kwantumtheorie waarbij de microscopisch kleine objecten zich bevinden in een ruimtetijd die beschreven wordt door het Newtoniaanse model van ruimtetijd. Om nauwkeurige voorspellingen te doen voor systemen bestaande uit microscopisch kleine objecten die met bijna de lichtsnelheid bewegen (denk bijvoorbeeld aan de situatie in een deeltjesversneller), is het echter noodzakelijk om een kwantumtheorie te onwikkelen die uitgaat van Minkowski ruimtetijd. In de ontwikkeling van een dergelijke theorie wordt het al gauw heel natuurlijk om velden te introduceren, en om deze reden heet de betreffende theorie dan ook kwantumveldentheorie. Kwantumveldentheorie is een zeer succesvolle theorie in de zin dat de voorspellingen die met de theorie gedaan kunnen worden, met zeer grote nauwkeurigheid overeenkomen met de experimentele data. Vanuit de natuurkunde beschouwd is kwantumveldentheorie dus een zeer goede theorie. Wiskundig gezien is kwantumveldentheorie echter een theorie die zeer slecht gedefinieerd is, omdat vaak niet duidelijk is wat de precieze aard is van de ’wiskundige’ objecten die gebruikt worden. In deze scriptie onderzoeken we in hoeverre kwantumveldentheorie beschreven kan worden als een wiskundige theorie. Het zal blijken dat er interessante formuleringen zijn van wat kwantumveldentheorie wiskundig zou moeten zijn, maar dat het nog niet zo eenvoudig is om te bewijzen dat de kwantumveldentheorie, zoals die gebruikt wordt door natuurkundigen, ook daadwerkelijk van de vorm is zoals beschreven in deze wiskundige formuleringen55 . Dat dit niet eenvoudig is zal blijken uit de moeilijkheden die we tegenkomen als we ditzelfde bewijzen voor veel eenvoudigere (en niet-realistische) versies van kwantumveldentheorie. 55 Dit probleem, ook wel het quantum Yang-Mills probleem genoemd, is één van de zeven Millenium Prize Problems. Wie een dergelijk probleem op weet te lossen, krijgt een miljoen dollar uitgekeerd van het Clay Mathematics Institute. Tot dusver is slechts één van deze zeven problemen opgelost, namelijk het Poincaré vermoeden. 143

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Axiomatic and constructive quantum field theory Thesis for the