Download Axiomatic and constructive quantum field theory Thesis for the

Document related concepts

Quantum chaos wikipedia , lookup

Uncertainty principle wikipedia , lookup

Old quantum theory wikipedia , lookup

Coherent states wikipedia , lookup

Quantum electrodynamics wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

Quantum gravity wikipedia , lookup

Instanton wikipedia , lookup

Theory of everything wikipedia , lookup

Density matrix wikipedia , lookup

Renormalization wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

Tensor operator wikipedia , lookup

Photon polarization wikipedia , lookup

Bell's theorem wikipedia , lookup

Quantum vacuum thruster wikipedia , lookup

Quantum field theory wikipedia , lookup

Noether's theorem wikipedia , lookup

Hilbert space wikipedia , lookup

Path integral formulation wikipedia , lookup

Relational approach to quantum physics wikipedia , lookup

Event symmetry wikipedia , lookup

Propagator wikipedia , lookup

Canonical quantum gravity wikipedia , lookup

Renormalization group wikipedia , lookup

Relativistic quantum mechanics wikipedia , lookup

Probability amplitude wikipedia , lookup

Quantum state wikipedia , lookup

History of quantum field theory wikipedia , lookup

Mathematical formulation of the Standard Model wikipedia , lookup

T-symmetry wikipedia , lookup

Topological quantum field theory wikipedia , lookup

Bra–ket notation wikipedia , lookup

Scalar field theory wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Quantum logic wikipedia , lookup

Oscillator representation wikipedia , lookup

Canonical quantization wikipedia , lookup

Transcript
Axiomatic and constructive quantum field theory
Thesis for the
master’s in Mathematical physics
Sohail Sheikh
Student number: 0481289
Thesis advisor : prof.dr. R.H. Dijkgraaf
Second reader : dr. H.B. Posthuma
Korteweg-de Vries Institute (KdVI)
Universiteit van Amsterdam (UvA)
FNWI
August 2013
Abstract
We investigate the mathematical structure of quantum field theory. For this purpose, we
first have to develop the mathematical framework for a relativistic quantum theory in terms
of a Hilbert space on which there is defined a unitary representation of the universal covering
group of the restricted Poincaré group. We then discuss how quantum fields can be used in
physics to compute physical quantities for scattering processes, such as scattering amplitudes.
After this discussion about the use of quantum fields in physics, we analyze two different axiom
systems for mathematically rigorous quantum field theory, namely the Wightman axiom system
and the Haag-Kastler axiom system. Finally, we look at some results in constructive quantum
field theory (CQFT). In CQFT the goal is to construct concrete non-trivial examples of models
that satisfy the Wightman axioms or Haag-Kastler axioms.
Contents
1 Introduction
1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.2 Conventions and notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1.3 Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2 Special relativity and quantum theory
2.1 Special relativity . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.1.1 Minkowski spacetime . . . . . . . . . . . . . . . . . . . . . .
2.1.2 The Lorentz group, causal structure and the Poincaré group
2.2 Quantum theory . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2.1 States and observables . . . . . . . . . . . . . . . . . . . . .
2.2.2 The general framework of quantum theory . . . . . . . . . .
2.2.3 Symmetries in quantum theory . . . . . . . . . . . . . . . .
2.2.4 Poincaré invariance and one-particle states . . . . . . . . .
2.2.5 Many-particle states and Fock space . . . . . . . . . . . . .
3 The
3.1
3.2
3.3
3.4
3.5
physics of quantum fields
The interaction picture and scattering theory . . . . .
The use of free quantum fields in scattering theory . .
Calculation of the S-matrix using perturbation theory
Obtaining V from a Lagrangian . . . . . . . . . . . . .
Some remarks on the physics of quantum fields . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
3
3
5
5
.
.
.
.
.
.
.
.
.
6
6
8
10
19
19
21
24
32
46
.
.
.
.
.
52
52
57
62
64
71
4 The mathematics of quantum fields
73
4.1 The Wightman formulation of quantum field theory . . . . . . . . . . . . . . . . . 73
4.1.1 Mathematical preliminaries: Distributions and operator-valued distributions 73
4.1.2 The Wightman axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.1.3 Wightman functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1.4 Important theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.1.5 Example: The free hermitean scalar field . . . . . . . . . . . . . . . . . . . 90
4.1.6 Haag-Ruelle scattering theory . . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.2 The Haag-Kastler formulation of quantum field theory . . . . . . . . . . . . . . . . 95
4.2.1 The algebraic approach to quantum theory . . . . . . . . . . . . . . . . . . 95
4.2.2 The Haag-Kastler axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.2.3 Vacuum states in the Haag-Kastler framework . . . . . . . . . . . . . . . . 105
5 Constructive quantum field theory
5.1 The Hamiltonian approach . . . . . . . . . . . . . . . . . . . . . .
5.1.1 The (λφ4 )2 -model as a Haag-Kastler model . . . . . . . . .
5.1.2 The physical vacuum for the (λφ4 )2 -model . . . . . . . . . .
5.1.3 The P(φ)2 -model and verification of some of the Wightman
5.1.4 Similar methods for other models . . . . . . . . . . . . . . .
5.2 The Euclidean approach . . . . . . . . . . . . . . . . . . . . . . . .
5.2.1 Euclidean fields and probability theory . . . . . . . . . . . .
5.2.2 An alternative method: The Osterwalder-Schrader theory .
5.2.3 The P(φ)2 -model as a Wightman model . . . . . . . . . . .
. . . . .
. . . . .
. . . . .
axioms .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
107
107
107
114
115
116
116
117
127
127
A Hilbert space theory
130
A.1 Direct sums and integrals of Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . 130
A.2 Self-adjoint operators and the spectral theorem . . . . . . . . . . . . . . . . . . . . 130
1
B Examples of free fields
B.1 The (0, 0)-field (or scalar field) . . . .
B.2 The ( 21 , 12 )-field (or vector field) . . . .
B.3 The ( 21 , 0)-field and the (0, 12 )-field . .
B.4 The ( 12 , 0) ⊕ (0, 12 )-field (or Dirac field)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
135
135
135
137
138
References
140
Popular summary (english)
142
Popular summary (dutch)
143
2
1
Introduction
Quantum field theory (QFT) is the physical theory that emerged when physicists tried to construct
a quantum theory that would be compatible with special relativity. Although this theory is used
to describe high-energy sub-atomic particles, the fundamental objects of the theory are fields.
The theoretical predictions that can be made with QFT have been tested several times against
experimental results, and these predictions turned out to be highly accurate. For this reason, QFT
should be viewed as a very important physical theory.
When I first came to study QFT, there were some aspects of the theory that puzzled me. For
example, I was very used to the systematic approach of non-relativistic quantum mechanics, where
for each system one can specify very precisely in what Hilbert space the physical states live. In fact,
in non-relativistic quantum mechanics the choice of Hilbert space is completely determined by the
number of degrees of freedom, by virtue of the Stone-Von Neumann theorem. However, in QFT
the only systems for which the Hilbert spaces are specified are the free systems, i.e. the systems
that describe particles that do not interact with each other. I found it very confusing that in QFT
one describes a physical system quantum mechanically without ever mentioning the Hilbert space.
After all, in non-relativistic quantum mechanics the starting point for any quantum mechanical
description was the Hilbert space. Another thing that I found very difficult about QFT is the
fact that it is built around perturbation theory. In non-relativistic quantum mechanics one often
starts with an exact theory for describing a particular system and then uses perturbation theory
to approximate the solution for the system. In QFT the starting point is already a perturbative
expression, such as the Dyson series, without the mention of what exact mathematical expression
we are approximating.
In my first meeting with professor Dijkgraaf I explained to him that I had some difficulties
in understanding QFT and that I would like to work on a master thesis that would take these
difficulties away. It was in this context that we came up with the idea to examine the mathematical
structure of QFT in terms of the Wightman framework and the Haag-Kastler framework. Professor
Dijkgraaf emphasized that he wanted me to motivate these mathematical frameworks by using
arguments from physics, and that he also wanted to see some results from constructive quantum
field theory (CQFT). Besides these two demands, I was completely free to organize the thesis
according to my own taste.
This master thesis should be readable to anyone with a healthy knowledge of real and complex
analyis, measure and integration theory, functional analysis, operator algebras, Lie groups and Lie
algebras. Since I have followed several courses1 in these fields during my study, I did not bother
explaining anything concerning these topics. For instance, I do not give the definition of C ∗ algebras and Von Neumann algebras, and I do not explain the Gelfand-Naimark-Segal construction
for general C ∗ -algebras or the Gelfand-Naimark theorem for abelian C ∗ -algebras. On the other
hand, since I was not familiar with the theory of distributions and operator-valued distributions,
I did write a subsection on these subjects. Furthermore, since the theory of unbounded operators
on a Hilbert space is not part of any course that one can follow at the University of Amsterdam, I
also included an appendix on unbounded (self-adjoint) operators. Strictly speaking, no knowledge
of physics is required for reading this thesis, although the material would be more natural if one
already has some background knowledge in quantum physics.
1.1
Overview
This thesis consists of four chapters, excluding the present introduction. These chapters consist of
several sections, some of which are further decomposed into subsections. The content of each of
the four chapters can be briefly summarized as follows.
1
In the national mastermath programm there were two excellent courses given by M. Müger and N.P. Landsman
on C ∗ -algebras and operator algebras, respectively. I am very grateful to these two teachers from de Radboud
Universiteit Nijmegen, not only for giving these courses, but also for the fact that they helped me very much in
finding good literature for this thesis.
3
In chapter 2 we will consider the two main ingredients of quantum field theory, namely special
relativity and quantum theory. The first section of this chapter, on special relativity, commences
with the introduction of certain physical concepts such as inertial observers and the invariance
of the speed of light. These physical concepts will then be used to motivate the structure of the
mathematical model that will be used for the description of spacetime. In the remainder of the
section we will investigate the properties of this mathematical model, including a detailed discussion
of the Poincaré group and its universal covering. In the second section, on quantum theory, we
define the notion of a state and an observable from the physical point of view. We will then explain
that in quantum theory these states and observables are represented mathematically in terms of
Hilbert space objects, and we will consider time evolution in both the Heisenberg picture and
Schrödinger picture. Because any quantum theory that is consistent with special relativity should
be invariant under Poincaré transformations, the next logical step is to study how symmetries are
implemented in a quantum theory. The important result in this context is Wigner’s theorem, which
states that, in a quantum system without superselection rules, any symmetry can be represented
either by a unitary operator or by an anti-unitary operator. Wigner’s theorem can then be applied
to Poincaré transformations, which will eventually lead to the conclusion that the Hilbert space
of any relativistic quantum system must contain a unitary representation of the universal covering
group of the restricted Poincaré group. In case this representation is irreducible, the corresponding
Hilbert space is interpreted as the pure state space of a single-particle system. The concepts of
mass and spin of a particle arise very naturally in the study of these irreducible representations.
In the last part of the section we will construct the Hilbert spaces for systems consisting of more
than one particle. These spaces are tensor products of single particle Hilbert spaces, but in general
not all states in these tensor products are physically realizable. Finally, we will consider spaces in
which the total number of particles is not constant.
In chapter 3 we will give a brief overview of the use of quantum fields in physics. The concept
of a quantum field will be introduced as a computational tool in the perturbative calculations that
are carried out in the quantitative description of experiments in which high-energy particles collide
with each other. This point of view is adopted from Weinberg’s book [35]. Once the quantum
fields are defined, we will sketch how they can be used to compute physically interesting quantities.
These computations are done by using perturbation theory, which is made somewhat easier by the
use of Feynman diagrams and the corresponding Feynman rules. However, we will not explain
the precise content of the Feynman rules, nor will we consider the process of renormalization
which is necessary to transform infinite quantities into finite ones. After considering methods of
computation in quantum field theory, we will show how one can obtain a quantum field theory from
a classical Lagrangian field theory, since this is the route that is followed in practice. Finally, we
will try to motivate what aspects of the physical theory should be included in any mathematically
rigorous treatment of quantum fields.
In chapter 4 we will discuss two different axiom systems for quantum field theory. In the first
section of this chapter we will consider the Wightman axioms. These axioms will be motivated
by the physical structure of quantum field theory as described in chapter 3. However, before
we can formulate the Wightman axioms, we first need to study the theory of distributions and
operator-valued distributions. After these mathematical concepts have been studied and after the
axioms have been formulated, we will prove several properties that are shared by all Wightman
theories. Among these properties are the spin-statistics theorem and the PCT-theorem, which are
very important from the physical point of view. As an easy example of a Wightman theory we
will consider the free hermitean scalar field. We will treat this example in some detail, because the
results will also be necessary when we will construct an interacting quantum field theory in the
next chapter. To close our discussion of the Wightman theory, we will show that under certain
conditions the Wightman theories allow an interpretation in terms of particles. These conditions
define what is called the Haag-Ruelle scattering theory. In the second section of chapter 4 we
will consider an alternative axiom system, namely the Haag-Kastler system. Again we will try to
motivate the content of the axioms by using our knowledge from previous discussions. Because the
Haag-Kastler systems are formulated in terms of abstract algebras rather than concrete Hilbert
4
spaces, we will begin with a treatment of algebraic quantum theory. This treatment includes topics
such as physical representations, superselection rules and symmetries. After we have considered
algebraic quantum theory in some detail, we will formulate the Haag-Kastler axioms and we will
consider some important results concerning Haag-Kastler systems.
In chapter 5 we will describe some of the early developments in the constructive quantum field
theory programm, which started in the 1960s. In constructive quantum field theory (CQFT) the
purpose is to construct concrete examples of non-trivial models that satisfy all axioms of one or
both of the two axiom systems mentioned above. This is a very difficult task, and for that reason
people started with the easiest possible models in 2 and 3-dimensional spacetime. We will consider
some of the results that were obtained for these ’easy’ models. Because the detailed proofs of
these results are almost always very technical (and not very fun), we will only focus on the main
arguments and constructions.
1.2
Conventions and notation
Physicists and mathematicians often use very different notations for the same mathematical object, so we had to make some choices. The most important conventions and notations that we will
use are the following.
-We will use the Einstein summation convention: if in some equation a Greek letter appears once
as a lower index and once as an upper index, then that index should be summed over all possible
values of the index.
-Inner products h., .i : V × V → C on a complex vector space will always be linear in the first
argument and conjugate-linear in the second. This coincides with the convention used in the mathematics literature on linear algebra and functional analysis, but is opposite to the convention used
in the physics literature on quantum theory. In particular, the bra-ket notation of Dirac will not
be used here.
-The complex conjugate of a complex number z is denoted by z (not by z ∗ ) and the adjoint of an
operator A is denoted by A∗ (not by A† ).
1.3
Acknowledgements
I would like to thank professor Dijkgraaf for all his time and effort in supervising this thesis. I
really appreciate that he always took his time to answer my questions, despite the fact that his
agenda was always overfull during his function as president of the KNAW. Even when he became
director of the Institute of Advanced Study (IAS) in Princeton, he still made time for me during
the short periods that he was in the Netherlands. In this context I would also like to thank his
personal assistant Ms. Corina de Boer, who always arranged my appointments with the professor
and who brought me into contact with professor Dijkgraaf in the first place. I am also very grateful
to the second reader of this thesis, doctor Posthuma, for all his time spent on reading the text and
for all his feedback.
I would also like to thank my family for all their support during my study. My parents always
inspire and motivate me in everything I do, and, as the youngest child, I have the privilege that I
can always take an example from my brother and sister.
5
2
Special relativity and quantum theory
In this chapter we will describe the two main ingredients for quantum field theory, namely special
relativity and quantum theory. At the end of this chapter we will combine the two theories to
obtain the general framework for a relativistic quantum theory. This chapter is rather long and
detailed, because it is only after we have developed a proper comprehension of relativistic quantum
theory that we can introduce quantum fields.
2.1
Special relativity
In the absence of gravity, freely moving macroscopic objects (i.e. macroscopic objects on which no
external agencies act) have constant relative velocities. This emperical fact allows us to define a
special class of observers, namely those for which all freely moving macroscopic objects move with
constant velocities. Such observers are called inertial observers. For convenience, we assume that
all inertial observers construct an orthogonal three-dimensional coordinate system in precisely the
same manner. For example, we might agree that the origin of this coordinate system is always the
center of mass of the observer’s body and that the x1 -axis runs from left to right, the x2 -axis runs
from back to front and the x3 -axis runs from down to up; note that this defines a right-handed
coordinate system. Distances along any of these axes are measured by using light rays. We also
assume that all inertial observers are equipped with the same kind of clock. Thus, we may in
fact define an inertial observer to be a right-handed three-dimensional coordinate system moving
through space with constant velocity relative to all freely moving macroscopic objects, together
with a clock at the origin. Two such coordinate systems that coincide but carry clocks that do not
have the same t = 0 moment, are considered as different inertial observers. By using light rays, any
two clocks that are at rest with respect to the same inertial observer can be synchronized in the
familiar way. We may therefore imagine that there is a clock at every point of the coordinate system
of any inertial observer and that all these clocks are synchronized. This allows a coordinatization
of space and time by four numbers (t, x1 , x2 , x3 ) for any inertial observer.
It is an emperical fact that all inertial observers are physically equivalent in the sense that
they all obtain the same outcome whenever they conduct the same experiment. This is called the
principle of special relativity. In order to fully understand this principle, we first introduce some
terminology concerning physical experiments; this terminology is borrowed from chapter 1 of [1].
The part of the physical world that is studied in a particular measurement is called the measured
object under consideration and the measurements that can be carried out on the measured object
are called physical quantities. In any measurement process, the measured object is interacting
for some time with a measuring apparatus, after which the interaction stops and the measuring
apparatus immediately indicates the measured value. In any measurement, the measured object is
prepared during a preparation process. By definition, this preparation process ends at the moment
when the interaction between the measured object and the measuring apparatus stops and when
the measuring apparatus indicates the measured value. In what follows, we will denote measured
objects by Greek letters α, β, . . ., and these Greek letters should be interpreted as a complete
description of how the measured object is prepared, in terms of the space and time coordinates of
all constituent parts of the measured object during the preparation process. We will often denote
these spacetime coordinates symbolically as α(x). Physical quantities will be denoted by capital
letters A, B, . . ., and these should be interpreted as a complete description of how to perform a
certain measurement process, in terms of the space and time coordinates of all constituent parts
of all measuring apparatuses during the measurement process; by definition, the measurement
process ends at the moment that the measured value is produced, i.e. at the same moment that
the preparation process for the measured object ends. As for measured objects, we will often write
A(x) to symbolically denote the spacetime coordinates of all parts of the measuring apparatuses.
The outcome of the measurement of a physical quantity A for a measured object α, i.e. the
measured value, is denoted by M (α, A). Measured values are assumed to be Borel subsets of
Rn , where n = nA depends on the physical quantity A. The reason that measured values are
Borel subsets of Rn , rather than elements of Rn , is that measurements always involve some errors.
6
Consider now some inertial observer O measuring the physical quantity A(y) for the measured
object α(x), where x and y symbolically represent the spacetime coordinates of all parts of the
measured object and of all parts of the measuring apparatuses, respectively. Note, in particular,
that this implies that x and y are such that the end of the preparation process α(x) and the end
of the measuring process A(y) coincide. We say that a second inertial observer O0 carries out
a similar experiment as observer O if this second observer measures the physical quantity A(y 0 )
for the measured object α(x0 ), where x0 and y 0 are the coordinates with respect to O0 and these
coordinates have the same numerical values as the coordinates x and y. Now consider the situation
where N different inertial observers carry out similar experiments. If N (B) denotes the number
of observers that find a measured value in the Borel subset B ⊂ Rn , then it is an emperical fact
that for any such B the fraction N (B)/N approaches some definite value as N becomes large
enough. This suggests that the similar experiments carried out by the different inertial observers
should be interpreted as repetitions of the same probabilistic experiment. This is the form of the
principle of special relativity that we will need in the following. In most texts on special relativity
this principle is not stated in probabilistic form but in deterministic form, since these texts are
often concerned only with classical dynamics. The deterministic form is stated by the equation
M (α(x), A(y)) = M (α(x0 ), A(y 0 )), i.e. the measured values are the same for all inertial observers
carrying out a similar experiment. In other words, in deterministic form the principle of special
relativity can be loosely stated as follows: inertial observers carrying out similar experiments will
obtain the same measured value.
When we apply (the deterministic form of) the principle of special relativity to experiments in
classical electromagnetism, it follows that all inertial observers measure the same speed of light.
This fact can be used to derive the coordinate transformations between different inertial frames,
as explained in any introductory text on special relativity. Here we will merely state the results
for later reference. We already mentioned above that an inertial observer can coordinatize the
points of spacetime by four numbers (x0 , x1 , x2 , x3 ), where x0 = t represents the time coordinate2
and the other components represent the spatial coordinates. Therefore, in any particular inertial
frame O, spacetime can be identified with the four-dimensional vector space R4 . If a second
inertial observer O0 is at rest with respect to the observer O and is standing at the point in space
with spatial coordinates (a1 , a2 , a3 ) with respect to the frame O and if O0 is oriented parallel to
observer O, then any point in space with coordinates (x1 , x2 , x3 ) with respect to O has coordinates
((x0 )1 , (x0 )2 , (x0 )3 ) = (x1 − a1 , x2 − a2 , x3 − a3 ) with respect to O0 . Furthermore, if the time zero
moment (x0 )0 = 0 of the clock of observer O0 takes place at time x0 = a0 with respect to O, then
the time coordinate of any point in spacetime with respect to O0 is (x0 )0 = x0 − a0 , where x0 is the
time coordinate of the point with respect to O. Thus, in this case the coordinates with respect to
O0 are related to those with respect to O by
(x0 )µ = xµ − aµ .
(2.1)
Such coordinate transformations are called spacetime translations. Now consider another observer
O00 that is standing at the same point in space as observer O with a clock that is synchronized
with the clock of O, but suppose that the orientation of O00 is obtained from the orientation of O
by rotating counterclockwise (as seen from a point with positive x3 -coordinate) over an angle θ in
the x1 x2 -plane. Then the coordinates with respect to O00 of a point in spacetime are related to
those with respect to O by
((x00 )0 , (x00 )1 , (x00 )2 , (x00 )3 ) = (x0 , x1 cos θ + x2 sin θ, −x1 sin θ + x2 cos θ, x3 ).
(2.2)
A very similar expression is obtained for rotations in the other two spatial planes. Rotations
around arbitrary axes are more complicated, but we will come back to them later. Coordinate
transformations of this form are called (spatial) rotations. Finally, consider yet another observer
O000 that has the same spatial orientation as O but moves with velocity v in the positive x1 -direction,
2
We will always use units in which the speed of light c is equal to 1. Otherwise it would have been more natural
to define x0 = ct.
7
and suppose that at the unique moment where the two observers are at the same point in space
their clocks are both at time zero, i.e. x0 = (x000 )0 = 0 at that moment. Then their coordinates
are related by
((x000 )0 , (x000 )1 , (x000 )2 , (x000 )3 ) = (γ(v)(x0 − vx1 ), γ(v)(x1 − vx0 ), x2 , x3 ),
(2.3)
where γ(v) = (1−v 2 )−1/2 is the Lorentz factor. Similar expressions are obtained when the observer
O000 moves along one of the other two spatial axes. For more general directions, the expression
becomes more complicated as we will discuss later. These coordinate transformations are called
(Lorentz) boosts. It should be clear that the coordinate transformations between any two inertial
frames can be obtained by a composition of translations, rotations and boosts.
In the situations above, where an intertial observer passes from the coordinates of his own frame
to the coordinates of another inertial observer’s frame, in order to see how the other observer
observes all objects in spacetime, we speak of a passive transformation. However, the observer
can obtain the same result by keeping his own coordinates and by transforming all objects in
spacetime. For example, observer O above could translate all objects in spacetime over the vector
−aµ to obtain the same point of view as observer O0 . Such transformations are called active
transformations. From the active viewpoint, the principle of special relativity may be stated as
follows: if we move all objects in spacetime (along with all measuring apparatuses) according to
a transformation that relates two inertial frames, then the outcome of any measurement remains
unchanged. Note that, in either form, this principle makes it possible to ’repeat’ an experiment at
different times and places, but also at different velocities.
Our goal for the rest of this section is to formulate a mathematical theory of space and time
that agrees with the data described above. In the first subsection we will give the mathematical
definition of spacetime and in the second subsection we will describe the transformations between
different inertial reference frames.
2.1.1
Minkowski spacetime
As dicussed above, each inertial observer can identify spacetime with the vector space R4 and
the coordinate transformations between different inertial frames are generated by translations,
rotations and boosts. However, we would like to define spacetime mathematically in a manner
that describes its intrinsic properties, independent of a choice of inertial frame. For example,
since two inertial frames that are related by a spacetime translation have different origins for their
coordinate frames, it is clear that spacetime should not be represented mathematically by a vector
space, but rather by an affine space. Furthermore, under any coordinate transformation between
inertial frames the quantity
3
2 X
2
(∆x) · (∆x) = (∆x)0 −
(∆x)j ,
(2.4)
j=1
is left invariant, so this quantity should play an important role in the mathematical definition
of spacetime; here ∆x denotes the difference between two points in spacetime. In fact, we can
identify the quantity in (2.4) as some kind of metric (analogous to the Euclidean metric describing
distances in Euclidean space) that acts on differences of two spacetime points. Thus, spacetime
should be represented by a four-dimensional affine space together with some kind of metric acting
on difference vectors, and the coordinate transformations between two inertial frames are then
represented by transformations that preserve the metric. Before defining the precise mathematical
model for spacetime (which will be called Minkowski spacetime, or simply Minkowski space), we
will first recall the definition of an affine space and of symmetric nondegenerate bilinear forms.
Definition 2.1 An affine space is a triple (A, V, `) consisting of a set A, a vector space V and a
map ` : V × A → A such that
(1) `(0, a) = a for all a ∈ A;
8
(2) `(v, `(w, a)) = `(v + w, a) for all v, w ∈ V and a ∈ A;
(3) for each a ∈ A the map `a : V → A defined by `a (v) = `(v, a) is a bijection.
The dimension of the affine space is defined to be the dimension of the vector space V .
Note that (3) implies that A and V are in fact identical as sets. Instead of `(v, a) we also write
v + a. If a1 , a2 ∈ A then, according to condition (3) in the definition, there exists a unique v ∈ V
such that a2 = `a1 (v) = `(v, a1 ) = v + a1 , which we rewrite as a2 − a1 = v. In this sense, we
can subtract points in A to obtain elements in V , so V can be interpreted as the set of differences
between points in A.
Recall from (multi-)linear algebra that a multilinear map T : (V ∗ )×k × V ×l → R on a vector
space V is called a (k, l)-tensor on V . If T : V × V → R is a (0, 2)-tensor on a real vector space
V , then we say that T is symmetric if T (v, w) = T (w, v) for all v, w ∈ V and antisymmetric
if T (v, w) = −T (w, v) for all v, w ∈ V . If T is either symmetric or antisymmetric, then T is
called nondegenerate if T (v, w) = 0 for all w ∈ V implies that v = 0. The following theorem on
nondegenerate symmetric bilinear forms, which we will state without proof, will be very useful for
our purposes.
Theorem 2.2 Let V be an n-dimensional real vector space on which a nondegenerate symmetric
(0, 2)-tensor T : V ×V → R is defined. Then there exists a basis {ej }nj=1 of V such that T (ei , ej ) =
0 if i 6= j and T (ei , ei ) = ±1 for i = 1, . . . , n. Moreover, the number of basis vectors ei for which
T (ei , ei ) = −1 is the same for any such basis (this number is called the index of T ).
Such a basis as described in the theorem is called an orthonormal basis with respect to T . Now
that we have considered affine spaces and symmetric nondegenerate bilinear forms, we can define
our mathematical model for spacetime as follows.
Definition 2.3 Let (M, V, `) be a 4-dimensional real affine space equipped with a nondegenerate
symmetric (0, 2)-tensor η : V × V → R of index 3. Then η is called a Minkowski metric on M,
and (M, V, `, η) is called Minkowski spacetime.
The action of η on two vectors in V , η(v, w) = v · w, is called the inner product (or scalar/dot
product) of v and w. Two vectors v, w ∈ V for which v · w = 0 are called orthogonal. The norm
of a vector v ∈ V is v · v (not the square root of this quantity), and we say that a vector v ∈ V
is timelike if its norm is positive, lightlike or null if its norm is zero and spacelike if its norm is
negative.
Definition 2.4 We define the light cone Cx at a point x ∈ M in Minkowksi spacetime by
Cx = {y ∈ M : η(y − x, y − x) = 0}.
If y ∈ M with η(y − x, y − x) > 0, we say that y lies inside the lightcone at x; if η(y − x, y − x) < 0,
we say that y lies outside the light cone at x.
Note that a point y ∈ M lies on the light cone at x if and only if y − x is a null vector, whereas
y ∈ M lies inside (or outside) the light cone at x if and only if y − x is a timelike (or spacelike)
vector.
We will denote Minkowski spacetime simply by M instead of (M, V, `, η). Furthermore, with
abuse of notation, more often than not we will denote the vector space V also by M. In fact,
from now on M will always denote the vector space V in the triple (M, V, `); whenever we are
considering the affine space M, we will state this explicitly. By theorem 2.2, there exists a basis
{eµ }3µ=0 of M with η(e0 , e0 ) = 1, η(ei , ei ) = −1 for i = 1, 2, 3 and η(eµ , eν ) = 0 if µ 6= ν. We will
always work in an orthonormal basis from now on, and we will always label the orthonormal basis
vectors such that η(e0 , e0 ) = +1.
If {eµ }3µ=0 denotes the dual basis of {eµ }, i.e. {eµ } is a basis of the dual vector space M∗ of
M such that eµ (eν ) = δνµ , then we can write η = ηµν eµ ⊗ eν , where ηµν = η(eµ , eν ). Of course, in
9
our particular choice of basis we have η00 = −η11 = −η22 = −η33 = 1 and all other coefficients are
0. The metric η : M × M → R defines a map
η̃ : M → M∗
v 7→ η(eµ , v)eµ .
In physics it is more convenient to write η̃(v) = vµ eµ rather than η̃(v) = [η̃(v)]µ eµ , and for this
reason physicists often refer to the map v 7→ η̃(v) as ’lowering the indices of v’ and in their
notation this map is simply v µ 7→ vµ = ηµν v ν . Because η̃(e0 ) = e0 and η̃(ej ) = −ej for j = 1, 2, 3,
it is clear that η̃ has an inverse η̃ −1 : M∗ → M and this inverse can be considered as a map
that ’raises indices’: fµ 7→ f µ . This map, in turn, defines a (2, 0)-tensor η −1 : M∗ × M∗ → R
on M given by η −1 (f, g) = g(η̃ −1 (f )) = f (η̃ −1 (g)) for f, g ∈ M∗ . In components we have
η −1 (f, g) = gµ f µ = fµ g µ . We call η −1 the inverse Minkowski metric and its nonzero components
η µν = η(eµ , eν ) are η 00 = −η 11 = −η 22 = −η 33 = 1. Note that for f ∈ M∗ we can now
write the raised components f µ in terms of η µν as f µ = η µν fν . Also, for f, g ∈ M∗ we have
η µν fµ gν = fµ g µ = ηµν f µ g ν . For this reason we will often write, with abuse of notation, f ·g instead
of η −1 (f, g). Similarly, because for f ∈ M∗ and v ∈ M we have that f (v) = fµ v µ = ηµν f ν v µ , we
will also write f · v instead of f (v). In other words, because we can move vectors back and forth
between M and M∗ , we will not make a clear distinction between M and M∗ and we write all
scalars η(v, w), η −1 (f, g) and f (v) as a dot product.
2.1.2
The Lorentz group, causal structure and the Poincaré group
As stated above, the coordinate transformations between different inertial frames are transformations that leave the Minkowski metric invariant. Therefore, we will now study such transformations.
Definition 2.5 A linear map L : M → M is said to be an orthogonal transformation of M if
η(Lv, Lw) = η(v, w) for all v, w ∈ M. An orthogonal transformation is also called a (general)
Lorentz transformation.
In this subsection we will investigate the properties of these Lorentz transformations. We will
also find that the Lorentz transformations form a Lie group, and we will study some important
properties of this Lie group.
Algebraic properties of Lorentz transformations and causal structure of M
The following discussion is largely based on the first chapter of [26]. If L : M → M is an orthogonal transformation and Lv = 0 for some v ∈ M, then for all w ∈ M we have η(v, w) =
η(Lv, Lw) = η(0, Lw) = 0, so that v must be the zero vector in M since η is nondegenerate. This
means that L is injective and hence we can conclude that orthogonal transformations are linear
automorphisms of M. In particular, an orthogonal transformation L is invertible and we have
η(L−1 v, L−1 w) = η(LL−1 v, LL−1 w) = η(v, w), so that its inverse is also an orthogonal transformation. We have the following characterization of orthogonal transformations:
Lemma 2.6 Let L : M → M be a linear map in Minkowski space. Then the following statements
are equivalent:
(a) L is an orthogonal transformation, i.e. η(Lv, Lw) = η(v, w) for all v, w ∈ M.
(b) η(Lv, Lv) = η(v, v) for all v ∈ M.
(c) L carries an orthonormal basis of M onto another orthonormal basis of M.
Proof
That (a) implies (b) is trivial. The opposite implication follows from the identity
4η(v, w) = η(v + w, v + w) − η(v − w, v − w)
= η(L(v + w), L(v + w)) − η(L(v − w), L(v − w))
= η(Lv + Lw, Lv + Lw) − η(Lv − Lw, Lv − Lw)
= 4η(Lv, Lw).
10
To see that (a) implies (c), let {eµ } be an orthonormal basis of M. Because L is an automorphism
of M, {Leµ } is also a basis of M, and from η(Leµ , Leν ) = η(eµ , eν ) it follows that {Leµ } is an
orthonormal basis of M. Finally, to prove that (c) implies (b), let {eµ } be an orthonormal basis
of M. By assumption, {Leµ } is also an orthonormal basis of M, so that3 η(Leµ , Leν ) = η(eµ , eν ).
This immediately implies that η(Lv, Lv) = η(v, v) for all v ∈ M.
If L : M → M is a linear map and {eµ } is an orthonormal basis of M, we define Lµ ν for
µ, ν ∈ {0, 1, 2, 3} by Lµ ν = (Leν )µ . Then Leν = (Leν )µ eµ = Lµ ν eµ , so for all v ∈ M we have
Lv = v ν Leν = v ν Lµ ν eµ , and hence the components of Lv can be expressed in terms of the Lµ ν
as (Lv)µ = Lµ ν v ν . In case L is an orthogonal transformation, the constants Lµ ν satisfy a special
property, namely,
ηµν
= η(Leµ , Leν ) = η(Lρ µ eρ , Lσ ν eσ ) = Lρ µ Lσ ν η(eρ , eσ )
= Lρ µ Lσ ν ηρσ
(2.5)
or, equivalently,
η µν = Lµ ρ Lν σ η ρσ .
(2.6)
Conversely, it is also true that if L : M → M is a linear map such that the constants Lµ ν =
(Leν )µ (with {eµ } an orthonormal basis of M) satisfy (2.5), then L : M → M is an orthogonal
transformation. Thus, the identity above gives a characterization of orthogonal transformations
on Minkowski space M in terms of the constants Lµ ν = (Leν )µ . Of course, the Lµ ν are nothing
else than the matrix coefficients of L with respect to the orthonormal basis {eµ }, where µ denotes
the row index of the matrix [L] of L and ν denotes the column index. When we define the matrix
[η] by [η]µν = ηµν , we can write (2.5) in matrix form as
[L]T [η][L] = [η].
From this matrix identity, the following proposition follows immediately by taking the determinant
on both sides.
Proposition 2.7 Let L : M → M be an orthogonal transformation and let [L] be its matrix with
respect to some orthonormal basis {eµ } of M. Then det([L]) = ±1.
An orthogonal transformation for which the determinant of its matrix is +1 (or −1) is called proper
(or improper ).
Because (1) the composition of two orthogonal transformations is again an orthogonal transfromation, (2) the composition of linear maps on M is associative, (3) the identity map is orthogonal
and (4) the inverse4 of an orthogonal transformation is again an orthogonal transformation, it follows that the orthogonal transformations on M form a group under the compostion of maps. This
group is called the Lorentz group and is denoted by L. Because the determinant is multiplicative,
the set of proper elements in L forms a subgroup of L. It is called the proper Lorentz group and
is denoted by L+ . In practice, we will often identify the Lorentz group L with the 4 × 4-matrices
Λ satisfying ΛT [η]Λ = [η]; the proper Lorentz group L+ is then identified with the set of those
elements Λ ∈ L that also satisfy det(Λ) = 1. Apart from the determinant being either +1 or −1,
the elements of L have another important property:
Proposition 2.8 Let L : M → M be an element of L. Then either L0 0 ≥ 1 or L0 0 ≤ −1.
Proof
P
P
Substitution of µ = ν = 0 in (2.5) gives (L0 0 )2 − 3k=1 (Lk 0 )2 = 1, or (L0 0 )2 = 1+ 3k=1 (Lk 0 )2 ≥ 1,
so that |L0 0 | ≥ 1.
3
Here we use our convention that orthonormal bases are always labeled such that η(e0 , e0 ) = +1.
Note that it follows from the identity [L]T [η][L] = [η] that the inverse matrix [L]−1 = [L−1 ] of [L] can be
expressed as [L]−1 = [η][L]T [η].
4
11
An element L ∈ L is called orthochronous (or nonorthochronous) if L0 0 ≥ 1 (or L0 0 ≤ −1).
To understand the properties of orthochronous elements of L, we first need to define the notion of
a past and future. For this, we need the following theorem.
Theorem 2.9 Suppose that v ∈ M is timelike and w ∈ M is either timelike or else a nonzero
null vector. Let {eµ } be an orthonormal basis for M and write v = v µ eµ and w = wµ eµ . Then
either
(a) v 0 w0 > 0, in which case η(v, w) > 0, or
(b) v 0 w0 < 0, in which case η(v, w) < 0.
In particular, v 0 w0 6= 0 and η(v, w) 6= 0.
Proof
By assumption, (v 0 )2 − (v 1 )2 − (v 2 )2 − (v 3 )2 = η(v, v) > 0 and
(w0 )2 − (w1 )2 − (w2 )2 − (w3 )2 =
η(w, w) ≥ 0, so (v 0 w0 )2 = (v 0 )2 (w0 )2 > (v 1 )2 + (v 2 )2 + (v 3 )2 (w1 )2 + (w2 )2 + (w3 )2 ≥ (v 1 w1 +
v 2 w2 + v 3 w3 )2 , where we have used the Cauchy-Schwarz inequality for R3 . Thus, we find that
0 0 1 1
v w > v w + v 2 w2 + v 3 w3 ,
0 0
0 0
v 0 w0 6= 0 and,
so0 in0 particular
moreover, η(v, w) 6= 0. Suppose that v w > 0. Then v w =
v w > v 1 w1 + v 2 w2 + v 3 w3 ≥ v 1 w1 + v 2 w2 + v 3 w3 and so v 0 w0 − v 1 w1 − v 2 w2 − v 3 w3 > 0, i.e.
η(v, w) > 0. On the other hand, if v 0 w0 < 0, then η(v, −w) > 0, so η(v, w) < 0.
Corollary 2.10 If a nonzero vector in M is orthogonal to a timelike vector, then it must be
spacelike.
We denote by τ the collection of all timelike vectors in M and define a relation ∼ on τ by v ∼ w if
and only if η(v, w) > 0 (so that, by theorem 2.9, v 0 and w0 have the same sign in any orthonormal
basis). The relation ∼ on τ is an equivalence relation with exactly two equivalence classes. We
denote the two equivalence classes of ∼ on τ (in an arbitrary way) by τ + and τ − and refer to
the elements of τ + as future-directed timelike vectors, whereas we refer to the elements of τ − as
past-directed. Then, given some orthonormal basis {eµ }, we have that either v 0 < 0 for all v ∈ τ +
(and v 0 > 0 for all v ∈ τ − ) or else that v 0 > 0 for all v ∈ τ + (and v 0 < 0 for all v ∈ τ − ). This
follows immediately by considering the equivalence class of the timelike vector e0 . Clearly, both
τ + and τ − are cones, which means that if v, w ∈ τ ± and r > 0, then rv ∈ τ ± and v + w ∈ τ ± .
Now that we have obtained the notion of past- and future-directed timelike vectors, we will
define past- and future-directed null vectors. For this we need the following lemma.
Lemma 2.11 If n ∈ M is a nonzero null vector, then n · v has the same sign for all v ∈ τ + .
Proof
n·v2
Suppose that v1 , v2 ∈ τ + with n · v1 < 0 and n · v2 > 0. Define v10 := |n·v
v1 . Because n · v2 > 0, it
1|
n·v2
n·v1
+
0
+
0
follows from the fact that τ is a cone that v1 ∈ τ , and we have n · v1 = |n·v
n · v1 = |n·v
n · v2 =
1|
1|
0
0
+
−n · v2 . Thus 0 = n · v1 + n · v2 = n · (v1 + v2 ). But again using the fact that τ is a cone,
v10 + v2 ∈ τ + ; in particular v10 + v2 is timelike. Since n is nonzero and null this contradicts corollary
2.10.
From this lemma it follows that the following definition makes sense.
Definition 2.12 Let n ∈ M be a nonzero null vector. Then n is called future-directed if n · v > 0
for all v ∈ τ + and past-directed if n · v < 0 for all v ∈ τ + .
Proposition 2.13 Let n1 , n2 ∈ M be two nonzero null vectors. Then n1 and n2 have the same
time orientation (i.e. future-directed or past-directed) if and only if (n1 )0 has the same sign as
(n2 )0 relative to any orthonormal basis for M.
12
Proof
Suppose that (n1 )0 and (n2 )0 have the same sign with respect to any orthonormal basis. Choose
an arbitrary orthonormal basis {eµ } and let λ ∈ {−1, 1} be such that v := λe0 ∈ τ + . Then the
two inner products v · n1 = λ(n1 )0 and v · n2 = λ(n2 )0 have the same sign by assumption. By the
previous lemma, this is not only true for v, but for all vectors in τ + . Thus, n1 and n2 have the
same time orientation. For the converse statement, assume that n1 and n2 have the same time
orientation and let {eµ } be an orthonormal basis. Again we let λ ∈ {−1, 1} be such that λe0 ∈ τ + .
Because n1 and n2 have the same time orientation, λe0 · n1 and λe0 · n2 have the same sign, which
implies that (n1 )0 and (n2 )0 have the same sign.
Definition 2.14 In the affine space M we define the future light cone at a point x ∈ M by
Cx+ = {y ∈ Cx : y − x is future-directed}
and the past light cone by
Cx− = {y ∈ Cx : y − x is past-directed}.
Now that we have introduced past- and future-directed timelike and null vectors, we can interpret
the orthochronous elements of L. This interpretation is given in the following theorem.
Theorem 2.15 Let L ∈ L and let {eµ } be an orthonormal basis for M. Then the following are
equivalent:
(a) L is orthochronous.
(b) L preserves the time orientation of all nonzero null vectors.
(c) L preserves the time orientation of all timelike vectors.
Before we can prove the
fact.
P Let L ∈ L. Substituting µ = ν = 0
Ptheorem, we first need a little
)2 > 3k=1 (L0 k )2 . Now let v = v µ eµ ∈ M be
in (2.6) gives (L0 0 )2 − 3k=1 (L0 k )2 = 1 and so (L0 0P
either timelike or else null and nonzero, so (v 0 )2 ≥ 3k=1 (v k )2 (note that v 0 6= 0, since otherwise
v = 0). Using these two inequalities and the Cauchy-Schwarz inequality for R3 (and the fact that
v 0 6= 0), we get
3
X
!2
0
L kv
k
3
X
≤
k=1
!
0
2
(L k )
k=1
3
X
!
k 2
(v )
< (L0 0 )2 (v 0 )2 = (L0 0 v 0 )2 .
k=1
We may rewrite this as
0 >
3
X
!2
L0 k v k
− (L0 0 v 0 )2 =
3
X
!
k=1
"
= −
3
X
!
L0 k v k − L0 0 v 0
k=1
L0 k ek
3
X
!
L0 k v k + L0 0 v 0
k=1
#
· v L0 µ v µ = −(w · v)(Lv)0 ,
k=0
P
where we have defined the timelike vector w = 3k=0 L0 k ek . Thus (w · v)(Lv)0 > 0, so we conclude
that w · v and (Lv)0 have the same sign. To summarize, if v ∈ M is either timelike or else null
and nonzero and if L ∈ L, then (Lv)0 and w · v (with w the timelike vector defined above) have
the same sign. We will use this fact to prove the theorem.
Proof
Let v ∈ M be again a timelike or nonzero null vector and let w be the timelike vector defined
above.
Assume L0 0 ≥ 1 (L orthochronous). We separate two cases. In case v 0 > 0 we have w0 v 0 =
L0 0 v 0 > 0, so by theorem 2.9 we have v · w > 0. Thus (Lv)0 > 0, by the discussion above. In case
13
v 0 < 0 we have w0 v 0 = L0 0 v 0 < 0, so by theorem 2.9 we have v · w < 0. Thus (Lv)0 < 0, by the
discussion above. So we conclude that if L0 0 ≥ 1, then v 0 and (Lv)0 always have the same sign,
i.e. we have proved that (a) implies (b) and that (a) implies (c).
Assume L0 0 ≤ −1 (L nonorthochronous). We separate two cases. In case v 0 > 0 we have
w0 v 0 = L0 0 v 0 < 0, so by theorem 2.9 we have v · w < 0. Thus (Lv)0 < 0, by the discussion above.
In case v 0 < 0 we have w0 v 0 = L0 0 v 0 > 0, so by theorem 2.9 we have v · w > 0. Thus (Lv)0 > 0,
by the discussion above. So we conclude that if L0 0 ≤ −1, then v 0 and (Lv)0 always have opposite
signs, i.e. we have proved that (b) implies (a) and that (c) implies (a).
Corollary 2.16 If L ∈ L is nonorthochronous, it reverses the time orientation of all timelike and
nonzero null vectors.
It is now clear how to interpret the orthochronous elements of L: they are precisely those elements
of L that preserve the causal structure of Minkowski space. Now suppose that L1 , L2 ∈ L are
both orthochronous. If v ∈ M is a timelike or nonzero null vector that is future-directed (or pastdirected), then by theorem 2.15 the vector w = L1 v is also a future-directed (or past-directed)
timelike or nonzero null vector in M, and by the same argument, so is L2 w = L2 L1 v. Thus, the
element L2 L1 ∈ L preserves the time orientation of all timelike and all nonzero null vectors. Using
theorem 2.15 again, we conclude that L2 L1 ∈ L is orthochronous. Furthermore, it is clear that
the identity map I : M → M is orthochronous since I 0 0 = 1. Finally, if L ∈ L is orthochronous,
and L−1 ∈ L would be nonorthochronous, then I = L−1 L would reverse the time orientation
of all timelike and all nonzero null vectors. This shows that L−1 must in fact be orthochronous
whenever L is orthochronous. Thus, the set of orthochronous elements of L forms a subgroup L↑
of L and is called the orthochronous Lorentz group. The intersection L↑+ := L+ ∩ L↑ of the two
subgroups L+ and L↑ of L is again a subgroup of L and is called the restricted Lorentz group. We
also define the subsets L− and L↓ , consisting of the Lorentz transformations L with det(L) = −1
and L0 0 ≤ −1, respectively. Of course these subsets cannot be subgroups of L since they do not
contain the identity element.
The Lie group structure of the Lorentz group
The Lorentz group L can be viewed as a subgroup of the matrix Lie group GL(4, R), and it
can in fact be shown that L is itself a six-dimensional matrix Lie group. It has four connected
components, namely L↑+ , L↓+ := L+ ∩ L↓ , L↑− := L− ∩ L↑ and L↓− := L− ∩ L↓ . Typical elements of L↑− , L↓− and L↓+ are the space-inversion Is , time-reversal It and spacetime-inversion
Ist = Is It , respectively, where Is is defined by Is (x0 , x1 , x2 , x3 ) = (x0 , −x1 , −x2 , −x3 ) and It is defined by It (x0 , x1 , x2 , x3 ) = (−x0 , x1 , x2 , x3 ). The transformation Is defines a bijection L↑− → L↑+
by L 7→ Is L. Similarly, It defines a bijection L↓− → L↑+ and Ist defines a bijection L↓+ → L↑+ . In
other words, the orthochronous Lorentz group L↑ = L↑+ ∪ L↑− is generated by L↑+ ∪ {Is } and the
proper Lorentz group L+ = L↑+ ∪ L↓+ is generated by L↑+ ∪ {Ist }. It also follows that the subgroup
L0 := L↑+ ∪ L↓− , called the orthochorous Lorentz group, is generated by L↑+ ∪ {It }.
The Lie algebra l of the Lorentz group L is by definition the set of all transformations X :
M → M such that etX ∈ L for all t ∈ R, or, in terms of matrices, the set of all 4 × 4-matrices
such that [η] = (etX )T [η]etX , which is equivalent to (etX )−1 = [η]−1 (etX )T [η] = [η](etX )T [η]. But
T
(etX )T = etX and (etX )−1 = e−tX , so a 4 × 4-matrix X is in l if and only if
T
e−tX = [η]etX [η] = et[η]X
T [η]
,
where we have used that for each 4 × 4-matrix M and each 4 × 4-matrix
G satisfying G2 = I we
k (GM G)k
P
P
P
k GM k G
kM k
t
∞
∞
∞
t
t
= k=0
=G
G = GetM G. Thus, X ∈ l if
have etGM G = k=0
k=0 k!
k!
k!
and only if [η]X T [η] = −X, or X T [η] + [η]X = 0. We have thus found that
l = {X ∈ M4 (R) : X T [η] + [η]X = 0}.
14
Now let {eµν } be the standard basis of M4 (R) (i.e. (eµν )ρσ = 1 if (ρ, σ) = (µ, ν) and (eµν )ρσ = 0
for other values of ρ and σ) and define for j, k = 1, 2, 3 the matrices
Xjk = ekj − ejk
Xk0 = −X0k = ek0 + e0k .
The basis matrices eµν of M4 (R) cannot be defined covariantly with the lower indices µ and ν
acting as Lorentz indices, by which we mean that for L ∈ L the matrices e0ρσ := Lµ ρ Lν σ eµν are
not the same as the matrices eµν defined above5 . However, if we define X00 = 0 then the matrices
X00 , Xk0 and Xjk can be given by the covariant expression (Xµν )α β = δµα ηνβ − δνα ηµβ , which is
antisymmetric in µ and ν. The matrices Xµν satisfy the commutation relations
[Xµν , Xρσ ] = ηµρ Xσν + ηνρ Xµσ + ηνσ Xρµ + ηµσ Xνρ ,
(2.7)
which easily follow from writing out [Xµν , Xρσ ]α β by using the covariant expression for (Xµν )α β
above. In other words, we have [Xµν , Xρσ ] = 0 whenever the sets {µ, ν} and {ρ, σ} are either equal
or disjoint, and we have [Xµν , Xνσ ] = ηνν Xµσ if µ 6= ν and ν 6= σ. The matrices {Xµν }µ<ν form a
basis of l and their commutation relations are obtained from (2.7) by replacing Xκλ → −Xλκ on
the right-hand side whenever κ > λ. We mention furthermore that the elements of the form etXµν
generate L↑+ .
We will now focus on the restricted Lorentz group L↑+ . The restricted Lorentz group is a
connected six-dimensional Lie group that is not simply connected, i.e. not every closed path in
this group can be contracted continuously to a point. According to the theory of Lie groups, there
e together with a Lie group
exists for each connected Lie group G a simply-connected Lie group G
6
e → G such that the associated Lie algebra homomorphism7 φ : e
homomorphism Φ : G
g → g is
e
a Lie algebra isomorphism. The Lie group G is called a universal covering group of G and the
homomorphism Φ is called the covering homomorphism. The universal covering group is unique
e 1 , Φ1 ) and (G
e 2 , Φ2 ) are universal covers of G then there exists a Lie
in the following sense: if (G
e
e
group isomorphism Ψ : G1 → G2 such that Φ2 ◦ Ψ = Φ1 . The universal covering group Le↑+ of
L↑+ is SL(2, C), the group of all 2 × 2 complex matrices with unit determinant. The covering
homomorphism Φ : SL(2, C) → L↑+ can be obtained as follows. Let H(2, C) be the set of 2 × 2
complex Hermitian matrices and, given an orthonormal basis {eµ } of M, define the R-linear
isomorphism ψ : M → H(2, C) by
ψ(x) =
3
X
µ=0
µ µ
x σ =
x0 + x3 x1 − ix2
,
x1 + ix2 x0 − x3
where σ 0 is the 2 × 2 identity matrix and the σ j with j = 1, 2, 3 are the Pauli matrices
0 1
0 −i
1 0
1
2
3
σ =
,
σ =
,
σ =
.
1 0
i 0
0 −1
Note that this definition of ψ depends on the choice of the orthonormal basis {eµ } of M; furthermore, once this choice of basis has been made (so that ψ is determined), we cannot use the
same formula to compute ψ(x) with respect to another orthonormal basis because the σ µ are not
supposed to transform in any way. The important property of ψ is that
det(ψ(x)) = (x0 )2 −
3
X
(xk )2 = x · x.
k=0
5
The eµν are given by (eµν )αβ = δµα δνβ , where δµν is the (non-covariant) Kronecker delta with lower indices.
Note that it would in fact be possible to define a standard basis {eµ ν } by using the covariant Kronecker deltas δνµ .
6
This is a smooth map of Lie groups that is also a group homomorphism.
7
This is the unique map φ : e
g → g satisfying Φ(eX ) = eφ(X) for all X ∈ e
g. The proof of the existence and
uniqueness of such a map (and that such a map is a Lie algebra homomorphism, i.e. φ([X, Y ]eg ) = [φ(X), φ(Y )]g for
all X, Y ∈ e
g) can be found in [22], theorem 2.21.
15
If A ∈ SL(2, C) and X ∈ H(2, C), then (AXA∗ )∗ = (A∗ )∗ X ∗ A∗ = AXA∗ , so AXA∗ ∈ H(2, C).
Also, det(AXA∗ ) = det(A) det(X) det(A∗ ) = det(X), since for all A ∈ SL(2, C) we have det(A) =
det(A∗ ) = 1. Thus, each element A ∈ SL(2, C) defines a map ΨA : H(2, C) → H(2, C) by
ΨA (X) = AXA∗
that preserves the determinant. Under the correspondence ψ : M → H(2, C), this determinant
preserving map on H(2, C) corresponds to a norm preserving linear map Φ(A) := ψ −1 ◦ ΨA ◦ ψ :
M → M given by


3
X
xµ σ µ 
Φ(A)x = (ψ −1 ◦ ΨA ◦ ψ)(xµ eµ ) = (ψ −1 ◦ ΨA ) 
µ=0
 
= ψ −1 A 
3
X



xµ σ µ  A∗  = ψ −1 
3
=
xµ Aσ µ A∗ 

3

1 X  X µ µ ∗  ν 
x Aσ A σ
eν
Tr
2
µ=0
ν=0
=

µ=0
µ=0

3
X
1
2
3 X
3
X
xµ Tr (Aσ µ A∗ σ ν ) eν ,
ν=0 µ=0
where we have
used that the inverse of the R-linear isomorphism ψ : M → H(2, C) is given by
1 P3
−1
ψ : X 7→ 2 µ=0 Tr(Xσ µ )eµ . Thus we obtain a map Φ : SL(2, C) → L, where
1
Φ(A)µ ν = Tr(Aσ ν A∗ σ µ ).
2
Note that the different index placement on both sides reflects the fact that the map Φ is not defined
in a covariant way. We can rewrite the equation Φ(A)x = (ψ −1 ◦ ΨA ◦ ψ)(x) = ψ −1 (Aψ(x)A∗ ) as
ψ(Φ(A)x) = Aψ(x)A∗ .
Using this equation, we find that for A, B ∈ SL(2, C) we have for each x ∈ M
ψ(Φ(AB)x) = ABψ(x)B ∗ A∗ = Aψ(Φ(B)x)A∗ = ψ(Φ(A)Φ(B)x).
Using the invertibility of ψ, we conclude that for each x ∈ M we have Φ(AB)x = Φ(A)Φ(B)x, and
thus that Φ(AB) = Φ(A)Φ(B). So Φ : SL(2, C) → L is a group homomorphism. It is also clear
from the formula for Φ(A)µ ν that this map is smooth, so Φ is in fact a Lie group homomorphism. In
particular, since Φ is continuous and SL(2, C) is a connected (even simply connected) Lie group,
the image Φ(SL(2, C)) ⊂ L must be connected. Because the identity of L is contained in this
image, the image must lie in the connected component of the identity, i.e. Φ(SL(2, C)) ⊂ L↑+ .
The Lie group homomorphism Φ : SL(2, C) → L↑+ induces a homomorphism φ : sl(2, C) → l of
the associated Lie algebras. Note that because L↑+ is the connected component of the identity in
L, the Lie algebra associated to L↑+ coincides with the Lie algebra l associated with L. To see what
φ is, we first need some information about sl(2, C). The Lie algebra sl(2, C) consists of all complex
2 × 2-matrices with zero trace. It can be viewed as a three dimensional complex Lie algebra, but
for our purposes it is more convenient to consider it as a six-dimensional real Lie algebra. A basis
1
of sl(2, C) is given by the six matrices { 21 σj , 2i
σj }j=1,2,3 , where the σj denote the Pauli matrices.
1
1
1
1
Note that the 2i σj ’s span the Lie algebra su(2) ⊂ sl(2, C); they satisfy [ 2i
σj , 2i
σk ] = 2i
σl , where
(j, k, l) is a cyclic permutation of (1, 2, 3). In terms of this basis for sl(2, C), the Lie algebra
homomorphism φ : sl(2, C) → l is given by
1
φ
σj
= Xkl
for (j, k, l) a cyclic permutation of (1, 2, 3)
2i
1
φ
σk
= X0k .
2
16
Since φ maps a basis of sl(2, C) onto a basis of l, it is clear that φ : sl(2, C) → l is an isomorphism
of Lie algebras, and by definition of φ we have
t
1
Φ(e 2i σj ) = etφ( 2i σj ) = etXkl
Φ(e
t
σ
2 k
tφ( 12 σk )
) = e
(2.8)
= etX0k ,
where in the first expression (j, k, l) is a cyclic permutation of (1, 2, 3). Because the elements
etXjk and etX0k on the right-hand sides generate L↑+ , it follows immediately that Φ is surjective.
Thus, Φ : SL(2, C) → L↑+ satisfies all the right properties of the universal covering map. We note
furthermore that the map Φ : SL(2, C) → L↑+ is two-to-one: for each L ∈ L↑+ the inverse image
Φ−1 (L) is a set of the form {A, −A}.
The Poincaré group
So far we have given the definition of Minkowski spacetime as a 4-dimensional affine space and we
have studied the group of transformations L : M → M with η(Lv, Lw) = η(v, w), where v and
w are elements in the vector space (and not the affine space) M. However, we are actually interested in transformations P : M → M of the affine space M that satisfy η(P x − P y, P x − P y) =
η(x − y, x − y) for all x and y in the affine space. Such transformations are called Poincaré transformations. In order to formulate the general form of a Poincaré transformation, note that if we
choose a fixed point x0 ∈ M in the affine space then we can write any x in the affine space as
x = x0 + (x − x0 ), where x − x0 lies in the vector space M. When we have agreed on such point x0 ,
we can actually identify the affine space M with the vector space M by identifying a point x in
the affine space with the point x − x0 in the vector space; note that the point x0 in the affine space
is then identified with the origin of the vector space. With this identification, a general Poincaré
transformation can then be written as
Pa,L (x) = Lx + a,
with L a Lorentz transformation and a an element of the vector space M. If we take L to be the
identity map 1 ∈ L, we obtain the map Ta (x) := Pa,1 = x+a, which is a spacetime translation. If we
take a to be the zero vector, we obtain the map P0,L (x) = Lx, which is a Lorentz transformation. A
general Poincaré transformation can thus be written as the composition of a Lorentz transformation
and a spacetime translation:
Pa,L (x) = Lx + a = (Ta ◦ L)(x).
From now on we will always write (Ta , L), or simply (a, L), to denote the Poincaré transformation
Pa,L . The composition of two Poincaré transformations (a1 , L1 ) and (a2 , L2 ) is again a Poincaré
transformation and its action is given by
((a1 , L1 ) ◦ (a2 , L2 ))x = (a1 , L1 )(L2 x + a2 ) = L1 (L2 x + a2 ) + a1 = L1 L2 x + (L1 a2 + a1 ),
so we have found the rule (a1 , L1 ) ◦ (a2 , L2 ) = (a1 + L1 a2 , L1 L2 ). In particular, we have for any
Poincaré transformation (a, L)
(0, 1) ◦ (a, L) = (a, L) ◦ (0, 1) = (a, L),
where 1 ∈ L is the identity map. Furthermore, for any isometry (a, L) we have
(a, L) ◦ (−L−1 a, L−1 ) = (−L−1 a, L−1 ) ◦ (a, L) = (0, 1).
This shows that the set of Poincaré transformations forms a group under composition of maps with
multiplication given by (a1 , L1 )(a2 , L2 ) = (a1 + L1 a2 , L1 L2 ), unit element (0, 1) and (a, L)−1 =
(−L−1 a, L−1 ). This group is called the Poincaré group and is denoted by P. From the considerations above, the group P is a semi-direct product of the additive group R4 and the Lorentz group
17
↑
L. We obtain the subgroups P+
, P+ , and P ↑ of P by demanding that the Lorentz transformation
↑
L in (a, L) ∈ P lies in L↑+ , L+ or L↑ , respectively. The subgroups P+
, P+ , and P ↑ are called
the restricted Poincaré group, the proper Poincaré group and the orthochronous Poincaré group,
↓
↑
↓
respectively. In a similar way we also define the subsets P+
, P−
and P−
by demanding that L lies
↓
↑
↓
in L+ , L− or L− .
↑
↓
The Poincaré group P is a ten-dimensional real Lie group with connected components P+
, P+
,
↑
↓
P− and P− . The Lie algebra p of the Poincaré group contains the Lie algebra l of the Lorentz
group as a Lie subalgebra. The Lie algebra p is spanned by the basis elements {Xµν }µ<ν of l
together with four elements {Yµ } with µ = 0, 1, 2, 3. The Lie bracket in p is given in terms of these
basis elements Xµν and Yµ by
[Xµν , Xρσ ] = ηµρ Xσν + ηνρ Xµσ + ηνσ Xρµ + ηµσ Xνρ
[Xµν , Yρ ] = −(ηνρ Yµ − ηµρ Yν )
[Yµ , Yν ] = 0.
↑
↑
Because the subgroup P+
of P is connected, it has a universal covering group P̃+
. This universal
↑
e
covering group P+ consists of all pairs (a, A), where a ∈ M and A ∈ SL(2, C). The multiplication
e↑ is given by
in P
+
(a1 , A1 )(a2 , A2 ) = (a1 + Φ(A1 )a2 , A1 A2 ),
where Φ denotes the covering homomorphism Φ : SL(2, C) → L↑+ . The covering homomorphism
e↑ → P ↑ is given by
Π:P
+
+
Π((a, A)) = (a, Φ(A)),
so that Π((a, A)) acts on any x ∈ M as
Π((a, A))x = Φ(A)x + a =
1X
Tr(Aσ ν A∗ σ µ )xµ eν + a.
2 µ,ν
Physical interpretation of the Poincaré transformations
At the beginning of this section on special relativity we gave the explicit form of some important
coordinate transformations, namely spacetime translations, spatial rotations (around the x3 -axis)
and Lorentz boosts (in the x1 -direction). We will now relate these coordinate transformations
to the Poincaré transformations (a, L). It is clear that the spacetime translation in (2.1) can be
written as
x0 = (−a, 0)x.
To rewrite the spatial rotation in (2.2), note that

1
0
0

0 cos(t) − sin(t)
etX12 = 
0 sin(t) cos(t)
0
0
0

0
0
.
0
1
We can thus write (2.2) as
x00 = 0, e−θX12 x.
More generally, if we define X = (X23 , X31 , X12 ), then a rotation of the coordinate axes of observer
b (according to the right-hand rule) with respect to the
O00 over an angle θ around the unit vector θ
b
b
axes of observer O gives the transformation rule x00 = 0, e−θθ·X x = 0, e−θ·X x, where θ = θθ.
Finally, because


cosh(t) − sinh(t) 0 0
− sinh(t) cosh(t) 0 0
,
etX01 = 

0
0
1 0
0
0
0 1
18
we can write (2.3) as
x000 = 0, eφv X01 x,
where sinh(φv ) = γ(v)v and cosh(φv ) = γ(v), i.e. φv =
1
2
1+v
1−v . We call φv the boost parameter
000
O moves with velocity v with respect
ln
corresponding to the speed v. More generally, if observer
e = (X01 , X02 , X03 ), then the coordinate transformation is given by x000 =
to
X
O and ifwe define
e
e
b.
0, eφ|v| vb·X x = 0, eφv ·X x, where we have defined φv = φ|v| v
Earlier we mentioned that the elementsof the form etXµν generate the restricted Lorentz group
↑
so the elements of the form a, etXµν generate the restricted Poincaré group P+
. In other
↑
words, P+ is generated by translations, spatial rotations and Lorentz boosts. The restricted
↑
Poincaré group P+
is therefore precisely the group of transformations that relates different inertial
↑
frames. The physically important group is thus P+
, rather than the entire Poincaré group P.
L↑+ ,
2.2
Quantum theory
In this section we will describe quantum theory. In the first subsection we introduce the definitions
of states and observables and some of their properties. In the second subsection we formulate the
general mathematical structure of quantum theory. In the third subsection we discuss how symmetries are described in the mathematical framework of quantum theory. In the fourth subsection we
will consider a particular form of symmetry, namely relativistic symmetry; this automatically leads
to the theory of unitary representations of the universal covering group of the Poincaré group. The
irreducible representations will lead naturally to a mathematical definition of a particle. Finally,
in the fifth subsection we will describe the state spaces associated with many-particle states.
2.2.1
States and observables
As stated at the beginning of the previous section on special relativity8 , if we repeat an experiment
N times, and if N (B) denotes the number of times that we find a measured value in the Borel
subset B ⊂ Rn , then it is an emperical fact that for any such B the fraction N (B)/N approaches
some definite value as N becomes large enough. Therefore, we assume that for any measured
object α and for any physical quantity A there exists some theoretical probability
PαA (B) := lim (N (B)/N )
N →∞
that the measured value M (α, A) lies in the Borel set B ⊂ RnA . Note that if we write α(x) and
A(y) to symbolically denote the spacetime components of all parts of the measured object and of
all parts of the measuring apparatuses, then according to the principle of special relativity these
probabilities satisfy
A(y 0 )
A(y)
Pα(x0 ) (B) = Pα(x) (B)
(2.9)
where x0 and y 0 denote spacetime components with respect to another inertial observer that are
numerically equal to x and y.
States and observables in the physical world
If for two measured objects α and β we have that
PαA (B) = PβA (B)
for all physical quantities A and all Borel sets B, then the measured objects α and β cannot be
distinguished by any experiment. This defines an equivalence relation on the set of all measured
objects, the corresponding equivalence classes of which are called states. With abuse of notation,
8
Like our discussion at the beginning of the previous section, the present discussion is also largely inspired by the
first chapter of [1].
19
we will denote the equivalence class (i.e. the state) of a measured object α also by α, and for
the probabilities we correspondingly write PαA (B) where α now denotes the state rather than a
measured object. If for two physical quantities A and B we have that
PαA (B) = PαB (B)
for all states α and all Borel sets B, then the physical quantities A and B cannot be distinguished
by any experiment. This defines an equivalence relation on the set of physical quantities, and the
corresponding equivalence classes are called observables. We will use the same letter for a physical
quantity as for its equivalence class (i.e. the observable) and we use the notation PαA (B) where
α is a state and A is an observable. It is clear from the definition of a state that a state α is
completely characterized by the set of probabilities {PαA (B)}A,B , where A runs over all possible
observables.
Now we would like to define the notion of simultaneously measurable observables. For any
observable A, we can define for each Borel-measurable function f : RnA → Rm an observable
f (A); namely, if some kind of measuring process results in a measured value in the Borel subset
B ⊂ RnA for the observable A then the Borel subset f (B) ⊂ Rm represents the measured value
for the observable f (A). We say that the observables A1 , . . . , Am are simultaneously measurable in
n
state α if there exists an observable B and functions fj : RnB → R Aj such that for all j = 1, . . . , m
A
f (B)
the observables Aj are indistinguishable from fj (B) in state α, i.e. if Pα j (B) = Pαj (B) for
all Borel sets B. Note that for such a set {A1 , . . . , Am } of simultaneously measurable observables
in state α we can now define the observable g(A1 , . . . , Am ) for state α for arbitrary measurable
functions g : RnA1 +...+nAm → Rk by g(A1 , . . . , Am ) = g(f1 (B), . . . , fm (B)). In particular, we can
define sums and products of such observables to obtain new observables.
States and observables in physical theories
In physical theories, the states {α} and observables {A} defined above are represented by certain
b which are also called states and observables, respectively. For
mathematical objects {b
α} and {A},
each inertial observer there are bijective correspondences
Ts : α 7→ α
b
and
b
To : A 7→ A,
and for each inertial observer these correspondences are defined in identical ways in terms of the
coordinate system. It seems reasonable to suspect that the mathematical objects corresponding to
the states and observables at one particular instant of time exhaust the entire set of mathematical
objects necessary to describe the physical world. Thus, at each moment of time we can use the
same set mathematical objects. However, it is not true that two states or two observables that are
related by a time translation will automatically be mapped to the same mathematical object. We
will come back to this below, when we will consider the time evolution of a system.
Although the correspondences Ts and To are completely determined for a given inertial observer, there might be mathematical objects in the theory, other than states and observables, that
are not completely determined for a given inertial observer. For instance, in electrodynamics each
observer can choose any particular gauge for the electromagnetic potentials.
For each pair (Ts (α), To (A)) and for each Borel set B, a physical theory should provide a
T (A)
number PTso(α) (B) that represents the probability PαA (B); this is the minimum requirement that
any physical theory should satisfy in order to be consistent with emperical data. There is also
another consistency requirement for physical theories: the theory should be consistent with the
probabilistic form of the principle of special relativity. What this means, can be understood as
follows. Suppose that a particular inertial observer O is interested in the possible outcome of a measurement of the observable A for some state α.9 For this, the observer O uses the correspondences
Ts/o to obtain the mathematical objects Ts (α) and To (A), and finds that the theory predicts the
9
It is important to emphasize at this point that the descriptions of the α and A are complete: there are no
external forces that are not included in the descriptions of α and A.
20
T (A)
probabilities PTso(α) (B). Now suppose that O actually performs the corresponding measurement
and suppose that a second observer O0 watches O performing the measurement. Then O0 will
conclude, from his or her point of view, that a measurement was made of the observable A0 for a
state α0 . If O0 wants to know the probabilities for the possible outcomes of the measurement of
O, he or she should consider the mathematical objects Ts (α0 ) and To (A0 ). Note in particular that
this procedure defines bijections
0
0
FO→O
0 : Ts (α) 7→ Ts (α )
and
FO→O0 : To (A) 7→ To (A0 )
of the mathematical objects. The observer O0 now concludes that the theory predicts the probT (A0 )
abilities PTso(α0 ) (B). The requirement that the theory must be consistent with the probabilistic
form of the principle of special relativity can thus be stated as
T (A)
T (A0 )
F
0T
(A)
PTso(α) (B) = PTso(α0 ) (B) = PF 0O→O Tso(α) (B),
O→O 0
(2.10)
where α0 and A0 are the state and observable α and A of observer O, as seen from observer O0 .
Symmetries of theories and of systems
In discussing the principle of special relativity, we argued that there is a pair (s0 , s) of bijections on
the sets of mathematical objects representing states and observables, respectively. Such a pair of
bijections is called a symmetry of the theory. However, in physics one often studies the symmetries
of a particular physical system, which can only be found in a limited set of states and for which
only a limited set of observables can be measured. The phrase ’symmetry of a physical system’
actually refers to the symmetries of a subsystem with respect to the entire system. For example, if
some inertial observer O places a fixed point charge Q at the origin of his/her coordinate system
and he/she wants to study the motion of some test charge q in the field of Q, it would be easier to
make use of the spherical symmetry. By spherical symmetry we mean the following. Suppose for
the moment that observer O does not know that there is a charge Q at the origin and suppose that
he/she develops a theory that describes the subsystem consisting of the test charge q. This theory
assigns mathematical objects to the states {αq } and observables {Aq } of the test charge q in a
similar manner as we discussed above. Now consider a second observer O0 whose coordinate axes
are rotated with respect to the coordinate system of O, and suppose that this second observer uses
the same physical theory as O and uses the same correspondence to assign mathematical objects to
the states and observables of q. Then spherical symmetry means that there is a relation like (2.10)
between the mathematical objects of O and O0 , only this time we consider only the possible states
and observables of the test charge q, rather than the set of all states and observables. In particular,
there is also a pair (s0 , s) of bijections of the mathematical objects as above. Generalization of
these results to arbitrary subsystems motivates us to define a symmetry of a (sub)system to be
a pair of bijections (s0 , s) of the mathematical objects representing the states and observables of
that particular system. However, in what follows we will only be concerned with symmetries of
the theory.
In the following subsection we will explain precisely how states and observables are represented
mathematically in quantum theory, and how these (mathematical representations of) states and
observables can be paired to obtain probabilities.
2.2.2
The general framework of quantum theory
The content of this subsection can be found in very many places; we have mainly used [34]. For
some background information on self-adjoint operators on a Hilbert space one can consult appendix
A. The mathematical description of a quantum system is characterized by a pair (H, A) consisting
of a separable Hilbert space H and a set A of self-adjoint operators on H, the elements of which are
called observables. Until subsection 4.2.1 we will always assume that A is the set of all self-adjoint
operators in H; in subsection 4.2.1 we will discuss the more general case in which A does not
21
contain all self-adjoint operators on the physical Hilbert space. The subset A0 = A ∩ B(H) of A is
called the set of bounded observables and B(H) is called the algebra of bounded observables. The
set of states S consists of all linear functionals ρ : B(H) → C of the form
ρ(A) = Tr(b
ρA),
where ρb ∈ B1 (H) is a trace-class operator with Tr(b
ρ) = 1. Such an operator is called a density
operator and it is clear that any convex combination ρb := λb
ρ1 + (1 − λ)b
ρ2 (with 0 ≤ λ ≤ 1) of two
density operators ρb1 , ρb2 is again a density operator and hence (using linearity of the trace) defines
a state given by
ρ(A) = Tr(b
ρA) = λTr(b
ρ1 A) + (1 − λ)Tr(b
ρ2 A) = λρ1 (A) + (1 − λ)ρ2 (A),
where ρi ∈ S denotes the state corresponding to ρbi . In particular, this shows that S is a convex
set (just read the equation from right to left and use the fact that ρb = λb
ρ1 + (1 − λ)b
ρ2 indeed
defines a density operator). The extremal points of the convex set S are called pure states, and
we denote the set of all pure states by P S. The elements in S\P S are called mixed states. Any
density operator is a countable sum of the form
X
ρb =
λi Ei ,
(2.11)
i
P
with Ei one-dimensional projections on H and λi ≥ 0 with i λi = 1. Conversely, any operator
of the form (2.11) defines a density operator. As a special case of (2.11), when ρb is a onedimensional projection onto the (one-dimensional) subspace V1 of H, it is easy to see, by choosing
an orthonormal basis {ei }i for H containing a unit vector Ψ ∈ V1 (so ek = Ψ for some k), that for
such a state we have that for all A ∈ B(H)
X
ρ(A) = Tr(b
ρA) =
hb
ρAei , ei i = hAΨ, Ψi.
(2.12)
| {z }
i
∈CΨ
Such a state is also called a vector state and we often write such a vector state as ρΨ . Combining
(2.11) and (2.12), we see that each state ρ ∈ S is a countable convex combination of vector states
and therefore we can write the action of ρ on A ∈ B(H) as
X
ρ(A) =
λi hAΨi , Ψi i
(2.13)
i
P
(with 0 ≤ λi ≤ 1 and i λi = 1 as before) for some countable collection {Ψi }i of unit vectors in
H. An immediate consequence of (2.13) is that all pure states must be vector states. The converse
of this statement is also true10 and therefore the set of pure states coincides exactly with the set
of vector states, which in turn is in one-to-one correspondence with the set of unit vectors in H
modulo phase factors. Thus the set P S of all pure states is in one-to-one correspondence with the
set of all unit rays
R(Ψ) = {eiθ Ψ : kΨk = 1, 0 ≤ θ < 2π}
in H. Note that R(Ψ1 ) = R(Ψ2 ) if and only if Ψ2 = eiθ Ψ1 for some θ ∈ [0, 2π).
Let P(R) denote the set of probability measures
R on R. We then define a map A × S → P(R)
as follows. If A ∈ A with spectral resolution A = R λdEA (λ) and if ρ ∈ S, we define a probability
measure µA,ρ on R by
µA,ρ (B) = ρ(EA (B)) = Tr(b
ρEA (B))
for any Borel set B ⊂ R. The quantity µA,ρ (B) then represents the probability that, when the
quantum system is in the state ρ, the result of a measurement of the observable A lies in the Borel
set B. The expectation value hAiρ of the observable A in the state ρ of the system is then
Z
hAiρ =
λdµA,ρ (λ)
R
10
This will no longer be the case in the more general situation of subsection 4.2.1, where the algebra of observables
is allowed to be a proper subalgebra of B(H).
22
and the variance of A in this state is
σρ2 (A) = h(A − hAiρ 1H )2 iρ = hA2 iρ − hAi2ρ .
In particular, if ρ ∈ S is a pure state given by ρ(.) = h.Ψ, Ψi, then µA,ρ (B) = hEA (B)Ψ, Ψi for
any observable A and for all Borel sets B ⊂ R, and
Z
λ dhEA (λ)Ψ, Ψi = hAΨ, Ψi,
hAiρ =
R
where the last equality makes sense only when Ψ lies in the domain of A. Also,
hA2 iρ = hA2 Ψ, Ψi = hAΨ, AΨi = kAΨk2 ,
so σρ2 (A) = kAΨk2 − hAΨ, Ψi2 . If A = {A1 , . . . , An } is a finite set of observables that pairwise
commute11 , we can define a projection-valued measure EA on Rn by setting EA (B1 × · · · × Bn ) =
EA1 (B1 ) . . . EAn (Bn ) for any Borel sets B1 , . . . , Bn ⊂ R, and this will then define EA for all Borel
subsets of Rn . If the system is in state ρ, this will define a probability measure µA,ρ on Rn by
µA,ρ (B) = ρ(EA (B)) = Tr(b
ρEA (B)) for any Borel set B ⊂ Rn , and the quantity µA,ρ (B) is to
be interpreted as the probability that the result (a1 , . . . , an ) of a simultaneous measurement of the
observables A1 , . . . , An belongs to the Borel set B ⊂ Rn . When two observables do not commute,
it is impossible to measure them simultaneously.
For each quantum system there is a special observable H ∈ A, called the Hamiltonian operator,
or simply the Hamiltonian of the quantum system. Given this Hamiltonian H (with dense domain
D(H)), we define a strongly-continuous one-parameter unitary group UH (t) on H by12
UH (t) = e−itH .
d
For any vector Ψ ∈ D(H) we then have i dt
UH (t)Ψ = UH (t)HΨ = HUH (t)Ψ, where in the last
equality we have assumed that each UH (t) leaves D(H) invariant. In quantum theory we have
the so-called Heisenberg picture and Schrödinger picture to describe the dynamics of a quantum
system.
In the Heisenberg picture, the observables depend on time, whereas the states do not depend
on time. Here we will only consider quantum systems for which the Hamiltonian does not depend
explicitly on time. When we have a quantum system in state ρ ∈ S at time t = 0, then the time
evolution A(t) of an observable with A(0) = A ∈ A is given by
A(t) = UH (−t)AUH (t) ∈ A
and hence the probability that the result of a measurement at time t of an observable A lies in the
Borel set B ⊂ R is
µA(t),ρ (B) = Tr(ρ̂EA(t) (B)).
Now assume that Ψ ∈ D(H) is such that UH (t0 )Ψ ∈ D(A) for all t0 in some neighborhood of t ∈ R,
then
d
d d 0
A(t)Ψ =
UH (−t )AUH (t)Ψ + 0 UH (−t)AUH (t0 )Ψ
dt
dt0 t0 =t
dt t0 =t
= iHUH (−t)AUH (t)Ψ − iUH (−t)AUH (t)HΨ
= i(HA(t) − A(t)H)Ψ.
In this sense, we can say that A(t) satisfies the Heisenberg equation of motion
dA
(t) = i[H, A(t)].
dt
11
We say that two self-adjoint elements S and T commute if ES (B1 )ET (B2 ) = ET (B2 )ES (B1 ) for all Borel sets
B1 , B2 ⊂ R.
12
We will always use units in which ~ = 1. Otherwise there would have been a factor ~−1 in the exponent.
23
An observable A ∈ A is called a quantum integral of motion (or constant of motion) if A(t) = A
for all t ∈ R. So an observable A is a constant of motion if and only if it commutes with all UH (t),
which happens precisely when it commutes with H. Hence constants of motion are observables
that commute with the Hamiltonian.
In the Schrödinger picture, states depend on time and observables do not depend on time.
When we have a quantum system which is at state ρ at t = 0, then the time evolution of the state
is described in term of the time evolution of the density operator as
ρb(t) = UH (t)b
ρUH (−t) ∈ B1 (H).
and hence the probability that the result of a measurement at time t of an observable A lies in the
Borel set B ⊂ R is
µA,ρ(t) (B) = Tr(ρ̂(t)EA (B)).
For any Ψ ∈ D(H) we then have
d
d d 0
ρb(t)Ψ =
UH (t )b
ρUH (−t)Ψ + 0 UH (t)b
ρUH (−t0 )Ψ
dt
dt0 t0 =t
dt t0 =t
= −iHUH (t)b
ρUH (−t)Ψ + iUH (t)b
ρUH (−t)HΨ
= i(H ρb(t) − ρb(t)H)Ψ.
In this sense, ρb(t) satisfies the Schrödinger equation of motion
db
ρ
(t) = −i[H, ρb(t)].
dt
A state ρ ∈ S is called stationary when ρb(t) = ρb for all t ∈ R. Thus a state ρ is stationary if and
only if ρb commutes with H. If ρ is a pure state given by ρb = EΨ , then for any Φ ∈ H we have
ρb(t)Φ = UH (t)EΨ (UH (−t)Φ) = UH (t)(hUH (−t)Φ, ΨiΨ) = hΦ, UH (t)ΨiUH (t)Ψ = EΨ(t) Φ,
where Ψ(t) = UH (t)Ψ. Now if Ψ ∈ D(H), we get
so then Ψ(t) satisfies the Schrödinger equation
i
2.2.3
dΨ
dt (t)
=
d
dt UH (t)Ψ
= −iHUH (t)Ψ = −iHΨ(t),
dΨ
(t) = HΨ(t).
dt
Symmetries in quantum theory
The global structure of this subsection is borrowed from section 2.4 of [1], but the proofs presented
here are more detailed than in [1]. In quantum theory, a symmetry of a quantum system (H, A)
is a pair (s, s0 ) of bijections s : A0 → A0 and s0 : S → S on the set of bounded observables and
the set of states, respectively, satisfying
(s0 ρ)(sA) = ρ(A),
s(f (A)) = f (s(A))
for all ρ ∈ S, f ∈ C(σ(A)) and A ∈ A0 . Also, if (s1 , s01 ) and (s2 , s02 ) are two symmetries, it can be
shown that s1 = s2 ⇔ s01 = s02P
; in other words, a symmetry (s, s0 ) is completely determined once
we know either s or s0 . If ρ = i λi ρi ∈ S is a convex combination of states ρi ∈ S, then for each
A ∈ A0 we have
X
X
(s0 ρ)(sA) = ρ(A) =
λi ρi (A) =
λi (s0 ρi )(sA).
i
i
Because s is a bijection, this implies that for each A ∈ A0 we have
X
X
(s0 ρ)(A) = (s0 ρ)(s(s−1 A)) =
λi (s0 ρi )(s(s−1 A)) =
λi (s0 ρi )(A),
i
i
24
P
0
0
so s0 ρ =
i λi s ρi , and hence s preserves the convex structure of S. In particular, the set of
extreme points is mapped bijectively onto itself, so s0 gives a bijection from the set of pure states
onto itself. Because the set of pure states is in one-to-one correspondence to the set of unit rays
in H (since we have assumed that A is the set of all self-adjoint operators on H), this means that
(s, s0 ) induces a bijection
sb : {R(Ψ)}Ψ∈H → {R(Ψ)}Ψ∈H
from the set of unit rays of H onto itself.
If ρ1 , ρ2 ∈ P S are the two pure states corresponding to the unit rays R(Ψ1 ) and R(Ψ2 ),
respectively, then we say that the pure states ρ1 and ρ2 are orthogonal if R(Ψ1 ) ⊥ R(Ψ2 ) (i.e. if
the vectors in the unit ray R(Ψ1 ) are orthogonal to the vectors in the unit ray R(Ψ2 )), and we
write ρ1 ⊥ ρ2 . Orthogonal pure states can also be characterized as follows.
Lemma 2.17 Two pure states ρ1 , ρ2 ∈ P S are orthogonal if and only if there exists a projection
operator E satisfying ρ1 (E) = 1 and ρ2 (E) = 0.
Proof
If such projection operator exists, then k(1 − E)Ψ1 k2 = ρ1 (1 − E) = 0 and kEΨ2 k2 = ρ2 (E) = 0,
where Ψ1 and Ψ2 are unit vectors in the unit rays corresponding to the pure states ρ1 and ρ2 ,
respectively. Thus (1 − E)Ψ1 = 0 and EΨ2 = 0, so
hΨ1 , Ψ2 i = h(1 − E)Ψ1 , Ψ2 i + hEΨ1 , Ψ2 i = 0 + hΨ1 , EΨ2 i = 0.
Now suppose that hΨ1 , Ψ2 i = 0. If E is the one-dimensional projection onto CΨ1 then EΨ1 = Ψ1
and EΨ2 = 0, so ρ1 (E) = hEΨ1 , Ψ1 i = 1 and ρ2 (E) = hEΨ2 , Ψ2 i = 0.
Lemma 2.18 If (s, s0 ) is a symmetry and ρ1 , ρ2 ∈ P S are pure states, then ρ1 ⊥ ρ2 if and only
if s0 ρ1 ⊥ s0 ρ2 . In other words, if R1 and R2 are the unit rays corresponding to ρ1 and ρ2 , then
R1 ⊥ R2 if and only if sb(R1 ) ⊥ sb(R2 ).
Proof
Suppose that ρ1 ⊥ ρ2 . Let E ∈ A0 be a projection with ρ1 (E) = 1 and ρ2 (E) = 0. Because
(sE)2 = s(E 2 ) = sE, sE is also a projection and we have (s0 ρ1 )(sE) = ρ1 (E) = 1 and (s0 ρ2 )(sE) =
ρ2 (E) = 0. Hence s0 ρ1 ⊥ s0 ρ2 , by the previous lemma. Now suppose that s0 ρ1 ⊥ s0 ρ2 . Because
(s−1 , (s0 )−1 ) is also a symmetry, the argument above gives that (s0 )−1 s0 ρ1 ⊥ (s0 )−1 s0 ρ2 , i.e. ρ1 ⊥ ρ2 .
Definition 2.19 If ρ1 and ρ2 are two pure states corresponding to the unit rays R(Ψ1 ) and R(Ψ2 ),
respectively, then
hρ1 , ρ2 i := |hΨ1 , Ψ2 i|2
is called the transition probability between the states ρ1 and ρ2 .
Physically, this transition probability represents the probability that a system in state ρ1 is found
to be in state ρ2 after measurement, or vice versa.
Lemma 2.20 If (s, s0 ) is a symmetry, then for all pure states ρ1 , ρ2 ∈ P S we have
hs0 ρ1 , s0 ρ2 i = hρ1 , ρ2 i.
As a consequence, if R1 and R2 are the unit rays corresponding to ρ1 and ρ2 respectively, then for
arbitrary Ψj ∈ Rj and Ψ0j ∈ sb(Rj ) we have
|hΨ01 , Ψ02 i|2 = |hΨ1 , Ψ2 i|2 .
25
Proof
Let Ψ1 and Ψ2 be two unit vectors in the unit rays corresponding to ρ1 and ρ2 , respectively, and
let Eρ1 be the one-dimensional projection onto CΨ1 . Then sEρ1 is a projection, (s0 ρ1 )(sEρ1 ) =
ρ1 (Eρ1 ) = 1 and for any ρ ∈ P S with ρ ⊥ ρ1 we have (s0 ρ)(sEρ1 ) = ρ(Eρ1 ) = 0. This implies that
sEρ1 = Es0 ρ1 .
We also have
hρ1 , ρ2 i = |hΨ1 , Ψ2 i|2 = |hΨ1 , Eρ1 Ψ2 i|2 = kEρ1 Ψ2 k2 = hEρ1 Ψ2 , Eρ1 Ψ2 i = hEρ1 Ψ2 , Ψ2 i
= ρ2 (Eρ1 ),
and similarly, hs0 ρ1 , s0 ρ2 i = (s0 ρ2 )(Es0 ρ1 ), so we get
hs0 ρ1 , s0 ρ2 i = (s0 ρ2 )(Es0 ρ1 ) = (s0 ρ2 )(sEρ1 ) = ρ2 (Eρ1 ) = hρ1 , ρ2 i.
This lemma has a nice implication. Let {Ψn }n be an orthonormal basis of H, let (s, s0 ) be a
symmetry and let {Ψ0n }n be a set of vectors with Ψ0n ∈ sb(R(Ψn )). Then, according to the lemma,
for arbitrary m, n we have |hΨ0m , Ψ0n i|2 = |hΨm , Ψn i|2 = δmn . By positive definiteness of the inner
product, this implies that
hΨ0m , Ψ0n i = δmn .
So {Ψ0n }n is an orthonormal set in H. Now suppose that there would exist a unit vector Ψ0 ∈ H
with Ψ0 ⊥ Ψ0n for all n. Then for each Ψ ∈ sb−1 (R(Ψ0 )) we have that |hΨ, Ψn i|2 = |hΨ0 , Ψ0n i|2 = 0
for all n. Because {Ψn }n is an orthonormal basis of H, this implies that Ψ = 0. This contradicts
the fact that Ψ is a unit vector. Hence, we conclude that no unit vector Ψ0 ∈ H satisfying Ψ0 ⊥ Ψ0n
for all n can exist. This shows that {Ψ0n }n is also an orthonormal basis of H.
Lemma 2.21 Let Ψ1 , Ψ2 ∈ H be unit vectors with Ψ1 ⊥ Ψ2 and let R(Ψ1 ) and R(Ψ2 ) be their
corresponding unit rays. For λ, µ ∈ C with |λ|2 + |µ|2 = 1 we define the unit vector Ψλ,µ :=
λΨ1 + µΨ2 with corresponding unit ray R(Ψλ,µ ). If (s, s0 ) is a symmetry and Ψ0 ∈ sb(R(Ψc1 ,c2 ))
with c1 c2 6= 0, then there exist Ψ01 ∈ sb(R(Ψ1 )) and Ψ02 ∈ sb(R(Ψ2 )) satisfying
Ψ0 = c1 Ψ01 + c2 Ψ02 .
Proof
If we define R := {R(Ψλ,µ ) : |λ|2 + |µ|2 = 1}, then R is precisely the set of all unit rays which are
orthogonal to any ray which is orthogonal to both R(Ψ1 ) and R(Ψ2 ). By lemma 2.18, sb(R) :=
{b
s(R) : R ∈ R} is then precisely the set of all unit rays which are orthogonal to any ray which is
orthogonal to both sb(R(Ψ1 )) and sb(R(Ψ2 )). But this implies that we can write sb(R) as
sb(R) = {R(λ0 Ψ001 + µ0 Ψ002 ) : |λ0 |2 + |µ0 |2 = 1},
where Ψ001 and Ψ002 are some fixed vectors in sb(R(Ψ1 )) and sb(R(Ψ2 )). Thus for each R(Ψλ,µ ) ∈ R
we can write sb(R(Ψλ,µ )) as sb(R(Ψλ,µ )) = R(λ0 Ψ001 + µ0 Ψ002 ) for some λ0 , µ0 ∈ C with |λ0 |2 + |µ0 |2 = 1.
Now let Ψ0 ∈ sb(R(Ψc1 ,c2 )) with c1 c2 6= 0. Because sb(R(Ψc1 ,c2 )) = R(c01 Ψ001 + c02 Ψ002 ) for some
0
c1 , c02 ∈ C with |c01 |2 + |c02 |2 = 1, there exists a θ ∈ [0, 2π) such that
Ψ0 = eiθ (c01 Ψ001 + c02 Ψ002 ) = c001 Ψ001 + c002 Ψ002 ,
where c00j := eiθ c0j . Because |c00j |2 = |hΨ0 , Ψ00j i|2 = |hΨc1 ,c2 , Ψj i|2 = |cj |2 , we can write c00j = cj eiθj for
some θj ∈ [0, 2π). But then
Ψ0 = c1 eiθ1 Ψ001 + c2 eiθ2 Ψ002 = c1 Ψ01 + c2 Ψ02 ,
26
where Ψ0j := eiθj Ψ00j ∈ sb(R(Ψj )).
We will now formulate and prove the main theorem concerning symmetries in quantum theory.
This theorem, due to Wigner, states that we can represent any symmetry transformation of a
quantum system by a unitary or antiunitary operator on the corresponding Hilbert space. The
(constructive) proof of Wigner’s theorem that is given below is a mixture of the proofs in [35] and
[1].
Theorem 2.22 (Wigner’s theorem) Let (H, A) be a quantum system and let sb be a bijection
of the set of unit rays onto itself that conserves transition probabilities. Then there exists a map
U : H → H that is either linear and unitary or antilinear and antiunitary satisfying
U Ψ ∈ sb(R(Ψ))
for all unit vectors Ψ ∈ H. Furthermore, such a map U is uniquely determined up to a phase
factor.
Proof
We divide the proof into several steps.
•Step 1: Define U on an orthonormal basis {Ψn }n≥1 of H.
Let {Ψn }n≥1 be an orthonormal basis of H (here we use the assumption that Hilbert spaces in
quantum mechanics are separable) and write Rn := R(Ψn ) for the corresponding unit rays. For
n 6= 1 we define the unit vectors
1
Φn = √ (Ψ1 + Ψn ).
2
We now choose an arbitrary vector in sb(R1 ) and call it U Ψ1 . For arbitrary Φ0n ∈ sb(R(Φn ))
we have |hΦ0n , U Ψ1 i|2 = |hΦn , Ψ1 i|2 = 12 , so there exists a unique U Φn ∈ sb(R(Φn )) such that
hU Φn , U Ψ1 i = √12 . For n 6= 1 we then define
U Ψn :=
√
2U Φn − U Ψ1 .
To see that U Ψn ∈ sb(Rn ), observe that, according to lemma 2.21, there exist Ψ01 ∈ sb(R1 ) and
Ψ0n ∈ sb(Rn ) satisfying U Φn = √12 (Ψ01 + Ψ0n ). Because Ψ01 , U Ψ1 ∈ sb(R1 ), there exists a θ ∈ [0, 2π)
√
such that Ψ01 = eiθ U Ψ1 . Then U Ψn can be written as U Ψn = √2U Φn − U Ψ1 = (eiθ − 1)U Ψ1 + Ψ0n .
Because U Ψ1 ⊥ Ψ0n , we find that eiθ − 1 = hU Ψn , U Ψ1 i = 2hU Φn , U Ψ1 i − hU Ψ1 , U Ψ1 i = 0.
Hence we have U Ψn = (eiθ − 1)U Ψ1 + Ψ0n = Ψ0n , so indeed U Ψn ∈ sb(Rn ). We have now defined
U on the basis elements Ψn and on the elements Φn = √12 (Ψ1 + Ψn ). According to the discussion
after lemma 2.18, {U Ψn }n forms an orthonormal basis of H, and our definition of U Φn does not
spoil the possibility to make U an R-linear operator, since
1
1
U √ (Ψ1 + Ψn ) = U Φn = √ (U Ψ1 + U Ψn ).
2
2
P
P
•Step 2: If Ψ = n cn Ψn is a unit vector with c1 6= 0 and if Ψ0 = n c0n U Ψn ∈ sb(R(Ψ)), then
either cn /c1 = c0n /c01 for all n or cn /c1 = c0n /c01 for all n.
Let
X
Ψ=
cn Ψn ∈ H
n
be a unit vector with c1 6= 0. If
Ψ0
∈ sb(R(Ψ)), we may write it as
X
Ψ0 =
c0n U Ψn .
n
27
Note
that for all
n we have |cn |2 = |hΨ, Ψn i|2 = |hΨ0 , U Ψn i|2 = |c0n |2 . Also, |c1 + cn |2 =
√
√
2|hΨ, Φn i|2 = 2|hΨ0 , U Φn i|2 = |c01 + c02 |2. For arbitrary complex numbers a, b ∈ C with a 6= 0
we have |a + b|2 = |a|2 +|b|2 + 2|a|2 Re ab , so theequality
|c1 + cn |2 = |c01 + c02 |2 implies that
0
|c1 |2 + |c2 |2 + 2|c1 |2 Re ccn1 = |c01 |2 + |c02 |2 + 2|c01 |2 Re ccn0 . Together with |cn |2 = |c0n |2 this implies
1
that
0 c
cn
= Re n0 .
Re
c1
c1
h i2 2 h i2 0 2 h 0 i2 h 0 i2
Also, Im ccn1
= Im ccn0
. Hence,
= ccn1 − Re ccn1
= ccn0 − Re ccn0
1
Im
cn
c1
1
= ±Im
c0n
c01
1
.
Thus we conclude that we either have
cn /c1 = c0n /c01
(2.14)
cn /c1 = c0n /c01 .
(2.15)
or
We will now show that for each n, the same choice between (2.14) and (2.15) must be made.
Suppose that for some k we have ck /c1 = c0k /c01 and that for some l we have cl /c1 = c0l /c01 and that
both ratios are not real. Note that this requires that k, l > 1 and k 6= l. Let
1
Υ := √ (Ψ1 + Ψk + Ψl )
3
and let Υ0 be an arbitrary vector in sb(R(Υ)). When we write Υ = d1 Ψ1 + dk Ψk + dl Ψl and
d0
d0
Υ0 = d01 U Ψ1 + d0k U Ψk + d0l U Ψl , we have ddk1 , dd1l ∈ R, so we must have ddk1 = dk0 and dd1l = d0l , so we
1
1
can write Υ0 = α(d1 U Ψ1 + dk U Ψk + dl Ψl ) = √α3 (U Ψ1 + U Ψk + U Ψl ) for some α ∈ C with |α| = 1.
It then follows from conservation of transition probability that
2
0
0 2
1 + ck + cl = 3 |hΨ0 , Υ0 i|2 = 3 |hΨ, Υi|2 = 1 + ck + cl .
c01
c01 |c01 |2
|c1 |2
c1
c1 By assumption, we have ck /c1 = c0k /c01 and cl /c1 = c0l /c01 , so
2
1 + ck /c1 + cl /c1 = |1 + ck /c1 + cl /c1 |2 .
For arbitrary complex numbers a, b ∈ C the equality |a+b|2 = |a+b|2 implies that Re(ab) = Re(ab),
which is equivalent to Re(a)Re(b) − Im(a)Im(b) = Re(a)Re(b) + Im(a)Im(b), or Im(a)Im(b) = 0.
When we apply this to a = 1 + ck /c1 and b = cl /c1 , we find that Im(1 + ck /c1 )Im(cl /c1 ) = 0. But
Im(1 + ck /c1 ) = Im(ck /c1 ), so
Im(ck /c1 )Im(cl /c1 ) = 0.
This implies that at least one of the two ratios ck /c1 and cl /c1 is real,
P which contradicts our assumption that both ratios are not real. We thus conclude that if Ψ = n cn Ψn is a unit vector with
P
c1 6= 0 and if Ψ0 = n c0n U Ψn ∈ sb(R(Ψ)), then either cn /c1 = c0n /c01 for all n or else cn /c1 = c0n /c01
for all n.
P
•Step 3: Define U on unit vectors Ψ = n cn Ψn with c1 6= 0 and for which it is not true that
cn /c1 ∈ R for all n.
If cn /c1 = c0n /c01 for all n and if it is not true that cn /c1 ∈ R for all n, then we define U Ψ to be the
unique vector in sb(R(Ψ)) for which c01 = c1 . In that case we get cn = (cn /c1 )c1 = (c0n /c01 )c01 = c0n
for all n.
28
If cn /c1 = c0n /c01 for all n and if it is not true that cn /c1 ∈ R for all n, then we define U Ψ to be
the unique vector in sb(R(Ψ)) for which c01 = c1 . In that case we get cn = (cn /c1 )c1 = (c0n /c01 )c01 = c0n
for all n.
If cn /c1 ∈ R for all n, we will not define U Ψ yet.
P
•Step 4: Define U on unit vectors Ψ = n cn Ψn with c1 = 0 and for which it is not true that
cn ∈ R for all n ≥ 2.
Now suppose that
X
cn Ψn ∈ H
Ψ=
n
is a unit vector with c1 = 0. We then define
X
e := √1 (Ψ1 + Ψ) =
Ψ
e
cn Ψn ,
2
n
√1
2
cn
√
2
for n 6= 1. If it is not true that cn ∈ R for all n ≥ 2 then it is also not
0
e =P e
e with
true that e
cn /e
c1 ∈ R for all n, so the procedure in step 3 defines U Ψ
b(R(Ψ))
n cn U Ψn ∈ s
0
0
either e
cn = e
cn for all n or else e
cn = e
cn for all n. We then define
X
√
e − U Ψ1 =
c0n U Ψn ,
U Ψ := 2U Ψ
where e
c1 =
and e
cn =
n
√
where c01 = 0 and c0n = 2e
c0n for n 6= 1. Thus, we either have c0n = cn for all n or else c0n = cn for
all n.
P
We have now defined U Ψ for all unit vectors Ψ = n cn Ψn for which not all coefficients have
the same phase (since we assumed that it is not true that cn /c1 ∈ R for all n), and for such unit
vectors we either have
!
X
X
UΨ = U
cn Ψn =
cn U Ψn
(2.16)
n
n
or
!
UΨ = U
X
cn Ψn
n
=
X
cn U Ψn .
(2.17)
n
Furthermore, for such unit vectors our choice between
the only choice that is
P (2.16) and (2.17) is P
possible, since for suchPunit vectors we either have n cn U Ψn ∈ sb(R(Ψ)) or n cn U Ψn ∈ sb(R(Ψ)),
but not both. If Ψ = n cn Ψn is a unit vector for
P which all coefficients have the same phase, then
also all coefficients of an arbitrary vector Ψ0 = n c0n U Ψn in sb(R(Ψ)) have the same phase. This
meansPthat for such Ψ
Pwe are free to choose whether we want U Ψ to satisfy (2.16) or (2.17), since
both n cn U Ψn and n cn U Ψn are in sb(R(Ψ)). As stated before, we will not make this choice yet.
•Step 5: The choice between (2.16) and (2.17) must be the same for all unit vectors Ψ for which
we have defined
U Ψ (i.e. the unit
P
P vectors where not all coefficients have the same phase).
Let Υ1 = n an Ψn and Υ2 = n bn Ψn be two unit vectors for which U Υ1 and U Υ2 are already
defined by steps 3 and 4 above, and such that U Υ1 satisfies equation (2.16) and U Υ2 satisfies equation (2.17); in particular, this implies that U Υ1 6= U Υ2 . Conservation of transition probability
gives
2
*
+2
X
X
X
ak Ψk ,
bl Ψl = |hΥ1 , Υ2 i|2
bn an = n
k
l
*
+2
X
X
= |hU Υ1 , U Υ2 i|2 = ak U Ψk ,
bl U Ψl k
l
2
X
= bn an .
n
29
Using this equality, we find that
X
X
[Re(bk bl )Re(ak al ) − Im(bk bl )Im(ak al )]
[Re(bk bl )Re(ak al ) + Im(bk bl )Im(ak al )] =
k,l
k,l
"
=
X
k,l
Re[(bk bl )(ak al )] = Re
!
X
bk ak
!#
X
k
bl al
l
2 2 2
X
X
X
= Re bn an = bn an = bn an n
n
n
2
"
!
!#
X
X
X
= Re bn an = Re
bk ak
bl al
n
k
l
X
=
Re[(bk bl )(ak al )]
k,l
=
X
[Re(bk bl )Re(ak al ) − Im(bk bl )Im(ak al )].
k,l
Thus, we find that U Υ1 and U Υ2 satisfy (2.16) and (2.17), respectively, if and only if
X
Im(bk bl )Im(ak al ) = 0.
(2.18)
k,l
We now separate two cases.
Case 1. If Υ1 and Υ2 lie in the same unit ray, then there exists a θ ∈ (0, 2π) such that for all
n we have bn = an eiθ .PBut then bk bl = ak eiθ al eiθ = ak al for all k, l, so equation (2.18) gives
P
2
2
k,l [Im(ak al )] , which implies that bk bl , ak al ∈ R for all k, l. This in turn
k,l [Im(bk bl )] = 0 =
implies that all an ’s have the same phase and all bn ’s have the same phase. But this contradicts
our assumption that U Υ1 and U Υ2 were already defined by our procedure. Hence, if Υ1 and Υ2
are in the same unit ray then the same choice between (2.16) and (2.17) must be made for U Υ1
and U Υ2 .
Case 2. Now suppose that Υ1 and Υ2 are not in the same unit ray. Because we have assumed that
U Υ1 and U Υ2 are defined, not all ak al and not all bk bl are real (since then all an would have the
same phase, as well as all bn ). We again separate two cases.
Case 2a. If there
exists a pair (i, j) such that both ai aj and bi bj are not
P real, then define a unit
P
vector Υ = n cn Ψn = √12 (eiλ Ψi + eiµ Ψj ) with 0 < λ < µ < 2π. Then k,l Im(ck cl )Im(ak al ) 6= 0
P
and k,l Im(ck cl )Im(bk bl ) 6= 0, so for U Υ and U Υ1 the same choice between (2.16) and (2.17)
must be made, and for U Υ and U Υ2 the same choice between (2.16) and (2.17) must be made.
Hence the same choice must be made for U Υ1 and U Υ2 .
Case 2b. Now suppose that there is no such pair (i, j). Then we choose a pair (i, j) for which
ai aj is not real and we choose a different pair (m, n)
with {m, n} ∩ {i, j} =
6 ∅) for
P (possibly P
which bm bn is not real. Now take a unit vector Υ = n cn Ψn = Pk∈{i,j,m,n} ck Ψk such that all
(three or four) coefficients ci , cj , cm , cn have different phases. Then k,l Im(ck cl )Im(ak al ) 6= 0 and
P
k,l Im(ck cl )Im(bk bl ) 6= 0, so again we conclude that the same choice between (2.16) and (2.17)
must be made for U Υ1 and U Υ2 .
Thus for all unit vectors Ψ for which we have already defined U Ψ, the same choice between
(2.16) and (2.17) must be made.
•Step 6: Define U for those unit vectors Ψ for which U Ψ was not yet defined by the previous
steps.
P
As stated before,
vectors Ψ = n cn Ψn for which we have not yet defined U Ψ, we
Pfor those unit P
have that both n cn U Ψn and n cn U Ψn are in the unit ray sb(R(Ψ)). We will now define U Ψ
for these vectors
P as follows. If U satisfies (2.16) for all unit vectors for which U is defined, then we
define U Ψ := n cn U Ψn . If U satisfies (2.17) for all unit vectors for which U is defined, then we
30
define U Ψ :=
P
n cn U Ψn .
•Step 7: Define U for Ψ ∈ H with kΨk =
6 1.
We have now defined U Ψ for all unit vectors Ψ in H and U either satisfies (2.16) for all unit
vectors, or else U satisfies (2.17) for all
unit
vectors. In both cases we can extend U to a map
Ψ
U : H → H by defining U Ψ := kΨkU kΨk for all Ψ ∈ H. In the first case, U becomes a linear
P
P
map on H and for arbitrary Ψ = n an Ψn and Φ = n bn Ψn in H we have
X
X
X
an bn =
hU Ψ, U Φi =
ak bl hU Ψk , U Ψl i =
ak bl hΨk , Ψl i = hΨ, Φi,
n
k,l
k,l
so U is unitary. InP
the second case, U becomes an antilinear map on H and for arbitrary Ψ =
P
a
Ψ
and
Φ
=
n
n
n
n bn Ψn in H we have
X
X
X
ak bl hU Ψk , U Ψl i =
an bn =
ak bl hΨk , Ψl i = hΦ, Ψi,
hU Ψ, U Φi =
n
k,l
k,l
so U is antiunitary.
•Step 8: Uniqueness of U up to a phase factor.
Now suppose that U 0 : H → H is another (anti)linear (anti)unitary map satisfying U 0 Ψ ∈ sb(R(Ψ))
for all unit vectors Ψ ∈ H. Choose an arbitrary unit vector Ψ0 ∈ H. Because U Ψ0 , U 0 Ψ0 ∈
sb(R(Ψ0 )), there exists a λ0 ∈ [0, 2π) such that U 0 Ψ0 = eiλ0 U Ψ0 .
Let R be a unit ray in H that is not orthogonal to the unit ray R(Ψ0 ), and let Ψ ∈ R be the
unique unit vector with hΨ, Ψ0 i ∈ R>0 . Since U Ψ and U 0 Ψ lie in the same unit ray, we can write
U 0 Ψ = eiλ U Ψ for some λ ∈ [0, 2π). But, because U and U 0 preserve real inner products,
hU Ψ, U Ψ0 i = hΨ, Ψ0 i = hU 0 Ψ, U 0 Ψ0 i = ei(λ−λ0 ) hU Ψ, U Ψ0 i,
so λ = λ0 and thus U 0 Ψ = eiλ0 U Ψ. Using the fact that U and U 0 are both (anti)linear, we find
that this holds not only for Ψ, but for all vectors in CΨ. We have thus found that U 0 Υ = eiλ0 U Υ
for all Υ ∈ H with hΥ, Ψ0 i =
6 0.
Now let Ψ ∈ H with hΨ, Ψ0 i = 0. Define Φ = Ψ + Ψ0 . Because Φ is not orthogonal to Ψ0 , it
follows from the discussion above that U 0 Φ = eiλ0 U Φ. Because U and U 0 are R-linear, we have
that U Φ = U Ψ + U Ψ0 and U 0 Φ = U 0 Ψ + U 0 Ψ0 . This gives us
U 0 Ψ = (U 0 Φ − U 0 Ψ0 ) = eiλ0 (U Φ − U Ψ0 )
= eiλ0 U Ψ.
We thus conclude that U 0 = eiλ0 U .
In case U is unitary, we have for all Ψ ∈ H and any observable A
hAΨ, Ψi = ρΨ (A) = (s0 ρΨ )(sA) = h(sA)U Ψ, U Ψi = hU ∗ (sA)U Ψ, Ψi,
which implies that A = U ∗ (sA)U , or
sA = U AU −1 ,
(2.19)
where we have used that unitarity of a linear map U : H → H is equivalent to U ∗ = U −1 . Note
that the expression for sA in (2.19) does not depend on the arbitrary phase factor in U , and the
bijection s : A0 → A0 so defined is R-linear and satisfies sA2 = (sA)2 . We may as well extend
(2.19) to a bijection s : B(H) → B(H), and this bijection is in fact an automorphism of the
C ∗ -algebra B(H) (we will discuss this in more detail in subsection 4.2.1). Now consider the case
31
where U is anti-unitary. Because the definition of the adjoint U ∗ of an anti-linear map is given by
the condition hU Ψ1 , Ψ2 i = hΨ1 , U ∗ Ψ2 i, we now obtain the equality
hAΨ, Ψi = ρΨ (A) = (s0 ρΨ )(sA) = h(sA)U Ψ, U Ψi = hU ∗ (sA)U Ψ, Ψi = hΨ, U ∗ (sA)U Ψi
for all Ψ ∈ H and any observable A. This implies that A∗ = U ∗ (sA)U , or
sA = U A∗ U −1 ,
(2.20)
where we have used that anti-unitarity of an anti-linear map U : H → H is equivalent to U ∗ =
U −1 . Again this expression does not depend on the arbitrary phase factor in U and the map
s : A0 → A0 in (2.20) is R-linear and satisfies sA2 = (sA)2 . When we extend s in (2.20) to a map
s : B(H) → B(H), we obtain an anti-automorphism of the C ∗ -algebra B(H) (i.e. a ∗-preserving
vector space isomorphism satisfying sAB = sBsA). We will come back to this in subsection 4.2.1.
It is clear that the set of all symmetries of a quantum system forms a group under the composition of (bijective) maps; it is called the symmetry group of the system. Now suppose that
(s1 , s01 ) and (s2 , s02 ) are two symmetries of a quantum system with corresponding unit ray transformations sb1 and sb2 . Then the composition (s2 ◦ s1 , s02 ◦ s01 ) is also a symmetry of the system
with corresponding unit ray transformation sb2 ◦ sb1 . By Wigner’s theorem there exist operators
U1 , U2 , U2◦1 : H → H each of which is either linear and unitary or antilinear and antiunitary,
and such that U1 Ψ ∈ sb1 (R(Ψ)), U2 Ψ ∈ sb2 (R(Ψ)), U2◦1 Ψ ∈ sb2 ◦ sb1 (R(Ψ)) for all unit vectors
Ψ ∈ H. Hence we must have U2 ◦ U1 = λ(sb1 , sˆ2 )U2◦1 , where λ(sb1 , sb2 ) is some complex number with
|λ(sb1 , sb2 )| = 1. In other words, we may conclude that if G is (a subgroup of) the symmetry group
of the quantum system and if for each g ∈ G we have chosen an operator U (g) as in Wigner’s
theorem, then for all g1 , g2 ∈ G we have
U (g1 )U (g2 ) = λ(g1 , g2 )U (g1 g2 ),
where λ : G × G → C with |λ(g1 , g2 )| = 1 for all g1 , g2 ∈ G. We say that U : G → B(H) is a
ray representation of G: it is a representation of G up to a phase factor. Because B(H) is an
associative algebra, we must have (U (g1 )U (g2 ))U (g3 ) = U (g1 )(U (g2 )U (g3 )) for all g1 , g2 , g3 ∈ G,
which implies that λ must satisfy
λ(g1 , g2 )λ(g1 g2 , g3 ) = λ(g1 , g2 g3 )λ(g2 , g3 );
a function λ : G × G → C with |λ(g, h)| = 1 for all g, h ∈ G satisfying this equation is called
a 2-cocycle of G. Because the operators U (g) are only determined up to a phase factor, we can
redefine them by letting U 0 (g) := µ(g)U (g), where µ : G → C with |µ(g)| = 1 for all g ∈ G. By
considering all such functions µ, we obtain all possible ray representations U : G → C. A natural
question to ask is whether we can choose µ such that λ(g1 , g2 ) = 1 for all g1 , g2 ∈ G; in that case
we get U (g1 )U (g2 ) = U (g1 g2 ) for all g1 , g2 ∈ G, so U becomes an ordinary representation of G
instead of a ray representation. In the next subsection we will answer this question for the case
↑
where G is the restricted Poincaré group P+
.
2.2.4
Poincaré invariance and one-particle states
In relativistic quantum theory it is believed that the laws of nature are the same for two observers
whose reference frames are related by a restricted Poincaré transformation, so every restricted
Poincaré transformation on spacetime must give rise to a symmetry of the quantum system. For
↑
notational simplicity we will identify a restricted Poincaré transformation g ∈ P+
with its associated symmetry by writing g instead of ŝ(g) for the corresponding ray transformation. We can
↑
then say that in relativistic quantum theory P+
must be a subgroup of the symmetry group of
↑
the theory. By the remarks in the previous subsection, this means that the action of P+
on a
↑
↑
quantum system is given by a ray representation U : P+ → B(H) of P+ on H. In a sufficiently
↑
↑
small neighborhood N ⊂ P+
of the identity 1 ∈ P+
, each element g ∈ N can be written as g = h2
32
↑
with h ∈ P+
. But then U (g) = λ(h, h)−1 U (h)U (h), which is linear and unitary, regardless of
whether U (h) is linear and unitary or antilinear and antiunitary. Hence for all g ∈ N the operator
↑
U (g) is linear and unitary. Because each elements g ∈ P+
can be written as a finite product of
↑
elements in N (since P+ is a connected Lie group), this in turn implies that U (g) is linear and
↑
↑
unitary for all g ∈ P+
. In other words, the action of P+
on a quantum system is given by a unitary
↑
↑
ray representation U : P+ → B(H) of P+ on H.
↑
↑
e↑ → P ↑ is
Now suppose that Uray : P+
→ B(H) is a unitary ray representation of P+
. If Φ : P
+
+
↑
↑
e
e
e
the covering map, then Uray := Uray ◦ Φ : P+ → B(H) is a unitary ray representation of P+ . For
eray the following theorem applies.
U
e↑ can, by a suitable choice of phase factors,
Theorem 2.23 Any unitary ray representation of P
+
e↑ .13
be made into a unitary representation of P
+
e↑ → C with |µ(e
e↑ such that ge 7→ µ(e
eray (e
Thus there exists a function µ : P
g )| = 1 for all ge ∈ P
g )U
g)
+
+
↑
↑
e ; we denote this unitary representation of P
e by U
e . The ray
is a unitary representation of P
+
+
↑
↑
e
e
representation Uray of P+ can thus be described by the unitary representation U : P+ → B(H) of
e (e
the universal covering group since R(Uray (g)Ψ) = R(U
g )Ψ), where ge is one of the two elements
−1
of the set Φ ({g}).
e↑ also gives rise to a ray representation of P ↑ .
Conversely, each unitary representation of P
+
+
e is a unitary representation of P
e↑ . Because U
e (1) = 1H and because
To see this, suppose that U
+
e (−1)U
e (−1) = U
e (1) = 1H , we must have U
e (−1) = ±1H . Then for each ge ∈ P
e↑ we have
U
+
e
e
e
e
e
e
e
U (−e
g ) = U (−1)U (e
g ) = ±U (e
g ), so R(U (−e
g )Ψ) = R(U (e
g )Ψ). This shows that U gives rise to a
↑
e (±g̃)} for all g ∈ P ↑ , where ge is one
unitary ray representation Ur of P+
by choosing Uray (g) ∈ {U
+
of the two elements of Φ−1 ({g}). We can thus conclude that for each unitary ray representation of
↑
e↑ that gives rise to the same transformation of unit rays
P+
there is a unitary representation of P
+
e↑ there exists a unitary ray representation of
of H, and that for each unitary representation of P
+
↑
P+
that gives rise to the same transformation of unit rays of H.
e↑
Classification of the irreducible representations of P
+
e↑ in some detail. There are several
We will now study the irreducible unitary representations of P
+
texts where one can find a discussion about these representations, but each of these texts misses
some of the information that can be found in one of the other texts. Here we have tried to include
as much (relevant) information as possible, based on [1], [3], [2], [8], [20], [29], [32] and [35]. Let
e↑ in the Hilbert space H. We will always assume
U be an irreducible unitary representation of P
+
that such representations are continuous with respect to the weak operator topology on B(H).
e↑ there corresponds a unitary operator U (a, L) on H. Before we
So to each element (a, L) ∈ P
+
proceed we have to choose some convention concerning the physical interpretation of the elements
of SL(2, C). In our discussion we will assume that we have chosen a fixed basis {eµ } of M. We
then let Φ : SL(2, C) → L↑+ be precisely the (basis-dependent) covering map that was defined
t
in section 2.1.2. So for instance, the element e 2i σj corresponds to a spatial rotation around the
xj -axis (over an angle t) in the chosen basis.
We will now study the representations in several steps.
•Step 1: Decomposition according to the translation subgroup
e↑ of translations gives rise to a continuous 4-parameter
The abelian subgroup {(a, 1)}a∈M ⊂ P
+
unitary group U (a, 0) of commuting unitary operators on H. The four parameters correspond to
the decomposition of the vectors a ∈ M with respect to our chosen orthonormal basis {eµ }3µ=0
of M. According to a generalization of Stone’s theorem on strongly continuous 1-parameter unitary groups in a Hilbert space (see also section X.5 of [5]), called the SNAG theorem (for Stone,
13
e↑ by P ↑ .
This theorem is not true when we replace P
+
+
33
Naimark, Ambrose and Godement), there exist four pairwise commuting self-adjoint operators P µ
(µ = 0, 1, 2, 3) defined on a common dense domain DP ⊂ H such that
U (a, 1) = eia·P
for all a ∈ M, where we have used the notation P := (P 0 , P 1 , P 2 , P 3 ). The operator P µ is
called the generator of translations in the xµ -direction. Under an SL(2, C) transformation these
operators transform according to
U (0, A)P µ U (0, A)−1 = Φ(A)ν µ P ν ,
which follows from
µP
U (0, A)eia
µ
µ Φ(A)ν
U (0, A)−1 = U (Φ(A)a, 1) = eia
µ Pν
,
which implies that U (0, A)Pµ U (0, A)−1 = Φ(A)ν µ Pν . Let EP denote the joint spectral measure
of the operators P µ as defined in appendix A.2. Then the operators P µ and U (a, 1) on H can be
written as
Z
Z
Pµ =
pµ dEP (p)
and
U (a, 1) =
eia·p dEP (p).
M
M
R4 ,
Here we write M (Minkowski space) instead of
because we need the Minkowski metric η on
this space. As stated in appendix A.2 we can represent H as a direct integral
Z ⊕
H(p)dµ(p)
M
of Hilbert spaces H(p), corresponding to the operators P µ . If we denote the decomposition of
an element Ψ ∈ H with respect to this direct integral decomposition as a function Ψ(p), where
Ψ(p) ∈ H(p) for all p ∈ M and
Z
M
kΨ(p)k2H(p) dµ(p) < ∞,
then for each Ψ ∈ DP we have (P µ Ψ)(p) = pµ Ψ(p) and for each Ψ ∈ H we have (U (a, 0)Ψ)(p) =
eia·p Ψ(p). Of course one is always free to change the value of the functions Ψ(p) on a set of
µ-measure zero, but we will simply ignore this fact in the following. In other words, we always
pretend that we have chosen some particular representative in the equivalence class of functions
Ψ(p) corresponding to some Ψ ∈ H.
•Step 2: Lorentz generators
According to Stone’s theorem, there exist self-adjoint operators {M j }3j=1 and {N j }3j=1 on H such
t
t
that the one-parameter unitary groups {U (0, e 2i σj )}3j=1 and {U (0, e 2 σj )}3j=1 can be written as
j
j
{eitM }3j=1 and {eitN }3j=1 , respectively. For obvious reasons we will call the operator M j the
generator of a rotation around the xj -axis and the operator N j the generator of a Lorentz boost
in the xj direction. Now define a set of operators {M µν }3µ,ν=0 by
M µν
= −M νµ
M jk = M l
M 0j
= Nj,
where in the second line (j, k, l) is a cyclic permutation of (1, 2, 3). The operators M µν satisfy
U (0, A)M µν U (0, A)−1 = Φ(A)µ ρ Φ(A)ν σ M ρσ .
It follows from the definition of M µν above that the operators iMµν and iPµ satisfy the same
commutation relations as the Lie algebra basis elements Xµν and Yµ , so
[M µν , M ρσ ] = −i(η µρ M σν + η νρ M µσ + η νσ M ρµ + η µσ M νρ )
[M µν , P ρ ] = i(η νρ P µ − η µρ P ν )
[P µ , P ν ] = 0.
34
An immediate consequence of these relations is that the operator P 2 := Pµ P µ commutes with all
e↑ on H, it follows from
generators P µ and M µν . Because U is an irreducible representation of P
+
2
Schur’s lemma that P is a scalar multiple of the identity operator, i.e. P 2 = c1 1H . In particular,
this implies that the measure µ is supported in a subset of M of which all elements p have the
same value of p · p, but we will come back to this, in more detail, in step 5. From the generators
P µ and M µν we can construct four new operators
1
W µ := − µνρσ Mνρ Pσ ,
2
called Pauli-Lubanski operators. Here µνρσ is the completely antisymmetric tensor, normalized
such that 0123 = −0123 = 1; in some textbooks (for example in [2]) the sign of this antisymmetric
tensor is reversed, i.e. 0123 = −0123 = −1, and in those texts there is no minus sign in the
definition of W µ . The Pauli-Lubanski operators satisfy
Pµ W µ = 0.
The Pauli-Lubanski operators all commute with the operators P µ , i.e. [W µ , P ν ] = 0, so the action
of W µ on a vector Ψ ∈ H may be described by (W µ Ψ)(p) = W µ (p)Ψ(p) for some set of operators
{W µ (p) : H(p) → H(p)}p∈M . Also, the operator W 2 = Wµ W µ commutes with all generators P µ
and M µν , so it must be a scalar multiple of the identity operator, i.e. W 2 = c2 1H .
•Step 3: The action of SL(2, C) on the spectral measures
Because U (0, A)U (a, 1)U (0, A)∗ = U ((0, A)(a, 1)(0, A−1 )) = U (Φ(A)a, 1) for all A ∈ SL(2, C), the
spectral measures (defined in step 1) satisfy
U (0, A)EP (∆)U (0, A)∗ = EP (Φ(A)∆)
(2.21)
for all Borel sets ∆ ⊂ M. Because U (0, A)∗ is unitary, we have U (0, A)∗ H = H, so (2.21) implies
that
U (0, A)EP (∆)H = EP (Φ(A)∆)H.
If we use the notation H∆ := EP (∆)H, then this can be rewritten as
U (0, A)H∆ = HΦ(A)∆ .
(2.22)
In particular, this impies that we are free to identify the spaces H(p1 ) and H(p2 ) with each other
if p1 and p2 are related by a restricted Lorentz transformation.
•Step 4: The support of the measure µ is an L↑+ -invariant subset of M
Because U (0, A) is unitary, we have for each Borel set ∆ ⊂ M and for all vectors Ψ1 , Ψ2 ∈ H∆
Z
hΨ1 (p), Ψ2 (p)iH(p) dµ(p) = hΨ1 , Ψ2 i
∆
= hU (0, A)Ψ1 , U (0, A)Ψ2 i
Z
=
h(U (0, A)Ψ1 )(p), (U (0, A)Ψ2 )(p)iH(p) dµ(p),
Φ(A)∆
where h., .iH(p) denotes the inner product on H(p). Now let q ∈ M be a point outside the support
of the measure µ, which means that there exists an open neighborhood Vq of q with µ(Vq ) = 0.
For all Ψ1 , Ψ2 ∈ HVq we then have
Z
Z
0=
hΨ1 (p), Ψ2 (p)iH(p) dµ(p) =
h(U (0, A)Ψ1 )(p), (U (0, A)Ψ2 )(p)iH(p) dµ(p).
Vq
Φ(A)Vq
In particular, when we choose Ψ1 and Ψ2 such that the integrand on the right-hand side is strictly
positive on Φ(A)Vq , we find that µ(Φ(A)Vq ) = 0. But Φ(A)Vq is an open neighborhood of the
35
point Φ(A)q, so the point Φ(A)q also lies outside the support of µ. Because this is true for all
A ∈ SL(2, C) and because Φ(SL(2, C)) = L↑+ , it follows that all elements of the form Lq with
L ∈ L↑+ are outside the support of µ. This shows that the support of µ must in fact be invariant
under L↑+ .
•Step 5: Orbits and their relation to irreducible representations
For p ∈ M we call the set {Lp}L∈L↑ the orbit of the point p. We can define an equivalence relation
+
on M by defining two points in M to be equivalent if and only if they have the same orbit; in
particular, this gives rise to a partition of M into disjoint orbits. It is clear that {0} is an orbit.
We will now characterize all other orbits, so in the following discussion the orbits are assumed to
be different from {0}. Note that the elements of one orbit all have the same value of p2 = p · p.
When this value is nonnegative, we can write it as p2 = m2 with m ≥ 0. In case this value is
negative, we can write it as p2 = (im)2 with m ≥ 0. So to each orbit we assign a number m (or
im) in this manner. In the nonnegative case, it follows from our results of section 2.1.2 that for
all elements p in the same orbit the component p0 has the same sign (note that p0 6= 0 because
we have excluded the orbit {0} from our discussion), so to orbits with p2 ≥ 0 we can also assign
a sign ∈ {+, −}. The label (m, ) completely characterizes the orbits with p2 ≥ 0. For an orbit
for which p2 < 0, the label im completely characterizes the orbit. Thus M is partitioned into the
following orbits (where m ≥ 0):
+
Om
= {p ∈ M : p2 = m2 , p0 > 0};
−
Om
= {p ∈ M : p2 = m2 , p0 < 0};
Oim = {p ∈ M : p2 = −m2 };
{0}.
Because the support of the measure µ is L↑+ -invariant, the support of µ is the union of complete
orbits in M. If it contains two or more orbits, then we can always construct an open subset W
that is invariant under L↑+ and is such that it contains at least one orbit in the support of µ and
such that it excludes at least one orbit in the support of µ. Using the results that we derived
earlier, we then find that
U (0, A)HW = HΦ(A)W = HW
and
Z
U (a, 1)HW =
ia·p
e
Z
dEP (p) EP (W )H =
M
eia·p dEP (p)H ⊂ HW ,
W
so HW is an invariant subspace of H, contradicting the irreducibility of U . We thus conclude that
the support of µ consists of exactly one orbit and the operator P 2 = Pµ P µ = c1 1H that we defined
in step 2 is given by
P 2 = m2 1H .
e↑ is (partly) characterized by the labels of
Therefore each irreducible unitary representation of P
+
the corresponding orbit. In particular, H can be represented as direct integral
Z ⊕
H(p)dµ(p),
O
where O denotes the corresponding orbit of the irreducible representation U . As we shall see later,
+ (with m ≥ 0) and {0}, so from now on we
the only orbits of physical relevance are the orbits Om
will only consider these orbits.
•Step 6: Representations corresponding to the orbit {0}
In this case the irreducible representations are either one-dimensional (the trivial representation)
or infinite-dimensional. In physics, only the trivial representation will be relevant:
U (a, A)Ψ = Ψ
36
for all Ψ ∈ H, where H is one-dimensional.
+ with m ≥ 0
•Step 7: Representations corresponding to the orbits Om
+
We now fix some orbit Om and derive some general properties of the corresponding irreducible
representations. The Hilbert space H can be decomposed according to
Z ⊕
H(p)dµm (p).
H=
+
Om
+ and
It can be shown that the requirements that the support of the measure µm must equal Om
↑
that µm must be L+ -invariant, determine µm uniquely up to a positive factor. We will choose this
+ ) by
factor in such a way that the measure is given (on Om
dµm (p) =
d3 p
,
2p0
(2.23)
p
+ , which of course implies that p0 =
where we write p = (p0 , p) for all p ∈ Om
m2 + p2 . A nice
property of this normalization of µm is that for each function f : M → R we have
Z
Z
p
δ(p2 − m2 )θ(p0 )f (p)d4 p =
f ( m2 + p2 , p)dµm (p),
+
Om
M
where θ : R → {0, 1} denotes the step function. Note that the elements in our Hilbert space can
+
+
now be represented as (∪p∈Om
+ H(p))-valued functions Ψ on Om with Ψ(p) ∈ H(p) for all p ∈ Om
and
Z
kΨ(p)k2H(p) dµm (p) < ∞.
+
Om
Because U (0, A)H∆ = HΦ(A)∆ for all Borel sets ∆ and because Φ(A) : M → M is bijective,
+ in the orbit the vector space isomorphisms (but not Hilbert
we can now define for each p ∈ Om
space isomorphisms) Up→Φ(A)p (0, A) : H(p) → H(Φ(A)p) such that
Up→Φ(A)p (0, A)(Ψ(p)) := (U (0, A)Ψ)(Φ(A)p).
(2.24)
+ . Sometimes we will simply write U (0, A) instead of U
for all p ∈ Om
p
p→Φ(A)p (0, A) to save some
space whenever the equations get too long. Because U (0, A) is unitary, these mappings are indeed
+ . They are not isomorphisms of Hilbert spaces (i.e.
isomorphisms of vector spaces for all p ∈ Om
0
inner q
product preserving) because of the p in the denominator of equation (2.23); however, the
0
p
map
U
(0, A) : H(p) → H(Φ(A)p) is in fact a Hilbert space isomorphism. This
(Φ(A)p)0 p→Φ(A)p
follows from the fact that
kΨ(p)k2H(p)
d3 p
d3 p
2
=
kU
(0,
A)Ψ(p)k
p→Φ(A)p
H(Φ(A)p)
2p0
2(Φ(A)p)0
(which follows from the unitarity of U (0, A)), which in turn implies that
p0
kU
(0, A)Ψ(p)k2H(Φ(A)p)
(Φ(A)p)0 p→Φ(A)p
2
s
0
p
= U
(0,
A)Ψ(p)
(Φ(A)p)0 p→Φ(A)p
kΨ(p)k2H(p) =
.
H(Φ(A)p)
q
0
p
We will use the Hilbert space isomorphisms
U
(0, A) later when we will define
(Φ(A)p)0 p→Φ(A)p
orthonormal bases on the spaces H(p).
Note that the (vector space) isomorphisms Up→Φ(A)p (0, A) satisfy
UΦ(AB)−1 p (0, AB)(Ψ(Φ(AB)−1 p)) = UΦ(A)−1 p (0, A)UΦ(AB)−1 p (0, B)(Ψ(Φ(AB)−1 p)).
37
In particular, if Φ(A), Φ(B) ∈ L↑+ are such that Φ(A)p = Φ(B)p = p, then this becomes
Up→p (0, AB)(Ψ(p)) = Up→p (0, A)Up→p (0, B)(Ψ(p)).
(2.25)
+ in the orbit and choose14 an orthonormal basis {e (k)} of H(k).
Now fix an element k ∈ Om
σ
σ
We will now use the operators Up→Φ(A)p (0, A) to define an orthonormal basis for the other points
+ . First we fix for each p ∈ O + an element M ∈ SL(2, C) such that p = Φ(M )k. Then for
of Om
p
p
m
+ we define an orthonormal basis {e (p)} of H(p) by
each p ∈ Om
σ
σ
s
k0
eσ (p) :=
Uk→p (0, Mp )eσ (k).
p0
q
0
Here we use that kp0 Uk→p (0, Mp ) is a Hilbert space isomorphism, as shown above. With this basis,
P
we can write each Ψ(p) ∈ H(p) as Ψ(p) = σ Ψ(p, σ)eσ (p), and we can identify a vector Ψ ∈ H
with the function Ψ(p, σ). We will see in a moment that the spaces H(p) are finite-dimensional in
all physically relevant cases, so the index σ takes
finite number of values. The Hilbert space
L on a +
+ , µ ). The inner
H can thus be realized as a finite direct sum σ L2 (Om
, µm ) of copies of L2 (Om
m
product is thus given by
XZ
hΨ1 (p, σ), Ψ2 (p, σ)i =
Ψ1 (p, σ)Ψ2 (p, σ)dµm (p)
σ
=
+
Om
XZ
σ
Ψ1 ((p0 , p), σ)Ψ2 ((p0 , p), σ)
R3
d3 p
,
2p0
p
where in the last line p0 = m2 + p2 . Of course we could just as well have defined functions
L 2 3 d3 p
Ψ(p, σ) with p in R3 (or R3 \{0} for massless particles) and realize H as
σ L (R , 2p0 ), and
we will in fact do this later. However, for the moment we will stick with Ψ(p, σ) for notational
convenience.
Now that we have defined these bases of H(p) (given some basis of H(k)), we can express the
action of Up→Φ(A)p (0, A) for any A ∈ SL(2, C) as
s
k0
U
(0, A)Uk→p (0, Mp )eσ (k)
p0 p→Φ(A)p
s
k0
U
(0, AMp )eσ (k)
p0 k→Φ(A)p
s
k0
−1
U
(0, MΦ(A)p MΦ(A)p
AMp )eσ (k)
p0 k→Φ(A)p
s
k0
−1
U
(0, MΦ(A)p )Uk→k (0, MΦ(A)p
AMp )eσ (k),
p0 k→Φ(A)p
Up→Φ(A)p (0, A)eσ (p) =
=
=
=
(2.26)
where in the last step we used that
−1
−1
−1
Φ(MΦ(A)p
AMp )k = Φ(MΦ(A)p
)Φ(A)Φ(Mp )k = Φ(MΦ(A)p
)Φ(A)p = k.
+ in the orbit we define
To understand (2.26) we introduce some terminology. For any point p ∈ Om
a subgroup Gp ⊂ SL(2, C) by
Gp := {A ∈ SL(2, C) : Φ(A)p = p}.
14
In steps 7a and 7b we will show how to choose such basis for the cases m > 0 and m = 0 separately.
38
+ and A ∈ SL(2, C) is such that Φ(A)p = p0 , then the
Clearly, if p0 is another point in the orbit Om
groups Gp and Gp0 are isomorphic and the isomorphism from Gp to Gp0 is given by B 7→ ABA−1 .
+
The isomorphic subgroups {Gp }p∈Om
+ are called the little group of the orbit Om .
−1
In this terminology, the transformation MΦ(A)p
AMp ∈ SL(2, C) in (2.26) is an element of the
little group Gk . In fact, it follows from (2.25) that U induces a unitary representation of Gk on the
Hilbert space H(k) by A 7→ Uk→k (0, A) for A ∈ Gk . For A ∈ Gk we write [Uk→k (0, A)]σ,σ0 for the
matrix components of the unitary operator Uk→k (0, A) on H(k) with respect to the orthonormal
basis eσ (k). With this notation we can write (2.26) as
s
!
X
k0
−1
Up→Φ(A)p (0, A)eσ (p) =
U
(0, MΦ(A)p )
[Uk→k (0, MΦ(A)p AMp )]σ0 ,σ eσ0 (k)
p0 k→Φ(A)p
0
σ
s
0
X
k
−1
[Uk→k (0, MΦ(A)p
AMp )]σ0 ,σ Uk→Φ(A)p (0, MΦ(A)p )eσ0 (k)
=
p0 0
|
{z
}
σ
r
=
s
=
(Φ(A)p)0
eσ0 (Φ(A)p)
k0
(Φ(A)p)0 X
−1
[Uk→k (0, MΦ(A)p
AMp )]σ0 ,σ eσ0 (Φ(A)p)
p0
0
(2.27)
σ
Using (2.24) with Φ(A)−1 p instead of p, we then find that
(U (0, A)Ψ)(p) = UΦ(A)−1 p→p (0, A)(Ψ(Φ(A)−1 p))
X
=
Ψ(Φ(A)−1 p, σ)UΦ(A)−1 p→p (0, A)eσ (Φ(A)−1 p)
σ
s
=
X
Ψ(Φ(A)−1 p, σ)
σ
(s
X
p0
[Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ0 ,σ eσ0 (p)
(Φ(A)−1 p)0 0
σ
)
X
[Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ0 ,σ Ψ(Φ(A)−1 p, σ) eσ0 (p).
p0
(Φ(A)−1 p)0 σ
σ0
P
Because we also have (U (0, A)Ψ)(p) = σ (U (0, A)Ψ)(p, σ)eσ (p), we thus conclude that
s
X
p0
[Uk→k (0, Mp−1 AMΦ(A)−1 p )]σ,σ0 Ψ(Φ(A)−1 p, σ 0 ).
(U (0, A)Ψ)(p, σ) =
(Φ(A)−1 p)0 0
=
X
σ
This shows that the action of U (0, A) on H is completely determined once we know the action of
Uk→k (0, A) on H(k) for all little group elements A ∈ Gk . In other words, we have reduced the
e↑ to the problem of finding irreducible
problem of finding irreducible unitary representations of P
+
unitary representations of Gk on the Hilbert space H(k). We will now briefly discuss these representations of Gk . We separate two cases: m > 0 and m = 0.
•Step 7a: m > 0
If m > 0, then the little group Gk is SU (2), the double cover of the rotation group SO(3). This can
be seen easily by choosing k = (m, 0, 0, 0). The only restricted Lorentz transformations that leave
(m, 0, 0, 0) invariant are rotations, i.e. elements of SO(3) (Note that these are the only restricted
Lorentz transformations that leave the zeroth component of a vector invariant, so there cannot be
any other restricted Lorentz transformations that leave (m, 0, 0, 0) invariant). Therefore, the little
group in this case is Gk = Φ−1 (SO(3)) = SU (2). This can also be seen directly. The image of k
under the map ψ : M → H(2, C) is ψ(k) = m1C2 , so an element A ∈ SL(2, C) is in Gk if and only
if m1C2 = A(m1C2 )A∗ = mAA∗ , or A−1 = A∗ . This shows that A ∈ SL(2, C) must be unitary, i.e.
A ∈ SU (2) and hence Gk = SU (2).
The irreducible representations of SU (2) are finite-dimensional and are labelled by the parameter s ∈ {0, 21 , 1, 32 , . . .}, where 2s + 1 is the dimension of the representation. Because SU (2) is
39
a simply-connected Lie group, all these representations can be characterized by the irreducible
2s + 1-dimensional representations D(s) : su(2) → V2s+1 of its Lie algebra su(2). If we choose a
1 3
basis of the 2s + 1-dimensional vector space V2s+1 in which D(s) ( 2i
σ ) is diagonal, we can specify
(s)
D by
1 3
(s)
σ
D
= −iσδσ,σ0
2i
σσ 0
p
p
1 1
(s)
D
σ
= −i δσ0 ,σ+1 (s − σ)(s + σ + 1) + δσ0 ,σ−1 (s + σ)(s − σ + 1)
2i
0
σσ
p
p
1 2
D(s)
σ
= − δσ0 ,σ+1 (s − σ)(s + σ + 1) − δσ0 ,σ−1 (s + σ)(s − σ + 1)
2i
σσ 0
where the row and column indices σ and σ 0 run from s to −s. We will denote the representation of
SU (2) corresponding to the representation D(s) of su(2) by D(s) . We thus conclude that the Hilbert
space H(k) is equal to C2s+1 for some j ∈ 12 Z≥0 and that Uk→k (0, A) = D(s) (A) for A ∈ SU (2).
Note that we have implicitly chosen the orthonormal basis {eσ (k)}σ on H(k) = C2s+1 to be a set of
1 3
σ ), and (as described above) this also defines orthonormal bases {eσ (p)}σ
eigenvectors of D(s) ( 2i
2s+1
+ . The representation U is now given by
for H(p) = C
at all other points p ∈ Om
s
(Φ(A)p)0 X (s)
−1
Up→Φ(A)p (0, A)eσ (p) =
[D (MΦ(A)p
AMp )]σ0 σ eσ0 (Φ(A)p).
p0
0
σ
In terms of the functions Ψ(p, σ) this reads
s
X
p0
[D(s) (Mp−1 AMΦ(A)−1 p )]σ,σ0 Ψ(Φ(A)−1 p, σ 0 ).
(U (0, A)Ψ)(p, σ) =
(Φ(A)−1 p)0 0
σ
On H(k) ' C2s+1 , we define the hermitian operators
S
(s),j
(k) = iD
(s)
1 j
σ
2i
(2.28)
for j = 1, 2, 3, which we will call the spin operators at the point k. They satisfy
[S
(s)
2
(k)] =
3 h
X
S (s),j (k)
i2
= s(s + 1)1H(k)
j=1
and
[S (s),a (k), S (s),b (k)] = iabc S (s),c (k).
(2.29)
The three generators M j commute with P 0 and the operators P j are zero operators on H(k) so
they commute trivially with the M j , so the M j leave the space H(k) invariant. We can therefore
define the operators M j (k) : H(k) → H(k) in the obvious way. We now find
eitS
(s),j (k)
= eiD
(s) ( t σ j )
2i
t
j
= Uk→k (0, e 2i σ ) = eitM
j (k)
.
So S (s),j (k) = M j (k). We will now use this to show that S (s),j (k) is proportional to W j (k).
Because Pµ W µ = 0, it follows that W 0 (k) = 0. For the other components of W (k) we find that
1 iνρσ
1 iνρσ
1
i
W (k) =
− Mνρ Pσ (k) = − Mνρ kσ (k) = − mijl0 Mjl (k)
2
2
2
1 0ijl
m X 0ijl jl
=
m Mjl (k) =
M (k) = mM i
2
2
j,l
= mS
(s),i
(k).
40
In other words,
S(s) (k) =
1
W(k).
m
(2.30)
In particular, this implies that W (k)2 = −[W(k)]2 = −m2 [S(s) (k)]2 = −m2 s(s + 1)1H(k) and
therefore that
W 2 = −m2 s(s + 1)1H ,
since we already knew that W 2 is a scalar multiple of 1H . We now define the spin operator S (s),j
on H by
1
W 0P j
(s),j
j
S
=
W −
.
(2.31)
m
m + P0
It is clear that, since this operator is constructed from the P µ and W µ , it leaves all spaces H(p)
+ . The reason
invariant, and therefore we can define spin operators S (s),j (p) at any point p ∈ Om
(s)
for the definition (2.31) is that the three operator components of S form a (pseudo-)vector, i.e.
+ this definition reproduces (2.30) and at each
[M a , S (s),b ] = iabc S (s),c . Also, at the point k ∈ Om
point p in the orbit the commutation relations (2.29) hold. It can in fact be shown that this is the
unique operator that is a linear combination of the W µ with coefficients that are functions of P µ
and that satisfies all these properties, see section 7.2C of [2]. The action of S (s),3 on a function
Ψ(p, σ) is given by
(S (s),3 Ψ)(p, σ) = σΨ(p, σ),
but we will not prove this.
•Step 7b: m = 0
For m = 0 we choose k to be the vector k = (1, 0, 0, 1). Under the map ψ : M → H(2, C) as
defined in section 2.1.2 the vector k corresponds to the hermitean matrix
0
2 0
k + k 3 k 1 − ik 2
.
=
ψ(k) =
0 0
k 1 + ik 2 k 0 − k 3
For an arbitrary 2 × 2-matrix A with components Aij (i, j = 1, 2) the condition Aψ(k)A∗ = ψ(k)
implies that |A11 |2 = 1 and A21 = 0. If A is also in SL(2, C) then we must have 1 = det(A) =
A11 A22 − A12 A21 = A11 A22 , which implies that A22 = A−1
11 = A11 . Thus, A ∈ SL(2, C) is in the
little group Gk of k if and only if it is of the form
iα
e
z
Aα,z =
0 e−iα
with α ∈ R and z = z1 + iz2 ∈ C. If α = 0 we can obtain A0,z by
1
A0,z = ez1 ( 2 σ
1 − 1 σ 2 )+z (− 1 σ 1 − 1 σ 2 )
2
2i
2i
2
(2.32)
and if α 6= 0 and α 6= π, we can obtain Aα,z by
1
Aα,z = e−2α 2i σ
3+
α
sin α
[z1 ( 12 σ1 − 2i1 σ2 )+z2 (− 2i1 σ1 − 12 σ2 )] .
(2.33)
Here we used that for c = (c1 , c2 , c3 ) ∈ C3 we have that
ec·σ =
if c21 + c22 + c23 6= 0 and
p
q
sinh
c21 + c22 + c23
p
cosh c21 + c22 + c23 1C2 +
c·σ
c21 + c22 + c23
ec·σ = 1C2 + c · σ
41
(2.34)
(2.35)
if c21 + c22 + c23 = 0. In the case α = 0 we chose c = ( z2 , iz2 , 0) and applied (2.35); in the case where
zα
izα
α 6= 0 and α 6= π we chose c = ( 2 sin
α , 2 sin α , iα) and applied (2.34). Note that the elements Aα,z
of Gk satisfy the algebraic properties
Aα1 ,0 Aα2 ,0 = Aα1 +α2 ,0
A0,z1 A0,z2
= A0,z1 +z2
Aα,0 A0,z A−α,0 = A0,ze2iα .
In order to understand this group, we need to recall the definition of the group E+ (2), the proper
Euclidean group in two dimensions. The group E+ (2) acts on the plane R2 and is generated by
the translations T (~v ) over a vector ~v ∈ R2 and the rotations R(θ) around an angle θ ∈ [0, 2π).
These generators satisfy
R(θ1 )R(θ2 ) = R(θ1 + θ2 )
T (~v1 )T (~v2 ) = T (~v1 + ~v2 )
R(θ)T (~v )R(−θ) = T (R(θ)~v ).
e+ (2) of E+ (2). The elements
Comparing these two groups, we observe that Gk is the double cover E
A 1 α,0 and A0,z satisfy the same algebraic properties as R(θ) and T (~v ), respectively, only the range
2
of α runs from 0 to 4π, while the range of θ runs from 0 to 2π. The only finite-dimensional
irreducible unitary representations of Gk are one-dimensional, and are given by
D(σ) (Aα,z ) = e2iσα 1H(k)
for σ ∈ 21 Z. Here H(k) ' C is one-dimensional. All other representations are infinite-dimensional,
but they turn out to be physically irrelevant. Thus, U is given by
s
−1
(Φ(A)p)0 2iσα(MΦ(A)p
AMp )
Up→Φ(A)p (0, A)eσ (p) =
eσ (Φ(A)p),
(2.36)
e
0
p
where the index σ can only take on one value, since the representation of Gk is one-dimensional,
and α(M ) denotes the angle α in M = Aα,z for M ∈ Gk . In terms of the functions Ψ(p, σ) this
reads
s
−1
p0
2iσα(MΦ(A)p
AMp )
(U (0, A)Ψ)(p, σ) =
e
Ψ(Φ(A)−1 p, σ).
(Φ(A)−1 p)0
e+ (2) is spanned by the elements
It follows from (2.32) and (2.33) that the Lie algebra of Gk ' E
1 3
σ ,
2i
1
:= − σ 2 +
2i
1
:= − σ 1 −
2i
R :=
T1
T2
1 1
σ
2
1 2
σ .
2
The Lie algebra representation D(σ) induced by D(σ) maps the basis element R to −iσ1H(k) ,
because
1 3
(σ) 1 3
e−2αD ( 2i σ ) = D(σ) (e−2α 2i σ ) = D(σ) (Aα,0 ) = e2iσα .
A similar calculation shows that the other two basis elements of the Lie algebra are mapped to 0
by D(σ) . On the space H(k) we define the operator
1 3
(σ)
σ = σ1H(k) ,
λ(k) := iD
2i
which we will call the helicity operator at the point k. Because [M 3 , P µ ] = [M 12 , P µ ] = i(η 2µ P 1 −
η 1µ P 2 ) and because P 1 and P 2 are the zero operators on H(k), we find that M 3 leaves the space
42
H(k) invariant, so we can define an operator M 3 (k) : H(k) → H(k) in the obvious way. By the
same reasoning as for m > 0 we then find that λ(k) = M 3 (k). The Pauli-Lubanski operator
at the point k is W µ (k) = − 21 µνρσ Mνρ kσ = − 21 (µνρ0 − µνρ3 )Mνρ , where we have used that
k3 = −k 3 = −1. Writing out these expressions gives15
W 0 (k) = M 3 (k) = λ(k)
W 1 (k) = (M 1 + N 2 )(k) = 0
W 2 (k) = (M 2 − N 1 )(k) = 0
W 3 (k) = M 3 (k) = λ(k),
so W µ (k) = k µ λ(k) = σk µ 1H(k) . We have thus found that W µ (k) is proportional to k µ and, in
particular, that [W (k)]2 = Wµ (k)W µ (k) = 0. Because W 2 = Wµ W µ is a scalar multiple of the
identity operator on H, this gives
W 2 = Wµ W µ = 0.
Because we always have Pµ W µ = 0 and because P 2 = m2 1H = 0, we conclude that W µ must be
proportional to P µ . Since W µ (k) = σk µ 1H(k) , the proportionality constant is σ and we obtain
W µ = σP µ .
On the Hilbert space H we now define the helicity operator by
λ=
M·P
W0
=
.
P0
|P|
We will now briefly discuss the image of Gk ⊂ SL(2, C) under the covering map Φ : SL(2, C) →
because this is not found in any of the literature that we have used. The image of an arbitrary
element Aα,z is easily obtained from ψ(Φ(Aα,z )x) = Aα,z ψ(x)A∗α,z , or
(Φ(Aα,z )x)0 + (Φ(Aα,z )x)3 (Φ(Aα,z )x)1 − i(Φ(Aα,z )x)2
(Φ(Aα,z )x)1 + i(Φ(Aα,z )x)2 (Φ(Aα,z )x)0 − (Φ(Aα,z )x)3
(1 + |z|2 )x0 + 2Re[zeiα (x1 − ix2 )] + (1 − |z|2 )x3 e2iα (x1 − ix2 ) + zeiα (x0 − x3 )
=
.
e−2iα (x1 + ix2 ) + ze−iα (x0 − x3 )
x0 − x3
L↑+ ,
After some straightforward computations this gives


z 2 +z 2
z 2 +z 2
1+ 12 2
z1 cos α + z2 sin α z1 sin α − z2 cos α
− 12 2


cos 2α
sin 2α
−z1 cos α + z2 sin α
 z cos α − z2 sin α
Φ(Aα,z ) =  1

− sin 2α
cos 2α
z1 sin α + z2 cos α 
−z1 sin α − z2 cos α
z12 +z22
2



= R3 (−α) 

1+
z12 +z22
2
z1
−z2
z12 +z22
2
z1 cos α + z2 sin α z1 sin α − z2 cos α

z 2 +z 2
z1 −z2 − 1 2 2

1
0
−z1 
 R3 (−α),
0
1
z2

b21 +b22
z1 −z2 1 − 2
1−
z12 +z22
2
where R(α) denotes a rotation around the x3 -axis over an angle α (counterclockwise, as seen from
a point with positive x3 -coordinate). In particular, this computation shows that Aα,0 is mapped
onto the rotation R3 (−2α). Note that it was to be expected that {R3 (α)}α∈[0,2π) would be a
subgroup of Φ(Gk ) because k = (1, 0, 0, 1) is invariant under rotations around the x3 -axis.
In the book [35] of Weinberg the group Φ(Gk ) ⊂ L↑+ is computed directly as follows16 . Let
W ∈ L↑+ be such that W k = k. Then for the vector t = (1, 0, 0, 0) we have 1 = t·k = (W t)·(W k) =
15
Here we use that eit(M
1
+N 2 )(k)
= Uk→k (0, e−tT2 ) = e−D
−D (σ) (T1 )
e
(σ)
(T2 )
= 1H(k) and eit(M
2
−N 1 )(k)
= Uk→k (0, e−tT1 ) =
= 1H(k) .
Weinberg does not use the universal covering group at all. As a consequence, he needs to consider double-valued
representations, i.e. representations up to a sign, of the little groups SO(3) and E+ (2).
16
43
(W t) · k = (W t)0 − (W t)3 , so we can write W t as W t = (1 + ct , at , bt , ct ) with at , bt , ct ∈ R. Also
1 = t · t = (W t) · (W t) = (1 + ct )2 − a2t − b2t − c2t , from which it follows that ct can be expressed in
terms of at and bt as ct = ct (at , bt ) = (a2t +b2t )/2. Now define the restricted Lorentz transformations
Sa,b by


1 + c(a, b) a b −c(a, b)


a
1 0
−a
 ∈ L↑ ,
Sa,b = 
+


b
0 1
−b
c(a, b)
a b 1 − c(a, b)
where c(a, b) := (a2 + b2 )/2. Using that c(a + a0 , b + b0 ) = c(a, b) + c(a0 , b0 ) + aa0 + bb0 , it follows
easily that Sa,b Sa0 ,b0 = Sa+a0 ,b+b0 , so the Sa,b form an abelian subgroup of L↑+ . Note that Sat ,bt t =
(1 + ct , at , bt , ct ) = W t and thus that Sa−1
W t = t, which shows that
t ,bt
Sa−1
W ∈ Φ(Gt ) = SO(3).
t ,bt
W k = k, so Sa−1
W ∈ SO(3) must be a
Because Sa,b k = k for all a, b ∈ R, we also have Sa−1
t ,bt
t ,bt
rotation that leaves the x3 -component invariant, i.e. it must be a rotation R3 (θ) around the 3-axis.
We thus conclude that W = Sat ,bt R3 (θ). A general element in L↑+ that leaves k invariant is thus
of the form Wa,b,θ = Sa,b R3 (θ), and these elements do indeed form a group. As we have already
seen above, the elements Wa,b,0 form an abelian subgroup of L↑+ with multiplication given by
Wa,b,0 Wa0 ,b0 ,0 = Wa+a0 ,b+b0 ,0 .
The elements W0,0,θ also from an abelian subgroup with multiplication given by
W0,0,θ W0,0,θ0 = W0,0,θ+θ0 .
Furthermore, we also have
W0,0,θ Wa,b,0 W0,0,−θ = Wa cos θ+b sin θ,−a sin θ+b cos θ,0 .
Thus the group formed by the elements Wa,b,θ is isomorphic to the two dimensional Euclidean
group E+ (2), i.e. the group of translations and rotations in the plane; the isomorphism is given
by identifying Wa,b,0 with a translation in the plane over the vector (a, b) and identifying W0,0,θ
with a rotation in the plane around an angle θ.
Physical interpretation of the irreducible representations
e↑ that are relevant in
We have now fully classified those irreducible unitary representations of P
+
physics. Quantum systems in which the pure state vectors Ψ ∈ H transform as such representae↑ are interpreted as one-particle states. The label m is interpreted as the mass of the
tions of P
+
particle and the operators P µ are interpreted as the four-momentum operators corresponding to
the one-particle system (this last fact is made more rigorous in [1], section 3.6). We found that the
one-particle states are ∪p∈Om
+ H(p)-valued functions of the four-momentum p with Ψ(p) ∈ H(p),
p
0
but since p = m2 + p2 , we will from now on write the one-particle states as functions Ψ(p) of
the three-momentum p. The generators M j of rotations around the xj -axis are interpreted as the
xj -component of the angular momentum of the particle.
If m > 0 and if the representation of SU (2) is 2s + 1-dimensional, then the state vectors of the
particle are functions Ψ(p, σ), where p ∈ R3 and σ ∈ {−s, . . . , s}. The label s ∈ 21 Z≥0 is called
the spin of the particle and because
(S (s),3 Ψ)(p, σ) = σΨ(p, σ),
the label σ ∈ {−s, . . . , s} denotes the spin component along the x3 -direction. The operators S(s)
contribute to the total angular momentum M of the particle and for a particle at rest we have
44
in fact M = S(s) . For a massive particle with spin s the probability of finding the particle with
three-momentum p in some Borel set B ⊂ R3 and spin x3 -component σ is
Z
d3 p
,
|Ψ(p, σ)|2
2ωp
B
where we defined
ωp :=
p
m 2 + p2 .
For this reason, we may interpret
1
ψ(p, σ) := p
Ψ(p, σ)
2ωp
as the momentum-spin wave function of the particle. This momentum-spin wave function ψ is
d3 p
square-integrable with respect to d3 p, not with respect to 2ω
. We will denote the Hilbert space of
p
such momentum-spin wave functions by H, instead of H. Thus, we can describe one-particle states
either by elements in Ψ in H or by elements ψ in H, both descriptions being unitarily equivalent
(and hence physically equivalent) to each other. The map J : Ψ 7→ √ 1 Ψ = ψ which relates two
2ωp
physically equivalent elements with each other, provides the unitary map from H onto H. This
map should be interpreted as some kind of change of variables on the space of functions. The
e↑ on H can easily be made into a representation u of P
e↑ on H by using the
representation U of P
+
+
map J:
p
1
[U (a, A) ( 2ωp ψ)](p, σ).
[u(a, A)ψ](p, σ) = [JU (a, A)J −1 ψ](p, σ) = p
| {z }
2ωp
∈H
From now on we will write both representations U and u as U ; it will always be clear from the
context which one is meant. In the following chapters we will need to switch often between both
Hilbert spaces H and H to describe one-particle states, but we will always make a very explicit
distinction between the two descriptions. The Fourier transform of the momentum-spin wave
function can be interpreted as the position (and spin) wave function. The position operators X j
act on the momentum space wave function as the operators X j = i ∂p∂ j . Therefore, the action of
X j on Ψ(p, σ) is given by the operator
Xj =
p
1
∂
2p0 i j p
.
∂p
2p0
If m = 0 then the Hilbert spaces H(p) are all one-dimensional and therefore the state vectors
are functions Ψ(p) of the three-momentum. The action of the helicity operator λ on Ψ is just a
multiplication by σ, i.e. λΨ = σΨ; here σ denotes the label that occurs in the classification of
the one-dimensional representations of (the double cover of) E+ (2). The absolute value |σ| of the
label σ is called the spin of the particle and σ itself is called the helicity of the particle. Because
for the helicity we have
M·P
σ1H =
,
|P|
the helicity measures the angular momentum of the particle in the direction of its three-momentum
p. Unlike the spin components σ for massive particles, the quantity σ for massless particles is fixed
for a given particle type. For example, if neutrinos are massless (which might not be the case),
then they would have σ = − 21 and anti-neutrinos would have σ = 12 . However, as we will see
after equation (2.39) below, there are cases where the particles with helicities σ and −σ should
be identified. In that case H is a direct sum of two irreducible representations, corresponding to
helicity σ and −σ, and the state vectors are described by functions Ψ(p, σ) with σ taking on the
two values ±σ. From now on we will always use the notation Ψ(p, σ) to denote the state vector of
a massless particle, even if σ can only take on one value. As for massive particles, we also define
45
the Hilbert space H of momentum-spin wave functions ψ(p, σ) for massless particles. However, for
massless particles it is not possible to give a satisfactory definition of position operators.
Space inversion and time reversal
e↑ , but also under the action of a space
Some quantum systems are not only invariant under P
+
0
0
inversion Is : (x , x) 7→ (x , −x) or time reversal It : (x0 , x) 7→ (−x0 , x) or the combination
Is It : x 7→ −x. By Wigner’s theorem, these transformations can then be represented by unitary or
antiunitary operators P, T and PT on the Hilbert space H corresponding to the quantum system.
It can be shown (see section 2.6 of [35]) that in order to avoid the existence of negative-energy
states, we must choose P to be linear and unitary and we must choose T to be antilinear and
antiunitary. Without any derivation, we will now simply give the action of P and T on states
e↑ , i.e. on states that represent one-particle states.
transforming irreducibly under P
+
For massive particles the action of P is given by
Peσ (p) = ξeσ (Is p),
(2.37)
where ξ is a phase factor that only depends on the species of particle; it is called the intrinsic
parity of the particle. The action of T is given by
Teσ (p) = ζ · (−1)s−σ e−σ (Is p),
(2.38)
where ζ is a phase factor that only depends on the species of particle and s is the spin of the particle.
In contrast to the intrinsic parity ξ, the phase factor ζ has no physical significance because when
we redefine eσ (p) by ζ 1/2 eσ (p), the factor ζ cancels out in equation (2.38). This trick does not
work for the intrinsic parity ξ because P is linear, rather than antilinear.
For massless particles the action of P is given by
Peσ (p) = ξσ · eiπ(p)σ e−σ (Is p),
(2.39)
where ξσ is a phase factor and (p) ∈ {−1, 1} is the sign of the x2 -component of p. Thus, if a
theory is invariant under space inversions then massless particles in this theory with some σ should
be identified with those particles obtained by substituting σ → −σ. This happens for instance in
quantum electrodynamics, where the massless particles with σ = 1 and σ = −1 are identified and
are both refered to as photons. The action of T is given by
Teσ (p) = ζσ · eiπ(p)σ eσ (Is p),
(2.40)
where ζσ is a phase factor and (p) is as above.
2.2.5
Many-particle states and Fock space
Most of the material covered in this subsection can be found in one of the texts [1], [2] or [8].
Suppose that we have a system consisting of n non-interacting distinguishable particles; here
distinguishable means that all particles are of a different type. If the individual particles are
described by one-particle states in Hilbert spaces17 H1 , . . . , Hn , then the total system is described
by the Hilbert space H1 ⊗ . . . ⊗ Hn , and the algebra of observables of the total system is the
tensor product of the algebras of observables on the one-particle states. If the system consists of n
non-interacting particles of the same type then the Hilbert space of the system is still H ⊗n , with H
the Hilbert space for a single particle of the given type, but not all unit rays in this Hilbert space
represent physically realizable states. This last fact comes from the fact that in quantum mechanics
two particles of the same type cannot be distinguished when they form a single system; there is
no way of keeping track of which particle is which. Mathematically, this means the following.
17
Here and in the rest of this section all the one-particle Hilbert spaces H either represent the spaces of state
functions Ψ(p, σ) or else they all represent the momentum-spin wave functions ψ(p, σ), but one should be consequent
in this choice. Thus, we either have H = H in all definitions, or else H = H in all definitions.
46
If Sn denotes the symmetric group on n objects, we define for each σ ∈ Sn a unitary operator
R(n) (σ) : H ⊗n → H ⊗n by
R(n) (σ)(h1 ⊗ . . . ⊗ hn ) = hσ(1) ⊗ . . . ⊗ hσ(n) .
Note that this defines a unitary representation of Sn in H ⊗n . The statement that the particles are
truely indistinguishable is then equivalent to saying that for any physically realizable pure state
h ∈ H ⊗n and for any σ ∈ Sn we must have18
R(n) (σ)h = λ(σ, n, h)h
(2.41)
with λ(σ, n, h) a complex number of absolute value 1. Note that it follows from the linearity
of R(n) that for any nonzero complex number c we have R(n) (σ)(ch) = λ(σ, n, h)ch, so that λ
satisfies λ(σ, n, ch) = λ(σ, n, h); this holds in particular whenever |c| = 1 and therefore λ(σ, n, h)
is independent of the choice of unit vector in the unit ray R(h). For σ1 , σ2 ∈ Sn and for any
physically realizable state h ∈ H ⊗n we have that
λ(σ1 σ2 , n, h)h = R(n) (σ1 σ2 )h = R(n) (σ1 )R(n) (σ2 )h = λ(σ1 , n, h)λ(σ2 , n, h)h,
so for any physically realizable state h ∈ H ⊗n the map λ(., n, h) : Sn → U (1) defines a 1dimensional representation of Sn on the space Ch. But the only two 1-dimensional representations
of Sn are the completely symmetric one, λS (σ) = 1 for all σ ∈ Sn , and the completely antisymmetric
one, λA (σ) = (σ) for all σ ∈ Sn , where : Sn → {−1, 1} denotes the sign of the permutation. Thus,
for any physically realizable state h ∈ H ⊗n we either have λ(σ, n, h) = λS (σ) or λ(σ, n, h) = λA (σ).
In the first case we say that h is a completely symmetric state and in the second case we say that
h is a completely antisymmetric state. The set of symmetric state vectors forms a linear subspace
n (H) and the set of antisymmetric state vectors forms a linear
of H ⊗n which we denote by F+
n (H). The orthogonal projections P + : H ⊗n → F n (H)
subspace of H ⊗n which we denote by F−
+
n
n
⊗n
n
−
and Pn : H → F− (H) onto F± (H) are given by
Pn+ =
Pn− =
1 X
R(σ)
n!
σ∈Sn
1 X
(σ)R(σ).
n!
σ∈Sn
n (H) with a state in F n (H) does not satisfy
It is obvious that a superposition of a state in F+
−
(2.41) and is therefore not physically realizable. It turns out that for any given particle type
occuring in nature, the state vectors of an n-particle system of particles of that type are either
always symmetric or else always antisymmetric. Furthermore, the choice between the symmetric
and antisymmetric case is the same for each n, so for any given particle type we either have that
n (H) and that λ(σ, n, h) = λ (σ) for all
the space of physically realizable states of n particles is F+
S
n
n (H)
h ∈ F+ (H) or else we have that the space of physically realizable states of n particles is F−
n (H). In the first case we say that the given particle is a
and that λ(σ, n, h) = λA (σ) for all h ∈ F−
boson and in the second case we say that it is a fermion.
18
There are more general possibilities here than a phase factor λ, the more general condition being that R(n) (σ)h
is physically indistinguishable from h. Instead of by unit rays, physical states are then mathematically described
by the (higher dimensional) generalized unit rays, see also [24]. Below we will see that λ defines a 1-dimensional
representation of the permutation group. In the more general case one then also considers higher dimensional
representations of the permutation group Sn induced by R(n) on H ⊗n . This more general theory is also referred to
as parastatistics and is of an entirely different nature than the generalizations that one encounters in 3-dimensional
spacetime (i.e. braid statistics, anyons, etc.). Note that in the case of two particles (n = 2) there is actually no
generalization because H ⊗2 decomposes precisely into a direct sum of the two subspaces corresponding to the two
1-dimensional representations of S2 . This is related to the fact that we can write v ⊗ w as 12 (v ⊗ w + w ⊗ v) + 12 (v ⊗
w − w ⊗ v).
47
Because we often have to consider systems of identical particles in which the number of particles
may change, we introduce the Hilbert spaces
F± (H) =
∞
M
n
F±
(H),
n=0
0 (H) ' C represents the vacuum state, i.e. the state with no particles. We will choose a
where F±
0 (H) and call it the Fock vacuum. For each one-particle state vector h ∈ H we
unit vector Ω ∈ F±
define a (densely defined) operator A∗± (h) : F± (H) → F± (H) by
A∗± (h)Ω = h
√
±
A∗± (h)Pn± (h1 ⊗ . . . ⊗ hn ) =
n + 1Pn+1
(h ⊗ h1 ⊗ . . . ⊗ hn ).
(2.42)
(2.43)
n (H) into F n+1 (H) by ’creating’ an extra particle with state vector h. For
This operator maps F±
±
∗
this reason A± (h) is called a creation operator. Note that it is defined on the dense subspace
±
D =
∞ M
n
[
j
F±
(H)
n=0 j=0
of F± (H), and that it leaves this subspace invariant. Furthermore, it can be shown that the
operator A∗± (h) is closable. Finally, note that the mapping h 7→ A∗± (h) is linear. For vectors
h1 , . . . , hn ∈ H it follows easily that
√
A∗± (h1 ) . . . A∗± (hn )Ω = n!Pn± (h1 ⊗ . . . ⊗ hn ).
The inner product on D± can be expressed as
±
hPn± (h1 ⊗ . . . ⊗ hn ), Pm
(g1 ⊗ . . . ⊗ gm )iD±
=
1
n!m!
X
± (σ)± (σ 0 )
σ∈Sn ,σ 0 ∈Sm
hhσ(1) ⊗ . . . ⊗ hσ(n) , gσ0 (1) ⊗ . . . ⊗ gσ0 (m) i
δnm X ±
=
(σσ 0 )hhσ(1) , gσ0 (1) i . . . hhσ(n) , gσ0 (n) i
(n!)2 0
σ,σ ∈Sn
=
=
δnm
(n!)2
X
σ,σ 0 ∈Sn
± (σσ 0 ) hh1 , gσ0 σ−1 (1) i . . . hh1 , gσ0 σ−1 (n) i
| {z }
=± (σ 0 σ −1 )
δmn X ±
(σ)hh1 , gσ(1) i . . . hhn , gσ(n) i,
n!
σ∈Sn
where + (σ) = 1 and − (σ) = (σ) for all σ ∈ Sn . The action of the adjoint operator A± (h) :
F± (H) → F± (H) of A∗± (h) on the subspace D± is given by
A± (h)Ω = 0
A± (h)Pn± (h1 ⊗ . . . ⊗ hn ) =
1
√
n
(2.44)
n
X
±
(±1)j−1 hhj , hiPn−1
(h1 ⊗ . . . , hj−1 ⊗ hj+1 , . . . hn ). (2.45)
j=1
n (H) into F n−1 (H), it is called an annihilation operator. Because
Because this operator maps F±
±
A± (h) is the adjoint of a densely defined operator, it is a closed operator. Unlike h 7→ A∗± (h), the
mapping h 7→ A± (h) is antilinear.
furthermore
that A± (h) is the restriction to the space
L∞ Note
L∞
⊗n
⊗n
F± (H) of the operator B(h) : n=0 H → n=0 H , defined by
√
B(h)(h1 ⊗ . . . ⊗ hn ) = nhh1 , hih2 ⊗ . . . ⊗ hn .
48
Indeed, when we apply B(h) to the vector Pn± (h1 ⊗ . . . ⊗ hn ), we get precisely the right-hand side
of (2.45). The creation and annihilation operators satisfy the following relations:
[A∗± (h1 ), A∗± (h2 )]∓ = [A± (h1 ), A± (h2 )]∓ = 0
[A± (h1 ), A∗± (h2 )]∓ = hh2 , h1 i1F± (H) ,
where [X, Y ]± = XY ± Y X.
Finally, we note that when we have an operator L on the one-particle Hilbert space H, we can
define an operator Γ± (L) on D± by
Γ± (L)Pn± (h1 ⊗ . . . ⊗ hn ) = Pn± (Lh1 ⊗ . . . ⊗ Lhn ).
n (H) invariant.
Here, by definition, we set Γ± (L)Ω = Ω. The operator Γ± (L) thus leaves all F±
Note that if L is invertible, then
Γ± (L)A∗± (h)Γ± (L)−1 Pn± (h1 ⊗ . . . ⊗ hn ) = Γ± (L)A∗± (h)Pn± (L−1 h1 ⊗ . . . ⊗ L−1 hn )
√
±
=
n + 1Γ± (L)Pn+1
(h ⊗ L−1 h1 ⊗ . . . ⊗ L−1 hn )
√
±
=
n + 1Pn+1
(Lh ⊗ h1 ⊗ . . . ⊗ hn )
= A∗± (Lh)Pn± (h1 ⊗ . . . ⊗ hn ),
so if L is invertible then
Γ± (L)A∗± (h)Γ± (L)−1 = A∗± (Lh)
(2.46)
on D± . In case L is also unitary, so that L−1 = L∗ , we have
Γ± (L)A± (h)Γ± (L)−1 Pn± (h1 ⊗ . . . ⊗ hn ) = Γ± (L)A± (h)Pn± (L−1 h1 ⊗ . . . ⊗ L−1 hn )
n
X
1 ±
= √ Γ (L)
(±1)j−1 hh, L−1 hj i
n
=
j=1
±
−1
Pn−1 (L h1 ⊗ . . .
n
1 X
j−1
√
n
(±1)
⊗ L−1 hj−1 ⊗ L−1 hj+1 ⊗ . . . ⊗ L−1 hn )
hLh, hj i
j=1
±
(h1 ⊗ . . . ⊗ hj−1 ⊗ hj+1 ⊗ . . . ⊗ hn )
Pn−1
= A± (Lh)Pn± (h1 ⊗ . . . ⊗ hn ),
so if L is unitary then
Γ± (L)A± (h)Γ± (L)−1 = A± (Lh)
(2.47)
on D± . If L and K are operators on H then Γ± (LK) = Γ± (L)Γ± (K), and we also have Γ± (1H ) =
1F± (H) . Thus, if φ : G → B(H) is a (unitary) representation of a group G on H, then Γ± (φ) :=
Γ± ◦ φ defines a (unitary) representation of G on F± (H). In particular, this holds for the unitary
e↑ on H, so we get a unitary representation Γ± (U ) on F± (H). This
representation19 U of P
+
representation is clearly reducible and the only vector in F± (H) which is invariant under Γ± (U )
e↑ we have
is the vacuum vector Ω. Using (2.46) and (2.47) we also find that for any (a, A) ∈ P
+
Γ± (U (a, A))A∗± (h)Γ± (U (a, A))−1 = A∗± (U (a, A)h)
Γ± (U (a, A))A± (h)Γ± (U (a, A))−1 = A± (U (a, A)h).
These are the transformation properties of the creation and annihilation operators under the
e↑ and we will need them several times in the following chapters. Of course, if there are
group P
+
also parity and time reversal operators P and T defined on the one-particle space H, then we also
obtain operators Γ± (P) and Γ± (T) on F± (H).
e↑ on
Recall from section 2.2.4 that we use the same notation (namely U ) to denote the representations of P
+
H = H and H = H.
19
49
In case we have a system that contains different types {τ }τ ∈T of particles that are not interacting, we proceed as follows. Let H [τ ] denote the Hilbert space of one-particle states for the particle
type τ . So the vectors in this Hilbert space transform according to the irreducible unitary repree↑ with mass mτ and spin sτ (or helicity στ ), as described in the previous section.
sentation of P
+
We then partition the set T of all particle types into two disjoint subsets TB and TF , consisting of
all particle types that are bosons or fermions, respectively. Then the Hilbert spaces H1B and H1F
of boson, respectively fermion, one-particle state vectors are defined by
M
M
H1B :=
H [τ ] ,
H1F :=
H [τ ] .
τ ∈TB
τ ∈TF
B/F
If we write TB = {τB,1 , . . . , τB,kB } and TF = {τF,1 , . . . , τF,kF }, then an arbitrary vector h in H1
can be written as a sum
h(τB/F,1 ) ⊕ · · · ⊕ h(τB/F,kB/F )
with h(τB/F,j ) ∈ H [τB/F,j ] . Because each h(τB/F,j ) is in turn a function of p and σ, we can write
B/F
h ∈ H1
as a function20 h(τ, p, σ), where the set of possible values of σ depends on τ . The inner
product of two such functions h and g is then given by
X
hh, giH B/F =
hh(τ ), g(τ )iH [τ ]
1
τ ∈TB/F
=
X
XZ
τ ∈TB/F σ∈Iτ
h(τ, p, σ)g(τ, p, σ)dλ(p)
(2.48)
R3
where Iτ = {−sτ , . . . , sτ } if τ is a massive particle with spin sτ , Iτ = {στ } if τ is a massless particle
with helicity στ and Iτ = {−στ , στ } if τ is a massless particle with possible helicities ±στ (see also
d3 p
the discussion above about parity); the volume element dλ(p) is either d3 p or 2ω
, depending on
p
B/F
B/F
B/F ⊗n
whether h, g ∈ H1
or h, gΨ ∈ H1 , respectively. The n-fold tensor product H1
can be
identified with the closed linear span of all product functions
h1 ⊗ . . . ⊗ hn ' h1 (τ1 , p1 , σ1 )h2 (τ2 , p2 , σ2 ) . . . hn (τn , pn , σn )
B/F
with all hj ∈ H1 . We can then construct the n-fold symmetrized (respectively antisymmetrized)
n (H B/F ) by using the projection operators P ± : the space F n (H B/F ) is the closed
tensor products F±
n
±
1
1
linear span of all functions of the form
1 X ±
(ρ)hρ(1) (τ1 , p1 , σ1 )hρ(2) (τ2 , p2 , σ2 ) . . . hρ(n) (τn , pn , σn ).
Pn (h1 ⊗ . . . ⊗ hn ) =
n!
ρ∈Sn
B/F
We then take the direct sum of all these spaces to obtain the Fock spaces F± (H1 ). On
B/F
these spaces F± (H1 ) we can define, as in (2.42)-(2.45), the creation and annihilation operaB/F
B/F
tors A∗± (h) and A± (h) for vectors h ∈ H1 . Because we can write each such h ∈ H1
as a
direct sum of vectors h(τ ) ∈ H [τ ] with all τ ∈ TB/F , it is useful to introduce a special notation
for creation and annihilation operators corresponding to a single particle species. We will write
A∗ (τ, h) and A(τ, h) to denote the creation and annihilation operators corresponding to a vec(∗)
B/F
tor h = h(p, σ) ∈ H[τ ] . In other words, A(∗) (τ, h) = A± (g), where g ∈ H1
is the function
L
0
(∗)
0
g(τ , p, σ) = τ ∈TB/F δτ τ h(p, σ). Note that we suppress the subindex ± in A (τ, .), because the
choice between + and − follows from τ .
The Fock space corresponding to the entire system of particles T = TB ∪ TF is the tensor
product
HFock := F+ (H1B ) ⊗ F− (H1F )
(2.49)
B/F
20
These functions h(τ, p, σ) can be either state functions Ψ ∈ H1
B/F
ψ ∈ H1 .
50
or else momentum-spin wave functions
of the boson Fock space F+ (H1B ) and the fermion Fock space F− (H1F ). If ΩB and ΩF denote the
vacuum vectors of F+ (H1B ) and F+ (H1F ), then the vector ΩFock := ΩB ⊗ ΩF is called the vacuum
vector of HFock . When τ is a boson, respectively fermion, we can let the operators A(∗) (τ, h)
act on the entire space HFock by taking the tensor product A(∗) (τ, h) ⊗ 1F− (H F ) , respectively
1
1F+ (H B ) ⊗ A(∗) (τ, h). If τ and τ 0 are both bosons or both fermions, so either τ, τ 0 ∈ TB or
1
τ, τ 0 ∈ TF , then these operators satisfy the relations
[A∗ (τ, h1 ), A∗ (τ 0 , h2 )]∓ = [A(τ, h1 ), A(τ 0 , h2 )]∓ = 0
[A(τ, h1 ), A∗ (τ 0 , h2 )]∓ = δτ,τ 0 hh2 , h1 i1HFock ,
where the upper sign corresponds to the boson case and the lower sign to the fermion case. Note
that interchanging two operators corresponding to different fermion types costs a minus sign.
If one of the particles τ and τ 0 is a boson and the other is a fermion, then their creation and
annihilation operators commute with each other. We finally note that the unitary representations
e↑ on the one-particle spaces H [τ ] define unitary representations UB := L
{Uτ }τ ∈T of P
+
τ ∈TB Uτ and
L
↑
B
F
e on H and H , respectively. These can then be used to define a unitary
UF :=
Uτ of P
τ ∈TF
+
1
1
representation UFock := Γ+ (UB ) ⊗ Γ− (UF ) on HFock . If all one-particle spaces H [τ ] also contain
parity and time reversal operators, then a similar construction gives us parity and time reversal
operators on HFock .
As emphasized above, all definitions in this section hold for both the Hilbert space H of oneparticle state functions Ψ(p, σ), as well as for the Hilbert space H of one-particle momentum-spin
wave functions ψ(p, σ). In the first case (i.e. in the case H = H, H [τ ] = H[τ ] , etc.) we obtain
the Fock spaces F± (H) and HFock and in this case we will write the creation and annihilation
operators with a capital letter A,
(∗)
A± (Ψ),
A(∗) (τ, Ψ),
etc.,
whereas in the second case (i.e. in the case H = H, H [τ ] = H[τ ] , etc.) we obtain Fock spaces F± (H)
and HFock and in this case we will always write the creation and annihilation operators with a
small letter a
(∗)
a± (ψ), a(∗) (τ, ψ), etc.
In the previous section we defined the unitary map J : H → H that relates the two Hilbert spaces.
This map naturally extends to a family of maps Γn (J) : H⊗n → H⊗n by defining Γn (J)(Ψ1 ⊗ . . . ⊗
n (H) then defines a unitary
Ψn ) = (JΨ1 ) ⊗ . . . ⊗ (JΨn ). For each n, the restriction of Γn (J) to F±
n
n
n
map Γ± (J) : F± (H) → F± (H). Hence, there is a unitary map
Γ± (J) : F± (H) → F± (H)
that relates the two Fock spaces. For Υ ∈ F± (H), the element Γ± (J)Υ is physically equivalent to
Υ in a similar sense that, for Ψ ∈ H, the element JΨ is physically equivalent to Ψ. Using Γ± (J),
we can express the relation between A(∗) and a(∗) as
(∗)
(∗)
A± (Ψ) = Γ± (J)−1 a± (JΨ)Γ± (J)
(∗)
(∗)
a± (ψ) = Γ± (J)a± (J −1 ψ)Γ± (J)−1
(∗)
(∗)
Thus A± (Ψ) and a± (JΨ) represent the same physics in the sense that if Υ ∈ F± (H) and if
(∗)
υ = Γ± (J)Υ ∈ F± (H) (which is physically equivalent to Υ), then A± (Ψ)Υ is physically equivalent
(∗)
to a± (ψ)υ.
Especially in the next chapter, on the physics of quantum fields, the description in terms of H
will be most convenient. This coincides with the notation in most of the physics literature, where
small letters a are used to denote creation and annihilation operators.
51
3
The physics of quantum fields
In this chapter we will give a brief overview of the use of quantum fields in physics in the so-called
canonical formalism (as opposed to the path-integral formalism). We will do this by following the
main arguments that are stated in chapters 3, 4 and 5 of Weinberg’s book [35]. In contrast to
the previous chapters (and the following ones) this chapter will not be mathematically rigorous.
Also, we will not provide any derivations of the results in this chapter since they can all be
found in [35]. The purpose of this chapter is merely to give some physical background on the
mathematical constructions in the next chapter and to motivate the content of the axioms of the
two mathematical frameworks that will be discussed in the next chapter.
3.1
The interaction picture and scattering theory
For any quantum system with Hamiltonian H the time-evolution operator is given by UH (t) =
e−itH , see also subsection 2.2.2. Now assume that this Hamiltonian can be written as H = H0 + V ,
where H0 is a Hamiltonian which corresponds to an easy and well-understood quantum system and
V is some extra term which makes the system more complicated. We thus assume that we already
know the evolution operator UH0 (t) = e−itH0 , and we want to express UH (t) in terms of UH0 (t)
and V . For this purpose we introduce the so-called interaction picture, which lies in between the
Heisenberg picture and the Schrödinger picture.
The interaction picture
In the interaction picture an observable A evolves according to
AI (t) = UH0 (−t)AUH0 (t)
and a state vector Ψ evolves according to
ΨI (t) = Ω(t)∗ Ψ,
where Ω(t) := UH (−t)UH0 (t) and hence Ω(t)∗ = UH0 (−t)UH (t). It is easy to see that the interaction picture is physically equivalent to the Heisenberg picture (which is in turn physically
equivalent to the Schrödinger picture), since
hAI (t)ΨI (t), ΨI (t)i = hUH0 (−t)AUH0 (t)UH0 (−t)UH (t)Ψ, UH0 (−t)UH (t)Ψi
= hUH0 (−t)AUH (t)Ψ, UH0 (−t)UH (t)Ψi = hAUH (t)Ψ, UH (t)Ψi
= hUH (−t)AUH (t)Ψ, Ψi.
Since we are interested in UH (t) and because UH (t) = UH0 (t)Ω(t)∗ , we must thus find a way to
calculate Ω(t)∗ . It follows directly from the expression Ω(t)∗ = UH0 (−t)UH (t) that Ω(t)∗ satisfies
dΩ(t)∗
= 1i VI (t)Ω(t)∗ , where VI (t) denotes the interaction picture evolution of V . This allows us
dt
to write
Z t
Z
dΩ(τ1 )∗
1 t
∗
∗
Ω(t) = Ω(0) +
dτ1 = 1 +
VI (τ1 )Ω(τ1 )∗ dτ1 .
dτ1
i 0
0
We can then insert this expression for Ω(t)∗ into the right-hand side of this same equation and
repeat this procedure over and over again, so that finally we obtain
∗
Ω(t)
Z Z
Z τ2
∞
X
1 t τn
∼ 1+
...
VI (τn )VI (τn−1 ) . . . VI (τ1 )dτ1 . . . dτn−1 dτn
in 0 0
0
n=1
Z
∞
X
1
= 1+
VI (τn )VI (τn−1 ) . . . VI (τ1 )dτ1 . . . dτn−1 dτn
in t>τn >τn−1 >...>τ1 >0
n=1
Z t
Z t
∞
X
1
= 1+
...
T {VI (τn )VI (τn−1 ) . . . VI (τ1 )}dτ1 . . . dτn−1 dτn ,
in n! 0
0
n=1
52
(3.1)
where T { } denotes the time-ordered product (i.e. the operators are ordered in such a way that
the time variables of the operators are in anti-chronological order from left to right). This is the
expression for Ω(t)∗ that we needed. We will now apply these results to the case of a scattering
experiment.
Scattering experiments
In a typical scattering experiment particles approach each other from very large mutual distances.
They will then interact with each other in some small region in space, and finally a collection of
particles (not necessarily the same ones) comes out of this region and moves apart to large mutual
distances. At the beginning and at the end of such experiment the particles are so far apart that
they do not interact with each other. Therefore, if H denotes the Hilbert space corresponding
to the scattering experiment and Ψ ∈ H is a pure state vector (in the Heisenberg picture), the
transformed state vectors e−iHt Ψ must in some sense ’look like’ state vectors in a free particle Fock
space HFock when t → ±∞; here either HFock = HFock or HFock = HFock , see also section 2.2.5.
Mathematically, this means that we have two linear isometric embeddings Ωin , Ωout : HFock → H
from the free particle Fock space into H. We then define Hin , Hout ⊂ H by Hin = Ωin HFock
and Hout = Ωout HFock ; these are called the spaces of asymptotic states of incoming and outgoing
particles, respectively.
Physically, the maps Ωin and Ωout should be interpreted as follows. If h ∈ HFock is a state
corresponding to a collection of non-interacting particles with some specified momenta and spinx3 components, then Ωin h ∈ H represents the state vector in a scattering experiment where the
state of the incoming particles (when they are not yet interacting) is given by h. The physical
interpretation of Ωout is analogous.
The linear operator S = (Ωout )∗ Ωin : HFock → HFock is called the scattering operator. In physics
it is often assumed that Hin = Hout (so-called asymptotic completeness), which is equivalent to
the requirement that S is unitary. The scattering matrix, or S-matrix, is defined by
Sβα := hShα , hβ iHFock = hΩin hα , Ωout hβ iH ,
where hα , hβ ∈ HFock are free-particle states. Note that it represents the transition amplitude for
e↑
the transition Ωin hα → Ωout hβ . Now recall the definition of the unitary representation UFock of P
+
on HFock as given subsection 2.2.5. The representation UFock induces two unitary representations
e↑ on Hin = Hout . The theory will be
U in := Ωin UFock (Ωin )∗ and U out := Ωout UFock (Ωout )∗ of P
+
Poincaré invariant if U in = U out , which is equivalent to SUFock = UFock S.
In physics textbooks, it is often assumed that the free particle states and the asymptotic states
both live in the same Hilbert space HFock = HFock of many-particle momentum-spin wave functions,
so in the rest of this chapter we will take HFock = HFock (rather than HFock = HFock ). Also, the
Hamiltonian H of the system is assumed to be a sum H = H0 + V of a free-particle Hamiltonian
H0 and an interaction term V . Then the operator
Ω(t) = UH (−t)UH0 (t) = eiHt e−iH0 t
is defined, and Ω(∓∞) = limt→∓∞ Ω(t) correspond to Ωin and Ωout , respectively, so that
S = lim
lim Ω(t)∗ Ω(t0 ).
t→∞ t0 →−∞
A similar calculation as the one which led to equation (3.1) gives that the operator Ω(t)∗ Ω(t0 ) can
be written as
Z t
Z t
∞
X
1
Ω(t)∗ Ω(t0 ) = 1 +
.
.
.
T {VI (τn )VI (τn−1 ) . . . VI (τ1 )}dτ1 . . . dτn−1 dτn .
in n! t0
t0
n=1
The S-operator can then be written as
Z
Z ∞
∞
X
(−i)n ∞
S =1+
...
dt1 . . . dtn T {VI (t1 ) . . . VI (tn )}.
n!
−∞
−∞
n=1
53
As a first step to guarantee that the S-matrix will be Lorentz-invariant, it is assumed that
VI (t) is of the form
Z
VI (t) =
H (t, x)d3 x,
(3.2)
R3
with H (x) a scalar in the sense that
UFock (a, A)H (x)UFock (a, A)−1 = H (Φ(A)x + a),
(3.3)
e↑ → P ↑ denotes the covering map, as usual21 . The S-operator may then be written
where Φ : P
+
+
as
Z
Z
∞
X
(−i)n
...
d4 x1 . . . d4 xn T {H (x1 ) . . . H (xn )}.
(3.4)
S =1+
n!
M
M
n=1
The time ordering of two points in spacetime is only Lorentz-invariant when the two points are
not spacelike separated, so to obtain Lorentz invariance we must have that H (x1 )H (x2 ) =
H (x2 )H (x1 ) whenever x1 and x2 are spacelike separated. Thus, H (x) satisfies
[H (x1 ), H (x2 )] = 0
(3.5)
when (x − y)2 < 0. Actually, to avoid singularities when x = y, it is assumed that this holds even
when (x − y)2 ≤ 0.
In physics it is assumed that different experiments that are carried out at large spatial distances
from each other cannot influence one another. This is called the cluster decomposition principle.
In terms of the S-matrix, this principle implies that for multiparticle scattering processes α1 →
β1 , . . . , αN → βN that are carried out at N different laboratories that are far apart, the S-matrix
element of the composite experiment will factorize:
Sβ1 +...+βN ,α1 +...+αN → Sβ1 α1 · . . . · SβN αN .
To see what kind of Hamiltonians will give rise to an S-matrix that satisfies this property, we
need to consider the creation and annihilation operators a(∗) (τ, ψ) on the space HFock defined
in subsection 2.2.5.R In the notation where we write a distribution F as a function F (x) in the
sense that F (f ) = F (x)f (x)dx, we can write the creation and annihilation operators a(∗) (τ, .) as
(operator-valued) functions a(∗) (τ, p, σ) in the sense that22
XZ
∗
a (τ, ψ) =
d3 pψ(τ, p, σ)a∗ (τ, p, σ)
σ∈Iτ
a(τ, ψ) =
R3
XZ
σ∈Iτ
d3 pψ(τ, p, σ)a(τ, p, σ),
R3
where we used that a(τ, ψ) depends conjugate-linearly on ψ. It is useful to introduce the so-called
definite momentum-spin wave functions ψτ 0 ,p0 ,σ0 . They are defined by
ψτ 0 ,p0 ,σ0 (τ, p, σ) = δτ τ 0 δσσ0 δ(p − p0 ).
The corresponding definite momentum-spin state functions Ψτ 0 ,p0 ,σ0 (τ, p, σ) are then given by
p
Ψτ 0 ,p0 ,σ0 (τ, p, σ) = 2ωp0 δτ τ 0 δσσ0 δ(p − p0 ),
21
In some (gauge) theories both (3.2) and (3.3) are only approximately true, but this will not affect the Lorentz
invariance. In such theories the interaction is obtained from a Lorentz invariant Lagrangian, which will guarantee
Lorentz invariance in a more general way than equations (3.2) and (3.3). We will come back to Lagrangians later.
22
Here and in the rest of this chapter the index set Iτ is defined as in (2.48).
54
but they will not be very useful. The “inner product”of two definite-momentum wave functions is
given by
Z
XXZ
3
ψτ 0 ,p0 ,σ0 (τ, p, σ)ψτ 00 ,p00 ,σ00 (τ, p, σ)d p = δτ 0 τ 00 δσ0 σ00
d3 p
τ
σ∈Iτ
R3
R3
δ(p − p0 )δ(p − p00 )
= δτ 0 τ 00 δσ0 σ00 δ(p0 − p00 ).
With the use of these definite momentum-spin wave functions, we can define the operators a(∗) (τ, p, σ)
formally as a(∗) (ψτ,p,σ ), since
XXZ
d3 pδτ τ 0 δσσ0 δ(p − p0 )a(∗) (τ 0 , p0 , σ 0 )
a(∗) (τ 0 , p0 , σ 0 ) =
τ
=
σ∈Iτ
R3
XXZ
d3 pψτ 0 ,p0 ,σ0 (τ, p, σ)a(∗) (τ 0 , p0 , σ 0 )
R3
τ
σ∈Iτ
(∗)
(ψτ 0 ,p0 ,σ0 )
= a
Since we are not being mathematically rigorous in this chapter, we can thus describe the action of
these operators as
√
±
a∗ (τ, p, σ)Pn± (ψ1 ⊗ . . . ⊗ ψn ) =
(ψτ,p,σ ⊗ ψ1 ⊗ . . . ⊗ ψn )
n + 1Pn+1
n
1 X
a(τ, p, σ)Pn (ψ1 ⊗ . . . ⊗ ψn ) = √
(±1)j−1 hψj , ψτ,p,σ i
n
=
j=1
±
Pn−1 (ψ1
n
X
1
√
n
⊗ . . . ⊗ ψj−1 ⊗ ψj+1 ⊗ . . . ⊗ ψn )
(±1)j−1 ψj (τ, p, σ)
j=1
±
Pn−1 (ψ1
⊗ . . . ⊗ ψj−1 ⊗ ψj+1 ⊗ . . . ⊗ ψn ).
TheLoperator a(τ, p, σ) is the restriction to the subspace F± (H) of the operator b(τ, p, σ), defined
⊗n by
on ∞
n=0 H
√
√
b(τ, p, σ)(ψ1 ⊗ . . . ⊗ ψn ) = nhψ1 , ψτ,p,σ iψ2 ⊗ . . . ⊗ ψn = nψ1 (τ, p, σ)(ψ2 ⊗ . . . ⊗ ψn ).
For an arbitrary function ψ (n) ∈ H⊗n this means that
√
[b(τ, p, σ)ψ (n) ](τ1 , p1 , σ1 , . . . , τn−1 , pn−1 , σn−1 ) = nψ (n) (τ, p, σ, τ1 , p1 , σ1 , . . . , τn−1 , pn−1 , σn−1 ).
If τ is a massive particle of spin sτ , then the operators a(∗) (τ, p, σ) transform under a transformation
e↑ as23
(b, A) ∈ P
+
s
(Φ(A)p)0
UFock (b, A)aτ (p, σ)UFock (b, A)−1 = e−ib·Φ(A)p
p0
sτ
X
−1
[D(sτ ) (MΦ(A)p
AMp )−1 ]σσ0 aτ (Φ(A)p, σ 0 ), (3.6)
σ 0 =−sτ
s
UFock (b, A)a∗τ (p, σ)UFock (b, A)−1 = eib·Φ(A)p
sτ
X
(Φ(A)p)0
p0
−1
[D(sτ ) (MΦ(A)p
AMp )−1 ]σσ0 a∗τ (Φ(A)p, σ 0 ), (3.7)
σ 0 =−sτ
23
(∗)
In order to save space, we will from now on always write aτ (p, σ) instead of a(∗) (τ, p, σ).
55
where Φ(A)p denotes the spatial part of Φ(A)p. If τ is a massless particle with (possible) helicity
στ , these transformations become
s
(Φ(A)p)0
UFock (b, A)aτ (p, στ )UFock (b, A)−1 = e−ib·Φ(A)p
p0
e
−1
−iστ α(MΦ(A)p
AMp )
s
UFock (b, A)a∗τ (p, στ )UFock (b, A)−1 = eib·Φ(A)p
e
aτ (Φ(A)p, στ ),
(3.8)
(Φ(A)p)0
p0
−1
iστ α(MΦ(A)p
AMp ) ∗
aτ (Φ(A)p, στ ),
(3.9)
where α(M ) is defined in the same manner as in (2.36). The annihilation and creation operators,
for both massive and massless particles, satisfy the (anti)commutation relations
[aτ (p0 , σ 0 ), a∗τ 0 (p, σ)]± = δτ τ 0 δσ0 σ δ(p0 − p),
(3.10)
[a∗τ (p0 , σ 0 ), a∗τ 0 (p, σ)]±
[aτ (p0 , σ 0 ), aτ 0 (p, σ)]±
= 0,
(3.11)
= 0,
(3.12)
where the minus sign holds whenever τ and τ 0 are both bosons and the plus sign holds whenever τ
and τ 0 are both fermions. When one of the particles τ or τ 0 is a boson and the other is a fermion,
then all creation and annihilation operators commute. Instead of aτ (p, σ) we will often simply
write a(q). Each operator A can then be represented in the form
e :=
A∼A
∞ X
∞ Z
X
0
0
dq10 . . . dqN
dq1 . . . dqM AN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN
)a(qM ) . . . a(q1 )
N =0 M =0
e 1 , Ψ2 i for all Ψ1 , Ψ2 ∈ H. Here the {AN M (q 0 , q)}N,M are
in the sense that hAΨ1 , Ψ2 i = hAΨ
0 , q , . . . , q . In particular, we can write the
complex-valued functions in the variables q10 , . . . , qN
1
M
Hamiltonian as
∞ X
∞ Z
X
0
0
H=
dq10 . . . dqN
dq1 . . . dqM HN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN
)a(qM ) . . . a(q1 ).
(3.13)
N =0 M =0
It can be shown that the cluster decomposition principle will be satisfied if the coefficients HN M (q 0 , q)
are of the form


N
M
X
X
e N M (q 0 , q),
HN M (q 0 , q) = δ 
p0i −
pj  H
i=1
j=1
e N M (q 0 , q) contains no further delta functions. Because the free particle Hamiltonian is
where H
always of the form
Z
H0 = dqE(q)a∗ (q)a(q),
p
with E(q) = E(τ, p, σ) = mτ 2 + p2 , it follows that the interaction V = H − H0 must be of the
form
∞ X
∞ Z
X
0
0
V =
dq10 . . . dqN
dq1 . . . dqM VN M (q 0 , q)a∗ (q10 ) . . . a∗ (qN
)a(qM ) . . . a(q1 )
(3.14)
N =0 M =0
P
PM
0
0
0
e
e
with VN M of the form δ( N
i=1 pi −
j=1 pj )VN M (q , q), where VN M (q , q) contains no further
delta functions. Note furthermore that V will be hermitian if and only if the coefficients satisfy
0 ,q ,...,q ) = V
0
0
VN M (q10 , . . . , qN
1
M
M N (q1 , . . . , qM , q1 , . . . , qN ).
56
3.2
The use of free quantum fields in scattering theory
To summarize the results of the previous section, for Lorentz invariance the operator V must be
of the form (3.2) with H (x) satisfying (3.3) and (3.5), and for the S-matrix to satisfy the cluster
decomposition principle, V must be of the form (3.14) with coefficients VN M (q 0 , q) as described
above. To satisfy both of these conditions, we will (at the end of this section) construct a scalar
density24
X X
X
0
τN −
τ10 +
τM
H (x) =
gj1 ...jN ,k1 ...kM (φτ1 )−
)+
j1 (x) . . . (φ )jN (x)(φ )k1 (x) . . . (φ
kM (x)
N,M j1 ,...,jN k1 ,...,kM
τ −
out of annihilation fields {(φτ )+
j (x)}j,τ and creation fields {(φ )j (x)}j,τ :
XZ
τ +
(φ )j (x) =
d3 puj (x; p, σ, τ )aτ (p, σ),
σ∈Iτ
(φτ )−
j (x)
=
XZ
d3 pvj (x; p, σ, τ )a∗τ (p, σ).
σ∈Iτ
e↑ the fields
Here the coefficients uj and vj are chosen so that under a transformation (b, M ) ∈ P
+
transform as
X
−1
(3.15)
UFock (b, M )(φτ )±
=
D(M −1 )jk (φτ )±
j (x)UFock (b, M )
k (Φ(M )x + b).
k
with {D(M −1 )jk }j,k some collection of numbers depending on M −1 . When we take M = 1 and
compare this expression with (3.6), (3.7), (3.8) and (3.9), we find that the coefficients are of the
form
uj (x; p, σ, τ ) = (2π)−3/2 uj (p, σ, τ )e−ip·x
vj (x; p, σ, τ ) = (2π)−3/2 vj (p, σ, τ )eip·x ,
p
where p0 = m2τ + p2 . Furthermore, by applying two respective transformations as in (3.15), it
can easily be seen that the matrices D(M ) form a representation of SL(2, C). Therefore, before
we can construct such fields we first need to consider the representations of SL(2, C).
Representations of SL(2, C)
Every representation of SL(2, C) can be written as a direct sum of irreducible representations
and because SL(2, C) is simply-connected, any representation D of SL(2, C) is completely determined by the corresponding representation ϕD of its Lie algebra sl(2, C). Recall that the
1
six matrices { 21 σj , 2i
σj }j=1,2,3 form a basis (over R) for the Lie algebra sl(2, C) of SL(2, C). If
ϕ : sl(2, C) → gl(V ) is any representation of sl(2, C) in some complex vector space V , then we
1
define on V the six linear maps Jk := iϕ( 2i
σk ) and Kk = iϕ( 12 σk ), k = 1, 2, 3. These linear maps
satisfy the commutation relations
[Ji , Jj ] = iijk Jk ,
[Ji , Kj ] = iijk Kk ,
[Ki , Kj ] = −iijk Jk .
We now define another six linear maps
Ak =
Bk =
24
1
(Jk + iKk ),
2
1
(Jk − iKk ),
2
Here τ denotes the particle species, as usual.
57
which satisfy the commutation relations
[Ai , Aj ] = iijk Ak ,
[Bi , Bj ] = iijk Bk ,
[Ai , Bj ] = 0.
From these relations it follows that V can be written as a tensor product
V = VA ⊗ VB
and that the action of Ak and Bk on V is given by
(A)
Ak .(vA ⊗ vB ) = (Sk vA ) ⊗ vB
(B)
Bk .(vA ⊗ vB ) = vA ⊗ (Sk vB ),
(A)
(B)
where vA ∈ VA , vB ∈ VB , A, B ∈ 21 Z≥0 and Sk , Sk are the spin operators corresponding to
spin A and B, respectively; see also equation (2.28) for the definition of spin operators. Note that,
in particular, the dimensions of the spaces VA and VB are 2A + 1 and 2B + 1, respectively, and
the dimension of V is (2A + 1)(2B + 1). We will label these representations as ϕ(A,B) and we will
(A)
(B)
write V (A,B) , VA and VB instead of V , VA and VB . These representations of sl(2, C) give rise
to irreducible representations D (A,B) of SL(2, C), which we will call the (A, B)-representation of
SL(2, C). This representation is given by
D (A,B) (M )(vA ⊗ vB ) = D(A) (M )vA ⊗ D(B) (M )vB ,
where the D(.) denote the SU (2) representations discussed in subsection 2.2.4. Note that the
(A, B)-representations are not unitary, due to the fact that SL(2, C) is non-compact. However,
the compact subgroup SU (2) ⊂ SL(2, C) is represented unitarily, with generators
Jk = Ak + Bk .
(A)
(B)
It follows that under an SU (2) transformation a vector v ∈ V (A,B) = VA ⊗ VB transforms as
the direct sum of spin j objects, with j = A + B, A + B − 1, . . . , |A − B|. Finally, we mention that
in case any (not necessarily irreducible) representation can be extended to a representation that
includes space inversion, there is an operator β that satisfies βJk β −1 = Jk and βKk β −1 = −Kk ,
and hence βAk β −1 = Bk and βBk β −1 = −Ak . Clearly, such an operator β only makes sense in
the (A, A)-representation and the (A, B) ⊕ (B, A)-representation.
Construction of general free fields
Now that we have found the representations of SL(2, C), we can write the annihilation and creation
fields corresponding to the (A, B)-representation as25
XZ
+
τ
−3/2
(φA,B )ab (x) = (2π)
d3 p(uA,B )ab (p, σ, τ )e−ip·x aτ (p, σ),
(3.16)
σ∈Iτ
(φτA,B )−
ab (x)
−3/2
= (2π)
XZ
d3 p(vA,B )ab (p, σ, τ )eip·x a∗τ (p, σ),
(3.17)
σ∈Iτ
where a = −A, −A + 1, . . . , A and b = −B, −B + 1, . . . , B. We will sometimes suppress the capital
letters A and B in the coefficients u and v to prevent that the equations become too wide. The
annihilation and creation fields transform according to
i
Xh
−1
(A,B)
−1
UFock (b, M )(φτA,B )±
(x)U
(b,
M
)
=
D
(M
)
(φτA,B )±
Fock
ab
a0 b0 (Φ(M )x + b). (3.18)
0 0
ab,a b
a0 ,b0
25
At this point we do not make any statements about the existence of such fields, i.e. about the question whether
(for any particle species τ ) components (uA,B )ab (p, σ, τ ) and (vA,B )ab (p, σ, τ ) can be found that transform properly.
Later we will answer this question for massive particles and massless particles separately.
58
The (anti-) commutation relations (3.10), (3.11) and (3.12) imply that the fields satisfy the
(anti)commutation relations
0
+
τ
[(φτA,B )+
ab (x), (φA0 ,B 0 )a0 b0 (y)]± = 0,
0
−
τ
[(φτA,B )−
ab (x), (φA0 ,B 0 )a0 b0 (y)]± = 0,
Z
δτ τ 0 X
−
τ0
d3 p(uA,B )ab (p, σ, τ )(vA,B )a0 b0 (p, σ, τ 0 )e−ip·(x−y) ,
[(φτA,B )+
(x),
(φ
)
(y)]
=
0
0
±
A ,B a0 b0
ab
(2π)3
σ∈Iτ
where the plus sign corresponds to the case where τ and τ 0 are both fermions and the minus sign
to the case where at least one of the particles τ and τ 0 is a boson. In order to obtain quantities
that either commute or anticommute at space-like distances (for reasons that will become clear
when we will come back to equation (3.5)), we will construct linear combinations
−
τ
(3.19)
(φτA,B )ab (x) := κ(φτA,B )+
ab (x) + λ(φA,B )ab (x)
Z
X
= (2π)−3/2
d3 p κe−ip·x uab (p, σ)aτ (p, σ) + λeip·x vab (p, σ)a∗τ (p, σ) ,
σ∈Iτ
where κ and λ (depending on τ and on the pair (A, B), but not on the components a and b) are
chosen so that if x − y is spacelike, we have
[(φτA,B )ab (x), (φτA0 ,B 0 )a0 b0 (y)]± = 0
(3.20)
[(φτA,B )ab (x), (φτA0 ,B 0 )∗a0 b0 (y)]±
(3.21)
= 0
for all pairs (A, B) and (A0 , B 0 ) for which these fields exist (again, we will come back to the existence
later). Note that instead of using the annihilation and creation fields {(φτA,B )± (x)}A,B for each
particle τ occuring in the scattering experiment, we will from now on use the fields {(φτA,B )(x)}A,B
and their adjoints {(φτA,B )∗ (x)}A,B . The correct values of κ and λ wil be given later, but first we
have to consider another problem. It might happen that particles that are created and annihilated
by these fields carry conserved quantum numbers, such as electric charge, in which case we must
be sure that H (x) commutes with the operator that corresponds to the conserved quantity. If
Q is an operator corresponding to some conserved quantum number (for example electric charge)
and if q(τ ) is the value of the conserved quantum number for particles of type τ , then we have the
commutation relations
[Q, aτ (p, σ)] = −q(τ )aτ (p, σ)
[Q, a∗τ (p, σ)] = q(τ )a∗τ (p, σ),
which show that the commutation relations between Q and the field (3.19) are not so pretty in
the sense that they do not allow an easy way to construct an interaction H (x) out of fields of the
form (3.19) that also commutes with Q. To solve this problem, we postulate that for each particle
type τ there exists another particle type τ C , called the antiparticle of τ , which has the same mass
but carries opposite values for all conserved quantum numbers. It can also happen that τ C = τ ,
in which case the antiparticle and the particle are identical and in this case there is no problem in
C
using the field (3.19). In case τ C 6= τ , we define the annihilation and creation fields (φτA,B )+
ab (x)
C
C
and (φτA,B )−
ab (x) to be the same as those in (3.16) and (3.17) but with every τ replaced with τ .
In this case we replace equation (3.19) with
C
−
τ
(φτA,B )ab (x) := κ(φτA,B )+
(3.22)
ab (x) + λ(φA,B )ab (x)
Z
X
= (2π)−3/2
d3 p κe−ip·x uab (p, σ)aτ (p, σ) + λeip·x vab (p, σ)a∗τ C (p, σ) ,
σ∈Iτ
with κ and λ still such that (3.20) and (3.21) are satisfied. If Q and q(τ ) are as above, then we
now have the simple commutation relations
[Q, (φτA,B )ab (x)] = −q(τ )(φτA,B )ab (x),
(3.23)
[Q, (φτA,B )∗ab (x)]
(3.24)
=
59
q(τ )(φτA,B )∗ab (x).
Note that these fields thus have the nice property that
[Q, (φτA,B )(x)(φτA,B )∗ (x)] = [Q, (φτA,B )(x)](φτA,B )∗ (x) + (φτA,B )(x)[Q, (φτA,B )∗ (x)]
= −q(τ )(φτA,B )(x)(φτA,B )∗ (x) + q(τ )(φτA,B )(x)(φτA,B )∗ (x)
= 0,
where we have suppressed the indices a and b. Later we will use a generalization of this property
to ensure that the interaction commutes with Q.
To summarize our results so far, for each particle type τ occuring in the scattering experiment
we construct (for certain pairs (A, B) depending on the particle species τ ) fields as in (3.19) if
C
τ C = τ and fields as in (3.22) if τ C 6= τ . However, we will not construct fields (φτA,B )(x), so for
each particle-antiparticle pair we must agree which of the two is the particle and which is the
antiparticle. Note that the partial derivatives of the fields (3.19) and (3.22) are given by
τ /τ C
−
∂µ (φτA,B )ab (x) = −ipµ κ(φτA,B )+
ab (x) + ipµ λ(φA,B )ab (x),
which we can regard as a different field on its own since it is also of the form (3.19) or (3.22). By
taking partial derivatives again, it follows immediately that each component (φτA,B )ab (x) of the
fields satisfies the Klein-Gordon equation:
( + m2τ )(φτA,B )ab (x) = 0.
For notational convenience we will combine all the irreducible fields {(φτA,B )(x)}τ,(A,B) that occur
in the description of a scattering experiment into a single field φ with components φj . The soobtained field φ of course no longer transforms irreducibly. Along with the field φ, we will also
consider its adjoint φ∗ . We can then construct very general Hamiltonian densities
X X
X
H (x) =
gj1 ...jN ,k1 ...kM : φj1 (x) . . . φjN (x)φ∗k1 (x) . . . φ∗kM (x) :
(3.25)
N,M j1 ,...,jN k1 ,...,kM
where the colons : : indicate normal ordering, i.e. the expression obtained by moving all creation
operators to the left of all annihilation operators while including a minus sign whenever two fermion
operators are interchanged. When we choose the coefficients gj1 ...jN ,k1 ...kM to transform properly,
H (x) will automatically satisfy the scalar condition (3.3). If each term in this interaction contains
an even number of fermion fields, then this interaction will also commute with itself at spacelikedistances. In equations (3.23) and (3.24) we have seen that for an operator Q that corresponds to
some conserved quantum number (such as electric charge) our fields satisfy commutation relations
of the form [Q, φj ] = −q(j)φj and [Q, φ∗j ] = q(j)φ∗j , so if each term in H (x) consists of a product
φj1 (x) . . . φjN (x)φ∗k1 (x) . . . φ∗kM (x) of fields and adjoint fields for which q(j1 ) + . . . + q(jN ) − q(k1 ) −
. . . − q(kM ) = 0, then H (x) will satisfy [Q, H (x)] = 0.
Before closing this section, we will give the explicit construction of the fields described above.
In order to find the coefficients uab and vab , as well as the constants κ and λ, we have to consider
the fields of massive particles and the fields of massless particles separately.
Massive particle fields
If τ is a massive particle, then the index σ takes values in {−sτ , . . . , sτ } and the coefficients are
given by
1 X h θ·S(A) i h −θ·S(B) i
e
e
CAB (sτ , σ; a0 , b0 )
(3.26)
uab (p, σ) = p
aa0
bb0
2ωp a0 ,b0
vab (p, σ) = (−1)jτ +σ uab (p, −σ),
where θ is the boost parameter corresponding to the boost that maps (m, 0, 0, 0) to (ωp , p) and
CAB (j, σ; a, b) are Clebsch-Gordan coefficients defined by
X
(s1 ,s2 )
vs,m
=
Cs1 s2 (s, m; m1 , m2 )vs1 ,m1 ⊗ vs2 ,m2 ,
m1 ,m2
60
where on the right-hand side the vectors vsi ,mi ∈ V (si ) denote the joint eigenvectors of [S(si ) ]2 and
(s )
(s1 ,s2 )
S3 i (in physics these vectors are written as |si mi i), and on the left-hand side the vectors vs,m
∈
(s)
(s)
(s
)
(s
)
1
2
(s
)
(s
)
(s)
2
V 1 ⊗ V 2 denote the joint eigenvectors of [S ] and S3 with Sk := Sk ⊗ 1 + 1 ⊗ Sk .
Before we can go on, we first make some remarks concerning the relationship between particles and
the fields that describe them. It is not true that any (A, B)-field can describe some given massive
particle species τ . By considering the transformation properties of the (A, B)-field under rotations,
it can be shown that it can only describe particles with spin A+B, A+B −1, . . . , |A−B|. Thus, for
a given particle species τ we can only construct (A, B)-fields for which |A − B| ≤ sτ ≤ A + B and
A + B − sτ ∈ Z≥0 . These different fields for τ are not physically distinct, however. For example,
when we take the derivative ∂µ φ0,0 (x) of the (0, 0)-field (or scalar field) φ0,0 (x), we obtain a field
that transforms as a ( 12 , 12 )-field (or vector field). In general, any (A, B)-field for a given particle
type τ can be expressed either as a rank 2B differential operator acting on a (sτ , 0)-field or as a
rank 2A differential operator acting on a (0, sτ )-field. For reasons that will become clear later, we
want that all (A, B)-fields that describe a single type of particle τ will commute with each other
when τ is a boson and anticommute with each other when τ is a fermion. To achieve this, we must
choose the constants κ and λ in (3.22) such that26 λ = (−1)2B κ. By adjusting the overall scale of
the field, we can then write it as
(φτA,B )ab (x)
−3/2
= (2π)
Z
sτ
X
d3 puab (p, σ, τ ) e−ip·x aτ (p, σ) + (−1)2B+sτ +σ eip·x a∗τ C (p, −σ) .
σ=−sτ
As shown in section 5.7 of [35] (in particular, equation (5.7.53)), the adjoints of the components of
an (A, B)-field for a particle with τ C = τ are proportional to the components of the (B, A)-field
for the particle τ , so if τ C = τ then the adjoint fields do not give rise to new kinds of objects.
Finally, we will mention the transformation properties of these fields under space inversions.
In the (A, A)-representation the field transforms according to
PφτA,A (x)P−1 = (−1)2A−j ξτ φτA,A (Is x),
with ξτ the intrinsic parity defined in the previous chapter, while in the (A, B)⊕(B, A)-representation
it transforms according to
τ
τ
φA,B (x)
φB,A (Is x)
−1
A+B−sτ
P τ
P = (−1)
ξτ
.
(3.27)
φB,A (x)
φτA,B (Is x)
So under space inversion, the (A, B) ⊕ (B, A)-representation becomes the (B, A) ⊕ (A, B)- representation. In appendix B we give some examples of free massive fields that can be obtained by
explicit calculation of the coefficients uab (p, σ, τ ) for some representation (A, B). As is shown in
that appendix, the ( 21 , 0) ⊕ (0, 21 )-field automatically satisfies the Dirac equation.
Massless particle fields
As we have already mentioned when we discussed parity, it is sometimes necessary to identify two
massless particles that only differ from each other in the fact that they have opposite helicities.
We will first consider the case where the massless particle τ can have only one helicity στ . In this
case the field has the form
Z
τ
−3/2
(φA,B )ab (x) = (2π)
d3 p κe−ip·x uab (p, στ )aτ (p, στ ) + λeip·x vab (p, στ C )a∗τ C (p, στ C ) .
Just as the (A, B)-field for massive particles could only describe particles with spin sτ for which
|A − B| ≤ sτ ≤ A + B, the (A, B)-field for massless particles can only describe particles for which
26
In deriving this result (see section 5.7 of [35]) it becomes clear that fields which describe particles with integer
spin commute, while fields that describe particles with half-odd integer spin anticommute. In other words, particles
with integer spin are bosons and particles with half-odd integer spin are fermions. This is the content of the famous
spin-statistics theorem. We will come back to this theorem in the next chapter.
61
στ = A − B and στ C = −στ . Thus the simplest field that one can construct for a massless particle
τ is the (στ , 0)-field if στ ≥ 0 and the (0, |στ |)-field if στ < 0. The (B + στ , B)-fields are 2Bth
order derivatives of the (στ , 0)-field and the (A, A + |στ |)-fields are 2Ath order derivatives of the
(0, |στ |)-field. As stated before, the only irreducible representations (A, B) that can be extended
to a representation that also contains space inversion are the representations for which A = B.
In the present case of massless particles this representation can only be chosen if στ = 0 and in
that case we have A = B = 0, i.e. a scalar field. For στ 6= 0 we again have to consider fields
of type (A, B) ⊕ (B, A). However, in order to obtain a transformation law for massless particles
analogous to (3.27), we must identify the particle type τ with the particle type which is obtained
by substituting στ → −στ . Otherwise the two τ ’s on the right-hand side of (3.27) cannot be the
same as the ones on the left-hand side, because the τ s necessarily have opposite helicity (because
A and B are switched).
3.3
Calculation of the S-matrix using perturbation theory
In this section we will give a brief overview of how physicists use perturbation theory to calculate
the S-matrix elements for some scattering process. If ψα , ψβ ∈ HFock are two Fock space vectors
∗ in
M
∗ out
given by ψα = ΠN
j=1 a (qj )ΩFock and ψβ = Πj=1 a (qj )ΩFock with the qj specifying particle species,
momentum and spin, then it follows from equation (3.4) that the S-matrix element hSψα , ψβ i is
equal to
Z
∞
X
M
1
∗ in
Πj=1 a(qjout )T {H (x1 ) . . . H (xn )}ΠN
j=1 a (qj )ΩFock , ΩFock dx1 . . . dxn .
n
i n! R4n
n=0
We can write the interaction density (3.25) as
H (x) =
X
gi Hi (x),
i
where i denotes a multi-index and each Hi (x) is a normal-ordered product of fields and field
adjoints. The task is therefore to calculate expressions of the form
M
∗ in
Πj=1 a(qjout )T {Hi1 (x1 ) . . . Hin (xn )}ΠN
j=1 a (qj )ΩFock , ΩFock
or, in terms of the field φ
D
E
(∗)
(∗)
(∗)
(∗)
out
N
∗ in
ΠM
a(q
)T
{:
φ
(x
)
.
.
.
φ
(x
)
:
.
.
.
:
φ
(x
)
.
.
.
φ
(x
)
:}Π
a
(q
)Ω
,
Ω
Fock
Fock ,
j=1
j
j=1
j
i1 ,1 1
in ,1 n
i1 ,k(1) 1
in ,k(n) n
where the double indices of the fields are defined by
(∗)
(∗)
Him (x) =: φim ,1 (x) . . . φim ,k(im ) (x) : .
For notational convenience, we will write such expressions simply as
hT {A1 . . . An }ΩFock , ΩFock i ,
where the Aj are either fields/field adjoints or creation/annihilation operators and, by definition,
the time ordering has no effect on the a(qjout ) and a∗ (qjin ). For any pair (Aj , Ak ) we define the
contraction by
z }| {
Aj Ak = T {Aj Ak }− : Aj Ak :
(time-ordering minus normal-ordering), where T {Aj Ak } = Aj Ak whenever at least one of the two
operators does not depend on time. Contractions happen to be always scalar multiples of the
62
identity operator, and we will identify the contraction with this scalar (thus forgetting about the
identity operator). Then according to Wick’s theorem we have for n even
hT {A1 . . . An }ΩFock , ΩFock i =
X
z }| {
z }| {
(P ) Aj1 Aj2 . . . Ajn−1 Ajn ,
(3.28)
P
where the sum runs over all groupings P of the Aj ’s into n/2 pairs with the ordering in each pair
the same as on the left-hand side, so j1 < j2 , j3 < j4 , . . . , jn−1 < jn . The sign (P ) depends on
the number of fermion interchanges that were needed to bring the two elements in each pair next
to each other. For n odd, the expression on the left-hand side is simply zero. Recalling that the
Aj are actually fields or creation/annihilation operators, we see that only a few contractions are
nonzero. Their contributions are27 :
z
}|
{
out ∗
in
in
out
in
aτ (C) (pout
,
σ
)a
(p
,
σ
j
j
k
k ) = δσjout σkin δ(pj − pk )
τ (C)
}|
{
z
out
out
τ ∗
out
aτ (pout
,
σ
)(φ
)
(x)
= (2π)−3/2 eipj ·x u` (pout
j
j
`
j , σj )
}|
{
z
out
out
out
τ
out
aτ C (pj , σj )(φ )` (x) = (2π)−3/2 eipj ·x v` (pout
j , σj )
z
}|
{
in
−3/2 −ipin
in
(φτ )` (x)a∗τ (pin
,
σ
e j ·x u` (pin
j
j ) = (2π)
j , σj )
z
}|
{
in
−3/2 −ipin
in
,
σ
e j ·x v` (pin
(φτ )∗` (x)a∗τ C (pin
j
j ) = (2π)
j , σj )
z
}|
{
+ ∗
φj (xm )φ∗k (xm+l ) = θ(x0m − x0m+l )[φ+
j (xm ), (φk ) (xm+l )]∓
−
∗
±θ(x0m+l − x0m )[(φ−
k ) (xm+l ), φj (xm )]∓
=: −i∆jk (xm , xm+l )
z
}|
{
φ∗j (xm )φk (xm+l )
=
−
∗
θ(x0m − x0m+l )[(φ−
j ) (xm ), φk (xm+l )]∓
+ ∗
±θ(x0m+l − x0m )[φ+
k (xm+l ), (φj ) (xm )]∓
=
∓i∆kj (xm+l , xm )
where in the last two equalities m, l ≥ 1 and θ : R → {0, 1} denotes the step function and the
lower signs correspond to the case where both components φj and φk are fermionic. Furthermore,
+
−
the φ±
j (x) refer to the decomposition φj (x) = φj (x) + φj (x) of the field into an annihilation field
and a creation field. The quantities ∆ij (x, y) are called propagators.
The nonzero terms in (3.28) are products of pairings of the forms above and such products
of pairings can be graphically represented by Feynman diagrams. In these diagrams the initial
particles are represented by vertices at the bottom of the diagram and the final particles are
represented by vertices at the top of the diagram, and these external vertices are labeled by their
corresponding particle species, momenta and spin states. Each Hj (xk ) is represented by a vertex
that is drawn between the initial and final vertices, and each such internal vertex is labeled by
j and xk . Each pairing is then represented by a line connecting the two vertices that represent
the two paired objects. Here we mean, in particular, that if a field or adjoint field in Hj (xk ) is
paired with some other object then we draw a line between that object and the vertex labeled by
j, xk . Of course this implies that if Hj (xk ) is a product of K fields and adjoint fields, then there
are K lines that connect the vertex j, xk with other vertices. Each line that connects the external
vertex of a particle carries an upward arrow, while each line that connects the external vertex of
an antiparticle carries a downward arrow. Internal lines, representing a pairing of a field with an
adjoint field, carry an arrow that points from the vertex of the adjoint field to the vertex of the
27
Here τ C denotes the antiparticle of τ , as usual. In each line where there are two τ s, these two τ s are the
same. In the first line τ (C) is either τ or τ C , but the same choice must be made for both factors in the first line.
Furthermore, the field φ can be decomposed into irreducible components each of which corresponds to a single
particle (not antiparticle) species. If we write (φτ )` instead of φ` we mean that the `th component of φ belongs to
an irreducible representation corresponding to particle species τ .
63
field. All lines in the diagram are labeled by the value of the corresponding contraction. The term
in (3.28) that corresponds to the diagram can now be recovered from the diagram as follows. For
each internal vertex j, xk we write a factor −igj and for each line we write a factor that is equal to
its label (i.e. to the corresponding contraction). The result is then integrated over all coordinates
xk , yielding the desired term in (3.28). In practice, one uses the Feynman diagrams to calculate all
terms (up to some order) in the expansion of the S-matrix elements. However, we will not discuss
the details of this procedure here, nor will we discuss the renormalization procedure that is needed
whenever the obtained expressions are infinite.
3.4
Obtaining V from a Lagrangian
In the previous section we mentioned that perturbation theory can be used to compute S-matrix
elements when we are given an expression for the interaction V (in the interaction picture) in terms
of the free fields. In this section we will show how this expression for V can be obtained from
a Lagrangian. In practice, this is very useful because Lagrangians are often easier to guess than
Hamiltonians. Furthermore, if there is Poincaré invariance in the Lagrangian formalism, then the
S-matrix will automatically be Poincaré invariant, even if the requirements (3.2) and (3.3) are not
exactly satisfied; see section 7.4 of [35]. These Lagrangians are given in the context of a classical
Lagrangian field theory, so we will first discuss classical Lagrangian field theory. Next we will make
the transition to Hamiltonian classical field theory, and then we will quantize the theory to obtain
Heisenberg picture quantum fields. After quantization we can then make the transition from the
Heisenberg picture to the interaction picture, and finally derive the expression for V . Although
the main structure of this chapter is based on [35], some parts of the present section are based
more on [6].
Classical relativistic field theory
In classical relativistic field theory, the field components φ1 (x), . . . , φn (x) are complex-valued28
e↑ as29
functions on spacetime that transform under a transformation (a, A) ∈ P
+
φj (x) →
n
X
[D(A)]jk φk (Φ(A)x + a),
k=1
where D is a representation of SL(2, C) and Φ : SL(2, C) → L↑+ is the covering map. Note that
the partial derivatives ∂µ φj (x) also transform in such a way, namely as the tensor product of the
representation D with the representation A 7→ Φ(A) (which transforms the index µ in ∂µ ). For
fixed time x0 = t, the fields φj (t, x) and their time derivatives φ̇(t, x) are functions of the space
coordinates x. We will call the space of all such possible functions the configuration space of the
system. This configuration space is thus a function space, the elements of which are ordered 2ntuples (q1 (x), . . . , qn (x), q̇1 (x), . . . , q̇n (x)) of functions qj and q̇j with certain smoothness conditions
(and perhaps some boundary conditions as well); the notation q̇j has nothing to do with the timederivative of qj (which does not even depend on time). The Lagrangian L is a functional L[qj , q̇j ]
of these 2n functions. We will always assume that there is a smooth function L(aj , bk , cl ) on
Cn+3n+n = C5n such that L[qj , q̇j ] can be written as
Z
L[qj , q̇j ] =
L(qj (x), ∇qj (x), q̇j (x))d3 x.
R3
In particular, when these functions qj (x) and q̇j (x) are taken to be the values of the field components φj (x) and their time-derivatives ∂0 φ(x) at a fixed moment x0 = t of time, we can define a
28
Here we consider real-valued functions as a special subclass of complex-valued functions.
This transformation property is only approximately true for components that represent gauge fields, where the
transformation rule may contain gauge transformations. Such extra terms have no effect on the Poincaré invariance
of the theory because the action S (we will define actions below) is always considered to be invariant under such
gauge transformations.
29
64
functional L(t) by:
Z
L(φ(t, x), ∇φ(t, x), ∂0 φ(t, x))d3 x
L(t) = L[φ(t, .), ∂0 φ(t, .)] =
3
R
Z
=
L(φ(t, x), ∂µ φ(t, x))d3 x,
R3
where φ(t, .) and ∂0 φ(t, .) denote the functions x 7→ φ(t, x), and x 7→ ∂0 φ(t, x). In the last step
we simply used a different convention for writing the dependence of L on its arguments. For given
φj (x), the action S is now defined as the time integral of L(t) and it is a functional of the fields
φj (x), but not of their time-derivatives, since these can be calculated once we know the fields (at
all times),
Z
L(φ(x), ∂µ φ(x))d4 x.
S = S[φ(.)] =
M
The action is manifestly invariant under spacetime translations φj (x) → φj (x+a) and it is assumed
that it is also a real-valued Lorentz-scalar. The equations of motion (or field equations) are obtained
by demanding that the action is stationary under variations; these equations are
∂L
∂L
− ∂µ
= 0,
∂φj
∂(∂µ φj )
for j = 1, . . . , n, and are called the Euler-Lagrange equations.
Suppose that the action S is invariant (by which we mean that δS = 0) under an infinitesimal
variation
φj (x) → φj (x) + Fj (x),
where is a constant and the Fj (x) are functions of the field components and their derivatives at
the point x. We also call this an infinitesimal symmetry of the action. Then there exists a current
J µ (x) that is conserved, i.e. ∂µ J µ (x) = 0. In particular, this implies that we can define a quantity
Z
Q(t) :=
J 0 (t, x)d3 x
R3
that is conserved in the sense that dQ
dt ≡ 0; in other words, an infinitesimal symmetry of the
action S implies the existence of a conserved quantity. This statement is also known as Noether’s
theorem. In cases where not only the action S is invariant under a certain infinitesimal variation
of the fields, but also the Lagrangian (this is the case for spatial translations and rotations), one
can write down an explicit expression for the conserved quantity Q in terms of the Lagrangian L
and the functions Fj (x). If the Lagrangian density L is also invariant under the variation, then
one can even find an explicit expression for the current J µ (x) in terms of the Lagrangian density
and the functions Fj (x). These expressions can be found in section 7.3 of [35]. As already stated
above, the action is invariant under spacetime translations, which can be described in infinitesimal
form as φj (x) → φj (x) + µ ∂µ φj (x). These are in fact four independent symmetries (one for each
spacetime dimension) and hence there are four conserved currents T µ0 (x), . . . , T µ3 (x). Because
the Lagrangian density is not invariant under translations, one cannot use the explicit expression
for the currents that we mentioned above. However, it is possible to derive an expression for these
currents by more direct means, see section 7.3 of [35]; the result is
T µν :=
n
X
j=1
∂L
∂ ν φj − η µν L,
∂(∂µ φj )
which is also called the energy-momentum tensor (the index ν is a Lorentz index, so this is indeed
a tensor), and the corresponding conserved quantities are


Z
Z X
n
∂L
P µ = T 0µ d3 x = 
∂ µ φj − η 0µ L d3 x.
∂(∂0 φj )
j=1
65
The conserved quantities P µ are interpreted as the four-momentum. The conserved currents
{M µνρ }ν,ρ corresponding to Lorentz invariance are given by
M
µρσ
n
∂L
1 X
1
=
[Dρσ ]jk φk − (T µρ xσ − T µσ xρ ),
2
∂(∂µ φj )
2
j,k=1
where D is the Lie algebra representation of l ' sl(2, C) induced by D, and Dρσ = D(X ρσ ) with
X µν ∈ l as defined before. The corresponding conserved quantities J ρσ are then
Z
M 0ρσ d3 x.
J ρσ =
R3
If (i, j, k) is a cyclic permutation of (1, 2, 3), then J ij is interpreted as the xk -component of the
angular momentum.
Although the Lagrangian formulation of classical fields is very useful in order to describe symmetries, it will also be necessary to construct the Hamiltonian formalism from a given Lagrangian.
The reason for this is that the Hamiltonian formalism involves Poisson (and Dirac) brackets, which
can be used to postulate the commutation relations of the field component in the corresponding
quantum theory. Also, after quantizing in the Hamiltonian formalism, the time evolution of the
(quantized) fields can be stated in a simple form. As described above, the Lagrangian L is a
functional on the configuration space and this configuration space consists of 2n-tuples of functions (qj (x), q̇j (x)). We now define a second space, called the phase space, which also consists of
2n-tuples (qj (x), pj (x)), where the first n functions are allowed to be functions of the same class as
the first n functions in the elements of configuration space, but the last n objects pj (x) are defined
as linear functionals
Z
q̇ 7→
pj (x)q̇(x)d3 x
R3
on the space of all allowed functions q̇(x) (note that this is very similar to the case of finitely many
particles, where the configuration space is the tangent bundle of some smooth manifold and the
phase space is the cotangent bundle of the same manifold). In order to define the Poisson brackets
we need to define the notion of a functional derivative of a functional. Let A be a functional
of k functions f1 (x), . . . , fk (x), i.e. A is a map (f1 , . . . , fk ) 7→ A[f1 , . . . , fk ]. We then define the
functional derivative of A with respect to fj at the point (g1 , . . . , gk ) as the linear map
d
h 7→
A[g1 , . . . , gj−1 , gj + h, gj+1 , . . . , gk ]
,
d
=0
where h(x) is a function in the same class as the fj . The integral kernel of this map is denoted by
δA/δfj :
Z
d
δA[f1 , . . . , fk ] A[g1 , . . . , gj−1 , gj + h, gj+1 , . . . , gk ]
h(x)d3 x
=
d
δf
(x)
3
j
R
=0
(g1 ,...,gk )
and this integral kernel will also be called the functional derivative of A with respect to fj . We
may interpret this functional derivative as a functional
Z
δA[f1 , . . . , fk ] (g1 , . . . , gk , h) 7→
h(x)d3 x,
δf
(x)
3
j
R
(g1 ,...,gk )
which is linear in h, as already stated above. Recall that in writing partial derivatives of ordinary
functions, we often denote the partial derivative
of f (x1 , . . . , xn ) with respect to xj at the point
∂f (y1 ,...,yn )
∂f (x1 ,...,xn ) (y1 , . . . , yn ) as
instead of
. Similarly, we will often write the functional
∂yj
∂xj
(y1 ,...,yn )
derivative of the functional A[f1 , . . . , fk ] above with respect to fj at the point (g1 , . . . , gk ) as
δA[g1 , . . . , gk ]
.
δgj (x)
66
When the functional A depends in a well-behaved manner on the functions fj , then it might
happen that these functional derivatives depend on x in a well-behaved manner, and in that case
we can consider them as a map
(x, g1 , . . . , gk ) 7→
δA[g1 , . . . , gk ]
,
δgj (x)
i.e. as a functional of the gj , depending in a well-behaved manner on the variable x. We now
apply these definitions as follows. Let F [q, p] and G[q, p] be two functionals on phase space. We
then define the Poisson bracket {F, G}P of F and G by
{F, G}P :=
n Z
X
j=1
R3
δF [q, p] δG[q, p] δG[q, p] δF [q, p] 3
d x.
−
δqj (x) δpj (x)
δqj (x) δpj (x)
In particular, for the functionals Fj,x [q, p] = qj (x) and Gk,y [q, p] = pk (y) we find that their Poisson
bracket is δjk δ(x−y). We will now use the Lagrangian L[q, q̇] to define a function from configuration
space to phase space by
(q1 (x), . . . , qn (x), q̇1 (x), . . . , q̇n (x)) 7→ (q1 (x), . . . , qn (x), π1 [q, q̇](x), . . . , πn [q, q̇](x)),
(3.29)
where the πj [q, q̇] are given by
πj [q, q̇](x) =
δL[q, q̇]
.
δ q̇j (x)
A priori the x-dependence should be interpreted in the sense of linear functionals as described
above, but because the Lagrangian L is a space integral of a Lagrangian density L(aj , bk , cl ) (with
j, l ∈ {1, . . . , n} and k ∈ {1, . . . , 3n}), the πj [q, q̇](x) are well-behaved functions given by
∂L(q(x), (∇q)(x), c) πj [q, q̇](x) =
,
∂cj
c=q̇(x)
It might happen that the map (3.29) gives rise to certain constraints, e.g. if L(aj , bk , cl ) does not
depend on cm for some m ∈ {1, . . . , n} then the image of configuration space under the map (3.29)
only contains 2n-tuples of the form (q1 , . . . , qn , p1 , . . . , pm−1 , 0, pm+1 , . . . , pn ). In this case we can
write the constraint as pm ≡ 0. In general, we will consider constraints of the form
αm,x [q, p] = 0,
where for each pair (m, x) ∈ {1, . . . , M } × R3 , the functional αm,x [q, p] is a function of the qs and
ps and their derivatives at the point x. These constraints are called primary constraints because
they already follow from the definition of the functions πj , without using equations of motion. We
will assume that these primary constraints are all independent of one another. We now define a
functional H 0 [q, q̇] on configuration space by
Z
0
0
H [q, q̇] = H [q, q̇, π[q, q̇]] :=
q̇j (x)πj [q, q̇](x)d3 x − L[q, q̇].
R3
It can be shown that H 0 [q, q̇] is actually a functional H 0 [q, π[q, q̇]], i.e. that the dependence on q̇ is
only via π. If there are primary constraints, we cannot extend H 0 uniquely to a functional H[q, p]
on the entire phase space. Indeed, if H[q, p] is such that H[q, π[q, q̇]] = H 0 [q, q̇], then
HT [q, p] = H[q, p] +
M Z
X
um,x [q, p]αm,x [q, p]d3 x
3
m=1 R
also satisfies this property for any set of functionals um,x [q, p], since αj,x [q, p] vanishes when we
substitute π[q, q̇] for p. However, as we will explain below, not all of the HT of the form above
will give rise to Hamiltonian equations of motion in the phase space that are consistent with
67
the primary constraints. To see this, we first have to introduce the concept of time evolution of
the qs and ps. Analogues to the theory of Hamiltonian particle dynamics, the time evolution in
Hamiltonian field theory is described by maps
t 7→ (q (t) , p(t) ),
where for each t ∈ R, (q (t) , p(t) ) is an element in phase space. This time evolution is given by the
Hamiltonian equations of motion
(t)
∂qj (x)
∂t
(t)
∂pj (x)
∂t
∂
Fj,x [q (t) , p(t) ]
∂t
≈ {Fj,x [q (t) , p(t) ], HT [q (t) , p(t) ]}P
=
∂
Gj,x [q (t) , p(t) ]
∂t
≈ {Gj,x [q (t) , p(t) ], HT [q (t) , p(t) ]}P ,
=
where Fj,x [q, p] = qj (x) and Gj,x [q, p] = pj (x), and ≈ means that the equality only holds for those
qs and ps that satisfy the constraints. This time evolution of the qs and ps determines the time
evolution for any functional g[q, p] by t 7→ g[q (t) , p(t) ] =: g (t) [q, p], which is equivalent to
d (t)
(t)
g [q, p] ≈ {g (t) [q, p], HT [q, p]}P .
dt
In particular, for g = HT we find that
d (t)
H [q, p] ≈ 0.
dt T
Therefore, we often suppress the time dependence of HT . We can also take g to be a constraint
functional αj,x . Because the constraints must be satisfied for all time, we have the following
equations
{αm,x [q, p], HT [q, p]}P ≈ 0
(3.30)
for m = 1, . . . , M . There are now three options for any one of these equations:
(1) The equation is trivially true because it follows from the primary constraints;
(2) The equation reduces to an equation not involving the um,x [q, p]. In this case we obtain a new
constraint of the form βx [q, p] ≈ 0, which is called a secondary constraint;
(3) The equation does not reduce in any of the two manners described above. In this case we
obtain an equation that describes restrictions on the um,x , namely
{αm,x [q, p], H[q, p]}P +
M Z
X
m0 =1
d3 xum0 ,x [q, p]{αm,x [q, p], αm0 ,x [q, p]}P ≈ 0.
(3.31)
R3
The procedure is now as follows. For each value of m we check which of the three options is
satisfied. If it is (1), then we obtain nothing new and we move on to the next m. If it is (2),
then we have obtained a new constraint βx [q, p] ≈ 0 and we must demand consistency of this new
constraint by substituting it into (3.30) instead of αm,x [q, p]:
{βm,x [q, p], HT [q, p]}P ≈ 0.
We then have to check again which of the three options is satisfied for this new equation. Finally,
if option (3) is satisfied then we have obtained a constraint on the um,x [q, p] and we move on to
the next m. The final result of this procedure is that we are left with M primary constraints
αm,x [q, p] ≈ 0, K secondary constraints βk,x [q, p] ≈ 0 and L constraints on the um,x of the form
(3.31). Because the distinction between primary and secondary constraints is not really necessary,
we will use the letter χ for both primary and secondary constraints from now on, χm,x := αm,x
68
for m = 1, . . . , M and χM +k,x = βk,x for k = 1, . . . , K, so we can write the set of primary and
secondary constraints as
χn,x [q, p] ≈ 0,
with n = 1, . . . , M + K. The constraints on the um,x [q, p] are now of the form
M Z
X
{χn,x [q, p], H[q, p]}P +
d3 yum,y [q, p]{χn,x [q, p], χm,y [q, p]}P ≈ 0
3
m=1 R
for some of the n ∈ {1, . . . , M + K}. We can interpret this as a system of non-homogeneous linear
equations in the unknown variables um,x [q, p]. If Um,x [q, p] is any particular solution of this system,
then the general solution can be written as
um,x [q, p] = Um,x [q, p] +
A Z
X
d3 zva,z [q, p] · (V (a,z) )m,x [q, p],
R3
a=1
where the va,x [q, p] are arbitrary functionals and the (V (a,z) )m,x [q, p] are all independent solutions
of the corresponding homogeneous system:
M Z
X
d3 y(V a,z )m,x [q, p]{χn,x [q, p], χm,y [q, p]}P ≈ 0.
3
m=1 R
The index (a, z), which labels the solutions to the homogeneous system, in general takes on less
values than the index (m, x), so the constraint equations on um,x [q, p] have reduced the arbitrariness
of the Hamiltonian HT somewhat. The Hamiltonian HT can now be written as
HT [q, p]
=
H[q, p]
M Z
X
+
d3 x Um,x [q, p] +
3
m=1 R
=
H[q, p] +
Z
+
a=1
d3 zva,z [q, p](V (a,z) )m,x [q, p] χm,x [q, p]
d3 xUm,x [q, p]χm,x [q, p]
R3
d3 zva,z [q, p]
R3
e p] +
=: H[q,
!
R3
a=1
M
X
m=1
A Z
X
A Z
X
M Z
X
!
d3 x(V (a,z) )m,x [q, p]χm,x [q, p]
3
m=1 R
A Z
X
a=1
d3 zva,z [q, p]e
χa,z [q, p].
R3
We have thus obtained an expression for the Hamiltonian in which the arbitrariness is made very
explicit. The equations of motion for any functional g (t) [q, p] can be written in terms of this
Hamiltonian as
A
X
d (t)
e p]}P +
g [q, p] ≈ {g (t) [q, p], H[q,
dt
Z
a=1
R3
(t)
d3 zva,z [q, p]{g (t) [q, p], χ
ea,z [q, p]}P ,
(t)
where the time dependence of the va,z [q, p] is arbitrary. Because of this arbitrariness in the equations of motion, the time evolution of g (t) [q, p] is not uniquely defined. The physical interpretation
(t)
of this is that different choices for va,z [q, p] correspond to the same physical situation; the system
has some gauge freedom. The infinitesimal gauge transformations are of the form
g[q, p] → g[q, p] +
A Z
X
a=1
R3
d3 za,z {g[q, p], χ
ea,z [q, p]}P .
69
(3.32)
Transformations of this kind do not change the physical state. The χ
ea,y [q, p] are called generating
functionals for the infinitesimal gauge transformations. It can be shown that the Poisson bracket
{e
χa,z [q, p], χ
ea0 ,z0 [q, p]}P of two generating functionals is again a generating functional. In general,
this gives rise to new gauge transformation, other than (3.32). For a better understanding of
theses new gauge transformations, we introduce some useful terminology. Any functional A[q, p]
for which
{A[q, p], χn,x [q, p]}P ≈ 0
for any pair (n, x) is called first class, and it is easy to see that the Poisson bracket of two first
class functionals is again first class. Any functional that is not first class is called second class. We
can apply this terminology to the constraint functionals χn,x [q, p] themselves. In this manner the
constraints can be divided into first class constraints and second class constraints. The constraint
functionals χ
ea,z [q, p], which are generating functionals for gauge transformations of the form (3.32),
are all first class (and primary). Because the Poisson bracket of two first class functionals is again
first class, we find that the {e
χa,y [q, p], χ
ea0 ,y0 [q, p]}P are first class constraints. However, it might
happen that some of these are not primary (and hence secondary) and in this case we obtain
gauge transformations other than (3.32). The corresponding first class secondary constraints can
be added to the Hamiltonian in a similar manner as the χ
ea,y [q, p] without changing the physics;
this new (and more general) Hamiltonian is called the extended Hamiltonian. In general it is not
true that every first class secondary constraint generates a gauge transformation (so we should not
add them all to the Hamiltonian), but in all physically interesting models this turns out to be the
case. In these models all first class constraints can be eliminated by choosing a gauge, so we from
now on we will only need to focus on second class constraints. In particular, we will only discuss
the quantization of a system with second class constraints.
Quantization
For a system with second class constraints χn,x [q, p] ≈ 0 we define the “matrix”
C(n1 ,x1 ),(n2 ,x2 ) [q, p] = {χn1 ,x1 [q, p], χn2 ,x2 [q, p]}P .
Because the constraints are second class, this matrix has an inverse (C −1 )(n1 ,x1 ),(n2 ,x2 ) [q, p] (recall
that we are not trying to be mathematically rigorous in this chapter). We then define the Dirac
bracket of two functionals by
{A[q, p], B[q, p]}D := {A[q, p], B[q, p]}P
X Z
−
d3 x1 d3 x2 {A[q, p], χn1 ,x1 [q, p]}P (C −1 )(n1 ,x1 ),(n2 ,x2 ) [q, p]
n1 ,n2
{χn2 ,x2 [q, p], B[q, p]}P .
In quantizing this system, the functionals become operators and we impose on them the commutation relations
[A, B] = i{A, B}D
and the time evolution in the Heisenberg picture of the corresponding quantum system is given
by the Heisenberg equations of motions, as usual. The reason for choosing these commutation
relations can be explained as follows for the case of finitely many30 degrees of freedom. Recall
that for unconstraint systems the commutation relations are defined to be i times the Poisson
bracket. If we have a classical system with only second class constraints, then it can be shown
that there exists a canonical transformation31 that transforms the coordinates q1 , . . . , qn and their
b1 , . . . , Q
b r with canoncanonical conjugates p1 , . . . , pn to a set of coordinates Q1 , . . . , Qn−r and Q
b
b
ical conjugates P1 , . . . , Pn−r and P1 , . . . , Pr , respectively, such that the constraints take the form
b j = Pbj = 0 with j = 1, . . . , r. The Qs and the P s then form an unconstraint system which
Q
30
Taking finitely many degrees of freedom makes the argument a little easier.
This is a coordinate transformation for which the new coordinates have the same Poisson brackets as the old
ones. Here the Poisson brackets of the new coordinates are computed with respect to the old coordinates.
31
70
we know how to quantize, namely by taking the commutators to be i times the Poisson brackets
(in terms of the unconstraint P s and Qs of course). But it can be shown that the Dirac bracket,
calculated in the original coordinates qj and pj , coincides with the Poisson bracket calculated in
terms Qs and P s, so using the Dirac bracket indeed gives the desired result. More details about
this quantization procedure can be found in section 7.6 of [35].
Transition to the interaction picture
Many (or perhaps even all) of the free fields that we constructed at the beginning of this chapter can be obtained by quantizing a classical Lagrangian field theory according to the procedure
described above, and in these cases the Lagrangian is known explicitly. However, the expansion
of the free fields in terms of creation and annihilation operators does not arise in any way from
the quantization procedure described above. Instead, this expansion can be obtained by solving
the classical equations of motion (which are linear differential equations for these free field Lagragians) and writing the general solution as a Fourier expansion. The Fourier coefficients are then
replaced with the creation and annihilation operators in the corresponding quantized field. Since
the commutation relations of the quantized free fields are dictated by the quantization procedure
above, the commutation relations of the creation and annihilation operators are also determined
by the quantization procedure. However, these commutation relations turn out to be precisely the
ones that we had before (so there is some consistency here). Furthermore, the free Hamiltonian
that is derived from the Lagrangian is also of the correct form when we write it in terms of the
creation and annihilation operators.
Suppose now that we are given a Lagrangian for an interacting field theory. Once we have
derived the Hamiltonian from the Lagrangian by the procedure described above, we will split the
Hamiltonian into a sum H = H0 + V of a free part H0 and an interaction part V . Finding the
correct free part H0 of H is not a very difficult task, because for all physically relevant free fields
we know the explicit form of the Lagrangian (and hence also of the Hamiltonian). Once we have
the decomposition H = H0 +V , we make the transition to the interaction picture at time t = 0. So
at t = 0 the interaction picture fields and their canonical conjugates coincide with the Heisenberg
fields and their canonical conjugates, but they evolve from t = 0 in a different manner. The next
task is to express the canonical conjugates of the interaction picture fields in terms of the fields
and their time derivatives. This is done by taking the functional derivative of H0 (not H) with
respect to the canonically conjugate fields. The result is an expression for the interaction term V
in the interaction picture in terms of the free fields, as desired.
3.5
Some remarks on the physics of quantum fields
It is clear that the entire framework described in the preceding sections was based on the fact that
we want to calculate the S-matrix elements for a given scattering experiment. These S-matrix
elements describe the scattering experiment in terms of the incoming particles and the resulting
outgoing particles, but they give no insight into the processes during the period that the particles
are interacting. The free field expansion of the S-operator is not very useful to examine this. Also,
the methods above give us no information about how to describe arbitrary relativistic quantum
systems (beyond scattering experiments). One could argue that for many practical purposes this
is unnecessary, but in the end any satisfactory fundamental theory of nature should in principle be
able to describe any system. Therefore, we must extend our discussion of the preceding sections.
The obvious extension would be to somehow give meaning to Lagrangian theories in terms of
interacting Heisenberg fields, and to assume that these fields describe the exact evolution of the
system. For scattering experiments, these fields then interpolate between the fields of incoming and
outgoing free particles. It is difficult to imagine what these fields would be like from a mathematical
point of view, since in general we cannot even solve the (non-linear) classical equations of motion
corresponding to these fields. However, there are some properties that we must expect these fields
to have. For example, they should transform under Poincaré transformations in such a way that
the theory is Poincaré invariant, and any physical quantity that can be measured in some bounded
71
spacetime region must be compatible with the physical quantities that can be measured in some
other spacelike-separated region. The latter demand can most easily be implemented by assuming
that the fields either commute or anticommute at spacelike separated points, just as the free
fields described earlier. These considerations will all be used to motivate the two mathematical
frameworks for quantum field theory that we will discuss in the next chapter.
72
4
The mathematics of quantum fields
In this chapter we will discuss the mathematical structure of quantum field theory. In the first
section we will introduce the Wightman axioms and we will motivate these axioms by refering
back to several physical aspects that we discussed in the previous chapters. After introducing
these axioms we will formulate some of the important theorems that can be proven within the
Wightman formalism. In the second section of this chapter we will briefly discuss an alternative
approach to the problem of finding a mathematical framework for quantum field theory, namely
the theory of local observables, also called algebraic quantum field theory. The axioms of the
theory of local observables are often refered to as the Haag-Kastler axioms.
4.1
The Wightman formulation of quantum field theory
However, before we can understand the Wightman formalism, we first need to develop some mathematical background. This mathematical background consists mainly of the theory of distributions
and operator-valued distributions and will be summarized in the following subsection, which is
based on chapter 2 of [32] and section 2.7 of [2]. The rest of this section on the Wightman
formulation is also mainly based on these two books.
4.1.1
Mathematical preliminaries: Distributions and operator-valued distributions
Let C ∞ (RN ) be the space of infinitely differentiable complex-valued functions f (x1 , . . . , xN ) on
RN . For any sequence k = (k1 , . . . , kN ) with kj ∈ Z≥0 we define a function x = (x1 , . . . , xN ) 7→ xk
in C ∞ (RN ), where xk is given by
xk := x1 k1 . . . xN kN .
Also, for any such sequence k we define the differential operator Dk on C ∞ (RN ) by
Dk :=
∂ |k|
,
(∂x1 )k1 . . . (∂xN )kN
where |k| = k1 + . . . + kN . For each f ∈ C ∞ (RN ) we then define for any r, s ∈ Z≥0
X
X
kf kr,s =
sup |xk Dl f (x)|,
{k:|k|≤r} {l:|l|≤s}
x
which is either a non-negative real number or else +∞. For any fixed r, s the restriction of
k kr,s : C ∞ (RN ) → R to the linear subspace C ∞ (RN )r,s of C ∞ (RN ) consisting of all functions f
for which kf kr,s < ∞ is a norm.
Definition 4.1 The function space S(RN ) is defined to be the set of all f ∈ C ∞ (RN ) for which
kf kr,s < ∞ for all r, s ∈ Z≥0 . The space S(RN ) is also called the Schwartz space.
In particular, since the k kr,s are norms (and hence semi-norms) on the Schwartz space, we can
give S(RN ) the structure of a locally convex space by taking as a subbasis for the topology the
sets of the form
Br,s (f0 , ) = {f ∈ S(RN ) : kf − f0 kr,s < },
where f0 ∈ S(RN ), r, s ∈ Z≥0 and > 0. Thus a set U ⊂ S(RN ) T
is open if and only if for each
f0 ∈ U there are r1 , . . . , rn , s1 , . . . , sn and 1 , . . . , n > 0 such that nj=1 Brj ,sj (f0 , j ) ⊂ U . As in
any topological space, we say that a sequence (fn ) of elements in S(RN ) converges to f ∈ S(RN )
if for each open neighborhood U of f there exists a positive integer NU such that fn ∈ U for all
n ≥ NU . In particular, this condition for U must hold for any open neighborhood of the form
Br,s (f, ), so if (fn ) converges to f then for every r, s ∈ Z≥0 and for every > 0 there exists
a positive integer Nr,s, such that kfn − f kr,s < for all n ≥ Nr,s, . Conversely, if (fn ) is a
sequence such that for every r, s ∈ Z≥0 and for every > 0 there exists a positive integer Nr,s,
such that kfn − f kr,s < for all n ≥ Nr,s, , then (fn ) must converge to f . This is because each
73
T
open neighborhood U of f contains an open subset of the form nj=1 Brj ,sj (f, j ). Therefore, we
conclude that a sequence (fn ) in the Schwartz space converges to some f in the Schwartz space if
and only if for all r, s ∈ Z≥0 we have limn→∞ kfn − f kr,s = 0.
Now that we have defined the Schwartz space we will introduce the notion of a distribution on
this space.
Definition 4.2 A continuous linear functional T : S(RN ) → C on the Schwartz space is called a
tempered distribution. We will denote the space of tempered distributions by S 0 (RN ).
Because the Schwartz space S(RN ) is metrizable32 , the continuity of a linear functional T :
S(RN ) → C can be expressed in terms of sequences (see for instance [25], theorem 21.3): the
linear functional T is continuous if and only if for each sequence (fn ) converging to f we have that
T (fn ) converges to T (f ), which is equivalent to the statement that limn→∞ kfn − f kr,s = 0 for all
r, s ∈ Z≥0 implies that limn→∞ |T (fn ) − T (f )| = 0.
The most natural topology to define on S 0 (RN ) is the weak*-topology, which is the topology
that is defined by the seminorms pf : T 7→ |T (f )|. With respect to this topology, a sequence (Tn )
of tempered distributions converges to a tempered distribution T if and only if limn→∞ |Tn (f ) −
T (f )| = 0 for all f ∈ S(RN ).
We now introduce some terminology. We say that a distribution T ∈ S 0 (RN ) vanishes in an
open set U ⊂ RN if T (f ) = 0 for all f ∈ S(RN ) for which supp(f ) ⊂ U . Here supp(f ) denotes the
support of f , i.e. the complement of the largest open set contained in {x ∈ RN : f (x) = 0}. We
then define the support supp(T ) of the distribution T to be the complement of the largest open
set on which T vanishes.
An important example of a tempered distribution is
X Z
T (f ) =
tk (x1 , . . . , xN )Dk f (x1 , . . . , xN )dx1 . . . dxN ,
(4.1)
N
{k:|k|≤s} R
where k = (k1 , . . . , kN ) (with kj ∈ Z≥0 ) and the functions tk are continuous and satisfy |tk (x)| ≤
Ck (1 + kxkjk )R for some Ck ≥ 0 and jk ∈ Z≥0 . A particularly nice case of (4.1) is when T is of the
form T (f ) = RN t(x)f (x)dN x. In that case we say that the tempered distribution T is a function.
However, it is convenient to write any distribution T as T (x), even though it is not a function.
For any non-singular linear transformation L : RN → RN and any vector a ∈ RN we can define
the diffeomorphism φ(a,L) (x) = Lx + a on RN . For any function f ∈ S(RN ) we then define a new
function f(a,L) by
−1
f(a,L) (x) := f (φ−1
(a,L) (x)) = f (L (x − a)).
For a distribution T ∈ S 0 (RN ) we define a new distribution T(a,L) by
T(a,L) (f ) := | det(L)|−1 T (f(a,L) )
−1
for all f ∈ S(RN ). If we define variables (y1 , . . . , yN ) = (φ−1
1 (x), . . . , φN (x)), then we find that
the volume elementRin RN satisfies dN x = | det(L)|dN y. Now suppose that T is a distribution that
is given by T (f ) = RN t(x)f (x)dN x, as a special case of (4.1). Then
Z
−1
−1
T(a,L) (f ) = | det(L)| T (f(a,L) ) = | det(L)|
t(x)f (φ−1 (x))dN x
N
R
Z
−1
−1
−1
= | det(L)|
(t ◦ φ)(φ (x))f (φ (x))dN x
N
ZR
−1
= | det(L)|
(t ◦ φ)(y)f (y)| det(L)|dN y
N
R
Z
=
t(Lx + a)f (x)dN x,
(4.2)
RN
32
This is because the topology of S(RN ) is determined by countably many seminorms, see also proposition IV.2.1
of [5] for this argument. It can be shown that S(RN ) is complete as a metric space, so that it is in fact a Fréchet
space, but we will not need this fact.
74
where in the last step we changed dummy variables. As we have stated before, distributions are
often written as T (x), even though they are not functions in general. In view of equation (4.2)
above, the distribution T(a,L) is then denoted by T (Lx + a). With the definition of T(a,L) at hand,
we can now define the partial derivative of a tempered distribution T as
∂T
:= lim h−1 (T(hej ,0) − T ).
h→0
∂xj
N
0
N
where {ej }N
j=1 is the standard basis for R . By definition of convergence in S (R ) this means
that for each f ∈ S(RN ) we have
∂T
(f ) =
∂xj
lim h−1 (T(hej ,0) − T )(f ) = lim h−1 [T (f(hej ,0) ) − T (f )]
h→0
h→0
lim T [h (f(hej ,0) − f )] = T [ lim h−1 (f(hej ,0) − f )]
h→0
∂f
= −T
,
∂xj
=
−1
h→0
where in the second last step we used that h−1 (f(hej ,0) − f ) converges in S(RN ) and that T is
continuous. For higher-order derivatives we have
(Dk T )(f ) = (−1)|k| T (Dk f ).
Suppose that for some n ≥ 2 we have a distribution T ∈ S 0 (Rn·N ) and let f1 , . . . , fn ∈ S(RN ).
Then the product function f1 · . . . · fn is an element of S(Rn·N ), so we can consider T (f1 · . . . · fn ).
This expression is linear in each of the fj in the sense that
T (f1 · . . . · (λfj0 + µfj00 ) · . . . · fn ) = λT (f1 · . . . · fj0 · . . . · fn ) + µT (f1 · . . . · fj00 · . . . · fn ),
and it depends continuously on each of the fj in the sense that
lim T (f1 · . . . · fj,l · . . . · fn ) = T (f1 · . . . · fj · . . . · fn )
l→∞
if liml→∞ fj,l = fj . Thus, T ∈ S 0 (Rn·N ) defines a multilinear functional on S(RN )×n that is
separately continuous in each of its arguments. Conversely, it is also true that each multilinear
functional on S(RN )×n which is continuous in each of its arguments can be derived from a (unique)
tempered distribution on S(Rn·N ) as above. This is the content of the nuclear theorem, which can
be found in section 1.3 of [9].
Theorem 4.3 (Nuclear theorem) Let Tb : S(RN )×n → C be a multilinear functional which is
separately continuous in each of its arguments. Then there exists a unique tempered distribution
T ∈ S 0 (Rn·N ) such that for all f1 , . . . , fn ∈ S(RN ) we have
Tb(f1 , f2 , . . . , fn ) = T (f1 · f2 · . . . · fn ).
We will now discuss the Fourier transform on distributions. Recall that the Fourier transform
and the inverse Fourier transform of a Schwartz function f are defined by
N Z
1
(FB f )(p) = √
e−iB(p,x) f (x)dN x
2π
RN
and
N Z
1
eiB(p,x) f (x)dN x,
(F B f )(p) = √
N
2π
R
respectively, where B(., .) denotes a non-degenerate symmetric bilinear form (for example the
Euclidean or the Minkowskian form). Now let f, g ∈ S(RN ). Because Schwartz functions behave
very nice, we can use Fubini’s theorem to conclude that the Fourier transform satisfies
Z
Z
N
f (x)(FB g)(x)dN x.
(FB f )(p)g(p)d p =
RN
RN
75
We will use this
case. Suppose that T ∈ S 0 (RN ) is a tempered distribution of the
R in the following
form T (f ) = RN h(x)f (x)dN x with h ∈ S(RN ) a Schwartz function. We will write Th instead of
T to denote the dependence of T on h. Then the equality above implies that we have the identity
Z
Z
TFB h (g) =
(FB h)(p)g(p)dN p =
h(x)(FB g)(x)dN x
RN
RN
= Th (FB g).
Similarly, we also have the identity
TF B h (g) = Th (F B g)
for the inverse Fourier transform. The left-hand sides of these two equations can be used as a
definition of the Fourier transform and its inverse, respectively, of the tempered distribution Th .
This motivates the following definition for the Fourier transform and the inverse Fourier transform
for tempered distributions.
Definition 4.4 Let T ∈ S 0 (RN ) be a tempered distribution. Then the Fourier transform FB T of
T is defined by
(FB T )(f ) = T (FB f ).
The inverse Fourier transform F B T of T is defined by
(F B T )(f ) = T (F B f ).
In order to also define the Laplace transform on distributions, it is convenient to start with a
larger class of distributions than S 0 (RN ). Let D(RN ) ⊂ C ∞ (RN ) denote the set of all C ∞ -functions
with compact support. By definition, a sequence (fn ) in D(RN ) converges to f ∈ D(RN ) if the
supports of all fn lie in a single compact set K, if fn → f uniformly in K and if all derivatives of
fn converge uniformly in K to the derivatives of f . It is clear that D(RN ) ⊂ S(RN ) as sets, so
every tempered distribution defines a linear functional on D(RN ), and because convergence of a
sequence in D(RN ) implies convergence of the same sequence in S(RN ), we see that any tempered
distribution is in fact continuous with respect to the topology on D(RN ). So if D0 (RN ) denotes the
space of all continuous linear functionals on D(RN ), then we have the inclusion S 0 (RN ) ⊂ D0 (RN ).
In general this inclusion will be strict and we have thus obtained a class of distributions D0 (RN )
that is larger than S 0 (RN ).
Now if T ∈ D0 (RN ) then for each g ∈ C ∞ (RN ) we can define a distribution gT by (gT )(f ) =
T (f g) for f ∈ D(RN ). It is easy to see that gT ∈ D0 (RN ). Sometimes it can happen that gT is
even a tempered distribution.
Definition 4.5 For each T ∈ D0 (RN ) we define a set Γ(T ) ⊂ RN by
Γ(T ) = {γ ∈ RN : e−B( . ,γ) T ∈ S 0 (RN )},
where B denotes a non-degenerate symmetric bilinear form and e−B( . ,γ) denotes the C ∞ -function
x 7→ e−B(x,γ) on RN .
It can be shown (see section 2.3 of [32]) that Γ(T ) is convex, i.e. if γ1 , γ2 ∈ Γ(T ) then also
tγ1 + (1 − t)γ2 ∈ Γ(T ) for all 0 < t < 1. Note that this does not exclude the case Γ(T ) = ∅, so Γ(T )
might still be empty. However, if T ∈ S 0 (RN ) then 0 ∈ Γ(T ), so then Γ(T ) is certainly non-empty.
In general, whenever T ∈ D0 (RN ) is such that Γ(T ) is non-empty and such that the support of T
lies in some half-space of the form
(B)
Hα,r
:= {x ∈ RN : B(x, α) > r}
with α ∈ RN and r ∈ R, the following theorem gives some information about Γ(T ).
76
(B)
Theorem 4.6 If T ∈ D0 (RN ) with supp(T ) ⊂ Hα,r for some α ∈ RN and r ∈ R, then Γ(T )
contains all points of the form γ + tα with γ ∈ Γ(T ) and t ≥ 0.
This is theorem 2.7 of [32]. Note that the actual value of r ∈ R is not important here. We now
define the Laplace transform of a distribution.
Definition 4.7 Let T ∈ D0 (RN ). For each γ ∈ Γ(T ) we define the Laplace transform LB (T )γ ∈
S 0 (RN ) by
LB (T )γ = FB (e−B( . ,γ) T ).
R
N
If
R T is given by T (fN) = RN t(x)f (x)d x, then its Laplace transform is given by LB (T )γ (f ) =
RN LB (T )γ (p)f (p)d p, with the function LB (T )γ (p) given by
N Z
1
√
LB (T )γ (p) =
e−iB(p,x) e−B(x,γ) t(x)dN x
N
2π
R
N Z
1
√
=
e−iB(x,p−iγ) t(x)dN x,
N
2π
R
where we have extended B to complex vectors by making B a C-bilinear form (not a sesquilinear
form). We will often identify the Laplace transform LB (T )γ with the function LB (T )γ (p). The
expression for LB (T )γ (p) gives the impression that the Laplace transform of a distribution T of
the form above depends on the complex variables p − iγ = (p1 − iγ1 , . . . , pN − iγN ) in a nice way.
In fact, as stated in the following theorem, which is theorem 2.6 in [32], this is even true for general
distributions in D0 (RN ).
Theorem 4.8 Let Γ ⊂ RN be a convex open set and let T ∈ D0 (RN ) be such that Γ ⊂ Γ(T ). Then
the Laplace transform LB (T )γ is a holomorphic function LB (T )(p − iγ) on the tube RN − iΓ ⊂ CN .
We will now apply the theorems above to distributions on Minkowski space. For the inside of
the future light cone we will use the notation
V + = {x ∈ M : η(x, x) > 0, x0 > 0},
where we have assumed that we have already chosen an orthonormal basis in M. The closure of
this set is V + = {x ∈ M : η(x, x) ≥ 0, x0 ≥ 0} and is just the union of V + with the future light
cone C + . For each a ∈ V + we have that η(x, a) ≥ 0 for all x ∈ V + , so for each a ∈ V + we have the
inclusion
(η)
V + ⊂ Ha,−
for any > 0. Now consider the n-fold product Mn ' R4n of M. On Mn we define a nondegenerate symmetric bilinear form η (n) by
η (n) ((x1 , . . . , xn ), (y1 , . . . , yn )) =
n
X
j=1
η(xj , yj ) =
n
X
ηµν xµj yjµ .
j=1
From now on, we will denote the bilinear form η (n) simply by η. Then, analogous to the inclusion
above, we have for all a ∈ (V + )n
(η)
(V + )n ⊂ Ha,−
for any > 0. Thus, if T ∈ S 0 (Mn ) (so that 0 ∈ Γ(T )) has support in V + then ta ∈ Γ(T ) for all
a ∈ (V + )n and t ≥ 0, i.e. (V + )n ⊂ Γ(T ). Because (V + )n is open and convex in Mn , the Laplace
transform of T is holomorphic on the tube Mn − i(V + )n =: Tn . This is summarized in the first
part of the following theorem. The second part is theorem 2.9 in [32].
Theorem 4.9 If T ∈ S 0 (Mn ) with supp(T ) ⊂ (V + )n then the Laplace transform Lη (T )(p − iγ) is
a holomorphic function on the tube Tn = Mn − i(V + )n . Also, for each f ∈ S(Mn ) we have
Z
lim
Lη (p − iγ)f (p)d4n p = (Fη (T ))(f ).
γ→0 Mn
77
We will now introduce the notion of a vector-valued distribution. Our purpose for using vectorvalued distributions will be to define operator-valued distributions. Vector-valued distributions can
be defined to take on values in any locally convex space, but for us it will be enough to restrict
ourselves to Hilbert spaces.
Definition 4.10 Let H be a Hilbert space. Then a linear map T : S(RN ) → H is called a vectorvalued distribution if for all Ψ ∈ D, with D ⊂ H some dense linear subspace, the linear functional
f 7→ hT (f ), Ψi is continuous. Thus, a vector-valued distribution is a linear map T : S(RN ) → H
such that f 7→ hT (f ), Ψi is a tempered distribution for all Ψ in some dense linear subspace D ⊂ H.
With this definition, the definition of an operator-valued distribution can be given as follows.
Definition 4.11 Let T be a linear map from the Schwartz space S(RN ) to the set of closable33
operators on a Hilbert space H which are all defined on the same dense linear subspace D ⊂ H.
Then the map T is called an operator-valued distribution in H if for all Ψ ∈ D the correspondence
f 7→ T (f )Ψ is a vector-valued distribution.
4.1.2
The Wightman axioms
In section 2.2.4 we showed that in any quantum theory that is Poincaré invariant we should have a
e↑ of the restricted Poincaré group P ↑ on the Hilbert
unitary representation of the double cover P
+
+
space H of pure states. Therefore, before we can even start discussing quantum fields the following
axiom must be satisfied:
Axiom 0: Relativistic quantum theory
If H denotes the Hilbert space of pure states for some quantum system in Minkowski spacetime,
e↑ → B(H) of the double cover of the restricted Poincaré
then there is a unitary representation U : P
+
group on H describing the transformation of states and operators under a Poincaré transformae↑ is represented by a unitary operator of
tion. In particular, any spacetime translation (a, 1) ∈ P
+
the form
U (a, 1) = eia·P ,
where the self-adjoint operators P = (P 0 , P 1 , P 2 , P 3 ) are interpreted as the energy-momentum
operators of the system. The points in the joint spectrum of these operators lie on or inside the
positive light cone in momentum space (positive energy condition), i.e. the operators P 0 and
M 2 = P · P are both positive operators.
Our description of quantum fields in the previous chapter seems to suggest that quantum fields are
objects which assign an operator to each point in spacetime. However, the fields at a spacetime
point are too singular to be a well-defined operator. Therefore, we assume that quantum fields only
define well-defined operators after they are smeared out with some rapidly decreasing test function
over spacetime. The quantum fields are thus operator-valued distributions. This motivates the
following axiom.
Axiom 1: Quantum field
There is an object φ = (φ1 , . . . , φN ), called a quantum field, whose components are operator-valued
distributions mapping each function f in the Schwartz space S(M) of functions on Minkowski
spacetime to operators φ1 (f ), . . . , φN (f ) on H whose domains all contain the same dense subspace
D ⊂ H and which satisfy φj (f )D ⊂ D. The adjoints φj (f )∗ are also operators whose domains
contain D and which satisfy φj (f )∗ D ⊂ D; the adjoint field φ∗ = (φ∗1 , . . . , φ∗N ) is then defined by
φ∗j (f ) = φj (f )∗ . Furthermore, the dense subset D is left invariant by U , i.e. U (a, A)D ⊂ D for
e↑ .
any (a, A) ∈ P
+
33
An operator A : H → H is called closable if it has a closed extension, i.e. if it has an extension whose graph is
closed in H ⊕ H.
78
Here S(M) is of course the same as S(R4 ). Note that the fact that D is invariant under the
fields and their adjoints implies that for any Ψ ∈ D we can let arbitrary products of smeared fields
and their adjoints act on Ψ.
Equation (3.15) in the previous chapter shows that the quantum field components should transe↑ . This is stated in the following axiom.
form according to a representation of P
+
Axiom 2: Transformation law of the field
For each f ∈ S(M) we have the operator identity on D
U (a, A)φj (f )U (a, A)−1 =
N
X
S(A−1 )jk φk (f(a,A) ),
k=1
e↑ → P ↑ is the covering map) and S : SL(2, C) →
where f(a,A) (x) := f (Φ(A)−1 (x − a)) (here Φ : P
+
+
GL(CN ) is a representation of SL(2, C) on CN .
Note that this implies that for the adjoints of the smeared fields we have the transformation
law
N
X
S(A−1 )jk φk (f(a,A) )∗ ,
U (a, A)φj (f )∗ U (a, A)−1 =
k=1
which follows easily by taking the adjoint of the transformation law of the fields. The representation
S in axiom 2 is not assumed to be irreducible; in general it will be a direct sum S = S (κ1 ) ⊕. . .⊕S (κ` )
of irreducible representations S (κj ) of SL(2, C). Correspondingly, the field φ can be decomposed
into irreducible fields as φ = (φ(κ1 ) , . . . , φ(κ` ) ). Each of these irreducible fields has components
φ(κj ) ab with a = −Aj , −Aj + 1, . . . , Aj and b = −Bj , −Bj + 1, . . . , Bj , where Aj and Bj are the
labels characterizing the irreducible
representation S (κj ) of SL(2, C) as in the previous chapter.
P`
Of course, the Aj , Bj satisfy j=1 (2Aj + 1)(2Bj + 1) = N . Although the parameters {Aj , Bj }`j=1
give us information about how the different components of the field φ are related to each other,
we cannot say anything about the complete form of any single component φj as operator-valued
distribution. For instance, they do not have to satisfy the Klein-Gordon equation, which was
satisfied for our (free) field components in the previous chapter. To say more about the field
e↑ .
components, we also need to know about the representation U (a, A) of P
+
To obtain a Lorentz invariant S-matrix it was also necessary that the Hamiltonian density
commutes with itself at spacelike distances, see equation (3.5). This was then translated to the
requirement that the fields and their adjoints should in fact commute or anticommute with each
other as in equation (3.20), and in section 3.5 we argued that this property should probably remain
valid beyond scattering theory. In terms of operator-valued distributions this can be formulated
by using Schwartz functions whose supports are spacelike separated. We say that the supports of
two Schwartz functions f and g are spacelike separated if f (x)g(y) = 0 whenever (x − y)2 ≤ 0.
Axiom 3: Local commutativity or microscopic causality
If f and g are Schwartz functions on Minkowski spacetime whose supports are spacelike separated,
then for any j, k ∈ {1, . . . , N } the corresponding smeared operators either commute or anticommute, i.e.
[φj (f ), φk (g)]± = 0
as operator identity on D. Similarly, we also have
[φj (f ), φk (g)∗ ]± = 0.
From the Poincaré covariance of the fields we expect that the components of an irreducible field
φ(κj ) either all commute with each other at spacelike distances, or else they all anticommute with
each other at spacelike distances.
79
Finally, we will also assume that each quantum field theory has a unique vacuum state and
that the entire Hilbert space H of pure states can be constructed by acting on the vacuum state
with polynomials in the smeared fields.
Axiom 4: Vacuum state
There exists a unique34 vector Ω ∈ D, called the vacuum state vector, which is invariant under the
e↑
action of P
+
U (a, A)Ω = Ω
and which is cyclic for the smeared fields, i.e. the set P (φ1 , . . . , φN )Ω of polynomials in the smeared
fields acting on the vacuum vector forms a dense subspace in H.
A quantum theory which satisfies the axioms 0-4 is called a quantum field theory. It is characterized by the objects (H, D, U, φ, Ω). The free fields discussed in the previous chapter provide
examples of quantum field theories. We will prove this for the free hermitean scalar field in subsection 4.1.5. The existence of these examples implies that the axioms above must be compatible
with each other. It can also be shown that these axioms are independent of each other, i.e. that
one can find theories that satisfy only a proper subset of these axioms, but we will not discuss this
here; see also section 3.2 of [32].
Finally, we want to make a remark related to the cyclicity of the vacuum state. We say that
the smeared fields form an irreducible set of operators in the Hilbert space if for A ∈ B(H) the
condition
hAφj (f )Ψ1 , Ψ2 i = hAΨ1 , φj (f )∗ Ψ2 i
for all Ψ1 , Ψ2 ∈ D, all f ∈ S(M) and all j, implies that A is a constant multiple of the identity. We
mention without proof that the cyclicity of the vacuum implies that in any quantum field theory
the fields form an irreducible set of operators.
4.1.3
Wightman functions
Given any quantum field theory with field components {φj }N
j=1 and their corresponding adjoints,
we can define for each n ≥ 0 maps of the form
D
E
(∗)
(∗)
(∗)
wi(∗) ...i(∗) : (f1 , . . . , fn ) 7→
φi1 (f1 )φi2 (f2 ) . . . φin (fn )Ω, Ω
n
1
D
E
=: φi(∗) (f1 )φi(∗) (f2 ) . . . φi(∗) (fn )Ω, Ω
1
2
n
(∗)
from S(M)×n to C. Here φj refers to either taking the adjoint of φj or not. Note that in the
second line we also introduce the notation φj ∗ to denote the adjoint field φ∗j . The benefit of this
notation is that we can refer to adjoint fields in expressions where only the field indices occur, such
as in wi(∗) ...i(∗) . However, from now on we will often suppress the (∗) unless it is really necessary.
n
1
The maps wi1 ...in are separately continuous in each of their n arguments, so according to the
nuclear theorem in the previous subsection, there exist unique tempered distributions Wi1 ...in on
S(M×n ) that satisfy
Wi1 ...in (f1 · f2 · . . . · fn ) = wi1 ...in (f1 , . . . , fn )
for all fj ∈ S(M). Here the arguments of each of the functions f1 , . . . , fn in the product f1 ·f2 ·. . .·fn
lie in different copies of M, so the product indeed defines a function in S(M×n ). The distributions
Wi1 ...in are called (n-point) vacuum expectation values or Wightman functions. As stated in the
previous subsection, we often write distributions as if they were functions. Thus we will often
write the Wightman functions as Wi1 ...in (x1 , . . . , xn ), where each of the variables xj denotes a
four-vector with components xµj . The Wightman functions satisfy some nice properties, as stated
in the following theorem. A detailed proof can be found in section 3.3 of [32].
34
Here we mean uniqueness up to a phase factor, of course.
80
Theorem 4.12 In any quantum field theory the Wightman functions are tempered distributions
which satisfy the following properties.
(a) (Relativistic transformation law). Under Poincaré transformations the Wightman functions transform as
X
S(A−1 )i(∗) j (∗) . . . S(A−1 )i(∗) j (∗) Wj (∗) ...j (∗) (Φ(A)x1 +a, . . . , Φ(A)xn +a) = Wi(∗) ...i(∗) (x1 , . . . , xn ),
(∗)
1
(∗)
n
1
n
1
n
1
n
j1 ,...,jn
where S(A−1 )i∗k jk∗ := S(A−1 )ik jk . So they are translation invariant and Lorentz covariant. For
each n, let ξj = xj − xj+1 for j = 1, . . . , n − 1. Then translation invariance implies that there exist
tempered distributions Vj1 ...jn (ξ1 , . . . , ξn−1 ) such that
Wj1 ...jn (x1 , . . . , xn ) = Vj1 ...jn (ξ1 , . . . , ξn−1 ).
cj ...jn = F η (Wj ...jn ) and Vbj ...jn =
(b) (Spectral conditions). The (inverse) Fourier transforms W
1
1
1
F η (Vj1 ...jn ) of Wj1 ...jn and Vj1 ...jn are tempered distributions and are related by


n
X
cj ...jn (p1 , . . . , pn ) = (2π)4 δ 
pj  Vbj1 ...jn (p1 , p1 + p2 , . . . , p1 + p2 + . . . + pn−1 ).
W
1
j=1
Also, Vbj1 ...jn (q1 , . . . , qn−1 ) = 0 if any qj is not in the joint spectrum of the operators P µ .
(c) (Hermiticity conditions). The Wightman functions satisfy
Wi1 ...in (x1 , . . . , xn ) = Wi∗n ...i∗1 (x1 , . . . , xn ),
where i∗k refers to the field obtained by taking the adjoint of the (adjoint) field that is referred to
(∗)
by the index ik ≡ ik .
(d) (Local commutativity conditions). If (xj − xj+1 )2 < 0, then for j = 1, . . . , n − 1 we have
Wj1 ...jn (x1 , . . . , xj+1 , xj , . . . , xn ) = ∓Wj1 ...jn (x1 , . . . , xj , xj+1 , . . . , xn ),
where the signs ∓ correspond to the two cases [φj , φj+1 ]± , respectively.
(e) (Positive definiteness conditions). For any sequence {fi1 ,...,in }∞
n=0 with fi1 ,...,in (x1 , . . . , xn )
n
in S(M ) and with fn ≡ 0 for all but finitely many values of the multi-indices (i1 , . . . , in ), we have
the inequality
∞
X
X
X
Z
m,n=0 (i∗1 ,...,i∗m ) (j1 ,...,jn )
Mm+n
Wi∗m ,...,i∗1 ,j1 ,...,jn (xm , . . . , x1 , x01 , . . . , x0n )fi1 ,...,im (x1 , . . . , xm )
fj1 ,...,jn (x01 , . . . , x0n )d4 x1 . . . d4 xm d4 x01 . . . d4 x0n ≥ 0.
(f ) (Cluster decomposition property). For any spacelike vector a ∈ M and for any m ∈
{1, . . . , n} we have
lim Wj1 ...jn (x1 , . . . , xm , xm+1 + λa, xm+2 + λa, . . . , xn + λa)
λ→∞
= Wj1 ...jm (x1 , . . . , xm )Wjm+1 ...jn (xm+1 , . . . , xn ),
where the limit is taken in the topology of S 0 (Mn ).
Conversely, if we have a set of tempered distributions satisfying all the properties in the theorem,
then there exists a unique quantum field theory for which these distributions are the Wightman
functions. This is also called the reconstruction theorem, see for example section 3.4 of [32].
81
As stated in part (b) of the theorem above, the support of the distribution Vbj1 ,...,jn (q1 , . . . , qn−1 ) ∈
lies in (V + )n−1 . Therefore, according to theorem 4.9, the Laplace transform Lη (Vbj1 ,...,jn )
is holomorphic on the tube Tn−1 = Mn−1 − i(V + )n−1 and for each f ∈ S(Mn−1 ) we have
Z
Lη (Vbj1 ,...,jn )(x1 − iγ1 , . . . , xn−1 − iγn−1 )
f (x1 , . . . , xn−1 )d4(n−1) x
lim
S 0 (Mn−1 )
γ→0 Mn−1
= (Fη (Vbj1 ,...,jn ))(f )
= (Fη (F η (Vj1 ,...,jn )))(f )
= Vj1 ,...,jn (f ),
so in this sense Vj1 ,...,jn is the boundary value of a holomorphic function defined on the tube Tn−1 ,
namely Lη (Vbj1 ,...,jn ). We will also denote this holomorphic function as Vjhol
from now on. Thus,
1 ,...,jn
Vj1 ,...,jn (x1 , . . . , xn−1 ) = lim Vjhol
(x1 − iγ1 , . . . , xn−1 − iγn−1 ),
1 ,...,jn
γ→0
where the convergence is in S 0 (Mn−1 ) and γj ∈ V + . On the set Tn := {(x1 − iγ1 , . . . , xn − iγn ) ∈
Mn + iMn : γj − γj+1 ∈ V + } we can define another holomorphic function by
Wjhol
(x1 − iγ1 , . . . , xn − iγn ) := Vjhol
(x1 − x2 − i(γ1 − γ2 ), . . . , xn−1 − xn − i(γn−1 − γn )),
1 ,...,jn
1 ,...,jn
(x1 , . . . , xn ) are boundary values of
where γj − γj+1 ∈ V + , and the Wightman functions Wjhol
1 ,...,jn
these functions.
From part (a) of the theorem above it follows that under an SL(2, C) transformation the
distributions V transform according to
X
S(A−1 )i1 j1 . . . S(A−1 )in jn Vj1 ...jn (Φ(A)x1 , . . . , Φ(A)xn−1 ) = Vi1 ...in (x1 , . . . , xn−1 ).
j1 ,...,jn
Now consider the holomorphic function
X
Vjhol
(ξ
,
.
.
.
,
ξ
)
−
S(A−1 )i1 j1 . . . S(A−1 )in jn Vjhol
(Φ(A)ξ1 , . . . , Φ(A)ξn−1 )
1
n−1
,...,j
n
1
1 ...jn
j1 ,...,jn
on the tube Tn−1 . From the transformation properties of Vi1 ,...,in it follows that the boundary
value bi1 ...in (x1 , . . . , xn−1 ) ∈ S 0 (Mn−1 ) of this holomorphic function is zero. According to the
generalized uniqueness theorem for holomorphic functions with several complex variables (see for
example theorem B.10 of [2]) this holomorphic function must then be identically zero on the tube
Tn−1 . This shows that Vjhol
has the same transformation properties on the tube Tn−1 as Vj1 ,...,jn
1 ,...,jn
n−1
on M
.
In order to understand the following important theorem, we have to introduce the notion of a
complex Lorentz transformation. Let M
' C4 be complex Minkowski spacetime with
P3C := jM+iM
0
0
j
Minkowski metric ηC (z, w) = z w − j=1 z w for w, z ∈ C4 . Then we define a complex Lorentz
transformation to be a linear map L : MC → MC that preserves the metric ηC . The set L(C) of
complex Lorentz transformations forms a group, called the complex Lorentz group. As for ordinary
Lorentz transformations, we have det(L) = ±1 for complex Lorentz transformations, and we define
the proper complex Lorentz group L+ (C) to be the set of those complex Lorentz transformations
L with det(L) = +1. This group L+ (C) is connected (unlike L+ ) and its universal covering group
is SL(2, C) × SL(2, C). The covering map ΦC : SL(2, C) × SL(2, C) → L+ (C) is defined in the
following manner, which is very similar to the definition of the map Φ : SL(2, C) → L↑+ . We
begin with a bijection ψC : MC → M2 (C) that maps each element z ∈ MC to a matrix ψC (z)
with det(ψC (z)) = ηC (z, z); the matrix ψC (z) is defined by precisely the same formula as ψ(x) in
subsection 2.1.2. Then for (A, B) ∈ SL(2, C) × SL(2, C) we define the determinant preserving map
ΨA,B : M2 (C) → M2 (C) by
ΨA,B (Z) = AZB T .
82
Under the bijection ψC this determinant preserving map corresponds to a metric preserving map
ΦC (A, B) : MC → MC , so ΦC (A, B) ∈ L(C). Similar arguments as for the real case show that
ΦC : SL(2, C) × SL(2, C) → L(C) is actually a surjective Lie group homomorphism onto L+ (C).
The elements of the form (A, A) ∈ SL(2, C) × SL(2, C) form a subgroup isomorphic to SL(2, C)
and for such elements we have
ΦC (A, A) = Φ(A),
T
which simply follows from A = A∗ . So if we have a representation D of SL(2, C), then we can
interpret this as a representation of the subgroup {(A, A) : A ∈ SL(2, C)} of SL(2, C) × SL(2, C).
According to the discussion in section 9.1A of [2], this representation can be uniquely extended
by analyticity to a representation DC of SL(2, C) × SL(2, C). We now apply these ideas to the
Wightman functions. According to our discussion above, the holomorphic functions V hol satisfy a
transformation law of the form
X
D(A−1 )jk Vkhol (Φ(A)z1 , . . . , Φ(A)zn ) = Vjhol (z1 , . . . , zn ),
k
where we write only a single index. This index corresponds to some basis for the vector space
obtained by taking the n-fold tensor product of the N -dimensional vector space on which the
representation S acts (here N is the number of field components in the theory and S is the representation as in the Wightman axioms). We can decompose the representation D into irreducible
representations, which are of the form D (A,B) as we already noticed when we constructed general
free fields in the previous chapter. If there are any representations with A + B a half-odd integer,
then the corresponding components must be zero as follows from substituting A = −1 in the
transformation law for Vjhol and using that D (A,B) (−1) = (−1)A+B . The non-trivial irreducible
components of Vjhol thus transform according to single-valued representations of the restricted
Lorentz group L↑+ . Also, the analytic continuation of D to a representation of SL(2, C) × SL(2, C)
will define a single-valued representation of the proper complex Lorentz group L+ (C). We are now
ready to state the following theorem of Bargmann, Hall and Wightman, the proof of which can be
found in section 9.1B of [2] (theorem 9.1) or section 2.4 of [32] (theorem 2.11).
Theorem 4.13 (Bargmann-Hall-Wightman) Let Fj (z1 , . . . , zn ) with j = 1, . . . , N be a set of
holomorphic functions defined on the tube Tn that satisfies
X
D(A−1 )jk Fk (Φ(A)z1 , . . . , Φ(A)zn ) = Fj (z1 , . . . , zn )
k
for A ∈ SL(2, C) and with D a representation of SL(2, C) the irreducible components of which
are of the form D (A,B) with A + B ∈ Z≥0 . Then the Fj can be uniquely extended by analytic
continuation to a holomorphic function on the so-called extended tube
[
Tn0 :=
LTn
L∈L+ (C)
and this extension satisfies
X
DC (A−1 , B −1 )jk Fk (ΦC (A, B)z1 , . . . , ΦC (A, B)zn ) = Fj (z1 , . . . , zn )
k
for (A, B) ∈ SL(2, C) × SL(2, C).
In view of our discussion preceding the theorem, the theorem states that every L↑+ -covariant holomorphic function on the tube has a unique analytic continuation on the extended tube which
is L+ (C)-covariant. We can apply the theorem to the functions Vjhol (z1 , . . . , zn ) to conclude that
they are actually L+ (C)-covariant holomorphic functions on the extended tube. Similarly, the holomorphic Wightman functions W hol (z1 , . . . , zn ) can be extended to L+ (C)-covariant holomorphic
functions on the extended tube
[
Tn0 :=
LTn .
L∈L+ (C)
83
Although the tube Tn did not contain any real points, the extended tune Tn0 does. These points
are called Jost points. There is a simple characterization of Jost points, which can be found in
section 2.4 of [32] (theorem 2.12) or in section 9.1C of [2] (proposition 9.5), but we will not need
it.
4.1.4
Important theorems
We will now discuss some of the famous theorems that can be proved for any quantum field theory,
such as the spin-statistics theorem and the PCT theorem. To understand the main arguments
in the proofs of these theorems, it is useful to know something about polynomial algebras of
operators corresponding to open sets in Minkowski spacetime. In a quantum field theory with field
φ = (φ1 , . . . , φN ) we define for each open set O ⊂ M the set P(O) consisting of all operators of
the form
M
X
c1H +
φj1 (fk,1 ) . . . φjk (fk,k )
k=0
with c ∈ C, M ∈ Z≥0 and the fk,l (with 1 ≤ l ≤ k and 0 ≤ k ≤ M ) functions in S(M) with
support contained in O. It is clear that P(O) is a ∗-algebra; it is called the polynomial algebra
of O. According to the following theorem, the vacuum vector Ω ∈ H is cyclic for any P(O) with
O ⊂ M open. We will only give a sketch of the proof, since some of the details of the proof require
some more knowledge of holomorphic functions which will not be very relevant for our purposes;
the full proof can be found in section 4.2 of [32].
Theorem 4.14 (Reeh-Schlieder) Given some quantum field theory (H, D, U, φ, Ω), let O ⊂ M
be a non-empty open set and let Ψc ∈ H be cyclic for P(M). Then Ψc is also cyclic for P(O).
Proof sketch
Let Ψ ∈ H be a vector which is orthogonal to the set {AΨc }A∈P(O) . The first step in the proof
consists of defining tempered distributions Fi(∗) ...i(∗) by
1
n
Fi(∗) ...i(∗) (−x1 , x1 − x2 , . . . , xn−1 − xn ) = hφi(∗) (x1 ) . . . φi(∗) (xn )Ψc , Ψi
1
n
1
n
and by argueing that the inverse Fourier transforms of these distributions vanish unless all of the
variables lie in the joint spectrum of the operators P µ (which is a subset of V + ). Then theorem 4.9
is used to define a function Fihol
which is holomorphic in the tube Tn in the complex variables
1 ...in
(−x1 ) − iγ1 , (x1 − x2 ) − iγ2 , . . . , (xn−1 − xn ) − iγn and which converges to Fi1 ...in as γ1 , . . . , γn → 0
in V + . By definition of Ψ, the supports of the distributions Fi1 ...in lie in the complement of
{(−x1 , x1 − x2 , . . . , xn−1 − xn ) ∈ Rn : x1 , . . . , xn ∈ O}, which in turn implies that Fihol
vanishes
1 ...in
on the whole tube Tn (this is a non-trivial argument from the theory of holomorphic functions
in several complex variables). But then the distributions Fi1 ...in vanish on the whole space Mn .
From the definition of these distributions it then follows that Ψ is in fact orthogonal to the set
{AΨc }A∈P(M) . Because Ψc was cyclic for P(M), we must have Ψ = 0. This proves that Ψc is
also cyclic for P(O).
For any open set O ⊂ M we define an open set O∨ ⊂ M by
O∨ = {x ∈ M : (x − y)2 < 0 for all y ∈ O}◦ ,
where we use the notation A◦ to denote the interior of a set A. Note that if O is also bounded,
then O∨ will be non-empty. In that case the following theorem applies.
Theorem 4.15 Given some quantum field theory (H, D, U, φ, Ω), let O ⊂ M be a non-empty open
set with O∨ 6= ∅ and let A ∈ P(O) be a monomial with AΩ = 0. Then A = 0.
84
Proof
Let Ψ ∈ D and let T ∨ ∈ P(O∨ ). Then,
hA∗ Ψ, T ∨ Ωi = hΨ, AT ∨ Ωi = 0,
where in the last step we used that A either commutes or anti-commutes with each term in the polynomial T ∨ , since O and O∨ are spacelike separated. By the previous theorem, {T ∨ Ω}T ∨ ∈P(O∨ )
is dense in H, so we conclude that A∗ Ψ = 0 for all Ψ ∈ D. For Ψ1 , Ψ2 ∈ D we then have
hAΨ1 , Ψ2 i = hΨ1 , A∗ Ψ2 i = 0. Because D is dense in H, this implies that AΨ = 0 for all Ψ ∈ D.
The (anti)commutator satisfies [A∗ , B ∗ ]± = (BA ± AB)∗ = ±[A, B]∗± , which implies that if the
field components φj and φk (anti)commute at spacelike distances then the adjoint components φ∗j
and φ∗k also (anti)commute at spacelike distances. Using the theorem above, we can also show that
if the field components φj and φk (anti)commute at spacelike distances then the field components
φj and φ∗k also (anti)commute at spacelike distances.
Theorem 4.16 (Dell’Antonio). Let (H, D, U, (φi )N
i=1 , Ω) be a quantum field theory and let j, k ∈
{1, . . . , N }. If we have at spacelike distances that
[φj , φk ]± = 0,
while
[φj , φ∗k ]∓ = 0,
then either φj or φk vanishes.
Proof
For any non-zero f, g ∈ S(M) with spacelike separated supports we have for Ψ ∈ D
φj (f )∗ φk (g)∗ φk (g)φj (f )Ψ = ±φk (g)∗ φj (f )∗ φk (g)φj (f )Ψ
= ∓ ± φk (g)∗ φk (g)φj (f )∗ φj (f )Ψ
= −φk (g)∗ φk (g)φj (f )∗ φj (f )Ψ.
Applying this to the vacuum vector Ω ∈ D we find the inequality
0 ≥ −kφk (g)φj (f )Ωk2 = −hφj (f )∗ φk (g)∗ φk (g)φj (f )Ω, Ωi
= hφk (g)∗ φk (g)φj (f )∗ φj (f )Ω, Ωi.
Suppose now that the supports K(f ), K(g) ⊂ M of f and g are compact and non-empty (and still
spacelike separated). Let a ∈ M be a spacelike vector such that the compact set Kλ (g) := K(g)+λa
remains spacelike separated from K(f ) for all λ > 0, and let gλ be the function gλ (x) = g(x − λa).
Then the support of gλ is clearly Kλ (g), and for each λ ≥ 0 the inequality above gives
hφk (gλ )∗ φk (gλ )φj (f )∗ φj (f )Ω, Ωi ≤ 0.
By the cluster decomposition property of Wightman functions, we have
lim hφk (gλ )∗ φk (gλ )φj (f )∗ φj (f )Ω, Ωi = hφk (g)∗ φk (g)Ω, Ωihφj (f )∗ φj (f )Ω, Ωi
λ→∞
= kφk (g)Ωk2 kφj (f )Ωk2
≥ 0.
Together, these inequalities imply that kφk (g)Ωk2 kφj (f )Ωk2 = 0, so either φj (f )Ω = 0 or φk (g)Ω =
0. According to the previous theorem, this in turn implies that either φj (f ) = 0 or φk (g) = 0.
We thus conclude that for all f, g ∈ S(M) with spacelike separated non-empty compact supports
we have either φj (f ) = 0 or φk (g) = 0. Suppose that φj does not vanish. Then there exists a
85
function h ∈ S(M) with non-empty compact support K(h) such that φj (h) 6= 0. Then for any
function p ∈ S(M) with compact support K(p) which is spacelike separated from K(h), we have
φk (p) = 0. By considering different functions hi ∈ S(M) with compact supports K(hi ) ⊂ K(h)
and repeating the same argument, we find that φk (p) = 0 for all p ∈ S(M) with compact support.
Because the set of all such functions p is dense in S(M), this implies that φk vanishes. Similarly,
assuming that φk does not vanish will imply that φj vanishes.
As discussed before, in any quantum field theory we can decompose the field into fields which
e↑ . In the previous chapter we found that irretransform as an irreducible representation of P
+
ducible fields which transform according to the (A, B)-representation can only describe particles
with spin j ∈ {|A − B|, |A − B| + 1, . . . , A + B − 1, A + B}. Therefore, an irreducible field which
transforms according to the (A, B)-representation will be called a field of integer spin if A + B is
an integer and a field of half-odd integer spin if A + B is a half-odd integer.
The following theorem shows that the components of an irreducible field of integer spin must
commute with each other at spacelike separated distances and that the components of an irreducible
field of half-odd integer spin must anticommute with each other at spacelike distances.
Theorem 4.17 (Spin-statistics theorem). Let (H, D, U, φ, Ω) be a quantum field theory and
let φ(κ) be an irreducible field in the decomposition of φ into irreducible fields. Suppose that φj is
a component of φ which belongs to φ(κ) . Then, if φ(κ) is of integer spin and φj satisfies
[φj (x), φ∗j (y)]+ = 0
for (x − y)2 < 0, or if φ(κ) is of half-odd integer spin and φj satisfies
[φj (x), φ∗j (y)]− = 0
for (x − y)2 < 0, then φj and φ∗j vanish.
Proof sketch
Suppose that φ satisfies one of the two alternatives stated in the theorem. Then
Vjj ∗ (x − y) + (−1) Vj ∗ j (−(x − y)) = hφj (x)φ∗j (y)Ω, Ωi + (−1) hφ∗j (y)φj (x)Ω, Ωi
= h(φj (x)φ∗j (y) + (−1) φ∗j (y)φj (x))Ω, Ωi
= 0,
where = 0 for integer spin and = 1 for half-odd integer spin. This implies that the corresponding
holomorphic functions satisfy
hol
Vjjhol
(4.3)
∗ (ξ) + (−1) Vj ∗ j (−ξ) = 0
on the tube T1 . It can be shown (see theorem 2.11 of [32]) that there exists a single-valued analytic
hol
35 T 0 , and
continuation of the holomorphic functions Vjjhol
∗ (ξ) and Vj ∗ j (−ξ) to the extended tube
1
that
hol
Vjhol
∗ j (ξ) = (−1) Vj ∗ j (−ξ),
where is as before. Combining this with (4.3) gives
hol
hol
hol
0 = Vjjhol
∗ (ξ) + (−1) (−1) Vj ∗ j (ξ) = Vjj ∗ (ξ) + Vj ∗ j (ξ).
Passing to the boundary, we obtain
0 = Vjj ∗ (x − y) + Vj ∗ j (x − y) = Vjj ∗ (x − y) + Vj ∗ j (−y − (−x))
= hφj (x)φ∗j (y)Ω, Ωi + hφ∗j (−y)φj (−x)Ω, Ωi.
(4.4)
The extended tube T10 is the set of all points ξ ∈ C4 of the form ξ = Λζ with ζ ∈ T1 and Λ a complex Lorentz
transformation. A complex Lorentz transformation Λ is a 4 × 4 complex matrix which satisfies ΛT ηΛ = η.
35
86
Now let f ∈ S(M) and define f − (x) := f (−x), then
kφj (f )∗ Ωk2 + kφj (f − )Ωk2 = hφj (f )φj (f )∗ Ω, Ωi + hφj (f − )∗ φj (f − )Ω, Ωi
Z
f (x)f (y)(hφj (x)φ∗j (y)Ω, Ωi + hφ∗j (−y)φj (−x)Ω, Ωi)dxdy
=
M2
= 0.
This implies both φj (f )∗ Ω = 0 and φj (f − )Ω = 0. Thus, because f ∈ S(M) was arbitrary, we
have φ∗j (x)Ω = 0 and φj (x)Ω = 0. Theorem 4.15 then implies that for all f ∈ S(M) with compact
support we have φ∗j (f ) = 0 and φj (f ) = 0, which in turn implies that φ∗j and φj vanish.
Together with our assumption that the components of an irreducible field either all commute
or else all anticommute with each other at spacelike distances, this theorem implies that at spacelike distances the components of an irreducible field all commute with each other if the irreducible
field is of integer spin and anticommute with each other if the irreducible field is of half-odd integer
spin. This is the famous connection between spin and statistics: if we identify commuting fields
with bosons and anticommuting fields with fermions, then bosons are described by fields of integer
spin and fermions by fields of half-odd integer spin.
The theorem gives no information about whether we should choose a commutator or anticommutator when we are dealing with components belonging to two different irreducible fields, so
there is some freedom here. We say that in a quantum field theory we have a normal connection
between spin and statistics if every component of a boson field commutes with any other field
component in the theory and if two components of (different) fermion fields always anticommute
with each other. It can be shown that in any quantum field theory in which there is no normal
connection between spin and statistics, there is always a transformation of the fields, called a Klein
transformation, which transforms the fields into new fields with a normal connection between spin
and statistics. Therefore, we may as well assume from now on that all quantum field theories have
a normal connection between spin and statistics. Then the following theorem applies.
Theorem 4.18 (PCT-theorem). Let (H, D, U, φ, Ω) be a quantum field theory with normal
connection between spin and statistics and let φ = (φ(κ1 ) j1 , . . . , φ(κ` ) j` ) be a decomposition of the
field into irreducible fields φ(κj ) transforming according to the (Aj , Bj )-representation of SL(2, C).
Then there exists a unique anti-unitary operator Θ on H which leaves the vacuum vector Ω invariant
and satisfies
∗
Θφ(κj ) (x)Θ−1 = (−1)2Aj ij φ(κj ) (−x),
where j = 0 if Aj + Bj is an integer and j = 1 if Aj + Bj is a half-odd integer.
Proof sketch
In the first part of the proof it is shown that in any quantum field theory with normal connection
between spin and statistics we have
Pk
l=1 jl
hφj1 (x1 ) . . . φjk (xk )Ω, Ωi = i
(−1)2
Pk
l=1
Ajl
hφ∗j1 (−x1 ) . . . φ∗jk (−xk )Ω, Ωi,
(4.5)
where jl = 0 (or jl = 1) if φjl is a component of a boson (or fermion) field transforming according
to the (Ajl , Bjl )-representation of SL(2, C). For functions fl ∈ S(M) this means that
Pk
hφj1 (f1 ) . . . φjk (fk )Ω, Ωi = i
l=1 jl
(−1)2
Pk
l=1
Ajl
hφj1 (fb1 )∗ . . . φjk (fbk )∗ Ω, Ωi,
where fb(x) = f (−x). From this it easily follows that
hφj1 (f1 )∗ . . . φjk (fk )∗ Ω, Ωi = hΩ, φjk (fk ) . . . φj1 (f1 )Ωi = hφjk (fk ) . . . φj1 (f1 )Ω, Ωi
= i
Pk
l=1 jl
(−1)2
Pk
= (−i)
l=1 jl
Pk
= (−i)
l=1 jl
87
Pk
l=1
Ajl
hφjk (fbk )∗ . . . φj1 (fb1 )∗ Ω, Ωi
(−1)2
Pk
Ajl
hφjk (fbk )∗ . . . φj1 (fb1 )∗ Ω, Ωi
(−1)2
Pk
Ajl
hφj1 (fb1 ) . . . φjk (fbk )Ω, Ωi.
l=1
l=1
So just like (4.5) we also get
Pk
hφ∗j1 (x1 ) . . . φ∗jk (xk )Ω, Ωi = (−i)
l=1 jl
(−1)2
Pk
l=1
Ajl
hφj1 (−x1 ) . . . φjk (−xk )Ω, Ωi.
(4.6)
Of course we can also make combinations of (4.5) and (4.6). The difference is that replacing a
field component by its adjoint gives a factor ij (−1)2Aj , while replacing an adjoint field component
by the corresponding field component gives a factor (−i)j (−1)2Aj . The next step in the proof
consists of showing that the antilinear extension of
Θφj1 (f1 ) . . . φjk (fk )Ω := (−i)
Pk
l=1 jl
(−1)2
Pk
l=1
Ajl
φ∗j1 (fb1 ) . . . φ∗jk (fbk )Ω
defines the anti-unitary operator with the desired properties. In showing this, the identities above
are used to derive the anti-unitarity.
In physics it is often convenient to consider quantum fields at a given time, for example when
one wants to study equal-time commutation relations for the fields. According to the Wightman
axioms, however, the fields are operator-valued distributions on Minkowski spacetime and therefore only the smeared fields φj (f ) for f ∈ S(M) define operators on the Hilbert space. In other
words, the Wightman fields must be smeared out both in time and space. Suppose now that in
addition to the Wightman axioms, we also assume that the fields φj define for each t and each
f ∈ S(R3 ) a well-defined operator φj (t, f ) on the dense set D ∈ H such that for all u ∈ S(R) we
have
Z
φj (f u) =
φj (t, f )u(t)dt,
R
where f u ∈ S(M) is the function defined by f u(t, x) = f (x)u(t). Then the fields φj can also be
considered as operator-valued distributions on S(R3 ) depending on a parameter t. To prevent bad
t-dependence, we assume that for each f ∈ S(R3 ) and each Ψ ∈ D the norm of the vector φj (t, f )Ψ
is a bounded function of |t|. When a quantum field theory satisfies these properties we will simply
say that it satisfies the sharp-time axiom.
For the following theorem we need the definition of the Euclidean group in three dimensions.
The group E+ (3) of proper Euclidean motions in R3 is generated by translations and rotations in
e+ (3) is therefore a semi-direct product of R3 and SU (2) (compare
R3 . Its universal covering group E
e↑ , which was a semi-direct product of R4 and SL(2, C)) and the multiplication law is
this with P
+
given by (a1 , R1 )(a2 , R2 ) = (a1 + R1 a2 , R1 R2 ).
Theorem 4.19 Let {(φ1 )j (t, .)}nj=1 and {(φ2 )j (t, .)}nj=1 be two sets of operator-valued distributions
on S(R3 ), depending on a parameter t, that act on Hilbert spaces H1 and H2 , respectively, and
assume that the operator-valued distributions at any time t form an irreducible set of operators36 .
e+ (3) such
Suppose further for i = 1, 2 that on Hi there are defined unitary representations Ui of E
that
n
X
−1
Ui (a, R)(φi )j (t, f )Ui (a, R) =
S(R−1 )jk (φi )k (t, f(a,R) )
k=1
for all f ∈
where S is a matrix representation of SU (2). Finally suppose that for some t0
there exists a unitary operator V : H1 → H2 such that
S(R3 ),
(φ2 )j (t0 , .) = V (φ1 )j (t0 , .)V −1 .
Then
(a) the representations U1 and U2 are unitarily equivalent:
U2 (a, R) = V U1 (a, R)V −1 ;
(b) if there exists in H1 a unique (up to a phase) normalized vector Ω1 that is invariant under U1 ,
then there also exists in H2 a unique (up to a phase) normalized vector, namely Ω2 = V Ω1 , that
is invariant under U2 .
36
See the end of subsection 4.1.2 for the definition of an irreducible set of operators.
88
Proof
For t = t0 we have for f ∈ S(R3 )
U2 (a, R)V (φ1 )j (t0 , f )V −1 U2 (a, R)−1 = U2 (a, R)(φ2 )j (t0 , f )U2 (a, R)−1
n
X
=
S(R−1 )jk (φ2 )k (t0 , f(a,R) )
k=1
= V
n
X
!
S(R
−1
0
)jk (φ1 )k (t , f(a,R) ) V −1
k=1
= V U1 (a, R)(φ1 )j (t0 , f )U1 (a, R)−1 V −1 ,
which is equivalent to
(φ1 )j (t0 , f )V −1 U2 (a, R)−1 = V −1 U2 (a, R)−1 V U1 (a, R)(φ1 )j (t0 , f )U1 (a, R)−1 V −1 ,
which in turn is equivalent to
(φ1 )j (t0 , f )V −1 U2 (a, R)−1 V U1 (a, R) = V −1 U2 (a, R)−1 V U1 (a, R)(φ1 )j (t0 , f ).
Thus, the operator V −1 U2 (a, R)−1 V U1 (a, R) on H1 commutes with all the (φ1 )j (t, f ) and is therefore a (nonzero) multiple of the identity operator:
V −1 U2 (a, R)−1 V U1 (a, R) = ω(a, R)−1 1H1
or
U2 (a, R) = ω(a, R)V U1 (a, R)V −1 ,
where ω(a, R) is a complex number depending on (a, R). Now for any T1 = (a1 , R1 ) and T2 =
(a2 , R2 ) we have
ω(T1 T2 )V U1 (T1 T2 )V −1 = U2 (T1 T2 ) = U2 (T1 )U2 (T2 )
= ω(T1 )ω(T2 )V U1 (T1 )V −1 V U1 (T2 )V −1
= ω(T1 )ω(T2 )V U1 (T1 T2 )V −1 ,
e+ (3) and hence ω(a, R) ≡ 1. This proves part
so ω is in fact a one-dimensional representation of E
(a). Part (b) follows directly from the unitary equivalence of U1 and U2 .
We will now state (a generalization of) Haag’s theorem. For simplicity, we will only state it
for scalar fields.
Theorem 4.20 (Generalized Haag’s theorem) Let (H1 , D1 , U1 , φ1 , Ω1 ) and (H2 , D2 , U2 , φ2 , Ω2 )
be two scalar quantum field theories which satisfy the sharp-time axiom and the fields of which have
well-defined time-derivatives at each time t. For i = 1, 2, suppose that for each t the fields φi (t, .)
and ∂t φi (t, .) together form an irreducible set of fields on Hi . Suppose also that for some instant
t0 there exists a unitary operator V : H1 → H2 such that
φ2 (t, .) = V φ1 (t, .)V −1 ,
∂t φ2 (t, .) = V ∂t φ1 (t, .)V −1 .
Then
(a) the first four Wightman functions are the same in both quantum field theories;
(b) if φ1 is a free field of mass m ≥ 0, then φ2 is also a free field of mass m and both theories are
unitarily equivalent.
Part (b) of the theorem is the original theorem of Haag, and its truth follows from the first part
because the two-point Wightman functions in a free scalar field theory completely determine the
other Wightman functions. A proof of this theorem can be found in [2], theorem 9.28.
89
4.1.5
Example: The free hermitean scalar field
3
Let H = L2 (R3 , d2pp0 ) be the one-particle state space for a spinless particle with mass m ≥ 0 that is
equal to its own antiparticle. We denote the corresponding Fock space by F+ (H) and on this space
we define the creation and annihilation operators A∗ (Ψ) := A∗+ (Ψ) and A(Ψ) := A+ (Ψ) for each
e↑ with corresponding
Ψ ∈ H. On the Fock space we also have a unitary representation UFock of P
+
energy-momentum operators P µ that satisfy the positive energy condition in axiom 0 and we also
have a vacuum vector Ω that is the unique unit vector (up to a phase) in F+ (H) that is invariant
under UFock . We will now construct a hermitean scalar field in F+ (H) that satisfies the remaining
Wightman axioms.
R
1
ip·x d4 x.
For each Schwartz function f ∈ S(M) let fb denote its Fourier transform fb(p) = (2π)
2 M f (x)e
Because fb is again a Schwartz function, its restriction fb| + to the orbit O+ is an element of
Om
3
m
L2 (R3 , d2pp0 ) = H. We can thus define a map R : S(M) → H by
R(f ) = fb|Om
+.
Explicitly,
Z
1
0
(Rf )(p) =
f (x)ei(ωp x −p·x) d4 x.
(4.7)
2
(2π) M
Because R(f ) ∈ H, the operators A∗ (Rf ) and A(Rf ) are well-defined and we can use them to
define for each real-valued f ∈ S(M) the operators
√
φ(f ) = 2π(A(Rf ) + A∗ (Rf )).
For complex-valued f = f1 + if2 ∈ S(M), with f1 and f2 the real and imaginary parts of f ,
we define φ(f ) := φ(f1 ) + iφ(f2 ). The reason for not defining φ(f ) by the same formula as for
real-valued functions is that fields should depend linearly on the Schwartz function f (recall that
annihilation operators A(Ψ) depend anti-linearly on Ψ). Because for each Ψ ∈ H the operators
A(∗) (Ψ) are defined on the dense subspace D+ ⊂ F+ (H) (which was defined in subsection 2.2.5),
the operators φ(f ) are defined on the dense subspace D+ for any f ∈ S(M). Also, because the
A(∗) (Ψ) all leave D+ invariant, the operators φ(f ) also leave D+ invariant. Furthermore, for any
Ψ1 , Ψ2 ∈ D+ the map S(M) → C, given by
f 7→ hφ(f )Ψ1 , Ψ2 i,
is a tempered distribution. Thus, f 7→ φ(f ) is an operator-valued distribution and each such φ(f )
is defined on the dense subspace D+ ⊂ F+ (H) and leaves this subspace invariant. For each f the
adjoint φ(f )∗ is defined on D+ , so axiom 1 is satisfied. From the transformation properties of the
e↑ (as derived in subsection 2.2.5), it follows that φ
creation and annihilation operators under P
+
transforms as
U (a, A)φ(f )U (a, A)−1 = φ(f(a,A) ),
so axiom 2 is also satisfied. For real-valued f, g ∈ S(M) we have
Z
d3 p
∗
[A(Rf ), A (Rg)] = hRg, Rf i =
(Rg)(p)(Rf )(p)
2ωp
R3
Z
3
Z Z
d p
1
0 −p·x)
i(ωp y 0 −p·y)
4
i(ω
x
4
=
e
g(y)d y
e p
f (x)d x
4
(2π) R3 M
2ωp
M
Z Z Z
3
1
d p
0
=
e−i(ωp (x−y) −p·(x−y))
f (x)g(y)d4 xd4 y,
4
(2π) M M R3
2ωp
so for real-valued f, g ∈ S(M) we have
[φ(f ), φ(g)] = 2π ([A(Rf ), A∗ (Rg)] + [A∗ (Rf ), A(Rg)])
= 2π ([A(Rf ), A∗ (Rg)] − [A(Rf ), A∗ (Rg)]∗ )
Z Z Z
2i
d3 p
0
= −
sin(ωp (x − y) − p · (x − y))
f (x)g(y)d4 xd4 y.
(2π)3 M M R3
2ωp
90
For complex-valued f = f1 + if2 and g = g1 + ig2 we then find
[φ(f ), φ(g)] = [φ(f1 ) + iφ(f2 ), φ(g1 ) + iφ(g2 )]
Z Z Z
2i
d3 p
0
sin(ωp (x − y) − p · (x − y))
= −
(2π)3 M M R3
2ωp
[f1 (x)g1 (y) − f2 (x)g2 (y) + if1 (x)g2 (y) + if2 (x)g1 (y)]d4 xd4 y
Z Z Z
2i
d3 p
0
= −
sin(ωp (x − y) − p · (x − y))
f (x)g(y)d4 xd4 y.
(2π)3 M M R3
2ωp
The integral between the square brackets is a distribution in the variable x − y and vanishes at
points where x − y is spacelike. Therefore, if the supports of f and g are mutually spacelike
separated then [φ(f ), φ(g)] = 0. So axiom 3 is also satisfied. It can also be shown that the
Fock vacuum vector Ω is cyclic for the field operators φ(f ), so axiom 4 is also satisfied. Thus all
Wightman axioms are satisfied; see section 8.4 of [2] for more details. Note that for the 2-point
Wightman function we have
hφ(f )φ(g)Ω, Ωi = 2πhA(Rf )A∗ (Rg)Ω, Ωi = 2πh[A(Rf ), A∗ (Rg)]Ω, Ωi
Z Z Z
3
1
−i(ωp (x−y)0 −p·(x−y)) d p
=
e
f (x)g(y)d4 xd4 y,
(2π)3 M M R3
2ωp
or
1
W (x, y) =
(2π)3
Z
e−i(ωp (x−y)
0 −p·(x−y))
R3
d3 p
.
2ωp
(4.8)
For odd n the n-point Wightman functions are zero and for even n one can express the n-point
function in terms of the n−2 point function and the 2-point function, and hence the 2-point function
determines the other n-point functions. This was also mentioned briefly when we discussed Haag’s
theorem, but we will not prove it here; these statements about the n-point functions can (for
example) be found in section 8.4 of [2] or in section 3.3 of [32].
For any Schwartz function f ∈ S(M) the Fourier transform [(∂ 2 + m2 )f ]∧ of (∂ 2 + m2 )f is
given by
Z
1
2
2
∧
[(∂ + m )f ] =
eip·x (∂ 2 + m2 )f (x)d4 x
(2π)2 M
Z
1
= −
[∂ 2 eip·x ] f (x)d4 x + m2 fb(p)
(2π)2 M | {z }
=p2 eip·x
= (m2 − p2 )fb(p).
+ = {p ∈ M : p2 = m2 , p0 > 0} is identically
So the restriction of this Fourier transform to Om
2
2
zero; in other words, R((∂ + m )f ) ≡ 0. This implies that the field φ satisfies the Klein-Gordon
equation:
[(∂ 2 + m2 )φ](f ) = φ((∂ 2 + m2 )f ) = 0
for any f ∈ S(M).
We will now write the field φ in terms of the creation and annihilation operators a(∗) defined
on the Fock space F+ (H). As indicated in subsection 2.2.5, the physical equivalent of A(∗) (Ψ) is
a(∗) (JΨ), with J : H 3 Ψ 7→ √ 1 Ψ ∈ H. In the present case this means that we should define
2ωp
the map r : S(M) → H by rf = JRf , so
(rf )(p) =
1
p
(2π)2 2ωp
Z
M
91
f (x)ei(ωp x
0 −p·x)
d4 x.
On F+ (H) the field φ(f ) for real-valued f ∈ S(M) is now given by
√
√
φ(f ) =
2π(a∗ (JRf ) + a(JRf )) = 2π(a∗ (rf ) + a(rf ))
h
i
√ Z
=
2π
d3 p (rf )(p)a∗ (p) + (rf )(p)a(p)
R3
Z
Z
Z
d3 p
i(ωp x0 −p·x) 4
∗
−i(ωp x0 −p·x) 4
−3/2
p
f (x)e
d x a (p) +
f (x)e
d x a(p)
= (2π)
2ωp
R3
M
M
)
Z
Z (
i
3p h
d
0
0
p
=
(2π)−3/2
e−i(ωp x −p·x) a(p) + ei(ωp x −p·x) a∗ (p) f (x)d4 x.
2ωp
R3
M
For this reason we also write
φ(x) = (2π)−3/2
Z
R3
i
d3 p h −i(ωp x0 −p·x)
0
p
e
a(p) + ei(ωp x −p·x) a∗ (p) .
2ωp
As stated in the previous chapter, the a∗ (p) and a(p) are not well-defined operators on the Fock
space, but since we are smearing them out this is no problem. However, as we will show when we
will discuss the (λφ4 )2 -model in the next chapter, it is not true that a∗ (p) and a(p) cannot be
given any mathematical meaning without smearing them out.
We will now define the notion of the field φ at a fixed moment in time, on the Fock space
F+ (H). Analogous to (4.7) we define for each t ∈ R and for each Schwartz function f ∈ S(R3 ) on
R3 a map Rt : S(R3 ) → H by
Z
Z
f (x)e−ip·x d3 x.
f (x)ei(ωp t−p·x) d3 x = (2π)−3/2 eiωp t
(Rt f )(p) = (2π)−3/2
R3
R3
Then, for each t ∈ R and each real-valued f ∈ S(R3 ) we can define an operator
φt (f ) = A∗ (Rt f ) + A(Rt f )
on the Fock space. We then extend φt to complex-valued functions f = f1 + if2 by defining
φt (f ) = φt (f1 ) + iφ(f2 ). The operators φt (f ) are defined on D+ and it can be shown that the
map t 7→ hφt (f )Ψ1 , Ψ2 i is smooth for any f ∈ S(R3 ) and Ψ1 , Ψ2 ∈ H. We will now investigate the
relationship between φt and φ. For each Schwartz function u ∈ S(R) on R we find that for any
f ∈ S(R3 )
Z
Z Z
−3/2
ip0 t −ip·x 3
f (x)e e
d x u(t)dt
(Rt f )(p)u(t)dt = (2π)
R
R
R3
Z
f (x)u(x0 )eip·x d4 x
= (2π)−3/2
M
√
=
2π[R(f · u)](p).
For real-valued f ∈ S(R3 ) and real-valued u ∈ S(R), this implies that
Z
√
dtu(t)A(∗) (Rt f ) = 2πA(∗) (R(f · u)),
R
and therefore for real-valued f ∈ S(R3 ) and real-valued u ∈ S(R) we find that
Z
dtu(t)φt (f ) = φ(f · u).
R
This can then be extended by linearity to complex-valued f and u. This establishes the relationship
between φt and φ: the operator-valued distribution φt on S(R3 ) is nothing else than the operatorvalued distribution φ on S(M) at fixed time t. For the time-derivative of the field at time t we
find
∂t φt (f ) = A(∂t Rt f )∗ + A(∂t Rt f ) = A(iωp Rt f )∗ + A(iωp Rt f ).
92
So in order to obtain the commutators of the φt and their derivatives, we must compute commutators of the form [A(h1 · Rt f ), A(h2 · Rt g)∗ ], where the hj = hj (p) are either identically one, or
else iωp . For these commutators we find
[A(h1 · Rt f ), A(h2 · Rt g)∗ ] = hh2 · Rt g, h1 · Rt f i
Z
Z
d3 p
1
−ip·y 3
iωp t
g(y)e
d y
h2 (p)e
=
(2π)3 R3 2ωp
R3
Z
iω
t
−ip·x
3
p
h1 (p)e
f (x)e
d x
R3
"
#
Z Z
Z
1
h1 (p)h2 (p) ip·(x−y) 3
=
e
d p f (x)g(y)d3 xd3 y.
(2π)3 R3 R3 R3
2ωp
The commutator of φt with itself now follows from choosing h1 = h2 ≡ 1,
[φt (f ), φt (g)] = [A(Rt f ), A∗ (Rt g)] − [A(Rt f ), A∗ (Rt g)]∗
Z Z Z
sin[p · (x − y)] 3
2i
d p f (x)g(y)d3 xd3 y
=
(2π)3 R3 R3 R3
2ωp
= 0.
The commutator of ∂t φt with itself follows from choosing h1 = h2 = iωp ,
[∂t φt (f ), ∂t φt (g)] = [A(iωp Rt f ), A∗ (iωp Rt g)] − [A(iωp Rt f ), A∗ (iωp Rt g)]∗
Z Z Z
ωp sin[p · (x − y)] 3
2i
=
d
p
f (x)g(y)d3 xd3 y
(2π)3 R3 R3 R3
2
= 0.
Finally, the commutator of φt with ∂t φt is obtained by taking h1 ≡ 1 and h2 = iωp ,
[φt (f ), ∂t φt (g)] = [A(Rt f ), A∗ (iωp Rt g)] − [A(Rt f ), A∗ (iωp Rt g)]∗
Z Z Z
i
3
=
cos[p · (x − y)]d p f (x)g(y)d3 xd3 y
(2π)3 R3 R3 R3
Z Z
= i
δ(x − y)f (x)g(y)d3 xd3 y.
R3
R3
We have thus found the commutation relations
[φt (x), φt (y)] = 0 = [∂t φt (x), ∂t φt (y)]
[φt (x), ∂t φt (y)] = iδ(x − y).
As we did for the field φ, we can also define the field φt on F+ (H). This is done by using the map
rt : S(R3 ) → H which is given by
Z
1
iωp t
p
(rt f )(p) =
e
f (x)e−ip·x d3 x.
3
(2π)3/2 2ωp
R
The field φt on F+ (H) is then defined by
φt (f ) = a∗ (rt f ) + a(rt f )
for real-valued f ∈ S(R3 ). The result is
)
Z (
Z
i
3p h
d
p
φt (f ) =
(2π)−3/2
e−i(ωp t−p·x) a(p) + ei(ωp t−p·x) a∗ (p) f (x)d3 p.
2ωp
R3
R3
This suggests that we should write
Z
φt (x) = (2π)−3/2
R3
i
d3 p h −i(ωp t−p·x)
p
e
a(p) + ei(ωp t−p·x) a∗ (p) .
2ωp
The right-hand side is the same as φ(x) with x0 = t, which reflects the fact that φt is precisely the
field φ at time t.
93
4.1.6
Haag-Ruelle scattering theory
In the previous chapter we showed that quantum fields arise quite naturally in the (perturbative)
calculations in scattering theory. Before we introduced the perturbation theory, we mentioned that
there must be some embeddings Ωin and Ωout from the Fock space HFock (describing free particles)
into the physical Hilbert space H corresponding to the scattering experiment. We will now show
that under some additional conditions, a quantum field theory (i.e. a theory satisfying the Wightman axioms) gives rise to such embeddings37 and can therefore describe scattering experiments.
The additional conditions are the following.
Haag-Ruelle axiom 1
The joint spectrum of the operators P µ lies in the set {0}∪Vµ+ , where Vµ+ = {p ∈ M : p · p > µ and p0 ≥ 0}.
Haag-Ruelle axiom 2
The Hilbert space H contains countably many mutually orthogonal subspaces {H[τ ] }τ ∈T (so-called
one-particle subspaces for particles of type τ ) which transform according to irreducible unitary repe↑ and are taken into H[τ C ] under PCT-transformations. Furthermore,
resentations (mτ , sτ ) of P
+
for each particle type τ ∈ T in the theory there exists an operator Aτ ∈ P(M) in the polynomial
C
algebra such that Aτ Ω is the zero vector in H[τ ] and A∗τ Ω ∈ H[τ ] .
Given the subspaces {H[τ ] }τ ∈T , the operators Aτ are called the solutions of the quantum field
problem of one-particle states. Sometimes the one-particle problem has a simple solution; for instance, this is the case if the mass m > 0 of the particle is an isolated point of the spectrum of the
mass operator.
In a theory satisfying the Wightman axioms and the Haag-Ruelle axioms we define for each
−1
particle type τ ∈ T the linear span B [τ ] of all operators
Cof∗the form U (a, L)Aτ U (a, L) with
e↑ . We then define the space A[τ ] := B [τ ] + B [τ ] . Then A[τ ] is a linear subspace of
(a, L) ∈ P
+
C
P(M) that is taken to A[τ ] under hermitean conjugation, is invariant with respect to restricted
Poincaré transformations and is such that D[τ ] := A[τ ] Ω is dense in H[τ ] . Thus, for each particle
type τ ∈ T essentially all one-particle states Ψ ∈ H[τ ] can be constructed by letting an operator in
the polynomial algebra act on the vacuum vector, and when the adjoint of this operator acts on
the vacuum vector then this gives a one-particle state of the corresponding antiparticle τ C .
For each operator A ∈ A[τ ] we define a family of operators {At }t∈R by
Z
∂
∂
t
A =
A(x) 0 Dmτ (x) − Dmτ (x) 0 A(x) d3 x,
∂x
∂x
0
x =t
R
d4 p
where A(x) := U (x, 1)AU (x, 1)−1 and Dm (x) := 2πi M (p0 )δ(p2 − m2 )e−ip·x (2π)
4 . An important
t
t
property of the family {A }t∈R is that each element A acts in the same way on the vacuum as A,
At Ω = AΩ.
(4.9)
The first part of the main result of Haag-Ruelle theory is that for Aj ∈ H[τj ] with j = 1, . . . , n the
limits limt→∓∞ At1 . . . Atn Ω exist in H. These limits are denoted by Ψin and Ψout :
Ψin (A1 , . . . , An ) =
Ψout (A1 , . . . , An ) =
lim At1 . . . Atn Ω
t→−∞
lim At1 . . . Atn Ω.
t→∞
To understand the second part of the result, let HFock be the Fock space describing a free system
of particles of types τ ∈ T . We will identify the vacuum vector ΩFock ∈ HFock with the vacuum
vector Ω ∈ H and the one-particle states in HFock with the one-particle states in H. Then for each
37
We will not provide the proofs of the relevant theorems here, since very detailed versions can be found in chapter
12 of [2]. Another good source for Haag-Ruelle theory is [20], sections II.3 and II.4.
94
A ∈ A[τ ] we can interpret AΩ and A∗ Ω as one-particle states in the Fock space HFock , describing a
free particle of type τ and of type τ C , respectively. Using the creation and annihilation operators
defined in subsection 2.2.5, we define the operator
b := A∗τ (AΩ) + A∗ C (A∗ Ω)
A
τ
on the Fock space. For each n ∈ N and each n-tuple (A1 , . . . , An ) with Aj ∈ A[τj ] we then define
a Fock space vector ΨFock (A1 , . . . , An ) by
c1 . . . A
cn ΩFock ,
ΨFock (A1 , . . . , An ) = A
and the closed linear span of all such vectors in the entire Fock space. A nice property of ΨFock is
that it satisfies
UFock (a, L)ΨFock (A1 , . . . , An ) = ΨFock (U (a, L)A1 U (a, L)−1 , . . . , U (a, L)An U (a, L)−1 ),
e↑ and UFock is the representation of P
e↑ on HFock . The second part of the main
where (a, L) ∈ P
+
+
result of Haag-Ruelle theory now states that there exist two linear isometries Ωin , Ωout : HFock → H
satisfying
Ωin/out (ΨFock (A1 , . . . , An )) = Ψin/out (A1 , . . . , An ),
with Aj ∈ A[τj ] . Furthermore, this property determines Ωin and Ωout uniquely and these maps are
Poincaré invariant, i.e.
U (a, L)Ωin/out = Ωin/out UFock (a, L)
e↑ , which in turn implies that the S-operator is Poincaré invariant:
for all (a, L) ∈ P
+
UFock (a, L)SUFock (a, L)−1 = UFock (a, L)(Ωout )∗ Ωin UFock (a, L)−1
= UFock (a, L) UFock (a, L)−1 (Ωout )∗ U (a, L)
U (a, L)−1 Ωin UFock (a, L) UFock (a, L)−1
= S.
(4.10)
S
From (4.9) it follows that Ψin (A) = Ψout (A) for any A ∈ τ ∈T A[τ ] , and thus that Ωin ΨFock (A) =
Ωout ΨFock (A) for any such A. This implies that the S-operator satisfies
SΨFock (A) = (Ωout )∗ Ωin ΨFock (A) = (Ωout )∗ Ωout ΨFock (A) = ΨFock (A),
where in the last step we used that Ωout is an isometry. Thus, the S-operator leaves one-particle
states invariant.
4.2
The Haag-Kastler formulation of quantum field theory
In this section we discuss the Haag-Kastler axioms as an alternative to the Wightman axioms.
The Haag-Kastler framework is often called algebraic quantum field theory, because it makes use
of abstract C ∗ -algebras, rather than concrete operators on a Hilbert space.
4.2.1
The algebraic approach to quantum theory
To discuss the Haag-Kastler formulation of quantum field theory, we first need to reformulate the
quantum theory that we discussed in section 2.2. In that section we assumed that the states and
observables of a quantum system are given in terms of a concrete Hilbert space. In particular,
the algebra of observables was B(H) and the states were given in terms of density matrices on
H. In the algebraic approach to quantum theory, which we will introduce now38 , the algebra of
observables corresponding to a quantum system is given as an abstract unital C ∗ -algebra U, the
hermitian elements of which are called bounded observables. The set of states of this C ∗ -algebra,
38
Good sources for the algebraic approach are chapter 6 of [2] and chapter 2 of [7].
95
i.e the normalized positive linear functionals on U, is denoted by S(U), but this set will be too
large for physical purposes and we will therefore define the smaller set of physical states below. In
the meantime, it will be convenient to introduce some terminology concerning the set S(U). The
transition probability ω1 · ω2 between two pure states ω1 , ω2 ∈ P S(U) is defined as
1
ω1 · ω2 = 1 − kω1 − ω2 k2 ,
4
where k.k denotes the operator norm on S(U). Because 0 ≤ kω1 − ω2 k ≤ kω1 k + kω2 k = 2, it is
clear that ω1 · ω2 ∈ [0, 1] and it follows from the positive-definiteness of k.k that ω1 · ω2 = 1 if
and only if ω1 = ω2 . When ω1 · ω2 = 0, we say that the states ω1 and ω2 are orthogonal, and
two subsets S1 , S2 ⊂ P S(U) of pure states are called mutually orthogonal if ω1 · ω2 = 0 for all
ω1 ∈ S1 and ω2 ∈ S2 . A non-empty subset S ⊂ P S(U) is called indecomposable if it cannot be
written as the disjoint union of two non-empty mutually orthogonal subsets. Using this definition,
we define a relation ∼ on P S(U) as follows: ω1 ∼ ω2 if and only if there exists an indecomposable
set S ⊂ P S(U) with ω1 , ω2 ∈ S.
Proposition 4.21 The relation ∼ is an equivalence relation on P S(U).
Proof
By considering the indecomposable set {ω}, it is clear that ω ∼ ω (reflexivity) for all ω ∈ P S(U).
Because the definition of ω1 ∼ ω2 is manifestly symmetric in ω1 and ω2 , it is also clear that
ω1 ∼ ω2 ⇒ ω2 ∼ ω1 (symmetry) for all ω1 , ω2 ∈ P S(U). To prove transitivity, assume that
ω1 ∼ ω2 and ω2 ∼ ω3 . Then there exist indecomposable sets S1 , S2 ⊂ P S(U) with ω1 , ω2 ∈ S1 and
ω2 , ω3 ∈ S2 . If the union S := S1 ∪ S2 would not be indecomposable, there would exist two disjoint
non-empty mutually orthogonal subsets S 0 , S 00 ⊂ P S(U) with S = S 0 ∪ S 00 , and hence we could
write Sj = (Sj ∩ S 0 ) ∪ (Sj ∩ S 00 ) for j = 1, 2. Note that either ω2 ∈ S 0 or ω2 ∈ S 00 . Assuming that
ω2 ∈ S 0 would immediately lead to Sj ∩ S 00 = ∅ for j = 1, 2 (since the Sj are indecomposable), and
thus also to S 00 = ∅. Similarly, assuming that ω2 ∈ S 00 would lead to S 0 = ∅. This contradiction
shows that S must in fact be indecomposable. Because ω1 , ω2 , ω3 ∈ S, this implies that ω1 ∼ ω3 ,
and thus that ∼ is indeed an equivalence relation.
Now consider an equivalence class C ⊂ P S(U) under ∼. We will show that C is indecomposable. If this would not be true then there would be disjoint mutually orthogonal non-empty sets
C1 , C2 ⊂ P S(U) with C = C1 ∪ C2 . Now if ω1 , ω2 ∈ C with ωj ∈ Cj , then (since ω1 ∼ ω2 ) there
exists an indecomposable set S ⊂ P S(U) with ω1 , ω2 ∈ S. Because all elements of S are equivalent under ∼, we must have S ⊂ C. But then S decomposes into disjoint mutually orthogonal
non-empty subsets S ∩ C1 and S ∩ C2 , contradicting the indecomposability of S. Thus we conclude that the equivalence classes are indecomposable subsets of P S(U). Now suppose that for
some equivalence class C we have an indecomposable set C 0 with C ⊂ C 0 . Then all elements in C 0
are equivalent under ∼ and hence we also have C 0 ⊂ C, which implies that C 0 = C. This shows
that the equivalence classes are maximal indecomposable subsets of P S(U); we will call these sets
sectors. Furthermore, note that if ω1 , ω2 ∈ P S(U) with ω1 · ω2 6= 0, then the set {ω1 , ω2 } is indecomposable and hence ω1 ∼ ω2 . This shows that the different sectors must be mutually orthogonal.
General facts about representations
In order to physically describe a system with abstract algebra of observables U, we choose an appropriate representation π : U → B(H) of the algebra of observables in some Hilbert space H. In
this context we call π the physical representation and H the physical Hilbert space. If the system
has finitely many coordinates and momenta that must satisfy the canonical commutation relations,
the choice of representation is uniquely determined (up to unitary equivalence) by the Stone-Von
Neumann theorem. However, if the system has infinitely many degrees of freedom, as in quantum
field theory, the Stone-Von Neumann theorem is no longer applicable and in such cases there are
many unitarily inequivalent representations of the canonical commutation relations. Therefore, for
96
such systems the physical representation π should be chosen carefully, depending on the particular
dynamics of the system at hand39 ; for instance, the Fock representation cannot be used for interacting fields. Without loss of generality, we may always assume that the physical representation
π is faithful, i.e. that π is injective. The reason for this is as follows. Suppose that it would have
been possible to physically describe a quantum system by using a non-faithful representation π of
the algebra of observables U. Then the representation π defines a representation π
b of the quotient
∗
C -algebra U/ ker(π), and we could just as well have started with this quotient algebra (as the
algebra of observables) from the beginning.
Given the physical representation π : U → B(H) we define for each unit vector Ψ ∈ H a state
on the C ∗ -algebra U by
U 3 A 7→ hπ(A)Ψ, Ψi =: ρΨ (A).
(4.11)
We call this state the vector state associated with π corresponding to the vector Ψ ∈ H. If π is
irreducible then this always defines a pure state in S(U) and in that case the set {ρΨ }Ψ∈H of all
vector states associated with π coincides precisely with a sector C in P S(U) and if π 0 is another
irreducible representation of U whose vector states correspond to some sector C 0 then C 0 = C if and
only if π 0 is unitarily equivalent to π. Also, for each sector C ⊂ P S(U) there exists an irreducible
representation π : U → B(H) such that C = {ρΨ }Ψ∈H , so we conclude that the sectors of P S(U) are
in one-to-one correspondence with the irreducible representations of U modulo unitary equivalence.
The proof of these facts can be found in section 6.1 (proposition 6.2) of [2].
As stated at the beginning of this subsection, the space S(U) is unnecessarily large for physical
purposes. For a physical representation π : U → B(H), we define the set of physical states to be
the set of all states in S(U) of the form
ρ(A) = Tr(b
ρπ(A)),
A∈U
(4.12)
with ρb a density operator on H. To emphasize that this set of physical states depends on the
representation π, we will denote it by Sπ . In general, Sπ is a proper subset of the set of all states
S(U). Note that we can now characterize a quantum system by the pair (U, π), instead of by the
pair (H, A) as we did in subsection 2.2.2. By the same reasoning as in subsection 2.2.2, we find
that the vector state ρΨ defined in (4.11) is obtained as a special case of (4.12) by taking ρb to be
the one-dimensional projection onto CΨ. Also, as in subsection 2.2.2, any ρ ∈ Sπ can be written
as a countable convex combination of vector states. Again this shows that any pure state in Sπ
must be a vector state. However, because in general π(U) is not equal to B(H), the converse is
not necessarily true. To illustrate this, suppose that the physical Hilbert space is a direct sum
H = H1 ⊕ H2 of Hilbert spaces and that π(U) = B(H1 ) ⊕ B(H2 ). Let Ψi ∈ Hi for i = 1, 2
be two unit vectors that define vector
√ states ρΨi which are different from each other, and define
the unit vector Ψ := (Ψ1 + Ψ2 )/ 2 ∈ H. Because π(A)Ψi ∈ Hi , and hence π(A)Ψi ⊥ Ψj for
(i, j) ∈ {(1, 2), (2, 1)}, for every A ∈ U, the vector state defined by Ψ satisfies
ρΨ (A) =
=
1
1
1
hπ(A)(Ψ1 + Ψ2 ), Ψ1 + Ψ2 i = hπ(A)Ψ1 , Ψ1 i + hπ(A)Ψ2 , Ψ2 i
2
2
2
1
1
ρΨ (A) + ρΨ2 (A),
2 1
2
which shows that the vector state ρΨ is a convex combination of two different states and is therefore
not pure. We note furthermore that although each state in Sπ is a countable convex combination
of vector states, it is not necessarily true that each state is a countable convex combination of pure
states.
We now introduce some terminology concerning representations. Two representations π1 :
U → B(H1 ) and π2 : U → B(H2 ) are called phenomenologically equivalent if Sπ1 = Sπ2 . Note
that two unitarily equivalent representations are in particular phenomenologically equivalent. A
39
In contrast to the case of finitely many degrees of freedom, where the chosen representation only depends on the
number of degrees of freedom (i.e. H ' L2 (RN ) for N degrees of freedom) and not on the specific dynamics of the
system.
97
representation π : U → B(H) is called factorial of type I if π is a direct sum of a (possibly
e so H = H
e ⊕K and π =
infinite) number of copies of some irreducible representation π
e : U → H,
⊕K
π
e . Two factorial representations of type I are called disjoint if they are multiples of irreducible
representations that are unitarily inequivalent. Without proof we mention that a representation π
is phenomenologically equivalent to some irreducible representation π
e if and only if π is a direct
sum of copies of π
e (and is thus factorial of type I).
Once we have chosen the physical representation π, we can consider the closure π(U) of the
algebra π(U) ⊂ B(H) in the σ-weak topology. By Von Neumann’s bicommutant theorem, this
is a Von Neumann algebra and we have π(U) = π(U)00 ; it is called the Von Neumann algebra of
observables of the quantum system. To be able to consider observables which are represented by
unbounded observables, we proceed as follows. We say that a (possibly unbounded) self-adjoint
operator A on H is affiliated to the Von Neumann algebra of observables π(U)00 if all spectral
projection operators EA (∆), with ∆ a Borel set in R, belong to π(U)00 . The set of observables of
the system is then defined to be the set of all self-adjoint operators on H which are affiliated to
π(U)00 .
Superselection rules
For a quantum system (U, π) the elements in the commutant π(U)0 are called superselection operators and a set of operators in B(H) that generates π(U)0 is refered to as the superselection rules
of the system. Of course, we always have C1H ⊂ π(U)0 and in case the inclusion is strict we say
that the system has non-trivial superselection rules.
If H is a Hilbert space and V ⊂ H is a subset of non-zero vectors in H, then V is called a linked
system of vectors if V cannot be written as a disjoint union of two non-empty mutually orthogonal
subsets. In particular, for any linear subspace V ⊂ H the set of unit vectors in V forms a linked
system. Now let W ⊂ H be a set of nonzero vectors that is total in H, i.e. the closed linear span of
W is equal to H. We then define a relation ∼ on W as follows: Ψ1 ∼ Ψ2 if and only if there exists
a linked system L ⊂ W with Ψ1 , Ψ2 ∈ L. By using similar arguments as for indecomposable sets of
P S(U) (see above), we find that ∼ defines an equivalence relation on W . The equivalence classes
give rise to a partitioning of W into mutually orthogonal maximal linked systems {Wν }ν∈N , where
the index set N may be uncountable. For each ν ∈ N we define a subspace Hν ⊂ H as the closed
linear span of Wν . Because W is total in H, we then have
M
H=
Hν .
ν∈N
So we conclude that if we have a total subset W in a Hilbert space H, then H decomposes into a
direct sum of non-zero subspaces Hν such that Wν = W ∩ Hν . Note that although N might be
uncountable, the direct sum is still discrete in the sense that the measure on the index set N is
discrete, in contrast to the general case of a direct integral when the set N is equipedR with a more
⊕
general
measure and in which case we would really have to write a direct integral
instead of
L
.
We will now apply this in the following way. Let π : U → B(H) be a representation of the
∗
C -algebra U and suppose that the set P ⊂ H of all vectors in H that define pure states on U
forms a total subset
L of H. Then according to the discussion above we can decompose H into a
direct sum H = ν∈N Hν of non-zero subspaces Hν with Pν = Hν ∩ P, where {Pν }ν∈N are the
maximal linked systems in P as above. Now fix some ν0 ∈ N and choose a Ψ0 ∈ Pν0 . If A ∈ U with
π(A)Ψ0 6= 0, then it can be shown40 that the state on U defined by the unit vector AΨ0 /kAΨ0 k is
pure, so the unit vectors of π(U)Ψ0 form a subset of P. Because π(U)Ψ0 is a linear subspace, the
unit vectors in π(U)Ψ0 form a linked system. Thus the set of unit vectors of π(U)Ψ0 is a subset
of Pν1 for some ν1 ∈ N . But Ψ0 ∈ Pν0 ∩ Pν1 , so we must in fact have ν1 = ν0 and thus the unit
vectors of π(U)Ψ0 lie in Pν0 . Since Ψ0 ∈ Pν0 was arbitrary, this implies that π(U)Pν0 ⊂ Hν0 ; since
ν0 ∈ N was arbitrary, we then have π(U)Pν ⊂ Hν for all ν ∈ N . Because for each ν the set Pν is
40
See exercise 6.10 of [2].
98
total in Hν , this in turn implies that π(U) leaves all the subspaces Hν invariant. Without proof41
we mention furthermore that the subrepresentation of π(U) on the subspaces Hν are all factorial
of type I and are pairwise disjoint. We can thus write H as a double direct sum
M
M
eν⊕Mν
H
(4.13)
H=
Hν =
ν∈N
ν∈N
and we can write π as
π=
M
πν =
ν∈N
M
π
eν⊕Mν ,
(4.14)
ν∈N
eν ) are irreducible representations. This decomposition into a (discrete) direct
where π
eν : U → B(H
sum of irreducible representations was possible because of the assumption that P is total in H;
this assumption about P is therefore called the hypothesis of discrete superselection rules. For
representations satisfying this hypothesis the following proposition holds, which can be found in
section 6.2 (proposition 6.6) of [2].
Proposition 4.22 Let π : U → B(H) be a representation of a C ∗ -algebra U, let P ⊂ H be the
set of all vectors that define pure states and suppose that π satisfies the hypothesis of discrete
superselection rules. Then the following statements are equivalent:
(1) The elements of P Sπ are in one-to-one correspondence with the elements of P.
(2) The representations πν : U → B(Hν ) in the decompositions (4.13) and (4.14) are irreducible
(i.e. Mν = 1 for
S all ν ∈ N ).
(3) P = {Ψ ∈ ν∈N Hν : kΨk = 1}.
(4) The commutant π(U)0 of π(U) is abelian.
Note that if (2) is satisfied, the representation is still phenomenologically equivalent to a representation where some (or all) of the Mν are larger than 1 (including the case where some Mν
is infinite). So demanding that the physical representation π satisfies (2) does not restrict the
possibilities for the state space Sπ , but it has the benefit that it simplifies the representation π.
For this reason it is often assumed that a system (U, π) that satisfies the hypothesis of discrete
superselection rules, also satisfies the equivalent statements in the proposition above. Because of
(4), this assumption is called the hypothesis of commutative (discrete) superselection rules. So for
a system (U, π) that satisfies the hypothesis of commutative discrete superselection rules the representation decomposes into a direct sum of unitarily inequivalent representations of U. As stated
earlier, all unit vectors in an irreducible representation define pure states on U, so in each of the
spaces Hν in the direct sum we have the unrestricted superposition principle, i.e. the superposition
of two pure states again defines a pure state. On the entire space H we then have the following
restricted version of the superposition principle: a normalized linear combination of two vectors
defining pure states again defines a pure state if the two vectors belong to the same space Hν .
For this reason the subspaces Hν are called coherent subspaces of H. For a system (U, π) with
commutative discrete superselection rules, the commutant is given by
M
M
πν (U)0 =
C1ν ,
π(U)0 =
ν∈N
ν∈N
where the first equality already holds if only the hypothesis of discrete superselection rules is
satisfied. The second equality follows from the irreducibility of the πν . The Von Neumann algebra
of observables is now clearly
M
π(U)00 =
B(Hν ).
ν∈N
Symmetries in the algebraic approach
We will now discuss symmetries in the algebraic approach. The definition of a symmetry of the
41
See proposition 6.5 of [2] for a proof.
99
quantum system (H, A) that was given in 2.2.3 can be restated for a quantum system (U, π) in
the algebraic approach. A symmetry of a quantum system (U, π) is defined to be a pair (s, s0 ) of
bijections s : Us-a → Us-a and s0 : Sπ → Sπ on the set of self-adjoint (s-a) elements of U and on the
set of physical states, respectively, satisfying
(s0 ρ)(sA) = ρ(A)
(4.15)
for all ρ ∈ Sπ and A ∈ Us-a .
The map s : Us-a → Us-a is continuous in the norm topology (it even preserves the norm) and
consequently the map s0 : Sπ → Sπ is weak*-continuous. The proofs of these facts can be found
in [2], proposition 6.7 and the paragraph preceding that proposition. Because π is assumed to
be faithful, Sπ can be shown42 to be weak*-dense in S(U). Thus we can extend s0 uniquely by
weak*-continuity to a map s0 : S(U) → S(U); therefore we may assume that s0 is a map from
S(U) into itself; it is in fact a bijection, with inverse given by the extension of (s0 )−1 : Sπ → Sπ .
By using the same reasoning as in section 2.2.3, the map s0 : S(U) → S(U) preserves the convex
structure of S(U) and therefore maps pure states onto pure states. Thus, s0 : S(U) → S(U) is a
weak*-continuous affine bijection which maps Sπ onto itself. Furthermore, s0 preserves transition
probabilities and as a consequence the image of a sector of S(U) under the map s0 is again a sector.
Conversely, given a weak*-continuous affine map s0 that maps Sπ onto itself, we can define a unique
symmetry (s, s0 ). Before we close our discussion of the map s0 and go over to a discussion of the
map s, we define the notion of an invariant state. A state ρ ∈ Sπ is called an invariant state under
(s, s0 ) if
(s0 ρ)(A) = ρ(A)
for all A ∈ Us−a , or equivalently (s0 ρ)(sA) = (s0 ρ)(A) for all A ∈ Us−a .
Now that we have discussed s0 in some detail, we will discuss some properties of s. To see that
s : Us−a → Us−a is R-linear, we note that for λ, µ ∈ R and A, B ∈ Us-a we have
(s0 ρ)(s(λA + µB)) = ρ(λA + µB) = λρ(A) + µρ(B) = λ(s0 ρ)(sA) + µ(s0 ρ)(sB)
= (s0 ρ)(λsA + µsB)
(4.16)
for all ρ ∈ S(U), which implies that s(λA+µB) = λsA+µsB since S(U) separates the points of Us-a .
Furthermore, it can also be shown that s satisfies s(A2 ) = s(A)2 for all A ∈ Us-a (see for instance
proposition 6.10 of [2]), which is equivalent to the property that s(AB+BA) = s(A)s(B)+s(B)s(A)
for all A, B ∈ Us-a . We will now extend s to a map s : U → U by demanding that condition (4.15)
also holds for all A ∈ U. Then the first step in (4.16) also makes sense for λ, µ ∈ C and it follows
that s : U → U is C-linear, and is hence a vector space automorphism. Using the fact that each
1
A ∈ U can be written as linear combination A = 12 (A + A∗ ) + i 2i
(A − A∗ ) =: Re(A) + iIm(A)
2
2
of self-adjoint elements, it is also easy to see that s(A ) = s(A) for all A ∈ U. Finally, for each
A ∈ U we also have
s(A∗ ) = s(Re(A) − iIm(A)) = s(Re(A)) − is(Im(A)) = [s(Re(A)) + is(Im(A))]∗
= s(A)∗ ,
where in the last step we used that s is C-linear. If U1 and U2 are C ∗ -algebras, a linear map
s : U1 → U2 satisfying s(A2 ) = s(A)2 and s(A∗ ) = s(A)∗ for all A ∈ U is called a Jordan*homomorphism. Thus we have found that the symmetries of a quantum system (U, π) must be
Jordan*-automorphisms of U. Conversely, if s : U → U is a Jordan*-automorphism then we get a
pair (s, s0 ) of bijections where (s0 ρ)(A) = ρ(s−1 A) for A ∈ U; however, this pair (s, s0 ) will only
define a symmetry if s0 maps Sπ onto itself. The set J (U) of all Jordan*-automorphisms inherits a topology from the weak*-topology of U∗ . Together with the composition law of Jordan*automorphisms this gives J (U) the structure of a topological group. Note that a Jordan*automorphism is more general than a C ∗ -isomorphism because a Jordan*-isomorphism is not
42
See section 6.1 and 6.3 of [2]. Here it is shown that if π is faithful, Sπ distinguishes the positive elements of U
(i.e. ρ(A) ≥ 0 for all ρ ∈ Sπ implies A ≥ 0), which in turn (by a result of Kadison which also uses that Sπ is convex)
implies that Sπ is weak*-dense in S(U).
100
necessarily multiplicative. For instance, a C ∗ -anti-automorphism of U (i.e. a vector space automorphism s : U → U which preserves the ∗-operation and satisfies s(AB) = s(B)s(A) for all
A, B ∈ U) is also a Jordan*-automorphism. The following theorem of Kadison, which can be
found in section 2.2 of [7] (theorem II.2.1), gives us more insight in the nature of a given Jordan*automorphism.
Theorem 4.23 Let U1 be an abstract C ∗ -algebra and let U2 be a C ∗ -algebra of operators on a
Hilbert space H. Then a linear ∗-preserving surjection α : U1 → U2 is a Jordan*-homomorphism if
and only if there exists a projection operator E ∈ (U2 )00 ∩ (U2 )0 such that α(AB)E = α(A)α(B)E
and α(AB)(1 − E) = α(B)α(A)(1 − E) for all A, B ∈ U1 .
We can apply the theorem as follows. If s : U → U is a Jordan*-automorphism that defines a
symmetry of the system (U, π), then the map πs : U → π(U) defined by A 7→ π(sA) is a surjective
Jordan*-homomorphism. According to the theorem, there exists a projection E ∈ π(U)00 ∩ π(U)0
with πs (AB)E = πs (A)πs (B)E and πs (AB)(1 − E) = πs (B)πs (A)(1 − E) for all A, B ∈ U. In other
words, if H denotes the representation space corresponding to π then H decomposes into a direct
sum H = H1 ⊕ H2 of subspaces H1 = EH and H2 = (1 − E)H which are invariant under π(U) (this
follows from the fact that E ∈ π(U)0 ) and such that the Jordan*-automorphism43 π(A) 7→ πs (A)
on π(U) is a direct sum of a C ∗ -automorphism of the algebra π(U)E ' π(U)|H1 and a C ∗ -antiautomorphism of the algebra π(U)(1 − E) ' π(U)|H2 . As a special case, if π(U)00 ∩ π(U)0 = C1
then the Jordan automorphism π(A) 7→ πs (A) is either a C ∗ -automorphism or else a C ∗ -antiautomorphism and we have thus obtained the following corollary to the theorem above.
Corollary 4.24 Let U be a C ∗ -algebra and let π : U → B(H) be a representation. If s : U → U
is a Jordan*-automorphism such that s(ker(π)) ⊂ ker(π), then π(A) 7→ π(sA) =: πs (A) is a
Jordan*-automorphism of π(U). Furthermore, there exists a projection operator E ∈ π(U)00 such
that π(U) leaves the subspaces H1 = EH and H2 = (1−E)H invariant and such that π(A) 7→ πs (A)
decomposes into a direct sum of a C ∗ -automorphism of π(U)|H1 and a C ∗ -anti-automorphism of
π(U)|H2 .
The relationship between this corollary and Wigner’s theorem in section 2.2.3 is as follows.
Assume that (U, π) satisfies the hypotheses of commutative and discrete superselection rules. Then
the physical Hilbert space decomposes into a direct sum of Hilbert spaces Hν
M
H=
Hν ,
ν∈N
L
and the physical representation π decomposes accordingly into a direct sum π =
ν∈N πν of
irreducible representations πν : U → B(Hν ). Because the Hilbert spaces H1 = EH and H2 =
(1 − E)H are invariant under π(U), the index set N can be written as a disjoint union N = N1 ∪ N2
such that
M
M
Hν
and
H2 =
Hν
H1 =
ν∈N1
ν∈N2
Note that it is allowed that one of the Ni is empty. It can now be shown that the C ∗ -automorphism
of π(U)|H1 can be represented by a unitary operator U1 : H1 → H1 as
πs (A)|H1 7→ U1 π(A)|H1 U1 −1
(4.17)
and that there exists a bijection b1 : N1 → N1 such that this operator U1 maps the coherent
subspace Hν with ν ∈ N1 unitarily onto the coherent subspace Hb1 (ν) . Similarly, the C ∗ -antiautomorphism of π(U)|H2 can be represented by an anti-unitary operator U2 : H2 → H2 as
πs (A)|H2 7→ U2 π(A)∗ |H2 U2 −1
43
Here we assume that s(ker(π)) ⊂ ker(π), which is certainly the case when π is faithful.
101
(4.18)
and there exists a bijection b2 : N2 → N2 such that U2 maps the coherent subspace Hν with
ν ∈ N2 anti-unitarily onto the coherent subspace Hb2 (ν) . By comparing equations (4.17) and
(4.18) with equations (2.19) and (2.20), the relationship with Wigner’s theorem is clear. In fact,
we have obtained a generalization of Wigner’s theorem: a symmetry in the algebraic approach
can be represented in the physical Hilbert space as a direct sum of a unitary operator and an
anti-unitary operator, each of which maps coherent subspaces unitarily, resp. anti-unitarily, onto
coherent subspaces. The proof of all these facts can be found in section 6.3 (proposition 6.11)
of [2]. In the absence of commutative and discrete superselection rules, we cannot always make
the step from corollary 4.24 to unitary and anti-unitary operators on H. In particular, when
a Jordan*-automorphism s : U → U defines a C ∗ -automorphism of π(U) (rather than a direct
sum of an automorphism and an anti-automorphism), it is not always true that there exists a
unitary operator U : H → H such that π(sA) = U π(A)U −1 . When such an operator U does
exist, we say that the symmetry s is implementable. An important example, which we will need
in the following subsection when we discuss vacuum states, is obtained in the case of a GNSrepresentation corresponding to a state which is invariant under a symmetry (s, s0 ) for which s is
a C ∗ -automorphism:
Theorem 4.25 Let U be a C ∗ -algebra and let (s, s0 ) be a symmetry for which the Jordan*automorphism s : U → U is a C ∗ -automorphism and suppose that the state ρ ∈ S(U) is invariant
under (s, s0 ). Let πρ : U → B(Hρ ) be the GNS-representation associated to the state ρ and suppose that s(ker(π)) ⊂ ker(π). Then there exists a unique unitary operator Us : Hρ → Hρ on the
representation space Hρ that satisfies
Us πρ (A)Us−1 = πρ (sA),
Us Ωρ = Ωρ
(4.19)
for each A ∈ U, where Ωρ ∈ Hρ denotes the cyclic vector corresponding to πρ .
Proof
Because s(ker(π)) ⊂ ker(π), we see that if πρ (A)Ωρ = πρ (B)Ωρ for some A, B ∈ U then also
πρ (sA)Ωρ = πρ (sB)Ωρ . Thus, on the dense subset πρ (U)Ωρ of Hρ we can define a linear operator
Us by
Us πρ (A)Ωρ := πρ (sA)Ωρ .
This operator satisfies
hUs πρ (A)Ωρ , Us πρ (B)Ωρ i = hπρ (sA)Ωρ , πρ (sB)Ωρ i = hπρ (sB ∗ A)Ωρ , Ωρ i = ρ(sB ∗ A) = ρ(B ∗ A)
= hπρ (A)Ωρ , πρ (B)Ωρ i,
where we have used that ρ is invariant under the symmetry. Because s is bijective, we have
Us πρ (U)Ωρ = πρ (U)Ωρ , so Us is indeed unitary. If U has a unit, then the second equation in (4.19)
follows from the fact that s1 = 1, since this implies πρ (s1)Ωρ = Ωρ . If U has no unit, then the
identity follows by taking an approximate unit eν of U. The first equation in (4.19) follows from
Us πρ (A)Us−1 [πρ (sB)Ωρ ] = Us πρ (A)Us−1 [Us πρ (B)Ωρ ] = Us πρ (AB)Ωρ = πρ (sAB)Ωρ
= πρ (sA)πρ (sB)Ωρ
and from the fact that the set {πρ (sB)Ωρ }B∈U is dense in Hρ . To show uniqueness, suppose that
Us0 is a linear operator satisfying the two equations in (4.19). Then for all A ∈ U we have
Us0 πρ (A)Ωρ = Us0 πρ (A)(Us0 )−1 Us0 Ωρ = πρ (sA)Ω = Us πρ (A)Ωρ ,
so Us0 coincides with Us on πρ (U)Ωρ . Hence Us0 = Us .
Now that we have discussed individual symmetries, we will consider symmetry groups. Recall
that after proving Wigner’s theorem we argued that the elements of a connected Lie group of
102
symmetries must be represented by unitary operators rather than anti-unitary ones. In view of
this, we expect that in the algebraic approach connected symmetry groups should be represented
by C ∗ -automorphisms. In the algebraic approach we say that a topological group G is a symmetry
group of the system if there is a morphism α : G → J (U) of topological groups. As demonstrated
in section 2.2 of [7] (theorem II.2.4), it is indeed the case that if a topological group G is a connected
symmetry group then the elements αg := α(g) ∈ J (U) are all C ∗ -automorphisms. In particular,
↑
in relativistic quantum systems the elements (a, L) of the restricted Poincaré group P+
give rise
∗
to C -automorphisms α(a,L) : U → U.
4.2.2
The Haag-Kastler axioms
We will now state and motivate the axioms of the Haag-Kastler framework. In section 4.1.6 we
showed that under some extra assumptions (the Haag-Ruelle axioms) the Wightman framework
is capable of describing particles in a scattering experiment. These extra assumptions concerned
certain properties of the spectrum of the energy-momentum operator and also the existence of oneparticle subspaces in the Hilbert space H and the existence of certain operators in the polynomial
algebra P(M) which generate one-particle states from the vacuum state. Using these operators
which generate one-particle states, it was possible to construct the in- and out-states that one
needs in scattering theory, as well as the Poincare-invariant isometries Ωin/out : HFock → H. It
seems that in the Haag-Ruelle theory the quantum fields only play a role in the background: they
were needed to obtain the correspondence O → P(O) from spacetime domains to ∗-algebras of
operators, and they were needed in the proofs of the mathematical statements (although we did not
consider these proofs in section 4.1.6; see for instance section 12.2 of [2] for detailed proofs). It turns
out that when in a quantum theory a correspondence between spacetime domains and operators
is chosen properly (i.e. as in the Haag-Kastler framework that we will introduce now), the results
of the Haag-Ruelle theory can be derived without using quantum fields, see also chapter 5 of [1].
This fact should be considered as one of the reasons for discussing the Haag-Kastler theory. Apart
from the fact that this framework is capable of incorporating the Haag-Ruelle scattering theory,
Haag and Kastler give (in their paper [21]) as a motivation for introducing their theory that the
true essence of quantum field theory is that it gives rise to the notion of observables which can be
measured in some spacetime region and that the observables corresponding to spacelike separated
regions are compatible. In this sense it should be expected that the Haag-Kastler theory, which
focusses on the assignment of observables to spacetime domains, is more general than quantum
field theory. Our discussion in this and the following subsection is inspired by the books [1] and
[20] and, of course, by the article [21].
As stated in the previous subsection, the Haag-Kastler theory is formulated in the setting of
algebraic quantum theory, so we should begin with the following axiom.
Axiom 0: Algebra of observables
There is a C ∗ -algebra U, called the algebra of observables.
Notice that there is no mention of the choice of the (faithful) physical representation here. The
reason for this, as explained in the article [21], is as follows. Suppose that we have a number of
physical systems that are prepared in identical ways and suppose that for each of these systems
we make a measurement of a set of simultaneously measurable observables A1 , . . . , An , in order to
obtain some knowledge about the unknown state α ∈ S(U) of the identical systems. More specifiA
cally, these measurements provide us with estimates p(Aj , B) ∈ [0, 1] of the probabilities Pα j (B)
that a measurement of observable Aj will result in a value in the Borel set B for the system in
state α. However, each measurement always involves some error and we can only prepare a finite
number of identical systems, so after our measurements we can only conclude that the system is
in some state α ∈ S(U) for which the following inequalities hold:
A
|Pα j (B) − p(Aj , B)| < j .
103
These inequalities do not specify the point α ∈ S(U) exactly, but rather define some neighborhood
in S(U) with respect to the weak topology on S(U). A particular physical representation π should
thus be considered as adequate for describing the system if all open neighborhoods in S(U) that
can be obtained from an experiment (in the way described above) contains an element of Sπ . In
view of these considerations, it seems natural to introduce the following definition. Two physical
representations π1 and π2 are called physically equivalent if every weakly open neighborhood in Sπ1
contains an element of Sπ2 , and vice versa. Note that Sπ1 and Sπ2 are both subsets of the larger
space S(U), so that the definition makes sense. An important result44 in the theory of C ∗ -algebras
now states that any two representations π1 and π2 with ker(π1 ) = ker(π2 ) are physically equivalent
in the sense defined above. In particular, any two faithful representations are physically equivalent
and for this reason it is not necessary to specify the particular choice of the (faithful) physical
representation in the axioms.
The correspondence between spacetime domains and operators is now established by assuming
that the algebra of observables has some a substructure which is given in terms of spacetime domains.
Axiom 1: Local algebras
For each bounded45 open set O ⊂ M in Minkowski spacetime there is a C ∗ -subalgebra U(O) ⊂ U.
The self-adjoint elements of U(O) are interpreted as the observables that can be measured in
the spacetime region O, also called local observables. A local observable which is measurable in
some subset of Minkowski spacetime, should also be measurable in some larger subset of Minkowski
spacetime. This is expressed in the following axiom.
Axiom 2: Monotonicity
If O1 , O2 ⊂ M are bounded open sets satisfying O1 ⊂ O2 , then U(O1 ) ⊂ U(O2 ).
When we have some bounded open set O ⊂ M and some observable A ∈ U(O) which can be
↑
measured in O, then a restricted Poincaré transformation g ∈ P+
should have the effect of map↑
ping A to some observable which can be measured in gO. So an element g ∈ P+
defines for each
(O)
bounded open set O ⊂ M a map αg : U(O) → U(gO). Because restricted Poincaré transformations are assumed to be symmetries of any relativistic quantum system, this map must in fact be
an isomorphism of C ∗ -algebras.
Axiom 3: Covariance
↑
For each restricted Poincaré transformation g ∈ P+
we have an automorphism αg : U → U such
that for each bounded open subset O ⊂ M the restriction of αg to U(O) is a C ∗ -isomorphism
αg : U(O) → U(gO). For a fixed observable A ∈ U, the map g 7→ αg (A) is continuous in g.
If two regions of spacetime are spacelike separated, then no physical process in one of the two
regions can affect a physical process in the other region. In particular, this means that we can perform simultaneous measurements in both regions and therefore the local observables corresponding
to one of the two regions must in fact commute with all local observables corresponding to the
other region.
Axiom 4: Locality
If the bounded open subsets O1 , O2 ⊂ M are spacelike separated, then the algebras U(O1 ) and U(O2 )
commute.
44
Haag and Kastler call this result Fell’s equivalence theorem. In their article [21] there is a reference to the
relevant article of J.M.G. Fell.
45
By a bounded set in M we mean a set with compact closure.
104
Finally, we assume that the algebra U of observables is the smallest C ∗ -algebra containing all
the local observables. This emphasizes the importance of local observables.
Axiom 5: Generating
property
S
The algebra O U(O) is dense in U. Here the union is taken over all bounded open subsets of M.
4.2.3
Vacuum states in the Haag-Kastler framework
A large difference with the Wightman theory is that there is no mention of a vacuum state in
the Haag-Kastler axioms. As we will show now, there is in fact a notion of vacuum states in the
Haag-Kastler framework. To this end, we first consider a physical representation π : U → B(H)
of the algebra of observables and we assume that there is some unitary representation T (x) of the
translation group on the physical Hilbert space H with corresponding energy-momentum generators
P µ . Let EP denote the joint spectral measure of the operators P µ . Then for any Ψ ∈ H we can
define a measure µΨ on Minkowski space M by µΨ (B) = hEP (B)Ψ, Ψi for any Borel set B ⊂ M.
The support of the measure µΨ is of course precisely the support of the wave function of Ψ in
energy-momentum space in case Ψ is a one-particle state. In general, it is given the following
name.
Definition 4.26 Let H be the Hilbert space of a physical system and let T (x) be a unitary
representation of the translation group on H with corresponding energy-momentum generators P µ
which have the joint spectral measure EP . Then for a vector Ψ ∈ H the support of the measure
µΨ : B 7→ hEP (B)Ψ, Ψi on M is called the energy-momentum spectrum of Ψ.
We now have the following lemma, which states that (the representation of) some operators in the
algebra U can shift the energy-momentum spectrum of vectors in the physical Hilbert space. This
lemma is lemma 4.1 in [1].
R
ip·x d4 x has
b(p) =
Lemma 4.27 Let f ∈ C ∞ (M) be such that its Fourier
transform
f
M f (x)e
R
4
bounded support ∆ ⊂ M, and for Q ∈ U define Q(f ) = α(x,1) (Q)f (x)d x as a Bochner integral46 .
Let π : U → B(H) be a representation and suppose that there is a unitary representation T (x) of the
translation group on H with energy-momentum generators P µ . If the energy-momentum spectrum
FΨ ⊂ M of a vector Ψ ∈ H is a closed set, then π(Q(f ))Ψ has energy-momentum spectrum FΨ +∆.
For this reason, we say that the operator Q(f ) ∈ U, with f as in the lemma, increases the energymomentum by ∆. Now define for a future-directed timelike vector e ∈ M the set M− (e) = {p :
p · e < 0}. Note that e lies along the time-axis in some particular inertial frame, so the set M− (e)
contains all energy-momentum vectors which have negative energy in this particular inertial frame.
If ∆ ⊂ M− (e), then the lemma implies that according to this inertial observer the operator π(Q(f ))
decreases the energy of any vector Ψ ∈ H. Thus, if we want some vector Ψ0 ∈ H to represent
a vacuum vector (which has the lowest possible energy in any inertial frame) then we must have
π(Q(f ))Ψ0 = 0 for any Q ∈ U and any smooth f whose Fourier transform has support ∆ ⊂ M− (e)
for some future-directed timelike vector e ∈ M. For such functions f we thus have
ρΨ0 (Q(f )∗ Q(f )) = hπ(Q(f )∗ Q(f ))Ψ0 , Ψ0 i = 0.
When we translate this back into the language of the abstract algebra, we obtain the following
definition.
Definition 4.28 A state ω ∈ S(U) on a C ∗ -algebra U is called a vacuum state if ω(Q(f )∗ Q(f )) = 0
for all Q ∈ U and for any smooth function f whose Fourier transform has bounded support
∆ ⊂ M− (e) for some future-directed timelike vector e ∈ M.
46
A Bochner integral is an integral of a function on a measure space with values in a Banach space. Its definition
is very similar to that of a Lebesgue integral of a complex-valued function on a measure space.
105
Let V+ = {p : p · p ≥ 0, p0 ≥ 0} denote the closed forward light cone in momentum space. Then
clearly M− (e) ⊂ M\V+ for any future-directed timelike vector e ∈ M, so if ω ∈ S(U) is a state
which satisfies ω(Q(f )∗ Q(f )) = 0 for all Q ∈ U and for any smooth function f whose Fourier
transform has bounded support ∆ ⊂ M\V+ , then ω is a vacuum state.
Conversely, suppose that ω is a vacuum state and suppose that f is a smooth function whose
Fourier transform fb has bounded support ∆
S ⊂ M\V+ . If V+ ⊂ M denotes the set of all futuredirected timelike vectors, then M\V+ = e∈V+ M− (e); because each M− (e) is open, we have
obtained an open cover {M− (e)}e∈V+ of M\V+ and hence also of ∆. But ∆ is compact, so there
exists a finite subcover {M− (ej )}nj=1 (with e1 , . . . , en ∈ V+ ) of ∆. Now let {gj }nj=1 be a partition
of unity subordinate to the cover {M− (ej )}nj=1 , i.e. each gj is a smooth function with support in
S
P
M− (ej ) and nj=1 gj (p) = 1 for all p ∈ nj=1 M− (ej ). Now define the smooth functions {fbj }nj=1 by
P
fbj = gj fb. Then the support of each fbj lies in M− (ej ) and n fbj = fb. If we denote the inverse
j=1
Fourier transform of fbj by fj , then we find that
0 ≤ ω(Q(f )∗ Q(f )) =
n X
n
X
ω(Q(fj )∗ Q(fk )) ≤
j=1 k=1
n X
n
X
|ω(Q(fj )∗ Q(fk ))| ≤ 0.
j=1 k=1
p
p
where in the last step we used the property |ω(A∗ B)| ≤ ω(A∗ A) ω(B ∗ B) of states. Thus
ω(Q(f )∗ Q(f )) = 0 and we have proved the following proposition.
Proposition 4.29 A state ω ∈ S(U) on a C ∗ -algebra U is a vacuum state if and only if ω(Q(f )∗ Q(f )) =
0 for all Q ∈ U and any smooth function f whose Fourier transform has bounded support ∆ ⊂
M\V+ .
106
5
Constructive quantum field theory
After our investigation of the Wightman and Haag-Kastler axiom schemes we might wonder
whether there are any concrete models that satisfy all axioms of one or both of these axiom
schemes. Of course, we have already seen that the free field theories are examples of such concrete
models. There are also some other models that were constructed at a very early stage in the development of rigorous quantum field theory, such as the Schwinger and Thirring models, but these
models turned out to be trivial in the sense that the corresponding fields could be expressed as
functions of free fields. The goal of the constructive quantum field theory programm that emerged
in the 1960s was to prove that concrete non-trivial models exist within the Wightman and/or
Haag-Kastler axiom scheme.
In this chapter we will discuss some of the earliest results that were obtained in constructive
quantum field theory. We have used the historical notes [23] and [33] as a guide through the literature, especially concerning the chronology of the results. The two main strategies for constructive
quantum field theory were the Hamiltonian and Euclidean strategy. We will discuss both of them
in separate sections, with a special focus on the scalar boson models with a self-interaction. Because the proofs of almost all theorems that we will be needing are very long and technical, we have
decided not to include them here. Instead, we will focus on the main arguments and we will specify how the different mathematical objects are constructed, without proving that the construction
makes sense mathematically.
5.1
The Hamiltonian approach
In the Hamiltonian approach one begins with a free field theory on Fock space and uses cutoffs in
order to make sense of the interaction term in the Hamiltonian of some interacting field theory.
The methods that are used in this approach are of a functional-analytic nature.
5.1.1
The (λφ4 )2 -model as a Haag-Kastler model
The scalar quantum field theory in 2-dimensional spacetime with a quartic self-interaction was one
of the first non-trivial models that people tried to construct in the 1960s, because it is probably
the simplest of all non-trivial models. The Hamiltonian for this model is given formally by47
Z
H = H0 + λ
: φ0 4 (x) : dx,
(5.1)
R
where H0 is the free field Hamiltonian (see also equation (5.2) below), λ is the coupling constant
and φ0 (x) denotes the free field at time t = 0. This model is called the (λφ4 )2 -model, where
the subindex 2 refers to the number of spacetime dimensions. Since the interaction term is not
well-defined, we will introduce a cutoff version of this interaction. However, we will begin with a
discussion of the free field system in two spacetime dimensions.
Description of the free field
Let H ' L2 (R, dp) be the Hilbert space of one-particle momentum-spin wave functions in 2dimensional Minkowski spacetime M2 for a particle with mass m and spin s = 0 that is equal to
its own antiparticle. Since the spacetime is 2-dimensional, these wave functions ψ(p) depend
only
p
on a single real variable48 p, and analogous to the 4-dimensional case we define ωp = m2 + p2 .
Let F ≡ F+ (H) be the boson Fock space corresponding to H, and let φ be the free scalar field,
defined on real-valued f ∈ S(M2 ) by
√
φ(f ) = 2π(a∗ (rf ) + a(rf ))
47
Here we use a boldface letter x to denote a single real variable, because the notation x is already used to denote
spacetime vectors, which have two components in this 2-dimensional model.
48
We will write p instead of p for the same reason that we write x instead of x.
107
as defined before, where for f ∈ S(M2 ) we define (rf )(p) =
2π
√1
2ωp
R
R2
f (t, x)ei(ωp t−px) dtdx. In
terms of the ill-defined operators a(∗) (p) we can express the field as
Z
dp
1
p
[ei(ωp t−px) a∗ (p) + e−i(ωp t−px) a(p)].
φ(t, x) = √
2π R 2ωp
We also define the sharp-time field φt and its derivative πt := ∂t φt in the same way as in the
4-dimensional case. Of special importance to us will be the t = 0 field
Z
1
dp −ipx ∗
p
φ0 (x) = √
e
[a (p) + a(−p)]
2π R 2ωp
and its canonical conjugate π0 . For real-valued f ∈ S(R), both φ0 (f ) and π0 (f ) are essentially
self-adjoint operators that are defined on the subspace D ≡ D+ of F consisting of all finite particle
states, as defined in subsection 2.2.5:
D =
∞ M
n
[
F n = {ψ = (ψ 0 , ψ 1 , ψ 2 , . . .) ∈ F : ∃N with ψ n = 0 for all n ≥ N },
n=0 j=1
n (H) is the symmetric n-particle Hilbert space, consisting of all square-integrable
where F n ≡ F+
functions ψ n (p1 , . . . , pn ) with ψ n (pσ(1) , . . . , pσ(n) ) = ψ n (p1 , . . . , pn ) for all σ ∈ Sn ; also, we have
used the notation where we write an element ψ ∈ F as a sequence ψ = (ψ 0 , ψ 1 , . . .) with ψ n ∈ F n
for all n. Because φ0 (f ) and π0 (f ) are essentially self-adjoint for real-valued f ∈ S(R), their
closures φ0 (f )− and π0 (f )− are self-adjoint, and by the spectral theorem they define spectral
measures Eφ0 (f ) and Eπ0 (f ) . If O ⊂ R is a bounded open set49 , let
DR (O) = {f ∈ S(R) : f real-valued, supp(f ) ⊂ O}.
If BR denotes the Borel σ-algebra on R, then we define the set

 
[
[
A(O) = 
Eφ0 (f ) (∆) ∪ 
∆∈BR ,f ∈DR (O)

Eπ0 (f ) (∆) .
∆∈BR ,f ∈DR (O)
The Von Neumann algebra A(O) (notice the difference between the letters A and A) is then defined
to be the Von Neumann algebra generated by the set A(O) ⊂ B(F). Equivalently, A(O) is the
Von Neumann algebra generated by the unitary elements eiφ0 (f ) and eiπ0 (f ) with f ∈ DR (O).
We will now show how the operators a(∗) (p) can be defined rigorously. We define the subset
D ⊂ F as the set of all elements ψ = (ψ 0 , ψ 1 , . . .) in D for which ψ n is a Schwartz function for all
n:
D = {ψ = (ψ 0 , ψ 1 , . . .) ∈ D : ψ n ∈ S(Rn ) for all n}.
The annihilation operator a(p) can now be defined as a map a(p) : D → D. The action of a(p)
on an element ψ = (ψ 0 , ψ 1 , . . .) ∈ D is
√
(a(p)ψ)n−1 (p1 , . . . , pn−1 ) = nψ n (p, p1 , p2 , . . . , pn−1 ).
Because a(p) maps D into itself, we can let an arbitrary product a(p1 ) . . . a(pn ) act on D and
hence such products are well-defined operators on D. Furthermore, for any ψ, υ ∈ D such a
product gives rise to a Schwartz function
(p1 , . . . , pn ) 7→ ha(p1 ) . . . a(pn )ψ, υi.
49
We will write open spatial sets (which are subsets of R) by boldface letters to distinguish them from open sets
in two-dimensional spacetime.
108
Unfortunately, the creation operators a∗ (p) are not so well-behaved as the annihilation operators
a(p). Formally, their action on an element ψ = (ψ 0 , ψ 1 , . . .) ∈ D is
n+1
(a∗ (p)ψ)n+1 (p1 , . . . , pn+1 ) = √
X
1
δ(p − pj )ψ n (p1 , . . . , pj−1 , pj+1 , . . . , pn+1 ).
n + 1 j=1
The delta function makes it impossible to define a∗ (p) as an operator on a non-trivial subspace of
F, but the fact that for any ψ ∈ H the operator a∗ (ψ) is the adjoint of a(ψ) suggests that we can
make sense of a∗ (p) as a bilinear form on D × D,
D × D 3 (ψ, υ) 7→ ha∗ (p)ψ, υi := hψ, a(p)υi.
More generally, for any product a∗ (p1 ) . . . a∗ (pn )a(p1 ) . . . a(pm ) we can define a bilinear form on
D × D by
(ψ, υ) 7→ ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υi
:= ha(p01 ) . . . a(p0m )ψ, a(pn ) . . . a(p1 )υi.
For fixed ψ, υ ∈ D, the right-hand side is a Schwartz function in the variables pi , p0j , i.e. fψ,υ ∈
S(Rn+m ), where
fψ,υ (p1 , . . . , pn , p01 , . . . , p0m ) = ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υi.
If F ∈ S 0 (Rn+m ) is a tempered distribution and if we write this distribution as a function
F (p1 , . . . , pn , p01 , . . . , p0m ), then the action of F on fψ,υ can be written as
Z
F (fψ,υ ) =
F (p1 , . . . , pn , p01 , . . . , p0m )fψ,υ (p1 , . . . , pn , p01 , . . . , p0m )dn pdm p0
n+m
R
Z
=
F (p1 , . . . , pn , p01 , . . . , p0m )ha∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )ψ, υidn pdm p0 .
Rn+m
In this sense, we may say that for each distribution F (p1 , . . . , pn , p01 , . . . , p0m ) ∈ S 0 (Rn+m ) we can
define the integral
Z
F (p1 , . . . , pn , p01 , . . . , p0m )a∗ (p1 ) . . . a∗ (pn )a(p01 ) . . . a(p0m )dn pdm p0 .
Rn+m
Since ωp δ(p − p0 ) is a Schwartz distribution in the variables p and p0 , i.e. ωp δ(p − p0 ) ∈ S 0 (R2 ),
we can use this to define for each n the operator
Z
Z
Nn :=
ωp n δ(p − p0 )a∗ (p)a(p0 )dpdp0 =
ωp n a∗ (p)a(p)dp.
R2
R
For n = 0 this gives the number operator N0 = N and for n = 1 this gives the free Hamiltonian
N1 = H0 ,
Z
H0 =
ω(p)a∗ (p)a(p)dp.
(5.2)
R
This bilinear form gives rise to a self-adjoint operator, the domain of which we denote by D(H0 ).
The interaction term
For g ∈ S(R), let gb ∈ S(R) denote its Fourier transform. Then, as a bilinear form, we may define
V (g) :=
4 Z
X
4
n=0
n
R4
dp1 . . . dp4
gb(p1 + . . . + p4 )
a∗ (p1 ) . . . a∗ (pn )a(−pn+1 ) . . . a(−p4 ).
√
√
(2π)3/2 ωp1 . . . ωp4
109
This bilinear form defines a self-adjoint operator (with domain D(V (g))), which we will also denote
by V (g), that is essentially self-adjoint on the subspace
D0 =
∞
\
D(H0n ).
n=0
The right-hand side in the definition of V (g) can be further rewritten, resulting in
R
" 4 #
Z
−i(p1 +...+p4 )x dx
√1
g(x)e
X
4
R
p
p
a∗ (p1 ) . . . a∗ (pn )a(−pn+1 ) . . . a(−p4 )
V (g) =
dp1 . . . dp4 2π
(2π)3/2 ω(p1 ) . . . ω(p4 ) n=0 n
R4
)
Z (Z
e−ip1 x dp1
e−ip4 x dp4
∗
∗
√ √
... √ √
: [a (p1 ) + a(−p1 )] . . . [a (p4 ) + a(−p4 )] : g(x)dx
=
2π ωp1
2π ωp4
R
R4
#4
Z "Z
e−ipx dp ∗
√ √ [a (p) + a(−p)] : g(x)dx
:
=
2π ωp
R
R
Z
=
: φ0 4 (x) : g(x)dx.
R
So V (g) is in fact a smeared out version of the interaction term in the total Hamiltonian (5.1). We
define the cut-off Hamiltonian H(g) to be
H(g) = H0 + V (g).
This cut-off Hamiltonian is self-adjoint with domain D(H0 ) ∩ D(V (g)).
For any bounded open set O ⊂ R we define the set
Ot = {x ∈ R : dist(x, O) < |t|}.
With this notation, Glimm and Jaffe show in the article [11] that the free Hamiltonian satisfies
eitH0 A(O)e−itH0 ⊂ A(Ot ),
(5.3)
where A(O) is the Von Neumann algebra that was defined above. Now let Eφ0 (f ) be the spectral
measure of the closure of φ0 (f ), which was already used in the definitions of A(O) and A(O). We
then define M to be the Von Neumann algebra generated by the set of projections
[
Eφ0 (f ) (∆).
∆∈BR ,f ∈S(R)
Note that the functions f are not assumed to have a bounded support this time. Perhaps the
most important result in the article [11] is the following theorem, which will allow us to remove
the cut-off in the time-evolution of a local observable.
Theorem 5.1 Let O ⊂ R be an open interval. If g ∈ S(R) is real-valued with supp(g) ⊂ O, then
eitV (g) ∈ A(O) ∩ M.
(5.4)
In what follows we choose O to be of the form O = {x ∈ R : |x| < M }. Now let A ∈ A(O), and
fix some n ∈ N and some g ∈ S(R). For 1 ≤ k ≤ n and t ∈ R we then define
ik h it
ik
h it
it
it
An,k (t) := e n H0 e n V (g) A e− n V (g) e− n H0 .
Let > 0. We can write g as a sum g = g1 + g2 , where g1 and g2 are smooth and satisfy
supp(g1 ) ⊂ O and supp(g2 ) ∩ O/2 = ∅. Then V (g) = V (g1 ) + V (g2 ) and it follows from the
definition of V (g) that V (g1 ) and V (g2 ) commute. Thus, for any t ∈ R we have
it
it
it
e n V (g) = e n V (g1 ) e n V (g2 ) .
110
it
Because supp(g2 ) ∩ O/2 = ∅, e n V (g2 ) commutes with50 A(O/4 ) and hence, in particular, with our
operator A ∈ A(O) ⊂ A(O/4 ). From this it follows that for An,1 (t) we have
it
it
it
it
it
it
it
it
An,1 (t) = e n H0 e n V (g) Ae− n V (g) e− n H0
it
it
it
it
= e n H0 e n V (g1 ) e n V (g2 ) Ae− n V (g2 ) e− n V (g1 ) e− n H0
it
it
= e n H0 e n V (g1 ) Ae− n V (g1 ) e− n H0 .
Thus, An,1 (t) only depends on g1 ; in other words, An,1 (t) depends only on the value of g in the
it
it
region O . Theorem 5.1 implies that e n V (g1 ) Ae− n V (g1 ) ∈ A(O ) and equation (5.3) then implies
that
it
it
it
it
An,1 (t) = e n H0 e n V (g1 ) Ae− n V (g1 ) e− n H0 ∈ A((O )t/n ) = A(O+t/n ).
it
it
it
it
Because An,k (t) = e n H0 e n V (g) An,k−1 (t)e− n V (g) e− n H0 , we can repeat the procedure above (with
An,k−1 (t) instead of A) by choosing in each step an appropriate decomposition g = g1 + g2 . The
result is that for each 1 ≤ k ≤ n we have An,k (t) ∈ A(Okt/n+k ) and that An,k (t) depends only
on the value of g in Okt/n+k . In particular, for k = n we find that An,n (t) ∈ A(Ot+n ) and that
An,n (t) depends only on the value of g in Ot+n . Because
T was arbitrary, An,n (t) 0depends only on
the value of g in Ot (the closure of Ot ), and An,n (t) ∈ >0 A(Ot+ ). Thus, if O ⊂ R is an open
region with O0 ∩ Ot = ∅ then An,n (t) commutes with every observable B ∈ A(O0 ). Because n was
arbitrary, the statements above hold for all n and thus also for
σt (A) := eitH(g) Ae−itH(g) = strong lim An,n (t),
n→∞
where we have used the Trotter product formula, which states that if S and T are self-adjoint and
S + T is essentially self-adjoint on D(S) ∩ D(T ), then for each ψ ∈ H we have
i i n
i(S+T )
e
ψ = lim e n S e n T ψ.
n→∞
In the present case this means that
eitH(g) = strong lim
n→∞
it
it
e n H0 e n V (g)
n
.
In particular, σt (A) depends only on the value of g in the region Ot . The idea is now to take
g ∈ S(R) to be a nonnegative function such that it equals the coupling constant λ in the region
Ot . The time evolution σt (A) of A ∈ A(O) is then determined by the value of g in the region
where it equals λ, and hence the cut-off has been removed. This was the main result of the article
[11]. We will now turn to the second article, namely [12], of Glimm and Jaffe on the (λφ4 )2 -model.
The ground state of H(g) and the field operators
The Hamiltonian H(g) defined above is bounded from below, i.e. its spectrum has an infimum
Eg ∈ R. This was shown by Nelson in [27], and later in a more general context by Glimm in
[10]. There exists a vector Ωg ∈ F that satisfies H(g)Ωg = Eg Ωg and kΩg k = 1 and this vector
is uniquely determined up to a phase factor. This phase factor is fixed by the requirement that
hΩg , ΩFock i > 0. The existence and uniqueness of the ground state Ωg are the content of theorems
2.2.1 and 2.3.1 of the article [12].
Using the t = 0 field φ0 (x), we define the field φg (t, x) by
φg (t, x) = eitH(g) φ0 (x)e−itH(g)
as a bilinear form on some subset of F, and if ψ ∈ F lies in this subset, then the function
(t, x) 7→ hφg (t, x)ψ, ψi is continuous. For each f ∈ S(R2 ) we then define another bilinear form
Ag,f (t) by
Z
Ag,f (t) =
φg (t, x)f (t, x)dx.
R
50
This is argued by Glimm and Jaffe in the proof of the theorem 5.1 above.
111
This bilinear form gives rise to a self-adjoint operator which we will also denote by Ag,f (t). In a
similar fashion, we obtain a self-adjoint operator from the bilinear form
Z
Bg,f (t) =
πg (t, x)f (t, x)dx,
R
where πg (t, x) = eitH(g) π0 (x)e−itH(g) . Using these self-adjoint operators, we then define the integrals
Z
Ag,f (t)ψdt
φg (f )ψ :=
R
Z
Bg,f (t)ψdt,
πg (f )ψ :=
R
which in turn define closed symmetric operators φg (f ) and πg (f ). Under certain conditions on f ,
which we will not discuss here (see section 3.2 of the article [12] for the details), these operators
satisfy
(∂t φg )(f ) = πg (f ) = [iH(g), φg (f )]
on a certain subset of F. A similar reasoning then also gives that
(∂t2 φg )(f ) = [iH(g), πg (f )].
The commutator on the right-hand side is a bilinear form equal to
Z
2
2
(∂x φg )(f ) − m φg (f ) − 4
: φ3g (t, x) : f (t, x)g(x)dxdt,
R2
where : φ3g (t, x) : is a shorthand for eitH(g) : φ30 (x) : e−itH(g) , so φg satisfies the differential equation
( + m2 )φg (f ) = −4λ : φ3g : (f ),
where the equality is in the sense of bilinear forms. If f has compact support, the operator φg (f )
is self-adjoint and the differential equation above can be interpreted as an equality of self-adjoint
operator-valued distributions. Also, if f has compact support, say supp(f ) ⊂ O for some bounded
open region in R2 , then we can remove the cutoff g in a similar manner as we did for operators
A ∈ A(O) above, i.e. by choosing a function g (O) ∈ S(R) such that g (O) (x) = λ on an interval I
of R whose causal shadow contains O. Thus for each bounded open subset O we can define a field
φ(O) (t, x) without any cutoff, where φ(O) (f ) = φg(O) (f ). We now want to patch such φ(O) (t, x)
together to form a field φ0 (t, x) without cutoffs. To accomplish this, we divide R2 into overlapping
squares Sj and we define a partition of unity subordinate to this open cover of overlapping squares,
see also section
3.4 of [12]. Thus, we define a set of functions ζj : R2 → [0,
P
P1] with supp(ζ
P j ) ⊂ Sj
2
and with j ζj = 1R2 . A function f ∈ S(R ) can then be written as f = j f ζj =: j fj , where
supp(fj ) ⊂ Sj . The idea is now to define a bilinear form
XZ
A0g,f (t) =
φ(Sj ) (t, x)fj (t, x)dx,
j
R
which gives rise to a self-adjoint operator which we will also denote by A0g,f (t). In a similar
0 (t) by replacing φ(Sj ) by π (Sj ) . We then define
manner we also obtain a self-adjoint operator Bg,f
the integrals
Z
0
φ (f )ψ :=
A0f,g (t)ψdt
(5.5)
R
Z
0
π 0 (f )ψ :=
Bf,g
(t)ψdt,
(5.6)
R
112
which give rise to closed symmetric operators φ0 (f ) and π 0 (f ) for f ∈ S(R2 ). We write φ0 and π 0
(instead of φ and π) to distinguish these objects from the free field φ and its time-derivative π.
The field φ0 is local in the sense that φ0 (f1 ) and φ0 (f2 ) commute whenever the supports of f1 and
f2 are mutually spacelike separated.
The algebra of local observables
For each bounded open subset O ⊂ R2 of spacetime, we define U(O) to be the Von Neumann
algebra generated by the set
0
{eiφ (f ) : supp(f ) ⊂ O, f = f }
of (bounded) operators on F. We will show that these algebras satisfy the Haag-Kastler axioms.
It is clear that for bounded open sets O1 ⊂ O2 ⊂ R2 we have U(O1 ) ⊂ U(O2 ), so monotonicity is
satisfied. By construction of the field φ0 , we can find for any bounded open O ⊂ R2 a function g
which equals λ on an interval of R and is such that for all f ∈ S(R2 ) with supp(f ) ⊂ O we have
ei∆tH(g) φ0 (f )e−i∆tH(g) = φ0 (f∆t,0 ),
(5.7)
where f∆t,∆x (t, x) = f (t−∆t, x−∆x). Because supp(f∆t,0 ) ⊂ O +(∆t, 0), we see that the spectral
measures of φ0 (f∆t,0 ) are in U(O + (∆t, 0)). So (5.7) induces a map U(O) → U(O + (∆t, 0)) for each
bounded open region O ⊂ R2 and this map is in fact a C ∗ -isomorphism. Thus, this map gives us
a transformation of the algebras U(O) under time translations that is of the form required by the
Haag-Kastler axioms. For space translations, we first consider the free field generator P of space
translations. Because φ0 (0, x) = φ0 (x), we have
e−i∆xP φ0 (0, x)ei∆xP = φ0 (0, x + ∆x).
If one now chooses a cutoff function g such that the interval where g = λ is large enough, we get
e−i∆xP φ0 (t, x)ei∆xP = e−i∆xP eitH(g) φ0 (0, x)e−itH(g) ei∆xP
= eitH(g) e−i∆xP φ0 (0, x)ei∆xP e−itH(g)
= eitH(g) φ0 (0, x + ∆x)e−itH(g)
= φ0 (t, x + ∆x),
see section 3.6 of [12]. In particular, this shows that
e−i∆xP φ0 (f )ei∆xP = φ0 (f0,∆x )
for any f ∈ S(R2 ) with bounded support. Applying the same reasoning as for time translations,
we find a map that transforms U(O) to U(O + (0, ∆x)) as required by the Haag-Kastler axioms.
Showing that there also exists a transformation of the algebras U(O) under Lorentz boosts with
all the desired properties is much harder. In fact, Cannon and Jaffe have written an article of 61
pages to prove this covariance under Lorentz boosts, see [4]. The starting point of their solution is
to consider the expression for the Lorentz boost generator as used in physics, i.e. the expression
in terms of the energy-momentum tensor of the field. Analogues to the interaction term in the
Hamiltonian, they introduce cutoff functions to obtain a well-defined local version of the boost
generator. This local boost generator then defines a transformation of the algebras U(O) under
Lorentz boosts, and this transformation can be shown to have the desired properties. This, together
with the results about spacetime translations above, completes the verification of the covariance
axiom. So for each Poincaré transformation (a, L) in two-dimensional spacetime, we have a C ∗ automorphism σ(a,L) : U → U such that for each bounded region O of spacetime the restriction of
σ(a,L) to U(O) defines a C ∗ -isomorphism
σ(a,L) : U(O) → U((a, L)O).
(5.8)
Finally, the locality axiom is also satisfied because we already noticed that φ0 (f1 ) and φ0 (f2 )
commute whenever the supports of the two compactly supported functions f1 and f2 are spacelike
separated, and the algebras U(O) are constructed from the smeared fields φ0 (f ) with f compactly
supported. The algebra of observables U is now obtained by taking the norm completion of
S
O U(O).
113
5.1.2
The physical vacuum for the (λφ4 )2 -model
As stated above, the cutoff Hamiltonian H(g) has an infimum Eg ∈ R and there exists a unique
unit vector Ωg ∈ F, up to a phase factor, that satisfies H(g)Ωg = Eg Ωg . This vector was called
the ground state of H(g). In order to obtain an operator such that the ground state has eigenvalue
0, we define
Hg := H(g) − Eg .
For this operator the vector Ωg is the unique vector, up to a phase factor, with the property
Hg Ωg = 0. Corresponding to the vector Ωg we can define a linear functional ωg : U → C, with U
the algebra of observables defined above, by
ωg (A) = hAΩg , Ωg i.
This linear functional is a state in the sense of C ∗ -algebras, as is any vector state. In the article
[13], Glimm and Jaffe use this state ωg to construct a physical vacuum state. They begin with
a cutoff function g that equals the coupling constant λ in some interval of the form [−M, M ]
and then they define the sequence (gn ) of functions gn (x) = g(x/n). If σx : U → U denotes the
transformation of U corresponding to a space translation over the vector x, then they define a
sequence of states (ωn ) by
Z
h(x/n)
ωn (A) =
hσx (A)Ωgn , Ωgn idx,
(5.9)
n
R
where h is a smooth nonnegative function with support in [−1, 1] and
Z
h(x)dx = 1,
R
which also implies that x 7→ h(x/n)/n integrates to 1. This sequence (ωn ) of states can be shown
to have a weakly-convergent subsequence (ωnk ), the limit of which is denoted by ω, i.e. for all
A ∈ U we have
lim ωnk (A) = ω(A).
(5.10)
k→∞
The obtained state ω can now be used to define a physical vacuum state in a physical Hilbert
space, via the Gelfand-Naimark-Segal construction. Thus we consider the GNS-representation
πω : U → B(Hω ) of the C ∗ -algebra U in the Hilbert space Hω , in which the state ω is given by
ω(A) = hπω (A)Ωω , Ωω i,
where the unit vector Ωω ∈ Hω is uniquely determined up to a phase factor. As shown in theorem
2.1 of [13], on the Hilbert space Hω there exists a unitary representation U (a) of the translation
group and this representation satisfies
U (a)πω (A)U (a)∗ = πω (σ(a,1) (A))
U (a)Ωω = Ωω ,
where A ∈ U and σ(a,L) is as in equation (5.8). The existence of U (a) follows from the fact that
ω is a translation-invariant state, i.e. ω(σ(a,1) (A)) = ω(A). The representation U (a) is stronglycontinuous, so according to the SNAG-theorem (which we also used in step 1 of the classification
e↑ ) there exist two commuting self-adjoint operators H and P such that
of the irreps of P
+
U (a) = eia·P ,
where P = (H, P). The operator H is positive, which is a consequence of (5.10) and of the fact
that each Hgn is positive; see also the end of the proof of theorem 2.1 of [13].
114
In the proof of theorem 2.2 of [13], it is shown that Hω is a separable Hilbert space and that
for each bounded region O of spacetime, there exists a unitary operator UO : F → Hω such that
for all A ∈ U(O)
πω (A) = UO AUO∗ .
This is also true if O is replaced by a bounded region O of space at time t = 0. So locally the
representation is unitarily equivalent to the local algebra of Fock space operators. For this reason,
the representation (Hω , πω ) is called locally Fock. This property can be used to construct fields on
the Hilbert space Hω as follows. Let O be a bounded open region of spacetime and let f ∈ S(R2 )
be a real-valued function with support in O. Then φ0 (f ) is self-adjoint on the Fock space F, and
0
hence s 7→ eisφ (f ) defines a strongly-continuous one parameter unitary group on F. Using the
unitary map UO described above, we then obtain a strongly-continuous one parameter unitary
group
0
0
s 7→ πω (eisφ (f ) ) = UO eisφ (f ) UO∗
on the Hilbert space Hω . According to Stone’s theorem there exists a self-adjoint operator φω (f ) on
Hω that generates this unitary group. In Stone’s theorem the self-adjoint operator is constructed
explicitly in terms of the derivative of the unitary group with respect to the parameter, and from
this construction it easily follows that the generators of the two unitary groups on F and Hω are
related by
φω (f ) = UO φ0 (f )UO∗ .
In the previous subsection we showed how a partition of unity can be used to define the smeared
fields φ0 (f ) for f ∈ S(R2 ). This same technique can also be used to define φω (f ) for arbitrary
Schwartz functions, see also the last pages of [15].
5.1.3
The P(φ)2 -model and verification of some of the Wightman axioms
At the beginning of the 1970s, all results in the previous two sections on the (λφ4 )2 -model were
rederived for the more general P(φ)2 -model, characterized (formally) by the Hamiltonian
Z
H = H0 + λ
: P(φ0 (x)) : dx,
R
where P is a polynomial that is bounded from below. So for the P(φ)2 -model the Haag-Kastler
axioms were established, as well as the existence of a vacuum state ω that gives rise to a locally Fock
representation (Hω , πω ) of the algebra of observables and on Hω there is a unitary representation
of the translation group with corresponding energy-momentum operators. Also, the locally Fock
representation allows the construction of fields φω as in the (λφ4 )2 -model. The problem of verifying
the Wightman axioms for the (λφ4 )2 -model could thus be investigated in the more general context
of the P(φ)2 -model.
Some of the first (new) results in this more general context were derived in the articles [14]
and [15]. In these articles, Glimm and Jaffe show that the energy-momentum spectrum lies in the
forward light cone for the P(φ)2 -model, as required by the Wightman axioms (spectral condition),
and that HΩω = 0 = P Ωω . They also show that, under the assumption that the model has
a mass gap, the vacuum vector Ωω is unique and the vacuum expectation values exist; we will
come back to the vacuum expectation values later. What is not established in these articles is
the Lorentz invariance of the vacuum, i.e. ω(σ(0,L) (A)) = ω(A) for all A ∈ U. The state of
affairs for the P(φ)2 -model at this point of history (i.e. 1970/1971) is also summarized in part I
of the lecture notes [16], which can also be found in volume 1 of the two-volume book ’collected
papers’ of Glimm and Jaffe. However, it soon became clear that there was a gap in the proof of the
spectral condition, as was pointed out by Frölich and Faris. So the spectral condition was no longer
established for the P(φ)2 -model. In the meantime, Streater proved in the article [31] that if one
could prove the spectral condition, then the Lorentz covariance of the Wightman functions would
follow automatically (given the results that were already established at that point). The existence
of these Wightman functions (as tempered distributions) was established in the article [17] of
115
Glimm and Jaffe which was published later than the article [31] of Streater, but Streater explains
that Glimm and Jaffe communicated some of their results before their article was published. In
[17] Glimm and Jaffe begin with the quantities of the form
hφ0 (f1 ) . . . φ0 (fm )Ωg , Ωg i
(5.11)
and they show that these quantities can be bounded in absolute value by a product of Schwartz
space norms kf1 k1 . . . kfm km ,
|hφ0 (f1 ) . . . φ0 (fm )Ωg , Ωg i| ≤ kf1 k1 . . . kfm km ,
independently of the cutoff g, and such that each of the norms is translation invariant (i.e. a
translation of a function f does not change the norm of f ). In a similar manner as in equation
(5.9) (with n = 1), they then average the quantities in (5.11) over space translations:
Z
h(x)hφ0 ((f1 )((0,x),1) ) . . . φ0 ((fm )((0,x),1) )Ωg , Ωg idx,
R
where h is a similar function as in (5.9) and f(a,L) (x) = f (L−1 (x − a)) as usual. Due to translation
invariance of the norms described above, this averaged quantity is also bounded by the same product of norms. By considering sequences (gn ) as in (5.9) and by taking a convergent subsequence,
we then obtain quantities that we denote by
hφω (f1 ) . . . φω (fm )Ωω , Ωω i.
It is quite nice that this procedure gives us the opportunity to somehow define the vacuum expectation values for the field φω , even though we are not sure whether the expressions φω (f1 ) . . . φω (fm )Ωω
are well-defined. The bounds above still hold for these vacuum expectation values, which shows
that they are separately continuous and therefore, by the nuclear theorem, define tempered distributions on the Schwartz space S(R2m ). Although it is not yet clear whether these vacuum
expectation values satisfy all the properties of Wightman functions, we can still use the reconstruction theorem to construct a Hilbert space with quantum fields, but this theory might not
satisfy all the Wightman axioms. For instance, it is not clear whether the spectrum condition
is satisfied or whether the vacuum is unique. Also Lorentz covariance is not established, but as
explained above this follows once the spectrum condition holds. A summary of all results for the
P(φ)2 -model up to this moment in history (i.e. 1972) is given in the notes [18], which can also be
found in volume 1 of the ’collected papers’.
5.1.4
Similar methods for other models
Without any further details we mention that, up to the beginning of the 1970s, similar techniques that were used for the P(φ)2 -model, were also used to establish some results for other
models. Among these models were the two-dimensional Yukawa-model, or Y2 -model, and the twodimensional model with exponential bosonic self-interaction. However, the results for these models
did not go as far as those for the P(φ)2 -model. For a summary of the results for the Y2 -model,
one can consult part II of the notes [16].
5.2
The Euclidean approach
Despite the hard work of constructive field theorists that we described above, the results at the
beginning of the 1970s were still very small. For that reason there was a large need for a new
approach to the constructive field programm, other than the brute-force methods above. This new
approach was Euclidean quantum field theory.
To understand Euclidean quantum field theory, recall that the Wightman functions are boundary values of holomorphic functions Wihol
(z1 , . . . , zn ) defined on the extended tube Tn0 . Here we
1 ...in
consider the general case of a quantum field theory in d-dimensional Minkowski spacetime Md . It
116
can be shown that in a quantum field theory (in the sense of Wightman) with a normal connection between spin and statistics, these holomorphic functions can be analytically continued to the
symmetrized tube
[
(Tn0 )S :=
σTn0 ,
σ∈Sn
where σ ∈ Sn permutes the n variables of (z1 , . . . , zn ) ∈ Tn0 in the obvious way. This is (part of)
the content of theorem 9.6 in [2]. To each x ∈ Md we now assign an element x0 ∈ Cd given by
x0 = (ix0 , x).
(5.12)
A point (z1 , . . . , zn ) ∈ Cdn where each of the zj is of the form (5.12) is called a Euclidean point.
If such a Euclidean point satisfies the property that zj 6= zk for j 6= k, then we speak of a nonexceptional (or non-coincident) Euclidean point. The important step toward Euclidean quantum
field theory is the statement that (Tn0 )S contains all non-exceptional Euclidean points, which is
proposition 9.10 in [2]. As a consequence, the Wihol
(z1 , . . . , zn ) are holomorphic on the set of
1 ...in
non-exceptional Euclidean points. We can use this property as follows. Define the set of points
(Rd )n6= := {(x1 , . . . , xn ) ∈ (Rd )n : xj 6= xk for all j 6= k}
and let x 7→ x0 be as in (5.12). Then we can define the Schwinger functions
Si1 ...in (x1 , . . . , xn ) := Wihol
(x01 , . . . , x0n ),
1 ...in
which are holomorphic functions on (Rd )n6= . The most important property of the Schwinger functions is that they are E+ (d)-covariant, i.e. covariant in the Euclidean sense.
5.2.1
Euclidean fields and probability theory
At the beginning of the 1970s Edward Nelson developed a framework for Euclidean quantum fields
in terms of certain stochastic processes. For a good comprehension of these ideas we first recall
some terminology from probability theory. The entire content of this subsection can be found in
[30] and [28], but the order in which we present the material is quite different from these references.
Probability spaces
A probability space is a measure space (X, A, µ) with µ(X) = 1. The σ-algebra A has the structure
of a ring when we define addition by A∆B = (A\B) ∪ (B\A) and multiplication by A ∩ B. If Nµ
denotes the collection of all sets in A with µ-measure zero, then Nµ is an ideal in A and we can
define the quotient ring A/Nµ ; we denote the equivalence class of A ∈ A by [A]. The measure µ
then defines a function [µ] on this quotient in the obvious way. Two probability spaces (X, A, µ)
and (X 0 , A0 , µ0 ) are called isomorphic if there exists a ring isomorphism ψ : A/Nµ → A0 /Nµ0 such
that for all [A] ∈ A/Nµ we have [µ0 ](ψ([A])) = [µ]([A]).
Let (X, A, µ) and (X 0 , A0 , µ0 ) be two probability spaces and let T : X → X 0 be (A, A0 )measurable, i.e. T −1 (A0 ) ∈ A for all A0 ∈ A0 . Then T is called a measure-preserving transformation if µ(T −1 (A0 )) = µ0 (A0 ) for all A0 ∈ A0 . If T is bijective and if its inverse T −1 : X 0 → X
is also measure-preserving, then T is called an invertible measure-preserving transformation. In
particular, we can apply this terminology to the case where the two probability spaces coincide. In
this case the invertible measure-preserving transformations form a group under composition, which
will be denoted by T (X, A, µ), or simply T when there is no confusion about the probability space.
Random variables
A function f : X → R on a probability space (X, A, µ) is called a random variable if it is (A, BR )measurable, where BR denotes the Borel σ-algebra on R. We denote the set of all random variables
on (X, A, µ) by LR (X, A). A random variable f ∈ LR (X, A) defines a probability measure µf ,
called the probability distribution of f , on the measurable space (R, BR ) by
µf (B) := µ(f −1 (B)).
117
The Fourier transform cf of µf ,
Z
e
cf (t) :=
itx
Z
eitf dµ,
dµf (x) =
X
R
is called the characteristic function of f . The expectation value of an integrable random variable
f ∈ LR (X, A) is defined by
Z
Z
Eµ (f ) :=
f dµ =
xdµf (x).
X
R
Often, we will also write hf iµ to denote the expectation value of f . If f ∈ LR (X, A) is a random
variable with f n integrable, then the n-th moment of f is defined as
Z
Z
f n dµ =
xn dµf (x).
If the characteristic function cf is
moments by differentiation of cf :
Z
X
∞
C , then
R
f has moments of all orders and we can obtain these
n
f dµ = (−i)
n
X
d
dt
n
cf .
t=0
If we have two isomorphic probability spaces (X, A, µ) and (X 0 , A0 , µ0 ) with isomorphism ψ :
A/Nµ → A0 /Nµ0 , then we say that two random variables f ∈ LR (X, A) and f 0 ∈ LR (X 0 , A0 )
correspond under the isomorphism ψ if for all B ∈ BR we have ψ([f −1 (B)]) = [(f 0 )−1 (B)]. When
we come to define Markov fields, we need the following theorem, a proof of which can be found in
section III.3 of [30] (theorem III.7).
Theorem 5.2 Let (X, A, µ) be a probability space and let A0 ⊂ A be a σ-subalgebra. We write µ0
for the restriction of µ to A0 . If f ∈ LR (X, A) is an integrable random variable, then there exists
a unique function (f |A0 ) ∈ LR (X, A0 ) such that for all g ∈ L∞ (X, A0 , µ0 ) we have
Z
Z
g · (f |A0 )dµ0 =
gf dµ,
X
X
i.e. Eµ0 (g · (f |A0 )) = Eµ (gf ).
Finally, we want to define the notion of a representation on a probability space. Let (X, A, µ)
be a probability space and let T be the group of invertible measure-preserving transformations on
(X, A, µ), as defined above. Note that any transformation T ∈ T defines a map from LR (X, A) to
itself, which we will also denote by T , given by
(T f )(x) := f (T −1 (x)).
A representation of a group G on the probability space (X, A, µ) is a homomorphism T : G → T .
We often write Tg rather than T (g) in this context. In case G is a topological group, we also
assume that a representation T is ’continuous’ in the following sense. If gn → g with respect to
the topology in G and if f ∈ LR (X, A), then Tgn f → Tg f in measure, which means that for all
> 0 we have limn→∞ µ({x ∈ X : |Tgn f (x) − Tg f (x)| ≥ }) = 0.
Sets of random variables
If f1 , . . . , fn ∈ LR (X, A) are random variables on a probability space (X, A, µ), then we define
their joint probability distribution µf1 ,...,fn on (Rn , BRn ) by
µf1 ,...,fn (B) = µ((f1 , . . . , fn )−1 (B))
for B ∈ BRn , where on the right-hand side (f1 , . . . , fn ) : X → Rn is given by (f1 , . . . , fn )(x) =
(f1 (x), . . . , fn (x)). Their joint characteristic function cf1 ,...,fn is defined by
Z
cf1 ,...,fn (t1 , . . . , tn ) =
ei(t1 x1 +...+tn xn ) dµf1 ,...,fn (x)
n
ZR
=
ei(t1 f1 +...+tn fn ) dµ.
X
118
We will now define the notion of a σ-algebra generated by a set of random variables. Let S ⊂
LR (X, A) be a set of random variables on some probability space (X, A, µ). Then the σ-subalgebra
of A generated by the collection {f −1 (B) : B ∈ BR , f ∈ S} ⊂ A is called the σ-algebra generated
by the collection S of random variables and we will denote it by AS (note that it is the smallest
σ-subalgebra with respect to which all f ∈ S are measurable). The restriction µS of the measure
µ to this σ-subalgebra defines a probability space (X, AS , µS ). On this probability space we can
again define the set of µS -measure zero sets in AS , which we will denote by NµS . In case the
corresponding quotient ring AS /NµS happens to coincide with A/Nµ , we say that the set S is full.
Now fix some set of random variables f1 , . . . , fk ∈ LR (X, A) on a probability space (X, A, µ).
Then we can consider formal power series
X
an1 ...nk f1 n1 . . . fk nk
(n1 ,...,nk )
in the random variables fj , with addition and multiplication of two such series defined in the
obvious way. By ’formal’ we mean that we do not bother about the convergence and we do not
substitute actual relations that are satisfied for the fj (for instance, if f1 ≡ 1 then we still consider
f1 and f1 2 as two different formal power series). We define partial derivatives of these formal
power series by
∂
∂fj
X
X
an1 ...nk f1 n1 . . . fk nk :=
(n1 ,...,nk )
nj an1 ...nk f1 n1 . . . fj nj −1 . . . fk nk ,
(n1 ,...,nk )
where we use the convention that fj 0−1 = 0. With these formal power series we can define
Wick products of random variables as follows. For each (n1 , . . . , nk ) ∈ (Z≥0 )k the Wick product
: f1 n1 . . . fk nk : is the unique formal power series in the fj that is defined recursively in n =
n1 + . . . + nk by the following relations
: f1 0 . . . fk 0 : = 1
∂
: f1 n1 . . . fk nk : = nj : f1 n1 . . . fj nj −1 . . . fk nk :
∂fj
h: f1 n1 . . . fk nk :iµ = 0,
where h.iµ denotes the expectation value as usual. It follows from the first two relations that
: f1 n1 . . . fk nk : is a power series of degree n1 in f1 , of degree n2 in f2 , and so on. If we have computed all Wick products with n = n1 + . . . + nk ≤ m for some m ∈ Z≥0 , then the second relation
tells us that the Wick products with n = m + 1 can be obtained by computing anti-derivatives
of the Wick products with n = m. The third relation then fixes the constant term in the power
series expansion of the Wick product with n = m + 1 (’the constant of integration’).
Random processes, random fields, Markov fields and Euclidean fields
If T is a set and (X, A, µ) is a probability space, then a map ρ : T → LR (X, A) is called a random
process indexed by T . If V is a vector space and (X, A, µ) is a probability space, then a linear map
λ : V → LR (X, A) is called a linear random process indexed by V . In case V is also a topological
vector space, we also assume that vn → v implies that the sequence (λ(vn )) in LR (X, A) converges
in measure to λ(v). In case V = D(Rd ) (see subsection 4.1.1 for the definition), we call λ a random
field. On D(Rd ) we define for m > 0 and q ∈ R≥0 an inner product h., .i−q,m by
Z
2 −q
hf, gi−q,m := h(−∆ + m ) f, giL2 =
[(−∆ + m2 )−q f ](x)g(x)dd x
Rd
Z
=
Rd
∂2
j=1 ∂xj 2 is the Laplacian
Hm−q (Rd ) be the completion of
where ∆ =
space
fb(k)b
g (k)
dd k
[G(k, k) + m2 ]q
Pd
and G is the Euclidean inner product on Rd . Let the Hilbert
this space. It can be shown that the embedding of D(Rd )
119
in Hm−1 (Rd ) is continuous, so every linear random process λ : Hm−1 (Rd ) → LR (X, A) defines a
random field when we restrict λ to D(Rd ). For this reason, we will call a linear random process
λ : Hm−1 (Rd ) → LR (X, A) a random field, from now on. These random fields do not exhaust the
set of all random fields, but they will suffice for our purposes, so we will restrict our attention to
these random fields.
Let (X, A, µ) be a probability space and let λ : Hm−1 (Rd ) → LR (X, A) be a random field. If
K ⊂ Rd , let Aλ,K ⊂ A be the σ-subalgebra generated by the set of random variables
{λ(h) ∈ LR (X, A) : h ∈ Hm−1 (Rd ), supp(h) ⊂ K}.
Then the random field λ is called a Markov field over Hm−1 (Rd ) if for all open sets U ⊂ Rd and
for every positive random variable f ∈ LR (X, Aλ,U ) ⊂ LR (X, A) we have the Markov property:
Eµ (f |Aλ,U c ) = Eµ (f |Aλ,∂U ),
where U c = Rd \U and ∂U is the boundary of U .
Now let (X, A, µ) be a probability space. A Euclidean field over Hm−1 (Rd ) is a Markov field
λ : Hm−1 (Rd ) → LR (X, A) together with a representation T of the Euclidean group E(d) on
(X, A, µ) such that for all h ∈ Hm−1 (Rd ) and all g ∈ E(d) we have
Tg (λ(h)) = λ(h ◦ g −1 ).
This property is called Euclidean covariance. It is convenient to assume that for any Euclidean
field the set {λ(h)}h is a full set of random variables. This situation can always be obtained by
making the σ-algebra A smaller. For s ∈ R we will use the notation Ys to denote the subset
{(x1 , . . . , xd ) ∈ Rd : x1 = s}.
Theorem 5.3 Let λ : Hm−1 (Rd ) → LR (X, A) be a Euclidean field with corresponding representation T of E(d) on the probability space (X, A, µ) and let E0 : L2 (X, A, µ) → L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) be
defined by E0 (f ) = (f |Aλ,Y0 ). If Tt ∈ E(d) denotes the translation (x1 , . . . , xd ) 7→ (x1 + t, . . . , xd ),
which in turn defines a transformation on L(X, A), then we can define the operator E0 Tt E0 on
L2 (X, A, µ). If the restriction of this operator to L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) is written as P t , then there
exists a positive self-adjoint operator H on L2 (X, Aλ,Y0 , µ|Aλ,Y0 ) such that
P t = e−|t|H .
The operator H plays an important role in Nelson’s axiom scheme for Euclidean field theory, as
we will see later.
Gaussian random variables and Gaussian random processes
If (X, A, µ) is a probability space, then a random variable f ∈ LR (X, A) is called a Gaussian
random variable (G.r.v.) if its characteristic function has the form
1
2
cf (t) = e− 2 at
with a ≥ 0. If a = 0 then µf is a Dirac distribution at the origin, while µf is a Gaussian distribution
whenever a > 0. A finite set f1 , . . . , fn ∈ LR (X, A) of random variables is called jointly Gaussian
if their joint characteristic function has the form
1
cf1 ,...,fn (t1 , . . . , tn ) = e− 2
P
i,j
aij ti tj
with (aij ) a symmetric real positive-definite n × n-matrix. We now have the following important
result, which can be found in section I.1 of [30] (proposition I.2).
120
Proposition 5.4 (Wick’s theorem) Let (X, A, µ) be a probability space and let f1 , . . . , f2n ∈
LR (X, A) be (not necessarily distinct) jointly Gaussian random variables. Then
X
hfi1 fj1 iµ . . . hfin fjn iµ ,
hf1 . . . f2n iµ =
pairings
where h.iµ := Eµ (.) and the sum is over all distinct ways of partitioning the set of indices
{1, . . . , 2n} into n two-element subsets {i1 , j1 }, . . . , {in , jn }.
If (X, A, µ) is a probability space and V is a real vector space, then a linear random process
γ : V → LR (X, A) is called a Gaussian random process indexed by V if each γ(v) is a G.r.v. and
if the set {γ(v) ∈ LR (X, A) : v ∈ V } is a full set of random variables. Note that γ defines a
semi-inner product h., .iγ : V × V → R by
hv, wiγ := hγ(v)γ(w)iµ .
If we define the linear subspace Nγ := {v ∈ V : hv, viγ = 0}, we can form the quotient vector space
V /Nγ to obtain an inner product space, and then complete it to obtain a real Hilbert space HV .
Note that v ∈ Nγ if and only if the probability distribution µγ(v) of the Gaussian random variable
γ(v) : X → R is a Dirac distribution, since hv, viγ = hγ(v)2 iµ is the variance of γ(v) (which can only
be zero for Dirac distributions). In terms of the random variable itself, the condition hv, viγ = 0
is equivalent to γ(v) ≡ 0. We now give the definition of a Gaussian random process over a real
Hilbert space, which is more restrictive than the definition of a Gaussian random process over a
real vector space.
Definition 5.5 Let (X, A, µ) be a probability space and let H be a real Hilbert space. A linear
random process γ : H → LR (X, A) is called a Gaussian random process indexed by H if
(1) each γ(v) is a G.r.v.,
(2) the set {γ(v) ∈ LR (X, A) : v ∈ V } is a full set of random variables,
(3) hv, wiγ = CG hv, wiH with CG ∈ R>0 fixed (by some convention) for all v, w ∈ H.
In [30] the convention is that CG = 12 , but we will keep things general here. It follows from the
positive-definiteness of h., .iH that hγ(v)2 iµ = 0 if and only if v = 0, so for the Gaussian random
variable γ(v) we have γ(v) ≡ 0 if and only if v = 0. The following important theorem, which is
theorem I.6 in [30], states that Gaussian random processes over a real Hilbert space are unique in
a sense.
Theorem 5.6 Let (X, A, µ) and (X 0 , A0 , µ0 ) be two probability spaces and let H be a real Hilbert
space. If γ : H → LR (X, A) and γ 0 : H → LR (X 0 , A0 ) are Gaussian random processes indexed
by H, then there exists an isomorphism ψ : A/Nµ → A0 /Nµ0 of the probability spaces such that
for every h ∈ H the random variables γ(h) and γ 0 (h) correspond under the isomorphism ψ, i.e.
ψ([γ(h)−1 (B)]) = [γ 0 (h)−1 (B)] for all B ∈ BR .
This result allows us to speak of ’the’ Gaussian random process γH indexed by the real Hilbert
space H. The existence of this Gaussian random process is shown in section I.2 of [30], where there
are given several explicit constructions (which are of course equivalent to each other in the sense
of the theorem).
For a real Hilbert space H, we will write the underlying probability space for the Gaussian
random process as (QH , AH , µH ), which is often called Q-space in the literature. For fixed n,
the Wick products : γH (v1 ) . . . γH (vn ) : are square-integrable and we denote their linear span by
n . For n = 0 we define W 0 to be the 1-dimensional space spanned by 1
WH
QH . The algebraic (i.e.
H
L(alg)
n
2
uncompleted) direct sum WH := n∈Z≥0 WH is a dense subspace in L (QH , AH , µH ), so
L2 (QH , AH , µH ) =
M
n∈Z≥0
121
n
WH
.
Most of the time we will interpret the random variables γH (h) as multiplication operators on the
Hilbert space L2 (QH , AH , µH ):
f 7→ γH (h)f.
In this way, the Gaussian random variables γH (h) become the same kind of mathematical object
as quantum fields, namely operators on a Hilbert space.
n by
If A ∈ B(H) is a bounded operator, then we can define a linear operator Γn (A) on WH
Γn (A) : γH (v1 ) . . . γH (vn ) : = : γH (Av1 ) . . . γH (Avn ) : .
It can be shown that this definition is algebraically consistent and that the norm of this operator
L(alg)
n is ≤ kAkn . So if kAk ≤ 1, then we can extend this to a linear operator Γ(A) on
n
on WH
n∈Z≥0 WH
and for this operator we have kΓ(A)k ≤ 1. So Γ(A) is continuous, and we can extend it to all
of L2 (QH , AH , µH ). We thus conclude that any A ∈ B(H) with kAk ≤ 1 defines an operator
Γ(A) ∈ B(L2 (QH , AH , µH )) with kΓ(A)k ≤ 1.
For each operator A on H with domain D(A) ⊂ H, we can also define an operator dΓn (A) on
n by
: γH (D(A)) . . . γH (D(A)) :⊂ WH
n
dΓ (A) : γH (v1 ) . . . γH (vn ) :=
n
X
: γH (v1 ) . . . γH (Avj ) . . . γH (vn ) :,
j=1
L(alg)
for v1 , . . . , vn ∈ D(A). The extension of this operator to
n∈Z≥0 : γH (D(A)) . . . γH (D(A)) : is
denoted by dΓ(A) and is sometimes called the second quantization of A.
Fock space and Gaussian random processes over Hilbert spaces
Let H be a real Hilbert space and let HC be its complexification. Let F(HC ) be the symmetric
Fock space with the creation and annihilation operators A(∗) (h) defined as before for h ∈ H (not
HC ). On F(HC ) we define for h ∈ H the Segal field
p
b
φ(h)
:= CG (A∗ (h) + A(h)),
defined as an operator on the dense subspace D of finite-particle states, which was defined several
b
times before. The operators {φ(h)}
h∈H all commute with each other in the sense that their spectral
measures Eφ(h)
commute.
We
denote
the abelian Von Neumann algebra in B(F(HC )) generated
b
by these spectral measures by M. According to the Gelfand-Naimark theorem we can represent
(the unital abelian C ∗ -algebra) M as C(Σ(M)), where (the compact Hausdorff space) Σ(M) is
the Gelfand spectrum of M. We will write α : M → C(Σ(M)) to denote the corresponding C ∗ isomorphism. The state A 7→ hAΩ, Ωi, with Ω the vacuum vector in F(HC ), defines a probability
measure µΩ on Σ(M) (with the Borel σ-algebra BΣ(M) on the topological space Σ(M)) such that
for all A ∈ M we have
Z
hAΩ, Ωi =
α(A)dµΩ .
Σ(M)
b
Although the φ(h)
are not in M, there is a natural way in which they can be represented as
Borel-measurable functions (and hence random variables) on Σ(M) by extending the continuous
functional calculus to the Borel functional calculus. With abuse of notation we will write these
b
b
random variables as α(φ(h)).
Because on the Fock space we have the equality heiφ(h) Ω, Ωi =
1
2
b
e− 4 khkH , the random variables α(φ(h))
are in fact Gaussian:
Z
1 2
2
b
b
cα(φ(h))
(t) =
eitα(φ(h)) dµΩ = heitφ(h) Ω, Ωi = e− 4 t khkH .
b
Σ(M)
b
The set {α(φ(h))}
h∈H is a full set of random variables on the measurable space (Σ(M), BΣ(M) )
122
and, finally, we also have
Z
hh1 , h2 iα◦φb
b 1 ))α(φ(h
b 2 ))iµ =
= hα(φ(h
Ω
b 1 ))α(φ(h
b 2 ))dµΩ
α(φ(h
Σ(M)
b 1 )φ(h
b 2 )Ω, ΩiF (H ) = hφ(h
b 2 )Ω, φ(h
b 1 )ΩiH
= hφ(h
C
= CG hA∗ (h2 )Ω, A∗ (h1 )ΩiH = CG hh2 , h1 iH
= CG hh1 , h2 iH ,
so by uniqueness of the Gaussian random process indexed by H, γH := α◦φb : H → LR (Σ(M), BΣ(M) )
must be the Gaussian random process indexed by H.
b
For h ∈ HC L
we can define φ(h)
by linearity, as we did when we defined the free√ field.
⊗n
2
Now define U :
→ L (Σ(M), BΣ(M) , µΩ ) by U (h1 ⊗ . . . ⊗ hn ) = (n!)−1/2 ( 2)n :
n∈Z≥0 HC
γH (h1 ) . . . γH (hn ) : and U Ω = 1ΣM . The restriction of U to Fock space is a unitary operator
U : F(HC ) → L2 (Σ(M), BΣ(M) , µΩ ) that satisfies
U Ω = 1Σ(M)
−1
b
U φ(h)U
= γH (h)
n
U Fn (HC ) = WH
.
This shows that there is a very intimate relation between the Gaussian random process indexed
by a real Hilbert space H and the Segal field on the Fock space F(HC ). As operators on a Hilbert
space, the Gaussian random variables γH (h) are just Segal fields on the Fock space F(HC ).
The free Euclidean field in two dimensions
Let N be the Hilbert space obtained by completing the set of real-valued elements in D(R) with
respect to the inner product h., .iN := C1G h., .i−1,m and let γN : N → LR (QN , AN ) be the Gaussian random process indexed by N . An element g ∈ E(2) in the Euclidean group acts naturally
as a linear operator on an element h ∈ N by
(u(g)h)(x) := h(g −1 x).
(5.13)
The operator u(g) preserves the inner product h., .iN on N and therefore ku(g)k = 1. We can
thus define an operator U (g) := Γ(u(g)) on L2 (QN , AN , µN ), and this operator is unitary. Now
U (g), in turn, defines an invertible measure-preserving map Tg : QN → QN , but we will not
show this. We merely mention that the correspondence g 7→ Tg defines a representation of E(2) on
(QN , AN , µN ), see also section III.1 of [30]. Finally, it also turns out that γN satisfies the Markov
property, so the Gaussian random process γN is in fact automatically a Euclidean field. We will call
it the free Euclidean field in two dimensions. This free Euclidean field has the additional property
that the translation subgroup of E(d) acts ergodically, which means that the only random variables
that are left invariant by all translations are constant random variables. Note that proposition 5.4
above implies that γN satisfies
X
hγN (h1 ) . . . γN (h2n )iµN =
hγN (hi1 )γN (hj1 )iµN . . . hγN (hin )γN (hjn )iµN
(5.14)
pairings
for all h1 , . . . , h2n ∈ N .
The connection with the free hermitean scalar quantum field
We will now argue that there is a very natural connection between the free hermitean scalar quantum field in 2-dimensional spacetime and the free Euclidean field γN defined above. Consider the
Schwinger functions S(x1 , . . . , xn ) for the free hermitean scalar field in 2-dimensional spacetime. If
n is odd, these functions are identically zero because the same is true for the Wightman functions.
The even Schwinger functions S(x1 , . . . , x2n ) can all be obtained from S(x1 , x2 ) by the recurrence
relation
X
S(x1 , . . . , x2n ) =
S(xi1 , xj1 ) . . . S(xin , xjn ),
pairings
123
where the sum is defined as in proposition 5.4 above. In particular, S(x1 , x2 ) is symmetric in its
two arguments, and so all S are symmetric under permutations of the arguments. This symmetry
of the Schwinger functions is in fact present in any boson quantum field theory. The explicit form
of S(x1 , x2 ) is given by
Z
1
eiG(k,x1 −x2 ) 2
S(x1 , x2 ) =
d k,
(2π)2 R2 G(k, k) + m2
where G denotes the Euclidean inner product in R2 and m is the mass of the free field. This formula
can be formally derived as follows. The two-point Wightman function for the free scalar quantum
field in 2-dimensional spacetime follows easily from the 4-dimensional case (4.8) by removing two
1
space variables and by removing two factors 2π
. With some additional formal manipulations we
then find51
Z
1 −i[ωp (x−y)0 −p(x−y)]
1
1
p
e
dp
W (x, y) =
2
2π R 2
p + m2
Z
Z
1 −i[ωp (x−y)0 −p(x−y)] 1
dp0
1
dp
=
e
2π R 2
π R (p0 )2 + p2 + m2
Z
0
e−i[ωp (x−y) −p(x−y)] 2
1
d p.
=
(2π)2 R2 (p0 )2 + p2 + m2
Replacing the spacetime vectors xj = ((xj )0 , xj ) by x0j = (i(xj )0 , xj ) gives the desired result for
S(x1 , x2 ). Although S(x1 , x2 ) is a function on (R2 )26= , it is sufficiently well-behaved to define a
distribution on S(R4 ) by
Z
S(f ) :=
S(x1 , x2 )f (x1 , x2 )d4 x,
R4
R2 .
where x1 , x2 ∈
From this and the recurrence relation above it also follows that S(x1 , . . . , xn )
defines a distribution on S(R2n ). In terms of these distributions we can formulate the recurrence
relation above as
X
S(f1 · . . . · f2n ) =
S(fi1 · fj1 ) . . . S(fin · fjn ),
(5.15)
pairings
where f1 , . . . , f2n ∈ S(R2 ) and (f1 · . . . · fj )(x1 , . . . , xj ) := f1 (x1 ) . . . fj (xj ) for any 2 ≤ j ≤ 2n. Now
notice the resemblence between equations (5.14) and (5.15). In fact it goes much further than a
mere resemblence, as we will now show. First observe that for any real-valued f1 , f2 ∈ S(R2 ) ⊂ N
we have
hγN (f1 )γN (f2 )iµN
= hf1 , f2 iγN = CG hf1 , f2 iN
Z
1
= CG
[(∆ + m2 )−1 f1 ](x)f2 (x)d2 x
CG R2
Z
fb1 (k)fb2 (k) 2
=
d k
2
R2 G(k, k) + m
Z Z
1
eiG(k,x1 −x2 )
=
f1 (x1 )f2 (x2 )d4 xd2 k
(2π)2 R2 R4 G(k, k) + m2
= S(f1 · f2 ),
where we have used that fb2 is not necessarily real-valued when f2 is real-valued. When we combine
51
The reason for including this non-rigorous derivation is that we can check that all factors of 2π are correct.
124
this equality with equations (5.14) and (5.15) we find that
Z
γN (f1 ) . . . γN (f2n )dµN = hγN (f1 ) . . . γN (f2n )iµN
QN
=
X
hγN (fi1 )γN (fj1 )iµN . . . hγN (fin )γN (fjn )iµN
pairings
=
X
S(fi1 · fj1 ) . . . S(fin · fjn )
pairings
= S(f1 · . . . · f2n ).
If we write δ · f for the function (δ · f )(t, x) = δ(t)f (x), then the formula above implies that
hγN (δ · f1 ) . . . γN (δ · fn )ΩN , ΩN i = W ((δ · f1 ) · . . . · (δ · fn )).
(5.16)
Note that for the free quantum field it is allowed to smear out over a product of a function and
a delta-function (for general quantum fields this might not be the case). Despite these beautiful
relations between the free Euclidean field in two dimensions and the free hermitean scalar quantum
field in 2-dimensional spacetime, the two are not the same. Of course this was to be expected,
since the smeared quantum fields φ(f ) do not commute with each other in the general case where
f is complex-valued, whereas random variables always commute with each other.
As we will demonstrate now, it is possible to represent the time-zero quantum field φ0 as
a Gaussian random process, but this Gaussian random process is not the two dimensional free
Euclidean field. On the space DR (R) of real-vaued functions in D(R) we define for m > 0 the
inner product h., .iF := CF h., .i− 1 ,m , where CF is fixed by some convention (in [30] the convention
2
is CF = 1), and let the Hilbert space F be the completion of this inner product space. On this
b = (−∆ + m2 ) 21 . Let γF : F → LR (QF , AF ) be the
Hilbert space we define the operator D
b of D
b as
Gaussian random process indexed by F and write H0 for the second quantization dΓ(D)
defined above. For each f ∈ S(R2 ) we then define
Z
0
γF (f ) :=
eitH0 γF (ft )e−itH0 dt,
R
where ft is the function on R defined by ft (x) = f (t, x). By defining ΩF ∈ L2 (QF , AF , µF ) to
be the random variable that is 1 everywhere, it can be shown that the quantities
Z
0
0
0
0
hγF
(f1 ) . . . γF
(fn )ΩF , ΩF i :=
γF
(f1 ) . . . γF
(fn )dµF
(5.17)
QF
are equal to the smeared Wightman functions W (f1 · . . . · fn ), see also theorem II.17 in [30]. Thus,
for the smeared Wightman functions we have
Z
W (f1 · . . . · fn ) =
dt1 . . . dtn heit1 H0 γF ((f1 )t1 )e−it1 H0 . . . eitn H0 γF ((fn )tn )e−itn H0 ΩF , ΩF i
n
R
Z
=
dt1 . . . dtn hγF ((f1 )t1 )ei(t2 −t1 )H0 . . . ei(tn −tn−1 )H0 γF ((fn )tn )ΩF , ΩF i,
Rn
where in the last step we used that eitH0 ΩF = ΩF . In this sense we may interpret L2 (QF , AF , µF )
0 as the free quantum field φ. In particular
as the Fock space for a scalar particle of mass m, and γF
the Gaussian process γF itself can be interpreted as the time-zero quantum field.
Now that we know that the free Euclidean field and the time-zero free quantum field are
Gaussian processes γN and γF , respectively, it is time to compare them. For each r ∈ R we define
a linear map jr : F → N by
p
(jr f )(s, x) := 2CG CF δr (s)f (x),
125
where δr is the delta function concentrated at r, i.e. δr (s) = δ(s − r). The Fourier transform of
jr f is
√
√
Z
2CG CF
2CG CF b
−i(us+kx)
c
f (k)e−iru ,
jr f (u, k) =
f (x)δr (s)e
dsdx = √
2π
2π
R2
so for the norm of jr f we find that
kjr f k2N
=
=
=
=
=
1
CG
Z
c
jc
r f (u, k)jr f (u, k)
dudk
k2 + u2 + m2
R2
Z
|fb(k)|2
1 2CG CF
dudk
2
2
2
CG 2π
2 k + u + m
R
Z
CF
π
√
|fb(k)|2 dk
π R k2 + m2
Z
|fb(k)|2
√
dk
CF
k2 + m2
R
kf k2F ,
so jr is an isometry. If gt ∈ E(2) is the time translation over t, then we will write ut to denote
u(gt ) as defined in (5.13). Then
p
p
(ut (jr f ))(s, x) = (jr f )(s−t, x) = 2CG CF δr (s−t)f (x) = 2CG CF δ(s−t−r)f (x) = (jr+t f )(s, x),
b
b is the
which shows that ut jr = jr+t . Another property of jr is that jr∗1 jr2 = e−|r2 −r1 |D , where D
differential operator on F defined above. In the same way as we did before, we can define a map
Jr := Γ(jr ) : L2 (QF , AF , µF ) → L2 (QN , AN , µN ). This map is also an isometry and it satisfies
Ut Jr = Jr+t and Jr∗ Jt = e−|r−t|H0 . One can now prove the Feynman-Kac-Nelson formula, see
theorem III.6 of [30].
Theorem 5.7 (The free field FKN formula) Let f1 , . . . , fn ∈ F , let F0 , . . . , Fk : Rn →
P C be
bounded functions and let t1 , . . . , tk ≥ 0 be fixed. Let s0 ∈ R be arbitrary and let sj = s0 + ji=1 ti
for 1 ≤ j ≤ k. Then
hF0 (γF (f1 ), . . . , γF (fn ))e−t1 H0 F1 (γF (f1 ), . . . , γF (fn )) . . . e−tk H0 Fk (γF (f1 ), . . . , γF (f1 ))ΩF , ΩF i
Z
k
Y
=
Fl (γN (jsl f1 ), . . . , γN (jsl fn ))dµN .
QN l=0
As Simon shows in theorem III.19 of [30], the FKN formula remains valid if the Fj are polynomially
bounded. Now consider the following special case of the FKN formula. Let f1 , . . . , fn ∈ F and let
Fm (x1 , . . . , xn ) = xm+1 for 0 ≤ m ≤ n − 1 (so k = n − 1), which are polynomially bounded. Then
the FKN formula reads
Z
−t1 H0
−tn H0
hγF (f1 )e
...e
γF (fn )ΩF , ΩF i =
γN (js0 f1 ) . . . γN (jsn−1 fn )dµN
QN
= hγN (js0 f1 ) . . . γN (jsn−1 fn )ΩN , ΩN i,
where we have chosen s0 = 0 and ΩN ∈ L2 (QN , AN , µN ) is the function that is identically 1.
Nelson’s axioms for Euclidean field theory
As stated in equation (5.16), the free Euclidean field can be used to obtain the Wightman functions
for the free quantum field. This immediately leads one to ask under what conditions a (non-free)
Euclidean field will define a Wightman quantum field theory. In his article [28], Nelson defines a
set of conditions on a Euclidean field that will guarantee that the Euclidean field gives rise to a
Wightman quantum field theory. These conditions are known as Nelson’s axioms. The idea behind
the axioms is that we try to carry out the same construction for a (non-free) Euclidean field as for
126
the free Euclidean field, and then look what extra conditions are needed to satisfy all characteristic
properties of the Wightman functions. Thus, given a Euclidean field λ : Hm−1 (Rd ) → LR (X, A)
with associated representation T of E(d) on the probability space (X, A, µ), we begin by defining
a map jr : S(Rd−1 ) → Hm−1 (Rd ) by (jr f )(s, x) = δr (s)f (x). Then we define a time-zero field
λ0 : S(Rd−1 ) → LR (X, A) by
λ0 (f ) := λ(jr f )
and for t ∈ R we define
λt (f ) = eitH λ0 (f )e−itH ,
where H is the positive self-adjoint operator that satisfies P t = e−|t|H . Finally, for f ∈ S(Rd ) we
define
Z
λt (ft )dt,
θ(f ) :=
R
where ft (x) = f (t, x). The candidate Wightman functions are now
hθ(f1 ) . . . θ(fn )Ω, Ωi
where Ω is the function that is identically 1 on X. The problem is now reduced to formulating
conditions that guarantee that these distributions are indeed the Wightman functions of some
quantum field theory. One of these conditions is that the translation subgroup of E(2) acts
ergodically, a property that was also present in the free Euclidean field. Besides this ergodicity,
there is also some regularity condition, but we will not discuss it here. To summarize, the Nelson
axioms require the existence of some Euclidean field, together with some regularity condition and
the ergodicity property. This will guarantee that the construction above will yield a Wightman
quantum field theory.
5.2.2
An alternative method: The Osterwalder-Schrader theory
When the Schwinger functions are computed for some given Wightman quantum field theory, the
Wightman functions can be recovered by analytic continuation of the Schwinger functions and
hence also the entire quantum field theory can be recovered, by the reconstruction theorem. These
ideas led in the early 1970s to the question whether it is possible to somehow begin with a set of
functions which can be shown to be the Schwinger functions for some quantum field theory, and
then recover the Wightman functions from these Schwinger functions. Of course, one should be
able to recognize whether some given set of functions is indeed the set of Schwinger functions for
some quantum field theory, i.e. one should have a set of conditions that should be satisfied for
these functions in order to be the Schwinger functions of some quantum field theory.
Soon after Nelson developed his axiom system for Euclidean field theory, Osterwalder and
Schrader developed another axiom system, called the Osterwalder-Schrader axioms, in which the
axioms describe properties of Schwinger functions that guarantee the existence of a corresponding
Wightman quantum field theory. We will not further discuss the Osterwalder-Schrader axioms
here.
5.2.3
The P(φ)2 -model as a Wightman model
Now that we have seen two general axiomatic frameworks for a Euclidean approach, we briefly
mention how the Euclidean framework can be used in the construction of the P(φ)2 -model. The
first task is of course to make sense of the formal expression
Z
V =λ
: P(φ(x)) : dx
R
for the interaction of the P(φ)2 -model within the Euclidean framework. In the Hamiltonian approach this was achieved by considering the interaction as a perturbation of the free field theory
and by introducing a cutoff version of the interaction. The first of these two points was manifest
127
in the fact that we began with the free particle Fock space as our Hilbert space, on which we
would later define the local observables for the interacting theory. The direct translation of this
to Nelson’s framework would be to begin the Euclidean construction of the P(φ)2 -model with the
Gaussian random process γN of which we know that it gives rise to the Schwinger functions of the
free quantum field:
Z
S(x1 · . . . · xn ) =
γN (x1 ) . . . γN (xn )dµN .
QN
Path-integral arguments from physics then suggest that the Schwinger functions for the interacting
theory are formally given by
R
R
−λ R2 :P(γN (x)):dx
dµN
QN γN (x1 ) . . . γN (xn )e
R
Sint (x1 · . . . · xn ) =
.
R
−λ R2 :P(γN (x)):dx
dµN
QN e
Recall that on the free particle Fock space we defined the normal ordering : φm (x) : of an unsmeared
field by writing all creation operators to the left of all annihilation operators. For random variables
we also defined the Wick ordering : f1 n1 . . . fk nk : of a product of powers of random variables, which
can in particular be applied to define Wick products : γN (f1 ) . . . γN (fm ) : of the free Euclidean field
γN (f ). The unsmeared version of these products can be written formally as : γN (x1 ) . . . γN (xm ) :.
m (x) : as in : P(γ (x)) : above,
However, these objects are not what is meant when we write : γN
N
which should be clear from the fact that we write only one variable x rather than m variables
x1 , . . . , xm . To understand what is meant here, first consider the expression γ(x) for a Gaussian
random process γ. Formally, by this expression we mean something like
Z
γ(x) = γ(δx ) = γ(y)δx (y)dy,
but of course smearing
out over a δ-function is not allowed. Mathematically we should thus replace
R
this with γh (x) = γ(y)h(x − y)dy, where h is a smooth function that looks like a δ-function that
is concentrated at the Rorigin. Then γh (x) is a random variable for any x and we can even define
random fields by g 7→ γh (x)m g(x)dx. Since γh (x) is a well-definedR random variable, we can also
take the Wick product : γh (x)m : and define a random field by g 7→ : γh (x)m : g(x)dx =: γhm (g) :.
By taking some limit in which h → δ, we then obtain the expression that is meant by : γ m (g) :.
The unsmeared version is then denoted by : γ m (x) :. This defines : P(γN (x)) :. For details about
the precise conditions for γ under which this definition is possible, see section V.1 of [30].
The cutoff for the interaction term in the Hamiltonian approach can easily be translated to a
cutoff in the Euclidean theory:
Z
U (g) = λ
: P(γN (x)) : g(x)dx,
R2
where g is a cutoff function that equals 1 on a large region, only this time the cutoff function is a
function on R2 instead of on R. The corresponding cutoff Schwinger functions are
R
γN (x1 ) . . . γN (xn )e−U (g) dµN
Q
R
Sg (x1 , . . . , xn ) = N
.
−U (g) dµ
N
QN e
When we define the measure dνg by
dνg = R
e−U (g) dµN
,
−U (g) dµ
N
QN e
we can write the cutoff Schwinger functions more simply as
Z
γN (x1 ) . . . γN (xn )dνg .
Sg (x1 , . . . , xn ) =
QN
128
The idea is now to show that in the limit where g → 1, the cutoff Schwinger functions become a
set of functions Sint (x1 , . . . , xn ) that satisfy all the Osterwalder-Schrader axioms and that these
functions can be identified as the Schwinger functions of the P(φ)2 -model. This is one of the
main results of Glimm, Jaffe and Spencer in their article [19]. They prove the result under the
assumption that λ/m2 is sufficiently small. They also prove that the theory has a mass gap and
that the infimum of the set that is obtained by removing the point {0} from the spectrum of the
mass operator (H 2 − P2 )1/2 is an isolated point {mr } in the spectrum of the mass operator, and
e↑ to the subspace corresponding to {mr }
that the restriction of the unitary representation of P
+
is irreducible (and thus describes a one-particle state). In particular, the Haag-Ruelle theory can
be applied to the P(φ)2 -model and this model thus has a particle structure. The techniques that
were used to derive these properties are directly inspired by techniques of statistical mechanics.
This can be understood by realizing that the expression for the Schwinger functions looks very
much like the correlation functions that one encounters in statistical mechanics. This analogy
with statistical mechanics also gave rise to the study of phase transitions and other quantities
from statistical mechanics for the P(φ)2 -model. Unfortunately, there is not enough time to discuss
all these interesting developments in this thesis.
129
A
Hilbert space theory
In this appendix we discuss some facts of Hilbert space theory that are usually not covered in an
elementary course in functional analysis, such as direct integrals and unbounded operators. We
will not discuss the more basic topics such as bounded operators on Hilbert space.
A.1
Direct sums and integrals of Hilbert spaces
k
The
to be the direct sum
Lk direct sum of a finite number of Hilbert spaces {Hn }n=1 is defined
k
with inner product of two elements (hn )n=1 and (gn )kn=1 defined by
n=1 Hn (of vector spaces)
Lk
P
k
h(hn )kn=1 , (gn )kn=1 i =
n=1 Hn with this inner product
n=1 hhn , gn iHn . It follows easily that
becomes a Hilbert space. If we have a countably infinite collection of Hilbert spaces {Hn }∞
n=1 ,
then their direct sum is defined to be
∞
M
Hn := {(hn )∞
n=1 : hn ∈ Hn for all n and
n=1
∞
X
khn k2Hn < ∞},
n=1
with similar addition and scalar multiplication as P
in the case of a finite number of Hilbert
L∞spaces
and inner product defined by h(hn )kn=1 , (gn )kn=1 i = ∞
hh
,
g
i
.
It
follows
easily
that
n=1 n n Hn
n=1 Hn
is a Hilbert space with this inner product.
We will now generalize this notion of a direct sum of Hilbert spaces to direct integrals of
Hilbert spaces. Let (X, A, µ) be a measure space with positive measure µ, and suppose that for
each x ∈ X we are given a Hilbert space H(x) of dimension n(x) ∈ N ∪ {∞} in such a way that the
function n : X → N ∪ {∞} is A-measurable. We will then define the direct integral of the Hilbert
spaces H(x) as follows. First we partition the set X into (disjoint) subsets {Xn }n∈N∪{∞} , where
Xn = {x ∈ X : dim(H(x)) = n}. For any n we may then identify all Hilbert spaces {H(x)}x∈Xn
with some particular Hilbert space H(n) with dim(H(n) ) = n. Now let Hn be the set of functions52
h : Xn → H(n) for whichR the function x 7→ hh(x), giH(n) from Xn to C is A-measurable for all
g ∈ H(n) and such that Xn kh(x)k2H(n) dµ(x) < ∞. We then define a vector space structure on
Hn by (h + g)(x) = h(x) + g(x) and
R (λh)(x) = λh(x) for h, g ∈ Hn and λ ∈ C, and we define an
inner product on Hn by hh, giHn = Xn hh(x), g(x)iH(n) dµ(x). It can be shown that Hn is in fact a
R⊕
Hilbert space, which we will denote by Xn H(x)dµ(x). Finally, we then define the direct integral
of the Hilbert spaces {H(x)}x∈X by
Z ⊕
MZ ⊕
H(x)dµ(x) =
H(x)dµ(x).
X
n
Xn
R ⊕ When we are given some Hilbert space H, we say that H can be represented as a direct integral
Hilbert spaces H(x), with (X, µ) some measure space, if there exists an
X H(x)dµ(x) of the R
⊕
isomorphism α : H → X H(x)dµ(x) of Hilbert spaces.
A.2
Self-adjoint operators and the spectral theorem
If H is a Hilbert space and D ⊂ H is a dense linear subspace of H, then we call an operator
A : D → H (which is not necessarily bounded) a densely defined operator. If A : D → H is a
densely defined operator, we define a linear subspace D∗ ⊂ H by
D∗ = {k ∈ H : h 7→ hAh, ki is a bounded linear functional on D}.
If k ∈ D, then h 7→ hAh, ki is a bounded linear functional on the dense subspace D ⊂ H and
can therefore be extended to a bounded linear functional F on H. By the Riesz representation
theorem, there exists a unique vector k ∗ ∈ H such that F (h) = hh, k ∗ i for all h ∈ H. In particular,
we have hAh, ki = hh, k ∗ i for all h ∈ D. We then define a map A∗ : D∗ → H by A∗ k = k ∗ .
52
Actually equivalence classes of functions, where the equivalence relation is given by h ∼ g ⇔ h = g µ-almost
everywhere. Here we pretend as if we have already chosen a particular representative of each class, so we can work
with functions instead of equivalence classes of functions.
130
Definition A.1 Let A : D → H be a densely defined operator in the Hilbert space H.
(i) We call A symmetric if hAh, gi = hh, Agi for all g, h ∈ D.
(ii) We call A self-adjoint if A = A∗ .
It is clear that every self-adjoint operator A is also symmetric, but the converse is false since in
general we do not have D = D∗ . When A is bounded, the converse is in fact true.
In order to formulate the spectral theorem for self-adjoint operators, we first need to recall the
definition of a spectral measure and some facts about integration with respect to spectral measures.
Definition A.2 If (X, Ω) is a measurable space and H is a Hilbert space, then a spectral measure
for (X, Ω, H) is a function E : Ω → B(H) such that:
(1) for each ∆ ∈ Ω the operator E(∆) ∈ B(H) is an orthogonal projection;
(2) E(∅) = 0 and E(X) = 1H ;
(3) E(∆1 ∩ ∆2 ) = E(∆1 )E(∆2 ) for ∆1 , ∆2 ∈ Ω;
(4) if {∆n }∞
n=1 are pairwise disjoint sets in Ω, then
!
∞
∞
X
[
E(∆n ).
E
∆n =
n=1
n=1
From (2) and (3) it follows that if ∆1 , ∆2 ∈ Ω with ∆1 ∩∆2 = ∅ then E(∆1 )E(∆2 ) = E(∆2 )E(∆1 ) =
0. Also, for each h ∈ H the map µ : Ω → R given by ∆ 7→ hE(∆)h, hi is a positive measure on
(X, Ω). We will denote this measure by µh .
Let E be a spectral measure for (X, Ω, H) and let t : X → C be a simple function on the
measurable space (X, A). We can write the simple function t as
t = α1 1∆1 + . . . + αn 1∆n
with αj ∈ C and all ∆j ∈ Ω disjoint. We then define the integral
Z
n
X
t(x)dE(x) :=
αj E(∆j ).
X
j=1
Since for j 6= k the sets ∆j and ∆k are disjoint, E(∆j )h and E(∆k )h are orthogonal for all h ∈ H.
We may thus use the Pythagoras theorem:
2
Z
n
n
X
X
2
t(x)dE(x)h =
|α
|
hE(∆
)h,
E(∆
)hi
=
|αj |2 hE(∆j )h, hi
j
j
j
X
j=1
=
n
X
j=1
|αj |2 µh (∆j ) =
R
X
|t(x)|2 dµh (x).
(A.1)
X
j=1
Since
Z
dµh = µh (X) = hE(X)h, hi = khk2 it follows that
Z
2
t(x)dE(x)h ≤ khk2 sup |t(x)|2 = khk2 max |t(x)|2 .
x∈X
X
x∈X
R
Hence X t(x)dE(x) is a bounded operator with norm ≤ maxx∈X |t(x)|.
An arbitrary bounded measurable function f : X → C can be approximated uniformly by a
sequence {tn } of simple functions. According to the estimate above it follows that
Z
2 Z
2
Z
≤ khk2 sup |tn (x)−tm (x)|2 .
tn (x)dE(x)h −
t
(x)dE(x)h
=
[t
(x)
−
t
(x)]dE(x)h
n
m
m
X
X
x∈X
X
R
Because this goes to zero uniformly on the unit ball in H for m, n → ∞, the sequence { X tn (x)dE(x)}n
is a Cauchy sequence in the Banach space B(H) and thus converges to an element in B(H). We
then define
Z
Z
f (x)dE(x) := lim
tn (x)dE(x) ∈ B(H),
X
n→∞ X
131
where the limit is taken in the norm-topology of B(H). The limit does not depend on the chosen
sequence of simple functions. Because for each converging sequence {An }n in B(H) with limit
A ∈ B(H)R it is in particular true that
R limn→∞ (An h) = Ah for all h ∈ H, we have in the present
case that X f (x)dE(x)h = limn→∞ ( X tn (x)dE(x)h). Using (A.1), we then find that
2
Z
2
Z
Z
2
f (x)dE(x)h = lim
tn (x)dE(x)h = lim tn (x)dE(x)h
n
n
X
X
Z X
Z
= lim
|tn (x)|2 dµh (x) =
|f (x)|2 dµh (x).
n
X
(A.2)
X
R
Now that we have defined the integral X f (x)dE(x) for a spectral measure E for (X, Ω, H)
and a bounded measurable function f : X → C, we will do the same for unbounded measurable
functions. Let E be a spectral measure for (X, Ω, H), let g : X → C be a measurable function and
let h ∈ H be such that g ∈ L2 (X, Ω, µh ). We can approximate g ∈ L2 (X, A, µh ) (in the L2 -norm)
by a sequence of bounded measurable functions. For instance, we can take the sequence {gn }n ,
defined by
g(x) if |g(x)| ≤ n
gn (x) =
0
if |g(x)| > n.
Because the functions gn are bounded, we can use the identity (A.2):
Z
2
Z
2
Z
gn (x)dE(x)h −
gm (x)dE(x)h
= [gn (x) − gm (x)]dE(x)h
X
X
X
Z
=
|gn (x) − gm (x)|2 dµh (x).
X
Because this goes to zero for n, m → ∞ (this follows from the fact that the
R sequence {gn }n
2
converges to g in the L -norm and is hence a Cauchy sequence), the sequence { X gn (x)dE(x)h}n
is a Cauchy sequence in H and therefore converges in H. We now define
Z
Z
g(x)dE(x)h := lim
gn (x)dE(x)h.
n→∞ X
X
This definition does not depend on the choice of the sequence {gn } and analogously to (A.2) we
now also have
Z
2 Z
g(x)dE(x)h =
|g(x)|2 dµh (x).
(A.3)
X
X
R
In general, the operator X g(x)dE(x) is not bounded and it is only defined for those h ∈ H for
which g ∈ L2 (X, Ω, µh ), i.e. for those h ∈ H for which the right-hand side of (A.3) is finite. Note
that (A.3) implies that
Z
2 Z
xdE(x)h =
|x|2 dµh (x)
X
X
for all h ∈ H for which the right-hand side is finite.
We now formulate the spectral theorem for self-adjoint operators.
Theorem A.3 Let A : D → H be a self-adjoint operator in a separable Hilbert space H. Then
there exists a unique spectral measure EA for (R, BR , H), with BR the Borel σ-algebra on R, such
that
Z
A=
x dEA (x).
R
We call EA the spectral
measure generated by A. The domain of A can then be expressed as
R
dom(A) = {h ∈ H : R |x|2 dµh (x) < ∞}.
132
We can slightly generalize the spectral theorem as follows. If {Ak }nk=1 is a system of pairwise
commuting self-adjoint operators defined on some common dense domain in H, then there exists
a unique spectral measure EA for (Rn , BRn , H), with BRn the Borel σ-algebra on Rn , such that
Z
Ak =
xk dEA (x)
Rn
for k = 1, . . . , n, where xk denotes the k-th component of the integration variable x. We will
call EA the joint spectral measure generated by the system {Ak }nk=1 . Note that we can apply our
results about integration with respect to spectral measures in particular to the spectral measure
EA . Thus, for each measurable function g : Rn → C we can define the operator
Z
g(x)dEA (x).
(A.4)
g(A1 , . . . , An ) :=
Rn
The domain of this operator is the set {h ∈ H :
R
2
R |g(x)| dµh (x)
< ∞}.
Definition A.4 If A : D → H is a self-adjoint operator in a (separable) Hilbert space H and
f ∈ H, we define Hf to be the smallest closed subspace in H that contains all vectors of the form
EA (∆)f with ∆ ∈ BR , and we call Hf the cyclic subspace in H generated by f .
Note that if h ∈ H with h ⊥ Hf , then for g ∈ Hf we have hEA (∆)h, gi = hh, EA (∆)gi = 0, so
EA (∆)h ⊥ Hf . This shows that Hh ⊥ Hf whenever h ⊥ Hf .
We will now state a theorem that uses the spectral theorem for self-adjoint operators to represent the Hilbert space as a certain direct integral of Hilbert spaces. We will not prove the theorem
in detail, but we will give a sketch of the proof. The reason for not omitting the proof altogether
is that the construction is of great importance in quantum physics.
Theorem A.5 Let A : D → H be a self-adjoint operator in a separable Hilbert space H. Then H
can be represented as a direct integral
Z ⊕
H(x)dµ(x)
R
of Hilbert spaces H(x) relative to a positive measure µ on R such that the action of A is given by
multiplication by x.
Proof sketch
Because H is separable, we can choose a countable dense set {f1 , f2 . . .} in H. Define H1 := Hf1 =
{EA (∆)f1 }∆∈BR and suppose that for some n ≥ 1 we have already constructed cyclic subspaces
H1 , . . . , Hn in H that are pairwise orthogonal,
and write Hn := H1 ⊕ . . . ⊕ Hn for their direct sum.
L
n
/ Hn }.
If Hn = H, then we can write H = k=1 Hk . If this is not the case, let kn = min{k : fk ∈
n
We choose a unit vector hn+1 in the subspace of H spanned by H and fkn such that hn+1 ⊥ Hn ;
we then define Hn+1 := Hhn+1 . Then Hn+1 is orthogonal to H1 , . . . , Hn and fkn ∈ H1 ⊕. . .⊕Hn+1 ,
and since {f1 , f2 , . . .} is dense in H, this construction gives rise to a decomposition
M
H=
Hn
n
of H into an orthogonal direct sum of (a finite or countably infinite number of) cyclic subspaces.
It can be shown (although we will not prove this here) that the spaces Hn can be realized as
function spaces L2µn (R), where µn (∆) = hEA (∆)hn , hn i with hn as defined above. The corresponding isomorphism πn : Hn →L
˜ 2µn (R) is defined 53 by EA (∆)hn 7→ 1∆ with 1∆ the characteristic function of the set ∆; in particular, hn 7→ 1R . The corresponding action of A on
53
The elements of L2 are actually equivalence classes of functions, not functions, but often we will work with
representatives whenever this is possible. So in this particular case we actually mean that EA (∆)hn is mapped to
the equivalence class of χ∆ .
133
R
{f ∈ L
L2µn (R) : R x2 |f (x)|2 dµn (x) < ∞} ⊂ L2µn (R) is then given by πn (Af ) = idR · πn (f ). Since
H = n Hn , we can thus realize each g = (g1 , g2 , . . .) ∈ H as a sequence (π1 (g1 ), π2 (g2 ), . . .) of
functions πn (gn ) ∈ L2µn (R)
g ∼ π(g) := (π1 (g1 ), π2 (g2 ), . . .) ∈
M
L2µn (R)
n
and if g ∈ D then
Ag ∼ π(Ag) = (idR · π1 (g1 ), idR · π2 (g2 ), . . .) ∈
M
L2µn (R).
n
Now define a measure µ on R by
∞
X
µ(∆) =
2−n µn (∆).
n=1
Note that µn (∆) = hEA (∆)hn , hn i ≤ khn k2 = 1, so the sum converges for each Borel set ∆, i.e.
µ(∆) < ∞ for each Borel set ∆. It is clear that if µ(∆) = 0, then µn (∆) = 0 for all n, so we have
µn << µ for all n. By the Radon-Nikodym theorem from measure
R theory it then follows that for
each n there exists a nonnegative function ϕn such that µn (∆) = ∆ ϕn (x)dµ(x). Now let gn ∈ Hn
√
and let πn (gn ) be the corresponding function in L2µn (R). Then the function π̂n (gn ) := ϕn πn (gn )
is in L2µ (R) and
Z
Z
Z
2
2
2
|π̂n (gn )(x)| dµ(x) =
|πn (gn )(x)| ϕn (x)dµ(x) =
|πn (gn )(x)|2 dµn (x)
kπ̂n (gn )kL2µ (R) =
R
R
= kπn (gn )k2L2µ
n
(R)
R
= kgn k2Hn ,
i.e. the mapping gn 7→ π̂n (gn ) is an isometry of Hn into L2µ (R). Now define Xn ⊂ R by Xn =
{x ∈ R : ϕn (x) > 0}; then the above mapping π̂n : Hn → L2µ (R) defines an isomorphism π
en :
Hn →L
˜ 2µ (Xn ). We thus have an isomorphism
π
e : H→
˜
M
L2µ (Xn ).
n
Define a function n : R → N ∪ {∞} by n(x) = #{m : x ∈ Xm }. Then n(x) is measurable and
if we write Bn := {x ∈ R : n(x) = n} then clearly µ(B0 ) = 0. For x ∈ Bn weL
write m1 (x) <
m2 (x) < . . . < mn (x) for the values of m for which x ∈ Xm . If ge = (e
g1 , ge2 , . . .) ∈ n L2µ (Xn ), we
(n)
(n)
(n)
define for each n a set of functions ϕk (e
g ) ∈ L2µ (Bn ) (k = 1, . . . , n) by ϕk (e
g )(x) = gemk (x) (x), so
L 2
2
⊕n
for each n this defines a map αn : k Lµ (Xk ) → Lµ (Bn ) given by ge = (e
g1 , ge2 , . . .) 7→ ϕ(n) (e
g ) :=
L
L 2
(n)
(n)
2
⊕n
(ϕ1 (e
g ), . . . , ϕn (e
g )). This, in turn, gives rise to a map α : n Lµ (Xn ) → n Lµ (Bn ) given by
(1)
(2)
ge 7→ (ϕ (e
g ), ϕ (e
g ), . . .). Because
ke
g k2L
2
n Lµ (Xn )
=
X
ke
gn k2L2µ (Xn )
=
n
=
n
=
n
XZ
X
XZ
n
X
2
|e
gn (x)| dµ(x) =
Xn
n
(n)
|ϕk (e
g )(x)|2 dµ(x) =
Bn k=1
(n)
kϕk (e
g )(x)k2L2µ (Bn )⊕n
XZ
n
XX
n
X
Bn k=1
|e
gmk (x) (x)|2 dµ(x)
(n)
kϕk (e
g )(x)k2L2µ (Bn )
n k=1
= kα(e
g )kLn L2µ (Bn )⊕n ,
n
we see that the map α is isometric. For each x ∈ R we now choose a Hilbert space H(x)
with dim(H(x)) = nx , where nx is the unique index in N ∪ {∞} such that x ∈ Bnx , and for
x
each x ∈ H we also specify a basis {ek (x)}nk=1
for H(x). For each n we can then represent
R⊕
2
⊕n
the Hilbert space Lµ (Bn )
as a direct integral Bn H(x)dµ(x) of the spaces H(x). Thus, we
134
now have isomorphisms
L 2
L R⊕
2
⊕n '
n Lµ (Xn ) '
n Lµ (Bn )
n Bn H(x)dµ(x), which are given by
(1)
(2)
(2)
(n)
(n)
7→ (ϕ1 (e
g )e1 , ϕ1 (e
g )e1 + ϕ2 (e
g )e2 , . . . , ϕ1 (e
g )e1 + . . . + ϕn (e
g )en , . . .).
L
ge 7→ (ϕ(1) (e
g ), ϕ(2) (e
g ), . . .)
It now follows from the definition of the direct integral that H can be represented as direct integral
Z ⊕
H(x)dµ(x).
R
This theorem can be slightly generalized as follows. When we have a finite system {Ak }nk=1 of
pairwise commuting self-adjoint operators Ak on a Hilbert space H (i.e. the spectral measures EAl
and EAm of Al and Am commute for all l, m), then H can be represented as a direct integral
Z ⊕
H(x)dµ(x)
Rn
of Hilbert spaces H(x) relative to a positive measure µ on Rn such that the action of Ak is given
by multiplication by xk , where xk denotes the k-th component of the integration variable x ∈ Rn .
B
Examples of free fields
We will now construct some of the fields for massive particles. The computation of the coefficients
in (3.26) is done by using the identities
#
"
3
h σi
1)
X
(2
1
eθ·J
= eθ· 2
pl [σ l ]jk
= [2m(p0 + m)]− 2 (p0 + m)δjk +
jk
jk
(1)
e−θ·J 2
=
h
e
−θ· 2
0
jk
jk
B.1
"
i
σ
− 12
= [2m(p + m)]
0
l=1
3
X
(p + m)δjk −
#
l
l
p [σ ]jk
l=1
The (0, 0)-field (or scalar field)
The (0, 0)-field (or scalar field) can only describe particles τ with spin jτ = 0. According to (3.26)
the coefficients u(p) are given by u(p) = (2p0 )−1/2 C00 (0, 0; 0, 0) = (2p0 )−1/2 , so (using (3.22)) the
scalar field is
Z
i
h
d3 p
τ
−iη(p,x)
ip·x ∗
(ψ0,0
)0,0 (x) =
(p)
,
(B.1)
e
a
(p)
+
e
a
τ
τC
(2π)3/2 (2p0 )1/2
p
with p0 = m2τ + p2 . Note that in case the particle τ coincides with its antiparticle τ C then
∗ (x) = ψ (x). For this reason, such field will also be called real.
ψ0,0
0,0
B.2
The ( 12 , 12 )-field (or vector field)
We will now consider the ( 12 , 21 )-field (or vector field). This field can only describe particles τ with
spin jτ = 0 or jτ = 1.
Spin 0
For spin 0 the coefficients are


u− 1 − 1 (p)
2
2
 u 1 1 (p) 
1
 −2 2

p

=
 u 1 − 1 (p)  2m p0
2
2
u 1 1 (p)
2 2
135

p1 + ip2
 −p0 + p3 


 p0 + p3 
−p1 + ip2

Now define new coefficients



 0
u 1 − 1 − u− 1 1
u0 (p)
p
2
2
2 2
 u 1 1 − u1 1 
1
u1 (p)

−im


−
−
0 −1/2 p 
 := √ 

2
2
2 2
=
−i(2p
)

u2 (p)
p2  .
2 −i[u− 12 − 12 + u 21 12 ]
u3 (p)
p3
u− 1 1 + u 1 − 1

2 2
2
2
1 1
(1)
This new choice of coefficients corresponds to a basis transformation in the space V ( 2 , 2 ) = VA 2 ⊗
(1)
VB2 . With respect to this new basis, the vector field for a particle of spin 0 becomes
Z
τ
µ
−3/2
d3 puµ (p) e−ip·x aτ (p) − eip·x a∗τ C (p)
(ψ 1 , 1 ) (x) = (2π)
2 2
Z
−3/2 0 − 21
= (2π)
(p )
d3 p −ipµ e−ip·x aτ (p) + ipµ eip·x a∗τ C (p)
τ
= ∂ µ (ψ0,0
)0,0 (x)
where in the first line we used that λ = (−1)2B κ = −κ.
Spin 1
For spin 1 the coefficients are

u− 1 − 1 (p, 1)
2
2
 u 1 1 (p, 1) 
 −2 2


 =
 u 1 − 1 (p, 1) 
2
2
u 1 1 (p, 1)
2 2


u− 1 − 1 (p, 0)
2
2
 u 1 1 (p, 0) 
 −2 2


 =
 u 1 − 1 (p, 0) 
2
2
u 1 1 (p, 0)
2 2


u− 1 − 1 (p, −1)
2
2
 u 1 1 (p, −1) 
 −2 2


 =
 u 1 − 1 (p, −1) 
2
2
u 1 1 (p, −1)


−(p1 + ip2 )2
 (p0 + m − p3 )(p1 + ip2 ) 


−(p0 + m + p3 )(p1 + ip2 ) ,
(p0 + m)2 − (p3 )2

[2m(p0 + m)]−1
p
2p0

2(p1 + ip2 )p3
(p0 + m − p3 )2 − (p1 )2 − (p2 )2 


(p0 + m + p3 )2 − (p1 )2 − (p2 )2  ,
2(−p1 + ip2 )p3

[2m(p0 + m)]−1
p
2 p0

(p0 + m)2 − (p3 )2
(p0 + m − p3 )(−p1 + ip2 )


 (p0 + m + p3 )(p1 − ip2 )  .
−(p1 − ip2 )2

[2m(p0 + m)]−1
p
2p0
2 2
For σ ∈ {1, 0, −1} we define


 0

u 1 − 1 (p, σ) − u− 1 1 (p, σ)
e (p, σ)
2
2
2 2
e1 (p, σ) p 
u− 1 − 1 (p, σ) − u 1 1 (p, σ) 


 = p0 
2
2
2 2
.

e2 (p, σ)
−i[u− 1 − 1 (p, σ) + u 1 1 (p, σ)]
2
2
2 2
e3 (p, σ)
u 1 1 (p, σ) + u 1 1 (p, σ)
−2
2
2
−2
1 1
(1)
(1)
Again, this corresponds to a basis transformation in the space V ( 2 , 2 ) = VA 2 ⊗ VB2 . However,
the new coefficients are not eµ (p, σ), but rather √1 0 eµ (p, σ) (since any new basis must be a linear
p
136
combination of the old basis, with scalar coefficients). Explicitly, these new coefficients are

 0


(p1 + ip2 )(p0 + m)
e (p, 1)
0
1 2
1 2

e1 (p, 1)

 = − √1 [m(p0 + m)]−1  m(p + m) + (p ) + ip p  ,
2
1
2
0
p p + im(p + m) + i(p2 )2 
e (p, 1)
2
e3 (p, 1)
(p1 + ip2 )p3


 0

(p0 + m)p3
e (p, 0)

e1 (p, 0)

p1 p3
0
−1 


,
2
3
e2 (p, 0) = [m(p + m)] 

p p
3
3
2
0
e (p, 0)
(p ) + m(p + m)


 0

(p1 − ip2 )(p0 + m)
e (p, −1)
0
1 2
 1 2
e1 (p, −1)
 = √1 [m(p0 + m)]−1  (p ) + m(p + m) − ip p  .

2
1
2
0
p p − im(p + m) − i(p2 )2 
e (p, −1)
2
e3 (p, −1)
(p1 − ip2 )p3
Note that for zero momentum these coefficients are
 
 
0
0




1
1
µ
,
0 ,
e
(0,
0)
=
eµ (0, 1) = − √ 
0
2 i
1
0

0
1 1

eµ (0, −1) = √ 
2 −i
0

and that the eµ (p, σ) are related to these by
eµ (p, σ) = L(p)µ ν eν (0, σ),
p
where L(p) is the standard boost that maps the momentum vector (m, 0) to ( m2 + p2 , p):
 0 0

p (p + m)
p1 (p0 + m)
p2 (p0 + m)
p3 (p0 + m)

p1 (p0 + m) m(p0 + m) + (p1 )2
p1 p2
p1 p3
.
L(p)µ ν = [m(p0 +m)]−1 
2
0
1
2
0
2
2
2
3

p (p + m)
p p
m(p + m) + (p )
p p
3
0
1
3
2
3
0
3
2
p (p + m)
p p
p p
m(p + m) + (p )
(B.2)
The field is
σ=1
X Z d3 p
τ
µ
−3/2
p
(ψ 1 , 1 ) (x) = (2π)
eµ (p, σ) e−ip·x aτ (p, σ) + (−1)σ eip·x a∗τ C (p, −σ) .
2 2
2p0
σ=−1
B.3
The ( 12 , 0)-field and the (0, 12 )-field
These fields can only describe particles with spin 21 . The coefficients for the ( 21 , 0)-field are given
by
!
0
3
u 1 0 (p, 12 )
0 0
− 12 p + m + p
2
= [4mp (p + m)]
,
u− 1 0 (p, 21 )
p1 + ip2
2
!
1
2
u 1 0 (p, − 12 )
1
p
−
ip
0
0
−
2
= [4mp (p + m)] 2
.
u− 1 0 (p, − 21 )
p0 + m − p3
2
So the field is given by
σ= 12
τ
−3/2
(ψ 1 ,0 )a0 (x) = (2π)
2
X Z
h
i
1
d3 pua0 (p, σ) e−ip·x aτ (p, σ) + (−1) 2 σ eip·x a∗τ C (p, −σ)
σ=− 21
σ= 21
−3/2
= (2π)
X Z
d3 p ua0 (p, σ)e−ip·x aτ (p, σ) + va0 (p, σ)eip·x a∗τ C (p, σ) ,
σ=− 21
137
1
where in the last line we have restored the coefficients va0 (p, σ) = (−1) 2 +σ ua0 (p, −σ) and we have
used that λ = (−1)2B κ = κ. The coefficients for the (0, 21 )-field are given by
!
0
3
u0 1 (p, 12 )
0 0
− 12 p + m − p
2
=
[4mp
(p
+
m)]
,
u0− 1 (p, 21 )
−p1 − ip2
2
!
u0 1 (p, − 21 )
−p1 + ip2
0 0
− 12
2
= [4mp (p + m)]
.
u0− 1 (p, − 21 )
p0 + m + p3
2
So the field is given by
σ= 12
τ
(0, ψ 1 )0b (x) = (2π)
−3/2
2
X Z
h
i
3
d3 pu0b (p, σ) e−ip·x aτ (p, σ) + (−1) 2 σ eip·x a∗τ C (p, −σ)
σ=− 12
σ= 21
= (2π)−3/2
X Z
d3 p u0b (p, σ)e−ip·x aτ (p, σ) − v0b (p, σ)eip·x a∗τ C (p, σ) ,
σ=− 12
1
where v0b (p, σ) = (−1) 2 +σ u0b (p, −σ) and the minus sign in the second term comes from λ =
(−1)2B κ = −κ.
The ( 12 , 0) ⊕ (0, 21 )-field (or Dirac field)
B.4
This field is just
(ψ τ1 ,0 )a0 (x)
2
τ ) (x)
(ψ0,
1 0b
σ= 21
!
= (2π)
−3/2
X Z
σ=− 12
2
va0 (p, σ)
ua0 (p, σ) −ip·x
eip·x a∗τ C (p, σ)
e
aτ (p, σ) +
d p
−v0b (p, σ)
u0b (p, σ)
3
σ= 12
= (2π)
−3/2
X Z
d3 p u(p, σ)e−ip·x aτ (p, σ) + v(p, σ)eip·x a∗τ C (p, σ) ,
σ=− 12
where

u 1 0 (p, 12 )
2
u 1 (p, 1 )
1
 −20
2 
u(p, ) = 

2
 u0 1 (p, 12 ) 
2
u0− 1 (p, 21 )
2


u 1 0 (p, − 12 )
2
u 1 (p, − 1 )
1
 −20
2 
u(p, − ) = 

2
 u0 1 (p, − 12 ) 
2
u0− 1 (p, − 12 )
2


v 1 0 (p, 12 )
2
 v 1 (p, 1 ) 
1
 −20
2 
v(p, ) = 

2
 −v0 1 (p, 12 ) 
2
−v0− 1 (p, 21 )
2


v 1 0 (p, − 12 )
2
 v 1 (p, − 1 ) 
1
 −20
2 
v(p, − ) = 

2
 −v0 1 (p, − 12 ) 
2
−v0− 1 (p, − 12 )


p0 + m + p3
 p1 + ip2 


p0 + m − p3  ,
−p1 − ip2

1
= [4mp0 (p0 + m)]− 2

p1 − ip2
p0 + m − p3 


 −p1 + ip2  ,
p0 + m + p3

1
= [4mp0 (p0 + m)]− 2

−p1 + ip2
−p0 − m + p3 


 −p1 + ip2  ,
p0 + m + p3

1
= [4mp0 (p0 + m)]− 2

p0 + m + p3
 p1 + ip2 


−p0 − m + p3  .
p1 + ip2

1
= [4mp0 (p0 + m)]− 2
2
138
If we define the 4 × 4-matrix M by


0
0
p0 + p3 p1 − ip2

0
0
p1 + ip2 p0 − p3 
,
M =
1
2
 p0 − p3
−p + ip
0
0 
−p1 − ip2
p0 + p3
0
0
then it is easily seen that M u(p, σ) = mu(p, σ) and M v(p, σ) = −mv(p, σ). If we now define the
4 × 4-matrices
0 1C2
0 −σ i
0
i
γ =
,
γ =
,
1C2 0
σi
0
where σ i denote the Pauli matrices, these equations can be rewritten as (γ µ pµ − m)u(p, σ) = 0
and (γ µ pµ + m)v(p, σ) = 0. Using that i∂µ u(p, σ)e−ip·x = pµ u(p, σ)e−ip·x and i∂µ v(p, σ)eip·x =
−pµ v(p, σ)eip·x , this in turn implies that the field satisfies the Dirac equation
(iγ µ ∂µ − m)
!
(ψ τ1 ,0 )a0 (x)
2
τ ) (x)
(ψ0,
1 0b
= 0.
2
Note furthermore that under a space inversion the field transforms as
!
τ ) (x) !
(ψ τ1 ,0 )a0 (x)
(ψ0,
1 0b
2
2
P−1 =
P
.
τ
τ
(ψ0, 1 )0b (x)
(ψ 1 ,0 )a0 (x)
2
2
139
References
[1] H. Araki. Mathematical theory of quantum fields, Oxford University Press, Oxford, 2000
[2] N.N. Bogolubov, A.A. Logunov, A.I. Oksak, I.T. Todorov. General principles of quantum field
theory, Kluwer, Dordrecht, 1990
[3] N.N. Bogolubov, A.A. Logunov, I.T. Todorov. Introduction to axiomatic quantum field theory,
W.A. Benjamin, Inc., Massachusetts, 1975
[4] J. Cannon, A. Jaffe. Lorentz covariance of the λ(φ4 )2 quantum field theory, Comm. Math.
Phys. 17 (1970), 261-321
[5] J.B. Conway. A course in functional analysis (2nd edition), Springer, New York, 1990
[6] P.A.M. Dirac. Lectures on quantum mechanics, Yeshiva University, New York, 1964
[7] G.G. Emch. Algebraic methods in statistical mechanics and quantum field theory, John Wiley
and sons, Inc., New York, 1972
[8] G.B. Folland. Quantum field theory, A tourist guide for mathematicians, American Mathematical Society, Providence, 2008
[9] I.M. Gel’fand, N.Ya. Vilenkin. Generalized functions, Volume 4: Applications of harmonic
analysis, Academic Press, New York, 1964
[10] J. Glimm. Boson fields with non-linear self-interaction in two dimensions, Comm. Math. Phys.
8 (1968), 12-25
[11] J. Glimm, A. Jaffe. A λφ4 quantum field theory without cut-offs I., Phys. Rev., 176 (1968),
1945-1961
[12] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs II. The field operators and
the approximate vacuum, Ann. Math., 91 (1970), 362-401
[13] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs III. The physical vacuum,
Acta Math., 125 (1970), 204-267
[14] J. Glimm, A. Jaffe. The energy momentum spectrum and vacuum expectation values in quantum field theory I, J. Math. Phys. 11 (1970), 3335-3338
[15] J. Glimm, A. Jaffe. The energy momentum spectrum and vacuum expectation values in quantum field theory II, Comm. Math. Phys. 22 (1971), 1-22
[16] J. Glimm, A. Jaffe. Quantum field theory models, in Statistical mechanics and quantum field
theory, C. De Witt and R. Stora, Editors, Gordon and Breach, New York, 1971
[17] J. Glimm, A. Jaffe. The λφ4 quantum field theory without cut-offs III. Perturbations of the
Hamiltonian, J. Math. Phys., 13 (1972), 1568-1584
[18] J. Glimm, A. Jaffe. Boson quantum field models, in Mathematics of contemporary physics, R.
Streater, Editor, Academic Press, London, 1972
[19] J. Glimm, A. Jaffe, T. Spencer. The Wightman axioms and particle structure in the P(φ)2
quantum field model, Ann. Math. 100 (1974), 585-632
[20] R. Haag. Local quantum physics: Fields, particles, algebras, Springer, Berlin, 1996
[21] R. Haag, D.Kastler. An algebraic approach to quantum field theory, J. Math. Phys. 5 (1964),
848
140
[22] B.C. Hall. Lie groups, Lie algebras, and representations, Springer, New York, 2003
[23] A. Jaffe. Constructive
www.arthurjaffe.com
quantum
field
theory,
Unpublished
notes
available
at
[24] A.M.L Messiah, O.W. Greenberg. Symmetrization postulate and its experimental foundation,
Phys. Rev. 136 (1964), 248-267
[25] J.R. Munkres. Topology (2nd edition), Prentice Hall, Inc., New Jersey, 2000
[26] G.L. Naber. The geometry of Minkonwski spacetime, an introduction to the mathematics of
the special theory of relativity Springer, New York, 1992
[27] E. Nelson. A quartic interaction in two dimensions, in Mathematical Theory of Elementary
Particles, R. Goodman and I. Segal, Editors, M.I.T. Press, Cambridge, 1966
[28] E. Nelson. Construction of quantum fields from Markoff fields, J. Func. Anal. 12 (1973),
97-112
[29] L.H. Ryder. Quantum field theory (2nd edition), Cambridge University press, Cambridge 1996
[30] B. Simon. The P (φ)2 (quantum) field theory, Princeton University Press, Princeton, 1974
[31] R.F. Streater. Connection between the spectrum condition and the Lorentz invariance of P(φ)2 ,
Comm. Math. Phys. 26 (1972), 109-120
[32] R.F. Streater, A.S. Wightman. PCT, spin and statistics, and all that (2nd revised printing),
Benjamin, New York, 1978; reprinted by Princeton University Press, Princeton, 2000
[33] S.J. Summers. A perspective on constructive quantum field theory, arXiv: 1203.3991v1
[34] L.A.Takhtajan. Quantum mechanics for mathematicians, American Mathematical Society,
Providence, 2008
[35] S. Weinberg. The quantum theory of fields I: Foundations, Cambridge University Press, Cambridge, 1995
141
Popular summary (english)
The special theory of relativity provides a mathematical model for space and time in the absence
of gravity; this mathematical model is also called Minkowski spacetime. The benefit of Minkowski
spacetime, over the more classical Newtonian spacetime, is that Minkowski spacetime remains
accurate when objects are moving with (almost) the speed of light. Quantum theory provides a
mathematical model for the behaviour of microscopically small objects, such as atoms or elementary
particles. If one speaks of quantum mechanics, one often means the quantum theory in which
the microscopically small objects are in a spacetime that is described by the Newtonian model of
spacetime. However, in order to make accurate predictions for systems consisting of microscopically
small objects that travel with almost the speed of light (think for instance of the situation in a
particle accelerator), it becomes necessary to develop a quantum theory that uses Minkowski
spacetime. In the development of such a theory it soon becomes very natural to introduce fields,
and for this reason the corresponding theory is called quantum field theory.
Quantum field theory is a very successful theory in the sense that the predictions that can
be made within the theory are in very good agreement with experimental data. From the point
of view of a physicist, quantum field theory is therefore a very good theory. However, from a
mathematical point of view quantum field theory is very ill-defined, because the precise nature of
the ’mathematical’ objects that are used in the theory is often unclear. In this thesis we investigate
to what extent quantum field theory can be described as a mathematical theory. We will find that
there are interesting formulations of what quantum field theory should be mathematically, but that
it is not so easy to prove that quantum field theory, as it is used by physicists, is actually of the
form as described by these mathematical formulations54 . That this is not so easy will follow from
the difficulties that we encounter when we prove this for much easier (and non-realistic) versions
of quantum field theory.
54
This problem, also called the quantum Yang-Mills problem, is one of the seven Millenium Prize Problems.
Whoever manages to solve such a problem, receives a million dollars from the Clay Mathematics Institute. Thusfar,
only one of these seven problems has been solved, namely the Poincaré conjecture.
142
Popular summary (dutch)
De speciale relativiteitstheorie geeft een wiskundig model voor ruimte en tijd in de afwezigheid
van zwaartekracht; dit wiskundig model heet ook wel Minkowski ruimtetijd. Het voordeel van
Minkowski ruimtetijd, boven de meer ouderwetse Newtoniaanse ruimtetijd, is dat Minkowski
ruimtetijd ook nauwkeurig blijft naarmate objecten met (bijna) de lichtsnelheid bewegen. De
kwantumtheorie geeft een wiskundig model voor het gedrag van microscopisch kleine objecten,
zoals atomen of elementaire deeltjes. Wanneer men spreekt over kwantummechanica, dan bedoelt
men meestal de kwantumtheorie waarbij de microscopisch kleine objecten zich bevinden in een
ruimtetijd die beschreven wordt door het Newtoniaanse model van ruimtetijd. Om nauwkeurige
voorspellingen te doen voor systemen bestaande uit microscopisch kleine objecten die met bijna
de lichtsnelheid bewegen (denk bijvoorbeeld aan de situatie in een deeltjesversneller), is het echter
noodzakelijk om een kwantumtheorie te onwikkelen die uitgaat van Minkowski ruimtetijd. In de
ontwikkeling van een dergelijke theorie wordt het al gauw heel natuurlijk om velden te introduceren,
en om deze reden heet de betreffende theorie dan ook kwantumveldentheorie.
Kwantumveldentheorie is een zeer succesvolle theorie in de zin dat de voorspellingen die met
de theorie gedaan kunnen worden, met zeer grote nauwkeurigheid overeenkomen met de experimentele data. Vanuit de natuurkunde beschouwd is kwantumveldentheorie dus een zeer goede
theorie. Wiskundig gezien is kwantumveldentheorie echter een theorie die zeer slecht gedefinieerd
is, omdat vaak niet duidelijk is wat de precieze aard is van de ’wiskundige’ objecten die gebruikt
worden. In deze scriptie onderzoeken we in hoeverre kwantumveldentheorie beschreven kan worden
als een wiskundige theorie. Het zal blijken dat er interessante formuleringen zijn van wat kwantumveldentheorie wiskundig zou moeten zijn, maar dat het nog niet zo eenvoudig is om te bewijzen
dat de kwantumveldentheorie, zoals die gebruikt wordt door natuurkundigen, ook daadwerkelijk
van de vorm is zoals beschreven in deze wiskundige formuleringen55 . Dat dit niet eenvoudig is zal
blijken uit de moeilijkheden die we tegenkomen als we ditzelfde bewijzen voor veel eenvoudigere
(en niet-realistische) versies van kwantumveldentheorie.
55
Dit probleem, ook wel het quantum Yang-Mills probleem genoemd, is één van de zeven Millenium Prize Problems.
Wie een dergelijk probleem op weet te lossen, krijgt een miljoen dollar uitgekeerd van het Clay Mathematics Institute.
Tot dusver is slechts één van deze zeven problemen opgelost, namelijk het Poincaré vermoeden.
143