Download Chapter_9 - Experimental Elementary Particle Physics Group

Document related concepts

Dirac equation wikipedia , lookup

Elementary particle wikipedia , lookup

Quantum field theory wikipedia , lookup

Identical particles wikipedia , lookup

Many-worlds interpretation wikipedia , lookup

Quantum key distribution wikipedia , lookup

Quantum electrodynamics wikipedia , lookup

Path integral formulation wikipedia , lookup

Orchestrated objective reduction wikipedia , lookup

Propagator wikipedia , lookup

Renormalization group wikipedia , lookup

Quantum teleportation wikipedia , lookup

Scalar field theory wikipedia , lookup

Wave function wikipedia , lookup

Quantum entanglement wikipedia , lookup

Renormalization wikipedia , lookup

Bell test experiments wikipedia , lookup

Max Born wikipedia , lookup

Copenhagen interpretation wikipedia , lookup

Quantum state wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

Topological quantum field theory wikipedia , lookup

Matter wave wikipedia , lookup

Double-slit experiment wikipedia , lookup

Relativistic quantum mechanics wikipedia , lookup

T-symmetry wikipedia , lookup

Canonical quantization wikipedia , lookup

History of quantum field theory wikipedia , lookup

Bohr–Einstein debates wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

EPR paradox wikipedia , lookup

Wave–particle duality wikipedia , lookup

Bell's theorem wikipedia , lookup

Event symmetry wikipedia , lookup

Hidden variable theory wikipedia , lookup

Transcript
9. The Relativistic Topology
9.1 In the Neighborhood
Nothing puzzles me more than time and space; and yet nothing troubles
me less, as I never think about them.
Charles Lamb (1775-1834)
It's customary to treat the relativistic spacetime manifold as an ordinary topological space
with the same topology as a four-dimensional Euclidean manifold, denoted by R4. This is
typically justified by noting that the points of spacetime can be parameterized by a set of
four coordinates x,y,z,t, and defining the "neighborhood" of a point somewhat informally
as follows (quoted from Ohanian and Ruffinni):
...the neighborhood of a given point is the set of all points such that their
coordinates differ only a little from those of the given point.
Of course, the neighborhoods given by this definition are not Lorentz-invariant, because
the amount by which the coordinates of two points differ is highly dependent on the
frame of reference. Consider, for example, two spacetime points in the xt plane with the
coordinates {0,0} and {1,1} with respect to a particular system of inertial coordinates. If
we consider these same two points with respect to the frame of an observer moving in the
positive x direction with speed v (and such that the origin coincides with the former
coordinate origin), the differences in both the space and time coordinates are reduced by
a factor of
, which can range anywhere between 0 and . Thus there exist
valid inertial reference systems with respect to which both of the coordinates of these
points differ (simultaneously) by as little or as much as we choose. Based on the above
definition of neighborhood (i.e., points whose coordinates “differ only a little”), how can
we decide if these two points are in the same neighborhood?
It might be argued that the same objection could be raised against this coordinate-based
definition of neighborhoods in Euclidean space, since we're free to scale our coordinates
arbitrarily, which implies that the numerical amount by which the coordinates of two
given (distinct) points differ is arbitrary. However, in Euclidean space this objection is
unimportant, because we will arrive at the same definition of limit points, and thus the
same topology, regardless of what scale factor we choose. In fact, the same applies even
if we choose unequal scale factors in different directions, provided those scale factors are
all finite and non-zero.
From a strictly mathematical standpoint, the usual way of expressing the arbitrariness of
metrical scale factors for defining a topology on a set of points is to say that if two
systems of coordinates are related by a diffeomorphism (a differentiable mapping that
possess a differentiable inverse), then the definition of neighborhoods in terms of
"coordinates that differ only a little" will yield the same limit points and thus the same
topology. However, from the standpoint of a physical theory it's legitimate to ask
whether the set of distinct points (i.e., labels) under our chosen coordinate system
actually corresponds one-to-one with the distinct physical entities whose connectivities
we are tying to infer. For example, we can represent formal fractions x/y for real values
of x and y as points on a Euclidean plane with coordinates (x,y), and conclude that the
topology of formal fractions is R2, but of course the value of every fraction lying along a
single line through the origin is the same, and the values of fractions have the natural
topology of R1 (because the reals are closed under division, aside from divisions by zero).
If the meanings assigned to our labels are arbitrary, then these are simply two different
manifolds with their own topologies, but for a physical theory we may wish to decide
whether the true objects of our study - the objects with ontological status in our theory are formal fractions or the values of fractions. When trying to infer the natural physical
topology of the points of spacetime induced by the Minkowski metric we face a similar
problem of identifying the actual physical entities whose mutual connectivities we are
trying to infer, and the problem is complicated by the fact that the "Minkowski metric" is
not really a metric at all (as explained below).
Recall that for many years after general relativity was first proposed by Einstein there
was widespread confusion and misunderstanding among leading scientists (including
Einstein himself) regarding various kinds of singularities. The main source of confusion
was the failure to clearly distinguish between singularities of coordinate systems as
opposed to actual singularities of the manifold/field. This illustrates how we can be
misled by the belief that the local topology of a physical manifold corresponds to the
local topology of any particular system of coordinates that we may assign to that physical
manifold. It’s entirely possible for the “manifold of coordinates” to have a different
topology than the physical manifold to which those coordinates are applied. With this in
mind, it’s worthwhile to consider carefully whether the most physically meaningful local
topology of spacetime is necessarily the same as the topology of the usual fourdimensional systems of coordinates that are conventionally applied to it.
Before examining the possible topologies of Minkowski spacetime in detail, it's
worthwhile to begin with a review of the basic definitions of point set topologies and
topological spaces. Given a set S, let P(S) denote the set of all subsets of S. A topology
for the set S is a mapping T from the Cartesian product {S  P(S)} to the discrete set
{0,1}. In other words, given any element e of S, and any subset A of S, the mapping
T(A,e) returns either 0 or 1. In the usual language of topology, we say that e is a limit
point of A if and only if T(A,e) = 1.
As an example, we can define a topology on the set of points of 2D Euclidean space
equipped with the usual Pythagorean metric
(1)
by saying that the point e is a limit point of any subset A of points of the plane if and only
if for every positive real number  there is an element u (other than e) of A such that
d(e,u) < . Clearly this definition relies on prior knowledge of the "topology" of the real
numbers, which is denoted by R1. The topology of 2D Euclidean space is called R2, since
it is just the Cartesian product R1  R1.
The topology of a Euclidean space described above is actually a very special kind of
topology, called a topological space. The distinguishing characteristic of a topological
space S,T is that S contains a collection of subsets, called the open sets (including S itself
and the empty set) which is closed under unions and finite intersections, and such that a
point p is a limit point of a subset A of S if and only if every open set containing p also
contains a point of A distinct from p. For example, if we define the collection of open
spherical regions in Euclidean space, together with any regions that can be formed by the
union or finite intersection of such spherical regions, as our open sets, then we arrive at
the same definition of limit points as given previously. Therefore, the topology we've
described for the points of Euclidean space constitutes a topological space. However, it's
important to realize that not every topology is a topological space.
The basic sets that we used to generate the Euclidean topology were spherical regions
defined in terms of the usual Pythagorean metric, but the same topology would also be
generated by any other metric. In general, a basis for a topological space on the set S is a
collection B of subsets of S whose union comprises all of S and such that if p is in the
intersection of two elements Bi and Bj of B, then there is another element Bk of B which
contains p and which is entirely contained in the intersection of Bi and Bj, as illustrated
below for circular regions on a plane.
Given a basis B on the set S, the unions of elements of B satisfy the conditions for open
sets, and hence serve to define a topological space. (This relies on the fact that we can
represent non-circular regions, such as the intersection of two circular open sets, as the
union of an infinite number of circular regions of arbitrary sizes.)
If we were to substitute the metric
d(a,b) = |xa  xb| + |ya  yb|
in place of the Pythagorean metric, then the basis sets, defined as loci of points whose
"distances" from a fixed point p are less than some specified real number r, would be
square-shaped diamonds instead of circles, but we would arrive at the same topology, i.e.,
the same definition of limit points for the subsets of the Euclidean plane E2. In general,
any true metric will induce this same local topology on a manifold. Recall that a metric
is defined as a distance function d(a,b) for any two points a,b in the space satisfying the
three axioms
(1) d(a,b) = 0 if and only if a = b
(2) d(a,b) = d(b,a) for each a,b
(3) d(a,c)  d(a,b) + d(b,c) for all a,b,c
It follows that d(a,b)  0 for all a,b. Any distance function that satisfies the conditions of
a metric will induce the same (local) topology on a set of points, and this will be a
topological space.
However, it's possible to conceive of more general "distance functions" that do not satisfy
all the axioms of a metric. For example, we can define a distance function that is
commutative (axiom 2) and satisfies the triangle inequality (axiom 3), but that allows
d(a,b) = 0 for distinct points a,b. Thus we replace axiom (1) with the weaker requirement
d(a,a) = 0. Such a distance function is called a pseudometric. Obviously if a,b are any
two points with d(a,b) = 0 we must have d(a,c) = d(b,c) for every point c, because
otherwise the points a,b,c would violate the triangle inequality. Thus a pseudometric
partitions the points of the set into equivalence classes, and the distance relations between
these equivalence classes must be metrical. We've already seen a situation in which a
pseudometric arises naturally, if we define the distance between two points in the plane
of formal fractions as the absolute value of the difference in slopes of the lines from the
origin to those two points. The distance between any two points on a single line through
the origin is therefore zero, and these lines represent the equivalence classes induced by
the pseudometric. Of course, the distances between the slopes satisfy the requirements of
a metric. Therefore, the absolute difference of value is a pseudometric for the space of
formal fractions.
Now, we know that the points of a two-dimensional plane can be assigned the R2
topology, and the values of fractions can be assigned the R1 topology, but what kind of
local topology is induced on the two-dimensional space of formal fractions by the
pseudometric? We can use our pseudometric distance function to define a basis, just as
with a metrical distance function, and arrive at a topological space, but this space will not
generally possess all the separation properties that we commonly expect for distinct
points of a topological space.
It's convenient to classify the separation properties of topological spaces according to the
"trennungsaxioms", also called the Ti axioms, introduced by Alexandroff and Hopf.
These represent a sequence of progressively stronger separation axioms to be met by the
points of a topological space. A space is said to be T0 if for any two distinct points at
least one of them is in a neighborhood that does not include the other. If each point is
contained in a neighborhood that does not include the other, then the space is called T1.
If the space satisfies the even stronger condition that any two points are contained in
disjoint open sets, then the space is called T2, also known as a Hausdorff space. There
are still more stringent separation axioms that can be applied, corresponding to T3
(regular), T4 (normal), and so on.
Many topologists will not even consider a topological space which is not at least T2 (and
some aren't interested in anything which is not at least T4), and yet it's clear that the
topology of the space of formal fractions induced by the pseudometric of absolute values
is not even T0, because two distinct fractions with the same value (such as 1/3 and 2/6)
cannot be separated into different neighborhoods by the pseudometric. Nevertheless, we
can still define the limit points of the set of formal fractions based on the pseudometric
distance function, thereby establishing a perfectly valid topology. This just illustrates
that the distinct points of a topology need not exhibit all the separation properties that we
usually associate with distinct points of a Hausdorff spaces (for example).
Now let's consider 1+1 dimensional Minkowski spacetime, which is physically
characterized by an invariant spacetime interval whose magnitude is
d(a,b) = | (ta  tb)2  (xa  xb)2 |
(2)
Empirically this appears to be the correct measure of absolute separation between the
points of spacetime, i.e., it corresponds to what clocks measure along timelike intervals
and what rulers measure along spacelike intervals. However, this distance function
clearly does not satisfy the definition of a metric, because it can equal zero for distinct
points. Moreover, it is not even a pseudo-metric, because the interval between points a
and b is always greater than the sum of the intervals from a to c and from c to b,
contradicting the triangle inequality. For example, it's quite possible in Minkowski
spacetime to have two sides of a "triangle" equal to zero while the remaining side is
billions of light years in length. Thus, the absolute interval of space-time does not
provide a metrical measure of distance in the strict sense. Nevertheless, in other ways the
magnitude of the interval d(a,b) is quite analogous to a metrical distance, so it's
customary to refer to it loosely as a "metric", even though it is neither a true metric nor
even a pseudometric. We emphasize this fact to remind ourselves not to prejudge the
topology induced by this distance function on the points of Minkowski spacetime, and
not to assume that distinct events possess the separation properties or connectivities of a
topological space.
The -neighborhood of a point p in the Euclidean plane based on the Pythagorean metric
(1) consists of the points q such that d(p,q) < . Thus the -neighborhoods of two points
in the plane are circular regions centered on the respective points, as shown in the lefthand illustration below. In contrast, the -neighborhoods of two points in Minkowski
spacetime induced by the Lorentz-invariant distance function (2) are the regions bounded
by the hyperbolic envelope containing the light lines emanating from those points, as
shown in the right-hand illustration below.
This illustrates the important fact that the concept of "nearness" implied by the
Minkowski metric is non-transitive. In a metric (or even a pseudometric) space, the
triangle inequality ensures that if A and B are close together, and B and C are close
together, then A and C cannot be very far apart. This transitivity obviously doesn't apply
to the absolute magnitudes of the spacetime intervals between events, because it's
possible for A and B to be null-separated, and for B and C to be null separated, while A
and C are arbitrarily far apart.
Interestingly, it is often suggested that the usual Euclidean topology of spacetime might
break down on some sufficiently small scale, such as over distances on the order of the
Planck length of roughly 10-35 meters, but the system of reference for evaluating that
scale is usually not specified. As noted previously, the spatial and temporal components
of two null-separated events can both simultaneously be regarded as arbitrarily large or
arbitrarily small (including less than 10-35 meters), depending on which system of inertial
coordinates we choose. This null-separation condition permeates the whole of spacetime
(recall Section 1.10 on Null Coordinates), so if we take seriously the possibility of nonEuclidean topology on the Planck scale, we can hardly avoid considering the possibility
that the effective physical topology ("connectedness") of the points of spacetime may be
non-Euclidean along null intervals in their entirety, which span all scales of spacetime.
It's certainly true that the topology induced by a direct application of the Minkowski
distance function (2) is not even a topological space, let alone Euclidean. To generate
this topology, we simply say that the point e is a limit point of any subset A of points of
Minkowski spacetime if and only if for every positive real number  there is an element u
(other than e) of A such that d(e,u) < . This is a perfectly valid topology, and arguably
the one most consistent with the non-transitive absolute intervals that seem to physically
characterize spacetime, but it is not a topological space. To see this, recall that in order
for a topology to be a topological space it must be possible to express the limit point
mapping in terms of open sets such that a point e is a limit point of a subset A of S if and
only if every open set containing e also contains a point of A distinct from e. If we define
our topological neighborhoods in terms of the Minkowski absolute intervals, our open
sets would naturally include complete Minkowski neighborhoods, but these regions don't
satisfy the condition for a topological space, as illustrated below, where e is a limit point
of A, but e is also contained in Minkowski neighborhoods containing no point of A.
The idea of a truly Minkowskian topology seems unsatisfactory to many people, because
they worry that it implies every two events are mutually "co-local" (i.e., their local
neighborhoods intersect), and so the entire concept of "locality" becomes meaningless.
However, the fact that a set of points possesses a non-positive-definite line element does
not imply that the set degenerates into a featureless point (which is fortunate, considering
that the spacetime we inhabit is characterized by just such a line element). It simply
implies that we need to apply a more subtle understanding of the concept of locality,
taking account of its non-transitive aspect. In fact, the overlapping of topological
neighborhoods in spacetime suggests a very plausible approach to explaining the "nonlocal" quantum correlations that seem so mysterious when viewed from the viewpoint of
Euclidean topology. We'll consider this in more detail in subsequent chapters.
It is, of course, possible to assign the Euclidean topology to Minkowski spacetime, but
only by ignoring the non-transitive null structure implied by the Lorentz-invariant
distance function. To do this, we can simply take as our basis sets all the finite
intersections of Minkowski neighborhoods. Since the contents of an -neighborhood of a
given point are invariant under Lorentz transformations, it follows that the contents of the
intersection of the -neighborhoods of two given points are also invariant. Thus we can
define each basis set by specifying a finite collection of events with a specific value of 
for each one, and the resulting set of points is invariant under Lorentz transformations.
This is a more satisfactory approach than defining neighborhoods as the set of points
whose coordinates (with respect to some arbitrary system of coordinates) differ only a
little, but the fact remains that by adopting this approach we are still tacitly abandoning
the Lorentz-invariant sense of nearness and connectedness, because we are segregating
null-separated events into disjoint open sets. This is analogous to saying, for the plane of
formal fractions, that 4/6 is not a limit point of every set containing 2/3, which is
certainly true on the formal level, but it ignores the natural topology possessed by the
values of fractions. In formulating a physical theory of fractions we would need to
decide at some point whether the observable physical phenomena actually correspond to
pairings of numerators and denominators, or to the values of fractions, and then select the
appropriate topology. In the case of a spacetime theory, we need to consider whether the
temporal and spatial components of intervals have absolute significance, or whether it is
only the absolute intervals themselves that are significant.
It's worth reviewing why we ever developed the Euclidean notion of locality in the first
place, and why it's so deeply engrained in our thought processes, when the spacetime
which we inhabit actually possesses a Minkowskian structure. This is easily attributed to
the fact that our conscious experience is almost exclusively focused on the behavior of
macro-objects whose overall world-lines are nearly parallel relative to the characteristic
of the metric. In other words, we're used to dealing with objects whose mutual velocities
are small relative to c, and for such objects the structure of spacetime does approach very
near to being Euclidean. On the scales of space and time relevant to macro human
experience the trajectories of incoming and outgoing light rays through any given point
are virtually indistinguishable, so it isn't surprising that our intuition reflects a Euclidean
topology. (Compare this with the discussion of Postulates and Principles in Chapter 3.1.)
Another important consequence of the non-positive-definite character of Minkowski
spacetime concerns the qualitative nature of geodesic paths. In a genuine metric space
the geodesics are typically the shortest paths from place to place, but in Minkowski
spacetime the timelike geodesics are the longest paths, in terms of the absolute value of
the invariant intervals. Of course, if we allow curvature, there may be multiple distinct
"maximal" paths between two given events. For example, if we shoot a rocket straight up
(with less than escape velocity), and it passes an orbiting satellite on the way up, and
passes the same satellite again on the way back down, then each of them has followed a
geodesic path between their meetings, but they have followed very different paths.
From one perspective, it's not surprising that the longest paths in spacetime correspond to
physically interesting phenomena, because the shortest path between any two points in
Minkowski spacetime is identically zero. Hence the structure of events was bound to
involve the longest paths. However, it seems rash to conclude that the shortest paths play
no significant role in physical phenomena. The shortest absolute timelike path between
two events follows a "dog leg" path, staying as close as possible to the null cones
emanating from the two events. Every two points in spacetime are connected by a
contiguous set of lightlike intervals whose absolute magnitudes are zero.
Minkowski spacetime provides an opportunity to reconsider the famous "limit paradox"
from freshman calculus in a new context. Recall the standard paradox begins with a twopart path in the xy plane from point A to point C by way of point B as shown below:
If the real segment AC has length 1, then the dog-leg path ABC has length
, as does
each of the zig-zag paths ADEFC, AghiEjklC, and so on. As we continue to subdivide
the path into more and smaller zigzags the envelope of the path converges on the straight
line from A to C. The "paradox" is that the limiting zigzag path still has length
,
whereas the line to which it converges (and from which we might suppose it is indistinguishable) has length 1. Needless to say, this is not a true paradox, because the limit of a
set of convergents does not necessarily possess all the properties of the convergents.
However, from a physical standpoint it teaches a valuable lesson, which is that we can't
necessarily assess the length of a path by assuming it equals the length of some curve
from which it never differs by any measurable amount.
To place this in the context of Minkowski spacetime, we can simply replace the y axis
with the time axis, and replace the Euclidean metric with the Minkowski pseudo-metric.
We can still assume the length of the interval AC is 1, but now each of the diagonal
segments is a null interval, so the total path length along any of the zigzag paths is
identically zero. In the limit, with an infinite number of infinitely small zigzags, the
jagged "null path" is everywhere practically coincident with the timelike geodesic path
AC, and yet its total length remains zero. Of course, the oscillating acceleration required
to propel a massive particle on a path approaching these light-like segments would be
enormous, as would the frequency of oscillation.
9.2 Up To Diffeomorphism
The mind of man is more intuitive than logical, and comprehends more
than it can coordinate.
Vauvenargues,
1746
Einstein seems to have been strongly wedded to the concept of the continuum described
by partial differential equations as the only satisfactory framework for physics. He was
certainly not the first to hold this view. For example, in 1860 Riemann wrote
As is well known, physics became a science only after the invention of
differential calculus. It was only after realizing that natural phenomena are
continuous that attempts to construct abstract models were successful… In the
first period, only certain abstract cases were treated: the mass of a body was
considered to be concentrated at its center, the planets were mathematical
points… so the passage from the infinitely near to the finite was made only in one
variable, the time [i.e., by means of total differential equations]. In general,
however, this passage has to be done in several variables… Such passages lead to
partial differential equations… In all physical theories, partial differential
equations constitute the only verifiable basis. These facts, established by
induction, must also hold a priori. True basic laws can only hold in the small and
must be formulated as partial differential equations.
Compare this with Einstein’s comments (see Section 3.2) over 70 years later about the
unsatisfactory dualism inherent in Lorentz’s theory, which expressed the laws of motion
of particles in the form of total differential equations while describing the
electromagnetic field by means of partial differential equations. Interestingly, Riemann
asserted that the continuous nature of physical phenomena was “established by
induction”, but immediately went on to say it must also hold a priori, referring somewhat
obscurely to the idea that “true basic laws can only hold in the infinitely small”. He may
have been trying to convey by these words his rejection of “action at a distance”. Einstein
attributed this insight to the special theory of relativity, but of course the Newtonian
concept of instantaneous action at a distance had always been viewed skeptically, so it
isn’t surprising that Riemann in 1860 – like his contemporary Maxwell – adopted the
impossibility of distant action as a fundamental principle. (It’s interesting the consider
whether Einstein might have taken this, rather than the invariance of light speed, as one
of the founding principles of special relativity, since it immediately leads to the
impossibility of rigid bodies, etc.) In his autobiographical notes (1949) Einstein wrote
There is no such thing as simultaneity of distant events; consequently, there is
also no such thing as immediate action at a distance in the sense of Newtonian
mechanics. Although the introduction of actions at a distance, which propagate at
the speed of light, remains feasible according to this theory, it appears unnatural;
for in such a theory there could be no reasonable expression for the principle of
conservation of energy. It therefore appears unavoidable that physical reality
must be described in terms of continuous functions in space.
It’s worth noting that while Riemann and Maxwell had expressed their objections in
terms of “action at a (spatial) distance”, Einstein can justly claim that special relativity
revealed that the actual concept to be rejected was instantaneous action at a distance. He
acknowledge that “distant action” propagating at the speed of light – which is to say,
action over null intervals – is remains feasible. In fact, one could argue that such “distant
action” was made more feasible by special relativity, especially in the context of
Minkowski’s spacetime, in which the null (light-like) intervals have zero absolute
magnitude. For any two light-like separated events there exist perfectly valid systems of
inertial coordinates in terms of which both the spatial and the temporal measures of
distance are arbitrarily small. It doesn’t seem to have troubled Einstein (nor many later
scientists) that the existence of non-trivial null intervals potentially undermines the
identification of the topology of pseudo-metrical spacetime with that of a true metric
space. Thus Einstein could still write that the coordinates of general relativity express the
“neighborliness” of events “whose coordinates differ but little from each other”. As
argued in Section 9.1, the assumption that the physically most meaningful topology of a
pseudo-metric space is the same as the topology of continuous coordinates assigned to
that space, even though there are singularities in the invariant measures based on those
coordinates, is questionable. Given Einstein’s aversion to singularities of any kind,
including even the coordinate singularity at the Schwarzschild radius, it’s somewhat
ironic that he never seems to have worried about the coordinate singularity of every
lightlike interval and the non-transitive nature of “null separation” in ordinary Minkowski
spacetime.
Apparently unconcerned about the topological implications of Minkowski spacetime,
Einstein inferred from the special theory that “physical reality must be described in terms
of continuous functions in space”. Of course, years earlier he had already considered
some of the possible objections to this point of view. In his 1936 essay on “Physics and
Reality” he considered the “already terrifying” prospect of quantum field theory, i.e., the
application of the method of quantum mechanics to continuous fields with infinitely
many degrees of freedom, and he wrote
To be sure, it has been pointed out that the introduction of a space-time
continuum may be considered as contrary to nature in view of the molecular
structure of everything which happens on a small scale. It is maintained that
perhaps the success of the Heisenberg method points to a purely algebraical
method of description of nature, that is to the elimination of continuous functions
from physics. Then, however, we must also give up, on principle, the space-time
continuum. It is not unimaginable that human ingenuity will some day find
methods which will make it possible to proceed along such a path. At the present
time, however, such a program looks like an attempt to breathe in empty space.
In his later search for something beyond general relativity that would encompass
quantum phenomena, he maintained that the theory must be invariant under a group that
at least contains all continuous transformations (represented by the symmetric tensor), but
he hoped to enlarge this group.
It would be most beautiful if one were to succeed in expanding the group once
more in analogy to the step that led from special relativity to general relativity.
More specifically, I have attempted to draw upon the group of complex
transformations of the coordinates. All such endeavours were unsuccessful. I also
gave up an open or concealed increase in the number of dimensions, an endeavor
that … even today has its adherents.
The reference to complex transformations is an interesting fore-runner of more recent
efforts, notably Penrose’s twistor program, to exploit the properties of complex functions
(cf Section 9.9). The comment about increasing the number of dimensions certainly has
relevance to current “string theory” research. Of course, as Einstein observed in an
appendix to his Princeton lectures, “In this case one must explain why the continuum is
apparently restricted to four dimensions”. He also mentioned the possibility of field
equations of higher order, but he thought that such ideas should be pursued “only if there
exist empirical reasons to do so”. On this basis he concluded
We shall limit ourselves to the four-dimensional space and to the group of
continuous real transformations of the coordinates.
He went on to describe what he (then) considered to be the “logically most satisfying
idea” (involving a non-symmetric tensor), but added a footnote that revealed his lack of
conviction, saying he thought the theory had a fair probability of being valid “if the way
to an exhaustive description of physical reality on the basis of the continuum turns out to
be at all feasible”. A few years later he told Abraham Pais that he “was not sure
differential geometry was to be the framework for further progress”, and later still, in
1954, just a year before his death, he wrote to his old friend Besso (quoted in Section 3.8)
that he considered it quite possible that physics cannot be based on continuous structures.
The dilemma was summed up at the conclusion of his Princeton lectures, where he said
One can give good reasons why reality cannot at all be represented by a
continuous field. From the quantum phenomena it appears to follow with certainty
that a finite system of finite energy can be completely described by a finite set of
numbers… but this does not seem to be in accordance with a continuum theory,
and must lead to an attempt to find a purely algebraic theory for the description of
reality. But nobody knows how to obtain the basis of such a theory.
The area of current research involving “spin networks” might be regarded as attempts to
obtain an algebraic basis for a theory of space and time, but so far these efforts have not
achieved much success. The current field of “string theory” has some algebraic aspects,
but it seems to entail much the same kind of dualism that Einstein found so objectionable
in Lorentz’s theory. Of course, most modern research into fundamental physics is based
on quantum field theory, about which Einstein was never enthusiastic – to put it mildly.
(Bargmann told Pais that Einstein once “asked him for a private survey of quantum field
theory, beginning with second quantization. Bargman did so for about a month.
Thereafter Einstein’s interest waned.”)
Of all the various directions that Einstein and others have explored, one of the most
intriguing (at least from the standpoint of relativity theory) was the idea of “expanding
the group once more in analogy to the step that led from special relativity to general
relativity”. However, there are many different ways in which this might conceivably be
done. Einstein referred to allowing complex transformations, or non-symmetric, or
increasing the number of dimensions, etc., but all these retain the continuum hypothesis.
He doesn’t seem to have seriously considered relaxing this assumption, and allowing
completely arbitrary transformations (unless this is what he had in mind when he referred
to an “algebraic theory”). Ironically in his expositions of general relativity he often
proudly explained that it gave an expression of physical laws valid for completely
arbitrary transformations of the coordinates, but of course he meant arbitrary only up to
diffeomorphism, which in the absolute sense is not very arbitrary at all.
We mentioned in the previous section that diffeomorphically equivalent sets can be
assigned the same topology, but from the standpoint of a physical theory it isn't selfevident which diffeomorphism is the right one (assuming there is one) for a particular set
of physical entities, such as the events of spacetime. Suppose we're able to establish a 1to-1 correspondence between certain physical events and the sets of four real-valued
numbers (x0,x1,x2,x3). (As always, the superscripts are indices, not exponents.) This is
already a very strong supposition, because the real numbers are uncountable, even over a
finite range, so we are supposing that physical events are also uncountable. However,
I've intentionally not characterized these physical events as points in a certain contiguous
region of a smooth continuous manifold, because the ability to place those events in a
one-to-one correspondence with the coordinate sets does not, by itself, imply any
particular arrangement of those events. (We use the word arrangement here to signify
the notions of order and nearness associated with a specific topology.) In particular, it
doesn't imply an arrangement similar to that of the coordinate sets interpreted as points in
the four-dimensional space denoted by R4.
To illustrate why the ability to map events with real coordinates does not, by itself, imply
a particular arrangement of those events, consider the coordinates of a single event,
normalized to the range 0-1, and expressed in the form of their decimal representations,
where xmn denotes the nth most significant digit of the mth coordinate, as shown below
x0 = 0.
x1 = 0.
x2 = 0.
x3 = 0.
x01 x02 x03 x04
x11 x12 x13 x14
x21 x22 x23 x24
x31 x32 x33 x34
x05 x06 x07 x08
x15 x16 x17 x18
x25 x26 x27 x28
x35 x36 x37 x38
...
...
...
...
We could, as an example, assign each such set of coordinates to a point in an ordinary
four-dimensional space with the coordinates (y0,y1,y2,y3) given by the diagonal sets of
digits from the corresponding x coordinates, taken in blocks of four, as shown below
y0 = 0.
y1 = 0.
y2 = 0.
y3 = 0.
x01 x12 x23 x34
x02 x13 x24 x31
x03 x14 x21 x32
x04 x11 x22 x33
x05 x16 x27 x38
x06 x17 x28 x35
x07 x18 x25 x35
x08 x15 x26 x37
...
...
...
...
We could also transpose each consecutive pair of blocks, or scramble the digits in any
number of other ways, provided only that we ensure a 1-to-1 mapping. We could even
imagine that the y space has (say) eight dimensions instead of four, and we could
construct those eight coordinates from the odd and even numbered digits of the four x
coordinates. It's easy to imagine numerous 1-to-1 mappings between a set of abstract
events and sets of coordinates such that the actual arrangement of the events (if indeed
they possess one) bears no direct resemblance to the arrangement of the coordinate sets in
their natural space.
So, returning to our task, we've assigned coordinates to a set of events, and we now wish
to assert some relationship between those events that remains invariant under a particular
kind of transformation of the coordinates. Specifically, we limit ourselves to coordinate
mappings that can be reached from our original x mapping by means of a simple linear
transformation applied on the natural space of x. In other words, we wish to consider
transformations from x to X given by a set of four continuous functions f i with
continuous partial first derivatives. Thus we have
X0
X1
X2
X3
=
=
=
=
f 0 (x0 , x1 , x2 , x3)
f 1 (x0 , x1 , x2 , x3)
f 2 (x0 , x1 , x2 , x3)
f 3 (x0 , x1 , x2 , x3)
Further, we require this transformation to posses a differentiable inverse, i.e., there exist
differentiable functions Fi such that
x0
x1
x2
x3
=
=
=
=
F0 (X0 , X1 , X2 , X3)
F1 (X0 , X1 , X2 , X3)
F2 (X0 , X1 , X2 , X3)
F3 (X0 , X1 , X2 , X3)
A mapping of this kind is called a diffeomorphism, and two sets are said to be equivalent
up to diffeomorphism if there is such a mapping from one to the other. Any physical
theory, such as general relativity, formulated in terms of tensor fields in spacetime
automatically possess the freedom to choose the coordinate system from among a
complete class of diffeomorphically equivalent systems. From one point of view this can
be seen as a tremendous generality and freedom from dependence on arbitrary coordinate
systems. However, as noted above, there are infinitely many systems of coordinates that
are not diffeomorphically equivalent, so the limitation to equivalent systems up to
diffeomorphism can also be seen as quite restrictive.
For example, no such functions can possibly reproduce the digit-scrambling
transformations discussed previously, such as the mapping from x to y, because those
mappings are everywhere discontinuous. Thus we cannot get from x coordinates to y
coordinates (or vice versa) by means of continuous transformations. By restricting
ourselves to differentiable transformations we're implicitly focusing our attention on one
particular equivalence class of coordinate systems, with no a priori guarantee that this
class of systems includes the most natural parameterization of physical events. In fact,
we don't even know if physical events possess a natural parameterization, or if they do,
whether it is unique.
Recall that the special theory of relativity assumes the existence and identifiability of a
preferred equivalence class of coordinate systems called the inertial systems. The laws of
physics, according to special relativity, should be the same when expressed with respect
to any inertial system of coordinates, but not necessarily with respect to non-inertial
systems of reference. It was dissatisfaction with having given a preferred role to a
particular class of coordinate systems that led Einstein to generalize the "gage freedom"
of general relativity, by formulating physical laws in pure tensor form (general
covariance) so that they apply to any system of coordinates from a much larger
equivalence class, namely, those that are equivalent to an inertial coordinate system up to
diffeomorphism. This entails accelerated coordinate systems (over suitably restricted
regions) that are outside the class of inertial systems. Impressive though this
achievement is, we should not forget that general relativity is still restricted to a preferred
class of coordinate systems, which comprise only an infinitesimal fraction of all
conceivable mappings of physical events, because it still excludes non-diffeomorphic
transformations.
It's interesting to consider how we arrive at (and agree upon) our preferred equivalence
class of coordinate systems. Even from the standpoint of special relativity the
identification of an inertial coordinate system is far from trivial (even though it's often
taken for granted). When we proceed to the general theory we have a great deal more
freedom, but we're still confined to a single topology, a single pattern of coherence. How
is this coherence apprehended by our senses? Is it conceivable that a different set of
senses might have led us to apprehend a different coherent structure in the physical
world? More to the point, would it be possible to formulate physical laws in such a way
that they remain applicable under completely arbitrary transformations?
9.3 Higher-Order Metrics
A similar path to the same goal could also be taken in those manifolds in
which the line element is expressed in a less simple way, e.g., by a fourth
root of a differential expression of the fourth degree…
Riemann, 1854
Given three points A,B,C, let dx1 denote the distance between A and B, and let dx2
denote the distance between B and C. Can we express the distance ds between A and C in
terms of dx1 and dx2? Since dx1, dx2, and ds all represent distances with comensurate
units, it's clear that any formula relating them must be homogeneous in these quantities,
i.e., they must appear to the same power. One possibility is to assume that ds is a linear
combination of dx1 and dx2 as follows
where g1 and g2 are constants. In a simple one-dimensional manifold this would indeed
be the correct formula for ds, with |g1| = |g2| = 1, except for the fact that it might give a
negative sign for ds, contrary to the idea of an interval as a positive magnitude. To ensure
the correct sign for ds, we might take the absolute value of the right hand side, which
suggests that the fundamental equality actually involves the squares of the two sides of
the above equation, i.e., the quantities ds, dx1, dx2 satisfy the relation
where we have put gij = gi gj. Thus we have g11g22  4(g12)2 = 0, which is the condition
for factorability of the expanded form as the square of a linear expression. This will be
the case in a one-dimensional manifold, but in more general circumstances we find that
the values of the gij in the expanded form of (2) are such that the expression is not
factorable into linear terms with real coefficients. In this way we arrive at the secondorder metric form, which is the basis of Riemannian geometry.
Of course, by allowing the second-order coefficients gij to be arbitrary, we make it
possible for (ds)2 to be negative, analagous to the fact that ds in equation (1) could be
negative, which is what prompted us to square both sides of (1), leading to equation (2).
Now that (ds)2 can be negative, we're naturally led to consider the possibility that the
fundamental relation is actually the equality of the squares of boths sides of (2). This
gives
where the sum is evaluated for  each ranging from 1 to n, where n is the
dimension of the manifold. Once again, having arrived at this form, we immediately
dispense with the assumption of factorability, and allow general fourth-order metrics.
These are non-Riemannian metrics, although Riemann actually alluded to the possibility
of fourth and higher order metrics in his famous inagural dissertation. He noted that
The line element in this more general case would not be reducible to the square
root of a quadratic sum of differential expressions, and therefore in the expression
for the square of the line element the deviation from flatness would be an
infinitely small quantity of degree two, whereas for the former manifolds [i.e.,
those whose squared line elements are sums of squares] it was an infinitely small
quantity of degree four. This pecularity [i.e., this quantity of the second degree] in
the latter manifolds therefore might well be called the planeness in the smallest
parts…
It's clear even from his brief comments that he had given this possibility considerable
thought, but he never published any extensive work on it. Finsler wrote a dissertation on
this subject in 1918, so such metrics are now often called Finsler metrics.
To visualize the effect of higher order metrics, recall that for a second-order metric the
locus of points at a fixed distance ds from the origin must be a conic, i.e., an ellipse,
hyperbola, or parabola. In contrast, a fourth-order metric allows more complicated loci of
equi-distant points. When applied in the context of Minkowskian metrics, these higherorder forms raise some intriguing possibilities. For example, instead of a spacetime
structure with a single light-like characteristic c, we could imagine a structure with two
null characteristics, c1 and c2. Letting x and t denote the spacelike and timelike
coordinates respectively, this means (ds/dt)4 vanishes for two values (up to sign) of dx/dt.
Thus there are four roots given by c1 and c2, and we have
The resulting metric is
The physical significance of this "metric" naturally depends on the physical meaning of
the coordinates x and t. In Minkowski spacetime these represent what physical rulers and
clocks measure, and we can translate these coordinates from one inertial system to
another according to the Lorentz transformations while always preserving the form of the
Minkowski metric with a fixed numerical value of c. The coordinates x and t are defined
in such a way that c remains invariant, and this definition happily coincides with the
physical measures of rulers and clocks. However, with two distinct light-like
"eigenvalues", it's no longer possible for a single family of spacetime decompositions to
preserve the values of both c1 and c2. Consequently, the metric will take the form of (3)
only with respect to one particular system of xt coordinates. In any other frame of
reference at least one of c1 and c2 must be different.
Suppose that with respect to a particular inertial system of coordinates x,t the spacetime
metric is given by (3) with c1 = 1 and c2 = 2. We might also suppose that c1 corresponds
to the null surfaces of electromagnetic wave propagation, just as in Minkowski
spacetime. Now, with respect to any other system of coordinates x',t' moving with speed
v relative to the x,t coordinates, we can decompose the absolute intervals into space and
time components such that c1 = 1, but then the values of the other lightlines
(corresponding to c2') must be (v + c2)/(1 + v c2) and (v  c2)/(1  v c2). Consequently,
for states of motion far from the one in which the metric takes the special form (3), the
metric will become progressively more asymmetrical. This is illustrated in the figure
below, which shows contours of constant magnitude of the squared interval.
Clearly this metric does not correspond to the observed spacetime structure, even in the
symmetrical case with v = 0, because it is not Lorentz-invariant. As an alternative to this
structure containing "super-light" null surfaces we might consider metrics with some
finite number of "sub-light" null surfaces, but the failure to exhibit even approximate
Lorentz-invariance would remain.
However, it is possible to construct infinite-order metrics with infinitely many super-light
and/or sub-light null surfaces, and in so doing recover a structure that in many respect is
virtually identical to Minkowski spacetime, except for a set (of spacetime trajectories) of
measure zero. This can be done by generalizing (3) to include infinitely many discrete
factors
where the values of ci represent an infinite family of sub-light parameters given by
A plot showing how this spacetime structure develops as n increases is shown below.
This illustrates how, as the number of sub-light cones goes to infinity, the structure of the
manifold goes over to the usual Minkowski pseudometric, except for the discrete null
sub-light surfaces which are distributed throughout the interior of the future and past light
cones, and which accumulate on the light cones. The sub-light null surfaces become so
thin that they no linger show up on these contour plots for large n, but they remain
present to all orders. In the limit as n approaches infinity they become discrete null
trajectories embedded in what amounts to ordinary Minkowski spacetime. To see this,
notice that if none of the factors on the right hand side of (4) is exactly zero we can take
the natural log of both sides to give
Thus the natural log of (ds)2 is the asymptotic average of the natural logs of the quantities
(dx)2ci2(dt)2. Since the values of ci accumultate on 1, it's clear that this converges on
the usual Minkowski metric (provided we are not precisely on any of the discrete sublight null surfaces).
The preceding metric was based purely on sub-light null surfaces. We could also include
n super-light null surfaces along with the n sub-light null surfaces, yielding an asymptotic
family of metrics which, again, goes over to the usual Minkowski metric as n goes to
infinity (except for the discrete null surface structure). This metric is given by the
formula
where the values of ci are generated as before. The results for various values of n are
illustrated in the figure below.
Notice that the quasi Lorentz-invariance of this metric has a subtle periodicity, because
any one of the sublight null surfaces can be aligned with the time axis by a suitable
choice of velocity, or the time axis can be placed "in between" two null surfaces. In a 1+1
dimensional spacetime the structure is perfectly symmetrical modulo this cycle from one
null surface to the next. In other words, the set of exactly equivalent reference systems
corresponds to a cycle with a period of , which is the increment between each ci and
ci+1. However, with more spatial dimensions the sub-light null structure is subtly less
symmetrical, because each null surface represents a discrete cone, which associates two
of the trajectories in the xt plane as the sides of a single cone. Thus there must be an
absolutely innermost cone, in the topological sense, even though that cone may be far off
center, i.e., far from the selected time axis. Similarly for the super-light cones (or
spheres), there would be a single state of motion with respect to which all of those null
surfaces would be spherically symmetrical. Only the accumulation shell, i.e., the actual
light-cone itself, would be spherically symmetrical with respect to all states of motion.
9.4 Polarization and Spin
Every ray of light has therefore two opposite sides… And since the crystal
by this disposition or virtue does not act upon the rays except when one of
their sides of unusual refraction looks toward that coast, this argues a
virtue or disposition in those sides of the rays which answers to and
sympathizes with that virtue or disposition of the crystal, as the poles of
two magnets answer to one another…
Newton, 1717
A transparent crystalline substance, now known as calcite, was discovered by a naval
expedition to Iceland in 1668, and samples of this “Iceland crystal” were examined by the
Danish scientist Erasmus Bartholin, who noticed that a double image appeared when
objects were viewed through this crystal. He found that rays of light passing through
calcite are split into two refracted rays. Some of the incoming light is always refracted at
the normal angle of refraction for the density of the substance and a given angle of
incidence, but some of the incoming light is refracted at a different angle. If the incident
ray is perpendicular to the face of the crystal, the ordinary ray undergoes no refraction
and passes straight through, just as we would expect, but the extraordinary ray is
refracted upon entering the crystal and again upon departing the crystal. Bartholin noted
that the direction in which the extraordinary ray diverges from the perpendicular as it
passes into the crystal depends on the orientation of the crystal about the incident axis.
Thus by rotating the crystal about the incident axis, the second image appearing through
the crystal revolves around the first image. This phenomenon could have been observed
at any time in human history, and might not have been regarded as terribly significant,
but by the middle of the 17th century the study of optics had reached a point where the
occurrence of two distinct refracted rays was a clear anomaly. Bartholin called this “one
of the greatest wonders that nature has produced”. (It’s interesting that Bartholin’s
daughter, Anne Marie, married Ole Roemer, whose discovery of the finite speed of light
was discussed in Section 3.3.)
Christiaan Huygens had previously derived the ordinary law of refraction from his wave
theory light by assuming that the speed of light in a refracting substance is the same in all
directions, i.e., isotropic. When Huygens learned of the double refraction in the Iceland
crystal (also known as Iceland spar) he concluded that the crystal must contain two
different media interspersed, and that the speed of light is isotropic in one of these media
but anisotropic in the other. Hence he imagined that two distinct wave fronts emanated
from each point, one spherical and the other ellipsoidal, and the directions of the two rays
were normal to these two wave fronts. He didn’t explain why part of the light propagated
purely in one of the media, and part of the light purely in the other. Moreover, he
discovered another very remarkable phenomena related to this double refraction that was
even more difficult to reconcile with his implicitly longitudinal conception of light
waves. He found that if a ray of light, after passing through an Iceland crystal, is passed
through a second crystal aligned parallel with the first, then all of the ordinary ray passes
through the second crystal without refraction, and all of the extraordinary ray is refracted
in the second crystal just as it was in the first. On the other hand, if the secondary crystals
are aligned perpendicular to the first, the refracted ray from the first crystal is not
refracted at all in the second crystal, whereas the unrefracted ray from the first crystal
undergoes refraction in the second. These two cases are depicted in the figures below.
Huygens was unable to account for this behavior in any way that was consistent with his
conception of light as a longitudinal wave with radial symmetry. He conceded
…it seems that one is obliged to conclude that the waves of light, after having
passed through the first crystal, acquire a certain form or disposition in virtue of
which, when meeting the texture of the second crystal, in certain positions, they
can move the two different kinds of matter which serve for the two species of
refraction; and when meeting the second crystal in another position are able to
move only one of these kinds of matter. But to tell how this occurs, I have hitherto
found nothing which satisfies me.
Newton considered this phenomena to be inexplicable “if light be nothing other than
pression or motion through an aether”, and argued that “the unusual refraction is [due to]
an original property of the rays”, namely, an axial asymmetry or sidedness, which he
thought must be regarded as an intrinsic property of individual corpuscles of light. At the
beginning of the 19th century the “sidedness” of Newton of reconciled with the wave
concept of Huygens by the idea of light as a transverse (rather than longitudinal) wave.
Later this transverse wave was found to be a feature of the electromagnetic waves
predicted by Maxwell’s equations, according to which the electric and magnetic fields
oscillate transversely in the plane normal to the direction of motion (and perpendicular to
each other). Thus an electromagnetic wave "looks" something like this:
where E signifies the oscillating electric field and B the magnetic field. The wave is said
to be polarized in the direction of E. The osculating planes are perpendicular to each
other, but their orientations are not necessarily fixed - it's possible for them to rotate like
a windmill about the axis of propagation. In general the electric field of a plane wave of
frequency  propagating along the z axis of Cartesian coordinates can be resolved into
two perpendicular components that can be written as
where  is the phase difference between the two components, and Cx and Cy are the
constant amplitudes. If the amplitudes of these two components both equal a single
constant E0, and if the phase difference is –/2, then remembering the trigonometric
identity sin(u) = cos(u  /2), we have
In this case the amplitude of the overall wave is constant, and, as can be seen in the figure
below, the electric field vector at constant z rotates (at the angular speed ) in the
clockwise direction as seen by an observer looking back toward the approaching wave.
This is conventionally called right-circularly polarized light. On the other hand, if the
amplitudes are equal but the phase difference is +/2, then remembering the
trigonometric identity sin(u) = cos(u + p/2), the two components are
so the direction of the electric field rotates in the counter-clockwise direction. This is
called left-circularly polarized light. If we superimpose left and right circularly polarized
waves (with the same frequency and phase), The result is simply
which represents a linearly polarized wave, since the electric field oscillates entirely in
the xz plane. By combining left and right circularly polarized light in other proportions
and with other phase relations, we can also produce what are called elliptically polarized
light waves, which are intermediate between the extremes of circularly polarized and
linearly polarized light. Conversely, a circularly polarized light wave can be produced by
combining two perpendicular linearly polarized waves.
A typical plane wave of ordinary light (such as from the Sun) consists of components
with all possible polarizations mixed together, but it can be decomposed into left and
right circularly polarized waves, and this can be done relative to any orthogonal set of
axes. Calcite crystals (as well as some other substances) have an isotropic index of
refraction for light whose electric field oscillates in one particular plane, but an
anisotropic index of refraction for light whose electric field oscillates in the perpendicular
plane. Hence it acts as a filter, decomposing the incident wave (normal to the surface)
into perpendicular linearly polarized waves aligned with the characteristic axis of the
crystal. As the wave enters the crystal, only the component whose electric plane of
oscillation encounters anisotropic refractivity is subjected to refraction. This is the
classical account of the phenomena observed by Bartholin and Huygens.
It could be argued that the classical account of polarization phenomena is incomplete,
because it relies on the assumption that a superposition of plane waves can be
decomposed into an arbitrary set of orthogonal components, and that the interactions of
those components with matter will yield the same results, regardless of the chosen basis
of decomposition. The difficulty can be seen by considering how a polarizing crystal
prevents exactly half of the waves from passing straight through while allowing other
half to pass. The incident beam consists of waves whose polarization axes are distributed
uniformly in all direction, so one might expect to find that only a very small fraction of
the waves would pass through a perfect polarizing substance. In fact, the fraction of
waves from a uniform distribution with polarizations exactly aligned with the polarizing
axis of a substance should be vanishingly small. Likewise it isn’t easy to explain, from
the standpoint of classical electrodynamics, why half of the incident wave energy is
diverted in one discrete direction, rather than being distributed over a range of refraction
angles. The process seems to be discretely binary, i.e., each bit of incident energy must
go in one of just two directions, even though the polarization angles of the incident
energy are uniformly distributed over all directions. The precise mechanism for how this
comes about requires a detailed understanding of the interactions between matter and
electromagnetic radiation – something which classical electrodynamics was never able to
provide.
If we discard the extraordinary ray emerging from a calcite polarizing prism, the crystal
functions as a filter, producing a beam of a linearly polarized light. The thickness of a
polarizing filter isn't crucial (assuming the polarization axis is perfectly uniform
throughout the substance), because the first surface effectively "selects" the suitably
aligned waves, which then pass freely through the rest of the substance. The light
emerging from the other side is plane-polarized with half the intensity of the incident
light. Now, as noted above, if we pass this polarized beam through another polarizing
filter oriented parallel to the first, then all the energy of the incident polarized beam will
be passed through the second filter. On the other hand, if the second filter is oriented
perpendicular to the first, none of the polarized beams energy will get through the second
filter. For intermediate angles, Etienne Malus (a captain in the army of Napoleon
Bonaparte) discovered in 1809 that the intensity of the beam emerging from the second
polarizing filter is I cos()2, where I is the intensity of the beam emerging from the first
filter and  is the angle between the two filters.
Incidentally, it’s possible to convert circularly polarized incident light into planepolarized light of the same intensity. The traditional method is to use a "quarter-wave
plate" thickness of a crystal substance such as mica. In this case we're not masking the
non-aligned components, but rather introducing a relative phase shift between them so as
to force them into alignment. Of course, a particular thickness of plate only "works" this
way for a particular frequency.
In 1922 the Dutch physicists Otto Stern and Walther Gerlach made a discovery
remarkably similar to that of Erasmus Bartholin, but instead of light rays their discovery
involved the trajectories of elementary particles of matter. They passed a beam of
particles (atoms of silver) through an oriented magnetic field, and found that the beam
split into two beams, with about half the particles in each beam, one deflected up (relative
to the direction of the magnetic field) and the other down. This is depicted in the figure
below.
Ultimately this behavior was recognized as being a consequence of the intrinsic spin of
elementary particles. The idea of intrinsic spin was introduced by Uhlenbech and
Goudsmit in 1925, and was soon incorporated (albeit in a somewhat ad hoc way) into the
formalism of quantum mechanics by postulating that the wave function of a particle can
be decomposed into two components, which we might label UP and DOWN, relative to
any given orientation of the magnetic field. These components are weighted and the sum
of the squares of the weights equals 1. (Of course, the overall state-vector for the particle
can be expressed as the Cartesian product of a non-spin vector times the spin vector.) The
observable "spin" then corresponds to three operators that are proportional to the Pauli
spin matrices:
These operators satisfy the commutation relations
as we would expect by the correspondence principle from ordinary (classical) angular
momentum. Not surprisingly, this non-commutation is closely related to the noncommutation of ordinary spatial rotations of a classical particle, in the sense that they're
both related to the cross-product of orthogonal vectors. Given an orthogonal coordinate
system [x,y,z] the angular momentum of a classical particle with momentum [px, py, pz] is
(in component form)
Guided by the correspondence principle, we replace the classical components px, py, pz
with their quantum mechanical equivalents, the differential operators
,
leading to the S operators noted above. Although this works, it is not entirely satisfactory,
first because it is ad hoc, and second because it is not relativistic. Both of these
shortcomings were eliminated by Dirac in 1928 when he developed the first relativistic
equation for an elementary massive particle. The Dirac equation is one of the greatest
examples of the heuristic power of the principle of relativity, leading not only to an
understanding of the necessity of intrinsic spin, but also to the prediction of anti-matter,
and ultimately to quantum field theory. Recall from Section 2.3 that the invariant
spacetime interval along the path of a timelike particle of mass m in special relativity is
and if we multiply through by m2/(dt)2 and make the identifications E = m(dt/d), px =
m(dx/d), etc., this gives
Also, if we postulate that the particle is described by a wave function (t,x,y,z) we can
differentiate with respect to  to give
multiplying through by m and making the identifications for E, px, py, pz, we get
This relation would be equivalent to equation (1) if we put
where
is a constant. This suggests the operator correspondences
on the basis of which equation (2) can be re-written as
which, if we identify with the reduced Planck constant h/(2), is the Klein-Gordon
wave equation. Unfortunately the solutions of this equation do not give positive-definite
probabilities, so it was ruled out as a possible quantum mechanical wave function for a
massive particle. Schrödinger’s provisional alternative was to base his wave mechanics
on the non-relativistic version of equation (1), which is simply E = p2/(2m). This led to
the familiar Schrödinger equation, whose solutions do give positive-definite probabilities,
and which was highly successful in non-relativistic contexts. Still, Dirac was dissatisfied,
and sought a relativistic wave equation with positive-definite probabilities. Dirac’s
solution was to factor the quadratic equation (1) into linear factors, and take one of those
factors as the basis of the quantum mechanical wave equation. Of course, equation (1)
doesn’t factor if we restrict ourselves to the set of real numbers, but it can be factored in
different classes of mathematical entities, just as x2+1 can’t be factored if we are
restricted to real numbers, but it factors as (x+i)(xi) if we allow imaginary as well as
real units.
In order to factor equation (1), Dirac postulated a set of basis variables 0, 1, 2, and 3
(not necessarily commuting) such that
Expanding the product and collecting terms, we find that this is a valid equality if and
only if the four variables j satisfy the relations
for all i,j = 0,1,2,3 with i ≠ j. These four quantities, along with unity, form the basis of
what is called a Clifford algebra. The natural representation of these “quantities” is as
4x4 matrices. Equation (1) is solved provided either of the factors equals zero. Setting the
first factor equal to zero and making the operator substitutions for energy and
momentum, we arrive at Dirac’s equation
Since the operator is four-dimensional, the wave function must be a vector with four
components, i.e., we have
The four components encode the different possible intrinsic spin states of the particle,
subsuming the earlier ad hoc two-dimensional state vector. The appearance of four
components instead of just two is due to the fact that these state vectors also encompass
negative energy states as well as positive energy states. This was inevitable, considering
that the relativistic equation (1) is satisfied equally well by –E as well as +E.
It may be surprising at first that equation (4), which is linear, is nevertheless covariant
under Lorentz transformations. The covariance certainly isn’t obvious, and it is achieved
only by stipulating that the components of  transform not as an ordinary four-vector, but
as two spinors. Thus the requirement for Lorentz covariance leads directly to the
existence of intrinsic spin for any massive particle described by Dirac’s equation,
including electrons. (Such particles are said to possess “spin 1/2”, since it can be shown
that the angular momentum represented by their intrinsic spin is
.) In addition,
when the interaction with an electromagnetic field is included in the Dirac equation, the
requirement for Lorentz covariance leads to the existence of anti-particles. The positron,
which is the anti-particle of the electron, was predicted by Dirac around 1929, and found
experimentally just two years later. Fundamentally, the Dirac equation introduced, for the
first time, the idea that any relativistic treatment of one particle must inevitably involve
consideration of other particles, and from this emerged the concept of second
quantization and quantum field theory. In effect, quantum field theory requires us to
consider not just the field of a single identified particle, but the field of all such fields.
(It’s interesting to compare this with the view of the metric of spacetime as the “field of
all fields” discussed in Section 4.6.)
One outcome of quantum field theory was a quantization of the electromagnetic field, the
necessity of which had been pointed out by Einstein as early as 1905. On an elementary
level, Maxwell’s equations are inadequate to describe the phenomena of radiation. The
quantum of electromagnetic radiation is called the photon, which behaves in some ways
like an elementary particle, although it is massless, and therefore always propagates at the
speed of light. Hence the "spin axis" of a photon is always parallel to its direction of
motion, pointing either forward or backward, as illustrated below.
These two states correspond to left-handed and right-handed photons. Whenever a photon
is absorbed by an object, an angular momentum of either
or
is imparted to the
object. Each photon is characterized not only by its energy (frequency) and its phase, but
also by it’s propensity to exhibit each of the two possible states of spin when it interacts
with an object. A beam of light, consisting of a large number of photons, is characterized
by the energies, phase relations, and spin propensities of its constituent photons. This
could be said to vindicate Newton’s belief that rays of light posses a previously unknown
“original property” that affects how they are refracted. Recall that, in classical
electromagnetic theory, the plane of oscillation of the electric field of circularly polarized
light rotates about the axis of propagation (in one direction or the other). When such light
impinges on a surface, it imparts angular momentum due to the rotation of the electric
field. In quantum theory this corresponds to a stream of photons with a high propensity
for being right-handed (or for being left-handed), so that each photon contributes to the
overall angular momentum imparted to the absorbing object.
On the other hand, linearly polarized light (in classical electrodynamics) does not impart
any angular momentum to the absorbing object. This is represented in quantum theory by
a stream of photons, each with equal propensity to exhibit right-handed or left-handed
spin. Each individual interaction, i.e., each absorption of a photon, imparts either
or
to the absorbing object, so if the intensity of a linearly polarized beam of light is
lowered to the point that only one photon is transmitted at a time, it will appear to be
circularly polarized (either left or right) for each photon, which of course is not predicted
by classical theory. (In a sense, Maxwell’s equations can be regarded as a crude form of
the Schrödinger equation for light, but it obviously does not represent all the quantum
mechanical effects.) However, for a large stream of such photons, the net angular
momentum is essentially zero, because half of the photons interact in the right-handed
sense and half in the left-handed sense. This corresponds (loosely) to the fact in classical
theory that linearly polarized light can be regarded as a superposition of left and right
circularly polarized light.
Incidentally, most people have personal "hands on" knowledge of polarized
electromagnetic waves without even realizing it. The waves broadcast by a radio or
television tower are naturally polarized, and if you've ever adjusted the orientation of
"rabbit ears" and found that your reception is better at some orientations than at others,
for a particular station, you've demonstrated the effects of electromagnetic wave
polarization.
The behavior of intrinsic spin of elementary particles can be used to illustrate some
important features of quantum mechanics – features which Einstein famously referred to
as “spooky action at a distance”. This behavior is discussed in the next section.
9.5 Entangled Events
Anyone who is not shocked by quantum theory has not understood it.
Niels Bohr, 1927
A paper written by Einstein, Podalsky, and Rosen (EPR) in 1935 described a thought
experiment which, the authors believed, demonstrated that quantum mechanics does not
provide a complete description of physical reality, at least not if we accept certain
common notions of locality and realism. Subsequently the EPR experiment was refined
by David Bohm (so it is now called the EPRB experiment) and analyzed in detail by John
Bell, who highlighted a fascinating subtlety that Einstein, et al, may have missed. Bell
showed that the outcomes of the EPRB experiment predicted by quantum mechanics are
inherently incompatible with conventional notions of locality and realism combined with
a certain set of assumptions about causality. The precise nature of these causality
assumptions is rather subtle, and Bell found it necessary to revise and clarify his premises
from one paper to the next. In Section 9.6 we discuss Bell's assumptions in detail, but for
the moment we'll focus on the EPRB experiment itself, and the outcomes predicted by
quantum mechanics.
Most actual EPRB experiments are conducted with photons, but in principle the
experiment could be performed with massive particles. The essential features of the
experiment are independent of the kind of particle we use. For simplicity we'll describe a
hypothetical experiment using electrons (although in practice it may not be feasible to
actually perform the necessary measurements on individual electrons). Consider the
decay of a spin-0 particle resulting in two spin-1/2 particles, an electron and a positron,
ejected in opposite directions. If spin measurements are then performed on the two
individual particles, the correlation between the two results is found to depend on the
difference between the two measurement angles. This situation is illustrated below, with
 and  signifying the respective measurement angles at detectors 1 and 2.
Needless to say, the mere existence of a correlation between the measurements on these
two particles is not at all surprising. In fact, this would be expected in most classical
models, as would a variation in the correlation as a function of the absolute difference  =
|  | between the two measurement angles. The essential strangeness of the quantum
mechanical prediction is not the mere existence of a correlation that varies with , it is the
non-linearity of the predicted variation.
If the correlation varied linearly as  ranged from 0 to , it would be easy to explain in
classical terms. We could simply imagine that the decay of the original spin-0 particle
produced a pair of particles with spin vectors pointing oppositely along some randomly
chosen axis. Then we could imagine that a measurement taken at any particular angle
gives the result UP if the angle is within /2 of the positive spin axis, and gives the result
DOWN otherwise. This situation is illustrated below:
Since the spin axis is random, each measurement will have an equal probability of being
UP or DOWN. In addition, if the measurements on the two particles are taken in exactly
the same direction, they will always give opposite results (UP/DOWN or DOWN/UP),
and if they are taken in the exact opposite directions they will always give equal results
(UP/UP or DOWN/DOWN). Also, if they are taken at right angles to each other the
results will be completely uncorrelated, meaning they are equally likely to agree or
disagree. In general, if  denotes the absolute value of the angle between the two spin
measurements, the above model implies that the correlation between these two
measurements would be C(q) = (2/)  1, as plotted below.
This linear correlation function is consistent with quantum mechanics (and confirmed by
experiment) if the two measurement angles differ by  = 0, /2, or , giving the
correlations 1, 0, and +1 respectively.
However, for intermediate angles, quantum theory predicts (and experiments confirm)
that the actual correlation function for spin-1/2 particles is not the linear function shown
above, but the non-linear function given by C() = cos(), as shown below
On this basis, the probabilities of the four possible joint outcomes of spin measurements
performed at angles differing by  are as shown in the table below. (The same table
would apply to spin-1 particles such as photons if we replace  with 2.)
To understand why the shape of this correlation function defies explanation within the
classical framework of local realism, suppose we confine ourselves to spin measurements
along one of just three axes, at 0, 120, and 240 degrees. For convenience we will denote
these axes by the symbols A, B, and C respectively. Several pairs of particles are
produced and sent off to two distant locations in opposite directions. In both locations a
spin measurement along one of the three allowable axes is performed, and the results are
recorded. Our choices of measurements (A, B, or C) may be arbitrary, e.g., by flipping
coins, or by any other means. In each location it is found that, regardless of which
measurement is made, there is an equal probability of spin UP or spin DOWN, which we
will denote by "1" and "0" respectively. This is all that the experimenters at either site can
determine separately.
However, when all the results are brought together and compared in matched pairs, we
find the following joint correlations
The numbers in this matrix indicate the fraction of times that the results agreed (both 0 or
both 1) when the indicated measurements were made on the two members of a matched
pair of objects. Notice that if the two distant experimenters happened to have chosen to
make the same measurement for a given pair of particles, the results never agreed, i.e.,
they were always the opposite (1 and 0, or 0 and 1). Also notice that, if both
measurements are selected at random, the overall probability of agreement is 1/2.
The remarkable fact is that there is no way (within the traditional view of physical
processes) to prepare the pairs of particles in advance of the measurements such that they
will give the joint probabilities listed above. To see why, notice that each particle must be
ready to respond to any one of the three measurements, and if it happens to be the same
measurement as is selected on its matched partner, then it must give the opposite answer.
Hence if the particle at one location will answer "0" for measurement A, then the particle
at the other location must be prepared to give the answer "1" for measurement A. There
are similar constraints on the preparations for measurements B and C, so there are really
only eight ways of preparing a pair of particles
These preparations - and only these - will yield the required anti-correlation when the
same measurement is applied to both objects. Therefore, assuming the particles are preprogrammed (at the moment when they separate from each other) to give the appropriate
result for any one of the nine possible joint measurements that might be performed on
them, it follows that each pair of particles must be pre-programmed in one of the eight
ways shown above. It only remains now to determine the probabilities of these eight
preparations.
The simplest state of affairs would be for each of the eight possible preparations to be
equally probable, but this yields the measurement correlations shown below
Not only do the individual joint probabilities differ from the quantum mechanical
predictions, this distribution gives an overall probability of agreement of 1/3, rather than
1/2 (as quantum mechanics says it must be), so clearly the eight possible preparations
cannot be equally likely. Now, we might think some other weighting of these eight
preparation states will give the right overall results, but in fact no such weighting is
possible. The overall preparation process must yield some linear convex combination of
the eight mutually exclusive cases, i.e., each of the eight possible preparations must have
some fixed long-term probability, which we will denote by a, b,.., h, respectively. These
probabilities are all positive values in the range 0 to 1, and the sum of these eight values
is identically 1. It follows that the sum of the six probabilities b through g must be less
than or equal to 1. This is a simple form of "Bell's inequality", which must be satisfied by
any local realistic model of the sort that Bell had in mind. However, the joint
probabilities in the correlation table predicted by quantum mechanics imply
Adding these three expressions together gives 2(b + c + d + e + f + g) = 9/4, so the sum
of the probabilities b through g is 9/8, which exceeds 1. Hence the results of the EPRB
experiment predicted by quantum mechanics (and empirically confirmed) violate Bell's
inequality. This shows that there does not exist a linear combination of those eight
preparations that can yield the joint probabilities predicted by quantum mechanics, so
there is no way of accounting for the actual experimental results by means of any realistic
local physical model of the sort that Bell had in mind. The observed violations of Bell's
inequality in EPRB experiments imply that Bell's conception of local realism is
inadequate to represent the actual processes of nature.
The causality assumptions underlying Bell's analysis are inherently problematic (see
Section 9.7), but the analysis is still important, because it highlights the fundamental
inconsistency between the predictions of quantum mechanics and certain conventional
ideas about causality and local realism. In order to maintain those conventional ideas, we
would be forced to conclude that information about the choice of measurement basis at
one detector is somehow conveyed to the other detector, influencing the outcome at that
detector, even though the measurement events are space-like separated. For this reason,
some people have been tempted to think that violations of Bell's inequality imply
superluminal communication, contradicting the principles of special relativity. However,
there is actually no effective transfer of information from one measurement to the other in
an EPRB experiment, so the principles of special relativity are safe. One of the most
intriguing aspects of Bell's analysis is that it shows how the workings of quantum
mechanics (and, evidently, nature) involve correlations between space-like separated
events that seemingly could only be explained by the presence of information from
distant locations, even though the separate events themselves give no way of inferring
that information. In the abstract, this is similar to "zero-information proofs" in
mathematics.
To illustrate, consider a "twins paradox" involving a pair of twin brothers who are
separated and sent off to distant locations in opposite directions. When twin #1 reaches
his destination he asks a stranger there to choose a number x1 from 1 to 10, and the twin
writes this number down on a slip of paper along with another number y1 of his own
choosing. Likewise twin #2 asks someone at his destination to choose a number x2, and
he writes this number down along with a number y2 of his own choosing. When the twins
are re-united, we compare their slips of paper and find that |y2  y1| = (x2  x1)2. This is
really astonishing. Of course, if the correlation was some linear relationship of the form
y2  y1 = A(x2  x1) + B for any pre-established constants A and B, the result would be
quite easy to explain. We would simply surmise that the twins had agreed in advance that
twin #1 would write down y1 = Ax1  B/2, and twin #2 would write down y2 = Ax2 + B/2.
However, no such explanation is possible for the observed non-linear relationship,
because there do not exist functions f1 and f2 such that f2(x2)  f1(x1) = (x2  x1)2. Thus if
we assume the numbers x1 and x2 are independently and freely selected, and there is no
communication between the twins after they are separated, then there is no "locally
realistic" way of accounting for this non-linear correlation. It seems as though one or both
of the twins must have had knowledge of his brother's numbers when writing down his
own number, despite the fact that it is not possible to infer anything about the individual
values of x2 and y2 from the values of x1 and y1 or vice versa.
In the same way, the results of EPRB experiments imply a greater degree of interdependence between separate events than can be accounted for by traditional models of
causality. One possible idea for adjusting our conceptual models to accommodate this
aspect of quantum phenomena would be to deny the existence of any correlations until
they becomes observable. According to the most radical form of this proposal, the
universe is naturally partitioned into causally compact cells, and only when these cells
interact do their respective measurement bases become reconciled, in such a way as to
yield the quantum mechanical correlations. This is an appealing idea in many ways, but
it's far from clear how it could be turned into a realistic model. Another possibility is that
the preparation of the two particles at the emitter and the choices of measurement bases at
the detectors may be mutually influenced by some common antecedent event(s). This can
never be ruled out, as discussed in Section 9.6. Lastly, we mention the possibility that the
preparation of the two particles may be conditioned by the measurements to which they
are subjected. This is discussed in Section 9.10.
9.6 Von Neumann's Postulate and Bell’s Freedom
If I have freedom in my love,
And in my soul am free,
Angels alone, that soar above,
Enjoy such liberty.
Richard Lovelace, 1649
In quantum mechanics the condition of a physical system is represented by a state vector,
which encodes the probabilities of each possible result of whatever measurements we
may perform on the system. Since the probabilities are usually neither 0 nor 1, it follows
that for a given system with a specific state vector, the results of measurements generally
are not uniquely determined. Instead, there is a set (or range) of possible results, each
with a specific probability. Furthermore, according to the conventional interpretation of
quantum mechanics (the so-called Copenhagen Interpretation of Niels Bohr, et al), the
state vector is the most complete possible description of the system, which implies that
nature is fundamentally probabilistic (i.e., non-deterministic). However, some physicists
have questioned whether this interpretation is correct, and whether there might be some
more complete description of a system, such that a fully specified system would respond
deterministically to any measurement we might perform. Such proposals are called
'hidden variable' theories.
In his assessment of hidden variable theories in 1932, John von Neumann pointed out a
set of five assumptions which, if we accept them, imply that no hidden variable theory
can possibly give deterministic results for all measurements. The first four of these
assumptions are fairly unobjectionable, but the fifth seems much more arbitrary, and has
been the subject of much discussion. (The parallel with Euclid's postulates, including the
controversial fifth postulate discussed in Chapter 3.1, is striking.) To understand von
Neumann's fifth postulate, notice that although the conventional interpretation does not
uniquely determine the outcome of a particular measurement for a given state, it does
predict a unique 'expected value' for that measurement. Let's say a measurement of X on
a system with a state vector  has an expected value denoted by <X;>, computed by
simply adding up all the possible results multiplied by their respective probabilities. Not
surprisingly, the expected values of observables are additive, in the sense that
In practice we can't generally perform a measurement of X+Y without disturbing the
measurements of X and Y, so we can't measure all three observables on the same system.
However, if we prepare a set of systems, all with the same initial state vector , and
perform measurements of X+Y on some of them, and measurements of X or Y on the
others, then the averages of the measured values of X, Y, and X+Y (over sufficiently
many systems) will be related in accord with (1).
Remember that according to the conventional interpretation the state vector  is the most
complete possible description of the system. On the other hand, in a hidden variable
theory the premise is that there are additional variables, and if we specify both the state
vector  AND the "hidden vector" H, the result of measuring X on the system is uniquely
determined. In other words, if we let <X;,H> denote the expected value of a
measurement of X on a system in the state (,H), then the claim of the hidden variable
theorist is that the variance of individual measured values around this expected value is
zero.
Now we come to von Neumann's controversial fifth postulate. He assumed that, for any
hidden variable theory, just as in the conventional interpretation, the averages of X+Y, X
and Y evaluated over a set of identical systems are additive. (Compare this with Galileo's
assumption of simple additivity for the composition of incommensurate speeds.)
Symbolically, this is expressed as
for any two observables X and Y. On this basis he proved that the variance ("dispersion")
of at least one observable's measurements must be greater than zero. (Technically, he
showed that there must be an observable X such that <X2> is not equal to <X>2.) Thus,
no hidden variable theory can uniquely determine the results of all possible
measurements, and we are compelled to accept that nature is fundamentally nondeterministic.
However, this is all based on (2), the assumption of additivity for the expectations of
identically prepared systems, so it's important to understand exactly what this assumption
means. Clearly the words "identically prepared" mean something different under the
conventional interpretation than they do in the context of a hidden variable theory.
Conventionally, two systems are said to be identically prepared if they have the same
state vector (), but in a hidden variable theory two states with the same state vector are
not necessarily "identical", because they may have different hidden vectors (H).
Of course, a successful hidden variable theory must satisfy (1) (which has been
experimentally verified), but must it necessarily satisfy (2)? Relation (1) implies that the
averages of <X;,H>, etc, evaluated over all applicable hidden vectors H, leads to (1), but
does it necessarily follow that (2) is satisfied for every (or even for ANY) specific value
of H? To give a simple illustration, consider the following trivial set of data:
The averages over these four "conventionally indistinguishable" systems are <X;3> = 3,
<Y;3> = 4, and <X+Y;3> = 7, so relation (1) holds. However, if we examine the
"identically prepared" systems taking into account the hidden components of the state, we
really have two different states (those with H=1 and those with H=2), and we find that the
results are not additive (but they are deterministic) in these fully-defined states. Thus,
equation (1) clearly doesn't imply equation (2). (If it did, von Neumann could have said
so, rather than taking it as an axiom.)
Of course, if our hidden variable theory is always going to satisfy (1), we must have some
constraints on the values of H that arise among "conventionally indistinguishable"
systems. For example, in the above table if we happened to get a sequence of systems all
in the same condition as System #1 we would always get the results X=2, Y=5, X+Y=5,
which would violate (1). So, if (2) doesn't hold, then at the very least we need our theory
to ensure a distribution of the hidden variables H that will make the average results over a
set of "conventionally indistinguishable" systems satisfy relation (1). (In the simple
illustration above, we would just need to ensure that the hidden variables are equally
distributed between H=1 and H=2.)
In Bohm's 1952 theory the hidden variables consist of precise initial positions for the
particles in the system – more precise than the uncertainty relations would typically allow
us to determine - and the distribution of those variables within the uncertainty limits is
governed as a function of the conventional state vector, . It's also worth noting that, in
order to make the theory work, it was necessary for  to be related to the values of H for
separate particles instantaneously in an explicitly non-local way. Thus, Bohm's theory is
a counter-example to von Neumann's theorem, but not to Bell's (see below).
Incidentally, it may be worth noting that if a hidden variable theory is valid, and the
variance of all measurements around their expectations are zero, then the terms of (2) are
not only the expectations, they are the unique results of measurements for a given  and
H. This implies that they are eigenvalues, of the respective operators, whereas the
expectations for those operators are generally not equal to any of the eigenvalues. Thus,
as Bell remarked, "[von Neumann's] 'very general and plausible postulate' is absurd".
Still, Gleason showed that we can carry through von Neumann's proof even on the
weaker assumption that (2) applies to commuting variables. This weakened assumption
has the advantage of not being self-evidently false. However, careful examination of
Gleason's proof reveals that the non-zero variances again arise only because of the
existence of non-commuting observables, but this time in a "contextual" sense that may
not be obvious at first glance. To illustrate, consider three observables X,Y,Z. If X and Y
commute and X and Z commute, it doesn't follow that Y and Z commute. We may be
able to measure X and Y using one setup, and X and Z using another, but measuring the
value of X and Y simultaneously will disturb the value of Z. Gleason's proof leads to
non-zero variances precisely for measurements in such non-commuting contexts. It's not
hard to understand this, because in a sense the entire non-classical content of quantum
mechanics is the fact that some observables do not commute. Thus it's inevitable that any
"proof" of the inherent non-classicality of quantum mechanics must at some point invoke
non-commuting measurements, but it's precisely at that point where linear additivity can
only be empirically verified on an average basis, not a specific basis. This, in turn, leaves
the door open for hidden variables to govern the individual results.
Notice that in a "contextual" theory the result of an experiment is understood to depend
not only on the deterministic state of the "test particles" but also on the state of the
experimental apparatus used to make the measurements, and these two can influence each
other. Thus, Bohm's 1952 theory escaped the no hidden variable theorems essentially by
allowing the measurements to have an instantaneous effect on the hidden variables,
which, of course, made the theory essentially non-local as well as non-relativistic
(although Bohm and others later worked to relativize his theory).
Ironically, the importance of considering the entire experimental setup (rather than just
the arbitrarily identified "test particles") was emphasized by Niels Bohr himself, and it's a
fundamental feature of quantum mechanics (i.e., objects are influenced by measurements
no less than measurements are influenced by objects). As Bell said, even Gleason's
relatively robust line of reasoning overlooks this basic insight. Of course, it can be argued
that contextual theories are somewhat contrived and not entirely compatible with the
spirit of hidden variable explanations, but, if nothing else, they serve to illustrate how
difficult it is to categorically rule out "all possible" hidden variable theories based simply
on the structure of the quantum mechanical state space.
In 1963 John Bell sought to clarify matters, noting that all previous attempts to prove the
impossibility of hidden variable interpretations of quantum mechanics had been “found
wanting”. His idea was to establish rigorous limits on the kinds of statistical correlations
that could possibly exist between spatially separate events under the assumption of
determinism and what might be called “local realism”, which he took to be the premises
of Einstein, et al. At first Bell thought he had succeeded, but it was soon pointed out that
his derivation implicitly assumed one other crucial ingredient, namely, the possibility of
free choice. To see why this is necessary, notice that any two spatially separate events
share a common causal past, consisting of the intersection of their past light cones. This
implies that we can never categorically rule out some kind of "pre-arranged" correlation
between spacelike-separated events - at least not unless we can introduce information that
is guaranteed to be causally independent of prior events. The appearance of such "new
events" whose information content is at least partially independent of their causal past,
constitutes a free choice. If no free choice is ever possible, then (as Bell acknowledged)
the Bell inequalities do not apply.
In summary, Bell showed that quantum mechanics is incompatible with a quite peculiar
pair of assumptions, the first being that the future behavior of some particles (i.e., the
"entangled" pairs) involved in the experiment is mutually conditioned and coordinated in
advance, and the second being that such advance coordination is in principle impossible
for other particles involved in the experiment (e.g., the measuring apparatus). These are
not quite each others' logical negations, but close to it. One is tempted to suggest that the
mention of quantum mechanics is almost superfluous, because Bell's result essentially
amounts to a proof that the assumption of a strictly deterministic universe is incompatible
with the assumption of a strictly non-deterministic universe. He proved, assuming the
predictions of quantum mechanics are valid (which the experimental evidence strongly
supports), that not all events can be strictly consequences of their causal pasts, and in
order to carry out this proof he found it necessary to introduce the assumption that not all
events are strictly consequences of their causal pasts!
Bell identified three possible positions (aside from “just ignore it”) that he thought could
be taken with respect to the Aspect experiments: (1) detector inefficiencies are keeping us
from seeing that the inequalities are not really violated, (2) there are influences going
faster than light, or (3) the measuring angles are not free variables. Regarding the third
possibility, he wrote:
...if our measurements are not independently variable as we supposed...even if
chosen by apparently free-willed physicists... then Einstein local causality can
survive. But apparently separate parts of the world become deeply entangled, and
our apparent free will is entangled with them.
The third possibility clearly shows that Bell understood the necessity of assuming free
acausal events for his derivation, but since this amounts to assuming precisely that which
he was trying to prove, we must acknowledge that the significance of Bell's inequalities is
less clear than many people originally believed. In effect, after clarifying the lack of
significance of von Neumann's "no hidden variables proof" due to its assumption of what
it meant to prove, Bell proceeded to repeat the mistake, albeit in a more subtle way.
Perhaps Bell's most perspicacious remark was (in reference to Von Neumann's proof) that
the only thing proved by impossibility proofs is the author's lack of imagination.
This all just illustrates that it's extremely difficult to think clearly about causation, and the
reasons for this can be traced back to the Aristotelian distinction between natural and
violent motion. Natural motion consisted of the motions of non-living objects, such as the
motions of celestial objects, the natural flows of water and wind, etc. These are the kinds
of motion that people (like Bell) apparently have in mind when they think of
determinism. Following the ancients, many people tend to instinctively exempt "violent
motions" – i.e., motions resulting from acts of living volition – when considering
determinism. In fact, when Bell contemplated the possibility that determinism might also
apply to himself and other living beings, he coined a different name for it, calling it
“super-determinism”. Regarding the experimental tests of quantum entanglement he said
One of the ways of understanding this business is to say that the world is superdeterministic. That not only is inanimate nature deterministic, but we, the
experimenters who imagine we can choose to do one experiment rather than
another, are also determined. If so, the difficulty which this experimental result
creates disappears.
But what Bell calls (admittedly on the spur of the moment) super-determinism is nothing
other than what philosophers have always called simply determinism. Ironically, if
confronted with the idea of vitalism, i.e., the notion that living beings are exempt from
the normal laws of physics that apply to inanimate objects, or at least that living beings
also entail some other kind of action transcending the normal laws of physics in
physically observable ways – many physicists would probably be skeptical if not
downright dismissive… and yet hardly any would think to question this very dualistic
assumption underlying Bell’s analysis. Regardless of our conscious beliefs, it's
psychologically very difficult for us to avoid bifurcating the world into inanimate objects
that obey strict laws of causality, and animate objects (like ourselves) that do not. This
dichotomy was historically appealing, and may even have been necessary for the
development of classical physics, but it always left the nagging question of how or why
we (and our constituent atoms) manage to evade the iron hand of determinism that
governs everything else. This view affects our conception of science by suggesting to us
that the experimenter is not himself part of nature, and is exempt from whatever
determinism is postulated for the system being studied. Thus we imagine that we can
"test" whether the universe is behaving deterministically by turning some dials and seeing
how the universe responds, overlooking the fact that we and the dials are also part of the
universe.
This immediately introduces "the measurement problem": Where do we draw the
boundaries between separate phenomena? What is an observation? How do we
distinguish "nature" from "violence", and is this distinction even warranted? When
people say they're talking about a deterministic world, they're almost always not. What
they're usually talking about is a deterministic sub-set of the world that can be subjected
to freely chosen inputs from a non-deterministic "exterior". But just as with the
measurement problem in quantum mechanics, when we think we've figured out the
constraints on how a deterministic test apparatus can behave in response to arbitrary
inputs, someone says "but isn't the whole lab a deterministic system?", and then the
whole building, and so on. At what point does "the collapse of determinism" occur, so
that we can introduce free inputs to test the system? Just as the infinite regress of the
measurement problem in quantum mechanics leads to bewilderment, so too does the
infinite regress of determinism.
The other loop-hole that can never be closed is what Bell called "correlation by postarrangement" or "backwards causality". I'd prefer to say that the system may violate the
assumption of strong temporal asymmetry, but the point is the same. Clearly the causal
pasts of the spacelike separated arms of an EPR experiment overlap, so all the objects
involved share a common causal past. Therefore, without something to "block off" this
region of common past from the emission and absorption events in the EPR experiment,
we're not justified in asserting causal independence, which is required for Bell's
derivation. The usual and, as far as I know, only way of blocking off the causal past is by
injecting some "other" influence, i.e., an influence other than the deterministic effects
propagating from the causal past. This "other" may be true randomness, free will, or
some other concept of "free occurrence". In any case, Bell's derivation requires us to
assert that each measurement is a "free" action, independent of the causal past, which is
inconsistent with even the most limited construal of determinism.
There is a fascinating parallel between the ancient concepts of natural and violent motion
and the modern quantum mechanical concepts of the linear evolution of the wave
function and the collapse of the wave function. These modern concepts are sometimes
termed U, for unitary evolution of the quantum mechanical state vector, and R, for
reduction of the state vector onto a particular basis of measurement or observation. One
could argue that the U process corresponds closely with Aristotle's natural (inanimate)
evolution, while the R process represents Aristotle's violent evolution, triggered by some
living act. As always, we face the question of whether this is an accurate or meaningful
bifurcation of events. Today there are several "non-collapse" interpretations of quantum
mechanics, including the famous "many worlds" interpretation of Everett and DeWit.
However, to date, none of these interpretations has succeeded in giving a completely
satisfactory account of quantum mechanical processes, so we are not yet able to dispense
with Aristotle's distinction between natural and violent motion.
9.7 Angels and Archetypes
The Stranger is the preparer of the way of the quaternity which he follows.
Women and children follow him gladly and he sometimes teaches them.
He sees his surroundings and particularly me as ignorant and uneducated.
He is no anti-Christ, but in a certain sense an anti-scientist…
Pauli describing a dream to Anna
Jung, 1950
Impressed by the seemingly limitless scope and precision of Newton’s laws, some of his
successors during the Enlightenment imagined a fully deterministic world. Newton
himself had tried to forestall any such conclusion by often referring to an active role for
the divine spirit in the workings of nature, especially in establishing the astonishingly
felicitous “initial conditions”, but also in restoring the world’s vitality and order from
time to time. Nevertheless, he couldn’t resist demonstrating the apparently perfect
precision with which the observed phenomena (at least in celestial mechanics) adhered to
the mathematical principles expressed by the laws of motion and gravity, and this was the
dominant impression given by his work. One of the most prominent and influential
proponents of Newtonian determinism was Laplace, who famously wrote
Present events are connected with preceding ones by the principle that a thing
cannot occur without a cause which produces it. This axiom, known as the
principle of sufficient reason, extends even to actions which are considered
indifferent… We ought then to regard the present state of the universe as the
effect of the anterior state and as the cause of the one that is to follow… If an
intelligence, for one instant, recognizes all the forces which animate Nature, and
the respective positions of the things which compose it, and if that intelligence is
also sufficiently vast to subject these data to analysis, it will comprehend in one
formula the movements of the largest bodies of the universe as well as those of
the minutest atom. Nothing would be uncertain, and the future, as the past, would
be present to its eyes.
Notice that he initially conceives of determinism as a temporally ordered chain of
implication, but then he describes a Gestalt shift, leading to the view of an atemporal
"block universe" that simply exists. He doesn’t say so, but the concepts of time and
causality in such a universe would be (at most) psychological interpretations, lacking any
active physical significance, because in order for time and causality to be genuinely
active, a degree of freedom is necessary; without freedom there can be no absolute
direction of causal implication. For example, we ordinarily say that electromagnetic
effects propagate at the speed of light, because the state of the electromagnetic field at
any given event (time and place) is fully determined by the state of the field within the
past light cone of that event, but since the laws of electromagnetism are timesymmetrical, the state of the field at any event is just as fully determined by state of the
field within its future light cone, and by the state of the field on the same time slice as the
event. We don’t conclude from this that electromagnetic effects therefore propagate
instantaneously, let alone backwards in time. This example merely shows that when
considering just a deterministic and time-symmetrical field, there is no unambiguous flow
of information, because there can be no source of information in such a field. In order to
even consider the flow of information, we must introduce a localized source of new
information, i.e., an effect not implied by the field itself. Only then can we examine how
this signal propagates through the field.
Of course, although this signal is independent of the “past”, it is certainly not independent
of the “future”. Owing to the time-symmetry of electromagnetism, we can begin with the
effects and project “backwards” in time using the deterministic field equations to arrive at
the supposedly freely produced signal. So even in this case, one can argue that the
introduction of the signal was not a “free” act at all. We can regard it as a fully
deterministic antecedent of the future, just as other events are regarded as fully
deterministic consequences of the past. Acts that we regard are “free” are characterized
by a kind of singularity, in the sense that when we extrapolate backwards from the effects
to the cause, using the deterministic field laws, we reach a singularity at the cause, and
cannot extrapolate through it back to a state that would precede it according to the
deterministic field laws. The information emanating from a “free act”, when extrapolated
backwards from its effects to the source, must annihilate itself at the source. This is
analogous to extrapolating the ripples in a pond backwards in time according to the laws
of surface wave propagation, until reaching the event of a pebble entering the water, prior
to which there is nothing in the quiet surface of the pond that implies (by the laws of the
surface) the impending disturbance. Such backward accounts seem implausible, because
they require such a highly coordinated arrangement of information from separate
locations in the future, so we ordinarily prefer to conceive of the flow of information in
the opposite direction.
Likewise, even in a block universe, it may be that certain directions are preferred based
on the simplicity with which they can be described and conceptually grasped. For
example, it may be possible to completely specify the universe based on the contents of a
particular cross-sectional slice, together with a simple set of fixed rules for recursively
inferring the contents of neighboring slices in a particular sequence, whereas other
sequences may require a vastly more complicated “rule”. However, in a deterministic
universe this chain of implication is merely a descriptive convenience, not an effective
mechanism by which the events “come into being”.
The concept of a static complete universe is consistent not only with the Newtonian
physics discussed by Laplace, but also with the theory of relativity, in which the
worldlines of objects (through spacetime) can be considered already existent in their
entirety. In fact, it can be argued that this is a necessary interpretation for some general
relativistic phenomena such as genuine black holes merging together in an infinite
universe, because, as discussed in Section 7.2, the trousers model implies that the event
horizons for two such black holes are continuously connected to each other in the future,
as part of the global topology of the universe. There is no way for two black holes that
are not connected to each other in the future to ever merge. This may sound tautological,
but the global topological feature of the spacetime manifold that results in the merging of
two black holes cannot be formed “from the past”, it must already be part of the final
state of the universe. So, in this sense, relativity is perhaps an even more deterministic
theory than Newtonian mechanics. The same conclusion could be reached by considering
the lack of absolute simultaneity in special relativity, which makes it impossible to say
which of two spacelike separated events preceded the other.
Admittedly, the determinism of classical physics (including relativity) has sometimes
been challenged, usually by pointing out that the long-term outcome of a physical process
may be exponentially sensitive to the initial conditions. The concept of classical
determinism relies on each physical variable being a real number (in the mathematical
sense) representing and infinite amount of information. One can argue that this premise is
implausible, and it certainly can’t be proven. We must also consider the possibility of
singularities in classical physics, unless they are simply excluded on principle.
Nevertheless, if the premise of infinite information in each real variable is granted, and if
we exclude singularities, classical physics exhibits the distinctive feature of determinism.
In contrast, quantum mechanics is widely regarded as decidedly non-deterministic.
Indeed, as we saw in Section 9.6, there is a famous theorem of von Neumann that
purports to rule out determinism (in the form of hidden variables) in the realm of
quantum mechanics. However, as Einstein observed
Whether objective facts are subject to causality is a question whose answer
necessarily depends on the theory from which we start. Therefore, it will never be
possible to decide whether the world is causal or not.
The word “causal” is being used here as a synonym for deterministic, since Einstein had
in mind strict causality, with no free choices, as summarized in his famous remark that
“God does not play dice with the universe”. We've seen that von Neumann’s proof was
based on a premise which is effectively equivalent to what he was trying to prove, nicely
illustrating Einstein’s point that the answer depends on the theory from which we start.
An assertion about what is recursively possible can be meaningful only if we place some
constraints on the allowable recursive "algorithm". For example, the nth state vector of a
system may be the kn+1 through k(n+1) digits of . This would be a perfectly
deterministic system, but the relations between successive states would be extremely
obscure. In fact, assuming the digits of the two transcendental numbers  and e are
normally distributed (as is widely believed, though not proven), any finite string of
decimal digits occurs infinitely often in their decimal expansions, and each string occurs
with the same frequency in both expansions. (It's been noted that, assuming normality,
the digits of  would make an inexhaustible source of high-quality "random" number
sequences, higher quality than anything we can get out of conventional pseudo-random
number generators). Therefore, given any finite number of digits (observations), we could
never even decide whether the operative “algorithm” was  or e, nor whether we had
correctly identified the relevant occurrence in the expansion. Thus we can easily imagine
a perfectly deterministic universe that is also utterly unpredictable. (Interestingly, the
recent innovation that enables computation of the nth hexadecimal digit of  (with much
less work than required to compute the first n digits) implies that we could present
someone with a sequence of digits and challenge them to determine where it first occurs
in the decimal expansion of , and it may be practically impossible for them to find the
answer.)
Even worse, there need be no simple rule of any kind relating the events of a
deterministic universe. This highlights the important distinction between determinism and
the concepts of predictability and complexity. There is no requirement for a deterministic
universe to be predictable, or for its complexity to be limited in any way. Thus, we can
never prove that any finite set of observations could only have occurred in a nondeterministic algorithm. In a sense, this is trivially true, because a finite Turing machine
can always be written to generate any given finite string, although the algorithm
necessary to generate a very irregular string may be nearly as long as the string itself.
Since determinism is inherently undecidable, we may try to define a more tractable
notion, such as predictability, in terms of the exhibited complexity manifest in our
observations. This could be quantified as the length of the shortest Turing machine
required to reproduce our observations, and we might imagine that in a completely
random universe, the size of the required algorithm would grow in proportion to the
number of observations (as we are forced to include ad hoc modifications to the
algorithm to account for each new observation). On this basis it might seem that we could
eventually assert with certainty that the universe is inherently unpredictable (on some
level of experience), i.e., that the length of the shortest Turing machine required to
duplicate the results grows in proportion with the number of observations. In a sense, this
is what the "no hidden variables" theorems try to do.
However, we can never reach such a conclusion, as shown by Chaitin's proof that there
exists an integer k such that it's impossible to prove that the complexity of any specific
string of binary bits exceeds k (where "complexity" is defined as the length of the
smallest Turing program that generates the string). This is true in spite of the fact that
"almost all" strings have complexity greater than k. Therefore, even if we (sensibly)
restrict our meaningful class of Turing machines to those of complexity less than a fixed
number k, rather than allowing the complexity of our model to increase in proportion to
the number of observations, it's still impossible for any finite set of observations (even if
we continue gathering data forever) to be provably inconsistent with a Turing machine of
complexity less than k. Naturally we must be careful not to confuse the question of
whether "there exist" sequences of complexity greater than k with the question of whether
we can prove that any particular sequence has complexity greater than k.
When Max Born retired from his professorship at the University of Edinburgh in 1953, a
commemorative volume of scientific papers was prepared. Einstein contributed a paper,
in which (as Born put it) Einstein’s “philosophical objection to the statistical
interpretation of quantum mechanics is particularly cogently and clearly expressed”. The
two men took up the subject in their private correspondence (which had started nearly 50
years earlier when they were close friends in Berlin during the first world war), and the
ensuring argument strained their friendship nearly to the breaking point. Eventually they
appealed to a mutual friend, Wolfgang Pauli, who tried to clarify the issues. Born was
sure that Einstein’s critique of quantum mechanics was focused on the lack of
determinism, but Pauli explained (with the benefit of discussing the matter with Einstein
personally at Princeton) that this was not the case. Pauli wrote to Born that
Einstein does not consider the concept of ‘determinism’ to be as fundamental as it
is frequently held to be (as he told me emphatically many times), and he denied
energetically that he ever put up a postulate such as (your letter, para 3) ‘the
sequence of such conditions must also be objective and real, that is, automatic,
machine-like, deterministic’. In the same way, he disputes that he uses as criterion
for the admissibility of a theory the question ‘Is it rigorously deterministic?’
This should not be surprising, given that Einstein knew it is impossible to ever decide
whether or not the world is deterministic. Pauli went on to explain the position that
Einstein himself had already described in the EPR paper years earlier, i.e., the insistence
on what might be called complete realism. Pauli summarized his understanding of
Einstein’s view, along with his own response to it, in the letter to Born, in which he tried
to explain why he thought it was “misleading to bring the concept of determinism into the
dispute with Einstein”. He wrote
Einstein would demand that the 'complete real description of the System', even
before an observation, must already contain elements which would in some way
correspond with the possible differences in the results of the observations. I think,
on the other hand, that this postulate is inconsistent with the freedom of the
experimenter to select mutually exclusive experimental arrangements…
Born accepted Pauli’s appraisal of the dispute, and conceded that he (Born) had been
wrong in thinking Einstein’s main criterion was determinism. Born’s explanation of his
misunderstanding was that he simply couldn’t believe Einstein would demand a
“complete real description” beyond that which can be perceived. The great lesson that
Born, Heisenberg, and the other pioneers of quantum mechanics had taken from
Einstein’s early work on special relativity was that we must insist on operational
definitions for all the terms of a scientific theory, and deny meaning to concepts or
elements of a theory that have no empirical content. But Einstein did not hold to that
belief, and even chided Born for adopting the positivistic maxim esse est percepi.
There is, however, a certain irony in Pauli’s position, since he asserts the irrelevance of
the concept of determinism, but at the same time criticizes Einstein’s “postulate” by
saying that it is “inconsistent with the freedom of the experimenter to select mutually
exclusive experimental arrangements”. As discussed in the previous section, this freedom
is itself a postulate, an unproveable proposition, and one that is obviously inconsistent
with determinism. Einstein argued that determinism is an undecidable proposition in the
absolute sense, and hence not a suitable criterion for physical theories, whereas Born and
Pauli implicitly demanded non-determinism of a physical theory.
By the way, Pauli and his psychoanalyst Carl Jung spent much time developing a concept
which they called synchronicity, loosely defined as the coincidental occurrence of noncausally related events that nevertheless exhibit seemingly meaningful correlations. This
was presented as a complementary alternative to the more scientific principle of
causation. One notable example of synchronicity was the development of the concept of
synchronicity itself, along side Einstein’s elucidation of non-classical correlations
between distant events implied by quantum mechanics. But Pauli (like Born) didn’t place
any value on Einstein’s “realist” reasons for rejecting their quantum mechanics as a
satisfactory theory. Pauli wrote to Born
One should no more rack one’s brain about the problem of whether something
one cannot know anything about exists all the same, than about the ancient
question of how many angels are able to sit on the point of a needle. But it seems
to me that Einstein’s questions are ultimately always of this kind.
It’s interesting that Pauli referred to the question of how many angles can sit on the point
of a needle, since one of his most important contributions to quantum mechanics was the
exclusion principle, which in effect answered the question of how many electrons can fit
into a single quantum state. He and Jung might have cited this as an example of the
collective unconscious reaching back to the scholastic theologians. Pauli seems to given
credence to Jung’s theory of archetypes, according to which the same set of organizing
principles and forms (the “unus mundus”) that govern the physical world also shape the
human mind, so there is a natural harmony between physical laws and human thoughts.
To illustrate this, Pauli wrote an essay on Kepler, which was published along with Jung’s
treatise on synchronicity.
The complementarity interpretation of quantum mechanics, developed by Bohr, can be
seen as an attempted compromise with Einstein over his demand for realism (similar to
Einstein’s effort to reconcile relativity with the language of Lorentz’s ether). Two
requirements of a classical description of phenomena are that they be strictly causal and
that they be expressed in terms of space and time. According to Bohr, these two
requirements are mutually exclusive. As summarized by Heisenberg
There exists a body of exact mathematical laws, but these cannot be interpreted as
expressing simple relationships between objects existing in space and time. The
observable predictions of this theory can be approximately described in such
terms, but not uniquely… This is a direct result of the indeterminateness of the
concept “observation”. It is not possible to decide, other than arbitrarily, what
objects are to be considered as part of the observed system and what as part of the
observer’s apparatus. The concept “observation” … can be carried over to atomic
phenomena only when due regard is paid to the limitations placed on all spacetime descriptions by the uncertainty principle.
Thus any description of events in terms of space and time must include acausal aspects,
and conversely any strictly causal description cannot be expressed in terms of space and
time. This of course was antithetical to Einstein, who maintained that the general theory
of relativity tells us something exact about space and time. (He wrote in 1949 that “In my
opinion the equations of general relativity are more likely to tell us something precise
than all other equations of physics”.) To accept that the fundamental laws of physics are
incompatible with space and time would require him to renounce general relativity. He
occasionally contemplated the possibility that this step might be necessary, but never
really came to accept it. He continued to seek a conceptual framework that would allow
for strictly causal descriptions of objects in space and time – even if it required the
descriptions to involve purely hypothetical components. In this respect his attitude
resembled that of Lorentz, who, in his later years, continued to argue for the conceptual
value of the classical ether and absolute time, even though he was forced to concede that
they were undetectable.
9.8 Quaedam Tertia Natura Abscondita
The square root of 9 may be either +3 or -3, because a plus times a plus
or a minus times a minus yields a plus. Therefore the square root of -9 is
neither +3 nor -3, but is a thing of some obscure third nature.
Girolamo
Cardano, 1545
In a certain sense the peculiar aspects of quantum spin measurements in EPR-type
experiments can be regarded as a natural extension of the principle of special relativity.
Classically a particle has an intrinsic spin about some axis with an absolute direction, and
the results of measurements depend on the difference between this absolute spin axis and
the absolute measurement axis. In contrast, quantum theory says there are no absolute
spin angles, only relative spin angles. In other words, the only angles that matter are the
differences between two measurements, whose absolute values have no physical
significance. Furthermore, the relations between measurements vary in a non-linear way,
so it's not possible to refer them to any absolute direction.
This "relativity of angular reference frames" in quantum mechanics closely parallels the
relativity of translational reference frames in special relativity. This shouldn’t be too
surprising, considering that velocity “boosts” are actually rotations through imaginary
angles. Recall from Section 2.4 that the relationship between the frequencies of a given
signal as measured by the emitter and absorber depends on the two individual speeds ve
and va relative to the medium through which the signal propagates at the speed cs, but as
this speed approaches c (the speed of light in a vacuum), the frequency shift becomes
dependent only on a single variable, namely, the mutual speed between the emitter and
absorber relative to each other. This degeneration of dependency from two independent
“absolute” variables down to a single “relative” variable is so familiar today that we take
it for granted, and yet it is impossible to explain in classical Newtonian terms.
Schematically we can illustrate this in terms of three objects in different translational
frames of reference as shown below:
The object B is stationary (corresponding to the presumptive medium of signal
propagation), while objects A and C move relative to B in opposite directions at high
speed. Intuitively we would expect the velocity of A in terms of the rest frame of C (and
vice versa) to equal the sum of the velocities of A and C in terms of the rest frame of B. If
we allowed the directions of motion to be oblique, we would still have the “triangle
inequality” placing limits on how the mutual speeds are related to each other. This could
be regarded as something like a “Bell inequality” for translational frames of reference.
When we measure the velocity of A in terms of the rest frame of C we find that it does
not satisfy this additive property, i.e., it violates "Bell's inequality" for special relativity.
Compare the above with the actual Bell's inequality for entangled spin measurements in
quantum mechanics. Two measurements of the separate components of an entangle pair
may be taken at different orientations, say at the angles A and C, relative to the
presumptive common spin axis of the pair, as shown below:
We then determine the correlations between the results for various combinations of
measurement angles at the two ends of the experiment. Just as in the case of frequency
measurements taken at two different boost angles, the classical expectation is that the
correlation between the results will depend on the two measurement angles relative to
some reference direction established by the mechanism. But again we find that the
correlations actually depend only on the single difference between angles A and C, not on
their two individual values relative to some underlying reference.
The close parallel between the “boost inequalities” in special relativity and the Bell
inequalities for spin measurements in quantum mechanics is more than just superficial.
In both cases we find that the assumption of an absolute frame (angular or translational)
leads us to expect a linear relation between observable qualities, and in both cases it turns
out that in fact only the relations between one realized event and another, rather than
between a realized event and some absolute reference, govern the outcomes. Recall from
Section 9.5 that the correlation between the spin measurements (of entangled spin-1/2
particles) is simply -cos() where  is the relative spatial angle between the two
measurements. The usual presumption is that the measurement devices are at rest with
respect to each other, but if they have some non-zero relative velocity v, we can
represent the "boost" as a complex rotation through an angle  = arctanh(v) where arctanh
is the inverse hyperbolic tangent (see Part 6 of the Appendix). By analogy, we might
expect the "correlation" between measurements performed with respect to two basis
systems with this relative angle would be
which of course is Lorentz-Fitzgerald factor that scales the transformation of space and
time intervals from one system of inertial coordinates to another, leading to the
relativistic Doppler effect, and so on. In other words, this factor represents the projection
of intervals in one frame onto the basis axes of another frame, just as the correlation
between the particle spin measurements is the projection of the spin vector onto the
respective measurement bases. Thus the "mysterious" and "spooky" correlations of
quantum mechanics can be placed in close analogy with the time dilation and length
contraction effects of special relativity, which once seemed equally counterintuitive. The
spinor representation, which uses complex numbers to naturally combine spatial rotations
and "boosts" into a single elegant formalism, was discussed in Section 2.6. In this
context we can formulate a generalized "EPR experiment" allowing the two measurement
bases to differ not only in spatial orientation but also by a boost factor, i.e., by a state of
relative motion. The resulting unified picture shows that the peculiar aspects of quantum
mechanics can, to a surprising extent, be regarded as aspects of special relativity.
In a sense, relativity and quantum theory could be summarized as two different strategies
for accommodating the peculiar wave-particle duality of physical phenomena. One of the
problems this duality presented to classical physics was that apparently light could either
be treated as an inertial particle emitted at a fixed speed relative to the source, ala Newton
and Ritz, or it could be treated as a wave with a speed of propagation fixed relative to the
medium and independent of the source, ala Maxwell. But how can it be both? Relativity
essentially answered this question by proposing a unified spacetime structure with an
indefinite metric (viz, a pseudo-Riemannian metric). This is sometimes described by
saying time is imaginary, so its square contributes negatively to the line element, and
yields an invariant null-cone structure for light propagation, yielding invariant light
speed.
But waves and particles also differ with regard to interference effects, i.e., light can be
treated as a stream of inertial particles with no interference (though perhaps "fits and
starts) ala Newton, or as a wave with fully wavelike interference effects, ala Huygens.
Again the question was how to account for the fact that light exhibits both of these
characteristics. Quantum mechanics essentially answered this question by proposing that
observables are actually expressible in terms of probability amplitudes, and these
amplitudes contain an imaginary component which, upon taking the norm, can contribute
negatively to the probabilities, yielding interference effects.
Thus we see that both of these strategies can be expressed in terms of the introduction of
imaginary (in the mathematical sense) components in the descriptions of physical
phenomena, yielding the possibility of cancellations in, respectively, the spacetime
interval and superposition probabilities (i.e., interference). They both attempt to reconcile
aspects of the wave-particle duality of physical entities. The intimate correspondence
between relativity and quantum theory was not lost on Niels Bohr, who remarked in his
Warsaw lecture in 1938
Even the formalisms, which in both theories within their scope offer adequate
means of comprehending all conceivable experience, exhibit deep-going
analogies. In fact, the astounding simplicity of the generalisation of classical
physical theories, which are obtained by the use of multidimensional [nonpositive-definite] geometry and non-commutative algebra, respectively, rests in
both cases essentially on the introduction of the conventional symbol sqrt(-1).
The abstract character of the formalisms concerned is indeed, on closer
examination, as typical of relativity theory as it is of quantum mechanics, and it is
in this respect purely a matter of tradition if the former theory is considered as a
completion of classical physics rather than as a first fundamental step in the
thorough-going revision of our conceptual means of comparing observations,
which the modern development of physics has forced upon us.
Of course, Bernhardt Riemann, who founded the mathematical theory of differential
geometry that became general relativity, also contributed profound insights to the theory
of complex functions, the Riemann sphere (Section 2.6), Riemann surfaces, and so on.
(Here too, as in the case of differential geometry, Riemann built on and extended the
ideas of Gauss, who was among the first to conceive of the complex number plane.) More
recently, Roger Penrose has argued that some “complex number magic” seems to be at
work in many of the most fundamental physical processes, and his twistor formalism is
an attempt to find a framework for physics that exploits this the special properties of
complex functions at a fundamental level.
Modern scientists are so used to complex numbers that, in some sense, the mystery is
now reversed. Instead of being surprised at the physical manifestations of imaginary and
complex numbers, we should perhaps wonder at the preponderance of realness in the
world. The fact is that, although the components of the state vector in quantum mechanics
are generally complex, the measurement operators are all required – by fiat – to be
Hermitian, meaning that they have strictly real eigenvalues. In other words, while the
state of a physical system is allowed to be complex, the result of any measurement is
always necessarily real. So we can’t claim that nature is indifferent to the distinction
between real and imaginary numbers. This suggests to some people a connection between
the “measurement problem” in quantum mechanics and the ontological status of
imaginary numbers.
The striking similarity between special relativity and quantum mechanics can be traced to
the fact that, in both cases, two concepts that were formerly regarded as distinct and
independent are found not to be so. In the case of special relativity, the two concepts are
space and time, whereas in quantum mechanics the two concepts are position and
momentum. Not surprisingly, these two pairs of concepts are closely linked, with space
corresponding to position, and time corresponding to momemtum (the latter representing
the derivative of position with respect to time). Considering the Heisenberg uncertainty
relation, it’s tempting to paraphrase Minkowski’s famous remark, and say that henceforth
position by itself, and momentum by itself, are doomed to fade away into mere shadows,
and only a kind of union of the two will preserve an independent reality.
9.9 Locality and Temporal Asymmetry
All these fifty years of conscious brooding have brought me no nearer to
the answer to the question, 'What are light quanta?' Nowadays every Tom,
Dick and Harry thinks he knows it, but he is mistaken.
Einstein, 1954
We've seen that the concept of locality plays an important role in the EPR thesis and the
interpretation of Bell's inequalities, but what precisely is the meaning of locality,
especially in a quasi-metric spacetime in which the triangle inequality doesn't hold? The
general idea of locality in physics is based on some concept of nearness or proximity, and
the assertion that physical effects are transmitted only between suitably "nearby" events.
From a relativistic standpoint, locality is often defined as the proposition that all causal
effects of a particular event are restricted to the interior (or surface) of the future null
cone of that event, which effectively prohibits communication between spacelikeseparated events (i.e., no faster-than-light communication). However, this restriction
clearly goes beyond a limitation based on proximity, because it specifies the future null
cone, thereby asserting a profound temporal asymmetry in the fundamental processes of
nature.
What is the basis of this asymmetry? It certainly is not apparent in the form of the
Minkowski metric, nor in Maxwell's equations. In fact, as far as we know, all the
fundamental processes of nature are perfectly time-symmetric, with the single exception
of certain processes involving the decay of neutral kaons. However, even in that case, the
original experimental evidence in 1964 for violation of temporal symmetry was actually a
demonstration of asymmetry in parity and charge conjugacy, from which temporal
asymmetry is indirectly inferred on the basis of the CPT Theorem. As recently as 1999
there were still active experimental efforts to demonstrate temporal asymmetry directly.
In any case, aside from the single rather subtle peculiarity in the behavior of neutral
kaons, no one has ever found any evidence at all of temporal asymmetry in any
fundamental interaction. How, then, do we justify the explicit temporal asymmetry in our
definition of locality for all physical interactions?
As an example, consider electromagnetic interactions, and recall that the only invariant
measure of proximity (nearness) in Minkowski spacetime is the absolute interval
which is zero between the emission and absorption of a photon. Clearly, any claim that
influence can flow from the emission event to the absorption event but not vice versa
cannot be based on an absolute concept of physical nearness. Such a claim amounts to
nothing more or less than an explicit assertion of temporal asymmetry for the most
fundamental interactions, despite the complete lack of justification or evidence for such
asymmetry in photon interactions. Einstein commented on the unnaturalness of
irreversibility in fundamental interactions in a 1909 paper on electromagnetic radiation,
in which he argued that the asymmetry of the elementary process of radiation according
to the classical wave theory of light was inconsistent with what we know of other
elementary processes.
While in the kinetic theory of matter there exists an inverse process for every
process in which only a few elementary particles take part (e.g., for every
molecular collision), according to the wave theory this is not the case for
elementary radiation processes. According to the prevailing theory, an oscillating
ion produces an outwardly propagated spherical wave. The opposite process does
not exist as an elementary process. It is true that the inwardly propagated
spherical wave is mathematically possible, but its approximate realization requires
an enormous number of emitting elementary structures. Thus, the elementary
process of light radiation as such does not possess the character of reversibility.
Here, I believe, our wave theory is off the mark. Concerning this point the
Newtonian emission theory of light seems to contain more truth than does the
wave theory, since according to the former the energy imparted at emission to a
particle of light is not scattered throughout infinite space but remains available for
an elementary process of absorption.
In the same paper he wrote
For the time being the most natural interpretation seems to me to be that the
occurence of electromagnetic fields of light is associated with singular points just
like the occurence of electrostatic fields according to the electron theory. It is not
out of the question that in such a theory the entire energy of the electromagnetic
field might be viewed as localized in these singularities, exactly like in the old
theory of action at a distance.
This is a remarkable statement coming from Einstein, considering his deep commitment
to the ideas of locality and the continuum. The paper is also notable for containing his
premonition about the future course of physics:
Today we must regard the ether hypothesis as an obsolete standpoint. It is
undeniable that there is an extensive group of facts concerning radiation that
shows that light possesses certain fundamental properties that can be understood
far more readily from the standpoint of Newton's emission theory of light than
from the standpoint of the wave theory. It is therefore my opinion that the next
stage in the development of theoretical physics will bring us a theory of light that
can be understood as a kind of fusion of the wave and emission theories of light.
Likewise in a brief 1911 paper on the light quantum hypothesis, Einstein presented
reasons for believing that the propagation of light consists of a finite number of energy
quanta which move without dividing, and can be absorbed and generated only as a
whole. Subsequent developments (quantum electrodynamics) have incorporated these
basic insights, leading us to regard a photon (i.e., an elementary interaction) as an
indivisible whole, including the null-separated emission and absorption events on a
symmetrical footing. This view is supported by the fact that once a photon is emitted, its
quantum phase does not advance while "in flight", because quantum phase is proportional
to the absolute spacetime interval, which, as discussed in Section 2.1, is what gives the
absolute interval its physical significance. If we take seriously the spacetime interval as
the absolute measure of proximity, then the transmission of a photon is, in some sense, a
single event, coordinated mutually and symmetrically between the points of emission and
absorption.
This image of a photon as a single unified event with a coordinated emission and
absorption seems unsatisfactory to many people, partly because it doesn't allow for the
concept of a "free photon", i.e., a photon that was never emitted and is never absorbed.
However, it's worth remembering that we have no direct experience of "free photons",
nor of any "free particles", because ultimately all our experience is comprised of
completed interactions. (Whether this extends to gravitational interactions is an open
question.) Another possible objection to the symmetrical view of elementary interactions
is that it doesn't allow for a photon to have wave properties, i.e., to have an evolving state
while "in flight", but this objection is based on a misconception. From the standpoint of
quantum electrodynamics, the wave properties of electromagnetic radiation are actually
wave properties of the emitter. All the potential sources of a photon have a certain
(complex) amplitude for photon emission, and this amplitude evolves in time as we
progress along the emitter's worldline. However, as noted above, once a photon is
emitted, its phase does not advance. In a sense, the ancients who conceived of sight as
something like a blind man's incompressible cane, feeling distant objects, were correct,
because our retinas actually are in "direct" contact, via null intervals, with the sources of
light. The null interval plays the role of the incompressible cane, and the wavelike
properties we "feel" are really the advancing quantum phases of the source.
One might think that the reception amplitude for an individual photon must evolve as a
function of its position, because if we had (contra-factually) encountered a particular
photon one meter further away from its source than we did, we would surely have found
it with a different phase. However, this again is based on a misconception, because the
photon we would have received one meter further away (on the same timeslice) would
necessarily have been emitted one light-meter earlier, carrying the corresponding phase
of the emitter at that point on its worldline. When we consider different spatial locations
relative to the emitter, we have to keep clearly in mind which points they correspond to
along the worldline of the emitter.
Taking another approach, it might seem that we could "look at" a single photon at
different distances from the emitter (trying to show that its phase evolves in flight) by
receding fast enough from the emitter so that the relevant emission event remains
constant, but of course the only way to do this would be to recede at the speed of light
(i.e., along a null interval), which isn't possible. This is just a variation of the young
Einstein's thought experiment about how a "standing wave" of light would appear to
someone riding along side it. The answer, of course, is that it’s not possible for a material
object to move along-side a pulse of light (in vacuum), because light exists only as
completed interactions on null intervals. If we attempted such an experiment, we would
notice that, as our speed of recession from the source gets closer to c, the difference
between the phases of the photons we receive becomes smaller (i.e., the "frequency" of
the light gets red-shifted), and approaches zero, which is just what we should expect
based on the fact that each photon is simply the lightlike null projection of the emitter's
phase at a point on the emitter's worldline. Hence, if we stay on the same projection ray
(null interval), we are necessarily looking at the same phase of the emitter, and this is true
everywhere on that null ray. This leads to the view that the concept of a "free photon" is
meaningless, and a photon is nothing but the communication of an emitter event's phase
to some null-separated absorber event, and vice versa.
More generally, since the Schrodinger wave function propagates at c, it follows that every
fundamental quantum interaction can be regarded as propagating on null surfaces. Dirac
gave an interesting general argument for this strong version of Huygens' Principle in the
context of quantum mechanics. In his "Principles of Quantum Mechanics" he noted that a
measurement of a component of the instantaneous velocity of a free electron must give
the value c, which implies that electrons (and massive particles in general) always
propagate along null intervals, i.e., on the local light cone. At first this may seem to
contradict the fact that we observe massive objects to move at speeds much less than the
speed of light, but Dirac points out that observed velocities are always average velocities
over appreciable time intervals, whereas the equations of motion of the particle show that
its velocity oscillates between +c and -c in such a way that the mean value agrees with
the average value. He argues that this must be the case in any relativistic theory that
incorporates the uncertainty principle, because in order to measure the velocity of a
particle we must measure its position at two different times, and then divide the change in
position by the elapsed time. To approximate as closely as possible to the instantaneous
velocity, the time interval must go to zero, which implies that the position measurements
must approach infinite precision. However, according to the uncertainty principle, the
extreme precision of the position measurement implies an approach to infinite
indeterminancy in the momentum, which means that almost all values of momentum from zero to infinity - become equally probable. Hence the momentum is almost certainly
infinite, which corresponds to a speed of c. This is obviously a very general argument,
and applies to all massive particles (not just fermions). This oscillatory propagation on
null cones is discussed further in Section 9.11.
Another argument that seems to favor a temporally symmetric view of fundamental
interactions comes from consideration of the exchange of virtual photons. (Whether
virtual particles deserve to be called "real" particles is debatable; many people prefer to
regard them only as sometimes useful mathematical artifacts, terms in the expansion of
the quantum field, with no ontological status. On the other hand, it's possible to regard all
fundamental particles that way, so in this respect virtual particles are not unique.) The
emission and absorption points of virtual particles may be space-like separated, and we
therefore can't say unambiguously that one happened "before" the other. The temporal
order is dependent on the reference frame. Surely in these circumstances, when it's not
even possible to say absolutely which side of the interaction was the emission and which
was the absorption, those who maintain that fundamental interactions possess an inherent
temporal asymmetry have a very difficult case to make. Over limited ranges, a similar
argument applies to massive particles, since there is a non-negligible probability of a
particle traversing a spacelike interval if it's absolute magnitude is less than about
h2/(2m)2, where h is Planck's constant and m is the mass of the particle. So, if virtual
particle interactions are time-symmetric, why not all fundamental particle interactions?
(Needless to say, time-symmetry of fundamental quantum interactions does not preclude
asymmetry for macroscopic processes involving huge numbers of individual quantum
interactions evolving from some, possibly very special, boundary conditions.)
Experimentally, those who argue that the emission of a photon is conditioned by its
absorption can point to the results from tests of Bell's inequalities, because the observed
violations of those inequalities are exactly what the symmetrical model of interactions
would lead us to expect. Nevertheless, the results of those experiments are rarely
interpreted as lending support to the symmetrical model, apparently because temporal
asymmetry is so deeply ingrained in peoples' intuitive conceptions of locality, despite the
fact that there is very little (if any) direct evidence of temporal asymmetry in any
fundamental laws or interactions.
Despite the preceding arguments in favor of symmetrical (reversible) fundamental
processes, there are clearly legitimate reasons for being suspicious of unrestricted
temporal symmetry. If it were possible for general information to be transmitted
efficiently along the past null cone of an event, this would seem to permit both causal
loops and causal interactions with spacelike-separated events, as illustrated below.
On such a basis, it might seem as if the Minkowskian spacetime manifold would be
incapable of supporting any notion of locality at all. That triangle inequality fails in this
manifold, so there are null paths connecting every two points, and this applies even to
spacelike separated points if we allow the free flow of information in either direction
along null surfaces. Indeed this seems to have been the main source of Einstein’s
uneasiness with the “spooky” entanglements entailed by quantum theory. In a 1948 letter
to Max Born, Einstein tried to clearly articulate his concern with entanglement, which he
regarded as incompatible with “the confidence I have in the relativistic group as
representing a heuristic limiting principle”.
It is characteristic of physical objects [in the world of ideas] that they are thought
of as arranged in a space-time continuum. An essential aspect of this arrangement
of things in physics is that they lay claim, at a certain time, to an existence
independent of one another, provided these objects ‘are situated in different parts
of space’. Unless one makes this kind of assumption about the independence of
the existence (the 'being-thus') of objects which are far apart from one another in
space… the idea of the existence of (quasi) isolated systems, and thereby the
postulation of laws which can be checked empirically in the accepted sense,
would become impossible.
In essence, he is arguing that without the assumption that it is possible to localize
physical systems, consistent with the relativistic group, in such a way that they are
causally isolated, we cannot hope to analyze events in any effective way, such that one
thing can be checked against another. After describing how quantum mechanics leads
unavoidably to entanglement of potentially distant objects, and therefore dispenses with
the principle of locality (in Einstein’s view), he says
When I consider the physical phenomena known to me, even those which are
being so successfully encompased by quantum mechanics, I still cannot find any
fact anywhere which would make it appear likely that the requirement [of
localizability] will have to be abandoned.
At this point the precise sense in which quantum mechanics entail non-classical
“influences” (or rather, correlations) for space-like separated events had not yet been
clearly formulated, and the debate between Born and Einstein suffered (on both sides)
from this lack of clarity. Einstein seems to have intuited that quantum mechanics does
indeed entail distant correlations that are inconsistent with very fundamental classical
notions of causality and independence, but he was unable to formulate those correlations
clearly. For his part, Born outlined a simple illustration of quantum correlations occuring
in the passage of light rays through polarizing filters – which is exactly the kind of
experiment that, twenty years later, provided an example of the very thing that Einstein
said he had been unable to find, i.e., a fact which makes it appear that the requirement of
localizability must be abandoned. It’s unclear to what extent Born grasped the nonclassical implications of those phenomena, which isn’t surprising, since the Bell
inequalities had not yet been formulated. Born simply pointed out that quantum
mechanics allows for coherence, and said that “this does not go too much against the
grain with me”.
Born often argued that classical mechanics was just as probabilistic as quantum
mechanics, although his focus was on chaotic behavior in classical physics, i.e.,
exponential sensitivity to initial conditions, rather than on entanglement. Born and
Einstein often seemed to be talking past each other, since Born focused on the issue of
determinism, whereas Einstein’s main concern was localizability. Remarkably, Born
concluded his reply by saying
I believe that even the days of the relativistic group, in the form you gave it, are
numbered.
One might have thought that experimental confirmation of quantum entanglement would
have vindicated Born’s forecast, but we now understand that the distant correlations
implied by quantum mechanics (and confirmed experimentally) are of a subtle kind that
do not violate the “relativistic group”. This seems to be an outcome that neither Einstein
nor Born anticipated; Born was right that the distant entanglement implicit in quantum
mechanics would be proven correct, but Einstein was right that the relativistic group
would emerge unscathed. But how is this possible? Considering that non-classical distant
correlations have now been experimentally established with high confidence, thereby
undermining the classical notion of localizability, how can we account for the continued
ability of physicists to formulate and test physical laws?
The failure of the triangle inequality (actually, the reversal of it) does not necessarily
imply that the manifold is unable to support non-trivial structure. There are absolute
distinctions between the sets of null paths connecting spacelike separated events and the
sets of null paths connecting timelike separated events, and these differences might be
exploited to yield a structure that conforms with the results of observation. There is no
reason this cannot be a "locally realistic" theory, provided we understand that locality in a
quasi-metric manifold is non-transitive. Realism is simply the premise that the results of
our measurements and observations are determined by an objective world, and it's
perfectly possible that the objective world might possess a non-transitive locality,
commensurate with the non-transitive metrical aspects of Minkowski spacetime. Indeed,
even before the advent of quantum mechanics and the tests of Bell's inequality, we should
have learned from special relativity that locality is not transitive, and this should have led
us to expect non-Euclidean connections and correlations between events, not just
metrically, but topologically as well. From this point of view, many of the seeming
paradoxes associated with quantum mechanics and locality are really just manifestations
of the non-intuitive fact that the manifold we inhabit does not obey the triangle inequality
(which is one of our most basic spatio-intuitions), and that elementary processes are
temporally reversible.
On the other hand, we should acknowledge that the Bell correlations can't be explained in
a locally realistic way simply by invoking the quasi-metric structure of Minkowski
spacetime, because if the timelike processes of nature were ontologically continuous it
would not be possible to regard them as propagating on null surfaces. We also need our
fundamental physical processes to consist of irreducible discrete interactions, as
discussed in Section 9.10.
9.10 Spacetime Mediation of Quantum Interactions
No reasonable definition of reality could be expected to permit this.
Einstein, Podolsky, and Rosen, 1935
According to general relativity the shape of spacetime determines the motions of objects
while those objects determine (or at least influence) the shape of spacetime. Similarly in
electrodynamics the fields determine the motions of charges in spacetime while the
charges determine the fields in spacetime. This dualistic structure naturally arises when
we replace action-at-a-distance with purely local influences in such a way that the
interactions between "separate" objects are mediated by an entity extending between
them. We must then determine the dynamical attributes of this mediating entity, e.g., the
electromagnetic field in electrodynamics, or spacetime itself in general relativity.
However, many common conceptions regarding the nature and extension of these
mediating entities are called into question by the apparently "non-local" correlations in
quantum mechanics, as highlighted by EPR experiments. The apparent non-locality of
these phenomena arises from the fact that although we regard spacetime as metrically
Minkowskian, we continue to regard it as topologically Euclidean. As discussed in the
preceding sections, the observed phenomena are more consistent with a completely
Minkowskian spacetime, in which physical locality is directly induced by the pseudometric of spacetime. According to this view, spacetime operates on matter via
interactions, and matter defines for spacetime the set of allowable interactions, i.e.,
consistent with conservation laws. A quantum interaction is considered to originate on (or
be "mediated" by) the locus of spacetime points that are null-separated from each of the
interacting sites. In general this locus is a quadratic surface in spacetime, and its surface
area is inversely proportional to the mass of the transferred particle.
For two timelike-separated events A and B the mediating locus is a closed surface as
illustrated below (with one of the spatial dimensions suppressed)
The mediating surface is shown here as a dotted circle, but in 4D spacetime it's actually a
closed surface, spherical and purely spacelike relative to the frame of the interval AB.
This type of interaction corresponds to the transit of massive real particles. Of course,
relative to a frame in which A and B are in different spatial locations, the locus of
intersection has both timelike and spacelike extent, and is an ellipse (or rather an
ellipsoidal surface in 4D) as illustrated below
The surface is purely spacelike and isotropic only when evaluated relative to its rest
frame (i.e., the frame of the interval AB), whereas this surface maps to a spatial ellipsoid,
consisting of points that are no longer simultaneous, relative to any other co-moving
frame. The directionally asymmetric aspects of the surface area correspond precisely to
the "relativistic mass" components of the corresponding particles as a function of the
relative velocity of the frames.
The propagation of a free massive particle along a timelike path through spacetime can be
regarded as involving a series of surfaces, from which emanate inward-going "waves"
along the nullcones in both the forward and backward direction, deducting the particle
from the past focal point and adding it to the future focal point, as shown below for
particles with different masses.
Recall that the frequency  of the de Broglie matter wave of a particle of mass m is
where px, py, pz are the components of momentum in the three directions. For a
(relatively) stationary particle the momentums vanish and the frequency is just
=(mc2)/h sec-1. Hence the time per cycle is inversely proportional to the mass. So, since
each cycle consists of an advanced and a retarded cone, the surface of intersection is a
sphere (for a stationary mass particle) of radius r = h/mc, because this is how far along
the null cones the wave propagates during one cycle. Of course, h/mc is just the Compton
scattering wavelength of a particle of mass m, which characterizes the spatial expanse
over which a particle tends to "scatter" incident photons in a characteristic way. This can
be regarded as the effective size of a particle when "viewed" by means of gamma-rays.
We may conceive of this effect being due to a high-energy photon getting close enough
to the nominal worldline of the massive particle to interfere with the null surfaces of
propagation, upsetting the phase coherence of the null waves and thereby diverting the
particle from it's original path.
For a massless particle the quantum phase frequency is zero, and a completely free
photon (if such a thing existed) would just be represented by an entire null-cone. On the
other hand, real photons are necessarily emitted and absorbed, so they corresponds to
bounded null intervals. Consistent with quantum electrodynamics, the quantum phase of
photon does not advance while in transit between its emission and absorption (unlike
massive particles). According to this view, the oscillatory nature of macroscopic
electromagnetic waves arises from the advancing phase of the source, rather than from
any phase activity of an actual photon.
The spatial volume swept out by a mediating surface is a maximum when evaluated with
respect to it's rest frame. When evaluated relative to any other frame of reference, the
spatial contraction causes the swept volume to be reduced. This is consistent with the
idea that the effective mass of a particle is inversely proportional to the swept volume of
the propagating surface, and it's also consistent with the effective range of mediating
particles being inversely proportional to their mass, since the electromagnetic force
mediated by massless photons has infinite range, whereas the strong nuclear force has a
very limited range because it is mediated by massive particles. Schematics of a stationary
and a moving particle are shown below.
This is the same illustration that appeared in the discussion of Lorentz's "corresponding
states" in Section 1.5, although in that context the shells were understood to be just
electromagnetic waves, and Lorentz simply conjectured that all physical phenomena
conform to this same structure and transform similarly. In a sense, the relativistic
Schrodinger wave equation and Dirac's general argument for light-like propagation of all
physical entities based on the combination of relativity and quantum mechanics (as
discussed in Section 9.10) provide the modern justification for Lorentz's conjecture.
Looking back even further, we see that by conceiving of a particle as a sequence of
surfaces of finite extent, it is finally possible to answer Zeno's question about how a
moving particle differs from a stationary particle in "a single instant". The difference is
that the mediating surfaces of a moving particle are skewed in spacetime relative to those
of a stationary particle, corresponding to their respective planes of simultaneity.
Some quantum interactions involve more than two particles. For example, if two coupled
particles separate at point A and interact with particles at points B and C respectively, the
interaction (viewed straight from the side) looks like this:
The mediating surface for the pair AB intersects with the mediating surface for AC at the
two points of intersection of the dotted circles, but in full 4D spacetime the intersection of
the two mediating spheres is a closed circle. (It's worth noting that these two surfaces
intersect if and only if B and C are spacelike separated. This circle enforces a particular
kind of consistency on any coherent waves that are generated on the two mediating
surfaces, and are responsible for "EPR" type correlation effects.)
The locus of null-separated points for two lightlike-separated events is a degenerate
quadratic surface, namely, a straight line as represented by the segment AB below:
The "surface area" of this locus (the intersection of the two cones) is necessarily zero, so
these interactions represent the transits of massless particles. For two spacelike-separated
events the mediating locus is a two-part hyperboloid surface, represented by the
hyperbola shown at the intersection of two null cones below
This hyperboloid surface has infinite area, which suggests that any interaction between
spacelike separated events would correspond to the transit of an infinitely massive
particle. On this basis it seems that these interactions can be ruled out. There is, however,
a limited sense in which such interactions might be considered. Recall that a
pseudosphere can be represented as a sphere with purely imaginary radius. It's
conceivable that observed interactions involving virtual (conjugate) pairs of particles over
spacelike intervals (within the limits imposed by the uncertainty relations) may
correspond to hyperboloid mediating surfaces.
(It's also been suggested that in a closed universe the "open" hyperboloid surfaces might
need to be regarded as finite, albeit extremely huge. For example, they might be 35 orders
of magnitude larger than the mediating surfaces for timelike interactions. This is related
to vague notions that "h" is in some sense the "inverse" of the size of a finite universe. In
a much smaller closed universe (as existed immediately following the big bang) there
may be have been an era in which the "hyperboloid" surfaces had areas comparable to the
ellipsoid surfaces, in which case the distinction between spacelike and time-like
interactions would have been less significant.)
An interesting feature of this interpretation is that, in addition to the usual 3+1
dimensions, spacetime requires two more "curled up" dimensions of angular orientation
to represent the possible directions in space. The need to treat these as dimensions in their
own right arises from the non-transitive topology of the pseudo-Riemannian manifold.
Each point [t,x,y,z] actually consists of a two-dimensional orientation space, which can
be parameterized (for any fixed frame) in terms of ordinary angular coordinates  and .
Then each point in the six-dimensional space with coordinates [x,y,z,t,,] is a terminus
for a unique pair of spacetime rays, one forward and one backward in time. A simple
mechanistic visualization of this situation is to imagine a tiny computer at each of these
points, reading its input from the two rays and sending (matched conservative) outputs on
the two rays. This is illustrated below in the xyt space:
The point at the origin of these two views is on the mediating surface of events A and B.
Each point in this space acts purely locally on the basis of purely local information.
Specifying a preferred polarity for the two null rays terminating at each point in the 6D
space, we automatically preclude causal loops and restrict information flow to the future
null cone, while still preserving the symmetry of wave propagation. (Note that an
essential feature of spacetime mediation is that both components of a wave-pair are
"advanced", in the sense that they originate on a spherical surface, one emanating forward
and one backward in time, but both converge inward on the particles involved in the
interaction.
According to this view, the "unoccupied points" of spacetime are elements of the 6D
space, whereas an event or particle is an element of the 4D space (t,x,y,z). If effect an
event is the union of all the pairs of rays terminating at each point (x,y,z). We saw in
Chapter 3.5 that the transformations of  and  under Lorentzian boosts are beautifully
handled by linear fractional functions applied to their stereometric mappings on the
complex plane.
One common objection to the idea that quantum interactions occur locally between nullseparated points is based on the observation that, although every point on the mediating
surface is null-separated from each of the interacting events, they are spacelike-separated
from each other, and hence unable to communicate or coordinate the generation of two
equal and opposite outgoing quantum waves (one forward in time and one backward in
time). The answer to this objection is that no communication is required, because the
"coordination" arises naturally from the context. The points on the mediating locus are
not communicating with each other, but each of them is in receipt of identical bits of
information from the two interaction events A and B. Each point responds independently
based on its local input, but the combined effect of the entire locus responding to the
same information is a coherent pair of waves.
Another objection to the "spacetime mediation" view of quantum mechanics is that it
relies on temporally symmetric propagation of quantum waves. Of course, this objection
can't be made on strictly mathematical grounds, because both Maxwell's equations and
the (relativistic) Schrodinger equation actually are temporally symmetric. The objection
seems to be motivated by the idea that the admittance of temporally symmetric waves
automatically implies that every event is causally implicated in every other event, if not
directly by individual interactions then by a chain of interactions, resulting in a nonsensical mess. However, as we've seen, the spacetime mediation view leads naturally to
the conclusion that interactions between spacelike-separated events are either impossible
or else of a very different (virtual) character than interactions along time-like intervals.
Moreover, the stipulation of a preferred polarity for the ray pairs terminating at each
point is sufficient to preclude causal loops.