Download 4. Weighty Arguments - The University of Arizona – The Atlas Project

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Classical central-force problem wikipedia , lookup

Scale relativity wikipedia , lookup

Modified Newtonian dynamics wikipedia , lookup

Lagrangian mechanics wikipedia , lookup

Relativistic mechanics wikipedia , lookup

Kinematics wikipedia , lookup

Centripetal force wikipedia , lookup

Relativity priority dispute wikipedia , lookup

Fictitious force wikipedia , lookup

Hunting oscillation wikipedia , lookup

Sagnac effect wikipedia , lookup

Routhian mechanics wikipedia , lookup

Velocity-addition formula wikipedia , lookup

Classical mechanics wikipedia , lookup

Analytical mechanics wikipedia , lookup

Equations of motion wikipedia , lookup

Four-vector wikipedia , lookup

Relational approach to quantum physics wikipedia , lookup

One-way speed of light wikipedia , lookup

History of special relativity wikipedia , lookup

Criticism of the theory of relativity wikipedia , lookup

Frame of reference wikipedia , lookup

Newton's laws of motion wikipedia , lookup

Faster-than-light wikipedia , lookup

Time dilation wikipedia , lookup

Spacetime wikipedia , lookup

Tests of special relativity wikipedia , lookup

Variable speed of light wikipedia , lookup

Inertia wikipedia , lookup

Derivations of the Lorentz transformations wikipedia , lookup

Inertial frame of reference wikipedia , lookup

Special relativity wikipedia , lookup

Special relativity (alternative formulations) wikipedia , lookup

Transcript
4. Weighty Arguments
4.1 Immovable Spacetime
My argument for the notion of space being really independent of body is
founded on the possibility of the material universe being finite and moveable. 'Tis not enough for this learned writer [Leibniz] to reply that he
thinks it would not have been wise and reasonable for God to have made
the material universe finite and moveable… Neither is it sufficient barely
to repeat his assertion that the motion of a finite material universe would
be nothing, and (for want of other bodies to compare it with) would
produce no discoverable change, unless he could disprove the instance
which I gave of a very great change that would happen, viz., that the parts
would be sensibly shocked by a sudden acceleration or stopping of the
motion of the whole: to which instance, he has not attempted to give any
answer.
Samuel Clarke, 1716
Although the words "relativity" and "relational" share a common root, their meanings are
quite different. The principle of relativity asserts that for any material particle in any state
of motion there exists a system of space and time coordinates in terms of which the
particle is instantaneously at rest and inertia is homogeneous and isotropic. Thus the
natural (inertial) decomposition of spacetime intervals into temporal and spatial
components can be defined only relative to some particular frame of reference. Of course,
the absolute spacetime intervals themselves are invariant, so the "relativity" refers only to
the anaytical decomposition of these intervals. (The physical significance of this
particular decomposition is that the quantum phase of any object evolves in proportion to
its "natural" temporal coordinate.) In contrast, the principle of relationism asserts that the
absolute intervals between material objects fully characterize their extrinsic positional
status, without reference to any underlying non-material system of reference which might
be called "absolute space".
The traditional debate between proponents of relational and absolute motion (such as
Leibniz and Clarke, respectively) is of questionable relevance if continuous fields are
accepted as extended physical entities, permeating all of space, because this implies there
are no unoccupied locations. In this context every point in the entire spacetime manifold
is a vertex of actual relations between physical entities, obscuring the distinction between
absolute and relational premises. Moreover, in the context of the general theory of
relativity, the metrical properties of spacetime itself constitute a field, i.e., an extended
physical entity, which not only acts upon material objects but is also acted upon by them,
so the absolute-relational distinction has no clear meaning. However, it remains possible
to regard fields as only representations of effects, and to insist on materiality for
ontological objects, in which case the absolute-relational question remains both relevant
and unresolved.
Physicists have always recognized the appeal of a purely relational theory of motion, but
every such theory has foundered on the same problem, namely, the physicality of
acceleration. For example, one of Newton’s greatest challenges was to account for the
fact that the Moon is relationally stationary with respect to the Earth (i.e., the distance
between Earth and Moon is roughly unchanging), whereas it ought to be accelerating
toward the Earth due to the influence of gravity. What is holding the Moon up? Or, to put
the question differently, why is the Moon not accelerating directly toward the Earth in
accord with the gravitational force that is presumably being applied to it? Newton's
brilliant answer was that the Moon is indeed accelerating directly toward the Earth, and
with precisely the magnitude of acceleration predicted by his gravity formula, but th the
Moon is also moving perpendicularly to the Earth-Moon axis, with a velocity v = R,
where R is the Earth-Moon distance and  is the Moon's angular velocity, i.e., roughly 2
radians/moonth. If it were not accelerating toward the Earth, the Moon would just wander
off tangentially away from the Earth, but the force of gravity is modifying its velocity by
adding GM/R2 ft/sec toward the Earth each second, which causes the Moon to turn
continually in a roughly circular orbit around the Earth. The centripetal acceleration of an
object revolving in a circle is v2/R = 2R, and so (Newton reasoned) this must equal the
gravitational acceleration. Thus we have 2 R3 = GM, which of course is Kepler's third
law. This explanation depends on strictly non-relational concept of motion. In fact, it
might be said that this was the crucial insight of Newtonian dynamics - and it applies no
less in the special theory of relativity. For the purposes of dynamical analysis, motion
must be referred to an absolute background class of rectilinear inertial coordinate
systems, rather than simply to the relations between material bodies, or even classical
fields. Thus we can not infer everything important about an object's state of motion
simply from its distances to other objects (at least not to nearby objects). In this sense,
both Newtonian and relativistic physics find it necessary to invoke absolute space.
But this concept of absolute space presents us with an ontological puzzle, because we can
empirically verify the physical equivalence of all uniform states of motion, which
suggests that position and velocity have no absolute physical significance, and yet we can
also verify that changes in velocity (i.e., accelerations) do have absolute significance,
independent of the relations between material bodies (at least in a local sense). If the
evident relativity of position and velocity lead us to discard the idea of absolute space,
then how are we to understand the apparent absoluteness of acceleration? Some have
argued that in order for the change in something to be ontologically real, it is necessary
for the thing itself to be real, but of course that's not the case. It's perfectly possible for
"the thing itself" to be an artificial conception, whereas the "change" is the ontological
entity. For example, the Newtonian concept of the physical world is a set of particles,
between which relations exist. The primary ontological entities are the particles, but it's
equally possible to imagine that the separations are the "real" entities, and particles are
merely abstract entities, i.e., a convenient bookkeeping device for organizing the facts of
a set of separations. This raises some interesting questions, such as whether an unordered
multiset of n(n1)/2 separations suffices to uniquely determine a configuration of n
points in a space of fixed dimension. It isn't difficult to find examples of multisets of
separations that allow for multiple distinct spatial arrangements. For example, given the
multiset of ten separations
we can construct either of the two five-point configurations shown below
For another example, the following three distinct configurations of eight co-planar points
each have the same multiset of 28 point-to-point separations:
In fact, of the 12870 possible arrangements of eight points on a 4x4 grid, there are only
1120 distinct multisets of separations. Much of this reduction is due to rotations and
reflections, but not all. Intrinsically distinct configurations of points with the same
multiset of distances are not uncommon. They are sometimes called isospectral sets,
referring to the spectrum of point-to-point distances. Examples such as these may suggest
that unordered separations cannot be the basis of our experience, although we can't rule
out, a priori, the possibility that our interpretation of experience is non-unique, and that
different states of consciousness might perceive a given physical configuration
differently. Even if we reject the possibility of non-unique mapping to our conventional
domain of objects, we could still imagine a separation-based ontology by stipulating an
ordering for those separations. (One hypothetical form which laws of separation might
take is discussed in Section 4.2.) By recognizing the need to specify this ordering, our
focus shifts back to a particle-based ontology.
As noted previously, according to both Galilean and Einsteinian (special) relativity,
position and velocity are relative but acceleration is not. However, it can be argued that
the absoluteness of acceleration is incongruous with Galilean spacetime, because if
spacetime was Galilean there would be no reason for acceleration to be absolute. This
was already alluded to in the discussion of Section 1.8, where the cyclic symmetry of the
velocity relations between three Galilean reference systems was noted. In a sense, the
relationist Leibniz was correct in asserting that absolute space and time are inconsistent
with Galilean relativity, citing the “principle of sufficient reason” in support of this claim.
If time and space are separate and distinct (which no one had ever disputed) then there
would be no observable distinction between accelerated and un-accelerated systems of
reference, as revealed by the fact that the concept of a moveable rigid body of arbitrary
size is perfectly consistent with the kinematics of Galilean relativity. Samual Clarke had
argued that if all the material in some finite universe was accelerated in tandem,
maintaining all the intrinsic relations between the particles, this acceleration would still
be physically real, even though no one could observe the acceleration (for lack of
anything to compare with it). Leibniz replied
Motion does not indeed depend upon being observed, but it does depend upon
being possible to be observed. There is no motion when there is no change that
can be observed. And when there is no change that can be observed, there is no
change at all. The contrary opinion is grounded upon the supposition of a real
absolute space, which I have demonstratively confuted by the principle of the
want of a sufficient reason of things.
It is quite right that, in the context of Galilean relativity, the acceleration of all the matter
of the universe in tandem would be strictly unobservable, so Leibniz has a valid point.
However, barring some Machian long-range influence which neither Clarke nor Leibniz
seems to have imagined, the same argument implies that inertia should not exist at all.
Thus Clarke was correct in pointing out that the very existence of inertia refuts Leibniz’s
position. There is indeed an observable distinction between uniform and accelerated, i.e.,
inertia does exist. In summary, Leibniz was correct in (effectively) claiming that the
existence of inertia is logically incompatible with the Galilean concept of space and time,
whereas Clarke was correct in pointing out that inertia does actually exist. The only was
out of this impasse would have been to discard the one premise that neither of them ever
questioned, namely, the Galilean concept of space and time. It was to be another 200
years before a viable alternative to Galilean spacetime was recognized.
As explained in Section 1, the spacetime structures of Galileo and Minkowski are
formally identical if the characteristic constant c of the latter is infinite. In that case it
follows that arbitrarily large rigid bodies are possible, so it is conceivable for all the
material in an arbitrarily large region to accelerate in tandem, maintaining all the same
intrinsic spatial relations. However, if c has some finite value, this is no longer the case.
Section 2.9 described the kinematic limitation on the size of a spatial region in which
objects can be accelerated in tandem. Hence the structure of Minkowski spacetime
intrinsically distinguishes uniform motion as the only kind of motion that could be
applied in tandem to all objects throughout space. In this context, Leibniz’s principle of
sufficient reason can be used to argue that different states of uniform motion should not
be regarded as physically different, but it cannot be applied to accelerated motion,
because the very kinematics of Minkowski spacetime do not permit the tandem
acceleration of objects over arbitrarily large regions. It seems justifiable to say that the
existence of inertia implies the Minkowski character of spacetime.
This goes some way towards resolving the epistemological problems that have often been
raised against the principle of inertia. To the question “How are we to distinguish the
inertial coordinate systems from all possible systems of reference?”, we can answer that
the inertial coordinate systems are precisely those in terms of which two objects
separated by an arbitrary distance can be accelerated in tandem. This doesn’t help to
identify inertial coordinate systems in Galilean spacetime, but it fully identifies them in
the context of Minkowski spacetime. So, it can be argued that (from an epistemological
standpoint) Minkowski spacetime is the only satisfactory framework for the principle of
inertia.
Still, there remain some legitimate open issues regarding any (so far) conceived
relativististic spacetime. According to both classical and special relativity, the inertial
coordinate systems are fully symmetrical, and each one is regarded as physically
equivalent (in the absence of matter). In particular, we cannot single out one particular
inertial system and claim that it is the "central" frame, because the equivalence class has
no center, and all ontological qualities are uniformly distributed over the entire class.
Unfortunately, from a purely formal standpoint, a purported uniform distribution over
inertial frames is somewhat problematic, because the inertial systems of reference along a
single line can only be linearly parameterized in terms of a variable that ranges from -
to +, such as q = log((1+v)/(1-v)), but if each value of q is to be regarded as equally
probable, then we are required to imagine a perfectly uniform density distribution over
the real numbers. Mathematically, no such distribution exists. To illustrate, imagine
trying to select a number randomly from a uniform distribution of all the real numbers.
This is the source of many well-known mathematical conundrums, such as the "HighLow Number" strategy game, whose answer depends on the fact that no perfectly
uniform distribution exists over the real numbers (nor even over the integers). In trying to
understand whether there was any arbitrary choice in the creation of the physical world,
it’s interesting to note that the selection of our particular rest frame cannot have been
perfectly arbitrary from a set of pre-existing alternatives. It might be argued that the
impossibility of a choice between indistinguishable inertial reference frames implies that
only an absolutist framework is intelligible. However, the identity of indiscernables led
Leibnic and Mach to argue just the opposite, i.e., that the only intelligible way to imagine
the existence of objects, all in roughly the same frame of reference within a perfectly
symmetrical class of possible reference systems, is to imagine that the objects themselves
are in some way responsible for the class, which brings us back to pure relationism.
Alas, as we’ve seen, pure relationism has its own problematic implications. For one, there
has traditionally been a close association between relationism and the concept of absolute
simultaneity. This is because the “relations” were regarded as purely spatial, and it was
necessary to posit a unique instant of time in which to evaluate those spatial relations. To
implement a spatial relationist theory in the framework of Minkowski spacetime would
evidently require that whatever laws apply to the spatial relations for one particular
decomposition of spacetime must also apply to all other decompositions. (A simple
example of this is discussed in Section 4.2.) Alternatively, we might say that only
invariant quantities should be subject to the relational laws, but this amounts to the same
thing as requiring that the laws apply to all decompositions.
One common feature of all purely relational models based on Galilean space and time is
their evident non-locality, because (as noted above) there is no way, if we limit ourselves
to local observations, to identify the inertial motions of material objects purely from the
kinematical relations between them. We're forced to attribute the distinction between
inertial and non-inertial motion to some non-material (or non-local) interaction. This is
nicely illustrated by Einstein's thought experiment (based on Newton's famous "spinning
pail") involving two nominally identical fluid globes S1 and S2 floating in an empty
region of space. One of these globes is set rotating (about their common axis) while the
other remains stationary. The rotating globe assumes an oblate shape due to its rotation.
If globes are mutually stationary and not rotating, they are both spherical and
symmetrical, and we cannot distinguish between them, but if one of the globes is
spinning about their common axis, the principle of inertia leads us to expect that the
spinning globe will bulge at the "equator" and shrink along its axis of rotation due to the
centripetal forces. The "paradox" (for the relationist) is that each globe is spinning with
respect to the other, so they must still be regarded as perfectly symmetrical, and yet their
shapes are no longer congruent. To what can we attribute the asymmetry?
If we look further afield we may notice that the deformed globe is rotating relative to all
the distant stars, whereas the spherical globe is not. A little experimentation shows that a
globe's deformation is strictly a function of its speed of rotation relative to the distant
stars, and presumably this is not a mere coincidence. Newton's explanation for this
coincidence was to argue that the local globes and the distant stars all reside in the same
absolute space, and it is this space that defines absolute (inertial) motion, and likewise the
special relativistic theory invokes an absolutely preferred class of reference frames.
Moreover, in the general theory of relativity, when viewed from a specific cosmological
perspective, there is always a preferred frame of reference, owing to the global boundary
conditions that must be imposed in order to single out a solution. This came as a shock to
Einstein himself at first, since he was originally thinking (hoping) that the field equations
of general relativity represented true relationism, but his conversion began when he
received Schwarzschild's exact solution for spherical symmetry, which of course exhibits
a preferred coordinate system such that the metric coefficients are independent of time,
i.e., the usual Schwarzschild coordinates, which are essentially unique for that particular
solution.
Likewise for any given solution there is some globally unique system of reference singled
out by symmetry or boundary conditions (even for asymptotically flat universes, as
Einstein himself showed). For example, in the Friedman "big bang" cosmologies there is
a preferred global system of coordinates corresponding to the worldlines with respect to
which the cosmic background radiation is isotropic. Of course, this is not a fresh insight.
The non-relational global aspects of general relativistic cosmologies have been
extensively studied, beginning with Einstein's 1917 paper on the subject, and continuing
with Gödel's rotating empty universes, and so on. Such examples make it clear that
general relativity is not a relational theory of motion. In other words, general relativity
does not correlate all physical effects with the relations between material bodies, but
rather with the relations between objects (including fields) and the absolute background
metric, which is affected by, but is not determined by, the distribution of objects (except
arguably in closed cosmological models). Thus relativity, no less than Newtonian
mechanics, relies on spacetime as an absolute entity in itself, exerting influence on fields
and material bodies. The extra information contained in the metric of spacetime is
typically introduced by means of boundary conditions or "initial values" on a spacelike
foliation, sufficient to fix a solution of the field equations.
In this way relativity very quickly disappointed its early logical-positivist supporters
when it became clear that it was not, and never had been, a relational theory of motion, in
the sense of Leibniz, Berkeley, or Mach. Initially even Einstein was disturbed by the
Schwarzschild and de Sitter solutions (see Section 7.6), which represent complete
metrical manifolds with only one material object or none at all (respectively). These
examples showed that spacetime in the theory of relativity cannot simply be regarded as
the totality of the extrinsic relations between material objects (and non-gravitational
fields), but is a primary physical entity of the theory, with its own absolute properties,
most notably the metric with its related invariants, at each point. Indeed this was
Einstein's eventual answer to Mach's critique of pre-relativity physics. Mach had
complained that it was unacceptable for our theories to contain elements (such as
spacetime) that act on (i.e., have an effect on) other things, but that are not acted upon by
other things. Mach, and the other relationalists before him, naturally expected this to be
resolved by eliminating spacetime, i.e., by denying that an entity called "spacetime" acts
in any physical way. To Mach's surprise (and unhappiness), the theory of relativity
actually did just the opposite - it satisfied Mach's criticism by instead making spacetime a
full-fledged element of theory, acted upon by other objects. By so doing, Einstein
believed he had responded to Mach's critique, but of course Mach hated it, and said so.
Early in his career, Einstein was sympathetic to the idea of relationism, and entertained
hopes of banishing absolute space from physics but, like Newton before him, he was
forced to abandon this hope in order to produce a theory that satisfactorily represents our
observations.
The absolute significance of spacetime in the theory of relativity was already obvious
from trivial considerations of the special theory. The twins paradox is a good illustration
of why relativity cannot be a relational theory, because the relation between the twins is
perfectly symmetrical, i.e., the spatial distance between them starts at zero, increases to
some maximum value, and then decreases back to zero. The distinction between the twins
cannot be expressed in terms of their mutual relations to each other, but only in terms of
how each of their individual worldlines are embedded in the absolute metrical manifold
of spacetime. This becomes even more obvious in the context of general relativity,
because we can then have multiple distinct geodesic paths between two given events,
with different lapses of proper time, so we cannot even appeal to any difference in "felt"
accelerations or local physics of any kind along the two world-paths to account for the
asymmetry. Hopes of accounting for this asymmetry by reference to the distant stars, ala
Mach, were certainly not fulfilled by general relativity, according to which the metric of
spacetime is conditioned by the presence of matter, but only to a very slight degree in
most circumstances. From an overall cosmological standpoint we are unable to attribute
the basic inertial field to the configuration of mass and energy, and we have no choice but
to simply assume a plausible absolute inertial background field, just as in Newtonian
physics, in order to actually make predictions and solve problems. This is necessarily a
separate and largely independent stipulation from our assumed distribution of matter and
energy.
To understand why Galilean relativity is actually more relational than special relativity,
note that the unified spacetime manifold with the lightcone structure of Minkowski
spacetime is more rigid than a pure Cartesian product of a three-dimensional spatial
manifold and an independent one-dimensional temporal manifold. In Galilean spacetime
at a spatial point P0 and time t0 there is no restriction at all on the set of spatial points at t0
+ dt that may "spatially coincide with P0" with respect to some valid inertial frame of
reference. In other words, an inertial worldline through P0 at time t0 can pass through any
point in the entire universe at time t0 + dt for any positive dt. In contrast, the lightcone
structure of Minkowski spacetime restricts the future of the point P0 to points inside the
future null cone, i.e., P0  cdt, and as dt goes to zero, this range goes to zero, imposing a
well-defined unique connection from each "infinitesimal" instant to the next, which of
course is what the unification of space and time into a single continuum accomplishes.
We referred above to Newtonian spacetime without distinguishing it from what has come
to be called Galilean spacetime. This is because Newton's laws are manifestly invariant
under Galilean transformations, and in view of this it would seem that Newton should be
counted as an advocate of relativistic spacetime. However, in several famous passages of
the first Scholium of the Principia Newton seems to reject the very relativity on which his
physics is founded, and to insist on distinctly metaphysical conceptions of absolute space
and time. He wrote
I do not define the words time, space, place, and motion, since they are well
known to all. However, I note that people commonly conceive of these quantities
solely in terms of the relations between the objects of sense perception, and this is
the source of certain preconceptions, for the dispelling of which it is useful to
distinguish between absolute and relative, true and apparent, mathematical and
common.
It isn't trivial to unpack the intended significance of these statements, especially because
Newton has supplied three alternate names for each of the two types of quantities that he
wishes us to distinguish. On one hand we have absolute, true, mathematical quantities,
and on the other we have relative, apparent, common quantities. The latter are understood
to be founded on our sense perceptions, so the former presumably are not, which seems
to imply that they are metaphysical. However, Newton also says that this distinction is
useful for dispelling certain prejudices, which suggests that his motives are utilitarian
and/or pedagogical rather than to establish an ontology. He continues
Absolute, true, and mathematical time, in and of itself and of its own nature flows
uniformly (equably), without reference to anything external. By another name it is
called duration. Relative, apparent, and common time is any sensible external
measure of duration by means of motion. Such measures (for example, an hour, a
day, a month, a year) are commonly used instead of true time.
Absolute space, in its own nature, without relation to anything external, remains
always similar and immovable. Relative space is some movable measure of
absolute space, which our senses determine by the positions of bodies... Absolute
and relative space are of the same type (species) and magnitude, but are not
always numerically the same...
Place is a part of space which a body takes up, and is according to the space either
absolute or relative.
Absolute motion is the translation of a body from one absolute place to another,
and relative motion is the translation from one relative place to another.
Newton's insistence on the necessity of referring all true motions to "immovable space"
has often puzzled historians of science, because his definition of absolute space and time
are plainly metaphysical, and it's easy to see that Newton's actual formulation of the laws
of physics is invariant under Galilean transformations, and the concept of absolute space
plays no role. Indeed, each mention of a "state of rest" in the definitions and laws is
accompanied by the phrase "or uniform motion in a right line", so the system built on
these axioms explicitly does not distinguish between these two concepts. What, then, did
Newton mean when he wrote that true motions must be referred to immovable space?
The introductory Scholium ends with a promise to explain how the true motions of
objects are to be determined, declaring that this was the purpose for which the Principia
was composed, so it's all the more surprising when we find that the subject is never even
mentioned in Books I or II. Only in the concluding Book III, "The System of the World",
does Newton return to this subject, and we finally learn what he means by "immovable
space". Although his motto was "I frame no hypotheses" we find, immediately following
Proposition X in Book III (in the third edition) the singular hypothesis
HYPOTHESIS I: That the centre of the system of the world is immovable.
In support of this remarkable assertion, Newton simply says "This is acknowledged by
all, although some contend that the earth, others that the sun, is fixed in that centre." In
the subsequent proposition XI we finally discover Newton's immovable space. He writes
PROPOSITION XI: That the common centre of gravity of the earth, the sun, and
all the planets, is immovable. For that centre either is at rest or moves uniformly
forwards in a right line; but if that centre moved, the center of the world would
move also, against the Hypothesis.
This makes it clear that Newton's purpose all along has been not to deny Galilean
relativity or the fundamental principle of inertia, but simply to show that a suitable
system of reference for determining true inertial motions need not be centered on some
material body. This was foreshadowed in the first Scholium when he wrote "it may be
that there is no body really at rest, to which the places and motions of others may be
referred". Furthermore, he notes that many people believed the immovable center of the
world was at the center of the Earth, whereas others followed Copernicus in thinking the
Sun was the immovable center. Newton evidently (and rightly) regarded it as one of the
most significant conclusions of his deliberations that the true inertial center of the world
was in neither of those objects, but is instead the center of gravity of the entire solar
system. We recall that Galileo found himself in trouble for claiming that the Earth moves,
whereas both he and Copernicus believed that the Sun was absolutely stationary. Newton
showed that the Sun itself moves, as he continues
PROPOSITION XII: That the sun is agitated by a continual motion, but never
recedes far from the common centre of gravity of all the planets. For since the
quantity of matter in the sun is to the quantity of matter in Jupiter as 1067 to 1,
and the distance to Jupiter from the sun is to the semidiameter of the sun is in a
slightly greater proportion, the common center of gravity of Jupiter and the sun
will fall upon a point a little without the surface of the sun.
This was certainly a magnificent discovery, worthy of being called the purpose for which
the Principia was composed, and it is clearly what Newton had in mind when he wrote
the introductory Scholium promising to reveal how immovable space (i.e., the center of
the world) is to be found. In this context we can see that Newton was not claiming the
ability to determine absolute rest, but rather the ability to infer from phenomena a state of
absolute inertial motion, which he identified with the center of gravity of the solar
system. He very conspicuously labels as a Hypothesis (one of only three in the final
edition of the Principia) the conventional statement, "acknowledged by all", that the
center of the world is immovable. By these statements he was trying to justify calling the
solar system's inertial center the center of the world, while specifically acknowledging
that the immovability of this point is conventional, since it could just as well be regarded
as moving "uniformly forwards in a right line".
The modern confusion over Newton's first Scholium arises from trying to impose an
ontological interpretation on a 17th century attempt to isolate the concept of pure inertia,
and incidentally to locate the "center of the world". It was essential for Newton to make
sure his readers understood that "uniform motion" and "right lines" cannot generally be
judged with reference to neighboring bodies (such as the Earth's spinning surface),
because those bodies themselves are typically in non-uniform motion. Hence he needed
to convey the fact that the seat of inertia is not the Earth's center, or the Sun, or any other
material body, but is instead absolute space and time - in precisely the same sense that
spacetime is absolute in special relativity. This is distinct from asserting an absolute state
of rest, which Newton explicitly recognized as a matter of convention.
Indeed, we now know the solar system itself revolves around the center of the galaxy,
which itself moves with respect to other galaxies, so under Hypothesis I we must
conclude that Proposition XI is strictly false. Nevertheless, the deviations from true
inertial motion represented by those stellar and galactic motions are so slight that
Newton's "immovable center of the world" is still suitable as the basis of true inertial
motion for nearly all purposes. In a more profound sense, the concept of "immoveable
space" been carried over into modern relativity because, as Einstein said, spacetime in
general relativity is endowed with physical qualities that enable it to establish the local
inertial frames, but "the idea of motion may not be applied to it".
4.2 Inertial and Gravitational Separations
And I am dumb to tell a weather’s wind
How time has ticked a heaven round the stars.
Dylan Thomas, 1934
The special theory of relativity is formulated as a local theory, so its natural focus is on
the worldlines of individual particles. In addition, special relativity presupposes a
preferred class of worldlines, those representing inertial motion. The idea of a worldline
is inherently “absolute” in the sense that it is nominally defined with reference only to a
system of space and time coordinates, not to any other objects. This is in contrast to a
truly relational theory, which would take the "dual" approach, and regard the separations
between particles as the most natural objects of study. In fact, as mentioned in Section
4.1, we could go to the relationist extreme of regarding separations as the primary
ontological entities, and considering particles to be merely abstract concepts that we use
to psychologically organize and coordinate the separations. The relationist view arguably
has the advantage of not presupposing a fixed background or even a definite
dimensionality of space, since each “separation” could be considered to represent an
independent degree of freedom. Of course, this freedom doesn’t seem to exist in the real
world, since we cannot arrange five particles all mutually equidistant from each other.
Indeed it appears that the n(n1)/2 separations between n particles can be fully encoded
as just 3n real numbers, and moreover that those real number vary continuously as the
individual particles “move”. This is the justification for the idea of particles moving in a
coherent three-dimensional space.
Nevertheless, it’s interesting to examine the spatial separations that exist between
material particles (as opposed to the space and time coordinates of individual particles),
to see if their behavior can be characterized in a simple way. From this point of view, the
idea of "motion" is secondary; we simply regard separations as abstract entities having
certain properties that may vary with time. In this context, rather than discussing inertial
motion of an individual particle, we consider the spatial separation (as a function of time)
between two inertial particles. However, since we don’t presuppose a background of
absolute inertial motion, we will refer to the particles as being “co-inertial”, meaning
simply that the spatial separation between them behaves like the separation between two
particles in absolute inertial motion, regardless of whether the two particles are actually
in absolute inertial motion.
Is it possible to characterize in a simple way the spatial separations that exist between coinertial particles? Consider, for example, the spatial separation s(t) as a function of time
between a stationary particle and a particle moving uniformly in a straight line through
space, as depicted in the figure below for the condition when the direction of motion of
the moving particle B is perpendicular to the displacement from the stationary particle A.
Obviously the separation between objects A and B in this configuration is stationary at
this instant, i.e., we have ds/dt = 0, and yet we know from experience that this physical
situation is distinct from one in which the two objects are actually stationary with respect
to each other’s inertial rest frames. For example, the Moon and Earth are separated by
roughly a constant distance, and yet we understand that the Moon is in constant motion
perpendicular to its separation from the Earth. It is this transverse motion that counteracts the effect of gravity and keeps the Moon in its orbit. This is another reason that we
ordinarily find it necessary to describe motion not in purely relational terms, but in terms
of absolutely non-rotating systems of inertial coordinates. Of course, as Mach observed,
the apparent existence of “absolute rotation” doesn’t necessarily refute relationism as a
viable basis for coordinating events. It could also mean that we must take more relations
into account. (For example, the Moon’s motion is always tangential to the Earth, but it is
not always tangential to other bodies, so it’s orbital motion does show up in the totality of
binary separations.) Whether or not a workable physics could be developed on a purely
relational basis is unclear, but it’s still interesting to examine the class of co-inertial
separations as functions of time. It turns out that co-inertial separations are characterized
by a condition that is nearly identical to the condition for linear gravitational free-fall, as
well as for certain other natural kinds of motion.
The three orthogonal components x, y, and z of the separation between two particles in
unaccelerated motion relative to a common reference frame must be linear functions of
time, i.e.,
where the coefficients ai and bi are constants. Therefore the magnitude of any "co-inertial
separation" is of the form
where
Letting the subscript n denote nth derivative with respect to time, the first two derivatives
of s(t) are
The right hand equation shows that s2 s03 = k, and we can differentiate this again and
divide the result by s02 to show that the separation s(t) between any two particles in
relatively unaccelerated (i.e., co-inertial) motion in Galilean spacetime must satisfy the
equation
Now we consider the separation that characterizes an isolated non-rotating two-body
system in gravitational free-fall. Assume the two bodies are identical particles, each of
mass m. According to Newtonian theory the inertial and gravitational constraints are
coupled together by the auxiliary quantity called "force" by the following equations
where G is a universal constant. (Note that each particle's "absolute" acceleration is half
of the second derivative of their mutual separation with respect to time.) Equating these
two forces gives s2 s02 = 2Gm. Differentiating this again and dividing through by s0, we
can characterize non-rotating gravitational free-fall by the purely kinematic equation
The formal similarity between equations (1) and (2) is remarkable, considering that the
former describes strictly inertial separations and the latter describes gravitational
separations. We can show how the two are related by considering general free motion in a
gravitational field. The Newtonian equations of motion are
where r is the magnitude of the distance from the center of the field and  is the angular
velocity of the particle. If we solve the left hand equation for  and differentiate to give
d/dt, we can substitute these expressions into the right hand equation and re-arrange the
terms to give
which applies (in the Newtonian limit) to arbitrary free paths of test particles in a
gravitational field. Obviously if m = 0 this reduces to equation (1), representing free
inertial separations, whereas for purely radial motion we have d2r/dt2 = m/r2, and so this
reduces to equation (2), representing radial gravitational separation.
Other classes of physical separations also satisfy a differential equation similar to (1) and
(2). For example, consider a particle of mass m attached to a rod in such a way that it can
slide freely along the rod. If we rotate the rod about some point P then the particle in
general will tend to slide outward along the rod away from the center of rotation in
accord with the basic equation of motion
where s is the distance from the center of rotation to the sliding particle, and  is the
angular velocity of the rod. Differentiating and multiplying through by s0 gives
Then since s2 = 2s0, we see that s(t) satisfies the equation
(3)
So, we have found that arbitrary co-inertial separations, non-rotating gravitational
separations, and rotating radial separations are all characterized by a differential equation
of the form
(4)
for some constant N. (Among the other solutions of this equation (with N = 1) are the
elementary transcendental functions et, sin(t), and cos(t).) Solving for N, to isolate the
arbitrary constant, we have
Differentiating this gives the basic equation
If none of s0, s1, s2, and s3 is zero, we can divide each term by all of these to give the
interesting form
This could be seen as a (admittedly very simplistic) “unification” of a variety of
physically meaningful spatial separation functions under a single equation. The
“symmetry breaking” that leads to the different behavior in different physical situations
arises from the choice of N, which appears as a constant of integration.
Incidentally, even though the above has been based on the Galilean spatial separations
between objects as a function of Galilean time, the same conditions can be shown to
apply to the absolute spacetime intervals between inertial particles as a function of their
proper times. Relative to any point on the worldline of one particle, the four components
t, x, y, and z of the absolute interval to any other inertially moving particle are all
linear functions of the proper time  along the latter particle's worldline. Therefore, the
components can be written in the form
where the coefficients ai and bi are constants. It follows that the absolute magnitude of
any "co-inertial separation" is of the form
where
Thus we have the same formal dependence as before, except now the parameter s
represents the absolute spacetime separation. This shows that the absolute separation
between any fixed point on one inertial worldline and a point advancing along any other
inertial worldline satisfies equation (1), where subscripts denote derivatives with respect
to proper time of the advancing point. Naturally the reciprocal relation also holds, as well
as the absolute separation between two points, each advancing along arbitrary inertial
worldlines, correlated according to their respective proper times.
4.3 Free-Fall Equations
When, therefore, I observe a stone initially at rest falling from an elevated
position and continually acquiring new increments of speed, why should I
not believe that such increases take place in a manner which is
exceedingly simple and rather obvious to everybody?
Galileo
Galilei, 1638
The equation of two-body non-rotating radial free-fall in Newtonian theory is formally
identical to the one-body radial free-fall solution in Einstein's theory (as is Kepler's third
law), provided we identify Newton's radial distance with the Schwarzschild parameter r,
and Newton's time with the proper time of the falling particle. Therefore, it's worthwhile
to explicitly derive the cycloidal form of this solution. From the Newtonian point of view
we can begin with the inverse-square law of gravitation for the radial separation s(t)
between two identical non-rotating particles of mass m
where dots signify derivatives with respect to time. Integrating this over ds from an
arbitrary initial separation s(0) to the separation s(t) at some other time t gives
Notice that the left hand integral can be rewritten
Therefore, the previous equation can easily be integrated to give
which shows that the quantity
is invariant for all t. Solving the equation for
Rearranging, this gives
, we have
To simplify the expressions, we put s0 = s(0), v0 =
preceding expression can be written
and r = s(t)/s0. In these terms, the
There are two cases to consider. If K is positive, then the trajectory is bounded, and there
is some point on the trajectory (the apogee) at which v = 0. Choosing this point as our
time origin t = 0, we have K=1, and the standard integral gives
This equation describes a (scaled) cycloidal relation between t and r, which can be
expressed parametrically in terms of a fictitious angle  as follows
To verify that these two equations are equivalent to the preceding equation, we can solve
the second for  and substitute into the first to give
Using the trigonometric identity
the right side is
we see that the first term on
Also, letting  = invcos(2r1), we can use the trigonometric identity
to show that this angle is
so the second term on the right side of (2) is
which completes the demonstration that the cycloid relation given by (2) is equivalent to
the free-fall relation (1).
The second case is when K is negative. For this case we can conveniently express the
equations in terms of the positive parameter k = -K. The standard integral
tells us that, for any two points s0 and s1 on the trajectory, the time interval is related to
the separations according to
where
Notice that if we define S0 = s0 / k and R = kr, then this becomes
Thus, if we define the normalized time parameter
then the normalized equation of motion is
This represents the shape of every non-rotating separation between identical particles of
mass m for which k is positive, which means that the absolute value of v0 exceeds
2
. These are the unbound radial orbits for which R goes to infinity, as opposed
to the case when the absolute value of v0 is less than this threshold, which gives bound
radial orbits in the shape of a cycloid in accord with equation (1).
It's interesting to note the "removable singularity" of (3) at R = 0. Physically the
parameter R is always non-negative by definition, so it abruptly reverses slope at the
origin, even though the position may vary monotonically with respect to an external
coordinate system.
4.4 Force, Curvature, and Uncertainty
The atoms, as their own weight bears them down plumb through the void,
at scarce determined times, in scarce determined places, from their course
decline a little - call it, so to speak, mere changed trend. For were it not
their wont thuswise to swerve, down would they fall, each one, like drops
of rain, through the unbottomed void; and then collisions ne'er could be,
nor blows among the primal elements; and thus Nature would never have
created aught.
Lucretius, 50 BC
The trajectory of radial non-rotating gravitational freefall can be expressed by the simple
differential equation
where k is a constant and dots signify derivatives with respect to time. This equation is
valid for both Newtonian gravity and general relativity, provided we identify Newton's
time parameter with the free-falling particle's proper time, and Newton's radial distance
with the radial Schwarzschild coordinate. Notice that no gravitational constant appears in
this equation (k is just a constant of integration determined by the initial conditions), so
equation (1) is a purely kinematic description of gravity. Why did Newton not adopt this
simple kinematic view? Historically the reasons involved considerations of rotating
systems, but the basic problem with the kinematic view is present even with simple nonrotating free-fall.
The problem is that equation (1) has an unrealistic "static solution" at
. This
condition implies that k=0, and the separation between the two objects has no proper
"trajectory" (i.e., time drops out of the equation), so the equation cannot extrapolate the
position forward or backward in time. Of course, this condition can never arise naturally
from any non-static condition with k0, but we can imagine that by the imposition of
some external force we can arrange to have the two objects initially at rest and not
accelerating relative to each other. Then when the objects are released from the "outside"
force we expect them to immediately begin falling toward each other under the influence
of their mutual gravitational attraction. This implies that k, and therefore , must
immediately assume some non-zero values, but equation (1) gives us no information
about these values, because the entire equation identically vanishes at the static solution.
To escape from the static solution, Newtonian mechanics splits the kinematic equation of
motion into two parts, coupled together by the dynamical concepts of force and mass.
Two objects are said to exert (equal and opposite) force on each other proportional to the
inverse of the square of the separation between them, and the second derivative of that
separation is proportional (per mass) to this force. Thus, the relation between separation
and time for two identical particles, each of mass m, is given not by a single kinematic
equation but by two simultaneous equations
If we combine these two equations by eliminating F, we have
which shows that when the two objects are released, the separation instantly acquires the
second derivative 2Gm/s2. Once this "initialization" has been accomplished, the
subsequent free fall is entirely determined by equation (1), as can be seen by
differentiating (2) to give
which, assuming the separation is not zero, can be divided by s to give
, the
derivative of (1). This shows that, for non-rotating radial free-fall, the coupling
parameters F and m are entirely superfluous except that they serves to establish the proper
initial condition when the two objects are released from rest. Thus, Newton's dual
concepts of force-at-a-distance and the proportionality of acceleration to force serve only
(in this context) to enable us to solve for a non-vanishing as a function of s when = 0,
which equation (1) obviously cannot do.
Furthermore, the constant G does not appear in (1) or (3), even though they give a
complete description of gravitational free-fall except for the singularity at = 0. Thus
the gravitational constant is also needed only at this singular point, the "static solution" of
equation (1), which is the only point at which the dynamical concepts of force and mass
are used. Aside from this singular condition, non-rotating radial Newtonian gravity is a
purely kinematical phenomenon.
There are several essentially equivalent formulations of the kinematic equation of nonrotating radial gravitational motion, but all lead to an indeterminate condition at the static
solution. For example, if we set k =  in equation (1) and multiply through by 2
we
have
. Integrating this over time gives
constant of integration. Dividing by s gives
where  is a
which we recognize as expressing the classical conservation of energy, with the first term
representing potential energy and the second term denoting kinetic energy. Taking the
derivative of this gives
Notice that in each of the preceding equations the condition
still represents a
solution for any s, even though it is unrealistic. At this point we may be tempted to solve
our problem by dividing through equation (4) by to give
which is the Newtonian inverse-square "force" law of gravity. This does indeed
determines the second derivative as a function of s, and thereby provides the
information needed to depart from the externally imposed static initial condition.
However, notice that the condition which concerns us is precisely when = 0, so when
we divided equation (4) by we were essentially just eliminating the singular pole
arbitrarily by dividing by zero. Thus we can't properly say that the "force-at-a-distance"
law (5) follows from equation (1). The removal of the indeterminate singularity actually
represents an independent assumption relative to the basic kinematic equation of motion.
Of course, this assumption is perfectly compatible with the equation of motion, as can be
seen by solving equation (5) for /s and substituting into the energy equation to give
and thus
which is the same as equation (1). This compatibility is a necessary consequence of the
fact that the equation of motion is totally indeterminate when =0, which is the only
condition at which the force law introduces new information not contained in the basic
equation of motion.
In view of the above relations, it is not surprising that in the general theory of relativity
we find gravity expressed without the concept of force. Einstein avoided the problem of
the static solution - without invoking an auxiliary concept such as force - simply by
recasting the phenomena in four-dimensional space-time, within which no material object
is ever static. Every object, even one "at rest" in space, necessarily has a proper
trajectory through spacetime, because it's moving forward in time. Furthermore, if we
allow the spacetime manifold to posses intrinsic curvature, it follows that a purely timelike trajectory can "veer off" and acquire space-like components.
Of course, this tendency to "veer off" depends on the degree of curvature of the spacetime, which general relativity relates to the mass-energy in the region. One of Einstein's
motivations for the general theory was the desire to eliminate arbitrary constants,
particularly the gravitational constant G, from the expressions of physical laws, but in the
general theory it is still necessary to determine the proportionality between mass and
curvature empirically, so the arbitrary gravitational constant remains. In any case, we see
that Newtonian mechanics and general relativity give formally identical relations between
separation and time for non-rotating free-fall, and the conceptual differences between the
two theories can be expressed in terms of the ways in which they escape from or avoid
the static condition.
It's interesting to note that the static solution of (1) is unstable in the direction collapse.
Given a positive separation s, the signs of
must be {+,}, {+,+}, {,+} or {,} in
order to satisfy (1), but considering small perturbations of these derivatives from the state
in which they are both zero, it's clear that {+,} is unrealistic, because would not go
positive from zero while was going negative from zero. For similar reasons,
perturbations leading to {+,+} and {,+} are also excluded. Only the case {,}
represents a realistic outcome of a small perturbation from the static solution.
This instability in the direction of collapse suggests another approach to escaping from
(or avoiding) the static solution. The exact velocity and position of the two objects
cannot be known at the quantum level, so, in a sense, the closest that two bodies can
come to a static condition must still allow the equivalent of one quanta of momentum in
their relative velocities. It's tempting to imagine that there might be some way of
deriving the gravitational constant based on the idea that the initial condition for (1) is
determined by the characteristic quantum uncertainty for the separations between massive
particles, since, as we've seen, this initial condition fully determines the trajectory of
radial gravitational free-fall. Simplistically we could note that, for a particle of mass m,
any finite limit L on allowable distances implies two irreducible quantities of energy per
unit mass, one being (h/2L)2/2m2 corresponding to the minimum "observable"
momentum mv = h/2L (where h is Planck's constant) due to the uncertainty principle, and
the other being the minimum gravitational potential energy Gm/L. Identifying these two
energies with each other, and setting L equal to the event horizon radius c/H where c is
the velocity of light and H is Hubble's expansion constant, we have the relation
Inserting the values h = (6.625)10-34 J sec, G = (6.673)10-11 Nm2/kg2, c = (2.998)108
m/sec, and H = (2.3)10-18 sec-1 gives a value of (1.8477)10-28 kg for the characteristic
mass m, which happens to be about one ninth the mass of a proton. Rough relationships
of this kind between the fundamental physical constants have been discussed by Dirac
and others, including Leopold Infeld, who wrote in 1949
Let us take as an example Maxwell’s equations and try to find their solution on a
cosmological background… In a closed universe the frequency of radiation has a
lowest value [corresponding to the maximum possible wavelength]. The
spectrum, on its red side, cannot reach frequency zero. We obtain characteristic
values for frequencies… a similar situation prevails if we consider Dirac’s
equations upon a cosmological background. The solutions in a closed universe are
different, not because of the metric, but because of the topology of our universe.
Such ideas are intiguing, but they have yet to be incorporated meaningfully into any
successful physical theory.
The above represents a very simplistic sense in which the uncertainty of quantum
mechanics and the spacetime curvature of general relativity can be regarded as two
alternative conceptual strategies for establishing a consistent gravitational coupling. In a
more sophisticated sense, we can find other interesting formal parallels between these
two concepts, both of which fundamentally express non-commutativity. Given a system
of orthogonal xyz coordinates, let A,B,C denote operations which, when applied to any
unit vector emanating from the origin, rotate that vector in the positive sense about x, y,
or z axis respectively. Each of these operations can be represented by a rotation matrix,
such that multiplying any vector by that matrix will effectively rotate the vector
accordingly. As Hamilton realized in his efforts to find a three-dimensional analog of
complex numbers (which represent rotation operators in two-dimensions), the
multiplication (i.e., composition) of two rotations in space is not commutative. This is
easily seen in our example, because if we begin with a vector V emanating from the
origin in the positive z direction, and we first apply rotation A and then rotation B, we
arrive at a vector pointing in the positive y direction, whereas if we begin with V and
apply the rotation B first and then A we arrive at a vector pointing in the negative x
direction. Thus the effect of the combined operation AB is different from the effect of
the combined operation BA, and so the matrix AB  BA does not vanish. This is in
contrast with ordinary scalars and complex numbers, which always satisfy the
commutivity relation ab  ba = 0 for every two numbers a,b.
This non-commutivity also appears when dealing with calculus on curved manifolds,
which we will discuss in more detail in Section 5. Just to give a preliminary indication of
how non-commutative relations arise in this context, suppose we have a vector field T
defined over a given metrical manifold, and we let T denote covariant differentiation
of T first with respect to the coordinate x and then with respect to the coordinate x. In
a flat manifold the covariant derivative is identical to the partial derivative, which is
commutative. In other words, the result of differentiation with respect to two coordinates
in succession is independent of the order in which we apply the differentiations.
However, in a curved manifold this is not the case. We find that reversing the order of
the differentiations yields different results, just as when applying two rotations in
succession to a vector. Specifically, we will find that
where R is the Riemann curvature tensor, to be discussed in detail in Section 5.7.
The vanishing of this tensor is the necessary and sufficient condition for the manifold to
be metrically flat, i.e., free of intrinsic curvature, so this tensor can be regarded as a
measure of the degree of non-commutivity of covariant derivative operators in the
manifold.
Non-commutivity also plays a central role in quantum mechanics, where observables
such as position and momentum are represented by operators, much like the rotation
operators in our previous example, and the possible observed states are eigenvalues of
those operators. If we let X and P denote the position and momentum operators, the
application of one of these operators to the state vector of a given system results in a new
state vector with specific probabilities. This represents a measurement of the respective
observable. The effect of a position measurement followed by a momentum
measurement can be represented by the combined operator XP, and likewise the effect of
a momentum measurement followed by a position measurement can be represented by
PX. Again we find that the commutative property does not generally hold. If two
observable are compatible, such as the X position and the Y position of a particle, then
the operators commute, which means we have XY  YX = 0. However, if two operators
are not compatible, such as position and momentum, their operators do not commute.
This leads to the important relation
This non-commutivity in the measurement of observables implies an inherent limit on the
precision to which the values of the incompatible observables can be jointly measured.
In general it can be shown that if A and B are the operators associated with the physical
quantities a and b, and if a and b denote the expected root mean squares of the
deviations of measured values of a and b from their respective expected values, then
This is Heisenberg's uncertainty relation. The commutator of two observable operators is
invariably a multiple of Planck's constant, so if Planck's constant were zero, all
observables would be compatible, i.e., their operators would commute, just as do all
classical operators. We might say (with some poetic license) that Planck's constant is a
measure of the "curvature" of the manifold of observation. This "curvature" applies only
to incompatible observables, although the term "incompatible" is somewhat misleading,
because it actually signifies that two observables A,B are conjugates, i.e., transformable
into each other by the conjugacy relation A=UBU-1 where U is a unitary operator
(analagous to a simple rotation operator).
4.5 Conventional Wisdom
This, however, is thought to be a mere strain upon the text, for the words
are these: ‘That all true believers break their eggs at the convenient end’,
and which end is the convenient end, seems, in my humble opinion, to be
left to every man’s conscience…
Jonathan Swift, 1726
It is a matter of empirical fact that the speed of light is invariant in terms of inertial
coordinates, and yet the invariance of the speed of light is often said to be a matter of
convention - as indeed it is. The empirical fact refers to the speed of light in terms of
inertial coordinates, but the decision to define speeds in terms of inertial coordinates is
conventional. It’s trivial to define systems of space and time coordinates in terms of
which the speed of light is not invariant, but we ordinarily choose to describe events in
terms of inertial coordinates, partly because of the invariance of light speed based on
those coordinates. Of course, this invariance would be tautological if inertial coordinate
systems were simply defined as the systems in terms of which the speed of light is
invariant. However, as discussed in Section 1.3, the class of inertial coordinate systems is
actually defined in purely mechanical terms, without reference to the propagation of light.
They are the coordinate systems in terms of which mechanical inertia is homogeneous
and isotropic (which are the necessary and sufficient conditions for Newton’s three laws
of motion to be valid, at least quasi-statically). The empirical invariance of light speed
with respect to this class of coordinate systems is a non-trivial empirical fact, but nothing
requires us to define “velocity” in terms of inertial coordinate systems. Such systems
cannot claim to have any a priori status as the “true” class of coordinates. Despite the
undeniable success of the principle of inertia as a basis for organizing our understanding
of the processes of nature, it is nevertheless a convention.
The conventionalist view can be traced back to Poincare, who wrote in "The Measure of
Time" in 1898
... we have no direct intuition about the equality of two time intervals. The
simultaneity of two events or the order of their succession, as well as the equality
of two time intervals, must be defined in such a way that the statements of the
natural laws be as simple as possible.
In the same paper, Poincare described the use of light rays, together with the convention
that the speed of light is invariant and the same in all directions, to give an operational
meaning to the concept of simultaneity. In his book "Science and Hypothesis" (1902) he
summarized his view of time by saying
There is no absolute time. When we say that two periods are equal, the statement
has no meaning, and can only acquire a meaning by a convention.
Poincare's views had a strong influence on the young Einstein, who avidly read "Science
and Hypothesis" with his friends in the self-styled "Olympia Academy". Solovine
remembered that this book "profoundly impressed us, and left us breathless for weeks on
end". Indeed we find in Einstein's 1905 paper on special relativity the statement
We have not defined a common time for A and B, for the latter cannot be defined
at all unless we establish by definition that the time required by light to travel
from A to B equals the time it requires to travel from B to A.
In a later popular exposition, Einstein tried to make the meaning of this definition more
clear by saying
That light requires the same time to traverse the path A to M (the midpoint of AB)
as for the path B to M is in reality neither a supposition nor a hypothesis about the
physical nature of light, but a stipulation which I can make of my own freewill in
order to arrive at a definition of simultaneity.
Of course, this concept of simultaneity is also embodied in Einstein's second "principle",
which asserts the invariance of light speed. Throughout the writings of Poincare,
Einstein, and others, we see the invariance of the speed of light referred to as a
convention, a definition, a stipulation, a free choice, an assumption, a postulate, and a
principle... as well as an empirical fact. There is no conflict between these
characterizations, because the convention (definition, stipulation, free choice, principle)
that Poincare and Einstein were referring to is nothing other than the decision to use
inertial coordinate systems, and once this decision has been made, the invariance of light
speed is an empirical fact. As Poincare said in 1898, we naturally choose our coordinate
systems "in such a way that the statements of the natural laws are as simple as possible",
and this almost invariably means inertial coordinates. It was the great achievement of
Galileo, Descartes, Huygens, and Newton to identify the principle of inertia as the basis
for resolving and coordinating physical phenomena. Unfortunately this insight is often
disguised by the manner in which it is traditionally presented. The beginning physics
student is typically expected to accept uncritically an intuitive notion of "uniformly
moving" time and space coordinate systems, and is then told that Newton's laws of
motion happen to be true with respect to those "inertial" systems. It is more meaningful to
say that we define inertial coordinate systems as those systems in terms of which
Newton's laws of motion are valid. We naturally coordinate events and organize our
perceptions in such a way as to maximize symmetry, and for the motion of material
objects the most important symmetries are the isotropy of inertia, the conservation of
momentum, the law of equal action and re-action, and so on. Newtonian physics is
organized entirely upon the principle of inertia, and the basic underlying hypothesis is
that for any object in any state of motion there exists a system of coordinates in terms of
which the object is instantaneously at rest and inertia is homogeneous and isotropic
(implying that Newton's laws of motion are at least quasi-statically valid).
The empirical validity of this remarkable hypothesis accounts for all the tremendous
success of Newtonian physics. As discussed in Section 1.3, the specification of a
particular state of motion, combined with the requirement for inertia to be homogeneous
and isotropic, completely determines a system of coordinates (up to insignificant scale
factors, rotations, etc), and such a system is called an inertial system of coordinates.
Such coordinate systems can be established unambiguously by purely mechanical means
(neglecting the equivalence principle and associated complications in the presence of
gravity). The assumption of inertial isotropy with respect to a given state of motion
suffices to establishes the loci of inertial simultaneity for that state of motion. Poincare
and Einstein rightly noted the conventionality of this simultaneity definition because they
were not pre-supposing the choice of inertial simultaneity. In other words, we are not
required to use inertial coordinates. We simply choose, of our own free will, to use
inertial coordinates - with the corresponding inertial definition of simultaneity - because
this renders the statement of physical laws and the descriptions of physical phenomena as
simple and perspicuous as possible, by taking advantage of the maximum possible
symmetry.
In this regard it's important to remember that inertial coordinates are not entirely
characterized by the quality of being unaccelerated, i.e., by the requirement that isolated
objects move uniformly in a straight line. It's also necessary to require the unique
simultaneity convention that renders mechanical inertial isotropic (the same in all spatial
directions), which amounts to the stipulation of equal one-way speeds for the propagation
of physically identical actions. These comments are fully applicable to the Newtonian
concept of space, time, and inertial reference frames. Given two objects in relative
motion we can define two systems of inertial coordinates in which the respective objects
are at rest, and we can orient these coordinates so the relative motion is purely in the x
direction. Let t,x and T,X denote these two systems of inertial coordinates. That such
coordinates exist is the main physical hypothesis underlying Galilean physics. An
auxiliary hypothesis – one that was not always clearly recognized – concerns the
relationship between two such systems of inertial coordinates, given that they exist.
Galileo assumed that if the coordinates x,t of an event are known, and if the two inertial
coordinate systems are the rest frames of objects moving with a relative speed of v, then
the coordinates of that event in terms of the other system (with suitable choice of origins)
are T = t, X = x  vt. Viewed in the abstract, this is a rather peculiar and asymmetrical
assumption, although it is admittedly borne out by experience - at least to the precision of
measurement available to Galileo. However, we now know, empirically, that the relation
between relatively moving systems of inertial coordinates has the symmetrical form T =
(t  vx)/ and X = (x  vt)/ where  = (1v2)1/2 when the time and space variables are
expressed in the same units such that the constant (3)108 meters/second equals unity. It
follows that the one-way (not just the two-way) speed of light is invariant and isotropic
with respect to any and every system of inertial coordinates.
The empirical content of this statement is simply that the propagation of light is isotropic
with respect to the same class of coordinate systems in terms of which mechanical inertia
is isotropic. This is consistent with the fact that light itself is an inertial phenomena, e.g.,
it conveys momentum. In fact, the inertia of light can be seen as a common thread
running through three of the famous papers published by Einstein in 1905. In the paper
entitled "On a Heuristic Point of View Concerning the Production and Transformation of
Light" Einstein advocated a conception of light as tiny quanta of energy and momentum,
somewhat reminiscent of Newton's inertial corpuscles of light. It's clear that Einstein
already understood that the conception of light as a classical wave is incomplete. In the
paper entitled "Does the Inertia of a Body Depend on its Energy Content?" he explicitly
advanced the idea of light as an inertial phenomenon, and of course this was suggested by
the fundamental ideas of the special theory of relativity presented in the paper "On the
Electrodynamics of Moving Bodies".
The Galilean conception of inertial frames assumed that all such frames share a unique
foliation of spacetime into "instants". Thus the relation "in the present of" constituted an
equivalence relation across all frames of reference. If A is in the present of B, and B is in
the present of C, then A is in the present of C. However, special relativity makes it clear
that there are infinitely many distinct loci of inertial simultaneity through any given
event, because inertial simultaneity depends on the velocity of the worldline through the
event. The inertial coordinate systems do induce a temporal ordering on events, but only
a partial one. (See the discussion of total and partial orderings in Section 1.2.) With
respect to any given event we can still partition all the other events of spacetime into
distinct causal regions, including "past", "present" and "future", but in addition we have
the categories "future null" and "past null", and none of these constitute equivalence
classes. For example, it is possible for A to be in the present of B, and B to be in the
present of C, and yet A is not in the present of C. Being "in the present of" is not a
transitive relation.
It could be argued that a total unique temporal ordering of events is a more useful
organizing principle than the isotropy of inertia, and so we should adopt a class of
coordinate systems that provides a total ordering. We can certainly do this, as Einstein
himself described in his 1905 paper
To be sure, we could content ourselves with evaluating the time of events by
stationing an observer with a clock at the origin of the coordinates who assigns to
an event to be evaluated the corresponding position of the hands of the clock
when a light signal from that event reaches him through empty space. However,
we know from experience that such a coordination has the drawback of not being
independent of the position of the observer with the clock.
The point of this "drawback" is that there is no physically distinguished "origin" on
which to base the time coordination of all systems of reference, so from the standpoint of
assessing possible causal relations we must still consider the full range of possible
"absolute" temporal orderings. This yields the same partial ordering of events as does the
set of inertial coordinates, so the "total ordering" that we can achieve by imposing a
single temporal foliation on all frames of reference is only formal, and not physically
meaningful. Nevertheless, we could make this choice, especially if we regard the total
temporal ordering of events as a requirement of intelligibility. This seems to have been
the view of Lorentz, who wrote in 1913 about the comparative merits of the traditional
Galilean and the new Einsteinian conceptions of time
It depends to a large extent on the way one is accustomed to think whether one is
attracted to one or another interpretation. As far as this lecturer is concerned, he
finds a certain satisfaction in the older interpretations, according to which... space
and time can be sharply separated, and simultaneity without further specification
can be spoken of... one may perhaps appeal to our ability of imagining arbitrarily
large velocities. In that way one comes very close to the concept of absolute
simultaneity.
Of course, the idea of "arbitrarily large velocities" already pre-supposes a concept of
absolute simultaneity, so Lorentz's rationale is not especially persuasive, but it expresses
the point of view of someone who places great importance on a total temporal ordering,
even at the expense of inertial isotropy. Indeed one of Poincare's criticisms of Lorentz's
early theory was that it sacrificed Newton's third law of equal action and re-action. (This
can be formally salvaged by assigning the unbalanced forces and momentum to an
undetectable ether, but the physical significance of a conservation law that references
undetectable elements is questionable.) Oddly enough, even Poincare sometimes
expressed the opinion that a total temporal ordering would always be useful enough to
out-weigh other considerations, and that it would always remain a safe convention. The
approach taken by Lorentz and most others may be summarized by saying that they
sacrificed the physical principles of inertial relativity, isotropy, and homogeneity in order
to maintain the assumed Galilean composition law. This approach, although technically
serviceable, suffers from a certain inherent lack of conviction, because while asserting the
ontological reality of anisotropy in all but one (unknown) frame of reference, it
unavoidably requires us to disregard that assertion and arbitrarily assume one particular
frame as being "the" rest frame.
Poincare and Einstein recognized that in our descriptions of events in spacetime in terms
of separate space and time coordinates we're free to select our "basis" of decomposition.
This is precisely what one does when converting the description of events from one frame
to another using Galilean relativity, but, as noted above, the Galilean composition law
yields anisotropic results when applied to actual observations. So it appeared (to most
people) that we could no longer maintain isotropy and homogeneity in all inertial frames
together with the ability to transform descriptions from one frame to another by simply
applying the appropriate basis transformation. But Einstein realized this was too
pessimistic, and that the new observations were fully consistent with both isotropy in all
inertial frames and with simple basis transformations between frames, provided we adjust
our assumption about the effective metrical structure of spacetime. In other words, he
brilliantly discerned that Lorentz's anisotropic results totally vanish in the context of a
different metrical structure.
Even a metrical structure is conventional in a sense, because it relies on our ontological
premises. For example, the magnitude of the interval between two events may seem to be
one thing but actually be another, due (perhaps) to variations in our means of observation
and measurement. However, once we have agreed on the physical significance of inertial
coordinate systems, the invariance of the quantity (dt)2 (dx)2 (dy)2 (dz)2 also
becomes physically significant. This shows the crucial importance of the very first
sentence in Section 1 of Einstein's 1905 paper:
Let us take a system of co-ordinates in which the equations of Newtonian
mechanics hold good.
Suitably qualified (as noted in Section 1.3), this immediately establishes not only the
convention of simultaneity, but also the means of operationally establishing it, and its
physical significance. Any observer in any state of inertial motion can throw two
identical particles in opposite directions with equal force (i.e., so there is no net
disturbance of the observer's state of motion), and the convention that those two particles
have the same speed suffices to fully specify an entire system of space and time
coordinates, which we call inertial coordinates. It is then an empirical fact - not a
definition, convention, assumption, stipulation, or postulate - that the speed of light is
isotropic in terms of inertial coordinates. This obviously doesn't imply that inertial
coordinates are "true" in any absolute sense, but the principle of inertia has proven to be
immensely powerful for organizing our knowledge of physical events, and for discerning
and expressing the apparent chains of causation.
If a flash of light emanates from the geometrical midpoint between two spatially separate
particles at rest in an inertial frame, the arrival times of the light wave at those two
particles are simultaneous in terms of that rest frame’s inertial coordinates. Furthermore,
we find empirically that all other physical processes are isotropic with respect to those
inertial coordinates, e.g., if a sound wave emanates from the midpoint of a uniform steel
beam at rest in an inertial frame, the sound reaches the two ends simultaneously in accord
with this definition. If we adopt any other convention we introduce anisotropies in our
descriptions of physical processes, such as sound in a uniform stationary steel beam
propagating more rapidly in one direction than in the other. The isotropy of physical
phenomena - including the propagation of light - is strictly a convention, but it was not
introduced by special relativity, it is one of the fundamental principles which we use to
organize our knowledge, and it leads us to choose inertial coordinates for the description
of events. On the other hand, the isotropy of multiple distinct physical phenomena in
terms of inertial coordinates is not purely conventional, because those coordinates can be
defined in terms of just one of those phenomena. The value of this definition is due to the
fact that a wide variety of phenomena are (empirically) isotropic with respect to the same
class of coordinate systems.
Of course, it could be argued that all these phenomena are, in some sense, “the same”.
For example, the energy conveyed by electromagnetic waves has momentum, so it is an
inertial phenomenon, and therefore it is not surprising that the propagation of such energy
is isotropic in terms of inertial coordinates. From this point of view, the value of the
definition of inertial coordinates is that it reveals the underlying unity of superficially dissimilar phenomena, e.g., the inertia of energy. This illustrates that our conventions and
definitions are not empty, because they represent ways of organizing our knowledge, and
the efficiency and clarity of this organization depends on choosing conventions that
reflect the unity and symmetries of the phenomena. We could, if we wish, organize our
knowledge based on the assumption of a total temporal ordering of events, but then it
would be necessary to introduce a whole array of unobservable anisotropic "corrections"
to the descriptions of physical phenomena.
As we’ve seen, the principle of relativity constrains, but does not uniquely determine, the
form of the mapping from one system of inertial coordinates to another. In order to fix
the observable elements of a spacetime theory with respect to every member of the
equivalence class of inertial frames we require one further postulate, such as the
invariance of light speed (or the inversion symmetry discussed in Chapter 1.8). However,
we should distinguish between the strong and weak forms of the light-speed invariance
postulate. The strong form asserts that the one-way speed of light is invariant with respect
to the natural space-time basis associated with any inertial state of motion, whereas the
weak form asserts only that the round-trip speed of light is invariant. To illustrate the
different implications of these two different assumptions, consider an experiment of the
type conducted by Michelson and Morley in their efforts to detect a directional variation
in the speed of light, due to the motion of the Earth through the aether, with respect to
which the absolute speed is light was presumed to be referred. To measure the speed of
light along a particular axis they effectively measured the elapsed time at the point of
origin for a beam of light to complete a round trip out to a mirror and back. At first we
might think that it would be just as easy to measure the one-way speed of light, by simply
comparing the time of transmission of a pulse of light from one location to the time of
reception at another location, but of course this requires us to have clocks synchronized at
two spatially separate locations, whereas it is precisely this synchronization that is at
issue. Depending on how we choose to synchronize our separate clocks we can measure a
wide range of light speeds. To avoid this ambiguity, we must evaluate the time interval
for a transit of light at a single spatial location (in the coordinate system of interest),
which requires us to measure a round trip, just as Michelson and Morley did.
Incidentally, it might seem that Roemer's method of estimating the speed of light from
the variations in the period between eclipses of Jupiter's moons (see Section 3.3)
constituted a one-way measurement. Similarly people sometimes imagine that the oneway speed of light could be discerned by (for example) observing, from the center of a
circle, pulses of light emitted uniformly by a light source moving at constant speed
around the perimeter of the circle. Such methods are indeed capable of detecting certain
kinds of anisotropy, but they cannot detect the anisotropy entailed by Lorentz’s ether
theory, nor any of the other theories that are observationally indistinguishable from
Lorentz’s theory (which itself is indistinguishable from special relativity). In any theory
of this class, there is an ambiguity in the definition of a “circle” in motion, because
circles contract to ellipses in the direction of motion. Likewise there is ambiguity in the
definition of “uniformly-timed” pulses from a light source moving around the perimeter
of a moving circle (ellipse). The combined effect of length contraction and time dilation
in a Lorentzian theory is to render the anisotropies unobservable.
The empirical indistinguishability between the theories in this class implies that there is
no unambiguous definition of “the one-way speed of light”. We can measure without
ambiguity only the lapses of time for closed-loop paths, and such measurements cannot
establish the “open-loop” speed. The ambiguity in the one-way speed remains, because
over any closed loop, by definition, the net change in each and every direction is zero.
Hence it is possible to consistently interpret all observations based on the assumption of
non-isotropic light speed. Admittedly the resulting laws take on a somewhat convoluted
appearance, and contain unobservable parameters, but they can't be ruled out empirically.
To illustrate, consider a measurement of the round-trip speed of light, assuming light
travels at a constant speed c relative to some absolute medium with respect to which our
laboratory is moving with a speed v. Under these assumptions, we would expect a pulse
of light to travel with a speed c+v (relative to the lab) in one direction, and cv in the
opposite direction. So, if we send a beam of light over a distance L out to a mirror in the
"c+v" direction, and it bounces back over the same distance in the "cv" direction, the
total elapsed time to complete the round trip of length 2L is
Therefore, the average round-trip speed relative to the laboratory would be
This shows why a round-trip measurement of the speed of light would not be expected to
reveal any dependency on the velocity of the laboratory unless the measurement was
precise enough to resolve second-order effects in v/c. The ability to detect such small
effects was first achieved in the late 19th century with the development of precision
interferometry (exploiting the wave-like properties of light.) The experiments of
Michelson and Morley showed that, despite the movement of the Earth in its orbit around
the Sun (to say nothing of the movement of the solar system, and even of the galaxy),
there was no (v/c)2 term in the round-trip speed of light. In other words, they found that
2L/t is always equal to c, at least to the accuracy they could measure, which was more
than adequate to rule out a second-order deviation. Thus we have a firm empirical basis
for asserting that the round-trip speed of light is independent of the motion of the source.
This is the weak form of the invariant light speed postulate, but in his 1905 paper
Einstein asserted something stronger, namely, that we should adopt the convention of
regarding the one-way speed of light as invariant. This stronger postulate doesn't follow
from the results of Michelson and Morley, nor from any other conceivable experiment or
observation - but there is also no conceivable observation that could conflict with it. The
invariant round-trip speed of light fixes the observable elements of the theory, but it does
not uniquely determine the presumed ontological structure, because multiple different
interpretations can be made to fit the same set of appearances. The one-way speed of light
is necessarily an interpretative element of our experience.
To illustrate the ambiguity, notice that we can ensure a null result for the Michelson and
Morley experiment while maintaining non-constant light speed, merely by requiring that
the speed of light v1 and v2 in the two opposite directions of travel (out and back) satisfy
the relation
In other words, a linear round-trip measurement of light speed will yield the constant c in
every direction provided only that the harmonic mean of the one-way speeds in opposite
directions always equals c. This is easily accomplished by defining the one-way velocity
v1 as a function of direction arbitrarily for all directions in one hemisphere, and then
setting the velocities in the opposite directions the velocities v2 in the opposite directions
as v2 = cv1 / (2v1  c). However, we also wish to cover more complicated round-trips,
rather than just back and forth on a single line. To ensure that a circuit of light around an
equilateral triangle with edges of length L yields a round-trip speed of c, the speeds v1,
v2, v3 in the three equally spaced directions must satisfy
so again we see that the light speeds must have a harmonic mean of c. In general, to
ensure that every closed loop of light, regardless of the path, yields the average speed c,
it's necessary (and also sufficient) to have light speed v = C() as a function of angle  in
a principal plane such that, for any positive integer n,
In units with c = 1, we need the n terms on the left side to sum to n, so the velocity
function must be such that 1/C() = 1 + f() where the function f() satisfies
for all . The canonical example of such a function is simply f() = k cos() for any
constant k. Thus if we postulate that the speed of light varies as a function of the angle of
travel  relative to some primary axis according to the equation
then we are assured that all closed-loop measurements of the speed of light will yield the
constant c, despite the fact that the one-way speed of light is distinctly non-isotropic (for
non-zero k). This equation describes an ellipse, and no measurement can disprove the
hypothesis that the one-way speed of light actually is (or is not) given by (1). It is, strictly
speaking, a matter of convention. If we choose to believe that light has the same speed in
all directions, then we assume k = 0, and in order to send a synchronizing signal to two
points we would locate ourselves midway between them (i.e., at the location where round
trips between ourselves and those two points take the same amount of time.) On the other
hand, if we choose to believe light travels twice as fast in one direction as in the other,
then we would assume k = 1/3, and we would locate ourselves 2/3 of the way between
them (i.e., twice as far from one as the other, so round trip times are two to one). The
latter case is illustrated in the figure below.
Regardless of what value we assume for k (in the range from -1 to +1), we can
synchronize all clocks according to our belief, and everything will be perfectly consistent
and coherent. Of course, in any case it's necessary to account consistently for the lapse of
time for information to get from one clock to another, but the lapse of time between any
two clocks separated by a distance L can be anything we choose in the range from
virtually 0 to 2L/c. The only real constraint is that that the speed be an elliptical function
of the direction angle.
The velocity profile given by (1) is simply the polar equation of an ellipse (or ellipsoid is
revolved about the major axis), with the pole at one focus, the semi-latus rectum equal to
c, and eccentricity equal to k. This just projects the ellipse given by cutting the light cone
with an oblique plane. Interestingly, there are really two light cones that intersect on this
plane, and they are the light cones of the two events whose projections are the two foci of
the ellipse - for timelike separated events. Recall that all rays emanating from one focus
of an ordinary ellipse and reflecting off the ellipse will re-converge on the other focus,
and that this kind of ray optics is time-symmetrical. In this context our projective ellipse
is the intersection of two null-cones, i.e., it is the locus of all points in spacetime that are
null-separated from both of the "foci events". This was to be expected in view of the
time-symmetry of Maxwell's equations (not to mention the relativistic Schrodinger
equation), as discussed in Section 9.
Our main reason for assuming k = 0 is our preference for symmetry, simplicity, and
consistency with inertial isotropy. Within our empirical constraints, k can be interpreted
as having any value between -1 and +1, but the principle of sufficient reason suggests that
it should not be assigned a non-zero value in the absence of any rational justification.
Nevertheless, it remains a convention (albeit a compelling one), but we should be clear
about what precisely is – and what is not – conventional. The invariance of lightspeed is a
convention, but the invariance of lightspeed in terms of inertial coordinates is an
empirical fact, and this empirical fact is not a formal tautology, because inertial
coordinates are determined by the mechanical inertia of material objects, independent of
the propagation of light.
Recall that Einstein’s 1905 paper states that if a pulse of light is emitted from an
unaccelerated clock at time t1, and is reflected off some distant object at time t2, and is
received back at the original clock at time t3, then the inertial coordinate synchronization
is given by stipulating that
Reichenbach noted that the formally viable simultaneity conventions correspond to the
assumption
where  is any constant in the range from 0 to 1. This describes the same class of
“elliptical speed” conventions as discussed above, with  = (k+1)/2 where k ranges from
1 to +1. The corresponding coordinate transformation is a simple time skew, i.e., x’ = x,
y’ = y, z’ = z, t’ = t + kx/c. This describes the essence of the Lorentzian “absolutist”
interpretation of special relativity. Beginning with the putative absolute rest frame inertial
coordinates x,y, Lorentz associates with each state of motion v a system of coordinates
x’,t’ related to x,y by a Galilean transformation with parameter v. In other words, x’ = x –
vt and t’ = t. He then re-scales the x’,t’ coordinates to account for what he regards as the
physical contraction of the lengths of stable object and the slowing of the durations of
stable physical processes, to arrive at the coordinates x” = x/ and t” = t where  =
(1v2/c2)1/2. These he regards as the proper rest frame coordinates for objects moving
with speed v in terms of the absolute frame. There is nothing logically unacceptable
about these coordinate systems, but we must realize that they do not constitute inertial
coordinate systems in the full sense. Mechanical inertia and the speed of light are not
isotropic in terms of such coordinates, precisely because the time foliation (i.e., the
simultaneity convention) is skewed relative to the  = 1/2 convention.
If we begin with the inertial rest frame coordinates for the state of motion v (which
Lorentz and Einstein agree are related to the putative absolute rest frame coordinate by a
Lorentz transformation), and then apply the time skew transformation with parameter k =
-v/c, we arrive at these Lorentzian rest frame coordinates. Needless to say, our choice of
coordinate systems does not affect the outcome of any physical measurement, except that
the outcome will be expressed in different terms. For example, by the Einsteinian
convention the speed of light is isotropic in terms of the rest frame coordinates of any
material object, whereas by the Lorentzian convention it is not. This difference is simply
due to different definitions of “rest frame coordinates”. If we specify inertial coordinate
systems (i.,e., coordinates in terms of which inertia is isotropic and Newton’s laws are
quasi-statically valid) then there is no ambiguity, and both Lorentz and Einstein agree
that the speed of light is isotropic in terms of all inertial coordinate systems.
In subsequent sections we’ll see that the standard formalism of general relativity provides
a convenient means of expressing the relations between spacetime events with respect to
a larger class of coordinate systems, so it may appear that inertial references are less
significant in the general theory. In fact, Einstein once hoped that the general theory
would not rely on the principle of inertia as a primitive element. However, this hope was
not fulfilled, and the underlying physical basis of the spacetime manifold in general
relativity remains the set of primitive inertial paths (geodesics) through spacetime. Not
only do these inertial paths determine the equivalence class of allowable coordinate
systems (up to diffeomorphism), it even remains true that at each event we can construct
a (local) system of inertial coordinates with respect to which the speed of light is c in all
directions. Thus the empirical fact of lightspeed invariance and isotropy with respect to
inertial coordinates remains as a primitive component of the theory. The difference is that
in the general theory the convention of using inertial coordinates is less prevalent,
because in general there is no single global inertial coordinate system, and non-inertial
coordinate systems are often more convenient on a curved manifold.
4.6 The Field of All Fields
Classes and concepts may be conceived as real objects, existing
independently of our definitions and constructions. It seems to me that the
assumption of such objects is quite as legitimate as the assumption of
physical bodies, and there is quite as much reason to believe in their
existence.
Kurt
Gödel, 1944
Where is the boundary between the special and general theories of relativity? It is
sometimes said that any invocation of "general covariance" implies general relativity, but
just about any theory can be expressed in a generally covariant form, so this doesn't even
distinguish between general relativity and Newtonian physics, let alone special relativity.
For example, it's perfectly possible to simply transform the special relativistic solution of
a rotating platform into some arbitrary accelerated coordinate system, and although the
result is ugly, it is no less (or more) valid than when it was expressed in terms of non-
accelerating coordinates, because the transformation from one stipulated set of
coordinates to another has no physical content. The key word there is "stipulated",
because the real difference between the special and general theories is in what they take
for granted.
In a sense, special relativity is analogous to "naive set theory" in mathematics. By this I
mean that special relativity is based on certain plausible-sounding premises which
actually are quite serviceable for treating a wide class of problems, but which on close
examination are susceptible to self-referential antinomies. This is most evident with
regard to the assumption of the identifiability of inertial frames. As Einstein remarked,
"in the special theory of relativity there is an inherent epistemological defect", namely,
that the preferred class of reference frames on which the theory relies is circularly
defined. Special relativity asserts that the lapse of proper time between two (timelikeseparated) events is greatest along the inertial worldline connecting those two events - a
seemingly interesting and useful assertion - but if we ask which of the infinitely many
paths connecting those two events is the "inertial" one, we can only answer that it is the
one with the greatest lapse of proper time. If we simply accept this uncritically, and are
willing to naively rely on the testimony of accelerometers as unambiguous indicators of
"inertia", we have a fairly solid basis on which to do physics, and we can certainly work
out correct answers to many questions. However, the epistemological defect was
worrisome to Einstein, and caused him (in a remarkably short time) to abandon special
relativity and global Lorentz invariance as a suitable conceptual framework for the
formulation of physics.
The naive reliance on accelerometers as unambiguous indicators of global inertia in the
context of special relativity is immediately undermined by the equivalence principle,
because we're then required to predicate any application of special relativity on the
absence (or at least the negligibility) of irreducible gravitational fields, and this condition
is simply not verifiable within special relativity itself, because of the circularity in the
principle of inertia. This circularity genuinely troubled Einstein, and was one of the major
motivations (along with the problem of reconciling mass-energy equivalence with the
Equivalence Principle) that led him to abandon special relativity.
Given the recognized limitations of special relativity, and considering how successfully it
was generalized and extended in 1916, we may wonder why it's even necessary to
continue carrying along the special theory as a conceptually distinct entity. Will this
duality persist indefinitely, or will we eventually just say there is a single theory of
relativity (the theory traditionally called general relativity), which subsumes and extends
the earlier theory called special relativity? The reluctance to discard the special theory as
a separate theory may be due largely to the fact that it represents a simple and widelyapplicable special case of the general theory, and it's convenient to have a name for this
limiting case. (There are, however, many cases in which the holistic approach of the
general theory is actually much simpler than the traditional special-theory-plus-generalcorrections approach.) Another reason that's sometimes mentioned is the (remote)
possibility that Einstein's general relativity is not the "right" generalization/extension of
the special theory. For example, if observation were ever to conclusively rule out the
existence of gravitational waves (which is admittedly hard to imagine in view of the
available binary star data), it might be necessary to seek another framework within which
to place the special theory. In this sense, we might regard special relativity as roughly
analogous to set theory without the axiom of choice, i.e., a restricted and less ambitious
theory that avoids making use of potentially suspect concepts or premises.
However, it's hard to say exactly which of the fundamental principles of general relativity
is considered to be suspect. We've seen that "general covariance" is a property of almost
any theory, so that can't be a problem. We might doubt the equivalence principle in one
or more of its various flavors, but it happens to be one of the most thoroughly tested
principles in physics. It seems most likely that if general relativity fails, it would be
because one or more of its "simplicities" is inappropriate. For example, the restriction to
2nd order, or the assumption of Riemannian metrics rather than, say, Finsler metrics, or
the naive assumption of R4 topology, or maybe even the basic assumption of a
continuum. Still, each of these would also have conceptual implications for the special
theory, so these aren't valid reasons for continuing to regard special relativity as a
separate theory.
Suppose we naively superimpose special relativity on Newtonian physics, and adopt a
naive definition of "inertial worldline", such as a worldline with no locally sensible
acceleration. On that basis we find that there can be multiple distinct "inertial" worldlines
connecting two given events (e.g., intersecting elliptical orbits of different eccentricities),
which conflicts with the special relativistic principle of a unique inertial interval between
any pair of timelike separated events. To press the antinomy analogy further, we could
arrange to have special relativity conclude that each of these worldlines has a lesser lapse
of proper time than each of the others. (If the barber shaves everyone who doesn't shave
himself, who shaves the barber?) Of course, with special relativity (as with set theory) we
can easily block such specific conundrums - once they are pointed out - by imposing one
or more restrictions on the definition of "inertial" (or the definition of a "set"), and in so
doing we make the theory somewhat less naive, but the experience raises legitimate
questions about whether we can be sure we have blocked all possible escapes.
We shouldn't push the analogy too far, since there are obvious differences between a
purely mathematical theory and a physical theory, the latter being exposed to potential
conflict with a much wider class of "external" constraints (such as the requirement to
possess a consistent mapping to a representation of experience). However, when
considering naive set theory's assumption of the existence of sets, and its assertions about
how to manipulate and reason with sets, all in the absence of a comprehensive criteria of
how to identify what can legitimately be called a set, there is an interesting parallel with
special relativity's assumption of the existence of inertial frames and how to reason with
them and in them, all in the absence of a comprehensive framework for deciding what
does and what does not constitute an inertial frame.
It might be argued that relativity is a purely formalistic theory, which simply assumes an
inertial frame is specified, without telling how to identify one. Certainly we can
completely insulate special relativity from any and all conflict by simply adopting this
strategy, i.e., asserting that special relativity avers no mapping at all between it's elements
and the objects of our experience. However, although this strategy effectively blocks
conflict, it also renders the theory quite unfalsifiable and phenomenologically otiose.
Even recognizing the distinction between logical inconsistency and empirical
falsification, we must also remember that the rules of logic and reason are ultimately
grounded in "observations", albeit of a very abstract nature, and mathematical theories no
less than physical theories are attempts to formalize "observations". As such, they are
comparably subject to upset when they're found to conflict with other observations (e.g.,
barbers, gravity, etc.).
It might be argued that we cannot really attribute any antinomies to special relativity,
because the cases noted above (multiply intersecting elliptical orbits, etc) arise only from
attempting to apply special relativistic reasoning to a class of entities for which it is not
suited. However, the same is true of naive set theory, i.e., it works perfectly well when
applied to a wide class of sets, but leads to logically impossible conclusions if we attempt
to apply it to a class of sets that "act on themselves"... just as gravity is found to act on
itself in the general theory. In a real sense, gravity in general relativity is a self-referential
phenomenon, as revealed by the non-linearity of the field equations. Notice that our
antinomies in the special theory arise only when trying to reason with "self-referential
inertial frames", i.e., in the presence of irreducible gravitational fields.
The basic point is that although special relativity serves as the local limiting case of the
general theory, it is not able to stand alone, because it cannot identify the applicability of
its premises, which renders it incapable of yielding definite macroscopic conclusions
about the physical world. By placing all the necessary indefinite qualifiers on the scope
of applicability, we effectively remove special relativity from the set of physical theories.
This just re-affirms the point that any application of special relativity is, strictly speaking,
legitimized only within the context of the general theory, which provides the framework
for assessing the validity of the application. One can, of course, still practice the special
theory from a naive standpoint, and be quite successful at it, just as one can practice naive
set theory without running into trouble very often. Naturally none of this implies that
special relativity, by itself, is unfalsifiable. Indeed it is falsifiable, but only when
superimposed on some other framework (such as Newtonian physics) and combined with
some auxiliary assumptions about how to identify inertial frames. In fact, the special
theory of relativity is not only falsifiable, it is falsified, and was superceded in 1916 by a
superior and more comprehensive theory. Nevertheless, strict epistemological scruples
don't have a great deal of relevance to the actual day-to-day practice of science.
From a more formal standpoint, it's interesting to consider the correspondence between
the foundations of set theory and the theories of relativity. The archetypal example of a
problematic concept in naive set theory was the notion of the "set of all sets". It soon
became apparent to Cantor, Russell, and other mathematicians that this plausiblesounding notion could not consistently be treated as a set in the usual sense. The problem
was recognized to be the self-referential nature of the concept. We can compare this to
the general theory of relativity, which is compelled by the equivalence principle to
represent the metric of spacetime as (so to speak) "the field of all fields". To make this
more precise, recall that Newtonian gravity can be represented by a scalar field  defined
over a pre-existing metrical space, whose metric we may denote as g. The vacuum field
equation is Lg() = 0 where Lg signifies the Laplacian operator over the space with the
fixed metric g. In general relativity the Laplacian is replaced by a more complicated
operator Rg which, like the Laplacian, is effectively a differential operator whose
components are evaluated on the spacetime with the metric g. However, in general
relativity the field on which Rg operates is nothing but the spacetime metric g itself. In
other words, the vacuum field equations are Rg(g) = 0. The entity Rg(g) is called the Ricci
tensor in differential geometry, usually denoted in covariant form as R.
This highlights the essentially self-referential nature of the Einstein field equations, as
opposed to the Newtonian field equations where the operator and the field being operated
on are completely independent entities. It's interesting to compare this situation to
schematic representations of Goedel's formalization of arithmetic, leading to his proof of
the Incompleteness Theorem. Given a well-defined mapping between single-variable
propositional statements and the natural numbers (which Goedel showed is possible,
though far from trivial), let Pn(w) denote the nth statement applied to the variable w.
Since every possible proposition maps to some natural number, there is a natural number
k such that Pk(w) represents the proposition that Pw(w) has no proof. But then what
happens if we set the variable w equal to k? We see that Pk(k) represents that proposition
that there is no proof of Pk(k), from which it follows that if there is no proof of Pk(k) then
Pk(k) is true, whereas if there is a proof of Pk(k) then Pk(k) is false. Hence, assuming our
system of arithmetic is self-consistent, so that it doesn't contain proofs of false
propositions, we must conclude that Pk(k) is true but unprovable. Obviously the negation
of Pk(k) must also be unprovable, assuming our arithmetic is consistent, so the
proposition is strictly undecidable within the formal system encoded by our numbering
scheme.
The analogy between Goedel propositions Pk(k) and the field equations of general
relativity Rg(g) = 0 should not be pressed too far, but it does hint at the real and profound
subtleties that can arise when we allow self-referential statements. It's interesting that
Einstein seems to have been mindful very early of the eventual necessity of such
statements, although he deferred it for quite some time. Prior to 1905 many physicists
were attempting to construct a purely electromagnetic theory of matter based on
Maxwell's equations, according to which "the particle would be merely a domain
containing an especially high density of field energy". However, in presenting the special
theory of relativity Einstein carefully avoided proposing any particular theory as to the
ultimate structure of matter, and showed that a purely kinematical interpretation could
account for the relation between energy and inertia. He took this approach not because he
was disinterested in the nature of matter, but because he recognized immediately that
Maxwell's equations did not permit the derivation of the equilibrium of the
electricity that constitutes a particle. Only different, nonlinear field equations
could possibly accomplish such a thing. But no method existed for discovering
such field equations without deteriorating into adventurous arbitrariness.
So in 1905 Einstein took the more conservative route and merely(!) redefined the
traditional concepts of time and space. A few years later he himself embarked on an
adventure leading ultimately in 1915 to the non-linear field equations of general
relativity, but even in this he managed to make important progress by sidestepping again
the question of the ultimate constituency of matter and light. As he recalled in his
Autobiographical Notes
It seemed hopeless to me at that time to venture the attempt of representing the
total field [as opposed to the pure gravitational field] and to ascertain field laws
for it. I preferred, therefore, to set up a preliminary formal frame for the
representation of the entire physical reality; this was necessary in order to be able
to investigate, at least preliminarily, the effectiveness of the basic idea of general
relativity.
In his later years it seems Einstein had decided he had made all the progress that could be
made on this preliminary basis, and set about the attempt to represent the total field. He
wrote the above comments in 1949, after a quarter-century of fruitless efforts to discover
the non-linear equations for the "total field", including electromagnetism and matter, so
he knew only too well the risks of deteriorating into adventurous arbitrariness.
4.7 The Inertia of Twins
We have no direct intuition of simultaneity, nor of the equality of two
durations. People who believe they possess this intuition are dupes of an
illusion... The simultaneity of two events, the order of their succession, and
the equality of two durations, are to be so defined that the enunciation of
the natural laws may be as simple as possible.
Poincare, The Value of
Science, 1905
The most commonly discussed "paradox" associated with the theory of relativity
concerns the differing lapses of proper time along two different paths between two fixed
events. This is often expressed in terms of a pair of twins, one moving inertially from
event A to event B, and the other moving inertially from event A to an intermediate event
M, where he changes his state of motion, and then moves inertially from M to B, where it
is found that the total elapsed time of the first twin exceeds that of the second. Much of
the popular confusion over this sequence of events is simply due to specious reasoning.
For example, if x,t and x',t' denote inertial rest frame coordinates respectively of the first
and second twin (on either the outbound or inbound leg of his journey), some people are
confused by the elementary fact that if those two coordinate systems are related
according to the Lorentz transformation, then the partials (t'/t)x and (t/t')x' both have
the same value. (For example, the unfortunate Herbert Dingle spent his retirement years
on a pitiful crusade to convince the scientific community that those two partial
derivatives must be the reciprocals of each other, and that therefore special relativity is
logically inconsistent.) Other people struggle with the equally elementary algebraic fact
that the proper time along any given path between two events is invariant under arbitrary
Lorentz transformations. The inability to grasp this has actually led some eccentrics to
waste years in a futile effort to prove special relativity inconsistent by finding a Lorentz
transformation that does not leave the proper time along some path invariant.
Despite the obvious fallacies underlying these popular confusions, and despite the
manifest logical consistency of special relativity, it is nevertheless true that the so-called
twins paradox, interpreted in a more profound sense, does highlight a fundamental
epistemological shortcoming of the principle of inertia, on which both Newtonian
mechanics and special relativity are based. Naturally if we simply stipulate that one of the
twins is in inertial motion the entire time and the other is not, then the resolution of the
"paradox" is trivial, but the stipulation of "inertial motion" for one of the twins begs the
very question that motivates the paradox (in its more profound form), namely, how are
inertial worldlines distinguished from the set of all possible worldlines? In a sense, the
only answer special relativity can give is that the inertial worldline between two events is
the one with the greatest lapse of proper time, which is clearly of no help in resolving
which of the twins' worldlines is "inertial", because we don't know a priori which twin
has the greater lapse of proper time - that's what we're trying to determine!
This circularity in the definition of inertia and the inability to justify the privileged
position held by inertial worldlines in special relativity were among the problems that led
Einstein in the years following 1905 to seek a broader and more coherent context for the
laws of physics. The same kind of circular reasoning arises whenever we critically
examine the concept of inertia. For example, when trying to decide if our region of
spacetime is really flat, so that "straight lines" exist, we face the same difficulty. As
Einstein said:
The weakness of the principle of inertia lies in this, that it involves an argument in
a circle: a mass moves without acceleration if it is sufficiently far from other
bodies; we know that it is sufficiently far from other bodies only by the fact that it
moves without acceleration.
We could equally well substitute [has the greatest lapse of proper time] for [is sufficiently
far from other bodies]. In either case the point is the same: special relativity postulates the
existence of inertial frames and assigns to them a preferred role, but it gives no a priori
way of establishing the correct mapping between this concept and anything in reality.
This is what Einstein was referring to when he said "In classical mechanics, and no less
in the special theory of relativity, there is an inherent epistemological defect...". He
illustrates this with a famous thought experiment involving two relatively spinning
globes, discussed in Chapter 4.1. (The term "thought experiment" might be regarded as
an oxymoron, since the epistemological significance of an experiment is its empirical
quality, which a thought experiment obviously doesn't possess. Nevertheless, it's
undeniable that scientists have made good use of this technique - along with occasionally
making bad use of it.) The puzzling asymmetry of the spinning globes is essentially just
another form of the twins paradox, where the twins separate and re-converge (one
accelerates away and back while the other remains stationary), and they end up with
asymmetric lapses of proper time. How can the asymmetry be explained? In 1916
Einstein thought that
The only satisfactory answer must be that the physical system consisting of S1 and
S2 reveals within itself no imaginable cause to which the differing behavior of S1
and S2 can be referred. The cause must therefore lie outside the system. We have
to take it that the general laws of motion...must be such that the mechanical
behavior of S1 and S2 is partly conditioned, in quite essential respects, by distant
masses which we have not included in the system under consideration.
It should be noted that the strongly Machian attitude conveyed by this passage was
subsequently tempered for Einstein when he realized that in the general theory of
relativity it may be necessary to attribute the "essential conditioning" to boundary
conditions rather than distant masses. Nevertheless, this quotation serves to demonstrate
how seriously Einstein took the question, which, of course, is as applicable to the twins
paradox as it is to the two-globe paradox.
The above “weighty argument from the theory of knowledge” was the first reason cited
by Einstein (in 1916) for the need to go beyond special relativity in order to arrive at a
suitable conceptual framework. The second reason was the apparent impossibility of
doing justice, within the context of special relativity, to the equivalence principle relating
gravitation and acceleration. The first of these reasons bears most directly on the twins
paradox, although the problem of reconciling acceleration with gravity inevitably enters
the picture as well, since we can't avoid the issue of gravitation as soon as we
contemplate acceleration  assuming we accept the equivalence principle. From these
considerations it’s clear that special relativity could never have been more than a
transitional theory, since it was not comprehensive enough to justify its own conclusions.
The question of whether general relativity is required to resolve the twins paradox has
long been a subject of spirited debate. On one hand, Einstein wrote a paper in 1918 to
explain how the general theory accounts for the asymmetric aging of the twins by means
of the “gravitational fields” that appear with respect to accelerated coordinates attached to
the traveling twin, and Max Born recounted this analysis in a popular book, concluding
that "the clock paradox is due to a false application of the special theory of relativity,
namely, to a case in which the methods of the general theory should be applied". On the
other hand, many people object vigorously to any suggestion that special relativity is
inadequate to satisfactorily resolve the twins paradox. Ultimately the answer depends on
what sort of satisfaction is being sought, viz., on whether the paradox is being presented
as a challenge to the consistency of special relativity (as is Dingle's fallacy) or to the
completeness of special relativity. If we're willing to accept uncritically the existence and
identifiability of inertial frames, and their preferred status, and if we are willing to
exclude any consideration of gravity or the equivalence principle, then we can reduce the
twins paradox to a trivial exercise in special relativity. However, if it is the completeness
(rather than the consistency) of special relativity that is at issue, then the naive acceptance
of inertial frames is precisely what is being challenged. In this context, we can hardly
justify the exclusion of gravitation, considering that the very same metrical field which
determines the inertial worldlines also represents the gravitational field.
Notice that the typical statement of the twins paradox does not stipulate how the galaxies
in the universe along with the cosmological boundary conditions that determine the
metrical field are dynamically configured relative to the twins. If every galaxy in the
universe were “moving” in tandem with the "traveling twin", which (if either) of the
twins' reference frames would be considered inertial? Obviously special relativity is
silent on this point, and even general relativity does not give an unequivocal answer.
Weinberg asserts that "inertial frames are determined by the mean cosmic gravitational
field, which is in turn determined by the mean mass density of the stars", but the second
clause is not necessarily true, because the field equations generally require some
additional information (such as boundary conditions) in order to yield definite results.
The existence of cosmological models in which the average matter of the universe rotates
(a fact proven by Kurt Gödel) shows that even general relativity is incomplete, in the
sense that it is subject to global conditions with considerable freedom. General relativity
may not even give a unique field for a given (non-spherically symmetric) set of boundary
conditions and mass distribution, which is not surprising in view of the possibility of
gravitational waves. Thus even if we sharpen the statement of the twins paradox to
specify how the twins are moving relative to the rest of the matter in the universe, the
theory of relativity still doesn't enable us to say for sure which twin is inertial.
Furthermore, once we recognize that the inertial and gravitational field are one and the
same, the twins paradox becomes even more acute, because we must then acknowledge
that within the theory of relativity it's possible to contrive a situation in which two
identical clocks in identical local circumstances (i.e., without comparing their positions to
any external reference) can nevertheless exhibit different lapses in proper time between
two given events. The simplest example is to place the twins in intersecting orbits, one
circular and the other highly elliptical. Each twin is in freefall continuously between their
periodic meetings, and yet they experience different lapses of proper time. Thus the
difference between the twins is not a consequence of local effects; it is a global effect. At
any point along those two geodesic paths the local physics is identical, but the paths are
embedded differently within the global manifold, and it is the different embedding within
the manifold that accounts for the difference in proper length. (The same point can be
made by referring to a flat cylindrical spacetime.) This more general form of the twins
paradox compels us to abandon the view that physical phenomena are governed solely by
locally sensible influences. (Notice, however, that we are forced to this conclusion not by
logical contradiction, but only by our philosophical devotion to the principle of sufficient
cause, which requires us to assign like physical causes to like physical effects.) Likewise
the identification of gravity with local spacetime curvature is untenable, as shown by the
fact that a suitable arrangement of gravitating masses can produce an extended region of
flat spacetime in which the metrical field is nevertheless accelerating in the global sense,
and we surely would not regard such a region as free of gravitation.
It is fundamentally misguided to exercise such epistemological concerns within the
framework of special relativity, because special relativity was always a provisional theory
with recognized epistemological short-comings. As mentioned above, one of Einstein's
two main two reasons for abandoning special relativity as a suitable framework for
physics was the fact that, no less than Newtonian mechanics, special relativity is based on
the unjustified and epistemologically problematical assumption of a preferred class of
reference frames, precisely the issue raised by the twins paradox. Today the "special
theory" exists only (aside from its historical importance) as a convenient set of widely
applicable formulas for important limiting cases of the general theory, but the
phenomenological justification for those formulas can only be found in the general
theory.
This is true even if we posit the absence of gravitational effects, because the question at
issue is essentially the origin of inertia, i.e., why one worldline is inertial while another is
not, and the answer unavoidably involves the origin and significance of the background
metric, even in the absence of curvature. The special theory never claimed, and was never
intended, to address such questions. The general theory attempts to provide a coherent
framework within which to answer such questions, but it's not clear whether the attempt
is successful. The only context in which general relativity can give (at least arguably) a
complete explanation of inertia is a closed, finite, unbounded cosmology, but the
observational evidence doesn't (at present) clearly support this hypothesis, and any
alternative cosmology requires some principle(s) outside of general relativity to
determine the metrical configuration of the universe.
Thus the twins paradox is ultimately about the origin and significance of inertia, and the
existence of a definite metrical structure with a preferred class of worldlines (geodesics).
In the general theory of relativity, spacetime is not simply the totality of all the relations
between material objects. The spacetime metric field is endowed with its own ontological
existence, as is clear from the fact that gravity itself is a source of gravity. In a sense, the
non-linearity of general relativity is an expression of the ontological existence of
spacetime itself. In this context it's not possible to draw the classical distinction between
relational and absolute entities, because spatio-temporal relations themselves are active
elements of the theory.
We should also mention another common objection to the relativistic treatment of the
twins, based not on any empirical disagreement, but on linguistic and metaphysical
preferences. It is pointed out that we can, without logical contradiction, posit the
existence of a unique, absolute, and true metaphysical time at every location, and we can
account for the differences between the elapsed times on clocks that have followed
different paths simply by stipulating that the rate of a clock depends on its absolute state
of motion (defined relative to, for instance, the local frame in which the presumably
global cosmic background radiation is maximally isotropic). Indeed this was essentially
the view advocated by Lorentz. However, as discussed at the end of Section 1.5,
postulating a metaphysical “truth” along with whatever physical laws are necessary to
account for why the observed facts differ from the postulated “truth” is not generally
useful, except as a way of artificially reconciling our experience with any particular
metaphysical truth that we might select. The relativistic point of view is based on purely
local concepts, such as that of an “ideal clock” corrected for all locally sensible
conditions, recommended to us by the empirical fact that all observable aspects of local
physical phenomena – including the rates of temporal progression – exhibit the same
dependence on their state of inertial motion (which is not a locally sensible condition).
This is the physical symmetry presented to us, and we are certainly justified in exploiting
this symmetry to simplify and clarify the enunciation of physical laws.
4.8 The Breakdown of Simultaneity
I have yielded: Instruct my daughter how she shall
persever, that time and place with this deceit so
lawful may prove coherent.
William Shakespeare, 1603
We've seen how the operational time convention enables us to define surfaces of
simultaneity with respect to any given inertial frame. However, if we try to apply this
procedure to a set of accelerating bodies the concept breaks down. The problem is
illustrated in the spacetime diagram shown below.
This drawing shows a family of worldlines, each having the identical history of velocity
as a function of time relative to the inertial coordinates. By sending light beams back and
forth to its neighboring worldlines, an observer following path B can determine that he is
equidistant from A and C. Likewise an observer on C is equidistant between B and D,
and an observer on D is equidistant from C and E. However, due to the change in velocity
of these worldlines, an observer on C can not conclude that he is equidistant from A and
E. This breakdown of the well-defined locus of simultaneity is unavoidable in
accelerating systems, because the operational procedure defining simultaneity involves a
non-zero lapse of time for spatially separate objects, so the simultaneity relations change
during the performance of the procedure. Of course, the greater the distance between
objects, the greater the change in velocity (and simultaneity relations) during the
performance of a synchronization procedure.
Another illustration of this problem is shown below, where the instantaneous loci of
simultaneity of an abruptly accelerated worldline are seen to intersect each other (on the
left), so that a given distant event is assigned multiple times of occurrence. Furthermore,
events in the region "R" on the right do not properly correspond to any time according to
the accelerating worldline's instantaneous inertial time, because at the instant of
acceleration his locus of simultaneity jumps abruptly.
Obviously any amount of relative "skew" between the planes of simultaneity for a given
worldline will result in interference at some distance, producing non-unique time
coordinates. However, if the velocity of our worldline varies continuously (instead of
abruptly), then for some limited region the planes of simultaneity will be advancing
forward in time faster than they are "tilting" backwards, so over this limited region we
can, if we choose, make use of these planes of simultaneity for the time labels of events.
This situation is illustrated below.
x
We can easily determine the approximate limit for unique time labels with this kind of
coordinate system by noting that if the velocity changes by amount dv/c during a time
interval dt, then the relative slope of the new plane of simultaneity is c/dv, so it intersects
with the original plane of simultaneity at a distance dx = (cdt)(c/dv) = c2/(dv/dt). Since a
= dv/dt is the acceleration, we can estimate that this accelerating system of coordinates is
coherent out to distances on the order of c2/a.
As an example of the use of accelerating coordinate systems and the breakdown of
inertial simultaneity, consider a circular Sagnac device as described in Section 2.7. As
we've seen, each point on the rim of the rotating disk can be associated with an
instantaneously co-moving inertial coordinate system, each with its own surfaces of
simultaneity. However, since each point of the disk is accelerating with respect to each
other point, there is no coherent simultaneity (in the inertial sense) shared by any two
points. If we analytically continue the local simultaneity from one point to the next
around the perimeter, the result is an open helical surface as indicated below:
The worldline of a particular point on the rim is shown by the helical curve AB, and the
shallower helix represents the analytically continued surface of inertial simultaneity. (It's
interesting to compare this construction with Riemann surfaces in complex function
analysis.)
Of course, we can dispense with the use of local inertial simultaneity to define our
constant-t coordinate surfaces, and simply define an arbitrary system of space and time
coordinates in terms of which a rotating disk is stationary (for example), but we then
must be careful to correctly account for non-inertial aspects of these accelerating
coordinates, particularly with regard to the meanings of spatial lengths. The usual
intuitive definition of the spatial length of an object (such as the perimeter of the rim) is
the absolute length of a locus of inertially simultaneous points of that object, so it
depends on the establishment of a slice of "inertial simultaneity" over the entire rim. If
we use inertial coordinates this is easy, but if we use non-inertial coordinates (such as
those in which the rotating disk is stationary), then no surface of inertial simultaneity
coincides with our surfaces of constant time parameter. In fact, this is essentially the
definition of non-inertial coordinates. So, we will obviously be unable to define a
coherent locus of inertial simultaneity over the whole disk as a surface of constant time
parameter when working with non-inertial coordinates.
One consequence of this is the fact that the spatial length of a path becomes dependent on
the speed of the path. We are accustomed to this for temporal lengths, i.e., the length of
time around the rim might be 30 seconds or 2 hours or 1 nanosecond, etc., depending on
how fast we are going relative to the disk, how fast the rim is spinning, in which direction
it is spinning, and so on. Likewise the spatial length of a path around the rim (in terms of
some particular coordinates) depends on the speed of the path. This shouldn't be
surprising, because the decomposition of spacetime into separate spatial and temporal
components is not unique, i.e., there are multiple equally self-consistent decompositions.
Since this is often a source of confusion, it's worthwhile to describe how this works in
detail. Let's first establish inertial cylindrical coordinates in 2+1 spacetime, using polar
coordinates (r,) for the space (where  is the angular coordinate), and t for time. The
metric in terms of these inertial coordinates is
and for any fixed time t the purely spatial metric is
So, to find the "length" of any spacelike curve, such as the perimeter of a spinning disk of
radius rd centered at the origin, we simply integrate ds over this curve at the fixed value
of t. For a circular disk, r = rd is constant, so dr = 0, and the spatial metric is simply ds =
rd d, which we integrate from  = 0 to 2 to give the length 2 rd.
Now let's look at this situation in terms of a system of coordinates in which the spinning
disk is stationary, i.e., such that a fixed point anywhere on the disk maintains constant
spatial coordinates for all values of the temporal coordinate. Taking the most naive and
simplistic approach, let's define the new coordinates T,R, by the relations
where  is a constant, denoting the angular speed of these coordinates with respect to the
inertial t,r, coordinates. We also have the differentials
Substituting these expressions into the metric equation gives
According to these coordinates, a spatial length S must be given by integrating the
absolute spacelike differential using the metric along some constant-T surface, i.e., with
dT = 0, where the metric is
Again for the perimeter of the disk we get 2 Rd = 2 rd. Notice that our constant-T
surfaces are also constant-t surfaces, so this perimeter length agrees with our previous
result, and of course it doesn't matter which direction we integrate around the perimeter.
Incidentally, letting v = Rd  denote the velocity of the rim with respect to the original
inertial coordinates, the full spacetime metric for the rim (R = Rd) in terms of the rotating
coordinates is
For a point fixed on the rim we have d = 0, and so
which confirms that the lapse of proper time for a point fixed on the rim of the rotating
disk is
times the lapse of T (and therefore of t).
Now let's send light beams around the perimeter in opposite directions. For lightlike
paths we have d = 0, so the path of light must satisfy
The purely spatial component is dS = Rd d, so we can make this substitution and divide
both sides by (dT)2 to give
The quantity dS/dT is the "speed of light" in terms of these rotating non-inertial
coordinates. Also, from the definitions we have
where d/dt is the angular velocity of the light at radius Rd with respect to the inertial
coordinates, so it equals 1/Rd (noting that c = 1 in our units), with the sign depending on
whether the light is clockwise or counter-clockwise. Substituting into the previous
expression gives
Letting C = dS/dT denote the speed of light with respect to these rotating non-inertial
coordinates, we therefore have C = 1  v, where again the sign depends on the direction
of the light relative to the direction of rotation of the disk.
Does this analysis lead to some kind of paradox? It indicates that the non-inertial "speed
of light" with respect to these rotating coordinates is not equal to 1, and in fact the ratio of
the speeds in the two directions is (1+v)/(1v), but of course this doesn't conflict with
special relativity, because these are not inertial coordinates (due to their rotation).
However, suppose we increase Rd and decrease w in proportion so that the rim speed v
remains constant. The above formulas still apply for arbitrarily large Rd and small angular
speed w, and yet the speed ratio remains the same, (1+v)/(1v). Does this conflict with
special relativity in the limit as the radius goes to infinity and the angular speed of the rim
goes to zero? Clearly not, since we saw in Section 2.7 that if t1 and t2 denote the travel
times for light pulses circling the disk in opposite directions, as measured by a clock at a
fixed point on the rim, so that t2/t1 = (1+v)/(1v), then we have t2/t1  1 = /, where 
is the angular travel of the disk during the transit of light. In other words, the observed
ratio of travel times around the rim always differs from 1 by an amount proportional to
the angular travel of the disk during the transit of light. Thus the net acceleration (change
of velocity) of the rim observer during the measurement remains in constant proportion to
the measured anisotropy of the transit times.
However, even without waiting for the light rays to circle the disk and report their
anisotropy, don't the above formulas imply that the speeds of light in the two directions
are in the ratio of (1+v)/(1v) instantaneously with respect to our rotating coordinates,
and don't the rotating coordinates approach being inertial coordinates as Rd increases
while holding v constant? Yes and no. Both sets of coordinates use the same time t = T,
but they use different space coordinates, s and S. For the perimeter of the disk we have
where W = d/dt. Thus the ratio dS/ds of spatial distances along a given "path" depends
on the angular speed W of the path. Recall that for a signal travelling at c = 1 (with
respect to the inertial coordinates) around the perimeter we have W = 1/rd, and so
This is consistent with the velocity ratio
This shows that the "spatial distances" around the perimeter are different in the two
directions. But we saw earlier that "the spatial distance" was independent of the direction
in which we integrated around the perimeter, even in the rotating coordinate system, so
does this indicate an inconsistency? No, because, as noted above, the ratio dS/ds along a
given path depends on the speed of the path. We have dS/ds = 1 + w/W, and for the
perimeter of the disk with rim speed v and for a path with speed V, this gives
If the path is lightlike, we have V = 1 and so dS/ds = 1  v, whereas when we considered
the purely spatial distance around the perimeter we took the "instantaneous" distance, i.e.,
we took a spacelike path with V = , in which case dS/ds = 1 in both directions. This
explains quantitatively what we mean when we say that we are measuring different
things, depending on what spacetime path is having it's "spatial length" evaluated. Just as
the temporal length of a path around the rim depends on the speed of the path, so too does
the spatial length.
By the way, notice that if we integrate the spatial component of a path whose velocity V
(relative to the original inertial coordinates) is the same as the rim speed itself, so that v =
V, then obviously we will never move with respect to the disk in one direction, so dS = 0
and therefore dS/ds = 0, whereas in the other direction we have dS/ds = 2. Similarly if V
= 0 we will never move relative to the original coordinates, i.e., ds = 0 and therefore
dS/ds is infinite along such a path.