Download QM Consilience_3_

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Atomic theory wikipedia , lookup

Delayed choice quantum eraser wikipedia , lookup

Propagator wikipedia , lookup

Topological quantum field theory wikipedia , lookup

Quantum field theory wikipedia , lookup

Quantum machine learning wikipedia , lookup

Elementary particle wikipedia , lookup

Orchestrated objective reduction wikipedia , lookup

Renormalization wikipedia , lookup

Particle in a box wikipedia , lookup

Scalar field theory wikipedia , lookup

Max Born wikipedia , lookup

Wave–particle duality wikipedia , lookup

Wave function wikipedia , lookup

Quantum group wikipedia , lookup

Many-worlds interpretation wikipedia , lookup

Matter wave wikipedia , lookup

Bohr–Einstein debates wikipedia , lookup

Hydrogen atom wikipedia , lookup

Identical particles wikipedia , lookup

Spin (physics) wikipedia , lookup

Quantum key distribution wikipedia , lookup

Density matrix wikipedia , lookup

Bra–ket notation wikipedia , lookup

Copenhagen interpretation wikipedia , lookup

Renormalization group wikipedia , lookup

Quantum teleportation wikipedia , lookup

Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup

Bell test experiments wikipedia , lookup

Path integral formulation wikipedia , lookup

Measurement in quantum mechanics wikipedia , lookup

Quantum entanglement wikipedia , lookup

History of quantum field theory wikipedia , lookup

Interpretations of quantum mechanics wikipedia , lookup

Quantum electrodynamics wikipedia , lookup

T-symmetry wikipedia , lookup

Relativistic quantum mechanics wikipedia , lookup

Double-slit experiment wikipedia , lookup

Canonical quantization wikipedia , lookup

Symmetry in quantum mechanics wikipedia , lookup

Probability amplitude wikipedia , lookup

Quantum state wikipedia , lookup

EPR paradox wikipedia , lookup

Bell's theorem wikipedia , lookup

Hidden variable theory wikipedia , lookup

Transcript
The Miraculous Consilience of Quantum Mechanics
Malcolm R. Forster
Department of Philosophy,
University of Wisconsin-Madison
September 5, 2004
ABSTRACT: Some quantum mechanical phenomena are notoriously hard to
explain in causal terms. But what prior motivation is there for seeking a causal
explanation in the first place, other than the fact that they have been used
successfully to explain unrelated phenomena? The answer is two-fold. First, the
agreement of independent measurements of probabilities is the mark of successful
causal explanations, which fails in many quantum mechanics examples. But
secondly, and more importantly, causal explanations fail to replicate the
successful predictions made by quantum mechanics of one phenomena from other
phenomena of a very different kind.
1. Baby Epistemology
Think back to a time when mirrors were a new experience to us as children. How did we learn
that what we saw in mirrors was not a different world, but a reflection of the world we already
knew? How did we acquire the parsimonious view that
reflections were independent views of the same objects? 1
Presumably, it has something to do with the way that mirror
Entity
images are correlated with our more direct perception of the
objects.
Consider a different way of illustrating the same idea.
Reichenbach (1938) imagined an observer enclosed in the
Observer
corner of a cubical world, where objects in the external
Figure 2: Reichenbach’s cubical
world.
1
This mirror example is used in Hung (1997).
world cast shadows on the walls of the enclosure. It seems that our knowledge of the external
world would be better if external objects cast two independent shadows on two walls of the
enclosure, rather than a single shadow. Again, we may ask why we should conclude that they
are shadows of the same object, rather than shadows of different objects.
There is no good formal theory of how such inferences work. Nevertheless, they seem to
conform to informal principles such as the principle of parsimony, or Occam’s razor, which
states that “Entities are not to be multiplied beyond necessity.”2 The rule is vague in crucial
ways. For one, what counts as an entity? And under what conditions does it become necessary
to multiply an entity?
Consider the first question more carefully. Do probabilities count as entities? Is there, in
other words, a principle of inference that says that probabilities are not to be multiplied beyond
necessity? In section 3, I argue that such a principle is crucial in causal modeling, for it provides
an account of the time asymmetry of cause and effect that does not rely on the concept of
manipulation or intervention. On the other side of the same coin, there are well known
circumstances in which causal modeling does not work. In section 4 I show that the application
of Occam’s razor to probabilities leads to false predictions in the infamous double-slit
experiment. The question then arises: Does Occam’s razor apply to quantum mechanics (QM)
in some different way? Is QM parsimonious with respect to other kinds of entities? This is a
difficult question, which I attempt to answer affirmatively in sections 5, 6, and 7, at least within
the context of a concrete example.
The concrete example is interesting because it is similar to the example used by Bell (1964,
1971) to falsify local hidden variable interpretations of QM. I provide a more recent shorter
version of Bell’s derivation of a false prediction in section 6, while in section 7 I show that it is
actually very close to the way that QM makes the correct prediction (just replace variables with
operators).
Bell’s theorem can be viewed as an argument that (at least some) quantum mechanical
phenomena have no causal explanation (van Fraassen 1983). This fits with the theses argued in
prior sections: (1) The constancy or invariance of probabilities is an essential component of
2
The principle is named after William of Ockham (~1280 - 1347 AD), who stated in Latin “non sunt multiplicanda
entia praeter necessitatem.”
2
causal modeling, and (2) the assumed invariance of probabilities leads to a false prediction in the
double-slit experiment. Therefore, there is no causal explanation of the double-slit phenomena.
The advantage of Bell’s argument is that it is harder to wriggle out the conclusion (though not
impossible).
In the discussion (section 8) things are tied together in the following way. One way of
making Reichenbach’s cubical world inference precise is in terms of his principle of common
cause (Reichenbach 1956). This principle appears to provide a sufficient condition for
postulating a single hidden variable to explain two effects. This sufficient condition clearly
applies to the QM correlations in section 6, and when the parsimonious hidden variable model is
considered, it leads to a false prediction. The arrow of modus tollens can be pointed in two
directions. One response is to say that it is necessary to multiply the hidden variables, while a
different response is to abandon hidden variables completely. Quantum theory takes the second
option by replacing variables with operators. This is what provides a description of entities to
which Occam’s razor does apply. The lesson is that the theory tells us what entities are not to be
multiplied beyond necessity. Progress in science sometimes depends on radically new ways of
thinking that fail to respect the intuitive standards of explanation and understanding set by
preceding theories or common sense.
I have not attempted to provide a formal account cubical world inference, partly because I’m
not sure that there exists a single formal account. In the absence of a precise theory, the best
strategy is to look at disparate examples of science. Nevertheless, a good informal description of
the kind of the scientific inferences that use Occam’s razor was published in 1858 by William
Whewell. Section 2 provides a brief introduction to these ideas.
2. Whewell’s Consilience of Inductions
Long before the discovery of quantum mechanics, William Whewell (1858) claimed that
scientific induction proceeds in three steps: (1) the Selection of the Idea, (2) the Construction of
the Conception, and (3) the Determination of the Magnitudes (see Butts 1989, 223-237). In
curve-fitting, for example, the selection of the idea is the selection of the independent variable
(x) from which we hope to predict the dependent variable (y). The construction of the
conception is determined by the choice of the formula (family of curves). And the determination
3
of magnitudes is what statisticians now refer to as the estimation of the adjustable parameters.
Whewell then makes an insightful claim about curve-fitting:
If we thus take the whole mass of the facts, and remove the errours of actual
observation, by making the curve which expresses the supposed observation regular and
smooth, we have the separate facts corrected by their general tendency. We are put in
possession, as we have said, of something more true than any fact by itself is.
(Whewell, quoted from Butts 1989, p. 227.)
In other words, the purpose of the curve, or the formula, is to capture the general tendency of the
data, the signal behind the noise, and this is something “more true” than the sum of the observed
facts. In another place, he again emphasizes that unification is the key to scientific innovation:
The particular facts are not merely brought together, but there is a New Element added
to the combination by the very act of thought by which they are combined. There is a
Conception of mind introduced in the general proposition, which did not exist in any of
the observed facts. ... The pearls are there, but they will not hang together until some
one provides the string. (Whewell, quoted from Butts 1989, pp. 140, 141.)
In the case of curve-fitting, the conceptual string is the curve, but Whewell also insists that the
same metaphor applies to all instances of scientific induction.
The best examples of science consist in much more than the mere colligation of facts (which
is Whewell’s name for ‘induction’). It also strives towards what he calls the consilience of
inductions (note the plural). The mark of a good theory lies not in the relationship between the
theory and its data in a single narrow application, but in the way it succeeds in ‘tying together’
separate inductions. A good theory is like a tree that puts out runners that grow into new trees,
until there is a huge forest of mutually sustaining trees. The ‘tying together’ can be achieved in
either of two ways: (a) The theory accommodates one set of data, and then predicts data of a
different kind. (b) The theory accommodates two kinds of data separately, and then finds that
the magnitudes in the separate inductions agree, or that the laws that hold in each case ‘jump
together’.3 Whewell describes both kinds of cases as leading to a consilience of inductions.4
3
When the magnitudes agree, then we have what is more commonly referred to as an agreement of independent
measurements. It is no coincidence that Newton scholars, such as Harper (2002), emphasize the importance of the
agreement of independent measurements in Newton’s argument for universal gravitation. For Whewell was also
primarily concerned with the explication of Newton’s methodology. See also Myrvold and Harper (2002) for an
argument that this kind of evidence is not properly taken into account in standard statistical methods of model
selection.
4
The main purpose of this essay is to show how parts of quantum theory can be viewed as a
achieving a consilience of many inductions. I believe that this achievement of QM is the starting
point of realist interpretation of quantum mechanics that is ontologically more parsimonious than
hidden variable interpretations of quantum mechanics.
First, I need to show that there is nothing built into Whewell’s methodology that militates
against the success of causal explanations. Quite the opposite. The consilience of inductions
actually provides a criterion for when causal explanations should and should not be applied. It is
exactly because this criterion can fail in QM that causal explanations should be ruled out for
some, though not all, quantum mechanical phenomena.
3. The Consilience of Correct Causal Models
Consider a very simple example—the jackpot machine. The machine has two input states:
Either one euro coin is placed in the machine, or two euro coins are placed in the machine, then a
handle is pulled. The output state is either ‘win’ or ‘lose’. For the sake of concreteness, suppose
‘win’ refers to the event that machine delivers 10 euros as a ‘jackpot’, and a loss leads to a zero
payout. In this example, the input states do not determine the output states. So, the ‘constancy’
built into the machine is described in terms of the constancy or invariance of probabilities.
Introduce two variables, X and Y. Upper case letters are used in statistics to denote variables
that have probabilities assigned to their possible values. Such variables are called random
variables. X represents the input state of the machine in some particular trial of the experiment,
while Y represents the output state in the same trial. The possible states of the machine can be
represented by assigning arbitrary numerical values to these variables. Let X = 1 denote the
event that only one euro is placed in the machine before the handle is pulled, while X = 2 denotes
the event that 2 euros are placed in the machine. Y = 0 denotes the event that there is no payout,
while Y = 0 denotes the event that the jackpot (of 10 euros) is paid out.
The standard ‘forward’ causal model says:
P(win |1euro) = α , and P (win | 2 euros) = β , for all trials i.
4
My purpose is not to argue for some particular historical or exegetical thesis about Whewell’s notion of
consilience, but to use (and adapt) Whewell’s idea for the purpose of explaining how probabilistic theories work in
general, and how quantum mechanics works in particular.
5
That is, different trials have something in common; namely, that
the values of the forward conditional probabilities are the same
in all cases. Note that the model postulates values for all
(Alice)
1 euro
2 euros
loss
90
80
win
10
20
forward probabilities:
P(loss |1euro) = 1 − α , and P (loss | 2 euros) = 1 − β , for all trials i.
These probabilities are also invariant because they are defined solely in terms of the constants α
and β.
These four probabilities are postulated by the model. They are theoretically entities. Their
measurement, or estimation, is determined from the available data in the same way that any
theoretical quantity is measured. Consider a typical set of data. Suppose that Alice plays the
machine 200 times, and we record the input and output state on each trial. The data consist in a
sequence of data ‘points’, which come in four flavors: (1, 0) , (1,1) , (2, 0) , or (2,1) . Because the
model says that the temporal order of the trials does not matter, the data are adequately recorded
in the table of the observed frequencies on the right.
Notice that Alice plays the machine with 1 euro half the time, and with 2 euros half the time
(100 trials each). Out of all the times she plays the 1 euro version of the game, she wins the
jackpot 10 times, thus earning 100 euros, which is the same amount that she paid to play. Out of
all the times she pays 2 euros to play, she wins 20 out of the 100 times (twice as often), and earns
a total of 200 euros (twice as much). But she paid twice as much to play, so she still earned the
same as what she paid to play. On the basis of the data, the machine appears to be fair.
Whewell’s three steps in the colligation of facts applies to this example in the following way.
Step 1 consists in the selection of X and Y as the relevant quantities to be considered. Step 2
introduces the formula, which in this case is probabilistic in nature. It introduces a family of
probability distributions parameterized by the adjustable parameters α and β. These determine
the Conception. In this example, the conception is probabilistic in nature. Step 3 is the
determination of the magnitudes α and β from the data. In contemporary statistical theory, this
is achieved by the method of maximum likelihood estimation (MLE).
To understand how MLE works, first note that each pair of values assigned to α and β picks
out a particular probabilistic hypothesis in the model. The fit of each hypothesis with the data is
defined by its likelihood, which is, by definition, the probability of the data given the hypothesis
6
(this should not be confused with the probability of the hypothesis given the data, which is a
distinctly Bayesian concept). The greater the likelihood of a hypothesis (the more probable it
makes the data) the better the hypothesis fits the data. The hypothesis that fits best is, by
definition, the hypothesis in the model that has the maximum likelihood. For arbitrary values of
α and β, the likelihood is:
Likelihood(α, β) = (1 − α )90 α 10 (1 − β )80 β 20 .
The probabilities are multiplied together because each trial is probabilistically independent of all
the others according to the model, in the same way that coin tosses are independent.
Mathematically speaking, maximizing the likelihood is the same as maximizing the loglikelihood.
log-Likelihood(α, β) = [90 log(1 − α ) + 10 log(α )] + [80 log(1 − β ) + 20 log( β )] .
Again, the terms in the square brackets can be maximized separately by differentiating with
respect to α and β and putting the resulting expressions equal to zero. Note that
d log x dx = 1 x . After multiplying by the factors α (1 − α ) and β (1 − β ) , respectively, the
equations simplify to:
−90α + 10(1 − α ) = 0 and −80β + 20(1 − β ) = 0 .
These equations yield the estimates: αˆ = 0.1 and βˆ = 0.2 . Accordingly, the theoretically
postulated probabilities are estimated by the natural relative frequencies in the data, just as one
would naïvely expect.
The question is whether the ‘forward’ model, which I have called the standard model, has
evidence in its favor that a ‘backward’ model does not. The backward model seeks to predict
values of X from values of Y, in the same probabilistic sense of ‘prediction’. In other words, it
postulates backward probabilities as follows:
P (2 euros | loss) = γ , and P (2 euros | win) = δ , for all trials i.
Using the same methods as before, the parameters of the backward model are estimated by the
corresponding relative frequencies in the data. The estimated values are: γˆ = 0.47 and
δˆ = 0.67 . There is nothing that points to any empirical difference between the models. In fact,
7
they cannot be compared with regard to their fit to the data because they tackle very different
prediction tasks. The fit of the forward model is measured in terms of Y-values, while the fit of
the backward model is measured in terms of X-values. In this sense, the two models are
incommensurable.
So, why don’t the two models happily co-exist? There is a sense in which they do co-exist,
for the forward model is capable of making backward predictions if it is provided with
information about the relative frequency of 1 euro versus 2 euro trials. With this information,
together with the estimated forward probabilities, the forward model can calculate the backward
probabilities, and gets the same answer as backward model. This is because they are effectively
calculated from the table of data in both cases. But this only
serves to deepen the puzzle. If both models can be seen as
predictively equivalent, why interpret one as causal and not the
(Bob)
1 euro
2 euros
lose
90
160
win
10
40
other?
This is the familiar puzzle about cause and correlation: The evidence for causation cannot be
exhausted by the correlations in the data, for correlations are symmetric, while causation is not.
Either there is no additional empirical evidence, in which case causal inference is based on nonempirical criteria (or psychological habit!?), or there is other evidence that breaks the symmetry.
As the reader may anticipate, the solution is to look at how the models predict data of a
different kind. Suppose that Bob plays the machine, and he happens to play with 2 euros twice
as often as with 1 euro. Given that the forward model is true, the frequency that Bob wins will
conform to the same forward probabilities, modulo a fluctuation in the data caused by sampling
errors. The noise fluctuations are not relevant to this discussion, so imagine that the number of
data is very large. Unfortunately, this would made the numbers harder to work with, so I have
more simply assumed that the relative frequencies conform exactly to the ‘true’ values. The data
from Bob’s trials are therefore given by the natural frequencies in the table on the right.
Given the way that the example is set up, the independent measurements from Alice’s and
Bob’s data of the parameters α and β will agree. The key question is whether the same is true
for the backward model. If it is not, then the symmetry between the models is broken.
A simple calculation shows that the new estimates of the parameters or the backward model
are γˆ = 0.64 and δˆ = 0.80 . These are not close to the previous values, which were γˆ = 0.47 and
δˆ = 0.67 . Therefore, the backward model fails the test of consilience, and the symmetry between
8
the forward and backward model is broken on empirical grounds (Forster 1984, Sober 1994,
Arntzenius 1997).
There is no such evidence available by considering the goodness-of-fit of the pooled data—
the models are always symmetric with respect to any single data set, as was shown by
considering Alice’s data. The evidence that differentiates the two models is always relational.
4. An Failed Consilience in the Double-Slit Experiment
The postulation of hidden variables is a way of interpreting the probabilities of quantum
mechanics as ‘measures of ignorance’. Einstein, for example, believed that a future physics
would reveal the existence of such hidden variables,
and the decay of radioactive particles, for example,
could be predicted exactly. Bohr, on the hand, thought
that quantum physics was complete in the sense that
Light
source
quantum probabilities are here to stay.
At the time of the debate, the most common
A
C
B
example used was the double-slit experiment.
Consider a particle of light (photon) that leaves the
light source and travels through either slit A or slit B
(and not both). We may represent this event in terms
of a random variable X, which can have one of two
values, x A and xB .5 For our purposes, it doesn’t matter
Figure 1: In the double-slit experiment,
the assumption that individual photon
travel through either slit A or slit B leads
to the false prediction that double-slit
pattern is the sum of two single-slit
pattern.
what numerical values we use, so let x A = 1 and xB = −1 . Now introduce a second random
variable Y such that Y = 1 if the particle is detected in some region C , and 0 otherwise. In a
more natural shorthand notation, let A stand for the event X = 1, B of for the event X = −1,
and C is the event Y = 1. The probability of C given A is written P (C | A) . Similarly,
P (C |B) is the probability that it arrives at C given that it is passes through slit B. P( A) is
the probability that the particle passes through slit A given that it passes through any slit.
Then the probability of C (given that it arrives somewhere on the screen) is, according the
axioms of probability:
5
Recall that a random variable is, by definition, any variable that has a probability distribution associated with it.
9
P(C ) = P ( A) P(C | A) + P ( B) P(C |B) .
It is worthwhile working through the proof of this theorem, because it introduces some
elementary concepts of probability theory that are
Not-C
C
40
10
A
passes through slit A or B, but not both. Then B is
not-A
20
30
logically equivalent to the event not-A. Suppose that 50 particles pass through slit A and 50
controversial in QM. Let’s suppose that the particle
particles pass through slit B. Of the 50 particles that pass through slit A, 40 arrive at C. Of
the 50 particles that pass through slit B, 20 arrive at C. Therefore, a total of 60 out of 100
particles arrive at C. In symbols, P ( A) = 50 100 , P (C | A) = 40 50 , P( B) = 50 100 .
P (C |B) = 20 50 . Therefore,
P ( A) P(C | A) + P( B) P(C |B) =
50 40 50 20 60
+
=
= P(C )
100 50 100 50 100
P ( A, C ) is the probability that the particle passes through slit A and arrives at C. According
to the table, this joint probability is 40 100 . The argument assumes that joint probabilities,
such as P( A, C ) , exist.
So far, there is no problem. But now postulate that the probabilities P (C | A) and P(C |B)
do not depend on whether the other slit is open or closed. This often called a locality
assumption because it assumes that what happens to a particle passing through one slit is
unaffected by what is happening non-locally (at the other slit in this case). For me, it is an
invariance assumption—it is an attempt to unify the phenomena of the single-slit and doubleslit experiments. It is the attempted consilience, which may also be view as an application of
the principle that “Probabilities are not to be multiplied beyond necessity.”
Unification is important because it leads to predictions. But in this case it leads to the
false prediction that the double-slit pattern is the average of the two single slit patterns. If the
slits are the right distance apart, then is double-slit interference pattern has its brightest spot
at C (see Fig. 1), whereas any average of the single slit patterns has the brightest spots
directly in front of the two slits. The invariance of the probabilities between the single-slit
and double-slit experiments provides a potential consilience of inductions, but when the
consilience fails, the intuitive explanation on which it is based is called into question.
10
There are ways of resisting this conclusion. The first obvious response is that different
photons are interacting with each other after they pass through the slits. This possibility is
highly implausible in light of the fact that the interference pattern is exactly the same if the
intensity of the light is so low that only one photon passes through the slit at a time.
Another possibility is that there is some kind of continuously emitted pilot wave that
guides the particles to their appropriate destinations. The pilot wave is affected by whether
the second slit is open or closed. You can see that this hypothesis is ad hoc if there is no way
of independently detecting the existence of the pilot wave. But it doesn’t lead to any false
predictions. To that extent it solves the problem. On the other hand, it does not fix the failed
consilience either.
The argument that probabilities should be invariant across the single-slit and double-slit
experiments is based on the assumption that the particles pass through one slit or the other in
every instance. It is an important prediction of quantum mechanics that if the precise
trajectory is experimentally determined, then the classical prediction is correct—that is, in
this case the pattern on the screen will be the sum of the two single-slit patterns. For
example, electrons can be detected going through a particular slit by the electric current they
induce when they pass through a wire loop. So if we place wire loops behind the two slits,
then the sum of the two single slit patterns will be observed.
The problem with the classical explanation is that it is sometimes wrong, and this can be
traced back to its use of variables. In classical models, variables serve a dual purpose. On
the one hand, they represent the values of quantities that are actually measured. At the same,
they continue to represent quantities that are assumed to have precise values even if those
values cannot be determined by the theory in the particular experimental context. In this
role, they are acting as hidden variables. The hidden variables represent the causes of the
values of the observed quantities. The positions and momenta of particles in a gas are
examples of hidden variables in classical mechanics, so the assumption of hidden variables
does not always lead to false predictions in physics.
Quantum mechanics is distinguished from classical mechanics by the fact that it delegates
these two roles to different kinds of mathematical objects. Variables are still use to represent
observed quantities. But variables are not used to represent the quantities that are
11
“measured”.6 Instead, they are represented by a quantum mechanical observable, X̂ . It is
enough for the present purpose to know that quantum mechanical observables are represented
mathematically by operators in an abstract formalism, where these operators have one
important similarity with variables, and one important difference. The similarity is that they
still have a set of possible values assigned to them—called eigenvalues. The eigenvalues are
real numbers, just as the values of variables are real values. There is also a precise
probability distribution assigned to these eigenvalues, so QM observables are also similar to
random variables. But operators have an algebraic property that variables do not have. For
instance, let X represent the observed vertical position of a particle as it passes through the
slit and let P represent the observed vertical component of the momentum of the particle as it
is deflected by the tungsten plate in which the slits are housed. It is mathematically true of
any variables that XP = PX . That is, variables always commute when they are multiplied
together. While the corresponding quantum mechanical operators, X̂ and P̂ , can also be
ˆ ˆ ≠ PX
ˆ ˆ . The
multiplied together, they do not necessarily commute. That is, in general, XP
non-commutation of QM observables, when it holds, constrains the possible joint probability
distributions that can be assigned to the two observables in a way that leads to Heisenberg’s
famous uncertainty principle. Observables whose operators fail to commute are called
incompatible observables. The result is that the product of the variances of the two
incompatible observables must be greater than zero. If one of the variances is 0, then the
other must me infinitely large. For those who do not know the definition of the variance of a
random variable, all you need to know is that QM rules out probability distributions that
assigns probability values of 0 or 1 to the values of incompatible observables. A theory, such
as classical statistical mechanics, may not in fact assign probabilities of 0 and 1 either, but
the formalism of the theory does not rule it out as theoretical possibility.
According to the orthodox interpretation of quantum mechanics, this means that
incompatible observables never have precise values simultaneously. For on this
interpretation of QM, an observable has a precise value if and only if the QM variance of the
observable is zero. But according to the uncertainty principle, if the variance of an
observable is close to zero, then the variance of any incompatible observable must be very
6
The scare quotes are necessary because the very concept of measurement changes in QM.
12
large, so their product remains finite. This is loosely expressed by saying that if the particle
has a precise position, then it loses its momentum.
Einstein was someone who rejected the orthodox interpretation of QM. But we have
already seen that the conclusion is not easy to reject, for its rejection, coupled with other
plausible assumptions about locality, can lead to a false prediction.
The probability distribution that QM assigns to a QM observable depends on the quantum
mechanical state of the system. The state is represent by a wave function, which in this case
assigns a complex number to every point in the so-called configuration space, which in our
example is the space of all the possible positions that can be assigned to the particles. Again,
the details of the formalism do not matter here. What’s important is that all states determine
probability distributions that conform to the uncertainty principle.
We are now in a position to understand how the apparently more constrained quantum
mechanical formalism can succeed where the hidden variable account fails. Recall that the
hidden variable theory unified the phenomena by assuming that the probability distributions
in the two single slit experiments add together to produce the probability distribution in
double slit experiment. Quantum mechanics replaces the additivity of probabilities with the
additivity of wave functions. Let’s introduce a random variable Y , where y denotes an
arbitrary value of Y, to represent the possible points on the screen at which the particle may
be detected. Now consider the single slit experiment in which slit B is closed. Suppose a
quantum mechanical model entails a wave function that does not depend on time, and has a
complex number ψ A ( y ) associated with each point on the screen. Then the model implies
that the probability that a particle is detected near the point y is proportional to | ψ A ( y ) |2 .
Note that the square of the magnitude of complex number is a non-negative real number.
Similarly, the probability of a particle landing near y is | ψ B ( y ) |2 in the other single slit
experiment. Then the probability of a particle landing near the same point in the double slit
experiment is:
|
1
2
ψ A ( y) +
1
2
ψ B ( y ) |2 = 12 | ψ A ( y ) |2 + 12 | ψ B ( y ) |2 + interference terms.
If the interference terms are zero, then the prediction is that same as the hidden variable
prediction. QM allows for the additivity of probabilities but when it applies, it is derived
from the additivity of wave functions. The virtue of the quantum mechanical model is that it
13
succeeds in unifying disparate phenomena. In other words, it leads to a successful
consilience of inductions in all cases.
The point of this section is two-fold. First, modeling in quantum mechanics is very
different from curve fitting because it uses operators in place of variables. Nevertheless, the
reasons for doing this make sense in terms of the general goals of modeling in classical
physics. In some cases, the postulation of hidden variables is harmless. But when
interference effects are present, the postulation of hidden variables leads to a false prediction.
The conceptual innovations of quantum mechanics succeed in making correct predictions in
accordance with a methodology that is consistent with Whewell’s idea that a successful
consilience of inductions is the strongest kind of evidence that we have for any theory.
5. Sequential Spin Measurements on a Single Electron
In the quantum mechanical model of electron spin, there is a QM spin observable corresponding
in every direction in 3-dimensional space. This makes no sense for spin in the classical sense,
which means that we should not try to read too much into the word ‘spin’ in quantum mechanics.
It is just a name given to a new kind of QM observable that does not arise in classical physics.7
There are really only two facts about quantum mechanical spin that you need to understand
here. The first is that if an electron passes through a device known as a Stern-Gerlach magnet
(which creates a non-linear magnetic field), then the electron will exit the device along one of
two possible paths, one called the ‘up’ path and the other called the ‘down’ path. If the electron
is traveling in the z direction and the Stern-Gerlach magnet is aligned, for example, in the x
direction, and the electron is detected in the ‘up’ path, then in common parlance, one says that
the electron has been subjected to a spin “measurement” in the x direction, and the outcome of
the measurement is ‘spin-up in the x direction’.8
However, the reader should not assume that the electron already had the property of being
‘spin-up in the x direction’ prior to the “measurement”. This amounts to introducing a classical
variable, which is exactly what leads to false predictions. In classical physics, it is always the
7
I don’t mean to deny that there are good reasons for using the term ‘spin’.
8
To avoid a possible confusion about the directions, assume that we straighten the paths of the electron after it
passes through the first measurement device so it is always traveling in the z direction.
14
function of “measurements” to reveal the values of hidden variables. It’s not that an electron
cannot have properties prior to the “measurement” in the quantum theory. It’s just that one
should not jump to any hasty conclusions about what those properties are, or how they should be
represented mathematically.
As we have already mentioned, “measurement” in QM has a different meaning. It is
therefore appropriate to fix some terminology. If an electron is passed through a Stern-Gerlach
magnet, but no particle detector, such as a Geiger counter, is placed in either the ‘up’ or the
‘down’ path, then no measurement has taken place. However, if one Geiger counter is placed in
the ‘up’ path and another in the ‘down’ path, then one or other of the Geiger counters will detect
the particle. In this case we say that a spin measurement has taken place. In particular, if the
Stern-Gerlach magnet is aligned in the x-direction, then a measurement of spin the x-direction
has been performed. The case in which one Geiger counter is placed in either the ‘up’ or the
‘down’ path but not the other will not be considered here.
Suppose one Stern-Gerlach magnet is placed directly in the path of an electron, and a second
Stern-Gerlach magnet is placed behind it in the ‘up’ path exiting the first magnet, while a third
magnet is placed behind it in the ‘down’ exit path of the first magnet. Now four Geiger counters
are placed in the exit paths of last two magnets. The particle will be detected by exactly one of
the four Geiger counters. We say in this case that the election has been subjected to a sequential
spin measurement.
If the frequencies of particles detected by the four Geiger counters are recorded in many
repeated trials of the experiment, then we have collected data that can be used to infer important
properties of spin observables. This procedure plays the role of the ‘determination of the
magnitudes’ in Whewell’s third step in the colligation of facts. After this step is completed, we
are able to make precise predictions about the statistics in other sequential spin measurements, in
which the magnets are aligned in different directions.
One of the basic postulates of quantum mechanics is that the statistical distributions of
Geiger counter readings can be inferred from the quantum mechanical state of the particles, if the
state is known. The state is represented by a vector in a complex valued vector space, known as
a Hilbert space. Since there are two possible outcomes of any spin measurement, we represent
the spin state of an electron as a vector in a 2-dimensional Hilbert space. Using the Dirac
notation, we write this vector as x + , where the x reminds us that the first magnet was aligned in
15
the x direction, and the + tells us that the electron exited from the magnet in the ‘up’ path. Since
the coordinates of the Hilbert space can be chosen arbitrarily, we may chose to represent any
vector as a linear combination of the basis vectors,
1 
0 
 0  and 1  .
 
 
In fact, any vector in the Hilbert space can be written as a complex combination (a superposition)
of these two vectors. Therefore,
1 
0  c 
x + = c1   + c2   =  1  ,
0 
1  c2 
where c1 and c2 are complex numbers.
Corresponding to this Hilbert space of column vectors is the so-called dual space of row
vectors, where the basis vectors of the dual space are
[1
0] and [ 0 1] .
The column vector x + corresponds to a row vector in this dual space, written in the Dirac
notation as x + . Contrary to what one might expect, x + is not defined as c1 [1 0] + c2 [ 0 1] ,
but by
x + = c1* [1 0] + c2* [ 0 1] ,
where c* is the complex conjugate of the complex number c. That is, if c = a + ib , where a and
b are real, then c* = a − ib . Given this convention, there is still a one-to-one mapping between
column vectors and row vectors.
An important property of complex numbers is that the product of a complex number with its
complex conjugate is equal to the squared magnitude (modulus) of the complex number,
c*c = ( a − ib )( a + ib ) = a 2 + b 2 = c = c* ,
2
2
which is always a non-negative real number. In fact, this is exactly the reason behind the strange
definition of the dual vector, for one can now obtain the squared magnitude of a vector in a
Hilbert space by multiplying the vector with its dual:
 1 
0  
2
2
x + x + = ( c1* [1 0] + c2* [ 0 1])  c1   + c2    = c1*c1 + c2*c2 = c1 + c2 = x +
1  
 0 
16
2
.
x + x + is a bra-ket (bracket) and so x + is called a bra vector and x + a ket vector.
In an ordinary vector space, the scalar product of two vectors,
 c1 
 d1 
 c  and  d  ,
 2
 2
is defined as the product of the dual of one of the vectors times the other vector, as follows:
[c1
d 
c2 ]  1  = c1d1 + c2 d 2 .
d2 
This definition applies to complex vector spaces provided that the dual vector is defined as
above. For example, for two arbitrary ket vectors,
c 
d 
ψ =  1  and φ =  1  ,
 c2 
d2 
d 
ψ φ = c1* c2*   1  = c1*d1 + c2* d 2 .
d2 
In general, this will be a complex number. But note that if we reverse the order in the bra-ket,
*
then we get its complex conjugate: φ ψ = ψ φ . In the special case in which φ = ψ ,
*
ψ ψ = ψ ψ , which proves that ψ ψ is always a real number.
A fundamental postulate of quantum mechanics is that observables are represented by linear
operators that map vectors in the Hilbert space to other vectors in the Hilbert space. An arbitrary
linear operator, Â , in a 2-dimensional Hilbert space is represented by a 2 × 2 matrix of the form
a b 
Aˆ = 
,
c d 
where a, b, c, and d are complex numbers. A second fundamental postulate is that the mean
value of an observable  when the system is in a state ψ is defined by
Eψ Aˆ ≡ ψ Aˆ ψ .
Note that from the associativity of matrix multiplication, ψ Â ψ can be read either as
(
ψ Â ψ
) or as ( ψ Â) ψ
. A third fundamental postulate is that the mean value of an
observable in any quantum state must be a real number. This restricts the class of operators that
are observables in the following way. First note that, in general,
17
 ac + bc2  *
 a b   c1 
*
ψ Aˆ ψ = c1* c2*  
=  c1* c2*   1


 = c1 ( ac1 + bc2 ) + c2 ( cc1 + dc2 ) .

+
c
cc
dc
c
d

 2
2
 1
The third postulate therefore requires that the right hand side is real. But,
c1* ( ac1 + bc2 ) + c2* ( cc1 + dc2 ) = a c1 + b ( c1*c2 ) + c ( c1c2* ) + d c2 .
2
2
Since c1 and c2 are arbitrary complex numbers, the right hand side can only be real in every case
if a and d are real. Furthermore, we require that b ( c1*c2 ) + c ( c1c2* ) is real. With a little algebraic
manipulation, it is possible to see that this is true for arbitrary complex numbers c and b only if
c = b* . Therefore, every observable in a 2-dimensional Hilbert space is represented by a matrix
of the form
a
Aˆ =  *
b
b
,
d 
where a and d are real. Any matrix that has this form is said to be Hermitian, or self-adjoint. In
sum, the fundamental postulates require that observables are Hermitian operators, for otherwise
their mean values might be complex in some quantum states.
Not only does a linear Hermitian operator have a well defined mean value in every quantum
state, but it also has a well defined dispersion, or variance, in every state (in fact, it has a well
defined probability distribution in every state, but this detail does not concern us at the moment).
The variance of an observable  is defined analogously to the variance of a random variable in
standard statistics—namely, as the expected value (mean value) of the operator minus its mean
value all squared. So, let
α = Eψ Aˆ ≡ ψ Aˆ ψ .
Then
(
Varψ Aˆ ≡ ψ Aˆ − α
)
2
ψ ,
(
for any state ψ . Note that it is possible to prove that the operator  − α
)
2
is Hermitian if  is
Hermitian. So, the variance of  is a (non-negative) real number in every quantum state.
Now, let us examine the special case in which an observable  is dispersion-free in a state
ψ , where ‘dispersion-free’ is a physicist’s way of saying that the variance is 0. Given an
18
observable  , how do we characterize the states ψ in which Varψ Aˆ = 0 . In other words,
which states ψ satisfy the equation Varψ Aˆ = 0 ?
Theorem:9
Varψ Aˆ = 0 if and only if  ψ = α ψ for some number α.
Definition: An equation of the form  ψ = λ ψ is called the eigenvalue equation for the
operator  , and any ψ satisfying this equation is called an eigenvector of  , and the λ is
called the eigenvalue corresponding to that eigenvector. Note that λ is the mean value of the
quantum observable in the corresponding eigenstate.
The eigenvalue equation is most commonly presented as defining the possible outcomes of
the measurement of quantum observable, and it does serve this function. But it is also
determines the possible values of the observable itself, as if the observable were are variable in
the ordinary mathematical sense. But here is must be remembered that an observable only has a
well defined mean value in every state. It is not assigned a precise values. At least, the QM state
does not determine a precise value (unless, as stated in the theorem) it is an eigenstate of the
observable. So, we should be careful to maintain a clear distinction between the ‘observed
value’ and ‘mean value’ of a QM observable.
Let us return the example of sequential spin measurements on an electron. The electron has
passed through the magnetic field of a Stern-Gerlach magnet oriented in the x direction. We now
place two more Stern-Gerlach magnets, also oriented in the x direction—one in the ‘up’ path
exiting the first magnet, and the other in the ‘down’ path. Finally, we place particle detectors
(Geiger counters, or whatever devices are used to detect electrons) in the ‘up’ and ‘down’ exit
paths of the last two magnets (4 Geiger counters in all). What we find, experimentally, is that no
electrons follow the ‘up’ and then the ‘down’ path, or the ‘down’ and then the ‘up’ path. That is
to say, repeated spin measurements always produce the same outcomes (‘up’ follows by ‘up’ or
‘down’ followed by ‘down’).
Note that the final detection of the particle retroactively determines the path it took after
exiting the first magnet, even though no particle detectors are placed on that paths between the
first magnet and the second set of magnets. Thus, the outcome of each electron measures the
9
See, for example, Khinchin (1960), 54-55, for a proof.
19
spin values of the electron at different times. This enables us to estimate the probability
distribution of the second spin observable given the outcome of the first spin measurement.
Now suppose that we change the experimental setup. We remove the second set of magnets
and place a Geiger in the ‘down’ path exiting the first magnet.10 If we know that a particle is
passing through the apparatus, and the particle detector does not respond, then we know that the
electron is traveling in the ‘up’ path. Now place a second magnet in that path that is oriented in
some direction d, and place two particle detectors on the exit paths to capture the electron in
either the ‘up’ path or the ‘down’ path. Given that the earlier detector was not triggered, how
does quantum mechanics determine the probabilities for two possible responses of the second set
of detectors? Suppose we begin by assuming that the electron is in some unknown state vector
ψ . Then we can determine the probabilities from the mean value of the spin observable σ d ,
where d is the alignment of the second magnet. By the fundamental postulates of quantum
mechanics, mean value of σ d is ψ σ d ψ . If we choose the eigenvalues of any spin operator to
be +1 and −1, then we already know that for d = x, ψ σ x ψ = 1 . But this implies that ψ is a
dispersion-free state for the observable σ x , and so by the theorem, ψ is an eigenstate of σ x ,
with eigenvalue +1, which implies that ψ = x + . So, in order to accommodate the first
experimental fact, we need to assume that the state of the electron in the ‘up’ path exiting the
first magnet is x + . This is why we say that the first magnet prepares the state of the electron.
So, if we assume that the state vector of the particle entering the d-aligned magnet is the same,
then we can predict the probabilities for this measurement from the value of x + σ d x + . This
is an example of how prediction and accommodation works in quantum mechanics.
Here is a second consequence of the eigenvalue equation. The identity operator, given by
1 0 
Iˆ = 
,
0 1 
10
Note that Geiger counters are still placed on all possible exit paths, so a measurement, in our earlier sense, is still
being performed.
20
always satisfies the eigenvalue equation with α = 1 . Thus, Iˆ is a dispersion-free observable in
every state, with a mean value of ψ Iˆ ψ = ψ ψ . But also note that Iˆ ψ = +1 ψ . So, if we
want the mean value of the identity observable to be equal to its eigenvalue in a dispersion-free
state, then we need to assume that all state vectors are normalized. That is, we need to assume
that ψ ψ = ψ
2
=1.
The next task is to evaluate a quantity like ψ σ x ψ for an arbitrary state vector ψ . What
we need to know (besides the state vector) is the matrix representation of the operator σ x . We
already know that it is represented by some Hermitian operator with the form
a
b
σx =  *
,
b d 
where a and b are real numbers. In order to take the analysis further, we need to make some
conventional choices. We know that we can choose any coordinate system we like for the
Hilbert space, so we can choose to represent the eigenvector of the observable σ x associated
with the eigenvalue that represents spin ‘up’ as:
1 
x+ =   .
0 
Let us also denote the expected value of spin ‘up’ for σ x in state x + by α + . Because the
outcome is dispersion-free in that state, α + also denotes the observed value of the observable
when the electron goes ‘up’. The eigenvalue equation now requires that
a
b*

b  1 
1 
=α+   ,



d  0
0 
which requires that a = α + and b* = 0 = b . So, the observable is represented by the matrix
α +
σx = 
0
0
,
d
for some as yet undetermined real value d. What are the other solutions of the equation
α +

0
0   c1 
 c1 
  =α  ?
d  c2 
c2 
21
We know one solution, and we can also see that ψ = 0 is a trivial solution of every eigenvalue
equation. But what are the non-trivial solutions? By multiplying the matrices, it is easy to see
that the only other solution is given by c1 = 0 , c2 ≠ 0 , and d = α − , for a second eigenvalue α −
( ≠ α + ) . We make the last requirement because if it were the case that α − = α + , then σ x would
be proportional to the identity operator, and this would imply that σ x is dispersion-free in every
quantum state, contrary to experimental fact. Thus, we have shown that the observable σ x is
represented by the matrix
α +
σx = 
0
0
,
α−
where α + ≠ α − are two eigenvalues, and the corresponding eigenvectors are
0
1 
x + =   and x − =   .
0
 c2 
Notice that these two eigenvectors are mutually orthogonal in the precise sense that
x + x − = 0 = x − x + . We also want x − to be a unit vector, but this still does not fix the value
of c1 uniquely. The exact choice of c1 makes no difference, so we set c1 = 1 . The choice of
eigenvalues is also a matter of convention, so long as they are not equal, so we set their values to
α + = +1 and α − = −1 . With these choices, we finally arrive at
 +1
σx = 
0
0
,
−1
1 
0 
x + =   and x − =   .
0
1 
Given these choices, we may verify that x + σ x x + = +1 and x − σ x x − = −1 .
Now consider a second experimental fact—namely that if a spin measurement in the x
direction is followed by a spin measurement in the y direction (where x is orthogonal to y), then
the results are random (50% probability for each outcome). Given that we have chosen to
represent spin outcomes by the numbers ±1 , the second experimental fact amounts to saying that
if the electron is prepared in a state x + or x − , then the expected value of the observable σ y is
0. That is, we require
22
x + σ y x + = 0 and x − σ y x − = 0 .
In matrix form, these equations are written as
[1
a
0]  *
b
b  1 
a
= 0 and [ 0 1]  *



d  0 
b
b  0 
= 0,
d  1 
and they imply that a = 0 = d. Therefore,
0
b
σy =  * .
b 0 
One cannot expect the constraint to result in a unique specification of σ y , since there are many
directions orthogonal to x, and each is associated with a different spin observable. On the other
hand, it cannot be that b = 0, for this would imply that σ y is the zero matrix. So, the simplest
choice is to set b = i, in which case we arrive at
0
i
σy = 
.
 −i 0 
It is straightforward to show that σ y has eigenvalues equal to ±1 , and that the corresponding
eigenvectors are orthogonal to each other, and that each is a linear combination (superposition)
of the eigenvectors of σ x .
The Hermitian operators σ x and σ y do not commute, as is verified by multiplying the
matrices together in each order:
1 0   0 b   0 b 
 0 −1 b* 0  =  −b* 0  , whereas


 

 0 b  1 0   0
b* 0  0 −1 = b*


 
−b 
.
0 
In fact, this shows that σ x and σ y anti-commute in the precise sense that σ xσ y = −σ yσ x . This is
the property of spin observables that leads to the correct predictions in the version of the Bell
experiment described in the next section.
6. Hidden Variables Again: A Second Failed Consilience
Three electrons fly apart towards 3 widely separated measuring devices, labeled 1, 2, and 3, each
containing a Stern-Gerlach magnet that can be aligned in one of two orthogonal directions, x and
23
y.11 The directions x and y are orthogonal to each other and to the direction of motion of the
incoming particle (which is traveling in the local z
direction). Each device contains two particle detectors,
x
one in the ‘up’ path and one in the ‘down’ path, such that
y
if the incoming electron is detected in the ‘up’ path, the
3
2
y
x
light bulb attached to the device flashes red and if it is
detected in the ‘down’ path, the same light bulb flashes
1
green. The particle is always detected by one or
x
other of the particle detectors, and never both. In
y
common parlance, we say that the electron is
spin-up if the light is red and spin-down if the light is
green in whichever direction the Stern-Gerlach magnet is
Figure 2: The GHZ version of Bell’s
experiment.
aligned.
First, consider the following set of experimental facts: When any 2 of the measurement
devices are set to y and the third is set to x (that is, for settings y-y-x, y-x-y, or x-y-y) then there is
always an odd number of red lights flashing in every trial of the experiment—that is, either all 3
red lights flash or one light flashes red and the other two flash green.
A local hidden variable theory accommodates these experimental facts in the following way:
Suppose that each particle, after separation from the others, carries with it a set of properties that
determine which light bulb will flash for every possible settings of the device it enters. Let us
represent the property that the particle approaching device 1 would cause the red bulb to flash if
the device 1 were set to position x by X1 = +1, and the property that the green bulb would flash
were device 1 set to position x by X1 = −1. According to the hidden variable story, the particle
has the property X1 = +1 or the property X1 = −1, but not both.
Note that while the value of the variable determines the experimental outcome, the variable is
not being defined as representing the observable outcome. The existence of these variables is
postulated by the theory to ‘explain’ the observed outcome. The theory also assumes that the
hidden variables have values when they are not measured.
11
This example is referred to as the GHZ experiment after Greenberg et al (1989). The treatment of the example in
this section and the next follows Mermin (1990) closely.
24
Similarly, let Y1 = +1 and Y1 = −1 represent the two properties that determine the outcome
when the measuring device 1 is set at y. So, a particle heading towards device 1 will have
exactly one of 4 possible sets of properties, which Mermin (1990) refers to as “instruction sets”:
Either {X1 = +1, Y1 = +1}, {X1 = +1, Y1 = −1}, {X1 = −1, Y1 = +1}, or {X1 = −1, Y1 = −1}. These
are the hidden variables introduced to explain the correlation. There is no effort to make hidden
variable model particularly parsimonious. Separate variables are introduced for each spin
property, but, collectively, they act as a common cause.
Note that in a single run of the experiment, a measurement device cannot be oriented in two
directions simultaneously, so we cannot determine all the hidden variable values by direct
measurement. For instance, if we see the bulb flash red when device 1 is set to y, then we would
only know that the instruction set was either {X1 = +1, Y1 = +1} or {X1 = −1, Y1 = +1}. This is
why the variables are called hidden variables.
The laws or regularities that must hold amongst hidden variables in order for the theory to
accommodate the first set of experimental facts then lead to a prediction. What needs to be
accommodated is that fact that the outcome for the third particle must be R if first two outcomes
are either R-R or G-G. Otherwise, the outcome for the third particle is G. The theoretical laws
that are necessary and sufficient to accommodate all three regularities are:
X1 × Y2 × Y3 = +1, Y1 × X2 × Y3 = +1, and Y1 × Y2 × X3 = +1 .
(1)
For example, if the setting is y-x-y, then the second law tells us that if the outcome for particles 1
and 3 is R-R , then Y1 = 1 = X2, and therefore Y3 = 1, and so the outcome for particle 3 will be R.
To present the theory as a probabilistic theory, think of an ensemble of repeated instances of
this experiment of the following type: First, assume that the electron triples are prepared in the
same way in each case, and denote this description by x1 (here x1 has nothing to do with the
hidden variable X 1 , or the directions of any of the Stern-Gerlach magnets). According to the
hidden variable theory, this description is incomplete in that it does not, by itself, imply or
constrain any of the values of the hidden variables. But this does not prevent the theory from
assigning probabilities to the outcomes based on the information that it has.
The second piece of available information is the alignment of the Stern-Gerlach magnets.
We encode this by x2 , where x2 can take on the values x-y-y, y-x-y, y-y-x, or some other
25
alignment triple. The value of x2 will be different in different token experiments, and so each
value will define a sub-type within the ensemble. Next, there is the information that electron
detectors are placed at all possible exit paths, and this fact is encoded by y1 . Finally, z1 encodes
the detector responses, which can be any one of eight possible triples ( R, R, R ) , ( R, R, G ) ,
( R, G, R ) , ( R, G, G ) , ( G, R, R ) , ( G, R, G ) , ( G, G, R ) , or ( G, G, G ) .
general type is encoded by a ‘trajectory’
(( x , x , y ) , z ) .
1
2
1
1
So, each token of the
The function of the theory is to assign
probability values to each of the eight values of z1 conditional on the value of ( x1 , x2 , y1 ) .
As we have described the theory so far, the laws in (1) do not uniquely determine what
probabilities should be assigned. They only imply that 4 of the 8 possible outcomes have
probability 0 (at least when x2 takes on one of the values x-y-y, y-x-y, or y-y-x). So, there are
many hidden variable theories that satisfy the laws in (1), each of which distributes the
probabilities amongst the 4 possible outcomes in a different way. Since the experimental facts of
this example are such that the 4 cases, ( R, R, R ) , ( R, G , G ) ,
( G, R, G ) , ( G, G, R ) , and
( G, G, G ) , occur with equal frequency (modulo expected sampling errors), then I will assume
that the hidden variable accommodates this fact by assigning a probability of ¼ in each case. In
the following section, we shall show that QM predicts this fact as well.
It should be evident by now that it cannot be required of a probabilistic theory that it assign
these probability distributions ‘in advance’. It is normal that prediction is preceded by
accommodation. However, it is also expected of a good theory that, after a period of
accommodation, it will be able to assign probability distributions to experimental situations that
have never been encountered before. Although Bell did not consider this particular example, it
was Bell (1964, 1971) who first saw that ‘local’ hidden variable theories of the kind I have
described do make some predictions.
If the laws in (1) are assumed to apply in all experimental situations (in which the electron
triples are prepared in same way, as specified by x1 ), then the following deductive consequence
of the laws holds in all situations. First, multiply the 3 equations in (3) together, to obtain
(X1 × Y2 × Y3)(Y1 × X2 × Y3)(Y1 × Y2 × X3) = X1 × X2 × X3 = +1,
26
(4)
where we have used the fact that Y12 = Y22 = Y32 = 1. The derived law, X1 × X2 × X3 = +1, implies
that there will also be an odd number of red flashes when all 3 devices are set to x-x-x. This
prediction is dramatically different from the prediction made by quantum mechanics, which
predicts that there must be an even number of red flashes in this experimental context! QM is
right and the hidden variable prediction is wrong!12
7. The Consilience of Quantum Mechanics
I will now derive the correct prediction from quantum mechanics. Instead of using variables,
quantum mechanics assigns spin observables to each particle in every spin direction. So, for
example, the observable σx1 replaces the hidden variable X1 in the hidden variable theory, and the
observable σy3 replaces the hidden variable Y3, and so on. The outcome probabilities are then
determined from the appropriate observable in the way described in section 5. For example, if
the system were in a +1 eigenstate of the observable σx1, then we would predict with probability
1 that device 1 will flash red. If the system is in a −1 eigenstate of the same observable, then the
green light will flash, and so on. If the system is not in an eigenstate of σx1, then the probability
that device 1 flashes red still be inferred from the mean value of σx1, which is equal to
ψ σ x1 ψ , where ψ is the quantum state of the system of three particles. All of this is exactly
what we should expect from the quantum mechanical treatment of sequential spin measurements
(section 5), assuming that the properties of the spin observables do not change from one situation
to the next.
There are 6 spin observables involved in our story: σx1, σy1, σx2, σy2, σx3, σy3. The new
feature of this example (called the GHZ example) is that we can also construct new observables
by considering products and sums of the 6 observables. However, not every function of the 6
observables is a Hermitian operator. Just as in section 5, an operator has to be Hermitian for it to
be guaranteed a real mean value in every state, and a product of two Hermitian operators is
Hermitian if and only they commute. So, it is important to understand that operators pertaining
to different particles always commute. For example, σx1 commutes with σy2, and the product
observable σy1σx2 commutes with σy3, and so on. By using these facts alone, it follows that every
12
As far as I’m aware, this exact experiment has not been performed. My confidence in making this claim is based
on the proven consilience of QM (next section).
27
product observable, like (σy1σx2)σy3, is Hermitian. The product observables that correspond the
three random variable products that appear in (1) are (σx1σy2σy3), (σy1σx 2σy3), and (σy1σy2σ x3).
We have just shown that these are quantum observables, which therefore have well defined mean
values and variances in every quantum state. Analogously to the hidden variable statistics, the
mean values of these product operators will determine the correlations amongst the light flashes.
The quantum mechanical story begins with the assumption that all the electron triples are
prepared in exactly the same quantum state, ψ . The fact that there is always an odd number of
red flashes in the settings y-y-x, y-x-y, and x-y-y tells us that the three product observables have
dispersion-free distributions in the state ψ , which implies that ψ is an eigenstate of all three
product observables. So, the first set of facts is accommodated by the supposition that ψ is in
a +1 eigenstate of all of the product observables (σx1σy2σy3), (σy1σx 2σy3), and (σy1σy2σ x3).
Therefore, in quantum mechanics, (1) is replaced by the laws:
σx1σy2σy3 ψ = +1 ψ , σy1σx 2σy3 ψ = +1 ψ , and σy1σy2σ x3 ψ = +1 ψ .
(2)
What predictions can be made from these quantum mechanical laws? By using the anticommutation property of the spin operators, as derived in section 5, and the fact that any spin
operator times itself is equal to the identity operator (as can be verified directly by squaring the
matrices derived in section 5), and the fact that operators pertaining to different particles
commute, we prove that
(σx1σy2σy3)(σy1σx 2σy3)(σy1σy2σ x3) = (σx1σy2σy3)(σy3σx 2σy1)(σy1σy2σ x3) = σx1σy2σx 2σy2σ x3.
From here, note that
σx1(σy2σx 2)σy2σ x3 =σx1(−σx 2σy2)σy2σ x3 = −σx1σx 2(σy2σy2)σ x3 = −σx1σx 2σ x3,
where the minus sign arises from the anti-commutation law
σy2 σx2 = −σx2σy2.
Recall that the anti-commutation law is required in order to accommodate the facts about
sequential measurements (section 5) But now, from (2), it follows that
(σx1σy2σy3)(σy1σx 2σy3)(σy1σy2σ x3) ψ = +1 ψ ,
and therefore,
28
σx1σx2σ x3 ψ = −1 ψ .
(3)
I have just proved what I promised: The only way for the quantum model to accommodate
the first set of experimental facts is to assume that (2) is true, which implies (3), which then
implies that that there will always be an even number of red flashes when the magnets are set to
x-x-x.
The spin operators of quantum mechanics are projectible in a way that is reminiscent of the
idea that Goodman (1965) had about projectible predicates, except that spin observables are far
better projectors than hidden variables in the GHZ example.
At first sight, the quantum account appears to be flexible about the exact probabilities
distributions assigned by ψ to the 6 spin observables. However, a deeper analysis of the
example reveals that the ψ is uniquely determined by the quantum laws in (2). This is worth
proving in detail because it shows that quantum mechanical accommodation of the first three
experimental results leads to far stronger predictions than those provided by the hidden variable
theory (see the Appendix).
In summary, there is only one way for quantum mechanics to accommodate the first three
experimental facts. This unique state vector then determines the full probability distribution for
those settings (i. e., that the probability of each of each of the 4 possible outcomes is ¼). And,
remarkably, in addition, ψ determines the probabilities for any measurement settings
whatsoever, including any case in which the three magnets are aligned in directions different
from x and y. In contrast, the local hidden variable theory must introduce an entirely new set of
hidden variables for each new measurement direction, and it has discovered no laws connecting
their values to the hidden variables already in play. The hidden variable theory is predictively
impotent in comparison with quantum mechanics, and after all that, the one prediction it does
make is false!
It is not my intention to disparage hidden variable theories in general. After all, the unknown
microstate of a thermodynamical system in statistical mechanics plays the role of a hidden
variable. Rather, the point is that hidden variable explanations of spin correlations gamble on the
discovery of laws that will constrain the hidden variables. The mere assertion of the existence of
these variables does not lead to very much prediction, even though many philosophers consider
them to be highly explanatory.
29
Since there is a one-to-one mapping between hidden variables and spin operators, there is
also an infinite number of spin observables in quantum mechanics. But this is no obstacle in
quantum mechanics because it is possible to express all spin observables as a linear combination
of just three of them (the Pauli matrices). So, quantum mechanics is able to use ψ to calculate
the probability distributions of all spin observables, and all Hermitian functions of them.
After working through an example like this, and after seeing how tightly quantum mechanics
ties the phenomena together, wouldn’t it be miraculous for this huge body of data to fit the
probabilities of quantum mechanics so well without there being some way of explaining this fact
in terms of a reality behind the observed phenomena? Unfortunately, the defense of a realist
interpretation of QM is beyond the scope of this particular essay. Suffice it to say that the
argument given here provides the same kind of empirical evidence for a realist view of the QM
properties of spin as provided for the existence of Newtonian mass, for example. In contrast,
local hidden variable models fail the test of consilience.
8. Why Causal Explanation Is Not Universal
Think again about to the epistemological problem confronting someone locked in Reichenbach’s
cubical world. The simplicity of the analogy encourages us to think narrowly in terms of
explaining the correlations in terms of common causes. Reichenbach made his position more
precise in the following way. Consider two physical quantities X and Y that exhibit a statistical
correlation, such that the correlation does not arise from X directly causing Y or Y directly
causing X. Then, the only alternative appears to be that the correlation is explained by a
common cause variable Z, such that Z causes X and Z causes Y.13 This very powerful idea is
severely challenged by the example described in previous sections.
The first point to establish is that the premises for the common cause inference clearly hold
in the GHZ version of the Bell experiment. Consider a version in which Stern-Gerlach magnet
number 1 is placed in Alice’s laboratory, and magnets 2 and 3 are placed in Bob’s laboratory.
Suppose that Bob hooks up a master light bulb to his two devices that flashes red if and only one
of his devices flashes red and the other flashes green. That is, if both his devices flash red or
13
There are a number of other scenarios that need to be ruled out as well. Arntzenius (1993) provides a concise list
of these. Also see Sober (1984), Cartwright (1989), Eells (1991), Hausman (1998), and Woodward (2003) for
extensive discussions of causal inference and explanation, and related philosophical issues.
30
both flash green, then his master bulb flashes green. If the magnets are all aligned in the xdirection, then there must be an even number of red flashes. This means that Alice’s light
flashes red if and only if Bob’s master light flashes red. There is a perfect positive correlation
between what Bob sees and what Alice sees, provided that all magnets are aligned in the xdirection. Moreover, if we imagine that Alice and Bob measurements are space-like separated,
then the special theory of relativity appears to rule out the existence of any direct causal
interaction between the events.
If we postulate the existence of a hidden quantity X which determines the outcome of Alice’s
measurement, and a quantity P that determines the outcome of Bob’s measurement, then we can
explain the observed correlation if and only if we postulate that X = P.
Now imagine that there is another quantity, Q that Bob could measure by realigning his
magnets that is perfectly correlated with X . Then we would have to add the equation X = Q.
Finally, imagine that there is a fourth quantity Y that Alice could measure that is perfectly
correlated with P. Then, to accommodate this experimental fact, we would have to add the
equation Y = P. But now the equations X = P, X = Q, and Y = P logically imply that Y = Q. This
entails the prediction that that measurements of Y and Q are perfectly correlated. But now
suppose that the prediction is wrong. What do you do? Each electron triple is postulated to have
definite values of the variables { X , Y , P, Q} and there is no way of assigning values to these
variables that acts as a joint common cause variable that explains all the experimental facts. At
least, here is no common cause variable if we assume that the values of the variable are fixed
when the electron triple is created, and these values determine the outcomes of the measurements
and that their values are independent of Alice or Bob’s choice of measurement setting. If we do
allow for strange changes in variable values when the electrons are in flight, or imagine that their
values are determined with a foreknowledge of the settings that Alice and Bob are going to
choose, then we can accommodate the phenomena. But what is the motivation for this ad hoc
way of rescuing hidden variables when quantum theory accounts for the phenomena in a way
that is beautifully predicted from the outcomes of sequential spin measurements?
If the best test of a theory is the consilience of inductions—in this case, the consilience of
sequential spin phenomena and triplet spin phenomena—then non-local hidden variable theories
are not only ad hoc, but they also fail to compete with QM. In order to compete, they must not
31
only accommodate different kinds of phenomena separately, but also predict aspects of one kind
of phenomena from different kinds of phenomena.
The consilience of QM not only explains the growing consensus that the QM strategy of
replacing variables with operators is here to stay, but it also explains why cubical world
inference is so hard to characterize in general terms. For the inference that is ultimately
convincing may depend on the invention of new theoretical concepts and mathematical
formalisms, which are not easy to anticipate in advance. Such is the case in QM.
The good news is that there do appear to be overarching methodology principles that apply
equally well to the old and the new physics. As in science itself, successful predictions are
impressive. Whewell’s prediction was that the consilience of inductions is the best indicator we
have of good science. It appears that the consilience of inductions is not explains why some
causal explanations work, but it also suggests that other causal explanations will always fail. At
least so long as causes are characterized in terms of hidden variables.
Appendix
The purpose of this appendix is to prove that the quantum mechanical accommodation of the first
three experimental facts in the GHZ experiment leads to far stronger predictions than those
provided by the hidden variable theory.
Because there are 8 possible outcomes, the state vector ψ is a vector in an 8-dimensional
Hilbert space. This space is constructed out of the three 2-dimensional Hilbert spaces that would
be used to represent the states of electrons separately. Let us choose the basis vectors for the
first Hilbert space to be x + and x − , etc., where the subscripts keep track of the Hilbert space
1
1
in question. It is now possible to prove that the eight vector products, x ±
1
x±
2
x ± , form a
3
natural basis for the 8-dimensional Hilbert space, where the products are formed by generalizing
ordinary matrix multiplication in a natural way. For instance,
 1     0   0 
0         
 0  1   0   1     0     0    0 
=
=
1   0  = 1    0   = 
 1    1   1 
       
1         
 0    0   0 
32
is obtained treating the second column matrix in the product as if it where simply a 1 × 1 matrix
with a column vector as its element. By extending the same idea to the product of three 2vectors, one may easily confirm that the 8 products x ±
x±
1
2
x±
3
are each equal to an eight-
element column matrix in which exactly one element is equal to 1 and the rest are 0. In each of
the 8 cases, the 1 is in a different position, and so these products generate the 8 basis vectors of
an 8-dimensional Hilbert space.
Recall that the matrix form of the spin operators derived in section 3 implies, for example,
 0 i  1   0 
0
=   = − i   = −i x − .



 −i 0   0   −i 
1 
σ y x+ = 
In the context of the 3-electron system, a spin operator like σ y 3 operates on the third part of the
vector products, so
(
σ y3 x+ x+ x+ = x+ x+ σ y3 x+
) = −i x
+
x+ x− ,
where I have dropped the subscripts because the order in which the 2-vectors appear in the
product determines which subscript should be understood. We now have everything in place that
we need to apply the eigenvalue equations in (2) to an arbitrary state vector
ψ = ∑∑∑ c j ,k ,l x j x k x l ,
j
k
l
where the indices j, k, and l, range over the values +1 and −1. Calculating the constraints on the
arbitrary complex coefficients c j ,k ,l is tedious, but it involves nothing more than simple algebra.
By setting c−1,−1,−1 = 1 , and making sure that resulting state vector is normalized, one proves that:
ψ =
1 + + −
1 + − +
1 − + +
1 − − −
x x x +
x x x +
x x x +
x x x .
4
4
4
4
The expression is easy to remember because there is an even number of +’s in each term.14
14
Some readers may worry about the fact that this state vector is symmetric, whereas the state vectors of
indistinguishable fermions are supposed to be anti-symmetric. Here it should be noted that I have only written the
spin component of the state vector. The complete state vector is anti-symmetric because the wavefunction
component (which gives the position probabilities) is anti-symmetric.
33
References
Arntzenius, Frank (1993): “The Common Cause Principle.” PSA 1992 Volume 2: 227-237. East
Lansing, Michigan: Philosophy of Science Association.
Arntzenius, Frank (1997): “Transition Chances and Causation.”
Quarterly 78: 149-168.
Pacific Philosophical
Bell, John S. (1964). “On the Einstein-Podolsky-Rosen Paradox”, Physics 1: 195-200.
Bell, John S. (1971), “Introduction to the Hidden Variable Question”, in B. d'Espagnat (ed.)
Foundations of Quantum Mechanics, N. Y.: Academic Press.
Butts, Robert E. (ed.) (1989). William Whewell: Theory of Scientific Method. Hackett Publishing
Company, Indianapolis/Cambridge.
Cartwright, Nancy (1989): Nature’s Capacities and their Measurement.
Oxford: Oxford
University Press.
Eells, Ellery (1991): Probabilistic Causality. Cambridge: Cambridge University Press.
Forster, Malcolm R. (1984): Probabilistic Causality and the Foundations of Modern Science.
Ph.D. Thesis, University of Western Ontario.
Forster, Malcolm R. and Alexey Kryukov (2003), “The Emergence of a Macro-World: A Study
of Intertheory Relations in Classical and Quantum Mechanics,” Philosophy of Science 70:
1039-1051.
Gillespie, Daniel T. (1970): A Quantum Mechanics Primer: An Elementary Introduction to the
Formal Theory of Non-relativistic Quantum Mechanics. Scranton, Pennsylvania:
International Textbook Co.
Goodman, Nelson (1965): Fact, Fiction and Forecast, Second Edition. Harvard University
Press, Cambridge, Mass.
Greenberger, Daniel M., Michael A. Horne and Anton Zeilinger (1989): “Going Beyond Bell’s
Theorem” in M. Kafatos (ed.), Bell’s Theorem, Quantum Theory and Conceptions of the
Universe, Kluwer Academic Publishers, pp. 69-72.
Harper, William L. (2002), “Howard Stein on Isaac Newton: Beyond Hypotheses.” In David B.
Malament (ed.) Reading Natural Philosophy: Essays in the History and Philosophy of
Science and Mathematics. Chicago and La Salle, Illinois: Open Court. 71-112.
Hausman, Daniel M. (1998): Causal Asymmetries. Cambridge University Press.
34
Hempel, Carl G. (1965): Aspects of Scientific Explanation and Other Essays in the Philosophy of
Science. New York: The Free Press.
Hung, Edwin (1997): The Nature of Science: Problems and Perspectives, Wadsworth Publishing
Co.
Khinchin, A. I. (1960): Mathematical Foundations of Quantum Statistics, Dover Publications,
Inc., Mineola, New York.
Mermin, David N. (1990) “Quantum Mysteries Revisited”, American Journal of Physics, August
1990, pp.731-4.
Myrvold, Wayne and William L. Harper (2002), “Model Selection, Simplicity, and Scientific
Inference”, Philosophy of Science 69: S135-S149.
Reichenbach, Hans (1938): Experience and Prediction. Chicago: University of Chicago Press.
Reichenbach, Hans (1956): The Direction of Time. Berkeley: University of California Press.
Reidy, Michael S. and Malcolm R. Forster (forthcoming), “William Whewell (1794-1866),” in
Thomas Hockey (ed.), Biographical Encyclopedia of Astronomers, Kluwer Academic
Publishers.
Sober, Elliott (1984): The Nature of Selection: Evolutionary Theory in Philosophical Focus.
MIT Press, Cambridge, Mass.
Sober, Elliott (1994): “Temporally Oriented Laws,” in Sober (1994) From A Biological Point of
View - Essays in evolutionary philosophy, Cambridge University Press, pp. 233 - 251.
van Fraassen, Bas (1980), The Scientific Image, Oxford: Oxford University Press.
van Fraassen, B.C. (1983), “The Charybdis of Realism: Epistemological Foundations of Bell's
Inequality Argument”, Synthese.
Whewell, William (1858): Novum Organon Renovatum, Part II of the 3rd the third edition of The
Philosophy of the Inductive Sciences, London, Cass, 1967.
Woodward, James. (2003): Making Things Happen: A Theory of Causal Explanation . Oxford
and New York: Oxford University Press.
35