* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download QM Consilience_3_
Atomic theory wikipedia , lookup
Delayed choice quantum eraser wikipedia , lookup
Topological quantum field theory wikipedia , lookup
Quantum field theory wikipedia , lookup
Quantum machine learning wikipedia , lookup
Elementary particle wikipedia , lookup
Orchestrated objective reduction wikipedia , lookup
Renormalization wikipedia , lookup
Particle in a box wikipedia , lookup
Scalar field theory wikipedia , lookup
Wave–particle duality wikipedia , lookup
Wave function wikipedia , lookup
Quantum group wikipedia , lookup
Many-worlds interpretation wikipedia , lookup
Matter wave wikipedia , lookup
Bohr–Einstein debates wikipedia , lookup
Hydrogen atom wikipedia , lookup
Identical particles wikipedia , lookup
Spin (physics) wikipedia , lookup
Quantum key distribution wikipedia , lookup
Density matrix wikipedia , lookup
Bra–ket notation wikipedia , lookup
Copenhagen interpretation wikipedia , lookup
Renormalization group wikipedia , lookup
Quantum teleportation wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Bell test experiments wikipedia , lookup
Path integral formulation wikipedia , lookup
Measurement in quantum mechanics wikipedia , lookup
Quantum entanglement wikipedia , lookup
History of quantum field theory wikipedia , lookup
Interpretations of quantum mechanics wikipedia , lookup
Quantum electrodynamics wikipedia , lookup
Relativistic quantum mechanics wikipedia , lookup
Double-slit experiment wikipedia , lookup
Canonical quantization wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Probability amplitude wikipedia , lookup
Quantum state wikipedia , lookup
EPR paradox wikipedia , lookup
The Miraculous Consilience of Quantum Mechanics Malcolm R. Forster Department of Philosophy, University of Wisconsin-Madison September 5, 2004 ABSTRACT: Some quantum mechanical phenomena are notoriously hard to explain in causal terms. But what prior motivation is there for seeking a causal explanation in the first place, other than the fact that they have been used successfully to explain unrelated phenomena? The answer is two-fold. First, the agreement of independent measurements of probabilities is the mark of successful causal explanations, which fails in many quantum mechanics examples. But secondly, and more importantly, causal explanations fail to replicate the successful predictions made by quantum mechanics of one phenomena from other phenomena of a very different kind. 1. Baby Epistemology Think back to a time when mirrors were a new experience to us as children. How did we learn that what we saw in mirrors was not a different world, but a reflection of the world we already knew? How did we acquire the parsimonious view that reflections were independent views of the same objects? 1 Presumably, it has something to do with the way that mirror Entity images are correlated with our more direct perception of the objects. Consider a different way of illustrating the same idea. Reichenbach (1938) imagined an observer enclosed in the Observer corner of a cubical world, where objects in the external Figure 2: Reichenbach’s cubical world. 1 This mirror example is used in Hung (1997). world cast shadows on the walls of the enclosure. It seems that our knowledge of the external world would be better if external objects cast two independent shadows on two walls of the enclosure, rather than a single shadow. Again, we may ask why we should conclude that they are shadows of the same object, rather than shadows of different objects. There is no good formal theory of how such inferences work. Nevertheless, they seem to conform to informal principles such as the principle of parsimony, or Occam’s razor, which states that “Entities are not to be multiplied beyond necessity.”2 The rule is vague in crucial ways. For one, what counts as an entity? And under what conditions does it become necessary to multiply an entity? Consider the first question more carefully. Do probabilities count as entities? Is there, in other words, a principle of inference that says that probabilities are not to be multiplied beyond necessity? In section 3, I argue that such a principle is crucial in causal modeling, for it provides an account of the time asymmetry of cause and effect that does not rely on the concept of manipulation or intervention. On the other side of the same coin, there are well known circumstances in which causal modeling does not work. In section 4 I show that the application of Occam’s razor to probabilities leads to false predictions in the infamous double-slit experiment. The question then arises: Does Occam’s razor apply to quantum mechanics (QM) in some different way? Is QM parsimonious with respect to other kinds of entities? This is a difficult question, which I attempt to answer affirmatively in sections 5, 6, and 7, at least within the context of a concrete example. The concrete example is interesting because it is similar to the example used by Bell (1964, 1971) to falsify local hidden variable interpretations of QM. I provide a more recent shorter version of Bell’s derivation of a false prediction in section 6, while in section 7 I show that it is actually very close to the way that QM makes the correct prediction (just replace variables with operators). Bell’s theorem can be viewed as an argument that (at least some) quantum mechanical phenomena have no causal explanation (van Fraassen 1983). This fits with the theses argued in prior sections: (1) The constancy or invariance of probabilities is an essential component of 2 The principle is named after William of Ockham (~1280 - 1347 AD), who stated in Latin “non sunt multiplicanda entia praeter necessitatem.” 2 causal modeling, and (2) the assumed invariance of probabilities leads to a false prediction in the double-slit experiment. Therefore, there is no causal explanation of the double-slit phenomena. The advantage of Bell’s argument is that it is harder to wriggle out the conclusion (though not impossible). In the discussion (section 8) things are tied together in the following way. One way of making Reichenbach’s cubical world inference precise is in terms of his principle of common cause (Reichenbach 1956). This principle appears to provide a sufficient condition for postulating a single hidden variable to explain two effects. This sufficient condition clearly applies to the QM correlations in section 6, and when the parsimonious hidden variable model is considered, it leads to a false prediction. The arrow of modus tollens can be pointed in two directions. One response is to say that it is necessary to multiply the hidden variables, while a different response is to abandon hidden variables completely. Quantum theory takes the second option by replacing variables with operators. This is what provides a description of entities to which Occam’s razor does apply. The lesson is that the theory tells us what entities are not to be multiplied beyond necessity. Progress in science sometimes depends on radically new ways of thinking that fail to respect the intuitive standards of explanation and understanding set by preceding theories or common sense. I have not attempted to provide a formal account cubical world inference, partly because I’m not sure that there exists a single formal account. In the absence of a precise theory, the best strategy is to look at disparate examples of science. Nevertheless, a good informal description of the kind of the scientific inferences that use Occam’s razor was published in 1858 by William Whewell. Section 2 provides a brief introduction to these ideas. 2. Whewell’s Consilience of Inductions Long before the discovery of quantum mechanics, William Whewell (1858) claimed that scientific induction proceeds in three steps: (1) the Selection of the Idea, (2) the Construction of the Conception, and (3) the Determination of the Magnitudes (see Butts 1989, 223-237). In curve-fitting, for example, the selection of the idea is the selection of the independent variable (x) from which we hope to predict the dependent variable (y). The construction of the conception is determined by the choice of the formula (family of curves). And the determination 3 of magnitudes is what statisticians now refer to as the estimation of the adjustable parameters. Whewell then makes an insightful claim about curve-fitting: If we thus take the whole mass of the facts, and remove the errours of actual observation, by making the curve which expresses the supposed observation regular and smooth, we have the separate facts corrected by their general tendency. We are put in possession, as we have said, of something more true than any fact by itself is. (Whewell, quoted from Butts 1989, p. 227.) In other words, the purpose of the curve, or the formula, is to capture the general tendency of the data, the signal behind the noise, and this is something “more true” than the sum of the observed facts. In another place, he again emphasizes that unification is the key to scientific innovation: The particular facts are not merely brought together, but there is a New Element added to the combination by the very act of thought by which they are combined. There is a Conception of mind introduced in the general proposition, which did not exist in any of the observed facts. ... The pearls are there, but they will not hang together until some one provides the string. (Whewell, quoted from Butts 1989, pp. 140, 141.) In the case of curve-fitting, the conceptual string is the curve, but Whewell also insists that the same metaphor applies to all instances of scientific induction. The best examples of science consist in much more than the mere colligation of facts (which is Whewell’s name for ‘induction’). It also strives towards what he calls the consilience of inductions (note the plural). The mark of a good theory lies not in the relationship between the theory and its data in a single narrow application, but in the way it succeeds in ‘tying together’ separate inductions. A good theory is like a tree that puts out runners that grow into new trees, until there is a huge forest of mutually sustaining trees. The ‘tying together’ can be achieved in either of two ways: (a) The theory accommodates one set of data, and then predicts data of a different kind. (b) The theory accommodates two kinds of data separately, and then finds that the magnitudes in the separate inductions agree, or that the laws that hold in each case ‘jump together’.3 Whewell describes both kinds of cases as leading to a consilience of inductions.4 3 When the magnitudes agree, then we have what is more commonly referred to as an agreement of independent measurements. It is no coincidence that Newton scholars, such as Harper (2002), emphasize the importance of the agreement of independent measurements in Newton’s argument for universal gravitation. For Whewell was also primarily concerned with the explication of Newton’s methodology. See also Myrvold and Harper (2002) for an argument that this kind of evidence is not properly taken into account in standard statistical methods of model selection. 4 The main purpose of this essay is to show how parts of quantum theory can be viewed as a achieving a consilience of many inductions. I believe that this achievement of QM is the starting point of realist interpretation of quantum mechanics that is ontologically more parsimonious than hidden variable interpretations of quantum mechanics. First, I need to show that there is nothing built into Whewell’s methodology that militates against the success of causal explanations. Quite the opposite. The consilience of inductions actually provides a criterion for when causal explanations should and should not be applied. It is exactly because this criterion can fail in QM that causal explanations should be ruled out for some, though not all, quantum mechanical phenomena. 3. The Consilience of Correct Causal Models Consider a very simple example—the jackpot machine. The machine has two input states: Either one euro coin is placed in the machine, or two euro coins are placed in the machine, then a handle is pulled. The output state is either ‘win’ or ‘lose’. For the sake of concreteness, suppose ‘win’ refers to the event that machine delivers 10 euros as a ‘jackpot’, and a loss leads to a zero payout. In this example, the input states do not determine the output states. So, the ‘constancy’ built into the machine is described in terms of the constancy or invariance of probabilities. Introduce two variables, X and Y. Upper case letters are used in statistics to denote variables that have probabilities assigned to their possible values. Such variables are called random variables. X represents the input state of the machine in some particular trial of the experiment, while Y represents the output state in the same trial. The possible states of the machine can be represented by assigning arbitrary numerical values to these variables. Let X = 1 denote the event that only one euro is placed in the machine before the handle is pulled, while X = 2 denotes the event that 2 euros are placed in the machine. Y = 0 denotes the event that there is no payout, while Y = 0 denotes the event that the jackpot (of 10 euros) is paid out. The standard ‘forward’ causal model says: P(win |1euro) = α , and P (win | 2 euros) = β , for all trials i. 4 My purpose is not to argue for some particular historical or exegetical thesis about Whewell’s notion of consilience, but to use (and adapt) Whewell’s idea for the purpose of explaining how probabilistic theories work in general, and how quantum mechanics works in particular. 5 That is, different trials have something in common; namely, that the values of the forward conditional probabilities are the same in all cases. Note that the model postulates values for all (Alice) 1 euro 2 euros loss 90 80 win 10 20 forward probabilities: P(loss |1euro) = 1 − α , and P (loss | 2 euros) = 1 − β , for all trials i. These probabilities are also invariant because they are defined solely in terms of the constants α and β. These four probabilities are postulated by the model. They are theoretically entities. Their measurement, or estimation, is determined from the available data in the same way that any theoretical quantity is measured. Consider a typical set of data. Suppose that Alice plays the machine 200 times, and we record the input and output state on each trial. The data consist in a sequence of data ‘points’, which come in four flavors: (1, 0) , (1,1) , (2, 0) , or (2,1) . Because the model says that the temporal order of the trials does not matter, the data are adequately recorded in the table of the observed frequencies on the right. Notice that Alice plays the machine with 1 euro half the time, and with 2 euros half the time (100 trials each). Out of all the times she plays the 1 euro version of the game, she wins the jackpot 10 times, thus earning 100 euros, which is the same amount that she paid to play. Out of all the times she pays 2 euros to play, she wins 20 out of the 100 times (twice as often), and earns a total of 200 euros (twice as much). But she paid twice as much to play, so she still earned the same as what she paid to play. On the basis of the data, the machine appears to be fair. Whewell’s three steps in the colligation of facts applies to this example in the following way. Step 1 consists in the selection of X and Y as the relevant quantities to be considered. Step 2 introduces the formula, which in this case is probabilistic in nature. It introduces a family of probability distributions parameterized by the adjustable parameters α and β. These determine the Conception. In this example, the conception is probabilistic in nature. Step 3 is the determination of the magnitudes α and β from the data. In contemporary statistical theory, this is achieved by the method of maximum likelihood estimation (MLE). To understand how MLE works, first note that each pair of values assigned to α and β picks out a particular probabilistic hypothesis in the model. The fit of each hypothesis with the data is defined by its likelihood, which is, by definition, the probability of the data given the hypothesis 6 (this should not be confused with the probability of the hypothesis given the data, which is a distinctly Bayesian concept). The greater the likelihood of a hypothesis (the more probable it makes the data) the better the hypothesis fits the data. The hypothesis that fits best is, by definition, the hypothesis in the model that has the maximum likelihood. For arbitrary values of α and β, the likelihood is: Likelihood(α, β) = (1 − α )90 α 10 (1 − β )80 β 20 . The probabilities are multiplied together because each trial is probabilistically independent of all the others according to the model, in the same way that coin tosses are independent. Mathematically speaking, maximizing the likelihood is the same as maximizing the loglikelihood. log-Likelihood(α, β) = [90 log(1 − α ) + 10 log(α )] + [80 log(1 − β ) + 20 log( β )] . Again, the terms in the square brackets can be maximized separately by differentiating with respect to α and β and putting the resulting expressions equal to zero. Note that d log x dx = 1 x . After multiplying by the factors α (1 − α ) and β (1 − β ) , respectively, the equations simplify to: −90α + 10(1 − α ) = 0 and −80β + 20(1 − β ) = 0 . These equations yield the estimates: αˆ = 0.1 and βˆ = 0.2 . Accordingly, the theoretically postulated probabilities are estimated by the natural relative frequencies in the data, just as one would naïvely expect. The question is whether the ‘forward’ model, which I have called the standard model, has evidence in its favor that a ‘backward’ model does not. The backward model seeks to predict values of X from values of Y, in the same probabilistic sense of ‘prediction’. In other words, it postulates backward probabilities as follows: P (2 euros | loss) = γ , and P (2 euros | win) = δ , for all trials i. Using the same methods as before, the parameters of the backward model are estimated by the corresponding relative frequencies in the data. The estimated values are: γˆ = 0.47 and δˆ = 0.67 . There is nothing that points to any empirical difference between the models. In fact, 7 they cannot be compared with regard to their fit to the data because they tackle very different prediction tasks. The fit of the forward model is measured in terms of Y-values, while the fit of the backward model is measured in terms of X-values. In this sense, the two models are incommensurable. So, why don’t the two models happily co-exist? There is a sense in which they do co-exist, for the forward model is capable of making backward predictions if it is provided with information about the relative frequency of 1 euro versus 2 euro trials. With this information, together with the estimated forward probabilities, the forward model can calculate the backward probabilities, and gets the same answer as backward model. This is because they are effectively calculated from the table of data in both cases. But this only serves to deepen the puzzle. If both models can be seen as predictively equivalent, why interpret one as causal and not the (Bob) 1 euro 2 euros lose 90 160 win 10 40 other? This is the familiar puzzle about cause and correlation: The evidence for causation cannot be exhausted by the correlations in the data, for correlations are symmetric, while causation is not. Either there is no additional empirical evidence, in which case causal inference is based on nonempirical criteria (or psychological habit!?), or there is other evidence that breaks the symmetry. As the reader may anticipate, the solution is to look at how the models predict data of a different kind. Suppose that Bob plays the machine, and he happens to play with 2 euros twice as often as with 1 euro. Given that the forward model is true, the frequency that Bob wins will conform to the same forward probabilities, modulo a fluctuation in the data caused by sampling errors. The noise fluctuations are not relevant to this discussion, so imagine that the number of data is very large. Unfortunately, this would made the numbers harder to work with, so I have more simply assumed that the relative frequencies conform exactly to the ‘true’ values. The data from Bob’s trials are therefore given by the natural frequencies in the table on the right. Given the way that the example is set up, the independent measurements from Alice’s and Bob’s data of the parameters α and β will agree. The key question is whether the same is true for the backward model. If it is not, then the symmetry between the models is broken. A simple calculation shows that the new estimates of the parameters or the backward model are γˆ = 0.64 and δˆ = 0.80 . These are not close to the previous values, which were γˆ = 0.47 and δˆ = 0.67 . Therefore, the backward model fails the test of consilience, and the symmetry between 8 the forward and backward model is broken on empirical grounds (Forster 1984, Sober 1994, Arntzenius 1997). There is no such evidence available by considering the goodness-of-fit of the pooled data— the models are always symmetric with respect to any single data set, as was shown by considering Alice’s data. The evidence that differentiates the two models is always relational. 4. An Failed Consilience in the Double-Slit Experiment The postulation of hidden variables is a way of interpreting the probabilities of quantum mechanics as ‘measures of ignorance’. Einstein, for example, believed that a future physics would reveal the existence of such hidden variables, and the decay of radioactive particles, for example, could be predicted exactly. Bohr, on the hand, thought that quantum physics was complete in the sense that Light source quantum probabilities are here to stay. At the time of the debate, the most common A C B example used was the double-slit experiment. Consider a particle of light (photon) that leaves the light source and travels through either slit A or slit B (and not both). We may represent this event in terms of a random variable X, which can have one of two values, x A and xB .5 For our purposes, it doesn’t matter Figure 1: In the double-slit experiment, the assumption that individual photon travel through either slit A or slit B leads to the false prediction that double-slit pattern is the sum of two single-slit pattern. what numerical values we use, so let x A = 1 and xB = −1 . Now introduce a second random variable Y such that Y = 1 if the particle is detected in some region C , and 0 otherwise. In a more natural shorthand notation, let A stand for the event X = 1, B of for the event X = −1, and C is the event Y = 1. The probability of C given A is written P (C | A) . Similarly, P (C |B) is the probability that it arrives at C given that it is passes through slit B. P( A) is the probability that the particle passes through slit A given that it passes through any slit. Then the probability of C (given that it arrives somewhere on the screen) is, according the axioms of probability: 5 Recall that a random variable is, by definition, any variable that has a probability distribution associated with it. 9 P(C ) = P ( A) P(C | A) + P ( B) P(C |B) . It is worthwhile working through the proof of this theorem, because it introduces some elementary concepts of probability theory that are Not-C C 40 10 A passes through slit A or B, but not both. Then B is not-A 20 30 logically equivalent to the event not-A. Suppose that 50 particles pass through slit A and 50 controversial in QM. Let’s suppose that the particle particles pass through slit B. Of the 50 particles that pass through slit A, 40 arrive at C. Of the 50 particles that pass through slit B, 20 arrive at C. Therefore, a total of 60 out of 100 particles arrive at C. In symbols, P ( A) = 50 100 , P (C | A) = 40 50 , P( B) = 50 100 . P (C |B) = 20 50 . Therefore, P ( A) P(C | A) + P( B) P(C |B) = 50 40 50 20 60 + = = P(C ) 100 50 100 50 100 P ( A, C ) is the probability that the particle passes through slit A and arrives at C. According to the table, this joint probability is 40 100 . The argument assumes that joint probabilities, such as P( A, C ) , exist. So far, there is no problem. But now postulate that the probabilities P (C | A) and P(C |B) do not depend on whether the other slit is open or closed. This often called a locality assumption because it assumes that what happens to a particle passing through one slit is unaffected by what is happening non-locally (at the other slit in this case). For me, it is an invariance assumption—it is an attempt to unify the phenomena of the single-slit and doubleslit experiments. It is the attempted consilience, which may also be view as an application of the principle that “Probabilities are not to be multiplied beyond necessity.” Unification is important because it leads to predictions. But in this case it leads to the false prediction that the double-slit pattern is the average of the two single slit patterns. If the slits are the right distance apart, then is double-slit interference pattern has its brightest spot at C (see Fig. 1), whereas any average of the single slit patterns has the brightest spots directly in front of the two slits. The invariance of the probabilities between the single-slit and double-slit experiments provides a potential consilience of inductions, but when the consilience fails, the intuitive explanation on which it is based is called into question. 10 There are ways of resisting this conclusion. The first obvious response is that different photons are interacting with each other after they pass through the slits. This possibility is highly implausible in light of the fact that the interference pattern is exactly the same if the intensity of the light is so low that only one photon passes through the slit at a time. Another possibility is that there is some kind of continuously emitted pilot wave that guides the particles to their appropriate destinations. The pilot wave is affected by whether the second slit is open or closed. You can see that this hypothesis is ad hoc if there is no way of independently detecting the existence of the pilot wave. But it doesn’t lead to any false predictions. To that extent it solves the problem. On the other hand, it does not fix the failed consilience either. The argument that probabilities should be invariant across the single-slit and double-slit experiments is based on the assumption that the particles pass through one slit or the other in every instance. It is an important prediction of quantum mechanics that if the precise trajectory is experimentally determined, then the classical prediction is correct—that is, in this case the pattern on the screen will be the sum of the two single-slit patterns. For example, electrons can be detected going through a particular slit by the electric current they induce when they pass through a wire loop. So if we place wire loops behind the two slits, then the sum of the two single slit patterns will be observed. The problem with the classical explanation is that it is sometimes wrong, and this can be traced back to its use of variables. In classical models, variables serve a dual purpose. On the one hand, they represent the values of quantities that are actually measured. At the same, they continue to represent quantities that are assumed to have precise values even if those values cannot be determined by the theory in the particular experimental context. In this role, they are acting as hidden variables. The hidden variables represent the causes of the values of the observed quantities. The positions and momenta of particles in a gas are examples of hidden variables in classical mechanics, so the assumption of hidden variables does not always lead to false predictions in physics. Quantum mechanics is distinguished from classical mechanics by the fact that it delegates these two roles to different kinds of mathematical objects. Variables are still use to represent observed quantities. But variables are not used to represent the quantities that are 11 “measured”.6 Instead, they are represented by a quantum mechanical observable, X̂ . It is enough for the present purpose to know that quantum mechanical observables are represented mathematically by operators in an abstract formalism, where these operators have one important similarity with variables, and one important difference. The similarity is that they still have a set of possible values assigned to them—called eigenvalues. The eigenvalues are real numbers, just as the values of variables are real values. There is also a precise probability distribution assigned to these eigenvalues, so QM observables are also similar to random variables. But operators have an algebraic property that variables do not have. For instance, let X represent the observed vertical position of a particle as it passes through the slit and let P represent the observed vertical component of the momentum of the particle as it is deflected by the tungsten plate in which the slits are housed. It is mathematically true of any variables that XP = PX . That is, variables always commute when they are multiplied together. While the corresponding quantum mechanical operators, X̂ and P̂ , can also be ˆ ˆ ≠ PX ˆ ˆ . The multiplied together, they do not necessarily commute. That is, in general, XP non-commutation of QM observables, when it holds, constrains the possible joint probability distributions that can be assigned to the two observables in a way that leads to Heisenberg’s famous uncertainty principle. Observables whose operators fail to commute are called incompatible observables. The result is that the product of the variances of the two incompatible observables must be greater than zero. If one of the variances is 0, then the other must me infinitely large. For those who do not know the definition of the variance of a random variable, all you need to know is that QM rules out probability distributions that assigns probability values of 0 or 1 to the values of incompatible observables. A theory, such as classical statistical mechanics, may not in fact assign probabilities of 0 and 1 either, but the formalism of the theory does not rule it out as theoretical possibility. According to the orthodox interpretation of quantum mechanics, this means that incompatible observables never have precise values simultaneously. For on this interpretation of QM, an observable has a precise value if and only if the QM variance of the observable is zero. But according to the uncertainty principle, if the variance of an observable is close to zero, then the variance of any incompatible observable must be very 6 The scare quotes are necessary because the very concept of measurement changes in QM. 12 large, so their product remains finite. This is loosely expressed by saying that if the particle has a precise position, then it loses its momentum. Einstein was someone who rejected the orthodox interpretation of QM. But we have already seen that the conclusion is not easy to reject, for its rejection, coupled with other plausible assumptions about locality, can lead to a false prediction. The probability distribution that QM assigns to a QM observable depends on the quantum mechanical state of the system. The state is represent by a wave function, which in this case assigns a complex number to every point in the so-called configuration space, which in our example is the space of all the possible positions that can be assigned to the particles. Again, the details of the formalism do not matter here. What’s important is that all states determine probability distributions that conform to the uncertainty principle. We are now in a position to understand how the apparently more constrained quantum mechanical formalism can succeed where the hidden variable account fails. Recall that the hidden variable theory unified the phenomena by assuming that the probability distributions in the two single slit experiments add together to produce the probability distribution in double slit experiment. Quantum mechanics replaces the additivity of probabilities with the additivity of wave functions. Let’s introduce a random variable Y , where y denotes an arbitrary value of Y, to represent the possible points on the screen at which the particle may be detected. Now consider the single slit experiment in which slit B is closed. Suppose a quantum mechanical model entails a wave function that does not depend on time, and has a complex number ψ A ( y ) associated with each point on the screen. Then the model implies that the probability that a particle is detected near the point y is proportional to | ψ A ( y ) |2 . Note that the square of the magnitude of complex number is a non-negative real number. Similarly, the probability of a particle landing near y is | ψ B ( y ) |2 in the other single slit experiment. Then the probability of a particle landing near the same point in the double slit experiment is: | 1 2 ψ A ( y) + 1 2 ψ B ( y ) |2 = 12 | ψ A ( y ) |2 + 12 | ψ B ( y ) |2 + interference terms. If the interference terms are zero, then the prediction is that same as the hidden variable prediction. QM allows for the additivity of probabilities but when it applies, it is derived from the additivity of wave functions. The virtue of the quantum mechanical model is that it 13 succeeds in unifying disparate phenomena. In other words, it leads to a successful consilience of inductions in all cases. The point of this section is two-fold. First, modeling in quantum mechanics is very different from curve fitting because it uses operators in place of variables. Nevertheless, the reasons for doing this make sense in terms of the general goals of modeling in classical physics. In some cases, the postulation of hidden variables is harmless. But when interference effects are present, the postulation of hidden variables leads to a false prediction. The conceptual innovations of quantum mechanics succeed in making correct predictions in accordance with a methodology that is consistent with Whewell’s idea that a successful consilience of inductions is the strongest kind of evidence that we have for any theory. 5. Sequential Spin Measurements on a Single Electron In the quantum mechanical model of electron spin, there is a QM spin observable corresponding in every direction in 3-dimensional space. This makes no sense for spin in the classical sense, which means that we should not try to read too much into the word ‘spin’ in quantum mechanics. It is just a name given to a new kind of QM observable that does not arise in classical physics.7 There are really only two facts about quantum mechanical spin that you need to understand here. The first is that if an electron passes through a device known as a Stern-Gerlach magnet (which creates a non-linear magnetic field), then the electron will exit the device along one of two possible paths, one called the ‘up’ path and the other called the ‘down’ path. If the electron is traveling in the z direction and the Stern-Gerlach magnet is aligned, for example, in the x direction, and the electron is detected in the ‘up’ path, then in common parlance, one says that the electron has been subjected to a spin “measurement” in the x direction, and the outcome of the measurement is ‘spin-up in the x direction’.8 However, the reader should not assume that the electron already had the property of being ‘spin-up in the x direction’ prior to the “measurement”. This amounts to introducing a classical variable, which is exactly what leads to false predictions. In classical physics, it is always the 7 I don’t mean to deny that there are good reasons for using the term ‘spin’. 8 To avoid a possible confusion about the directions, assume that we straighten the paths of the electron after it passes through the first measurement device so it is always traveling in the z direction. 14 function of “measurements” to reveal the values of hidden variables. It’s not that an electron cannot have properties prior to the “measurement” in the quantum theory. It’s just that one should not jump to any hasty conclusions about what those properties are, or how they should be represented mathematically. As we have already mentioned, “measurement” in QM has a different meaning. It is therefore appropriate to fix some terminology. If an electron is passed through a Stern-Gerlach magnet, but no particle detector, such as a Geiger counter, is placed in either the ‘up’ or the ‘down’ path, then no measurement has taken place. However, if one Geiger counter is placed in the ‘up’ path and another in the ‘down’ path, then one or other of the Geiger counters will detect the particle. In this case we say that a spin measurement has taken place. In particular, if the Stern-Gerlach magnet is aligned in the x-direction, then a measurement of spin the x-direction has been performed. The case in which one Geiger counter is placed in either the ‘up’ or the ‘down’ path but not the other will not be considered here. Suppose one Stern-Gerlach magnet is placed directly in the path of an electron, and a second Stern-Gerlach magnet is placed behind it in the ‘up’ path exiting the first magnet, while a third magnet is placed behind it in the ‘down’ exit path of the first magnet. Now four Geiger counters are placed in the exit paths of last two magnets. The particle will be detected by exactly one of the four Geiger counters. We say in this case that the election has been subjected to a sequential spin measurement. If the frequencies of particles detected by the four Geiger counters are recorded in many repeated trials of the experiment, then we have collected data that can be used to infer important properties of spin observables. This procedure plays the role of the ‘determination of the magnitudes’ in Whewell’s third step in the colligation of facts. After this step is completed, we are able to make precise predictions about the statistics in other sequential spin measurements, in which the magnets are aligned in different directions. One of the basic postulates of quantum mechanics is that the statistical distributions of Geiger counter readings can be inferred from the quantum mechanical state of the particles, if the state is known. The state is represented by a vector in a complex valued vector space, known as a Hilbert space. Since there are two possible outcomes of any spin measurement, we represent the spin state of an electron as a vector in a 2-dimensional Hilbert space. Using the Dirac notation, we write this vector as x + , where the x reminds us that the first magnet was aligned in 15 the x direction, and the + tells us that the electron exited from the magnet in the ‘up’ path. Since the coordinates of the Hilbert space can be chosen arbitrarily, we may chose to represent any vector as a linear combination of the basis vectors, 1 0 0 and 1 . In fact, any vector in the Hilbert space can be written as a complex combination (a superposition) of these two vectors. Therefore, 1 0 c x + = c1 + c2 = 1 , 0 1 c2 where c1 and c2 are complex numbers. Corresponding to this Hilbert space of column vectors is the so-called dual space of row vectors, where the basis vectors of the dual space are [1 0] and [ 0 1] . The column vector x + corresponds to a row vector in this dual space, written in the Dirac notation as x + . Contrary to what one might expect, x + is not defined as c1 [1 0] + c2 [ 0 1] , but by x + = c1* [1 0] + c2* [ 0 1] , where c* is the complex conjugate of the complex number c. That is, if c = a + ib , where a and b are real, then c* = a − ib . Given this convention, there is still a one-to-one mapping between column vectors and row vectors. An important property of complex numbers is that the product of a complex number with its complex conjugate is equal to the squared magnitude (modulus) of the complex number, c*c = ( a − ib )( a + ib ) = a 2 + b 2 = c = c* , 2 2 which is always a non-negative real number. In fact, this is exactly the reason behind the strange definition of the dual vector, for one can now obtain the squared magnitude of a vector in a Hilbert space by multiplying the vector with its dual: 1 0 2 2 x + x + = ( c1* [1 0] + c2* [ 0 1]) c1 + c2 = c1*c1 + c2*c2 = c1 + c2 = x + 1 0 16 2 . x + x + is a bra-ket (bracket) and so x + is called a bra vector and x + a ket vector. In an ordinary vector space, the scalar product of two vectors, c1 d1 c and d , 2 2 is defined as the product of the dual of one of the vectors times the other vector, as follows: [c1 d c2 ] 1 = c1d1 + c2 d 2 . d2 This definition applies to complex vector spaces provided that the dual vector is defined as above. For example, for two arbitrary ket vectors, c d ψ = 1 and φ = 1 , c2 d2 d ψ φ = c1* c2* 1 = c1*d1 + c2* d 2 . d2 In general, this will be a complex number. But note that if we reverse the order in the bra-ket, * then we get its complex conjugate: φ ψ = ψ φ . In the special case in which φ = ψ , * ψ ψ = ψ ψ , which proves that ψ ψ is always a real number. A fundamental postulate of quantum mechanics is that observables are represented by linear operators that map vectors in the Hilbert space to other vectors in the Hilbert space. An arbitrary linear operator,  , in a 2-dimensional Hilbert space is represented by a 2 × 2 matrix of the form a b Aˆ = , c d where a, b, c, and d are complex numbers. A second fundamental postulate is that the mean value of an observable  when the system is in a state ψ is defined by Eψ Aˆ ≡ ψ Aˆ ψ . Note that from the associativity of matrix multiplication, ψ  ψ can be read either as ( ψ  ψ ) or as ( ψ Â) ψ . A third fundamental postulate is that the mean value of an observable in any quantum state must be a real number. This restricts the class of operators that are observables in the following way. First note that, in general, 17 ac + bc2 * a b c1 * ψ Aˆ ψ = c1* c2* = c1* c2* 1 = c1 ( ac1 + bc2 ) + c2 ( cc1 + dc2 ) . + c cc dc c d 2 2 1 The third postulate therefore requires that the right hand side is real. But, c1* ( ac1 + bc2 ) + c2* ( cc1 + dc2 ) = a c1 + b ( c1*c2 ) + c ( c1c2* ) + d c2 . 2 2 Since c1 and c2 are arbitrary complex numbers, the right hand side can only be real in every case if a and d are real. Furthermore, we require that b ( c1*c2 ) + c ( c1c2* ) is real. With a little algebraic manipulation, it is possible to see that this is true for arbitrary complex numbers c and b only if c = b* . Therefore, every observable in a 2-dimensional Hilbert space is represented by a matrix of the form a Aˆ = * b b , d where a and d are real. Any matrix that has this form is said to be Hermitian, or self-adjoint. In sum, the fundamental postulates require that observables are Hermitian operators, for otherwise their mean values might be complex in some quantum states. Not only does a linear Hermitian operator have a well defined mean value in every quantum state, but it also has a well defined dispersion, or variance, in every state (in fact, it has a well defined probability distribution in every state, but this detail does not concern us at the moment). The variance of an observable  is defined analogously to the variance of a random variable in standard statistics—namely, as the expected value (mean value) of the operator minus its mean value all squared. So, let α = Eψ Aˆ ≡ ψ Aˆ ψ . Then ( Varψ Aˆ ≡ ψ Aˆ − α ) 2 ψ , ( for any state ψ . Note that it is possible to prove that the operator  − α ) 2 is Hermitian if  is Hermitian. So, the variance of  is a (non-negative) real number in every quantum state. Now, let us examine the special case in which an observable  is dispersion-free in a state ψ , where ‘dispersion-free’ is a physicist’s way of saying that the variance is 0. Given an 18 observable  , how do we characterize the states ψ in which Varψ Aˆ = 0 . In other words, which states ψ satisfy the equation Varψ Aˆ = 0 ? Theorem:9 Varψ Aˆ = 0 if and only if  ψ = α ψ for some number α. Definition: An equation of the form  ψ = λ ψ is called the eigenvalue equation for the operator  , and any ψ satisfying this equation is called an eigenvector of  , and the λ is called the eigenvalue corresponding to that eigenvector. Note that λ is the mean value of the quantum observable in the corresponding eigenstate. The eigenvalue equation is most commonly presented as defining the possible outcomes of the measurement of quantum observable, and it does serve this function. But it is also determines the possible values of the observable itself, as if the observable were are variable in the ordinary mathematical sense. But here is must be remembered that an observable only has a well defined mean value in every state. It is not assigned a precise values. At least, the QM state does not determine a precise value (unless, as stated in the theorem) it is an eigenstate of the observable. So, we should be careful to maintain a clear distinction between the ‘observed value’ and ‘mean value’ of a QM observable. Let us return the example of sequential spin measurements on an electron. The electron has passed through the magnetic field of a Stern-Gerlach magnet oriented in the x direction. We now place two more Stern-Gerlach magnets, also oriented in the x direction—one in the ‘up’ path exiting the first magnet, and the other in the ‘down’ path. Finally, we place particle detectors (Geiger counters, or whatever devices are used to detect electrons) in the ‘up’ and ‘down’ exit paths of the last two magnets (4 Geiger counters in all). What we find, experimentally, is that no electrons follow the ‘up’ and then the ‘down’ path, or the ‘down’ and then the ‘up’ path. That is to say, repeated spin measurements always produce the same outcomes (‘up’ follows by ‘up’ or ‘down’ followed by ‘down’). Note that the final detection of the particle retroactively determines the path it took after exiting the first magnet, even though no particle detectors are placed on that paths between the first magnet and the second set of magnets. Thus, the outcome of each electron measures the 9 See, for example, Khinchin (1960), 54-55, for a proof. 19 spin values of the electron at different times. This enables us to estimate the probability distribution of the second spin observable given the outcome of the first spin measurement. Now suppose that we change the experimental setup. We remove the second set of magnets and place a Geiger in the ‘down’ path exiting the first magnet.10 If we know that a particle is passing through the apparatus, and the particle detector does not respond, then we know that the electron is traveling in the ‘up’ path. Now place a second magnet in that path that is oriented in some direction d, and place two particle detectors on the exit paths to capture the electron in either the ‘up’ path or the ‘down’ path. Given that the earlier detector was not triggered, how does quantum mechanics determine the probabilities for two possible responses of the second set of detectors? Suppose we begin by assuming that the electron is in some unknown state vector ψ . Then we can determine the probabilities from the mean value of the spin observable σ d , where d is the alignment of the second magnet. By the fundamental postulates of quantum mechanics, mean value of σ d is ψ σ d ψ . If we choose the eigenvalues of any spin operator to be +1 and −1, then we already know that for d = x, ψ σ x ψ = 1 . But this implies that ψ is a dispersion-free state for the observable σ x , and so by the theorem, ψ is an eigenstate of σ x , with eigenvalue +1, which implies that ψ = x + . So, in order to accommodate the first experimental fact, we need to assume that the state of the electron in the ‘up’ path exiting the first magnet is x + . This is why we say that the first magnet prepares the state of the electron. So, if we assume that the state vector of the particle entering the d-aligned magnet is the same, then we can predict the probabilities for this measurement from the value of x + σ d x + . This is an example of how prediction and accommodation works in quantum mechanics. Here is a second consequence of the eigenvalue equation. The identity operator, given by 1 0 Iˆ = , 0 1 10 Note that Geiger counters are still placed on all possible exit paths, so a measurement, in our earlier sense, is still being performed. 20 always satisfies the eigenvalue equation with α = 1 . Thus, Iˆ is a dispersion-free observable in every state, with a mean value of ψ Iˆ ψ = ψ ψ . But also note that Iˆ ψ = +1 ψ . So, if we want the mean value of the identity observable to be equal to its eigenvalue in a dispersion-free state, then we need to assume that all state vectors are normalized. That is, we need to assume that ψ ψ = ψ 2 =1. The next task is to evaluate a quantity like ψ σ x ψ for an arbitrary state vector ψ . What we need to know (besides the state vector) is the matrix representation of the operator σ x . We already know that it is represented by some Hermitian operator with the form a b σx = * , b d where a and b are real numbers. In order to take the analysis further, we need to make some conventional choices. We know that we can choose any coordinate system we like for the Hilbert space, so we can choose to represent the eigenvector of the observable σ x associated with the eigenvalue that represents spin ‘up’ as: 1 x+ = . 0 Let us also denote the expected value of spin ‘up’ for σ x in state x + by α + . Because the outcome is dispersion-free in that state, α + also denotes the observed value of the observable when the electron goes ‘up’. The eigenvalue equation now requires that a b* b 1 1 =α+ , d 0 0 which requires that a = α + and b* = 0 = b . So, the observable is represented by the matrix α + σx = 0 0 , d for some as yet undetermined real value d. What are the other solutions of the equation α + 0 0 c1 c1 =α ? d c2 c2 21 We know one solution, and we can also see that ψ = 0 is a trivial solution of every eigenvalue equation. But what are the non-trivial solutions? By multiplying the matrices, it is easy to see that the only other solution is given by c1 = 0 , c2 ≠ 0 , and d = α − , for a second eigenvalue α − ( ≠ α + ) . We make the last requirement because if it were the case that α − = α + , then σ x would be proportional to the identity operator, and this would imply that σ x is dispersion-free in every quantum state, contrary to experimental fact. Thus, we have shown that the observable σ x is represented by the matrix α + σx = 0 0 , α− where α + ≠ α − are two eigenvalues, and the corresponding eigenvectors are 0 1 x + = and x − = . 0 c2 Notice that these two eigenvectors are mutually orthogonal in the precise sense that x + x − = 0 = x − x + . We also want x − to be a unit vector, but this still does not fix the value of c1 uniquely. The exact choice of c1 makes no difference, so we set c1 = 1 . The choice of eigenvalues is also a matter of convention, so long as they are not equal, so we set their values to α + = +1 and α − = −1 . With these choices, we finally arrive at +1 σx = 0 0 , −1 1 0 x + = and x − = . 0 1 Given these choices, we may verify that x + σ x x + = +1 and x − σ x x − = −1 . Now consider a second experimental fact—namely that if a spin measurement in the x direction is followed by a spin measurement in the y direction (where x is orthogonal to y), then the results are random (50% probability for each outcome). Given that we have chosen to represent spin outcomes by the numbers ±1 , the second experimental fact amounts to saying that if the electron is prepared in a state x + or x − , then the expected value of the observable σ y is 0. That is, we require 22 x + σ y x + = 0 and x − σ y x − = 0 . In matrix form, these equations are written as [1 a 0] * b b 1 a = 0 and [ 0 1] * d 0 b b 0 = 0, d 1 and they imply that a = 0 = d. Therefore, 0 b σy = * . b 0 One cannot expect the constraint to result in a unique specification of σ y , since there are many directions orthogonal to x, and each is associated with a different spin observable. On the other hand, it cannot be that b = 0, for this would imply that σ y is the zero matrix. So, the simplest choice is to set b = i, in which case we arrive at 0 i σy = . −i 0 It is straightforward to show that σ y has eigenvalues equal to ±1 , and that the corresponding eigenvectors are orthogonal to each other, and that each is a linear combination (superposition) of the eigenvectors of σ x . The Hermitian operators σ x and σ y do not commute, as is verified by multiplying the matrices together in each order: 1 0 0 b 0 b 0 −1 b* 0 = −b* 0 , whereas 0 b 1 0 0 b* 0 0 −1 = b* −b . 0 In fact, this shows that σ x and σ y anti-commute in the precise sense that σ xσ y = −σ yσ x . This is the property of spin observables that leads to the correct predictions in the version of the Bell experiment described in the next section. 6. Hidden Variables Again: A Second Failed Consilience Three electrons fly apart towards 3 widely separated measuring devices, labeled 1, 2, and 3, each containing a Stern-Gerlach magnet that can be aligned in one of two orthogonal directions, x and 23 y.11 The directions x and y are orthogonal to each other and to the direction of motion of the incoming particle (which is traveling in the local z direction). Each device contains two particle detectors, x one in the ‘up’ path and one in the ‘down’ path, such that y if the incoming electron is detected in the ‘up’ path, the 3 2 y x light bulb attached to the device flashes red and if it is detected in the ‘down’ path, the same light bulb flashes 1 green. The particle is always detected by one or x other of the particle detectors, and never both. In y common parlance, we say that the electron is spin-up if the light is red and spin-down if the light is green in whichever direction the Stern-Gerlach magnet is Figure 2: The GHZ version of Bell’s experiment. aligned. First, consider the following set of experimental facts: When any 2 of the measurement devices are set to y and the third is set to x (that is, for settings y-y-x, y-x-y, or x-y-y) then there is always an odd number of red lights flashing in every trial of the experiment—that is, either all 3 red lights flash or one light flashes red and the other two flash green. A local hidden variable theory accommodates these experimental facts in the following way: Suppose that each particle, after separation from the others, carries with it a set of properties that determine which light bulb will flash for every possible settings of the device it enters. Let us represent the property that the particle approaching device 1 would cause the red bulb to flash if the device 1 were set to position x by X1 = +1, and the property that the green bulb would flash were device 1 set to position x by X1 = −1. According to the hidden variable story, the particle has the property X1 = +1 or the property X1 = −1, but not both. Note that while the value of the variable determines the experimental outcome, the variable is not being defined as representing the observable outcome. The existence of these variables is postulated by the theory to ‘explain’ the observed outcome. The theory also assumes that the hidden variables have values when they are not measured. 11 This example is referred to as the GHZ experiment after Greenberg et al (1989). The treatment of the example in this section and the next follows Mermin (1990) closely. 24 Similarly, let Y1 = +1 and Y1 = −1 represent the two properties that determine the outcome when the measuring device 1 is set at y. So, a particle heading towards device 1 will have exactly one of 4 possible sets of properties, which Mermin (1990) refers to as “instruction sets”: Either {X1 = +1, Y1 = +1}, {X1 = +1, Y1 = −1}, {X1 = −1, Y1 = +1}, or {X1 = −1, Y1 = −1}. These are the hidden variables introduced to explain the correlation. There is no effort to make hidden variable model particularly parsimonious. Separate variables are introduced for each spin property, but, collectively, they act as a common cause. Note that in a single run of the experiment, a measurement device cannot be oriented in two directions simultaneously, so we cannot determine all the hidden variable values by direct measurement. For instance, if we see the bulb flash red when device 1 is set to y, then we would only know that the instruction set was either {X1 = +1, Y1 = +1} or {X1 = −1, Y1 = +1}. This is why the variables are called hidden variables. The laws or regularities that must hold amongst hidden variables in order for the theory to accommodate the first set of experimental facts then lead to a prediction. What needs to be accommodated is that fact that the outcome for the third particle must be R if first two outcomes are either R-R or G-G. Otherwise, the outcome for the third particle is G. The theoretical laws that are necessary and sufficient to accommodate all three regularities are: X1 × Y2 × Y3 = +1, Y1 × X2 × Y3 = +1, and Y1 × Y2 × X3 = +1 . (1) For example, if the setting is y-x-y, then the second law tells us that if the outcome for particles 1 and 3 is R-R , then Y1 = 1 = X2, and therefore Y3 = 1, and so the outcome for particle 3 will be R. To present the theory as a probabilistic theory, think of an ensemble of repeated instances of this experiment of the following type: First, assume that the electron triples are prepared in the same way in each case, and denote this description by x1 (here x1 has nothing to do with the hidden variable X 1 , or the directions of any of the Stern-Gerlach magnets). According to the hidden variable theory, this description is incomplete in that it does not, by itself, imply or constrain any of the values of the hidden variables. But this does not prevent the theory from assigning probabilities to the outcomes based on the information that it has. The second piece of available information is the alignment of the Stern-Gerlach magnets. We encode this by x2 , where x2 can take on the values x-y-y, y-x-y, y-y-x, or some other 25 alignment triple. The value of x2 will be different in different token experiments, and so each value will define a sub-type within the ensemble. Next, there is the information that electron detectors are placed at all possible exit paths, and this fact is encoded by y1 . Finally, z1 encodes the detector responses, which can be any one of eight possible triples ( R, R, R ) , ( R, R, G ) , ( R, G, R ) , ( R, G, G ) , ( G, R, R ) , ( G, R, G ) , ( G, G, R ) , or ( G, G, G ) . general type is encoded by a ‘trajectory’ (( x , x , y ) , z ) . 1 2 1 1 So, each token of the The function of the theory is to assign probability values to each of the eight values of z1 conditional on the value of ( x1 , x2 , y1 ) . As we have described the theory so far, the laws in (1) do not uniquely determine what probabilities should be assigned. They only imply that 4 of the 8 possible outcomes have probability 0 (at least when x2 takes on one of the values x-y-y, y-x-y, or y-y-x). So, there are many hidden variable theories that satisfy the laws in (1), each of which distributes the probabilities amongst the 4 possible outcomes in a different way. Since the experimental facts of this example are such that the 4 cases, ( R, R, R ) , ( R, G , G ) , ( G, R, G ) , ( G, G, R ) , and ( G, G, G ) , occur with equal frequency (modulo expected sampling errors), then I will assume that the hidden variable accommodates this fact by assigning a probability of ¼ in each case. In the following section, we shall show that QM predicts this fact as well. It should be evident by now that it cannot be required of a probabilistic theory that it assign these probability distributions ‘in advance’. It is normal that prediction is preceded by accommodation. However, it is also expected of a good theory that, after a period of accommodation, it will be able to assign probability distributions to experimental situations that have never been encountered before. Although Bell did not consider this particular example, it was Bell (1964, 1971) who first saw that ‘local’ hidden variable theories of the kind I have described do make some predictions. If the laws in (1) are assumed to apply in all experimental situations (in which the electron triples are prepared in same way, as specified by x1 ), then the following deductive consequence of the laws holds in all situations. First, multiply the 3 equations in (3) together, to obtain (X1 × Y2 × Y3)(Y1 × X2 × Y3)(Y1 × Y2 × X3) = X1 × X2 × X3 = +1, 26 (4) where we have used the fact that Y12 = Y22 = Y32 = 1. The derived law, X1 × X2 × X3 = +1, implies that there will also be an odd number of red flashes when all 3 devices are set to x-x-x. This prediction is dramatically different from the prediction made by quantum mechanics, which predicts that there must be an even number of red flashes in this experimental context! QM is right and the hidden variable prediction is wrong!12 7. The Consilience of Quantum Mechanics I will now derive the correct prediction from quantum mechanics. Instead of using variables, quantum mechanics assigns spin observables to each particle in every spin direction. So, for example, the observable σx1 replaces the hidden variable X1 in the hidden variable theory, and the observable σy3 replaces the hidden variable Y3, and so on. The outcome probabilities are then determined from the appropriate observable in the way described in section 5. For example, if the system were in a +1 eigenstate of the observable σx1, then we would predict with probability 1 that device 1 will flash red. If the system is in a −1 eigenstate of the same observable, then the green light will flash, and so on. If the system is not in an eigenstate of σx1, then the probability that device 1 flashes red still be inferred from the mean value of σx1, which is equal to ψ σ x1 ψ , where ψ is the quantum state of the system of three particles. All of this is exactly what we should expect from the quantum mechanical treatment of sequential spin measurements (section 5), assuming that the properties of the spin observables do not change from one situation to the next. There are 6 spin observables involved in our story: σx1, σy1, σx2, σy2, σx3, σy3. The new feature of this example (called the GHZ example) is that we can also construct new observables by considering products and sums of the 6 observables. However, not every function of the 6 observables is a Hermitian operator. Just as in section 5, an operator has to be Hermitian for it to be guaranteed a real mean value in every state, and a product of two Hermitian operators is Hermitian if and only they commute. So, it is important to understand that operators pertaining to different particles always commute. For example, σx1 commutes with σy2, and the product observable σy1σx2 commutes with σy3, and so on. By using these facts alone, it follows that every 12 As far as I’m aware, this exact experiment has not been performed. My confidence in making this claim is based on the proven consilience of QM (next section). 27 product observable, like (σy1σx2)σy3, is Hermitian. The product observables that correspond the three random variable products that appear in (1) are (σx1σy2σy3), (σy1σx 2σy3), and (σy1σy2σ x3). We have just shown that these are quantum observables, which therefore have well defined mean values and variances in every quantum state. Analogously to the hidden variable statistics, the mean values of these product operators will determine the correlations amongst the light flashes. The quantum mechanical story begins with the assumption that all the electron triples are prepared in exactly the same quantum state, ψ . The fact that there is always an odd number of red flashes in the settings y-y-x, y-x-y, and x-y-y tells us that the three product observables have dispersion-free distributions in the state ψ , which implies that ψ is an eigenstate of all three product observables. So, the first set of facts is accommodated by the supposition that ψ is in a +1 eigenstate of all of the product observables (σx1σy2σy3), (σy1σx 2σy3), and (σy1σy2σ x3). Therefore, in quantum mechanics, (1) is replaced by the laws: σx1σy2σy3 ψ = +1 ψ , σy1σx 2σy3 ψ = +1 ψ , and σy1σy2σ x3 ψ = +1 ψ . (2) What predictions can be made from these quantum mechanical laws? By using the anticommutation property of the spin operators, as derived in section 5, and the fact that any spin operator times itself is equal to the identity operator (as can be verified directly by squaring the matrices derived in section 5), and the fact that operators pertaining to different particles commute, we prove that (σx1σy2σy3)(σy1σx 2σy3)(σy1σy2σ x3) = (σx1σy2σy3)(σy3σx 2σy1)(σy1σy2σ x3) = σx1σy2σx 2σy2σ x3. From here, note that σx1(σy2σx 2)σy2σ x3 =σx1(−σx 2σy2)σy2σ x3 = −σx1σx 2(σy2σy2)σ x3 = −σx1σx 2σ x3, where the minus sign arises from the anti-commutation law σy2 σx2 = −σx2σy2. Recall that the anti-commutation law is required in order to accommodate the facts about sequential measurements (section 5) But now, from (2), it follows that (σx1σy2σy3)(σy1σx 2σy3)(σy1σy2σ x3) ψ = +1 ψ , and therefore, 28 σx1σx2σ x3 ψ = −1 ψ . (3) I have just proved what I promised: The only way for the quantum model to accommodate the first set of experimental facts is to assume that (2) is true, which implies (3), which then implies that that there will always be an even number of red flashes when the magnets are set to x-x-x. The spin operators of quantum mechanics are projectible in a way that is reminiscent of the idea that Goodman (1965) had about projectible predicates, except that spin observables are far better projectors than hidden variables in the GHZ example. At first sight, the quantum account appears to be flexible about the exact probabilities distributions assigned by ψ to the 6 spin observables. However, a deeper analysis of the example reveals that the ψ is uniquely determined by the quantum laws in (2). This is worth proving in detail because it shows that quantum mechanical accommodation of the first three experimental results leads to far stronger predictions than those provided by the hidden variable theory (see the Appendix). In summary, there is only one way for quantum mechanics to accommodate the first three experimental facts. This unique state vector then determines the full probability distribution for those settings (i. e., that the probability of each of each of the 4 possible outcomes is ¼). And, remarkably, in addition, ψ determines the probabilities for any measurement settings whatsoever, including any case in which the three magnets are aligned in directions different from x and y. In contrast, the local hidden variable theory must introduce an entirely new set of hidden variables for each new measurement direction, and it has discovered no laws connecting their values to the hidden variables already in play. The hidden variable theory is predictively impotent in comparison with quantum mechanics, and after all that, the one prediction it does make is false! It is not my intention to disparage hidden variable theories in general. After all, the unknown microstate of a thermodynamical system in statistical mechanics plays the role of a hidden variable. Rather, the point is that hidden variable explanations of spin correlations gamble on the discovery of laws that will constrain the hidden variables. The mere assertion of the existence of these variables does not lead to very much prediction, even though many philosophers consider them to be highly explanatory. 29 Since there is a one-to-one mapping between hidden variables and spin operators, there is also an infinite number of spin observables in quantum mechanics. But this is no obstacle in quantum mechanics because it is possible to express all spin observables as a linear combination of just three of them (the Pauli matrices). So, quantum mechanics is able to use ψ to calculate the probability distributions of all spin observables, and all Hermitian functions of them. After working through an example like this, and after seeing how tightly quantum mechanics ties the phenomena together, wouldn’t it be miraculous for this huge body of data to fit the probabilities of quantum mechanics so well without there being some way of explaining this fact in terms of a reality behind the observed phenomena? Unfortunately, the defense of a realist interpretation of QM is beyond the scope of this particular essay. Suffice it to say that the argument given here provides the same kind of empirical evidence for a realist view of the QM properties of spin as provided for the existence of Newtonian mass, for example. In contrast, local hidden variable models fail the test of consilience. 8. Why Causal Explanation Is Not Universal Think again about to the epistemological problem confronting someone locked in Reichenbach’s cubical world. The simplicity of the analogy encourages us to think narrowly in terms of explaining the correlations in terms of common causes. Reichenbach made his position more precise in the following way. Consider two physical quantities X and Y that exhibit a statistical correlation, such that the correlation does not arise from X directly causing Y or Y directly causing X. Then, the only alternative appears to be that the correlation is explained by a common cause variable Z, such that Z causes X and Z causes Y.13 This very powerful idea is severely challenged by the example described in previous sections. The first point to establish is that the premises for the common cause inference clearly hold in the GHZ version of the Bell experiment. Consider a version in which Stern-Gerlach magnet number 1 is placed in Alice’s laboratory, and magnets 2 and 3 are placed in Bob’s laboratory. Suppose that Bob hooks up a master light bulb to his two devices that flashes red if and only one of his devices flashes red and the other flashes green. That is, if both his devices flash red or 13 There are a number of other scenarios that need to be ruled out as well. Arntzenius (1993) provides a concise list of these. Also see Sober (1984), Cartwright (1989), Eells (1991), Hausman (1998), and Woodward (2003) for extensive discussions of causal inference and explanation, and related philosophical issues. 30 both flash green, then his master bulb flashes green. If the magnets are all aligned in the xdirection, then there must be an even number of red flashes. This means that Alice’s light flashes red if and only if Bob’s master light flashes red. There is a perfect positive correlation between what Bob sees and what Alice sees, provided that all magnets are aligned in the xdirection. Moreover, if we imagine that Alice and Bob measurements are space-like separated, then the special theory of relativity appears to rule out the existence of any direct causal interaction between the events. If we postulate the existence of a hidden quantity X which determines the outcome of Alice’s measurement, and a quantity P that determines the outcome of Bob’s measurement, then we can explain the observed correlation if and only if we postulate that X = P. Now imagine that there is another quantity, Q that Bob could measure by realigning his magnets that is perfectly correlated with X . Then we would have to add the equation X = Q. Finally, imagine that there is a fourth quantity Y that Alice could measure that is perfectly correlated with P. Then, to accommodate this experimental fact, we would have to add the equation Y = P. But now the equations X = P, X = Q, and Y = P logically imply that Y = Q. This entails the prediction that that measurements of Y and Q are perfectly correlated. But now suppose that the prediction is wrong. What do you do? Each electron triple is postulated to have definite values of the variables { X , Y , P, Q} and there is no way of assigning values to these variables that acts as a joint common cause variable that explains all the experimental facts. At least, here is no common cause variable if we assume that the values of the variable are fixed when the electron triple is created, and these values determine the outcomes of the measurements and that their values are independent of Alice or Bob’s choice of measurement setting. If we do allow for strange changes in variable values when the electrons are in flight, or imagine that their values are determined with a foreknowledge of the settings that Alice and Bob are going to choose, then we can accommodate the phenomena. But what is the motivation for this ad hoc way of rescuing hidden variables when quantum theory accounts for the phenomena in a way that is beautifully predicted from the outcomes of sequential spin measurements? If the best test of a theory is the consilience of inductions—in this case, the consilience of sequential spin phenomena and triplet spin phenomena—then non-local hidden variable theories are not only ad hoc, but they also fail to compete with QM. In order to compete, they must not 31 only accommodate different kinds of phenomena separately, but also predict aspects of one kind of phenomena from different kinds of phenomena. The consilience of QM not only explains the growing consensus that the QM strategy of replacing variables with operators is here to stay, but it also explains why cubical world inference is so hard to characterize in general terms. For the inference that is ultimately convincing may depend on the invention of new theoretical concepts and mathematical formalisms, which are not easy to anticipate in advance. Such is the case in QM. The good news is that there do appear to be overarching methodology principles that apply equally well to the old and the new physics. As in science itself, successful predictions are impressive. Whewell’s prediction was that the consilience of inductions is the best indicator we have of good science. It appears that the consilience of inductions is not explains why some causal explanations work, but it also suggests that other causal explanations will always fail. At least so long as causes are characterized in terms of hidden variables. Appendix The purpose of this appendix is to prove that the quantum mechanical accommodation of the first three experimental facts in the GHZ experiment leads to far stronger predictions than those provided by the hidden variable theory. Because there are 8 possible outcomes, the state vector ψ is a vector in an 8-dimensional Hilbert space. This space is constructed out of the three 2-dimensional Hilbert spaces that would be used to represent the states of electrons separately. Let us choose the basis vectors for the first Hilbert space to be x + and x − , etc., where the subscripts keep track of the Hilbert space 1 1 in question. It is now possible to prove that the eight vector products, x ± 1 x± 2 x ± , form a 3 natural basis for the 8-dimensional Hilbert space, where the products are formed by generalizing ordinary matrix multiplication in a natural way. For instance, 1 0 0 0 0 1 0 1 0 0 0 = = 1 0 = 1 0 = 1 1 1 1 0 0 0 32 is obtained treating the second column matrix in the product as if it where simply a 1 × 1 matrix with a column vector as its element. By extending the same idea to the product of three 2vectors, one may easily confirm that the 8 products x ± x± 1 2 x± 3 are each equal to an eight- element column matrix in which exactly one element is equal to 1 and the rest are 0. In each of the 8 cases, the 1 is in a different position, and so these products generate the 8 basis vectors of an 8-dimensional Hilbert space. Recall that the matrix form of the spin operators derived in section 3 implies, for example, 0 i 1 0 0 = = − i = −i x − . −i 0 0 −i 1 σ y x+ = In the context of the 3-electron system, a spin operator like σ y 3 operates on the third part of the vector products, so ( σ y3 x+ x+ x+ = x+ x+ σ y3 x+ ) = −i x + x+ x− , where I have dropped the subscripts because the order in which the 2-vectors appear in the product determines which subscript should be understood. We now have everything in place that we need to apply the eigenvalue equations in (2) to an arbitrary state vector ψ = ∑∑∑ c j ,k ,l x j x k x l , j k l where the indices j, k, and l, range over the values +1 and −1. Calculating the constraints on the arbitrary complex coefficients c j ,k ,l is tedious, but it involves nothing more than simple algebra. By setting c−1,−1,−1 = 1 , and making sure that resulting state vector is normalized, one proves that: ψ = 1 + + − 1 + − + 1 − + + 1 − − − x x x + x x x + x x x + x x x . 4 4 4 4 The expression is easy to remember because there is an even number of +’s in each term.14 14 Some readers may worry about the fact that this state vector is symmetric, whereas the state vectors of indistinguishable fermions are supposed to be anti-symmetric. Here it should be noted that I have only written the spin component of the state vector. The complete state vector is anti-symmetric because the wavefunction component (which gives the position probabilities) is anti-symmetric. 33 References Arntzenius, Frank (1993): “The Common Cause Principle.” PSA 1992 Volume 2: 227-237. East Lansing, Michigan: Philosophy of Science Association. Arntzenius, Frank (1997): “Transition Chances and Causation.” Quarterly 78: 149-168. Pacific Philosophical Bell, John S. (1964). “On the Einstein-Podolsky-Rosen Paradox”, Physics 1: 195-200. Bell, John S. (1971), “Introduction to the Hidden Variable Question”, in B. d'Espagnat (ed.) Foundations of Quantum Mechanics, N. Y.: Academic Press. Butts, Robert E. (ed.) (1989). William Whewell: Theory of Scientific Method. Hackett Publishing Company, Indianapolis/Cambridge. Cartwright, Nancy (1989): Nature’s Capacities and their Measurement. Oxford: Oxford University Press. Eells, Ellery (1991): Probabilistic Causality. Cambridge: Cambridge University Press. Forster, Malcolm R. (1984): Probabilistic Causality and the Foundations of Modern Science. Ph.D. Thesis, University of Western Ontario. Forster, Malcolm R. and Alexey Kryukov (2003), “The Emergence of a Macro-World: A Study of Intertheory Relations in Classical and Quantum Mechanics,” Philosophy of Science 70: 1039-1051. Gillespie, Daniel T. (1970): A Quantum Mechanics Primer: An Elementary Introduction to the Formal Theory of Non-relativistic Quantum Mechanics. Scranton, Pennsylvania: International Textbook Co. Goodman, Nelson (1965): Fact, Fiction and Forecast, Second Edition. Harvard University Press, Cambridge, Mass. Greenberger, Daniel M., Michael A. Horne and Anton Zeilinger (1989): “Going Beyond Bell’s Theorem” in M. Kafatos (ed.), Bell’s Theorem, Quantum Theory and Conceptions of the Universe, Kluwer Academic Publishers, pp. 69-72. Harper, William L. (2002), “Howard Stein on Isaac Newton: Beyond Hypotheses.” In David B. Malament (ed.) Reading Natural Philosophy: Essays in the History and Philosophy of Science and Mathematics. Chicago and La Salle, Illinois: Open Court. 71-112. Hausman, Daniel M. (1998): Causal Asymmetries. Cambridge University Press. 34 Hempel, Carl G. (1965): Aspects of Scientific Explanation and Other Essays in the Philosophy of Science. New York: The Free Press. Hung, Edwin (1997): The Nature of Science: Problems and Perspectives, Wadsworth Publishing Co. Khinchin, A. I. (1960): Mathematical Foundations of Quantum Statistics, Dover Publications, Inc., Mineola, New York. Mermin, David N. (1990) “Quantum Mysteries Revisited”, American Journal of Physics, August 1990, pp.731-4. Myrvold, Wayne and William L. Harper (2002), “Model Selection, Simplicity, and Scientific Inference”, Philosophy of Science 69: S135-S149. Reichenbach, Hans (1938): Experience and Prediction. Chicago: University of Chicago Press. Reichenbach, Hans (1956): The Direction of Time. Berkeley: University of California Press. Reidy, Michael S. and Malcolm R. Forster (forthcoming), “William Whewell (1794-1866),” in Thomas Hockey (ed.), Biographical Encyclopedia of Astronomers, Kluwer Academic Publishers. Sober, Elliott (1984): The Nature of Selection: Evolutionary Theory in Philosophical Focus. MIT Press, Cambridge, Mass. Sober, Elliott (1994): “Temporally Oriented Laws,” in Sober (1994) From A Biological Point of View - Essays in evolutionary philosophy, Cambridge University Press, pp. 233 - 251. van Fraassen, Bas (1980), The Scientific Image, Oxford: Oxford University Press. van Fraassen, B.C. (1983), “The Charybdis of Realism: Epistemological Foundations of Bell's Inequality Argument”, Synthese. Whewell, William (1858): Novum Organon Renovatum, Part II of the 3rd the third edition of The Philosophy of the Inductive Sciences, London, Cass, 1967. Woodward, James. (2003): Making Things Happen: A Theory of Causal Explanation . Oxford and New York: Oxford University Press. 35