* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Aspects of quantum information theory
Basil Hiley wikipedia , lookup
Bohr–Einstein debates wikipedia , lookup
Self-adjoint operator wikipedia , lookup
Bell test experiments wikipedia , lookup
Wave function wikipedia , lookup
Renormalization wikipedia , lookup
Quantum dot wikipedia , lookup
Particle in a box wikipedia , lookup
Delayed choice quantum eraser wikipedia , lookup
Dirac equation wikipedia , lookup
Compact operator on Hilbert space wikipedia , lookup
Quantum fiction wikipedia , lookup
Copenhagen interpretation wikipedia , lookup
Quantum field theory wikipedia , lookup
Quantum decoherence wikipedia , lookup
Hydrogen atom wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Topological quantum field theory wikipedia , lookup
Many-worlds interpretation wikipedia , lookup
Renormalization group wikipedia , lookup
Bra–ket notation wikipedia , lookup
Measurement in quantum mechanics wikipedia , lookup
Coherent states wikipedia , lookup
Scalar field theory wikipedia , lookup
Orchestrated objective reduction wikipedia , lookup
Quantum electrodynamics wikipedia , lookup
Quantum computing wikipedia , lookup
Path integral formulation wikipedia , lookup
Relativistic quantum mechanics wikipedia , lookup
Quantum machine learning wikipedia , lookup
Bell's theorem wikipedia , lookup
Interpretations of quantum mechanics wikipedia , lookup
History of quantum field theory wikipedia , lookup
EPR paradox wikipedia , lookup
Probability amplitude wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Quantum key distribution wikipedia , lookup
Hidden variable theory wikipedia , lookup
Density matrix wikipedia , lookup
Quantum group wikipedia , lookup
Quantum state wikipedia , lookup
Canonical quantization wikipedia , lookup
Quantum entanglement wikipedia , lookup
Aspects of quantum information theory Von der Gemeinsamen Naturwissenschaftlichen Fakultät der Technischen Universität Carolo-Wilhelmina zu Braunschweig von Michael Keyl aus Berlin angenommene Habilitationsschrift zur Erlangung der Venia Legendi für das Lehrgebiet Theoretische Physik Braunschweig 21. Mai 2003 Foreword The main purpose of this habilitation thesis is to document my research on quantum information theory since 1997. To this end I have divided it into two parts. The first (Part I “Fundamentals”) is of introductory nature and gives an overview on the foundations of quantum information. It is based on an invited review article I have written for “Physics Report” [134]. Its main purposes are to make the work self contained and to show how my own research fits into in the whole field, which has become fairly large during the last decade. At the same time it should help readers unfamiliar with quantum information to get an easier access to the work. My own research on different aspects of quantum information theory is presented in the second part of the work (Part II “Advanced topics”). It contains results on quantum channel capacities (Chapter 7), quantum cloning and estimation (Chapters 8 - 11), quantum game theory (Chapter 12) and infinitely entangled states (Chapter 13). Although most of the results presented here are published already elsewhere [136, 133, 138, 137, 65, 139, 75], the present work contains several significant new results. This concerns in particular estimation theory for mixed quantum states (Chapter 10) and the discussion of infinite entanglement in Chapter 13. The latter is submitted [135] but not yet accepted for publication. Contents 1 Introduction 1.1 What is quantum information? . . . . . . . . . . . . . . . . . . . . . 1.2 Tasks of quantum information . . . . . . . . . . . . . . . . . . . . . . 1.3 Experimental realizations . . . . . . . . . . . . . . . . . . . . . . . . I Fundamentals 2 Basic concepts 2.1 Systems, States and Effects . . . . . . . 2.1.1 Operator algebras . . . . . . . . 2.1.2 Quantum mechanics . . . . . . . 2.1.3 Classical probability . . . . . . . 2.1.4 Observables . . . . . . . . . . . . 2.2 Composite systems and entangled states 2.2.1 Tensor products . . . . . . . . . 2.2.2 Compound and hybrid systems . 2.2.3 Correlations and entanglement . 2.2.4 Bell inequalities . . . . . . . . . . 2.3 Channels . . . . . . . . . . . . . . . . . 2.3.1 Completely positive maps . . . . 2.3.2 The Stinespring theorem . . . . . 2.3.3 The duality lemma . . . . . . . . 2.4 Separability criteria and positive maps . 2.4.1 Positivity . . . . . . . . . . . . . 2.4.2 The partial transpose . . . . . . 2.4.3 The reduction criterion . . . . . 9 9 12 13 17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 18 18 19 20 21 22 22 23 24 25 26 26 27 28 29 29 30 31 3 Basic examples 3.1 Entanglement . . . . . . . . . . . . . . . . . . . . . . . . 3.1.1 Maximally entangled states . . . . . . . . . . . . 3.1.2 Werner states . . . . . . . . . . . . . . . . . . . . 3.1.3 Isotropic states . . . . . . . . . . . . . . . . . . . 3.1.4 OO-invariant states . . . . . . . . . . . . . . . . 3.1.5 PPT states . . . . . . . . . . . . . . . . . . . . . 3.1.6 Multipartite states . . . . . . . . . . . . . . . . . 3.2 Channels . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2.1 Quantum channels . . . . . . . . . . . . . . . . . 3.2.2 Channels under symmetry . . . . . . . . . . . . . 3.2.3 Classical channels . . . . . . . . . . . . . . . . . 3.2.4 Observables and preparations . . . . . . . . . . . 3.2.5 Instruments and parameter dependent operations 3.2.6 LOCC and separable channels . . . . . . . . . . 3.3 Quantum mechanics in phase space . . . . . . . . . . . . 3.3.1 Weyl operators and the CCR . . . . . . . . . . . 3.3.2 Gaussian states . . . . . . . . . . . . . . . . . . . 3.3.3 Entangled Gaussians . . . . . . . . . . . . . . . . 3.3.4 Gaussian channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 32 32 33 34 35 37 37 39 39 40 42 42 43 45 46 46 47 48 50 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Basic tasks 4.1 Teleportation and dense coding . . . . . . . . . . . . . . . . . 4.1.1 Impossible machines revisited: Classical teleportation 4.1.2 Entanglement enhanced teleportation . . . . . . . . . 4.1.3 Dense coding . . . . . . . . . . . . . . . . . . . . . . . 4.2 Estimating and copying . . . . . . . . . . . . . . . . . . . . . 4.2.1 Quantum state estimation . . . . . . . . . . . . . . . . 4.2.2 Approximate cloning . . . . . . . . . . . . . . . . . . . 4.3 Distillation of entanglement . . . . . . . . . . . . . . . . . . . 4.3.1 Distillation of pairs of qubits . . . . . . . . . . . . . . 4.3.2 Distillation of isotropic states . . . . . . . . . . . . . . 4.3.3 Bound entangled states . . . . . . . . . . . . . . . . . 4.4 Quantum error correction . . . . . . . . . . . . . . . . . . . . 4.4.1 The theory of Knill and Laflamme . . . . . . . . . . . 4.4.2 Graph codes . . . . . . . . . . . . . . . . . . . . . . . 4.5 Quantum computing . . . . . . . . . . . . . . . . . . . . . . . 4.5.1 The network model of classical computing . . . . . . . 4.5.2 Computational complexity . . . . . . . . . . . . . . . . 4.5.3 Reversible computing . . . . . . . . . . . . . . . . . . 4.5.4 The network model of a quantum computer . . . . . . 4.5.5 Simons problem . . . . . . . . . . . . . . . . . . . . . 4.6 Quantum cryptography . . . . . . . . . . . . . . . . . . . . . 5 Entanglement measures 5.1 General properties and definitions . . . . . . . 5.1.1 Axiomatics . . . . . . . . . . . . . . . . 5.1.2 Pure states . . . . . . . . . . . . . . . . 5.1.3 Entanglement measures for mixed states 5.2 Two qubits . . . . . . . . . . . . . . . . . . . . 5.2.1 Pure states . . . . . . . . . . . . . . . . 5.2.2 EOF for Bell diagonal states . . . . . . 5.2.3 Wootters formula . . . . . . . . . . . . . 5.2.4 Relative entropy for Bell diagonal states 5.3 Entanglement measures under symmetry . . . . 5.3.1 Entanglement of Formation . . . . . . . 5.3.2 Werner states . . . . . . . . . . . . . . . 5.3.3 Isotropic states . . . . . . . . . . . . . . 5.3.4 OO-invariant states . . . . . . . . . . . 5.3.5 Relative Entropy of Entanglement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 52 52 52 54 55 55 56 57 58 59 60 60 61 64 66 66 67 68 69 72 72 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 75 75 77 78 79 80 81 82 83 83 84 84 85 86 87 6 Channel capacity 6.1 Definition and elementary properties . . . . . . . . 6.1.1 The definition . . . . . . . . . . . . . . . . . 6.1.2 Elementary properties . . . . . . . . . . . . 6.1.3 Relations to entanglement measures . . . . 6.2 Coding theorems . . . . . . . . . . . . . . . . . . . 6.2.1 Shannon’s theorem . . . . . . . . . . . . . . 6.2.2 The classical capacity of a quantum channel 6.2.3 Entanglement assisted capacity . . . . . . . 6.2.4 The quantum capacity . . . . . . . . . . . . 6.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 90 90 92 95 95 95 96 97 98 98 . . . . . . . . . . . . . . . II Advanced topics 103 7 Continuity of the quantum capacity 7.1 Discrete to continuous error model . . . . . . 7.2 Coding by random graphs . . . . . . . . . . . 7.3 Results . . . . . . . . . . . . . . . . . . . . . . 7.3.1 Correcting small errors . . . . . . . . . 7.3.2 Estimating capacity from finite coding 7.3.3 Error exponents . . . . . . . . . . . . 7.3.4 Capacity with finite error allowed . . . . . . . . . . . . . . . . . . . . . . . . . . . solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 104 105 107 107 108 110 111 8 Multiple inputs 8.1 Overview and general structure . . . . . . . . 8.2 Symmetric cloner and estimators . . . . . . . 8.2.1 Reducing parameters . . . . . . . . . . 8.2.2 Decomposition of tensor products . . . 8.2.3 Fully symmetric cloning maps . . . . . 8.2.4 Fully Symmetric estimators . . . . . . 8.3 Appendix: Representations of unitary groups 8.3.1 The groups and their Lie algebras . . 8.3.2 Representations . . . . . . . . . . . . . 8.3.3 The Casimir invariants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 112 114 114 116 118 120 122 122 122 123 9 Optimal Cloning 9.1 Figures of merit . . . . . . . . . . 9.2 The optimal cloner . . . . . . . . 9.3 Testing all clones . . . . . . . . . 9.3.1 Existence and uniqueness 9.3.2 Supplementary properties 9.4 Testing single clones . . . . . . . 9.4.1 Fully symmetric cloners . 9.4.2 The qubit case . . . . . . 9.4.3 The general case . . . . . 9.5 Asymptotic behavior . . . . . . . 9.6 Cloning of mixed states . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 125 126 129 129 131 132 132 134 135 140 141 10 State estimation 10.1 Estimating pure states . . . . . . . . . . . 10.1.1 Relations to optimal cloning . . . . 10.1.2 The optimal estimator . . . . . . . 10.2 Estimating mixed states . . . . . . . . . . 10.2.1 Estimating the spectrum . . . . . 10.2.2 Asymptotic behavior . . . . . . . . 10.2.3 Estimating the full density matrix 10.3 Appendix: Large deviation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 143 143 144 146 147 148 152 155 11 Purification 11.1 Statement of the problem . . . 11.1.1 Figures of Merit . . . . 11.1.2 The optimal purifier . . 11.2 Calculating fidelities . . . . . . 11.2.1 Decomposition of states 11.2.2 The one qubit fidelity . 11.2.3 The all qubit fidelity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 157 157 158 159 159 160 162 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.3 Solution of the optimization problems 11.4 Asymptotic behavior . . . . . . . . . . 11.4.1 The one particle test . . . . . . 11.4.2 The many particle test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 165 165 168 12 Quantum game theory 12.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.1.1 Classical games . . . . . . . . . . . . . . . . . . . . 12.1.2 Quantum games . . . . . . . . . . . . . . . . . . . 12.2 The quantum Monty Hall problem . . . . . . . . . . . . . 12.2.1 The classical game . . . . . . . . . . . . . . . . . . 12.2.2 The quantum game . . . . . . . . . . . . . . . . . 12.2.3 The classical strategy . . . . . . . . . . . . . . . . 12.2.4 Strategies against classical notepads . . . . . . . . 12.2.5 Strategies for Quantum notepads . . . . . . . . . . 12.2.6 Alternative versions and quantizations of the game 12.3 Quantum coin tossing . . . . . . . . . . . . . . . . . . . . 12.3.1 Coin tossing protocols . . . . . . . . . . . . . . . . 12.3.2 Classical coin tossing . . . . . . . . . . . . . . . . . 12.3.3 The unitary normal form . . . . . . . . . . . . . . 12.3.4 A particular example . . . . . . . . . . . . . . . . . 12.3.5 Bounds on security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170 170 170 171 173 173 174 176 177 179 180 182 182 185 186 187 188 13 Infinitely entangled states 13.1 Density operators on infinite dimensional Hilbert space . . . 13.2 Infinite one-copy entanglement . . . . . . . . . . . . . . . . 13.3 Singular states and infinitely many degrees of freedom . . . 13.3.1 Von Neumann’s incomplete infinite tensor product of spaces . . . . . . . . . . . . . . . . . . . . . . . . . . 13.3.2 Singular states . . . . . . . . . . . . . . . . . . . . . 13.3.3 Local observable algebras . . . . . . . . . . . . . . . 13.3.4 Some basic facts about operator algebras . . . . . . 13.4 Von Neumann algebras with maximal entanglement . . . . 13.4.1 Characterization and basic properties . . . . . . . . 13.4.2 Characterization by violations of Bell’s inequalities . 13.4.3 Schmidt decomposition and modular theory . . . . . 13.4.4 Characterization by the EPR-doubles property . . . 13.5 The original EPR state . . . . . . . . . . . . . . . . . . . . 13.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . 13.5.2 Restriction to the CCR-algebra . . . . . . . . . . . . 13.5.3 EPR-correlations . . . . . . . . . . . . . . . . . . . . 13.5.4 Infinite one-shot entanglement . . . . . . . . . . . . 13.5.5 EPR states based on two mode Gaussians . . . . . . 13.5.6 Counterintuitive properties of the restricted states . . . . . . . . . . . . . . . . Hilbert . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 190 191 192 193 193 194 195 196 196 198 199 200 202 202 203 204 205 207 209 Chapter 1 Introduction Quantum information and quantum computation have recently attracted a lot of interest. The promise of new technologies like safe cryptography and new “super computers”, capable of handling otherwise untractable problems, has excited not only researchers from many different fields like physicists, mathematicians and computer scientists, but also a large public audience. On a practical level all these new visions are based on the ability to control the quantum states of (a small number of) micro systems individually and to use them for information transmission and processing. From a more fundamental point of view the crucial point is a reconsideration of the foundations of quantum mechanics in an information theoretical context. The purpose of this work is to document my own contributions on this field. To this end the text is divided into two parts. The first (Part I. “Fundamentals”) is of introductory nature. It takes into account that most of the fundamental concepts and basic ideas of quantum information are developed during the last decade, and are therefore unfamiliar to most physicists. To make the thesis more self contained and easier accessible for the non-expert I have started the work therefore with a detailed review (mainly based on [134]) about the fundamentals of quantum information. Its outline is as follows: The rest of this introduction is devoted to a rough and informal overview of the field, discussing some of its tasks and experimental realizations. Afterwards, in Chapter 2, we will consider the basic formalism which is necessary to present more detailed results. Typical keywords in this context are: systems, states, observables, correlations, entanglement and quantum channels. We then clarify these concepts (in particular entanglement and channels) with several examples in Chapter 3, and in Chapter 4 we discuss the most important tasks of quantum information in greater detail. Chapter 5 and 6 are then devoted to a more quantitative analysis. They discuss entanglement measures and channel capacities. The second part of this thesis (Part II. “Advanced topics”) is devoted to my own contributions to quantum information. It starts with Chapter 7 where continuity properties of the quantum capacity of a quantum channel are discussed. These results are based on [139]. The following Chapters 8 to 11 should be regarded as a contiguous part because they all deal with different aspects of quantum cloning and estimation (general properties in Chapter 8, quantum cloning in Chapter 9, quantum state estimation in Chapter 10 and undoing noise in Chapter 11). They are mainly based on work published in [136, 138, 133, 137], but Chapter 10 contains significant new results as well (Section 10.2). Chapter 12 is devoted to quantum game theory. It discusses two examples [65, 75] and adds some new ideas about definition and general structure of quantum games. The last chapter (Chapter 13) finally discusses entanglement theory in the context of systems with infinitely many degrees of freedom. It is based on [135]. 1.1 What is quantum information? Classical information is, roughly speaking, everything which can be transmitted from a sender to a receiver with “letters” from a “classical alphabet” e.g. the two digits “0” and “1” or any other finite set of symbols. In the context of classical information theory, it is completely irrelevant which type of physical system is used to perform the transmission. This abstract approach is successful because it is easy to transform information between different types of carriers like electric currents in a wire, laser pulses in an optical fiber, or symbols on a piece of paper without loss 1. Introduction 10 of data; and even if there are losses they are well understood and it is known how to deal with them. However, quantum information theory breaks with this point of view. It studies, loosely speaking, that kind of information (“quantum information”) which is transmitted by micro particles from a preparation device (sender) to a measuring apparatus (receiver) in a quantum mechanical experiment – in other words the distinction between carriers of classical and quantum information becomes essential. This approach is justified by the observation that a lossless conversion of quantum information into classical information is in the above sense not possible. Therefore, quantum information is a new kind of information. In order to explain why there is no way from quantum to classical information and back, let us discuss how such a conversion would look like. To convert quantum to classical information we need a device which takes quantum systems as input and produces classical information as output – this is nothing else than a measuring apparatus. The converse translation from classical to quantum information can be rephrased similarly as “parameter dependent preparation”, i.e. the classical input to such a device is used to control the state (and possibly the type of system) in which the micro particles should be prepared. A combination of these two elements can be done in two ways. Let us first consider a device which goes from classical to quantum to classical information. This is a possible task and in fact technically realized already. A typical example is the transmission of classical information via an optical fiber. The information transmitted through the fiber is carried by micro particles (photons) and is therefore quantum information (in the sense of our preliminary definition). To send classical information we have to prepare first photons in a certain state send them through the channel and measure an appropriate observable at the output side. This is exactly the combination of a classical → quantum with a quantum → classical device just described. The crucial point is now that the converse composition – performing the measurement M first and the preparation P afterwards (cf. Figure 1.1) – is more problematic. Such a process is called classical teleportation, if the particles produced by P are “indistinguishable” from the input systems. We will show the impossibility of such a device via a hierarchy of other “impossible machines” which traces the problem back to the fundamental structure of quantum mechanics. This finally will prove our statement that quantum information is a new kind of information 1 . Measurement Preparation M P Figure 1.1: Schematic representation of classical teleportation. Here and in the following diagrams a curly arrow stands for quantum systems and a straight one for the flow of classical information. To start with, we have to clarify the precise meaning of “indistinguishable” in this context. This has to be done in a statistical way, because the only possibility to compare quantum mechanical systems is in terms of statistical experiments. Hence we need an additional preparation device P 0 and an additional measuring apparatus M 0 . Indistinguishable now means that it does not matter whether we perform M 0 measurements directly on P 0 outputs or whether we switch a teleportation device 1 The following chain of arguments is taken from [232], where it is presented in greater detail. This concerns in particular the construction of Bell’s telephone from a joint measurement, which we have omitted here. 1.1. What is quantum information? 11 P0 M P ∼ = P0 M0 M0 Figure 1.2: A teleportation process should not affect the results of a statistical experiment with quantum systems. A more precise explanation of the diagram is given in the text. in between; cf. Figure 1.2. In both cases we should get the same distribution of measuring results for a large number of repetitions of the corresponding experiment. This requirement should hold for any preparation P 0 and any measurement M 0 , but for fixed M and P . The latter means that we are not allowed to use a priori knowledge about P 0 or M 0 to adopt the teleportation process (otherwise we can choose in the most extreme case always P 0 for P and the whole discussion becomes meaningless). The second impossible machine we have to consider is a quantum copying machine. This is a device C which takes one quantum system p as input and produces two systems p1 , p2 of the same type as output. The limiting condition on C is that p1 and p2 are indistinguishable from the input, where “indistinguishable” has to be understood in the same way as above: Any statistical experiment performed with one of the output particles (i.e. always with p1 or always with p2 ) yields the same result as applied directly to the input p. To get such a device from teleportation is easy: We just have to perform an M measurement on p, make two copies of the classical data obtained, and run the preparation P on each of them; cf. Figure 1.3. Hence if teleportation is possible copying is possible as well. According to the “no-cloning theorem” of Wootters and Zurek [239], however, a quantum copy machine does not exist and this basically concludes our proof. However we will give an easy argument for this theorem in terms of a third impossible P M P Figure 1.3: Constructing a quantum copying machine from a teleportation device. 1. Introduction 12 A C B Figure 1.4: Constructing a joint measurement for the observables A and B from a quantum copying machine. machine – a joint measuring device MAB for two arbitrary observables A and B. This is a measuring apparatus which produces each time it is invoked a pair (a, b) of classical outputs, where a is a possible output of A and b a possible output of B. The crucial requirement for MAB again is of statistical nature: The statistics of the a outcomes is the same as for device A, and similarly for B. It is known from elementary quantum mechanics that many quantum observables are not jointly measurable in this way. The most famous examples are position and momentum or different components of angular momentum. Nevertheless a device MAB could be constructed for arbitrary A and B from a quantum copy machine C. We simply have to operate with C on the input system p producing two outputs p1 and p2 and to perform an A measurement on p1 and a B measurement on p2 ; cf. Figure 1.4. Since the outputs p1 , p2 are, by assumption indistinguishable from the input p the overall device constructed this way would give a joint measurement for A and B. Hence a quantum copying machine cannot exist, as stated by the no-cloning theorem. This in turn implies that classical teleportation is impossible, and therefore we can not transform quantum information lossless into classical information and back. This concludes our chain of arguments. 1.2 Tasks of quantum information So we have seen that quantum information is something new, but what can we do with it? There are three answers to this question which we want to present here. First of all let us remark that in fact all information in a modern data processing environment is carried by micro particles (e.g. electrons or photons). Hence quantum information comes automatically into play. Currently it is safe to ignore this and to use classical information theory to describe all relevant processes. If the size of the structures on a typical circuit decreases below a certain limit, however, this is no longer true and quantum information will become relevant. This leads us to the second answer. Although it is far too early to say which concrete technologies will emerge from quantum information in the future, several interesting proposals show that devices based on quantum information can solve certain practical tasks much better than classical ones. The most well known and exciting one is, without a doubt, quantum computing. The basic idea is, roughly speaking, that a quantum computer can operate not only on one number per register but on superpositions of numbers. This possibility leads to an “exponential speedup” for some computations which makes problems feasible which are considered intractable by any classical algorithm. This is most impressively demonstrated by Shor’s factoring algorithm [192, 193]. A second example which is quite close 1.3. Experimental realizations 13 to a concrete practical realization (i.e. outside the laboratory; see next Section) is quantum cryptography. The fact that it is impossible to perform a quantum mechanical measurement without disturbing the state of the measured system is used here for the secure transmission of a cryptographic key (i.e. each eavesdropping attempt can be detected with certainty). Together with a subsequent application of a classical encryption method known as the “one-time” pad this leads to a cryptographic scheme with provable security – in contrast to currently used public key systems whose security relies on possibly doubtful assumptions about (pseudo) random number generators and prime numbers. We will come back to both subjects – quantum computing and quantum cryptography in Sections 4.5 and 4.6. The third answer to the above question is of more fundamental nature. The discussion of questions from information theory in the context of quantum mechanics leads to a deeper and in many cases more quantitative understanding of quantum theory. Maybe the most relevant example for this statement is the study of entanglement, i.e. non-classical correlations between quantum systems, which lead to violations of Bell inequalities2 . Entanglement is a fundamental aspect of quantum mechanics and demonstrates the differences between quantum and classical physics in the most drastic way – this can be seen from Bell-type experiments, like the one of Aspect et. al. [11], and the discussion about. Nevertheless, for a long time it was only considered as an exotic feature of the foundations of quantum mechanics which is not so relevant from a practical point of view. Since quantum information attained broader interest, however, this has changed completely. It has turned out that entanglement is an essential resource whenever classical information processing is outperformed by quantum devices. One of the most remarkable examples is the experimental realization of “entanglement enhanced” teleportation [33, 31]. We have argued in Section 1.1 that classical teleportation, i.e. transmission of quantum information through a classical information channel, is impossible. If sender and receiver share, however, an entangled pair of particles (which can be used as an additional resource) the impossible task becomes, most surprisingly, possible [19]! (We will discuss this fact in detail in Section 4.1.) The study of entanglement and in particular the question how it can be quantified is therefore a central topic within quantum information theory (cf. Chapter 5). Further examples for fields where quantum information has led to a deeper and in particular more quantitative insight include “capacities” of quantum information channels and “quantum cloning”. A detailed discussion of these topics will be given in Chapter 6 and 8. Finally let us remark that classical information theory benefits in a similar way from the synthesis with quantum mechanics. Beside the just mentioned channel capacities this concerns for example the theory of computational complexity which analyzes the scaling behavior of time and space consumed by an algorithm in dependence of the size of the input data. Quantum information challenges here in particular the fundamental Church-Turing hypotheses [54, 212] which claims that each computation can be simulated “efficiently” on a Turing machine; we come back to this topic in Section 4.5. 1.3 Experimental realizations Although this is a theoretical paper, it is of course necessary to say something about experimental realizations of the ideas of quantum information. Let us consider quantum computing first. Whatever way we go here, we need systems which can be prepared very precisely in few distinct states (i.e. we need “qubits”), which can be manipulated afterwards individually (we have to realize “quantum gates”) and which can finally be measured with an appropriate observable (we have to “read out” the result). 2 This is only a very rough characterization. A more precise one will be given in Section 2.2. 1. Introduction 14 One of the most far developed approaches to quantum computing is the ion trap technique (see Section 4.3 and 5.3 in [32] and Section 7.6 of [172] for an overview and further references). A “quantum register” is realized here by a string of ions kept by electromagnetic fields in high vacuum inside a Paul trap, and two long-living states of each ion are chosen to represent “0” and “1”. A single ion can be manipulated by laser beams and this allows the implementation of all “one-qubit gates”. To get two-qubit gates as well (for a quantum computer we need at least one two qubit gate together with all one-qubit operations; cf. Section 4.5) the collective motional state of the ions has to be used. A “program” on an ion trap quantum computer starts now with a preparation of the register in an initial state – usually the ground state of the ions. This is done by optical pumping and laser cooling (which is in fact one of the most difficult parts of the whole procedure, in particular if many ions are involved). Then the “network” of quantum gates is applied, in terms of a (complicated) sequence of laser pulses. The readout finally is done by laser beams which illuminate the ions subsequently. The beams are tuned to a fast transition which affects only one of the qubit states and the fluorescent light is detected. An overview about recent experimental directions can be found in [86]. A second quite successful technique is NMR quantum computing (see Section 5.4 of [32] and Section 7.7 of [172] together with the references therein for details). NMR stands for “nuclear magnetic resonance” and it is the study of transitions between Zeeman levels of an atomic nucleus in a magnetic field. The qubits are in this case different spin states of the nuclei in an appropriate molecule and quantum gates are realized by high frequency oscillating magnetic fields in pulses of controlled duration. In contrast to ion traps however we do not use one molecule but a whole cup of liquid containing some 1020 of them. This causes a number of problems, concerning in particular the preparation of an initial state, fluctuations in the free time evolution of the molecules and the readout. There are several ways to overcome these difficulties and we refer the reader again to [32] and [172] for details. Concrete implementations of NMR quantum computers are capable to use up to seven qubits [213]. A recent review can be found in [87] The fundamental problem of the two methods for quantum computation discussed so far, is their lack of scalability. It is realistic to assume that NMR and ion-trap quantum computer with up to tens of qubits will exist somewhen in the future but not with thousands of qubits which are necessary for “real world” applications. There are, however, many other alternative proposals available and some of them might be capable to avoid this problem. The following is a small (not at all exhaustive) list: atoms in optical lattices [37], semiconductor nanostructures such as quantum dots (there are many works in this area, some recent are [209, 40, 28, 38]) and arrays of Josephson junctions [155]. A second circle of experiments we want to mention here is grouped around quantum communication and quantum cryptography (for a more detailed overview let us refer to [227] and [97]). Realizations of quantum cryptography are fairly far developed and it is currently possible to span up to 50km with optical fibers (e.g. [126]). Potentially greater distances can be bridged by “free space cryptography” where the quantum information is transmitted through the air (e.g [44]). With this technology satellites can be used as some sort of “relays”, thus enabling quantum key distribution over arbitrary distances. In the meantime there are quite a lot of successful implementations. For a detailed discussion we will refer the reader to the review of Gisin et. al. [97] and the references therein. Other experiments concern the usage of entanglement in quantum communication. The creation and detection of entangled photons is here a fundamental building block. Nowadays this is no problem and the most famous experiment in this context is the one of Aspect et. al. [11], where the maximal violation of Bell inequalities was demonstrated with polarization correlated photons. Another spectacular experiment is the creation 15 1.3. Experimental realizations of entangled photons over a distance of 10 km using standard telecommunication optical fibers by the Geneva group [211]. Among the most exciting applications of entanglement is the realization of entanglement based quantum key distribution [130], the first successful “teleportation” of a photon [33, 31] and the implementation of “dense coding” [159]; cf. Section 4.1. 1. Introduction 16 Part I Fundamentals Chapter 2 Basic concepts After we have got a first, rough impression of the basic ideas and most relevant subjects of quantum information theory, let us start with a more detailed presentation. First we have to introduce the fundamental notions of the theory and their mathematical description. Fortunately, much of the material we should have to present here, like Hilbert spaces, tensor products and density matrices, is known already from quantum mechanics and we can focus our discussion to those concepts which are less familiar like POV measures, completely positive maps and entangled states. 2.1 Systems, States and Effects As classical probability theory quantum mechanics is a statistical theory. Hence its predictions are of probabilistic nature and can only be tested if the same experiment is repeated very often and the relative frequencies of the outcomes are calculated. In more operational terms this means: the experiment has to be repeated according to the same procedure as it can be set out in a detailed laboratory manual. If we consider a somewhat idealized model of such a statistical experiment we get in fact two different types of procedures: first preparation procedures which prepare a certain kind of physical system in a distinguished state and second registration procedures measuring a particular observable. A mathematical description of such a setup basically consists of two sets S and E and a map S × E 3 (ρ, A) → ρ(A) ∈ [0, 1]. The elements of S describe the states, i.e. preparations, while the A ∈ E represent all yes/no measurements (effects) which can be performed on the system. The probability (i.e. the relative frequency for a large number of repetitions) to get the result “yes”, if we are measuring the effect A on a system prepared in the state ρ, is given by ρ(A). This is a very general scheme applicable not only to quantum mechanics but also to a very broad class of statistical models, containing in particular classical probability. In order to make use of it we have to specify of course the precise structure of the sets S and E and the map ρ(A) for the types of systems we want to discuss. 2.1.1 Operator algebras Throughout this paper we will encounter three different kinds of systems: quantum and classical systems and hybrid systems which are half classical, half quantum (cf. Subsection 2.2.2). In this subsection we will describe a general way to define states and effects which is applicable to all three cases and which therefore provides a handy way to discuss all three cases simultaneously (this will become most useful in Section 2.2 and 2.3). The scheme we are going to discuss is based on an algebra A of bounded operators acting on a Hilbert space H. More precisely A is a (closed) linear subspace of B(H), the algebra of bounded operates on H, which contains the identity (1I ∈ A) and is closed under products (A, B ∈ A ⇒ AB ∈ A) and adjoints (A ∈ A ⇒ A∗ ∈ A). For simplicity we will refer to each such A as an observable algebra. The key observation is now that each type of system we will study in the following can be completely characterized by its observable algebra A, i.e. once A is known there is a systematic way to derive the sets S and E and the map (ρ, A) 7→ ρ(A) from it. We frequently make use of this fact by referring to systems in terms of their observable algebra A, or even by identifying them with their algebra and saying that A is the system. 2.1. Systems, States and Effects 19 Although A and H can be infinite dimensional in general, we will consider only finite dimensional Hilbert spaces, as long as nothing else is explicitly stated. Since most research in quantum information is done up to now for finite dimensional systems this is not a too severe loss of generality. (We come back to this point in Chapter 13 where we will discuss the new aspects which arises from infinite dimensional observable algebras.) Hence we can choose H = Cd and B(H) is just the algebra of complex d × d matrices. Since A is a subalgebra of B(H) it operates naturally on H and it inherits from B(H) the operator norm kAk = supkψk=1 kAψk and the operator ordering A ≥ B ⇔ hψ, Aψi ≥ hψ, Bψi ∀ψ ∈ H. Now we can define: S(A) = {ρ ∈ A∗ | ρ ≥ 0, ρ(1I) = 1} (2.1) where A∗ denotes the dual space of A, i.e. the set of all linear functionals on A, and ρ ≥ 0 means ρ(A) ≥ 0 ∀A ≥ 0. Elements of S(A) describe the states of the system in question while effects are given by E(A) = {A ∈ A | A ≥ 0, A ≤ 1I}. (2.2) The probability to measure the effect A in the state ρ is ρ(A). More generally we can look at ρ(A) for an arbitrary A as the expectation value of A in the state ρ. Hence the idea behind Equation (2.1) is to define states in terms of their expectation value functionals. Both spaces are convex, i.e. ρ, σ ∈ S(A) and 0 ≤ λ ≤ 1 implies λρ + (1 − λ)σ ∈ S(A) and similarly for E(A). The extremal points of S(A) respectively E(A), i.e. those elements which do not admit a proper convex decomposition (x = λy+(1−λ)z ⇒ λ = 1 or λ = 0 or y = z = x), play a distinguished role: the extremal points of S(A) are pure states and those of E(A) are the propositions of the system in question. The latter represent those effects which register a property with certainty in contrast to non-extremal effects which admit some “fuzziness”. As a simple example for the latter consider a detector which registers particles not with certainty but only with a probability which is smaller than one. Finally let us note that the complete discussion of this section can be generalized easily to infinite dimensional systems, if we replace H = Cd by an infinite dimensional Hilbert space (e.g. H = L2 (R)). This would require however more material about C* algebras and measure theory than we want to use in this paper. 2.1.2 Quantum mechanics For quantum mechanics we have A = B(H), (2.3) where we have chosen again H = Cd . The corresponding systems are called d-level systems or qubits if d =£2 holds. notations we frequently write S(H) ¤ To avoid £ clumsy ¤ and E(H) instead of S B(H) and E B(H) . From Equation (2.2) we immediately see that an operator A ∈ B(H) is an effect iff it is positive and bounded from above by 1I. An element P ∈ E(H) is a propositions iff P is a projection operator (P 2 = P ). States are described in quantum mechanics usually by density matrices, i.e. positive and normalized trace class1 operators. To make contact to the general definition in Equation (2.1) note first that B(H) is a Hilbert space with the HilbertSchmidt scalar product hA, Bi = tr(A∗ B). Hence each linear functional ρ ∈ B(H)∗ 1 On a finite dimensional Hilbert space this attribute is of course redundant, since each operator is of trace class in this case. Nevertheless we will frequently use this terminology, due to greater consistency with the infinite dimensional case. 2. Basic concepts 20 can be expressed in terms of a (trace class) operator ρe by2 A 7→ ρ(A) = tr(e ρA). It is obvious that each ρe defines a unique functional ρ. If we start on the other hand with ρ we can recover the matrix elements of ρe from ρ by ρekj = tr(e ρ|jihk|) = ρ(|jihk|), where |jihk| denotes the canonical basis of B(H) (i.e. |jihk|ab = δja δkb ). More generally we get for ψ, φ ∈ H the relation hφ, ρeψi = ρ(|ψihφ|), where |ψihφ| now denotes the rank one operator which maps η ∈ H to hφ, ηiψ. In the following we drop the ∼ and use the same symbol for the operator and the functional whenever confusion can be avoided. Due to the same abuse of language we will interpret elements of B(H)∗ frequently as (trace class) operators instead of linear functionals (and write tr(ρA) instead of ρ(A)). However we do not identify B(H)∗ with B(H) in general, because the two different notations help to keep track of the distinction between spaces of states and spaces of observables. In addition we equip B ∗ (H) with the trace-norm kρk1 = tr |ρ| instead of the operator norm. Positivity of the functional ρ implies positivity of the operator ρ due to 0 ≤ ρ(|ψihψ|) = hψ, ρψi and the same holds for normalization: 1 = ρ(1I) = tr(ρ). Hence we can identify the state space from Equation (2.1) with the set of density matrices, as expected for quantum mechanics. Pure states of a quantum system are the one dimensional projectors. As usual we will frequently identify the density matrix |ψihψ| with the wave function ψ and call the latter in abuse of language a state. To get a useful parameterization of the state space consider again the HilbertSchmidt scalar product hρ, σi = tr(ρ∗ σ), but now on B ∗ (H). The space of trace free matrices in B ∗ (H) (alternatively the functionals with ρ(1I) = 0) is the corresponding orthocomplement 1I⊥ of the unit operator. If we choose a basis σ1 , . . . , σd2 −1 with hσj , σk i = 2δjk in 1I⊥ we can write each selfadjoint (trace class) operator ρ with tr(ρ) = 1 as 2 ρ= d −1 2 1I 1 1I 1 X + xj σj =: + ~x · ~σ , with ~x ∈ Rd −1 . d 2 j=1 d 2 (2.4) If d = 2 or d = 3 holds, it is most natural to choose the Pauli matrices respectively the Gell-Mann matrices (cf. e.g. Sect. 13.4 of [62]) for the σj . In the qubit case it is easy to see that ρ ≥ 0 holds iff |~x| ≤ 1. Hence the state space S(C2 ) coincides with the Bloch ball {~x ∈ R3 | |~x| ≤ 1}, and the set of pure states with its boundary, the Bloch sphere {~x ∈ R3 | |~x| = 1}. This shows in a very geometric way that the pure states are the extremal points of the convex set S(H). If ρ is more generally a pure state of a d-level system we get 1 = tr(ρ2 ) = p 1 1 2 + |~x| ⇒ |~x| = 2 (1 − 1/d). d 2 (2.5) This implies that all states are contained in the ball with radius 21/2 (1 − 1/d)1/2 , however not all operators in this set are positive. A simple example is d −1 1I±21/2 (1− 1/d)1/2 σj , which is positive only if d = 2 holds. 2.1.3 Classical probability Since the difference between classical and quantum systems is an important issue in this work let us reformulate classical probability theory according to the general scheme from Subsection 2.1.1. The restriction to finite dimensional observable algebras leads now to the assumption that all systems we are considering admit a finite 2 If we consider infinite dimensional systems this is not true. In this case the dual space of the observable algebra is much larger and Equation (2.1) leads to states which are not necessarily given by trace class operators. Such “singular states” play an important role in theories which admit an infinite number of degrees of freedom like quantum statistics and quantum field theory; cf. [35]. This point will be essential in the discussion of infinitely entangled states; cf. Chapter 13. 2.1. Systems, States and Effects 21 set X of elementary events. Typical examples are: throwing a dice X = {1, . . . , 6}, tossing a coin X = {“head”, “number”} or classical bits X = {0, 1}. To simplify the notations we write (as in quantum mechanics) S(X) and E(X) for the spaces of states and effects. The observable algebra A of such a system is the space A = C(X) = {f : X → C} (2.6) of complex valued functions on X. To interpret this as an operator algebra acting on a Hilbert space H (as indicated in Subsection 2.1.1) choose an arbitrary but fixed orthonormal P basis |xi, x ∈ X in H and identify the function f ∈ C(X) with the operator f = x fx |xihx| ∈ B(H) (we use the same symbol for the function and the operator, provided confusion can be avoided). Most frequently we have X = {1, . . . , d} and we can choose H = Cd and the canonical basis for |xi. Hence C(X) becomes the algebra of diagonal d × d matrices. Using Equation (2.2) we immediately see that f ∈ C(X) is an effect iff 0 ≤ fx ≤ 1, ∀x ∈ X. Physically we can interpret fx as the probability that the effect f registers the elementary event x. This makes the distinction between propositions and “fuzzy” effects very transparent: P ∈ E(X) is a proposition iff we have either Px = 1 or Px = 0 for all x ∈ X. Hence the propositions P ∈ C(X) are in one to one correspondence with the subsets ωP = {x ∈ X | Px = 1} ⊂ X which in turn describe the events of the system. Hence P registers the event ωP with certainty, while a fuzzy effect f < P does this only with a probability less then one. Since C(X) is finite dimensional and admits the distinguished basis |xihx|, x ∈ X it is naturally isomorphic to its dual C ∗ (X). More precisely: each linear functional ρ ∈ C ∗ (X) defines Pand is uniquely defined by the function x 7→ ρx = ρ(|xihx|) and we have ρ(f ) = x fx ρx . As in the quantum case we will identify the function ρ with the linear functional and use the same symbol for both, although we keep the notation C ∗ (X) to indicate that we are talking about states rather than observables. Positivity of ρP ∈ C ∗ (X) is given P by ρx ≥ 0 for all x and normalization leads to 1 = ρ(1I) = ρ ( x |xihx|) = x ρx . Hence to be a state ρ ∈ C ∗ (X) must be a probability distribution on X and ρx is the probability that the elementary event x occurs during statistical experiments with systems in the state ρ. More generally P ρ(f ) = j ρj fj is the probability to measure the effect f on systems in the state ρ. If P is in particular a proposition, ρ(P ) gives the probability for the event ωP . The pure states of the system are the Dirac measures δx , x ∈ X; with δx (|yihy|) = δxy . Hence each ρ ∈ S(X) can be decomposed in a unique way into a convex linear combination of pure states. 2.1.4 Observables Up to now we have discussed only effects, i.e. yes/no experiments. In this subsection we will have a first short look at more general observables. We will come back to this topic in Section 3.2.4 after we have introduced channels. We can think of an observable E taking its values in a finite set X as a map which associates to each possible outcome x ∈ X the effect Ex ∈ E(A) (if A is the observable algebra of the system in question) which is true if x is measured and false otherwise. If the measurement is performed on systems in the state ρ we get for each x ∈ X the probability px = ρ(Ex ) to measure x. Hence the family of the px should be a probability distribution on X, and this implies that E should be a POV measure on X. Definition 2.1.1 Consider an observable algebra A ⊂ B(H) and a finite 3 set X. A family E = (Ex )x∈X of effects in A (i.e. 0 ≤ Ex ≤ 1I) is called a positive 3 This is if course an artifical restriction and in many situations not justified (cf. in particular the discussion of quantum state estimation in Section 4.2 and Chapter 8). However, it helps us to avoid measure theoretical subtleties; cf. Holevo’s book [111] for a more general discussion. 2. Basic concepts 22 P operator valued measure (POV measure) on X if x∈X Ex = 1I holds. If all Ex are projections, E is called projection valued measure (PV measure). From basic quantum mechanics we know that observables are described by self adjoint operators on a Hilbert space H. But, how does this point of view fit into the previous definition? The answer is given by the spectral theorem (Thm. VIII.6 [186]): Each selfadjoint operator A on a finite dimensional Hilbert space H has P the form A = λ∈σ(A) λPλ where σ(A) denotes the spectrum of A, i.e. the set of eigenvalues and Pλ denotes the projection onto the corresponding eigenspace. Hence there is a unique PV measure P = (Pλ )λ∈σ(A) associated to A which is called the spectral measure of A. It is uniquely characterized by the property that the expectaP tion value λ λρ(Pλ ) of P in the state ρ is given for any state ρ by ρ(A) = tr(ρA); as it is well known from quantum mechanics. Hence the traditional way to define observables within quantum mechanics perfectly fits into the scheme just outlined, however it only covers the projection valued case and therefore admits no fuzziness. For this reason POV measures are sometimes called generalized observables. Finally note that the eigenprojections Pλ of A are elements of an observable algebra A iff A ∈ A. This shows two things: First of all we can consider selfadjoint elements of any *-subalgebra A of B(H) as observables of A-systems, and this is precisely the reason why we have called A observable algebra. Secondly we see why it is essential that A is really a subalgebra of B(H): if it is only a linear subspace of B(H) the relation A ∈ A does not imply Pλ ∈ A. 2.2 Composite systems and entangled states Composite systems occur in many places in quantum information theory. A typical example is a register of a quantum computer, which can be regarded as a system consisting of N qubits (if N is the length of the register). The crucial point is that this opens the possibility for correlations and entanglement between subsystems. In particular entanglement is of great importance, because it is a central resource in many applications of quantum information theory like entanglement enhanced teleportation or quantum computing – we already discussed this in Section 1.2 of the introduction. To explain entanglement in greater detail and to introduce some necessary formalism we have to complement the scheme developed in the last section by a procedure which allows us to construct states and observables of the composite system from its subsystems. In quantum mechanics this is done of course in terms of tensor products, and we will review in the following some of the most relevant material. 2.2.1 Tensor products Consider two (finite dimensional) Hilbert spaces H and K. To each pair of vectors ψ1 ∈ H, ψ2 ∈ K we can associate a bilinear form ψ1 ⊗ ψ2 called the tensor product of ψ1 and ψ2 by ψ1 ⊗ ψ2 (φ1 , φ2 ) = hψ1 , φ1 ihψ2 , φ2 i. For two product vectors ψ1 ⊗ ψ2 and η1 ⊗ η2 their scalar product is defined by hψ1 ⊗ ψ2 , η1 ⊗ η2 i = hψ1 , η1 ihψ2 , η2 i and it can be shown that this definition extends in a unique way to the span of all ψ1 ⊗ ψ2 which therefore defines the tensor product H ⊗ K. If we have more than two Hilbert spaces Hj , j = 1, . . . , N their tensor product H1 ⊗ · · · ⊗ HN can be defined similarly. The tensor product A1 ⊗ A2 of two bounded operators A1 ∈ B(H), A2 ∈ B(K) is defined first for product vectors ψ1 ⊗ ψ2 ∈ H ⊗ K by A1 ⊗ A2 (ψ1 ⊗ ψ2 ) = (A1 ψ1 ) ⊗ (A2 ψ2 ) and then extended by linearity. The space B(H ⊗ K) coincides with the span of all A1 ⊗ A2 . If ρ ∈ B(H ⊗ K) is not of product form (and of trace class for infinite dimensional H and K) there is nevertheless a way to define “restrictions” to H respectively K called the partial trace of ρ. It is defined by the equation tr[trK (ρ)A] = tr(ρA ⊗ 1I) ∀A ∈ B(H) (2.7) 2.2. Composite systems and entangled states 23 where the trace on the left hand side is over H and on the right hand side over H ⊗ K. If two orthonormal bases φ1 , . . . , φn and ψ1 , . . . , ψm are given in H respectively K we can consider the product basis P φ1 ⊗ ψ1 , . . . , φn ⊗ ψm in H ⊗ K, and we can expand each Ψ ∈ H ⊗ K as Ψ = jk Ψjk φj ⊗ ψk with Ψjk = hφj ⊗ ψk , Ψi. This procedure works for an arbitrary number of tensor factors. However, if we have exactly a twofold tensor product, there is a more economic way to expand Ψ, called Schmidt decomposition in which only diagonal terms of the form φj ⊗ ψj appear. Proposition 2.2.1 For each element Ψ of the twofold tensor product H ⊗ K there are orthonormal systems φj , j = 1, . . . , n and ψk , k = 1, . . . , n (not necessarily bases, i.e.P n can √ be smaller than dim H and dim K) of H and K respectively such that Ψ = j λj φj ⊗ ψj holds. The φj and ψj are uniquely determined by Ψ. The √ expansion is called Schmidt decomposition and the numbers λj are the Schmidt coefficients. Proof. Consider the partial trace ρ1 = trK (|ΨihΨ|) of the one dimensional projector |ΨihΨ| associated to Ψ. ItPcan be decomposed in terms of its eigenvectors φ n and we get trK (|ΨihΨ|) = ρ1 = n λn |φn ihφn |. Now we can choose an orthonormal basis ψk0 , k = 1, . . . , m in K and expand Ψ with respect to φj ⊗ ψk0 . Carrying out the k P 00 summation we get a family of vectors ψj = k hΨ, φj ⊗ ψk0 iψk0 with the property P Ψ = j φj ⊗ ψj00 . Now we can calculate the partial trace and get for any A ∈ B(H1 ): X j λj hφj , Aφj i = tr(ρ1 A) = hΨ, (A ⊗ 1I)Ψi = X j,k hφj , Aφk ihψj00 , ψk00 i. (2.8) Since A is arbitrary we can compare the left and right hand side of this equation −1/2 00 term by term and we get hψj00 , ψk00 i = δjk λj . Hence ψj = λj ψj is the desired orthonormal system. 2 As an immediate application of this result we can show that each mixed state ρ ∈ B ∗ (H) (of the quantum system B(H)) can be regarded as a pure state on a larger Hilbert space H ⊗ H0 . We just have to consider the eigenvalue expansion ρ = P j λj |φj ihφj | of ρ and to choose an arbitrary orthonormal system ψ j , j = 1, . . . n in H0 . Using Proposition 2.2.1 we get Corollary 2.2.2 Each state ρ ∈ B ∗ (H) can be extended to a pure state Ψ on a larger system with Hilbert space H ⊗ H0 such that trH0 |ΨihΨ| = ρ holds. 2.2.2 Compound and hybrid systems To discuss the composition of two arbitrary (i.e. classical or quantum) systems it is very convenient to use the scheme developed in Subsection 2.1.1 and to talk about the two subsystems in terms of their observable algebras A ⊂ B(H) and B ⊂ B(K). The observable algebra of the composite system is then simply given by the tensor product of A and B, i.e. A ⊗ B := span{A ⊗ B | A ∈ A, B ∈ B} ⊂ B(K ⊗ H). (2.9) The dual of A ⊗ B is generated by product states, (ρ ⊗ σ)(A ⊗ B) = ρ(A)σ(B) and we therefore write A∗ ⊗ B ∗ for (A ⊗ B)∗ . The interpretation of the composed system A ⊗ B in terms of states and effects is straightforward and therefore postponed to the next Subsection. We will consider first the special cases arising from different choices for A and B. If both systems are quantum (A = B(H) and B = B(K)) we get B(H) ⊗ B(K) = B(H ⊗ K) (2.10) 2. Basic concepts 24 as expected. For two classical systems A = C(X) and B = C(Y ) recall that elements of C(X) (respectively C(Y )) are complex valued functions on X (on Y ). Hence the tensor product C(X) ⊗ C(Y ) consists of complex valued functions on X × Y , i.e. C(X) ⊗ C(Y ) = C(X × Y ). In other words states and observables of the composite system C(X) ⊗ C(Y ) are, in accordance with classical probability theory, given by probability distributions and random variables on the Cartesian product X × Y . If only one subsystem is classical and the other is quantum; e.g. a micro particle interacting with a classical measuring device we have a hybrid system. The elements of its observable algebra C(X) ⊗ B(H) can be regarded as operator valued functions on X, i.e. X 3 x 7→ Ax ∈ B(H) and A is an effect iff 0 ≤ Ax ≤ 1I holds for all x ∈ X. The elements of the dual C ∗ (X) ⊗ B ∗ (H) are in a similar way B ∗ (X) valued functions X 3 x 7→ P ρx ∈ B ∗ (H) and ρ is a state iff each ρx is a positive trace class operator on H and x ρx = 1I. The probability to measure the effect A in the state P ρ is x ρx (Ax ). 2.2.3 Correlations and entanglement Let us now consider two effects A ∈ A and B ∈ B then A ⊗ B is an effect of the composite system A ⊗ B. It is interpreted as the joint measurement of A on the first and B on the second subsystem, where the “yes” outcome means “both effects give yes”. In particular A ⊗ 1I means to measure A on the first subsystem and to ignore the second one completely. If ρ is a state of A ⊗ B we can define its restrictions by ρA (A) = ρ(A ⊗ 1I) and ρB (A) = ρ(1I ⊗ A). If both systems are quantum the restrictions of ρ are the partial traces, while in the classical case we have to sum over the B, respectively A variables. For two states ρ1 ∈ S(A) and ρ2 ∈ S(B) there is always a state ρ of A ⊗ B such that ρ1 = ρA and ρ2 = ρB holds: We just have to choose the product state ρ1 ⊗ ρ2 . However in general we have ρ 6= ρA ⊗ ρB which means nothing else then ρ also contains correlations between the two subsystems. Definition 2.2.3 A state ρ of a bipartite system A ⊗ B is called correlated if there are some A ∈ A, B ∈ B such that ρ(A ⊗ B) 6= ρA (A)ρB (B) holds. We immediately see that ρ = ρ1 ⊗ ρ2 implies ρ(A ⊗ B) = ρ1 (A)ρ2 (B) = ρA (A)ρB (B) hence ρ is not correlated. If on the other hand ρ(A⊗B) = ρA (A)ρB (B) holds we get ρ = ρA ⊗ ρB . Hence, the definition of correlations just given perfectly fits into our intuitive considerations. An important issue in quantum information theory is the comparison of correlations between quantum systems on the one hand and classical systems on the other. Hence let us have a closer look on the state space of a system consisting of at least one classical subsystem. Proposition 2.2.4 Each state ρ of a composite system A ⊗ B consisting of a classical (A = C(X)) and an arbitrary system (B) has the form X B (2.11) ρ= λj ρA j ⊗ ρj j∈X B with positive weights λj > 0 and ρA j ∈ S(A), ρj ∈ S(B). Proof. Since A = C(X) is classical, there is a basis |jihj| ∈ A, j ∈ X of Pmutually orthogonal one-dimensional projectors and we can write each A ∈ A as j aj |jihj| (cf. Subsection 2.1.3). For each state ρ ∈ S(A ⊗ B) we can now define ρA j ∈ S(A) −1 B B with ρA (A) = tr(A|jihj|) = a and ρ ∈ S(B) with ρ ρ(|jihj| ⊗ B) and (B) = λ j j jP j j B ⊗ ρ with positive λ as stated. λj = ρ(|jihj| ⊗ 1I). Hence we get ρ = j∈X λj ρA j j j 2 If A and B are two quantum systems it is still possible for them to be correlated in the way just described. We can simply prepare them with a classical random 2.2. Composite systems and entangled states 25 generator which triggers two preparation devices to produce systems in the states B ρA j , ρj with probability λj . The overall state produced by this setup is obviously the ρ from Equation (2.11). However, the crucial point is that not all correlations of quantum systems are of this type! This is an immediate consequence of the definition of pure states ρ = |ΨihΨ| ∈ S(H): Since there is no proper convex decomposition of ρ, it can be written as in Proposition 2.2.4 iff Ψ is a product vector, i.e. Ψ = φ ⊗ ψ. This observation motivates the following definition. Definition 2.2.5 A state ρ of the composite system B(H1 ) ⊗ B(H2 ) is called separable or classically correlated if it can be written as X (1) (2) ρ= λj ρj ⊗ ρ j (2.12) j (k) with states ρj of B(Hk ) and weights λj > 0. Otherwise ρ is called entangled. The set of all separable states is denoted by D(H1 ⊗ H2 ) or just D if H1 and H2 are understood. 2.2.4 Bell inequalities We have just seen that it is quite easy for pure states to check whether they are entangled or not. In the mixed case however this is a much bigger, and in general unsolved, problem. In this subsection we will have a short look at Bell inequalities, which are maybe the oldest criterion for entanglement (for a more detailed review see [233]). Today more powerful methods, most of them based on positivity properties, are available. We will postpone the corresponding discussion to the end of the following section, after we have studied (completely) positive maps (cf. Section 2.4). Bell inequalities are traditionally discussed in the framework of “local hidden variable theories”. More precisely we will say that a state ρ of a bipartite system B(H ⊗ K) admits a hidden variable model, if there is a probability space (X, µ) and (measurable) response functions X 3 x 7→ FA (x, k), FB (x, l) ∈ R for all discrete PV measures A = A1 , . . . , AN ∈ B(H) respectively B = B1 , . . . , BM ∈ B(K) such that Z FA (x, k)FB (x, l)µ(dx) = tr(ρAk ⊗ Bl ) (2.13) X holds for all, k, l and A, B. The value of the functions FA (x, k) is interpreted as the probability to get the value k during an A measurement with known “hidden parameter” x. The set of states admitting a hidden variable model is a convex set and as such it can be described by an (infinite) hierarchy of correlation inequalities. Any one of these inequalities is usually called (generalized) Bell inequality. The most well known one is those given by Clauser, Horne, Shimony and Holt [57]: The state ρ satisfies the CHSH-inequality if ¡ ¢ ρ A ⊗ (B + B 0 ) + A0 ⊗ (B − B 0 ) ≤ 2 (2.14) holds for all A, A0 ∈ B(H) respectively B, B 0 ∈ B(K), with −1I ≤ A, A0 ≤ 1I and −1I ≤ B, B 0 ≤ 1I. For the special case of two dichotomic observables the CHSH inequalities are sufficient to characterize the states with a hidden variable model. In the general case the CHSH-inequalities are a necessary but not a sufficient condition and a complete characterization is not known. Pn (1) (2) It is now easy to see that each separable state ρ = ⊗ ρj adj=1 λj ρj mits a hidden variable model: we have to choose X = 1, . . . , n, µ({j}) = λ j , (1) FA (x, k) = ρx (Ak ) and FB analogously. Hence we immediately see that each state of a composite system with at least one classical subsystem satisfies the Bell inequalities (in particular the CHSH version) while this is not the case for pure 2. Basic concepts 26 quantum systems. The most prominent examples are “maximally entangled states” (cf. Subsection 3.1.1) which violate the CHSH inequality (for appropriately chosen √ A, A0 , B, B 0 ) with a maximal value of 2 2. This observation is the starting point for many discussions concerning the interpretation of quantum mechanics, in par√ ticular because the maximal violation of 2 2 was observed in 1982 experimentally by Aspect and coworkers [11]. We do not want to follow this path (see [233] and the the references therein instead). Interesting for us is the fact that Bell inequalities, in particular the CHSH case in Equation (2.14), provide a necessary condition for a state ρ to be separable. However there exist entangled states admitting a hidden variable model [229]. Hence, Bell inequalities are not sufficient for separability. 2.3 Channels Assume now that we have a number of quantum systems, e.g. a string of ions in a trap. To “process” the quantum information they carry we have to perform in general many steps of a quite different nature. Typical examples are: free time evolution, controlled time evolution (e.g. the application of a “quantum gate” in a quantum computer), preparations and measurements. The purpose of this section is to provide a unified framework for the description of all these different operations. The basic idea is to represent each processing step by a “channel”, which converts input systems, described by an observable algebra A into output systems described by a possibly different algebra B. Henceforth we will call A the input and B the output algebra. If we consider e.g. the free time evolution, we need quantum systems of the same type on the input and the output side, hence in this case we have A = B = B(H) with an appropriately chosen Hilbert space H. If on the other hand we want to describe a measurement we have to map quantum systems (the measured system) to classical information (the measuring result). Therefore we need in this example A = B(H) for the input and B = C(X) for the output algebra, where X is the set of possible outcomes of the measurement (cf. Subsection 2.1.4). Our aim is now to get a mathematical object which can be used to describe a channel. To this end consider an effect A ∈ B of the output system. If we invoke first a channel which transforms A systems into B systems, and measure A afterwards on the output systems, we end up with a measurement of an effect T (A) on the input systems. Hence we get a map T : E(B) → E(A) which completely describes the channel 4 . Alternatively we can look at the states and interpret a channel as a map T ∗ : S(A) → S(B) which transforms A systems in the state ρ ∈ S(A) into B systems in the state T ∗ (ρ). To distinguish between both maps we can say that T describes the channel in the Heisenberg picture and T ∗ in the Schrödinger picture. On the level of the statistical interpretation both points of view should coincide of course, i.e. the probabilities5 (T ∗ ρ)(A) and ρ(T A) to get the result “yes” during an A measurement on B systems in the state T ∗ ρ, respectively a T A measurement on A systems in the state ρ, should be the same. Since (T ∗ ρ)(A) is linear in A we see immediately that T must be an affine map, i.e. T (λ1 A1 + λ2 A2 ) = λ1 T (A1 ) + λ2 T (A2 ) for each convex linear combination λ1 A1 + λ2 A2 of effects in B, and this in turn implies that T can be extended naturally to a linear map, which we will identify in the following with the channel itself, i.e. we say that T is the channel. 2.3.1 Completely positive maps Let us change now slightly our point of view and start with a linear operator T : A → B. To be a channel, T must map effects to effects, i.e. T has to be positive: 4 Note that the direction of the mapping arrow is reversed compared to the natural ordering of processing. 5 To keep notations more readable we will follow frequently the usual convention to drop the parenthesis around arguments of linear operators. Hence we will write T A and T ∗ ρ instead of T (A) and T ∗ (ρ). Similarly we will simply write T S instead of T ◦ S for compositions. 2.3. Channels 27 T (A) ≥ 0 ∀A ≥ 0 and bounded from above by 1I, i.e. T (1I) ≤ 1I. In addition it is natural to require that two channels in parallel are again a channel. More precisely, if two channels T : A1 → B1 and S : A2 → B2 are given we can consider the map T ⊗S which associates to each A⊗B ∈ A1 ⊗A2 the tensor product T (A)⊗S(B) ∈ B1 ⊗B2 . It is natural to assume that T ⊗ S is a channel which converts composite systems of type A1 ⊗ A2 into B1 ⊗ B2 systems. Hence S ⊗ T should be positive as well [178]. Definition 2.3.1 Consider two observable algebras A, B and a linear map T : A → B ⊂ B(H). 1. T is called positive if T (A) ≥ 0 holds for all positive A ∈ A. 2. T is called completely positive (cp) if T ⊗ Id : A ⊗ B(Cn ) → B(H) ⊗ B(Cn ) is positive for all n ∈ N. Here Id denotes the identity map on B(Cn ). 3. T is called unital if T (1I) = 1I holds. Consider now the map T ∗ : B ∗ → A∗ which is dual to T , i.e. T ∗ ρ(A) = ρ(T A) for all ρ ∈ B ∗ and A ∈ A. It is called the Schrödinger picture representation of the channel T , since it maps states to states provided T is unital. (Complete) positivity can be defined in the Schrödinger picture as in the Heisenberg picture and we immediately see that T is (completely) positive iff T ∗ is. It is natural to ask whether the distinction between positivity and complete positivity is really necessary, i.e. whether there are positive maps which are not completely positive. If at least one of the algebras A or B is classical the answer is no: each positive map is completely positive in this case. If both algebras are quantum however complete positivity is not implied by positivity alone. We will discuss explicit examples in Subsection 2.4.2. If item 2 holds only for a fixed n ∈ N the map T is called n-positive. This is obviously a weaker condition then complete positivity. However, n-positivity implies m-positivity for all m ≤ n, and for A = B(Cd ) complete positivity is implied by n-positivity, provided n ≥ d holds. Let us consider now the question whether a channel should be unital or not. We have already mentioned that T (1I) ≤ 1I must hold since effects should be mapped to effects. If T (1I) is not equal to 1I we get ρ(T 1I) = T ∗ ρ(1I) < 1 for the probability to measure the effect 1I on systems in the state T ∗ ρ, but this is impossible for channels which produce an output with certainty, because 1I is the effect which is always true. In other words: If a cp map is not unital it describes a channel which sometimes produces no output at all and T (1I) is the effect which measures whether we have got an output. We will assume in the future that channels are unital if nothing else is explicitly stated. 2.3.2 The Stinespring theorem Consider now channels between quantum systems, i.e. A = B(H1 ) and B = B(H2 ). A fairly simple example (not necessarily unital) is given in terms of an operator V : H1 → H2 by B(H1 ) 3 A 7→ V AV ∗ ∈ B(H2 ). A second example is the restriction to a subsystem, which is given in the Heisenberg picture by B(H) 3 A 7→ A ⊗ 1I K ∈ B(H ⊗ K). Finally the composition S ◦ T = ST of two channels is again a channel. The following theorem, which is the most fundamental structural result about cp maps6 , says that each channel can be represented as a composition of these two examples [202]. 6 Basically there is a more general version of this theorem which works with arbitrary output algebras. It needs however some material from representation theory of C*-algebras which we want to avoid here. See e.g. [178, 115]. 2. Basic concepts 28 Theorem 2.3.2 (Stinespring dilation theorem) Every map T : B(H1 ) → B(H2 ) has the form T (A) = V ∗ (A ⊗ 1IK )V, completely positive (2.15) with an additional Hilbert space K and an operator V : H2 → H1 ⊗ K. Both (i.e. K and V ) can be chosen such that the span of all (A ⊗ 1I)V φ with A ∈ B(H1 ) and φ ∈ H2 is dense in H1 ⊗ K. This particular decomposition is unique (up to unitary equivalence) and called the minimal decomposition. If dim H1 = d1 and dim H2 = d2 the minimal K satisfies dim K ≤ d21 d2 . P By introducing a family |χj ihχj | of one dimensional projectors with j |χj ihχj | = 1I we can define the “Kraus operators” hψ, Vj φi = hψ ⊗ χj , V φi. In terms of them we can rewrite Equation (2.15) in the following form [146]: Corollary 2.3.3 (Kraus form) Every completely positive map T : B(H1 ) → B(H2 ) can be written in the form T (A) = N X Vj∗ AVj (2.16) j=1 with operators Vj : H2 → H1 and N ≤ dim(H1 ) dim(H2 ). Finally let us state a third result which is closely related to the Stinespring theorem. It characterizes all decompositions of a given completely positive map into completely positive summands. By analogy with results for states on abelian algebras (i.e. probability measures) we will call it a Radon-Nikodym theorem; see [9] for a proof. Theorem 2.3.4 (Radon-Nikodym theorem) Let Tx : B(H1 ) → B(H2 ), x ∈ X be a family of completely positive maps and let V : H2 → H1 ⊗ K be the Stinespring P operator of P T̄ = x Tx , then there are uniquely determined positive operators Fx in B(K) with x Fx = 1I and Tx (A) = V ∗ (A ⊗ Fx )V. (2.17) 2.3.3 The duality lemma We will consider a fundamental relation between positive maps and bipartite systems, which will allow us later on to translate properties of entangled states to properties of channels and vice versa. The basic idea originates from elementary linear algebra: A bilinear form φ on a d-dimensional vector space V can be represented by a d × d-matrix, just as an operator on V . Hence, we can transform φ into an operator simply by reinterpreting the matrix elements. In our situation things are more difficult, because the positivity constraints for states and channels should match up in the right way. Nevertheless we have the following theorem. Theorem 2.3.5 Let ρ be a density operator on H ⊗ H1 . Then there is a Hilbert space K a pure state σ on H ⊗ K and a channel T : B(H1 ) → B(K) with ρ = (Id ⊗T ∗ ) σ, (2.18) where Id denotes the identity map on B ∗ (H). The pure state σ can be chosen such that trH (σ) has no zero eigenvalue. In this case T and σ are uniquely ³ determined ´ e (up to unitary equivalence) by Equation (2.18); i.e. if σ e, T with ρ = Id ⊗Te∗ σ e are ∗ ∗ e given, we have σ e = (1I ⊗ U ) σ(1I ⊗ U ) and T ( · ) = U T ( · )U with an appropriate unitary operator U . 2.4. Separability criteria and positive maps 29 Proof. The state σ is obviously the purification of trH1 (ρ). Hence if λj and ψj are and eigenvectors of trH1 (ρ) we can set σ = |ΨihΨ| with p P eigenvalues Ψ = j λj ψj ⊗ φj where φj is an (arbitrary) orthonormal basis in K. It is clear that σ is uniquely determined up to a unitary. Hence we only have to show that a unique T exists if Ψ is given. To satisfy Equation (2.18) we must have ¡ ¢ ¡ ¢ ® ρ |ψj ⊗ ηk ihψl ⊗ ηl | = Ψ, (Id ⊗T ) |ψj ⊗ ηk ihψl ⊗ ηl | Ψ (2.19) ¡ ¢ ® = Ψ, |ψj ihψl | ⊗ T |ηk ihηp | Ψ (2.20) p ¡ ¢ ® = λj λl φj , T |ηk ihηp | φl , (2.21) where ηk is an (arbitrary) orthonormal basis in H1 . Hence T is uniquely determined by ρ in terms of its matrix elements and we only have to check complete positivity. To this end it is useful to note that the map ρ 7→ T is linear if the λj are fixed. Hence it is sufficient to consider the case ρ = |χihχ|. Inserting this in Equation −1/2 (2.21) we immediately see that T (A) = V ∗ AV with hV φj , ηk i = λj hψj ⊗ ηk , χi holds. Hence T is completely positive. Since normalization T (1I) = 1I follows from the choice of the λj the theorem is proved. 2 2.4 Separability criteria and positive maps We have already stated in Subsection 2.3.1 that positive but not completely positive maps exist, whenever input and output algebra are quantum. No such map represents a valid quantum operation, nevertheless they are of great importance in quantum information theory, due to their deep relations to entanglement properties. Hence, this Section is a continuation of the study of separability criteria which we have started in 2.2.4. In contrast to the rest of this section, all maps are considered in the Schrödinger rather than in the Heisenberg picture. 2.4.1 Positivity Let us consider now an arbitrary positive, but not necessarily completely positive map T ∗ : B ∗ (H) → B ∗ (K). If Id again denotes the identity map, it is easy to see that (Id ⊗T ∗ )(σ2 ⊗σ2 ) = σ1 ⊗T ∗ (σ2 ) ≥ 0 holds for each product state σ1 ⊗σ2 ∈ S(H⊗K). Hence (Id ⊗ T ∗ )ρ ≥ 0 for each positive T ∗ is a necessary condition for ρ to be separable. The following theorem proved in [118] shows that sufficiency holds as well. Theorem 2.4.1 A state ρ ∈ B ∗ (H ⊗ K) is separable iff for any positive map T ∗ : B ∗ (K) → B ∗ (H) the operator (Id ⊗T ∗ )ρ is positive. Proof. We will only give a sketch of the proof see [118] for details. The condition is obviously necessary since (Id ⊗T ∗ )ρ1 ⊗ ρ2 ≥ 0 holds for any product state provided T ∗ is positive. The proof of sufficiency relies on the fact that it is always possible to separate a point ρ (an entangled state) from a convex set D (the set of separable states) by a hyperplane. A precise formulation of this idea leads to the following proposition. Proposition 2.4.2 For any entangled state ρ ∈ S(H⊗K) there is an operator A on H ⊗ K called entanglement witness for ρ, with the property ρ(A) < 0 and σ(A) ≥ 0 for all separable σ ∈ S(H ⊗ K). Proof. Since D ⊂ B ∗ (H⊗K) is a closed convex set, for each ρ ∈ S ⊂ B ∗ (H⊗K) with ρ 6∈ D there exists a linear functional α on B ∗ (H ⊗ K), such that α(ρ) < γ ≤ α(σ) for each σ ∈ D with a constant γ. This holds as well in infinite dimensional Banach spaces and is a consequence of the Hahn-Banach theorem (cf. [187] Theorem 3.4). Without loss of generality we can assume that γ = 0 holds. Otherwise we just have 2. Basic concepts 30 to replace α by α − γ tr. Hence the result follows from the fact that each linear functional on B ∗ (H ⊗ K) has the form α(σ) = tr(Aσ) with A ∈ B(H ⊗ K). 2 To continue the proof of Theorem 2.4.1 associate now to any operator A ∈ B(H ⊗ K) the map TA∗ : B ∗ (K) → B ∗ (H) with tr(Aρ1 ⊗ ρ2 ) = tr(ρT1 TA∗ (ρ2 )), (2.22) where ( · )T denotes the transposition in an arbitrary but fixed orthonormal basis |ji, j = 1, . . . , d. It is easy to see that TA∗ is positive if tr(Aρ1 ⊗ ρ2 ) ≥ 0 for all product states ρ1 ⊗ ρ2 ∈ S(H ⊗ K) [128]. A straightforward calculation [118] shows in addition that ¡ ¢ tr(Aρ) = tr |ΨihΨ|(Id ⊗TA∗ )(ρ) (2.23) P holds, where Ψ = d−1/2 j |ji⊗|ji. Assume now that (Id ⊗T ∗ )ρ ≥ 0 for all positive T ∗ . Since TA∗ is positive this implies that the left hand site of (2.23) is positive, hence tr(Aρ) ≥ 0 provided tr(Aσ) ≥ 0 holds for all separable σ, and the statement follows from Proposition 2.4.2. 2 2.4.2 The partial transpose The most typical example for a positive non-cp map is the transposition ΘA = A T of d × d matrices, which we have just used in the proof of Theorem 2.4.1. Θ is obviously a positive map, but the partial transpose B ∗ (H ⊗ K) 3 ρ 7→ (Id ⊗Θ)(ρ) ∈ B ∗ (H ⊗ K) (2.24) is not. The latter can be easily checked with the maximally entangled state (cf. Subsection 3.1.1). 1 X Ψ= √ |ji ⊗ |ji (2.25) d j where |ji ∈ Cd , j = 1, . . . , d denote the canonical basis vectors. In low dimensions the transposition is basically the only positive map which is not cp. Due to results of Størmer [203] and Woronowicz [240] we have: dim H = 2 and dim K = 2, 3 imply that each positive map T ∗ : B ∗ (H) → B ∗ (K) has the form T ∗ = T1∗ + T2∗ Θ with two cp maps T1∗ , T2∗ and the transposition on B(H). This immediately implies that positivity of the partial transpose is necessary and sufficient for separability of a state ρ ∈ S(H ⊗ K) (cf. [118]): Theorem 2.4.3 Consider a bipartite system B(H ⊗ K) with dim H = 2 and dim K = 2, 3. A state ρ ∈ S(H ⊗ K) is separable iff its partial transpose is positive. To use positivity of the partial transpose as a separability criterion was proposed for the first time by Peres [180], and he conjectured that it is a necessary and sufficient condition in arbitrary finite dimension. Although it has turned out in the meantime that this conjecture is wrong in general (cf. Subsection 3.1.5), partial transposition has become a crucial tool within entanglement theory and we define: Definition 2.4.4 A state ρ ∈ B ∗ (H ⊗ K) of a bipartite quantum system is called ppt-state if (Id ⊗Θ)ρ ≥ 0 holds and npt-state otherwise (ppt=“positive partial transpose” and npt=“negative partial transpose”). 2.4. Separability criteria and positive maps 31 2.4.3 The reduction criterion Another frequently used example of a non-cp but positive map is B ∗ (H) 3 ρ 7→ T ∗ (ρ) = (tr ρ)1I − ρ ∈ B ∗ (H). The eigenvalues of T ∗ (ρ) are given byP tr ρ − λi , where λi are the eigenvalues of ρ. If ρ ≥ 0 we have λi ≥ 0 and therefore j λj − λk ≥ 0. Hence T ∗ is positive. That T ∗ is not completely positive follows if we consider again the example |ψihψ| from Equation (2.25), hence we get 1I ⊗ tr2 (ρ) − ρ ≥ 0, tr1 (ρ) ⊗ 1I − ρ ≥ 0 (2.26) for any separable state ρ ∈ B ∗ (H ⊗ K), These equations are another non-trivial separability criterion, which is called the reduction criterion [117, 52]. It is closely related to the ppt criterion, due to the following proposition (see [117]) for a proof). Proposition 2.4.5 Each ppt-state ρ ∈ S(H ⊗ K) satisfies the reduction criterion. If dim H = 2 and dim K = 2, 3 both criteria are equivalent. Hence we see with Theorem 2.4.3 that a state ρ in 2 × 2 or 2 × 3 dimensions is separable iff it satisfies the reduction criterion. Chapter 3 Basic examples After the somewhat abstract discussion in the last chapter we will become more concrete now. In the following we will present a number of examples which help on the one hand to understand the structures just introduced, and which are of fundamental importance within quantum information on the other. 3.1 Entanglement Although our definition of entanglement (Definition 2.2.5) is applicable in arbitrary dimensions, detailed knowledge about entangled states is available only for low dimensional systems or for states with very special properties. In this section we will discuss some of the most basic examples. 3.1.1 Maximally entangled states Let us start with a look on pure states of a composite systems A ⊗ B¢ and their ¡ possible correlations. If one subsystem is classical, i.e. A = C {1, . . . , d} , the state space is given according to Subsection 2.2.2 by S(B)d and ρ ∈ S(B)d is pure iff ρ = (δj1 τ, . . . , δjd τ ) with j = 1, . . . , d and a pure state τ of the B system. Hence the restrictions of ρ to A respectively B are the Dirac measure δj ∈ S(X) or τ ∈ S(B), in other words both restrictions are pure. This is completely different if A and B are quantum, i.e. A ⊗ B = B(H ⊗ K): Consider ρ = |ΨihΨ| with Ψ ∈ H ⊗ K and P 1/2 Schmidt decomposition (Proposition 2.2.1) Ψ = j λj φj ⊗ ψj . Calculating the A restriction, i.e. the partial trace over K we get X 1/2 1/2 tr[trK (ρ)A] = tr[|ΨihΨ|A ⊗ 1I] = λj λk hφj , Aφk iδjk , (3.1) jk P hence trK (ρ) = j λj |φj ihφj | is mixed iff Ψ is entangled. The most extreme case arises if H = K = Cd and trK (ρ) is maximally mixed, i.e. trK (ρ) = 1dI . We get for Ψ d 1 X φj ⊗ ψ j Ψ= √ d j=1 (3.2) with two orthonormal bases φ1 , . . . , φd and ψ1 , . . . , ψd . In 2n × 2n dimensions these states violate maximally the CHSH inequalities, with appropriately chosen operators A, A0 , B, B 0 . Such states are therefore called maximally entangled. The most prominent examples of maximally entangled states are the four “Bell states” for two qubit systems, i.e. H = K = C2 , |1i, |0i denotes the canonical basis and 1 Φ0 = √ (|11i + |00i) , 2 Φj = i(1I ⊗ σj )Φ0 , j = 1, 2, 3 (3.3) where we have used the shorthand notation |jki for |ji ⊗ |ki and the σj denote the Pauli matrices. The Bell states, which form an orthonormal basis of C2 ⊗C2 , are the best studied and most relevant examples of entangled states within quantum information. A mixture of them, i.e. a density matrix ρ ∈ S(C2 ⊗ C2 ) with eigenvectors Φj and P eigenvalues 0 ≤ λj ≤ 1, j λj = 1 is called a Bell diagonal state. It can be shown [24] that ρ is entangled iff maxj λj > 1/2 holds. We omit the proof of this statement here, but we will come back to this point in Chapter 5 within the discussion of entanglement measures. 3.1. Entanglement 33 Let us come back to the general case now and consider an arbitrary ρ ∈ S(H⊗H). Using maximally entangled states, we can introduce another separability criterion in terms of the maximally entangled fraction (cf. [24]) F(ρ) = sup Ψ max. ent. hΨ, ρΨi. (3.4) If ρ is separable the reduction criterion (2.26) implies hΨ, [tr1 (ρ) ⊗ 1I − ρ]Ψi ≥ 0 for any maximally entangled state. Since the partial trace of |ΨihΨ| is d−1 1I we get d−1 = hΨ, tr1 (ρ) ⊗ 1IΨi ≤ hΨ, ρΨi, (3.5) hence F(ρ) ≤ 1/d. This condition is not very sharp however. Using the ppt criterion it can be shown that ρ = λ|Φ1 ihΦ1 | + (1 − λ)|00ih00| (with the Bell state Φ1 ) is entangled for all 0 < λ ≤ 1 but a straightforward calculation shows that F(ρ) ≤ 1/2 holds for λ ≤ 1/2. Finally, we have to mention here a very useful parameterization of the set of pure states on H ⊗ H in terms of maximally entangled states: If Ψ is an arbitrary but fixed maximally entangled state, each φ ∈ H ⊗ H admits (uniquely determined) operators X1 , X2 such that φ = (X1 ⊗ 1I)Ψ = (1I ⊗ X2 )Ψ (3.6) holds. This can be easily checked in a product basis. 3.1.2 Werner states If we consider entanglement of mixed states rather than pure ones, the analysis becomes quite difficult, even if the dimensions of the underlying Hilbert spaces are low. The reason is that the state space S(H1 ⊗ H2 ) of a two-partite system with dim Hi = di is a geometric object in a d21 d22 −1 dimensional space. Hence even in the simplest non-trivial case (two qubits) the dimension of the state space becomes very high (15 dimensions) and naive geometric intuition can be misleading. Therefore it is often useful to look at special classes of model states, which can be characterized by only few parameters. A quite powerful tool is the study of symmetry properties; i.e. to investigate the set of states which is invariant under a group of local unitaries. A general discussion of this scheme can be found in [221]. In this paper we will present only three of the most prominent examples. Consider first a state ρ ∈ S(H ⊗ H) (with H = Cd ) which is invariant under the group of all U ⊗ U with a unitary U on H; i.e. [U ⊗ U, ρ] = 0 for all U . Such a ρ is usually called a Werner state [229, 181] and its structure can be analyzed quite easily using a well known result of group theory which goes back to Weyl [237] (see also Theorem IX.11.5 of [195]), and which we will state in detail for later reference: Theorem 3.1.1 Each operator A on the N -fold tensor product H ⊗N of the (finite dimensional) Hilbert space H which commutes with all unitaries of the form U ⊗N P is a linear combination of permutation operators, i.e. A = π λπ Vπ , where the sum is taken over all permutations π of N elements, λπ ∈ C and Vπ is defined by Vπ φ1 ⊗ · · · ⊗ φN = φπ−1 (1) ⊗ · · · ⊗ φπ−1 (N ) . (3.7) In our case (N = 2) there are only two permutations: the identity 1I and the flip F (ψ ⊗ φ) = φ ⊗ ψ. Hence ρ = a1I + bF with appropriate coefficients a, b. Since ρ is a density matrix, a and b are not independent. To get a transparent way to express these constraints, it is reasonable to consider the eigenprojections P± of F rather then 1I and F ; i.e. F P± ψ = ±P± ψ and P± = (1I ± F )/2. The P± are the projections ⊗2 on the subspaces H± ⊂ H ⊗ H of symmetric respectively antisymmetric tensor 3. Basic examples 34 products (Bose- respectively Fermi-subspace). If we write d± = d(d ± 1)/2 for the ⊗2 dimensions of H± we get for each Werner state ρ ρ= λ (1 − λ) P+ + P− , d+ d− λ ∈ [0, 1]. (3.8) On the other hand it is obvious that each state of this form is U ⊗ U invariant, hence a Werner state. If ρ is given, it is very easy to calculate the parameter λ from the expectation value of ρ and the flip tr(ρF ) = 2λ − 1 ∈ [−1, 1]. Therefore we can write for an arbitrary state σ ∈ S(H ⊗ H) PUU (σ) = (1 − tr σF ) tr(σF ) + 1 P+ + P− , 2d+ 2d− (3.9) and this defines a projection from the full state space to the set of Werner states which is called the twirl operation. In many cases it is quite useful that it can be written alternatively as a group average of the form Z (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU, (3.10) PUU (σ) = U(d) where dU denotes the normalized, left invariant Haar measure on U(d). To check this identity note first that its right hand side is indeed U ⊗ U invariant, due to the invariance of the volume element dU . Hence we have to check only that the trace of F times the integral coincides with tr(F σ): # Z " Z tr F U(d) (U ⊗ U )σ(U ∗ ⊗ U ∗ )dU = U(d) tr [F (U ⊗ U )σ(U ∗ ⊗ U ∗ )] dU (3.11) dU = tr(F σ), (3.12) = tr(F σ) Z U(d) where we have used the fact that F commutes with U ⊗ U and the normalization of dU . We can apply PUU obviously to arbitrary operators A ∈ B(H ⊗ H) and, as an integral over unitarily implemented operations, we get a channel. Substituting U → U ∗ in (3.10) and cycling the trace tr(APUU (σ)) we find tr(PUU (A)ρ) = tr(APUU (ρ)), hence PUU has the same form in the Heisenberg and the Schrödinger picture (i.e. ∗ PUU = PUU ). If σ ∈ S(H ⊗ H) is a separable state the integrand of PUU (σ) in Equation (3.10) consists entirely of separable states, hence PUU (σ) is separable. Since each Werner state ρ is the twirl of itself, we see that ρ is separable iff it is the twirl PUU (σ) of a separable state σ ∈ S(H ⊗ H). To determine the set of separable Werner states we therefore have to calculate only the set of all tr(F σ) ∈ [−1, 1] with separable σ. Since each such σ admits a convex decomposition into pure product states it is sufficient to look at hψ ⊗ φ, F ψ ⊗ φi = |hψ, φi|2 (3.13) which ranges from 0 to 1. Hence ρ from Equation (3.8) is separable iff 1/2 ≤ λ ≤ 1 and entangled otherwise (due to λ = (tr(F ρ) + 1)/2). If H = C2 holds, each Werner state is Bell diagonal and we recover the result from Subsection 3.1.1 (separable if highest eigenvalue less or equal than 1/2). 3.1.3 Isotropic states To derive a second class of states consider the partial transpose (Id ⊗Θ)ρ (with respect to a distinguished base |ji ∈ H, j = 1, . . . , d) of a Werner state ρ. Since ρ is, by definition, U ⊗U invariant, it is easy to see that (Id ⊗Θ)ρ is U ⊗ Ū invariant, where 3.1. Entanglement 35 Ū denotes component wise complex conjugation in the base |ji (we just have to use that U ∗ = Ū T holds). Each state τ with this kind of symmetry is called an isotropic state [183], and our previous discussion shows that τ is a linear combination of 1I and the partial transpose of the flip, which is the rank one operator Fe = (Id ⊗Θ)F = |ΨihΨ| = d X jk=1 |jjihkk|, (3.14) P where Ψ = j |jji is, up to normalization a maximally entangled state. Hence each isotropic τ can be written as ¶ · µ ¸ d2 1I 1 e λ + (1 − λ)F , λ ∈ 0, 2 , (3.15) τ= d d d −1 where the bounds on λ follow from normalization and positivity. As above we can determine the parameter λ from the expectation value tr(Fe τ ) = 1 − d2 λ+d d (3.16) which ranges from 0 to d and this again leads to a twirl operation: For an arbitrary state σ ∈ S(H ⊗ H) we can define ¶ µ £ ¤ £ ¤ 1 e σ) − d 1I + 1 − d tr(Fe σ) Fe , tr( F (3.17) PUŪ (σ) = d(1 − d2 ) and as for Werner states PUŪ can be rewritten in terms of a group average Z PUŪ (σ) = (U ⊗ Ū )σ(U ∗ ⊗ Ū ∗ )dU. (3.18) U(d) Now we can proceed in the same way as above: PUŪ is a channel with PU∗ Ū = PUŪ , its fixed points PUŪ (τ ) = τ are exactly the isotropic states, and the image of the set of separable states under PUŪ coincides with the set of separable isotropic states. To determine the latter we have to consider the expectation values (cf. Equation (3.13)) ¯ ¯ ¯ ¯ d ¯ ¯X (3.19) ψj φj ¯¯ = |hψ, φ̄i|2 ∈ [0, 1]. hψ ⊗ φ, Fe ψ ⊗ φi = ¯¯ ¯ ¯ j=1 This implies that τ is separable iff d(d − 1) d2 ≤ λ ≤ d2 − 1 d2 − 1 (3.20) holds and entangled otherwise. For λ = 0 we recover the maximally entangled state. For d = 2, again we recover again the special case of Bell diagonal states encountered already in the last subsection. 3.1.4 OO-invariant states Let us combine now Werner states with isotropic states, i.e. we look for density matrices ρ which can be written as ρ = a1I + bF + cFe , or, if we introduce the three mutually orthogonal projection operators p0 = 1e F, d p1 = 1 (1I − F ), 2 1 1 (1I + F ) − Fe 2 d (3.21) as a convex linear combination of tr(pj )−1 pj , j = 0, 1, 2: ρ = (1 − λ1 − λ2 )p0 + λ1 p2 p1 + λ2 , tr(p1 ) tr(p2 ) λ1 , λ2 ≥ 0, λ1 + λ2 ≤ 1 (3.22) 3. Basic examples 36 f tr(F ρ) 3 2 1 0 -1 -1 0 1 2 3 tr(F ρ) Figure 3.1: State space of OO-invariant states (upper triangle) and its partial transpose (lower triangle) for d = 3. The special cases of isotropic and Werner states are drawn as thin lines. Each such operator is invariant under all transformations of the form U ⊗ U if U is a unitary with U = Ū , in other words: U should be a real orthogonal matrix. A little bit representation theory of the orthogonal group shows that in fact all operators with this invariance property have the form given in (3.22); cf. [221]. The corresponding states are therefore called OO-invariant, and we can apply basically the same machinery as in Subsection 3.1.2 if we replace the unitary group U(d) by the orthogonal group O(d). This includes in particular the definition of a twirl operation as an average over O(d) (for an arbitrary ρ ∈ S(H ⊗ H)): Z U ⊗ U ρU ⊗ U ∗ dU (3.23) POO (ρ) = O(d) which we can express alternatively in terms of the expectation values tr(F ρ), tr( Fe ρ) by ! à tr(Fe ρ) 1 − tr(F ρ) p2 1 + tr(F ρ) tr(Fe ρ) POO (ρ) = p0 + p1 + − . (3.24) d 2 tr(p1 ) 2 d tr(p2 ) The range of allowed values for tr(F ρ), tr(Fe ρ) is given by −1 ≤ tr(F ρ) ≤ 1, 0 ≤ tr(Fe ρ) ≤ d, tr(F ρ) ≥ 2 tr(Fe ρ) − 1. d (3.25) For d = 3 this is the upper triangle in Figure 3.1. The values in the lower (dotted) triangle belong to partial transpositions of OO-invariant states. The intersection of both, i.e. the gray shaded square Q = 3.1. Entanglement 37 [0, 1]×[0, 1], represents therefore the set of OO-invariant ppt states, and at the same time the set of separable states, since each OO-invariant ppt state is separable. To see the latter note that separable OO-invariant states form a convex subset of Q. Hence, we only have to show that the corners of Q are separable. To ¡ ¢ do this note that 1. POO (ρ) is separable whenever ρ is and 2. that tr F POO (ρ) = tr(F ρ) and ¡ ¢ tr Fe POO (ρ) = tr(F ρ) holds (cf. Equation (3.12)). We can consider pure product ¡ ¢ ¡ ¢ states |φ ⊗ ψihφ ⊗ ψ| for ρ and get |hφ, ψi|2 , hφ, ψ̄i|2 for the tuple tr(F ρ), tr(Fe ρ) . Now the point 1, 1) in Q is obtained if ψ = φ is real, the point (0, 0) is obtained for real and orthogonal φ, ψ and the point (1, 0) belongs to the case ψ = φ and hφ, φ̄i = 0. Symmetrically we get (0, 1) with the same φ and ψ = φ̄. 3.1.5 PPT states We have seen in Theorem 2.4.3 that separable states and ppt states coincide in 2 × 2 and 2×3 dimensions. Another class of examples with this property are OO-invariant states just studied. Nevertheless, separability and a positive partial transpose are not equivalent. An easy way to produce such examples of states which are entangled and ppt is given in terms of unextendible product bases [22]. An orthonormal family φj ∈ H1 ⊗ H2 , j = 1, . . . , N < d1 d2 (with dk = dim Hk ) is called an unextendible product basis1 (UPB) iff 1. all φj are product vectors and 2. there is no product vector orthogonal to all φj . Let us denote the projector to the span of all φj by E, its orthocomplement by E ⊥ , i.e. E ⊥ = 1I−E, and define the state ρ = (d1 d2 −N )−1 E ⊥ . It is entangled because there is by construction no product vector in the support of ρ, and it is ppt. The latter can be seen as follows: The projector E is a sum of the one dimensional projectors |φj ihφj |, j = 1, . . . , N . Since all φj are product vectors the partial transposes of the |φj ihφj | are of the form |φej ihφej |, with another UPB φej , j = 1, . . . , N and the partial transpose (1I ⊗ Θ)E of E is the sum of the |φej ihφej |. Hence (1I ⊗ Θ)E ⊥ = 1I − (1I ⊗ Θ)E is a projector and therefore positive. To construct entangled ppt states we have to find UPBs. The following two examples are taken from [22]. Consider first the five vectors φj = N (cos(2πj/5), sin(2πj/5), h), j = 0, . . . , 4, (3.26) p p √ √ with N = 2/ 5 + 5 and h = 21 1 + 5. They form the apex of a regular pentagonal pyramid with height h. The latter is chosen such that nonadjacent vectors are orthogonal. It is now easy to show that the five vectors Ψj = φj ⊗ φ2jmod5 , j = 0, . . . , 4 (3.27) form a UPB in the Hilbert space H ⊗ H, dim H = 3 (cf. [22]). A second example, again in 3×3 dimensional Hilbert space are the following five vectors (called “Tiles” in [22]): ¢ ¡ ¢ ¡ ¢ 1 1 1 ¡ √ |0i ⊗ |0i − |1i , √ |2i ⊗ |1i − |2i , √ |0i − |1i ⊗ |2i, 2 2 2 ¢ ¢ ¡ ¢ 1 ¡ 1¡ √ |1i − |2i ⊗ |0i, |0i + |1i + |2i ⊗ |0i + |1i + |2i , 3 2 (3.28) where |ki, k = 0, 1, 2 denotes the standard basis in H = C3 . 3.1.6 Multipartite states In many applications of quantum information rather big systems, consisting of a large number of subsystems, occur (e.g. a quantum register of a quantum computer) and it is necessary to study the corresponding correlation and entanglement properties. Since this is a fairly difficult task, there is not much known about – much less 1 This name is somewhat misleading because the φj are not a base of H1 ⊗ H2 . 3. Basic examples 38 as in the two-partite case, which we mainly consider in this paper. Nevertheless, in this subsection we will give a rough outline of some of the most relevant aspects. At the level of pure states the most significant difficulty is the lack of an analog of the Schmidt decomposition [179]. More precisely there are elements in an N -fold tensor product H(1) ⊗ · · · ⊗ H(N ) (with N > 2) which can not be written as2 Ψ= d X j=1 (k) (1) (N ) λ j φj ⊗ · · · ⊗ φ j (3.29) (k) with N orthonormal bases φ1 , . . . , φd of H(k) , k = 1, . . . , N . To get examples for such states in the tri-partite case, note first that any partial trace of |ΨihΨ| with Ψ from Equation (3.29) has separable eigenvectors. Hence, each purification (Corollary 2.2.2) of an entangled, two-partite, mixed state with inseparable eigenvectors (e.g. a Bell diagonal state) does not admit a Schmidt decomposition. This implies on the one hand that there are interesting new properties to be discovered, but on the other we see that many techniques developed for bipartite pure states can be generalized in a straightforward way only for states which are Schmidt decomposable in the sense of Equation (3.29). The most well known representative of this class for a tripartite qubit system is the GHZ state [101] ¢ 1 ¡ Ψ = √ |000i + |111i , 2 (3.30) which has the special property that contradictions between local hidden variable theories and quantum mechanics occur even for non-statistical predictions (as opposed to maximally entangled states of bipartite systems; [101, 163, 162]). A second new aspect arising in the discussion of multiparty entanglement is the fact that several different notions of separability occur. A state ρ of an N -partite system B(H1 ) ⊗ · · · ⊗ B(HN ) is called N -separable if X ρ= (3.31) λ J ρ j1 ⊗ · · · ⊗ ρ jN , J with states ρjk ∈ B ∗ (Hk ) and multi indices J = (j1 , . . . , jk ). Alternatively, however, we can decompose B(H1 ) ⊗ · · · ⊗ B(HN ) in two subsystems (or even into M subsystems if M < N ) and call ρ biseparable if it is separable with respect to this decomposition. It is obvious that N -separability implies biseparability with respect to all possible decompositions. The converse is – not very surprisingly – not true. One way to construct a corresponding counterexample is to use an unextendable product base (cf. Subsection 3.1.5). In [22] it is shown that the tripartite qubit state complementary to the UPB 1 |0, 1, +i, |1, +, 0i, |+, 0, 1i, |−, −, −i with |±i = √ (|0i ± |1i) 2 (3.32) is entangled (i.e. tri-inseparable) but biseparable with respect to any decomposition into two subsystems (cf. [22] for details). Another, maybe more systematic, way to find examples for multipartite states with interesting properties is the generalization of the methods used for Werner states (Subsection 3.1.2), i.e. to look for density matrices ρ ∈ B ∗ (H⊗N ) which commute with all unitaries of the form U ⊗N . Applying again theorem 3.1.1 we see that each such ρ is a linear combination of permutation unitaries. Hence the (k) (k) 2 There is however the possibility to choose the bases φ such that the number of 1 , . . . , φd summands becomes minimal. For tri-partite systems this “minimal canonical form” is study in [1]. 3.2. Channels 39 structure of the set of all U ⊗N invariant states can be derived from representation theory of the symmetric group (which can be tedious for large N !). For N = 3 this program is carried out in [81] and it turns out that the corresponding set of invariant states is a five dimensional (real) manifold. We skip the details here and refer to [81] instead. 3.2 Channels In Section 2.3 we have introduced channels as very general objects transforming arbitrary types of information (i.e. classical, quantum and mixtures of them) into one another. In the following we will consider some of the most important special cases. 3.2.1 Quantum channels Many tasks of quantum information theory require the transmission of quantum information over long distances, using devices like optical fibers or storing quantum information in some sort of memory. Both situations can be described by a channel or quantum operation T : B(H) → B(H), where T ∗ (ρ) is the quantum information which will be received when ρ was sent, or alternatively: which will be read off the quantum memory when ρ was written. Ideally we would prefer those channels which do not affect the information at all, i.e. T = 1I, or, as the next best choice, a T whose action can be undone by a physical device, i.e. T should be invertible and T −1 is again a channel. The Stinespring Theorem (Theorem 2.3.2) immediately shows that this implies T ∗ ρ = U ρU ∗ with a unitary U ; in other words the systems carrying the information do not interact with the environment. We will call such a kind of channel an ideal channel. In real situations however interaction with the environment, i.e. additional, unobservable degrees of freedom, can not be avoided. The general structure of such a noisy channel is given by ¡ ¢ (3.33) T ∗ (ρ) = trK U (ρ ⊗ ρ0 )U ∗ where U : H ⊗ K → H ⊗ K is a unitary operator describing the common evolution of the system (Hilbert space H) and the environment (Hilbert space K) and ρ 0 ∈ S(K) is the initial state of the environment (cf. Figure 3.2). It is obvious that the quantum information originally stored in ρ ∈ S(H) can not be completely recovered from T ∗ (ρ) if only one system is available. It is an easy consequence of the Stinespring theorem that each channel can be expressed in this form Corollary 3.2.1 (Ancilla form) Assume that T : B(H) → B(H) is a channel. Then there is a Hilbert space K, a pure state ρ0 and a unitary map U : H ⊗ K → H ⊗ K such that Equation (3.33) holds. It is always possible, to choose K such that dim(K) = dim(H)3 holds. Proof. Consider the Stinespring form T (A) = V ∗ (A ⊗ 1I)V with V : H → H ⊗ K of T and choose a vector ψ ∈ K such that U (φ ⊗ ψ) = V (φ) can be extended to a unitary map U : H ⊗ K → H ⊗ K (this is always possible since T is unital and V therefore isometric). If ej ∈ H, j = 1, . . . , d1 and fk ∈ K, k = 1, . . . , d2 are orthonormal bases with f1 = ψ we get £ ¤ £ ¤ X hV ρej , (A ⊗ 1I)V ej i (3.34) tr T (A)ρ = tr ρV ∗ (A ⊗ 1I)V = = XD jk j U (ρ ⊗ |ψihψ|)(ej ⊗ fk ), (A ⊗ 1I)U (ej ⊗ fk ) h £ ¤ i = tr trK U (ρ ⊗ |ψihψ|)U ∗ A , which proves the statement. E (3.35) (3.36) 2 3. Basic examples 40 T ∗ (ρ) Unitary ρ A Figure 3.2: Noisy channel Note that there are in general many ways to express a channel this way, e.g. if T is an ideal channel ρ 7→ T ∗ ρ = U ρU ∗ we can rewrite it with an arbitrary unitary U0 : K → K by T ∗ ρ = tr2 (U ⊗ U0 ρ ⊗ ρ0 U ∗ ⊗ U0∗ ). This is the weakness of the ancilla form compared to the Stinespring representation of Theorem 2.3.2. Nevertheless Corollary 3.2.1 shows that each channel which is not an ideal channel is noisy in the described way. The most prominent example for a noisy channel is the depolarizing channel for d-level systems (i.e. H = Cd ) S(H) 3 ρ 7→ ϑρ + (1 − ϑ) 1I ∈ S(H), d 0≤ϑ≤1 (3.37) or in the Heisenberg picture B(H) 3 A 7→ ϑA + (1 − ϑ) tr(A) 1I ∈ B(H). d (3.38) A Stinespring dilation of T (not the minimal one – this can be checked by counting dimensions) is given by K = H ⊗ H ⊕ C and V : H → H ⊗ K = H ⊗3 ⊕ H with # "r d i h√ 1−ϑ X ϑ|ji , (3.39) |ki ⊗ |ki ⊗ |ji ⊕ |ji 7→ V |ji = d k=1 where |ki, k = 1, . . . , d denotes again the canonical basis in H. An ancilla form of T with the same K is given by the (pure) environment state "r # d i h√ 1−ϑ X ψ= ϑ|0i ∈ K (3.40) |ki ⊗ |ki ⊕ d k=1 and the unitary operator U : H ⊗ K → H ⊗ K with U (φ1 ⊗ φ2 ⊗ φ3 ⊕ χ) = φ2 ⊗ φ3 ⊗ φ1 ⊕ χ, (3.41) i.e. U is the direct sum of a permutation unitary and the identity. 3.2.2 Channels under symmetry Similarly to the discussion in Section 3.1 it is often useful to consider channels with special symmetry properties. To be more precise, consider a group G and two unitary representations π1 , π2 on the Hilbert spaces H1 and H2 respectively. A channel T : B(H1 ) → B(H2 ) is called covariant (with respect to π1 and π2 ) if T [π1 (U )Aπ1 (U )∗ ] = π2 (U )T [A]π2 (U )∗ ∀A ∈ B(H1 ) ∀U ∈ G (3.42) 3.2. Channels 41 holds. The general structure of covariant channels is governed by a fairly powerful variant of Stinesprings theorem which we will state below (and which will be very useful for the study of the cloning problem in Chapter 8). Before we do this let us have a short look on a particular class of examples which is closely related to OO-invariant states. Hence consider a channel T : B(H) → B(H) which is covariant with respect to the orthogonal group, i.e. T (U AU ∗ ) = U T (A)U ∗ for all unitaries U on H with Ū = UPin a distinguished basis |ji, j = 1, . . . , d. The maximally entangled state ψ = d−1/2 j |jji is OO-invariant, i.e. U ⊗ U ψ = ψ for all these U . Therefore each state ρ = (Id ⊗T ∗ )|ψihψ| is OO-invariant as well and by the duality lemma (Theorem 2.3.5) T and ψ are uniquely determined (up to unitary equivalence) by ρ. This means we can use the structure of OO-invariant states derived in Subsection 3.1.4 to characterize all orthogonal covariant channels. As a first step consider the linear maps X1 (A) = d tr(A)1I, X2 (A) = dAT and X3 (A) = dA. They are not channels (they are not unital and X2 is not cp) but they have the correct covariance property and it is easy to see that they correspond to the operators 1I, F, Fe ∈ B(H ⊗ H), i.e. (Id ⊗X1 )|ψihψ| = 1I, (Id ⊗X2 )|ψihψ| = F, (Id ⊗X3 )|ψihψ| = Fe . (3.43) Using Equation (3.21) we can determine therefore the channels which belong to the three extremal OO-invariant states (the corners of the upper triangle in Figure 3.1): tr(A)1I − AT T0 (A) = A, T1 (A) = d−1 · ¸ ¢ d¡ 2 T T2 (A) = tr(A)1I + A − A d(d + 1) − 2 2 (3.44) (3.45) Each OO-invariant channel is a convex linear combination of these three. Special cases are the channels corresponding to Werner and isotropic states. The latter leads to depolarizing channels T (A) = ϑA + (1 − ϑ)d−1 tr(A)1I with ϑ ∈ [0, d2 /(d2 − 1)]; cf. Equation (3.15), while Werner states correspond to T (A) = ¤ 1 − ϑ£ ¤ ϑ £ tr(A)1I + AT + tr(A)1I − AT , ϑ ∈ [0, 1]; d+1 d−1 (3.46) cf. Equation (3.8). Let us come back now to the general case. We will state here the covariant version of the Stinespring theorem (see [136] for a proof). The basic idea is that all covariant channels are parameterized by representations on the dilation space. Theorem 3.2.2 Let G be a group with finite dimensional unitary representations πj : G → U(Hj ) and T : B(H1 ) → B(H2 ) a π1 , π2 - covariant channel. 1. Then there is a finite dimensional unitary representation π e : G → U(K) and an operator V : H2 → H1 ⊗ K with V π2 (U ) = π1 (U ) ⊗ π e(U )V and T (A) = V ∗ A ⊗ 1IV . P 2. If T = α T α is a decomposition of T inPcompletely positive and covariant summands, there is a decomposition 1I = α F α of the identity operator on K into positive operators F α ∈ B(K) with [F α , π e(g)] = 0 such that T α (X) = ∗ α V (X ⊗ F )V . To get an explicit example consider the dilation of a depolarizing channel given in Equation (3.39). In this case we have π1 (U ) = π2 (U ) = U and π e(U ) = (U ⊗ Ū )⊕1I. The check that the map V has indeed the intertwining property V π2 (U ) = π1 (U ) ⊗ π e(U ) stated in the theorem is left as an exercise to the reader. 3. Basic examples 42 3.2.3 Classical channels The classical analog to a quantum operation is a channel T : C(X) → C(Y ) which describes the transmission or manipulation of classical information. As we have mentioned already in Subsection 2.3.1 positivity and complete positivity are equivalent in this case. Hence we have to assume only that T is positive and unital. Obviously T is characterized by its matrix elements Txy = δy (T |xihx|), where δy ∈ C ∗ (X) denotes the Dirac measure at y ∈ Y and |xihx| ∈ C(X) is the canonical basis in C(X) (cf. Subsection 2.1.3). Positivity and normalization of T imply that 0 ≤ Txy ≤ 1 and ´i X h ³X ¡ ¢ Txy (3.47) |xihx| = 1 = δy (1I) = δy T (1I) = δy T x x holds. Hence the family (Txy )x∈X is a probability distribution on X and Txy is therefore the probability to get the information x ∈ X at the output side of the channel if y ∈ Y was send. Each classical channel is uniquely determined by its matrix of transition probabilities. For X = Y we see that the information is transmitted without error iff Txy = δxy , i.e. T is an ideal channel if T = Id holds and noisy otherwise. 3.2.4 Observables and preparations Let us consider now a channel which transforms quantum information B(H) into classical information C(X). Since positivity and complete positivity are again equivalent, we just have to look at a positive and unital map E : C(X) → B(H). With the canonical basis |xihx|, x ∈ X of C(X) P we get a family Ex = E(|xihx|), x ∈ X of positive operators Ex ∈ B(H) with x∈X Ex = 1I. Hence the Ex form a POV measure, i.e. an observable. If on the other hand a POV measure Ex ∈ B(H), x ∈ X is given we can define a quantum to classical channel E : C(X) → B(H) by E(f ) = X f (x)Ex . (3.48) x∈X This shows that the observable Ex , x ∈ X and the channel E can be identified and we say E is the observable. Keeping this interpretation in mind it is possible to have a short look at continuous observables without the need of abstract measure theory: We only have to define the classical algebra C(X) for a set X which is not finite or discrete. To this end assume that X is locally compact space (e.g. an open or closed subset of R d ). We choose for C(X) the space of continuous, complex valued functions vanishing at infinity, i.e. |f (x)| < ² for each ² > 0 provided x lies outside an appropriate compact set. C(X) can be equipped with the sup-norm and becomes an Abelian C*-algebra (cf. [35]). To interpret it as an operator algebra as assumed in Subsection 2.1.1 we have to identify f ∈ C(X) with the corresponding multiplication operator on L2 (X, µ), where µ is an appropriate measure on X (e.g. the Lebesgue measure for X ⊂ Rd ). An observable taking arbitrary values in X can now be defined as a positive map E : C(X) → B(H). The probability to get a result in the open subset ω ⊂ X during an E measurement on systems in the state ρ is Kρ (ω) = sup {tr(E(f )ρ) | f ∈ C(X), 0 ≤ f ≤ 1I, supp f ⊂ ω} (3.49) where supp denotes the support of f . Applying a little bit measure theory (basically the Riesz-Markov theorem [186, Thm. IV.18] together with dominated convergence [186, Thm. I.16] and linearity of the trace) it is easy to see that we can express Kρ (ω) for each ρ by a positive operator E(ω) such that ¡ ¢ Kρ (ω) = tr E(ω)ρ (3.50) 3.2. Channels 43 holds. The family of operators E(ω) we get in this way has typical properties of a measure (e.g. some sort of σ-additivity). Hence we have encountered the continuous version of a POV measure. We do not want to discuss the technical details here (cf. [115, Sect. 2.1] instead). For later use we will only remark here that we can reconstruct the channel f 7→ E(f ) from the measure ω 7→ E(ω) in terms of the integrals Z f (x)E(dx) E(f ) = (3.51) X which should be regarded as the continuous variable analog of Equation (3.48). The most well known example for R valued observables are of course position Q and momentum P of a free particle in one dimension. In this case we have H = L 2 (R) and the channels corresponding to Q and P are (in position representation) given by C(R) 3 f 7→ EQ (f ) ∈ B(H) with EQ (f )ψ = f ψ respectively C(R) 3 f 7→ EP (f ) ∈ b ∨ where ∧ and ∨ denote the Fourier transform and its B(H) with EP (f )ψ = (f ψ) inverse. Let us return now to a finite set X and exchange the role of C(X) and B(H); in other words let us consider a channel R : B(H) → C(X) with a classical input and a quantum output algebra. In the Schrödinger picture we get a family of density matrices ρx := R∗ (δx ) ∈ B ∗ (H), x ∈ X, where δx ∈ C ∗ (X) again denote the Dirac measures (cf. Subsection 2.1.3). Hence we get a parameter dependent preparation which can be used to encode the classical information x ∈ X into the quantum information ρx ∈ B ∗ (H). 3.2.5 Instruments and parameter dependent operations An observable describes only the statistics of measuring results, but does not contain information about the state of the system after the measurement. To get a description which fills this gap we have to consider channels which operates on quantum systems and produces hybrid systems as output, i.e. T : B(H) ⊗ M(X) → B(K). Following Davies [66] we will call such an object an instrument. From T we can derive the subchannel C(X) 3 f 7→ T (1I ⊗ f ) ∈ B(K) (3.52) £ ¡ ¢ ¤ which is the observable measured by T , i.e. tr T 1I ⊗ |xihx| ρ is the probability to measure x ∈ X on systems in the state ρ. On the other hand we get for each x ∈ X a quantum channel (which is not unital) B(H) 3 A 7→ Tx (A) = T (A ⊗ |xihx|) ∈ B(K). (3.53) It describes the operation performed by the instrument T if x ∈ X was measured. More precisely if a measurement on systems in the state ρ gives the result x ∈ X we get (up to normalization) the state Tx∗ (ρ) after the measurement (cf. Figure 3.3), while ¡ ¢ tr (Tx∗ (ρ)) = tr (Tx∗ (ρ)1I) = tr ρT (1I ⊗ |xihx|) (3.54) Tx∗ (ρ) ∈ B ∗ (H) ρ ∈ B ∗ (K) T Figure 3.3: Instrument x∈X 3. Basic examples 44 ρ ∈ B ∗ (H) x∈X Tx∗ (ρ) ∈ B ∗ (K) T Figure 3.4: Parameter dependent operation is (again) the probability to measure x ∈ X on ρ. The instrument T can be expressed in terms of the operations Tx by X T (A ⊗ f ) = f (x)Tx (A); (3.55) x hence we can identify T with the family Tx , x ∈ X. Finally we can consider the second marginal of T X Tx (A) ∈ B(K). (3.56) B(H) 3 A 7→ T (A ⊗ 1I) = x∈X It describes the operation we get if the outcome of the measurement is ignored. The most well known example of an instrument is a von Neumann-Lüders measurement associated to a PV measure given by family of projections E x , x = 1, . . . d; e.g. the eigenprojections of a selfadjoint operator A ∈ B(H). It is defined as the channel T : B(H) ⊗ C(X) → B(H) with X = {1, . . . , d} and Tx (A) = Ex AEx , (3.57) Hence we get the final state tr(Ex ρ)−1 Ex ρEx if we measure the value x ∈ X on systems initially in the state ρ – this is well known from quantum mechanics. Let us change now the role of B(H) ⊗ C(X) and B(K); in other words consider a channel T : B(K) → B(H) ⊗ C(X) with hybrid input and quantum output. It describes a device which changes the state of a system depending on additional classical information. As for an instrument, T decomposes into a family of (uniP ∗ p T tal!) channels Tx : B(K) → B(H) such that we get T ∗ (ρ ⊗ p) = x x (ρ) in x the Schrödinger picture. Physically T describes a parameter dependent operation: depending on the classical information x ∈ X the quantum information ρ ∈ B(K) is transformed by the operation Tx (cf. figure 3.4) Finally we can consider a channel T : B(H) ⊗ C(X) → B(K) ⊗ C(Y ) with hybrid input and output to get a parameter dependent instrument (cf. figure 3.5): Similarly to the discussion in the last paragraph we can define a family of instruments T y : ∗ Ty,x (ρ) ∈ B ∗ (K) ρ ∈ B ∗ (H) y∈Y T x∈X Figure 3.5: Parameter dependent instrument 3.2. Channels 45 Alice Bob TB T ∗ ρ ∈ B ∗ (K1 ⊗ K2 ) ρ ∈ B ∗ (H1 ⊗ H2 ) TA Figure 3.6: One way LOCC operation; cf Figure 3.7 for an explanation. P B(H) ⊗ C(X) → B(K), y ∈ Y by the equation T ∗ (ρ ⊗ p) = y py Ty∗ (ρ). Physically T describes the following device: It receives the classical information y ∈ Y and a quantum system in the state ρ ∈ B ∗ (K) as input. Depending on y a measurement with the instrument Ty is performed, which in turn produces the measuring value ∗ x ∈ X and leaves the quantum system in the state (up to normalization) T y,x (ρ); with Ty,x given as in Equation (3.53) by Ty,x (A) = Ty (A ⊗ |xihx|). 3.2.6 LOCC and separable channels Let us consider now channels acting on finite dimensional bipartite systems: T : B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ). In this case we can ask the question whether a channel preserves separability. Simple examples are local operations (LO), i.e. T = T A ⊗ T B with two channels T A,B : B(Hj ) → B(Kj ). Physically we think of such a T in terms of two physicists Alice and Bob both performing operations on their own particle but without information transmission neither classical nor quantum. The next difficult step are local operations with one way classical communications (one way LOCC). This means Alice operates on her system with an instrument, communicates the classical measuring result j ∈ X = {1, . . . , N } to Bob and he selects an operation depending on these data. We can write such a channel as a composition T = (T A ⊗ Id)(Id ⊗T B ) of the instrument T A : B(H1 ) ⊗ C(X1 ) → B(K1 ) and the parameter dependent operation T B : B(H2 ) → C(X1 ) ⊗ B(K2 ) (cf. Figure 3.6) Id ⊗T B T A ⊗Id B(H1 ⊗ H2 ) −−−−−→ B(H1 ) ⊗ C(X) ⊗ B(K2 ) −−−−→ B(K1 ⊗ K2 ). (3.58) It is of course possible to continue the chain in Equation (3.58), i.e. instead of just operating on his system, Bob can invoke a parameter dependent instrument depending on Alice’s data j1 ∈ X1 , send the corresponding measuring results j2 ∈ X2 to Alice and so on. To write down the corresponding chain of maps (as in Equation (3.58)) is simple but not very illuminating and therefore omitted; cf. Figure 3.7 instead. If we allow Alice and Bob to drop some of their particles, i.e. the operations they perform need not to be unital, we get a LOCC channel (“local operations and classical communications”). It represents the most general physical process which can be performed on a two partite system if only classical communication (in both directions) is available. LOCC channels play a significant role in entanglement theory (we will see this in Section 4.3), but they are difficult to handle. Fortunately it is often possible to replace them by closely related operations with a more simple structure: A not necessarily unital channel T : B(H1 ⊗ K2 ) → B(K1 ⊗ K2 ) is called separable, if it is 3. Basic examples 46 Alice Bob Alice Bob Figure 3.7: LOCC operation. The upper and lower curly arrows represent Alice’s respectively Bob’s quantum system, while the straight arrows in the middle stand for the classical information Alice and Bob exchange. The boxes symbolize the channels applied by Alice and Bob. a sum of (in general non-unital) local operations, i.e. T = N X j=1 TjA ⊗ TjB . (3.59) It is easy to see that a separable T maps separable states to separable states (up to normalization) and that each LOCC channel is separable (cf. [21]). The converse however is (somewhat surprisingly) not true: there are separable channels which are not LOCC, see [21] for a concrete example. 3.3 Quantum mechanics in phase space Up to now we have considered only finite dimensional systems and even in this extremely idealized situation it is not easy to get nontrivial results. At a first look the discussion of continuous quantum systems seems therefore to be hopeless. If we restrict our attention however to small classes of states and channels, with sufficiently simple structure, many problems become tractable. Phase space quantum mechanics, which will be reviewed in this Section (see Chapter 5 of [111] for details), provides a very powerful tool in this context. Before we start let us add some remarks to the discussion of Chapter 2 which we have restricted to finite dimensional Hilbert spaces. Basically most of the material considered there can be generalized in a straightforward way, as long as topological issues like continuity and convergence arguments are treated carefully enough. There are of course some caveats (cf. in particular Footnote 2 of Chapter 2), however they do not lead to problems in the framework we are going to discuss and can therefore be ignored. 3.3.1 Weyl operators and the CCR The kinematical structure of a quantum system with d degrees of freedom is usually described by a separable Hilbert space H and 2d selfadjoint operators Q1 , . . . , Qd , P1 , . . . , Pd satisfying the canonical commutation relations [Qj , Qk ] = 0, [Pj , Pk ] = 0, [Qj , Pk ] = iδjk 1I. The latter can be rewritten in a more compact form as R2j−1 = Qj , R2j = Pj , j = 1, . . . , d, [Rj , Rk ] = −iσjk . (3.60) 3.3. Quantum mechanics in phase space 47 Here σ denotes the symplectic matrix σ = diag(J, . . . , J), J= · 0 −1 1 0 ¸ , (3.61) which plays a crucial role for the geometry of classical mechanics. We will call the pair (V, σ) consisting of σ and the 2d-dimensional real vector space V = R 2d henceforth the classical phase space. The relations in Equation (3.60) are, however, not sufficient to fix the operators Rj up to unitary equivalence. The best way to remove the remaining physical ambiguities is the study of the unitaries W (x) = exp(ix · σ · R), x ∈ V, x · σ · R = 2d X xj σjk Rk (3.62) jk=1 instead of the Rj directly. If the family W (x), x ∈ V is irreducible (i.e. [W (x), A] = 0, ∀x ∈ V implies A = λ1I with λ ∈ C) and satisfies3 µ ¶ i W (x)W (x0 ) = exp − x · σ · x0 W (x + x0 ), (3.63) 2 it is called an (irreducible) representation of the Weyl relations (on (V, σ)) and the operators W (x) are called Weyl operators. By the well known Stone - von Neumann uniqueness theorem all these representations are mutually unitarily equivalent, i.e. if we have two of them W1 (x), W2 (x), there is a unitary operator U with U W1 (x)U ∗ = W2 (x) ∀x ∈ V . This implies that it does not matter from a physical point of view which representation we use. The most well known one is of course the Schrödinger representation where H = L2 (Rd ) and Qj , Pk are the usual position and momentum operators. 3.3.2 Gaussian states A density operator ρ ∈ S(H) has finite second moments if the expectation values tr(ρQ2j ) and tr(ρPj2 ) are finite for all j = 1, . . . , d. In this case we can define the mean m ∈ R2d and the correlation matrix α by £ mj = tr(ρR), αjk + iσjk = 2 tr (Rj − mj )ρ(Rk − mk )]. (3.64) The mean m can be arbitrary, but the correlation matrix α must be real and symmetric and the positivity condition α + iσ ≥ 0 (3.65) must hold (this is an easy consequence of the canonical commutation relations (3.60)). Our aim is now to distinguish exactly one state among all others with the same mean and correlation matrix. This is the point where the Weyl operators come into play. Each state ρ ∈ S(H) £can be characterized uniquely by its quantum character¤ istic function X 3 x 7→ tr W (x)ρ ∈ C which should be regarded as the quantum Fourier transform of ρ and is in fact the Fourier transform of the Wigner function of ρ [228]. We call ρ Gaussian if ¶ µ £ ¤ 1 (3.66) tr W (x)ρ = exp im · x − x · α · x 4 3 Note that the CCR (3.60) are implied by the Weyl relations (3.63) but the converse is, in contrast to popular believe, not true: There are representations of the CCR which are unitarily inequivalent to the Schrödinger representation; cf. [186] Section VIII.5 for particular examples. Hence uniqueness can only be achieved on the level of Weyl operators – which is one major reason to study them. 3. Basic examples 48 holds. By differentiation it is easy to check that ρ has indeed mean m and covariance matrix α. The most prominent examples for Gaussian states are the ground state ρ 0 of a system of d harmonic oscillators (where the mean is 0 and α is given by the corresponding classical Hamiltonian) and its phase space translates ρ m = W (m)ρW (−m) (with mean m and the same α as ρ0 ), which are known from quantum optics as coherent states. ρ0 and ρm are pure states and it can be shown that a Gaussian state is pure iff σ −1 α = −1I holds (see [111], Ch. 5). Examples for mixed Gaussians are temperature states of harmonic oscillators. In one degree of freedom this is ¶n ∞ µ N 1 X |nihn| (3.67) ρN = N + 1 n=0 N + 1 where |nihn| denotes the number basis and N is the mean photon number. The characteristic function of ρN is ¸ µ ¶ · ¤ £ 1 1 (3.68) N+ |x|2 , tr W (x)ρN = exp − 2 2 and its correlation matrix is simply α = 2(N + 1/2)1I 3.3.3 Entangled Gaussians Let us now consider bipartite systems. Hence the phase space (V, σ) decomposes into a direct sum V = VA ⊕ VB (where A stands for “Alice” and B for “Bob”) and the symplectic matrix σ = σA ⊕ σB is block diagonal with respect to this decomposition. If WA (x) respectively WB (y) denote Weyl operators, acting on the Hilbert spaces HA , HB , and corresponding to the phase spaces VA and VB , it is easy to see that the tensor product WA (x) ⊗ WB (y) satisfies the Weyl relations with respect to (V, σ). Hence by the Stone - von Neumann uniqueness theorem we can identify W (x ⊕ y), x ⊕ y ∈ Va ⊕ VB = V with WA (x) ⊗ WA (y). This immediately shows that a state ρ on H = HA ⊗ HB is a product state iff its characteristic function factorizes. Separability4 is characterized as follows (we omit the proof, see [234] instead). Theorem 3.3.1 A Gaussian state with covariance matrix α is separable iff there are covariance matrices αA , αB such that · ¸ αA 0 α≥ (3.69) 0 αB holds. This theorem is somewhat similar to Theorem 2.4.1: It provides a useful criterion as long as abstract considerations are concerned, but not for explicit calculations. In contrast to finite dimensional systems, however, separability of Gaussian states can be decided by an operational criterion in terms of nonlinear maps between matrices [93]. To state it we have to introduce some terminology first. The key tool is a sequence of 2n + 2m × 2n + 2m matrices αN , N ∈ N, written in block matrix notation as ¸ · A N CN . (3.70) αN = T BN CN Given α0 the other αN are recursively defined by: AN +1 = BN +1 = AN − Re(XN ) and CN +1 = − Im(XN ) (3.71) 4 In infinite dimensions we have to define separable states (in slight generalization to Definition 2.2.5) as a trace-norm convergent convex sum of product states. 49 3.3. Quantum mechanics in phase space T if αN −iσ ≥ 0 and αN +1 = 0 otherwise. Here we have set XN = CN (BN −iσB )−1 CN 5 and the inverse denotes the pseudo inverse if BN − iσB is not invertible. Now we can state the following theorem (see [93] for a proof): Theorem 3.3.2 Consider a Gaussian state ρ of a bipartite system with correlation matrix α0 and the sequence αN , N ∈ N just defined. 1. If for some N ∈ N we have AN − iσA 6≥ 0 then ρ is not separable. 2. If there is on the other hand an N ∈ N such that AN − kCN k1I − iσA ≥ 0, then the state ρ is separable (kCN k denotes the operator norm of CN ). To check whether a Gaussian state ρ is separable or not we have to iterate through the sequence αN until either condition 1 or 2 holds. In the first case we know that ρ is entangled and separable in the second. Hence only the question remains whether the whole procedure terminates after a finite number of iterations. This problem is treated in [93] and it turns out that the set of ρ for which separability is decidable after a finite number of steps is the complement of a measure zero set (in the set of all separable states). Numerical calculations indicate in addition that the method converges usually very fast (typically less than five iterations). To consider ppt states we first have to characterize the transpose for infinite dimensional systems. There are different ways to do that. We will use the fact that the adjoint of a matrix can be regarded as transposition followed by componentwise complex conjugation. Hence we define for any (possibly unbounded) operator A T = CA∗ C, where C : H → H denotes complex conjugation of the wave function in position representation. This implies QTj = Qj for position and PjT = −Pj for momentum operators. If we insert the partial transpose of a bipartite state ρ into Equation (3.64) we see that the correlation matrix α ejk of ρT picks up a minus sign whenever one of the indices belongs to one of Alice’s momentum operators. To be a state α e should satisfy α e + iσ ≥ 0, but this is equivalent to α + ie σ ≥ 0, where in σ e the corresponding components are reversed i.e. σ e = (−σA ) ⊕ σB . Hence we have shown Proposition 3.3.3 A Gaussian state is ppt iff its correlation matrix α satisfies ¸ · −σA 0 . (3.72) α + ie σ ≥ 0 with σ e= 0 σB The interesting question is now whether the ppt criterion is (for a given number of degrees of freedom) equivalent to separability or not. The following theorem which was proved in [197] for 1 × 1 systems and in [234] in 1 × d case gives a complete answer. Theorem 3.3.4 A Gaussian state of a quantum system with 1 × d degrees of freedom (i.e. dim XA = 2 and dim XB = 2d) is separable iff it is ppt; in other words iff the condition of Proposition 3.3.3 holds. For other kinds of systems the ppt criterion may fail which means that there are entangled Gaussian states which are ppt. A systematic way to construct such states can be found in [234]. Roughly speaking, it is based on the idea to go to the boundary of the set of ppt covariance matrices, i.e. α has to satisfy Equation (3.65) and (3.72) and it has to be a minimal matrix with this property. Using this method explicit examples for ppt and entangled Gaussians are constructed for 2 × 2 degrees of freedom (cf. [234] for details). 5 A−1 is the pseudo inverse of a matrix A if AA−1 = A−1 A is the projector onto the range of A. If A is invertible A−1 is the usual inverse. 3. Basic examples 50 3.3.4 Gaussian channels Finally we want to give a short review on a special class of channels for infinite dimensional quantum systems (cf. [116] for details). To explain the basic idea firstly note that each finite set of Weyl operators (W (xj ), j = 1, . . . , N , xj 6= xk for j 6=Pk) is linear independent. This can be checked easily using expectation values of j λj W (xj ) in Gaussian states. Hence linear maps on the space of finite linear combinations of Weyl operators can be defined by T [W (x)] = f (x)W (Ax) where f is a complex valued function on V and A is a 2d × 2d matrix. If we choose A and f carefully enough, such that some continuity properties match T can be extended in a unique way to a linear map on B(H) – which is, however, in general not completely positive. This means we have to consider special choices for A and f . The most easy case arises if f ≡ 1 and A is a symplectic isomorphism, i.e. AT σA = σ. If this holds the map V 3 x 7→ W (Ax) is a representation of the Weyl relations and therefore unitarily equivalent to the representation we have started with. In other words there is a unitary operator U with T [W (x)] = W (Ax) = U W (x)U ∗ , i.e. T is unitarily implemented, hence completely positive and, in fact, well known as Bogolubov transformation. If A does not preserve the symplectic matrix, f ≡ 1 is no option. Instead we have to choose f such that the matrices ¶ µ i i (3.73) Mjk = f (xj − xk ) exp − xj · σxk + Axj · σAxk 2 2 are positive. Complete positivity of the corresponding T is then a standard result of abstract C*-algebra theory (cf. [67]). If the factor f is in addition a Gaussian, ¢ ¡ i.e. f (x) = exp − 12 x · βx for a positive definite matrix β the cp-map T is called a Gaussian channel. A simple way to construct a Gaussian channel is in terms of an ancilla representation. More precisely, if A : V → V is an arbitrary linear map we can extend it to a symplectic map V 3 x 7→ Ax ⊕ A0 x ∈ V ⊕ V 0 , where the symplectic vector space (V 0 , σ 0 ) now refers to the environment. Consider now the Weyl operator W (x) ⊗ W 0 (x0 ) = W (x, x0 ) on the Hilbert space H ⊗ H0 associated to the phase space element x ⊕ x0 ∈ V ⊕ V 0 . Since A ⊕ A0 is symplectic it admits a unitary Bogolubov transformation U : H ⊗ H0 → H ⊗ H0 with U ∗ W (x, x0 )U = W (Ax, A0 x). If ρ0 denotes now a Gaussian density matrix on H0 describing the initial state of the environment we get a Gaussian channel by £ ¤ £ ¤ £ ¤ £ ¤ tr T ∗ (ρ)W (x) = tr ρ ⊗ ρ0 U ∗ W (x, x0 )U = tr ρW (Ax) tr ρ0 W (A0 x) . (3.74) £ ¤ £ Hence T W (x) = f (x)W (Ax) with f (x) = tr ρ0 W (A0 x)]. Particular examples for Gaussian channels in the case of one degree of freedom are attenuation and amplification channels [113, 116]. They are given in terms of a real parameter k 6= 1 by R2 3 x 7→ Ax = kx ∈ R2 p R2 3 x 7→ A0 x = 1 − k 2 x ∈ R2 < 1, (3.75) for k < 1 and R2 3 (q, p) 7→ A0 (q, p) = (κq, −κp) ∈ R2 with κ = p k2 − 1 (3.76) for k > 1. If the environment is initially in a thermal state ρNe (cf. Equation (3.67)) this leads to · µ 2 ¶ ¸ £ ¤ 1 |k − 1| T W (x) = exp (3.77) + Nc x2 W (kx), 2 2 51 3.3. Quantum mechanics in phase space e . If we start initially with a thermal state ρN it is where we have set Nc = |k 2 − 1|N mapped by T again to a thermal state ρN 0 with mean photon number N 0 given by N 0 = k 2 N + max{0, k 2 − 1} + Nc . (3.78) If Nc = 0 this means that T amplifies (k > 1) or damps (k < 1) the mean photon number, while Nc > 0 leads to additional classical, Gaussian noise. We will reconsider this channel in greater detail in Chapter 6. Chapter 4 Basic tasks After we have discussed the conceptual foundations of quantum information we will now consider some of its basic tasks. The spectrum ranges here from elementary processes, like teleportation 4.1 or error correction 4.4, which are building blocks for more complex applications, up to possible future technologies like quantum cryptography 4.6 and quantum computing 4.5. 4.1 Teleportation and dense coding Maybe the most striking feature of entanglement is the fact that otherwise impossible machines become possible if entangled states are used as an additional resource. The most prominent examples are teleportation and dense coding which we want to discuss in this section. 4.1.1 Impossible machines revisited: Classical teleportation We have already pointed out in the introduction that classical teleportation, i.e. transmission of quantum information over a classical information channel is impossible. With the material introduced in the last two chapters it is now possible to reconsider this subject in a slightly more mathematical way, which makes the following treatment of entanglement enhanced teleportation more transparent. To “teleport” the state ρ ∈ B ∗ (H) Alice performs a measurement (described by a POV measure E1 , . . . , EN ∈ B(H)) on her system and gets a value x ∈ X = {1, . . . , N } with probability px = tr(Ex ρ). These data she communicates to Bob and he prepares a B(H) system in the state ρx . P Hence the overall state Bob gets if the experiment is repeated many times is: ρe = x∈X tr(Ex ρ)ρx (cf. Figure 1.1). The latter can be rewritten as the composition D∗ E∗ B ∗ (H) −−→ C(X)∗ −−→ B ∗ (H)∗ of the channels C(X) 3 f 7→ E(f ) = X x∈X and C ∗ (X) 3 p 7→ D ∗ (p) = f (x)Ex ∈ B(H) X x∈X px ρx ∈ B ∗ (H), (4.1) (4.2) (4.3) i.e. ρe = D ∗ E ∗ (ρ) and this Equation makes sense even if X is not finite. The teleportation is successful if the output state ρe can not be distinguished from the input state ρ by any statistical experiment, i.e. if D ∗ E ∗ (ρ) = ρ. Hence the impossibility of classical teleportation can be rephrased simply as ED 6= Id for all observables E and all preparations D. 4.1.2 Entanglement enhanced teleportation Let us now change our setup slightly. Assume that Alice wants to send a quantum state ρ ∈ B ∗ (H) to Bob and that she shares an entangled state σ ∈ B ∗ (K ⊗ K) and an ideal classical communication channel C(X) → C(X) with him. Alice can perform a measurement E : C(X) → B(H ⊗ K) on the composite system B(H ⊗ K) consisting of the particle to teleport (B(H)) and her part of the entangled system (B(K)). Then she communicates the classical data x ∈ X to Bob and he operates with the parameter dependent operation D : B(H) → B(K) ⊗ C(X) appropriately on his particle (cf. Figure 4.1). Hence the overall procedure can be described by the 4.1. Teleportation and dense coding 53 channel T = (E ⊗ Id)D, or in analogy to (4.1) E ∗ ⊗Id D∗ B ∗ (H ⊗ K⊗2 ) −−−−→ C ∗ (X) ⊗ B ∗ (K) −−→ B ∗ (H). (4.4) The teleportation of ρ is successful if ¡ ¢ T ∗ (ρ ⊗ σ) := D ∗ (E ∗ ⊗ Id)(ρ ⊗ σ) = ρ (4.5) holds, in other words if there is no statistical measurement which can distinguish the final state T ∗ (ρ ⊗ σ) of Bob’s particle from the initial state ρ of Alice’s input system. The two channels E and D and the entangled state σ form a teleportation scheme if Equation (4.5) holds for all states ρ of the B(H) system, i.e. if each state of a B(H) system can be teleported without loss of quantum information. Assume now that H = K = Cd and X = {0, . . . , d2 − 1} holds. In this case we can define a teleportation scheme as follows: The entangled state shared by Alice and Bob is a maximally entangled state σ = |ΩihΩ| and Alice performs a measurement which is given by the one dimensional projections Ej = |Φj ihΦj |, where Φj ∈ H ⊗ H, j = 0, . . . , d2 − 1 is a basis of maximally entangled vectors. If her result is j = 0, . . . , d2 − 1 Bob has to apply the operation τ 7→ Uj∗ τ Uj on his partner of the entangled pair, where the Uj ∈ B(H), j = 0, . . . , d2 − 1 are an orthonormal family of unitary operators, i.e. tr(Uj∗ Uk ) = dδjk . Hence the parameter dependent operation D has the form (in the Schrödinger picture): ∗ ∗ ∗ C (X) ⊗ B (H) 3 (p, τ ) 7→ D (p, τ ) = 2 dX −1 j=0 pj Uj∗ τ Uj ∈ B ∗ (H). Therefore we get for T ∗ (ρ ⊗ σ) from Equation (4.5) £ ¤ £ ¤ tr T ∗ (ρ ⊗ σ)A = tr (E ⊗ Id)∗ (ρ ⊗ σ)D(A) 2 dX −1 i £ tr12 |Φj ihΦj |(ρ ⊗ σ) Uj∗ AUj = tr (4.6) (4.7) (4.8) j=0 = 2 dX −1 j=0 £ ¤ tr (ρ ⊗ σ)|Φj ihΦj | ⊗ (Uj∗ AUj ) (4.9) here tr12 denotes the partial trace over the first two tensor factors (= Alice’s qubits). If Ω, the Φj and the Uj are related by the equation Φj = (Uj ⊗ 1I)Ω Alice ρ E (4.10) Bob x∈X ρ Dx σ Figure 4.1: Entanglement enhanced teleportation 4. Basic tasks 54 it is a straightforward calculation to show that T ∗ (ρ ⊗ σ) = ρ holds as expected [231]. If d = 2 there is basically a unique choice: the Φj , j = 0, . . . , 3 are the four Bell states (cf. Equation (3.3), Ω = Φ0 and the Uj are the identity and the three Pauli matrices. In this way we recover the standard example for teleportation, published for the first time in [19]. The first experimental realizations are [33, 31]. 4.1.3 Dense coding We have just shown how quantum information can be transmitted via a classical channel, if entanglement is available as an additional resource. Now we are looking at the dual procedure: transmission of classical information over a quantum channel. To send the classical information x ∈ X = {1, . . . , n} to Bob, Alice can prepare a d-level quantum system in the state ρx ∈ B ∗ (H), sends it to Bob and he measures an observable given by positive operators E1 , . . . , Em . The probability for Bob to receive the signal y ∈ X if Alice has sent x ∈ X is tr(ρx Ey ) and this defines a classical information channel by (cf. Subsection 3.2.3) C ∗ (X) 3 p 7→ ¡P x∈X p(x) tr(ρx E1 ), . . . , P x∈X p(x) tr(ρx Em ) ¢ ∈ C ∗ (X). (4.11) To get an ideal channel we just have to choose mutually orthogonal pure states ρx = |ψx ihψx |, x = 1, . . . , d on Alice’s side and the corresponding one-dimensional projections Ey = |ψy ihψy |, y = 1, . . . , d on Bob’s. If d = 2 and H = C2 it is possible to send one bit classical information via one qubit quantum information. The crucial point is now that the amount of classical information can be increased (doubled in the qubit case) if Alice shares an entangled state σ ∈ S(H ⊗ H) with Bob. To send the classical information x ∈ X = {1, . . . , n} to Bob, Alice operates on her particle with an operation Dx : B(H) → B(H), sends it through an (ideal) quantum channel to Bob and he performs a measurement E1 , . . . , En ∈ B(H ⊗ H) on both particles. The probability for Bob to measure y ∈ X if Alice has send x ∈ X is given by ¤ £ tr (Dx ⊗ Id)∗ (σ)Ey , (4.12) and this defines the transition matrix of a classical communication channel T . If T is an ideal channel, i.e. if the transition matrix (4.12) is the identity, we will call E, D and σ a dense coding scheme (cf. Figure 4.2). In analogy to Equation (4.4) we can rewrite the channel T defined by (4.12) in terms of the composition D ∗ ⊗Id E∗ C ∗ (X) ⊗ B ∗ (H) ⊗ B ∗ (H) −−−−→ B ∗ (H) ⊗ B ∗ (H) −−→ C ∗ (X) Alice x∈X Bob Dx E σ Figure 4.2: Dense coding x∈X (4.13) 4.2. Estimating and copying 55 of the parameter dependent operation D : C ∗ (X) ⊗ B ∗ (H) → B ∗ (H), p ⊗ τ 7→ n X pj Dj (τ ) (4.14) j=1 and the observable E : C(X) → B(H ⊗ H), p 7→ n X p j Ej , (4.15) j=1 i.e. T ∗ (p) = E ∗ ◦ (D∗ ⊗ Id)(p ⊗ σ). The advantage of this point of view is that it works as well for infinite dimensional Hilbert spaces and continuous observables. Finally let us again consider the case where H = Cd and X = {1, . . . , d2 }. If we choose as in the last paragraph a maximally entangled vector Ω ∈ H ⊗ H, an orthonormal base Φx ∈ H ⊗ H, j = x, . . . , d2 of maximally entangled vectors and an orthonormal family Ux ∈ B(H ⊗ H), x = 1, . . . , d2 of unitary operators, we can construct a dense coding scheme as follows: Ex = |Φx ihΦx |, Dx (A) = Ux∗ AUx and σ = |ΩihΩ|. If Ω, the Φx and the Ux are related by Equation (4.10) it is easy to see that we really get a dense coding scheme [231]. If d = 2 holds, we have to set again the Bell basis for the Φx , Ω = Φ0 and the identity and the Pauli matrices for the Ux . We recover in this case the standard example of dense coding proposed in [27] and we see that we can transfer two bits via one qubit, as stated above. 4.2 Estimating and copying The impossibility of classical teleportation can be rephrased as follows: It is impossible to get complete information about the state ρ of a quantum system by one measurement on one system. However, if we have many systems, say N , all prepared in the same state ρ it should be possible to get (with a clever measuring strategy) as much information on ρ as possible, provided N is large enough. In this way we can circumvent the impossibility of devices like classical teleportation or quantum copying at least in an approximate way. 4.2.1 Quantum state estimation To discuss this idea in a more detailed way consider a number N of d-level quantum systems, all of them prepared in the same (unknown) state ρ ∈ B ∗ (H). Our aim is to estimate the state ρ by measurements on the compound system ρ ⊗N . This is described in terms of an observable (in the following called “estimator”) E N : C(S) → B(H⊗N ) with values in the quantum state space S = S(H). Since S is not finite, we have to apply here the machinery introduced in Section 3.2.4, i.e. C(S) is the algebra of continuous functions, and the probability to get a measuring value in an open subset ω ⊂ S is given by (cf. Section 3.2.4) ¡ ¢ KN (ω) = sup{tr EN (f )ρ | f ∈ C(X), 0 ≤ f ≤ 1I, supp f ⊂ ω}. (4.16) For many practical purposes it is sufficient to consider only those estimators which admits a finite set of possible outcomes. In this case everything reduces to the finite dimensional setup introduced in Chapter 2 and EN becomes X EN (f ) = f (σ)EN,σ (4.17) σ∈X where X ⊂ S is a finite subset of the quantum state space and EN,σ , σ ∈ X is a POV measure. For such a discrete observable the probability KN (ω) simplifies to X ¡ ¢ KN (ω) = tr EN (ω)ρ⊗N with EN (ω) = EN,σ . (4.18) σ∈XN ∩ω 4. Basic tasks 56 However, to discuss structural problems, e.g. a quantitative analysis like the search for an “optimal estimator” (cf. Chapter 10) a restriction to the special case from Equation (4.17) is inappropriate. The criterion for a good estimator EN is that for any one-particle density operator ρ, the value measured on a state ρ⊗N is likely to be close to ρ, i.e. that the probability KN (ω) is small if ω ⊂ S(H) is the complement of a small ball around ρ. Of course, we will look at this problem for large N . So the task is to find a whole “estimation scheme”, i.e. a sequence of observables EN , N = 1, 2, . . ., which is “asymptotically exact”, i.e. error probabilities should vanish in the limit N → ∞. Variants of this scheme arise if have some a priori knowledge about the input state ρ. E.g. if we know that ρ is an element of a distinguished subset Y of the state space S(H) it is sufficient to control the error probabilities for each ρ ∈ Y . Hence we can improve the estimation quality for each ρ ∈ Y at the expense of the usefulness of the estimates for ρ 6∈ Y . The most relevant special case is estimation of pure states (i.e. Y is the set of pure states). It is much better understood than the general problem and it admits a rather simple optimal solution which is closely related to the corresponding cloning problem; we will come back to this circle of questions in a more quantitative way in Chapter 10. Another special case, called “quantum hypothesis testing”, arises if Y is finite. The task is to distinguish between finitely many states in terms of a measurement on N equally prepared systems; cf. [109] for an overview and [173, 107, 166] and the references therein for more recent results. The most direct way to get an asymptotically exact estimation scheme is to perform a sequence of measurements on each of the N input systems separately. A finite set of observables which leads to a successful estimation strategy is usually called a “quorum” (cf. e.g. [150, 226]). E.g. for d = 2 we can perform alternating measurements of the three spin components. If ρ = 21 (1I + ~x · ~σ ) is the Bloch representation of ρ (cf. Subsection 2.1.2) we see that the expectation values of these measurements are given by 12 (1 + xj ). Hence we get an arbitrarily good estimate if N is large enough (we leave the construction of the observable EN associated to this scheme as an easy exercise to the reader). A similar procedure is possible for arbitrary d if we consider the generalized Bloch representation for ρ (see again Subsection 2.1.2). There are however more efficient strategies based on “entangled” measurements (i.e. the EN (σ) can not be decomposed into pure tensor products) on the whole input system ρ⊗N (e.g. [218, 137]). Somewhat in between are “adaptive schemes” [89] consisting of separate measurements but the j th measurement depend on the results of (j − 1)th . We will reconsider this circle of questions in a more quantitative way in Chapter 10. 4.2.2 Approximate cloning By virtue of the no-cloning theorem [239], it is impossible to produce M perfect copies of a d-level quantum system if N < M input systems in the common (unknown) state ρ⊗N are given. More precisely there is no channel TM N : B(H⊗M ) → ⊗N ∗ ) = ρ⊗M holds for all ρ ∈ S(H). Using state estimaB(H⊗N ) such that TM N (ρ tion, however, it is easy to find a device TM N which produces at least approximate copies which become exact in the limit N, M → ∞: If ρ⊗N is given, we measure the observable EN and get the classical data σ ∈ S(H), which we use subsequently to prepare M systems in the state σ ⊗M . In other words, TM N has the form Z ∗ ⊗N B (H ) 3 ρ 7→ σ ⊗M KN (dσ) ∈ B ∗ (H⊗M ) (4.19) S where KN denotes the probability measure from Equation (4.16). If EN is discrete as in Equation (4.17) this channel simplifies to X B ∗ (H⊗N ) 3 ρ 7→ tr(EN,σ ρ)σ ⊗M ∈ B ∗ (H⊗M ). (4.20) σ∈XN 4.3. Distillation of entanglement 57 We immediately see that the probability to get wrong copies coincides exactly with the error probability of the estimator EN . This shows first that we get exact copies in the limit N → ∞ and second that the quality of the copies does not depend on the number M of output systems, i.e. the asymptotic rate limN,M →∞ M/N of output systems per input system can be arbitrary large. Note that the latter (independence of cloning quality from the output number M ) is a special feature of the estimation based cloning scheme just introduced. In Chapter 9 we will encounter cloning maps which are not based on estimation and repreparation and which produce better copies, as long as the required number M of outputs is finite. Similar to the estimation problem we can improve the quality of the outcomes if we can use a priori information about the state ρ to be cloned. The most relevant example arises again if ρ is pure. A detailed discussion of this special case, including the construction of the (unique) optimal pure state cloner, will be given in Chapter 9. The fact that the cloning map from Equation (4.19) uses classical data at an intermediate step allows further generalizations. Instead of just preparing M systems in the state σ detected by the estimator, we can apply first an arbitrary transformation F : S(H) → S(H) on the density matrix σ and prepare F (σ)⊗M instead of σ ⊗M . In this way we get the channel (cf. Figure 4.3) Z ∗ ⊗N F (σ)⊗M KN (dσ) ∈ B ∗ (H⊗M ), (4.21) B (H ) 3 ρ 7→ S i.e. a physically realizable device which approximates the impossible machine F . If the estimator is discrete as in Equation (4.17) we get similarly to (4.20) B ∗ (H⊗N ) 3 τ 7→ X σ∈XN tr(EN,σ τ )F (σ)⊗M ∈ B ∗ (H⊗M ). (4.22) The probability to get a bad approximation of the state F (ρ)⊗M (if the input state was ρ⊗N ) is again given by the error probability of the estimator and we get a perfect realization of F at arbitrary rate as M, N → ∞. There are in particular two interesting tasks which become possible this way: The first is the “universal not gate” which associates to each pure state of a qubit the unique pure state orthogonal to it [46]. This is a special example of a antiunitarily implemented symmetry operation and therefore not completely positive. The second example is the purification of states [55, 138]. Here it is assumed that the input states were once pure but have passed later on a depolarizing channel |φihφ| 7→ ϑ|φihφ| + (1 − ϑ)1I/d. If ϑ > 0 this map is invertible but its inverse does not describe an allowed quantum operation because it maps some density operators to operators with negative eigenvalues. Hence the reversal of noise is not possible with a one shot operation but can be done with high accuracy if enough input systems are available. A detailed quantitative analysis is again postponed to Chapter 11. 4.3 Distillation of entanglement Let us now return to entanglement. We have seen in Section 4.1 that maximally entangled states play a crucial role for processes like teleportation and dense coding. In practice however entanglement is a rather fragile property: If Alice produces a pair of particles in a maximally entangled state |ΩihΩ| ∈ S(HA ⊗ HB ) and distributes one of them over a great distance to Bob, both end up with a mixed state ρ which contains much less entanglement then the original and which can not be used any longer for teleportation. The latter can be seen quite easily if we try to apply the qubit teleportation scheme (Subsection 4.1.2) with a non-maximally entangled isotropic state (Equation (3.15) with λ > 0) instead of Ω. 4. Basic tasks 58 F (σ)⊗M ∈ B ∗ (H⊗M ) Preparation Estimation ρ⊗N ∈ B ∗ (H⊗N ) F classical data σ∈X⊂S F (σ) ∈ S Figure 4.3: Approximating the impossible machine F by state estimation. Hence the question arises, whether it is possible to recover |ΩihΩ| from ρ, or, following the reasoning from the last section, at least a small number of (almost) maximally entangled states from a large number N of copies of ρ. However since the distance between Alice and Bob is big (and quantum communication therefore impossible) only LOCC operations (Section 3.2.6) are available for this task (Alice and Bob can only operate on their respective particles, drop some of them and communicate classically with one another). This excludes procedures like the purification scheme just sketched, because we would need “entangled” measurements to get an asymptotically exact estimate for the state ρ. Hence we need a sequence of LOCC channels ⊗N ⊗N ⊗ HB ) TN : B(CdN ⊗ CdN ) → B(HA such that (4.23) kTN∗ (ρ⊗N ) − |ΩN ihΩN |k1 → 0, for N → ∞ dN (4.24) dN holds, with a sequence of maximally entangled vectors ΩN ∈ C ⊗ C . Note that ⊗N ⊗N ∼ we have to use here the natural isomorphism HA ⊗ HB = (HA ⊗ HB )⊗N , i.e. we ⊗N have to reshuffle ρ such that the first N tensor factors belong to Alice (HA ) and the last N to Bob (HB ). If confusion can be avoided we will use this isomorphism in the following without a further note. We will call a sequence of LOCC channels, TN satisfying (4.24) with a state ρ ∈ S(HA ⊗ HB ) a distillation scheme for ρ and ρ is called distillable if it admits a distillation scheme. The asymptotic rate with which maximally entangled states can be distilled with a given protocol is lim inf log2 (dN )/N. n→∞ (4.25) This quantity will become relevant in the framework of entanglement measures (Chapter 5). 4.3.1 Distillation of pairs of qubits Concrete distillation protocols are in general rather complicated procedures. We will sketch in the following how any pair of entangled qubits can be distilled. The first step is a scheme proposed for the first time by Bennett et. al. [20]. It can be applied if the maximally entangled fraction F (Equation (3.4)) is greater than 1/2. As indicated above, we assume that Alice and Bob share a large amount of pairs 4.3. Distillation of entanglement 59 in the state ρ, so that the total state is ρ⊗N . To obtain a smaller number of pairs with a higher F they proceed as follows: 1. First they take two pairs (let us call them pair 1 and pair 2), i.e. ρ ⊗ ρ and apply to each of them the twirl operation PUŪ associated to isotropic states (cf. Equation (3.18)). This can be done by LOCC operations in the following way: Alice selects at random (respecting the Haar measure on U(2)) a unitary operator U applies it to her qubits and sends to Bob which transformation she has chosen; then he applies Ū to his particles. They end up with two isotropic states ρe ⊗ ρe with the same maximally entangled fraction as ρ. 2. Each party performs the unitary transformation UXOR : |ai ⊗ |bi 7→ |ai ⊗ |a + b mod 2i (4.26) on his/her members of the pairs. 3. Finally Alice and Bob perform local measurements in the basis |0i, |1i on pair 1 and discards it afterwards. If the measurements agree, pair 2 is kept and has a higher F. Otherwise pair 2 is discarded as well. If this procedure is repeated over and over again, it is possible to get states with an arbitrarily high F, but we have to sacrifice more and more pairs and the asymptotic rate is zero. To overcome this problem we can apply the scheme above until F(ρ) is high enough such that 1 + tr(ρ ln ρ) ≥ 0 holds and then we continue with another scheme called hashing [24] which leads to a nonvanishing rate. If finally F(ρ) ≤ 1/2 but ρ is entangled, Alice and Bob can increase F for some of their particles by filtering operations [17, 95]. The basic idea is that Alice applies an instrument T : C(X) ⊗ B(H) → B(H) with two possible outcomes (X = {1, 2}) −1 ∗ to her particles. Hence £ ∗ the ¤ state becomes ρ 7→ px (Tx ⊗ Id) (ρ), x = 1, 2 with probability px = tr Tx (ρ) (cf. Subsection 3.2.5 in particular Equation (3.53) for the definition of Tx ). Alice communicates her measuring result x to Bob and if x = 1 they keep the particle otherwise (x = 2) they discard it. If the instrument T was correctly chosen Alice and Bob end up with a state ρe with higher maximally entangled fraction. To find an appropriate T firstly note that there are ψ ∈ H ⊗ H with hψ, (Id ⊗Θ)ρψi ≤ 0 (this follows from Theorem 2.4.3 since ρ is by assumption entangled) and second that we can write each vector ψ ∈ H ⊗ H as (Xψ ⊗ 1I)Φ0 with the Bell state Φ0 and an appropriately chosen operator Xψ (see Subsection 3.1.1). Now we can define T in terms of the two operations T1 , T2 (cf. Equation (3.55)) with (4.27) T1 (A) = Xψ∗ AXψ−1 , Id −T1 = T2 It is straightforward to check that we end up with ρe = (T ⊗ Id)∗ (ρ) £ x ¤ tr (Tx ⊗ Id)∗ (ρ) (4.28) such that F(e ρ) > 1/2 holds and we can continue with the scheme described in the previous paragraph. 4.3.2 Distillation of isotropic states Consider now an entangled isotropic state ρ in d dimensions, i.e. we have H = C d and 0 ≤ tr(Fe ρ) ≤ 1 (with the operator Fe of Subsection 3.1.3). Each such state is distillable via the following scheme [36, 117]: First Alice and Bob apply a filter operation T : C(X) ⊗ B(H) → B(H) on their respective particle given by T1 (A) = P AP , T2 = 1−T1 where P is the projection onto a two dimensional subspace. If both measure the value 1 they get a qubit pair in the state ρe = (T1 ⊗ T1 )(ρ). Otherwise 4. Basic tasks 60 they discard their particles (this requires classical communication). Obviously the state ρe is entangled (this can be easily checked), hence they can proceed as in the previous Subsection. The scheme just proposed can be used to show that each state ρ which violates the reduction criterion (cf. Subsection 2.4.3) can be distilled [117]. The basic idea is to project ρ with the twirl PUŪ (which is LOCC as we have seen above; cf. Subsection 4.3.1) to an isotropic state PUŪ (ρ) and to apply the procedure from the last paragraph afterwards. We only have to guarantee that PUŪ (ρ) is entangled. To this end use a vector ψ ∈ H ⊗ H with hψ, (1I ⊗ tr1 (ρ) − ρ)ψi < 0 (which exists by assumption since ρ violates the reduction criterion) and to apply the filter operation given by ψ via Equation (4.27). 4.3.3 Bound entangled states It is obvious that separable states are not distillable, because a LOCC operation map separable states to separable states. However is each entangled state distillable? The answer, maybe somewhat surprising, is no and an entangled state which is not distillable is called bound entangled [119] (distillable states are sometimes called free entangled, in analogy to thermodynamics). Examples of bound entangled states are all ppt entangled states [119]: This is an easy consequence of the fact that each separable channel (and therefore each LOCC channel as well) maps ppt states to ppt states (this is easy to check), but a maximally entangled state is never ppt. It is not yet known, whether bound entangled npt states exists, however, there are at least some partial results: 1. It is sufficient to solve this question for Werner states, i.e. if we can show that each npt Werner state is distillable it follows that all npt states are distillable [117]. 2. Each npt Gaussian state is distillable [92]. 3. For each N ∈ N there is an npt Werner state ρ which is not “N -copy distillable”, i.e. hψ, ρ⊗N ψi ≥ 0 holds for each pure state ψ with exactly two Schmidt summands [72, 78]. This gives some evidence for the existence of bound entangled npt states because ρ is distillable iff it is N -copy distillability for some N [119, 72, 78]. Since bound entangled states can not be distilled, they can not be used for teleportation. Nevertheless bound entanglement can produce a non-classical effect, called “activation of bound entanglement” [125]. To explain the basic idea, assume that Alice and Bob share one pair of particles in a distillable state ρf and many particles in a bound entangled state ρb . Assume in addition that ρf can not be used for teleportation, or, in other words if ρf is used for teleportation the particle Bob receives is in a state σ 0 which differs from the state σ Alice has send. This problem can not be solved by distillation, since Alice and Bob share only one pair of particles in the state ρf . Nevertheless they can try to apply an appropriate filter operation on ρ to get with a certain probability a new state which leads to a better quality of the teleportation (or, if the filtering fails, to get nothing at all). It can be shown however [120] that there are states ρf such that the error occuring in this process (e.g. measured by the trace norm distance of σ and σ 0 ) is always above a certain threshold. This is the point where the bound entangled states ρ b come into play: If Alice and Bob operate with an appropriate protocol on ρf and many copies of ρb the distance between σ and σ 0 can be made arbitrarily small (although the probability to be successful goes to zero). Another example for an activation of bound entanglement is related to distillability of npt states: If Alice and Bob share a certain ppt-entangled state as additional resource each npt state ρ becomes distillable (even if ρ is bound entangled) [80, 145]. For a more detailed survey of the role of bound entanglement and further references see [123]. 4.4 Quantum error correction If we try to distribute quantum information over large distances or store it for a long time in some sort of “quantum memory” we always have to deal with “de- 4.4. Quantum error correction 61 coherence effects”, i.e. unavoidable interactions with the environment. This results in a significant information loss, which is particularly bad for the functioning of a quantum computer. Similar problems arise as well in a classical computer, but the methods used there to circumvent the problems can not be transferred to the quantum regime. E.g. the most simple strategy to protect classical information against noise is redundancy: instead of storing the information once we make three copies and decide during readout by a majority vote which bit to take. It is easy to see that this reduces the probability of an error from order ² to ²2 . Quantum mechanically however such a procedure is forbidden by the no cloning theorem. Nevertheless quantum error correction is possible although we have to do it in a more subtle way than just copying; this was observed for the first time independently in [48] and [201]. Let us consider first the general scheme and assume that T : B(K) → B(K) is a noisy quantum channel. To send quantum systems of type B(H) undisturbed through T we need an encoding channel E : B(K) → B(H) and a decoding channel D : B(H) → B(K) such that ET D = Id holds, respectively D∗ T ∗ E ∗ = Id in the Schrödinger picture; cf. Figure 4.4. 4.4.1 The theory of Knill and Laflamme To get a more detailed description of the structure of the channels E and D we will give in the following a short review of the theory of error correcting codes in the sense of Knill and Laflamme [143]. To this end start from the error corrector’s dream, namely the situation in which all the errors happen in another part of the system, where we do not keep any of the precious quantum information. This will help us to characterize the structure of the kind of errors which such a scheme may tolerate, or ‘correct’. Of course, the dream is just a dream for the situation we are mainly interested in: several parallel channels, each of which may be affected by errors. But the splitting of the system into subsystems, mathematically the decomposition of the Hilbert space of the total system into a tensor product is something we may change by a suitable unitary transformation. This is then precisely the role of the encoding and decoding operations. The Knill-Laflamme theory is precisely the description of the situation where such a unitary, and hence a coding/decoding scheme exists. Constructing such schemes, however, is another matter, to which we will turn in the next subsection. So consider a system split into H = Hg ⊗ Hb , where the indices g and b stand for ‘good’ and ‘bad’. We prepare the system in a state ρ ⊗ |ΩihΩ|, where ρ is the quantum state we want Pto protect. Now come the errors in the form of a completely positive map T (A) = i Fi∗ AFi . Then according to the error corrector’s dream, we T Id Id Id Decoding Encoding ρ ρ Id Figure 4.4: Five bit quantum code: Encoding one qubit into five and correcting one error. 4. Basic tasks 62 would just have to discard the bad system, and get the same state ρ as before. The hardest demands for realizing this come from pure states ρ = |φihφ|, because the only way that the restriction to the good system can again be |φihφ| is that the state after errors factorizes, i.e. X T ∗ (|φ ⊗ Ωihφ ⊗ Ω|) = |Fi (φ ⊗ Ω)ihFi (φ ⊗ Ω)| = |φihφ| ⊗ σ . (4.29) i This requires that Fi (φ ⊗ Ω) = φ ⊗ Φi , (4.30) where Φi ∈ Hb is some vector, which must be independent of φ if such an equation is to hold for all φ ∈ Hg . Conversely, condition (4.30) implies (4.29) for every pure state |φihφ| and, by convex combination, for every state ρ. Two remarks are in order. Firstly, we have not required that Fi = 1I ⊗ Fi0 . This would be equivalent to demanding that this scheme works with every Ω, or indeed with every (possibly mixed) initial state of the bad system. This would be much too strong for a useful theory of codes. So later on we must insist on a proper initialization of the bad subsystem by a suitable encoding. Secondly, if we have the condition (4.30) for the Kraus operators of some channel T , then it also holds for all channels whose Kraus operators can be written as linear combinations of the F i . In other words, the “set of correctible errors” is naturally identified with the vector space of operators F such that there is a vector Φ ∈ Hb with F (φ ⊗ Ω) = φ ⊗ Φ for all φ ∈ Hg . This space will be called the maximal error space of the coding scheme, and will be denoted by Emax . Usually, a code is designed for a given error space E. Then the statement that these given errors are corrected simply becomes E ⊂ Emax . The key observation, however, is that the space of errors is a vector space in a natural way, i.e., if we can correct two types of errors, then we can also correct their superposition. Let us now consider the situation in which we want to send states of a small system with Hilbert space H1 through a channel T : B(H2 ) → B(H2 ). The Kraus operators of T lie in an error space E ⊂ B(H2 ), which we assume to be given. No more assumptions will be made about T . Our task is now to devise coding E and decoding D so that ET D is the identity on B(H1 ). The idea is to realize the error corrector’s dream by suitable encoding. The ‘good’ space in that scenario is, of course, the space H1 . We are looking for a way to write H2 ∼ = H1 ⊗ Hb . Actually, an isomorphism may be asking too much, and we look for an isometry U : H1 ⊗ Hb → H2 . The encoding, written best in the Schrödinger picture, is tensoring with an initial state Ω as before, but now with an additional twist by U : E ∗ (ρ) = U (ρ ⊗ |ΩihΩ|)U ∗ . (4.31) The decoding operation D is again taking the partial trace over the bad space H b , after reversing of U . Since U is only an isometry and not necessarily unitary we need an additional term to make D unit preserving. The whole operation is best written in the Heisenberg picture: D(X) = U (X ⊗ 1I)U ∗ + tr(ρ0 X)(1I − U U ∗ ) , (4.32) where ρ0 is an arbitrary density operator. These transformations are successful, if the error space (transformed by U ) behaves as before, i.e., if for all F ∈ E there are vectors Φ(F ) ∈ Hb such that, for all φ ∈ H1 F U (φ ⊗ Ω) = U (φ ⊗ Φ(F )) (4.33) holds. This equation describes precisely the elements F ∈ Emax of the maximal error space. 4.4. Quantum error correction 63 P To check that we really have ET D = Id for any channel T (A) = i Fi∗ AFi with Fi ∈ Emax , it suffices to consider pure input states |φihφ|, and the measurement of an arbitrary observable X at the output: £ ¤ X £ ¤ tr |φihφ|ET D(X) = tr U |φ ⊗ Ωihφ ⊗ Ω|U ∗ Fi U (X ⊗ 1I)U ∗ Fi (4.34) i X £ ¤ tr |φ ⊗ Φ(Fi )ihφ ⊗ Φ(Fi )|X ⊗ 1I = i = hφ, Xφi X i kΦ(Fi )k2 = hφ, Xφi. (4.35) (4.36) P In the last equation we have used that i kΦ(Fi )k2 = 1, since E, T , and D each map 1I to 1I. The encoding E defined in Equation (4.31) is of the form E ∗ (ρ) = V ρV ∗ with the encoding isometry V : H1 → H2 given by V φ = U (φ ⊗ Ω) . (4.37) If we just know this isometry and the error space we can reconstruct the whole structure, including the decomposition H2 = H1 ⊗Hb ⊕(1I−U U ∗ )H2 , and hence the decoding operation D. A necessary condition for this, first established by Knill and Laflamme [143], is that, for arbitrary φ1 , φ2 ∈ H1 and error operators F1 , F2 ∈ E: hV φ1 , F1∗ F2 V φ2 i = hφ1 , φ2 iω(F1∗ F2 ) (4.38) ω(F1∗ F2 ) independent of φ1 , φ2 . Indeed, from (4.33) we holds with some numbers immediately get this equation with ω(F1∗ F2 ) = hΦ(F1 ), Φ(F2 )i. Conversely, if the Knill-Laflamme condition (4.38) holds, the numbers ω(F1∗ F2 ) serve as a (possibly degenerate) scalar product on E, which upon completion becomes the ‘bad space’ Hb , such that F ∈ E is identified with a Hilbert space vector Φ(F ). The operator U : φ⊗Φ(F ) = F V φ is then an isometry, as used at the beginning of this section. To conclude, the Knill-Laflamme condition is necessary and sufficient for the existence of a decoding operation. Its main virtue is that we can use it without having to construct the decoding explicitly. The most relevant example of such a scheme arises if we generalize the classical idea of sending multiple copies in a certain sense. This means we encode the quantum information we want to transmit into n systems which can be send separately through multiple copies of a noisy channel; cf. Figure 4.4. In that case the space H2 is the n-fold tensor product of the system H on which the noisy channels under consideration act. Definition 4.4.1 We say that a coding isometry V : H1 → H⊗n corrects f errors, if it satisfies the Knill-Laflamme condition (4.38) for the error space Ef spanned linearly by all operators of the kind X1 ⊗ X2 ⊗ · · · ⊗ Xn , where at most f places we have a tensor factor Xi 6= 1I. When F1 and F2 are both supported on at most f sites, the product F1∗ F2 , which appears in the Knill-Laflamme condition involves 2f sites. Therefore we can paraphrase the condition by saying that hV φ1 , XV φ2 i = hφ1 , φ2 iω(X) (4.39) for X ∈ E2f . From Kraus operators in Ef we can build arbitrary channels of the kind T = T1 ⊗ T2 ⊗ · · · ⊗ Tn , where at most f of the tensor factors Ti are channels different from id. There are several ways to construct error correcting codes (see e.g. [98, 47, 10]). Most appropriate for our purposes are “Graph codes” [190], because they are quite easy to describe and admit a simple way to check the error correction condition. This will be the subject of the next subsection. 4. Basic tasks 64 4.4.2 Graph codes The general scheme of graph codes works not just for qubits, but for any dimension d of one site spaces. The code will have some number m of input systems, which we label by a set X, and, similarly n output systems, labeled by a set Y . The Hilbert space of the system with label x ∈ X ∪ Y will be denoted by Hx although all these are isomorphic to Cd , and are equipped with a special basis |jx i, where jx ∈ Zd is an integer taken modulo d. As a convenient shorthand, we write jX for a tuple of jx ∈ ZdN , specified for every x ∈ X. Thus the |jX i form a basis of the input space space will be called HX = x∈X Hx of the code. An operator F , say, on the output N localized on a subset Z ⊂ Y of systems, if it is some operator on y∈Z Hy , tensored with the identity operators of the remaining sites. The main ingredient of the code construction is now an undirected graph with vertices X ∪ Y . The links of the graph are given by the adjacency matrix, which we will denote by Γ. When we have |X| = m input vertices and |Y | = n output vertices, this is an (n + m) × (n + m) matrix with Γxy = 1 if node x and y are linked and Γxy = 0 othFigure 4.5: Two graph codes. erwise. We do allow multiple edges, so the entries of Γ will in general be integers, which can also be taken modulo d. It is convenient to exclude self-linked vertices, so we always take Γxx = 0. The graph determines an operator V = VΓ : HX → HY by the formula ¶ µ iπ −n/2 (4.40) jX∪Y Γ · jX∪Y , hjY |VΓ |jX i = d exp d where the exponent contains the matrix element of Γ X jx Γxy jy . jX∪Y · Γ · jX∪Y = (4.41) x,y∈X∪Y Because Γ is symmetric, every term in this sum appears twice, hence adding a multiple of d to any jx or Γxy will change the exponent in (4.40) by a multiple of 2π, and thus will not change VΓ . The error correcting properties of VΓ are summarized in the following result [190]. It is just the Knill-Laflamme condition with a special expression for the form ω, for error operators such that F1∗ F2 is localized on a set Z. Theorem 4.4.2 Let Γ be a graph, i.e., a symmetric matrix with entries Γ xy ∈ Zd , for x, y ∈ (X ∪Y ). Consider a subset Z ⊂ Y , and suppose that the (Y \Z)×(X ∪Z)submatrix of Γ is non-singular, i.e., X ∀y∈Y \Z Γyx hx ≡ 0 implies ∀x∈X∪Z hx ≡ 0 (4.42) x∈X∪Z where congruences are Z, we have mod d. Then, for every operator F ∈ B(H Y ) localized on VΓ∗ F VΓ = d−n tr(F )1IX (4.43) Proof. It will be helpful to use the notation for collections of variables, already present in (4.41) more systematically: for any subset W ⊂ X ∪ Y we write jW for the collection of variables jy with y ∈ W . The Kronecker-Delta δ(jW ) is defined to be zero if for any y ∈ W jy 6= P 0, and one otherwise. By jW · ΓW W 0 · kW 0 we mean the suitably restricted sum, i.e., x∈W,y∈W 0 jx Γxy ky . The important sets to which we 4.4. Quantum error correction 65 apply this notation are X 0 = (X ∪ Z) and Y 0 = Y \ Z. In particular, the condition on Γ can be written as ΓY 0 X 0 jX 0 = 0 =⇒ jX 0 = 0. Consider now the matrix element X (4.44) hjX |VΓ∗ |jY ihjY |F |kY ihkY |VΓ |kX i hjX |VΓ∗ F VΓ |kX i = jY ,kY = X d−n e jY ,kY iπ d ³ kX∪Y ·Γ·kX∪Y −jX∪Y ·Γ·jX∪Y ´ hjY |F |kY i Since F is localized on Z, the matrix element contains a factor δjy ,ky for every y ∈ Y \ Z = Y 0 , so we can write hjY |F |kY i = hjZ |F |kZ iδ(jY 0 − kY 0 ). Therefore we can compute the sum (4.44) in stages: X hjX |VΓ∗ F VΓ |kX i = (4.45) hjZ |F |kZ iS(jX 0 , kX 0 ) , jZ ,kZ where S(jX 0 , kX 0 ) is the sum over the Y 0 -variables, which, of course, still depends on the input variables jX , kX and the variables jZ , kZ at the error positions: ´ ³ iπ X kX∪Y ·Γ·kX∪Y −jX∪Y ·Γ·jX∪Y d −n (4.46) S(jX 0 , kX 0 ) = d δ(jY 0 − kY 0 )e jY 0 ,kY 0 The sums in the exponent can each be split into four parts according to the decomposition X 0 vs. Y 0 . The terms involving ΓY 0 Y 0 cancel because kY 0 = jY 0 . The terms involving ΓX 0 Y 0 and ΓY 0 X 0 are equal because Γ is symmetric, and together give 2jY 0 · ΓY 0 X 0 · (kX 0 − jX 0 ). The ΓX 0 X 0 remain unchanged, but only give a phase factor independent of the summation variables. Hence ¢X ¡ iπ 2πi e d jY 0 ·ΓY 0 X 0 ·(kX 0 −jX 0 ) S(jX 0 , kX 0 ) = d−n e d kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0 = = = d−n e iπ d ¡ kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0 0 d−n+|Y | e d 0 −n+|Y | iπ d ¡ ¢ jY 0 d|Y kX 0 ·Γ·kX 0 −jX 0 ·Γ·jX 0 δ(kX 0 − jX 0 ) . 0 | ¢ δ(ΓY 0 X 0 · (kX 0 − jX 0 )) δ(kX 0 − jX 0 ) (4.47) Here we used at the first equation that the sum is a product of geometric series as they appear in discrete Fourier transforms. At the second equality the main P condition of the Proposition enters: if x∈X 0 Γyx · (kx − jx ) vanishes for all y ∈ Y 0 as required by the delta-function then (and only then) the vector kX 0 − jX 0 must vanish. But then the two terms in the exponent of the phase factor also cancel. Inserting this result into (4.45), and using that δ(hX 0 ) = δ(hX )δ(hZ ), we find 0 X hjX |VΓ∗ F VΓ |kX i = δ(jX − kX ) d−n+|Y | hjZ |F |jZ i = δ(jX − kX ) d −n X jY jZ hjY |F |jY i Here the error operator is considered in the first line as an operator on HZ , and as an operator on HY in the second line, by tensoring it with 1IY 0 . This cancels the 0 dimension factor d|Y | 2 All that is left to get an error correcting code is to ensure that the conditions of this Theorem are satisfied sufficiently often. This is evident from combining the above Theorem with Definition 4.4.1. 4. Basic tasks 66 Corollary 4.4.3 Let Γ be a graph as in the previous Proposition, and suppose that the (Y \ Z) × (X ∪ Z)-submatrix of Γ is non-singular for all Z ⊂ Y with up to 2f elements. Then the code associated to Γ corrects f errors. Two particular examples (which are equivalent!) are given in Figure 4.5. In both cases we have N = 1, M = 5 and K = 1 i.e. one input node, which can be chosen arbitrarily, five output nodes and the corresponding codes correct one error. 4.5 Quantum computing Quantum computing is without a doubt the most prominent and most far reaching application of quantum information theory, since it promises on the one hand, “exponential speedup” for some problems which are “hard to solve” with a classical computer, and gives completely new insights into classical computing and complexity theory on the other. Unfortunately, an exhaustive discussion would require its own review article. Hence we we are only able to give a short overview (see Part II of [172] for a more complete presentation and for further references). 4.5.1 The network model of classical computing Let us start with a brief (and very informal) introduction to classical computing (for a more complete review and hints for further reading see Chapter 3 of [172]). What we need first is a mathematical model for computation. There are in fact several different choices and the Turing machine [212] is the most prominent one. More appropriate for our purposes is, however, the so called network model, since it allows an easier generalization to the quantum case. The basic idea is to interpret a classical (deterministic) computation as the evaluation of a map f : BN → BM (where B = {0, 1} denotes the field with two elements) which maps N input bits to M output bits. If M = 1 holds f is called a boolean function and it is for many purposes sufficient to consider this special case – each general f is in fact a Cartesian product of boolean functions. Particular examples are the three elementary gates AND, OR and NOT defined in Figure 4.6 and arbitrary algebraic expressions constructed from them: e.g. the XOR gate (x, y) 7→ x + y mod 2 which can be written as (x ∨ y) ∧ ¬(x ∧ y). It is now a standard result of boolean algebra that each boolean function can be represented in this way and there are in general many possibilities to do this. A special case is the disjunctive normal form of f ; cf [225]. To write such an expression down in form of equations is, however, somewhat confusing. f is therefore expressed most conveniently in graphical form as a circuit or network, i.e. a graph C with nodes representing elementary gates and edges (“wires”) which determine how the gates should be composed; cf. Figure 4.7 for an example. A a a c b a 0 1 0 1 b 0 0 1 1 c 0 0 0 1 c b a 0 1 0 1 b 0 0 1 1 c 0 1 1 1 a b a 0 1 b 1 0 c = ab c = a + b − ab b=1−a AND, ∧ OR, ∨ NOT, ¬ Figure 4.6: Symbols and definition for the three elementary gates AND, OR and NOT. 4.5. Quantum computing 67 classical computation can now be defined as a circuit applied to a specified string of input bits. Variants of this model arise if we replace AND, OR and NOT by another (finite) set G of elementary gates. We only have to guarantee that each function f can be expressed as a composition of elements from G. A typical example for G is the set which contains only the NAND gate (x, y) 7→ x ↑ y = ¬(x ∧ y). Since AND, OR and NOT can be rewritten in terms of NAND (e.g. ¬x = x ↑ x) we can calculate each boolean function by a circuit of NAND gates. x c x + y mod 2 y Figure 4.7: Half-adder circuit as an example for a boolean network. 4.5.2 Computational complexity One of the most relevant questions within classical computing, and the central subject of computational complexity, is whether a given problem is easy to solve or not, where “easy” is defined in terms of the scaling behavior of the resources needed in dependence of the size of the input data. In the following we will give a rough survey over the most basic aspects of this field, while we refer the reader to [177] for a detailed presentation. To start with, let us specify the basic question in greater detail. First of all the problems we want to analyze are decision problems which only give the two possible values “yes” and “no”. They are mathematically described by boolean functions acting on bit strings of arbitrary size. A well known example is the factoring problem given by the function fac with fac(m, l) = 1 if m (more precisely the natural number represented by m) has a divisor less then l and fac(m, l) = 0 otherwise. Note that many tasks of classical computation can be reformulated this way, so that we do not get a severe loss of generality. The second crucial point we have to clarify is the question what exactly are the resources we have mentioned above and how we have to quantify them. A natural physical quantity which come into mind immediately is the time needed to perform the computation (space is another candidate, which we do not discuss here, however). Hence the question we have to discuss is how the computation time t depends on the size L of the input data x (i.e. the length L of the smallest register needed to represent x as a bit string). However a precise definition of “computation time” is still model dependent. For a Turing machine we can take simply the number of head movements needed to solve the problem, and in the network model we choose the number of steps needed to execute the whole circuit, if gates which operate on different bits are allowed to work simultaneously1 . Even with a fixed type of model the functional behavior of t 1 Note that we have glanced over a lot of technical problems at this point. The crucial difficulty is that each circuit CN allows only the computation of a boolean function fN : BN → B which acts on input data of length N . Since we are interested in answers for arbitrary finite length inputs a sequence CN , N ∈ N of circuits with appropriate uniformity properties is needed; cf. [177] for details. 4. Basic tasks 68 depends on the set of elementary operations we choose, e.g. the set of elementary gates in the network model. It is therefore useful to divide computational problems into complexity classes whose definitions do not suffer under model dependent aspects. The most fundamental one is the class P which contains all problems which can be computed in “polynomial time”, i.e. t is, as a function of L, bounded from above by a polynomial. The model independence of this class is basically the content of the strong Church Turing hypotheses which states, roughly speaking, that each model of computation can be simulated in polynomial time on a probabilistic Turing machine. Problems of class P are considered “easy”, everything else is “hard”. However even if a (decision) problem is hard the situation is not hopeless. E.g. consider the factoring problem fac described above. It is generally believed (although not proved) that this problem is is not in class P. But if somebody gives us a divisor p < l of m it is easy to check whether p is really a factor, and if the answer is true we have computed fac(m, l). This example motivates the following definition: A decision problem f is in class NP (“nondeterministic polynomial time”) if there is a boolean function f 0 in class P such that f 0 (x, y) = 1 for some y implies f (x). In our example fac0 is obviously defined by fac0 (m, l, p) = 1 ⇔ p < l and p is a devisor of m. It is obvious that P is a subset of NP the other inclusion however is rather nontrivial. The conjecture is that P 6= NP holds and great parts of complexity theory are based on it. Its proof (or disproof) however represents one of the biggest open questions of theoretical informatics. To introduce a third complexity class we have to generalize our point of view slightly. Instead of a function f : BN → BM we can look at a noisy classical T which sends the input value x ∈ BN to a probability distribution Txy , y ∈ BM on BM (i.e. Txy is the transition matrix of the classical channel T ; cf. Subsection 3.2.3). Roughly speaking, we can interpret such a channel as a probabilistic computation which can be realized as a circuit consisting of “probabilistic gates”. This means there are several different ways to proceed at each step and we use a classical random number generator to decide which of them we have to choose. If we run our device several times on the same input data x we get different results y with probability Txy . The crucial point is now that we can allow some of the outcomes to be wrong as long as there is an easy way (i.e. a class P algorithm) to check the validity of the results. Hence we define BPP (“bounded error probabilistic polynomial time”) as the class of all decision problems which admit a polynomial time probabilistic algorithm with error probability less than 1/2 − ² (for fixed ²). It is obvious that P ⊂ BPP holds but the relation between BPP and NP is not known. 4.5.3 Reversible computing In the last subsection we have discussed the time needed to perform a certain computation. Other physical quantities which seem to be important are space and energy. Space can be treated in a similar way as time and there are in fact spacerelated complexity classes (e.g PSPACE which stands for “polynomial space”). Energy, however, is different, because it turns surprisingly out that it is possible to do any calculation without expending any energy! One source of energy consumption in a usual computer is the intrinsic irreversibility of the basic operations. E.g. a basic gate like AND maps two input bits to one output bit, which obviously implies that the input can not be reconstructed from the output. In other words: one bit of information is erased during the operation of the AND gate, hence a small amount of energy is dissipated to the environment. A thermodynamic analysis, known as Landauer’s principle, shows that this energy loss is at least kB T ln 2, where T is the temperature of the environment [148]. If we want to avoid this kind of energy dissipation we are restricted to reversible processes, i.e. it should be possible to reconstruct the input data from the output 4.5. Quantum computing 69 data. This is called reversible computation and it is performed in terms of reversible gates, which in turn can be described by invertible functions f : BN → BN . This does not restrict the class of problems which can be solved however: We can repackage a non-invertible function f : BN → BM into an invertible one f 0 : BN +M → BN +M simply by f 0 (x, 0) = (x, f (x)) and an appropriate extension to the rest of BN +M . It can be even shown that a reversible computer performs as good as a usual one, i.e. an “irreversible” network can be simulated in polynomial time by a reversible one. This will be of particular importance for quantum computing, because a reversible computer is, as we will see soon, a special case of a quantum computer. 4.5.4 The network model of a quantum computer Now we are ready to introduce a mathematical model for quantum computation. To this end we will generalize the network model discussed in Subsection 4.5.1 to the network model of quantum computation. U U |xi 7→ U |xi One qubit gate. |0xi 7→ |0i ⊗ |xi |0xi 7→ |0i ⊗ |xi |1xi 7→ |1i ⊗ U |xi |1xi 7→ |1i ⊗ |¬xi Controlled U gate. CNOT gate. Figure 4.8: Universal sets of quantum gates. A classical computer operates by a network of gates on a finite number of classical bits. A quantum computer operates on a finite number of qubits in terms of a network of quantum gates – this is the rough idea. To be more precise consider the Hilbert space H⊗N with H = C2 which describes a quantum register consisting of N qubits. In H there is a preferred set |0i, |1i of orthogonal states, describing the two values a classical bit can have. Hence we can describe each possible value x of a classical register of length N in terms of the computational basis |xi = |x1 i⊗· · ·⊗|xN i, x ∈ BN . A quantum gate is now nothing else but a unitary operator acting on a small number of qubits (preferably 1 or 2) and a quantum network is a graph representing the composition of elementary gates taken from a small set G of unitaries. A quantum computation can now be defined as the application of such a network to an input state ψ of the quantum register (cf. Figure 4.9 for an example). Similar to the classical case the set G should be universal; i.e. each unitary operator on a quantum register of arbitrary length can be represented as a composition of elements from G. Since the group of unitaries on a Hilbert space is continuous, it is not possible to do this with a finite set G. However we can find at least suitably small sets which have the chance to be realizable technically (e.g. in an ion-trap) somehow in the future. Particular examples are on the one hand the controlled U operations and the set consisting of CNOT and all one-qubit gates on the other (cf. Figure 4.8; for a proof of universality see Section 4.5 of [172]). Basically we could have considered arbitrary quantum operations instead of only unitaries as gates. However in Subsection 3.2.1 we have seen that we can implement each operation unitarily if we add an ancilla to the systems. Hence this kind of gen- 4. Basic tasks 70 H U1 H U2 U1 H U3 1 H=√ 2 · 1 1 1 −1 ¸ Uk = · 1 0 U2 0 e2 −k π U1 H ¸ Figure 4.9: Quantum circuit for the discrete Fourier transform on a 4-qubit register. eralization is already covered by the model. (As long as non-unitarily implemented operations are a desired feature. Decoherence effect due to unavoidable interaction with the environment are a completely different story; we come back to this point at the end of the Subsection.) The same holds for measurements at intermediate steps and subsequent conditioned operations. In this case we get basically the same result with a different network where all measurements are postponed to the end. (Often it is however very useful to allow measurements at intermediate steps as we will see in the next Subsection.) Having a mathematical model of quantum computers in mind we are now ready to discuss how it would work in principle. 1. The first step is in most cases preprocessing of the input data on a classical computer. E.g. the Shor algorithm for the factoring problem does not work if the input number m is a pure prime power. However in this case there is an efficient classical algorithm. Hence we have to check first whether m is of this particular form and use this classical algorithm where appropriate. 2. In the next step e have to prepare the quantum register based on these preprocessed data. This means in the most simple case to write classical data, i.e. to prepare the state |xi ∈ H⊗N if the (classical) input is x ∈ BN . In many cases however it might be more intelligent to use a superposition of several |xi, e.g. the state 1 X |xi, (4.48) Ψ= √ 2N x∈BN which represents actually the superposition of all numbers the registers can represent – this is indeed the crucial point of quantum computing and we come back to it below. 3. Now we can apply the quantum circuit C to the input state ψ and after the calculation we get the output state U ψ, where U is the unitary represented by C. 4. To read out the data after the calculation we perform a von Neumann measurement in the computational basis, i.e. we measure the observable given by the one dimensional projectors |xihx|, x ∈ BN . Hence we get x ∈ BN with probability PN = |hψ|xi|2 . 71 4.5. Quantum computing 5. Finally we have to postprocess the measured value x on a classical computer to end up with the final result x0 . If, however, the output state U Ψ is a proper superposition of basis vectors |xi (and not just one |xi) the probability p x to get this particular x0 is less than 1. In other words we have performed a probabilistic calculation as described in the last paragraph of Subsection 4.5.2. Hence we have to check the validity of the results (with a class P algorithm on a classical computer) and if they are wrong we have to go back to step 2. So, why is quantum computing potentially useful? First of all, a quantum computer can perform at least as good as a classical computer. This follows immediately from our discussion of reversible computing in Subsection 4.5.3 and the fact that any invertible function f : BN → BN defines a unitary by Uf : |xi 7→ |f (x)i (the quantum CNOT gate in Figure 4.8 arises exactly in this way from the classical CNOT). But, there is on the other hand strong evidence which indicates that a quantum computer can solve problems in polynomial time which a classical computer can not. The most striking example for this fact is the Shor algorithm, which provides a way to solve the factoring problem (which is most probably not in class P) in polynomial time. If we introduce the new complexity class BQP of decision problems which can be solved with high probability and in polynomial time with a quantum computer, we can express this conjecture as BPP 6= BQP. The mechanism which gives a quantum computer its potential power is the ability to operate not just on one value x ∈ BN , but on whole superpositions of values, as already mentioned in step 2 above. E.g. consider a, not necessarily invertible, map f : BN → BM and the unitary operator Uf H⊗N ⊗ H⊗M 3 |xi ⊗ |0i 7→ Uf |xi ⊗ |0i = |xi ⊗ |f (x)i ∈ H⊗N ⊗ H⊗M . (4.49) If we let act Uf on a register in the state Ψ ⊗ |0i from Equation (4.48) we get the result 1 X |xi ⊗ |f (x)i. (4.50) Uf (Ψ ⊗ |0i) = √ 2N x∈BN Hence a quantum computer can evaluate the function f on all possible arguments x ∈ BN at the same time! To benefit from this feature – usually called quantum parallelism – is, however, not as easy as it looks like. If we perform a measurement on Uf (Ψ ⊗ |0i) in the computational basis we get the value of f for exactly one argument and the rest of the information originally contained in Uf (Ψ ⊗ |0i) is destroyed. In other words it is not possible to read out all pairs (x, f (x)) from Uf (Ψ ⊗ |0i) and to fill a (classical) lookup table with them. To take advantage from quantum parallelism we have to use a clever algorithm within the quantum computation step (step 3 above). In the next section we will consider a particular example for this. Before we come to this point, let us give some additional comments which link this section to other parts of quantum information. The first point concerns entanglement. The state Uf (Ψ ⊗ |0i) is highly entangled (although Ψ is separable since £ ¤⊗N Ψ = 2−1/2 (|0i + |1i) ), and this fact is essential for the “exponential speedup” of computations we could gain in a quantum computer. In other words, to outperform a classical computer, entanglement is the most crucial resource – this will become more transparent in the next section. The second remark concerns error correction. Up to now we have implicitly assumed that all components of a quantum computer work perfectly without any error. In reality however decoherence effects make it impossible to realize unitarily implemented operations, and we have to deal with noisy channels. Fortunately it is possible within quantum information to correct at least a certain amount of errors, as we have seen in Section 4.4). Hence unlike an 4. Basic tasks 72 analog computer2 a quantum computer can be designed fault tolerant, i.e. it can work with imperfectly manufactured components. 4.5.5 Simons problem We will consider now a particular problem (known as Simons problem; cf. [196]) which shows explicitly how a quantum computer can speed up a problem which is hard to solve with a classical computer. It does not fit however exactly into the general scheme sketched in the last subsection, because a quantum “oracle” is involved, i.e. a black box which performs an (a priori unknown) unitary transformation on an input state given to it. The term “oracle” indicates here that we are not interested in the time the black box needs to perform the calculation but only in the number of times we have to access it. Hence this example does not prove the conjecture BPP 6= BQP stated above. Other quantum algorithms which we have not the room here to discuss include: the Deutsch [69] and Deutsch-Josza problem [70], the Grover search algorithm [103, 102] and of course Shor’s factoring algorithm [192, 193]. Hence let us assume that our black box calculates the unitary Uf from Equation (4.49) with a map f : BN → BN which is two to one and has period a, i.e. f (x) = f (y) iff y = x + a mod 2. The task is to find a. Classically, this problem is hard, i.e. we have to query the oracle exponentially often. To see this note first that we have to find a pair (x, y) with f (x) = f (y) and the probability to get it with two random queries is 2−N (since there is for each x exactly one y 6= x with f (x) = f (y)). If we use the box 2N/4 times, we get less than 2N/2 different pairs. Hence the probability to get the correct solution is 2−N/2 , i.e. arbitrarily small even with exponentially many queries. Assume now that we let our box act on a quantum register H ⊗N ⊗ H⊗N in the state Ψ ⊗ |0i with Ψ from Equation (4.48) to get Uf (Ψ ⊗ |0i) from (4.50). Now we measure the second register. The outcome is one of 2N −1 possible values (say f (x0 )), each of which occurs equiprobable. Hence, after the measurement the first register is the state 2−1/2 (|xi + |x + ai). Now we let a Hadamard gate H (cf. Figure 4.9) act on each qubit of the first register and the result is (this follows with a short calculation) X ¡ ¢ 1 1 √ H ⊗N |xi + |x + ai = √ (−1)x·y |yi (4.51) N −1 2 2 a·y=0 where the dot denotes the (B-valued) scalar product in the vector space B N . Now we perform a measurement on the first register (in computational basis) and we get a y ∈ BN with the property y · a = 0. If we repeat this procedure N times and if we get N linear independent values yj we can determine a as a solution of the system of equations y1 · a = 0, . . . , yN · a = 0. The probability to appear as an outcome of the second measurement is for each y with y · a = 0 given by 21−N . Therefore the success probability can be made arbitrarily big while the number of times we have to access the box is linear in N . 4.6 Quantum cryptography Finally we want to have a short look on quantum cryptography – another more practical application of quantum information, which has the potential to emerge into technology in the not so distant future (see e.g. [130, 126, 44] for some experimental realizations and [97] for a more detailed overview). Hence let us assume that Alice has a message x ∈ BN which she wants to send secretly to Bob over a public communication channels. One way to do this is the so called “one-time pad”: Alice generates randomly a second bit-string y ∈ BN of the same length as x sends x + y 2 If an analog computer works reliably only with a certain accuracy, we can rewrite the algorithm into a digital one. 73 4.6. Quantum cryptography instead of x. Without knowledge of the key y it is completely impossible to recover the message x from x + y. Hence this is a perfectly secure method to transmit secret data. Unfortunately it is completely useless without a secure way to transmit the key y to Bob, because Bob needs y to decrypt the message x + y (simply by adding y again). What makes the situation even worse is the fact that the key y can be used only once (therefore the name one-time pad). If two messages x 1 , x2 are encrypted with the same key we can use x1 as a key to decrypt x2 and vice versa: (x1 + y) + (x2 + y) = x1 + x2 , hence both messages are partly compromised. Due to these problems completely different approaches, namely “public key systems” like DSA and RSA are used today for cryptography. The idea is to use two keys instead of one: a private key which is used for decryption and only known to its owner and a public key used for encryption, which is publicly available (we do not discuss the algorithms needed for key generation, encryption and decryption here, see [198] and the references therein instead). To use this method, Bob generates a key pair (z, y), keeps his private key (y) at a secure place and sends the public one (z) to Alice over a public channel. Alice encrypts her message with z sends the result to Bob and he can decrypt it with y. The security of this scheme relies on the assumption that the factoring problem is computationally hard, i.e. not in class P, because to calculate y from z requires the factorization of large integers. Since the latter is tractable on quantum computers via Shor’s algorithm, the security of public key systems breaks down if quantum computers become available in the future. Another problem of more fundamental nature is the unproven status of the conjecture that factorization is not solvable in polynomial time. Consequently, security of public key systems is not proven either. The crucial point is now that quantum information provides a way to distribute a cryptographic key y in a secure way, such that y can be used as a one-time pad afterwards. The basic idea is to use the no cloning theorem to detect possible eavesdropping attempts. To make this more transparent, let us consider a particular example here, namely the probably most prominent protocol proposed by Benett and Brassard in 1984 [18]. 1. Assume that Alice wants to transmit bits from the (randomly generated) key y ∈ BN through an ideal quantum channel to Bob. Before they start they settle upon two orthonormal bases e0 , e1 ∈ H, respectively f0 , f1 ∈ H, which are mutually nonorthogonal, i.e. |hej , fk i| ≥ ² > 0 with ² big enough for each j, k = 0, 1. If photons are used as information carrier a typical choice are linearly polarized photons with polarization direction rotated by 45◦ against each other. 2. To send one bit j ∈ B Alice selects now at random one of the two bases, say e0 , e1 and then she sends a qubit in the state |ej ihej | through the channel. Note that neither Bob nor a potential eavesdropper knows which bases she has chosen. 3. When Bob receives the qubit he selects, as Alice before, at random a base and performs the corresponding von Neumann measurement to get one classical bit k ∈ B, which he records together with the measurement method. 4. Both repeat this procedure until the whole string y ∈ BN is transmitted and then Bob tells Alice (through a classical, public communication channel) bit for bit which base he has used for the measurement (but not the result of the measurement). If he has used the same base as Alice both keep the corresponding bit otherwise they discard it. They end up with a bit-string y 0 ∈ BM of a reduced length M . If this is not sufficient they have to continue sending random bits until the key is long enough. For large N the rate of 4. Basic tasks 74 successfully transmitted bits per bits sended is obviously 1/2. Hence Alice has to send approximately twice as many bits as they need. To see why this procedure is secure, assume now that the eavesdropper Eve can listen and modify the information sent through the quantum channel and that she can listen on the classical channel but can not modify it (we come back to this restriction in a minute). Hence Eve can intercept the qubits sent by Alice and make two copies of it. One she forwards to Bob and the other she keeps for later analysis. Due to the no cloning theorem however she has produced errors in both copies and the quality of her own decreases if she tries to make the error in Bob’s as small as possible. Even if Eve knows about the two bases e0 , e1 and f0 , f1 she does not know which one Alice uses to send a particular qubit3 . Hence Eve has to decide randomly which base to choose (as Bob). If e0 , e1 and f0 , f1 are chosen optimal, i.e. |hej , fk i|2 = 0.5 it is easy to see that the error rate Eve necessarily produces if she randomly measures in one of the bases is 1/4 for large N . To detect this error Alice and Bob simply have to sacrify portions of the generated key and to compare randomly selected bits using their classical channel. If the error rate they detect is too big they can decide to drop the whole key and restart from the beginning. So let us discuss finally a situation where Eve is able to intercept the quantum and the classical channel. This would imply that she can play Bob’s part for Alice and Alice’s for Bob. As a result she shares a key with Alice and one with Bob. Hence she can decode all secret data Alice sends to Bob, read it, and encode it finally again to forward it to Bob. To secure against such a “woman in the middle attack”, Alice and Bob can use classical authentication protocols which ensure that the correct person is at the other end of the line. This implies that they need a small amount of initial secret material which can be renewed however from the new key they have generated through quantum communication. 3 If Alice and Bob uses only one basis to send the data and Eve knows about it she can produce of course ideal copies of the qubits. This is actually the reason why two nonorthogonal bases are necessary. Chapter 5 Entanglement measures In the last chapter we have seen that entanglement is an essential resource for many tasks of quantum information theory, like teleportation or quantum computation. This means that entangled states are needed for the functioning of many processes and that they are consumed during operation. It is therefore necessary to have measures which tell us whether the entanglement contained in a number of quantum systems is sufficient to perform a certain task. What makes this subject difficult, is the fact that we can not restrict the discussion to systems in a maximally or at least highly entangled pure state. Due to unavoidable decoherence effects realistic applications have to deal with imperfect systems in mixed states, and exactly in this situation the question for the amount of available entanglement is interesting. 5.1 General properties and definitions The difficulties arising if we try to quantify entanglement can be divided, roughly speaking, into two parts: Firstly we have to find a reasonable quantity which describes exactly those properties which we are interested in and secondly we have to calculate it for a given state. In this section we will discuss the first problem and consider several different possibilities to define entanglement measures. 5.1.1 Axiomatics First of all, we will collect some general properties which a reasonable entanglement measure should have (cf. also [24, 216, 215, 217, 121]). To quantify entanglement, means nothing else but to associate a positive real number to each state of (finite dimensional) two-partite systems. Axiom E0 An entanglement measure is a function E which assigns to each state ρ of a finite dimensional bipartite system a positive real number E(ρ) ∈ R + . Note that we have glanced over some mathematical subtleties here, because E is not just defined on the state space of B(H ⊗ K) systems for particularly chosen Hilbert spaces H and K – E is defined on any state space for arbitrary finite dimensional H and K. This is expressed mathematically most conveniently by a family of functions which behaves naturally under restrictions (i.e. the restriction to a subspace H0 ⊗ K0 coincides with the function belonging to H0 ⊗ K0 ). However we will see soon that we can safely ignore this problem. The next point concerns the range of E. If ρ is unentangled E(ρ) should be zero of course and it should be maximal on maximally entangled states. But what happens if we allow the dimensions of H and K to grow? To get an answer consider first a pair of qubits in a maximally entangled state ρ. It should contain exactly one bit entanglement i.e. E(ρ) = 1 and N pairs in the state ρ⊗N should contain N bits. If we interpret ρ⊗N as a maximally entangled state of a H ⊗ H system with H = CN we get E(ρ⊗N ) = log2 (dim(H)) = N , where we have to reshuffle in ρ⊗N the tensor factors such that (C2 ⊗ C2 )⊗N becomes (C2 )⊗N ⊗ (C2 )⊗N (i.e. “all Alice particles to the left and all Bob particles to the right”; cf. Section 4.3.) This observation motivates the following. Axiom E1 (Normalization) E vanishes on separable and takes its maximum on maximally entangled states. More precisely, this means that E(σ) ≤ E(ρ) = log 2 (d) for ρ, σ ∈ S(H ⊗ H) and ρ maximally entangled. 5. Entanglement measures 76 One thing an entanglement measure should tell us, is how much quantum information can be maximally teleported with a certain amount of entanglement, where this maximum is taken over all possible teleportation schemes and distillation protocols, hence it can not be increased further by additional LOCC operations on the entangled systems in question. This consideration motivates the following Axiom. Axiom E2 (LOCC monotonicity) E can not increase under LOCC operation, i.e. E[T (ρ)] ≤ E(ρ) for all states ρ and all LOCC channels T . A special case of LOCC operations are of course local unitary operations U ⊗ V . Axiom E2 implies now that E(U ⊗ V ρU ∗ ⊗ V ∗ ) ≤ E(ρ) and on the other hand E(U ∗ ⊗ V ∗ ρeU ⊗ V ) ≤ E(e ρ) hence with ρe = U ⊗ V ρU ∗ ⊗ V we get E(ρ) ≤ E(U ⊗ ∗ ∗ V ρV ⊗U ) therefore E(ρ) = E(U ⊗V ρU ∗ ⊗V ∗ ). We fix this property as a weakened version of Axiom E2: Axiom E2a (Local unitary invariance) E is invariant under local unitaries, i.e. E(U ⊗ V ρU ∗ ⊗ V ∗ ) = E(ρ) for all states ρ and all unitaries U , V . This axiom shows why we do not have to bother about families of functions as mentioned above. If E is defined on S(H ⊗ H) it is automatically defined on S(H1 ⊗ H2 ) for all Hilbert spaces Hk with dim(Hk ) ≤ dim(H), because we can embed H1 ⊗ H2 under this condition unitarily into H ⊗ H. Consider now a convex linear combination λρ + (1 − λ)σ with 0 ≤ λ ≤ 1. Entanglement can not be “generated” by mixing two states, i.e. E(λρ + (1 − λ)σ) ≤ λE(ρ) + (1 − λ)E(σ). Axiom E3 (Convexity) E is a convex function, i.e. E(λρ + (1 − λ)σ) ≤ λE(ρ) + (1 − λ)E(σ) for two states ρ, σ and 0 ≤ λ ≤ 1. The next property concerns the continuity of E, i.e. if we perturb ρ slightly the change of E(ρ) should be small. This can be expressed most conveniently as continuity of E in the trace norm. At this point however it is not quite clear, how we have to handle the fact that E is defined for arbitrary Hilbert spaces. The following version is motivated basically by the fact that it is a crucial assumption in Theorem 5.1.2 and 5.1.3. Axiom E4 (Continuity) Consider a sequence of Hilbert spaces HN , N ∈ N and two sequences of states ρN , σN ∈ S(HN ⊗ HN ) with lim kρN − σN k1 = 0. Then we have E(ρN ) − E(σN ) = 0. (5.1) lim N →∞ 1 + log2 (dim HN ) The last point we have to consider here are additivity properties: Since we are looking at entanglement as a resource, it is natural to assume that we can do with two pairs in the state ρ twice as much as with one ρ, or more precisely E(ρ ⊗ ρ) = 2E(ρ) (in ρ ⊗ ρ we have to reshuffle tensor factors again ;see above). Axiom E5 (Additivity) For any pair of two-partite states ρ, σ ∈ S(H ⊗ K) we have E(σ ⊗ ρ) = E(σ) + E(ρ). Unfortunately this rather natural looking axiom seems to be too strong (it excludes reasonable candidates). It should be however always true that entanglement can not increase if we put two pairs together. Axiom E5a (Subadditivity) For any pair of states ρ, σ we have E(ρ ⊗ σ) ≤ E(ρ) + E(σ). There are further modifications of additivity available in the literature. Most frequently used is the following, which restricts Axiom E5 to the case ρ = σ: 5.1. General properties and definitions 77 Axiom E5b (Weak additivity) For any state ρ of a bipartite system we have N −1 E(ρ⊗N ) = E(ρ). Finally, the weakest version of additivity only deals with the behavior of E for large tensor products, i.e. ρ⊗N for N → ∞. Axiom E5c (Existence of a regularization) For each state ρ the limit E(ρ⊗N ) N →∞ N E ∞ (ρ) = lim (5.2) exists. 5.1.2 Pure states Let us consider now a pure state ρ = |ψihψ| ∈ S(H ⊗ K). If it is entangled its partial trace σ = trH |ψihψ| = trK |ψihψ| is mixed and for a maximally entangled state it is maximally mixed. This suggests to use the von Neumann entropy 1 of ρ, which measures how much a state is mixed, as an entanglement measure for mixed states, i.e. we define [17, 24] £ ¤ EvN (ρ) = − tr trH ρ ln(trH ρ) . (5.3) It is easy to deduce from the properties of the von Neumann entropy that E vN satisfies Axioms E0, E1, E3 and E5b. Somewhat more difficult is only Axiom E2 which follows however from a nice theorem of Nielsen [169] which relates LOCC operations (on pure states) to the theory of majorization. To state it here we need first some terminology. Consider two probability distributions λ = (λ 1 , . . . , λM ) and µ = (µ1 , . . . , µN ) both given in decreasing order (i.e. λ1 ≥ . . . ≥ λM and µ1 ≥ . . . ≥ µN ). We say that λ is majorized by µ, in symbols λ ≺ µ, if k X j=1 λj ≤ k X j=1 µj ∀k = 1, . . . , min M, N (5.4) holds. Now we have the following result (see [169] for a proof). P 1/2 0 Theorem 5.1.1 A pure state ψ = j λj ej ⊗ ej ∈ H ⊗ K can be transformed P 1/2 into another pure state φ = j µj fj ⊗ fj0 ∈ H ⊗ K via a LOCC operation, iff the Schmidt coefficients of ψ are majorized by those of φ, i.e. λ ≺ µ. The von Neumann entropy of the restriction trH |ψihψ| can be P immediately calculated from the Schmidt coefficients λ of ψ by EvN (|ψihψ|) = − Pj λj ln(λj ). Axiom E2 follows therefore from the fact that the entropy S(λ) = − j λj ln(λj ) of a probability distribution λ is a Shur concave function, i.e. λ ≺ µ implies S(λ) ≥ S(µ); see [171]. Hence we have seen so far that EvN is one possible candidate for an entanglement measure on pure states. In the following we will see that it is in fact the only candidate which is physically reasonable. There are basically two reasons for this. The first one deals with distillation of entanglement. It was shown by Bennett et. al. [17] that each state ψ ∈ H ⊗ K of a bipartite system can be prepared out of (a possibly large number of) systems in an arbitrary entangled state φ by LOCC operations. To be more precise, we can find a sequence of LOCC operations ¤ £ ¤ £ (5.5) TN : B (H ⊗ K)⊗M (N ) → B (H ⊗ K)⊗N such that lim kTN∗ (|φihφ|⊗N ) − |ψihψ|k1 = 0 N →∞ (5.6) 1 We assume here and in the following that the reader is sufficiently familiar with entropies. If this is not the case we refer to [174]. 5. Entanglement measures 78 holds with a nonvanishing rate r = limN →∞ M (N )/N . This is done either by distillation (r < 1 if ψ is higher entangled then φ) or by “diluting” entanglement, i.e. creating many less entangled states from few highly entangled ones (r > 1). All this can be performed in a reversible way: We can start with some maximally entangled qubits dilute them to get many less entangled states which can be distilled afterwards to get the original states back (again only in an asymptotic sense). The crucial point is that the asymptotic rate r of these processes is given in terms of EvN by r = EvN (|φihφ|)/EvN (|ψihψ|). Hence we can say, roughly speaking that EvN (|ψihψ|) describes exactly the amount of maximally entangled qubits which is contained in |ψihψ|. A second somewhat more formal reason is that EvN is the only entanglement measure on the set of pure states which satisfies the axioms formulated above. In other words the following “uniqueness theorem for entanglement measures” holds [182, 217, 74] Theorem 5.1.2 The reduced von Neumann entropy EvN is the only entanglement measure on pure states which satisfies Axioms E0 – E5. 5.1.3 Entanglement measures for mixed states To find reasonable entanglement measures for mixed states is much more difficult. There are in fact many possibilities (e.g. the maximally entangled fraction introduced in Subsection 3.1.1 can be regarded as a simple measure) and we want to present therefore only four of the most reasonable candidates. Among those measures which we do not discuss here are negativity quantities ([220] and the references therein) the “best separable approximation” [151], the base norm associated with the set of separable states [219, 188] and ppt-distillation rates [185]. The first measure we want to present is oriented along the discussion of pure states: We define, roughly speaking, the asymptotic rate with which maximally entangled qubits can be distilled at most out of a state ρ ∈ S(H ⊗ K) as the Entanglement of Distillation ED (ρ) of ρ; cf [20]. To be more precise consider all possible distillation protocols for ρ (cf. Section 4.3), i.e. all sequences of LOCC channels (5.7) TN : B(CdN ⊗ CdN ) → B(H⊗N ⊗ K⊗N ) such that lim kTN∗ (ρ⊗N ) − |ΩN ihΩN | k1 = 0 N →∞ (5.8) holds with a sequence of maximally entangled states ΩN ∈ CdN . Now we can define ED (ρ) = sup lim sup (TN )N ∈N N →∞ log2 (dN ) , N (5.9) where the supremum is taken over all possible distillation protocols (T N )N ∈N . It is not very difficult to see that ED satisfies E0, E1, E2 and E5b. It is not known whether continuity (E4) and convexity (Axiom E3) holds. It can be shown however that ED is not convex (and not additive; Axiom E5) if npt bound entangled states exist (see [194], cf. also Subsection 4.3.3). For pure states we have discussed beside distillation the “dilution” of entanglement and we can use, similar to ED , the asymptotic rate with which bipartite systems in a given state ρ can be prepared out of maximally entangled singlets [108]. Hence consider again a sequence of LOCC channels TN : B(H⊗N ⊗ K⊗N ) → B(CdN ⊗ CdN ) dN (5.10) and a sequence of maximally entangled states ΩN ∈ C , N ∈ N, but now with the property lim kρ⊗N − TN∗ (|ΩN ihΩN |) k1 = 0. (5.11) N →∞ 5.2. Two qubits 79 Then we can define the entanglement cost EC (ρ) of ρ as EC (ρ) = inf lim inf (SN )N ∈N N →∞ log2 (dN ) , N (5.12) where the infimum is taken over all dilution protocols SN , N ∈ N. It is again easy to see that EC satisfies E0, E1, E2 and E5b. In contrast to ED however it can be shown that EC is convex (Axiom E3), while it is not known, whether EC is continuous (Axiom E4); cf [108] for proofs. ED and EC are directly based on operational concepts. The remaining two measures we want to discuss here are defined in a more abstract way. The first can be characterized as the minimal convex extension of EvN to mixed states: We define the entanglement of formation EF of ρ as [24] X EF (ρ) = P inf pj EvN (|ψj ihψj |), (5.13) ρ= j pj |ψj ihψj | where the infimum is taken over all decompositions of ρ into a convex sum of pure states. EF satisfies E0 - E4 and E5a (cf. [24] for E2 and [170] for E4 the rest follows directly from the definition). Whether EF is (weakly) additive (Axiom E5b) is not known. Furthermore it is conjectured that EF coincides with EC . However proven is only the identity EF∞ = EC , where the existence of the regularization EF∞ of EF follows directly from subadditivity. Another idea to quantify entanglement is to measure the “distance” of the (entangled) ρ from the set of separable states D. It hat turned out [216] that among all possible distance functions the relative entropy is physically most reasonable. Hence we define the relative entropy of entanglement as £ ¡ ¢¤ ER (ρ) = inf S(ρ|σ), S(ρ|σ) = tr ρ log2 ρ − ρ log2 σ , (5.14) σ∈D where the infimum is taken over all separable states. It can be shown that E R satisfies, as EF the Axioms E0 - E4 and E5a, where E1 and E2 are shown in [216] and E4 in [73]; the rest follows directly from the definition. It is shown in [221] that ∞ of ER does not satisfy E5b; cf. also Subsection 5.3. Hence the regularization ER ER differs from ER . Finally let us give now some comments on the relation between the measures just introduced. On pure states all measures just discussed, coincide with the reduced von Neumann entropy – this follows from Theorem 5.1.2 and the properties stated in the last Subsection. For mixed states the situation is more difficult. It can be shown however that ED ≤ EC holds and that all “reasonable” entanglement measures lie in between [121]. Theorem 5.1.3 For each entanglement measure E satisfying E0, E1, E2 and E5b and each state ρ ∈ S(H ⊗ K) we have ED (ρ) ≤ E(ρ) ≤ EC (ρ). Unfortunately no measure we have discussed in the last Subsection satisfies all the assumptions of the theorem. It is possible however to get a similar statement for the regularization E ∞ with weaker assumptions on E itself (in particular without assuming additivity); cf [74]. 5.2 Two qubits Even more difficult than finding reasonable entanglement measures are explicit calculations. All measures we have discussed above involve optimization processes over spaces which grow exponentially with the dimension of the Hilbert space. A direct numerical calculation for a general state ρ is therefore hopeless. There are however some attempts to get either some bounds on entanglement measures or to get explicit calculations for special classes of states. We will concentrate this discussion to 5. Entanglement measures 80 some relevant special cases. On the one hand we will concentrate on EF and ER and on the other we will look at two special classes of states where explicit calculations are possible: Two qubit systems in this section and states with symmetry properties in the next one. 5.2.1 Pure states Assume for the rest of this section that H = C2 holds and consider first a pure state ψ ∈ H ⊗ H. To calculate EvN (ψ) is of course not difficult and it is straightforward to see that (cf. for all material of this and the following subsection [24]): · ³ ´¸ p 1 (5.15) EvN (ψ) = H 1 + 1 − C(ψ)2 2 holds, with H(x) = −x log2 (x) − (1 − x) log2 (1 − x) and the concurrence C(ψ) of ψ which is defined by ¯ ¯ ¯ 3 ¯ 3 X ¯X 2 ¯ ¯ C(ψ) = ¯ αj ¯¯ with ψ = αj Φ j , ¯ j=0 ¯ j=0 (5.16) (5.17) where Φj , j = 0, . . . , 3 denotes the Bell basis (3.3). Since C becomes rather important in the following let us reexpress it as C(ψ) = |hψ, Ξψi|, where ψ 7→ Ξψ denotes complex conjugation in Bell basis. Hence Ξ is an antiunitary operator and it can be written as the tensor product Ξ = ξ ⊗ ξ of the map H 3 φ 7→ σ2 φ̄, where φ̄ denotes complex conjugation in the canonical basis and σ2 is the second Pauli matrix. Hence local unitaries (i.e. those of the form U1 ⊗ U2 ) commute with Ξ and it can be shown that this is not only a necessary but also a sufficient condition for a unitary to be local [222]. We see from Equations (5.15) and (5.17) that C(ψ) ranges from 0 to 1 and that EvN (ψ) is a monotone function in C(ψ). The latter can be considered therefore as an entanglement quantity in its own right. For a Bell state we get in particular C(Φj ) = 1 while a separable state φ1 ⊗ φ2 leads to C(φ1 ⊗ φ2 ) = 0; this can be seen easily with the factorization Ξ = ξ ⊗ ξ. Assume now that one of the αj say α0 satisfies |α0 |2 > 1/2. This implies that C(ψ) can not be zero since ¯ ¯ ¯X ¯ ¯ 3 2¯ ¯ αj ¯¯ ≤ 1 − |α0 |2 (5.18) ¯ ¯ j=1 ¯ must hold. Hence C(ψ) is at least 1 − 2|α0 |2 and this implies for EvN and arbitrary ψ i ( h p ¢ ¡ H 21 + x(1 − x) x ≥ 21 2 . (5.19) EvN (ψ) ≥ h |hΦ0 , ψi| with h(x) = 0 x < 12 This inequality remains valid if we replace Φ0 by any other maximally entangled state Φ ∈ H⊗H. To see this note that two maximally entangled states Φ, Φ 0 ∈ H⊗H are related (up to a phase) by a local unitary transformation U1 ⊗ U2 (this follows immediately from their Schmidt decomposition; cf Subsection 3.1.1). Hence, if we replace the Bell basis in Equation (5.17) by Φ0j = U1 ⊗ U2 Φj , j = 0, . . . , 3 we get for the corresponding C 0 the equation C 0 (ψ) = hU1∗ ⊗ U2∗ ψ, ΞU1∗ ⊗ U2∗ ψi = C(ψ) since Ξ commutes with local unitaries. We can even replace |hΦ0 , ψi|2 with the supremum over all maximally entangled states and get therfore £ ¡ ¢¤ EvN (ψ) ≥ h F |ψihψ| , (5.20) 5.2. Two qubits 81 ¡ ¢ where F |ψihψ| is the maximally entangled fraction of |ψihψ| which we have introduced in Subsection 3.1.1. To see that even equality holds in Equation (5.20) note first that it is sufficient to consider the case ψ = a|00i+b|11i with a, b ≥ 0, a2 +b2 = 1, since each pure state ψ can be brought into this form (this follows again from the Schmidt decomposition) by a local unitary transformation which on the other hand does not change E vN . The maximally state which maximizes |hψ, Φi|2 is in this case Φ0 and we ¡ ¢ entangled 2 get £ F ¡ |ψihψ| ¢¤ = (a + b) /2 = 1/2 + ab. Straightforward calculations now show that h F |ψihψ| = h(1/2 + ab) = EvN (ψ) holds as stated. 5.2.2 EOF for Bell diagonal states It is easy to extend the inequality (5.20) to mixed states if we use the convexity of EF and the fact that EF coincides with EvN on pure states. Hence (5.20) becomes £ ¤ EF (ρ) ≥ h F(ρ) . (5.21) For general two qubit states ¡ this bound is not¢ achieved however. This can be seen with the example ρ = 1/2 |φ1 ihφ1 | + |00ih00| , which we have already considered in the last of Subsection 3.1.1. It is easy to see that F(ρ) = 1/2 holds £ paragraph ¤ hence h F(ρ) = 0 but ρ is entangled. Nevertheless we can show that equality holds P3 in Equation (5.21) if we restrict it to Bell diagonal states ρ = j=0 λj|Φ P j ihΦj |. To prove this statement we have to find a convex decomposition ρ = j µj |Ψj ihΨj | £ ¤ P of such a ρ into pure states |Ψj ihΨj | such that h F(ρ) = j µj EvN (|Ψj ihΨj | £ ¤ holds. Since EF (ρ) can not be smaller than h F(ρ) due to inequality (5.21) this decomposition must be optimal and equality is proven. To find such Ψj assume first that the biggest eigenvalue of ρ is greater than 1/2, and let, without loss of generality, be λ1 this eigenvalue. A good choice for the Ψj are then the eight pure states 3 X p p λ0 Φ0 + i (± λj )Φj (5.22) j=1 The P reduced von Neumann entropy of all these states equals h(λ1 ), hence j µj EvN (|Ψj ihΨj |) = h(λ1 ) and therefore EF (ρ) = h(λ1 ). Since the maximally entangled fraction of ρ is obviously λ1 we see that (5.21) holds with equality. Assume now that the highest eigenvalue is less than 1/2. Then we can find phase P3 factors exp(iφj ) such that j=0 exp(iφj )λj = 0 holds and ρ can be expressed as a convex linear combination of the states 3 X p p eiφ0 /2 λ0 Φ0 + i (±eiφj /2 λj )Φj . (5.23) j=1 The concurrence C of all these states is 0 hence their entanglement is 0 by Equation (5.15), which in turn implies EF (ρ) = 0. Again we see that equality is achieved in (5.21) since the maximally entangled fraction of ρ is less than 1/2. Summarizing this discussion we have shown (cf. Figure 5.1) Proposition 5.2.1 A Bell diagonal state ρ is entangled iff its highest eigenvalue λ is greater than 1/2. In this case the Entanglement of Formation of ρ is given by ¸ · 1 p + λ(1 − λ) . (5.24) EF (ρ) = H 2 5. Entanglement measures 82 1 Entanglement of Formation Relative Entropy EF (ρ) ER (ρ) 0.8 0.6 0.4 0.2 0 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Highest eigenvalue λ of ρ Figure 5.1: Entanglement of Formation and Relative Entropy of Entanglement for Bell diagonal states, plotted as a function of the highest eigenvalue λ of ρ 5.2.3 Wootters formula If we have a general two qubit state ρ there is a formula of Wootters [238] which allows an easy calculation of EF . It is based on a generalization of the concurrence C to mixed states. To motivate it rewrite C 2 (ψ) = |hψ, Ξψi| as ¡ ¢ ¡ ¢ C 2 (ψ) = tr |ψihψ||ΞψihΞψ| = tr ρΞρΞ = tr(R2 ) (5.25) with R= q √ √ ρΞρΞ ρ. (5.26) Here we have set ρ = |ψihψ|. The definition of the hermitian matrix R however makes sense for arbitrary ρ as well. If we write λj , j = 1, . . . , 4 for the eigenvalues of R and λ1 is without loss of generality the biggest one we can define the concurrence of an arbitrary two qubit state ρ as [238] ¡ ¢ C(ρ) = max 0, 2λ1 − tr(R) = max(0, λ1 − λ2 − λ3 − λ4 ). (5.27) It is easy to see that C(|ψihψ|) coincides with C(ψ) from (5.17). The crucial point is now that Equation (5.15) holds for EF (ρ) if we insert C(ρ) instead of C(ψ): Theorem 5.2.2 (Wootters Formula) The Entanglement of Formation of a two qubit system in a state ρ is given by · ³ ´¸ p 1 EF (ρ) = H 1 + 1 − C(ρ)2 (5.28) 2 where the concurrence of ρ is given in Equation (5.27) and H denotes the binary entropy from (5.16). P To prove this theorem we firstly have to find a convex decomposition ρ = of ρ into pure states Ψj such that the average reduced von Neuj µj |Ψj ihΨj |P mann entropy j µj EvN (Ψj ) coincides with the right hand side of Equation (5.28). Secondly we have to show that we have really found the minimal decomposition. 5.3. Entanglement measures under symmetry 83 Since this is much more involved than the simple case discussed in Subsection 5.2.2 we omit the proof and refer to [238] instead. Note however that Equation (5.28) really coincides with the special cases we have derived for pure and Bell diagonal states. Finally let us add the remark that there is no analogon of Wootters’ formula for higher dimensional Hilbert spaces. It can be shown [222] that the essential properties of the Bell basis Φj , j = 0, .., 3 which would be necessary for such a generalization are available only in 2 × 2 dimensions. 5.2.4 Relative entropy for Bell diagonal states To calculate the Relative Entropy of Entanglement ER for two qubit systems is more difficult. However there is at least an easy formula for Bell diagonal states which we will give in the following; [216]. Proposition 5.2.3 The Relative Entropy of Entanglement for a Bell diagonal state ρ with highest eigenvalue λ is given by (cf. Figure 5.1) ( 1 − H(λ) λ > 21 ER (ρ) = (5.29) 0 λ ≤ 12 Proof. For a Bell diagonal state ρ = P3 j=0 λj |Φj ihΦj | we have to calculate £ ¡ ¢¤ ER (ρ) = inf tr ρ log2 ρ − ρ log2 σ σ∈D 3 X λj hΦj , log2 (σ)Φj i . = tr(ρ log2 ρ) + inf − σ∈D (5.30) (5.31) j=0 Since log is a concave function we have − log 2 hΦj , σΦj i ≤ hΦj , − log2 (σ)Φj i and therefore 3 X ER (ρ) ≥ tr(ρ log2 ρ) + inf − λj log2 hΦj , σΦj i . (5.32) σ∈D j=0 Hence only the diagonal elements of σ in the Bell basis enter the minimization on the right hand side of this inequality and this implies that we can restrict the infimum to the set of separable Bell diagonal state. Since a Bell diagonal state is separable iff all its eigenvalues are less than 1/2 (Proposition 5.2.1) we get 3 3 X X pj = 1. (5.33) λj log2 pj , with ER (ρ) ≥ tr(ρ log2 ρ) + inf − pj ∈[0,1/2] j=0 j=0 This is an optimization problem (with constraints) over only four real parameters and easy to solve. If the highest eigenvalue of ρ is greater than 1/2 we get p 1 = 1/2 and pj = λj /(2 − 2λ), where we have chosen without loss of generality λ = λ1 . We get a lower bound on ER (ρ) which is achieved if we insert the corresponding σ in Equation (5.31). Hence we have proven the statement for λ > 1/2. which completes the proof, since we have already seen that λ ≤ 1/2 implies that ρ is separable (Proposition 5.2.1). 2 5.3 Entanglement measures under symmetry The problems occuring if we try to calculate quantities like ER or EF for general density matrices arise from the fact that we have to solve optimization problems over very high dimensional spaces. One possible strategy to get explicit results is therefore parameter reduction by symmetry arguments. This can be done if the state 5. Entanglement measures 84 in question admits some invariance properties like Werner, isotropic or OO-invariant states; cf. Section 3.1. In the following we will give some particular examples for such calculations, while a detailed discussion of the general idea (together with much more examples and further references) can be found in [221]. 5.3.1 Entanglement of Formation Consider a compact group of unitaries G ⊂ B(H ⊗ H) (where H is again arbitrary finite dimensional), the set of G-invariant states, i.e. R all ρ with [V, ρ] = 0 for all V ∈ G and the corresponding twirl operation PG σ = G V σV ∗ dV . Particular examples we are looking at are: 1. Werner states where G consists of all unitaries U ⊗ U 2. Isotropic states where each V ∈ G has the form V = U ⊗ Ū and finally 3. OO-invariant states where G consists of unitaries U ⊗ U with real matrix elements (U = Ū ) and the twirl is given in Equation (3.24). One way to calculate EF for a G-invariant state ρ consists now of the following steps: 1. Determine the set Mρ of pure states Φ such that PG |ΦihΦ| = ρ holds. 2. Calculate the function PG S 3 ρ 7→ ²G (ρ) = inf{EvN (σ) | σ ∈ Mρ } ∈ R, (5.34) where we have denoted the set of G-invariant states with PG S. 3. Determine EF (ρ) then in terms of the convex hull of ², i.e. P EF (ρ) = inf{ j λj ²(σj ) | P P σj ∈ PG S, 0 ≤ λj ≤ 1, ρ = j λj σj , j λj = 1}. (5.35) The equality in the last Equation is of course a non-trivial statement which has to be proved. We skip this point, however, and refer the reader to [221]. The advantage of this scheme relies on the fact that spaces of G invariant states are in general very low dimensional (if G is not too small). Hence the optimization problem contained in step 3 has a much bigger chance to be tractable than the one we have to solve for the original definition of EF . There is of course no guarantee that any of this three steps can be carried out in a concrete situation. For the three examples mentioned above, however, there are results available, which we will present in the following. 5.3.2 Werner states Let us start with Werner states [221]. In this case ρ is uniquely determined by its flip expectation value tr(ρF ) (cf. Subsection 3.1.2). To determine Φ ∈ H ⊗ H such that PUU |ΦihΦ| = ρ holds, we have to solve therefore the equation X (5.36) hΦ, F Φi = Φjk Φkj = tr(F ρ), jk where Φjk denote components of Φ in the canonical basis. On the otherPhand the reduced density matrix ρ = tr1 |ΦihΦ| has the matrix elements ρjk = l Φjl Φkl . By exploiting U ⊗ U invariance we can assume without loss of generality that ρ is diagonal. Hence to get the function ²UU we have to minimize # " X ¡ ¢ X 2 (5.37) S |Φjk | EvN |ΦihΦ| = j k under the constraint (5.36), where S(x) = −x log2 (x) denotes the von Neumann entropy. We skip these calculations here (see [221] instead) and state the results only. For tr(F ρ) ≥ 0 we get ²(ρ) = 0 (as expected since ρ is separable in this case) and with H from (5.16) · ³ ´¸ p 1 2 ²UU (ρ) = H 1 − 1 − tr(F ρ) (5.38) 2 5.3. Entanglement measures under symmetry 85 1 EF (ρ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -1 -0.8 -0.6 -0.4 -0.2 0 tr(ρF ) Figure 5.2: Entanglement of Formation for Werner states plotted as function of the flip expectation. for tr(F ρ) < 0. The minima are taken for Φ where all Φjk except one diagonal element are zero in the case tr(F ρ) ≥ 0 and for Φ with only two (non-diagonal) coefficients Φjk , Φkj , j 6= k nonzero if tr(ρF ) < 0. The function ² is convex and coincides therefore with its convex hull such that we get Proposition 5.3.1 For any Werner state ρ the Entanglement of Formation is given by (cf. Figure 5.2) ´i ( h ³ p tr(F ρ) < 0 H 12 1 − 1 − tr(F ρ)2 (5.39) EF (ρ) = 0 tr(F ρ) ≥ 0. 5.3.3 Isotropic states Let us now consider isotropic, i.e. U ⊗ Ū invariant states. They are determined by the expectation value tr(ρFe ) with Fe from Equation (3.14). Hence we have to look e first for pure states Φ with hΦ, Fe¡Φi = tr(ρ ¢ F ) (since this determines, as for Werner |ΦihΦ| = ρ). To this states above, those Φ with PP UŪ P end assume that Φ has the Schmidt decomposition Φ = j λj fj ⊗ fj0 = U1 ⊗ U2 j λj ej ⊗ ej with appropriate unitary matrices U1 , U2 and the canonical basis ej , j = 1, . . . , d. Exploiting the U ⊗ Ū invariance of ρ we get * + X X e e tr(ρF ) = (1I ⊗ V ) λj ej ⊗ ej , F (1I ⊗ V ) (5.40) λ k ek ⊗ e k = X j,k,l,m j k λj λk hej ⊗ V ej , el ⊗ el ihem ⊗ em , ek ⊗ V ek i ¯2 ¯ ¯ ¯X ¯ ¯ ¯ λj hej , V ej i¯¯ =¯ ¯ ¯ j (5.41) (5.42) with V = U1T U2 and after inserting the definition of Fe . Following our general scheme, we have to minimize EvN (|ΦihΦ|) under the constraint given in Equation 5. Entanglement measures 86 2 d=4 d=3 d=2 ²UŪ (ρ) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 1 1.5 2 2.5 3 3.5 4 tr(ρFe ) Figure 5.3: ²-function for isotopic states plotted as a function of the flip expectation. For d > 2 it is not convex near the right endpoint. (5.42). This is explicitly done in [210]. We will only state the result here, which leads to the function ( H(γ) + (1 − γ) log2 (d − 1) tr(ρFe ) ≥ d1 (5.43) ²UŪ (ρ) = 0 tr(ρFe ) < 0 with 1 γ= 2 d µq tr(ρFe ) + ¶2 q e [d − 1][d − tr(ρF )] . (5.44) For d ≥ 3 this function is not convex (cf. Figure 5.3), hence we get Proposition 5.3.2 For any isotropic state the Entanglement of Formation is given as the convex hull P P (5.45) EF (ρ) = inf{ j λj ²UŪ (σj ) | ρ = j λj σj , PUŪ σ = σ} of the function ²UŪ in Equation (5.43). 5.3.4 OO-invariant states The results derived for isotropic and Werner states can be extended now to a large part of the set of OO-invariant states without solving new minimization problems. This is possible, because the definition of EF in Equation (5.13) allows under some conditions an easy extension to a suitable set of non-symmetric states. If more P precisely a nontrivial, minimizing decomposition ρ = j pj |ψj ihψj | of ρ is known, all states ρ0 which are a convex linear combination of the same |ψj ihψj | but arbitrary p0j have the same EF as ρ (see [221] for proof of the statement). For the general scheme we have presented in Subsection 5.3.1 this implies the following: If we know the pure states σ ∈ Mρ which solve the minimization problem for ²(ρ) in Equation (5.34) we get a minimizing decomposition of ρ in terms of U ∈ G translated copies of σ. This follows from the fact that ρ is by definition of Mρ the twirl of σ. Hence any convex linear combination of pure states U σU ∗ with U ∈ G has the same EF as ρ. 5.3. Entanglement measures under symmetry 87 A detailed analysis of the corresponding optimization problems in the case of Werner and isotropic states (which we have omitted here; see [221, 210] instead) leads therefore to the following results about OO-invariant states: The space of OO-invariant states decomposes into four regions: The separable square and three triangles A, B, C; cf. Figure 5.4. For all states ρ in triangle A we can calculate EF (ρ) as for Werner states in Proposition 5.3.1 and in triangle B we have to apply the result for isotropic states from Proposition 5.3.2. This implies in particular that EF depends in A only on tr(ρF ) and in B only on tr(ρFe ) and the dimension. 3 2 A C 1 B -1 0 1 5.3.5 Relative Entropy of Entanglement 0 To calculate ER (ρ) for a symmetric state ρ is even easier as the treatment of EF (ρ), because we can restrict the minimization in the definition of ER (ρ) in Equation (5.14) to G-invariant separable states, provided G is a group of local unitaries. To see this assume that σ ∈ D minimizes S(ρ|σ) for a G-invariant state ρ. Then we get S(ρ|U σU ∗ ) = S(ρ|σ) for all U ∈ G since the relative entropy S is invariant under unitary transformations of both arguments and due to its convexity we even get S(ρ|PG σ) ≤ S(ρ|σ). Hence PG σ minimizes S(ρ| · ) as well, and since PG σ ∈ D holds for a group G of local unitaries, we get ER (σ, ρ) = S(ρ|PG σ) as stated. Figure 5.4: State space of OO-invariant states. 1 ER (ρ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 -1 -0.8 -0.6 -0.4 -0.2 0 tr(ρF ) Figure 5.5: Relative Entropy of Entanglement for Werner states, plotted as a function of the flip expectation. The sets of Werner and isotropic states are just intervals and the corresponding separable states form subintervals over which we have to perform the optimization. Due to the convexity of the relative entropy in both arguments, however, it is clear that the minimum is attained exactly at the boundary between entangled and separable states. For Werner states this is the state σ0 with tr(F σ0 ) = 0, i.e. it gives equal weight to both minimal projections. To get ER (ρ) for a Werner state ρ 5. Entanglement measures 88 we have to calculate therefore only the relative entropy with respect to this state. Since all Werner states can be simultaneously diagonalized this is easily done and we get: µ ¶ 1 + tr(F ρ) ER (ρ) = 1 − H (5.46) 2 Similarly, the boundary point σ1 for isotropic states is given by tr(Fe σ1 ) = 1 which leads to ! à ! à tr(Fe ρ) 1 − tr(Fe ρ) tr(Fe ρ) log2 (d − 1) − S (5.47) , ER (ρ) = log2 d − 1 − d d d for each entangled isotropic state ρ, and 0 if ρ is separable. (S(p1 , p2 ) denotes here the entropy of the probability vector (p1 , p2 ).) 2 d=2 d=3 d=4 ER (ρ) 1.5 1 0.5 0 1 1.5 2 2.5 3 3.5 4 tr(ρFe ) Figure 5.6: Relative Entropy of Entanglement for isotropic states and d = 2, 3, 4, plotted as a function of tr(ρFe ). Let us now consider OO-invariant states. As for EOF we divide the state space into the separable square and the three triangles A, B, C; cf. Figure 5.4. The state at the coordinates (1, d) is a maximally entangled state and all separable states on the line connecting (0, 1) with (1, 1) minimize the relative entropy for this state. Hence consider a particular state σ on this line. The convexity property of the relative entropy immediately shows that σ is a minimizer for all states on the line connecting σ with the state at (1, d). In this way it is easy to calculate E R (ρ) for all ρ in A. In a similar way we can treat the triangle B: We just have to draw a line from ρ to the state at (−1, 0) and find the minimizer for ρ at the intersection with the separable border between (0, 0) and (0, 1). For all states in the triangle C the relative entropy is minimized by the separable state at (0, 1). An application of the scheme just reviewed is a proof that ER is not additive, i.e. it does not satisfy Axiom E5b. To see this consider the state ρ = tr(P− )−1 P− where P− denotes the projector on the antisymmetric subspace. It is a Werner state with flip expectation −1 (i.e. it corresponds to the point (−1, 0) in Figure 5.4). According to our discussion above S(ρ| · ) is minimized in this case by the separable state σ 0 5.3. Entanglement measures under symmetry 89 and we get ER (ρ) = 1 independently of the dimension d. The tensor product ρ⊗2 can be regarded as a state in S(H⊗2 ⊗ H⊗2 ) with U ⊗ U ⊗ V ⊗ V symmetry, where U, V are unitaries on H. Note that the corresponding state space of U U V V invariant states can be parameterized by the expectation of the three operators F ⊗ 1I, 1I ⊗ F and F ⊗ F (cf. [221]) and we can apply the machinery just described to get the minimizer σ e of S(ρ| · ). If d > 2 holds it turns out that σ e= d+1 d−1 P+ ⊗ P + + P− ⊗ P − 2d tr(P+ )2 2d tr(P− )2 (5.48) holds (where P± denote the projections onto the symmetric and antisymmetric subspaces of H ⊗ H) and not σ e = σ0 ⊗ σ0 as one would expect. As a consequence we get the inequality ¶ µ 2d − 1 ⊗2 < 2 = S(ρ⊗2 |σ0⊗2 ) = 2ER (ρ). (5.49) ER (ρ ) = 2 − log2 d d = 2 is a special case, where σ0⊗2 and σ e (and all their convex linear combination) give the same value 2. Hence for d > 2 the Relative Entropy of Entanglement is, as stated, not additive. Chapter 6 Channel capacity In Section 4.4 we have seen that it is possible to send (quantum) information undisturbed through a noisy quantum channel, if we encode one qubit into a (possibly long and highly entangled) string of qubits. This process is wasteful, since we have to use many instances of the channel to send just one qubit of quantum information. It is therefore natural to ask, which resources we need at least if we are using the best possible error correction scheme. More precisely the question is: With which maximal rate, i.e. information sent per channel usage, we can transmit quantum information undisturbed through a noisy channel? This question naturally leads to the concept of channel capacities which we will review in this chapter. 6.1 Definition and elementary properties A quantum channel T can be used to send quantum as well as classical data. Hence we can associate a classical and a quantum capacity to T . The basic ideas behind both quantities are, however, quite similar. In this section we will consider therefore a general definition of capacity which applies to arbitrary channels and both kinds of information. (See also [232] as a general reference for this section.) 6.1.1 The definition Hence consider two observable algebras A1 , A2 and an arbitrary channel T : A1 → A2 . To send systems described by a third observable algebra B undisturbed through T we need an encoding channel E : A2 → B and a decoding channel D : B → A1 such that ET D equals the ideal channel B → B, i.e. the identity on B. Note that the algebra B describing the systems to send, and the input respectively output algebra of T need not to be of the same type, e.g. B can be classical while A 1 , A2 are quantum (or vice versa). In general (i.e. for arbitrary T and B) it is of course impossible to find such a pair E and D. In this case we are interested at least in encodings and decodings which make the error produced during the transmission as small as possible. To make this statement precise we need a measure for this error and there are in fact many good choices for such a quantity (many of them leading to equivalent results). We will use in the following the “cb-norm difference” kET D − Id kcb , where Id is the identity (i.e. ideal) channel on B and k · kcb denotes the norm of complete boundedness (“cb-norm” for short) kT kcb = sup kT ⊗ Idn k, n∈N Idn : B(Cn ) → B(Cn ) (6.1) The cb-norm improves the sometimes annoying property of the unusual operator norm that quantities like kT ⊗ IdB(Cd ) k may increase with the dimension d. On infinite dimensional observable algebras kT kcb can be infinite although each term in the supremum is finite. A particular example for a map with such a behavior is the transposition on an infinite dimensional Hilbert space. A map with finite cb-norm is therefore called completely bounded. In a finite dimensional setup each linear map is completely bounded. For the transposition Θ on Cd we have in particular kΘkcb = d. The cb-norm has some nice features which we will use frequently; this includes its multiplicativity kT1 ⊗T2 kcb = kT1 kcb kT2 kcb and the fact that kT kcb = 1 holds for each (unital) channel. Another useful relation is kT kcb = kT ⊗ IdB(H) k, which holds if T is a map B(H) → B(H). For more properties of the cb-norm let us refer to [178]. 6.1. Definition and elementary properties 91 Now we can associate to a pair of channels T : A1 → A2 and S : B1 → B2 the quantity ∆(T, S) = inf kET D − Skcb , (6.2) E,D where the infimum is taken over all encoding and decoding channels E : A2 → B2 respectively D : B1 → A1 . The map S plays the role of a reference channel and ∆(T, S) is the minimal error we have to take into account if we want to simulate S by T and appropriate encodings and decodings. If we try in particular to transmit B systems through T we have to choose B1 = B2 = B and S = IdB . In this case we write ∆(T, B) = ∆(T, IdB ) = inf kET D − IdB kcb . (6.3) E,D In Section 4.4, we have seen that we can reduce the error if we take M copies of the channel instead of just one. More generally we are interested in the transmission of “codewords of length” N , i.e. B ⊗N systems using M copies of the channel T . Encodings and decodings are in this case channels of the form E : A⊗M → B ⊗N 2 ⊗M ⊗N respectively D : B → A1 . If we increase the number M of channels the error ∆(T ⊗M , B ⊗N ) decreases provided the rate with which N grows as a function of M is not too large. A more precise formulation of this idea leads to the following definition. Definition 6.1.1 A number c ≥ 0 is called achievable rate for a channel T with respect to a reference channel S, if for any pair of sequences Mj , Nj , j ∈ N with Mj → ∞ and lim supj→∞ Nj /Mj < c we have lim ∆(T ⊗Mj , S ⊗Nj ) = 0. j→∞ (6.4) The supremum of all achievable rates is called the capacity of T with respect to S and denoted by C(T, S). If S is the ideal channel on an observable algebra B we write C(T, B) instead of C(T, IdB ). Similarly we write C(A, S) if T is an ideal A channel. Note that by definition c = 0 is an achievable rate hence C(T, S) ≥ 0. If on the other hand each c > 0 is achievable we write C(T, B) = ∞. At a first look it seems cumbersome to check all pairs of sequences with given upper ratio when testing c. Due to some monotonicity properties of ∆, however, it can be shown that it is sufficient to check only one sequence provided the Mj satisfy the additional condition Mj /(Mj+1 ) → 1. This is the subject of the following lemma. Lemma 6.1.2 Let (Mα )α∈N be a strictly increasing sequence of integers such that limα Mα+1 /Mα = 1. Suppose Nα are integers such that limα ∆(T ⊗Mα , S ⊗Na ) = 0. Then any Nα (6.5) c < lim inf α Mα is an admissible rate. Moreover, if the errors decrease exponentially, in the sense that ∆(T ⊗Mα , S ⊗Nα ) ≤ µe−λMα (µ, λ ≥ 0), then they decrease exponentially for M → ∞ with rate −1 ln ∆(T ⊗M , S ⊗bcM c ) ≥ λ, (6.6) lim inf M →∞ M where bxc denotes the largest integer smaller then x. Proof. Let us introduce the notation c+ = lim inf α Nα /Mα , so c < c+ . We pick η > 0 such that (1 + η)c < c+ . Then for sufficiently large α ≥ α0 we have (Mα+1 /Mα ) ≤ (1 + η), and (Nα /Mα ) ≥ (1 + η)c. Now let M ≥ Mα0 , and consider the unique index α such that Mα ≤ M ≤ Mα+1 . Then M ≤ (1 + η)Mα and bcM c ≤ cM ≤ c(1 + η)Mα ≤ Nα . (6.7) 6. Channel capacity 92 Clearly, ∆(T ⊗M , S ⊗N ) decreases as M increases, because good coding becomes easier if we have more parallel channels and increases with N , because if a coding scheme works for codewords of length N it also works at least as well for shorter codewords. Hence ∆(T ⊗M , S ⊗bcM c ) ≤ ∆(T ⊗Mα , S ⊗Nα ) → 0. It follows that c is an admissible rate. With the exponential bound on ∆ we find similarly that ∆(T ⊗M , S ⊗bcM c ) ≤ µ e−λMα ≤ µ e−λ/(1+η)M , (6.8) so that the liminf in (6.6) is ≥ λ/(1 + η). Since η was arbitrary, we get the desired result. 2 6.1.2 Elementary properties We see that there are in fact many different capacities of a given channel depending on the type of information we want to transmit. However, there are only two different cases we are interested in: B can be either classical or quantum. We will discuss both special cases in greater detail. Before we do this, however, we will have a short look on some simple calculations which can be done in the general case. To this end it is convenient to introduce the notations Md = B(Cd ) Cd = C({1, . . . , d}) and (6.9) as shorthand notations for B(Cd ) and C({1, . . . , d}) since some notations become otherwise a little bit clumsy. Our first topic are capacities of ideal channels Proposition 6.1.3 The capacities of ideal channels are given by C(Cf , Cd ) = C(Mf , Md ) = C(Mf , Cd ) = log2 f . log2 d (6.10) Proof. It is obvious that we can transmit N d-level systems through M parallel copies of an ideal channel Mf → Mf provided f M > dN . Hence C(Mf , Md ) ≥ log2 f / log2 d. Similar reasoning holds for the other cases; hence we only have to show that no bigger rate can be achieved. To this end assume that c > log 2 f / log2 d is achievable. Then we have by definition ⊗Mj lim ∆(Mf j→∞ ⊗Nj , Md )=0 (6.11) for sequences Mj , Nj , j ∈ N with c > limj→∞ Nj /Mj > log2 f / log2 d. This implies ⊗M ⊗N that there is a j0 ∈ N such that dim Md j > dim Mf j holds for all j > j0 . ⊗Nj Therefore each decoding map D : Md ⊗N Md j ⊗Mj → Mf must have a nontrivial kernel. with D(A) = 0 and kAk = 1. Then we have for any k ∈ N and Let A ∈ B ∈ Mk with kBk = 1: kED − Id kcb ≥ k(ED − Id) ⊗ Id k ≥ k(ED − Id)(A) ⊗ Id(B)k = 1. ⊗M (6.12) ⊗N Hence ∆(Mf j , Md j ) ≥ 1 for all j > j0 in contradiction to (6.11) which implies C(Md , Mf ) = log2 f / log2 d. Similar reasoning holds for C(Cf , Cd ) and C(Mf , Cd ), and the proof is complete1 . 2 In the previous proposition we have excluded the case C(Cf , Md ), i.e. the quantum capacity of an ideal classical channel. From the “no-teleportation theorem” we expect that this quantity is zero. For a proof of this statement it is useful to introduce first a simple upper bound on C(T, Md ) (cf. [116]) 1 For the classical capacity of a quantum channel C(M , C ), it is, however, more difficult f d to derive an analog of the error estimate (6.12). We skip this part nevertheless and leave the corresponding details to the reader. 6.1. Definition and elementary properties 93 Lemma 6.1.4 For each channel T we have C(T, Md ) ≤ logd kT Θkcb (6.13) where ΘA = AT denotes the transposition. Proof. We start with the fact that kΘkcb = d if d is the dimension of the Hilbert space on which Θ operates. Assume that Nj /Mj → c ≤ C(T, Md ) and j large N enough such that k Idd j −Ej T ⊗Mj Dj k ≤ ² with appropriate encodings and decodings Ej , Dj . We get (where Idd denotes the identity on Md ) N N dNj = k Idd j Θkcb ≤ kΘ(Idd j −Ej T ⊗Mj Dj )kcb + kΘEj T ⊗Mj Dj kcb ≤ N dNj k Idd j ≤d Nj ²+ −Ej T ⊗Mj Dj kcb + kΘEj Θ(ΘT )⊗Mj Dj kcb M kΘT kcbj , (6.14) (6.15) (6.16) where we have used for the last equation the fact that Dj and ΘEj Θ are channels and that the cb-norm is multiplicative. Taking logarithms on both sides we get Nj logd (1 − ²) + ≤ logd kΘT kcb . Mj Mj (6.17) In the limit j → ∞ this implies c ≤ log d kΘT k and therefore C(T, Md ) ≤ logd kΘT kcb as stated. 2 If, e.g. T is classical we have ΘT = T since the transposition coincides on a classical algebra Cd with the identity (elements of Cd are just diagonal matrices). This implies Cθ (T ) = log2 kΘT kcb = log2 kT kcb = 0, because the cb-norm of a channel is 1. We see therefore that the quantum capacity of a classical channel is 0, as expected. Corollary 6.1.5 for each classical channel T : Cf → Ck we have C(T, Md ) = 0. Now let us consider three channels T1 , T2 , T3 . On the one hand we can simulate the reference channel T1 directly by T3 . On the other we can first simulate T1 by T2 and then T2 by T3 . The next proposition shows that the second approach is potentially wasteful. Proposition 6.1.6 For three channels T1 , T2 , T3 the two step coding inequality holds: C(T3 , T1 ) ≥ C(T2 , T1 )C(T3 , T2 ). (6.18) Proof. Consider the relations kT1⊗N − E1 E2 T3⊗K D2 D1 kcb = kT1⊗N − E1 T2⊗M D1 + E1 T2⊗M D1 − E1 E2 T3⊗K D2 D1 kcb (6.19) ≤ (6.21) ≤ kT1⊗N kT1⊗N − − E1 T2⊗M D1 kcb E1 T2⊗M D1 kcb + + kE1 kcb kT2⊗M − E2 T3⊗K D2 kcb kD1 kcb kT2⊗M − E2 T3⊗K D2 kcb (6.20) where we have used for the last inequality the fact that the cb-norm of a channel is one. If c1 is an achievable rate of T1 with respect to T2 such that limj→∞ Nj /Mj < c1 and c2 is an achievable rate of T2 with respect to T3 such that limj→∞ Mj /Kj < c2 (i.e. the sequences of quotients converge) we see that lim inf j→∞ M j Nj Nj Mk Nj = lim inf ≤ lim lim . j→∞ Kj Mj j→∞ Mj k→∞ Kk Kj (6.22) 6. Channel capacity 94 Hence each c < c1 c2 is achievable. Since C(T1 , T3 ) is the supremum over all achievable rates we get (6.18). 2 As a first application of (6.18), we can relate all capacities C(T, Md ) (and C(T, Cd )) for different d to one another. If we choose T3 = T , T1 = IdMd and 2f T2 = IdMf we get with (6.1.3) C(T, Md ) ≥ log log2 d C(T, Mf ), and exchanging d with f shows that even equality holds. A similar relation can be shown for C(T, C d ). Hence the dimension of the observable algebra B describing the type of information to be transmitted, enters only via a multiplicative constant, i.e. it is only a choice of units and we define the classical capacity Cc (T ) and the quantum capacity Cq (T ) of a channel T as2 Cc (T ) = C(T, C2 ), Cq (T ) = C(T, M2 ). (6.23) A second application of Equation (6.18) is a relation between the classical and the quantum capacity of a channel. Setting T3 = T , T1 = IdC2 and T2 = IdM2 we get again with (6.1.3) Cq (T ) ≤ Cc (T ). (6.24) Note that it is now not possible to interchange the roles of C2 and M2 . Hence equality does not hold here. Another useful relation concerns concatenated channels: We transmit information of type B first through a channel T1 and then through a second channel T2 . It is reasonable to assume that the capacity of the composition T2 T1 can not be bigger than capacity of the channel with the smallest bandwidth. Proposition 6.1.7 Two channels T1 , T2 satisfy the “ Bottleneck inequality”: C(T2 T1 , B) ≤ min{C(T1 , B), C(T2 , B)}. (6.25) Proof. To see this consider an encoding and a decoding channel E respectively D for (T2 T1 )⊗M , i.e. in the definition of C(T2 T1 , B) we look at ⊗M ⊗M k Id⊗N Dkcb = k Id⊗N )T1⊗M Dkcb . B −E(T2 T1 ) B −(ET2 (6.26) ET2⊗M This implies that and D are an encoding and a decoding channel for T1 . Something similar holds for D and T1⊗M D with respect to T2 . Hence each achievable rate for T2 T1 is also an achievable rate for T2 and T1 , and this proves Equation (6.25). 2 Finally we want to consider two channels T1 , T2 in parallel, i.e. we consider the tensor product T1 ⊗ T2 . Proposition 6.1.8 The channel capacity is superadditive, i.e. C(T1 ⊗ T2 , B) ≥ C(T1 , B) + C(T2 , B) (6.27) for any pair of channels T1 , T2 . Proof. If Ej , Dj , j = 1, 2 are encoding, respectively decoding channels for T1⊗M ⊗N and T2⊗M such that k IdB j −Ej Tj⊗M Dj kcb ≤ ² holds, we get k Id − Id ⊗(E2 T ⊗M D2 ) + Id ⊗(E2 T ⊗M D2 ) − E1 ⊗ E2 (T1 ⊗ T2 )⊗M D1 ⊗ D2 kcb (6.28) ≤ k Id ⊗(Id −E2 T ⊗M D2 )kcb + k(Id −E1 T1⊗M D1 ) ⊗ E2 T ⊗M D2 kcb ≤ k Id −E2 T ⊗M D2 kcb + k Id −E1 T1⊗M D1 kcb ≤ 2² (6.29) (6.30) 2 There are other possibilities to define the quantum capacity [24, 13, 12] which are at least closely related to our version. It is not yet clear whether equality holds. There might be subtle differences [147]. 6.2. Coding theorems 95 Hence c1 + c2 is achievable for T1 ⊗ T2 if cj is achievable for Tj , which completes the proof. 2 When all channels are ideal, or when all systems involved are classical even equality holds, i.e. channel capacities are additive in this case. However, if quantum channels are considered, it is one of the big open problems of the field, to decide under which conditions additivity holds. 6.1.3 Relations to entanglement measures The duality lemma proved in Subsection 2.3.3 provides an interesting way to derive bounds on channel capacities and capacity like quantities from entanglement measures (and vice versa) [24, 122]: To derive a state of a bipartite system from a channel T we can take a maximally entangled state Ψ ∈ H ⊗ H, send one particle through T and get a less entangled pair in the state ρT = (Id ⊗T ∗ )|ΨihΨ|. If on the other hand an entangled state ρ ∈ S(H ⊗ H) is given, we can use it as a resource for teleportation and get a channel Tρ . The two maps ρ 7→ Tρ and T 7→ ρT are, however, not inverse to one another. This can be seen easily from the duality lemma (Theorem 2.3.5): For each state ρ ∈ S(H ⊗ H) there is a channel T and a pure state Φ ∈ H ⊗ H such that ρ = (Id ⊗T ∗ )|ΦihΦ| holds; but Φ is in general not maximally entangled (and uniquely determined by ρ). Nevertheless, there are special cases in which the state derived from Tρ coincides with ρ: A particular class of examples is given by teleportation channels derived from a Bell-diagonal state. On ρT we can evaluate an entanglement measure E(ρT ) and get in this way a quantity which is related to the capacity of T . A particularly interesting candidate for E is the “one-way LOCC” distillation rate ED,→ . It is defined in the same way as the entanglement of distillation ED , except that only one-way LOCC operation are allowed in Equation (5.8). According to [24] ED,→ is related to Cq by the inequalities ED,→ (ρ) ≥ Cq (Tρ ) and ED,→ (ρT ) ≤ Cq (T ). Hence if ρTρ = ρ we can calculate ED,→ (ρ) in terms of Cq (Tρ ) and vice versa. A second interesting example is the transposition bound Cθ (T ) introduced in the last subsection. It is related to the logarithmic negativity [220] Eθ (ρT ) = log2 k(Id ⊗Θ)ρT k1 , (6.31) which measures the degree with which the partial transpose of ρ fails to be positive. Eθ can be regarded as entanglement measure although it has some drawbacks: it is not LOCC monotone (Axiom E2), it is not convex (Axiom E3) and most severe: It does not coincides with the reduced von Neumann entropy on pure states, which we have considered as “the” entanglement measure for pure states. On the other hand it is easy to calculate and it gives bounds on distillation rates and teleportation capacities [220]. In addition Eθ can be used together with the relation between depolarizing channels and isotropic states to derive Equation (6.50) in a very simple way. 6.2 Coding theorems To determine channel capacities directly in terms of Definition 6.1.1 is fairly difficult, because optimization problems in spaces of exponentially fast growing dimensions are involved. This renders in particular each direct numerical approach practically impossible. It is therefore an important task of (quantum) information theory to express channel capacities in terms of quantities which are easier to compute. In this section we will review the most important of these “coding theorems”. 6.2.1 Shannon’s theorem Let us consider first a classical to classical channel T : C(Y ) → C(X). This is basically the situation of classical information theory and we will only have a short 6. Channel capacity 96 look here – mainly to show how this (well known) situation fits into the general scheme described in the last section3 . First of all we have to calculate the error quantity ∆(T, C2 ) defined in Equation (6.2). As stated in Subsection 3.2.3 T is completely determined by its transition probabilities Txy , (x, y) ∈ X × Y describing the probability to receive x ∈ X when y ∈ Y was sent. Since the cb-norm for a classical algebra coincides with the ordinary norm we get (we have set X = Y for this calculation): ¯ ¯ ¯X ¯ ¯ ¯ k Id −T kcb = k Id −T k = sup ¯ (δxy − Txy ) fy ¯ (6.32) ¯ ¯ x,f y = 2 sup (1 − Txx ) (6.33) x where the supremum in the first equation is taken over all f ∈ C(X) with kf k = supy |fy | ≤ 1. We see that the quantity in Equation (6.33) is exactly twice the maximal error probability, i.e. the maximal probability of sending x and getting anything different. Inserting this quantity for ∆ in Definition 6.1.1 applied to a classical channel T and the “bit-algebra” B = C2 , we get exactly Shannon’s classical definition of the capacity of a discrete memoryless channel [191]. Hence we can apply Shannon’s noisy channel coding theorem to calculate C c (T ) for a classical channel. To state it we have to introduce first some terminology. Consider therefore a state p ∈ C ∗ (X) of the classical input algebra C(X) and its image q = T ∗ (p) ∈ C ∗ (Y ) under the channel. p and q are probability distributions on X respectively Y and px canPbe interpreted as the probability that the “letter” x ∈ X was send. Similarly qy = x Txy px is the probability that y ∈ Y was received and Pxy = Txy px is the probability that x ∈ X was sent and y ∈ Y was received. The family of all Pxy can be interpreted as a probability distribution P on X × Y and the Txy can be regarded as conditional probability of P under the condition x. Now we can introduce the mutual information ¶ µ X Pxy , (6.34) I(p, T ) = S(p) + S(q) − S(P ) = Pxy log2 px q y (x,y)∈X×Y where S(p), S(q) and S(P ) denote the entropies of p, q and P . The mutual information describes, roughly speaking, the information that p and q contain about each other. E.g. if p and q are completely uncorrelated (i.e. Pxy = px qy ) we get I(p, T ) = 0. If T is on the other hand an ideal bit-channel and p equally distributed we have I(p, T ) = 1. Now we can state Shannon’s Theorem which expresses the classical capacity of T in terms of mutual informations [191]: Theorem 6.2.1 (Shannon) The classical capacity of Cc (T ) of a classical communication channel T : C(Y ) → C(X) is given by Cc (T ) = sup I(p, T ), (6.35) p where the supremum is taken over all states p ∈ C ∗ (X). 6.2.2 The classical capacity of a quantum channel If we transmit classical data through a quantum channel T : B(H) → B(H) the encoding E : B(H) → C2 is a parameter dependent preparation and the decoding D : C2 → B(H) is an observable. Hence the composition ET D is a channel C2 → C2 , i.e. a purely classical channel and we can calculate its capacity in terms of Shannon’s Theorem (Theorem 6.2.1). The corresponding quantity supE,D Cc (ET D) is 3 Please note that this implies in particular that we do not give a complete review of the foundations of classical information theory here; cf [140, 88, 63] instead. 6.2. Coding theorems 97 obviously a lower bound to the classical capacity. It can be defined alternatively in terms of Definition 6.1.1 if we allow only encodings (which are for the classical capacity parameter dependent preparations) E ∗ : C ∗ (X N ) → B ∗ (H⊗M ) which are based on separable signal states and similarly decodings D : C(X N ) → B(H⊗N ) which allow only separable measurements. It is known that the value supE,D Cc (ET D) can be improved if we allow entangled measurements for decodings [112]. The corresponding capacity, which uses separable signal states but arbitrary decodings is called the one-shot classical capacity of T and denoted by Cc,1 (T ). It is not known in general whether entangled encodings would provide further improvements. Therefore we only have 1 Cc,1 (T ⊗M ) (6.36) Cc,1 (T ) ≤ Cc (T ) = sup M M ∈N for a general T . There are however examples for which Cc,1 is additive and coincides therefore with the classical capacity. This concerns in particular the depolarizing channel [142] and all qubit channels [141]. Another reason why Cc,1 (T ) is an interesting quantity relies on the fact that we have, due to the following theorem [112] a computable expression for it. Theorem 6.2.2 The one-shot classical capacity Cc,1 (T ) of a quantum channel T : B(H) → B(H) is given by X X ¡ ¢ Cc,1 (T ) = sup S pj T ∗ [ρj ] − (6.37) pj S T ∗ [ρj ] , pj ,ρj j j where the supremum is taken over all probability distributions pj and collections of density operators ρj . 6.2.3 Entanglement assisted capacity Another classical capacity of a quantum channel arises, if we use dense coding schemes instead of simple encodings and decodings to transmit the data through the channel T . In other words we can define the entanglement enhanced classical capacity Ce (T ) in the same way as Cc (T ) but by replacing the encoding and decoding channels in Definition 6.1.1 and Equation (6.2) by dense coding protocols. Note that this implies that the sender Alice and the receiver Bob share an (arbitrary) amount of (maximally) entangled states prior to the transmission. For this quantity a coding theorem was recently proven by Bennett and others [26] which we want to state in the following. To this end assume that we are transmitting systems in the state ρ ∈ B ∗ (H) through the channel and that ρ has the purification Ψ ∈ H ⊗ H, i.e. ρ = tr1 |ΨihΨ| = tr2 |ΨihΨ|. Then we can define the entropy exchange h¡ ¢¡ ¢i S(ρ, T ) = S T ⊗ Id |ΨihΨ| . (6.38) ¡ ¢¡ ¢ The density operator T ⊗ Id |ΨihΨ| has the output state T ∗ (ρ) and the input state ρ as its partial traces. It can be regarded therefore as the quantum analog of the input/output probability distribution Pxy defined in Subsection 6.2.1. Another way to look at S(ρ, T ) is in terms of an ancilla representation of T : If T ∗ (ρ) = trK (U ρ ⊗ ρK U ∗ ) with a unitary U on H ⊗ K and a pure environment state ρK it can be shown [13] that S(ρ, T ) = S [TK∗ ρ] where TK is the channel describing the information transfer into the environment, i.e. TK∗ (ρ) = trH (U ρ ⊗ ρK U ∗ ), in other words S(ρ, T ) is the final entropy of the environment. Now we can define I(ρ, T ) = S(ρ) + S(T ∗ ρ) − S(ρ, T ) (6.39) which is the quantum analog of the mutual information given in Equation (6.34). It has a number of nice properties, in particular positivity, concavity with respect 6. Channel capacity 98 to the input state and additivity [3] and its maximum with respect to ρ coincides actually with Ce (T ) [26]. Theorem 6.2.3 The entanglement assisted capacity Ce (T ) of a quantum channel T : B(H) → B(H) is given by Ce (T ) = sup I(ρ, T ), (6.40) ρ where the supremum is taken over all input states ρ ∈ B ∗ (H). Due to the nice additivity properties of the quantum mutual information I(ρ, T ) the capacity Ce (T ) is known to be additive as well. This implies that it coincides with the corresponding “one-shot” capacity, and this is an essential simplification compared to the classical capacity Cc (T ). 6.2.4 The quantum capacity Although there is no coding theorem for the quantum capacity Cq (T ), there is a fairly good candidate which is related to the coherent information J(ρ, T ) = S(T ∗ ρ) − S(ρ, T ). (6.41) Here S(T ∗ ρ) is the entropy of the output state and S(ρ, T ) is the entropy exchange defined in Equation (6.38). It is argued [13] that J(ρ, T ) plays a role in quantum information theory which is analogous to that of the (classical) mutual information (6.34) in classical information theory. J(ρ, T ) has some nasty properties, however: it can be negative [51] and it is known to be not additive [71]. To relate it to Cq (T ) it is therefore not sufficient to consider a one-shot capacity as in Shannon’s Theorem (Thm 6.2.1). Instead we have to define Cs (T ) = sup N 1 Cs,1 (T ⊗N ) with Cs,1 (T ) = sup J(ρ, T ). N ρ (6.42) In [13] and [14] it is shown that Cs (T ) is an upper bound on Cq (T ). Equality, however, is conjectured but not yet proven, although there are good heuristic arguments [153],[122]. A second interesting quantity which provides an upper bound on the quantum capacity uses the transposition operation Θ on the output systems. We have seen already in Lemma 6.1.4 that Cq (T ) ≤ Cθ (T ) = log2 kT Θkcb (6.43) holds for any channel. Cθ is in many cases a weaker bound as Cs , however it is much easier to calculate and it is in particular useful if we want identify cases where the quantum capacity is zero (e.g. the quantum capacity of a classical channel discussed in Corollary 6.1.5). Finally we want to mention that lower bounds can be derived in terms of rates which can be achieved in terms of distinguished coding schemes; cf. e.g [24, 99, 71, 158]. A detailed discussion of this approach is given in the next chapter. 6.2.5 Examples Although the expressions provided in the coding theorems above are much easier to calculate then the original definitions they still involve some optimization problems over possibly large parameter spaces. Nevertheless there are special cases which allow explicit calculations. As a first example we will consider the “quantum erasure channel” which transmits with probability 1−ϑ the d-dimensional input state intact 6.2. Coding theorems 99 while it is replaced with probability ϑ by an “erasure symbol”, i.e. a (d + 1) th pure state ψe which is orthogonal to all others [100]. In the Schrödinger picture this is B ∗ (Cd ) 3 ρ 7→ T ∗ (ρ) = (1 − ϑ)ρ + ϑ tr(ρ)|ψe ihψe | ∈ B ∗ (Cd+1 ). (6.44) This example is very unusal, because all capacities discussed up to now can be calculated explicitly: We get Cc,1 (T ) = Cc (T ) = (1 − ϑ) log2 (d) for the classical, Ce (T ) = 2Cc (T ) for the entanglement enhanced classical capacity and Cq (T ) = max(0, (1 − 2ϑ) log2 (d)) for the quantum capacity [23, 25]. Hence the gain by entanglement assistance is exactly a factor two; cf. Figure 6.1. 2 classical capacity ee. classical capacity quantum capacity Ce (T ) Cc (T ) Cq (T ) 1.5 1 0.5 0 0 0.2 0.4 0.6 0.8 1 ϑ Figure 6.1: Capacities of the quantum erasure channel plotted as a function of the error probability. Our next example is the depolarizing channel B ∗ (Cd ) 3 ρ 7→ T ∗ (ρ) = (1 − ϑ)ρ + ϑ tr(ρ) 1I ∈ B ∗ (Cd ), d (6.45) already discussed in Section 3.2. It is more interesting and more difficult to study. Using the unitary covariance of T (cf. Subsection 3.2.2) we see first that I(U ρU ∗ , T ) = I(ρ, T ) holds for all unitaries U (to calculate S(U ρU ∗ , T ) note that U ⊗ U Ψ is a purification of U ρU ∗ if Ψ is a purification of ρ). Due to the concavity of I(ρ, T ) in the first argument we can average over all unitaries and see that the maximum in Equation (6.40) is achieved on the maximally mixed state. Straightforward calculation therefore shows that µ µ ¶ ¶ d2 − 1 d2 − 1 d2 − 1 ϑ Ce (T ) = log2 (d2 ) + 1 − ϑ 1 − ϑ log +ϑ log2 2 (6.46) 2 2 2 d d d2 d holds, while we have ¶ µ ¶ µ ϑ d−1 d−1 d−1 log2 , log2 1 − ϑ +ϑ Cc,1 (T ) = log2 (d) + 1 − ϑ d d d d (6.47) where the maximum in Equation (6.37) is achieved for an ensemble of equiprobable pure states taken from an orthonormal basis in H [114]. This is plausible since 6. Channel capacity 100 the first term underPthe sup in Equation (6.37) becomes maximal and the second ∗ becomes minimal: j pj T ρj is maximally mixed in this case and its entropy is therefore maximal. The entropies of the T ∗ ρj are on the other hand minimal if the ρj are pure. In Figure 6.2 we have plotted both capacities as a function of the noise parameter ϑ and in Figure 6.3 we have plotted the quotient Ce (T )/Cc (T ) which gives an upper bound on the gain we get from entanglement assistance. Note in this context that due to a result of King [142] Cc (T ) = Cc,1 (T ) holds for the depolarizing channel. 2 one-shot cl. capacity entanglement enhanced cl. capacity Ce (T ) Cc (T ) 1.8 1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 θ Figure 6.2: Entanglement enhanced classical and classical capacity of a depolarizing qubit channel. 3 Ce (T ) Cc,1 (T ) 2.9 2.8 2.7 2.6 2.5 2.4 2.3 2.2 2.1 2 0 0.2 0.4 0.6 0.8 1 θ Figure 6.3: Gain of using entanglement assisted versus unassisted classical capacity for a depolarizing qubit channel. 6.2. Coding theorems 101 1 one-shot coherent information transposition bound Hamming bound Cθ (T ) Cs,1 (T ) 0.8 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 ϑ Figure 6.4: Cθ (T ), Cs (T ) and the Hamming bound of a depolarizing qubit channel plotted as function of the noise parameter ϑ. For the quantum capacity of the depolarizing channel precise calculations are not available. Hence let us consider first the coherent information. J(T, ρ) inherits from T its unitary covariance, i.e. we have J(U ρU ∗ , T ) = J(ρ, T ). In contrast to the mutual information, however, it does not have nice concavity properties, which makes the optimization over all input states more difficult to solve. Nevertheless, the calculation of J(ρ, T ) is straightforward and we get in the qubit case (if ϑ is the noise parameter of T and λ is the highest eigenvalue of ρ): µ ϑ J(ρ, T ) = S λ(1 − ϑ) + 2 ¶ −S µ ¶ 1 − ϑ/2 − A −S 2 µ ¶ µ ¶ λϑ (1 − λ)ϑ −S −S 2 2 1 − ϑ/2 + A 2 ¶ µ where S(x) = −x log2 (x) denotes again the entropy function and p A = (2λ − 1)2 (1 − ϑ/2)2 + 4λ(1 − λ)(1 − ϑ)2 . (6.48) (6.49) Optimization over λ can be performed at least numerically (the maximum is attained at the left boundary (λ = 1/2) if J is positive there, and the right boundary otherwise). The result is plotted together with Cθ (T ) in Figure 6.4 as a function of θ. The quantity Cθ (T ) is much easier to compute and we get ¶ µ 3 (6.50) Cθ (T ) = max{0, log2 2 − θ }. 2 6. Channel capacity 102 Part II Advanced topics Chapter 7 Continuity of the quantum capacity In the Section 6.2 we have stated that a coding theorem for the quantum capacity is not yet available. Nevertheless, there are several subproblems which can be treated independently and which admit simpler solutions. One of them concerns the question we are going to answer in the following: Is it possible to correct small errors with small coding effort? Or more precisely: If a channel T is close to an identical channel Id, is Cq (T ) close to Cq (Id)?. The arguments in this chapter are based on [139]. Closely related discussions given by other authors are [105, 158] 7.1 Discrete to continuous error model In Section 4.4 we have described how errors can be corrected, which occur only on a small number k of n > k parallel channels. Hence the corresponding schemes correct rare rather than small errors which occur on each copy of the parallel channels T ⊗n . Nevertheless, the discrete theory can be applied to the situation we are studying in this chapter. This is the content of the following Proposition. It is the appropriate formulation of “reducing the order of errors from ε to εf +1 ”. Proposition 7.1.1 Let T : B(H) → B(H) be a channel, and let E, D be encoding and decoding channels for coding m systems into n systems. Suppose that this coding scheme corrects f errors (Definition 4.4.1), and that Then kT − id kcb ≤ (f + 1)/(n − f − 1). (7.1) kET ⊗n D − id kcb ≤ kT − id kfcb+1 2nH2 ((f +1)/n) , (7.2) where H2 (r) = −r log2 r − (1 − r) log2 (1 − r) denotes the Shannon entropy of the probability distribution (r, 1 − r). Proof. Into ET ⊗n D, we insert the decomposition T = id +(T − id) and expand the product. This gives 2n terms, containing tensor products with some number, say k, of tensor factors (T − id) and tensor factors id on the remaining (n − k) sites. Now when k ≤ f , the error correction property makes the term zero. Terms with k > f we estimate by kT − id kkcb . Collecting terms we get kET ⊗n D − id kcb ≤ µ ¶ n X n kT − id kkcb . k (7.3) k=f +1 The rest then follows from the next Lemma (with r = (f + 1)/n). It treats the exponential growth in n for truncated binomial sums. Lemma 7.1.2 Let 0 ≤ r ≤ 1 and a > 0 such that a ≤ r/(1 − r). Then, for all integers n: à n µ ¶ ! X n ¡ 1 (7.4) ak ≤ log ar ) + H2 (r) . log k n k=rn Proof. For λ > 0 we can estimate the step function by an exponential, and get n µ ¶ n µ ¶ X X n k n k λ(k−rn) a ≤ a e k k k=rn k=0 ¡ ¢n = e−λrn 1 + aeλ = M (λ)n (7.5) 7.2. Coding by random graphs 105 ¡ ¢ with M (λ) = e−λr 1 + aeλ . The minimum over all real λ is attained at aeλmin = r/(1−r). We get λmin ≥ 0 precisely when the conditions of the Lemma are satisfied, in which case the bound is computed by evaluating M (λ). 2 2 Suppose now that we find a family of coding schemes with n, m → ∞ with fixed rate r ≈ (m/n) of inputs per output, and a certain fraction f /n ≈ ε of errors being corrected. Then we can apply the Proposition and find that the errors can be estimated above by ´n ¡ ¢ ³ H2 (ε) ∆ T ⊗n , Mm ≤ 2 kT − id kεcb . (7.6) d This goes to zero, and even exponentially to zero, as soon as the expression in parentheses is < 1. This will be the case whenever kT − id kcb is small enough, or, more precisely, kT − id kcb ≤ 2−H2 (ε)/ε . (7.7) Note in addition that we have for all n ∈ N 2H2 (ε)/ε < ² − n1 1−²+ 1 n . (7.8) Hence the bound from Equation (7.1) is implied by (7.7). The function appearing on the right hand side of (7.7) looks rather complicated, so we will often replace it by a simpler one, namely ε ≤ 2−H2 (ε)/ε , e (7.9) where e is the base of natural logarithms; cf. Figure 7.1. The proof of this inequality is left to the reader as exercise in logarithms. The bound is very good (exact to first order) in the range of small ε, in which we are mostly interested anyhow. In any case, from kT − id k ≤ ε/e we can draw the same conclusion as from (7.7): exponentially decreasing errors, provided we can actually find code families correcting a fraction ² of errors. This will be the aim of the next section. 7.2 Coding by random graphs Our aim in this section is to apply the theory of graph codes (Subsection 4.4.2) to construct a family of codes with positive rate. It is not so easy to construct such families explicitly. However, if we are only interested in existence, and do not attempt to get the best possible rates, we can use a simple argument, which shows not only the existence of codes correcting a certain fraction of errors, but even that “typical graph codes” for sufficiently large numbers of inputs and outputs have this property. Here “typical” is in the sense of the probability distribution, defined by simply setting the edges of the graph independently, and each according to the uniform distribution of the possible values of the adjacency matrix. For the random method to work we need the dimension of the underlying one site Hilbert space to be a prime number. This curious condition is most likely an artefact of our method, and will be removed later on. We have seen that a graph code corrects many errors if certain submatrices of the adjacency matrix have maximal rank (Corollary 4.4.3). Therefore we need the following Lemma. Lemma 7.2.1 Let d be a prime, M < N integers and let X be an N × M -matrix with independent and uniformly distributed entries in Zd . Then X is singular over the field Zd with probability at most d−(N −M ) . 7. Continuity of the quantum capacity 106 1 2−H2 (²)/² 0.8 0.6 0.4 ² e 0.2 0 0 0.2 0.4 0.6 0.8 1 ² Figure 7.1: The two bounds from Equation (7.9) plotted as a function of ². Proof. The sum of independent uniformly distributed random variables in Z d is again uniformly distributed. Moreover, since d is prime, this distribution is invariant under multiplication by non-zero factors. Hence if xj ∈ Zd (j = 1, . . . , N ) are independent and uniformly distributed, and φj ∈ Zd are non-random constants, PN not all of which are zero, j=1 xj φj is uniformly distributed. Hence, for a fixed PM vector φ ∈ ZM d , the N components (Xφ)k = j=1 Xkj φj are independent uniformly distributed random variables. Hence the probability for Xφ = 0 for some fixed φ 6= 0 is d−N . Since there are dM − 1 vectors φ to be tested, the probability for some φ to yield Xφ = 0 is at most dM −N . 2 Proposition 7.2.2 Let d be a prime, and let Γ be a symmetric (n + m) × (n + m)matrix with entries in Zd , chosen at random such that Γkk = 0 and that the Γkj with k > j are independent and uniformly distributed. Let P be the probability for the corresponding graph code not to correct f errors (with 2f < n). Then ³ 2f ´ ³ m 4f ´ 1 log P ≤ + − 1 log d + H2 . (7.10) n n n n Proof. Each error configuration is an 2f -element subset of the n output nodes. According to Theorem 4.4.2 we have to decide, whether the corresponding (n − 2f ) × (m + 2f )-submatrix of Γ, connecting input and error positions with the remaining output positions, is singular or not. Since this submatrix contains no pairs Γ ij , Γji , its entries are independent and satisfy the conditions of the previous Lemma. Hence the probability that a particular configuration of e errors goes uncorrected is at ¡n¢ possible error configurations among the most d(m+2f )−(n−2f ) . Since there are 2f outputs, we can estimate¡ the probability of any 2f site error configuration to be ¢ m−n+4f n undetected as less than 2f d . Using Lemma 7.1.2 we can estimate the bi¡n¢ nomial as log 2f ≤ nH2 (2f /n), which leads to the bound stated. 2 In particular, if the right hand side of the inequality in (7.10) is negative, we 7.3. Results 107 get P < 1, so that there must be at least one matrix Γ correcting f errors. The crucial point is that this observation does not depend on n, but only on the rate-like parameters m/n and f /n. Let us make this behavior a Definition: Definition 7.2.3 Let d be an integer. Then we say a pair (µ, ²) consisting of a coding rate µ and an error rate ² is achievable, if for every n we can find an encoding E of dµne d-level systems into n d-level systems correcting b²nc errors. Then we can paraphrase the last proposition as saying that all pairs (µ, ²) with (1 − µ − 4²) log2 d > H2 (2²) (7.11) are achievable. This is all the input we need for the next section, although a better coding scheme, giving larger µ or larger ² would also improve the rate estimates proved there. Such improvements are indeed possible. E.g. for the qubit case (d = 2) it is shown in [47] that there is always a code which saturates the quantum GilbertVarshamov bound (1 − µ − 2² log2 (3)) > H2 (2²) which is slightly better than our result. But there are also known limitations, particularly the so-called Hamming bound. This is a simple dimension counting argument, based on the error correctors dream: Assuming that the scalar product (F, G) 7→ ω(F ∗ G) on the error space E is nondegenerate, the dimension of the “bad space” is the same as the dimension of the error space. Hence with the notations of Section 4.4 we expect dim H0 · dim E ≤ dim H2 . We now take m input systems and n output systems of dimension d each, so that dim H1 = dm and dim H2 = dn . For the space of errors happening at at most f places we introduce a basis s follows: at each site we choose a basis of B(H) consisting of d2 −1 operators plus the identity. Then a basis of E is given byP all tensor ¡ ¢ products with basis elements 6= 1I placed at j ≤ f sites. Hence dim E = j≤f nj (d2 − 1)j . For large n we estimate this as in Lemma 7.1.1 as log dim E ≈ (f /n) log2 (d2 − 1) + H2 (f /n). Hence the Hamming bound becomes f m log2 d + H2 (²) + log2 (d2 − 1) ≤ log2 d n n (7.12) which (with d2 À 1) is just (7.11) with a factor 1/2 on all errors. If we drop the nondegeneracy condition made above it is possible to find codes which break the Hamming bound [71]. In this case, however, we can consider the weaker singleton bound, which has to be respected by those degenerate codes as well. It reads m f 1− ≥d . (7.13) n n We omit its proof here (see [172] Sect. 12.4 instead). Both bounds are plotted together with the rate achieved by random graph coding in in Figure 7.2 (for d = 2). 7.3 Results We are now ready to apply the results about error correction just derived to the calculation of achievable rates and therefore to lower bounds on the quantum capacity. 7.3.1 Correcting small errors We first look at the problem which motivated our study, namely estimating the capacity of a channel T ≈ Id. Theorem 7.3.1 Let d be a prime, and let T be a channel on d-level systems. Suppose that for some 0 < ε < 1/2, k id −T k < 2−H2 (ε)/ε . (7.14) 7. Continuity of the quantum capacity µ 108 1 Hamming bound Singleton bound Achieved by random graph coding 0.8 0.6 0.4 0.2 0 0 0.1 0.2 0.3 0.4 0.5 ² Figure 7.2: Singleton bound and Hamming bound together with the rate achieved by random graph coding (for d = 2). The allowed regions are below the respective curve. Then Cq (T ) ≥ (1 − 4ε) log2 (d) − H2 (2ε) = g(²) (7.15) Proof. For every n set f = bεnc, and m = bµnc − 1, where µ is, up to a log 2 (d) factor, the right hand side of (7.15), i.e. µ = 1 − 4ε − log2 (d)−1 H2 (2ε). This ensures that the right hand side of (7.10) is strictly negative, so there must be a code for d-level systems, with m inputs and n outputs, and correcting f errors. To this code we apply Proposition 7.1.1, and insert the bound on k id −T k into Equation (7.6). bµnc−1 Thus ∆(T ⊗n , Md ) → 0, even exponentially. This means that any number < µ log2 (d) is an achievable rate. In other words, µ log 2 (d) is a lower bound to the capacity. 2 If ² > 0 is small enough the quantity on the right hand side of Equation (7.15) is strictly positive (cf. the dotted graph in Figure 7.2). Hence each channel which is sufficiently close to the identity allows (asymptotically) perfect error correction. Beyond that we see immediately that Cq (T ) is continuous (in the cb-norm) at T = Id: Since Cq (T ) is smaller than log2 (d) and g(²) is continuous in ² with g(0) = log2 (d) we find for each δ > 0 an ² > 0 exists, such that log 2 (d) − Cq (T ) < ² for all T with kT − Id kcb < ²/e. In other words if T is arbitrarily close to the identity its capacity is arbitrarily close to log 2 (d). In Corollary 7.3.3 below we will show the significantly stronger statement that Q is a lower semicontinuous function on the set of all channels. 7.3.2 Estimating capacity from finite coding solutions A crucial consequence of the ability to correct small errors is that we do not actually have to compute the limit defining the capacity: if we have a pretty good coding scheme for a given channel, i.e., one that gives us ET ⊗k D ≈ idd , then we know the errors can actually be brought to zero, and the capacity is close to the nominal rate of this scheme, namely log2 (d)/k. 7.3. Results 109 Theorem 7.3.2 Let T be a channel, not necessarily between systems of the same dimension. Let k, p ∈ N with p a prime number, and suppose there are channels E and D encoding and decoding a p-level system through k parallel uses of T , with 1 error ∆ = k idp −ET ⊗k Dkcb < 2e . Then Cq (T ) ≥ 1 log2 (p) (1 − 4e∆) − H2 (2e∆) . k k (7.16) Moreover, Cq (T ) is the least upper bound on all expressions of this form. Proof. We apply Proposition 7.3.1 to the channel Te = ET ⊗k D. With the random e and D e coding method we thus find a family of coding and decoding channels E from m0 into n0 systems, of p levels each, such that ¡ ¢ 0 e ET ⊗k D ⊗n Dk e cb → 0. k id −E (7.17) 0 This can be reinterpreted as an encoding of pm -dimensional systems through kn0 uses of the channel T (rather than Te), which corresponds to a rate 0 (kn0 )−1 log2 (pm ) = (log2 p/k)(m0 /n0 ). We now argue exactly as in the proof of the previous proposition, with ε = e∆, so that k idp −ET ⊗k Dkcb = ε/e ≤ 2H2 (ε)/ε (7.18) by equation (7.9). By random graph coding we can achieve the coding ratio µ ≈ 0 0 (m0 /n0 ) = 1 − 4ε − log2 (p)−1 H2 (2ε), and have the errors ∆(Te⊗n , Mm p ) go to zero exponentially. Since ¡ ¢ 0 0 0 e cb , e ET ⊗k D ⊗n Dk e⊗n0 , Mm0 ) ≤ k id −E ∆(T ⊗kn , Mm p p ) ≤ ∆(T (7.19) we can apply Lemma 6.1.2 to the channel T (where the sequence Mα is given by Mα = nα) and find that the rate µ(log2 p/k) is achievable. This yields the estimate claimed in Equation (7.16). To prove the second statement consider the function x → p(x) which associates to each real number x ≥ 2 the biggest prime p(x) with p(x) ≤ x. From known bounds on the length of gaps between two consecutive primes [127]1 it follows that limx→∞ x/p(x) = 1 holds, hence we get 2kc /p(2kc ) ≤ 1 + δ 0 for an arbitrary δ 0 > 0, provided n is large enough, but this implies £ ¤ log2 p(2kc ) log2 (1 + δ 0 ) c− < . (7.20) k k Since we can choose an achievable rate c arbitrarily close to the capacity C q (T ) this shows that there is for each δ > 0 a prime p and a positive integer k such that |Cq (T ) − log2 (p)/k| ≤ δ. In addition we can find a coding scheme E, D for T ⊗k such that Equation (7.18) holds, i.e. the right hand side of (7.16) can be arbitrarily close to log2 (p)/k, and this completes the proof. 2 This theorem allows us to derive very easily an important continuity property of the quantum capacity. It is well known that each function F (on a topological space) which is given as the supremum of¡ a set of ¢ real-valued, continuous functions is lower semicontinuous, i.e. the set F −1 (x, ∞] is open for each x ∈ R. Since the right hand side of Equation (7.16) is continuous in T and since Q(T ) is (according to Proposition 7.3.2) the supremum over such quantities, we get: Corollary 7.3.3 T 7→ Cq (T ) is lower semi-continuous in cb-norm. 1 If p denotes the nth prime and g(p ) = p n n n+1 − pn is the length of the gap between pn and pn+1 it is shown in [127] that g(p) is bounded by constp5/8+² . 7. Continuity of the quantum capacity 110 7.3.3 Error exponents Another consequence of Theorem 7.3.2 concerns the rate with which the error bcnc ∆(T ⊗n , M2 ) decays in the limit n → ∞. Theorem 7.3.2 says, roughly speaking that we can achieve each rate c < Cq (T ) by combining a coding scheme E, D with ¤ £ e D. e However, the error ∆ (ET ⊗n D)⊗l , Mk desubsequent random-graph coding E, p cays according to (7.6) and Proposition 7.2.2 exponentially. A more precise analysis of this idea leads to the following (cf. also the work Hamada [105]): Proposition 7.3.4 If T is a channel with quantum capacity Cq (T ) and c < Cq (T ), then, for sufficiently large n we have bcnc ∆(T ⊗n , M2 ) ≤ e−nλ(c) , (7.21) with a positive constant λ(c). Proof. We start as in Theorem 7.3.2 with the channel Te = ET ⊗k D and the quantity ∆ = k idp −ET ⊗k Dkcb . However instead of assuming that ∆ = ²/e holds, the full range e∆ ≤ ² ≤ 1/2 is allowed for the error rate ². Using the same arguments as in the proof of Theorem 7.3.2 we get an achievable rate log2 (p) c(k, p, ²) = k µ H2 (2²) 1 − 4² − log2 (p) ¶ (7.22) and an exponential bound on the coding error: ³ ´n 0 ¡ ¢ 0 0 0 e ET ⊗k D ⊗n Dk e cb ≤ 2H2 (²) ∆² ) ≤ k id − E ; ∆(T ⊗kn , Mm p (7.23) cf. Equations (7.6) and (7.19). To calculate the exponential rate λ(c) with which the coding error vanishes we have to consider the quantity 1 −1 0 ³ H2 (²) ² ´ bncc λ(c) = lim inf − ln ∆(T ⊗n , M2 ) ≥ lim n ln 2 ∆ 0 n→∞ n →∞ kn0 µ n ¶ H2 (²) ² ln(∆) + ln 2 = −²Λ(∆, ²)/k ≥− k ² (7.24) (7.25) where we have inserted inequality (7.23). Now we we can apply Lemma 6.1.2 (with the sequence Mα = kα), which shows that λ(c) is positive, if the right hand side of (3.21) is. What remains to show is that λ(c) > 0 holds for each c < Cq (T ). To this end we have to choose k, p, ∆ and ² such that c(k, p, ²) = c and Λ(∆, ²) < 0. Hence consider δ > 0 such that c + δ < Cq (T ) is an achievable rate. As in the proof of Theorem 7.3.2 we can choose log2 (p)/k such that log2 (p)/k > c + δ holds while ∆ is arbitrarily small. Hence there is an ²0 > 0 such that c(k, p, ²) = c implies ² > ²0 . The statement therefore follows from the fact that there is a ∆0 > 0 with Λ(∆, ²) > 0 for all 0 < ∆ < ∆0 and ² > ²0 . 2 In addition to the statement of Proposition 7.3.4 we have just derived a lower bound on the error exponent λ(c). Since we can not express the error rate ² as a function of k, p and c we can not specify this bound explicity. However we can plot it as a parametrized curve (using Equation (7.22) and (7.25) with ² as the parameter) in the (c, λ)-space. In Figure 7.3 this is done for k = 1, p = 2 and several values of ∆. 7.3. Results 111 1 λ ∆ = 10−3 ∆ = 10−4 0.8 ∆ = 10−5 ∆ = 10−6 0.6 0.4 0.2 0 0 0.2 0.4 0.6 0.8 1 c Figure 7.3: Lower bounds on the error exponent λ(c) plotted for n = 1, p = 2 and different values of ∆. 7.3.4 Capacity with finite error allowed (²) We can also tolerate finite errors in encoding. Let Cq (T ) denote the quantity defined exactly like the capacity, but with the weaker requirement that bcnc (²) ∆(T ⊗n , M2 ) ≤ ε for large n. Obviously we have Cq (T ) ≥ Cq (T ) for each ² > 0. Regarded as a function of ² and T this new quantity admits in addition the following continuity property in ². (²) Proposition 7.3.5 limε→0 Cq (T ) = Cq (T ). Proof. By definition we can find for each ²0 , δ > 0 a tuple n, p, E and D such that k idp −ET ⊗n Dkcb = ε0 + ² e (7.26) (²) and |Cq (T ) − log2 (p)/n| < δ holds. If ² + ²0 is small enough, however, we find as in Theorem 7.3.2 a random graph coding scheme such that Cq (T ) ≥ ¢ 1 ¡ ¢ log2 (p) ¡ 1 − 4(² + ²0 ) − H2 2(² + ²0 ) = g(² + ²0 ). n n (7.27) Hence the statement follows from continuity of g and the fact that g(0) = log 2 (p)/n holds. 2 For a classical channel Φ even more is known about the similar defined quantity (²) Cc (T ): If ² > 0 is small enough we can not achieve bigger rates by allowing small (²) errors, i.e. C(T ) = Cc (T ). This is called the “strong converse of Shannon’s noisy channel coding theorem” [191]. To check whether a similar statement holds in the quantum case is one of the big open problem of the theory. Chapter 8 Multiple inputs The topic of this and the following three chapters is a quantitative discussion of the circle of questions we have already visited in Section 4.2, i.e. quantum state estimation, quantum copying and other devices which act on a large number of equally prepared inputs. This means that we are following the spirit of Chapter 5 and 6 and ask questions like: How can we measure the error which an approximate cloning machine produces in its outputs? What is the lower bound on this error? Is there a device which achieves this bound and how does it look like? One fundamental difference to similar questions arising within entanglement distillation and calculations of channel capacities is the fact that we are able to give complete answers under quite general conditions. The reason is that the tasks we are going to discuss admit large symmetry groups which can be used to reduce the number of parameters, which makes the corresponding optimization problems more tractable. Since the material we want to present is quite comprehensive, we have broken it up into four chapters: The topic of the present one (which is a significantly extended version of [133]) is an overview and the discussion of some general properties, while the following three treat special cases, namely: optimal pure state cloning (Chapter 9), quantum state estimation (Chapter 10) and optimal purification (Chapter 11). 8.1 Overview and general structure To start with, let us have a short look on the general structure of this particular type of problem. In all cases we are searching for channels T : A → B(H⊗N ) (8.1) which operate on N d-level quantum systems (i.e. H = Cd ) and produce an output of a possibly different type (described by the observable algebra A). There are two different choices for A we are going to consider: If we look at quantum cloning and related tasks T should be a channel which produces M systems of the same type as the inputs, i.e. A = B(H⊗M ). In slight abuse of language we will call all such channels cloning maps (even if the particular problem we are looking at is only loosely related to cloning) and we write T (N, M ) = {T : B(H⊗M ) → B(H⊗N ) | T unital, cp}. (8.2) For state estimation, T should be an observable with values in the quantum state space S = S(H). Hence we have to choose A = C(S) and the set of all such estimators is denoted by T (N, ∞) = {E : C(S) → B(H⊗N ) | E positive, unital}. (8.3) This notation is justified by the fact that state estimation is in a certain sense the limiting case of cloning for infinitely many output systems. We will make this statement more precise in Chapter 10. In both cases the task is to optimize a “figure of merit” ∆(T ) which measures, roughly speaking, the largest deviation of T ∗ (ρ⊗N ) from the target functional β(ρ) ∈ S(A) we want to approximate. In most cases ∆(T ) has the form £ ¤ ∆(T ) = sup δ T ∗ (ρ⊗N ), β(ρ) , (8.4) ρ∈X 8.1. Overview and general structure 113 where δ is a distance measure on the state space S(A) of the algebra A and X ⊂ S(H). If nothing is known about the input state ρ we have to choose X = S(H). If, in contrast to that, X is strictly smaller than S(H) it describes a priori knowledge about ρ. The most relevant special cases arise when X is the set of pure states or if X is finite. The latter corresponds to a cryptographic setup, where Alice and Bob use finitely many signal states ρ1 , . . . , ρn to send classical information through a quantum channel S and Eve tries to eavesdrop on the conversation by copying the quantum information transmitted through S. Both situations require quite different methods and we will concentrate our discussion on the pure state case (for recent results concerning quantum hypothesis testing, i.e. estimation of states from a finite set, cf. e.g. [173, 107, 166] and the references therein). A different kind of a priori knowledge are a priori measures, i.e. instead of knowing that all possible input states lie in a special set X we know for each measurable set X ⊂ S(H) the probability µ(X) for ρ ∈ X. Such a situation typically arises when we are trying to estimate (or copy) states of systems which originate from a source with known characteristics. In this case we can use mean errors Z £ ¤ ¯ ∆(T ) = δ T ∗ (ρ⊗N ), β(ρ) µ(dρ). (8.5) S(H) as a figure of merit. Sometimes they are easier to compute than maximal errors as ¯ and we will in Equation (8.4). Often however ∆ leads to stronger results than ∆ concentrate our discussion therefore on maximal rather than mean errors. Now assume that a particular estimation or cloning problem is given, which is described by a set of channels and an appropriate figure of merit ∆. Then there is a number of characteristic questions we are interested in the following. The first is: • Is there an optimal device Tb which minimizes the error, i.e. ∆(Tb) = inf T ∆(T ), and how does it look like? Since the dimension of the space T (N, M ) grows exponentially with N and M it seems at a first look to be hopeless to search for a closed form solution for arbitrary N and M . We will see however that some problems admit quite simple (symmetry based) arguments which restrict the size of the spaces, in which we have to search for the minimizers, quite significantly. In this way we will be able to give a complete answer for pure state cloning (Chapter 9) and estimation (Section 10.1) and for purification (Chapter 11). In other cases the situation is more difficult and a closed form solution can not be achieved; for us this concerns primarily mixed state estimation and related tasks. In this situation we will concentrate on the asymptotic behavior in the limit of infinitely many input systems (N → ∞). Here we have to distinguish between estimation (M = ∞) and cloning-like tasks (M arbitrary), because in the latter case we have two parameters which can go to infinity. Let us consider first state estimation. Here our main interests are the following • Find sequences EN , N ∈ N with EN ∈ T (N, ∞) such that limN →∞ ∆(EN ) = 0 holds. We have already seen in Section 4.2 that such sequences exist. • Determine the error exponent ν = limN →∞ −N −1 ln ∆(EN ). What is the best ν we can achieve (i.e. what is the fastest decay) and how does the corresponding estimation scheme EN , N ∈ N look like? Note that the EN we are looking for in this context are not necessarily optimal for N < ∞ but if ν is finite the error ∆(EN ) vanishes exponentially fast and the difference (measured by ∆) between EN and an optimal scheme becomes already negligible for a quite small number of input systems. The search for a sequence 8. Multiple inputs 114 EN , N ∈ N with minimal ν can be regarded in addition as a search for a scheme which is asymptotically optimal. We will discuss this circle of question in Chapter 10. If we consider cloning maps there are two parameters N and M which can be send to infinity. One possibility is to keep N fixed and look at the limiting case of infinite many output systems (M → ∞). This situation is closely related to estimation, because cloning devices which can be constructed from estimators (cf. Equation (4.19) and the corresponding discussion in Section 4.2) lead to errors which do not depend on the number M of outputs. We will make this statement precise in Section 10.1.1. A more general type of question appears if both parameters N and M goes to infinity, i.e. we have a sequence M (N ) ∈ N, N ∈ N of positive integers such that limN →∞ M (N )/N = c holds (possibly with c = ∞). Then we can ask • Consider a double sequence TN,M ∈ T (N, M ), N, M ∈ N (e.g. a cloning scheme). What is the best asymptotic rate c = limN →∞ M (N )/N we can achieve such that limN →∞ ∆(TN,M (N ) ) = 0? Note that this is precisely the same type of question we have already encountered within the discussion of entanglement distillation (Section 5.1.3) and channel capacities (Section 6.1.1). One difference is that we can combine the search for asymptotic rates with results (where available) about optimal devices, i.e. we can choose for TN,M those channels which minimizes ∆(TN,M ) for given N, M ∈ N. This simplifies the calculation of optimal rates significantly. We will consider this type of questions in detail in Chapter 11. 8.2 Symmetric cloner and estimators All figures of merit we want to discuss in this work admit a large symmetry group G because they describe “universal” and “symmetric” problems which do not prefer a direction in the Hilbert space H or a particular tensor factor of the input or (if applicable) output systems. This fact can be used to simplify the calculation of ∆ and the solution of the corresponding optimization problem. We have already used similar arguments in the context of entanglement measures (cf. Subsection 5.3), and we will see that the simplification of cloning and estimation problems gained from symmetry arguments are even stronger. The purpose of this section is to discuss the group G, its action on T (N, M ) and in particular the set of cloners (respectively estimators) which are invariant under this action. A short summary of notations and terminology from group theory, which we will use throughout this and the following three chapters, is added as an appendix to this chapter (Section 8.3). 8.2.1 Reducing parameters To start with let us consider the set T (N, M ) of cloning maps defined in Equation (8.2), i.e. M < ∞. The symmetry group G is in this case given as the direct product U(d) × SN × SM of the unitary group U(d) and the symmetric groups SN and SM . Its action on T (N, M ) is defined by αU,σ,τ (T ) = (αU ασ ατ )(T ) with (U, σ, τ ) ∈ U(d) × SN × SM and (αU T )(A) = U ⊗N T (U ∗⊗M AU ⊗M )U ∗⊗N , (ασ T )(A) = Vσ T (A)Vσ∗ , (ατ T )(A) = T (Vτ AVτ∗ ), (8.6) where Vσ , Vτ are the unitaries associated to the permutations σ and τ ; cf. Equation (3.7). The transformation T 7→ (αU T ) can be interpreted (passively) as a basis change in the one-particle Hilbert space H, while ασ and ατ refer to permutations of input respectively output systems. Now we have the following lemma: Lemma 8.2.1 Consider the space T (N, M ) for M < ∞ and a convex, lower semicontinuous functional ∆ : T (N, M ) → R+ which is invariant under the action of 8.2. Symmetric cloner and estimators 115 G = U(d) × SN × SM defined in Equation (8.6), i.e. ∆(αg T ) = ∆(T ) holds for all g ∈ G. Then there is at least one Tb ∈ T (N, M ) with ∆(Tb) ≤ ∆(T ) ∀T ∈ T (N, M ) and αg Tb = Tb, (8.7) i.e. there exist minimizers which are invariant under the group action α g . Proof. The existence of minimizers is a simple consequence of compactness of T (N, M ) and semicontinuity of ∆. Hence there is an S ∈ T (N, M ) with ∆(S) ≤ ∆(T ) ∀T ∈ T (N, M ). Due to the invariance of ∆ we get ∆(αg S) = ∆(S) ≤ ∆(T ) ∀g ∈ G ∀T ∈ T (N, M ). (8.8) Now we can average over G (this integral is well defined because αg S ∈ T (N, M ) and T (N, M ) is finite dimensional) Z αg Sdg, (8.9) Tb = G where dg denotes the normalized Haar measure on G (which exists, due to compactness of G). Obviously Tb is G-invariant: Since the action αg is affine we get Z Z Z αg Sdg = αh Tb = αh αhg Sdg = αg0 Sdg 0 = Tb with g 0 = hg. (8.10) G G G Exploiting convexity of ∆ and Equation (8.8) we get in addition (for all T ∈ T (N, M )) ¸ Z ·Z Z b ∆(S)dg = ∆(S), (8.11) ∆(αg S)dg = αg Sdg ≤ ∆(T ) = ∆ G G G which proves the lemma. 2 Hence as long as we are only interested to find some (rather than all) optimal devices we can restrict our attention to those channels which are invariant under the operation αU,σ,τ of U(d) × SN × SM introduced above. It is therefore useful to define Definition 8.2.2 A completely positive, (not necessarily unital) map T : B(H⊗N ) → B(H⊗M ) which is invariant under the action αU,σ,τ of U(d) × SN × SM defined in Equation (8.6) is called a fully symmetric cloning map. The space of all fully symmetric elements of T (N, M ) is denoted by Tfs (N, M ). To adopt the previous discussion to state estimation, let us consider now the space T (N, ∞) of estimators (Equation (8.3)). As the set T (N, M ) defined above, T (N, ∞) is convex, however it is infinite dimensional and compactness is therefore topology dependent. An appropriate topology for our purposes is the weak topology on T (N, ∞), i.e. the coarsest topology such that all functions T (N, ∞) 3 E 7→ hψ, E(f )φi with f ∈ C(S), ψ, φ ∈ H⊗N are continuous. It is then an easy consequence of the Banach-Alaoglu Theorem [186, Theorem IV.21] that T (N, ∞) is compact in this topology. In analogy to Equation (8.6) we can define a weakly continuous action of the group SN × U(d) on T (N, ∞): For each (U, τ ) 3 U(d) × SM we define αU,τ E = (αU ατ )(E) with αU E(f ) = U ⊗N E(fU )U ∗⊗N and ατ E(f ) = Vτ E(f )Vτ , (8.12) where Vτ : H⊗N → H⊗N denotes the permutation unitary associated to τ and fU ∈ C(S) is given by fU (ρ) = f (U ρU ∗ ). Now we can follow the proof of Lemma 8. Multiple inputs 116 8.2.1 if we take into account that integrals of T (N, ∞) valued maps should be considered as weak integrals, i.e. the average Ē of αg E over the group G = SN × U(d) is defined as Z hψ, Ē(f )φi = hψ, (αg E)(f )φiµ(dg) ∀f ∈ C(S) ∀ψ, φ ∈ H⊗N . (8.13) G Hence we have: Lemma 8.2.3 Consider a convex, lower semicontinuous (with respect to the weak topology) functional ∆ : T (N, ∞) → R+ which is invariant under the action of G = U(d) × SN defined in Equation (8.12), i.e. ∆(αg E) = ∆(E) holds for all b ∈ T (N, ∞) with g ∈ G. Then there is at least one estimator E b ≤ ∆(E) ∀E ∈ T (N, ∞) and αg E b = E, b ∀g ∈ G. ∆(E) (8.14) As in the case M < ∞ discussed above, this lemma shows that we can restrict the search for minimizers to those observables which are invariant under the action αU,τ of U(d) × SN (as long as the figure of merit under consideration has the correct symmetry). In analogy to the M < ∞ case we define therefore Definition 8.2.4 A (completely) positive, unital map E : C(S) → B(H ⊗N ) which is invariant under the action αU,τ of U(d)×SN defined in Equation (8.12) is called a fully symmetric estimator. The set of all fully symmetric E is denotes by Tfs (N, ∞). To make use of these results it is necessary to get a better understanding of the structure of the sets Tfs (N, M ) for N ∈ N and M ∈ N ∪ {∞}. This is the subject of the rest of this chapter. 8.2.2 Decomposition of tensor products The first step is an analysis of the representations U 7→ U ⊗N and σ 7→ Vσ of U(d) respectively SN on the tensor product Hilbert space H⊗N , which play a crucial role in the definition of fully symmetric channels and estimators. The results we are going to review here are well known and go back to Weyl [237, Ch. 4]. To state them we have to introduce some notations from group theory: A Young frame is an arrangement of a finite number of boxes into rows of decreasing length. We represent it by a sequence of integers m1 ≥ m2 ≥ · · · ≥ md ≥ 0 where mk denotes the number of boxes in the k th row. Hence Yd (N ) = {m = (m1 , . . . , md ) ∈ Nd0 | m1 ≥ m2 ≥ . . . ≥ md , d X k=1 mk = N } (8.15) denotes the set of all frames with d rows and N boxes. Each Young frame m ∈ Y d (N ) determines uniquely (up to unitary equivalence) irreducible representations of S N and U(d) which we denote by Πm and πm . In the U(d)-case m gives the highest weight of πm in the basis Ejj = |jihj|, j = 1, .., d of the Cartan subalgebra of the Lie algebra of U(d) (cf. Subsection 8.3.2 for notations and a further discussion). Πm as well as πm can be constructed explicitly from m, but we do not need this information. Theorem 8.2.5 Consider the d-dimensional Hilbert space H = Cd , its N -fold tensor product H⊗N and the representations U(d) 3 U 7→ U ⊗N and SN 3 σ 7→ Vσ on H⊗N . There is a unique decomposition of H⊗N into a direct sum such that M M πm (U ) ⊗ 1I, Hm ⊗ Km , U ⊗N ∼ H⊗N ∼ = = m∈Yd (N ) Vσ ∼ = M m∈Yd (N ) m∈Yd (N ) 1I ⊗ Πm (σ) holds, where ∼ = means “naturally isomorphic”. (8.16) 8.2. Symmetric cloner and estimators 117 For a proof see [195, Sect. IX.11]. This theorem is intimately related to Theorem 3.1.1, where commutativity properties between the U ⊗N and Vσ are discussed. This can be seen if we introduce the algebras AN = M m∈Yd (N ) B(Hm ) ⊗ 1I, BN = M m∈Yd (N ) 1I ⊗ B(Km ). (8.17) It is easy to check that AN and BN are commutants of each other, i.e. we have 0 AN = B N and BN = A0N where the prime denotes the commutant of the corresponding set of operators, i.e. A0N = {B ∈ B(H⊗N ) | [A, B] = 0, ∀A ∈ A} and 0 similarly for BN . On the other hand we have U ⊗N ∈ AN and Vσ ∈ BN for all U ∈ U(d) and σ ∈ SN . Hence, irreducibility of the representations πm and Πm implies immediately that each element of AN (of BN ) is a finite linear combination of operators from {U ⊗N | U ∈ U(d)} (respectively of permutation unitaries Vσ ). This shows in particular that each operator which commutes with all U ⊗N is a linear combination of Vσ ’s as stated in Theorem 3.1.1. Note however that Theorem 3.1.1 is not a corollary of Theorem 8.2.5, because the former is used in an essential way in the proof of the latter. Now let us consider the general linear group GL(d, C). Each representation πm of U(d) admits a (unique) analytic continuation and leads therefore to a representation of GL(d, C). We will denote it by πm as well and it is in fact the GL(d, C) representation with highest weight m. Therefore the following Corollary is an easy consequence of Theorem 8.2.5 Corollary 8.2.6 Consider an operator X ∈ GL(d, C) and a Young frame m ∈ Yd (N ) for some N ∈ N then we have Pm X ⊗N Pm = πm (X) ⊗ 1I ∈ B(Hm ) ⊗ B(Km ) (8.18) where the Pm denote the central projections of the algebra AN (respectively BN ), i.e. M φn 7→ Pm φ = φm ∈ Hm ⊗ Km , (8.19) H⊗N 3 φ = n L with n, m ∈ Yd (N ) and φ = n φn denotes the decomposition of φ according to the direct sum from Equation (8.16). A subrepresentation of U ⊗N which is particularly important within pure state cloning and estimation arises if we consider the action of U(d) and SN on the symmetric tensor product, i.e. ⊗N H+ = {SN ψ | ψ ∈ H⊗N }, SN ψ = 1 X Vσ ψ. N! (8.20) σ∈SN ⊗N By definition all permutation unitaries Vσ act as identities on H+ and it is the ⊗N ⊗N biggest subspace of H with this property. In addition it is easy to see that H+ ⊗N is left invariant by the U , i.e. it carries a subrepresentation of U(d). Since the trivial representation of SN is labeled by the Young frame with one row and N ⊗N boxes we get with Theorem 8.2.5 H+ = HN 1 ⊗ KN 1 , where we have used the notation 1 = (1, 0, . . . , 0) ∈ Yd (N ) and N 1 = (N, 0, . . . , 0) ∈ Yd (N ). (8.21) But the trivial representation is one-dimensional, i.e. KN 1 = C and this leads to the following proposition: 8. Multiple inputs 118 ⊗N Proposition 8.2.7 Consider the N -fold symmetric tensor product H + , the cor⊗N N responding projection SN : H⊗N → H+ and the U(d) representation π+ (U ) = ⊗N SN U SN . Then we have, using the notations from Theorem 8.2.5 and Equation (8.21): ⊗N N H+ = πN 1 , SN = P N 1 . (8.22) = H N 1 , π+ 8.2.3 Fully symmetric cloning maps Let us consider now fully symmetric cloning maps. Our aim in this Subsection is to determine the extremal elements of the convex set Tfs (M, N ) and the central tool for this task is Theorem 8.2.5, which we have to apply to the input and output Hilbert space H⊗N and H⊗M . Since the procedure is quite complex we have broken it up into several steps. Proposition 8.2.8 Each fully symmetric channel T : B(H ⊗M ) → B(H⊗N ) can be decomposed into a direct sum M T (A) = (8.23) Tm (A) ⊗ 1IKm , m∈Yd (N ) where Tm are channels from B(H⊗M ) to B(Hm ) with Tm (U ⊗M AU ∗⊗M ) = πm (U )Tm (A)πm (U ∗ ) and Tm (Vτ AVτ∗ ) = Tm (A). (8.24) The set of all such Tm (which we will call again fully symmetric) is denoted by Tfs (Hm , M ). Proof. According to Definition 8.2.2 we have [T (A), Vσ ] = 0 for all A ∈ B(H⊗M ) and all σ ∈ SN . By Theorem 8.2.5 this implies that T (A) ∈ AN holds, where AN denotes the algebra from Equation (8.17). Hence, T is of the given form. 2 The next step applies Theorem 8.2.5 to the output Hilbert space H⊗M . This leads to a further decomposition of the spaces Tfs (Hm , M ). Theorem 8.2.9 Consider N, M ∈ N and m ∈ Yd (N ). Each channel T : B(H⊗M ) → B(Hm ) satisfying the covariance condition from Equation (8.24) admits a unique convex decomposition X ¤ £ cn (8.25) Tn trKn (Pn APn ) T (A) = dim Kn n∈X with cn > 0, P n cn = 1 and X = {n ∈ Yd (M ) | ∃A ∈ B(H⊗M ) with T (Pn APn ) 6= 0} (8.26) and the Tn are unital cp-maps Tn : B(Hn ) → B(Hm ) satisfying Tn (πn (U )Aπn (U )∗ ) = πm (U )T (A)πm (U )∗ . (8.27) The set of all channels Tn with this property is denoted by Tfs (Hm , Hn ). Proof. To prove uniqueness it is sufficient to note that each summand in Equation (8.25) equals T (Pn APn ) and is therefore uniquely determined by T . To show that P the corresponding decomposition T (A) = n T (Pn APn ) of T has the given form consider first the dual T ∗ of T . By assumption we have [Vτ , T ∗ (ρ)] = 0 for all ρ ∈ B ∗ (Hm ) and all τ ∈ SM . Due to Theorem 8.2.5 this implies T ∗ (ρ) ∈ A∗M , where A∗M denotes the dual of the algebra from Equation (8.17). T ∗ is therefore a L direct sum T ∗ (ρ) = n Ten∗ (ρ) ⊗ 1I over n ∈ Yd (M ), where Ten∗ is the dual of a cpmap Ten : B(Hn ) → B(Hm ) which satisfies the covariance condition from Equation (8.27). 8.2. Symmetric cloner and estimators 119 Rewriting the decomposition of T ∗ in the Heisenberg picture we get with A ∈ B(H⊗M ) and ρ ∈ B ∗ (Hm ) £ £ ¤ X £ ¤ tr T (A)ρ] = tr AT ∗ (ρ) = (8.28) tr A (Ten∗ (ρ) ⊗ 1IKn ) n X £ ¤ tr trKn (Pn APn )Ten∗ (ρ) = (8.29) n = X £ ¤ tr Ten [trKn (Pn APn )] ρ . (8.30) n Since A and ρ are arbitrary we see that T becomes X £ ¤ T (A) = Ten trKn (Pn APn ) . (8.31) n This is exactly the form from Equation (8.25), except that the Ten are not unital. Hence consider Ten (1I). Due to covariance of Ten we have ¢ ¡ (8.32) πm (U )Ten (1I)πm (U )∗ = Ten πn (U )Pn πn (U )∗ = Ten (1I) For all U ∈ U(d). Irreducibility of πm implies therefore that Ten (1I) = e cn 1I holds, i.e. e Tn is unital up to a positive factor 0 ≤ e cn ≤ 1. In addition we get (since T is unital) from (8.31) the equality X X 1I = dim Kn Ten (1I) = dim Kn e cn 1I, (8.33) n n P e hence n Kn e cn = 1. If n ∈ X we have e cn > 0 such that we can define Tn = e c−1 n Tn and cn = e cn dim Kn , such that Equation (8.25) follows from (8.31). Since Tm is unital and inherits covariance (8.27) from Tem the theorem is proved. 2 Now the final step is to analyze the spaces Tfs (Hm , Hn ) of πm , πn covariant channels. Proposition 8.2.10 The set Tfs (Hm , Hn ) is convex and its extremal elements are of the form T (A) = V ∗ (A ⊗ 1IL )V (8.34) with an isometry V : Hm → Hn ⊗ L into the tensor product of Hm and an auxiliary Hilbert space L such that L carries an irreducible representation π of U(d) and V intertwines πm with πn ⊗ π, i.e. V πm = πn ⊗ πV holds. Proof. Assume first that T from Equation (8.34) admits a convex decomposition T = λT1 + (1 − λ)T2 with T1 , T2 ∈ Tfs (Hm , Hn ) and 0 < λ < 1. By Theorem 3.2.2 this implies that there are two operators F1 , F2 with Tj (A) = V ∗ A ⊗ Fj V , j = 1, 2 and [π(U ), Fj ] = 0. Irreducibility of π implies together with normalization (the Tj are unital by assumption) that F1 = F2 = 1I holds. Hence T1 = T2 = T which implies that T is extremal. To show that each channel T ∈ Tfs (Hm , Hn ) can be decomposed into elements of the given form, note that Theorem 3.2.2 implies that a Stinespring representation T (A) = V ∗ (A ⊗ 1IL )V of T exists such that L carries a representation π of U(d) and V : Hm → Hn ⊗ L is an isometry which intertwines πm with πn ⊗ π. If π is irreducible the Theorem is proved (T is extremal in this case); if not we can decompose it into a direct sum M M L= Lj , π = πj (8.35) j∈J j∈J 8. Multiple inputs 120 where J is a finite index set and the πj are irreducible representations on Lj . If the projection from L onto Lj is denoted by Pj we can define operators Vj = (1I ⊗ Pj )V which intertwine πm and πn ⊗πj . Hence Tej (A) = Vj∗ (A⊗1I)Vj is a cp-map satisfying the proposition, except that it is not unital. Irreducibility of πm and covariance of Tej P imply however that Tej (1I) = cj 1I holds with positive constants cj . Due to j Pj = 1I, e we get a convex decomposition T (A) = Σj cj Tj (A) of T with summands Tj = c−1 j Tj of the stated form. 2 Combining Theorem 8.2.9 and Proposition 8.2.10 we get the extremal elements of the set Tfs (Hm , M ) as T (A) = ¢ ¤ £¡ 1 V ∗ trKn (Pn APn ) ⊗ 1I V dim Kn (8.36) with n ∈ Yd (M ) and an isometry V which satisfies the condition from Proposition 8.2.10. Using in addition Proposition 8.2.8 we see that each extremal element of the set Tfs (N, M ) is a direct sum over the set Yd (N ) of channels of the form (8.36). To get a result which is even more explicit, we have to determine for each n and m all admissible intertwining isometries V . For arbitrary but finite d this can be done at least in an algorithmic way and in the special case d = 2 we just have to calculate Clebsch-Gordon coefficients. This shows that the general structure of a fully symmetric cloning map is completely determined by group theoretical data. 8.2.4 Fully Symmetric estimators Our next task is to determine the structure of the set Tfs (N, M ) in the special case M = ∞. Hence consider an E ∈ Tfs (N, M ). As for finite M we have [E(f ), Vσ ] = 0 for allLσ ∈ SN and all f ∈ C(S). This implies that E decomposes into a direct sum E = m∈Yd (N ) Em where the Em are observables Em : C(S) → B(Hm ), with Em (fU ) = πm (U )Em (f )πm (U ∗ ) (8.37) and fU (ρ) = f (U ρU ∗ ). We write Tfs (m, ∞) for the space of all observables satisfying Equation (8.37) and call them again fully symmetric. To analyze the structure of the Em let us state first the following result [66, 111]: Theorem 8.2.11 Consider a compact, unimodular group G which acts transitively on a topological space X by G × X 3 (g, x) 7→ αg (x), and a representation π of G on a Hilbert space H. Each covariant POV measure E : C(X) → B(H) (i.e. E(f ◦ αg ) = π(g)E(f )π(g)∗ holds for all g ∈ G and all f ∈ C(X)) has the form Z f (αg x0 )π(g)Q0 π(g)∗ µ(dg) (8.38) E(f ) = G where x0 ∈ X is an (arbitrary) reference point, µ is the Haar-measure on G and Q0 ∈ B(H) a positive operator which is uniquely determined by validity of Equation (8.38) and the choice of x0 . Unfortunately this theorem is not applicable to our case, because the action of U(d) on S is not transitive. Nevertheless, it tells us how the observables E m look like along the orbits of the U(d) action on S(H). To analyze the behavior of E m transversal to the orbits it is useful to look at the set Σ = {x ∈ [0, 1]d | x1 ≥ x2 ≥ . . . ≥ xd ≥ 0, d X xj = 1} (8.39) j=1 of “ordered spectra” of density operators and the corresponding projection p : S → Σ which associates to each ρ ∈ S its spectrum p(ρ) ∈ Σ. It is easy to see that 8.2. Symmetric cloner and estimators 121 Σ coincides with the orbit space S/ U(d) and p with the canonical projection. If e1 , . . . , ed ∈ H denotes an orthonormal basis we can introduce in addition the map Σ 3 x 7→ ρx = Σdj=1 xj |ej ihej | ∈ S, (8.40) which is a section of the projection p : S → Σ, i.e. x 7→ ρx is injective and satisfies p(ρx ) = x for all x. Finally we can define the surjective (but not injective) map Σ × U(d) 3 (x, U ) 7→ U ρx U ∗ ∈ S. Using this terminology we can expect from Theorem 8.2.11 that Em ∈ Tfs (m, ∞) looks (heuristically) like Z f (U ρx U ∗ )πm (U )Q0 (x)πm (U )∗ dU dx (8.41) Em (f ) = U(d)×Σ where Σ 3 x 7→ Q0 (x) ∈ B(Hm ) is an operator valued density. To make this statement precise we only have to take into account that Q0 can have discrete and singular parts. This leads to the following theorem Theorem 8.2.12 Each fully symmetric estimator Em ∈ Tfs (m, ∞) (with m ∈ Yd (N )) has the form Z Em (f ) = πm (U )Qm (fU )πm (U )∗ dU (8.42) U(d) with fU ∈ C(Σ), fU (x) = f (U ρx U ∗ ) and an appropriate POV-measure Qm : C(Σ) → B(Hm ). S Proof. Note first that we can decompose S into a disjoint union S = j∈J Sj of finitely many subsets Sj ⊂ S such that there is for each j ∈ J a homogeneous j space Xj , a transitive operation U(d) × Xj 3 (U, v) 7→ αU v ∈ Xj , a distinguished j element v0 ∈ Xj and a homeomorphism Φj : Σj × Xj → Sj with Σj = p(Sj ) j j and Φj (x, αU v0 ) = U ρx U ∗ . This is a central result from the theory of compact G-manifolds [129]. In our case the Sj are characterized by the degeneracy of the eigenvalues of their elements. The Sj are measurable subsets of S. This implies that we can define for each f ∈ C(Sj ) the integral Z em,j (f ) = E f (ρ)E(dρ). (8.43) Sj em,j is nonzero, covariance of E implies that there is a positive constant If the map E em,j (1I) = λj 1I hence we can define Em,j = λ−1 E em,j and get POV-measures λj with E j P Em,j : C(Sj ) → B(Hm ) with Em (f ) = j λj Em,j (f ¹ Sj ), where f ¹ Sj denotes the restriction of f ∈ C(S) to Sj . It is therefore sufficient to show Equation (8.42) for all Em,j . Hence consider a positive function h ∈ C(Σj ) and the map £ ¤ eh (g) = Em,j (g ⊗ h) ◦ Φ−1 ∈ B(Hm ) C(Xj ) 3 g 7→ E j (8.44) eh (g ◦ αj ) = it is linear and positive and has the covariance property E U ∗ eh (g)πm (U ) . Together with irreducibility of πm the latter implies that πm (U )E eh (1I) = νh 1I holds with a constant νh > 0. In other words E eh is (up to norE malization) a covariant POV measure and Theorem 8.2.11 applies. Hence there is a unique positive operator Qm,j (h) ∈ B(Hm ) such that Z £ ¤ eh (g) = Em,j (g ⊗ h) ◦ Φ−1 = g(U v0j U ∗ )πm (U )Qm,j (h)πm (U )∗ dU (8.45) E j U(d) 8. Multiple inputs 122 holds. It is easy to see that the map C(Σj ) 3 h 7→ Qm,j (h) ∈ B(Hm ) is linear and positive. Normalization of Em,j implies in addition that Qm,j (1I) = 1I. Hence Qm,j is a POV-measure satisfying Equation (8.45) for each function (g ⊗h)◦Φ−1 j ∈ C(Sj ). Linearity and continuity (which is a consequence of positivity) implies therefore Z πm (U )Qm,j (fU )πm (U )∗ dU (8.46) Em,j (f ) = U(d) for all f ∈ C(Sj ). Hence Equation (8.42) follows with Qm = 8.3 P j λj Qm,j . 2 Appendix: Representations of unitary groups Throughout this and the following three chapters many arguments from representation theory of unitary groups are used. In order to fix the notation and to state the most relevant theorems we will recall in this appendix some well known facts from representation theory of Lie groups. General references are the books of Barut and Raczka [15], Zhelobenko [241] and Simon [195]. 8.3.1 The groups and their Lie algebras Let us consider first the group U(d) of all complex d × d unitary matrices. Its Lie algebra u(d) can be identified with the Lie algebra of all anti-hermitian d × d matrices. The exponential function is then given by the usual matrix exponential X 7→ exp(X). u(d) is a real Lie algebra. Hence we can consider its complexification u(d) ⊗ C which coincides with the set of all d × d matrices and at the same time with the Lie algebra gl(d, C) of the general linear group GL(d, C). In other words u(d) is a real form of gl(d, C). A basis of gl(d, C) is given by the matrices Ejk = |jihk|. The set of elements of U(d) with determinant one forms the subgroup SU(d) of U(d). Its Lie algebra su(d) is the subalgebra of u(d) consisting of the elements with zero trace. Hence the complexification su(d) ⊗ C of su(d) is the Lie algebra of trace-free matrices and coincides therefore with the Lie algebra sl(d, C) of the special linear group SL(d, C). As well as in the U(d) case this means that su(d) is a real form of sl(d, C). The matrices Ejk do not form a basis of sl(d, C) since the Ejj are not trace free. Instead we have to consider Ejk , j 6= k and Hj = Ejj − Ej+1,j+1 , j = 1, . . . , d − 1. The difference between sl(d, C) and gl(d, C) is exactly the center of gl(d, C), i.e. all complex multiples of the identity matrix. In other words we have gl(d, C) = sl(d, C) ⊕ C1I. A similar result holds for the real forms: u(d) = su(d) ⊕ R1I. The (real) span of all iEjj , j = 1, . . . , d is a subalgebra of u(d) which is maximally abelian, i.e. a Cartan subalgebra of u(d). In the following we will denote it by t(d) and its complexification by tC (d) ⊂ gl(d, C). The intersection of t(d) with su(d) results in a Cartan subalgebra st(d) of su(d). We will denote the complexification by stC (d). Again the two algebras t(d) and st(d) differ by the center of u(d) i.e. t(d) = st(d) ⊕ R1I and tC (d) = stC (d) ⊕ C1I in the complexified case. 8.3.2 Representations Consider now a finite-dimensional representation π : U(d) → GL(N, C) of U(d). It is characterized uniquely by the corresponding representation ∂π : u(d) → gl(N, C) of its Lie algebra, i.e. we have π(exp(X)) = exp(∂π(X)). The representation ∂π can be extended by complex linearity to a representation of gl(d, C) which we will denote by ∂π as well. Hence ∂π leads to a representation π of the group GL(d, C). We will adopt similar notations for representations of SU(d) and SL(d, C). Assume now that π is an irreducible representation of GL(d, C). An infinitesimal weight of π (or simply a weight in the following) is an element λ of the dual of t ∗C (d) of tC (d) such that ∂π(X)x = λ(X)x holds for all X ∈ tC (d) and for a nonvanishing x ∈ CN . The linear subspace Vλ ⊂ CN of all such x is called the weight subspace of the weight λ. The set of weights of π is not empty and, due to irreducibility, there is 123 8.3. Appendix: Representations of unitary groups exactly one weight m, called the highest weight, such that ∂π(Ejk )x = 0 for all x in the weight subspace of m and for all j, k = 1, . . . , d with j < k. The representation π is (up to unitary equivalence) uniquely determined by its highest weight. On the other hand the weight m is uniquely determined by its values m(Ejj ) = mj on the basis Ejj of tC (d). We will express this fact in the following as “m = (m1 , . . . , md ) is the highest weight of the representation π”. For each analytic representation of GL(d, C) the mj are integers satisfying the inequalities m1 ≥ m2 ≥ · · · ≥ md and the converse is also true: each family of integers with this property defines the highest weight of an analytic, irreducible representation of GL(d, C). In a similar way we can define weights and highest weights for representations of the group SL(d, C) as linear forms on the Cartan subalgebra stC (d). As in the GL(d, C)-case an irreducible representation π of SL(d, C) is characterized uniquely by its highest weight m. However we can not evaluate m on the basis Ejj since these matrices are not trace free. One possibility is to consider an arbitrary extension of m to the algebra tC (d) = stC (d) ⊕ C1I. Obviously this extension is not unique. Therefore the values m(Ejj ) = mj are unique only up to an additive constant. To circumvent this problem we will use usually the normalization condition m d = 0. In this case the integer mj corresponds to the number of boxes in the j th row of the Young tableau usually used to characterize the irreducible representation π. Another possibility to describe the weight m is to use the basis Hj of stC (d). We get a sequence of integers lj = m(Hj ), j = 1, . . . , d − 1. They are related to the mj by lj = mj − mj+1 . Each sequence l1 , . . . , ld−1 defines the highest weight of an irreducible representation of SL(d, C) iff the lj are positive integers. Finally consider the representation π̄ conjugate to π, i.e. π(u) = π(u). If π is irreducible the same is true for π̄. Hence π̄ admits a highest weight which is given by (−md , −md−1 , . . . , −m1 ). If π is a SU(d) representation we can apply the normalization md = 0. Doing this as well for the conjugate representation we get (m1 , m1 − md−1 , . . . , m1 − m2 , 0). In terms of Young tableaus this corresponds to the usual rule to construct the tableau of the conjugate representation: Complete the Young tableau of π to form a d × m1 rectangle. The complementary tableau rotated by 180◦ is the Young tableau of π̄. 8.3.3 The Casimir invariants To each Lie algebra g we can associate its universal algebra G. It is L enveloping ⊗N with the two sided defined as the quotient of the full tensor algebra n∈N0 g ideal I generated by X ⊗ Y − Y ⊗ X − [X, Y ], i.e. G is an associative algebra. The original Lie algebra g can be embedded in its envelopping algebra G by g 3 X 7→ X +I ∈ G. The Lie bracket is then simply given by [X, Y ] = XY −Y X. Moreover G is algebraically generated by g and 1I. Hence each representation ∂π of g generates a unique representation ∂π of G simply by ∂π(X1 · · · Xk ) = ∂π(X1 ) · · · ∂π(Xk ). If ∂π is irreducible the induced representation ∂π is irreducible as well. We are interested not in the whole algebra but only in its center Z(G), i.e. the subalgebra consisting of all Z ∈ G commuting with all elements of G. The elements of Z(G) are called central elements or Casimir elements. If ∂π is a representation of G the representatives ∂π(Z) of Casimir elements commute with all other representatives ∂π(X). This implies for irreducible representations that all ∂π(Z) are multiples of the identity. Consider now the case g = gl(d, C). In this case we can identify the envelopping algebra G with the set of all left invariant differential operators on GL(d, C) (a similar statement is true for any Lie group). Of special interest for us are the Casimir elements belonging to operators of first and second order. Using the standard basis 8. Multiple inputs 124 Eij of gl(d, C) introduced in Section 8.3.1 they are given by C1 = d X Ejj and C2 = j=1 d X Ejk Ekj . j,k=1 Of course C21 is as well of second order and it is linearly independent of C2 . Hence each second order Casimir element of G is a linear combination of C2 and C21 . If ∂π is an irreducible representation of gl(d, C) with highest weight (m1 , . . . , md ) it induces, as described above, an irreducible representation ∂π of G and the images of ∂π(C1 ) and ∂π(C2 ) are multiples of the identity, i.e. ∂π(C1 ) = C1 (π)1I and ∂π(C2 ) = C2 (π)1I with C1 (π) = d X mj and C2 (π) = j=1 d X j=1 m2j + X j<k (mj − mk ). (8.47) Let us discuss now the Casimir elements of SL(d, C). Since SL(d, C) is a subgroup of GL(d, C) its enveloping algebra S is a subalgebra of G. However the corresponding Lie algebras differ only by the center of gl(d, C). Hence the center Z(S) of S is a subalgebra of Z(G). Since sl(d, C) is simple there is no first order Casimir element e 2 which is therefore a linear and there is only one second order Casimir element C e 2 = C2 + αC2 of C2 and C2 . Obviously the factor α is uniquely combination C 1 1 determined by the condition that the expression e2 (π) = C1 (π) + αC12 (π) = C d X m2j + j=1 X j<k (mj − mk ) + α d X j=1 2 mj (8.48) e 2) = C e2 (π)1I is invariant under the renormalization (m1 , . . . , md ) 7→ (m1 + with ∂π(C µ, . . . , md + µ) with an arbitrary constant µ. Straightforward calculations show that e2 = C2 − 1 C 2 and α = − d1 . Hence we get C d 1 d d X X X 1 e2 (π) = (d − 1) mj mk + d C m2j − (8.49) (mj − mk ) . d j=1 j6=k j<k e 2 can be expressed in terms of a basis (Xj )j of sl(d, C). In fact there Alternatively C e2 is a symmetric second rank tensor g jk Xj ⊗ Xk ∈ sl(d, C) ⊗ sl(d, C) such that C P jk jk e coincides with the equivalence class of g in S. In other words C2 = jk g Xj Xk holds which leads to X e2 (π)1I = g jk ∂π(Xi )∂π(Xj ) (8.50) C jk for an irreducible representation π of SU(d). Chapter 9 Optimal Cloning After the general discussion let us consider now several special problems. The first is optimal cloning of pure states. In other words we are searching for a device T ∈ T (N, M ) which acts on N d-level systems, each of them in the same (unknown) pure state ρ and which yields at its output side an M -particle system in a state T ∗ (ρ⊗N ) which approximates the product state ρ⊗M “as good as possible”. This is obviously easy if N ≥ M holds, because we only have to drop some particles. Hence we will assume throughout this chapter M > N , as long as nothing else is explicitly stated. The presentation in this chapter is based on [230, 136] and it concerns universal and symmetric cloners, i.e. we are looking at problems which admit symmetry properties as discussed in Section 8.2. Other related work in this direction, in most cases restricted to the qubit case can be found in [96, 41, 42, 45]. Other approaches to quantum cloning, which are not subject of this work, include “asymmetric cloning”, which arises if we trade the quality of one particular output system against the rest (see [49]) and cloning of Gaussian states [50]. 9.1 Figures of merit To get a figure of merit ∆ which measures the quality of the clones, we can follow the general formula (8.4). Hence we have to choose the set of pure states for X and β(ρ) = ρ⊗M for the target functional. The remaining freedom is the distance measure δ and there are in fact two physically different choices: We can either check the quality of each clone separately or we can test in addition the correlations between output systems. With the notation ρ(j) = 1I⊗(j−1) ⊗ ρ ⊗ 1I⊗(M −j) ∈ B(H⊗M ) a figure of merit for the first case is given by £ ¡ ¢¤ ∆C 1 − tr T (ρ(j) )ρ⊗N = 1 − F1C (T ) 1 (T ) = sup (9.1) (9.2) ρ pure,j where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1C denotes the “one-particle fidelity” £ ¤ (9.3) F1C (T ) = inf tr T (ρ(j) )ρ⊗N . ρ pure,j ∗ ⊗N ). If we are ∆C 1 measures the worst one particle error of the output state T (σ interested in correlations too, we have to choose ¢¤ £ ¡ ⊗M ⊗N C (T ) (9.4) ∆C )ρ = 1 − Fall all (T ) = sup 1 − tr T (ρ ρ,pure and the corresponding fidelity is ¤ £ C (T ) = inf tr T (ρ⊗M )σ ⊗N . Fall ρ pure (9.5) ∆C all measures again a “worst case” error, but now of the full output with respect to M uncorrelated copies of the input ρ. Note that we can replace the fidelity quantities in Equation (9.4) and (9.2) by other distance measures like trace-norm 9. Optimal Cloning 126 distances1 or relative entropies without changing the results we are going to present significantly (although some proofs might become more difficult). This is however a special feature of the pure state case. For mixed state cloning the correct choice of the figure of merit has to be done much more carefully; cf the discussion in Section 9.6. Another simplification which arises from the restriction to pure input states C concerns the dependency of ∆C 1 and ∆all on the channel T . Since ρ = |ψihψ| with ψ ∈ H holds we have tr[T (A)ρ⊗N ] = hψ ⊗N , T (A)ψ ⊗N i (9.6) for all A ∈ B(H⊗M ). Therefore tr[T (A)ρ⊗N ] depends only on the part of T (A) ⊗N which is supported by the symmetric tensor product H+ ⊂ H⊗N , i.e. we have C C ∆] (T ) = ∆] (T+ ), ] = 1, all with T+ (A) = SN T (A)SN , where SN denotes as ⊗N in Subsection 8.2.2 the projection onto H+ . This implies that it is sufficient to consider channels ⊗N T : B(H⊗M ) → B(H+ ). (9.7) ⊗N Since H+ ⊂ H⊗N we can look at such a T as a cp-map which takes its values in ⊗N ⊗N B(H ), but in this sense it is not unital (since T is unital as map into B(H+ ) i.e. we have T (1I) = SN ). In other words the channel T from Equation (9.7) is not in T (N, M ) and does not fit into the general discussion from Section 8.1. This is, however, an artificial problem, because we can replace T by ⊥ ⊥ tr(SN ASN ) ⊥ SN , Te(A) = T (A) + ⊥ dim SN (9.8) ⊥ where SN denotes the orthocomplement of SN . This new map is obviously completely positive and unital (i.e. in T (N, M )) but the additional term does not change C C e the value of ∆C ] (i.e. ∆] (T ) = ∆] (T )). Whenever it is necessary for formal reasons to consider T from Equation (9.7) as an element of T (N, M ) such an extension is understood. 9.2 The optimal cloner According to the discussion from the last paragraph, the optimal cloning map has ⊗N to take density operators on H+ to operators on H⊗M . An easy way to achieve such a transformation is to tensor the given operator ρ with the identity operators belonging to tensor factors (N + 1) through M , i.e., to take ρ 7→ ρ ⊗ 1I⊗(M −N ) . This breaks the symmetry between the clones, making N perfect copies and (N − M ) states, which are worst possible “copies”. Moreover, it does not map to states on the ⊗M Bose sector H+ , which would certainly be desirable, as the target states ρ⊗M are supported by that subspace. An easy way to remedy both defects is to compress the operator to the symmetric subspace with the projection SM . With the appropriate normalization this is our definition of the cloning map, later shown to be optimal: d[N ] Tb∗ (ρ) = SM (ρ ⊗ 1I⊗(M −N ) )SM d[M ] ⊗N where d[N ] (respectively d[M ]) denotes the dimension of H+ , i.e. µ ¶ µ ¶ −d d+N −1 d[N ] = (−1)N = , N N (9.9) (9.10) 1 Trace norm distances would lead in the present case to exactly the same results, including the values of the minimal errors from Proposition 9.2.2. This is easy to check. Other proofs, however, would become more difficult, which is the reason why we have chosen fidelity based quantities. 9.2. The optimal cloner 127 ⊗N which can be checked easily using the occupation number basis of H+ . The channel Tb given in Equation (9.9) produces M clones from N input systems. Sometimes it is useful to have a symbol for Tb which indicates these numbers (i.e. if N and M are not understood from the context) in this case we write TbN →M instead of Tb. The following two propositions summarizes the most elementary properties of Tb. ⊗N Proposition 9.2.1 The map Tb : B(H⊗M ) → B(H+ ) with dual (9.9) is completely positive, unital and fully symmetric. Proof. Full symmetry and complete positivity of Tb are obvious. Hence it remains to show that Tb is unital. With U(d) covariance of T (which is part of full symmetry) we get N N (U )∗ , (U )Tb(1I)π+ Tb(1I) = Tb(U ⊗M 1IU ∗⊗M ) = π+ (9.11) N (U ) = SN U ⊗N SN . Irreducibility of π+ (cf. Proposition 8.2.7) implies where π+ therefore Tb(1I) = λ1I. To determine the value of λ ∈ R+ , let us consider the density matrix τN = d[N ]−1 SN and ¤ ¤ £ £ λ = tr Tb(1I)τN = tr Tb∗ (τN ) · ¸ 1 = tr SM (SN ⊗ 1I⊗(M −N ) )SM d[M ] · ¸ SM = tr = 1. d[M ] Hence Tb is unital, which completes the proof. (9.12) (9.13) (9.14) 2 C Proposition 9.2.2 The one- and all-particle errors ∆C 1 and ∆all of the cloning b map T defined in Equation (9.9) are given by ¯ ¯ d − 1 ¯¯ N M + d ¯¯ 1 − d ¯ N +d M ¯ ¯ ¯ ¯ d[N ] ¯¯ b ¯ 1 − ( T ) = ∆C . all ¯ d[M ] ¯ b ∆C 1 (T ) = (9.15) (9.16) C b C b Proof. Consider ∆C 1 first. By definition we have ∆1 (T ) = 1 − F1 (T ) and F1C (Tb) = σ N £ ¤ 1 X £ (j) b∗ ⊗M ¤ tr T (σ (j) )σ ⊗N = tr ρ T (ρ ) pure,j N j=1 inf (9.17) for an arbitrary pure state ρ. To get£the second equality we have to use the symmetry ¤ properties of Tb which imply that tr T (σ (j) )σ ⊗N does not depend on j and σ. Using this we get M F1C (Tb) = ¤ d[N ] X £ (j) tr ρ SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM . M d[M ] j=1 (9.18) 9. Optimal Cloning 128 2 = SM ) and due to [ Since SM is a projector (SM leads to P j σ (j) , SM ] = 0 this equation M ¤ d[N ] X £ (j) ⊗N tr ρ (ρ ⊗ 1I⊗(M −N ) )SM M d[M ] j=1 £ ¤ d[N ] ³ = N tr SM (ρ⊗N ⊗ 1I⊗(M −N ) )SM M d[M ] £ ¤´ + (M − N ) tr SM (ρ⊗(N +1) ⊗ 1I⊗(M −N −1) )SM F1C (Tb) = = (9.19) (9.20) (9.21) ¤ (M − N ) d[N ] £ ¤ N £ b∗ tr TN →M (ρ⊗N ) + tr TbN∗ +1→M (ρ⊗(N +1) ) (9.22) M M d[N + 1] Where the indices N → M and N + 1 → M indicate that two variants of Tb occur here. One operates on N the other on N + 1 input systems; cf. the corresponding remark above. Inserting Equation (9.10) into (9.22) we get M −N N −1 N + . F1C (Tb) = M M N +d (9.23) C Together with ∆C 1 = 1 − F1 a straightforward computation yields Equation (9.15). To show the other equation note that d[N ] ⊗M F1C (Tb) = inf hψ , SM (ρ⊗N ⊗ 1I⊗(N −M ) )SM ψ ⊗M i, ψ d[M ] (9.24) where ρ = |ψihψ| and the infimum is taken over all normalized ψ ∈ H. Since ∗ SM = SM and SM ψ ⊗M = ψ ⊗M we get F1C (Tb) = inf ψ d[N ] d[N ] ⊗N ⊗N ⊗N hψ , ρ ψ ihψ ⊗(M −N ) , ψ ⊗(M −N ) i = . d[M ] d[M ] C Together with ∆C all = 1 − Fall Equation (9.16) follows. (9.25) 2 The significance of Tb lies in the fact that it is the only cloning map which C minimizes ∆C 1 and ∆all . The central result of this section is therefore the following. ⊗N Theorem 9.2.3 For any cloning map T : B(H⊗M ) → B(H+ ) (with M > N , ⊗N d H = C and H+ denotes the N -fold symmetric tensor product) we have ¯ ¯ d − 1 ¯¯ N M + d ¯¯ ∆C (T ) ≤ 1 − (9.26) 1 d ¯ N +d M ¯ ¯ ¯ ¯ d[N ] ¯¯ ¯ 1 − (T ) ≤ ∆C . (9.27) all ¯ d[M ] ¯ In both cases equality holds iff T = Tb from (9.9). The proof of this theorem is rather nontrivial (in particular the part concerning ∆C 1 ) and therefore distributed over the following subsections. The surprising message of this result is that the amount of entanglement we allow between the clones does not influence the best cloning device we can construct. This means that we can not increase the quality of individual clones by increasing at the same time the correlations between them. We will see in Subsection 9.6 that this changes drastically if we consider mixed state cloning. Another aspect where the difference between ∆ C 1 and ∆C all is crucial are asymptotic rates (number of clones per input system) in the limit N → ∞. We will discuss this point in detail in Section 9.5 and 11.4. 9.3. Testing all clones 129 9.3 Testing all clones 9.3.1 Existence and uniqueness The purpose of this subsection is to prove the statements of Theorem 9.2.3 which b concern the all particle error ∆C all , i.e. optimality of T and its uniqueness. To this end let us come back to the discussion of symmetry properties from Section 8.2. As a supremum over affine quantities, ∆C all is convex and lower semicontinuous. It is in addition straightforward to check that ∆C all is invariant under the action αU,σ,τ of the group U(d) × SN × SM defined in Equation (8.6). Hence Lemma 8.2.1 applies. C b To prove optimality of Tb it is therefore sufficient to show that ∆C all (T ) ≥ ∆all (T ) holds for all fully symmetric T . Additional simplification arises from the fact that we are only looking for chan⊗N nels which take their values in the algebra B(H+ ) (rather than B(H⊗N )). This implies that in the direct sum decomposition from Proposition 8.2.8 only the sum⊗N mand with m = N 1 occurs (since H+ = HN 1 holds according to Proposition 8.2.7). Hence we can apply Theorem 8.2.9 and get in the Schrödinger picture (by rewriting Equation (8.25) accordingly) X cm (9.28) T ∗ (η) ⊗ 1IKm T ∗ (η) = dim Km m m∈Yd (M ) ⊗N ⊗N with covariant, unital channels Tm : B(Hm ) → B(H+ ) and η ∈ S(H+ ). C Proposition 9.3.1 The channel Tb minimizes the all particle error ∆C all = 1 − Fall . C Proof. To calculate the all particle fidelity Fall (T ) of¤ a fully symmetric T note that £ ∗ ⊗N U(d) covariance of T implies that tr T (ρ )ρ⊗M does not depend on the pure state ρ ∈ S(H). Hence £ ¤ £ ¤ C Fall (T ) = inf tr T ∗ (σ ⊗N )σ ⊗M = tr T ∗ (ρ⊗N )ρ⊗M (9.29) σpure for an arbitrary pure state ρ. Inserting Equation (9.28) leads to X ¤ £ ∗ ⊗N cm C Fall (T ) = tr Tm (ρ )Pm ρ⊗M Pm . dim Km (9.30) m∈Yd (M ) ⊗M Since ρ⊗M is supported by H+ we have due to PM 1 = SM the equalities ⊗M ⊗M PM 1 ρ PM 1 = ρ and Pm ρ⊗M Pm = 0 for m 6= M 1. Hence only one term remains in the sum (9.29) and we get £ ∗ ¤ C ⊗N ⊗M Fall (T ) = cM 1 tr TM )ρ . (9.31) 1 (ρ If T is optimal we must have cM 1 = 1 and therefore £ ∗ ¤ C ⊗N ⊗M T = TM 1 and Fall (T ) = TM )ρ 1 (ρ (9.32) C To get an upper bound on Fall we use positivity of the operator SN − ρ⊗N and U(d) covariance of TM 1 . The latter implies that T ∗ (SN /d[N ]) is a density M M matrix which commutes with π+ (U ). Irreducibility of π+ implies together with ∗ ∗ the fact that TM 1 is trace preserving that TM 1 (SN ) = d[N ]/d[M ]SM . Hence we get according to T = TM 1 £ ¤ 0 ≤ tr T ∗ (SN − ρ⊗N )ρ⊗M (9.33) ¤ £ ¤ d[N ] £ = tr SM ρ⊗M − tr T ∗ (ρ⊗N )ρ⊗M (9.34) d[M ] £ ¤ d[N ] = − tr T ∗ (ρ⊗N )ρ⊗M . (9.35) d[M ] 9. Optimal Cloning 130 C Together with Equation (9.32) we get Fall (T ) ≤ d[N ]/d[M ]. However we already C b know from Proposition 9.2.2 that Fall (T ) = d[N ]/d[M ], hence Tb is optimal. 2 ⊗N Proposition 9.3.2 There is only one channel T : B(H ⊗M ) → B(H+ ) which C C minimizes ∆all (respectively maximizes Fall ). ⊗N Proof. To prove uniqueness consider now a general channel T : B(H ⊗M ) → B(H+ ) C C which minimizes ∆all (respectively maximizes Fall ). By averaging over the groups U(d) and SM we get Z ¤ £ N 1 X ∗ N (9.36) (U )∗ U ⊗M Vτ∗ dU T (η) = (U )ηπ+ Vτ U ⊗M ∗ T ∗ π+ M! U(d) τ ∈SM which is a fully symmetric channel. Due to convexity and invariance of ∆ C all we get C C ∆C (T ) ≤ ∆ (T ) and since T is by assumption already optimal: ∆ (T ) = ∆C all all all all (T ). Hence T is optimal as well and at the same time fully symmetric; cf. the proof of ∗ Lemma 8.2.1. This implies according to Equation (9.32) that T (η) is supported by ∗ ∗ ⊗N ⊗M ). Hence we get for an arbitrary H+ , i.e. T (η) = SM T (η)SM for all η ∈ S(H+ ⊗M vector ψ ∈ H with SM ψ = 0 ∗ 0 = hψ, T (η)ψi Z ® ¤ £ N ⊗M ∗ 1 X N = (U )∗ U ⊗M Vτ∗ ψ dU. (U )ηπ+ U Vτ ψ, T ∗ π+ M! U(d) (9.37) (9.38) τ ∈SM This is a sum of integrals over positive quantities and it vanishes. Hence the integrand has to be zero for all U and all τ , which implies in particular that ∗ ⊗M . hψ, T ∗ (η)ψi = 0. In other words T ∗ (η) is as T (η) supported only by H+ C Since T is optimal we have Fall (T ) = d[N ]/d[M ]. Together with Equation (9.29) and (9.35) this implies £ ∗ ¤ tr T (SN − ρ⊗N )ρ⊗M = 0. (9.39) As in Equation (9.37) we can argue that (9.39) involves an integral over positive quantities which vanish. Hence Equation (9.39) holds as well for T . £ ¤ tr T ∗ (SN − ρ⊗N )ρ⊗M = 0. (9.40) £ ¤ Together with optimality this leads to tr T ∗ (SN )ρ⊗M = d[N ]/d[M ] for all pure ⊗M states ρ. Since the symmetric subspace H+ is generated by tensor products ψ ⊗M ⊗M ∗ and due to the observation that T (η) is supported by H+ we conclude that ∗ T (SN ) = d[N ]/d[M ]SM holds. To further exploit the optimality condition, consider the Stinespring dilation of T ∗ in the form d[N ] ∗ T ∗ (η) = V (η ⊗ 1IK )V (9.41) d[M ] ⊗M ⊗N where V : H+ → H+ ⊗ K for some auxiliary Hilbert space K, and η is an ⊗N arbitrary density matrix on H+ . We have included the factor d[N ]/d[M ] in this definition, so that for an optimal cloner V ∗ V = 1I. The optimality condition (9.40) written in terms of V becomes ¡ ¢ ® ¡ ¢ 0 = ψ ⊗M , V ∗ (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M = k (SN − ρ⊗N ) ⊗ 1IK V ψ ⊗M k2 (9.42) ¡ where is the one-dimensional projection to ψ ∈ H. Equivalently, (SN − ρ⊗N ) ⊗ ¢ ρ⊗M 1IK V ψ = 0 which is to say that V ψ ⊗M must be in the subspace ψ ⊗N ⊗ K for every ψ. 9.3. Testing all clones 131 So we can write V ψ ⊗M = ψ ⊗N ⊗ ξ(ψ), with ξ(ψ) ∈ K some vector depending in a generally non-linear way on the unit vector ψ ∈ H. From the above observation that V must be an isometry we can calculate the scalar products of all the vectors ξ(ψ): hφ, ψiM = hφ⊗M , ψ ⊗M i = hV φ⊗M , V ψ ⊗M i = hφ ⊗N ⊗ ξ(φ), ψ ⊗N (9.43) N ⊗ ξ(ψ)i = hφ, ψi hξ(φ), ξ(ψ)i (9.44) hence we get hξ(φ), ξ(ψ)i = hφ, ψiM −N = hφ⊗(M −N ) , ψ ⊗(M −N ) i. This information is sufficient to compute all matrix elements ¡ ¢ ⊗M ⊗N hψ1⊗M , T ∗ |φ⊗N i 1 ihφ2 | ψ2 i.e., T is uniquely determined and equal to Tb. (9.45) (9.46) 2 9.3.2 Supplementary properties For the rest of this section we will discuss some additional statements about the minimizer of ∆C all which will be important later on (Sections 9.4 and 11.3). The first proposition is an interesting consequence of the uniqueness result just proved. ⊗N Proposition 9.3.3 Each completely positive, unital map T : B(H ⊗M ) → B(H+ ) satisfying the equation T = Tb (where T denotes the averaged channel from Equation (9.36)) coincides with Tb. Proof. Note first that T = Tb implies in the Schrödinger picture Z 1 X ∗ αU,τ T ∗ dU = Tb∗ T = M! (9.47) τ ∈SM where αU,τ is the action defined in Equation2 (8.6). Furthermore we know from ¡ ¢ d[N ] Proposition 9.2.2 and 9.3.1 that tr ρ⊗M T ∗ (ρ⊗N ) ≤ d[M ] is true for all pure states ρ ∈ S(H) and from Proposition 9.3.2 that equality holds iff T = Tb. Consequently we have ¶ µ Z £ ¤ 1 X d[N ] − tr ρ⊗M αU,τ T (ρ⊗N ) dU = M! d[M ] τ ∈SM U(d) ³ ³ ´ ´ d[N ] d[N ] ∗ − tr ρ⊗M T (ρ⊗N ) = − tr ρ⊗M Tb∗ (ρ⊗N ) = 0. (9.48) d[M ] d[M ] Since the integral on the left hand site of this equation is taken over positive quantities the integrand has to vanish for all values of U ∈ U(d) and τ ∈ SM . This implies £ ¤ d[N ] tr ρ⊗M T ∗ (ρ⊗N ) = d[M ] for all pure states ρ ∈ S(H). However this is, according to Proposition 9.3.2 only possible if T = Tb. 2 For later use (Chapter 11) let us temporarily drop our assumption that M > N holds. In the case M ≤ N the optimal “cloner” is easy to achieve: we only have to throw away N − M particles, i.e. we can define TbN∗ →M (η) = trN −M (η), N ≥M (9.49) 2 Since T (A) ∈ B(H⊗N ) (i.e. T (A) is by assumption supported by H⊗N ) the action α of S σ N + + on the range of T is trivial and therefore omitted. 9. Optimal Cloning 132 where trN −M denotes the partial trace over N −M tensor factors. As in the N < M case TbN →M is uniquely determined by optimality. The proof can be done (almost) as in Proposition 9.3.2. Proposition 9.3.4 Assume that N ≥ M holds, then TbN →M from Equation (9.49) ⊗N is the only channel T : B(H⊗M ) → B(H+ ) with ∆C all (T ) = 0. Proof. Assume that T is optimal, i.e. ∆C all (T ) = 0. Then we can show exactly as ⊗N in the proof of Proposition 9.3.2 that T ∗ (η) is supported for all η ∈ B ∗ (H+ ) by ⊗M H+ . Hence we get ¢ ¡ ¢ ¡ ¢ ¡ (9.50) 1 = tr T ∗ (η) = tr T ∗ (η)SM = tr ηT (SM ) ⊗N for each pure state η ∈ B ∗ (H+ ). This implies T (SM ) = SN . Optimality of T C implies in addition 0 = ∆all (T ) = ∆C all (T ), where T denotes the averaged channel from Equation (9.36). Since T is fully symmetric we have £ ¡ ¢¤ £ ¡ ¢¤ 0 = sup 1 − tr T (σ ⊗M )σ ⊗N = 1 − tr T (ρ⊗M )ρ⊗N , (9.51) σ,pure with an arbitrary pure state ρ. The right hand side of this equation should be regarded ¡as in (9.37) as ¢ an integral over positive quantities which vanishes. Hence we get tr T (ρ⊗M )ρ⊗N = 1. Together with (9.50) this implies £ ¤ 0 = tr T (SM − ρ⊗M )ρ⊗N (9.52) in analogy to Equation (9.40). Now we can proceed as in the proof of Proposition 9.3.2, only replacing T ∗ with T and d[N ]/d[M ] with 1. 2 9.4 Testing single clones What remains is the proof of those parts of Theorem 9.2.3 which concerns ∆C 1. The central idea is, as in the last Section, to reduce the whole discussion to fully symmetric cloners. However, the simplifications which arise from this approach are b less strong as for ∆C all , and the proof of optimality of T is much more difficult. We have broken it up therefore into several parts. 9.4.1 Fully symmetric cloners C As ∆C all the error ∆1 is a supremum over affine quantities and therefore convex and lower semicontinuous. Invariance under the group action from Equation (8.6) is easy to prove, hence we can apply Lemma 8.2.1 again to see that it is sufficient ⊗N to search for minimizers among fully symmetric channels T : B(H ⊗M ) → B(H+ ). As in the last Subsection we can apply Theorem 8.2.9 to get a decomposition of T ⊗N into a convex linear combination of channels Tm : B(Hm ) → B(H+ ); cf. Equation (8.25). As we will see in the next to subsection it is reasonable to decompose the Tm further. According to Proposition 8.2.10 we get T (A) = X K(m) X m∈Yd (M ) j=1 ¤ £ cm Tmj trKm (Pm APm ) . dim Km (9.53) where K(m) ∈ N and the Tmj are U(d) covariant channels with ∗ N Tmj (A) = Vmj A ⊗ 1IVmj , Vmj π+ = πm ⊗ π ej Vmj (9.54) where π ej is an irreducible U(d) representation. Note that the π ej are not necessarily different from each other, i.e. we can have π ej = π ek although j 6= k 9.4. Testing single clones 133 C calculate £ To (k) ¤ F1 note that in analogy to Equation (9.29) the quantity ⊗N tr T (σ )σ does not depend on the pure state σ and the index k = 1, . . . , M . Hence we get M X ¤ 1 £ F1C (T ) = tr T (ρ(k) )ρ⊗N (9.55) M k=1 for an arbitrary pure state ρ. Now consider the Lie algebra sl(d, C) of SL(d, C), i.e. the space of trace free d × d matrices with the commutator as the P equipped (k) X is the representation of sl(d, C) Lie bracket. The map sl(d, C) 3 X 7→ k corresponding to U 7→ U ⊗N . Hence we get M X k=1 Pm X (k) Pm = ∂πm (X) ⊗ 1IKm , (9.56) where ∂πm denotes the irreducible sl(d, C) representation associated to πm ; cf. Subsection 8.3.2. With X = 1I/d − ρ and Equations (9.54), (9.55) and (9.56) we get X K(m) X ¤ ¡ ¢ £ 1 1 (9.57) cm tr Tmj ∂πm (X) ρ⊗N . F1C (T ) = − d M j=1 m∈Yd (M ) To further exploit this equation we need the following lemma which helps to calculate Tm,j (∂πm (X)). Lemma 9.4.1 Let π : U(d) → B(Hπ ) be a unitary representation, and let T : ⊗N B(Hπ ) → B(H+ ) be a completely positive, unital and U(d)-covariant map, i.e. ∗ N N T (π(u)Aπ(u) ) = π+ (u)T (A)π+ (u)∗ . Then there is a number ω(T ) such that T [∂π(X)] = ω(T ) N X X (k) , (9.58) k=1 for every trace free X ∈ B(Hπ ). ⊗N Proof. Consider the linear map sl(d, C) 3 X 7→ L(X) = T [∂π(X)] ∈ B(H+ ). It inherits from T and ∂π the covariance property N N L(U XU ∗ ) = π+ (U )L(X)π+ (U )∗ . (9.59) ⊗N ⊗N ⊗N Now note that we can identify B(H+ ) with the tensor product H+ ⊗ H+ . ⊗N Hence the map which associates to each U ∈ SU(d) the operator B(H+ ) 3 X 7→ ⊗N N N π+ (U )Xπ+ (U )∗ ∈ B(H+ ) can be reinterpreted as a unitary representation of ⊗N ⊗N SU(d) on the representation space H+ ⊗ H+ . In fact it is (unitarily equivalent N N to) the tensor product π+ ⊗ π+ . Since SU(d) 3 U 7→ U ( · )U −1 ∈ B((su(d)) is the adjoint representation of SU(d) this implies that each linear map L satisfying (9.59) N N and the adjoint representation Ad. Note in addition that the intertwines π+ ⊗ π+ representation ⊗N sl(d, C) 3 X 7→ ∂π+ (X) = ⊗N X j=1 ⊗N X (j) ∈ B(H+ ) (9.60) of the Lie algebra sl(d, C) satisfies Equation (9.59) as well. Hence we have to show that all such intertwiners are proportional, or in other words that Ad is contained N N exactly once. This however is a straightforward application of standard in π+ ⊗ π+ results from group theory. We omit the details here, see [241, § 79, Ex. 4] instead. 2 9. Optimal Cloning 134 Applying this lemma to Equation (9.57) we get with X = 1I/d − ρ and ∆C 1 = 1 − F1C the following proposition. ⊗N Proposition 9.4.2 For each fully symmetric channel T : B(H ⊗M ) → B(H+ ) with a convex decomposition as in Equation (9.54), the one particle error is given by ∆C 1 (T ) = N X d−1 1− cmj ω(Tmj ) . d M mj (9.61) Hence, to find the minimizer we have to maximize ω(Tmj ), and this is in fact the hard part of the proof. Therefore we will explain the idea first in the d = 2 case. 9.4.2 The qubit case For d = 2 the representations of SU(2) are conventionally labeled by their “total angular momentum” α = 0, 1/2, 1, . . ., which is related to the highest weight m = (m1 , m2 ) by α = (m1 − m2 )/2. The irreducible representation πα has dimension + 2α + 1, and is isomorphic to πN with N = 2α in the notation used above. For α = 1 we get the 3-dimensional representation isomorphic to the rotation group, which is responsible for the importance of this group in physics. In a suitable basis X1 , X2 , X3 of the Lie algebra su(2) we get the commutation relations [X1 , X2 ] = X3 , and cyclic permutations of the indices thereof. In the α = 1 representation ∂π1 (Xk ) generates the rotations around the k-axis in 3-space. The Casimir operator e2 = (cf. Subsection 8.3.3) of SU(2) is the square of this vector operator, i.e., C P3 2 k=1 Xk . In the representation πα it is the scalar α(α + 1), i.e., if we extend the representation ∂π of the Lie algebra to the universal enveloping algebra (which e 2 ) = α(α + 1)1I. We also contains polynomials in the generators), we get ∂πα (C can use this to determine ω(Tmj ) from Proposition 9.4.2 for arbitrary irreducible representations. This computation can be seen as an elementary computation of a so-called 6j-symbol, but we will not need to invoke any of the 6j-machinery. Lemma 9.4.3 Consider three irreducible SU(2) representations π α , πβ , πγ with α, β, γ ∈ {0, 1/2, . . .}, an intertwining isometry V πγ = πα ⊗ πβ V and the corresponding channel T (A) = V ∗ (A ⊗ 1I)V ∗ . Then we have ω(T ) = 1 α(α + 1) − β(β + 1) + . 2 2γ(γ + 1) (9.62) Proof. According to Lemma 9.4.1 ω(T ) is defined by ω(T ) · ∂πγ (Xk ) = V ∗ (∂πα (Xk ) ⊗ 1I)V. (9.63) We multiply this equation by ∂πγ (Xk ), use the ¡ ¢ intertwining property of V in the form V ∂πγ (X) = ∂πα (X) ⊗ 1I + 1I ⊗ ∂πβ (X) V , and sum over k to get X ¡ ¢ ¡ ¢ e 2 ) = V ∗ ∂πα (C e 2 ) ⊗ 1I V + V ∗ ∂πα (Xk ) ⊗ ∂πβ (Xk ) V. ω(T ) · ∂πγ (C (9.64) k The tensor product in the second summand can be re-expressed in terms of Casimir operators as X¡ k ¢ 1 X¡ ¢2 ∂πα (Xk ) ⊗ ∂πβ (Xk ) = ∂πα (Xk ) ⊗ 1I + 1I ⊗ ∂πβ (Xk ) − 2 k 1 e 2 ) ⊗ 1Iβ − 1 1Iα ⊗ ∂πα (C e 2 ). ∂πα (C 2 2 (9.65) 9.4. Testing single clones 135 Inserting this into the previous equation, using the intertwining property once again, e 2) ≡ C e2 (π)1I, we find that and inserting the appropriate scalars for ∂π(C and hence e2 (πγ ) = C e2 (πα ) + ω(T ) · C ω(T ) = e2 we find Inserting the value for C ω= ¢ 1¡ e e2 (πα ) − C e2 (πβ ) , C2 (πγ ) − C 2 e2 (πα ) − C e2 (πβ ) 1 C + . e 2 2C2 (πγ ) 1 α(α + 1) − β(β + 1) + , 2 2γ(γ + 1) which was to show. (9.66) (9.67) (9.68) 2 e 2 is some fixed Note that we have only used the fact that the Casimir operator C quadratic expression in the generators. This is also true for SU(d). Hence equation (9.67) also holds in the general case; this observation leads directly to Lemma 9.4.4. In particular, we have shown that for the purpose of optimizing ω(Tmj ) for any finite d only the isomorphism types of πα and πβ are relevant, but not the particular intertwiner V . To calculate ω(Tmj ) we have to set γ = N/2 and α is constrained by the condition that πα must be a subrepresentation of U 7→ U ⊗M , which is equivalent to α ≤ M/2. Finally we have π ej = πβ for some β = 0, 1/2, 1, . . . which is constrained by the condition that there must be a non-zero intertwiner between πγ and πα ⊗ πβ . It is well-known that this condition is equivalent to the inequality |α−β| ≤ γ ≤ α + β. This is the same as the “triangle inequality”: the sum of any two of α, β, γ is larger than the third. The area of admissible pairs (α, β) is represented in Fig. 1. Since x 7→ x(x + 1) is increasing for x ≥ 0, we maximize ω with respect to β in equation (9.62) if we choose β as small as possible, i.e., β = |α − γ|. Then the numerator in equation (9.62) becomes α(α + 1) − β(β + 1) = 2αγ − γ 2 + max{γ, 2α − γ}, (9.69) which is strictly increasing in α. Hence the maximum ωmax = M +2 N +2 (9.70) is only attained for α = M/2 and β = (M − N )/2. Note that the seemingly simpler procedure of first maximizing α and then minimizing β to the smallest value consistent with α = M/2 leads to the same result, but is fallacious because it fails to rule out possibly larger values of ω in the lower triangle of the admissible region in Fig. 1. The same problem arises for higher d, and one has to be careful to find a maximization procedure which takes into account all constraints. 9.4.3 The general case Let us generalize now the previous discussion to arbitrary but finite d. The first step is to establish the analog of Equation (9.67). Lemma 9.4.4 Consider two irreducible SU(d) representations π m and πn with N highest weight m, n, an intertwining isometry V π+ = πm ⊗ πn V and the corresponding channel Tmn (A) = V ∗ (A ⊗ 1I)V. (9.71) 9. Optimal Cloning 136 β 6 (M − N )/2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ N/2 ¡ ¡ @ ¡ @ @¡ N/2 ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ ¡ r ωmax α M/2 Figure 9.1: Area of admissible pairs (α, β). Then we have ω(Tmn ) = e2 (πm ) − C e2 (πn ) 1 C + , N e2 (π ) 2 2C + (9.72) e2 denotes the second order Casimir operator of SU(d); cf. Subsection 8.3.3. where C Proof. In analogy to (9.63) we consider the equation N (X) = V ∗ (∂πm (X) ⊗ 1In )V, ω(Tmn ) · ∂π+ ∀X ∈ su(d), (9.73) which follows from Lemma 9.4.1. Note that equation (9.73) is valid only for X ∈ su(d) (and not for X ∈ u(d) in general). Hence we have to consider the second order e 2 of SU(d) which is given, according to Subsection 8.3.3, by an Casimir operator C e 2 = P g jk Xj Xk . This is all we needed in the derivation expression of the form C jk of equation (9.67) in Lemma 9.4.3. Hence the statement follows. 2 All channels Tmj from Equation (9.54) are of the form (9.71) with some highest weight n. Hence the previous lemma shows together with Proposition 9.4.2 and the e2 (π N ) is a positive constant that we have only to maximize the function fact that C + e2 (πm ) − C e2 (πn ) ∈ Z W 3 (m, n) 7→ F (m, n) = C (9.74) N W = {(m, n) ∈ Zd × Zd | m ∈ Yd (M ) and π+ ⊂ πm ⊗ πn }, (9.75) on its domain N N where π+ ⊂ πm ⊗ πn stands for: “π+ is a subrepresentation of πm ⊗ πn ” and the latter is an necessary and sufficient condition for the existence of an intertwining N and πm ⊗ πn . The first step is the following Lemma isometry V between π+ Lemma 9.4.5 The function F from Equation (9.74) is given by F (m, n) = F1 (m, n) − 2M N − N 2 , d (9.76) 9.4. Testing single clones 137 with W 3 (m, n) 7→ F1 (m, n) = C2 (πm ) − C2 (πn ) = d X j=1 (m2j − n2j ) + d X k=1 (d − 2k + 1)(mk − nk ) ∈ Z (9.77) Proof. The first step is to reexpress F (m, n) in terms of the U(d) Casimir operators C2 and C21 . Note in this context that although equation (9.73) is, as already stated, valid only for X ∈ su(d) the representations πm and πn are still U(d) representations e 2 = C2 − 1 C2 given in Section 8.3.3: Hence we can apply the equation C d 1 1 F (m, n) = C2 (πm ) − C2 (πn ) − (C12 (πm ) − C12 (πn )). d (9.78) This rewriting is helpful, because the invariants C1 turn out to be independent of the variational parameters: Since πm is a subrepresentation of U 7→ U ⊗M = π1⊗M (U ) and ∂π1⊗M (1I) = M 1I, we also have C1 (πm ) = M . On the other hand, the existence N = πm ⊗ πn V implies of an intertwining isometry V with V π+ N N (C1 ) = (∂πm (C1 ) ⊗ 1In + 1Im ⊗ ∂πn (C1 )) V )1I = V ∂π+ V C1 (π+ = (C1 (πm )1I + C1 (πn )1I) V (9.79) N N and therefore C1 (π+ ) = C1 (πm ) + C1 (πn ). Since C1 (π+ ) = N and C1 (πm ) = M we get C1 (πn ) = N − M . Inserting this into equation (9.78) the statement follows. 2 We see that only F1 depends on the variational parameter and has to be maximized over W . To do this we have to express the constraints defining the domain W more explicitly. e = (e Lemma 9.4.6 If we introduce for each n ∈ Zd the notation n n1 , . . . , n ed ) = (−nd , . . . , −n1 ), we can express the set W as e = m − µ, and (m, µ) ∈ W1 } W = {(m, n) | n with W1 = {(m, µ) ∈ Yd (M ) × Zd | d X (9.80) µk = N and k=1 0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1}. (9.81) The function F1 is then given by e) = F1 (m, n) = F1 (m, n d X k=1 µk (2mk − 2k − µk ) + (d + 1) with W1 3 (m, µ) 7→ F2 (m, µ) = d X µk = F2 (m, µ) + (d + 1)N (9.82) k=1 d X k=1 µk (2mk − 2k − µk ) ∈ Z. (9.83) Proof. To fix the constraints for n consider the characters χn , χm and χN + of πn , πm N and π+ , i.e. χn (U ) = tr[πn (U )] and similarly in the other cases. They are elements ¡ ¢ of the Hilbert space L2 U(d) and we have [195, Cor. VII.9.6] N ⊂ πm ⊗ πn ⇔ hχN π+ + , χm χn i 6= 0. (9.84) 9. Optimal Cloning 138 N More precisely the scalar product hχN + , χm χn i coincides with the multiplicity of π+ in πm ⊗ πn . Hence N N 0 6= hχN + , χm χn i = hχ+ χn , χn i ⇔ πm ⊂ π+ ⊗ πn . (9.85) e is the highest weight of πn (cf. Subsection 8.3.2) we get Since n N ⊗ πe . πm ⊂ π + n (9.86) N A tensor product of π+ = πN 1 and an irreducible representation πp (with arbitrary highest weight p) can be decomposed explicitly into irreducible components [241, §79, Ex. 4]: X πN 1 ⊗ π p = πp1 +µ1 ,...,pd +µd . (9.87) 0≤µk+1 ≤pk −pk+1 µ1 +···+µd =N e we get Together with Equation (9.86) and the definition of n N π+ ⊂ πm ⊗ πn ⇐⇒ n ek = mk − µk with 0 ≤ µk ≤ mk − mk+1 ∀k = 1, . . . , d − 1 and d X µk = N, (9.88) k=1 and this implies the statement about W . To express the function F1 in terms of the ). new variables note that C2 (πn ) = C2 (πn ) = C2 (πe n e ) = F1 (m, m − µ). F1 (m, n) = F1 (m, n Together with equation (9.77) this implies (9.76). (9.89) 2 Hence we have reduced our problem to the following Lemma: Lemma 9.4.7 The function F2 : W1 → Z defined in equation (9.83) attains its maximum for and only for ( (N, 0, . . . , 0) for N ≤ M mmax = (M, 0, . . . , 0) and µmax = (9.90) M, 0, . . . , 0, N − M ) for N ≥ M. Proof. We consider a number of cases in each of which we apply a different strategy for increasing F2 . In these procedures we consider d to be a variable parameter, too, because if µd = md = 0, the further optimization will be treated as a special case of the same problem with d reduced by one. Case A: µd > 0, µi < mi − mi+1 for some i < d. In this case we apply the substitution µi 7→ (µi + 1), µd 7→ (µd − 1), which leads to the change ¡ ¢ ¡ ¢ δF2 = 2 −µi + µd + (d − i − 1) + mi+1 − md ≥ 2 µd + (d − i − 1) > 0 (9.91) in the target functional. In this way we proceed until either all µi with i < d satisfy the upper bound with equality (Case B below) or µd = 0, i.e., Case C or Case D applies. Case B: µd > 0, µi = mi − mi+1 for all i < d. In this case all µk , including µd are determined by the mk and by the normalization (µd = N − m1 + md ). Inserting these values into F2 , and using the normalization conditions, we get F2 (m, n) = F3 (m) − 2(M + dN ) − N 2 with F3 (m) = 2(N + d)m1 constrained by m1 ≥ · · · ≥ md ≥ 0, X k mk = M, and m1 − md ≤ N. (9.92) 9.4. Testing single clones 139 This defines a variational problem in its own right. Any step increasing m 1 at the expense of some other mk increases F2 . This process terminates either, when M = m1 , and all other mk = 0. This is surely the case for M < N , because then µd = N −m1 +md ≥ N −M > 0. This is already the final result claimed in the Lemma. On the other hand, the process may terminate because µd reaches 0 or would become negative. In the former case we get µd = 0, and hence Case C or Case D. The latter case (termination at µd = 1) may occur because the transformation m1 7→ (m1 + 1), md 7→ (md − 1) changes µd = N − m1 + md by −2. There are two basic situations in which changing both m1 and md is the only option for maximizing F3 , namely d = 2 and m1 = m2 = · · · = md . The first case is treated below as Case E. In the latter case we have 1 = N − m1 + md = N . Then the overall variational problem in the Lemma is trivial, because only one term remains, and one only has to maximize the quantity 2mk − 2k − 1, with trivial maximum at k = 1, m1 = M . Case C: µd = 0, md > 0. For µd = 0, the number md does not enter in the function F2 . Therefore, the move md 7→ 0 and m1 7→ m1 + md , increases F2 by µ1 md ≥ 0. Note that this is always compatible with the constraints, and we end up in Case D. Case D: µd = 0, md = 0, d > 2. Set d 7→ (d − 1). Note that we could now use the extra constraint µd0 ≤ md0 , where d0 = d − 1. We will not use it, so in principle we might get a larger maximum. However, since we do find a maximizer satisfying all constraints, we still get a valid maximum. Case E: d = 2, µ1 = m1 − m2 , µ2 = 1. In this case m = (m1 , m2 ) is completely fixed by the constraints. We have: m1 + m2 = M and µ1 + µ2 = m1 − m2 + 1 = N hence m1 −m2 = N −1. This implies 2m1 = M +N −1, 2m2 = M −N +1 and since m2 ≥ 0 we get M ≥ N − 1. If M = N − 1 holds we get m1 = N − 1 = M , m2 = 0 and consequently µ1 = N − 1. Together with µ2 = 1 = N − M these are exactly the parameters where F2 should take its maximum according to the Lemma. Hence assume M ≥ N . In this case µ2 = 1 implies that F2 becomes N M − 3N − 4, which is, due to M ≥ N , strictly smaller than F2 (M, 0; N, 0) = 2M N − N 2 − 2N . Uniqueness: In all cases just discussed the manipulations described lead to a strict increase of F2 (m, µ) as long as (m, µ) 6= (mmax , µmax ) holds. The only exception is Case C with µ1 = 0. In this situation there is a 1 < k < d with µk > 0. Hence we can apply the maps d 7→ d − 1 (Case D) and md 7→ 0 and m1 7→ m1 + md (Case C) until we get µd 6= 0 (i.e. d reaches k). Since µ1 = 0 the corresponding (m, µ) is not equal to (mmax , µmax ). Therefore we can apply one of manipulations described in Case A, Case B or Case E which leads to a strict increase of F2 (m, µ). This shows that F2 (m, µ) < F2 (mmax , µmax ) as long as (m, µ) 6= (mmax , µmax ) holds. Consequently the maximum is unique. 2 Now we are ready to prove optimality of Tb with respect to ∆C 1: Proposition 9.4.8 The channel Tb minimizes the one particle error ∆C 1 (T ) = 1 − C Fall . Proof. With Lemma 9.4.7 and Equations (9.72), (9.74), (9.76), (9.82) and (9.83) we can easily calculate ωmax : M +d ωmax = ω(Tb) = N +d C b and with Proposition 9.4.2 we get ∆C 1 (T ) ≥ ∆1 (T ) for all T . 2 The last thing which is missing is the uniqueness proof for ∆C 1. ⊗N Proposition 9.4.9 There is only one channel : B(H ⊗M ) → B(H+ ) which miniC mizes ∆1 . 9. Optimal Cloning 140 Proof. One part of the uniqueness proof is already given above: there is only one optimal fully symmetric cloning map, namely Tb. This follows easily from the uniqueness of the maximum found in Lemma 9.4.7 and from the fact that the representation + + N π+ is contained exactly once in the tensor product πM ⊗ πM −N (cf. Equation (9.87) and the decomposition of a fully symmetric T from Equation (9.53)). Suppose now that T is a non-covariant cloning map, which also attains the best C b value: ∆C 1 (T ) = ∆1 (T ). Then we may consider the average T of T over the group SM × U(d) (cf. Equation (9.36)), which is also optimal and, in addition, fully symmetric. Therefore T = Tb. The uniqueness part of the proof thus follows immediately from Proposition 9.3.3. 2 9.5 Asymptotic behavior Finally let us analyze the asymptotic behavior of the optimal cloner, in other words we have to ask: What is the maximal asymptotic rate r ∈ R+ (number of outputs per input system) such that the cloning error vanishes in the limit of infinitely many input systems. In other words we are searching for r such that b lim ∆C ] (TN →brN c ) = 0 N →∞ (9.93) holds; where bxc denotes the biggest integer smaller than x. Note that this question is related to entanglement of distillation and channel capacities, where asymptotic rates are used as well. In the present case the complete answer to our question is given by the following Theorem Theorem 9.5.1 For each asymptotic rate r ∈ [1, ∞] we have b lim ∆C 1 (TN →brN c ) = 0 N →∞ b lim ∆C all (TN →brN c ) = 1 − N →∞ (9.94) 1 rd−1 (9.95) Proof. Consider first the all particle error. According to Equation (9.15) we have ¯ ¯ N (r − 1)N + d ¯¯ d − 1 ¯¯ b 1 − (9.96) ( T ) ≤ ∆C N →brN c 1 d ¯ N + d (r − 1)N ¯ ¯ ¯ d − 1 ¯¯ 1 (r − 1) + d/N ¯¯ = 1 − . (9.97) d ¯ r − 1 1 + d/N ¯ C b Hence limN →∞ ∆C 1 (TN →brN c ) = 0 as stated. The all particle error ∆all is given according to Equation (9.16) and (9.10) by b ∆C all (TN →brN c ) = 1 − (N + 1)(N + 2) · · · (N + d − 1) (brN c + 1)(brN c + 2) · · · (brN c + d − 1) (9.98) hence we get (N + 1)(N + 2) · · · (N + d − 1) N →∞ (rN + 1)(rN + 2) · · · (rN + d − 1) 1 + 1/N 1 + 2/N 1 + (d − 1)/N = 1 − lim ··· N →∞ r + 1/N r + 2/N r + (d − 1)/N 1 = 1 − d−1 r b lim ∆C all (TN →brN c ) = 1 − lim N →∞ This completes the proof. (9.99) (9.100) (9.101) 2 9.6. Cloning of mixed states 141 This results complements Theorem 9.2.3 where we have seen that the one- and all-particle error admits the same (unique) optimal cloner. If we consider the asymptotic behavior we see that both figures behaves very differently: We can produce optimal copies at infinite rate if we measure only the quality of individual clones. If we take in addition correlations into account the rate is, however, zero. 9.6 Cloning of mixed states Up to now we have excluded a discussion of mixed state cloning and related tasks. The reason is that the search for a reasonable figure of merit is much more difficult in this case and not even clarified for classical systems. At first, the latter statement sounds strange, because it is indeed possible to copy classical information without any error. However, cloning mixed states of a classical system does not mean to copy a particular code word (e.g. from the hard drive in the memory of a computer) but to enlarge a sample of N iid random variables to a size M > N . To explain this in greater detail let us consider a finite alphabet X, the corresponding classical observable algebra C(X) and a channel T : C(X M ) → C(X N ). This T can be interpreted as a device which maps codewords of length N to codewords of length M and it is uniquely characterized by the matrix T~xy~ , ~x ∈ X N , ~y ∈ X M of transition probabilities; i.e. T~x~y denotes the probability that the codeword ~x = (x1 , . . . , xN ) is mapped to ~y = (y1 , . . . , yM ). If S denotes in addition a source which produces letters from X independently and identically distributed according to the (unknown) probability distribution p ∈ S(X) (recall the notation and terminology from Subsection 2.1.3), we can describe classical cloning as follows: Draw a sample ~x ∈ X N from S and generate with probability T~xy~ a bigger sequence ~y = (y1 , . . . , yM ) ∈ X M which reflects the statistics of S as good as possible. This means the output distribution T ∗ (p⊗N ) with X (T ∗ ρ⊗N )(|~y ih~y |) = (9.102) T~xy~ px1 . . . pxN ~ x∈X N should be “close” to p⊗M ∈ C(X M ). If we know that p is a pure state, this task can be performed quite easily: A pure state of a classical system is given by a Dirac measure δz ∈ S(X), i.e. δz (x) = δzx . This means that S produces always the same (although unknown) letter z ∈ X. To clone a sample ~x produced by such a source we only have to take the first letter and copy it M times. The corresponding channel Tb is described by the transition probabilities Tb~xy~ = δx1 y1 δx1 y2 · · · δx1 yM . It provides not only the optimal but even the ideal solution of the pure state classical cloning problem, because the output of Tb is in fact indistinguishable from a sample of length M drawn from S. This implies that in the pure state case (i.e. if we know a priori that input state is pure) there is a unique solution which is independent from each reasonable figure of we can choose. If the input state p is arbitrary (i.e. not necessarily pure) the situation is much more difficult and different figures of merit leads to different optimal solutions. Classical estimation theory suggests on the other hand that (if nothing is known about p) the best cloning method is to draw a sample ~y ∈ X M which is iid according to the empirical distribution pex = N (x)/N (where N (x) denotes the number of occurrences of x ∈ X in the sample ~x). The corresponding channel Te can be realized most easily in the following way: Generate M random, equally distributed integers 1 ≤ r1 , . . . , rM ≤ N and choose yk = xrk , k = 1, . . . , M for the sequence ~y . Such a procedure is used within the so called “bootstrap program” within classical statistics [79], however it is not known whether Te arises as a solution of an appropriate optimization problem. In other words, a mathematically precise way to say that Te is the optimal cloner, is missing. In the quantum case the definition of optimality 9. Optimal Cloning 142 for mixed state cloners is most probable even more difficult and good proposals are up to now not available. Chapter 10 State estimation Our next topic is quantum estimation theory, i.e. we are looking at measurements on N d-level quantum systems which are all prepared in the same state ρ. There is quite a lot of literature about this topic and we are not able to give a complete discussion here (cf. Hellströms book [109] for an overview and [107, 166, 173, 2, 89, 94, 43, 42, 68] and the references therein for a small number of recent publications). Instead we will follow the symmetry based approach already used in the last two Chapters. Parts of the presentation (Theorem 10.2.4) are based on [137]. Other results (Theorem 10.1.2 and 10.2.6) are not yet published. 10.1 Estimating pure states Consider first the case where we know that the N input systems are all prepared in the same pure but otherwise unknown state. As for optimal cloning this assumption leads to great simplifications and therefore to a quite complete solution of the corresponding optimization problem. 10.1.1 Relations to optimal cloning As already discussed in Section 4.2 cloning and estimation are closely related: If E : C(S) → B(H⊗N ) is an estimator we can construct a cloning map TE : B(H⊗M ) → B(H⊗N ) by (cf. Equation (4.19)) Z σ ⊗M tr[E(dσ)ρ⊗N ]. (10.1) TE∗ (ρ⊗N ) = S In terms of matrix elements this can be rewritten as Z hψ, TE∗ (ρ⊗N )φi = hψ, σ ⊗M φi tr[E(dσ)ρ⊗N ] = tr[E(fψφ )ρ⊗N ], (10.2) S where ψ, φ ∈ H⊗M and fψφ ∈ C(S) is the function given by fψφ (σ) = hψ, σ ⊗M φi. If we insert TE into the figure of merit ∆C 1 we get according to Equation (9.2) £ ¤ ∆C inf tr ρ(j) TE∗ (ρ⊗N ) (10.3) 1 (TE ) = 1 − ρ pure,j Z = 1 − inf tr(ρσ) tr[E(dσ)ρ⊗N ] (10.4) ρ pure S £ ¤ = 1 − inf tr ρhEiρ (10.5) ρ pure where hEiρ denotes the expectation value of E in the state ρ⊗N , i.e. Z £ ¤ hψ, hEiρ φi = hψ, σφi tr[E(dσ)ρ⊗N ] = tr E(hψφ )ρ⊗N , ∀φ, ψ ∈ H (10.6) S with fψφ ∈ C(S), fψφ (ρ) = hψ, ρφi. Hence a possible figure of merit for the estimation of pure states is the biggest deviation of the expectation value from the “true” density matrix ρ, i.e. ¡ £ ¤¢ ∆E . (10.7) p (E) = sup 1 − tr ρhEiρ ρpure If we measure the quality of E with ∆E p we get immediately an upper bound: Since M is completely arbitrary in Equation (10.1) we see from (10.5) that ∆E p (E) 10. State estimation 144 b is bounded from above by the one particle error ∆C 1 (TN →M ) of the optimal cloner for arbitrary M . With Proposition 9.2.2 we get µ ¶ d−1 N d−1 C b ∆E (E) ≤ lim ∆ ( T ) = 1 − = . (10.8) N →M p 1 M →∞ d N +d N +d We will see in the next Subsection that this bound can be achieved. Hence optimal estimation of pure states can be regarded as the limiting case of optimal cloning for M → ∞: On the one hand we can construct the optimal N → ∞ cloner from an optimal estimator. On the other we can produce the best possible estimates from a composition of the N → ∞ optimal cloner with a measurement on the infinitely many clones. 10.1.2 The optimal estimator £ ¤ Equation (10.6) shows that the function T (N, ∞) 3 E 7→ tr ρhEiρ is continuous in the weak topology of T (N, ∞) which we have defined in Subsection 8.2.1. As a supremum over affine functions, ∆E p is therefore lower semicontinuous and convex. In addition it is easy to see that ∆E p is invariant under the group action αU,τ from Equation (8.12). In this context note that hEiρ satisfies £ ¤ hψ, hαU Eiρ φi = tr αU E(fψφ )ρ⊗N (10.9) ¤ £ ∗ ⊗N (10.10) = tr E(fU ∗ ψ,U ∗ φ )(U ρU ) = hψ, U hEiU ∗ ρU U ∗ φi (10.11) and therefore hαU Eiρ = U hEiU ∗ ρU U ∗ holds. In the same way we can show that hατ Eiρ = hEiρ holds for each permutation τ ∈ SN . Inserting this in (10.7) we get ¤¢ ¡ £ (10.12) ∆E p (αU,τ T ) = sup 1 − tr ρhαU,τ Eiρ ρ pure ¡ £ ¤¢ (10.13) = sup 1 − tr ρU hEiU ∗ ρU U ∗ ρ pure ¤¢ ¡ £ = sup 1 − tr (U ∗ ρU )hEiU ∗ ρU = ∆E p (E). (10.14) ρ pure We can invoke therefore Lemma 8.2.3 to see that we can search estimators which minimize ∆E p among the covariant ones, and the latter are completely characterized by Theorem 8.2.12. The general structure is still quite complicate. However we know that the input states are pure and this leads to several possible simplifications. First of all ∆E p (E) detects only the part of E which is supported by the symmetric ⊗N . Hence we can restrict the discussion in this subsection to observables subspace H+ of the form ⊗N E : C(S) → B(H+ ). (10.15) This is the same type of assumption we have made already in the last Chapter. In addition it is reasonable to search an optimal pure state estimator among those observables which are concentrated on the set of pure states. But the latter is transitive under the action of U(d). Covariance implies therefore according to Theorem 8.2.11 that we have to look for observables of the form Z f (U σ0 U ∗ )U ⊗N P0 U ⊗N ∗ dU (10.16) E(f ) = U(d) ⊗N where σ0 is a fixed but arbitrary pure state and P0 is a positive operator on H+ . ⊗N The most obvious choice for P0 is just σ0 . Hence we define Z b ) = d[N ] f (U σ0 U ∗ )(U σ0 U ∗ )⊗N dU. (10.17) E(f U(d) 10.1. Estimating pure states 145 Sometimes it is useful to keep track about the number of input systems on which bN instead of E. b the estimator operates. In this case we write E b The map E is obviously positive. To see that it is unital as well (and therefore an observable), note that Z Z (U σU ∗ )⊗N dU = U ⊗N σ ⊗N U ⊗N ∗ dU (10.18) U(d) U(d) ⊗N is an operator which is supported by the symmetric subspace H+ and commutes ⊗N N with all unitaries U . Hence, by irreducibility of π+ it coincides with λSN where λ is a positive constant which can be determined by "Z # Z (U σU ∗ )⊗N dU = λd[N ] = λ tr(SN ) = tr U(d) tr(U σU ∗ )N dU = 1. (10.19) U(d) b is a unital positive map from C(S) to B(H⊗N ). Now we Hence λ = d[N ]−1 and E + have [111, 157, 43] b defined in Equation (10.17) satisfies Theorem 10.1.1 The estimator E d−1 b ∆E p (E) = N +d and is therefore optimal. (10.20) £ ¤ b ρ does not depend on the pure state Proof. Due to covariance the quantity tr ρhEi ρ. Hence we have £ ¤ b ρ ψi b = 1 − tr ρhEi b ρ = 1 − hψ, hEi (10.21) ∆E (E) p for an arbitrary but fixed ρ = |ψihψ|. Inserting Equation (10.17) into (10.6) we get Z ¤ £ b ρ = d[N ] tr(ρU σU ∗ ) tr(ρ⊗N (U σU ∗ )⊗N )dU (10.22) tr ρhEi U(d) " = d[N ] tr ρ⊗(N +1) Z # (U σU ∗ )⊗(N +1) dU = U(d) d[N ] d[N + 1] (10.23) where we have used for the last equality the same reasoning as in Equation (10.19). Hence Equation (10.20) follows from the definition of d[N ] in (9.10) and optimality is an immediate consequence of the bound (10.8), derived from optimal cloning. 2 In spite of its close relationship to optimal cloning, the quantity ∆E p provides only the most basic measure for the quality of an estimator, because the probability that concrete estimates are far away from ρ can be quite large although ∆ E (E) is small. There are different ways to handle this problem: One is to study the behavior of variances in the limit of infinitely many input systems. Such an analysis leads to quantum versions of Cramer-Rao inequalities and is carried out by Gill and Massar in [94]. In this work we will follow a different approach which is based on the theory of large deviations (cf. Section 10.3 for a short overview of some material we will b far away from need in this chapter). This means we are discussing the behavior of E the expectation value (in contrast to a Cramer-Rao like analysis which is related to the behavior near the expectation value) and show that the probability to get an estimate outside a small ball around the “true” state vector decays exponentially fast with the number N of input systems. To formulate the corresponding theorem b associates to each measurable subset ω ⊂ S an recall from Section 3.2.4 that E 10. State estimation 146 £ ¤ b b effect E(ω) such that tr ρ⊗N E(ω) is the probability to get an estimate in ω if the b is concentrated on the set N input systems where in the joint state ρ⊗N . Since E P ⊂ S of pure states, only subsets ω of P are interesting here. This leads to the following theorem. b defined in Equation (10.17). The seTheorem 10.1.2 Consider the estimator E quence of probability measures on the set P of pure states £ ¤ ⊗N b KN (ω) = tr E(ω)ρ (10.24) satisfies the large deviation principle with rate function I(σ) = − ln tr(ρσ). Proof. We use Theorem 10.3.5 and show that the probability measures KN satisfy the Laplace principle (cf. Definition 10.3.4). Hence consider a continuous, bounded function f : P → R and Z ∗ 1 e−N f (U σ0 U ) tr(ρU σ0 U ∗ )N dU (10.25) lim N →∞ N U(d) Z ∗ ∗ 1 e−N [f (U σ0 U )−ln tr(ρU σ0 U )] dU (10.26) = lim N →∞ N U(d) £ ¤ (10.27) = − inf f (U σ0 U ∗ ) − ln tr(ρU σ0 U ∗ ) U ∈U(d) £ ¤ = − inf f (σ) − ln tr(ρσ) . (10.28) σ∈P To derive the second Equation we have used Varadhan’s Theorem (Theorem 10.3.2) and the fact that a constant sequence of measures satisfies the large deviation principle with zero rate function. Theorem 10.3.5 now implies that the KN satisfy the large deviation principle as stated. 2 Since the rate function I(σ) = − ln tr(ρσ) is positive and vanishes only for σ = ρ £ ¤ bN (ω) converge weakly to a we see that the probability measures KN (ω) = tr ρ⊗N E point measure concentrated at σ = ρ. This shows that the estimation scheme which bN is asymptotically exact (cf. the is given by the sequence of optimal estimators E corresponding discussion in Section 4.2). 10.2 Estimating mixed states If no a priori information about the state ρ of the input systems is available, we can try to generalize the figure of merit ∆E p by replacing the supremum over all pure states with the supremum over all density matrices. In addition we have to use a different distance measure which is more appropriate for mixed states. A good choice is the trace-norm distance (for a discussion of fidelities of mixed states consider the corresponding Section in [172]) and we get ∆E m (E) = sup kρ − hEiρ k1 . (10.29) ρ∈S It is easy to see that ∆E m is a convex and lower semicontinuous function on T (N, ∞) which is invariant under the group action from Equation (8.12). Hence we can restrict due to Lemma 8.2.3 our search for optimal estimators to the set Tfs (N, ∞) of fully symmetric ones. In contrast to the pure state case discussed in the last subsection, the simplification which arises from this restriction is now not very strong. From Theorem 8.2.12 we see that the structure of a fully symmetric estimator E is simple only along the orbits of the action U(d) × S 3 (U, ρ) 7→ U ρU ∗ ∈ S, (10.30) 10.2. Estimating mixed states 147 while E can be arbitrary transversal to them. Since the set of all orbits of (10.30) coincides with the set Σ = {x ∈ [0, 1]d | x1 ≥ x2 ≥ . . . ≥ xd ≥ 0, d X xj = 1} (10.31) j=1 of ordered spectra (cf. Equation (8.39)) this observation indicates that the hard part of the estimation problem is estimating the spectrum of a density matrix, while the rest can be covered by methods we know already from pure state estimation. 10.2.1 Estimating the spectrum To follow this idea let us introduce a spectral estimator as an observable F : C(Σ) → B(H⊗N ) (10.32) on N quantum systems with values in the set of ordered spectra. If we denote the natural projection from S to Σ by p : S → Σ (i.e. p(ρ) coincides with the ordered spectrum of ρ) we can construct a spectral estimator from a full estimator E by F (f ) = E(f ◦ p), where f ∈ C(Σ). If E is fully symmetric the corresponding F is invariant under U(d) transformations and permutations, i.e. it satisfies [U ⊗N , F (f )] = [Vτ , F (f )] = 0, ∀f ∈ C(Σ), ∀U ∈ U(d), ∀τ ∈ SN . (10.33) Following Definition 8.2.4 we will denote each spectral estimator with this invariance property fully symmetric. Theorem 8.2.5 implies immediately the following proposition. Proposition 10.2.1 Consider a fully symmetric, spectral estimator, i.e. an observable F : C(Σ) → B(H⊗N ) satisfying Equation (10.33). There is a sequence µm , m ∈ Yd (N ) of probability measures on Σ such that Z X F (f ) = f (x)µm (dx) (10.34) Pm m∈Yd (M ) Σ holds, where Pm are the central projections from Equation (8.19). If we consider in particular projection valued observables the structure of F becomes much simpler. Since the Pm are the only projections which commute with all U ⊗N and all Vτ we see: Corollary 10.2.2 A fully symmetric, projection valued, spectral estimator F is given by a map Yd (N ) 3 m 7→ x(m) ∈ Σ (10.35) such that F (f ) = X m∈Yd (N ) ¡ ¢ f x(m) Pm (10.36) holds. We see that the structure of spectral estimators becomes much easier if we restrict our attention to the projection valued case. To indicate that we can do this without loosing estimation quality, let us have a short look on an optimization problem, which is similar to the one in the last section. To this end consider the expectation value of a (general) spectral estimator F Z £ ¤ x tr ρ⊗N F (dx) , (10.37) hF iρ = Σ 10. State estimation 148 and define in analogy to Equation (10.29) 2 ∆E s (F ) = sup khF iρ − p(ρ)k , (10.38) ρ∈S where p(ρ) ∈ Σ denotes again the ordered spectrum of ρ and k · k is the usual norm1 of Rd . In contrast to Theorem 10.1.1 we are not able to state a minimizer of this quantity explicitly. However, we can show at least that it is sufficient to search among projection valued observables. Proposition 10.2.3 The figure of merit ∆E s is minimized by a projection valued estimator. Proof. Using similar reasoning as in the pure state case (cf. Lemma 8.2.3) it is e easy to see that ∆E s is minimized by a fully symmetric estimator F , i.e. we have E e E ∆s (F ) = inf F ∆s (F ). Inserting Equation (10.34) into (10.37) we get Z X £ ¤ e tr Pm ρPm xµm (dx). (10.39) hF i ρ = Σ m∈Yd (N ) If ρ is non degenerate (i.e. has no vanishing eigenvalue) we have ρ ∈ GL(d, C) and get tr[Pm ρPm ] = χm (ρ) dim Km , (10.40) where χm (ρ) = tr[πm (ρ)] is the character of the irreducible GL(d, C) representation with highest weight m. Since the set of non-degenerate matrices is dense in B(H) we have by continuity (of the quantity under the supremum) sup khF iρ − p(ρ)k2 = sup khF iρ − p(ρ)k2 ρ∈S (10.41) ρ∈S det ρ>0 and therefore °2 ° ° ° X ° ° e ° , ° χ (ρ) dim K x(m) − p(ρ) ( F ) = sup ∆E m m s ° ° ρ∈S ° ° m∈Yd (N ) (10.42) det ρ>0 where x(m) are the first moments of the probability measures µm , i.e. Z x(m) = xµm (dx). (10.43) Σ The map m 7→ x(m) from Equation (10.43) defines according to Corollary 10.2.2 a E e projection valued, spectral estimator F which satisfies ∆E 2 s (F ) = ∆s (F ). 10.2.2 Asymptotic behavior To determine an optimal spectral estimator we have to find a function x(m) which minimizes the right hand side of Equation (10.42). Although explicit formulas for χm and dim Km exist, this minimization problem is very difficult to solve. The quantity under the sup in (10.42) is a polynomial in p(ρ) ∈ Σ and its degree increases linearly in N . Hence there is no closed form expression for ∆E s except for trivial cases. We omit therefore a further discussion of the optimization problem given by ∆ E s . Instead we will pass directly to an analog of Theorem 10.1.2 and determine the large 1 There are of course other possible choices for a distance measure, however k . k 2 leads to the most simple quantity, because the term under the sup becomes a polynomial (although its coefficients depends in a difficult way on m and N .) 10.2. Estimating mixed states 149 deviation behavior of an appropriate estimation scheme FbN without considering its optimality for finite N . The result of Proposition 10.2.3 serves here as a motivation for the fact that we choose the spectral estimators FbN among the fully symmetric, projection valued ones. To get a concrete expression for FbN , note that the normalized Young frames itself provide the most simple choice for a function x(m), i.e. Yd (N ) 3 m 7→ x(m) = Hence we define FbN (f ) = X m∈Yd (N ) f m ∈ Σ. N ³m´ N Pm . (10.44) (10.45) It turns out somewhat surprisingly that these FbN form an asymptotically exact estimation scheme, i.e. the probability measures tr[FbN (ω)ρ⊗N ] converge weakly to the point measure at the spectrum p(ρ) of ρ. Explicitly, for each continuous function f on Σ we have Z X ³m´ ¡ ¢ ¡ ¢ tr ρ⊗N Pm = f p(ρ) f (10.46) f (x) tr[FbN (dx)ρ⊗N ] = lim lim N →∞ N →∞ Σ N Y We illustrate this in Figure 10.1, for d = 3, and ρ a density operator with spectrum r = (0.6, 0.3, 0.1). Then Σ is a triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0), and C = (1/3, 1/3, 1/3), and we plot the probabilities tr(ρ⊗N Pm ) over m/N ∈ Σ. This behavior was observed already by Alicki et. al. [5] in the framework of statistical mechanics. We will prove now the following stronger result. Theorem 10.2.4 The sequence of probability measures KN (ω) = tr[FbN (ω)ρ⊗N ] satisfies the large deviation principle on Σ with rate function X Σ 3 x 7→ I(x) = xj (ln xj − ln pj (ρ)) ∈ [0, ∞] (10.47) (10.48) j where pj (ρ) denotes the j th component of the spectrum p(ρ) ∈ Σ of ρ. Proof. The idea of the proof is to use the Gärtner Ellis Theorem (Theorem 10.3.3), which is, however, a statement about measures on vector spaces. Instead of Fb (a measure on Σ) we have to look at its trivial extension to Rd ⊃ Σ, i.e. Rd ⊃ ω 7→ Fb (ω ∩ Σ) ∈ B(H⊗N ) for any measurable subset ω of Rd . Hence we have to analyze the integrals Z 1 ln eN hx,yi tr[FbN (dx)ρ⊗N ]. (10.49) N Rd To simplify the calculations note first that it is, due to U(d) invariance of Fb , sufficient to consider diagonal density matrices, with eigenvalues given in decreasing order. Further simplification arises if we set ρ = eh where h = diag(h1 , . . . , hd ) with h1 ≥ h2 ≥ . . . ≥ hd . Note that we exclude by this choice singular density matrices, i.e. those with zero eigenvalue. However we can retain the latter as a limiting case, if some of the hj goes to infinity. Hence, to restrict the analysis to ρ = eh is no loss of generality. Now we can define Z 1 eN hx,yi tr[FbN (dx)(eh )⊗N ] (10.50) cN (y, h) = ln N Rd X 1 = ehm,yi χm (eh ) dim Km (10.51) ln N m∈Yd (N ) 10. State estimation 150 Figure 10.1: Probability distribution tr(ρ⊗N Pm ) for d = 3, N = 20, 100, 500 and r = (0.6, 0.3, 0.1). The set Σ is the triangle with corners A = (1, 0, 0), B = (1/2, 1/2, 0), C = (1/3, 1/3, 1/3). 10.2. Estimating mixed states 151 where χm denotes again the character of πm . It is easy to see that cn (y, h) exists for each y and h. Hence it remains to show that the limit c(y, h) = limN →∞ cN (y, h) exists and the function y 7→ c(y, h) is differentiable. To get a more explicit expression for χm (ρ), note that h is an element of the Cartan subalgebra tC of gl(d, C). Hence we can calculate λ · h for each weight λ ∈ t∗C of πm (cf. Subsection 8.3.2). If we denote in addition the multiplicity of λ (i.e. the dimension of the weight subspace of λ) by mult(λ) we get X χm (ρ) = mult(λ)eλ·h (10.52) λ where the sum is taken over the set of all weight λ of πm . If the matrix elements hk of h are given (as assumed) in decreasing order, exp(m·h) is the biggest exponential (this is equivalent to the statement that m is the highest weight) and we have em·h ≤ χm (ρ) ≤ dim (Hm ) em·h . (10.53) Using the Weyl dimension formula it can be shown [76] that dim(Hm ) is bounded from above by a polynomial in N , i.e. dim (Hm ) ≤ (a1 + a2 N )a3 (10.54) with positive constants a1 , a2 , a3 . Inserting this in Equation (10.51) we get where e cN (y + h) ≤ cN (y, h) ≤ e cN (y) = 1 ln N a3 ln(a1 + a2 N ) +e cN (y + h) N X m∈Yd (N ) em·y dim Km , (10.55) (10.56) and we have identified here the diagonal matrix h = diag(h1 , . . . , hd ) with the dtuple h = (h1 , . . . , hd ) ∈ Rd . Equation (10.55) implies that c(y, h) = lim cN (y, h) = lim e cN (y + h) N →∞ N →∞ (10.57) holds. In other words we only have to calculate the limit of e c(y) = lim N →∞ e cN (y). This can be traced back to the following lemma [76]2 e N , N ∈ N given Lemma 10.2.5 Consider the sequence of probability measures K by Z ³m´ X e N (dx) = B(N )−1 dim Km , (10.58) f (x)K f N Rd m∈Yd (N ) X B(N ) = dim Km . (10.59) m∈Yd (N ) Then the limits b c(y) = lim b cN (y), b cN (y) = N →∞ 1 ln N Z Rd e N (dx) eN hx,yi K (10.60) exist and the function b c(y) is differentiable in y. For y1 ≥ · · · ≥ yd we have d X b c(y) = ln d−1 exp(yj ) . (10.61) j=1 2 If y ≥ · · · ≥ y holds, there is a direct way to calculate e c(y), because we can invoke Equation 1 d (10.57) to show that e c(y) = c(0, y). Using the definition of cN (y, h) in Equation (10.50) we easily Pd get e c(y) = ln exp(yj ); cf. [137]. For a general y ∈ Rd this argument does not work, however. j=1 10. State estimation 152 It is easy to see that e cN (y) = b cN (y) + 1 ln B(N ) N (10.62) holds. To estimate B(N ) note first that B(N ) ≤ dim H ⊗N = dN . On the other hand we have X dN = dim(Hm ) dim(Km ). (10.63) m∈Yd (N ) With Equation (10.54) this leads to ln d − a3 1 ln(a1 + N a2 ) ≤ ln B(N ) ≤ ln d. N N (10.64) Hence (10.62) implies e c(y) = b c(y) + d. Together with Equation (10.57) this shows that c(y, h) exists and is differentiable for all y. In other words the KN from Equation (10.47) ¡ satisfy the¢ large deviation principle with the Legendre transform I(x) = supy y · x − c(y, h) of c(y, h) as the rate function. Using (10.62) and (10.61) we get c(y, h) = e c(y + h) = b c(y + h) + lnd = ln d X pj (ρ) exp(yj ) (10.65) j=1 with ρ = eh , pj (ρ) = exp(hj ) and for all y ∈ Rd with y1 + h1 ≥ · · · ≥ yd + hd . It is now a simple calculus exercise to see that the rate function I(x) is given (for each x ∈ Σ) as in Equation (10.48). This completes the proof. 2 Although Fb is not optimal with respect to ∆E s or any related figure of merit (this can be checked numerically), the last theorem shows that the probability to have a measuring result which is “far away” from the true value of the spectrum of ρ, decreases exponentially with the number N of input systems. In other words we get very good estimates already with fairly small N . However, there is an even stronger argument which indicates that Fb is not only a “good” estimator but even optimal in an asymptotic sense. To explain this remark, assume that we have to estimate the spectrum of a density matrix ρ whose eigenbasis ei , i = 1, . . . , d of ρ is known a priori. In this case we only have to perform a complete von Neumann measurement with respect to the ei and get a classical estimation problem for the probability distribution xj = hej , ρej i j = 1, . . . , d. If no additional knowledge is available about the xj , j = 1, . . . , d, the best possible estimate for this problem is to use the empirical distribution, i.e. the collection of relative frequencies of outcomes e N (ω) denotes the proability to get the empirical distribution drawn from a j. If K sample of length N in the measurable set ω ⊂ Rd , it follows from Sanov’s Theorem e N satisfy the large deviation principle with exactly the same rate [77] that the K function as the KN from Equation (10.47)! This implies that the estimation scheme FbN is (asymptotically) as good as a strategy which uses a priori information about the eigenbasis of the input state ρ. Although this is not a very precise argument, it indicates that the FbN are (in a certain sense) “asymptotically optimal”. 10.2.3 Estimating the full density matrix Now we can combine the results of the last two subsections to get an estimation scheme for the full density matrix. We look for a fully symmetric estimator E : C(S) → B(H⊗N ) whose image under the projection p : S → Σ coincides with Fb from Equation (10.45), i.e. E(h ◦ p) = FbN (h) holds for each h ∈ C(Σ). According to Theorem 8.2.12 this implies that E has the form X Z £ ¤ E(f ) = f (U ρm/N U ∗ )U ⊗N Qm ⊗ 1I U ⊗N ∗ dU, (10.66) m∈Yd (N ) U(d) 10.2. Estimating mixed states 153 where the Qm ∈ B(Hm ) are appropriately chosen operators and ρx = d X j=1 xj |ej ihej |, x∈Σ (10.67) with a distinguished basis ej , j = 1, . . . , d. If h ∈ C(Σ) we get due to irreducibility of πm ³m´ Z X ¤ £ E(h ◦ p) = h (10.68) πm (U )Qm πm (U )∗ ⊗ 1IdU N U(d) m∈Yd (N ) ³m´ X h c m Pm , (10.69) = N m∈Yd (N ) with constants cm . Positivity and normalization implies cm = 1. Hence E(h ◦ p) = Fb (h) as stated. The only freedom we have is therefore the choice of the Qm . In analogy to the pure state estimator from Equation (10.17) we choose Qm = |φm ihφm | where φm ∈ Hm is the heighest weight vector associated to the U (d) representation πm (with heighest weight m). Hence we define "Z # X b )= E(f dim Hm f (U ρm/N U ∗ )πm (U )|φm ihφm |πm (U )∗ dU ⊗ 1I, m∈Yd (N ) U(d) (10.70) where the factors dim(Hm ) are needed for normalization (this is straightforward to check). As for the spectral estimator Fb and in contrast to the pure state case it is not b is optimal for finite N , i.e. whether it minimizes an appropriately clear whether E chosen figure of merit. Nevertheless, we can extend the large deviation result from Theorem 10.2.4 to get the following: bN from Equation (10.70) and a density Theorem 10.2.6 Consider the estimator E matrix ρ. The sequence £ ¤ bN (ω)ρ⊗N KN (ω) = tr E (10.71) satisfies the large deviation principle with a rate function I : S → [0, ∞] which is given by ¸¶ µ · d X pmj (U ∗ ρU ) (10.72) I(U ρx U ∗ ) = xj ln(xj ) − ln pmj−1 (U ∗ ρU ) j=1 where x ∈ Σ, ρx is the density matrix from Equation (10.67), U ∈ U(d) and pmj (σ) denotes: the principal minor of the matrix σ for j = 1, . . . , d and pm0 (σ) = 1 for j = 0. Proof. We will show that the measures KN satisfy the Laplace principle (Definition 10.3.4) which implies according to Theorem 10.3.5 the large deviation principle. Hence we have to consider Z e−N f (ρ) KN (dρ) = S Z X ∗ dim Hm dim Km e−N f (U ρm/N U ) hφm , πm (U ∗ ρU )φm idU, (10.73) m∈Yd (N ) U(d) where we have assumed without loss of generality that ρ is non-degenerate. Now we can express the matrix elements of πm (U ∗ ρU ) with respect to the highest weight vector as follows ([241, § 49] or [195, Sect. IX.8]) hφm , πm (U ∗ ρU )φm i = d Y k=1 pmk (U ∗ ρU )mk −mk+1 (10.74) 10. State estimation 154 where we have set md+1 = 0. Note that the right hand side of this equation makes sense even if the exponents are not integer valued. We can rewrite therefore Equation (10.73) with the probability measure Z X 1 m h(x)LN (dx) = N (10.75) h( ) dim(Hm ) dim(Km ) d N Σ m∈Yd (N ) to get Z e−N f (ρ) KN (dρ) = S = Z Z Σ = Z Z Σ with (10.76) N −N f (U ρx U ∗ ) d e U(d) U(d) d Y pmk (U ∗ ρU )N (xk −xk+1 ) dU LN (dx) (10.77) k=1 ¡ £ ¤¢ exp −N f (U ρx U ∗ ) − ln(d) − I1 (U, x) dU LN (dx) I1 (U, x) = d X k=1 £ ¤ (xk − xk+1 ) ln pmk (U ∗ ρU ) (10.78) (10.79) where we have set xd+1 = 0. Now we can apply Lemma 10.2.5 and Equation (10.54) to see that the LN satisfy the large deviation principle on Σ with rate function3 I0 (x) = ln(d) + d X xj ln(xj ). (10.80) j=1 The product measures dU LN (dx) satisfy therefore the large deviation principle as well, with the same rate function, but on U(d) × Σ. Varadhan’s Theorem 10.3.2 implies therefore Z e−N f (σ) KN (dσ) = − inf (f (U ρx U ∗ ) − ln(d) − I1 (U, x) + I0 (x)) (10.81) x,U S d X xj ln(xj ) − I1 (U, x) . (10.82) = − inf f (U ρx U ∗ ) + x,U j=1 Hence the KN satisfy the Laplace principle, provided there is a well defined function I : S → [0, ∞] with I(U ρx U ∗ ) = ln(d) − I1 (U, x) + I0 (x). Lemma 10.2.7 There is a (unique) continuous function I on S such that I(U ρx U ∗ ) = d X j=1 xj ln(xj ) − I1 (U, x) (10.83) holds. I is positive and I(σ) = 0 implies σ = ρ. Proof. To prove that I is well defined we have to show that U1 ρx U1∗ = U2 ρx U2∗ implies I1 (U1 , x) = I2 (U2 , x) holds. This is equivalent to the implication [U, ρx ] = 0 ⇒ I1 (U, x) = I1 (1I, x). To exploit the relation [U, ρx ] = 0 let us introduce k ≤ d integers 1 = j1 < j2 < · · · < jk = d such that xjα > xjα+1 and xj = xjα holds for jα ≤ j < jα+1 . Then we have " # k X pmjα (U ∗ ρU ) xjα ln I1 (U, x) = (10.84) pmjα−1 (U ∗ ρU ) α=1 3 The measures K eN from Lemma 10.2.5 are slightly different. We have to use therefore Theorem 10.3.3 and the same reasoning as in the last paragraph of the proof of Theorem 10.2.4. 10.3. Appendix: Large deviation theory 155 with j0 = 0 and pm0 (σ) = 1. On the other hand [U, ρx ] = 0 implies that U is block diagonal U = diag(U1 , . . . , Uk ) with Uα ∈ U(dα ), dα = jα+1 − jα . Hence we have pmjα (U ∗ ρU ) = pmjα (ρ) for all such U and all α. Together with Equation (10.84) this shows that I is well defined. To prove positivity of I consider a fixed x ∈ Σ. To get a lower bound on − d X xj ln j=1 · pmj (U ∗ ρU ) pmj−1 (U ∗ ρU ) ¸ (10.85) we have to choose U such that the − ln terms are given in increasing order;¤ i.e. the £ reverse ordering of the xj . This implies in particular that − ln pm1 (U ∗ ρU ) should be as small as possible, in other words pm1 (U ∗ ρU ) should be as big as possible. This is achieved if pm1 (U ∗ ρU ) coincides with the biggest eigenvalue λ1 of ρ. In this case the basis vector e1 has to be the eigenvector of U ∗ ρU which corresponds to λ1 . This shows that biggest possible value of pm2 (U ∗ ρU ) is λ1 λ2 , where λ2 is the second biggest eigenvalue of ρ. Again, this implies that e2 is the corresponding eigenvector of U ∗ ρU . In this way we can proceed to see that the quantity in Equation (10.85) is minimized if pmj (U ∗ ρU ) = λ1 λ2 · · · λj , where λj , j = 1, . . . , d are the eigenvalues of ρ in decreasing order. Hence we get I(x, U ) ≥ d X j=1 ¡ ¢ xj ln(xj ) − ln(λj ) , (10.86) and equality holds iff ρ and σ = U ρx U ∗ are simultaneously diagonalizable. Since the left hand side of this inequality is a relative entropy of classical probability distributions, we see that I is positive and I(σ) = 0 holds iff σ = ρ. 2 Now we can invoke Theorem 10.3.5 which implies together with Equation (10.82) and the preceding lemma the Theorem. 2 We see that the measures KN converge weakly to a point measure concentrated bN is asymptotically exact with error probaat ρ. Hence the estimation scheme E bilities which vanish exponentially fast. This is the same result as for the spectral estimator FbN which we have studied in the last section. The rate function I(σ), however, has a more difficult structure. It is in particular not just the relative entropy between ρ and σ, but a closely related quantity. 10.3 Appendix: Large deviation theory The purpose of this appendix is to collect some material about large deviation theory which is used throughout this chapter. For a more detailed presentation we refer the reader to monographs like [85] or [77]. Definition 10.3.1 Let KN , N ∈ N be a sequence of probability measures on the Borel subsets of a complete separable metric space E. We say that the KN satisfy the large deviation principle with rate function I : E → [0, ∞] if the following conditions hold: 1. I is lower semicontinuous 2. The set {x ∈ E | I(x) ≥ c} is compact for each c < ∞. 3. For each closed subset ω ⊂ E we have lim sup N →∞ 1 ln KN (ω) ≤ − inf I(x) x∈ω N (10.87) 10. State estimation 156 4. For each open subset ω ⊂ E we have lim inf N →∞ 1 ln KN (ω) ≥ − inf I(x) x∈ω N (10.88) The most relevant consequence of this definition is the following theorem of Varadhan [214], which describes the behavior of some expectation values in the limit N → ∞: Theorem 10.3.2 (Varadhan) Assume that the sequence KN , N ∈ N of probability measures on E satisfies the large deviation principle with rate function I : E → [0, ∞]. If fN : E → R, N ∈ N is a sequence of continuous functions, bounded from above and converging uniformly on bounded subsets to f : E → R, the following equality holds: Z ¡ ¢ 1 ln (10.89) e−N fN (x) KN (dx) = − inf f (x) + I(x) . lim x∈E N →∞ N E Throughout this work we are using two different methods to prove that a given sequence KN , N ∈ N satisfies the large deviation principle. One possibility, which leads to the Gärtner-Ellis Theorem [85, Thm. II.6.1], is to look at the corresponding sequence of Laplace transforms. Theorem 10.3.3 Consider a finite dimensional vector space E with dual E ∗ and a sequence KN , N ∈ N of probability measures on the Borel subsets of E. Define cN : E ∗ → R by Z 1 ln cN (y) = eN y(x) KN (dx), (10.90) N E and assume that 1. cN is finite for all N ∈ N, 2. c(y) = limN →∞ cN (y) exists and is finite for all y ∈ E ∗ , and 3. c(y) is differentiable for all y ∈ E ∗ . Then the sequence KN ,¡ N ∈ N satisfies the large deviation principle with rate ¢ function I(x) = supy∈E ∗ y(x) − c(y) . A second possibility to check the large deviation principle is basically the converse of Varadhans Theorem. To formulate it we need first the following definition. Definition 10.3.4 Let E, KN and I as in Definition 10.3.1. We say that the KN satisfy the Laplace principle with rate function I, if we have Z ¡ ¢ 1 e−N f (x) KN (dx) = − inf f (x) + I(x) . (10.91) ln lim x∈E N →∞ N E for all bounded continuous functions f : E → R. Now we have [77, Theorem 1.2.3] Theorem 10.3.5 The Laplace principle implies the large deviation principle with the same rate function. Chapter 11 Purification A central problem of quantum information processing is to ensure that devices which have been designed to perform certain tasks still work well in the presence of decoherence, i.e., under the combined influences of inaccurate specifications, interaction with further degrees of freedom, and thermal noise. If quantum error correction is not an option or if the protection by the quantum code was insufficient we have to try to recover the original information from the decohered systems. As in the classical case this is impossible for operations working on single systems. However, if many (say N ) systems are available, all of which were originally prepared in the same unknown pure state σ, and subsequently exposed to the same decohering process (described by a noisy channel R : B(H) → B(H)), we can perform a measurement on the decohered systems to get an estimate ρ ∈ S(H) for the state R∗ (σ). If R is known and invertible we can calculate (R ∗ )−1 (ρ) = σ e and reprepare arbitrarily many systems in the state σ e, which approximates the (pure) input state σ (we have described such a procedure already in Section 4.2). This is exactly the same type of problem we have described in Chapter 8 and in fact closely related to optimal cloning. The present chapter is based on [138]. A different approach to the same problem can be found in [55]. 11.1 Statement of the problem 11.1.1 Figures of Merit As for optimal cloning we are searching for cloning maps T ∈ T (M, N ) which optimize appropriate figures of merit. To define the latter let us assume that the decoherence can be described by a depolarizing channel 1I R∗ σ = λσ + (1 − λ) . d (11.1) Now we can follow the general structure given in Equation (8.4): We are searching for channels which act on N systems in the state ρ = R ∗ (σ), where σ ∈ S(H) is pure. The target functional we want to approximate is (R ∗ )−1 (ρ) = σ. If we follow the two general options already encountered in Chapter 9 (all-particle error and one-particle error) we get £ ¡ ¢¤ 1 − tr T (σ (j) )(R∗ σ)⊗N = 1 − F1R (T ) (11.2) ∆R 1 (T ) = sup σ pure,j where the supremum is taken over all pure states ρ and j = 1, . . . , N and F1R denotes the “one-particle fidelity” £ ¤ F1R (T ) = inf tr T (σ (j) )(R∗ σ)⊗N . (11.3) σ pure,j Here σ (j) = 1I⊗· · ·⊗σ ⊗· · ·⊗1I denotes the tensor products with (M −1) factors “1I” and one factor σ at the j th position (cf. Section 9.1). ∆R 1 measures the worst one particle error of the output state T ∗ ([R∗ σ]⊗N ). If we are interested in correlations too, we have to choose £ ¡ ¢¤ ⊗M R )(R∗ σ)⊗N = 1 − Fall (T ) (11.4) ∆R all (T ) = sup 1 − tr T (σ σ,pure 11. Purification 158 and the corresponding fidelity is £ ¤ R (T ) = inf tr T (σ ⊗M )(R∗ σ)⊗N . Fall σ pure (11.5) 11.1.2 The optimal purifier As in the last chapters symmetry arguments will play a central role in the following. It is in particular easy to see that the ∆R ] are convex, lower semicontinuous and invariant under the group action from Equation (8.12). Hence we can apply Lemma 8.2.1 to see that it is sufficient to search optimal purifiers among the fully symmetric ones. There is, however, a significant difference between purification and cloning. Since the ∆R ] (T ) are defined in terms of fidelities of T with respect to mixed states (R∗ (σ)) rather than pure ones we have to consider all summands in the direct sum decomposition from Proposition 8.2.8. This makes the representation theory much more difficult than in the last two chapters and we can present complete results only in the qubit case. If nothing else is explicitly stated we will assume throughout this chapter that H = C2 holds. As a first consequence of the restriction to qubits we can relabel the representations πm in terms of angular momentum quantum numbers s, i.e. we can express each Young frame m ∈ Y2 (N ) in terms of N and s = (m1 − m2 )/2, i.e. we have m1 = s + N/2 and m2 = N/2 − s. Hence we can write1 Hm = Hs , Km = KN,s and πm = πs . With ( {0, 1, . . . , N2 } N even (11.6) s ∈ I[N ] = { 12 , 32 . . . , N2 } N odd the decomposition of H⊗N (Section 8.2.2) becomes. M M πs (U ) ⊗ 1I. Hs ⊗ KN,s , U ⊗N = H⊗N = (11.7) s∈I[N ] s∈I[N ] The second special feature of the qubit case concerns the rather simple structure of the πs . For each s the Hilbert space Hs is naturally isomorphic to the symmetric ⊗2s 2s tensor product H+ and πs is unitarily equivalent to π+ (the restriction of U 7→ ⊗2s ⊗2s U to H+ ). The decomposition of a fully symmetric T from 8.2.8 therefore L becomes T (A) = T (A) ⊗ 1 I, with fully symmetric channels Ts : B(H⊗M ) → s s ⊗2s B(H+ ). Hence the Ts are exactly of the special form we have studied already in Chapter 9 within optimal cloning. Hence let us define M b : B(H⊗M ) → B(H⊗N ), Q(A) b Tb2s→M (A) ⊗ 1I (11.8) Q = s∈I[N ] with and 2s + 1 ∗ (θ) = Tb2s→M SM (θ ⊗ 1I⊗(M −2s) )SM M +1 ∗ Tb2s→M (θ) = tr2s−M θ 2s ≥ M. 2s < M (11.9) (11.10) b on a system in the state ρ The action of Q can be interpreted as follows: First apply an instrument to the system which produces with probability £ ¤ wN (s) = tr Ps ρ⊗N Ps (11.11) ⊗N 2s particles in the joint state ρs = πs (ρ) χs (ρ) (11.12) 1 Note that the Hilbert space H m = Hs depends (in contrast to KN,s ) only on s and not on N . The same is true if we consider πm (U ) = πs (U ) for U ∈ SU(2). 11.2. Calculating fidelities 159 (where Ps = Pm is the projection from H⊗N to Hs ⊗ Ks,N ). If the number 2s of systems we have got in this first step is bigger than required (2s ≥ M ) we throw away any excess particles. If 2s < M holds we have to apply the optimal 2s → M cloner to produce the required number of outputs. Although this cloning process is b of the output state wasteful we will see in Section 11.3 that the fidelities F]R (Q) b are even the best fidelities we can get for any N → M purifier. produced by Q b therefore the optimal purifier. Hence we will call Q 11.2 Calculating fidelities b This is now much more difficult Our next task is to calculate the fidelities F]R of Q. than in the cloning case. We start therefore with some additional simplifications arising from the assumption d = 2. 11.2.1 Decomposition of states Consider a general qubit density matrix ρ, which can be written in its eigenbasis as (β ≥ 0) ³ σ ´ 1 1 3 = β exp 2β ρ(β) = 2 cosh(β) 2 e + e−β 1 = tanh(β)|ψihψ| + (1 − tanh(β)) 1I, 2 µ ¶ 0 e−β ¶ µ 1 ψ= 0 eβ 0 (11.13) The parametrization of ρ in terms of the “pseudo-temperature” β is chosen here, because it is, as we will see soon, very useful for calculations. The relation to the form of ρ = R∗ σ initially given in Equation (11.1) is obviously λ = tanh(β). (11.14) The N –fold tensor product ρ⊗N can be expressed as exp(2βL3 ) 2 cosh(β))N (11.15) ´ 1³ σ3 ⊗ 1I⊗(N −1) + · · · + 1I⊗(N −1) ⊗ σ3 2 (11.16) ρ(β)⊗N = where B(H⊗N ) 3 L3 = denotes the 3–component of angular momentum in the representation U → 7 U ⊗N . Similarly we get ¡ ¢ πs ρ(β) sinh(β) ¢= ¡ ¢ exp(2βL(s) ρs (β) = ¡ (11.17) 3 ) χs ρ(β) sinh (2s + 1)β (s) where L3 denotes again the 3–component of angular momentum but now in the representation πs . For wN (s) introduced in (11.11) we can write ¡ ¢ ¡ ¢ sinh (2s + 1)β ⊗N dim KN,s . (11.18) wN (s) = tr Ps ρ(β) Ps = sinh(β)(2 cosh(β))N Hence the decomposition of ρ(β)⊗N becomes ρ(β)⊗N = M s∈I[N ] wN (s)ρs (β) ⊗ 1I , dim KN,s (11.19) 11. Purification 160 The quantities wN (s) are closely related to the spectral estimator Fb introduced in Equation (10.45). We only have to identify the set Σ with the interval [0, 1/2] according to the map [0, 1/2] 3 λ 7→ (1/2 + λ, 1/2 − λ) ∈ Σ. Then we get X £ ¤ f (s/N )wN (s). (11.20) tr Fb (f )ρ(β)⊗N = s∈I[N ] This observation will be very useful in PSection 11.4. For now, note that the wN (s) define a probability measure. Hence s wN (s) = 1 and 0 ≤ wN (s) ≤ 1. Together with the fact that the multiplicities dim KN,s are independent of β we can extract from Equation (11.18) a generating functional for dim KN,s : X ¡ ¢ 2 sinh(β)(2 cosh(β))N = 2 sinh (2s + 1)β dim KN,s (11.21) s∈I[N ] ¡ β = e −e −β ¢¡ β e +e ¢ −β N X ³ = s∈I[N ] obtaining dim KN,s ´ e(2s+1)β − e−(2s+1)β dim KN,s , µ ¶ 2s + 1 N = , N/2 + s + 1 N/2 − s (11.22) (11.23) provided N/2−s is integer, and zero otherwise. The same result can be derived using representation theory of the symmetric group; see [195], where the more general case dim H = d ∈ N is studied. 11.2.2 The one qubit fidelity b To this end note that due to covariance of the Our next task is to calculate F1R (Q). depolarizing channel R the expression under the infima defining F1R (T ) in Equation (11.3) depends for any fully symmetric purifier T not on σ and i. I.e. we get with R∗ σ = ρ(β): h ¡ ¢i (11.24) F1R (T ) = tr σ (1) T ∗ ρ(β)⊗N with σ = |ψihψ|. Further simplification arises, if we introduce the black cow parameter γ(θ) which is defined for each density matrix θ on H ⊗M by γ(θ) = 1 tr(2L3 θ). M (11.25) To derive the relation of γ to F1R note that full symmetry of T implies equivalently to (11.24) M X ¡ ¢ 1 (11.26) F1R (T ) = tr σ (j) T ∗ ρ(β)⊗N . M j=1 Since σ = (1I + σ3 )/2 holds with the Pauli matrix σ3 we get together with the definition of L3 in Equation (11.16) £ ¤i 1h 1 + γ T ∗ (ρ(β)⊗N ) . (11.27) 2 £ ¤ In other words it is sufficient to calculate γ T ∗ (ρ(β)⊗N ) (which is simpler because SU(2) representation theory is more directly applicable) instead of F1R (T ). Another advantage of γ is its close relation to the parameter λ = tanh(β) defining the operation R∗ in Equation (11.1). In fact we have F1R (T ) = γ(ρ(β)⊗N ) = ¢ ¡ ¢ 1 1 ¡ tr 2L3 ρ(β)⊗N = N tr σ3 ρ(β) = tanh(β) = λ. N N (11.28) 11.2. Calculating fidelities 161 ¡ ¢ In other words the one particle restrictions of the output state T ρ(β)⊗N are given by £ ¤ £ ¤ 1I (11.29) γ T (ρ(β)⊗N ) σ + 1 − γ[T (ρ(β)⊗N )] . 2 £ ¤ This implies that γ T (ρ(β)⊗N ) > λ should hold if T is really a purifier. Now we can prove the following proposition: b of the optimal purifier is given Proposition 11.2.1 The one–qubit fidelity F1R (Q) by X b = F1R (Q) wN (s)f1 (M, β, s) (11.30) s∈I[N ] with 2f1 (M, β, s) − 1 = ¡ ¢ 1 2s + 1 2s coth (2s + 1)β − 2s coth β = ´ ¡ ¢ 1 M + 2³ (2s + 1) coth (2s + 1)β − coth β 2s + 2 M for 2s > M (11.31) for 2s ≤ M . Proof. According to Equation (11.8) and (11.27) we have X £ ¤ ∗ b = 1 1 + F1R (Q) wN (s)γ Tb2s→M (ρs (β)) 2 s∈I[N ] X =: wN (s)f1 (M, β, s), (11.32) (11.33) s∈I[N ] where we have introduced the abbreviation £ ∗ ¤i 1h 1 + γ Tb2s→M (ρs (β)) . f1 (M, β, s) = 2 (11.34) To exploit this equation further we need the following Lemma. ⊗2s Lemma 11.2.2 For each fully symmetric channel T : B(H ⊗M ) → B(H+ ) there (s) is a positive constant ω(T ) such that T (L3 ) = ω(T )L3 holds. In addition we have M for 2s ≥ M 2s max ω(T ) = ω(Tb2s→M ) = (11.35) T M + 2 for 2s < M , 2s + 2 where the supremum is taken over all fully symmetric channels T : B(H ⊗M ) → ⊗2s B(H+ ). (s) Proof. Validity of T (L3 ) = ω(T )L3 follows from Lemma 9.4.1. If 2s < M Equation (11.35) is a consequence of Theorem 9.2.3. For 2s ≥ M note first that the one-qubit b error of Tb2s→M vanishes, i.e. ∆C 1 (T2s→M ) = 0; cf. Equation (9.4). On the other hand we know from Proposition 9.4.2 that ∆C (Tb2s→M ) is related to ω(Tb2s→M ) by 1 b ∆C 1 (T2s→M ) = 1 2 µ 1− ¶ 2s b ω(T2s→M ) . M (11.36) Hence ω(Tb2s→M ) = M/2s as stated and this is due to ∆C 1 (T ) ≥ 0 for all T the biggest possible value. 2 11. Purification 162 Now we have and £ ∗ ¤ ¤ 1 £ b 2f1 (M, β, s) − 1 = γ Tb2s→M (ρs (β)) = tr 2T2s→M (L3 )ρs (β) M ω(Tb2s→M )2s ω(Tb2s→M ) (s) tr[2L3 ρs (β)] = γ[ρs (β)]. = M M ¡ (s) (s) ¢ ´ 1 ³ (s) 1 tr 2L3 exp(2βL3 ) γ ρs (β) = tr 2L3 ρs (β) = ¡ ¢ 2s 2s tr exp(2βL(s) ) 3 ¡ 1 d (s) ¢ ln tr exp(2βL3 ) = 2s dβ ¡ ¢ ¢ 1 d ¡ ln sinh (2s + 1)β − ln sinh β = 2s dβ ¡ ¢ 1 2s + 1 coth (2s + 1)β − coth β = 2s 2s ¡ ¢ (11.37) (11.38) (11.39) (11.40) (11.41) (11.42) Inserting the values of ω(Tb2s→M ) from Equation (11.42) we get Equation (11.31). 2 11.2.3 The all qubit fidelity R Similarly to Equation (11.46) the infima defining Fall (T ) in Equation (11.5) does not depend on σ, provided T is a fully symmetric purifier. Hence we have £ ¡ ¢¤ R Fall (T ) = tr σ ⊗M T ∗ ρ(β)⊗N (11.43) with σ = |ψihψ|. Using this relation we can prove the following proposition: R b Proposition 11.2.3 The all–qubit fidelity Fall (Q) of the optimal purifier is given by X R b wN (s)fall (M, β, s) (11.44) Fall (Q) = s∈I[N ] where fall (M, β, s) is given by 2s + 1 1 − e−2β M + 1 1 − e−(4s+2)β µ ¶−1 X µ ¶ fall (M, β, s) = 2s 1 − e−2β K 2β(K−s) e 1 − e−(4s+2)β M M M ≤ 2s (11.45) M > 2s. K b given in Equation (11.8) we get for the optimal Proof. Using the decomposition of Q purifier something similar as in the last subsection: h X ¡ ¢i R b ∗ Fall (Q) = (11.46) wN (s) tr σ ⊗M Tb2s→M ρs (β) . s∈I[N ] However the calculation of h ¡ ¢i ∗ fall (M, β, s) := tr σ ⊗M Tb2s→M ρs (β) (11.47) is now more difficult, since the knowledge of Tb2s→M (L3 ) = ω(Tb2s→M )Ls3 is not sufficient in this case. Hence we have to use the explicit form of Tb2s→M in Equation 11.3. Solution of the optimization problems 163 (11.9) and (11.10). For 2s < M this leads to 2s + 1 M +1 2s + 1 = M +1 2s + 1 = M +1 2s + 1 = M +1 fall (M, β, s) = hψ ⊗M , SM (ρs (β) ⊗ 1I⊗(M −2s) )SM ψ ⊗M i (11.48) hψ ⊗M , (ρs (β) ⊗ 1I⊗(M −2s) )ψ ⊗M i (11.49) hψ ⊗2s , ρs (β)ψ ⊗2s i (11.50) 1 − e−2β 1 − e−(4s+2)β (11.51) For M ≤ 2s we have to calculate i h h ¡ ¢i ∗ ρs (β) = tr Tb2s→M (σ ⊗M )ρs (β) fall (s, M, β) = tr σ ⊗M Tb2s→M ´i h ³ = tr ρs (β) SM [(|ψ ⊗M ihψ ⊗M |) ⊗ 1I⊗(2s−M ) ]SM (11.52) (11.53) We will compute the operator Tb2s→M (σ ⊗M ) in occupation number representation. By definition, the basis vector “|ni” of the occupation number basis is the normalized version of¡S¢M Ψ, where Ψ is a tensor product of n factors ψ and (M − n) factors φ, where φ = 01 denotes obviously the second basis vector. The normalization factor is easily computed to be SM (ψ ⊗n ⊗ φ⊗(M −n) ) = µ M n ¶−1/2 |ni. (11.54) We can now expand the “1I” in Equation (11.53) in product basis, and apply (11.54), to find X µ2s − M ¶µ2s¶−1 ⊗(2s−M ) ⊗M ⊗M SM [(|ψ ihψ |) ⊗ 1I ]SM = |Ki hK|. (11.55) K −M K K Now L3 is diagonal in this basis, with eigenvalues mK = (K − s), K = 0, . . . , (2s). With ρs (β) from (11.13) we get µ ¶µ ¶−1 1 − e−2β X 2s − M 2s fall (M, β, s) = e2β(K−s) K 1 − e−(4s+2)β K K − M for M ≤ 2s. (11.56) Together with µ 2s − M K −M µ ¶−1 µ ¶ ¶µ ¶−1 K!(2s − K)! 2s K 2s (2s − M )! = (11.57) = (K − M )!(2s − K)! (2s)! M M K we get fall (M, β, s) = µ ¶−1 X µ ¶ 2s 1 − e−2β K 2β(K−s) e . −(4s+2)β M M 1−e K Now the statement follows from Equations (11.46), (11.51) and (11.58). 11.3 Solution of the optimization problems Now we are going to prove the following theorem: (11.58) 2 11. Purification 164 b maximizes the fidelities F R (T ) and F R (T ) (reTheorem 11.3.1 The purifier Q 1 all spectively minimizes the corresponding errors). Hence the optimal fidelities (] = 1, all) F]max (N, M ) = sup F] (T ), (11.59) T ∈T (N,M ) are given by Equation (11.30) and (11.44). Proof. The figures of merit ∆R ] satisfy (as in the cloning case) the assumption from Lemma 8.2.1. Hence, there is a fully symmetric purifier T which minimizes R max R ∆R (N, M ) = F# (T ). Applying ] , respectively maximizes F] , i.e. we have F ] L Proposition 8.2.8 we get a decomposition T (A) = s Ts (A)⊗1I with fully symmetric ⊗2s channels T : B(H⊗N ) → B(H+ ). With Equations (11.24) and (11.43) we get therefore X ¤ £ 1 (11.60) wN (s)γ Ts∗ (ρs (β)) F1R (T ) = 1 + 2 s∈I[N ] and R Fall (T ) = X s∈I[N ] ¡ ¢¤ £ wN (s) tr σ ⊗M Ts∗ ρs (β) . (11.61) The last two Equations show that we have to optimize each component T s of the purifier T independently. In the one qubit case this is very easy, because we can use ¤ ¡ (s) ¢ £ (s) Lemma 11.2.2 to get£Ts (L3 ) =¢ω(Ts )L3 and γ Ts∗ (ρs (β)) = ω(Ts ) tr L3 ρs (β) . Hence maximizing γ Ts∗ (ρs (β) ] is equivalent to maximizing ω(Ts ). But we have according to Lemma 11.2.2 M for 2s ≥ M 2s b max ω(Ts ) = ω(T2s→M ) = (11.62) T M +2 for 2s < M , 2(s + 1) b holds as stated. which shows that F1max (N, M ) = F1R (Q) For the many qubit–test version the proof is slightly more difficult. However as in the F1R -case we can solve the optimization problem for each summand in Equation (11.61) separately. First of all this means that we can assume without loss ⊗M of generality that Ts∗ takes its values in B(H+ ) because the functional ¡ ⊗M ∗ ¡ ¢¢ (11.63) fs (Ts ) := tr σ Ts ρs (β) which we have to maximize, depends only on this part of the operation. Full symmetry implies in addition that Ts∗ (ρs (β)) is diagonal in occupation number basis (see Equation (11.54)), because Ts∗ (ρs (β)) commutes with each πs0 (U ) (s0 = M/2, U ∈ U(2)) if πs (U ) commutes with ρs (β). If M > 2s this means we have Ts∗ (ρs (β)) = κ∗ σ ⊗M + r∗ where r∗ is a positive operator with σ ⊗M r∗ = r∗ σ ⊗M = 0. Inserting this into (11.63) we see that fs (Ts ) = κ∗ . Hence we have to ¡maximize κ¢∗ . The first step is an upper bound which we get from the fact that tr σ ⊗2s ρs (β) 1I − ρs (β) is a positive operator. Since Ts∗ (1I) = (2s + 1)/(M + 1)1I (another consequence of full symmetry) we have ³ ¡ ´ ¢ ¢ 2s + 1 ¡ ⊗2s 0 ≤ T tr σ ⊗2s ρs (β) 1I − ρs (β) = tr σ ρs (β) 1I − κσ ⊗M − r∗ . (11.64) M +1 Multiplying this Equation with σ ⊗M and taking the trace we get κ∗ ≤ ¢ 2s + 1 ¡ ⊗2s tr σ ρs (β) . M +1 (11.65) 11.4. Asymptotic behavior 165 However calculating fs (Tb2s→M ) we see that this upper bound is achieved, in other words Tb2s→M maximizes fs . If M ≤ 2s holds we have to use slightly different arguments because the estimate (11.65) is to weak in this case. However we can consider in Equation (11.63) the dual Ts instead of Ts∗ and use then similar arguments. In fact for each covariant Ts the quantity Ts (σ ⊗M ) is, due to the same reasons as Ts∗ (ρs (β)) diagonal in the occupation number basis and we get Ts (σ ⊗M ) = κσ ⊗2s +r where r is again a positive P2s−1 operator with r = n=0 rn |nihn| (|ni denotes again the occupation number basis) and κ is a positive constant. Since Ts is unital we get from 1I−σ ⊗M ≥ 0 the estimate 0 ≤ κ ≤ 1 in the same way as Equation (11.65). Calculating Tb2s→M (σ ⊗M ) shows again that the upper bound κ = 1 is indeed achieved, however it is now not clear whether maximizing κ is equivalent to maximizing fs (Ts ). Hence let us show first that κ = 1 is necessary for fs (Ts ) to be maximal. This follows basically from the fact that Ts is, up to a multiplicative constant, trace preserving. In fact we have ¢ ¡ ¢ ¡ ¢ ¡ 2s + 1 . tr Ts (σ ⊗M ) = tr Ts (σ ⊗M )1I = tr σ ⊗M Ts∗ (1I) = M +1 (11.66) This means especially that κ + tr(r) = (2s + 1)/(M + 1) holds, i.e. decreasing κ by 0 < ² < 1 is equivalent to increasing tr(r) by¡ the same ¢². Taking into account that P2s ρs (β) = n=0 hn |nihn| holds with hn = exp 2β(n − s) , we see that reducing κ by ² reduces fs (Ts ) at least by ³ ¡ ¡ ¢ ¢ ¡ ¢´ ² tr σ ⊗2s ρs (β) − tr |2s − 1ih2s − 1|ρs (β) = ² e2βs − e(2s−1)β > 0. (11.67) Therefore κ = 1 is necessary. The last question we have to answer, is how the rest term r has to be chosen, for fs (Ts ) to be maximal. To this end let us consider the cloning fidelity of Ts , C i.e. Fall (Ts ). It is in contrast to fs (Ts ) maximized iff κ = 1. However the operation C which maximizes Fall (Ts ) is according to Proposition 9.3.4 unique. This implies that κ = 1 fixes Ts completely. Together with the facts that κ = 1 is necessary for fs (Ts ) to be maximal and κ = 1 is realized for Tb2s→M we conclude that max fs (Ts ) = fs (Tb2s→M ) holds, which proves the assertion. 2 11.4 Asymptotic behavior Now we want to analyze the rate with which nearly perfect purified qubits can be produced in the limit N → ∞. This is more difficult as in the cloning case (Section 9.5), because we have to compute the asymptotic behavior of various expectations involving s. Fortunately we can trace this problem back to the analysis of spectral estimation from Subsection 10.2.2. According to Equation (11.20) and Theorem 10.2.4 the quantities wN (s) define a sequence of probability measures on Σ = [0, 1/2] which converge weakly to a point measure. More precisely we have the following lemma. Lemma 11.4.1 Let fN : (0, 1) → R, N ∈ N be a uniformly bounded sequence of continuous functions, converging uniformly on a neighborhood of λ = tr(ρ(β)σ 3 ) to a continuous function f∞ , and let wN (s) denote the weights in Equation (11.18). Then X lim wN (s)fN (2s/N ) = f∞ (λ). (11.68) N →∞ s∈I[N ] 11.4.1 The one particle test Let us analyze first the behavior of the optimal one–qubit fidelity F1max (N, M ) (cf. Equation (11.59)) in the limit M → ∞. Obviously only the M > 2s case of 11. Purification 166 f1 (M, β, s) is relevant in this situation and we get, together with Equation (11.30), the expression F1max (N, ∞) = · ´¸ X ¡ ¢ 1 1 ³ wN (s) 1 + (2s + 1) coth (2s + 1)β − coth β , 2 2s + 2 (11.69) s∈I[N ] which obviously takes its values between 0 and 1. To take the limit N → ∞ we can write X 2s (11.70) lim F1max (N, ∞) = lim wN (s)fN,∞ ( ) N →∞ N →∞ N s∈I[N ] with · ´¸ ¡ ¢ 1 1 ³ fN,∞ (x) = 1+ (N x + 1) coth (N x + 1)β − coth β . 2 Nx + 2 (11.71) The functions fN,∞ are continuous, bounded and converge on each interval (², 1) with 0 < ² < 1 uniformly to f∞,∞ ≡ 1. Hence the assumptions of Lemma 11.4.1 are fulfilled and we get (11.72) lim F1max (N, ∞) = f∞,∞ (λ) = 1 N →∞ Hence we can produce arbitrarily good purified qubits at infinite rate if we have enough input systems. In other words we have proved the following proposition: Proposition 11.4.2 For each asymptotic rate r > 0 the optimal one-qubit fidelity from Equation (11.59) satisfies lim F1max (N, brN c) = 1. N →∞ (11.73) Let us consider now F1max (N, M ) for M < ∞. Since F1max (N, M ) > F1max (N, ∞) we have obviously limN →∞ F1max (N, M ) = 1 for all M . Hence there is no difference between finite and infinite output systems, as long as we are looking only at the limit limN →∞ F1max (N, M ). Our next task is therefore to analyze how fast the quantities F1max (N, M ) approaches 1 as N → ∞. To this end we compare three different quantities F1max (N, ∞), F1max (N, 1) and f1 (1, β, N/2). The latter is the maximal fidelity we can expect for N input systems. It corresponds to a device which produces an output only with probability wN (N/2) and declares failure otherwise (from Lemma 11.4.1 we see that this probability goes to 0 as N → ∞). In slight abuse of notation we write F1max (N, 0) = f1 (1, β, N/2) expressing that this is the case with no demands on output numbers at all. The results are given in the following proposition and plotted in Figure 11.1. Proposition 11.4.3 The leading asymptotic behavior (as N → ∞) of F1max (N, M ) for the cases M = 0, 1, ∞ is of the form µ ¶ cM 1 max +o Fone (N, M ) = 1 − (11.74) 2N N ¡ ¢ where, as usual, o N1 stands for terms going to zero faster than N1 , and with c0 = (1 − λ)/λ c1 = (1 − λ)/λ c∞ = (λ + 1)/λ (11.75) 2 (11.76) 2 (11.77) 11.4. Asymptotic behavior 167 1 F1 (M, N ) 0.95 0.9 0.85 0.8 0.75 0.7 0.65 M =0 M =1 M =∞ 0.6 0.55 0 20 40 60 80 100 120 140 160 180 200 N Figure 11.1: Optimal one-qubit fidelity for M = 0, M = 1 and M = ∞ output systems. Proof. Consider the limit lim N (1 − F1max (N, ∞)) = N →∞ X s∈I[N ] c∞ 2s wN (s)feN,∞ ( ) ≡ N 2 (11.78) with feN,∞ = N (1 − fN,∞ ). The existence of this limit is equivalent to the asymptotic formula (11.74). Lemma 11.4.1 leads to c∞ /2 = fe∞,∞ (λ) with fe∞,∞ = limN →∞ feN,∞ uniformly on (², 1). To calculate fe∞,∞ note that feN,∞ (x) = N N coth β + + Rest Nx + 2 Nx + 2 (11.79) holds, where “Rest” is a term which vanishes exponentially fast as N → ∞. Hence with coth β = 1/λ we get 1+λ . c∞ = 2fe∞,∞ (λ) = λ2 (11.80) The asymptotic behavior of F1max (N, 1) can be analyzed in the same way. The only difference is that we have to consider now the 1 = M ≤ 2s branch of Equation (11.31). In analogy to Equation (11.78) we have to look at lim N (1 − F1max (N, 1)) = N →∞ X s∈I[N ] 2s c1 wN (s)feN,1 ( ) = N 2 with feN,1 = N (1 − fN,1 ) and · ¸ ¡ ¢ ¤ 1 £ 1 1− (N x + 1) coth (N x + 1)β − coth β . fN,1 (x) = 2 Nx For fe∞,1 we get 1 −1 1 fe∞,1 (x) = ( + ). 2 x xλ (11.81) (11.82) (11.83) 11. Purification 168 Using again Lemma 11.4.1 leads to 1−λ . (11.84) λ2 max Finally let us consider Fone (N, 0). Here the situation is easier than in the other cases because we only have to look at one summand (f1 (1, β, N/2)) of F1max (N, 1), i.e. ¸ · ¡ ¢ ¤ 1 1 £ max Fone (N, 0) = 1− (N + 1) coth (N + 1)β − coth β = fN,1 (1). (11.85) 2 N c1 = 2fe∞,1 (λ) = Hence we only need the asymptotic behavior of fN,1 (x) at x = 1. Using Equation (11.83) we get 1−λ 1 max + ··· . (11.86) Fone (N, 0) = 1 − λ 2N This concludes the proof of Equations (11.74) to (11.77). 2 11.4.2 The many particle test We have seen already within optimal cloning (Section 9.5) that the all particle C fidelity Fall behaves in the limit N, M → ∞ quite differently as F1C . This statement generalizes to purification. It is in particular impossible to produce outputs with R Fall (N, M ) → 1 at a non-vanishing rate (as long as λ < 1). More precisely the following proposition holds (cf. also Figure 11.2) Proposition 11.4.4 For each rate µ ∈ R+ we have 2λ2 2 max (N, bµN c) = 2λ +2 µ(1 − λ) Φ(µ) = lim Fall 2λ N →∞ µ(1 + λ) if µ ≤ λ (11.87) if µ ≥ λ. Proof. The central part of the proof is the following lemma, which allows us to ¡ 2s ¢−1 P ¡ K ¢ 2β(K−s) term in Equation (11.45). handle the M K M e Lemma 11.4.5 For integers M ≤ K and z ∈ C, define ¶ µ ¶−1 X K µ R K−R K z . Φ(K, M, z) = M M (11.88) R=M Then, for |z| < 1, and c ≥ 1: lim M,K→∞ M/K→c Φ(K, M, z) = 1 . 1 − (1 − c)z (11.89) Proof. We substitute R 7→ (K − R) in the sum, and get Φ(K, M, z) = ∞ X c(K, M, R)z R , (11.90) R=0 where coefficients with M +R > K are defined to be zero. We can write the non-zero coefficients as ¶ µ ¶−1 µ (K − M )!(K − R)! K −R K = (11.91) c(K, M, R) = M K!(K − R − M )! M (K − M − R + 1) (K − M ) (K − M − 1) ··· (11.92) = K (K − 1) (K − R + 1) R−1 Y³ M ´ = 1− . (11.93) K −S S=0 11.4. Asymptotic behavior 169 Since 0 ≤ c(K, M, R) ≤ 1, for all K, M, R, the series for different values of M, K are all dominated by the geometric series, and we can go to the limit termwise, for every R separately. In this limit we have M/(K − S) → c for every S, and hence c(K, M, R) → (1 − c)R . The limit series is again geometric, with quotient (1 − c)z and we get the result. 2 To calculate now Φ(µ) recall that the weights wN (s) approach a point measure in 2s/N =: x concentrated at λ = tr(ρ(β)σ3 ). This means that in Equation (11.44) only the term with 2s = λN survives the limit. Hence if µ ≥ λ we get M ≥ λN = 2s. Using Equation (11.45) and Lemma 11.4.1 we get in this case Φ(µ) = λ (1 − e−2β ). µ (11.94) We see that Φ(µ) → 0 for µ → ∞ and Φ(µ) → 1 − exp(−2β) for µ → λ. If 0 < µ < λ we get M < λN = 2s, which means we have to choose Equation (11.45) for fall (M, β, s). With Lemma 11.4.5 and Lemma 11.4.1 we get Φ(µ) = 1 − e−2β 1 − (1 − µ/λ)e−2β (11.95) which approaches 1 if µ → 0 and 1 − exp(−2β) if µ → λ. Writing this in terms of λ = tanh β, we obtain Equation (11.87). 2 1 theta=0.25 theta=0.50 theta=0.75 theta=1.00 Φ(µ) 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.5 1 1.5 2 µ Figure 11.2: Asymptotic all-qubit fidelity Φ(µ) plotted as function of the rate µ. Chapter 12 Quantum game theory Game theory is a misnomer for “multiperson decision theory” and it studies the decision making process of competing agents in some conflict situations. This description leads immediately to the question what can be “quantum” in game theory. The answer is that quantum game theory studies games which are played with quantum systems and typical applications lie within quantum cryptography, e.g. if Alice and Bob use a quantum key distribution system to exchange a secure cryptographic key they cooperate against a third player – the eavesdropper – who uses quantum measurements to gain as much information about the conversation as possible. Although it is a very new field of research there are already many publication available, mainly focusing on the discussion of examples (a recent list of references can be found in [90]). In the present chapter we will discuss the basic aspects of quantum game theory. To this end we consider after an introductory section (Section 12.1) two particular examples: A quantum version of the “Monty Hall problem” (Section 12.2) and a cryptographic protocol called “quantum coin tossing” (Section 12.3). Most parts of this Chapter are based on publications in [65, 75]. 12.1 Overview Game theory, originally developed in the 1930’s in the context of economic theory by von Neumann and Morgenstern [224], has become in the meantime a well established theory which is used in many areas such as social sciences, biology and engineering. In this section we will give a brief (informal) survey about the most basic concepts and some (slight) modifications which are necessary to cover quantum games. A nice introduction to (classical) game theory can be found in [165]. 12.1.1 Classical games Our starting point is the definition of a two person 1 normal form game. It consists of two sets Σj (where j = A, B stands – as usual – for Alice and Bob) and two function uj : XA × XB → R. The elements of Xj represent the strategies which are available to player j and uj (sA , sB ) is the payoff player j gains if Alice and Bob play strategy sA and sB respectively. The uj are therefore called payoff or utility functions. A special case arises if uA = −uB , i.e. the win of one player is the loss of the other. This is a real conflict situation and called zero sum game. Alternatively a game can be represented in its extensive form which is more related to the usual picture of a game like chess where both players manipulate turn by turn the information which is represented by the pieces on the game board. Mathematically the extensive form of a game is given by a tree (i.e. a graph where each pair of nodes is connected by exactly one path), an allocation of each node of the tree (except the end notes) to a player and payoffs for each player at the end nodes. If node n belongs to Alice all edges starting in n represent the possible moves Alice has in the situation represented by n. A strategy of a player j is now a complete description of the action she will take at each possible “position” in the game, i.e. a map which associates to each node belonging to j one edge 2 . If a 1 Most of the material presented here can be generalized in a straightforward way to multiperson games – the two-person case is, however, sufficient for our purposes. 2 To be more precise we have to decompose the set of all nodes into equivalence classes (“information sets”) such that all nodes in a class belong to the same player and the same moves are available at each node. Each equivalence class represent the same position although each node in a class is given by a different combination of moves. A strategy is then a map from equivalence 12.1. Overview 171 strategy is given for each player we get a path in the tree which connects the top node with one of the end nodes where the payoffs are given. Hence we can construct the normal form of a game from its extensive form but the converse is not true. The normal form of a game should therefore be regarded as a summary representation – which contains, however, for many purposes enough information. The aim of each player is of course to maximize her (or his) payoff uj (sA , sB ) by judicious choice of the own strategy. In general, however, this has to be done without knowledge about the choice of the opponent. The most important concept to solve this problem is the Nash equilibrium. A pair of strategies (b sA , sbB ) ∈ XA × XB is called pure Nash equilibrium if uA (sA , sbB ) ≤ uA (b sA , sbB ) and uB (b sA , sB ) ≤ uB (b sA , sbB ) (12.1) holds for all alternative strategies sA ∈ XA respectively sB ∈ XB . If the game has a unique Nash equilibrium it is natural for both players to choose the corresponding strategy, because it provides the best payoff under the assumption that the other player behaves “rational”, i.e. always tries to maximize his own payoff. This can be seen quite easily for a zero sum game: If Alice plays sbA each deviation of Bob from sbB would decreases his gain. Hence it is natural for Alice to assume that Bob chooses sbB and her best response to sbB is sbA . There are games which do not admit pure state Nash equilibria. This problem can be solved if mixed strategies are taken into account. In other words the same game is played many times and player j uses the pure strategy s ∈ Xj with probability pj (s). Hence mixed strategies are probability distributions on the sets X j and the sets of all strategies are therefore given by S(Xj ), i.e. the state spaces of the classical algebras C(Xj ). The utility functions are obviously elements of C(XA ×XB ) and we can calculate for each pair pj ∈ S(Xj ) the expected payoff X pA (sA )pB (sB )uj (sA , sB ). (12.2) ūj (pA , pB ) = pA ⊗ pB (uj ) = sA ,sB Now we can define a (mixed) Nash equilibrium to a be pair pbA ∈ S(XA ), pbB ∈ S(XB ) such that ūA (pA , pbB ) ≤ ūA (b pA , pbB ) and ūB (b pA , pB ) ≤ ūB (b pA , pbB ) (12.3) holds for all pA ∈ S(XA ), pB ∈ S(XB ). Due to a well known theorem of Nash [167] each finite normal form game has a mixed Nash equilibrium, which is however in general not unique. 12.1.2 Quantum games Let us turn now to quantum games. Roughly speaking a quantum game is nothing else but a usual game (in the sense described above) which is played with quantum systems, i.e. the strategies can be represented by quantum operations. There are several proposals which try to make this informal idea more precise. Most of them are based on the normal form description of a game [84, 156, 149]. It is however quite difficult to provide a definition which describes all relevant physical ideas without excluding interesting examples (e.g. the version of the Monty Hall game described in Section 12.2 is not covered by most of the proposed definitions). We will follow therefore a different, turn based, approach which can be loosely regarded as a generalization of the extensive form. Maybe it is still not general enough, but it covers many relevant examples; in particular those which arise from quantum cryptography (cf. Section 12.3). Hence a quantum game is in the following an interactive process between two players (Alice and Bob) which obeys the following structure: classes to moves rather than nodes to moves. 12. Quantum game theory 172 1. The starting point is a hybrid system described by an observable algebra A = B(H) ⊗ C(X), where the quantum part is related to the “game system” which is manipulated during game-play while X is a classical notepad which is used to store and exchange classical information of private or public nature. 2. Initially the system is prepared in a state ρ which can be either arbitrary (e.g. if it is overridden in the next step) or which is part of the game description. 3. The first player operates on A with a channel T1 : A → A which has to be taken from a distinguished set Σ1 . Since A is a hybrid system, T1 is a parameter dependent instrument (cf. Section 3.2.5). Hence it does not only describe an operation on the quantum system B(H). Instead, it can depend on classical information initially stored in X. After this step the system is in the state T1∗ ρ. 4. Now Alice and Bob continue to act alternating on A with channels Tj ∈ Σj until a fixed number N of rounds is reached3 . The sets Σj describe the operations which are allowed at each step. 5. Finally an observable E is measured to determine the payoffs. The rules of the game are obviously implemented by the sets Σj , the initial preparation and the final measurement E, while Alice’s respectively Bob’s strategies are the elements of the sets ΣA = Σ1 × Σ3 × · · · × ΣN −1 and ΣB = Σ2 × Σ4 × · · · × ΣN ; where we have assumed without loss of generality that N is even. Alternatively we can allow one or both of the players to choose the initial preparation and the final measurement as a part of their strategical options. This is however not really a generalization, because we can interpret the state T1∗ ρ of the A-system after the first round as an initial preparation provided by Alice. In the same way we can look at TN E as an observable measured on A by Bob in the N th round. Note that this reinterpretation of the first and last round in the game will be used in the next section. Assume now that Alice chooses the strategy sA = (T1 , T3 , . . . , TN −1 ) ∈ ΣA and Bob sB = (T2 , . . . , TN ) ∈ ΣB . The probability for player j to get a payoff in ωj ⊂ R is then given by ¤ £ (12.4) υ(ωA , ωB ) = tr (TN∗ TN∗ −1 · · · T1∗ ρ)E(ωA × ωB ) and the expected payoffs are Z £ ¤ ῡj (sA , sB ) = xj tr (TN∗ TN∗ −1 · · · T1∗ ρ)E(dxA × dxB ) . (12.5) R2 (1) If the game is repeated many times and Alice uses strategy sA with probability λ (2) and sA with 1 − λ Equation (12.4) becomes (1) (2) (1) (2) λυ(sA , sB ) + (1 − λ)υ(sA , sB ) = υ(λsA + (1 − λ)sA , sB ), (12.6) where (1) (2) (1) λsA + (1 − λ)sA = (λT1 (2) (1) (2) + (1 − λ)T1 , . . . , λTN −1 − (1 − λ)TN −1 ). (12.7) Hence it is natural to assume that ΣA and ΣB are convex sets and its extremal elements are the pure strategies. 3 Alternatively we can use a special condition on the classical system (“checkmate”) which signals the end of the game. This is however more difficult to handle and we do not need this generalization. 12.2. The quantum Monty Hall problem 173 We have reached therefore a normal form description of a quantum game which is very similar to the classical case. The only difference is that the sets Σ j of mixed strategies are not of the form S(Xj ). In many cases, however, statements from game theory only rely on the convex structure and not on the fact that the S(X j ) are simplices. This concerns in particular Nash equilibria: if we use the definition of expected payoffs from (12.5) the definition of a mixed Nash equilibrium from Equation (12.3) can be applied immediately to the quantum case. Nash’s existence proof is based on Kakutani’s fixed point theorem [131], which holds for any compact and convex set. Hence each quantum game of the described form admits a (mixed) Nash equilibrium. Let us turn now to “quantizations” of classical games. Quantizing a game means to enlarge the strategical options of one or both players and to allow them to perform some quantum operations. To be more precise consider a (classical) game G described by the strategy sets XA , XB and utility functions uA , uB . A quantization then consists of a quantum game G0 as just described and maps Ij which associates to each pure strategy sj ∈ Xj a pure strategy Ij (sj ) ∈ Σj such that the pair (IA (sA ), IB (sB )) always leads to the same outcome as (sA , sB ). I.e. if two players plays the game G0 but only uses strategies of the form Ij (sj ) (or convex linear combinations of them) it is the same as if they are playing G. An interesting situation occurs, if only one player knows about the additional options, because he can fool the other who bases his decisions on knowledge about G rather G0 . This was first observed by Meyer in [164]. We will study this and other typical behavior of quantum games with two examples which are based on publications in [65] and [75]. The next section treats a quantum version of the Monty Hall problem. This is a model case which is at the one hand quite simple but has on the other hand enough structure to provide some interesting results. The other example – quantum coin tossing – is taken from quantum cryptography and shows therefore how methods from game theory can be applied to more practical problems of quantum information theory. 12.2 The quantum Monty Hall problem The well-known classical Monty Hall problem, also known under various other names [30], is set in the context of a television game show. It can be seen as a two person game, in which a player P tries to win a prize, but a show master (or Quiz master) Q tries to make it difficult for her4 . We will discuss in this section a quantization of this game (mainly based on [65]) which illustrate many interesting features of quantum game theory. Of course quantizations of a game are rarely unique, and depend critically on what is seen as a “key element”, and also on how actions which might change the system are formalized, corresponding to how, in the classical version, information is gained by “looking at something”. The Monty Hall problem is no exception, and there are already quantizations [152, 91]. The version we present in this paper was drafted independently, and indeed we come to a quite different conclusion. We discuss the relation between these two approaches and ours in more detail in Sec. 12.2.6 below. 12.2.1 The classical game The classical Monty Hall problem is set in the context of a television game show. In the last round of the show, the candidates were given a chance to collect their prize (or lose it) in the following game: 1. Before the show the prize is hidden behind one of three closed doors. The show master knows where the prize is but, of course, the candidate does not. 4 In this text, the show master is male, like Monty Hall, the host of the show, where the game first appeared. The player is female, like Marilyn vos Savant, who was the first to fight the public debate for the recognition of the correct solution, and had to take some sexist abuse for that. 12. Quantum game theory 174 2. The candidate is asked to choose one of the three doors, which is, however, not opened at this stage. 3. The show master opens another door, and shows that there is no prize behind it. (He can do this, because he knows where the prize is). 4. The candidate can now open one of the remaining doors to either collect her prize or lose. Of course, the question is: should the candidate stick to her original choice or “change her mind” and pick the other remaining door? As a quick test usually shows, most people will stick to their first choice. After all, before the show master opened a door the two doors were equivalent, and they were not touched (nor was the prize moved). So they should still be equivalent. This argument seems so obvious that trained mathematicians and physicists fall for it almost as easily as anybody else. However, the correct solution by which the candidates can, in fact, double their chance of winning, is to always choose the other door. The quickest way to convince people of this is to compare the game with another one, in which the show master offers the choice of either staying with your choice or opening both other doors. Anybody would prefer that, especially if the show master courteously offers to open one of the doors for you. But this is precisely what happens in the original game when you always change to the other door. To catch up with the general discussion from the last section let us discuss the normal form of this game. The pure strategies of Q are described by the numbers of the doors where the prize is hidden5 XQ = {1, 2, 3}. The player P can choose as well one of the three doors in round 2 and has to decide whether she switches (1) or not (0). Hence XP = {1, 2, 3} × {0, 1}. The game is a zero sum game, i.e. uQ = −uP and uP has only two possible outcomes +1 if P wins and −1 if she looses. If j ∈ XQ and (k, l) ∈ XP we can write uP simply as uP (j; k, l) = (−1)l (2δkj − 1). If the game is repeated very often there are unique optimal strategies for both players. Assume to this end that P has watched each issue of the show and has calculated the probabilities pj with which the price is hidden behind the door j. Then her best option is to choose in the 2. round the door with the lowest pj and her chance to win becomes 1 − minj pj if she switches at the end to the second unopened door. This is even greater than 2/3 if Q does not uses all three doors with equal probability. Hence the best option for Q is to choose the uniform distribution. The pair of strategies “uniform distribution” and “switch to the second door” is therefore a Nash equilibrium. 12.2.2 The quantum game We will “quantize” only the key parts of the problem. That is, the prize and the players, as well as their publicly announced choices, will remain classical. The quantum version can even be played in a game show on classical TV. The main quantum variable will be the position of the prize. It lies in a 3dimensional complex Hilbert space H, called the game space. We assume that an orthonormal basis is fixed for this space so that vectors can be identified by their components, but apart from this the basis has no significance for the game. A second important variable in the game is what we will call the show master’s notepad, described by an observable algebra N . This might be classical information describing how the game space was prepared, or it might be a quantum system, entangled with the prize. In the latter case, the show master is able to do a quantum measurement 5 If P selects the correct door in the 2. step, Q has has to choose in step 3 between two doors. However, to take this choice as an additional strategical option of Q into account makes things more difficult without leading to new insights. 175 12.2. The quantum Monty Hall problem on his notepad, providing him with classical information about the prize, without moving the prize, in the sense that the player’s information about the prize is not changed by the mere fact that the show master “consults his notepad”. A measurement on an auxiliary quantum system, even if entangled with a system of interest, does not alter the reduced state of the system of interest. After the show master has consulted his notepad, we are in the same situation as if the notepad had been a classical system all along. As in the classical game, the situation for the player might change when the show master, by opening a door, reveals to some extent what he saw in his notepad. Opening a door corresponds to a measurement along a one dimensional projection on H. Finally we need a classical system which is used by both players to exchange classical information. We will call it the mail box and describe it by a classical observable algebra C(X), where X can be taken as the space of one-dimensional projections in H (i.e. X is the projective space P 2 (C)). The overall algebra A which has to be used to describe the game according to Subsection 12.2.2 is therefore A = B(H) ⊗ N ⊗ C(X). The game proceeds in the following stages, closely analogous to the classical game: 1. Before the show Q prepares the game space quantum mechanically and stores some information about this preparation in his notepad N . The initial state of the mail box can be arbitrary. 2. The candidate P chooses some one dimensional projection p on H and stores this as classical information in the mailbox. The game space and the showmasters notepad (which P can not access) remain untouched. 3. The show master opens a door, i.e., he chooses a one dimensional projection q, and makes a Lüders/von Neumann measurement with projections q and (1I − q). In order to do this, he is allowed first to consult his notebook. If it is a quantum system, this means that he carries out a measurement on the notebook. The joint state of prize and notebook then change, but the traced out or reduced state of the prize does not change, as far as the player is concerned. Two rules constrain the show master’s choice of q: he must choose “another door” in the sense that q ⊥ p; and he must be certain not to reveal the prize. The purpose of his notepad is to enable him to do this. After these steps, the game space is effectively collapsed to the two-dimensional space (1I − q)H and information about the opened door is stored in the mailbox. 4. The player P reads the mailbox, chooses a one dimensional projection p 0 on (1I − q)H, and performs the corresponding measurement on the game space. If it gives “yes” she collects the prize. Note that we recover the classical game if Q and P are restricted to choose projections along the three coordinate axes. This shows that the proposed scheme is really a quantization as described in Subsection 12.2.2. As in the classical case, the question is: how should the player choose the projection p0 in order to maximize her chance of winning? Perhaps it is best to try out a few options in a simulation, for which a Java applet is available [64]. For the input to the applet, as well as for some of the discussion below it is easier to use unit vectors rather than one-dimensional projections. As standard notation we will use for p = |ΦihΦ| for the door chosen by the player, q = |χihχ| for the door opened by Q, and r = |ΨihΨ| for the initial position of the prize, if that is defined. From the classical case it seems likely that choosing p0 = p is a bad idea. So let us say that the classical strategy in this game consists of always switching to the orthogonal complement of the previous choice, i.e., to take p0 = 1I − q − p. Note that this is always a projection because, by rule 3, p and q are orthogonal one 12. Quantum game theory 176 dimensional projections. We will analyze this strategy in Sec. 12.2.3, which turns out to be possible without any specification of how the show master can guarantee not to stumble on the prize in step 3. For the show master there are two main ways how he can satisfy the rules. The first is that he chooses randomly the components of a vector in H, and prepares the game space in the corresponding pure state. He can then just take a note of his choice on a classical pad, so that in stage 3 he can compute a vector orthogonal to both the direction of the preparation and the direction chosen by the player. Q’s strategies in this case are discussed in Subsection 12.2.3. The second and more interesting way is to use a quantum notepad, i.e., another system with three dimensional Hilbert space K, and to prepare a “maximally entangled state” on H ⊗ K. Then until stage 3 the position of the prize is completely undetermined in the strong sense only possible in quantum mechanics, but the show master can find a safe door to open on H by making a suitable measurement on K. Q’s strategies in this case are discussed in Subsection 12.2.5. 12.2.3 The classical strategy To explain why the classical strategy works almost as in the classical version of the problem, we look more closely at the end of round 3, i.e. Q has opened one door by measuring along q and the information which q he has chosen is stored in the mailbox system. Q’s notepad is completely irrelevant from this stage on because it is now P’s turn and she can not access it. Hence we have to look at a state ω on the hybrid system B(H) ⊗ C(X). Note that ω depends on p but we suppress this dependency in the notation. For a finite set X we have seen in Section 2.2.2 that ω is given by a probability distribution w(q) on X and family ρq ∈ S(H), q ∈ X of density operators such that expectation values becomes X w(q)f (q) tr[ρq p0 ], p0 ⊗ f ∈ B(H) ⊗ C(X) (12.8) ω(p0 ⊗ f ) = q∈X The ρq are called conditional density operators and they represent, loosely speaking, the density matrix which P has to use for the game space after Q has announced his intention of opening door q. This is usually not the same conditional density operator as the one used by Q: Since Q has more classical information about the system, he may condition on that, leading to finer predictions. In contrast, ρ q is conditioned only on the publicly available information. In our case X is not finite but the set of all one-dimensional projections in H. Therefore Equation (12.8) is not applicable. Fortunately it can be generalized if we replace the sum with an integral and the probability distribution with a probability measure [175]. Z ω(p0 ⊗ f ) = w(dq) tr(ρq p0 )f (q) (12.9) The map q 7→ ρq becomes an element of L1 (X, w) ⊗ B ∗ (H) and is therefore an equivalence class of function with respect to almost everywhere equality. However for our purposes it is safe to ignore this difficulty and to identify the equivalence class with one of its representatives. From w and ρq we can compute the marginal density operator for the quantum subsystem, describing measurements without consideration of the classical variable q. This is the mean density operator Z ρ = w(dq) ρq . (12.10) It will not depend on p and it will be the same as the reduced density operator for the game space before the show master consults his notepad (he is not allowed 177 12.2. The quantum Monty Hall problem to touch the prize), and even before the player chooses p (which cannot affect the prize). From the rules alone we know two things about the conditional density operators: firstly, that tr(ρq q) = 0: the show master must not hit the prize. Secondly, q and p must commute, so it does not matter whichR of the two we measure first. Thus a measurement of p responds with probability w(dq) tr(ρq p) = tr(ρ p). Combining these two we get the overall probability wc for winning with the classical strategy as Z ¡ wc = w(dq) tr ρq (1I − p − q)) = 1 − tr(ρ p) . (12.11) If we assume that ρ is known to P, from watching the show sufficiently often, the best strategy for P is to choose initially the p with the smallest expectation with respect to ρ, just as in the classical game with uneven prize distribution it is best to choose initially the door least likely to contain the prize. If Q on the other hand wants to minimize P’s gain, he will choose ρ to be uniform, which in the quantum case means ρ = 31 1I, and hence wc = 2/3. 12.2.4 Strategies against classical notepads In this section we consider the case that the show master records the prepared direction of the prize on a classical notepad. We will denote the one dimensional projection of this preparation by r. Then when he has to open a door q, he needs to choose q ⊥ r and q ⊥ p. This is always possible in a three dimensional space. But unless p = r, he has no choice: q is uniquely determined. This is the same as in the classical case, only that the condition “p = r”, i.e., that the player chooses exactly the prize vector typically has probability zero. Hence Q’s strategic options are not in the choice of q, but rather in the way he randomizes the prize positions r, i.e., in the choice of a probability measure v on the set of pure states. In order to safeguard against the classical strategy he will make certain that the mean density R operator ρ = v(dr) r is unpolarized (= 13 1I). It seems that this is about all he has to do, and that the best the player can do is to use the classical strategy, and win 2/3 of the time. However, this turns out to be completely wrong. Preparing along the axes. — Suppose the show master decides that since the player can win as in the classical case, he might as well play classical as well, and save the cost for an expensive random generator. Thus he fixes a basis and chooses each one of the basis vectors with probability 1/3. Then ρ = 13 1I, and there seems to be no giveaway. In fact, the two can now play the classical version, with P choosing likewise a projection along a basis vector. But suppose she √ does not, and chooses instead the projection along the vector Φ = (1, 1, 1)/ 3. Then if the prize happens to be prepared in the direction Ψ = (1, 0, 0), the show master has no choice but to choose for q the unique projection orthogonal to these two, which is along χ = (0, 1, −1). So when Q announces his choice, P only has to look which component of the vector is zero, to find the prize with certainty! In other words a quantum strategy of P can always beat an opponent, who is restricted to classical strategies. This is exactly the behavior we have mentioned at end of Section 12.1. At a first look the success of P’s strategy seems to be an artifact of the rather minimalistic choice of probability distribution. But suppose that Q has settled for any arbitrary finite collection of vectors Ψα and their probabilities. Then P can choose a vector Φ which lies in none of the two dimensional subspaces spanned by two of the Ψα . This is possible, even with a random choice of Φ, because the union of these two dimensional subspaces has measure zero. Then, when Q announces the projection q, P will be able to reconstruct the prize vector with certainty: at most one of the Ψα can be orthogonal to q. Because if there were two, they would span a 12. Quantum game theory 178 two dimensional subspace, and together with Φ they would span a three dimensional subspace orthogonal to q, which is a contradiction. Of course, any choice of vectors announced with floating point precision is a choice from a finite set. Hence the last argument would seem to allow P to win with certainty in every realistic situation. However, this only works if she is permitted to ask for q at any desired precision. So by the same token (fixed length of floating point mantissa) this advantage is again destroyed. This shows, however, where the miracle strategies come from: by announcing q, the show master has not just given the player log 2 3 bits of information, but an infinite amount, coded in the digits of the components of q (or the vector χ). Preparing real vectors. — The discreteness of the probability distribution is not the key point in the previous example. In fact there is another way to economize on random generators, which proves to be just as disastrous for Q. The vectors in H are specified by three complex numbers. So what about choosing them real for simplicity? An overall phase does not matter anyhow, so this restriction does not seem to be very dramatic. √ Here the winning strategy for P is to take Φ = (1, i, 0)/ 2, or another vector whose real and imaginary parts are linearly independent. Then the vector χ ⊥ Φ announced by Q will have a similar property, and also must be orthogonal to the real prize vector. But then we can simply compute the prize vector as the outer product of real and imaginary part of χ. For the vector Φ specified above we find that if the prize is at Ψ = (Ψ1 , Ψ2 , Ψ3 ), with Ψk ∈ IR, the unique vector χ orthogonal to Φ and Ψ is connected to Ψ via the transformations χ ∝ (Ψ3 , −iΨ3 , −Ψ1 + iΨ2 ) Ψ ∝ (− Re χ3 , Im χ3 , χ1 ) , (12.12) (12.13) where “∝” means “equal up to a factor”, and it is understood that an overall phase for χ is chosen to make χ1 real. This is also the convention used in the simulation [64], so Eq. (12.13) can be tried out as a universal cheat against show masters using only real vectors. Uniform distribution. — The previous two examples have one thing in common: the probability distribution of vectors employed by the show master is concentrated on a rather small set of pure states on H. Clearly, if the distribution is more spread out, it is no longer possible for P to get the prize every time. Hence it is a good idea for Q to choose a distribution which is as uniform as possible. There is a natural definition of “uniform” distribution in this context, namely the unique probability distribution on the unit vectors, which is invariant under arbitrary unitary transformations. Is this a good strategy for Q? Let us consider the conditional density operator ρq , which depends on the two orthogonal projections p, q. It implicitly contains an average over all prize vectors leading to the same q, given p. Therefore, ρq must be invariant under all unitary rotations of H fixing these two vectors, which means that it must be diagonal in the same basis as p, q, (1I − p − q). Moreover, the eigenvalues cannot depend on p and q, since every pair of orthogonal one dimensional projections can be transformed into any other by a unitary rotation. Since we know the average eigenvalue in the p-direction to be 1/3, we find ρq = 2 1 p + (1I − p − q) . 3 3 (12.14) Hence the classical strategy for P is clearly optimal. In other words, the pair of strategies: “uniform distribution for Q and classical strategy for P” is a Nash equilibrium of the game. We do not know however, whether this equilibrium is unique, in 12.2. The quantum Monty Hall problem 179 other words: If Q does not play precisely by the uniform distribution: can P always improve on the classical strategy? We suppose that the answer to this question is yes; to find a proof of this conjecture has turned out, however, to be a hard problem which is still open. 12.2.5 Strategies for Quantum notepads Assume now that the notepad system N is quantum rather than classical, i.e. N = B(K) with K = C3 . Initially the bipartite system B(K ⊗ H) consisting of notepad and game space is prepared in a maximally entangled state 3 1 X |kki, Ω= √ 3 k=1 (12.15) where |ki, denotes an arbitrary basis in H, respectively in K. To “look” in his notepad now means that Q performs a measurement on N . This is described by a POV measure Fx , x ∈ Y , which we can take for simplicity to be discrete; i.e. Y is a finite set of possible outcomes. How could Q now infer from this result a safe door q for him to open in the game? This would mean that Fx measured on K, and q measured on H never give a simultaneous positive response, when measured in the state Ω, i.e., 0 = hΩ|q ⊗ Fx |Ωi = hΩ|1I ⊗ Fx q T |Ωi = 1 tr(q T Fx ). 3 (12.16) Here q 7→ q T denotes transposition in the basis |ki and we have used the fact that (X ⊗ 1I)Ω = (1I ⊗ X T )Ω holds for any operator X ∈ B(H); cf. Subsection 3.1.1. Since Fx and q T are both positive, this is equivalent to Fx q T = 0. Of course, Q’s choice must also satisfy the constraint q ⊥ p. There are different ways of arranging this, which we discuss in the following. Equivalence if observable is chosen beforehand. — Suppose Q chooses the measurement beforehand, and let us suppose it is discrete, as before. Then for every outcome x and every p he must be able to find a one dimensional projection satisfying both constraints FxT q = 0 and qp = 0. Clearly, this requires that Fx has at least a two dimensional null space, i.e., Fx = |φx ihφx |, with a possibly non-normalized vector φx ∈ H. It will be convenient to take the vectors φx to be normalized, and to define Fx = vx |φx ihφx | with factors vx summing to 3, the dimension of the Hilbert space. We can further simplify this structure, by identifying outcomes x with the same φ x , since for these the same projection q has to be chosen anyhow. We can therefore drop the index “x”, and consider the measure to be defined directly on the set of one dimensional projections. But this is precisely the structure we had used to describe a classical notepad. This is not an accidental analogy: apart from taking transposes this measure has precisely the same strategic meaning as the measure of a classical notepad. This is not surprising: if the observable is chosen beforehand, it does not matter whether the show master actually performs the measurement before or after the player’s choice. But if he does it before P’s choice, we can just as well consider this measurement with its classical output as part of the preparation of a classical notepad, in which the result is recorded. Simplified strategy for Q. — Obviously the full potential of entanglement is used only, when Q chooses his observable after P’s choice. Since the position of the prize is “objectively undetermined” until then, it might seem that there are now ways to beat the 2/3 limit. However, the arguments for the classical strategy hold in this case as well. So the best Q can hope for are some simplified strategies. For example, he can now get away with something like measuring along axes only, even though for classical notepads using “axes only” was a certain loss for Q. 12. Quantum game theory 180 We can state this in a stronger way, by introducing tougher rules for Q: In this variant P not only picks the direction p, but also two more projections p0 and p00 such that p + p0 + p00 = 1I. Then Q is not only required to open a door q ⊥ p, but we require that either q = p0 or q = p00 . It is obvious how Q can play this game with an entangled notepad: he just uses the transposes of p, p0 , p00 as his observable. Then everything is as in the classical version, and the equilibrium is again at 2/3. 12.2.6 Alternative versions and quantizations of the game Quantizing something is seldom a problem with a unique solution and quantum game theory is no exception in this respect. In the following we will give a brief overview on some games which are closely related to our version. Variants arising already in the classical case. — Some variants of the problem can also be considered in the classical case, and they tend to trivialize the problem, so that P’s final choice becomes equivalent to “Q has prepared a coin, and P guesses heads or tails”. Here are some possibilities, formulated in a way applying both to the classical and the quantum version. • Q is allowed to touch the prize after P made her first choice. Clearly, in this case Q can reshuffle the system, and equalize the odds between the remaining doors. So no matter what P chooses, there will be a 50% chance for getting the prize. • Q is allowed to open the door first chosen by P. Then there is no way P’s first choice enters the rules, and we may analyze the game with stage 2 omitted, which is entirely trivial. • Q may open the door with the prize, in which case the game starts again. Since Q knows where the prize is, this is the same as allowing him to abort the round, whenever he does not like what has happened so far, e.g., if he does not like the relative position of prize and P’s choice. In the classical version he could thus cancel 50% of the cases, where P’s choice is not the prize, thus equalizing the chances for P’s two pure strategies. Similar possibilities apply in the quantum case. Variants in which classical and quantum behave differently. — More interesting cases arises if we focus on the differences between the classical and the quantum game. • Q may open the door with the prize, in which case P gets the prize. In the classical version, revealing the prize is then the worst possible pure strategy, so mixing in a bit of it would seem to make things always worse for Q. Then although increasing Q’s options in principle can only improve things for Q, one would advise him not to use the additional options. This is assuming, though, that in the remaining cases Q sticks to his old strategy. However, even classically, the relaxed rule gives him some new options: He can simply ignore the notepad, and open any door other than p. Then the game becomes effectively “P and Q open a door each, and P gets all prizes”. Assuming uniform initial distribution of prizes this gives the same 2/3 winning chance as in the original game. The corresponding quantum strategy works in the same way. Assuming, for simplicity, a uniform mean density operator ρ = 13 1I, Q’s strategy of ignoring his prior information will give the classical 2/3 winning chance for P. But this is a considerable improvement for Q in cases where a non-uniform probability distribution of pure states previously gave Q a 100% chance of winning. So in the quantum case, doing two seemingly stupid things together amounts to a 12.2. The quantum Monty Hall problem 181 good strategy for Q: firstly, sometimes revealing the prize for P, and secondly ignoring all prior information. Note that this strategy is optimal for Q, because the classical strategy still guarantees the 2/3 winning chance for P. This can be seen with the same arguments as in Subsection 12.2.3. The only difference is that tr(ρq q) can be nonzero, since Q may open the door with the prize. However in this case P wins and we get instead of Equation (12.11) Z ¡ wc = w(dq) tr ρq (1I − p − q)) + tr(ρq q) (12.17) = 1 − tr(ρp) = 2 3 (12.18) • As Q opens the door he is allowed to make a complete von Neumann measurement. Classically, it would make no difference if the doors were completely transparent to the show master. He would not even need a pad then, because he could always look where the prize is. But “looking” is never innocent in quantum mechanics, and in this case it is tantamount to moving the prize around. So let us make it difficult for Q, by insisting that the initial preparation is along a fixed vector, known also to P, and that Q not only has to announce the direction q of the door he opens, but also the projections q 0 ⊥ q and q 00 = 1I − q − q 0 entering in the complete von Neumann measurement, which takes an arbitrary density operator ρ to ρ 7→ qρq + q 0 ρq 0 + q 00 ρq 00 . (12.19) Moreover, we require as before, that q is orthogonal both to p and to the prize. The only thing remaining secret is which of the projections q 0 and q 00 has detected the presence of the prize (This simply would allow P to open that door and collect). Q’s simple strategy is now to choose q as before. The position of p is irrelevant for his choice of p0 and p00 : he will just take these directions at 45◦ to the prize vector. This will result in the unpolarized density operator (q 0 + q 00 )/2, and no matter what P chooses, her chances of hitting the prize will be 1/2. She will probably feel cheated, and she is, because even though she knows exactly where the prize was initially, the strategy “choose the prize, and stick to this choice” no longer works. Two published versions. — Finally let us have a short look on two variants of the game which are proposed independently by other authors [152, 91]. • The quantization proposed in [152] is closely related to a version already discussed above: After Q has opened one door he is allowed to perform an arbitrary von Neumann measurement on the remaining two-dimensional subspace – i.e. he “looks” where the prize is. In the classical case this is an allowed (but completely superfluous) step. In the quantum case, however, the prize is shuffled around. In other words, the final result of the game is completely independent of the steps prior to this measurement and the whole game is reduced to a classical coin flip – which is not very interesting. • A completely different quantization of the game is given in [91]. In contrast to our approach, the moves available to Q and P are here not preparations and measurements but operations on a tripartite system which is initially in the pure state ψ ∈ HQ ⊗ HP ⊗ HO (and different choices for ψ lead to different variants of the game). The Hilbert spaces HQ , HP and HO describe the doors 12. Quantum game theory 182 where Q hides the prize, which P chooses in the second step and which Q opens afterwards and the gameplay is described by the unitary operator U = (cos γUS + sin γUN )UO (UQ ⊗ UP ⊗ 1I). (12.20) UQ and UP are arbitrary unitaries, describing Q’s and P ’s initial choice, UO is the (fixed) opening box operator and US respectively UN are P’s “switching” and “not-switching” operators. The payoff is finally given as the expectation value of an appropriate observable $ (for a precise definition of UO , US , UN and $ see [91]). The basic idea behind this scheme is quite different from ours and a comparison of results is therefore impossible. Nevertheless, this is a nice example which shows that quantizing a classical game is very non-unique. 12.3 Quantum coin tossing The purpose of this section is to show how game theoretical questions naturally arises within quantum cryptography. To this end we will discuss a particular class of cryptographic protocols called “quantum coin tossing” within the game theoretic setup introduced in Section 12.1. Classical coin tossing was introduced in 1981 by Blum [29] as a solution to the following cryptographic problem (cited from [29]): “Alice and Bob want to flip a coin by telephone. (They have just divorced, live in different cities, want to decide who gets the car.) Bob would not like to tell Alice heads and hear Alice (at the other end of the line) say: Here it goes... I’m flipping the coin... You lost!”. Hence the basic difficulties are: both players (Alice and Bob) distrust each other, there is no trustworthy third person available and the only resource they can use is the communication channel. Although this problem sounds somewhat artificial, coin tossing is a relevant building block which appears in many cryptographic protocols. Within classical cryptography coin tossing protocols are in general based on assumptions about the complexity of certain computational tasks like factoring of large integers, which are unproven and, even worse, break down if quantum computers become available. A subset of classical cryptography which suffers from similar problems is given by public key cryptosystems. In this case however a solution is available in form of quantum key distribution (cf. [97] for a review) whose security is based only on the laws of quantum mechanics and no other assumptions. Hence the natural question is: Does quantum mechanics provide the same service for cointossing, i.e. is there a perfectly secure quantum coin-tossing protocol? Although the answer is, as we will see, “no” [154, 161], quantum coin-tossing provides a reasonable security improvement over classical schemes. 12.3.1 Coin tossing protocols Two players (as usual called Alice and Bob) are separated from each other and want to create a random bit, which can take both possible values with equal probability. However they do not trust each other and there is no trustworthy third person who can flip the coin for them. Hence they only can exchange data until they have agreed on a value 0 or 1 or until one player is convinced that the other is cheating; in this case we will write ∅ for the corresponding outcome. To describe such a coin tossing protocol mathematically, we need three observable algebras A, B and M, where A and B represent private information, which is only accessible by Alice and Bob respectively – Alice’s and Bob’s “notepad” – while M is a public area, which is used by both players to exchange data. We will call it in the following the “mailbox”. Each of the three algebras A, B and M contain in general a classical and a quantum part, i.e. we have A = C(XA ) ⊗ B(HA ) and similar for B and M. A typical choice is HA = H⊗n and XA = Bm where H = C2 12.3. Quantum coin tossing 183 TB,1 TA,2 TB,N −1 TA,N ρB,0 Bob Mailbox ρA,0 Alice Figure 12.1: Schematic picture of a quantum coin-tossing protocol. The curly arrows stands for the flow of quantum or classical information or both. and B denotes the field with two elements – in other words Alice’s notepad consists in this case of n qubits and m classical bits. If Alice wants to send data (classical or quantum) to Bob, she has to store them in the mailbox system, where Bob can read them off in the next round. Hence each processing step of the protocol (except the first and the last one) can be described as follows: Alice (or Bob) uses her own private data and the information provided by Bob (via the mailbox) to perform some calculations. Afterwards she writes the results in part to her notepad and in part to the mailbox. An operation of this kind can be described by a completely positive map TA : A ⊗ M → A ⊗ M, or (if executed by Bob) by TB : M ⊗ B → M ⊗ B. Based on these structures we can describe a coin tossing protocol as a special case of the general scheme for a quantum game introduced in Subsection 12.1.2: At the beginning Alice and Bob prepare their private systems in some initial state. Alice uses in addition the mailbox system to share some information about her preparation with Bob, i.e. Alice prepares the system A⊗M in a (possibly entangled, or at least correlated) state ρA,0 , while Bob prepares his notepad in the state ρB,0 . Hence the state of the composite system becomes ρ0 = ρA,0 ⊗ ρB,0 . Now Alice and Bob start to operate alternately6 on the system, as described in the last paragraph, i.e. Alice in terms of operations TA : A ⊗ M → A ⊗ M and Bob with TB : M ⊗ B → M ⊗ B. After N rounds7 the systems ends therefore in the state (cf. Figure 12.1) ∗ ∗ ∗ ∗ ρN = (TA,N ⊗ IdB )(IdA ⊗TB,N −1 ) · · · (TA,2 ⊗ IdB )(IdA ⊗TB,1 )ρ0 , (12.21) where IdA , IdB are the identity maps on A and B. Note that we have assumed here without loss of generality that Alice performs the first (i.e. providing the initial preparation of the mailbox) and the last step (applying the operation TA,N ). It is obvious how we have to change the following discussion if Bob starts the game or if N is odd. To determine the result Alice and Bob perform measurements on their notepads. The corresponding observables EA = (EA,0 , EA,1 , EA,∅ ) and EB = (EB,0 , EB,1 , EB,∅ ) can have the three possible outcomes X = {0, 1, ∅}, which we 6 This means we are considering only turn based protocols. If special relativity, and therefore finite propagation speed for information, is taken into account it can be reasonable to consider simultaneous exchange of information; cf. e.g. [132] for details. 7 Basically N is the maximal number of rounds: After K < N steps Alice (Bob) can apply identity maps, i.e. TA,j = Id for j > K. 12. Quantum game theory 184 have described already above. The tuples sA = (ρA,0 ; TA,2 , . . . , TA,N +2 ; EA ), sB = (ρB,0 ; TB,1 , . . . , TB,N +1 ; EB ) (12.22) consists of all parts of the protocol Alice respectively Bob can influence. Hence the sA represent Alice’s and the sB represent Bob’s strategies. As in Subsection 12.1.2 the sets of all strategies of Alice respectively Bob are denoted by ΣA and ΣB . Note that ΣA depends only on the algebras A and M while ΣB depends on B and M. Occasionally it is useful to emphasize this dependency (the number of rounds is kept fixed in this paper). In this case we write ΣA (A, M) and ΣB (B, M) instead of ΣA and ΣB . The probability that Alice gets the result a ∈ X if she applies the strategy sA ∈ ΣA and Bob gets b ∈ X with strategy sB ∈ ΣB is (cf. Equation (12.4)) £ ¤ υ(sA , sB ; a, b) = tr (EA,a ⊗ 1I ⊗ EB,b )ρN . (12.23) If both measurements in the last step yield the same result a = b = 0 or 1 the procedure is successful (and the outcome is a). If the results differ or if one player signals ∅ the protocol fails. As stated above we are interested in protocols which do not fail and which produce 0 and 1 with equal probability. Another crucial requirement concerns security: Neither Alice nor Bob should be able to improve the probabilities of the outcomes 0 or 1 by “cheating”, i.e. selecting strategies which deviate from the predefined protocol. At this point it is crucial to emphasize that we do not make any restricting assumptions about the resources Alice and Bob can use to cheat – they are potentially unlimited. This includes in particular the possibility of arbitrarily large notepads. In the next definition this is expressed by the (arbitrary) algebra R. Definition 12.3.1 A pair of strategies (sA , sB ) ∈ ΣA (A, M) × ΣB (B, M) is called a (strong) coin tossing protocol with bias ² ∈ [0, 1/2] if the following conditions holds for any (finite dimensional) observable algebra R 1. Correctness: υ(sA , sB ; 0, 0) = υ(sA , sB ; 1, 1) = 12 , 2. Security against Alice: ∀s0A ∈ ΣA (R, M) and ∀x ∈ {0, 1} we have υ(s0A , sB ; x, x) ≤ 1 +² 2 (12.24) 3. Security against Bob: ∀s0B ∈ ΣB (R, M) and ∀x ∈ {0, 1} we have υ(sA , s0B ; x, x) ≤ 1 +² 2 (12.25) The two security conditions in this definition imply that neither Alice nor Bob can increase the probability of the outcome 0 or 1 beyond the bound 1/2 + ². However it is more natural to think of coin tossing as a game with payoff defined according to the following table a=b=0 a=b=1 other Alice 1 0 0 Bob 0 1 0 (12.26) This implies that Alice tries to increase only the probability for the outcome 0 and not for 1 while Bob tries to do the contrary, i.e. increase the probability for 1. This motivates the following definition. 12.3. Quantum coin tossing 185 Definition 12.3.2 A pair of strategies (sA , sB ) ∈ ΣA (A, M) × ΣB (B, M) is called a weak coin tossing protocol, if item 1 of Definition 12.3.1 holds, and if items 2 and 3 are replaced by 2’ Weak security against Alice: ∀s0A ∈ ΣA (R, M) we have υ(s0A , sB ; 0, 0) ≤ 1 + ², 2 (12.27) 3’ Weak security against Bob: ∀s0B ∈ ΣB (R, M) we have υ(sA , s0B ; 1, 1) ≤ 1 + ². 2 (12.28) Here R stands again for any finite dimensional (but arbitrarily large) observable algebra. Good coin tossing protocols are of course those with a small bias. Hence the central question is: What is the smallest bias which we have to take into account, and how do the corresponding optimal strategies look like? To get an answer, however, is quite difficult. Up to now there are only partial results available (cf. Section 12.3.5 for a summary). Other but related questions arise if we exploit the game theoretic nature of the problem. In this context it is reasonable to look at a whole class of quantum games, which arises from the scheme developed up to now. We only have to fix the algebras8 A, B and M and to specify a payoff matrix as in Equation (12.26). The latter, however, has to be done carefully. If we consider instead of (12.26) the payoff a=b=0 a=b=1 other Alice 1 -1 0 Bob -1 1 0 (12.29) we get a zero sum game, which seems at a first look very reasonable. Unfortunately it admits a very simple (and boring) optimal strategy: Bob produces always the outcome 1 on his side while Alice claims always that she has measured 0. Hence they never agree and nobody has to pay. The game from Equation (12.26) does not suffer from this problem, because a draw is for Alice as bad as the case a = b = 1 where Bob wins. 12.3.2 Classical coin tossing Let us now add some short remarks on classical coin tossing, which is included in the general scheme just developed as a special case: We only have to choose classical algebras for A, B and M, i.e. A = C(XA ), B = C(XB ) and M = C(XM ). The completely positive maps TA and TB describing the operations performed by Alice and Bob are in this case given by matrices of transition probabilities (see Sect. 3.2.3) This implies in particular that the strategies in ΣA , ΣB are in general mixed strategies. This is natural – there is of course no classical coin tossing protocol consisting of pure strategies, because it would lead always to the same result (either always 0 or always 1). However, we can decompose each mixed strategy in a unique way into a convex linear combination of pure strategies, and this can be used to show that there is no classical coin tossing protocol, which admits the kind of security contained in Definition 12.3.1 and 12.3.2. 8 In contrast to the security definitions given above this means that we assume limited recourses (notepads) of Alice and Bob. This simplifies the analysis of the problem and should not be a big restriction (from the practical point of view) if the notepads are fixed but very large. 12. Quantum game theory 186 Proposition 12.3.3 There is no (weak) classical coin tossing protocol with bias ² < 12 . Proof. Assume a classical coin tossing protocol (sA , sB ) is given. Since its outcome is by definition probabilistic, sA or sB (or both) are mixed strategies which can be decomposed (in a unique way) into pure strategies. Let us denote the sets of pure strategies appearing in this decomposition by Σ0A , Σ0B . Since the protocol (sA , sB ) is correct, each pair (sA , sB ) ∈ Σ0A × Σ0B leads to a valid outcome, i.e. either 0 or 1 on both sides. Hence there are two possibilities to construct a zero-sum game, either Alice wins if the outcome is 0 and Bob if it is 1 or the other way round. In both cases we get a zero-sum two-person game with perfect information, no chance moves 9 and only two outcomes. In those games one player has a winning strategy (cf. Sect. 15.6, 15.7 of [224]), i.e. if she (or he) follows that strategy she wins with certainty, no matter which strategy the opponent uses. This includes in particular the case where the other player is honest and follows the protocol. If we apply this arguments to both variants of the game, we see that either one player could force both possible outcomes or one bit could be forced by both players. Both cases only fit into the definition of (weak) coin tossing if the bias is 1/2. This proves the proposition. 2 Note that the proof is not applicable in the quantum case (in fact there are coin tossing protocols with bias less than 1/2 as we will see in Section 12.3.4). One reason is that in the quantum case one does not have perfect information. E.g. if Alice sends a qubit to Bob, he does not know what qubit he got. He could perform a measurement, but if he measures in a wrong basis, he will inevitably change the qubit. Another way to circumvent the negative result of the previous proposition is to weaken the assumption that both players can perform any operation on their data. A possible practical restriction which come into mind immediately is limited computational power, i.e. we can assume that no player is able to solve intractable problems like factorization of large integers in an acceptable time. Within the definition given above this means that Alice and Bob do not have access to all strategies in ΣA and ΣB but only to certain subsets. Of course, such additional restrictions can be imposed as well in the quantum case. To distinguish the much stronger security requirements in Definition 12.3.1 and 12.3.2 a protocol is sometimes called unconditionally secure, if no additional assumptions about the accessible cheating strategies are necessary (loosely speaking: the “laws of quantum mechanics” are the only restriction). 12.3.3 The unitary normal form A special class of quantum coin tossing arises if: 1. all algebras are quantum, i.e. A = B(HA ), B = B(HB ) and M = B(HM ) with Hilbert spaces HA , HB and HM ; 2. the initial preparation is pure: ρA = |ψA ihψA | and ρB = |ψB ihψB | with ψA ∈ HA ⊗ HM and ψB ∈ HB ; 3. the operations TA,j , TB,j are unitarily implemented: ∗ TA,j (ρ) = UA,j ρUA,j with a unitary operator UA,j on HA ⊗ HM and something similar holds for Bob and 4. the observables EA , EB are projection valued. It is easy to see that the corresponding strategies (sA , sB ) ∈ ΣA × ΣB do not admit a proper convex decomposition into other strategies. Hence they represent the pure strategies. In contrast to the classical case it is possible to construct correct coin tossing protocols with pure strategies. The following proposition was stated for the first time (in a less explicit way) in [160] and shows that we can replace a mixed strategy always by a pure one without loosing security. Proposition 12.3.4 For each strategy sA ∈ ΣA (A, M) with A ⊂ B(HA ) there is a e M) with Ae = B(HA ⊗ KA ) such Hilbert space KA and a pure strategy σ eA ∈ ΣA (A, 9 That means there are no outside probability experiments like dice throws. 12.3. Quantum coin tossing 187 that υ(sA , sB ; x, y) = υ(e σA , sB ; x, y) (12.30) holds for all sB ∈ ΣB (B, M) (with arbitrary Bob algebra B) and all x, y ∈ {0, 1, ∅}. A similar statement holds for Bob’s strategies. Proof. Note first that all observable algebras A, B and M are linear subspaces of pure quantum algebras, i.e. A ⊂ B(HA ), B ⊂ B(HB ) and M ⊂ B(HM ). In addition it can be shown that Alice’s operations TA : A ⊗ M → B(HA ) ⊗ B(HM ) can be extended to a channel TeA : B(HA ) ⊗ B(HM ) → B(HA ) ⊗ B(HM ), i.e. a quantum operation [178]; something similar holds for Bob’s operations. Hence we can restrict the proof to the case where all three observable algebras are quantum. Now the statement basically follows from the fact that we can find for each item in the sequence TA = (ρA ; TA,2 , . . . , TA,N ; EA ) a “dilation”. For the operations TA,j this is just the ancilla representation given in Corollary 3.2.1 , i.e. ¢ ¡ ∗ (12.31) (ρ) = tr2 Vj (ρ ⊗ |φj ihφj |)Vj∗ TA,j with a Hilbert space Lj , a unitary Vj on HA ⊗ Lj and a pure state φj ∈ Lj (and tr2 denotes the partial trace over Lj ). Similarly, there is a Hilbert space L0 and a pure state φ0 ∈ HA ⊗ L0 such that ρA = tr2 (|φ0 ihφ0 |) (12.32) holds (i.e. φ0 is the purification of ρA ; cf. Sect. 2.2), and finally we have a Hilbert space LN +2 , a pure state φN +2 and a projection valued measure F0 , F1 , F∅ ∈ B(HA ⊗ LN +2 ) with ¡ ¢ tr(EA,x ρ) = tr Fx (ρ ⊗ |φN +2 ihφN +2 |) , (12.33) this is another consequence of Stinesprings theorem. Now we can define the pure strategy σ eA as follows: KA = L0 ⊗ L2 ⊗ . . . ⊗ LN ⊗ LN +2 UA,j ψA = φ0 ⊗ φ2 ⊗ · · · ⊗ φN ⊗ φN +2 = 1I0 ⊗ 1I2 ⊗ · · · ⊗ Vj ⊗ · · · ⊗ 1IN ⊗ 1IN +2 eA,x = 1I0 ⊗ · · · ⊗ 1IN ⊗ Fx , E (12.34) (12.35) (12.36) (12.37) where 1Ik denotes the unit operator on Lk and in Equation (12.36) we have implicitly used the canonical isomorphism between HA ⊗KA and L0 ⊗· · ·⊗HA ⊗Lj ⊗. . .⊗LN +2 . It is now easy to show σ eA satisfies Equation (12.30). 2 This result allows us to restrict many discussions to pure strategies. This is very useful for the proof of no-go theorems or for calculations of general bounds on the bias of coin-tossing protocols. This concerns in particular the results in [6] which apply, due to Proposition 12.3.4 immediately to the general case introduced in Section 12.3.1. Many concrete examples (cf. the next section) are however mixed protocols and to rewrite them in a pure form is not necessarily helpful. 12.3.4 A particular example In this section we are giving a concrete example for a strong coin tossing protocol. It has a bias of ² = 0.25 and is derived from a quantum bit commitment protocol (a procedure related to coin tossing) given in [199]. Bit commitment is another two person protocol which is related to coin tossing. It is always possible to construct a coin tossing protocol from a bit commitment protocol but not the other way round (cf. [132]). Hence statements about the security of certain bit commitment protocols can be translated into statements about the bias of the related coin tossing protocols. This shows together with [199] that the given protocol has the claimed bias. 12. Quantum game theory 188 The protocol. — In this protocol we take HA = HM = C3 , HB = C3 ⊗ C3 plus classical parts of at most 2 bits for each notepad. The canonical base in the Hilbert space C3 is denoted by |ii, i = 0, 1, 2 1. preparation step: Alice throws a coin, the result is bA ∈ {0, 1}, with probability 1/2 each. She stores the result and prepares the system B(HA ) ⊗ B(HM ) in the state |ψbA ihψbA |, where |ψ0 i = √12 (|0, 0i + |1, 2i) and |ψ1 i = √1 (|1, 1i + |0, 2i) are orthogonal to each other. Bob throws a similar coin, and 2 stores the result bB . The initial preparation of his quantum part is arbitrary. 2. Bob reads the mailbox (i.e. swaps it with the second part of his Hilbert space) and sends bB to Alice. 3. Alice receives bB and puts her remaining quantum system into the mailbox. 4. Bob reads the mailbox and puts the system into the first slot of this quantum register. 5. results: The result on Alice’s side is bA ⊕ bB , where ⊕ is the addition modulo 2. Bob performs a projective measurement on his quantum system with P 0 = |ψ0 ihψ0 |, P1 = |ψ1 ihψ1 | and P(∅) = 1I − P0 − P1 , with result b0A . If everybody followed the protocol b0A = bA . So the result on Bob’s side is b0A ⊕ bB . Possible cheating strategies. — Now we will give possible cheating strategies for each party which lead to the maximal probability of achieving the preferred outcome. For simplicity we just look at the case where Alice prefers the outcome to be 0, whereas Bob prefers it to be 1, cheating strategies for the other cases are easily derivable. A cheating strategy for Bob is to try to distinguish in step 2 whether Alice has prepared |ψ0 i or |ψ1 i. For this purpose he performs the measurement (|0ih0|, |1ih1|, |2ih2|). If the result cB 6= 2 (the probability for this in either case is 1/2) he can identify bA = cB and set bB = cB ⊕ 1 to achieve the overall result 1. If cB = 2 holds, he has not learned anything about bA . In that case he just continues with the protocol and hopes for the desired result, which appears with the probability 1/2.10 So the total probability for Bob to achieve the result 0 is 1 1 3 1 2 + 2 · 2 = 4. A cheating strategy for Alice is to set in the initial step bA = 0 and to prepare f0 i = √1 (|0, 0i + |0, 1i + 2 · |1, 2i). Then the system B(HA ) ⊗ B(HM ) in the state |ψ 6 she continues until step 3. If bB = 0 she just continues with the protocol. Then the f0 ihψ f0 |·|ψ0 ihψ0 |) = probability that in the last step Bob measures b0B = 0 equals tr(|ψ f0 i|2 = 3 . If bB = 1 she first applies a unitary operator, which swaps |0i and |hψ0 |ψ 4 f1 ihψ f1 | |1i, on her system before she sends it to Bob. The state on Bob’s side is then | ψ f1 i = √1 (|1, 0i + |1, 1i + 2 · |0, 2i). The probability that Bob measures b0 = 1 with |ψ B 6 3 2 f f f equals tr(|ψ1 ihψ1 | · |ψ1 ihψ1 |) = |hψ1 |ψ1 i| = 4 . So the total probability for Alice to get the outcome 0 is 12 · 34 + 12 · 34 = 34 . 12.3.5 Bounds on security The previous example shows that quantum coin tossing admits, in contrast to the classical case, a nontrivial bias. However, how secure quantum coin tossing really is? Can we reach the optimal case (² = 0)? The answer actually is “no”. This was first proven by Mayers, Salvail and Chiba-Kohno [161]. Later on Ambainis recalls the 10 After that measurement he is no longer able to figure out which outcome occurs on Alice’s side, so he just sets his outcome to 1. A similar situation occurs in the cheating strategy for Alice, but she is in neither case able to predict the outcome on Bob’s side with certainty. 189 12.3. Quantum coin tossing arguments in a more explicit form [6]11 . It is still an open question, whether there exists quantum coin tossing protocols with bias arbitrarily near to zero. Ambainis also shows that a coin tossing protocol with a bias of at most ² must use at least Ω(log log 1² ) rounds of communication. Although in that paper he gives only the proof for strong coin tossing, it holds in the weak case as well. It follows that a protocol cannot be made arbitrarily secure (i.e. have a sequence of protocols with ² → 0) by just increasing the amount of information exchanged in each step. The number of rounds has to go to infinity (although very slowly). The strong coin tossing protocol given in section 12.3.4 has a bias of ² = 0.25. Another one with the same bias is given by Ambainis [6]. No strong protocol with provable smaller bias is known yet. The best known weak protocol is given by Spekkens and Rudolph [200] and has a bias of ² = √12 − 12 = 0.207 . . . . Although this is still far from arbitrarily secure, it shows another distinction between classical and quantum information, as in a classical world no protocol with bias smaller than 0.5 is possible. Another interesting topic in quantum coin tossing is the question of cheatsensitivity, that means how much can each player increase the probability of one outcome without risking being caught cheating. For more about this cf e.g. [200] or [106]. 11 The first attempt for a proof, given by Lo and Chau [154]. However, its validity is restricted to the case where ‘cheating’ always influences the probabilities of both valid outcomes. More precisely they demand that the probabilities for the outcomes 0 and 1 are equal, for any cheating strategy. This restriction is too strong, even if Alice and Bob sit together and throw a real coin one of them can always say he (she) does not accept the result (and for example refuses to pay his loss) and so put the probability for one outcome to zero while the probability for the other one and the outcome invalid are 1/2 each. Chapter 13 Infinitely entangled states Many of the concepts of entanglement theory were originally developed for quantum systems described in finite dimensional Hilbert spaces. This restriction is often justified, since we are usually only trying to coherently manipulate a small part of the system. On the other hand, a full description of almost any system, beginning with a single elementary particle, requires an infinite dimensional Hilbert space. Hence if one wants to discuss decoherence mechanisms arising from the coupling of the “qubit part” of the system with the remaining degrees of freedom, it is necessary to widen the context of entanglement theory to infinite dimensions. This is not difficult, since many of the basic notions, e.g. the definitions of entanglement measures, like the reduced von Neumann entropy or entanglement of formation, carry over almost unchanged, merely with finite sums replaced by infinite series. More serious are some technical problems arising from the fact that such entanglement measures can now become infinite, and are no longer continuous functions of the state. Luckily, as shown in recent work of Eisert et. al. [83], these problems can be tamed to a high degree, if one imposes some natural energy constraints on the systems. In this chapter we look at some not-so-tame states, which should be considered as idealized descriptions of situations in which very much entanglement is available. For example, in the study of “entanglement assisted capacity” (Subsection 6.2.3) one assumes that the communicating partners have an unlimited supply of shared maximally entangled singlets. In quantum information problems involving canonical variables it is easily seen that perfect operations can only be expected in the limit of an “infinitely squeezed” two mode gaussian state as entanglement resource (see also Section 13.5). But infinite entanglement is not only a desirable resource, it is also a natural property of some physical systems, such as the vacuum in quantum field theory (see [204, 205] and Section 13.4 below). Our aim is to show that one can analyze these situations by writing down bona fide states on suitably constructed systems. This chapter is mainly based on [135]. Related publications are [83] where entangled density matrices in infinite dimensional Hilbert spaces are studied and [58, 59, 60] concerning EPR states (cf. Section 13.5). 13.1 Density operators on infinite dimensional Hilbert space We will start our discussion with a short look at entanglement properties of density operators on an infinite dimensional but separable1 Hilbert space H ⊗ H. Most of the definitions of entanglement quantities carry over from the finite dimensional setting without essential change. Since we want to see how these quantities may diverge, let us look mainly at the smallest, the distillible entanglement (cf. Subsection 5.1.3). Effectively, distillation protocols for infinite dimensional systems can be built up by first projecting to a suitable finite dimensional subspace, and subsequently applying finite dimensional procedures to the result. With this in mind we can easily construct pure states with infinite distillible 1 Another extension of this framework, namely to Hilbert spaces of uncountable dimension (i.e., unseparable ones in the topological sense) is not really interesting with regard to entanglement theory, since any density operator has separable support, i.e., it is zero on all but countably many dimensions. 13.2. Infinite one-copy entanglement 191 entanglement. Let us consider vectors in Schmidt form, i.e., X cn e0n ⊗ e00n Φ= (13.1) n P with orthonormal bases e0n , e00n , and positive numbers cn ≥ 0, and n |cn |2 = 1. The density operator of the restriction of Pthis state to Alices’s subsystem has eigenvalues c2n , and von Neumann entropy − n c2n log2 (c2n ), which we can take to be infinite (cn = 1/(Z(n + 2) log2 (n + 2)2 ) will do). We can distill this by using more and more of the dimensions as labelled by the bases e0n , e00n , and applying the known finite dimensional distillation procedures to this to get out arbitrary amount of entanglement per pair. Once this is done, it is also easy to construct mixed states with large entanglement in the neighborhood of any state ρ, mixed or pure, separable or not. We only have to remember that every state is essentially (i.e., up to errors of small probability) supported on a finite dimensional subspace. Therefore we can consider the mixture ρ² = (1 − ²)ρ + ²σ with a small fraction of an infinitely entangled pure state σ, which is supported on those parts of the Hilbert space, where ρ is nearly zero. Therefore distillation based on the support of σ will work for ρ² and produce arbitrarily large entanglement per ρ² pair, in spite of the constant reduction factor ². For the details of such arguments we refer to [59, 124]. The argument as given here does not quite show that states of infinite distillible entanglement are norm dense, but it certainly establishes the discontinuity of the function “distillible entanglement” with respect to the trace norm topology. This might appear to show that the approach to distillible entanglement based on finite dimensional systems is fundamentally flawed: If only finitely many dimensions out of the infinitely many providing a full description of the particle/system are used, might not the entanglement be misrepresented completely? Here it helps that states living on a far out subspace in Hilbert space usually also have large or infinite energy. For typical confined systems, the subspaces with bounded energy are finite dimensional, so if we assume a realistic a priori bound on the energy expectation of the states on the consideration, continuity can be restored [83]. 13.2 Infinite one-copy entanglement If the distillable entanglement of a state is infinite: how much of that entanglement can we get out? If sufficiently many copies of the state are given, we can use of course a distillation process producing in the long run infinitely many nearly pure singlets per original entangled pair. But if the entanglement is infinite, might it not be possible to use only one copy of the state in the first place? In other words,are there states, which can be used as a one time resource, to teleport an arbitrary number of qubits? We will now give a definition of such states. The extraction of entanglement will be described by a sequence of operations resulting in a pair of d-level systems with finite d. The extraction is successful, if this pair is in a nearly maximally entangled state, when one starts from the given input state. The overall operation is then given mathematically by a completely positive, trace preserving map Ed . Of course, we must make sure that the extraction process does not generate entanglement. There are different ways of expressing this mathematically. For example, we could allow E d to be LOCC operations (Subsection 3.2.6). We will also consider a much weaker, and much more easily verified condition, namely that Ed takes pure product states into states with positive partial transpose (“PPtPPT operations” for “pure product to positive partial transpose”). Of course, every LOCC channel is a PPtPPT channel. The success is measured by the fidelity of the output state Ed (ρ) with a fixed maximally entangled state on Cd ⊗ Cd . By pd we denote the projection onto this 13. Infinitely entangled states 192 maximally entangled reference vector. Then a density operator ρ is said to have “infinite one-copy entanglement”, if for any ε > 0 and any d ∈ N there is a PPtPPT channel Ed such that tr(Ed (ρ)pd ) ≥ 1 − ε . (13.2) Then we have the following Theorem, whose proof uses a distillation estimate of Rains [184] developed for the finite dimensional context. Theorem 13.2.1 For any sequence of PPtPPT channels Ed , d ∈ N, and for any fixed density operator ρ we have lim tr(Ed (ρ)pd ) = 0 . d→∞ (13.3) In particular, no density operator with infinite one-copy entanglement exists. Proof. Consider the operators Ad defined by tr(ρAd ) = tr(Ed (ρ)pd ). (13.4) In order to verify that Ad exists, observe that Ed , as a positive operator is automatically norm continuous. Hence the right hand side is a norm continuous linear functional on density matrices ρ. Since the set of bounded operators is the dual Banach space of the set of trace class operators [186, Theorem VI.26] such functionals are indeed of the form (13.4). We now have to show that, for every ρ, we have limd tr(ρAd ) = 0, i.e., that Ad → 0 in the weak*-topology of this dual Banach space. Obviously, 0 ≤ Ad ≤ 1I, and by the Banach-Alaoglu Theorem [186, Theorem IV.21], this set is compact in the topology for which we want to show convergence. Hence the sequence has accumulation points and we only have to show that all accumulation points are zero. Let A∞ denote such a point. Then it suffices to show that tr(σA∞ ) = 0 for all pure product states σ. Indeed, since A∞ ≥ 0, this condition forces A∞ φ ⊗ ψ = 0 for all pairs of vectors φ, ψ, and hence A∞ = 0, because such vectors span the tensor product Hilbert space. On the other hand, our locality condition is strong enough to allow us to compute the limit directly for pure product states σ. We claim that tr(σAd ) = tr(Ed (σ)pd ) = tr(Ed (σ)T2 pTd 2 ) ≤ kpTd 2 k = 1/d (13.5) Here we denote by X T2 the partial transposition with respect to the second tensor factor, of an operator X on the finite dimensional space Cd ⊗ Cd , and use that this operation is unitary with respect to the Hilbert-Schmidt scalar product hX, Y i HS = tr(X ∗ Y ). By assumption, Ed (σ)T2 ≥ 0, and since partial transposition preserves the trace, Ed (σ)T2 is even a density operator. Hence the expectation value of pTd 2 in this state is bounded by the norm of this operator. But it is easily verified that p Td 2 is just (1/d) times the unitary operator exchanging the two tensor factors. Hence its norm is (1/d). Taking the limit of this estimate along a sub-net of Ad converging to A∞ , we find tr(σA∞ ) = 0. 2 13.3 Singular states and infinitely many degrees of freedom In this section we will show how to construct a system of infinitely many singlets. It is clear from Theorem 13.2.1 that not all of the well-known features of the finite situation will carry over. Nevertheless, we will stay as closely as possible to the standard constructions trying to pretend that ∞ is finite, and work out the necessary modifications as we go along. The crucial point is, as we will see that the 193 13.3. Singular states and infinitely many degrees of freedom equivalence between descriptions of quantum states in terms density matrices and expectation value functionals, which we have discussed in the finite dimensional case in Section 2.1, breaks down when observable algebras become infinite dimensional (cf. Subsection 13.3.2). 13.3.1 Von Neumann’s incomplete infinite tensor product of Hilbert spaces The first difficulty we encounter is the construction of Hilbert spaces for Alice’s and Bob’s subsystem, respectively, which should be the infinite tensor power (C 2 )⊗∞ of the one qubit space C2 . Let us recall the definition of a tensor product: it is a Hilbert space generated by linear combination and norm limits from basic vectors written N∞ as Φ = j=1 φj , where φj is a vector in the jth tensor factor. All we need to know to construct the tensor product as the completion of formal linear combinations of such vectors are their scalar products, which are, by definition, *∞ + ∞ ∞ O O Y ψj = φj , hφj , ψj i . (13.6) j=1 j=1 j=1 The problem lies in this infinite product, which clearly need not converge for arbitrary choice of vectors φj , ψj . A well-known way out of this dilemma, known as von Neumann’s incomplete tensor product [223] is to restrict the possible sequences of vectors φ1 , φ2 , . . . in the basic product vectors: for each tensor factor, one picks a reference unit vector χj , and only sequences are allowed for which φj = χj holds for all but a finite number of indices. Evidently, if this property holds for both the φ j and the ψj the product in (13.6) contains only a finite number of factors 6= 1, and converges.P By taking norm limits of such vectors we see that also product vectors ∞ for which j=1 kφj − χj k < ∞ are included in the infinite product Hilbert space. However, the choice of reference vectors χj necessarily breaks the full unitary symmetry of the factors, as far as asymptotic properties for j → ∞ are concerned. For the case at hand, i.e., qubit systems, let us choose, for definiteness, the “spin up” vector as χj for every j, and denote the resulting space by H∞ . An important observation about this construction is that all observables of finite tensor product subsystems N∞ act as operators on this infinite tensor product space. In fact, any operator j=1 Aj makes sense on the incomplete tensor product, as long as Aj = 1I for all but finitely many indices. The algebra of such operators is known as the algebra of local observables. It has the structure of a *-algebra, and its closure in operator norm is called quasi-local algebra [35]. Let us take the space H∞ as Alice’s and Bob’s Hilbert space. Then each of them holds infinitely many qubits, and we can discuss the entanglement contained in a density operator on H∞ ⊗ H∞ . Clearly, there is no general upper bound to this entanglement, since we can take a maximally entangled state on the first M < ∞ factors, complemented by infinitely many spin-up product states on the remaining qubit pairs. But for any fixed density operator the entanglement is limited: for measurements on qubit pairs with sufficiently large j we always get nearly the same expectations as for two uncorrelated spin-up qubits (or whatever the reference states χj dictate). This is just another instance of Theorem 13.2.1: there is no density operator describing infinitely many singlets. 13.3.2 Singular states However, can we not take the limit of states with growing entanglement? To be specific, let ΦM denote the vector which is a product of singlet states for the first M qubit pairs, and a spin-up product for the remaining ones. These vectors do not converge in H∞ ⊗ H∞ , but that need not concern us, if we are only interested in expectation values: for all local observables A (observables depending on only 13. Infinitely entangled states 194 finitely many qubits), the limit ω(A) = limhΦM , AΦM i M (13.7) exists. Thereby we get an expectation value functional for all quasi-local observables, and by the Hahn-Banach Theorem (see e.g. [186, Theorem III.6]), we can extend this expectation value functional to all bounded operators on H ∞ ⊗ H∞ . The extended functional ω has all the properties required by the statistical interpretation of quantum mechanics: linearity in A, ω(A) ≥ 0 for positive A, and ω(1I) = 1. In other words it is a state on the algebra B(H) as we have introduced them in Subsection 2.1.1. By construction, ω describes maximal entanglement for any finite collection of qubit pairs, so it is truly a state of infinitely many singlets. How does this match with Theorem 13.2.1? The crucial point is that that Theorem only speaks of states given by the trace with a density operator, i.e., of functionals of the form ωρ (A) = tr(ρA). Such states are called “normal” and for a finite dimensional algebra each state is normal (cf. Subsection 2.1.2). But in the infinite dimensional case this equivalence between the two different descriptions of quantum states breaks down. In other words there is no density operator for ω: this is a singular state on the algebra of bounded operators. Singular states are not that unusual in quantum mechanics, although they can only be “constructed” by an invocation the Axiom of Choice, usually through the Hahn-Banach Theorem2 . For example, we can think of a non-relativistic particle localized at a sharp point, as witnessed by the expectations of all continuous functions of position. Extending from this algebra to all bounded operators, we get a singular state with sharp position3 , but “infinite momentum”, i.e., the probability assigned to finding the momentum in any given finite interval is zero [235]. This shows that the probability measure on the momentum space induced by such a state is only finitely additive, but not σ-additive. This is typical for singular states. More practical situations involving singular states arise in all systems with infinitely many degrees of freedom, as in quantum field theory and in statistical mechanics in the thermodynamic limit. For example, the equilibrium state of a free Bose gas in infinite space at finite density and temperature is singular with respect to Fock space because the probability for finding only a finite number of particles in such a state is zero. In all these cases, one is primarily interested in the expectations of certain meaningful observables (e.g., local observables), and the wilder aspects of singular states are connected only to the extension of the state to all bounded operators. Therefore it is a good strategy to focus on the state as an expectation functional only on the “good” observables. 13.3.3 Local observable algebras If we want to represent a situation with infinitely many singlets, an obvious approach is to take again von Neumann’s incomplete tensor product, but this time the infinite tensor product of pairs rather than single qubits, with the singlet vector chosen as the reference vector χj for every pair. We denote this space by H∞∞ , and by Ω ∈ H∞∞ the infinite tensor product of singlet vectors. Clearly, this is a normal state (with density operator |ΩihΩ|), and we seem to have gotten around Theorem 13.2.1 after all. However, the problem is now to identify the Hilbert spaces of Alice and Bob as tensor factors of H∞∞ . To be sure, the observables measurable by Alice and Bob, respectively, are easily identified. For example, the σx -Pauli matrix for Alice’s 137th 2 Other constructions based on the Axiom of Choice are the application of invariant means, e.g., when averaging expectation values over all translations, or algebraic constructions using maximal ideals. For an application in von Neumann style measurement theory of continuous spectra, see [176] 3 This is not related to improper eigenkets of position, which do not yield normalized states 195 13.3. Singular states and infinitely many degrees of freedom particle is a well defined operator on H∞∞ . Alice’s observable algebra A is generated by the collection of all Alice observables for each pair. Bob’s observable algebra B is generated similarly, and together they generate the local algebra of the pair system. Moreover, the two observable algebras commute elementwise. This is just what we expect from the usual setup, when the total Hilbert space is H = HA ⊗ HB , and Alice’s and Bob’s observable algebras are A = B(HA ) ⊗ 1IB and B = 1IA ⊗ B(HB ). However, the A and B constructed above are definitely not of this form, so H ∞∞ has no corresponding decomposition as HA ⊗ HB . The most direct way of seeing this is to note that H∞∞ contains no product vectors describing an uncorrelated preparation of the two subsystems. If we move to qubit pairs with sufficiently high index, then by construction of the incomplete tensor product, every vector in H ∞∞ will be close to the singlet vector, and in particular, will violate Bell’s inequality nearly maximally (see also Section 13.4.2). Hence we arrive at the following generalized notion of bipartite states, generalizing the finite dimensional one: Alice’s and Bob’s subsystems are identified by their respective observable algebras A and B. We postpone the discussion of the precise technical properties of these algebras. What is important is, on the one hand, that these algebras are part of a larger system, so they are both subalgebras of a larger algebra, typically the algebra B(H) of bounded operators on some Hilbert space. This allows us to consider products and correlations between the two algebras. On the other hand, each measurement Alice chooses must be compatible with each one chosen by Bob. This requires that A and B commute elementwise. A bipartite state is then simply a state on the algebra containing both A and B. We can then describe the two ways out of the NoGo-Theorem: on the one hand we can allow more general states than density matrices, but on the other hand we can also consider more general observable algebras. In the examples we will discuss, the algebra containing A and B will in fact be of the form B(H), and the states will be given by density matrices on H. So both strategies can be successful by themselves. 13.3.4 Some basic facts about operator algebras The possibility of going either to singular states or to extended observable algebras is typical of the duality of states and observables in quantum mechanics. There are many contexts, where it is useful to extend either the set of states or the set of observables by idealized elements, usually obtained by some limit. However, these two idealizations may not be compatible [235]. There are two types of operator algebras which differ precisely in the strength of the limit procedures under which they are closed [35, 208]. On the one hand there are C*-algebras, which are isomorphic to norm and adjoint closed algebras of operators on a Hilbert space. Norm limits are quite restrictive, so some operations are not possible in this framework. In particular, the spectral projections of an hermitian element of the algebra often do not lie again in the algebra (although all continuous functions will). Therefore, it is often useful to extend the algebra by all elements obtained as weak limits (meaning that all matrix elements converge). In such von Neumann algebras the spectral theorem holds. Moreover, the limit of an increasing but bounded sequence of elements always converges in the algebra. For these algebras the distinction between normal and singular states becomes relevant. The normal states are simply those for which such increasing limits converge, and at the same time those which can be represented by a density operator in the ambient Hilbert space. A basic operation for von Neumann algebras is the formation of the commutant: for any set M ⊂ B(H) closed under the adjoint operation, we define its commutant 13. Infinitely entangled states as the von Neumann algebra (cf. Subsection 8.2.2) ¯ o n ¯ M0 = X ∈ B(H) ¯ ∀M ∈ M [M, X] = 0 . 196 (13.8) Then the Bicommutant Theorem [208] states that M00 = (M0 )0 is the smallest von Neumann algebra containing M. In particular, when M is already an algebra, M 00 is the weak closure of M. Von Neumann algebras are characterized by the property M00 = M. A von Neumann algebra M with the property that its only elements commuting with all others are the multiples of the identity (i.e., M0 ∩ M00 = C1I) is called a factor. It might seem that the two ways out of the NoGo-Theorem indicated at the end of the previous section are opposite to each other, but in fact they are closely related. For if ω is a state on a C*-algebra C ⊃ A ∪ B, we can associate with it a Hilbert space Hω , a representation πω : C → B(H), and a unit vector Ω ∈ Hω , such that ω(C) = hΩ, πω (C)Ωi, and such that the vectors πω (C)Ω are dense in Hω . This is called the Gelfand-Naimark-Segal (GNS)-construction [35]. Clearly, the given state ω is given by a density operator (namely |ΩihΩ|) in this new representation and the algebra can naturally be extended to the weak closure πω (C)00 . The commutativity of two subalgebras is preserved by the weak closure, so the normal state |ΩihΩ|, and the two commuting von Neumann subalgebas πω (A)00 and πω (B)00 are again a bipartite system, which describes essentially the same situation. The only difference is that some additional idealized observables arise from the weak closure operations, and that some observables in C (those with C ≥ 0 but ω(C) = 0) are represented by zero in πω . We remark that von Neumann’s incomplete infinite tensor product of Hilbert spaces can be seen as aNspecial case of the GNS-construction: The infinite tensor product of C*-algebras i Ai is well-defined (see [35, Sec 2.6] for precise conditions), N essentially by taking the norm completion of the algebra of local observables i Ai , with all but finitely many factors Ai ∈ Ai equal to 1Ii . On this algebra the infinite tensor product of states is well-defined, N and we get the incomplete tensor product as the GNS-Hilbert space of the algebra i B(Hi ) with respect to the pure product state defined by the reference vectors χi . 13.4 Von Neumann algebras with maximal entanglement 13.4.1 Characterization and basic properties Let us analyze the example given in the last section: the bipartite state obtained from the incomplete tensor product of singlets in H∞∞ . We take as Alice’s observable algebra A the von Neumann algebra generated by all local Alice operators (and analogously N for Bob). The bipartite state on these algebras, given by the reference vector i χi , then has the following properties ME 1 A and B together generate B(H) as a von Neumann algebra, so there are no additional observables of the system beyond those measurable by Alice and Bob. ME 2 A and B are maximal with respect to mutual commutativity. (i.e., A = B 0 and B = A0 ) ME 3 The overall state is pure, i.e., given by a vector Ω ∈ H, ME 4 The restriction of this state to either subsystem is a trace, so ω(A 1 A2 ) = ω(A2 A1 ), for A1 , A2 ∈ A. ME 5 A is hyperfinite, i.e., it is the weak closure of an increasing family of finite dimensional algebras. 197 13.4. Von Neumann algebras with maximal entanglement These properties, except perhaps ME 2 (see [7]) are immediately clear from the construction, and the properties of the respective local observables. They are also true for finite dimensional maximally entangled states on H = HA ⊗ HB , A = B(HA ) ⊗ 1I, and B = 1I ⊗ B(HB ). This justifies calling this particular bipartite system maximally entangled, as well. There are many free parameters in this construction. For example, we could take arbitrary dimensions di < ∞ for the ith pair. However, all these possibilities lead to the same maximally entangled system: Theorem 13.4.1 All bipartite states on infinite dimensional systems satisfying conditions ME 1 - ME 5 above are unitarily isomorphic. Proof. (Sketch). We first remark that A has to be a factor, i.e., A ∩ A0 = C1I. Indeed, using ME 1 and ME 2, we get A ∩ A0 = B 0 ∩ A0 = (B ∪ A)0 = B(H)0 = C1I. Now consider the support projection S ∈ A of the restriction of the state to A. Thus 1I − S is the largest projection in A with vanishing expectation. Suppose that this projection does not lie in the center of A, i.e., there is an A ∈ A such that AS 6= SA. Let X = (1I − S)AS, which must then be nonzero, as AS − SA = ((1I − S) + S)(AS − SA) = X − SA(1I − S). Then using the trace property we get ω(X ∗ X) = ω(XX ∗ ) ≤ kAk2 ω(1I−S) = 0, which implies that the support projection of X ∗ X has vanishing expectation. But since X ∗ X ≤ kAk2 S, this contradicts the maximality of (1I − S). It follows that S lies in the center of A and that S = 1I, because A is a factor. To summarize this argument, ω must be faithful, in the sense that A ∈ A, A ≥ 0, and ω(A) = 0 imply A = 0. Now consider the subspace spanned by all vectors of the form AΩ, with A ∈ A. This subspace is invariant under A, so its orthogonal projection is in A 0 = B. But since (1I − P ) obviously has vanishing expectation, the previous arguments, applied to B imply that P = 1I. This is to say that AΩ is dense in H or, in the jargon of operator algebras, that Ω is cyclic for A. Thus H is unitarily equivalent to the GNS-Hilbert space of ω restricted to A, and the form of B = A0 is completely determined by this statement. Now a factor admits at most one trace state, so ω is uniquely determined by the isomorphism type of A as a von Neumann algebra, and it remains to show that A is uniquely determined by the above conditions. A is a factor admitting a faithful normal trace state, so it is a “type II1 -factor” in von Neumann’s classification. It is also hyperfinite, so we can invoke a deep result of Alain Connes [61] stating that such a factor is uniquely determined up to isomorphism. 2 For the rest of this section we will study further properties of this unique maximally entangled state of infinite entanglement. The items ME 6, ME 7 below are clear from the above proof. ME 8 follows by splitting the infinite tensor product either into a finite product and an infinite tail, or into factors with even and odd labels, respectively. ME 9 - ME 11 are treated in separate subsections as indicated. ME 6 A and B are factors: A ∩ A0 = C1I. ME 7 AΩ and BΩ are dense in H. ME 8 The state contains infinite one-shot entanglement, which is not diminished by extracting entanglement. Moreover, it is unitarily isomorphic to two copies of itself. ME 9 Every density operator on H maximally violates the Bell-CHSH inequality (see Section 13.4.2). ME 10 The generalized Schmidt spectrum of Ω is flat (see Section 13.4.3). 13. Infinitely entangled states 198 ME 11 Every A ∈ A is completely correlated with a “double” B ∈ B. (see Section 13.4.4). 13.4.2 Characterization by violations of Bell’s inequalities If we look at systems consisting of two qubits, maximally entangled states can be characterized in terms of maximal violations of Bell-inequalities. It is natural to ask, whether something similar holds for the infinite dimensional setting introduced in Section 13.4. To answer this question consider again a bipartite state ω on an algebra containing two mutually commuting algebras A, B describing Alice’s and Bob’s observables, respectively. We define the Bell correlations with respect to A and B in ω as β(ω) = 1 sup ω(A1 (B1 + B2 ) + A2 (B1 − B2 )), 2 (13.9) where the supremum is taken over all selfadjoint Ai ∈ A, Bj ∈ B satisfying −1I ≤ Ai ≤ 1I, −1I ≤ Bj ≤ 1I, for i, j = 1, 2. In other words A1 , A2 and B1 , B2 are (appropriately bounded) observables measurable by Alice respectively Bob. Of course, a classically correlated (separable) state, or any other state consistent with a local hidden variable model [229] satisfies the Bell-CHSH-inequality β(ω) ≤ 1. Exactly as in the standard case, we can show Cirelson’s inequality [56, 206, 233] bounding the quantum violations of the inequality as √ (13.10) β(ω) ≤ 2. √ If the upper bound 2 is attained we speak of a maximal violation of Bell’s inequality. It is clear that the maximally entangled state described above does saturate this bound: In the infinite tensor product construction of H = H∞∞ we only need to take observables Ai , Bi from the first tensor factor. But we could also have chosen similar observables Ai,k , Bi,k (i = 1, 2) for the k th qubit pair. Let us denote by Tk = A1,k (B1,k + B2,k ) + A2,k (B1,k − B2,k ) (13.11) the “test operator” for the k th qubit pair, whose expectation enters the Bell-CHSHinequality. Then for a dense set of vectors φ ∈ H, namely for those differing √ from the reference vector in only finitely many positions, we get hφ, Tk φi = 2 for all sufficiently large k. Since the norms kT√ k k are uniformly bounded, a simple 3εargument shows that limk→∞ hφ, Tk φi = 2 for all φ ∈ H∞∞ . By taking mixtures we find √ lim tr(ρTk ) = 2 (13.12) k→∞ for all density operators ρ on H∞∞ . This property is clearly impossible in the finite dimensional case: any product state would violate it. This clarifies the statement in Section 13.3.3 that H∞∞ is in no way a tensor product of Hilbert spaces for Alice and Bob. Of course, we can simply define a product state on the algebra of local operators, and then extend it by the Hahn-Banach Theorem to all operators on B(H∞∞ ). However, just as the reference state of infinitely many singlets is a singular state on B(H∞ ⊗ H∞ ), any product state will necessarily be singular on B(H∞∞ ). It is interesting that bipartite states with property (13.12) naturally arise in quantum field theory, with A and B the algebras of observables measurable in two causally disjoint (but tangent) spacetime regions. This is true under axiomatic assumptions on the structure of local algebras, believed to hold in any free or interacting theory. The only thing that enters is indeed the structure of the local von Neumann algebras, as shown by the following Theorem [204, 205, 206]. Again the maximally entangled state plays a key role. 199 13.4. Von Neumann algebras with maximal entanglement Theorem 13.4.2 ([205]) Let A, B ⊂ B(H) be mutually commuting von Neumann algebras acting on a separable Hilbert space H. Then the following are equivalent: √ (i) For some density operator ρ, which has no zero eigenvalues, we have β(ρ) = 2. √ (ii) For every density operator ρ on H we have β(ρ) = 2. (iii) There is a set Tk of test operators formed from A and B such that (13.12) holds for all density operators ρ. (iv) There is a unitary isomorphism under which e, H = H∞∞ ⊗ H A = A1 ⊗ Ae , B = B1 ⊗ Be , e Be ⊂ B(H) e are A1 , B1 ⊂ B(H∞∞ ) are the algebras of Theorem 13.4.1, and A, other von Neumann algebras. In other words, the maximal violation of Bell’s inequalities for all normal states implies that the bipartite system is precisely the maximal entangled state, plus some e B), e which do not contribute to the violation of additional degrees of freedom (A, Bell inequalities. 13.4.3 Schmidt decomposition and modular theory The Schmidt decomposition (Proposition 2.2.1) is a key technique for analyzing bipartite pure states in the standard framework. It represents an arbitrary vector Ω ∈ HA ⊗ HB as X Ω= c α eα ⊗ f β , (13.13) α where the cα > 0 are positive constants, and {eα } ⊂ HA and {fα } ⊂ HB are orthonormal systems. Its analog in the context of von Neumann algebras is a highly developed theory with many applications in quantum field theory and statistical mechanics, known as the modular theory of Tomita and Takesaki [207]. We recommend Chapter 2.5 in [35] for an excellent exposition, and only outline some ideas and indicate the connection to the Schmidt decomposition. Throughout this subsection, we will assume that A, B ⊂ B(H) are von Neumann algebras, and Ω ∈ H is a unit vector, such that the properties ME 2, ME 3, and ME 7 of Section 13.4.1 hold. As in the case of the usual Schmidt decomposition the essential information is already contained in the restriction of the given state to the subalgebra A, i.e., by the linear functional ω(A) = hΩ, AΩi. Indeed, the Hilbert space and the cyclic vector Ω (cf. ME 7) satisfy precisely the conditions for the GNS-representation, which is unique up to unitary equivalence. Moreover, condition ME 2 fixes B as the commutant algebra. However, since A often does not admit a trace, we cannot represent ω by a density operator, and therefore we cannot use the spectrum of the density operator to characterize ω. Surprisingly, it is equilibrium statistical mechanics, which provides the notion to generalize. In the finite dimensional context, we can consider every density operator as a canonical equilibrium state, and determine from it the Hamiltonian of the system. This in turn defines a time evolution. Note that the Hamiltonian is only defined up to a constant, so we cannot expect to reconstruct the eigenvalues of H, but only the spectrum of the Liouville operator σ 7→ i[σ, H], which generates the dynamics on density operators, and has eigenvalues i(E n −Em ), when the En are the eigenvalues of H. The connection between the time evolutions 13. Infinitely entangled states 200 and equilibrium states makes sense also for von Neumann algebras, and can be seen as the physical interpretation of modular theory [35]. We begin the outline of this theory with the anti-linear operator S on H by S(AΩ) = A∗ Ω, A ∈ A. (13.14) It turns out to be closable, and we denote its closure by the same letter. As a closed operator S admits a polar decomposition S = J∆1/2 , (13.15) which defines the anti-unitary modular conjugation J and the positive modular operator ∆. Let us calculate ∆ in the standard situation, where H = K⊗K, and A = B(K)⊗1I respectively B = 1I ⊗ B(K), and Ω is in Schmidt form (13.13). Due to assumption ME 7 (cyclicity), the orthonormal systems eα and fα have to be even complete (i.e., bases). Now consider (13.14) with A = (|eβ iheγ |) ⊗ 1I, which becomes S(cγ eβ ⊗ fγ ) = cβ eγ ⊗ fβ , (13.16) from which we readily get P ∆1/2 = ρ1/2 ⊗ ρ−1/2 , and J = F (Θ ⊗ Θ), (13.17) where ρ = α c2α |eα iheα | is the reduced density operator, F φ1 ⊗ φ2 = φ2 ⊗ φ1 is the flip operator and Θ denotes complex conjugation in the en basis. The time evolution with Hamiltonian H = − log ρ + c1I, for which ω is now the equilibrium state with unit temperature, is then given by Et (A) ⊗ 1I = ∆it (A ⊗ 1I)∆−it . In the case of general von Neumann algebras, the spectrum of ∆ need no longer be discrete, and it can be a general positive, but unbounded selfadjoint operator. It turns out that ∆it still defines a time evolution on the algebra A, the so-called modular evolution The equilibrium condition cannot be written directly in the Gibbs form ρ ∝ exp(−H), since there is no density matrix any more, but has to be replaced by the so-called KMS-condition, a boundary condition for the analytic continuation of correlation functions [35, 104] which links the modular evolution to the state. In the standard situation, the eigenvalue 1 of ∆ plays a special role, because it points to degeneracies in the Schmidt spectrum. In the extreme case of a maximally entangled state all cα are equal, and ∆ = 1I or, equivalently, S is anti-unitary. This characterization of maximal entanglement carries over to the von Neumann algebra case: S is anti-unitary if and only if for all A1 , A2 ∈ A hΩ, A1 A2 Ωi = hA∗1 Ω, A2 Ωi = hSA1 Ω, SA∗2 Ωi = hA∗2 Ω, A1 Ωi = hΩ, A2 A1 Ωi. This is precisely the trace property ME 4. 13.4.4 Characterization by the EPR-doubles property In the original EPR-argument it is crucial that certain observables of Alice and Bob are perfectly correlated, so that Alice can find the values of observables on Bob’s side with certainty, without Bob having to carry out this measurement. An approach to studying such correlations was proposed recently by Arens and Varadarajan [8]. The basic idea, stripped of some measure theoretic overhead, and extended to the more general bipartite systems considered here [236], rests on the following definition. Let A, B be commuting observable algebras and ω a state on an algebra containing both A and B. Then we say that an element B ∈ B is an EPR-double of A ∈ A, or that A and B are doubles (of each other) if ¡ ¢ ¡ ¢ ω (A∗ − B ∗ )(A − B) = ω (A − B)(A∗ − B ∗ ) = 0. (13.18) 201 13.4. Von Neumann algebras with maximal entanglement Of course, when A and B are hermitian, the two expressions coincide, and in this case there is a simple interpretation of equation (13.18). Since A and B commute, we can consider their joint distribution (measuring the joint spectral resolution of A and B). Then (A−B)2 is a positive quantity, which has vanishing expectation if and only if the joint distribution is concentrated on the diagonal, i.e., if the measured values coincide with probability one. Basic properties are summarized in the following Lemma. Lemma 13.4.3 Let ω be a state on a C*-algebra containing commuting subalgebras A and B. Then (i) A and B are doubles iff for all C in the ambient observable algebra we have ω(AC) = ω(BC) and ω(CA) = ω(CB). (ii) If A1 , A2 have doubles B1 , B2 , then A∗1 , A1 + A2 , and A1 A2 have doubles B1∗ , B1 + B2 , and B2 B1 , respectively. (iii) When A and B are normal (AA∗ = A∗ A), and doubles of each other, then so are f (A) and f (B), where f is any continuous complex valued function on the spectrum of A and B, evaluated in the functional calculus. (iv) When A and B are von Neumann algebras, and ω is a normal state, and observables An with doubles Bn converge in weak*-topology to A, then every cluster point of the sequence Bn is a double of A. (v) Suppose that ω restricted to B is faithful (i.e., B 3 B ≥ 0 and ω(B) = 0 imply B = 0). Then every A ∈ A admits at most one double. Proof. (i) One direction is obvious by setting C = A∗ − B ∗ . The other direction follows from the Schwartz inequality |ω(X ∗ Y )|2 ≤ ω(X ∗ X)ω(Y ∗ Y ). The remaining items follow directly from (i). (iii) is obvious from (ii) for polynomials in A and A∗ , and extends to continuous functions by taking norm limits on the polynomial approximations to f provided by the Stone-Weierstraß approximation theorem. For (iv) one has to use the weak*-continuity of the product in each factor separately (see e.g. [189, Theorem 1.7.8]). 2 In the situation we have assumed for modular theory, we can give a detailed characterization of the elements admitting a double: Proposition 13.4.4 Suppose A and B = A0 are von Neumann algebras on a Hilbert space H, and the state ω is given by a vector Ω ∈ H, which is cyclic for both A and B. Then for every A ∈ A the following conditions are equivalent: (i) A has an EPR-double B ∈ B. (ii) A is in the centralizer of the restricted state, i.e., ω(AA1 ) = ω(A1 A) for all A1 ∈ A. (iii) A is invariant under the modular evolution ∆it A∆−it = A for all t ∈ R. In this case the double is given by B = JA∗ J. Proof. (i)⇒(ii) When A has a double B, we get ω(AA1 ) = ω(BA1 ) = ω(A1 B) = ω(A1 A) for all A1 in the ambient observable algebra. (ii)⇔(iii) This is a standard result (see, e.g., [16, Prop. 15.1.7]). (iii)⇒(i) Since ∆it Ω = Ω, (iii) implies ∆it AΩ = AΩ, so AΩ is an eigenvector for eigenvalue 1 of the unitary ∆it and ∆AΩ = AΩ. By the same token, ∆A∗ Ω = A∗ Ω. We claim that in that case B = JA∗ J ∈ B is a double of A in B: We have BΩ = 13. Infinitely entangled states 202 JA∗ JΩ = JA∗ Ω = JSAΩ = ∆AΩ = AΩ and, similarly, B ∗ Ω = A∗ Ω. From this (i) follows immediately. The formula for B was established in the last part of the proof. Uniqueness follows from Lemma 13.4.3. 2 Two special cases are of interest. On the one hand, in the standard case of a pure bipartite state we get a complete characterization of the observables which posses a double: they are exactly the ones commuting with the reduced density operator [8]. On the other hand, we can ask under what circumstances all A ∈ A admit a double. Clearly, this is the case when the centralizer in (ii) of the Proposition is all of A, i.e., if and only if the restricted state is a trace. Again this characterizes the everybody’s maximally entangled states on finite dimensional algebras, and the unique infinite dimensional one for hyperfinite von Neumann algebras. 13.5 The original EPR state In their famous 1935 paper [82] Einstein, Podolsky and Rosen studied two quantum particles with perfectly correlated momenta and perfectly anticorrelated positions. It is immediately clear that such a state does not exist in the standard framework of Hilbert space theory: the difference of the positions is a self-adjoint operator with purely absolutely continuous spectrum, so whatever density matrix we choose, the probability distribution of this quantity will have a probability density with respect to Lebesgue measure, and cannot be concentrated on a single point. Consequently, the wave function written in [82] is a pretty wild object. Essentially it is Ψ(x1 , x2 ) = cδ(x1 − x2 + a), with the Dirac delta function, and c a “normalization factor” which must vanish, because the normalization integral for the delta function is undefined, but infinite if anything. How could such a profound physical argument be based on such an ill-defined object? The answer is probably that the authors were completely aware that they were really talking about a limiting situation of more and more sharply peaked wave functions. We could model them by a sequence of more and more highly squeezed two mode Gaussian states (cf. Subsection 13.5.5), or some other sequence representation of the delta function. The key point is that the main argument does not depend on the particular approximating sequence. But then we should also be able to discuss the limiting situation directly in a rigorous way, and extract precisely what is common to all approximations of the EPR state. 13.5.1 Definition In this section we consider a family of singular states, which describes quite well what Einstein Podolsky and Rosen may have had in mind. Throughout we assume we are in the usual Hilbert space H = L2 (R2 ) for describing two canonical degrees of freedom, with position and momentum operators Q1 , Q2 , P1 , P2 . The basic observation is that the operators P1 + P2 and Q1 − Q2 commute as a consequence of the Heisenberg commutation relations. Therefore we can evaluate in the functional calculus (i.e., using a joint spectral resolution) any function of the form g(P1 + P2 , Q1 − Q2 ), where g : R2 → C is an arbitrary bounded continuous function. We define an EPR-state as any state ω such that ³ ´ ω g(P1 + P2 , Q1 − Q2 ) = g(0, a) , (13.19) where a is the fixed distance between the particles. Several comments are in order. First of all, if we take any sequence of vectors to “approximate” the EPR wave function (and adjust normalization on the way), weak*-cluster points of the corresponding sequence of pure states exist by compactness of the state space, and all these will be EPR states in the sense of our definition. Secondly, condition (13.19) does 13.5. The original EPR state 203 not fix ω uniquely. Indeed, different approximating sequences may lead to different ω. Even for a fixed approximating sequence it is rarely the case that the expectation values of all bounded operators converge, so the sequence will have many different cluster points. Thirdly, the existence of EPR states can also be seen more directly: the algebra of bounded continuous functions on R2 is faithfully represented in B(H) (i.e., g(P1 + P2 , Q1 − Q2 ) = 0 only when g is the zero function). On that algebra the point evaluation at (0, a) is a well defined state, so any Hahn-Banach extension of this state to all of B(H) will be an EPR state 4 . In our further analysis we will only look at properties which are common to all EPR states, and which are hence independent of any choice of approximating sequences. The basic technique for extracting such properties from (13.19) is to use positivity of ω in the form of the Schwartz inequality |ω(A∗ B)| ≤ ω(A∗ A)ω(B ∗ B). For example, we get ³ ´ ³ ´ ω Xb g = ω gb X = g(0, a)ω(X) , (13.20) where gb is shorthand for g(P1 + P2 , Q1 − Q2 ) for some bounded continuous function g, and X ∈ B(H) is an arbitrary bounded operator. This is shown by taking A = X ∗ and B = (b g −g(0, a)1I) (or A = (b g −g(0, a)1I) and B = X) in the Schwartz inequality. 13.5.2 Restriction to the CCR-algebra Next we consider the expectations of Weyl operators W(ξ1 , ξ2 , η1 , η2 ) = ei(ξ1 P1 +ξ2 P2 −η1 Q1 −η2 Q2 ) = ei(ξ·P −~η·Q) . ~ ~ ~ (13.21) ~ ~η ) ∈ S, we have Obviously, if ξ1 = ξ2 and η1 = −η2 , which we will abbreviate as (ξ, ~ W(ξ, ~η ) = gb for a uniformly continuous g, so (13.19) determines the expectation. Combining it with Equation (13.20) we get: ³ ´ ³ ´ ~ ~η )X = ω XW(ξ, ~ ~η ) = ω(X) , ω W(ξ, for ~ ~η ) ∈ S . (ξ, (13.22) In particular, the state is invariant under all phase space translations by vectors in S. This is already sufficient to conclude that the state is purely singular, i.e., that ω(K) = 0 for every compact operator, and in particular for all finite dimensional projections. An even stronger statement is that the restrictions to Alice’s and Bob’s subsystem are purely singular. Lemma 13.5.1 For any EPR state, and any compact operator K, ω(K ⊗ 1I) = 0. Proof. Indeed the restricted state is invariant under all phase space translations, since we can extend W(ξ, η) to a Weyl operator of the total system, i.e., W 0 (ξ, η) = W(ξ, ξ, η, −η) ∼ = W(ξ, η) ⊗ W(ξ, −η), with (ξ, ξ, η, −η) ∈ S, and ¡ ω (W(ξ, η)AW(ξ, η)∗ ) ⊗ 1I) (13.23) ¡ 0 0 ∗ = ω W (ξ, η)(A ⊗ 1I)W (ξ, η) )) . 4 The reason for defining EPR-states with respect to continuous functions of P + P and 1 2 Q1 − Q2 rather than, say, measurable functions, is that we need faithfulness. The functional calculus is well defined also for measurable functions, but some functions will evaluate to zero. In particular, for the function g(p, x) = 1 for x = a and p = 0, but g(p, x) = 0 for all other points, we get g(P1 + P2 , Q1 − Q2 ) = 0, because the joint spectrum of these operators is purely absolutely continuous. Hence condition (13.19), extended to measurable functions would require the expectation of the zero operator to be 1. 13. Infinitely entangled states 204 Now consider a unit vector χ with bounded support in position space, and let K = |χihχ| be the corresponding one-dimensional projection. Then sufficiently widely space translates W(nξ0 , 0)χ are orthogonal, and hence, for all N , the operator PN KN = n=1 W(nξ0 , 0)KW(nξ0 , 0∗ ) is bounded by 1I. Hence N ω(K) = ω(KN ) ≤ ω(1I) = 1, and ω(K) = 0. Since vectors of compact support are norm dense in Hilbert space, the conclusion holds for arbitrary 2 For other Weyl operators we get the expectations from the Weyl commutation relations ~ ~η )W(ξ~ 0 , ~η 0 ) = ei σ2 W(ξ~ + ξ~ 0 , ~η + ~η 0 ) , with W(ξ, σ = ξ~ · ~η 0 − ξ~ 0 · ~η . (13.24) This is just a form of the Heisenberg commutation relations. Now S is a so-called maximal isotropic subspace of phase space, which is to say that the commutation ~ ~η ), (ξ~ 0 , ~η 0 ) ∈ S, and no subspace of phase space strictly phase σ vanishes for (ξ, including S has the same property. ~ ~η ) in phase space, which does not belong to S, we can find For a point (ξ, 0 some vector (ξ~ , ~η 0 ) ∈ S such that the commutation phase eiσ 6= 1 is non trivial. Combining the Weyl relations (13.24) with the invariance (13.22) gives ~ ~η )) = ω(W(ξ~ 0 , ~η 0 )W(ξ, ~ ~η )) = eiσ ω(W(ξ, ~ ~η )W(ξ~ 0 , ~η 0 )) = eiσ ω(W(ξ, ~ ~η )) ω(W(ξ, which implies that the expectation values ³ ´ ~ ~η ) = 0 ~ ~η ) ∈ ω W(ξ, for (ξ, /S (13.25) must vanish. With equations (13.22,13.25) we have a complete characterization of the state ω restricted to the “CCR-algebra”, which is just the C*-algebra generated by the Weyl operators. Since this is a well-studied object, one might make these equations the starting point of an investigation of EPR states. However, one can see that (13.19) is strictly stronger: there are states which look like ω on the CCRalgebra, but which give an expectation in (13.19) corresponding to a limit of states going to infinity instead of going to zero. 13.5.3 EPR-correlations How about the correlation property, which is so important in the EPR-argument? The best way to show this is the ‘double’ formalism of Section 13.4.4 in which we denote by Z the norm closed subalgebra of operators on L2 (R) generated by all operators of the form f (ξP + ηQ), where f : R → C is an arbitrary uniformly continuous function evaluated in the functional calculus on a real linear combination ξP + ηQ of position and momentum 5 . This algebra is fairly large: it contains many observables of interest, in particular all Weyl operators and all compact operators. It is closed under phase space translations, and these act continuously in the sense that, for Z ∈ Z, kW(ξ, η)ZW(ξ, η)∗ − Zk → 0 as (ξ, η) → 0 6 . Theorem 13.5.2 All operators of the form Z ⊗ 1I with Z ∈ Z have doubles in the sense of equation (13.18). Moreover, the double of Z ⊗ 1I is 1I ⊗ Z T , where Z T denotes the transpose (adjoint followed by complex conjugation) in the position representation. Proof. We only have to show that for f (ξP + ηQ) ⊗ 1I = f (ξP1 + ηQ1 ) we get the double 1I ⊗ f (−ξP + ηQ) = f (−ξP2 + ηQ2 ), when f, ξ, and η are as in the definition of Z. By the general properties of the double construction this will then automatically extend to operator products and norm limits. 5 The same type of operators, although motivated by a different argument already appears in [60] 6 This continuity is crucial in the correspondence theory set out in [235]. We where not able to prove the analogue of Theorem 13.5.2 by only assuming this continuity. 13.5. The original EPR state 205 Fix ε > 0. Since f is uniformly continuous, there is some δ > 0 such that |f (x) − f (y)| ≤ ε whenever |x − y| ≤ δ. Now pick a continuous function h : R → [0, 1] ⊂ R such that h(0) = 1, h(t) = 0 for |t| > δ. We consider the operator M = = (f (ξP1 + ηQ1 ) − f (−ξP2 + ηQ2 )) × ×h(ξ(P1 + P2 ) + η(Q1 − Q2 )) F (ξP1 + ηQ1 , −ξP2 + ηQ2 ) where F (x, y) = (f (x)−f (y))h(x−y) and this function is evaluated in the functional calculus of the commuting selfadjoint operators (ξP1 + ηQ1 ) and (−ξP2 + ηQ2 ). But the real valued function F satisfies |F (x, y)| ≤ ε for all (x, y): when |x − y| > δ the h-factor vanishes, and on the strip |x − y| ≤ δ we have |f (x) − f (y)| ≤ ε. Therefore kM k ≤ ε. Let X be an arbitrary operator. Then ¯ ¡£ ¤ ¢¯ ¯ω f (ξP1 + ηQ1 ) − f (−ξP2 + ηQ2 ) X ¯ ¡ ¢ = |ω M X | ≤ kM k kXk ≤ εkXk . Here we have added a factor h(ξ(P1 + P2 ) + η(Q1 − Q2 )) at the second equality sign, which we may because of (13.20), and because h is a function of the appropriate operators, which is = 1 at the origin. Since this estimate holds for any ε, we conclude that the first relation in Lemma 13.4.3.1 holds. The argument for the second relation is completely analogous. 2 13.5.4 Infinite one-shot entanglement In order to show that the EPR state is indeed highly entangled, let us verify that it contains infinite one-shot entanglement in the sense forbidden by Theorem 13.2.1. The local operations needed to extract a d-dimensional system will be simply the restriction to a subalgebra. In other words, we will construct subalgebras A d ⊂ A and Bd ⊂ B such that the state ω restricted to Ad ⊗Bd will be a maximally entangled pure state of d-dimensional systems. The matrix algebras Ad , Bd are best seen to be generated by Weyl operators, satisfying a discrete version of the canonical commutation relations (13.24), with the addition operation on the right hand side replaced by the addition in a finite group. Let Zd denote the cyclic group of integers modulo d. With the canonical basis |k, `i, k, ` ∈ Zd we introduce the Weyl operators w(n1 , m1 , n2 , m2 )|k, `i = ζ n1 (k−m1 )+n2 (`−m2 ) |k − m1 , ` − m2 i, (13.26) where ζ = exp(2πi/d) is the d th root of unity. These are a basis of the vector space B(Cd ⊗ Cd ), which shows that this algebra is generated by the four unitaries u1 = w(1, 0, 0, 0), v1 = w(0, 1, 0, 0), u2 = w(0, 0, 1, 0) and v2 = w(0, 0, 0, 1). They are defined algebraically by the relations vk uk = ζuk vk , k = 1, 2, and ud1 = ud2 = v1d = v2d = 1I. The oneP dimensional projection onto the standard maximally entangled vector Ω = d−1/2 k |kki can be expressed in the basis (13.26) as |ΩihΩ| = = 1 X w(n, m, −n, m) d2 n,m 1 X n m (u1 u−1 2 ) (v1 v2 ) , d2 n,m (13.27) which will be useful for computing fidelity. In order to define the subalgebras extracting the desired entanglement we first define operators U1 , V1 in Alice’s subalgebra and U2 , V2 in Bob’s, which satisfy the 13. Infinitely entangled states 206 above relations and hence generate two copies of the d × d matrices. It is easy to satisfy the commutation relations Vk Uk = ζUk Vk , by taking appropriate Weyl operators, say e2 = ei(Q2 −a) , and Vek = eiξPk e1 = eiQ1 , U (13.28) U with ξ = 2π/d. The tilde indicates that these are not quite the operators yet we are e d = exp(idQ1 ) 6= looking for, because they do not satisfy the periodicity relations: U 1 d e e e 1I, and similarly for U2 and Vk . We will denote by A the C*-algebra, generated by e1 , Ve1 (13.28). The algebra Be is constructed analogously. Then by the operators U e d and Ve d commute with all other elements of virtue of the commutation relations U 1 1 e e which represents the classical variables A, i.e., they belong to the center CA ⊂ A, e d and Ve d generate the center CB of Bob’s of the system. In the same manner, U 2 2 7 algebra Be . If we take any continuous function (in the functional calculus) of a hermitian or unitary element of CA , it will still be in CA . If we take a measurable (possibly discontinuous) function the result may fail to be in CA , but it still commutes with all elements of Ae (and analogously for Bob’s algebras). In particular, we construct the operators ¡ d ¢1/d bk = U ek , (13.29) U where the dth root of numbers on the unit circle is taken with a branch cut on the negative real axis. This branch cut makes the function discontinuous, and also makes ek . We now define Vbk analogously, this odd-looking combination very different from U and set b −1 U ek and Vk = Vb −1 Vek Uk = U (13.30) k k bk , Vbk commute with Ae ⊗ B, e the commutation relations for k = 1, 2. Then since U e d . It bd = U Vk Uk = ζUk Vk still hold, but in addition we have Ukd = 1I, because U k k remains to show that on the finite dimensional algebras generated by these operators, the given state is a maximally entangled pure state. We will verify this by computing the fidelity, i.e., the expectation of the projection (13.27): à ! 1 X −1 n m ω (U1 U2 ) (V1 V2 ) = 1. (13.31) d2 n,m e1 and U e2 are EPRProof of this equation. We have shown in Section 13.5.3 that U e e2 by doubles. This property transfers to arbitrary continuous functions of U1 and U Lemma 13.4.3 and uniform approximation of continuous functions by polynomials. However, because the state ω is not normal, it does not transfer automatically to b1 and U b2 . We the measurable functional calculus and hence not automatically to U claim that this is true nonetheless. Denote by rd (z) = z 1/d the dth root function with the branch cut as described, and let f² be a continuous function from the unit circle to the unit interval [0, 1] such that f² (z) = 1 except for z in an ²-neighborhood of z = −1 in arclength, and such that f² (−1) = 0. Then the function z 7→ f² (z)rd (z) is continuous. Then, since e d and U e d are doubles, so are f² (U e d ), f² (U e d )U b1 and their counterparts. Note that U 1 2 1 1 7 The C*-algebra A e is isomorphic to the continuous sections in an C*-algebra bundle over the torus, where each fiber is a copy of the algebra Ad . Such a bundle is called trivial, if it is isomorphic to the tensor product Ad ⊗ CA . This would directly give us the desired subalgebra Ad as a subalgebra of A. However, this is bundle is not trivial [34, 110]. In order to “trivialize” the bundle, we are therefore forced to go beyond norm continuous operations, which respect the continuity of bundle sections. Instead we have to go to the measurable functional calculus, and introduce an operation on the fibers, which depends discontionuously on the base point, through the introduction of a branch cut. 13.5. The original EPR state 207 both of these commute with all other operators involved. Hence (using the notation |X|2 = X ∗ X or |X|2 = XX ∗ , which coincide in this case) ³ ´ e1d )2 |U b1 − U b2 |2 ω f ² (U ³¯ ¯´ b2 ¯ = 0, b1 − f² (U e d )U e d )U (13.32) = ω ¯f ² ( U 2 1 where the first equality holds by expanding the modulus square, and applying the e d ) where appropriate. On the other hand, we have double property of f² (U 1 ´ ³¡ ¢ b1 − U b2 |2 e d ) 2 |U ω 1I − f² (U 1 ¡ ¢ e1d )2 ≤ 4 ² , ≤ 4ω 1I − f² (U (13.33) π b1 − U b2 k ≤ 2, and 0 ≤ f² (U e d ) ≤ 1I. For the estimate we used that f² (z)2 because kU 1 for all z on the unit circle except a section of relative size 2²/(2π), and that the e d is uniform, because the expectation probability distribution for the spectrum of U 1 d e ) = exp(indQ1 ) vanishes. of all powers (U 1 ¡ ¢ b1 − U b2 |2 ≤ 4²/π for every ², and Adding (13.32) and (13.33) we find that ω |U b1 and U b2 are EPR doubles as claimed. The proof that Vb1 and Vb ∗ are hence that U 2 likewise doubles (just as Ve1 and Ve2∗ ) is entirely analogous. Hence U1 and U2 as well as V1 and V2 are also doubles. Applying this property in the fidelity expression (13.31) we find that every term has expectation one, so that with the prefactor d−2 the d2 terms add up to one as claimed. ¤ 13.5.5 EPR states based on two mode Gaussians In this section we will deviate from the announcement that we intended to study only such properties of EPR states which follow from the definition alone, and are hence common to all EPR states. The reason is that there is one particular family, which has a lot of additional symmetry, and hence more operators admitting doubles, than general EPR states. Moreover, it is very well known. In fact, most people working in quantum optics probably have a very concrete picture of the EPR state, or rather of an approximation to this state: since Gaussian states play a prominent role in the description of lasers, it is natural to consider a Gaussian wave function of the form µ 1−λ 1 (q1 − q2 )2 Ψλ (x1 , x2 ) = √ exp − 4(1 + λ) π ¶ 1+λ (q1 + q2 )2 (13.34) − 4(1 − λ) ∞ p X Ψλ = 1 − λ 2 λn en ⊗ e n , (13.35) n=0 where en denotes the eigenbasis of the harmonic oscillators Hi = (Pi2 + Q2i )/2 (i = 1, 2). This state is also known as the NOPA state, and the parameter λ ∈ [0, 1) is related to the so-called squeezing parameter r by λ = tanh(r). Values around r = 5 are considered a good experimental achievement [144]. Of course, we are interested in the limit r → ∞, or λ → 1. The λ-dependence of the wave function can also be written as Ψλ (x1 , x2 ) = Ψ0 (x1 cosh η + x2 sinh η, −x1 sinh η + x2 cosh η) , (13.36) where the hyperbolic angle η is r/2. It is easy to see that for any wave function Ψ 0 the probability distributions of both Q1 − Q2 and P1 + P2 scale to a point measures 13. Infinitely entangled states 208 at zero. Hence any cluster point of the associated sequence of states ω λ (X) = hΨλ , AΨλ i is an EPR state in the sense of our definition (with shift parameter a = 0). Note, however, that the family itself does not converge to any state: it is easy to construct observables X for which the expectation ωλ (X) remains oscillating between 0 and 1 as λ → 1. Here, as in the general case, a single state can only be obtained by going to a finest subsequence (or by taking the limit along an ultrafilter). The virtue of the particular family (13.35) is that it has especially high symmetry: it is immediately clear that ³ ´ (f (H1 ) − f (H2 ) Ψλ = 0 (13.37) for all λ, and for all bounded functions f : N → C of the oscillator Hamiltonians H1 , H2 . This implies that f (H1 ) and f (H2 ) are doubles with respect to the state ωλ for each λ. Clearly, this property remains valid in the limit along any subsequence, so all EPR-states obtained as cluster points of the sequence ωλ also have f (H1 ) in their algebra of doubles. Consequently, the unitaries Uk (t) = exp(itHk ) are also doubles of each other, and the limiting states are invariant under the time evolution U12 (t) = U1 (t) ⊗ U2 (−t). This is certainly suggestive, because oscillator time evolutions have an interpretation as linear symplectic transformations on phase space: Qk 7→ Qk cos t ± Pk sin t and Pk 7→ ∓Qk sin t + Pk cos t, where the upper sign holds for k = 1 and the lower for k = 2. The subspace S from Section 13.5.3 is invariant under such rotations, and one readily verifies that the time evolution U12 (t) takes EPR states into EPR states. This certainly implies that by averaging we can generate EPR states invariant under this evolution, and we have clearly just constructed a family with this invariance. As λ → 1, the Schmidt spectrum in (13.35) becomes “flatter”, which suggests that exchanging some labels n should also define a unitary with double. Let p : N → N denote an injective (i.e., one-to-one but not necessarily onto map). Then we define an isometry Vp by Vp en = ep(n) (13.38) with adjoint Vp∗ en = ½ ep−1 (n) 0 if n ∈ p(N) if n ∈ / p(N) (13.39) Let us assume that p has finite distance, i.e., there is a constant ` such that |p(n) − n| ≤ ` for all n ∈ N. We claim that in this case Vp ⊗ 1I and 1I ⊗ Vp∗ are doubles in all EPR states constructed from the sequence (13.35). We show this by verifying that the condition holds approximately already for finite λ. Consider the vector ¢ ¡ (13.40) ∆λ = Vp ⊗ 1I − 1I ⊗ Vp∗ Ψλ ∞ p X 1 − λ2 (λn − λp(n) ) ep(n) ⊗ en , = n=0 where in the second summand we changed the summation index from n to p(n), automatically omitting all terms annihilated by Vp∗ according to (13.39). Since this is a sum of orthogonal vectors, we can readily estimate the norm by writing (λ n − λp(n) ) = λn (1 − λp(n)−n ): k∆λ k2 ≤ max |1 − λp(n)−n |2 ≤ |1 − λ−` |2 , n which goes to zero as λ → 1. Therefore ³ ¡ ¢´ ωλ X Vp ⊗ 1I − 1I ⊗ Vp∗ = hΨλ , X∆λ i → 0 (13.41) (13.42) 13.5. The original EPR state 209 as λ → 1. Hence Vp ⊗ 1I and 1I ⊗ Vp∗ are doubles in any state defined by a limit of ωλ along a subsequence, as claimed. Vp is an isometry but not necessarily unitary. But it is effectively unitary under ¡ ¢ an EPR state: Since Vp is in the centralizer, we must have ω (1I − Vp Vp∗ ) ⊗ 1I = ¡ ¢ ω (1I − Vp∗ Vp ) ⊗ 1I = 0, although this operator is non-zero. This is in keeping with the general properties of EPR states, whose restrictions must be purely singular. In fact, (1I − Vp Vp∗ ) is the projection onto those eigenstates en for which n ∈ / p(N), and this set is finite: it has at most ` elements8 . It is interesting to note what happens if one tries to relax the finite distance condition. An extreme case would be the two isometries Veven en = e2n and Vodd en = e2n+1 . These cannot have doubles in any state, because the restriction ωA of the ∗ ∗ state to the first factor would then have to satisfy 1 = ωA (Veven Veven + Vodd Vodd )= ∗ ∗ ωA (Veven Veven + Vodd Vodd ) = ωA (1I + 1I) = 2. On the other hand, the norm of ∆λ no longer goes to zero, and we get k∆λ k2 → 1/6 instead. To get infinite one-shot entanglement is easier than in the case of general EPR states: we can simply combine d periodic multiplication operators with d-periodic permutation operators to construct a finite Weyl-system of doubles9 . In fact there is a very quick way to get high fidelity entangled pure states even for λ < 1 (see [168] for an application to Bell inequality violations). Consider the unitary operator Ud : H → H ⊗ Cd given by Ud edk+r = (ek ⊗ e(d) r ), (13.43) for k = 0, 1, . . . and r = 0, 1, . . . , d − 1. Then (d) (Ud ⊗ Ud )Ψλ = Ψλd ⊗ Ψλ (13.44) (d) with a λ-dependent normalized vector Ψλ ∈ Cd ⊗ Cd proportional to (d) Ψλ ∝ d X r=1 (d) λr e(d) r ⊗ er . (13.45) Note that the infinite dimensional factor on the right hand side of (13.44) is again a state of the form (13.35), however, a less entangled one with parameter λ0 = λd < λ. The second factor, i.e., (13.44) becomes maximally entangled in the limit λ → 1. Therefore the unitary (Ud ⊗ Ud ) splits both Alice’s and Bob’s subsystem, so that the total system is split exactly into a less entangled version of itself and a pure, nearly maximally entangled d-dimensional pair. The local operation extracting entanglement from this state is to discard the infinite dimensional parts. Seen in one of the limit states of the family ωλ this is maximally entangled, so equation (13.2) is satisfied with ² = 0. Moreover, since the remaining system is of exactly the same type, the process can be repeated arbitrarily often. 13.5.6 Counterintuitive properties of the restricted states Basically, subsection 13.5.3 shows that the EPR states constructed here do satisfy the requirements of the EPR argument. However, Einstein, Podolsky and Rosen do not consider the measurement of suitable periodic functions of Qk or Pk but measurements of these quantities themselves [82]: What do EPR states have to say about these? Unfortunately, the “values of momentum” found by Alice or Bob are not quite what we usually mean by “values”: they are infinite with probability 1. To see this, 8 For any N > `, consider the set {1, . . . , N }. This has to contain at least the images of {1, . . . , N − `}, hence it can contain at most ` elements not in p(N). 9 This is probably what the authors of [39] are trying to say. Infinitely entangled states 210 recall the remark after eq. (13.22) that EPR states are invariant with respect to ~ ~η ) with (ξ, ~ ~η ) ∈ S. Hence phase space translations with W(ξ, ¢ ¡ ω W(ξ1 , 0, η1 , 0)(A ⊗ 1I)W(ξ1 , 0, η1 , 0)∗ ¢ ¡ = ω W(ξ1 , ξ1 , η1 , −η1 )(A ⊗ 1I)W(ξ1 , ξ1 , η1 , −η1 )∗ = ω(A ⊗ 1I). (13.46) That is, the reduced state is invariant under all phase space translations. Now suppose that for some continuous function f with compact support we have ω(f (Q1 )) = ² 6= 0. Then we could add many (say N ) sufficiently widely spaced PN translates of f to get an operator F = i f (Q1 + xi 1I) with kF k ≤ kf k and |N ²| = |ω(F )| ≤ kf k, which implies ² = 0. Hence for every function with compact support we must have ω(f (Q1 )) = 0. Note that this is possible only for singular states, since we can easily construct a sequence of compactly supported function increasing to the identity, whose ω expectations are all zero, hence fail to converge to 1. In spite of being infinite, the “measured values” of Alice and Bob are perfectly correlated, which means that we have to distinguish different kinds if infinity. Such “kinds of infinity” are the subject of the topological theory of compactifications [53, 235]. The basic idea is very simple: consider some C*-algebra of bounded functions on the real line. Then the evaluations of the functions at a point, i.e., the functionals x 7→ f (x), are pure states on such an algebra, but ´compactness of the state space together with the Kreı̆n-Milman Theorem [4] dictates that there are many more pure states. These additional pure states are interpreted as the points at infinity associated with the given observable algebra. The set of all pure states is called the Gel’fand spectrum of the commutative C*-algebra[35, Sec.2.3.5], and the algebra is known to be isomorphic to the algebra of continuous functions on this compact space. For the algebra of all bounded function the additional pure states are called free ultrafilters, for the algebra of all continuous bounded functions we get the points of the Stone-Čech-compactification, and for the algebra of uniformly continuous functions we get a still coarser notion of points at infinity. According to Section 13.5.3 these are the measured values, which will be perfectly correlated between Alice’s and Bob’s positions or momenta. It is not possible to exhibit any such value, because proving their mere existence already requires an argument based on the Axiom of Choice. So do we have to be content with the statement that the measured values lie “out there on the infinite ranges, where the free ultrafilters roam?” Section 13.5.4 shows that for many concrete problems, involving not too large observable algebras, we can use the perfect correlation property quite well. A smaller algebra of observables means that many points of Gel’fand spectrum become identified, and some of these coarser points may have a direct physical interpretation. So the moral is not so much that compactification points at infinity are wild, pathological objects, but that they describe the way a sequence can go to infinity in the finest possible detail, which is just much finer that we usually want to know. The EPR correlation property holds even for such wild “measured values”. Bibliography [1] A. Acı́n, A. Andrianov, L. Costa, E. Jané, J. I. Latorre and R. Tarrach. Schmidt decomposition and classification of three-quantum-bit states. Phys. Rev. Lett. 85, no. 7, 1560–1563 (2000). [2] A. Acı́n, R. Tarrach and G. Vidal. Optimal estimation of two-qubit pure-state entanglement. Phys. Rev. A 61, 062307 (2000). [3] C. Adami and N. J. Cerf. Von Neumann capacity of noisy quantum channels. Phys. Rev. A 56, no. 5, 3470–3483 (1997). [4] E.M. Alfsen. Compact convex sets and boundary integrals, volume 57 of Ergebnisse der Mathematik und ihrer Grenzgebiete. Springer, Springer, New York, Hedelberg, Berlin, (1971). [5] R. Alicki, S. Rudnicki and S. Sadowski. Symmetry properties of product states for the system of N n-level atoms. J. Math. Phys. 29, no. 5, 1158–1162 (1988). [6] A. Ambainis. A new protocol and lower bounds for quantum coin flipping. In Proceedings of the 33rd Annual Symposium on Theory of Computing 2001, pages 134–142. Association for Computing Machinery, New York (2001). see also the more recent version in quant-ph/0204022. [7] H. Araki and E.J. Woods. A classification of factors. Publ. R.I.M.S, Kyoto Univ. 4, 51–130 (1968). [8] R. Arens and V.S. Varadarajan. On the concept of EPR states and their structure. Jour. Math. Phys. 41, 638–651 (2000). [9] W. Arveson. Subalgebras of C*-algebras. Acta. Math. 123, 141–224 (1969). [10] A. Ashikhmin and E. Knill. Nonbinary quantum stabilizer codes. IEEE T. Inf. Theory 47, no. 7, 3065–3072 (2001). [11] A. Aspect, J. Dalibard and G. Roger. Experimental test of Bell’s inequalities using time-varying analyzers. Phys. Rev. Lett. 49, 1804–1807 (1982). [12] H. Barnum, E. Knill and M. A. Nielsen. On quantum fidelities and channel capacities. IEEE Trans. Inf. Theory 46, 1317–1329 (2000). [13] H. Barnum, M. A. Nielsen and B. Schumacher. Information transmission through a noisy quantum channel. Phys. Rev. A 57, no. 6, 4153–4175 (1998). [14] H. Barnum, J. A. Smolin and B. M. Terhal. Quantum capacity is properly defined without encodings. Phys. Rev A 58, no. 5, 3496–3501 (1998). [15] A. O. Barut and R. Raczka. Theory of group representations and applications. World Scientific, Singapore (1986). [16] H. Baumgärtel and M. Wollenberg. Akademie Verlag, Berlin (1992). Causal nets of operator algebras. [17] C. H. Bennett, H. J. Bernstein, S. Popescu and B. Schumacher. Concentrating partial entanglement by local operations. Phys. Rev. A 53, no. 4, 2046–2052 (1996). Bibliography 212 [18] C. H. Bennett and G. Brassard. Quantum key distribution and coin tossing. In Proc. of IEEE Int. Conf. on Computers, Systems, and Signal Processing (Bangalore, India, 1984), pages 175–179. IEEE, New York (1984). [19] C. H. Bennett, G. Brassard, C. Crépeau, R. Jozsa, A. Peres and W. K. Wootters. Teleporting an unknown quantum state via dual classical and EinsteinPodolsky-Rosen channels. Phys. Rev. Lett. 70, 1895–1899 (1993). [20] C. H. Bennett, G. Brassard, S. Popescu, B. Schumacher, J. A. Smolin and W. K. Wootters. Purification of noisy entanglement and faithful teleportation via noisy channels. Phys. Rev. Lett. 76, no. 5, 722–725 (1996). Erratum: Phys. Rev. Lett. 78, 10, 2031 (1997). [21] C. H. Bennett, D. P. DiVincenzo, C. A. Fuchs, T. Mor, E. M. Rains, P. W. Shor, J. A. Smolin and W. K. Wootters. Quantum nonlocality without entanglement. Phys. Rev. A 59, no. 2, 1070–1091 (1999). [22] C. H. Bennett, D. P. DiVincenzo, T. Mor, P. W. Shor, J. A. Smolin and B. M. Terhal. Unextendible product bases and bound entanglement. Phys. Rev. Lett 82, no. 26, 5385–5388 (1999). [23] C. H. Bennett, D. P. DiVincenzo and J. A. Smolin. Capacities of quantum erasure channels. Phys. Rev. Lett. 78, no. 16, 3217–3220 (1997). [24] C. H. Bennett, D. P. DiVincenzo, J. A. Smolin and W. K. Wootters. Mixedstate entanglement and quantum error correction. Phys. Rev. A 54, no. 4, 3824–3851 (1996). [25] C. H. Bennett, P. W. Shor, J. A. Smolin and A. V. Thapliyal. Entanglementassisted classical capacity of noisy quantum channels. Phys. Rev. Lett. 83, no. 15, 3081–3084 (1999). [26] C. H. Bennett, P. W. Shor, J. A. Smolin and A. V. Thapliyal. Entanglementassisted capacity of a quantum channel and the reverse Shannon theorem. quant-ph/0106052 (2001). [27] C. H. Bennett and S. J. Wiesner. Communication via one- and two-particle operators on Einstein-Podolsky-Rosen states. Phys. Rev. Lett. 20, 2881–2884 (1992). [28] E. Biolatti, R. C. Iotti, P. Zanardi and F. Rossi. Quantum information processing with semiconductor macroatoms. Phys. Rev. Lett. 85, no. 26, 5647–5650 (2000). [29] M. Blum. Coin flipping by telephone. A protocol for solving impossible problems. SIGACT News 15, 23–27 (1981). [30] A. Bogomolny. Interactive mathematics miscellany and puzzles. http://www. cut-the-knot.com/hall.html. [31] D. Boschi, S. Branca, F. De Martini, L. Hardy and S. Popescu. Experimental realization of teleporting an unknown pure quantum state via dual classical an Einstein-Podolsky-Rosen channels. Phys. Rev. Lett. 80, no. 6, 1121–1125 (1998). [32] D. Bouwmeester, A. K. Ekert and A. Zeilinger (editors). The physics of quantum information: Quantum cryptography, quantum teleportation, quantum computation. Springer, Berlin (2000). 213 Bibliography [33] D. Bouwmeester, J.-W. Pan, K. Mattle, M. Eibl, H. Weinfurter and A. Zeilinger. Experimental quantum teleportation. Nature 390, 575–579 (1997). [34] O. Bratelli, G.A. Elliott, D.E. Evans and A. Kishimoto. Non-commutative spheres II: Rational rotations. J. Operator Theory 27, 53–85 (1992). [35] O. Bratteli and D. W. Robinson. Operator algebras and quantum statistical mechanics. I+II. Springer, New York (1979, 1997). [36] S. L. Braunstein, C. M. Caves, R. Jozsa, N. Linden, S. Popescu and R. Schack. Separability of very noisy mixed states and implications for NMR quantum computing. Phys. Rev. Lett. 83, no. 5, 1054–1057 (1999). [37] G. K. Brennen, C. M. Caves and I. H. Deutsch F. S. Jessen. Quantum logic gates in optical lattices. Phys. Rev. Lett. 82, no. 5, 1969–1063 (1999). [38] K.R. Brown, D.A. Lidar and K.B. Whaley. Quantum computing with quantum dots on linear supports. quant-ph/0105102 (2001). [39] C. Brukner, M.S. Kim, J-W. Pan and A. Zeilinger. Correspondence between continuous variable and discrete quantum systems of arbitrary dimensions. quant-ph/0208116 (2002). [40] T. A. Brun and H. L. Wang. Coupling nanocrystals to a high-q silica microsphere: Entanglement in quantum dots via photon exchange. Phys. Rev. A 61, 032307 (2000). [41] D. Bruß, D. P. DiVincenzo, A. Ekert, C. A. Fuchs, C. Machiavello and J. A. Smolin. Optimal universal and state-dependent cloning. Phys. Rev. A 57, no. 4, 2368–2378 (1998). [42] D. Bruß, A. K. Ekert and C. Macchiavello. Optimal universal quantum cloning and state estimation. Phys. Rev. Lett. 81, no. 12, 2598–2601 (1998). [43] D. Bruß and C. Macchiavello. Optimal state estimation for d-dimensional quantum systems. Phys. Lett. A253, 249–251 (1999). [44] W. T. Buttler, R.J. Hughes, S.K. Lamoreaux, G.L. Morgan, J.E. Nordholt and C.G. Peterson. Daylight quantum key distribution over 1.6 km. Phys. Rev. Lett 84, 5652–5655 (2000). [45] V. Bužek and M. Hillery. Universal optimal cloning of qubits and quantum registers. Phys. Rev. Lett. 81, no. 22, 5003–5006 (1998). [46] V. Bužek, M. Hillery and R. F. Werner. Optimal manipulations with qubits: Universal-not gate. Phys. Rev. A 60, no. 4, R2626–R2629 (1999). [47] A. R. Calderbank, E. M. Rains, P. W. Shor and N. J. A. Sloane. Quantum error correction and orthogonal geometry. Phys. Rev. Lett. 78, no. 3, 405–408 (1997). [48] A. R. Calderbank and P. W. Shor. Good quantum error-correcting codes exist. Phys. Rev. A 54, 1098–1105 (1996). [49] N. J. Cerf. Asymmetric quantum cloning machines. J.Mod.Opt. 47, 187– (2000). [50] N. J. Cerf. Quantum cloning with continuous variables. quant-ph/0210061 (2002). Bibliography 214 [51] N. J. Cerf and C. Adami. Negative entropy and information in quantum mechanics. Phys. Rev. Lett. 79, no. 26, 5194–5197 (1997). [52] N. J. Cerf, C. Adami and R. M. Gingrich. Reduction criterion for separability. Phys. Rev. A 60, no. 2, 898–909 (1999). [53] R.E. Chandler. Hausdorff compactifications, volume 23 of Lect. Notes Pure Appl. Math. Dekker, New York (1976). [54] A. Church. An unsolved problem of elementary number theory. Amer. J. Math. 58, 345–363 (1936). [55] J. I. Cirac, A. K. Ekert and C. Macchiavello. Optimal purification of single qubits. Phys. Rev. Lett. 82, 4344–4347 (1999). [56] B.S. Cirel’son. Quantum generalizations of Bell’s inequalities. Lett. Math. Phys. 4, 93–100 (1980). [57] J. F. Clauser, M. A. Horne, A. Shimony and R. A. Holt. Proposed experiment to test local hidden-variable theories. Phys. Rev. Lett 23, no. 15, 880–884 (1969). [58] R. Clifton and H. Halvorson. Maximal beable subalgebras of quantummechanical observables. Int. J. Theor. Phys. 38, 2441–2484 (1999). [59] R. Clifton and H. Halvorson. Bipartite mixied states of infinite dimensional systems are generically nonseparable. Phys. Rev. A 61, 012108 (2000). [60] R. Clifton and H. Halvorson. Reconsidering Bohr’s reply to EPR. quantph/0110107 (2001). [61] A. Connes. Sur la cassification des facteurs de type II. C.R. Acad. Sci. Paris Ser. A-B 281, A13–A15 (1975). [62] J. F. Cornwell. Group theory in physics. II. Academic Press, London et. al. (1984). [63] T. M. Cover and J. A. Thomas. Elements of information theory. Wiley, Chichester (1991). [64] G.M. D’Ariano, R.D. Gill, M. Keyl, B. Kuemmerer, H. Maassen and R.F. Werner. The quantum Monty Hall problem. http://www.imaph.tubs.de/qi/monty. [65] G.M. D’Ariano, R.D. Gill, M. Keyl, B. Kuemmerer, H. Maassen and R.F. Werner. The quantum Monty Hall problem. Quantum Inf. Comput. 2, 355– 366 (2002). [66] E. B. Davies. Quantum Theory of Open Systems. Academic Press, London (1976). [67] B. Demoen, P. Vanheuverzwijn and A. Verbeure. Completely positive maps on the CCR-algebra. Lett. Math. Phys. 2, 161–166 (1977). [68] R. Derka, V. Bužek and A.K. Ekert. Universal algorithm for optimal estimation of quantum states from finite ensembles via realizable generalized measurements. Phys. Rev. Lett. 80, no. 8, 1571–1575 (1998). [69] D. Deutsch. Quantum theory, the Church-Turing principle and the universal quantum computer. Proc. R. Soc. Lond. A 400, 97–117 (1985). 215 Bibliography [70] D. Deutsch and R. Jozsa. Rapid solution of problems by quantum computation. Proc. R. Soc. Lond. A 439, 553–558 (1992). [71] D. P. DiVincenzo, P. W. Shor and J. A. Smolin. Quantum-channel capacity of very noisy channels. Phys. Rev. A 57, no. 2, 830–839 (1998). Erratum: Phys. Rev. A 59, 2, 1717 (1999). [72] D.P. DiVincenzo, P.W. Shor, J.A. Smolin, B.M. Terhal and A.V. Thapliyal. Evidence for bound entangled states with negative partial transpose. Phys. Rev. A 61, no. 6, 062312 (2000). [73] M. J. Donald and M. Horodecki. Continuity of relative entropy of entanglement. Phys. Lett A 264, no. 4, 257–260 (1999). [74] M. J. Donald, M. Horodecki and O. Rudolph. The uniqueness theorem for entanglement measures. quant-ph/0105017 (2001). [75] C. Döscher and M. Keyl. An introduction to quantum coin-tossing. Fluct. Noise Lett. 2, no. 4, R125–R137 (2002). [76] N. G. Duffield. A large deviation principle for the reduction of product representations. Proc. Amer. Math. Soc. 109, 503–515 (1990). [77] P. Dupius and R. S. Ellis. A weak convergence approach to the theory of large deviations. Wiley, New York et. al. (199?). [78] W. Dür, J.I. Cirac, M. Lewenstein and D. Bruss. Distillability and partial transposition in bipartite systems. Phys. Rev. A 61, no. 6, 062313 (2000). [79] B. Efron and R. J. Tibshirani. An introduction to the bootstrap. Chapman and Hall, New York (1993). [80] T. Eggeling, K. G. H. Vollbrecht, R. F. Werner and M. M. Wolf. Distillability via protocols respecting the positivity of the partial transpose. Phys. Rev. Lett. 87, 257902 (2001). [81] T. Eggeling and R. F. Werner. Separability properties of tripartite states with U × U × U -symmetry. Phys. Rev. A 63, no. 4, 042111 (2001). [82] A. Einstein, B. Podolsky and N. Rosen. Can quantum-mechanical description of physical reality be considered complete? Phys. Rev 47, 777–780 (1935). [83] J. Eisert, C. Simon and M.B. Plenio. On the quantification of entanglement in infinite-dimensional quantum systems. quant-ph/0112064 (2001). [84] J. Eisert, M. Wilkens and M. Lewenstein. Quantum games and quantum strategies. Phys. Rev. Lett. 83, 3077–3080 (1999). [85] R. S. Ellis. Entropy, large deviations, and statistical mechanics. Springer, Berlin (1985). [86] D. J. Wineland et. al. Quantum information processing with trapped ions. quant-ph/0212079 (2002). [87] R. Laflamme et. al. Introduction to NMR quantum information processing. quant-ph/0207172 (2002). to appear in LA Science. [88] A. Feinstein. Foundations of Informations Theory. McGraw-Hill, New York (1958). Bibliography 216 [89] D. G. Fischer and M. Freyberger. Estimating mixed quantum states. Phys. Lett. A273, 293–302 (2000). [90] A. P. Flitney and D. Abbott. An introduction to quantum game theory. quantph/0208069 (2002). [91] A. P. Flitney and D. Abbott. Quantum version of the Monty Hall problem. Phys. Rev. A 65, 062318 (2002). [92] G. Giedke, L.-M. Duan, J. I. Cirac and P. Zoller. Distillability criterion for all bipartite gaussian states. Quant. Inf. Comp. 1, no. 3 (2001). [93] G. Giedke, B. Kraus, M. Lewenstein and J. I. Cirac. Separability properties of three-mode gaussian states. Phys. Rev. A 64, no. 05, 052303 (2001). [94] R. D. Gill and S. Massar. State estimation for large ensembles. Phys. Rev. A61, 2312–2327 (2000). [95] N. Gisin. Hidden quantum nonlocality revealed by local filters. Phys. Lett. A 210, no. 3, 151–156 (1996). [96] N. Gisin and S. Massar. Optimal quantum cloning machines. Phys.Rev.Lett. 79, no. 11, 2153–2156 (1997). [97] N. Gisin, G. Ribordy, W. Tittel and H. Zbinden. Quantum cryptography. Rev. Mod. Phys. 74, no. 1, 145–195 (2002). [98] D. Gottesman. Class of quantum error-correcting codes saturating the quantum hamming bound. Phys. Rev. A 54, 1862–1868 (1996). [99] D. Gottesman. Stabilizer codes and quantum error correction. Ph.D. thesis, California Institute of Technology (1997). quant-ph/9705052. [100] M. Grassl, T. Beth and T. Pellizzari. Codes for the quantum erasure channel. Phys. Rev. A 56, no. 1, 33–38 (1997). [101] D. M. Greenberger, M. A. Horne and A. Zeilinger. Going beyond bell’s theorem. In Bell’s theorem, quantum theory, and conceptions of the universe ( M. Kafatos, editor), pages 69–72. Kluwer Academic, Dordrecht (1989). [102] L. K. Grover. Quantum computers can search arbitrarily large databases by a single query. Phys. Rev. A 56, no. 23, 4709–4712 (1997). [103] L. K. Grover. Quantum mechanics helps in searching for a needle in a haystack. Phys. Rev. Lett. 79, no. 2, 325–328 (1997). [104] R. Haag, N.M. Hugenholtz and M. Winnink. On the equilibrium states in quantum statistical mechanics. Commun. Math. Phys. 5, 215–236 (1967). [105] M. Hamada. Exponential lower bound on the highest fidelity achievable by quantum error-correcting codes. quant-ph/0109114 (2001). [106] L. Hardy and A. Kent. Cheat sensitive quantum bit commitment. quantph/9911043 (1999). [107] M. Hayashi. Optimal sequence of quantum measurements in the sense of stein’s lemma in quantum hypothesis testing. quant-ph/020820 (2002). submitted to J. Phys. A. 217 Bibliography [108] P. M. Hayden, M. Horodecki and B. M. Terhal. The asymptotic entanglement cost of preparing a quantum state. J. Phys. A., Math. Gen. 34, no. 35, 6891– 6898 (2001). [109] C. W. Helstrom. Quantum detection and estimation theory. Academic Press, New York (1976). [110] R. Høegh-Krohn and T. Skjelbred. Classification of C*-algebras admitting ergodic actions of the two-dimensional torus. J. Reine Angew. Math. 328, 1–8 (1981). [111] A. S. Holevo. Probabilistic and statistical aspects of quantum theory. NorthHolland, Amsterdam (1982). [112] A. S. Holevo. Coding theorems for quantum channels. Tamagawa University Research Review no. 4 (1998). quant-ph/9809023. [113] A. S. Holevo. Sending quantum information with gaussian states. In Proc. of the 4th Int. Conf. on Quantum Communication, Measurement and Computing (Evanston, 1998) (1998). quant-ph/9809022. [114] A. S. Holevo. On entanglement-assisted classical capacity. quant-ph/0106075 (2001). [115] A. S. Holevo. Statistical structure of quantum theory. Springer, Berlin (2001). [116] A. S. Holevo and R. F. Werner. Evaluating capacities of bosonic gaussian channels. Phys. Rev. A 63, no. 3, 032312 (2001). [117] M. Horodecki and P. Horodecki. Reduction criterion of separability and limits for a class of distillation protocols. Phys. Rev. A 59, no. 6, 4206–4216 (1999). [118] M. Horodecki, P. Horodecki and R. Horodecki. Separability of mixed states: Necessary and sufficient conditions. Phys. Lett. A 223, no. 1-2, 1–8 (1996). [119] M. Horodecki, P. Horodecki and R. Horodecki. Mixed-state entanglement and distillation: Is there a “bound” entanglement in nature? Phys. Rev. Lett. 80, no. 24, 5239–5242 (1998). [120] M. Horodecki, P. Horodecki and R. Horodecki. General teleportation channel, singlet fraction, and quasidistillation. Phys. Rev. A 60, no. 3, 1888–1898 (1999). [121] M. Horodecki, P. Horodecki and R. Horodecki. Limits for entanglement measures. Phys. Rev. Lett. 84, no. 9, 2014–2017 (2000). [122] M. Horodecki, P. Horodecki and R. Horodecki. Unified approach to quantum capacities: Towards quantum noisy coding theorem. Phys. Rev. Lett. 85, no. 2, 433–436 (2000). [123] M. Horodecki, P. Horodecki and R. Horodecki. Mixed-state entanglement and quantum communication. In Quantum information ( G. Alber et. al., editor), pages 151–195. Springer (2001). [124] P. Horodecki, J.I. Cirac and M. Lewenstein. Bound entanglement for continuous variables is a rare phenomenon. quant-ph/0103076 (2001). [125] P. Horodecki, M. Horodecki and R. Horodecki. Bound entanglement can be activated. Phys. Rev. Lett. 82, no. 5, 1056–1059 (1999). Bibliography 218 [126] R. J. Hughes, G. L. Morgan and C. G. Peterson. Quantum key distribution over a 48 km optical fibre network. J. Mod. Opt. 47, no. 2-3, 533–547 (2000). [127] A. E. Ingham. On the difference between consecutive primes. Quart. J. Math., Oxford Ser. 8, 255–266 (1937). [128] A. JamioÃlkowski. Linear transformations which preserve trace and positive semidefiniteness of operators. Rep. Math. Phys. 3, 275–278 (1972). [129] Klaus Jänich. Differenzierbare G-Mannigfaltigkeiten. Lecture Notes in Mathematics, No 59. Springer-Verlag, Berlin (1968). [130] T. Jennewein, C. Simon, G. Weihs, H. Weinfurter and A. Zeilinger. Quantum cryptography with entangled photons. Phys. Rev. Lett. 84, 4729–4732 (2000). [131] S. Kakutani. A generalization of Brouwer’s fixed point theorem. Duke Math. J. 8, 457–459 (1941). [132] A. Kent. Coin tossing is strictly weaker than bit commitment. Phys. Rev. Lett. 83, 5382–5384 (1999). [133] M. Keyl. Quantum operation with multiple inputs. In Quantum theory and symmetries ( H. D. Doebner, V. K. Dobrev, J.-D. Hennig and W. Lücke, editors), pages 401–405. World Scientific, Singapore (2000). [134] M Keyl. Fundamentals of quantum information theory. Phys. Rep. 369, no. 5, 431–548 (2002). [135] M. Keyl, D. Schlingemann and R. F. Werner. Infinitely entangled states. quant-ph/0212014 (2002). [136] M. Keyl and R. F. Werner. Optimal cloning of pure states, testing single clones. J. Math. Phys. 40, 3283–3299 (1999). [137] M. Keyl and R. F. Werner. Estimating the spectrum of a density operator. Phys. Rev. A 64, no. 5, 052311 (2001). [138] M. Keyl and R. F. Werner. The rate of optimal purification procedures. Ann H. Poincaré 2, 1–26 (2001). [139] M. Keyl and R.F. Werner. How to correct small quantum errors. In Coherent evolution in noisy environment ( A. Buchleitner and K. Hornberger, editors), volume 611 of Lecture notes in physics, pages 263–286. Springer, Berlin (2002). [140] A. I. Khinchin. Mathematical Foundations of Information Theory. Dover Publications, New York (1957). [141] C. King. Additivity for unital qubit channels. J. Math. Phys. 43, no. 10, 4641–4653 (2002). [142] C. King. The capacity of the quantum depolarizing channel. quant-ph/0204172 (2002). [143] E. Knill and R. Laflamme. Theory of quantum error-correcting codes. Phys. Rev. A 55, no. 2, 900–911 (1997). [144] N. Korolkova and G. Leuchs. Multimode quantum correlations. In Coherence and statistics of photons and atoms ( J. Perina, editor). Wiley (2001). [145] B. Kraus, M. Lewenstein and J. I. Cirac. Characterization of distillable and activable states using entanglement witnesses. quant-ph/0110174 (2001). Bibliography 219 [146] K. Kraus. States effects and operations. Springer, Berlin (1983). [147] D. Kretschmann. Channel capacities quantized. Braunschweig. in preparation. Diploma Thesis, TU- [148] R. Landauer. Irreversibility and heat generation in the computing process. IBM J. Res. Dev. 5, 183 (1961). [149] C. F. Lee and N. F. Johnson. Quantum game theory. quant-ph/0207012 (2002). [150] U. Leonhardt. Measuring the quantum state of light. Cambridge Univ. Press, Cambridge (1997). [151] M. Lewenstein and A. Sanpera. Separability and entanglement of composite quantum systems. Phys. Rev. Lett. 80, no. 11, 2261–2264 (1998). [152] C.-F. Li, Y.-S. Zhang, Y.-F. Huang and G.-C. Guo. Quantum strategies of quantum measurement. Phys. Lett. A 280, 257–260 (2000). [153] S. Lloyd. Capacity of the noisy quantum channel. Phys. Rev. A 55, no. 3, 1613–1622 (1997). [154] H.-K. Lo and H. F. Chau. Why quantum bit commitment and ideal quantum coin tossing are impossible. Physica D 120, 177–187 (1998). [155] Y. Makhlin, G. Schön and A. Shnirman. Quantum-state engineering with Josephson-junction devices. Rev. Mod. Phys. 73, no. 2, 357–400 (2001). [156] L. Marinatto and T. Weber. A quantum approach to static games of complete information. Phys. Let. A 272, 291–303 (2000). [157] S. Massar and S. Popescu. Optimal extraction of information from finite quantum ensembles. Phys. Rev. Lett. 74, no. 8, 1259–1263 (1995). [158] R. Matsumoto and T. Uyematsu. Lower bound for the quantum capacity of a discrete memoryless quantum channel. quant-ph/0105151 (2001). [159] K. Mattle, H. Weinfurter, P. G. Kwiat and A. Zeilinger. Dense coding in experimental quantum communication. Phys. Rev. Lett. 76, no. 25, 4656– 4659 (1996). [160] D. Mayers. Unconditional secure quantum bit commitment is impossible. Phys. Rev. Let. 78, 3414–3417 (1997). [161] D. Mayers, L. Salvai and Y. Chiba-Kohno. Unconditional secure quantum coin-tossing. quant-ph/9904078 (1999). [162] N. D. Mermin. Quantum mysteries revisited. Am. J. Phys. 58, no. 8, 731–734 (1990). [163] N. D. Mermin. What’s wrong with these elements of reality? Phys. Today 43, no. 6, 9–11 (1990). [164] D. A. Meyer. Quantum strategies. Phys. Rev. Lett. 82, 1052–1055 (1999). [165] M. M. Möbius. Introduction to game theory. http://www.courses.fas.harvard. edu/∼ec1052/ (2002). Bibliography 220 [166] H. Nagaoka and M. Hayashi. An information-spectrum approach to classical and quantum hypothesis testing for simple hypotheses. quant-ph/0206185 (2002). [167] J. Nash. Non-cooperative games. Ann. of Math., II. Ser 54, 286–295 (1951). [168] M. Neumann. Verletzung der Bellschen Ungleichungen für Gaußsche Zustände. Diplomarbeit, TU-Braunschweig (2002). [169] M. A. Nielsen. Conditions for a class of entanglement transformations. Phys. Rev. Lett. 83, no. 2, 436–439 (1999). [170] M. A. Nielsen. Continuity bounds for entanglement. Phys. Rev. A 61, no. 6, 064301 (2000). [171] M. A. Nielsen. Characterizing mixing and measurement in quantum mechanics. Phys. Rev. A 63, no. 2, 022114 (2001). [172] M. A. Nielsen and I. L. Chuang. Quantum computation and quantum information. Cambridge University Press, Cambridge (2000). [173] T. Ogawa and H. Nagaoka. Strong converse and Stein’s lemma in quantum hypothesis testing. IEEE Trans. Inf. Theory IT-46, 2428 2433 (2000). [174] M. Ohya and D. Petz. Quantum entropy and its use. Springer, Berlin (1993). [175] M. Ozawa. Conditional probability and a posteriori states in quantum mechanics. Publ. RIMS Kyoto Univ. 21, 279–295 (1985). [176] M. Ozawa. Measuring processes and repeatability hypothesis. In Probability theory and mathematical statistics (Kyoto, 1986), volume 1299 of Lect. Notes Math., pages 412–421. Springer, Berlin (1988). [177] C. M. Papadimitriou. Computational complexity. Addison-Wesley, Reading, Massachusetts (1994). [178] V. I. Paulsen. Completely bounded maps and dilations. Longman Scientific & Technical (1986). [179] A. Peres. Higher order schmidt decompositions. Phys. Lett. A 202, no. 1, 16–17 (1995). [180] A. Peres. Separability criterion for density matrices. Phys. Rev. Lett. 77, no. 8, 1413–1415 (1996). [181] S. Popescu. Bell’s inequalities versus teleportation: What is nonlocality? Phys. Rev. Lett. 72, no. 6, 797–799 (1994). [182] S. Popescu and D. Rohrlich. Thermodynamics and the measure of entanglement. Phys. Rev. A 56, no. 5, R3319–R3321 (1997). [183] E. M. Rains. Bound on distillable entanglement. Phys. Rev. A 60, no. 1, 179–184 (1999). Erratum: Pys. Rev. A 63, 1, 019902(E) (2001). [184] E. M. Rains. A semidefinite program for distillable entanglement. quantph/0008047 (2000). [185] E. M. Rains. A semidefinite program for distillable entanglement. IEEE T. Inf. Theory 47, no. 7, 2921–2933 (2001). 221 Bibliography [186] M. Reed and B. Simon. Methods of modern mathematical physics. I. Academic Press, San Diego (1980). [187] W. Rudin. Functional Analysis. McGraw-Hill, New-York (1973). [188] O. Rudolph. A separability criterion for density operators. J. Phys. A 33, no. 21, 3951–3955 (2000). [189] S. Sakai. C*-algebras and W*-algebras. Springer, Berlin, Heidelberg, New York (1971). [190] D. Schlingemann and R. F. Werner. Quantum error-correcting codes associated with graphs. quant-ph/0012111 (2000). [191] C. E. Shannon. A mathematical theory of communication. Bell. Sys. Tech. J. 27, 379–423, 623–656 (1948). [192] P. W. Shor. Algorithms for quantum computation: Discrete logarithms and factoring. In Proc. of the 35th Annual Symposium on the Foundations of Computer Science ( S. Goldwasser, editor), pages 124–134. IEEE Computer Science, Society Press, Los Alamitos, California (1994). [193] P. W. Shor. Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer. Soc. Ind. Appl. Math. J. Comp. 26, 1484–1509 (1997). [194] P. W. Shor, J. A. Smolin and B. M. Terhal. Nonadditivity of bipartite distillable entanglement follows from a conjecture on bound entangled Werner states. Phys. Rev. Lett. 86, no. 12, 2681–2684 (2001). [195] B. Simon. Representations of finite and compact groups. American Mathematical Society, Providence (1996). [196] D. Simon. On the power of quantum computation. In Proc. 35th annual symposium on foundations of computer science, pages 124–134. IEEE Computer Society Press, Los Alamitos (1994). [197] R. Simon. Peres-Horodecki separability criterion for continuous variable systems. Phys. Rev. Lett. 84, no. 12, 2726–2729 (2000). [198] S. Singh. The code book: The Science of Secrecy from Ancient Egypt to Quantum Cryptography. Fourth Estate, London (1999). [199] R. W. Spekkens and T. Rudolph. Degrees of concealment and bindingness in quantum bit commitment protocols. Phys. Rev. A 65, 012310 (2002). [200] R. W. Spekkens and T. Rudolph. A quantum protocol for cheat-sensitive weak coin flipping. quant-ph/0202118 (2002). [201] A. M. Steane. Multiple particle interference and quantum error correction. Proc. Roy. Soc. Lond. A 452, 2551–2577 (1996). [202] W. F. Stinespring. Positive functions on C*-algebras. Proc. Amer. Math. Soc. 6, 211–216 (1955). [203] E. Størmer. Positive linear maps of operator algebras. Acta Math. 110, 233– 278 (1693). [204] S.J. Summers and R.F.Werner. Maximal violation of Bell’s inequality is generic in quantum field theory. Commun. Math. Phys. 110, 247–259 (1987). Bibliography 222 [205] S.J. Summers and R.F.Werner. Maximal violation of Bell’s inequalities for algebras of observables in tangent spacetime regions. Ann. Inst. H. Poincaré A 49, 215–243 (1988). [206] S.J. Summers and R.F.Werner. On Bell’s inequalities and algebraic invariants. Lett. Math. Phys. 33, 321–334 (1995). [207] M. Takesaki. Tomita’s theory of modular Hilbert algebras and its application, volume 128 of Lect. Notes. Math. Springer, Berlin, Heidelberg, New York (1970). [208] M. Takesaki. Theory of operator algebras. Springer, New York, Heidelberg, Berlin (1979). [209] T. Tanamoto. Quantum gates by coupled asymmetric quantum dots and controlled-not-gate operation. Phys. Rev. A 61, 022305 (2000). [210] B. M. Terhal and K. G. H. Vollbrecht. Entanglement of formation for isotropic states. Phys. Rev. Lett. 85, no. 12, 2625–2628 (2000). [211] W. Tittel, J. Brendel and H. Zbinden N. Gisin. Violation of Bell inequalities by photons more than 10 km apart. Phys. Rev. Lett. 81, no. 17, 3563–3566 (1998). [212] A. M. Turing. On computable numbers, with an application to the entscheidungsproblem. Proc. Lond. Math. Soc. Ser. 2 42, 230–265 (1936). [213] L. M. K. Vandersypen. Experimental quantum computation with nuclear spins in liquid solution. Ph.D. thesis, Stanford University (2002). quantph/0205193. [214] S. R. S. Varadhan. Asymptotic probabilities and differential equations. Commun. Pure Appl. Math. 19, 261–286 (1966). [215] V. Vedral and M. B. Plenio. Entanglement measures and purification procedures. Phys. Rev. A 54, no. 3, 1619–1633 (1998). [216] V. Vedral, M. B. Plenio, M. A. Rippin and P. L. Knight. Quantifying entanglement. Phys. Rev. Lett. 78, no. 12, 2275–2279 (1997). [217] G. Vidal. Entanglement monotones. J. Mod. Opt. 47, no. 2-3, 355–376 (2000). [218] G. Vidal, J. I. Latorre, P. Pascual and R. Tarrach. Optimal minimal measurements of mixed states. Phys. Rev. A60, 126–135 (1999). [219] G. Vidal and R. Tarrach. Robustness of entanglement. Phys. Rev. A 59, no. 1, 141–155 (1999). [220] G. Vidal and R. F. Werner. A computable measure of entanglement. quantph/0102117 (2001). [221] K. G. H. Vollbrecht and R. F. Werner. Entanglement measures under symmetry. quant-ph/0010095. (2000). [222] K. G. H. Vollbrecht and R. F. Werner. Why two qubits are special. J. Math. Phys. 41, no. 10, 6772–6782 (2000). [223] J. von Neumann. On infinite direct products. Compos. Math. 6, 1–77 (1938). cf. also Collected Works III, No. 6. 223 Bibliography [224] J. von Neumann and O. Morgenstern. Theory of games and economic behavior. Princeton Univ. Press, Princeton (1944). [225] I. Wegener. The complexity of boolean functions. Teubner, Stuttgart (1987). [226] S. Weigert. Reconstruction of quantum states and its conceptual implications. In Trends in quantum mechanics ( H. D. Doebner, S. T. Ali, M. Keyl and R. F. Werner, editors), pages 146–156. World Scientific, Singapore (2000). [227] H. Weinfurter and A. Zeilinger. Quantum communication. In Quantum information ( G. Alber et. al., editor), pages 58–95. Springer (2001). [228] R. F. Werner. Quantum harmonic analysis on phase space. J. Math. Phys. 25, 1404–1411 (1984). [229] R. F. Werner. Quantum states with Einstein-Podolsky-Rosen correlations admitting a hidden-variable model. Phys. Rev. A 40, no. 8, 4277–4281 (1989). [230] R. F. Werner. Optimal cloning of pure states. Phys.Rev. A 58, 980–1003 (1998). [231] R. F. Werner. All teleportation and dense coding schemes. quant-ph/0003070 (2000). [232] R. F. Werner. Quantum information theory – an invitation. In Quantum information ( G. Alber et. al., editor), pages 14–59. Springer (2001). [233] R. F. Werner and M. M. Wolf. Bell inequalities and entanglement. Quant. Inf. Comp. 1, no. 3, 1–25 (2001). [234] R. F. Werner and M. M. Wolf. Bound entangled gaussian states. Phys. Rev. Lett. 86, no. 16, 3658–3661 (2001). [235] R.F. Werner. Physical uniformities on the state space of non-relativistic quantum mechanics. Found. Phys. 13, 859–881 (1983). [236] R.F. Werner. EPR states for von Neumann algebras. quant-ph/9910077 (1999). [237] H. Weyl. The classical groups. Princeton University, Princeton (1946). [238] W. K. Wooters. Entanglement of formation of an arbitrary state of two qubits. Phys. Rev. Lett. 80, no. 10, 2245–2248 (1998). [239] W. K. Wootters and W. H. Zurek. A single quantum cannot be cloned. Nature 299, 802–803 (1982). [240] S. L. Woronowicz. Positive maps of low dimensional matrix algebras. Rep. Math. Phys. 10, 165–183 (1976). [241] D. P. Zhelobenko. Compact Lie groups and their representations. American Mathematical Society, Providence (1978).