* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Schumacher Compression
Renormalization wikipedia , lookup
Topological quantum field theory wikipedia , lookup
Relativistic quantum mechanics wikipedia , lookup
Double-slit experiment wikipedia , lookup
Ensemble interpretation wikipedia , lookup
Basil Hiley wikipedia , lookup
Bohr–Einstein debates wikipedia , lookup
Theoretical and experimental justification for the Schrödinger equation wikipedia , lookup
Scalar field theory wikipedia , lookup
Algorithmic cooling wikipedia , lookup
Particle in a box wikipedia , lookup
Delayed choice quantum eraser wikipedia , lookup
Quantum decoherence wikipedia , lookup
Quantum electrodynamics wikipedia , lookup
Quantum field theory wikipedia , lookup
Probability amplitude wikipedia , lookup
Path integral formulation wikipedia , lookup
Hydrogen atom wikipedia , lookup
Copenhagen interpretation wikipedia , lookup
Quantum dot wikipedia , lookup
Measurement in quantum mechanics wikipedia , lookup
Bell test experiments wikipedia , lookup
Coherent states wikipedia , lookup
Quantum fiction wikipedia , lookup
Many-worlds interpretation wikipedia , lookup
Symmetry in quantum mechanics wikipedia , lookup
Orchestrated objective reduction wikipedia , lookup
Bell's theorem wikipedia , lookup
History of quantum field theory wikipedia , lookup
Density matrix wikipedia , lookup
Quantum computing wikipedia , lookup
Quantum entanglement wikipedia , lookup
Interpretations of quantum mechanics wikipedia , lookup
EPR paradox wikipedia , lookup
Quantum group wikipedia , lookup
Quantum machine learning wikipedia , lookup
Canonical quantization wikipedia , lookup
Quantum channel wikipedia , lookup
Quantum state wikipedia , lookup
Quantum cognition wikipedia , lookup
Hidden variable theory wikipedia , lookup
Quantum teleportation wikipedia , lookup
CHAPTER 17 Schumacher Compression One of the fundamental tasks in classical information theory is the compression of information. Given access to many uses of a noiseless classical channel, what is the best that a sender and receiver can make of this resource for compressed data transmission? Shannon’s compression theorem demonstrates that the Shannon entropy is the fundamental limit for the compression rate in the IID setting (recall the development in Section 13.4). That is, if one compresses at a rate above the Shannon entropy, then it is possible to recover the compressed data perfectly in the asymptotic limit, and otherwise, it is not possible to do so.1 This theorem establishes the prominent role of the entropy in Shannon’s theory of information. In the quantum world, it very well could be that one day a sender and a receiver would have many uses of a noiseless quantum channel available,2 and the sender could use this resource to transmit compressed quantum information. But what exactly does this mean in the quantum setting? A simple model of a quantum information source is an ensemble of quantum states {p X (x) , ∣ψ x ⟩}, i.e., the source outputs the state ∣ψ x ⟩ with probability p X (x), and the states {∣ψ x ⟩} do not necessarily have to form an orthonormal basis. Let’s suppose for the moment that the classical data x is available as well, even though this might not necessarily be the case in practice. A naive strategy for compressing this quantum information source would be to ignore the quantum states coming out, handle the classical data instead, and exploit Shannon’s compression protocol from Section 13.4. That is, the sender compresses the sequence x n emitted from the quantum information source at a rate equal to the Shannon entropy H (X), sends the compressed classical bits over the noiseless quantum channels, the receiver reproduces the classical sequence x n at his end, and finally reconstructs the sequence ∣ψ x n ⟩ of quantum states corresponding to the classical sequence x n . The above strategy will certainly work, but it makes no use of the fact that the noiseless quantum channels are quantum! It is clear that noiseless quantum channels will be expensive in practice, and the above strategy is wasteful in this sense because it could have merely exploited classical channels (channels that cannot preserve superpositions) to achieve the same goals. Schumacher compression 1 Technically, we did not prove the converse part of Shannon’s data compression theorem, but the converse of this chapter suffices for Shannon’s classical theorem as well. 2 How we hope so! If working, coherent fault-tolerant quantum computers come along one day, they stand to benefit from quantum compression protocols. 389 390 CHAPTER 17. SCHUMACHER COMPRESSION is a strategy that makes effective use of noiseless quantum channels to compress a quantum information source down to a rate equal to the von Neumann entropy. This has a great benefit from a practical standpoint—recall from Exercise 11.9.2 that the von Neumann entropy of a quantum information source is strictly lower than the source’s Shannon entropy if the states in the ensemble are non-orthogonal. In order to execute the protocol, the sender and receiver simply need to know the density operator ρ ≡ ∑x p X (x) ∣ψ x ⟩ ⟨ψ x ∣ of the source. Furthermore, Schumacher compression is provably optimal in the sense that any protocol that compresses a quantum information source of the above form at a rate below the von Neumann entropy cannot have a vanishing error in the asymptotic limit. Schumacher compression thus gives an operational interpretation of the von Neumann entropy as the fundamental limit on the rate of quantum data compression. Also, it sets the term “qubit” on a firm foundation in an information-theoretic sense as a measure of the amount of quantum information “contained” in a quantum information source. We begin this chapter by giving the details of the general information processing task corresponding to quantum data compression. We then prove that the von Neumann entropy is an achievable rate of compression and follow by showing that it is optimal (these two respective parts are the direct coding theorem and converse theorem for quantum data compression). We illustrate how much savings one can gain in quantum data compression by detailing a specific example. The final section of the chapter closes with a presentation of more general forms of Schumacher compression. 17.1 The Information Processing Task We first overview the general task that any quantum compression protocol attempts to accomplish. Three parameters n, R, and є corresponding to the length of the original quantum data sequence, the rate, and the error, respectively, characterize any such protocol. An (n, R + δ, є) quantum compression code consists of four steps: state preparation, encoding, transmission, and decoding. Figure 17.1 depicts a general protocol for quantum compression. An State Preparation. The quantum information source outputs a sequence ∣ψ x n ⟩ of quantum states according to the ensemble {p X (x) , ∣ψ x ⟩} where ∣ψ x n ⟩ An ≡ ∣ψ x1 ⟩ 1 ⊗ ⋯ ⊗ ∣ψ x n ⟩ A An . (17.1) The density operator, from the perspective of someone ignorant of the classical sequence x n , is equal to the tensor power state ρ⊗n where ρ ≡ ∑ p X (x) ∣ψ x ⟩ ⟨ψ x ∣ . (17.2) x Also, we can think about the purification of the above density operator. That is, an equivalent mathematical picture is to imagine that the quantum information source produces states of the form ∣φ ρ ⟩RA ≡ ∑ √ p X (x) ∣x⟩ ∣ψ x ⟩ , R A x ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (17.3) 391 17.2. THE QUANTUM DATA COMPRESSION THEOREM Figure 17.1: The most general protocol for quantum compression. Alice begins with the output of some quantum information source whose density operator is ρ ⊗n on some system An . The inaccessible reference system holds the purification of this density operator. She performs some CPTP encoding map E, sends the compressed qubits through 2nR uses of a noiseless quantum channel, and Bob performs some CPTP decoding map D to decompress the qubits. The scheme is successful if the difference between the initial state and the final state is negligible in the asymptotic limit n → ∞. where R is the label for an inaccessible reference system (not to be confused with the rate R!). The resulting IID state produced is (∣φ ρ ⟩RA)⊗n . n Encoding. Alice encodes the systems An according to some CPTP compression map E A →W where W is a quantum system of size 2nR . Recall that R is the rate of compression: R= 1 log dW − δ, n (17.4) where dW is the dimension of system W and δ is an arbitrarily small positive number. Transmission. Alice transmits the system W to Bob using n (R + δ) noiseless qubit channels. n Decoding. Bob sends the system W through a decompression map D W→Â . The protocol has є error if the compressed and decompressed state is є-close in trace distance to the original state (∣φ ρ ⟩RA)⊗n : ⊗n ∥(φ RA ρ ) n →W − (D W→Â ○ E A n ⊗n )((φ RA ρ ) )∥ ≤ є. 1 (17.5) 17.2 The Quantum Data Compression Theorem We say that a quantum compression rate R is achievable if there exists an (n, R + δ, є) quantum compression code for all δ, є > 0 and all sufficiently large n. Schumacher’s compression theorem establishes the von Neumann entropy as the fundamental limit on quantum data compression. ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 392 CHAPTER 17. SCHUMACHER COMPRESSION Figure 17.2: Schumacher’s compression protocol. Alice begins with many copies of the output of the quantum information source. She performs a measurement onto the typical subspace corresponding to the state ρ and then performs a compression isometry of the typical subspace to a space of size 2n[H(ρ)+δ] qubits. She transmits these compressed qubits over n [H (ρ) + δ] uses of a noiseless quantum channel. Bob performs the inverse of the isometry to uncompress the qubits. The protocol is successful in the asymptotic limit due to the properties of typical subspaces. Theorem 17.2.1 (Quantum Data Compression). Suppose that ρ A is the density operator of the quantum information source. Then the von Neumann entropy H (A)ρ is the smallest achievable rate R for quantum data compression: inf {R ∶ R is achievable} = H (A)ρ . 17.2.1 (17.6) The Direct Coding Theorem Schumacher’s compression protocol demonstrates that the von Neumann entropy H (A)ρ is an achievable rate for quantum data compression. It is remarkably similar to Shannon’s compression protocol from Section 13.4, but it has some subtle differences that are necessary for the quantum setting. The basic steps of the encoding are to perform a typical subspace measurement and an isometry that compresses the typical subspace. The decoder then performs the inverse of the isometry to decompress the state. The protocol is successful if the typical subspace measurement successfully projects onto the typical subspace, and it fails otherwise. Just like in the classical case, the law of large numbers guarantees that the protocol is successful in the asymptotic limit as n → ∞. Figure 17.2 provides an illustration of the protocol, and we now provide a rigorous argument. ⊗n Alice begins with many copies of the state (φ RA ρ ) . Suppose that the spectral decomposition of ρ is as follows: ρ = ∑ p Z (z) ∣z⟩ ⟨z∣ , (17.7) z where p Z (z) is some probability distribution, and {∣z⟩} is some orthonormal basis. Her first step n n E1A →YA is to perform a typical subspace measurement of the form in (14.1.4) onto the typical subspace of An , where the typical projector is with respect to the density operator ρ. The action of ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 393 17.2. THE QUANTUM DATA COMPRESSION THEOREM n →An Y E1A n on a general state σ A is n →YAn E1A Y Y (σ A ) ≡ ∣0⟩ ⟨0∣ ⊗ (I − Π δA ) σ A (I − Π δA ) + ∣1⟩ ⟨1∣ ⊗ Π δA σ A Π δA , n n n n n n n (17.8) n and the classically-correlated flag bit Y indicates whether the typical subspace projection Π δA is successful or unsuccessful. Recall from the Shannon compression protocol in Section 13.4 that we exploited a function f that mapped from the set of typical sequences to a set of binary sequences n[H(ρ)+δ] {0, 1} . Now, we can construct an isometry U f that is a coherent version of this classical An W function f . It simply maps the orthonormal basis {∣z n ⟩ } to the basis {∣ f (z n )⟩ }: An U f ≡ ∑ ∣ f (z n )⟩ ⟨z n ∣ . W z n ∈TδZ (17.9) n The above operator is an isometry because the input space is a subspace of size at most 2n[H(ρ)+δ] (recall Property 14.1.2) embedded in a larger space of size 2n (at least for qubits) and the output n space is of size at most 2n[H(ρ)+δ] . So her next step E2YA →YW is to perform the isometric compression n conditional on the flag bit Y being equal to one, and the action of E2YA →YW on a general classicaln n n Y Y quantum state σ YA ≡ ∣0⟩ ⟨0∣ ⊗ σ0A + ∣1⟩ ⟨1∣ ⊗ σ1A is as follows: E2YA n →YW (σ YA ) ≡ ∣0⟩ ⟨0∣Y ⊗ Tr {σ0A } ∣e⟩ ⟨e∣W + ∣1⟩ ⟨1∣Y ⊗ U f σ1A U †f , n n n where ∣e⟩ is some error flag orthogonal to all of the states {∣ f (ϕ x n )⟩ }ϕ W W pletes the details of her encoder E n →YW EA An →YW qn x n ∈Tδ (17.10) . This last step com- , and the action of it on the initial state is ⊗n YA ((φ RA ρ ) ) ≡ (E2 n →YW n →YAn ○ E1A ⊗n )((φ RA ρ ) ). (17.11) Alice then transmits all of the compressed qubits over n [H (ρ) + δ] + 1 uses of the noiseless qubit channel. n Bob’s decoding D YW→A performs the inverse of the isometry conditional on the flag bit being An equal to one and otherwise maps to some other state ∣e⟩ outside of the typical subspace. The action Y Y of the decoder on some general classical-quantum state σ YW ≡ ∣0⟩ ⟨0∣ ⊗ σ0W + ∣1⟩ ⟨1∣ ⊗ σ1W is n Y A Y D1YW→YA (σ YW ) ≡ ∣0⟩ ⟨0∣ ⊗ Tr {σ0W } ∣e⟩ ⟨e∣ + ∣1⟩ ⟨1∣ ⊗ U †f σ1W U f . n (17.12) The final part of the decoder is to discard the classical flag bit: D2YA →A ≡TrY {⋅}. Then D YW→A ≡ n n n D2YA →A ○ D1YW→YA . We now can analyze how this protocol performs with respect to our performance criterion in (17.5). Consider the following chain of inequalities: n ⊗n ∥(φ RA ρ ) n →YW − (D YW→A ○ E A n ⊗n ⊗n ≤ ∥∣1⟩ ⟨1∣ ⊗ (φ RA ρ ) Y ⊗n = ∥∣1⟩ ⟨1∣ ⊗ (φ RA ρ ) Y n − (D1YW→A ○ E A n n →YW − (∣0⟩ ⟨0∣ ⊗ Tr {(I − Y n ⊗n ) ((φ RA ρ ) )∥ (17.13) 1 n →YW YW→A = ∥TrY {∣1⟩ ⟨1∣ ⊗ (φ RA ○ EA ρ ) } − (D Y n ⊗n ) ((φ RA ρ ) )∥ ⊗n (17.14) 1 ) ((φ RA ρ ) )∥ 1 An An RA ⊗n Π δ ) (φ ρ ) } ∣e⟩ ⟨e∣ (17.15) Y + ∣1⟩ ⟨1∣ ⊗ Π δA (φ RA ρ ) n ⊗n Π δA )∥ n 1 (17.16) ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 394 CHAPTER 17. SCHUMACHER COMPRESSION ⊗n Y The first equality follows by adding a flag bit ∣1⟩ to (φ RA and tracing it out. The first inequality ρ ) follows from monotonicity of trace distance under the discarding of subsystems (Corollary 9.1.2). ⊗n n n The second equality follows by evaluating the map D1YW→A ○ E A →YW on the state (φ RA ρ ) . Continuing, we have ⊗n ≤ ∥∣1⟩ ⟨1∣ ⊗ (φ RA ρ ) Y Y − ∣1⟩ ⟨1∣ ⊗ Π δA (φ RA ρ ) n + ∥∣0⟩ ⟨0∣ ⊗ Tr {(I − ⊗n ⊗n − Π δA (φ RA ρ ) n Π δA ∥ n 1 An An RA ⊗n Π δ ) (φ ρ ) } ∣e⟩ ⟨e∣ ∥ Y = ∥(φ RA ρ ) √ ≤ 2 є+є ⊗n 1 (17.17) ⊗n Π δA ∥ + Tr {(I − Π δA ) (φ RA ρ ) } n n (17.18) 1 (17.19) The first inequality follows from the triangle inequality for trace distance (Lemma 9.1.2). The equality uses the facts ∥ρ ⊗ σ − ω ⊗ σ∥1 = ∥ρ − ω∥1 ∥σ∥1 = ∥ρ − ω∥1 and ∥bρ∥1 = ∣b∣ ∥ρ∥1 for some density operators ρ, σ, and ω and a constant b. The final inequality follows from the first property of typical subspaces: ⊗n n An ⊗n (17.20) Tr {Π δA (φ RA ρ ) } = Tr {Π δ ρ } ≥ 1 − є, and the Gentle Operator Lemma (Lemma 9.4.2). We remark that it is important for the typical subspace measurement in (17.8) to be implemented as a coherent quantum measurement. That is, the only information that this measurement should learn is whether the state is typical or not. Otherwise, there would be too much disturbance to the quantum information, and the protocol would fail at the desired task of compression. Such precise control on so many qubits is possible in principle, but it is of course rather daunting to implement in practice! 17.2.2 The Converse Theorem We now prove the converse theorem for quantum data compression by considering the most general compression protocol that meets the success criterion in (17.5) and demonstrating that such an asymptotically error-free protocol should have its rate of compression above the von Neumann entropy of the source. Alice would like to compress a state σ that lives on a Hilbert space An . The n n purification ϕ R A of this state lives on the joint systems An and R n where R n is the purifying system (again, we should not confuse reference system R n with rate R). If she can compress any system on An and recover it faithfully, then she should be able to do so for the purification of the state. An (n, R + δ, є) compression code has the property that it can compress at a rate R with only error є. The quantum data processing is An E W D Ân , (17.21) Ð → Ð → and the following inequality holds for a successful quantum compression protocol: ∥ω R where ωR n Ân n Ân − ϕR n Ân ∥ ≤ є, ≡ D(E(ϕ R 1 n An (17.22) )). ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License (17.23) 395 17.3. QUANTUM COMPRESSION EXAMPLE Consider the following chain of inequalities: 2nR = log2 (2nR ) + log2 (2nR ) (17.24) ≥ ∣H (W)ω ∣ + ∣H (W∣R )ω ∣ ≥ ∣H (W)ω − H (W∣R n )ω ∣ = I (W; R n )ω n (17.25) (17.26) (17.27) The first inequality follows from the fact that both the quantum entropy H (W) and the conditional quantum entropy H (W∣R n ) cannot be larger than the logarithm of the dimension of the system W (see Property 11.1.3 and Theorem 11.5.1). The second inequality is the triangle inequality. The second equality is from the definition of quantum mutual information. Continuing, we have ≥ I (Ân ; R n )ω (17.28) ≥ I (Ân ; R n )ϕ − nє′ (17.29) = I (An ; R n )ϕ − nє′ (17.30) = H (A )ϕ + H (R )ϕ − H (A R )ϕ − nє n n n n ′ = 2H (An )ϕ − nє′ (17.31) (17.32) The first inequality follows from the quantum data processing inequality (Bob processes W with the decoder to get Ân ). The second inequality follows from applying the Alicki-Fannes’ inequality (see Exercise 11.9.7) to the success criterion in (17.22) and setting є′ ≡ 6єR + 4H2 (є)/n. The first equality follows because the systems Ân and An are isomorphic (they have the same dimension). The second equality is by the definition of quantum mutual information, and the last equality follows because the entropies of each half of a pure, bipartite state are equal and their joint entropy vanishes. In the case that the state ϕ is an IID state of the form (∣φ ρ ⟩RA)⊗n from before, the von Neumann entropy is additive so that H (An )∣φ ρ ⟩⊗n = nH (A)∣φ ρ ⟩ , (17.33) and this shows that the rate R must be greater than the entropy H (A) if the error of the protocol vanishes in the asymptotic limit. 17.3 Quantum Compression Example We now highlight a particular example where Schumacher compression gives a big savings in compression rates if noiseless quantum channels are available. Suppose that the ensemble is of the following form: 1 1 {( , ∣0⟩) , ( , ∣+⟩)} . (17.34) 2 2 This ensemble is known as the Bennett-92 ensemble because it is useful in Bennett’s protocol for quantum key distribution. The naive strategy would be for Alice and Bob to exploit Shannon’s compression protocol. That is, Alice would ignore the quantum nature of the states, and supposing that ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 396 CHAPTER 17. SCHUMACHER COMPRESSION the classical label for them were available, she would encode the classical label. Though, the entropy of the uniform distribution on two states is equal to one bit, and she would have to transmit classical messages at a rate of one bit per channel use. A far wiser strategy is to employ Schumacher compression. The density operator of the above ensemble is 1 1 ∣0⟩ ⟨0∣ + ∣+⟩ ⟨+∣ , (17.35) 2 2 which has the following spectral decomposition: cos2 (π/8) ∣+′ ⟩ ⟨+′ ∣ + sin2 (π/8) ∣−′ ⟩ ⟨−′ ∣ , (17.36) ∣+′ ⟩ ≡ cos (π/8) ∣0⟩ + sin (π/8) ∣1⟩ , ∣−′ ⟩ ≡ sin (π/8) ∣0⟩ − cos (π/8) ∣1⟩ . (17.37) (17.38) where The binary entropy H2 (cos2 (π/8)) of the distribution [cos2 (π/8) , sin2 (π/8)] is approximately equal to 0.6009 qubits, (17.39) and thus they can save a significant amount in terms of compression rate by employing Schumacher compression. This type of savings will always occur whenever the ensemble includes nonorthogonal quantum states. Exercise 17.3.1 In the above example, suppose that Alice associates a classical label with the states, so that the ensemble instead is 1 1 {( , ∣0⟩ ⟨0∣ ⊗ ∣0⟩ ⟨0∣) , ( , ∣1⟩ ⟨1∣ ⊗ ∣+⟩ ⟨+∣)} . 2 2 (17.40) Does this help in reducing the amount of qubits she has to transmit to Bob? 17.4 Variations on the Schumacher Theme We can propose several variations on the Schumacher compression theme. For example, suppose that the quantum information source corresponds to the following ensemble instead: {p X (x) , ρ xA} , (17.41) where each ρ x is a mixed state. Then the situation is not as “clear-cut” as in the simpler model for a quantum information source because the techniques exploited in the converse proof do not apply here. Thus, the entropy of the source does not serve as a lower bound on the ultimate compressibility rate. Let us consider a special example of the above situation. Suppose that the mixed states ρ x live on orthogonal subspaces, and let ρ A = ∑x p X (x) ρ x denote the expected density operator of the ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 397 17.4. VARIATIONS ON THE SCHUMACHER THEME ensemble. These states are perfectly distinguishable by a measurement whose projectors project onto the different orthogonal subspaces. Alice could then perform this measurement and associate classical labels with each of the states: ρ XA ≡ ∑ p X (x) ∣x⟩ ⟨x∣ ⊗ ρ xA . X (17.42) x Furthermore, she can do this in principle without disturbing the state in any way, and therefore the entropy of the state ρ XA is equivalent to the original entropy of the state ρ A: H (A)ρ = H (XA)ρ . (17.43) Naively applying Schumacher compression to such a source is actually not a great strategy here. The compression rate would be equal to H (XA)ρ = H (X)ρ + H (A∣X)ρ , (17.44) and in this case, H (A∣X)ρ ≥ 0 because the conditioning system is classical. For this case, a much better strategy than Schumacher compression is for Alice to measure the classical variable X, compress it with Shannon compression, and transmit to Bob so that he can reconstruct the quantum states at his end of the channel. The rate of compression here is equal to the Shannon entropy H (X) which is provably lower than H (XA)ρ for this example. How should we handle the mixed source case in general? Let’s consider the direct coding theorem and the converse theorem. The direct coding theorem for this case is essentially equivalent to Schumacher’s protocol for quantum compression—there does not appear to be a better approach in the general case. The density operator of the source is equal to ρ A = ∑ p X (x) ρ xA . (17.45) x A compression rate R > H (A)ρ is achievable if we form the typical subspace measurement from the n ⊗n typical subspace projector Π δA onto the state (ρ A) . Although the direct coding theorem stays the same, the converse theorem changes somewhat. A purification of the above density operator is as follows: √ RA X X′ XX ′ RA (17.46) ∣ϕ⟩ ≡ ∑ p X (x) ∣x⟩ ∣x⟩ ∣ϕ ρ x ⟩ , x RA where each ∣ϕ ρ x ⟩ is a purification of ρ xA. So the purifying system is the joint system XX ′ R. Let ω XX R Â be the actual state generated by the protocol: ′ ω XX R Â ≡ D(E(ϕ XX RA)). ′ ′ (17.47) We can now provide an alternate converse proof: nR = log2 dW ≥ H (W)ω ≥ H (W)ω − H (W∣X)ω = I (W; X)ω ≥ I (A; X)ω ≥ I (A; X)ϕ − nє′ (17.48) (17.49) (17.50) (17.51) (17.52) (17.53) ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 398 CHAPTER 17. SCHUMACHER COMPRESSION The first equality follows from evaluating the logarithm of the dimension of system W. The first inequality follows because the von Neumann entropy is less than the logarithm of the dimension of the system. The second inequality follows because H (W∣X)ω ≥ 0—the system X is classical when tracing over X ′ and R. The second equality follows from the definition of quantum mutual information. The third inequality follows from quantum data processing, and the final follows from applying the Alicki-Fannes’ inequality (similar to the way that we did for the converse of the quantum data XX ′ RA compression theorem). Tracing over X ′ and R of ∣ϕ⟩ gives the following state ∑ p X (x) ∣x⟩ ⟨x∣ ⊗ ρ xA , X (17.54) x demonstrating that the ultimate lower bound on the compression rate of a mixed source is the Holevo information of the ensemble. The next exercise asks you to verify that ensembles of mixed states on orthogonal subspaces saturate this bound. Exercise 17.4.1 Show that the Holevo information of an ensemble of mixed states on orthogonal subspaces has its Shannon information equal to its Holevo information. Thus, this is an example of a class of ensembles that meet the above lower bound on compressibility. 17.5 Concluding Remarks Schumacher compression was the first quantum Shannon-theoretic result discovered and is the simplest one that we encounter in this book. The proof is remarkably similar to the proof of Shannon’s noiseless coding theorem, with the main difference that we should be more careful in the quantum case not to be learning any more information than necessary when performing measurements. The intuition that we gain for future quantum protocols is that it often suffices to consider only what happens to a high probability subspace rather than the whole space itself if our primary goal is to have a small probability of error in a communication task. In fact, this intuition is the same needed for understanding information processing tasks such as entanglement concentration, classical communication, private classical communication, and quantum communication. The problem of characterizing the lower and upper bounds for the quantum compression rate of a mixed state quantum information source still remains open, despite considerable efforts in this direction. It is only in special cases, such as the example mentioned in Section 17.4, that we know of a matching lower and upper bound as in Schumacher’s original theorem. 17.6 History and Further Reading Schumacher introduced typical subspaces and proved the quantum data compression theorem in Ref. [203]. Jozsa and Schumacher later generalized this proof [160], and Lo further generalized the theorem to mixed state sources [177]. There are other generalizations in Refs. [139, 13]. Several schemes for universal quantum data compression exist [158, 159, 30], in which the sender does not need to have a description of the quantum information source in order to compress its output. ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 17.6. HISTORY AND FURTHER READING 399 There are also practical schemes for quantum data compression in work about quantum Huffman codes [45]. ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License 400 CHAPTER 17. SCHUMACHER COMPRESSION ©2011 Mark M. Wilde—This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License