Download Quantum Information Theory

Quantum Information Theory This web site is intended to present Quantum Information Theory at an undergraduate level. It is assumed that the reader has a basic understanding of quantum mechanics. The basics of classical information theory are presented, but for greater depth please see the references. Here are a list of topics in quantum information theory, presented in the order in which you will probably need to read them. Examples are provided for almost every concept. 1. 2. 3. 4. 5. 6. Quantum Density Matrix Entropy Mutual Information and Conditional Entropy Galois Fields Classical Coding Theory Quantum Coding Theory Notation used in these pages: a prime following a complex number indicates the complex conjugate, a prime following a matrix or operator indicates the Hermitian conjugate. All logarithms in the text are to the base 2, so that information is measured in units of bits. Because not all web browsers support superscripts and subscripts, an underscore precedes subscripts and a carat precedes superscripts. X^2 is "X squared" and X_2 is "X sub 2". Because of the hassle of making bent brackets in HTML, parentheses are used in Dirac notation, bra = (X| and ket = |X). Quantum Density Matrix A quantum density matrix expresses the distribution of quantum states in an ensemble of particles. The density matrix is Hermitian with all its eigenvalues between 0 and 1. Let the density matrix for a certain ensemble be D. A particle is drawn from this ensemble and subjected to a measurement. The probability that the particle will be found in eigenstate |A) of the measurement is (A|D|A). The density matrix for an ensemble of particles prepared in the pure state |Y) is |Y)(Y|. The density matrix for an ensemble that is an incoherent mixture of pure states is the weighted sum of the density matrices for the pure states. Example 1 Let the particles in an ensemble have a two dimensional state space spanned by eigenstates |0) and |1). Each particle may be thought of as a quantum bit, or "qubit". If all particles are prepared in the pure state |Y) = a |0) + b |1), with a and b complex, aa' + bb' = 1, then the quantum density matrix for this ensemble is: --D = |Y)(Y| = | aa' ab' | | ba' bb' | --- Example 2 Suppose that 3/4 of the particles in an ensemble are prepared in the pure state |A) = 0.8 |0) + 0.6 |1), and the other 1/4 are prepared in the pure state |B) = 0.6 |0) - 0.8i |1). The quantum density matrix for this ensemble is: --| 0.57 0.36+0.12i | D = 0.75 |A)(A| + 0.25 |B)(B| = | | | 0.36-0.12i 0.43 | --- A particle is drawn from this ensemble and a measurement is performed whose eigenstates are |0) and |1). There is a 57% chance that the particle will be found in the state |0) and a 43% chance that the particle will be found in state |1). Another particle is drawn and a measurement is performed whose eigenstates are |E) = 0.6 |0) + 0.8 |1) and |F) = 0.8 |0) - 0.6 |1). What are the probabilities that the particle will be found in these states? Prob(E) = (E|D|E) = 0.75 (E|A)(A|E) + 0.25 (E|B)(B|E) = [ 0.6 --- --0.8 ] | 0.57 0.36+0.12i | | 0.6 | | | | | = 0.826 | 0.36-0.12i 0.43 | | 0.8 | --- --- Prob(F) = (F|D|F) = 0.75 (F|A)(A|F) + 0.25 (F|B)(B|F) --- --= [ 0.8 -0.6 ] | 0.57 0.36+0.12i | | 0.8 | | | | | = 0.174 | 0.36-0.12i 0.43 | | -0.6 | --- --- We find that there is an 82.6% chance that the particle will be found in the state |E) and a 17.4% chance that the particle will be found in the state |F). Entropy The entropy is a measure of the disorder of a system. In information theory, the entropy represents our ignorance of the state of a system. If our system can be in any of 2^n possible equally likely states, the entropy is n bits, since it would require answering n yes/no questions to determine exactly which state the system was in. For systems like this, the entropy is log(states) [base 2]. Now suppose that we divide the states of the system into two sets, one containing nine tenths of the states and the other containing the other tenth. You agree to tell me which of the two sets the actual state of the system is in. If it is the smaller part, the entropy is log(N/10) = log(N) log(10) so has decreased by log(10)=3.32193 bits. If it is in the larger part, the entropy is log(9N/10) = log(N) - log(10/9) so has decreased by log(10/9)=0.15200 bits. Since the state has a nine tenths chance of being in the larger part, the expected decrease in entropy is delta(E) = 0.9 log(10/9) + 0.1 log(10) = -0.9 log(0.9) - 0.1 log(0.1) = 0.46900 This is the same as having a system which can be in either of two states, one with probability 0.9 and one with probability 0.1. The Shannon entropy of this system is defined to be the expected amount of information gained (i.e. entropy lost) by finding out which of the two states the system is actually in (0.46900 bits). This easily generalizes to the cases with many states of unequal probability. Given a discrete random variable X with distribution P_X and letting p_x be the probability that X=x, the Boltzmann-Gibbs-Shannon entropy of the distribution P_X is defined as: S(P_X) = - Sum(x) p_x log p_x Here and in all these documents log means the logarithm to the base 2, so the entropy is expressed in bits. Please note that in this computation 0 log 0 = 0 since lim(a->0) a log a = 0. An important property of the entropy defined this way is that when two independent systems are combined into a larger system, the entropy of the combined system is the sum of the entropies of the component systems. Put more simply, if I give you N bits of information and then give you M more bits of information which is completely unrelated to the first N bits, you now have N+M bits of information about the system. (This certainly seems like a good thing to demand of the definition of entropy.) Given two independent discrete random variables X and Y with distributions P_X and Q_Y and letting p_x be the probability that X=x, q_y be the probability that Y=y. The probability that X=x and Y=y is thus p_x q_y. The Boltzmann-Gibbs-Shannon entropy of the joint distribution P_X Q_Y is: S(P_X Q_Y) = - Sum(x,y) p_x q_y log(p_x q_y) = -Sum(x,y) p_x q_y (log p_x + log q_y) = - Sum(x) Sum(y) p_x q_y log p_x - Sum(y) Sum(x) p_x q_y log q_y = -(Sum(x) p_x log p_x) (Sum(y) q_y) -(Sum(y) q_y log q_y) (Sum(x) p_x) = - Sum(x) p_x log p_x - Sum(y) q_y log q_y = S(P_X) + S(Q_Y) since Sum(x) p_x = Sum(y) q_y = 1 An equivalent expression for the quantum entropy has been introduced by von Neumann in terms of the quantum density matrix D: S = - Trace( D log D ) To evaluate this expression write D in the form V'EV where E is a diagonal matrix of eigenvalues of D and V is a matrix whose columns are the corresponding eigenvectors of D. S = - Trace( D log D ) = - Trace( V EV log(V'EV) ) = - Trace( V'EVV' log(E) V) = -Trace( V' (E log E) V ) = - Trace( E log E ) = - Sum(i) E_i log E_i where E_i are the eigenvalues of D. For an ensemble of particles in a pure state, D will have one eigenvalue equal to 1 and all the others 0. Since 1 log 1 = 0 log 0 = 0, the entropy of this ensemble is 0, as expected. Viewed in a basis where D is diagonal, the quantum entropy is exactly the Shannon entropy, since the eigenvalues of D are the probabilities to find a particle in each of the eigenstates. Example Let us compute the entropy of a particle drawn from the distribution in example 2 of the Quantum Density Matrix page. --| 0.57 0.36+0.12i | D = | | | 0.36-0.12i 0.43 | --D has eigenvalues 0.885876 and 0.114124 S = -0.885876 log 0.885876 - 0.114124 log 0.114124 = 0.512232 Compare this to the maximum entropy for a qubit: S_max = -0.5 log 0.5 - 0.5 log 0.5 = log 2 = 1 Mutual Information and Conditional Entropy NOTE: The majority of this page comes from papers written by Nicolas Cerf and Chris Adami at Caltech. These papers are very readable and highly suggested reading. [Nicolas and Chris - if I mess this up, give me some e-mail and I'll correct it Michael] Fundamentals of Probability Theory Given a sample space S and events E and F in S, the probability of E is written P(E) and the probability of F is written P(F). The joint probability of E and F is the probability that both occur, P(E and F). E and F are independent if and only if P(E and F) = P(E) * P(F). P(E or F) = P(E) + P(F) - P(E and F) The conditional probability of E occurring, given that F has occurred is P(E | F) = P(E and F) / P(F) Bayes Rule: If (E_1,...,E_n) are n mutually exclusive events whose union is the sample space S, and E is any arbitrary event of S with nonzero probability, then P(E_k | E) = P(E_k) P(E | E_k) / P(E) Conditional Entropy - Classical The classical conditional entropy is defined as: H(X | Y) = H(P_XY) - H(Y) = - Sum(x,y) p_xy log p_x|y where p_x|y = p_xy / p_y = probability of X=x conditional on Y=y In the classical theory H(X|Y) is always positive because the entropy of a composite system cannot be lower than the entropy of any of its subsystems. Surprisingly, this is not true in the quantum case, when X and Y are quantum entangled systems. For a classical system: max[H(P_X),H(P_Y)] <= H(P_XY) <= H(P_X) + H(P_Y) The upper bound is reached if X and Y are independent, and the lower bound is reached if X and Y are maximally correlated. Mutual Information - Classical Given a discrete two dimensional sample space with distribution P_XY, we write p_xy = Prob(X=x,Y=y). The distributions of X and Y separately are P_X and P_Y, and the probabilities p_x = Prob(X=x) and p_y = Prob(Y=y) may be calculated in terms of p_xy: p_x = Sum(y) p_xy p_y = Sum(x) p_xy The entropy of each distribution is: H(XY) = - Sum(x,y) p_xy log p_xy H(X) = - Sum(x) p_x log p_x H(Y) = - Sum(y) p_y log p_y The mutual information of X and Y, written I(X,Y) is the amount of information gained about X when Y is learned, and vice versa. I(X,Y) = 0 if and only if X and Y are independent. I(X,Y) = H(P_X) + H(P_Y) - H(P_XY) I(X,Y) <= min[H(P_X),H(P_Y)] The bound on I(X,Y) is twice as large in the quantum case as it is in the classical case. The maximum for the quantum case occurs for entangled states which have no classical counterpart. Conditional Entropy - Quantum Cerf and Adami define the conditional density matrix D_X|Y = D_XY ( 1_X @ D_Y )^-1 where @ represents the tensor product, and 1_X is the unity matrix in X's Hilbert space. D_X|Y is not a true density matrix because it does not have trace 1. Because of this, the conditional entropy can be negative. The quantum conditional entropy is defined as S(X|Y) = - Trace( D_XY ln D_X|Y ) = S(D_XY) - S(D_Y) Mutual Information - Quantum The mutual information of two entities in a quantum system is defined in terms of the quantum entropy which is expressed in terms of the quantum density matrix. I(X,Y) = S(D_X) + S(D_Y) - S(D_XY) I(X,Y) <= 2 min[S(D_X),S(D_Y)] The classical limit is exceeded in the quantum entangled case and is related to negative conditional entropy. Examples 1. Independent States 2. Classically Correlated States 3. Quantum Entangled States Galois Fields A Galois field is a finite field with p^n elements where p is a prime integer. The set of nonzero elements of the field is a cyclic group under multiplication. A generator of this cyclic group is called a primitive element of the field. The Galois field can be generated as the set of polynomials with coefficients in Z_p modulo an irreducible polynomial of degree n. Example 1: GF(2) GF(2) consists of the elements 0 and 1 and is the smallest finite field. It is generated by polynomials over Z_2 modulo the polynomial x. Its addition and multiplication tables are as follows: + | 0 1 --+---0 | 0 1 1 | 1 0 * | 0 1 --+---0 | 0 0 1 | 0 1 Codes often use GF(2) because it is easily represented on a computer by a single bit. Example 2: GF(3) GF(3) consists of the elements 0, 1, and -1. It is generated by polynomials over Z_3 modulo the polynomial x. Its addition and multiplication tables are as follows: + | 0 1 -1 --+-------0 | 0 1 -1 1 | 1 -1 0 -1 |-1 0 1 * | 0 1 -1 --+-------0 | 0 0 0 1 | 0 1 -1 -1 | 0 -1 1 Some codes called ternary codes use GF(3). Example 3: GF(4) Of particular interest in quantum information theory is GF(4), which is generated by polynomials over Z_2 modulo the irreducible polynomial x^2 + x + 1. Its elements are denoted here as (0,1,A,B). Here are the addition and multiplication tables for GF(4): + | 0 1 A B --+-------0 | 0 1 A B 1 | 1 0 B A A | A B 0 1 B | B A 1 0 * | 0 1 A B --+-------0 | 0 0 0 0 1 | 0 1 A B A | 0 A B 1 B | 0 B 1 A Because of the multiplication table, A is often identified with the cube root of unity 1/2 + i sqrt(3)/2. A and B are primitive elements of GF(4). Classical Coding Theory The fundamental problem of coding theory is the reliable transmission of information in the presence of noise. A message is encoded into a stream of bits and decoded at the other end. In order to protect against the corruption of the message by noise, some degree of redundancy is required in the message. The greater the chance of a bit error, the more redundancy is needed to keep the probability of error low. A code C over an alphabet A is a set of vectors of fixed length n with entries from A. A is generally chosen to be a finite field GF(q), and is often in practice just (0,1). The Hamming distance d(u,v) between two vectors u and v is the number of places in which they differ. For a vector u over GF(q), define the weight, wt(u), as the number of nonzero components. Then d(u,v) = wt(u-v). The minimum Hamming distance between two distinct vectors in a code C is called the minimum distance d. A code can detect e errors if e < d. A code can correct t errors if 2t+1 < d. Error correction proceeds by computing a syndrome from the received code word which tells which bits are in error. The error bits are then changed to form a corrected code word, which will be equal to the transmitted code word if no more than t errors occurred during transmission. The rate of a code is the ratio of the number of bits needed to send a message in an error-free transmission to the number needed to send the message using the code. One of the goals of coding theory is to find efficient codes, i.e., those that have as large a rate as possible for a given level of error tolerance. Example 1: Tell Me Three Times A simple error correcting scheme which can correct one error is to send each bit three times and assume at least two of the bits are correct at the receiver. The alphabet here is (0,1) and the two code words are 000 and 111. The code words differ in three positions so the minimum distance of this code is 3, thus we can correct one error. The rate of this code is one third since we send three bits for every bit of the original message. Suppose we receive the message abc. We form the syndrome (b+c, a+c). If the syndrome is:     (0,0) - do nothing (0,1) - flip a (1,0) - flip b (1,1) - flip c Example 2: Hamming(7,4,3) The alphabet for this code is (0,1). There are 16 seven bit code words: 0000000, 0100101, 1000011, 1100110, 0001111, 0101010, 1001100, 1101001, 0010110, 0110011, 1010101, 1110000, 0011001, 0111100, 1011010, 1111111 The first four bits of each code word are the message we wish to send, and the remaining three are parity bits. Given a message abcd we compute e=b+c+d, f=a+c+d, g=a+b+d, and send the message abcdefg. Upon reception we form the syndrome (d+e+f+g, b+c+f+g, a+c+e+g) and correct bits as follows:   (0,0,0) - do nothing (0,0,1) - flip a       (0,1,0) - flip b (0,1,1) - flip c (1,0,0) - flip d (1,0,1) - flip e (1,1,0) - flip f (1,1,1) - flip g If there was no more than one bit in error, abcd will be the original message transmitted. The rate of this code is 4/7. Quantum Coding Theory In the course of a computation in a quantum computer, the transmission of data from one computing element to another will be disturbed by thermal noise. Classical computers overcome this by using a large number of particles for each bit sent. Quantum computers likewise need redundancy in the transmission using quantum codes. It is important that the error correcting scheme used return the message to its original state, which may be a superposition of code words. Because of this, the correction must take place without performing a measurement of the codeword, which would destroy the phase information in the superposition. The basic idea behind quantum error correction is to give the code qubits several paths to take, each of which corresponds to an error syndrome. In that branch, the error is corrected and all the branches are merged to form a corrected output. The trick is to find a Hamiltonian which splits codewords with different syndromes apart from each other without introducing further noise into the codewords. Great care must be taken to merge all the branches back together in phase, or the phase relationships of the original codeword will be altered. To show a simple example of correction, suppose I wish to transport a vertically polarized photon to you through a channel which may change the polarization. Upon receiving my photon, you pass it through a birefringent crystal to separate the horizontal and vertical polarization components. You place an element into the horizontal channel which rotates the polarization by 90 degrees, then recombine the signals. You are now guaranteed to receive a vertically polarized photon. The "error" has been corrected. This device cannot be used to send a message, because it transmits no information. You know with certainty that you will receive a vertically polarized photon. However, it does illustrate the principle of quantum error correction. Below, I go through the example of the quantum equivalent of Tell-Me-Three-Times which outlines a potentially useful quantum code. In a classical system the only type of error is a bit-flip. A quantum bit can have both phase and flip errors. The general error for a qubit is a 2 by 2 unitary transformation. It has been shown that if flips, phase errors, and a combination of the two can be corrected, then any unitary transformation can be corrected. Flips are represented by the matrix X, and phase errors by Z. Most of the references use Y=XZ as the flip+phase error, but I prefer to stick to the Pauli matrices: -- -- -- -- -- -- X = | 0 | +1 --I = | +1 | 0 -- +1 | Y = | 0 -i | 0 | | +i 0 | ----0 | = identity matrix +1 | -- Z = | +1 | 0 -- 0 | -1 | -- A unitary matrix U may be represented by: U = t I + ix X + iy Y + iz Z where t,x,y,z real and t^2 + x^2 + y^2 + z^2 = 1 Demonstration that if you can correct X, Y, and Z separately, you can correct any unitary error in a qubit: Assume the correct state is |A) and that |A), X|A), Y|A), Z|A) are all orthonormal states. We are trying to send |A), but it has been corrupted by the unitary operator U. U|A) = t|A) + ix X|A) + iy Y|A) + iz Z|A) We separate the four states into four different channels, operate on X|A) with X, Y|A) with Y, and Z|A) with Z, resulting in: I: t |A) X: ix X|A) Y: iy Y|A) Z: iz Z|A) goes goes goes goes to t |A) to ix |A) to iy |A) to iz |A) The four channels are recombined to give the corrected state [t+ix+iy+iz]|A). I am disturbed by this because it does not have magnitude 1. Is there some way to correct this? Examples: Tell Me Three Times is a single bit error correcting code which will correct only flip errors. Tell Me Five Times is a single bit error correcting code which will correct both flip and phase errors. References 1. Negative Entropy and Information in Quantum Mechanics Nicolas J. Cerf and Chris Adami W.K.Kellogg Radiation Laboratory and Computation and Neural Systems California Institute of Technology Preprint December 20, 1995 2. Quantum Information Theory of Entanglement Nicolas J. Cerf and Chris Adami W.K.Kellogg Radiation Laboratory and Computation and Neural Systems California Institute of Technology Preprint May 25, 1996 3. Sphere Packings, Lattices, and Groups, 2nd Edition J.H.Conway and N.J.A.Sloane Springer-Verlag 1993 4. Coding and Information Theory Steven Roman Springer-Verlag 1992 5. Theory of Error-Correcting Codes F.J.MacWilliams and N.J.A.Sloane North-Holland 1977

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Quantum Information Theory