Download Quantum Information Theory

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Transcript
Quantum Information Theory
This web site is intended to present Quantum Information Theory at an undergraduate
level. It is assumed that the reader has a basic understanding of quantum mechanics.
The basics of classical information theory are presented, but for greater depth please
see the references.
Here are a list of topics in quantum information theory, presented in the order in
which you will probably need to read them. Examples are provided for almost every
concept.
1.
2.
3.
4.
5.
6.
Quantum Density Matrix
Entropy
Mutual Information and Conditional Entropy
Galois Fields
Classical Coding Theory
Quantum Coding Theory
Notation used in these pages: a prime following a complex number indicates the
complex conjugate, a prime following a matrix or operator indicates the Hermitian
conjugate. All logarithms in the text are to the base 2, so that information is measured
in units of bits. Because not all web browsers support superscripts and subscripts, an
underscore precedes subscripts and a carat precedes superscripts. X^2 is "X squared"
and X_2 is "X sub 2". Because of the hassle of making bent brackets in HTML,
parentheses are used in Dirac notation, bra = (X| and ket = |X).
Quantum Density Matrix
A quantum density matrix expresses the distribution of quantum states in an ensemble
of particles. The density matrix is Hermitian with all its eigenvalues between 0 and 1.
Let the density matrix for a certain ensemble be D. A particle is drawn from this
ensemble and subjected to a measurement. The probability that the particle will be
found in eigenstate |A) of the measurement is (A|D|A). The density matrix for an
ensemble of particles prepared in the pure state |Y) is |Y)(Y|. The density matrix for
an ensemble that is an incoherent mixture of pure states is the weighted sum of the
density matrices for the pure states.
Example 1
Let the particles in an ensemble have a two dimensional state space spanned by
eigenstates |0) and |1). Each particle may be thought of as a quantum bit, or "qubit". If
all particles are prepared in the pure state |Y) = a |0) + b |1), with a and b complex, aa'
+ bb' = 1, then the quantum density matrix for this ensemble is:
--D = |Y)(Y| = | aa' ab' |
| ba' bb' |
---
Example 2
Suppose that 3/4 of the particles in an ensemble are prepared in the pure state |A) =
0.8 |0) + 0.6 |1), and the other 1/4 are prepared in the pure state |B) = 0.6 |0) - 0.8i |1).
The quantum density matrix for this ensemble is:
--|
0.57
0.36+0.12i |
D = 0.75 |A)(A| + 0.25 |B)(B| = |
|
| 0.36-0.12i
0.43
|
---
A particle is drawn from this ensemble and a measurement is performed whose
eigenstates are |0) and |1). There is a 57% chance that the particle will be found in the
state |0) and a 43% chance that the particle will be found in state |1). Another particle
is drawn and a measurement is performed whose eigenstates are |E) = 0.6 |0) + 0.8 |1)
and |F) = 0.8 |0) - 0.6 |1). What are the probabilities that the particle will be found in
these states?
Prob(E) = (E|D|E) = 0.75 (E|A)(A|E) + 0.25 (E|B)(B|E)
= [ 0.6
--- --0.8 ] |
0.57
0.36+0.12i | | 0.6 |
|
| |
| = 0.826
| 0.36-0.12i
0.43
| | 0.8 |
--- ---
Prob(F) = (F|D|F) = 0.75 (F|A)(A|F) + 0.25 (F|B)(B|F)
--- --= [ 0.8 -0.6 ] |
0.57
0.36+0.12i | | 0.8 |
|
| |
| = 0.174
| 0.36-0.12i
0.43
| | -0.6 |
--- ---
We find that there is an 82.6% chance that the particle will be found in the state |E)
and a 17.4% chance that the particle will be found in the state |F).
Entropy
The entropy is a measure of the disorder of a system. In information theory, the
entropy represents our ignorance of the state of a system. If our system can be in any
of 2^n possible equally likely states, the entropy is n bits, since it would require
answering n yes/no questions to determine exactly which state the system was in. For
systems like this, the entropy is log(states) [base 2]. Now suppose that we divide the
states of the system into two sets, one containing nine tenths of the states and the
other containing the other tenth. You agree to tell me which of the two sets the actual
state of the system is in. If it is the smaller part, the entropy is log(N/10) = log(N) log(10) so has decreased by log(10)=3.32193 bits. If it is in the larger part, the
entropy is log(9N/10) = log(N) - log(10/9) so has decreased by log(10/9)=0.15200
bits. Since the state has a nine tenths chance of being in the larger part, the expected
decrease in entropy is
delta(E) = 0.9 log(10/9) + 0.1 log(10)
= -0.9 log(0.9) - 0.1 log(0.1) = 0.46900
This is the same as having a system which can be in either of two states, one with
probability 0.9 and one with probability 0.1. The Shannon entropy of this system is
defined to be the expected amount of information gained (i.e. entropy lost) by finding
out which of the two states the system is actually in (0.46900 bits). This easily
generalizes to the cases with many states of unequal probability.
Given a discrete random variable X with distribution P_X and letting p_x be the
probability that X=x, the Boltzmann-Gibbs-Shannon entropy of the distribution P_X
is defined as:
S(P_X) = - Sum(x) p_x log p_x
Here and in all these documents log means the logarithm to the base 2, so the entropy
is expressed in bits. Please note that in this computation 0 log 0 = 0 since lim(a->0) a
log a = 0.
An important property of the entropy defined this way is that when two independent
systems are combined into a larger system, the entropy of the combined system is the
sum of the entropies of the component systems. Put more simply, if I give you N bits
of information and then give you M more bits of information which is completely
unrelated to the first N bits, you now have N+M bits of information about the system.
(This certainly seems like a good thing to demand of the definition of entropy.)
Given two independent discrete random variables X and Y with distributions P_X and
Q_Y and letting p_x be the probability that X=x, q_y be the probability that Y=y. The
probability that X=x and Y=y is thus p_x q_y. The Boltzmann-Gibbs-Shannon
entropy of the joint distribution P_X Q_Y is:
S(P_X Q_Y) = - Sum(x,y) p_x q_y log(p_x q_y)
= -Sum(x,y) p_x q_y (log p_x + log q_y)
= - Sum(x) Sum(y) p_x q_y log p_x
- Sum(y) Sum(x) p_x q_y log q_y
= -(Sum(x) p_x log p_x) (Sum(y) q_y)
-(Sum(y) q_y log q_y) (Sum(x) p_x)
= - Sum(x) p_x log p_x - Sum(y) q_y log q_y
= S(P_X) + S(Q_Y)
since Sum(x) p_x = Sum(y) q_y = 1
An equivalent expression for the quantum entropy has been introduced by von
Neumann in terms of the quantum density matrix D:
S = - Trace( D log D )
To evaluate this expression write D in the form V'EV where E is a diagonal matrix of
eigenvalues of D and V is a matrix whose columns are the corresponding eigenvectors
of D.
S = - Trace( D log D ) = - Trace( V EV log(V'EV) )
= - Trace( V'EVV' log(E) V) = -Trace( V' (E log E) V )
= - Trace( E log E ) = - Sum(i) E_i log E_i
where E_i are the eigenvalues of D.
For an ensemble of particles in a pure state, D will have one eigenvalue equal to 1 and
all the others 0. Since 1 log 1 = 0 log 0 = 0, the entropy of this ensemble is 0, as
expected. Viewed in a basis where D is diagonal, the quantum entropy is exactly the
Shannon entropy, since the eigenvalues of D are the probabilities to find a particle in
each of the eigenstates.
Example
Let us compute the entropy of a particle drawn from the distribution in example 2 of
the Quantum Density Matrix page.
--|
0.57
0.36+0.12i |
D = |
|
| 0.36-0.12i
0.43
|
--D has eigenvalues 0.885876 and 0.114124
S = -0.885876 log 0.885876 - 0.114124 log 0.114124
= 0.512232
Compare this to the maximum entropy for a qubit:
S_max = -0.5 log 0.5 - 0.5 log 0.5 = log 2 = 1
Mutual Information and Conditional
Entropy
NOTE: The majority of this page comes from papers written by Nicolas Cerf and
Chris Adami at Caltech. These papers are very readable and highly suggested reading.
[Nicolas and Chris - if I mess this up, give me some e-mail and I'll correct it Michael]
Fundamentals of Probability Theory
Given a sample space S and events E and F in S, the probability of E is written P(E)
and the probability of F is written P(F). The joint probability of E and F is the
probability that both occur, P(E and F).
E and F are independent if and only if P(E and F) = P(E) * P(F).
P(E or F) = P(E) + P(F) - P(E and F)
The conditional probability of E occurring, given that F has occurred is
P(E | F) = P(E and F) / P(F)
Bayes Rule: If (E_1,...,E_n) are n mutually exclusive events whose union is the
sample space S, and E is any arbitrary event of S with nonzero probability, then
P(E_k | E) = P(E_k) P(E | E_k) / P(E)
Conditional Entropy - Classical
The classical conditional entropy is defined as:
H(X | Y) = H(P_XY) - H(Y) = - Sum(x,y) p_xy log p_x|y
where p_x|y = p_xy / p_y = probability of X=x conditional on Y=y
In the classical theory H(X|Y) is always positive because the entropy of a composite
system cannot be lower than the entropy of any of its subsystems. Surprisingly, this is
not true in the quantum case, when X and Y are quantum entangled systems. For a
classical system:
max[H(P_X),H(P_Y)] <= H(P_XY) <= H(P_X) + H(P_Y)
The upper bound is reached if X and Y are independent, and the lower bound is
reached if X and Y are maximally correlated.
Mutual Information - Classical
Given a discrete two dimensional sample space with distribution P_XY, we write
p_xy = Prob(X=x,Y=y). The distributions of X and Y separately are P_X and P_Y,
and the probabilities p_x = Prob(X=x) and p_y = Prob(Y=y) may be calculated in
terms of p_xy:
p_x = Sum(y) p_xy
p_y = Sum(x) p_xy
The entropy of each distribution is:
H(XY) = - Sum(x,y) p_xy log p_xy
H(X)
= - Sum(x) p_x log p_x
H(Y)
= - Sum(y) p_y log p_y
The mutual information of X and Y, written I(X,Y) is the amount of information
gained about X when Y is learned, and vice versa. I(X,Y) = 0 if and only if X and Y
are independent.
I(X,Y) = H(P_X) + H(P_Y) - H(P_XY)
I(X,Y) <= min[H(P_X),H(P_Y)]
The bound on I(X,Y) is twice as large in the quantum case as it is in the classical case.
The maximum for the quantum case occurs for entangled states which have no
classical counterpart.
Conditional Entropy - Quantum
Cerf and Adami define the conditional density matrix
D_X|Y = D_XY ( 1_X @ D_Y )^-1
where @ represents the tensor product,
and 1_X is the unity matrix in X's Hilbert space.
D_X|Y is not a true density matrix because it does not have trace 1. Because of this,
the conditional entropy can be negative. The quantum conditional entropy is defined
as
S(X|Y) = - Trace( D_XY ln D_X|Y ) = S(D_XY) - S(D_Y)
Mutual Information - Quantum
The mutual information of two entities in a quantum system is defined in terms of the
quantum entropy which is expressed in terms of the quantum density matrix.
I(X,Y) = S(D_X) + S(D_Y) - S(D_XY)
I(X,Y) <= 2 min[S(D_X),S(D_Y)]
The classical limit is exceeded in the quantum entangled case and is related to
negative conditional entropy.
Examples
1. Independent States
2. Classically Correlated States
3. Quantum Entangled States
Galois Fields
A Galois field is a finite field with p^n elements where p is a prime integer. The set of
nonzero elements of the field is a cyclic group under multiplication. A generator of
this cyclic group is called a primitive element of the field. The Galois field can be
generated as the set of polynomials with coefficients in Z_p modulo an irreducible
polynomial of degree n.
Example 1: GF(2)
GF(2) consists of the elements 0 and 1 and is the smallest finite field. It is generated
by polynomials over Z_2 modulo the polynomial x. Its addition and multiplication
tables are as follows:
+ | 0 1
--+---0 | 0 1
1 | 1 0
* | 0 1
--+---0 | 0 0
1 | 0 1
Codes often use GF(2) because it is easily represented on a computer by a single bit.
Example 2: GF(3)
GF(3) consists of the elements 0, 1, and -1. It is generated by polynomials over Z_3
modulo the polynomial x. Its addition and multiplication tables are as follows:
+ | 0 1 -1
--+-------0 | 0 1 -1
1 | 1 -1 0
-1 |-1 0 1
* | 0 1 -1
--+-------0 | 0 0 0
1 | 0 1 -1
-1 | 0 -1 1
Some codes called ternary codes use GF(3).
Example 3: GF(4)
Of particular interest in quantum information theory is GF(4), which is generated by
polynomials over Z_2 modulo the irreducible polynomial x^2 + x + 1. Its elements
are denoted here as (0,1,A,B). Here are the addition and multiplication tables for
GF(4):
+ | 0 1 A B
--+-------0 | 0 1 A B
1 | 1 0 B A
A | A B 0 1
B | B A 1 0
* | 0 1 A B
--+-------0 | 0 0 0 0
1 | 0 1 A B
A | 0 A B 1
B | 0 B 1 A
Because of the multiplication table, A is often identified with the cube root of unity 1/2 + i sqrt(3)/2. A and B are primitive elements of GF(4).
Classical Coding Theory
The fundamental problem of coding theory is the reliable transmission of information
in the presence of noise. A message is encoded into a stream of bits and decoded at
the other end. In order to protect against the corruption of the message by noise, some
degree of redundancy is required in the message. The greater the chance of a bit error,
the more redundancy is needed to keep the probability of error low.
A code C over an alphabet A is a set of vectors of fixed length n with entries from A.
A is generally chosen to be a finite field GF(q), and is often in practice just (0,1).
The Hamming distance d(u,v) between two vectors u and v is the number of places in
which they differ. For a vector u over GF(q), define the weight, wt(u), as the number
of nonzero components. Then d(u,v) = wt(u-v). The minimum Hamming distance
between two distinct vectors in a code C is called the minimum distance d. A code can
detect e errors if e < d. A code can correct t errors if 2t+1 < d.
Error correction proceeds by computing a syndrome from the received code word
which tells which bits are in error. The error bits are then changed to form a corrected
code word, which will be equal to the transmitted code word if no more than t errors
occurred during transmission.
The rate of a code is the ratio of the number of bits needed to send a message in an
error-free transmission to the number needed to send the message using the code. One
of the goals of coding theory is to find efficient codes, i.e., those that have as large a
rate as possible for a given level of error tolerance.
Example 1: Tell Me Three Times
A simple error correcting scheme which can correct one error is to send each bit three
times and assume at least two of the bits are correct at the receiver. The alphabet here
is (0,1) and the two code words are 000 and 111. The code words differ in three
positions so the minimum distance of this code is 3, thus we can correct one error.
The rate of this code is one third since we send three bits for every bit of the original
message. Suppose we receive the message abc. We form the syndrome (b+c, a+c). If
the syndrome is:




(0,0) - do nothing
(0,1) - flip a
(1,0) - flip b
(1,1) - flip c
Example 2: Hamming(7,4,3)
The alphabet for this code is (0,1). There are 16 seven bit code words:
0000000,
0100101,
1000011,
1100110,
0001111,
0101010,
1001100,
1101001,
0010110,
0110011,
1010101,
1110000,
0011001,
0111100,
1011010,
1111111
The first four bits of each code word are the message we wish to send, and the
remaining three are parity bits. Given a message abcd we compute e=b+c+d,
f=a+c+d, g=a+b+d, and send the message abcdefg. Upon reception we form the
syndrome (d+e+f+g, b+c+f+g, a+c+e+g) and correct bits as follows:


(0,0,0) - do nothing
(0,0,1) - flip a






(0,1,0) - flip b
(0,1,1) - flip c
(1,0,0) - flip d
(1,0,1) - flip e
(1,1,0) - flip f
(1,1,1) - flip g
If there was no more than one bit in error, abcd will be the original message
transmitted. The rate of this code is 4/7.
Quantum Coding Theory
In the course of a computation in a quantum computer, the transmission of data from
one computing element to another will be disturbed by thermal noise. Classical
computers overcome this by using a large number of particles for each bit sent.
Quantum computers likewise need redundancy in the transmission using quantum
codes. It is important that the error correcting scheme used return the message to its
original state, which may be a superposition of code words. Because of this, the
correction must take place without performing a measurement of the codeword, which
would destroy the phase information in the superposition.
The basic idea behind quantum error correction is to give the code qubits several
paths to take, each of which corresponds to an error syndrome. In that branch, the
error is corrected and all the branches are merged to form a corrected output. The trick
is to find a Hamiltonian which splits codewords with different syndromes apart from
each other without introducing further noise into the codewords. Great care must be
taken to merge all the branches back together in phase, or the phase relationships of
the original codeword will be altered.
To show a simple example of correction, suppose I wish to transport a vertically
polarized photon to you through a channel which may change the polarization. Upon
receiving my photon, you pass it through a birefringent crystal to separate the
horizontal and vertical polarization components. You place an element into the
horizontal channel which rotates the polarization by 90 degrees, then recombine the
signals. You are now guaranteed to receive a vertically polarized photon. The "error"
has been corrected. This device cannot be used to send a message, because it transmits
no information. You know with certainty that you will receive a vertically polarized
photon. However, it does illustrate the principle of quantum error correction. Below, I
go through the example of the quantum equivalent of Tell-Me-Three-Times which
outlines a potentially useful quantum code.
In a classical system the only type of error is a bit-flip. A quantum bit can have both
phase and flip errors. The general error for a qubit is a 2 by 2 unitary transformation.
It has been shown that if flips, phase errors, and a combination of the two can be
corrected, then any unitary transformation can be corrected. Flips are represented by
the matrix X, and phase errors by Z. Most of the references use Y=XZ as the
flip+phase error, but I prefer to stick to the Pauli matrices:
--
--
--
--
--
--
X = | 0
| +1
--I = | +1
| 0
--
+1 |
Y = | 0 -i |
0 |
| +i
0 |
----0 | = identity matrix
+1 |
--
Z = | +1
| 0
--
0 |
-1 |
--
A unitary matrix U may be represented by:
U = t I + ix X + iy Y + iz Z where t,x,y,z real
and t^2 + x^2 + y^2 + z^2 = 1
Demonstration that if you can correct X, Y, and Z separately, you can correct any
unitary error in a qubit: Assume the correct state is |A) and that |A), X|A), Y|A), Z|A)
are all orthonormal states. We are trying to send |A), but it has been corrupted by the
unitary operator U.
U|A) = t|A) + ix X|A) + iy Y|A) + iz Z|A)
We separate the four states into four different channels, operate on X|A) with X, Y|A)
with Y, and Z|A) with Z, resulting in:
I: t |A)
X: ix X|A)
Y: iy Y|A)
Z: iz Z|A)
goes
goes
goes
goes
to t |A)
to ix |A)
to iy |A)
to iz |A)
The four channels are recombined to give the corrected state [t+ix+iy+iz]|A). I am
disturbed by this because it does not have magnitude 1. Is there some way to correct
this?
Examples:
Tell Me Three Times is a single bit error correcting code which will correct only flip
errors.
Tell Me Five Times is a single bit error correcting code which will correct both flip
and phase errors.
References
1. Negative Entropy and Information in Quantum Mechanics
Nicolas J. Cerf and Chris Adami
W.K.Kellogg Radiation Laboratory and Computation and Neural Systems
California Institute of Technology
Preprint December 20, 1995
2. Quantum Information Theory of Entanglement
Nicolas J. Cerf and Chris Adami
W.K.Kellogg Radiation Laboratory and Computation and Neural Systems
California Institute of Technology
Preprint May 25, 1996
3. Sphere Packings, Lattices, and Groups, 2nd Edition
J.H.Conway and N.J.A.Sloane
Springer-Verlag 1993
4. Coding and Information Theory
Steven Roman
Springer-Verlag 1992
5. Theory of Error-Correcting Codes
F.J.MacWilliams and N.J.A.Sloane
North-Holland 1977