Download Chapter 6: Complex Matrices We assume that the reader has some

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Capelli's identity wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

System of linear equations wikipedia , lookup

Rotation matrix wikipedia , lookup

Principal component analysis wikipedia , lookup

Determinant wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Jordan normal form wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Gaussian elimination wikipedia , lookup

Four-vector wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Matrix multiplication wikipedia , lookup

Transcript
Chapter 6: Complex Matrices
We assume that the reader has some experience with matrices and determinants.
We can easily extend basic theory of linear algebra by allowing taking complex numbers
as matrix entries. However, we should pay more attention to features unique to complex
matrices, especially the notion of adjoint, which is the matrix version of complex conjugate.
Among the first things we learned from linear algebra is the intimate relation between
matrices and linear mappings. To describe this relation within our convention, we need to
identify each vector in Cn as a column, that is, as an n × 1 matrix. Thus a vector in Cn ,
say x = (x1 , x2 , . . . , xn ), will be considered as the same as

x1
 x2 
⊤

x=
 ...  ≡ [x1 x2 · · · xn ] .

xn
We are safeguarded from confusion by different types of brackets. From now on, let us
adopt the following rule: things in a row surrounded by the round brackets “(” and “)” is
the same things
in a column surrounded by the square brackets “[” and “]”, e.g.
arranged
dog
(dog, cat) =
. We have the following
cat
“Matrix Representation Theorem” A map T from Cn to Cm is linear if
and only if there exists an m × n matrix A such that T x = Ax for all x ∈ Cn .
Furthermore, the matrix A here is uniquely determined by T . (Recall that a mapping
T from Cn to Cm (we write T : Cn → Cm ) is linear if the following identity holds
for all vectors x, y in Cn and all scalars α, β: T (αx + βy) = αT x + βT y.)
Given a complex matrix A, we define the adjoint of A, denoted by A∗ , to be the
conjugate transpose of A. In other words, A∗ is obtained by taking complex conjugate of
⊤
all entries of A, followed by taking the transpose: A∗ = A . Thus
a11
a
 21

A=


am1
a12
a22
···
···


a1n
a11
a2n 
 a12


∗
 =⇒ A = 


am2
· · · amn
a1m
1
a21
a22
···
···

an1
an2 

.

a2m
· · · anm
As we have mentioned, the adjoint is the matrix version of the complex conjugate.
Example 6.1. Regarding a vector v = (a1 , a2 , · · · , an ] in Cn as a matrix, we have

a1
 a2 

v=
 ...  ,

a1 a1
 a2 a1

vv∗ = 


v∗ = [a1 a2 · · · an ],
an
an a1
a1 a2
a2 a2
···
···

a1 an
a2 an 

,

an a2
· · · an an
and v∗ v = |a1 |2 + |a2 |2 + · · · + |an |2 = v, v.
For n × n matrices A and B, and for a complex number α, we have
(A + B)∗ = A∗ + B ∗ ,
(αA)∗ = aA∗
(AB)∗ = B ∗ A∗
The last identity tells us that in general (AB)∗ = A∗ B ∗ is false.
The following identity is the most basic feature concerning the adjoint of a matrix:
for every n × n matrix A, and all vectors x, y in the complex vector space Cn , we have
Ax, y = x, A∗ y
We check this identity only for 2 × 2 matrices. Suppose
a
A = 11
a21
Then
a12
,
a22
a x + a12 x2
Ax = 11 1
a21 x1 + a22 x2
x1
x=
x2
y1
y=
.
y2
a11 y1 + a21 y2
and A y =
.
a12 y1 + a22 y2
∗
So Ax, y = a11 x1 y 1 + a12 x2 y 1 + a21 x1 y 2 + a22 x2 y 2 and x, A∗ y = x1 a11 y 1 + x1 a21 y 2 +
x2 a12 y 1 + x2 a22 y 2 . Comparing them, we see that Ax, y = x, A∗ y.
We say that an n × n matrix is self–adjoint or Hermitian if A∗ = A. The last
identity can be regarded as the matrix version of z = z. So being Hermitian is the
matrix analogue of being real for numbers. We say that a matrix A is unitary if
A∗ A = AA∗ = I, that is, the adjoint A∗ is equal to the inverse of A. The identity
A∗ A = AA∗ = I is the matrix analogue of zz = 1, or |z| = 1. Thus, being unitary is a
matrix analogue of being unit modular for complex numbers. Denote by U(n) the set of
2
all n × n unitary matrices. It is easy to check that U(n) forms a group under the usual
matrix multiplication. For example, A, B ∈ U(n) implies A∗ A = B ∗ B = I and hence
(AB)(AB)∗ = ABB ∗ A∗ = AIA∗ = AA∗ = I, etc. The group U(n) is called the unitary
group. It plays a basic role in the geometry of the complex vector space Cn .
Let A be an n × n unitary matrix and denote by v1 , v2 , . . . , vn its column
vectors. Thus we have A = [v1 v2 . . . vn ] and hence
v1∗ v1
 v2∗ v1


∗
∗

A =
 ..  A A = 

.
vn∗
v ∗ v1
 v∗ 
1
 v2∗

n
v1∗ v2
v2∗ v2
···
···
 
v1 , v1 v1∗ vn
∗
v2 vn   v2 , v1  
=
 
vn∗ v2
· · · vn∗ vn
vn , v1 v1 , v2 v2 , v2 ···
···
vn , v2 · · ·

v1 , vn v2 , vn 

.

vn , vn Thus A∗ A = I tells us that vj , vk = δjk , meaning that the columns v1 , v2 , . . . , vn
form an orthonormal basis in Cn . We have shown that the columns of a unitary matrix
form an orthonormal basis. It turns out that the converse is also true. We have arrived at
the following characterization of unitary matrices:
An n × n matrix is unitary iff its columns form an orthonormal basis in Cn .
Here “iff” stands for “if and only if”, a short hand invented by Paul Halmos. We also have
the “real version” of the above statement: A real n × n matrix is orthogonal iff its columns
form an orthonormal basis in Rn . Now we give examples of unitary matrices which are
used in practice: communication theory (but exactly how they are used is too lengthy to
be explained here).
Example 6.2. The matrix
√ √ 1
1 1
1/√2
1/√2
H1 = √
wih columns v1 =
, v2 =
1/ 2
−1/ 2
2 −1 1
is an orthogonal matrix, since we can check that its columns v1 , v2 form an orthonormal
basis in R2 . Now we describe a process to define the Hadamard matrix Hn . Let
a11 a12
A=
a21 a22
be a 2 × 2 matrix and let B be an n × n matrix. We define their tensor product A ⊗ B
to be the 2n × 2n matrix given
a11 B a12 B
A⊗B =
.
a21 B a22 B
3
We have the following basic identities about tensor products of matrices:
(A ⊗ B)∗ = A∗ ⊗ B ∗ ,
aA ⊗ bB = ab(A ⊗ B),
(A ⊗ B)(C ⊗ D) = AC ⊗ BD.
(6.1)
A consequence of these identities is: if A and B are unitary (or orthogonal), then so is
A ⊗ B. For example
H2 ≡ H1 ⊗ H1 =
1
2
1
−1
1
1  −1
1
1 1
⊗
= 
1
−1 1
2 −1
1

1
1
−1
−1
1
−1
1
−1

1
1

1
1
We can define Hn inductively by putting
Hn = H1 ⊗ Hn−1
1
=√
2
Hn−1
−Hn−1
Hn−1
Hn−1
which is a 2n × 2n orthogonal matrix, called the Hadamard matrix. We remark that
tensoring is an important operation used in many areas, such as quantum information and
quantum computation.
Example 6.3. Let ω = e2πi/n . The columns of the following matrix is the orthonormal basis of Cn described in Example 5.2 of the last chapter and hence is a unitary matrix:
1
1

1
1 

F =√ 
n



1
ω
ω2
1 ω n−1
1
ω2
ω4
1
ω3
ω6
1
ω4
ω8
···
···
···
1
ωn−1
ω2(n−1)
ω2(n−1)
ω 3(n−1)
ω 4(n−1)
···
ω(n−1)(n−1)









The linear mapping associated with this matrix is called the finite Fourier transform.
To speed up this transform by using some special methods is related to saving the cost of
communication network in recent years. The rediscovery of so–called FFT (Fast Fourier
Transform) has great practical significance. Now the historian can trace back FFT method
as early as Gauss.
The material in the rest of the present chapter is optional.
We say that an n × n complex matrix A is orthogonally diagonalizable if there
is an orthonormal basis E = {e1 , e2 , . . . , en } consisting of eigenvectors of A, that is,
4
for each k, Aek = λk ek , where λk is the eigenvalue corresponding to the eigenvector ek .
Now we use the basis vectors (considered as columns) in E to form the unitary matrix
U = [e1 e2 . . . en ]. In the next step, we make use of Aek = λk ek , but somehow we
find it incorrect because we need to consider the scalar λk as a 1 × 1 matrix, while the
vector ek on its right hand side is n × 1. To adjust this, we rewrite λk ek as ek λk . Thus
we have Aek = ek λk . Now the way is clear for the following matrix manipulation:
AU = A[e1 e2 . . . en ]
= [Ae1 Ae2 . . . Aen ]
= [e1 λ1 e2 λ2 . . . en λn ]
= [e1 e2 . . . en ]D = U D
where D is the diagonal matrix given by
λ
0
λ2
0
0
0
λ3
0
0
0
0
0
0
λn
1
 0

 0
D=









Thus we have A = U DU −1 . The above steps can go backward. So we have proved:
Fact. A is orthogonally diagonalizable if and only if A = U DU −1 ≡ U DU ∗ for some
unitary U and diagonal D.
The identity A = U DU ∗ gives A∗ = (U ∗ )∗ D ∗ U ∗ = U D∗ U ∗ = U D∗ U −1 and hence
A∗ A = U D∗ U −1 U DU −1 = U D∗ DU −1
= U DD ∗ U −1 = U DU −1 U D∗ U −1 = AA∗ ,
in view of
|λ1 |2
 0

 0
∗
∗
D D = DD = 



0
0
|λ2 |2
0
0
0
|λ3 |2
0
0

0
0 

0 



|λn |2
A matrix A is called a normal matrix if the identity A∗ A = AA∗ holds. We have
shown that orthogonally diagonalizable matrices are normal. An important fact in linear
algebra says that the converse is also true. So we conclude:
Fact. A complex matrix is orthogonally diagonalizable if and only if it is normal.
5
We do not prove this theorem here because it takes the length much more than the one we
are willing to complete. Notice that both self adjoint matrices and unitary matrices are
normal and hence they are orthogonally diagonalizable.
Denote by SU(2) the set of all 2 × 2 unitary matrices of determinant equal to 1:
SU(2) = {U ∈ U(2) : det(U ) = 1}.
Let U be in SU(2). Write down U and U U ∗ explicitly as follows
z w
U=
u v
z w
and UU =
u v
∗
z̄
w̄
2
ū
|z| + |w|2
=
v̄
uz̄ + v w̄
z ū + wv̄
.
|u|2 + |v|2
From U U ∗ = I we get |z|2 + |w|2 = 1 and uz̄ + v w̄ = 0. Assume w = 0 and z = 0.
Then we may write u = αw̄ and v = β z̄ for some α and β. Now uz̄ + v w̄ = 0 gives
(α + β)zw = 0 and hence α + β = 0. Thus
1 = det(U ) = zv − wu = z(β z̄) − w(αw̄)
= z(β z̄) − w(−β w̄) = β(|z|2 + |w|2 ) = β.
Therefore U is of the form
z
U=
−w̄
w
, where |z|2 + |w|2 ≡ z̄z + w̄w = 1.
z̄
(6.2)
In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 × 2
matrix U is in SU(2) if and only if it can be expressed at (6.2) above.
Writing z = x0 + ix1 and w = x1 + ix2 in (6.2), we have
z
U=
−w̄
where
w
x0 + ix1
=
z̄
−x2 + ix3
1 0
i
1=
, i=
0 1
0
x2 + ix3
x0 − ix1
= x0 1 + x1 i + x2 j + x3 k,
0
0
, j=
−i
−1
1
0 i
, k=
.
0
i 0
(6.3)
(6.4)
Matrix U in (6.2) belongs to SU(2) if and only if
|z|2 + |w|2 ≡ x20 + x21 + x22 + x23 = 1.
An expression written as the RHS of (6.3), without the condition x20 + x21 + x22 + x23 = 1
imposed, is called a quaternion. Since the theory of quaternions was discovered by
6
Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is
determined by the following identities among basic units 1, i, j, k:
1q = q1 = q, i2 = j2 = k2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j,
(6.5)
where q is any quaternion. These identities can be checked by direct computation. We
usually suppress the unit 1 of the quaternion algebra H and write x0 for x0 1. Let q be
the quaternion given as (6.3), which is a 2 × 2 complex matrix. Its adjoint is given by
z̄ −w
x0 − ix1 −x2 − ix3
∗
=
= x0 − x1 i − x2 j − x3 k,
q =
w̄
z
x2 − ix3
x0 + ix1
which is also called the conjugate of q. A direct computation shows
q∗ q = qq∗ = (|z|2 + |w|2 )1 ≡ |z|2 + |w|2
= det(q) = x20 + x21 + x22 + x23 .
The square root of the last expression is called the norm of q and is denoted by q. Thus
q∗ q = qq∗ = q2 .
So, q is in SU(2) if and only if q = 1:
SU(2) = {q = x0 + x1 i + x2 j + x3 k ∈ H : q2 ≡ x20 + x21 + x22 + x23 = 1}.
Regarding H as the 4-dimensional space with rectangular coordinates x0 , x1 , x2 , x3 , we
may identity SU(2) is the 3-dimensional sphere x20 + x21 + x22 + x23 = 1, which will be
simply called the 3-sphere. Notice that, if we write z = x0 + x1 i and w = x2 + x3 i,
then q = x0 + x1 i + x2 j + x3 k can be written as q = z + wj, in view of ij = k.
For a quaternion q = x0 + x1 i + x2 j + x3 k, we often write q = x0 + x, where x0 is
called the scalar part and x = x1 i + x2 j + x3 k is called the vector part. From (6.7) we
see how to multiply “pure vector” quaternions. It is easy to check that the product of two
quaternions q = x0 + x and r = y0 + y is determined by
qr = (x0 + x)(y0 + y) = x0 y0 + x0 y + y0 x + xy, where
xy = −x · y + x × y.
(6.6)
The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for
deciding its conjugate, as we can easily check that
q∗ = (x0 + x)∗ = x0 − x,
7
(6.7)
which resembles the identity x + iy = x − iy for complex numbers. From (6.7) we see that
a quaternion q is a pure vector if and only if q∗ = −q, that is, q is skew Hermitian.
We identify a pure vector x = x1 i + x2 j + x3 k with the vector x = (x1 , x2 , x3 ) in
R3 . For each q ∈ SU(2), define a linear transformation R(q) in R3 by putting
R(q)x = q∗ xq. We can check that y ≡ R(q)x is indeed in R3 :
y∗ = (R(q)x)∗ = (q∗ xq)∗ = q∗ x∗ q = q∗ (−x)q = −q∗ xq = −y.
The most interesting thing about R(q) is that it is an isometry: x and y ≡ R(q)x have
the same length. Indeed,
y2 = y∗ y = (q∗ xq)∗ (q∗ xq)
= q∗ x∗ qq∗ xq = q∗ x∗ xq = q∗ x2 q = x2 q∗ q = x2 .
Using some connectedness argument in topology, one can show that R(q) is actually a
rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be
written in the form R(q) and we call it the spinor representation of the rotation. Also,
we call SU(2) the spinor group. It is an essential mathematical device for describing
electron spin, and studying aircraft stability. It is also used to explain how a cat can turn
its body 180o in the midair in order to achieve a safe landing, without violating the basic
physical law of conservation of angular momentum.
8