Download Chapter 6: Complex Matrices We assume that the reader has some

Chapter 6: Complex Matrices We assume that the reader has some experience with matrices and determinants. We can easily extend basic theory of linear algebra by allowing taking complex numbers as matrix entries. However, we should pay more attention to features unique to complex matrices, especially the notion of adjoint, which is the matrix version of complex conjugate. Among the first things we learned from linear algebra is the intimate relation between matrices and linear mappings. To describe this relation within our convention, we need to identify each vector in Cn as a column, that is, as an n × 1 matrix. Thus a vector in Cn , say x = (x1 , x2 , . . . , xn ), will be considered as the same as  x1  x2  ⊤  x=  ...  ≡ [x1 x2 · · · xn ] .  xn We are safeguarded from confusion by different types of brackets. From now on, let us adopt the following rule: things in a row surrounded by the round brackets “(” and “)” is the same things in a column surrounded by the square brackets “[” and “]”, e.g. arranged dog (dog, cat) = . We have the following cat “Matrix Representation Theorem” A map T from Cn to Cm is linear if and only if there exists an m × n matrix A such that T x = Ax for all x ∈ Cn . Furthermore, the matrix A here is uniquely determined by T . (Recall that a mapping T from Cn to Cm (we write T : Cn → Cm ) is linear if the following identity holds for all vectors x, y in Cn and all scalars α, β: T (αx + βy) = αT x + βT y.) Given a complex matrix A, we define the adjoint of A, denoted by A∗ , to be the conjugate transpose of A. In other words, A∗ is obtained by taking complex conjugate of ⊤ all entries of A, followed by taking the transpose: A∗ = A . Thus a11 a  21  A=   am1 a12 a22 ··· ···   a1n a11 a2n   a12   ∗  =⇒ A =    am2 · · · amn a1m 1 a21 a22 ··· ···  an1 an2   .  a2m · · · anm As we have mentioned, the adjoint is the matrix version of the complex conjugate. Example 6.1. Regarding a vector v = (a1 , a2 , · · · , an ] in Cn as a matrix, we have  a1  a2   v=  ...  ,  a1 a1  a2 a1  vv∗ =    v∗ = [a1 a2 · · · an ], an an a1 a1 a2 a2 a2 ··· ···  a1 an a2 an   ,  an a2 · · · an an and v∗ v = |a1 |2 + |a2 |2 + · · · + |an |2 = v, v. For n × n matrices A and B, and for a complex number α, we have (A + B)∗ = A∗ + B ∗ , (αA)∗ = aA∗ (AB)∗ = B ∗ A∗ The last identity tells us that in general (AB)∗ = A∗ B ∗ is false. The following identity is the most basic feature concerning the adjoint of a matrix: for every n × n matrix A, and all vectors x, y in the complex vector space Cn , we have Ax, y = x, A∗ y We check this identity only for 2 × 2 matrices. Suppose a A = 11 a21 Then a12 , a22 a x + a12 x2 Ax = 11 1 a21 x1 + a22 x2 x1 x= x2 y1 y= . y2 a11 y1 + a21 y2 and A y = . a12 y1 + a22 y2 ∗ So Ax, y = a11 x1 y 1 + a12 x2 y 1 + a21 x1 y 2 + a22 x2 y 2 and x, A∗ y = x1 a11 y 1 + x1 a21 y 2 + x2 a12 y 1 + x2 a22 y 2 . Comparing them, we see that Ax, y = x, A∗ y. We say that an n × n matrix is self–adjoint or Hermitian if A∗ = A. The last identity can be regarded as the matrix version of z = z. So being Hermitian is the matrix analogue of being real for numbers. We say that a matrix A is unitary if A∗ A = AA∗ = I, that is, the adjoint A∗ is equal to the inverse of A. The identity A∗ A = AA∗ = I is the matrix analogue of zz = 1, or |z| = 1. Thus, being unitary is a matrix analogue of being unit modular for complex numbers. Denote by U(n) the set of 2 all n × n unitary matrices. It is easy to check that U(n) forms a group under the usual matrix multiplication. For example, A, B ∈ U(n) implies A∗ A = B ∗ B = I and hence (AB)(AB)∗ = ABB ∗ A∗ = AIA∗ = AA∗ = I, etc. The group U(n) is called the unitary group. It plays a basic role in the geometry of the complex vector space Cn . Let A be an n × n unitary matrix and denote by v1 , v2 , . . . , vn its column vectors. Thus we have A = [v1 v2 . . . vn ] and hence v1∗ v1  v2∗ v1   ∗ ∗  A =  ..  A A =   . vn∗ v ∗ v1  v∗  1  v2∗  n v1∗ v2 v2∗ v2 ··· ···   v1 , v1 v1∗ vn ∗ v2 vn   v2 , v1   =   vn∗ v2 · · · vn∗ vn vn , v1 v1 , v2 v2 , v2 ··· ··· vn , v2 · · ·  v1 , vn v2 , vn   .  vn , vn Thus A∗ A = I tells us that vj , vk = δjk , meaning that the columns v1 , v2 , . . . , vn form an orthonormal basis in Cn . We have shown that the columns of a unitary matrix form an orthonormal basis. It turns out that the converse is also true. We have arrived at the following characterization of unitary matrices: An n × n matrix is unitary iff its columns form an orthonormal basis in Cn . Here “iff” stands for “if and only if”, a short hand invented by Paul Halmos. We also have the “real version” of the above statement: A real n × n matrix is orthogonal iff its columns form an orthonormal basis in Rn . Now we give examples of unitary matrices which are used in practice: communication theory (but exactly how they are used is too lengthy to be explained here). Example 6.2. The matrix √ √ 1 1 1 1/√2 1/√2 H1 = √ wih columns v1 = , v2 = 1/ 2 −1/ 2 2 −1 1 is an orthogonal matrix, since we can check that its columns v1 , v2 form an orthonormal basis in R2 . Now we describe a process to define the Hadamard matrix Hn . Let a11 a12 A= a21 a22 be a 2 × 2 matrix and let B be an n × n matrix. We define their tensor product A ⊗ B to be the 2n × 2n matrix given a11 B a12 B A⊗B = . a21 B a22 B 3 We have the following basic identities about tensor products of matrices: (A ⊗ B)∗ = A∗ ⊗ B ∗ , aA ⊗ bB = ab(A ⊗ B), (A ⊗ B)(C ⊗ D) = AC ⊗ BD. (6.1) A consequence of these identities is: if A and B are unitary (or orthogonal), then so is A ⊗ B. For example H2 ≡ H1 ⊗ H1 = 1 2 1 −1 1 1  −1 1 1 1 ⊗ =  1 −1 1 2 −1 1  1 1 −1 −1 1 −1 1 −1  1 1  1 1 We can define Hn inductively by putting Hn = H1 ⊗ Hn−1 1 =√ 2 Hn−1 −Hn−1 Hn−1 Hn−1 which is a 2n × 2n orthogonal matrix, called the Hadamard matrix. We remark that tensoring is an important operation used in many areas, such as quantum information and quantum computation. Example 6.3. Let ω = e2πi/n . The columns of the following matrix is the orthonormal basis of Cn described in Example 5.2 of the last chapter and hence is a unitary matrix: 1 1  1 1   F =√  n    1 ω ω2 1 ω n−1 1 ω2 ω4 1 ω3 ω6 1 ω4 ω8 ··· ··· ··· 1 ωn−1 ω2(n−1) ω2(n−1) ω 3(n−1) ω 4(n−1) ··· ω(n−1)(n−1)          The linear mapping associated with this matrix is called the finite Fourier transform. To speed up this transform by using some special methods is related to saving the cost of communication network in recent years. The rediscovery of so–called FFT (Fast Fourier Transform) has great practical significance. Now the historian can trace back FFT method as early as Gauss. The material in the rest of the present chapter is optional. We say that an n × n complex matrix A is orthogonally diagonalizable if there is an orthonormal basis E = {e1 , e2 , . . . , en } consisting of eigenvectors of A, that is, 4 for each k, Aek = λk ek , where λk is the eigenvalue corresponding to the eigenvector ek . Now we use the basis vectors (considered as columns) in E to form the unitary matrix U = [e1 e2 . . . en ]. In the next step, we make use of Aek = λk ek , but somehow we find it incorrect because we need to consider the scalar λk as a 1 × 1 matrix, while the vector ek on its right hand side is n × 1. To adjust this, we rewrite λk ek as ek λk . Thus we have Aek = ek λk . Now the way is clear for the following matrix manipulation: AU = A[e1 e2 . . . en ] = [Ae1 Ae2 . . . Aen ] = [e1 λ1 e2 λ2 . . . en λn ] = [e1 e2 . . . en ]D = U D where D is the diagonal matrix given by λ 0 λ2 0 0 0 λ3 0 0 0 0 0 0 λn 1  0   0 D=          Thus we have A = U DU −1 . The above steps can go backward. So we have proved: Fact. A is orthogonally diagonalizable if and only if A = U DU −1 ≡ U DU ∗ for some unitary U and diagonal D. The identity A = U DU ∗ gives A∗ = (U ∗ )∗ D ∗ U ∗ = U D∗ U ∗ = U D∗ U −1 and hence A∗ A = U D∗ U −1 U DU −1 = U D∗ DU −1 = U DD ∗ U −1 = U DU −1 U D∗ U −1 = AA∗ , in view of |λ1 |2  0   0 ∗ ∗ D D = DD =     0 0 |λ2 |2 0 0 0 |λ3 |2 0 0  0 0   0     |λn |2 A matrix A is called a normal matrix if the identity A∗ A = AA∗ holds. We have shown that orthogonally diagonalizable matrices are normal. An important fact in linear algebra says that the converse is also true. So we conclude: Fact. A complex matrix is orthogonally diagonalizable if and only if it is normal. 5 We do not prove this theorem here because it takes the length much more than the one we are willing to complete. Notice that both self adjoint matrices and unitary matrices are normal and hence they are orthogonally diagonalizable. Denote by SU(2) the set of all 2 × 2 unitary matrices of determinant equal to 1: SU(2) = {U ∈ U(2) : det(U ) = 1}. Let U be in SU(2). Write down U and U U ∗ explicitly as follows z w U= u v z w and UU = u v ∗ z̄ w̄ 2 ū |z| + |w|2 = v̄ uz̄ + v w̄ z ū + wv̄ . |u|2 + |v|2 From U U ∗ = I we get |z|2 + |w|2 = 1 and uz̄ + v w̄ = 0. Assume w = 0 and z = 0. Then we may write u = αw̄ and v = β z̄ for some α and β. Now uz̄ + v w̄ = 0 gives (α + β)zw = 0 and hence α + β = 0. Thus 1 = det(U ) = zv − wu = z(β z̄) − w(αw̄) = z(β z̄) − w(−β w̄) = β(|z|2 + |w|2 ) = β. Therefore U is of the form z U= −w̄ w , where |z|2 + |w|2 ≡ z̄z + w̄w = 1. z̄ (6.2) In case z = 0 or w = 0, U has the same form (please check this). We conclude: a 2 × 2 matrix U is in SU(2) if and only if it can be expressed at (6.2) above. Writing z = x0 + ix1 and w = x1 + ix2 in (6.2), we have z U= −w̄ where w x0 + ix1 = z̄ −x2 + ix3 1 0 i 1= , i= 0 1 0 x2 + ix3 x0 − ix1 = x0 1 + x1 i + x2 j + x3 k, 0 0 , j= −i −1 1 0 i , k= . 0 i 0 (6.3) (6.4) Matrix U in (6.2) belongs to SU(2) if and only if |z|2 + |w|2 ≡ x20 + x21 + x22 + x23 = 1. An expression written as the RHS of (6.3), without the condition x20 + x21 + x22 + x23 = 1 imposed, is called a quaternion. Since the theory of quaternions was discovered by 6 Hamilton, we denote the collection of all quaternions by H. The algebra of quaternions is determined by the following identities among basic units 1, i, j, k: 1q = q1 = q, i2 = j2 = k2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j, (6.5) where q is any quaternion. These identities can be checked by direct computation. We usually suppress the unit 1 of the quaternion algebra H and write x0 for x0 1. Let q be the quaternion given as (6.3), which is a 2 × 2 complex matrix. Its adjoint is given by z̄ −w x0 − ix1 −x2 − ix3 ∗ = = x0 − x1 i − x2 j − x3 k, q = w̄ z x2 − ix3 x0 + ix1 which is also called the conjugate of q. A direct computation shows q∗ q = qq∗ = (|z|2 + |w|2 )1 ≡ |z|2 + |w|2 = det(q) = x20 + x21 + x22 + x23 . The square root of the last expression is called the norm of q and is denoted by q. Thus q∗ q = qq∗ = q2 . So, q is in SU(2) if and only if q = 1: SU(2) = {q = x0 + x1 i + x2 j + x3 k ∈ H : q2 ≡ x20 + x21 + x22 + x23 = 1}. Regarding H as the 4-dimensional space with rectangular coordinates x0 , x1 , x2 , x3 , we may identity SU(2) is the 3-dimensional sphere x20 + x21 + x22 + x23 = 1, which will be simply called the 3-sphere. Notice that, if we write z = x0 + x1 i and w = x2 + x3 i, then q = x0 + x1 i + x2 j + x3 k can be written as q = z + wj, in view of ij = k. For a quaternion q = x0 + x1 i + x2 j + x3 k, we often write q = x0 + x, where x0 is called the scalar part and x = x1 i + x2 j + x3 k is called the vector part. From (6.7) we see how to multiply “pure vector” quaternions. It is easy to check that the product of two quaternions q = x0 + x and r = y0 + y is determined by qr = (x0 + x)(y0 + y) = x0 y0 + x0 y + y0 x + xy, where xy = −x · y + x × y. (6.6) The “scalar plus vector” decomposition q = x0 + x of a quaternion is also convenient for deciding its conjugate, as we can easily check that q∗ = (x0 + x)∗ = x0 − x, 7 (6.7) which resembles the identity x + iy = x − iy for complex numbers. From (6.7) we see that a quaternion q is a pure vector if and only if q∗ = −q, that is, q is skew Hermitian. We identify a pure vector x = x1 i + x2 j + x3 k with the vector x = (x1 , x2 , x3 ) in R3 . For each q ∈ SU(2), define a linear transformation R(q) in R3 by putting R(q)x = q∗ xq. We can check that y ≡ R(q)x is indeed in R3 : y∗ = (R(q)x)∗ = (q∗ xq)∗ = q∗ x∗ q = q∗ (−x)q = −q∗ xq = −y. The most interesting thing about R(q) is that it is an isometry: x and y ≡ R(q)x have the same length. Indeed, y2 = y∗ y = (q∗ xq)∗ (q∗ xq) = q∗ x∗ qq∗ xq = q∗ x∗ xq = q∗ x2 q = x2 q∗ q = x2 . Using some connectedness argument in topology, one can show that R(q) is actually a rotation (not a reflection) in 3–space. It turns out that every rotation in 3–space can be written in the form R(q) and we call it the spinor representation of the rotation. Also, we call SU(2) the spinor group. It is an essential mathematical device for describing electron spin, and studying aircraft stability. It is also used to explain how a cat can turn its body 180o in the midair in order to achieve a safe landing, without violating the basic physical law of conservation of angular momentum. 8

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Chapter 6: Complex Matrices We assume that the reader has some