* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 7 Eigenvalues and Eigenvectors
Covariance and contravariance of vectors wikipedia , lookup
Matrix completion wikipedia , lookup
Capelli's identity wikipedia , lookup
Symmetric cone wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
System of linear equations wikipedia , lookup
Rotation matrix wikipedia , lookup
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Principal component analysis wikipedia , lookup
Four-vector wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Gaussian elimination wikipedia , lookup
Matrix calculus wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Jordan normal form wikipedia , lookup
Matrix multiplication wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
7 7.1 Eigenvalues and Eigenvectors Introduction The simplest of matrices are the diagonal ones. Thus a linear map will be also easy to handle if its associated matrix is a diagonal matrix. Then again we have seen that the matrix associated depends upon the choice of the bases to some extent. This naturally leads us to the problem of investigating the existence and construction of a suitable basis with respect to which the matrix associated to a given linear transformation is diagonal. Definition 7.1 A n × n matrix A is called diagonalizable if there exists an invertible n × n matrix M such that M −1 AM is a diagonal matrix. A linear map f : V −→ V is called diagonalizable if the matrix associated to f with respect to some basis is diagonal. Remark 7.1 (i) Clearly, f is diagonalizable iff the matrix associated to f with respect to some basis (any basis) is diagonalizable. (ii) Let {v1 , . . . , vn } be a basis. The matrix Mf of a linear transformation f w.r.t. this basis is diagonal iff f (vi ) = λi vi , 1 ≤ i ≤ n for some scalars λi . Naturally a subquestion here is: does there exist such a basis for a given linear transformation? Definition 7.2 Given a linear map f : V −→ V we say v ∈ V is an eigenvector for f if v 6= 0 and f (v) = λv for some λ ∈ K. In that case λ is called as eigenvalue of f. For a square matrix A we say λ is an eigenvalue if there exists a non zero column vector v such that Av = λv. Of course v is then called the eigenvector of A corresponding to λ. Remark 7.2 (i) It is easy to see that eigenvalues and eigenvectors of a linear transformation are same as those of the associated matrix. (ii) Even if a linear map is not diagonalizable, the existence of eigenvectors and eigenvalues itself throws some light on the nature of the linear map. Thus the study of eigenvalues becomes extremely important. They arise naturally in the study of differential equations. Here we shall use them to address the problem of diagonalization and then see some geometric applications of diagonalization itself. 7.2 Characteristic Polynomial Proposition 7.1 (1) Eigenvalues of a square matrix A are solutions of the equation χA (λ) = det (A − λI) = 0. (2)The null space of A − λI is equal to the eigenspace EA (λ) := {v : Av = λv} = N (A − λI). Proof: (1) If v is an eigenvector of A then v 6= 0 and Av = λv for some scalar λ. Hence (A − λI)v = 0. Thus the nullity of A − λI is positive. Hence rank(A − λI) is less than n. Hence det (A − λI) = 0. (2) EA (λ) = {v ∈ V : Av = λv} = {v ∈ V : (A − λI)v = 0} = N (A − λI). ♠ 58 Definition 7.3 For any square matrix A, the polynomial χA (λ) = det (A−λI) in λ is called the characteristic polynomial of A. Example " 7.1 # 1 2 . To find the eigenvalues of A, we solve the equation (1) A = 0 3 det (A − λI) = det " 1−λ 2 0 3−λ # = (1 − λ)(3 − λ) = 0. Hence the eigenvalues of A are 1 and 3. Let us calculate the eigenspaces E(1) and E(3). By definition E(1) = {v | (A − I)v = 0} and E(3) = {v | (A − 3I)v = 0}. # " 0 2 . Hence (x, y)t ∈ E(1) iff A−I = 0 2 E(1) = L{(1, 0)}. A − 3I = Then " " 1−3 2 0 3−3 −2x + 2y 0 # = " # = 0 0 # " −2 2 0 0 # " 0 2 0 2 . Suppose # " " x y # −2 2 0 0 = " 2y 2y # " x y # # = " = " 0 0 # 0 0 # . Hence . . This is possible iff x = y. Thus E(3) = L({(1, 1)}). 3 0 0 2 (2) Let A = −2 4 2 . Then det (A − λI) = (3 − λ) (6 − λ). −2 1 5 Hence eigenvalues of A are 3 and 6. The eigenvalue λ = 3 is a double root of the characteristic polynomial of A. We say that λ = 3 has algebraic multiplicity 2. Let us find the eigenspaces E(3) and E(6). 0 0 0 λ = 3 : A − 3I = −2 1 2 . Hence rank (A − 3I) = 1. Thus nullity (A − 3I) = 2. By −2 1 2 solving the system (A − 3I)v = 0, we find that N (A − 3I) = EA (3) = L({(1, 0, 1), (1, 2, 0)}). The dimension of EA (λ) is called the geometric multiplicity of λ. Hence geometric multiplicity of λ = 3 is2. −3 0 0 λ = 6 : A − 6I = 2 . Hence rank(A − 6I) = 2. Thus dim EA (6) = 1. (It −2 −2 −2 1 −1 can be shown that {(0, 1, 1)} is a basis of EA (6).) Thus both the algebraic and geometric multiplicities" of the #eigenvalue 6 are equal to 1. 1 1 . Then det (A − λI) = (1 − λ)2 . Thus λ = 1 has algebraic multiplicity (3) A = 0 1 2. 59 " # 0 1 . Hence nullity (A − I) = 1 and EA (1) = L{e1 }. In this case the A−I = 0 0 geometric multiplicity is less than the algebraic multiplicity of the eigenvalue 1. Remark 7.3 (i) Observe that χA (λ) = χM −1 AM (λ). Thus the characteristic polynomial is an invariant of similarity. Thus the characteristic polynomial of any linear map f : V −→ V is also defined (where V is finite dimensional) by choosing some basis for V, and then taking the characteristic polynomial of the associated matrix M(f ) of f. This definition does not depend upon the choice of the basis. (ii) If we expand det (A − λI) we see that there is a term (a11 − λ)(a22 − λ) · · · (ann − λ). This is the only term which contributes to λn and λn−1 . It follows that the degree of the characteristic polynomial is exactly equal to n, the size of the matrix; moreover, the coefficient of the top degree term is equal to (−1)n . Thus in general, it has n complex roots, some of which may be repeated, some of them real, and so on. All these patterns are going to influence the geometry of the linear map. (iii) If A is a real matrix then of course χA (λ) is a real polynomial. That however, does not allow us to conclude that it has real roots. So while discussing eigenvalues we should consider even a real matrix as a complex matrix and keep in mind the associated linear map Cn −→ Cn . The problem of existence of real eigenvalues and real eigenvectors will be discussed soon. (iv) Next, the above observation also shows that the coefficient of λn−1 is equal to (−1)n−1 (a11 + · · · + ann ) = (−1)n−1 tr A. Lemma 7.1 Suppose A is a real matrix with a real eigenvalue λ. Then there exists a real column vector v 6= 0 such that Av = λv. Proof: Start with Aw = λw where w is a non zero column vector with complex entries. Write w = v + ıv′ where both v, v′ are real vectors. We then have Av + ıAv′ = λ(v + ıv′ ) Compare the real and imaginary parts. Since w 6= 0, at least one of the two v, v′ must be a non zero vector and we are done. ♠ Proposition 7.2 Let A be an n × n matrix with eigenvalues λ1 , λ2 , . . . , λn . Then (i) tr (A) = λ1 + λ2 + . . . + λn . (ii) det A = λ1 λ2 . . . λn . Proof: The characteristic polynomial of A is det (A − λI) = det a11 − λ a12 ··· a1n a21 a22 − λ · · · a2n .. .. .. . . . an1 an2 · · · ann − λ 60 (−1)n λn + (−1)n−1 λn−1 (a11 + . . . + ann ) + . . . (48) Put λ = 0 to get det A = constant term of det (A − λI). Since λ1 , λ2 , . . . , λn are roots of det (A − λI) = 0 we have det (A − λI) = (−1)n (λ − λ1 )(λ − λ2 ) . . . (λ − λn ). (−1)n [λn − (λ1 + λ2 + . . . + λn )λn−1 + . . . + (−1)n λ1 λ2 . . . λn ]. (49) (50) (51) Comparing (49) and 51), we get, the constant term of det (A − λI) is equal to λ1 λ2 . . . λn = det A and tr(A) = a11 + a22 + . . . + ann = λ1 + λ2 + . . . + λn . ♠ Proposition 7.3 Let v1 , v2 , . . . , vk be eigenvectors of a matrix A associated to distinct eigenvalues λ1 , λ2 , . . . , λk . Then v1 , v2 , . . . , vk are linearly independent. Proof: Apply induction on k. It is clear for k = 1. Suppose k ≥ 2 and c1 v1 + . . . + ck vk = 0 for some scalars c1 , c2 , . . . , ck . Hence c1 Av1 + c2 Av2 + . . . + ck Avk = 0 Hence c1 λ1 v1 + c2 λ2 v2 + . . . + ck λk vk = 0 Hence λ1 (c1 v1 + c2 v2 + . . . + ck vk ) − (λ1 c1 v1 + λ2 c2 v2 + . . . + λk ck vk ) = (λ1 − λ2 )c2 v2 + (λ1 − λ3 )c3 v3 + . . . + (λ1 − λk )ck vk = 0 By induction, v2 , v3 , . . . , vk are linearly independent. Hence (λ1 − λj )cj = 0 for j = 2, 3, . . . , k. Since λ1 6= λj for j = 2, 3, . . . , k, cj = 0 for j = 2, 3, . . . , k. Hence c1 is also zero. Thus v1 , v2 , . . . , vk are linearly independent. ♠ Proposition 7.4 Suppose A is an n×n matrix. Let A have n distinct eigenvalues λ1 , λ2 , . . . , λn . Let C be the matrix whose column vectors are respectively v1 , v2 , . . . , vn where vi is an eigenvector for λi for i = 1, 2, . . . , n. Then C −1 AC = D(λ1 , . . . , λn ) = D the diagonal matrix. Proof: It is enough to prove AC = CD. For i = 1, 2, . . . , n : let C i (= vi ) denote the ith column of C etc.. Then (AC)i = AC i = Avi = λi vi . Similarly, (CD)i = CDi = λi vi . ♠ Hence AC = CD as required.] 61 7.3 Relation Between Algebraic and Geometric Multiplicities Recall that Definition 7.4 The algebraic multiplicity aA (µ) of an eigenvalue µ of a matrix A is defined to be the multiplicity k of the root µ of the polynomial χA (λ). This means that (λ−µ)k divides χA (λ) whereas (λ − µ)k+1 does not. Definition 7.5 The geometric multiplicity of an eigenvalue µ of A is defined to be the dimension of the eigenspace EA (λ); gA (λ) := dim EA (λ). Proposition 7.5 Both algebraic multiplicity and the geometric multiplicities are invariant of similarity. Proof: We have already seen that for any invertible matrix C, χA (λ) = χC −1 AC (λ). Thus the invariance of algebraic multiplicity is clear. On the other hand check that EC −1 AC (λ) = C(EA (λ)). Therefore, dim (EC −1 AC (λ)) = dim C(EA λ)) = dim (EA (λ)), the last equality being the consequence of invertibility of C. ♠ We have observed in a few examples that the geometric multiplicity of an eigenvalue is at most its algebraic multiplicity. This is true in general. Proposition 7.6 Let A be an n×n matrix. Then the geometric multiplicity of an eigenvalue µ of A is less than or equal to the algebraic multiplicity of µ. Proof: Put aA (µ) = k. Then (λ − µ)k divides det (A − λI) but (λ − µ)k+1 does not. Let gA (µ) = g, be the geometric multiplicity of µ. Then EA (µ) has a basis consisting of g eigenvectors v1 , v2 , . . . , vg . We can extend this basis of EA (µ) to a basis of Cn , say {v1 , v2 , . . . , vg , . . . , vn }. Let B be the matrix such that B j = vj . Then B is an invertible matrix and µIg X B −1 AB = 0 Y where X is a g × (n − g) matrix and Y is an (n − g) × (n − g) matrix. Therefore, det (A − λI) = det [B −1 (A − λI)B] = det (B −1 AB − λI) = (det (µ − λ)Ig )(det (C − λIn−g ) = (µ − λ)g det (Y − λIn−g ). Thus g ≤ k. ♠ Remark 7.4 We will now be able to say something about the diagonalizability of a given matrix A. Assuming that there exists B such that B −1 AB = D(λ1 , . . . , λn ), as seen in the previous proposition, it follows that AB = BD . . . etc.. AB i = λB i where B i denotes the ith column vector of B. Thus we need not hunt for B anywhere but look for eigenvectors of A. Of course B i are linearly independent, since B is invertible. Now the problem turns to 62 the question whether we have n linearly independent eigenvectors of A so that they can be chosen for the columns of B. The previous proposition took care of one such case, viz., when the eigenvalues are distinct. In general, this condition is not forced on us. Observe that the geometric multiplicity and algebraic multiplicity of an eigenvalue co-incide for a diagonal matrix. Since these concepts are similarity invariants, it is necessary that the same is true for any matrix which is diagonalizable. This turns out to be sufficient also. The following theorem gives the correct condition for diagonalization. Theorem 7.1 A n × n matrix A is diagonalizable if and only if for each eigenvalue µ of A we have the algebraic and geometric multiplicities are equal: aA (µ) = gA (µ). Proof: We have already seen the necessity of the condition. To prove the converse, suppose that the two multiplicities coincide for each eigenvalue. Suppose that λ1 , λ2 , . . . , λk are all the eigenvalues of A with algebraic multiplicities n1 , n2 , . . . , nk . Let B1 = {v11 , v12 , . . . , v1n1 } = a basis of E(λ1 ), B2 = {v21 , v22 , . . . , v2n2 } = a basis of E(λ2 ), .. . Bk = {vk1 , vk2, . . . , vknk } = a basis of E(λk ). Use induction on k to show that B = B1 ∪ B2 ∪ . . . ∪ Bk is a linearly independent set. (The proof is exactly similar to the proof of proposition (7.3). Denote the matrix with columns as elements of the basis B also by B itself. Then, check that B −1 AB is a diagonal matrix. Hence A is diagonalizable. ♠ 63 Lectures 19,20,21 7.4 Eigenvalues of Special Matrices In this section we discuss eigenvalues of special matrices. We will work in the n-dimensional complex vector space Cn . If u = (u1 , u2 , . . . , un )t and v = (v1 , v2 , . . . , vn )t ∈ Cn , we have defined their inner product in Cn by hu, vi = u∗ v = u1 v1 + u2 v2 + · · · + un vn . The length of u is given by kuk = q |u1 |2 + · · · + |un |2 . Definition 7.6 Let A be a square matrix with complex entries. A is called (i) Hermitian if A = A∗ ; (ii) Skew Hermitian if A = −A∗ . Lemma 7.2 A is Hermitian iff for all column vectors v, w we have (Av)∗ w = v∗ Aw; i.e., (hAv, wi = hv, Awi) (52) Proof: If A is Hermitian then (Av)∗ w = v∗ A∗ w = v∗ Aw. To see the converse, take v, w to be standard basic column vectors. ♠ Remark 7.5 (i) If A is real then A = A∗ means A = At . Hence real symmetric matrices are Hermitian. Likewise a real skew Hermitian matrix is skew symmetric. (ii) A is Hermitian iff ıA is skew Hermitian. Proposition 7.7 Let A be an n × n Hermitian matrix. Then : 1. For any u ∈ Cn , u∗ Au is a real number. 2. All eigenvalues of A are real. 3. Eigenvectors of a Hermitian matrix corresponding to distinct eigenvalues are mutually orthogonal. Proof: (1) Since u∗ Au is a complex number, to prove it is real, we prove that (u∗ Au)∗ = u∗ Au. But (u∗ Au)∗ = u∗ A∗ (u∗ )∗ = u∗ Au. Hence u∗ Au is real for all u ∈ Cn . (2) Suppose λ is an eigenvalue of A and u is an eigenvector for λ. Then u∗ Au = u∗ (λu) = λ(u∗ u) = λkuk2 . Since u∗ Au is real and kuk is a nonzero real number, it follows that λ is real. (3) Let λ and µ be two distinct eigenvalues of A and u and v be corresponding eigenvectors. Then Au = λu and Av = µv. Hence λu∗ v = (λu)∗ v = (Au)∗ v = u∗ (Av) = u∗ µv = µ(u∗ v). Hence (λ − µ)u∗ v = 0. Since λ 6= µ, u∗ v = 0. 64 ♠ Corollary 7.1 Let A be an n × n skew Hermitian matrix. Then : 1. For any u ∈ Cn , u∗ Au is either zero or a purely imaginary number. 2. Each eigenvalue of A is either zero or a purely imaginary number. 3. Eigenvectors of A corresponding to distinct eigenvalues are mutually orthogonal. Proof: All this follow straight way from the corresponding statement about Hermitian matrix, once we note that A is skew Hermitian implies ıA is Hermitian and the fact that a complex number c is real iff ıc is either zero or purely imaginary. Definition 7.7 Let A be a square matrix over C. A is called (i) unitary if A∗ A = I; (ii) orthogonal if A is real and unitary. Thus a real matrix A is orthogonal iff AT = A−1 . Also observe that A is unitary iff AT is unitary iff A is unitary. Example 7.2 The matrices U= " cos θ sin θ − sin θ cos θ # 1 and V = √ 2 " 1 i i 1 # are orthogonal and unitary respectively. Proposition 7.8 Let A be a square matrix. Then the following conditions are equivalent. (i) U is unitary. (ii) The rows of U form an orthonormal set of vectors. (iii) The columns of U form an orthonormal set of vectors. (iv) U preserves the inner product, i.e., for all vectors x, y ∈ Cn , we have hUx, Uyi = hx, yi. Proof: Write the matrix U column-wise : U = [u1 u2 . . . un ] so that U∗ = u∗1 u∗2 .. . u∗n Hence ∗ U U = = u∗1 u∗2 .. . u∗n [u1 u2 . . . un ] u∗1 u1 u∗1 u2 · · · u∗1 un u∗2 u1 u∗2 u2 · · · u∗2 un . .. . ··· ∗ ∗ ∗ un u1 un u2 · · · un un 65 . Thus U ∗ U = I iff u∗i uj = 0 for i 6= j and u∗i ui = 1 for i = 1, 2, . . . , n iff the column vectors of U form an orthonormal set. This proves (i) ⇐⇒ (ii). Since U ∗ U = I implies UU ∗ = I, the proof of (i) ⇐⇒ (iii) follows. To prove (i) ⇐⇒ (iv) let U be unitary. Then U ∗ U = Id and hence hUx, Uyi = hx, U ∗ Uyi = hx, yi. Conversely, iff U preserves inner product take x = ei and y = ej to get e∗i (U ∗ U)ej = e∗i ej = δij where δij are Kronecker symbols (δij = 1 if i = j; = 0 otherwise.) This means the (i, j)th entry of U ∗ U is δij . Hence U ∗ U = In . ♠ Remark 7.6 Observe that the above theorem is valid for an orthogonal matrix also by merely applying it for a real matrix. Corollary 7.2 Let U be a unitary matrix. Then : (1) For all x, y ∈ Cn , hUx, Uyi = hx, yi. Hence kUxk = kxk. (2) If λ is an eigenvalue of U then |λ| = 1. (3) Eigenvectors corresponding to different eigenvalues are orthogonal. Proof: (1) We have, kUxk2 = hUx, Uxi = hx, xi = kxk2 . (2) If λ is an eigenvalue of U with eigenvector x then Ux = λx. Hence kxk = kUxk = |λ| kxk. Hence |λ| = 1. (3) Let Ux = λx and Uy = µy where x, y are eigenvectors with distinct eigenvalues λ and µ respectively. Then hx, yi = hUx, Uyi = hλx, µyi = λµhx, yi. Hence λµ = 1 or hx, yi = 0. Since λλ = 1, we cannot have λµ = 1. Hence hx, yi = 0, i.e., x and y are orthogonal. ♠ Example 7.3 U = " cos θ − sin θ sin θ cos θ # is an orthogonal matrix. The characteristic polyno- mial of U is : D(λ) = det (U − λI) = det, " cos θ − λ − sin θ sin θ cos θ − λ # = λ2 − 2λ cos θ + 1. Roots of D(λ) = 0 are : λ= 2 cos θ ± √ 4cos2 θ − 4 = cos θ ± ı sin θ = e±ıθ . 2 Hence |λ| = 1. Check that eigenvectors are : " # " 1 1 and for λ = e−ıθ : y = for λ = eıθ : x = −ı ı # " 1 = 1 + ı2 = 0. Hence x ⊥ Thus x∗ y = [1 ı] ı Therefore if we take, " 1 1 C=√ 2 −ı then C −1 UC = D(eıθ , e−ıθ ). 66 # . y. Normalize the eigenvectors x and y. 1 ı # 7.5 Spectral Theorem and Applications Similarity does not necessarily preserve the distance. In terms of matrices, this may be noticed in the fact that an arbitrary conjugate C −1 AC of a Hermitian matrix may not be Hermitian. Thus the diagonalization problem for special matrices such as Hermitian matrices needs a special treatment viz., we need to restrict C to those matrices which preserve the inner product. In this section we shall first establish an important result that says that a Hermitian matrix can be diagonalized using a unitary transformation. We shall then see some applications of this to geometry. Closely related to the problem of diagonalization is the problem of triangularization. We shall use this concept as a stepping stone toward the solution of diagonalization problem. Definition 7.8 Two n × n matrices A and B are said to be congruent if there exists a unitary matrix C such that C ∗ AC = B. Definition 7.9 We say A is triangularizable if there exists an invertible matrix C such that C −1 AC is upper triangular. Remark 7.7 Obviously all diagonalizable matrices are triangularizable. The following result says that triangularizability causes least problem: Proposition 7.9 Over the complex numbers every square matrix is congruent to an upper triangular matrix. Proof: Let A be a n × n matrix with complex entries. We have to find a unitary matrix C such that C ∗ AC is upper triangular. We shall prove this by induction. For n = 1 there is nothing to prove. Assume the result for n − 1. Let µ be an eigenvalue of A and let v ∈ Cn be such that Av1 = µv1 . (Here we need to work with complex numbers; real numbers won’t do. why?) We can choose v1 to be of norm 1. We can then complete it to an orthonormal basis {v1 , . . . , vn }. (Here we use Gram-Schmidt.) We then take C1 to be the matrix whose ith column is vi . Then as seen earlier, C1 is a unitary matrix. Put A1 = C1−1 AC1 . Then A1 e1 = C1−1 AC1 e1 = C1−1 Av1 = µC1−1 (v1 ) = µe1 . This shows that the first column of A1 has all entries zero except the first one which is equal to µ. Let B be the matrix obtained from A1 by cutting down the first row and the first column, so that A1 is a block matrix of the form # " µ ... A1 = 0n−1 B where 0n−1 is the column of zeros of size n − 1. By induction there exists a (n − 1) × (n − 1) unitary matrix M such that M −1 BM is an " # 1 0tn−1 upper triangular matrix. Put M1 = . 0n−1 M Then M1 is unitary and hence C = C1 M1 is also unitary. Clearly C −1 AC = M1−1 C1−1 AC1 M1 = M1−1 A1 M1 which is of the form " 1 0n−1 0tn−1 M −1 #" µ 0n−1 ... B #" 1 0n−1 0tn−1 M # ♠ and hence is upper triangular. 67 Remark 7.8 Assume now that A is a real matrix with all its eigenvalues real. Then from lemma 7.1, it follows that we can choose the eigenvector v1 to be a real vector and then complete this into a basis for Rn . Thus the matrix C1 corresponding to this basis will have real entries. By induction M will have real entries and hence the product C = MC1 will also have real entries. Thus we have proved: Proposition 7.10 For a real square matrix A with all its eigenvalues real, there exists an orthogonal matrix C such that C t AC is upper triangular. Definition 7.10 A square matrix A is called normal if A∗ A = AA∗ . Remark 7.9 (i) Normality is congruence invariant. This means that if C is unitary and A is normal then C −1 AC is also normal. This is easy to verify. (ii) Any diagonal matrix is normal. Therefore it follows that normality is necessary for diagonalization. Amazingly, it turns out to be sufficient. That is the reason to define this concept. (iii) "Observe# that product of two normal matrices may not be normal. For example take " # 0 1 1 0 A= ; B= . 1 0 0 2 (iv) Certainly Hermitian matrices are normal. Of #course there are normal matrices which " ı 0 are not Hermitian. For example take A = . 0 −ı Lemma 7.3 For a normal matrix A we have kAxk2 = kA∗ xk2 for all x ∈ Cn . Proof: hAx, Axi = (Ax)∗ (Ax) = x∗ A∗ Ax = x∗ AA∗ x = (A∗ x)∗ (A∗ x) = hA∗ x, A∗ xi = kA∗ xk2 . ♠ Lemma 7.4 If A is normal, then v is an eigenvector of A with eigenvalue µ iff v is an eigenvector of A∗ with eigenvalue µ. Proof: Observe that if A is normal then A − µI is also normal. Now (A − µI)(v) = 0 iff ♠ k(A − µI)(v)k = 0 iff k(A − µI)∗ vk = 0 iff (A∗ − µI)(v) = 0. Proposition 7.11 An upper triangular normal matrix is diagonal. Proof: Let A be an upper triangular normal matrix. Inductively we shall show that aij = 0 for j > i. We have Ae1 = a11 e1 . Hence kAe1 k2 = |a11 |2 . On the other hand this is equal to kA∗ e1 k = |a11 |2 + |a12 |2 + · · · + |a1n |2 . Hence a12 = a13 = · · · = a1n = 0. Inductively suppose we have shown aij = 0 for j > i for all 1 ≤ i ≤ k − 1. Then it follows that Aek = akk ek . Exactly as in the first case, this implies that kA∗ ek k = |ak,k |2 +|ak,k+1|2 +· · ·+|ak,n |2 = |ak,k |2 . Hence ak,k+1 = · · · = ak,n = 0. ♠ Why all this fuss? Well for one thing, we now have a big theorem. All we have to do is to combine propositions 7.9 and 7.11. Theorem 7.2 Spectral Theorem Given any normal matrix A, there exists a unitary matrix C such that C ∗ AC is a diagonal matrix. 68 Corollary 7.3 Every Hermitian matrix A is congruent to a diagonal matrix. A real symmetric matrix is real-congruent to a diagonal matrix. Proof: For the first statement we simply observe that a Hermitian matrix is normal and apply the above theorem. For the second statement, we first recall that for a real symmetric matrix, all eigenvalues are real. Hence the proposition 7.10 is applicable and so we can choose C to be an orthogonal matrix. Along with proposition 7.11, this gives the result. ♠ Quadratic forms and their diagonalization Definition 7.11 Let A = (aij ) be an n × n real matrix . The function Q : Rn → R defined by : Q(x) = n X n X aij xi xj , i=1 j=1 X = (x1 , x2 , . . . , xn )t ∈ Rn is called the quadratic form associated with A. If A = diag (λ1 , λ2 , . . . , λn ) then Q(X) = λ1 x21 + λ2 x22 + · · · + λn x2n is called a diagonal form. Proposition 7.12 Q(X) = Proof: [x1 , x2 , . . . , xn ]A [x1 , x2 , . . . , xn ]A x1 x2 .. . xn x1 x2 .. . xn = X t AX where X = (x1 , x2 , . . . , xn )t . = = = n X a1j xj j=1 .. [x1 , x2 , . . . , xn ] . n X anj xj n X a1j xj x1 j=1 n X n X j=1 n X +···+ anj xj xn i=1 aij xi xj j=1 i=1 = Q(x). ♠ Example # " " 7.4 # x 1 1 . Then , X= (1) A = y 3 5 t X AX = [x, y] (2) Let B = " 1 2 2 5 # " , X= X t BX = [x, y] " 1 1 3 5 #" " # x y 1 2 2 5 x y # = [x, y] " x+y 3x + 5y # = x21 + 4xy + 5y 2 . = [x, y] " x + 2y 2x + 5y # = x21 + 4xy + 5x22 . . Then #" x y # Notice that A and B give rise to same Q(x) and B = 21 (A + At ) is a symmetric matrix. 69 Proposition 7.13 For any n × n matrix A and the column vector X = (x1 , x2 , . . . , xn )t , 1 where B = (A + At ). 2 X t AX = X t BX Hence every quadratic form is associated with a symmetric matrix. Proof: X t AX is a 1 × 1 matrix. Hence X t AX = X t At X = (X t AX)t . Hence 1 1 1 X t AX = X t AX + X t At X = X t A + At X = X t BX. 2 2 2 ♠ We now show how the spectral theorem helps us in converting a quadratic form into a diagonal form. Theorem 7.3 Let X t AX be a quadratic form associated with a real symmetric matrix A. Let U be an orthogonal matrix such that U t AU = diag (λ1 , λ2 , . . . , λn ). Then X t AX = λ1 y12 + λ2 y22 + · · · + λn yn2 , where X= x1 x2 .. . xn =U y1 y2 .. . yn = UY. Proof: Since X = UY, X t AX = (UY )t A(UY ) = Y t (U t AU)Y. Since U t AU = diag (λ1 , λ2 , . . . , λn ), we get t X AX = = [y1 , y2 , . . . , yn ] λ1 y12 + λ2 y22 λ1 λ2 .. . λn +···+ λn yn2 . y1 y2 .. . yn ♠ Example 7.5 Let us determine the orthogonal matrix U which reduces the quadratic form Q(X) = 2x21 + 4xy + 5x22 to a diagonal form. We write Q(X) = [x, y] " 2 2 2 5 # " x y # = X t AX. The symmetric matrix A can be diagonalized. The eigenvalues of A are λ1 = 1 and λ2 = 6. An orthonormal set of eigenvectors for λ1 and λ2 is 1 v1 = √ 5 " 2 −1 # 1 and v2 = √ 5 70 " 1 2 # . Hence U = √1 5 " diagonal form is: 2 1 −1 2 # . The change of variables equations are [u, v] " 1 0 0 6 # " x y # = U " u v # . The [u, v]T = u2 + 6v 2 . Check that U t AU = diag (1, 6). 7.6 Conic Sections and quadric surfaces A conic section is the locus in the Cartesian plane R2 of an equation of the form ax2 + bxy + cy 2 + dx + ey + f = 0. (53) It can be proved that this equation represents one of the following: (i) the empty set (ii) a single point (iii) one or two straight lines (iv) an ellipse (v) an hyperbola and (vi) a parabola. The second degree part of (53) Q(x, y) = ax2 + bxy + cy 2 is a quadratic form. This determines the type of the conic. We can write the equation (53) into matrix form after setting x = x, y = y : [x, y] " a b/2 b/2 c # " x y # + [d, e] " x y # +f = 0 (54) # " a b/2 . Let U = [v1 , v2 ] be an orthogonal matrix whose column vectors v1 Write A = b/2 c and v2 are eigenvectors of A with eigenvalues λ1 and λ2 . Apply the change of variables X= " x y # =U " u v # to diagonalize the quadratic form Q(x, y) to the diagonal form λ1 y12 + λ2 y22. The orthonormal basis {v1 , v2 } determines a new set of coordinate axes with respect to which the locus of the equation [x, y]A[x, y]T + B[x, y]T + f = 0 with B = [d, e] is same as the locus of the equation 0 = [u, v] diag (λ1 , λ2 )[u, v]T + (BU)[u, v]T + f = λ1 u2 + λ2 v 2 + [d, e][v1 , v2 ][u, v]T + f. (55) If the conic determined by (55) is not degenerate i.e., not an empty set, a point, nor line(s) then signs of λ1 and λ2 determine whether it is a parabola, an hyperbola or an ellipse. The equation (53) will represent (1) ellipse if λ1 λ2 > 0 (2) hyperbola if λ1 λ2 < 0 (3) parabola if λ1 λ2 = 0 Example 7.6 (1) 2x2 + 4xy + 5y 2 + 4x + 13y − 1/4 = 0. We have earlier diagonalized the quadratic form 2x2 +4xy+5y 2 . The associated symmetric matrix, the eigenvectors and eigenvalues are displayed in the equation of diagonalization : 1 U AU = √ 5 t " 2 −1 1 2 # " 2 2 2 5 71 # 1 √ 5 " 2 1 −1 2 # = " 1 0 0 6 # . √ Set t = 1/ 5 for convenience. Then the change of coordinates equations are : " # x y = " 2t t −t 2t # " u v # , i.e., x = t(2u + v) and y = t(−u + 2v). Substitute these into the original equation to get u2 + 6v 2 − √ √ 1 5u + 6 5v − = 0. 4 Complete the square to write this as 1√ 2 1√ 2 5) + 6(v + 5) = 9. 2 2 √ √ This is an equation of ellipse with center ( 21 5, − 12 5) in the uv-plane. The u-axis and v-axis are determined by the eigenvectors v1 and v2 as indicated in the following figure : (u − v y u 2 x 2 (2) 2x − 4xy − y − 4x + 10y − 13 = 0. Here, the matrix A = " part of the equation. We write the equation in matrix form as [x, y] " 2 −2 −2 −1 #" x y # + [−4, 10] " x y # 2 −2 −2 −1 # gives the quadratic − 13 = 0. √ Let t = 1/ 5. The eigenvalues of A are λ1 = 3, λ2 = −2. An orthonormal set of eigenvectors is v1 = t(2, −1)t and v2 = t(1, 2)t . Put U =t " 2 1 −1 2 # and " x y # =U " u v # . The transformed equation becomes 3u2 − 2v 2 − 4t(2u + v) + 10t(−u + 2v) − 13 = 0 or 3u2 − 2v 2 − 18tu + 16tv − 13 = 0. Complete the square in u and v to get 3(u − 3t)2 − 2(v − 4t)2 = 12 72 or (u − 3t)2 (v − 4t)2 − = 1. 4 6 This represents a hyperbola with center (3t, 4t) in the uv-plane. The eigenvectors v1 and v2 determine the directions of positive u and v axes. v y u x (3) 9x2 + 24xy + 16xy 2 − 20x + 15y = 0 The symmetric matrix for the quadratic part is A = " λ1 = 25, λ2 = 0. An orthonormal set of eigenvectors is" v1 3 a = 1/5. An orthogonal diagonalizing matrix is U = a 4 of coordinates are " x y # =U " u v # # 9 12 . The eigenvalues are 12 16 = a(3, 4)t , v2 = a(−4, 3)t where # −4 . The equations of change 3 i.e., x = a(3u − 4v), y = a(4u + 3v). The equation in uv-plane is u2 + v = 0. This is an equation of parabola with its vertex at the origin. Quadric Surfaces Let A be a 3 × 3 real symmetric matrix. The locus of the equation x x [x, y, z]A y + [b1 , b2 , b3 ] y +c = 0 z z (56) in three variables is called a quadric surface. We can carry out an analysis similar to the one for quadratic forms in two variables to bring (56) into standard form. Degenerate cases may arise. But the primary cases are : 73 Equation x2 a2 x2 a22 x a2 x2 a22 x a2 x2 a2 + + + + − − y2 b2 y2 b22 y b2 y2 b22 y b2 y2 b2 z2 c2 z − =0 c z2 − 2 =0 c z2 − 2 =1 c z2 − 2 =1 c z − =0 c + Surface signs of eigenvalues of A ellipsoid all three positive elliptic paraboloid two positive, one negative elliptic cone two positive, one negative 1-sheeted hyperboloid two positive, one negative 2-sheeted hyperboloid one positive, two negative hyperbolic paraboloid one positive, one negative, one zero. Example 7.7 (1) 7x2 + 7y 2 − 2z 2 + 20yz − 20zx − 2xy = 36. The matrix form is: x 7 −1 −10 [x, y, z] −1 7 10 y = 36. z −10 10 −2 Let A be the 3 × 3 matrix appearing in this equation. The eigenvalues of A are λ1 = 6, λ2 = −12 and λ2 = 18. An orthonormal set of eigenvectors is given by the column vectors of the orthogonal matrix U = √1 2 √1 2 0 √1 6 − √16 √2 6 √1 3 − √13 − √13 consider the change of coordinates given by u x y = U v . w z This change of coordinates transforms the given equation into the form 6u2 − 12v 2 + 18w 2 = 36 or 6 0 0 and U t AU = 0 −12 0 . 0 0 18 u2 v 2 w 2 − + =1 6 3 2 This is a hyperboloid of one sheet. (2) Consider the quadric x2 + y 2 + z 2 + 4yz − 4zx + 4xy = 27. The symmetric matrix for this is 1 2 −2 A= 2 1 2 . −2 2 1 74 The eigenvalues of A are 3, 3, −3. Since A is diagonalizable the eigenspace E(3) is 2-dimensional. We find a basis of E(3) first. The eigenvectors for λ = 3 are solutions of 0 x −2 2 −2 2 −2 2 y = 0 . 0 z −2 2 −2 Hence we obtain the equation x − y + z = 0. Hence E(3) = {(y − z, y, z) | y, z ∈ R} = L({u1 = (0, 1, 1)t , u2 = (−1, 1, 2)t }). Now we apply Gram-Schmidt process to get an orthonormal basis of E(3): 1 1 v1 = (0, √ , √ )t 2 2 s 2 1 1 , − √ , √ ). 3 6 6 √ √ √ A unit eigenvector for λ = −3 is v3 = (1/ 3, 1/ 3, 1/ 3). We know that hv3 , v1 i = hv3 , v2 i = 0 since A is a symmetric matrix. The orthogonal matrix for diagonalization is U = [v1 , v2 , v3 ] written column wise. The quadric under the change of coordinates and v2 = (− u x y =U v w z reduces to 3u2 + 3v 2 − 3w 2 = 27. This is a hyperboloid of one sheet. w v u 75