* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Diagonalisation
Quartic function wikipedia , lookup
Generalized eigenvector wikipedia , lookup
Tensor operator wikipedia , lookup
Factorization wikipedia , lookup
System of polynomial equations wikipedia , lookup
Bra–ket notation wikipedia , lookup
Cartesian tensor wikipedia , lookup
Rotation matrix wikipedia , lookup
Invariant convex cone wikipedia , lookup
Basis (linear algebra) wikipedia , lookup
Determinant wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
System of linear equations wikipedia , lookup
Quadratic form wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Linear algebra wikipedia , lookup
Four-vector wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Gaussian elimination wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Matrix calculus wikipedia , lookup
Matrix multiplication wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Jordan normal form wikipedia , lookup
Chapter 3 Diagonalisation Reading As in the previous chapter, there is no specific essential reading for this chapter. It is essential that you do some reading, but the topics discussed in this chapter are adequately covered in many texts on linear algebra. The list below gives examples of relevant reading. (For full publication details, see Chapter 1.) Ostaszewski, A. Mathematics in Economics, Chapter 7, Sections 7.2, 7.4, 7.6. Ostaszewski, A. Advanced Mathematical Methods. Chapter 5, sections 5.1 and 5.2. Leon, S.J., Linear Algebra with Applications. Chapter 6, sections 6.1 and 6.3. Simon, C.P. and Blume, L., Mathematics for Economists. Chapter 23, sections 23.1, 23.7. Introduction One of the most useful techniques in applications of matrices and linear algebra is diagonalisation. Before discussing this, we have to look at the topic of eigenvalues and eigenvectors. We shall explore a number of applications of diagonalisation in the next chapter of the guide. Eigenvalues and eigenvectors Definitions Suppose that A is a square matrix. The number λ is said to be an eigenvalue of A if for some non-zero vector x, Ax = λx. Any non-zero vector x for which this 37 Eigenvalues and eigenvectors, diagonalisation of a matrix, orthogonal diagonalisation fo symmetric matrices equation holds is called an eigenvector for eigenvalue λ or an eigenvector of A corresponding to eigenvalue λ. Finding eigenvalues and eigenvectors To determine whether λ is an eigenvalue of A, we need to determine whether there are any non-zero solutions to the matrix equation Ax = λx. Note that the matrix equation Ax = λx is not of the standard form, since the right-hand side is not a fixed vector b, but depends explicitly on x. However, we can rewrite it in standard form. Note that λx = λIx, where I is, as usual, the identity matrix. So, the equation is equivalent to Ax = λIx, or Ax−λIx = 0, which is equivalent to (A−λI)x = 0. Now, a square linear system Bx = 0 has solutions other than x = 0 precisely when |B| = 0. Therefore, taking B = A − λI, λ is an eigenvalue if and only if the determinant of the matrix A − λI is zero. This determinant, p(λ) = |A − λI|, is known as the characteristic polynomial of A, since it is a polynomial in the variable λ. To find the eigenvalues, we solve the equation |A − λI| = 0. Let us illustrate with a very simple 2 × 2 example. Example: Let A= Then A − λI = 1 1 1 2 −λ 1 2 1 2 1 0 0 1 . = 1−λ 2 1 2−λ and the characteristic polynomial is 1 − λ 1 |A − λI| = 2 2 − λ = (1 − λ)(2 − λ) − 2 = λ2 − 3λ + 2 − 2 = λ2 − 3λ. So the eigenvalues are the solutions of λ2 − 3λ = 0. To solve this, one could use either the formula for the solutions to a quadratic, or simply observe that the equation is λ(λ − 3) = 0 with solutions λ = 0 and λ = 3. Hence the eigenvalues of A are 0 and 3. To find an eigenvector for eigenvalue λ, we have to find a solution to (A − λI)x = 0, other than the zero vector. (I stress the fact that eigenvectors cannot be the zero vector because this is a mistake many students make.) This is easy, since for a particular value of λ, all we need to do is solve a simple linear system We illustrate by finding the eigenvectors for the matrix of the example just given. Example: We find eigenvectors of A= 1 2 1 2 . We have seen that the eigenvalues are 0 and 3. To find an eigenvector for eigenvalue 0 we solve the system (A − 0I)x = 0: that is, Ax = 0, or 1 1 x1 0 = . 2 2 x2 0 This could be solved using row operations. (Note that it cannot be solved by using inverse matrices since A is not invertible. In fact, inverse matrix techniques or 38 Cramer’s rule will never be of use here since λ being an eigenvalue means that A − λI is not invertible.) However, we can solve this fairly directly just by looking at the equations. We have to solve x1 + x2 = 0, 2x1 + 2x2 = 0. Clearly both equations are equivalent. From either one, we obtain x1 = −x2 . We can choose x2 to be any number we like. Let’s take x2 = 1; then we need x1 = −x2 = −1. It follows that an eigenvector for 0 is −1 x= . 1 The choice x2 = 1 was arbitrary; we could have chosen any non-zero number, so, for example, the following are eigenvectors for 0: 5.2 2 . , −5.2 −2 There are infinitely many eigenvectors for 0: for each α 6= 0, α −α is an eigenvector for 0. But be careful not to think that you can choose α = 0; for then x becomes the zero vector, and this is never an eigenvector, simply by definition. To find an eigenvector for 3, we solve (A − 3I)x = 0, which is −2 1 x1 0 = . 2 −1 x2 0 This is equivalent to the equations −2x1 + x2 = 0, 2x1 − x2 = 0, which are together equivalent to the single equation x2 = 2x1 . If we choose x1 = 1, we obtain the eigenvector 1 x= . 2 (Again, any non-zero scalar multiple of this vector is also an eigenvector for eigenvalue 3.) We illustrated with a 2 × 2 example just for simplicity, but you should be able to work with 3 × 3 matrices. We give three such examples. Example: Suppose that 4 A = 0 4 0 4 4 4 4. 8 Find the eigenvalues of A and obtain one eigenvector for each eigenvalue. To find the eigenvalues we solve |A − λI| = 0. Now, 4 − λ 0 4 4−λ 4 |A − λI| = 0 4 4 8 − λ 0 4 − λ 4 − λ 4 + 4 = (4 − λ) 4 4 4 8 − λ = (4 − λ) ((4 − λ)(8 − λ) − 16) + 4 (−4(4 − λ)) = (4 − λ) ((4 − λ)(8 − λ) − 16) − 16(4 − λ). 39 Now, we notice that each of the two terms in this expression has 4 − λ as a factor, so instead of expanding everything, we take 4 − λ out as a common factor, obtaining |A − λI| = (4 − λ) ((4 − λ)(8 − λ) − 16 − 16) = (4 − λ)(32 − 12λ + λ2 − 32) = (4 − λ)(λ2 − 12λ) = (4 − λ)λ(λ − 12). It follows that the eigenvalues are 4, 0, 12. (The characteristic polynomial will not always factorise so easily. Here it was simple because of the common factor (4 − λ). The next example is more difficult.) To find an eigenvector for 4, we have to solve the equation (A − 4I)x = 0, that is, 0 0 4 x1 0 0 0 4 x2 = 0 . 4 4 4 0 x3 Of course, we could use row operations, but the system is simple enough to solve straight away. The equations are 4x3 4x3 4x1 + 4x2 + 4x3 = 0 = 0 = 0, so x3 = 0 and x2 = −x1 . Choosing x1 = 1, we get the eigenvector 1 −1 . 0 (Again, we can choose x1 to be any non-negative number. So the eigenvalues for eigenvalue 4 are all non-zero multiples of this vector.) Activity 3.1 Determine eigenvectors for 0 and 12. You should find that for λ = 0, your eigenvector is a non-zero multiple of 1 1 −1 and that for λ = 12 your eigenvector is a non-zero multiple of 1 1. 2 Example: Let −3 −1 −2 1. A = 1 −1 1 1 0 Given that −1 is an eigenvalue of A, find all the eigenvalues of A. 40 We calculate the characteristic polynomial of A: −3 −1 −2 |A − λI| = 1 −1 1 1 1 0 −1 − λ 1 1 1 1 −1 − λ = (−3 − λ) − (−1) − 2 1 −λ 1 −λ 1 1 = (−3 − λ)(λ2 + λ − 1) + (−λ − 1) − 2(2 + λ) = −λ3 − 4λ2 − 5λ − 2. Now, the fact that −1 is an eigenvalue means that −1 is a solution of the equation |A − λI| = 0, which means that (λ − (−1)), that is, (λ + 1), is a factor of the characteristic polynomial |A − λI|. So this characteristic polynomial can be written in the form (λ + 1)(aλ2 + bλ + c). Clearly we must have a = −1 and c = −2 to obtain the correct λ3 term and the correct constant. Given this, b = −3. In other words, the characteristic polynomial is (λ + 1)(−λ2 − 3λ − 2) = −(λ + 1)(λ2 + 3λ + 2) = −(λ + 1)(λ + 2)(λ + 1). That is, |A − λI| = (λ + 1)2 (λ + 2). The eigenvalues are the solutions to |A − λI| = 0, so they are λ = −1 and λ = −2. Note that in this case, there are only two eigenvalues (or, the eigenvalue −1 is repeated, or has multiplicity 2, as it is sometimes said). Example: Let 3 −1 1 A = 0 2 0. 1 −1 3 Then (check this!), the characteristic polynomial is −λ3 + 80λ2 − 20λ + 16. This factorises (check!) as −(λ − 2)(λ − 2)(λ − 4), so the eigenvalues are 2 and 4. There are only two eigenvalues in this case. (We sometimes say that the eigenvalue 2 is repeated or has multiplicity 2, because (λ − 2)2 is a factor of the characteristic polynomial.) To find an eigenvector for λ = 4, we have to solve the equation (A − 4I)x = 0, that is, x1 0 −1 −1 1 0 −2 0 x2 = 0 . 1 −1 −1 x3 0 The equations are −x1 − x2 + x3 −2x2 x1 − x2 − x3 = 0 = 0 = 0, so x2 = 0 and x1 = x3 . Choosing x3 = 1, we get the eigenvector 1 0. 1 For λ = 2, we have to solve the 1 0 1 equation (A − 2I)x = 0, that is, −1 1 x1 0 0 0 x2 = 0 . −1 1 x3 0 41 This system is equivalent to the single equation x1 −x2 +x3 = 0. (Convince yourself!) Choosing x3 = 1 and x2 = 0 we have x1 = −1, so we obtain the eigenvector −1 0. 1 Complex eigenvalues Although we shall only deal in this subject with real matrices (that is, matrices whose entries are real numbers), it is possible for such real matrices to have complex eigenvalues. This is not something you have to spend much time on, but you have to be aware of it. We briefly describe complex numbers.1 The complex numbers are based on the complex number i, which is defined to be the square root of −1. (Of course, no such real number exists.) Any complex number z can be written in the form z = a + bi where a, b are real numbers. We call a the real part and b the imaginary part of z. Of course, any real number is a complex number, since a = a + 0i. The following example shows a 2 × 2 matrix with complex eigenvalues, and it also demonstrates how to deal with complex numbers. 1 1 Example: Consider the matrix A = . We shall see that it has complex −9 1 eigenvalues. First, the characteristic polynomial |A−λI| is (1−λ)2 +9 = λ2 −2λ+10. (Check this!) Using the formula for the roots of a quadratic equation, the eigenvalues are √ 2 ± −36 . 2 Now p √ √ √ √ −36 = (36)(−1) = 36 −1 = 6 −1 = 6i. So the eigenvalues are the complex numbers 1 + 3i and 1 − 3i. Let’s proceed with finding eigenvectors. To find an eigenvector for 1 + 3i, we solve (A − (1 + 3i)I)x = 0, which is −3i 1 x1 0 = . −9 − 3i x2 0 This is equivalent to the equations −3ix1 + x2 = 0, −9x1 − 3ix2 = 0. But the second equation is just −3i times the first, so both are equivalent. Taking the second, we see that x1 = −i/3x2 . So an eigenvector is (taking x2 = 3) (−i, 3)T . For λ = 1 − 3i, we end up solving the system 3ix1 + x2 = 0, −9x1 + 3ix2 = 0, a solution of which is (i, 3)T . You should be aware, then, that even though we are not dealing with matrices that have complex numbers as their entries, the possibility still exists that eigenvalues (and eigenvectors) will involve complex numbers. However, if a matrix is symmetric (that is, it equals its transpose), then it certainly has real eigenvalues. This useful fact, which we shall prove later, is important when we consider quadratic forms in the next chapter. 42 1 For more discussion of complex numbers, see Appendix A3 of Simon and Blume. Diagonalisation of a square matrix Square matrices A and B are similar if there is an invertible matrix P such that P −1 AP = B. The matrix A is diagonalisable if it is similar to a diagonal matrix; in other words, if there is a diagonal matrix D and an invertible matrix P such that P −1 AP = D. Suppose that matrix A is diagonalisable, and that P −1 AP = D, where D is a diagonal matrix λ1 0 · · · 0 0 λ2 · · · 0 . D = diag(λ1 , λ2 , . . . , λn ) = .. 0 . 0 0 0 0 · · · λn (Note the useful notation for describing the diagonal matrix D.) Then we have AP = DP . If the columns of P are the vectors v1 , v2 , . . . , vn , then AP = A (v1 . . . vn ) = (Av1 . . . Avn ), and λ1 0 DP = 0 0 0 λ2 0 0 ··· ··· .. . 0 0 (v1 . . . vn ) = (λ1 v1 . . . λn vn ) . 0 · · · λn So this means that Av1 = λ1 v1 , Av2 = λ2 v2 , . . . , Avn = λn vn . The fact that P −1 exists means that none of the vectors vi is the zero vector. So this means that (for i = 1, 2, . . . , n) λi is an eigenvalue of A and vi is a corresponding eigenvector. Since P has an inverse, these eigenvectors are linearly independent. Therefore, A has n linearly independent eigenvectors. Conversely, if A has n linearly independent eigenvectors, then the matrix P whose columns are these eigenvectors will be invertible, and we will have P −1 AP = D where D is a diagonal matrix with entries equal to the eigenvalues of A. We have therefore established the following result. Theorem 3.1 A matrix A is diagonalisable if and only if it has n linearly independent eigenvectors. Suppose that this is the case, and let v1 , . . . , vn be n linearly independent eigenvectors, where vi is an eigenvector for eigenvalue λi . Then the matrix P = (v1 . . . vn ) is such that P −1 exists, and P −1 AP = D where D = diag(λ1 , . . . , λn ). There is a more sophisticated way to think about this result, in terms of change of basis and matrix representations of linear transformations. Suppose that T is the linear transformation corresponding to A, so that T (x) = Ax for all x. Suppose that A has a set of n linearly independent eigenvectors B = {x1 , x2 , . . . , xn }, corresponding (respectively) to the eigenvalues λ1 , . . . , λn . Since this is a linearly independent set of size n in Rn , B is a basis for Rn . By Theorem 2.18, the matrix of T with respect to B is AT [B, B] = ([T (x1 )]B . . . [T (xn )]B ) . But T (xi ) = Axi = λxi , so the coordinate vector of T (xi ) with respect to B is [T (xi )]B = (0, 0, . . . , 0, λi , 0, . . . , 0), 43 which has λi in entry i and all other entries zero. Therefore AT [B, B] = diag(λ1 , . . . , λn ) = D. But by Theorem 2.20, AT [B, B] = P −1 AT P, where P = (x1 . . . xn ) and AT is the matrix representing T , which in this case is simply A itself. We therefore see that P −1 AP = AT [B, B] = D, and so the matrix P diagonalises A. Example: Consider again the matrix 4 A = 0 4 0 4 4 4 4. 8 We have seen that it has three distinct eigenvalues 0, 4, 12, and that eigenvectors corresponding to eigenvalues 4, 0, 12 are (in that order) 1 1 1 −1 , 1 , 1 . 0 −1 2 We now form the matrix P whose columns are these eigenvectors: 1 1 1 P = −1 1 1. 0 −1 2 Then, according to the theory, P should have an inverse, and we should have P −1 AP = D = diag(4, 0, 12). To check that this is true, we could calculate P −1 and evaluate the product. The inverse may be calculated using either elementary row operations or determinants. (Matrix inversion is not part of this subject: however, it is part of the pre-requisite subject ‘Mathematics for economists’. You should therefore know how to invert a matrix.) Activity 3.2 Calculate P −1 and verify that P −1 AP = D. Not all n × n matrices have n linearly independent eigenvalues, as the following example shows. Example: The 2 × 2 matrix A= 4 −1 1 2 has characteristic polynomial λ2 − 6λ + 9 = (λ − 3)2 , so there is only one eigenvalue, λ = 3. The eigenvectors are the non-zero solutions to (A − 3I)x = 0: that is, 1 1 x1 0 = . −1 −1 x2 0 This is equivalent to the single equation x1 + x2 = 0, with general solution x1 = −x2 . Setting x2 = r, we see that the solution set of the system consists of all 44 vectors of the form −r r as r runs through all non-zero real numbers. So the −1 eigenvectors are precisely the non-zero scalar multiples of the fixed vector . 1 Any two eigenvectors are therefore multiples of each other and hence form a linearly dependent set. In other words, there are not two linearly independent eigenvectors, and the matrix is not diagonalisable. The following result is useful. It shows that if a matrix has n different eigenvalues then it is diagonalisable.2 Theorem 3.2 Eigenvectors corresponding to different eigenvalues are linearly independent. So if an n × n matrix has n different eigenvalues, then it has a set of n linearly independent eigenvectors and is therefore diagonalisable. It is not, however, necessary for the eigenvalues to be distinct. What is needed for diagonalisation is a set of n linearly independent eigenvectors, and this can happen even when there is a ‘repeated’ eigenvalue (that is, when there are fewer than n different eigenvalues). The following example illustrates this. Example: We considered the matrix 3 −1 1 A = 0 2 0 1 −1 3 above, and we saw that it has only two eigenvalues, 4 and 2. If we want to diagonalise it, we need to find three linearly independent eigenvectors. We found that an eigenvector corresponding to λ = 4 is 1 0, 1 and that, for λ = 2, the eigenvectors are given by the non-zero-vector solutions to the system consisting of just the single equation x1 − x2 + x3 = 0. Above, we simply wanted to find an eigenvector, but now we want to find two which, together with the eigenvector for λ = 4, form a linearly independent set. Now, the system for the eigenvectors corresponding to λ = 2 has just one equation and is therefore of rank 1; it follows that the solution set is two-dimensional. Let’s see exactly what the general solution looks like. We have x1 = x2 − x3 , and x2 , x3 can be chosen independently of each other. Setting x3 = r and x2 = s, we see that the general solution is x1 s−r 1 −1 x2 = s = s 1 + r 0 , (r, s ∈ R). x3 r 0 1 This shows that the solution space (the eigenspace, as it is called in this instance) is spanned by the two linearly independent vectors 1 −1 1, 0. 0 1 Now, each of these is an eigenvector corresponding to eigenvalue 2 and, together with our eigenvector for λ = 4, the three form a linearly independent set. So there are 45 2 For a proof, see Ostaszewski, ‘Mathematics in Economics’, Section 7.4. three linearly independent eigenvectors, even though two of them correspond to the same eigenvalue. The matrix is therefore diagonalisable. We may take 1 1 −1 P = 0 1 0. 1 0 1 Then (Check!) P −1 AP = D = diag(4, 2, 2). Orthogonal diagonalisation of symmetric matrices The matrix we considered above, 4 A = 0 4 4 4, 8 0 4 4 is symmetric: that is, its transpose AT is equal to itself. It turns out that such matrices are always diagonalisable. They are, furthermore, diagonalisable in a special way. A matrix P is orthogonal if P T P = P P T = I: that is, if P has inverse P T . A matrix A is said to be orthogonally diagonalisable if there is an orthogonal matrix P such that P T AP = D where D is a diagonal matrix. Note that P T = P −1 , so P T AP = P −1 AP . The argument given above shows that the columns of P must be n linearly independent eigenvectors of A. But the condition that P T P = I means something else, as we now discuss. Suppose that the columns of P are x1 , x2 , . . . , xn , so that P = (x1 x2 . . . xn ). Then the rows of the transpose P T are xT1 , . . . , xTn , so xT 1 P T xT2 = .. . . xTn Calculating the matrix product P T P , we find that the (i, j)-entry of P T P is xTi xj . But, since P T P = I, we must have xTi xi = 1 (i = 1, 2, . . . , n), xTi xj = 0 (i 6= j). We say that vectors x, y are orthogonal if the matrix product xT y is 0. (Orthogonality will be discussed in more detail later.) So, any two of the eigenvectors x1 , . . . , xn must be orthogonal. Furthermore, for i = 1, 2, . . . , n, xTi xi = 1. The length of a vector x is v u n √ uX x2i = xT x. kxk = t i=1 So, not only must any two of these eigenvectors be orthogonal, but each must have length 1. We shall discuss orthogonality in more detail in the next chapter. For the moment, we have the following result. Theorem 3.3 If the matrix A is symmetric (AT = A) then eigenvectors corresponding to different eigenvalues are orthogonal. 46 Proof Suppose that λ and µ are any two different eigenvalues of A and that x, y are corresponding eigenvectors. Then Ax = λx and Ay = µy. The trick in this proof is to find two different expressions for the product xT Ay (which then must, of course, be equal to each other). Note that the matrix product xT Ay is a 1 × 1 matrix or, equivalently, a number. First, since Ay = µy, we have xT Ay = xT (µy) = µxT y. But also, since Ax = λx, we have (Ax)T = (λx)T = λxT . Now, for any matrices M, N , (M N )T = N T M T , so (Ax)T = xT AT . But AT = A (because A is symmetric), so xT A = λxT and hence xT Ay = λxT y. We therefore have two different expressions for xT Ay: it equals µxT y and λxT y. Hence, µxT y = λxT y, or (µ − λ)xT y = 0. But since λ 6= µ (they are different eigenvalues), we have µ − λ 6= 0. We deduce, therefore, that xT y = 0. But this says precisely that x and y are orthogonal, which is exactly what we wanted to prove. This is quite a sneaky proof: the trick is to remember to consider xT Ay. The result just presented shows that if an n × n symmetric matrix has exactly n different eigenvalues then any n corresponding eigenvectors are orthogonal to one another. Since we may take the eigenvectors to have length 1, this shows that the matrix is orthogonally diagonalisable. The following result makes this precise. Theorem 3.4 Suppose that A has n different eigenvalues. Take n corresponding eigenvectors, each of length 1. (Recall that the length of a vector x is just kxk = pPn 2 i=1 xi .) Form the matrix P which has these eigenvectors as its columns. Then P −1 = P T (that is, P is an orthogonal matrix) and P T AP = D, the diagonal matrix whose entries are the eigenvalues of A. (Note that we have only shown here that symmetric matrices with n different eigenvalues are orthogonally diagonalisable, but it turns out that all symmetric matrices are orthogonally diagonalisable.) Example: We work with the same matrix 4 A = 0 4 we used earlier, 0 4 4 4. 4 8 As we have already observed, this is symmetric. We have seen that it has three distinct eigenvalues 0, 4, 12. Earlier, we found that eigenvectors for eigenvalues 4, 0, 12 are (in that order) 1 1 1 −1 , 1 , 1 . 0 −1 2 Activity 3.3 Convince yourself that any two of these three eigenvectors are orthogonal. 47 Now, these eigenvectors p √ are not of length 1. For example, √ the first one has length 12 + (−1)2 + 02 = 2. If we divide each entry of it by 2, we will indeed obtain an eigenvector of length 1: √ 1/√2 −1/ 2 . 0 We can similarly normalise the other two vectors, obtaining √ √ 1/√6 1/√3 1/ 3 , 1/ 6 . √ √ 2/ 6 −1/ 3 Activity 3.4 Make sure you understand this normalisation. We now form the matrix P whose columns are these normalised eigenvectors: √ √ √ 1/ √2 1/√3 1/√6 P = −1/ 2 1/ √3 1/√6 . 0 −1/ 3 2/ 6 Then P is orthogonal and P T AP = D = diag(4, 0, 12). Activity 3.5 Check that P is orthogonal by calculating P T P . Example: Let 7 0 A = 0 2 9 0 9 0. 7 Note that A is symmetric. We find an orthogonal matrix P such that P T AP is a diagonal matrix. The characteristic polynomial of A is 7 − λ 0 9 |A − λI| = 0 2−λ 0 9 0 7 − λ = = = (2 − λ)[(7 − λ)(7 − λ) − 81] (2 − λ)(λ2 − 14λ − 32) (2 − λ)(λ − 16)(λ + 2), where we have expanded the determinant using the middle row. So the eigenvalues are 2, 16, −2. An eigenvector for λ = 2 is given by 5x + 9z = 0, 9x + 5z = 0. This means x = z = 0. So we may take (0, 1, 0)T . This already has length 1 so there is no need to normalise it. (Recall that we need three eigenvectors which are of length 1.) For λ = −2 we find that an eigenvector is (−1, 0, 1)T (or some multiple of √ this). To normalise √ (that is, to make of length 1), we divide by its length, which is √2, obtaining (1/ 2)(−1, 0, 1)T . For λ = 16, we find a normalised eigenvector is (1/ 2)(1, 0, 1). It follows that if we let √ √ 0 −1/ 2 1/ 2 P = 1 0√ 0√ , 0 1/ 2 1/ 2 then P is orthogonal and P T AP = D = diag(2, −2, 16). Check this! 48 Learning outcomes This chapter has discussed eigenvalues and eigenvectors and the very important technique of diagonalisation. We shall see in the next chapter how useful a technique diagonalisation is. At the end of this chapter and the relevant reading, you should be able to: • explain what is meant by eigenvectors and eigenvalues, and by diagonalisation • find eigenvalues and corresponding eigenvectors for a square matrix • diagonalise a diagonalisable matrix • recognise what diagonalisation says in terms of change of basis and matrix representation of linear transformations • perform orthogonal diagonalisation on a symmetric matrix that has distinct eigenvalues. Sample examination questions The following are typical exam questions, or parts of questions. Question 3.1 Find the eigenvalues of the 0 A = 16 −16 matrix 1 −6 10 2 4 4 and find an eigenvector for each eigenvalue. Hence find an invertible matrix P and a diagonal matrix D such that P −1 AP = D. Question 3.2 Prove that the matrix 1 0 1 1 is not diagonalizable. Question 3.3 Let A be any (real) n × n matrix and suppose λ is an eigenvalue of A. Show that {x : Ax = λx}, the set of eigenvectors for eigenvalue λ together with the zero-vector 0, is a subspace of Rn . Question 3.4 Let −1 A = −6 0 Show that the vector 1 2 1 2 6. 1 1 x = 0 1 is an eigenvector of A. What is the corresponding eigenvalue? Find the other eigenvalues of A, and an eigenvector for each of them. Find an invertible matrix P and a diagonal matrix D such that P −1 AP = D. 49 Question 3.5 Let 0 −2 2 1. 0 3 0 A = 1 1 Find an invertible matrix P and a diagonal matrix D such that P −1 AP = D. Question 3.6 Let 2 A = 1 1 1 1 0 1 0. 1 Find the eigenvalues of A, and an eigenvector for each of them. Find an orthogonal matrix P and a diagonal matrix D such that P T AP = D. Question 3.7 Suppose that A is a real diagonalisable matrix and that all the eigenvalues of A are non-negative. Prove that there is a matrix B such that B 2 = A. Sketch answers or comments on selected questions Question 3.1 The characteristic polynomial is −λ3 + 14λ2 − 48, which is easily factorised as −λ(λ−6)(λ−8). So the eigenvalues are 0, 6, 8. Corresponding eigenvectors, respectively, are calculated to be (and non-zero multiples of) (1/2, −1/2, 1)T , (1/2, 1, 1)T , (1/4, 1, 0)T . We may therefore take 1/2 P = −1/2 1 1/2 1 1 1/4 1 , D = diag(0, 6, 8). 0 Question 3.2 You can check that the only eigenvalue is 1 and that the corresponding eigenvectors are all the scalar multiples of (1, 0)T . So there cannot be two linearly independent eigenvectors, and hence the matrix is not diagonalisable. Question 3.3 Denote the set described by W . First, 0 ∈ W . Suppose now that x, y are in W and that α ∈ R. We need to show that x + y and αx are also in W . We know that Ax = λx and Ay = λy, so A(x + y) = Ax + Ay = λx + λy = λ(x + y) and A(αx) = α(Ax) = α(λx) = λ(αx), so x + y and αx are indeed in W . Question 3.4 It is given that x is an eigenvector. To determine the corresponding eigenvalue we work out Ax. This should be λx where λ is the required eigenvalue. Performing the calculation, we see that Ax = x and so λ = 1. The characteristic 50 polynomial of A is p(λ) = −λ3 + 2λ2 + λ − 2. Since λ = 1 is a root, we know that (λ − 1) is a factor. Factorising, we obtain p(λ) = (λ − 1)(−λ2 + λ + 2) = −(λ − 1)(λ − 2)(λ + 1), so the other eigenvalues are λ = 2, −1. Corresponding eigenvectors are, respectively, (1, 1, 1)T and (0, −2, 1)T . We may therefore take 1 1 0 P = 0 1 −2 , D = diag(1, 2, −1). 1 1 1 Question 3.5 This is slightly more complicated since there are not 3 distinct eigenvalues. The eigenvalues turn out to be 1 and 2, with two occurring ‘twice’. An eigenvector for 1 is (−2, 1, 1)T . We need to find a set of 3 linearly independent eigenvectors, so we need another two coming from the eigenspace corresponding to 2. You should find that the eigenspace for λ = 2 is two-dimensional and has a basis consisting of (−1, 0, 1)T and (0, 1, 0)T . These two vectors together with (−2, 1, 1)T do indeed form a linearly independent set. Therefore we may take −2 −1 0 P = 1 0 1 , D = diag(1, 2, 2). 1 1 0 Question 3.6 The characteristic polynomial turns out to be p(λ) = −λ3 + 4λ2 − 3λ = −λ(λ − 3)(λ − 1), so the eigenvalues are 0, 1, 3. Corresponding eigenvectors are, respectively, (−1, 1, 1)T , (0, −1, 1)T , (2, 1, 1)T . To perform orthogonal diagonalisation (rather than simply diagonalisation) we need to normalise these (that is, make them of length √ √ 1 by √ dividing each by its length). The lengths of the vectors are (respectively), 3, 2, 6, so the normalised eigenvectors are √ √ √ √ √ √ √ √ (−1/ 3, 1/ 3, 1/ 3)T , (0, −1/ 2, 1/ 2)T , (2/ 6, 1/ 6, 1/ 6)T . If we take √ √ 0√ 2/√6 −1/√3 P = 1/√3 −1/√2 1/√6 , 1/ 3 1/ 2 1/ 6 then P −1 AP = diag(0, 1, 3) = D. Question 3.7 Since A can be diagonalised, we have P −1 AP = D for some P , where D = diag(λ1 , . . . , λn ), these entries being the eigenvalues of A. It is given that all λi ≥ 0. We have A = P DP −1 . Let p p p B = P diag( λ1 , λ2 , . . . , λn )P −1 . Then B2 p p p p λ2 , . . . , λn )P −1 P diag( λ1 , λ2 , . . . , λn )P −1 p 2 p 2 p 2 = P diag( λ1 , λ2 , . . . , λn )P −1 = P DP −1 = A, = P diag( p λ1 , p and we are done. 51