* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MODULE 11 Topics: Hermitian and symmetric matrices Setting: A is
Survey
Document related concepts
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Rotation matrix wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Gaussian elimination wikipedia , lookup
Symmetric cone wikipedia , lookup
Four-vector wikipedia , lookup
Matrix calculus wikipedia , lookup
Matrix multiplication wikipedia , lookup
Principal component analysis wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Jordan normal form wikipedia , lookup
Transcript
MODULE 11 Topics: Hermitian and symmetric matrices Setting: A is an n × n real or complex matrix defined on Cn with the complex dot product hx, yi = n X x j yj . j=1 Notation: A∗ = AT , i.e., aij = aji . We know from Module 4 that hAx, yi = hx, A∗ yi for all x, y ∈ Cn . Definition: If A = AT then A is symmetric. If A = ¶ A∗ then A is Hermitian. µ 1 i is symmetric but not Hermitian Examples: i 1 µ ¶ 1 i is Hermitian but not symmetric. −i 1 Note that a real symmetric matrix is Hermitian. Theorem: If A is Hermitian then hAx, xi is real for all x ∈ Cn . Proof: hAx, xi = hx, Axi = hAx, xi, so the number hAx, xi is real. Theorem: If A is Hermitian then its eigenvalues are real. Proof: Suppose that Au = λu, then λhu, ui = hλu, ui = hAu, ui = hu, A∗ ui = hu, Aui = hu, λui = λ̄hu, ui. It follows that λ = λ̄ so that λ is real. Theorem: If A is Hermitian then eigenvectors corresponding to distinct eigenvalues are orthogonal. Proof: Suppose that Au = λu and Av = µv for λ 6= µ. Then λhu, vi = hAu, vi = hu, Avi = hu, µvi = µhu, vi. 66 Since µ 6= λ it follows that hu, vi = 0. Suppose that A has n distinct eigenvalues {λ1 , . . . , λn } with corresponding orthogonal eigenvectors {u1 , . . . , un }. Let us also agree to scale the eigenvectors so that hui , uj i = δij where δij is the so-called Kronecker delta δij = ½ if i 6= j if i = j. 0 1 We recall that the eigenvalue equation can be written in matrix form AU = U Λ where U = (u1 u2 . . . un ) and Λ = diag{λ1 , . . . , λn }. We now observe from the orthonormality of the eigenvectors that U ∗ U = I. Hence U −1 = U ∗ and consequently A = U ΛU ∗ . In other words, A can be diagonalized in a particularly simple way. Example: Suppose A= µ 1 i −i 1 ¶ We saw that A is Hermitian; its eigenvalues are the roots of (1 − λ)2 − 1 = 0 so that √ u1 = (1, i)/ 2 √ u2 = (1, −i)/ 2 λ1 = 0, λ2 = 2, which shows that hu1 , u2 i = 0. Thus 1 U U=√ 2 ∗ µ 1 1 −i i ¶ 1 ·√ 2 µ 1 1 i −i and A=U µ 0 0 0 2 67 ¶ U ∗. ¶ = µ 1 0 0 1 ¶ We saw from the example A= µ 1 0 1 1 ¶ that the eigenvector system for A can only be written in the form A µ 1 1 0 0 ¶ = µ 1 0 1 0 ¶µ 1 0 0 1 ¶ . This matrix A cannot be diagonalized because we do not have n linearly independent eigenvectors. However, a Hermitian matrix can always be diagonalized because we can find an orthonormal eigenvector basis of Cn regardless of whether the eigenvalues are distinct or not. Theorem: If A is Hermitian then A has n orthonormal eigenvectors {u 1 , . . . , un } and A = U ΛU ∗ where Λ = diag{λ1 , . . . , λn }. Proof: If all the eigenvalues were distinct then the results follows from the forgoing discussion. Here we shall outline a proof this result for the case where A is real so that the setting is Rn rather than Cn . Define the function f (y) = hAy, yi Analysis tells us that this function has a minimum on the set hy, yi = 1. The minimizer can be found with Lagrange multipliers. We have the Lagrangian L(y) = hAy, yi − λ(hy, yi − 1) and the minimizer x1 is found from the necessary condition ∇L(y) = 0. We compute ∂L = hAk , yi + hAy, ek i − 2λyk = 2(hAk , yi − λyk ) ∂yk 68 so that we have to solve ∇L ≡ Ay − λy = 0 hy, yi = 1. The solution to this problem is the eigenvector x1 with eigenvalue λ1 . If we now apply the Gram-Schmidt method to the set of n + 1 dependent vectors {x1 , e1 , . . . , en } we end up with n orthogonal vectors {x1 , y2 , . . . , yn }. We now minimmize f (y) = hAy, yi subject to kyk = 1 over all y ∈ span{y2 , . . . , y2 } and get the next eigenvalue λ2 and eigenvector x2 which then necessarily satisfies hx1 , x2 i = 0. Next we find an orthogonal basis of Rn of the form {x1 , x2 , y3 , . . . , yn } and minimize f (y) over span {y3 , . . . , yn } subject to the constraint that kyk = 1. We keep going and find eventually n eigenvectors, all of which are mutually orthogonal. Afterwards all eigenvectors can be normalized so that we have an orthonormal eigenvector basis of Rn which we denote by {u1 , . . . , un }. If A is real and symmetric then this choice of eigenvectors satisfies U T U = I. If A is complex then the orthonormal eigenvectors satisfy U ∗ U = I. In either case the eigenvalue equation AU = U Λ can be rewritten as A = U ΛU ∗ or equivalently, U ∗ AU = Λ. We say that A can be diagonalized. We shall have occasion to use this result when we talk about solving first order systems of linear ordinary differential equations. 69 The diagonalization theorem guarantees that for each eigenvalue one can find an eigenvector which is orthogonal to all the other eigenvectors regardless of whether the eigenvalue is distinct or not. This means that if an eigenvalue µ is a repeated root of multiplicity k (meaning that det(A − λI) = (λ − µ)k g(λ), where g(µ) 6= 0), then dim N (A − µI) = k. We simply find an orthogonal basis of this null space. The process is, as usual, the application of Gaussian elimination to (A − µI)x = 0 and finding k linearly independent vectors in the null space which can then be made orthogonal with the Gram-Schmidt process. Some consequences of this theorem: Throughout we assume that A is Hermitian and that x is not the zero vector. Theorem: hAx, xi > 0 if and only if all eigenvalues of A are strictly positive. Proof: Suppose that hAx, xi > 0. If λ is a non-positive eigenvalue with eigenvector u then hAu, ui = λhu, ui ≤ 0 contrary to assumption. Hence the eigenvalues must be positive. Conversely, assume that all eigenvalues are positive. Let x be arbitrary, then the existence of an orthonormal eigenvector basis {uj } assures that x = α 1 u1 + · · · + α n un . If we substitute this expansion into hAx, xi we obtain hAx, xi n X λj α j α j > 0 j=1 for any non-zero x. Definition: A is positive definite if hAx, xi > 0 for x 6= 0. A is positive semi-definite if hAx, xi ≥ 0 for all x. A is negative definite if hAx, xi < 0 for x 6= 0. A is negative semi-definite if hAx, xi ≤ 0 for all x. Note that for a positive or negative definite matrix Ax = 0 if and only if x = 0 while a semi-definite matrix may have a zero eigenvalue. 70 An application: Consider the real quadratic polynomial P (x, y, z) = a11 x2 + a12 xy + a13 xz + a22 y 2 + a23 yz + a33 z 2 + b1 x + b2 y + b3 z + c We “know” from analytic geometry that P (x, y, z) = 0 describes an ellipsoid, paraboloid or hyperboloid in R3 . But how do we find out? We observe that P (x, y, z) = 0 can be rewritten as where * + * + x x x P (x, y, z) = A y , y + b, y c = 0 z z z a11 A = a12 /2 a13 /2 Hence A can be diagonalized a12 /2 a13 /2 a22 a23 /2 a23 /2 a33 is a symmetric real matrix. A = U ΛU T Define the new vector (new coordinates) X x Y = UT y Z z Then P (X, Y, X) = + + * X X X + b, U Y + c Λ Y , Y Z Z Z * = λ1 X 2 + λ2 Y 2 + λ3 Z 2 + lower order terms where {λj } are the eigenvalues of A. If all are positive or negative we have an ellipsoid, if all are non-zero but not of the same sign we have a hyperboloid, and if at least one eigenvalue is zero we have a paraboloid. The lower order terms determine where these shapes are centered but not their type. 71 Module 11 Homework 1) Suppose that A has the property A = −A∗ . In this case A is said to be skew-Hermitian. i) Show that all eigenvalues of A have to be purely imaginary. ii) Prove or disprove: The eigenvectors corresponding to distinct eigenvalues of a skew-Hermitian matrix are orthogonal with respect to the complex dot product. iii) If A is real and skew-Hermitian what does this imply about A T ? 2) Let A be a skew-Hermitian matrix. i) Show that for each x ∈ Cn we have RehAx, xi = 0 ii) Show that for any matrix A we have hAx, xi = hBx, xi + hCx, xi where B is Hermitian and C is skew-Hermitian. 3) Suppose that U ∗ U = I. What can you say about the eigenvalues of U ? 4) Find a new coordinate system such that the conic section x2 − 2xy + 4y 2 = 6 is in standard form so that you can read off whether it is an ellipse, parabola or hyperbola. 72