* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download notes on matrix theory - VT Math Department
Laplace–Runge–Lenz vector wikipedia , lookup
Exterior algebra wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Euclidean vector wikipedia , lookup
Rotation matrix wikipedia , lookup
Vector space wikipedia , lookup
Principal component analysis wikipedia , lookup
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Jordan normal form wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
System of linear equations wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Four-vector wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Matrix calculus wikipedia , lookup
NOTES ON MATRIX THEORY BY CHRISTOPHER BEATTIE JOHN ROSSI Department of Mathematics Virginia Tech © Christopher Beattie and John Rossi, 2003 2 Contents 1 Linear equations and elementary matrix algebra { a review 1.1 1.2 1.3 1.4 Vectors in and . Matrix Operations . . . Matrix Inverses . . . . . The LU Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.1 A Model for General Vector Spaces . . . . . . . 2.2 The Basics of Bases . . . . . . . . . . . . . . . 2.2.1 Spanning Sets and Linear Independence 2.2.2 Basis and Dimension . . . . . . . . . . . 2.2.3 Change of Basis . . . . . . . . . . . . . 2.3 Linear Transformations and their Representation . . . . . . . . . . . . . . . 2.3.1 Matrix Representations . . . . . . . . . 2.3.2 Similarity of Matrices . . . . . . . . . . 2.4 Determinants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Rn Cn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Vector Spaces and Linear Transformations 3 Inner Products and Best Approximations 3.1 3.2 3.3 3.4 3.5 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . Best approximation and projections . . . . . . . . . . . . . . . . Pseudoinverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . Orthonormal Bases and the QR Decomposition . . . . . . . . . . Unitary Transformations and the Singular Value Decomposition . 4 The Eigenvalue Problem 4.1 Eigenvalues and Eigenvectors . . . . . . 4.1.1 Eigenvalue Basics . . . . . . . . . 4.1.2 The Minimal Polynomial . . . . 4.2 Invariant Subspaces and Jordan Forms . 4.2.1 Invariant Subspaces . . . . . . . 4.2.2 Jordan Forms . . . . . . . . . . . 4.3 Diagonalization . . . . . . . . . . . . . . 4.4 The Schur Decomposition . . . . . . . . i . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 10 14 20 29 35 35 42 42 45 51 54 59 62 64 71 71 75 79 82 87 95 95 96 100 105 106 108 112 116 ii CONTENTS 4.5 Hermitian and other Normal Matrices . . . . . . . . 4.5.1 Hermitian matrices . . . . . . . . . . . . . . . 4.5.2 Normal Matrices . . . . . . . . . . . . . . . . 4.5.3 Positive Denite Matrices . . . . . . . . . . . 4.5.4 Revisiting the Singular Value Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 121 122 123 126 Chapter 1 Linear equations and elementary matrix algebra { a review Prerequisites: None, aside from being a careful reader. Advanced Prerequisites: Familiarity with row reduction for solving linear systems. Learning Objectives: Review of row reduction and Gauss Elimination for solving systems of linear equations. Introduction of matrices as basic bookkeeping devices to organize row reduction. Identication of all possible solutions for a system of linear equations and recognition of when a system fails to have any solution. The set of equations x + 2y + 2z = 3 3x + 5y + 4z = 5 2x + y 3z = 7 1 (1.1) 2 Linear Equations { a Review is an example of a system of linear equations. The variables x, y, and z appear \linearly" in the sense that they each have unit exponents (that is, they're not raised to a power other than one); they don't multiply one another (for example, no terms like \6xy"); and they don't otherwise participate as arguments of more complicated functions like sin(x). A solution of the system (1.1) is a choice of values for each of x, y, and z which, upon substitution, will satisfy all equations simultaneously. The solution set is then simply the set of all possible solutions. The usual strategy in solving a linear system such as this is to transform it into a simpler linear system that may then be easily solved. \Simpler" in this context means that each equation of the simpler linear system has only one variable with a nonzero coeÆcient for which a value has not yet been determined (for which we can then directly solve). For example, the linear system 2x = 3 3x + y = 5 2x + y + z = 7 is simple in this sense, since the rst equation can be solved directly for x, then knowing x, the second equation can be solved directly for y, and then knowing both x and y, the nal equation can be solved for z . Notice that the feature that makes this particular system of equations simple is that the rst equation has zero as a coeÆcient for both y and z , and the second equation has zero as a coeÆcient for z . The process of transforming the original linear system into a simpler one then involves systematically introducing zeros as coeÆcients of certain variables in certain equations. However we might conceive of ways of ddling with the coeÆcients, of paramount importance is that we not change the set of solutions in the course of our ddling. There are many types of transformations that we may apply to a linear system that will not change the set of solutions and that have the additional desired potential for yielding simplied linear systems. We list three here that suÆce for all our purposes. These are called elementary transformations: Type 1 : Replace an equation with the sum of the same equation and a multiple of another equation; Type 2 : Interchange two equations; Type 3 : Multiply an equation by a nonzero number. It should be obvious that Type 2 and Type 3 operations will not change the set of solutions to the linear system in any way, since (for Type 2 ) the set of solutions must be independent of the order in which the equations are presented and (for Type 3 ) multiplying both sides of a given equation by a nonzero number cannot change the set of values that satisfy that particular equation. It is less obvious perhaps that Type 1 operations cannot change the solution set of the system. Let's consider this in more detail. It is apparent 3 Gaussian Elimination that the set of solutions to a linear system cannot be diminished by a Type 1 operation, since any set of values that satisfy all equations of the original system before the Type 1 operation will satisfy all equations of the modied system after the Type 1 operation as well. But can a Type 1 operation expand the set of solutions to a linear system ? That is, could there be solutions to the modied system (after the Type 1 operation) that were not solutions to the original system (before the Type 1 operation) ? The key is in observing that the eect of any Type 1 operation can be reversed with another Type 1 operation. For example, the Type 1 operation, (equation i) + (equation j) ! (equation i) can be undone by another Type 1 operation, (equation i) (equation j) ! (equation i) Then, the o riginal system could be viewed as resulting from a Type 1 operation applied to the modied system and so (using the reasoning above), every solution of the original system must also be a solution to the modied system. We conclude that the original and modied systems have precisely the same solutions and Type 1 operations leave the solution set for a system unchanged. Brief reection will make it clear that these three types of elementary transformations only permit summing, subtracting, scaling, and exchanging of equation coeÆcients with corresponding coeÆcients of other equations, and that these transformations never allow the coeÆcients of any given variable to be summed or exchanged with coeÆcients of a dierent variable. For this reason, it is possible to introduce a \bookkeeping" mechanism to streamline the solution process. Recognizing that only the relative position of the equation coeecients need be known to apply any of the three elementary transformations, one typically denes an augmented matrix by stripping away all extraneous symbols and letters, leaving only the coeÆcients and right-hand side constants in place. This represents a linear system of equations without the clutter of symbols, e.g., 2 3 1 2 2 3 x + 2y + 2z = 3 4 3 5 4 5 5 represents 3x + 5y + 4z = 5 2 1 3 7 2 x + y 3z = 7 The elementary equation transformations described before now correspond to operations on rows in the augmented matrix and are known as elementary row operations : Type 1 : Replace a row with the sum of the same row and a multiple of another row in the augmented matrix; Type 2 : Interchange two rows; and Type 3 : Multiply a row by a nonzero number. 4 Linear Equations { a Review We proceed to simplify the system by introducing zeros as follows: 2 4 1 2 3 5 2 1 2 4 3 3 5 7 3 5 + Type 1: 3 2 3 1 2 2 4 0 1 2 45 0 3 7 13 + Type 1: 3 2 1 2 2 3 4 0 1 2 45 0 0 1 1 x + 2y + 2z = 3x + 5y + 4z = 2x + y 3z = (Row/Eqn 2) + ( (Row/Eqn 3) + ( 3) 2) 3 5 7 (Row/Eqn 1) ! (Row/Eqn 2) (Row/Eqn 1) ! (Row/Eqn 3) x + 2y + 2z = 3 1y 2z = 4 3y 7z = 13 (Row/Eqn 3) + ( 3) (Row/Eqn 2) ! (Row/Eqn 3) x + 2y + 2z = 3 y 2z = 4 z = 1 The nal linear system is \simpler" and straightforward to solve: z = 1 (solving the third equation for z ) y = 2 (solving the second equation for y) x= 3 (solving the rst equation for x): We were able to do the reduction to a simpler system here using only Type 1 elementary row operations and then followed this with \back substitution". By using Type 1 and Type 2 elementary operations, one can always achieve a rough \triangular" form similar to this, called the row echelon form. The dening feature of the row echelon form is that the rst nonzero entry of each row (called the \pivot" for that row) must occur below and to the right of the pivot of each of the preceding rows above it. Rows consisting of all zeros occur at the bottom of the matrix. Type 1 operations do the work of introducing zeros, while Type 2 operations do the necessary shuing of rows to get a triangular form. This is an important enough observation to state as Theorem 1.1. Every m n matrix can be reduced to row echelon form using only Type 1 and Type 2 elementary operations Although terminology can vary, the composite process rst of using Type 1 and Type 2 elementary operations to reduce to a row echelon form that represents a \triangular" system, and then of solving the triangular system by back substitution, is called Gauss elimination. This isn't the only strategy available to us. Instead of using back substitution after achieving a row echelon form as we did above, we could have continued the reduction process with elementary operations by next attacking nonzero entries in the upper triangular portion of the augmented matrix, and then nally dividing each row by its rst nonzero entry. : : : continuing with the reduction: Gaussian Elimination 5 .. . + Type 1: (Row/Eqn 1) + 2 (Row/Eqn 2) ! (Row/Eqn 1) 2 3 1 0 2 5 x 2z = 5 4 0 1 2 45 y 2z = 4 0 0 1 1 z = 1 2) + ( 2) (Row/Eqn 3) ! (Row/Eqn 2) + Type 1: (Row/Eqn (Row/Eqn 1) + ( 2) (Row/Eqn 3) ! (Row/Eqn 1) + Type 3: ( 1) (Row/Eqn 3) ! (Row/Eqn 3) 2 3 1 0 0 3 x = 3 4 0 1 0 y = 2 25 z = 1 0 0 1 1 (the same result as before). The additional elementary operations had the eect of the back substitution phase of Gauss elimination. By using all three elementary operations in this way we can always achieve a reduced triangular form called reduced row echelon form. The dening features of a reduced row echelon form are that it be a row echelon form matrix for which additionally, the rst nonzero entry in each row is a \1" (a \leading one") and each column with a leading one has no other nonzero entries. The process of using all the elementary operations in this way to achieve a reduced row echelon form from which the solution is then available by inspection is generally called Gauss-Jordan elimination. Although Gauss-Jordan elimination is useful in some circumstances (typically involving hand calculations), Gauss elimination based on what was described above or in the equivalent \LU" form described in x1.4 is somewhat more eÆcient and predominates in computer implementations. Although this collection of methods for solving linear systems usually carries some reference to Gauss in its name, the same basic idea of transforming systems of linear equations systematically to a simpler form to extract a solution was known in the early Han Dynasty in China nearly 2,000 years earlier. Gauss rediscovered this elimination method at the beginning of the 19th century and recommended it as replacement to the then commonly used Cramer's Rule1 for practical computation. Gauss and the geodisist Wilhelm Jordan (who later in the century added the renements useful in hand calculations we described above) used these methods to solve least squares problems in celestial mechanics and surveying calculations, respectively. A linear system of equations is said to be consistent if there exists at least one choice of variable values that satisfy all equations simultaneously { that is, if there is at least one solution to the system. If no such choice exists, the system is called inconsistent. A linear system of equations is called homogeneous if the constants on the right hand side of the equations are each equal to zero. If one 1 Against all odds, Cramer's Rule still survives in linear algebra pedagogy today despite being more diÆcult to understand, unable to provide any result for systems with multiple solutions, and immensely more expensive even for systems of modest size than either Gauss or Gauss-Jordan elimination. Go gure : : : 6 Linear Equations { a Review or more of these constants are nonzero, then the system is nonhomogeneous. A homogeneous system is always consistent since the trivial or zero solution consisting of all variables set equal to zero always is one possible solution. Consider the linear system of equations x1 2x2 + 2x3 x5 = 1 3x1 6x2 + 7x3 + 3x4 + x5 + 3x6 = 4 2x1 4x2 + 7x3 + 2x4 + 4x5 + 9x6 = 7 2x1 + 4x2 2x3 2x4 + 3x5 + 6x6 = 2 with its associated augmented matrix representation 2 3 1 2 2 1 0 0 1 6 3 6 7 3 1 3 4 77 : 6 4 2 4 7 2 4 9 75 2 4 2 2 3 6 2 Using Gauss elimination, the nal augmented matrix in row echelon form and the linear system of equations it represents are found to be 2 3 1 2 2 1 0 0 1 x1 2x2 + 2x3 + x4 = 1 6 0 7 0 1 0 1 3 1 x3 + x5 + 3x6 = 1 6 7 4 0 0 0 0 1 0 25 x5 = 2 0 0 0 0 0 0 0 (0 = 0) As we solve backwards starting with x5 , notice that x6 , x4 , and x2 are completely unconstrained to the extent that whatever values we might choose for them will lead to a valid solution of the linear system provided that values for the remaining variables are chosen consistently with the remaining equations. If we label the values of x6 , x4 , and x2 respectively, with the free parameters r, s, and t, say, then we obtain that x5 = 2, x6 = r (free), x3 = 1 3r, x4 = s (free), x2 = t (free), and x1 = 3 s + 2t. Note that the system has innitely many solutions since we have an innite number of possible choices for assigning values to r, s, and t. This example represents a situation occuring for a large class of linear systems as described in the following theorem. Theorem 1.2. A consistent system of linear equations with more unknowns than equations always has innitely many solutions. As an important special case, a homogeneous system of linear equations with more unknowns than equations always has innitely many nontrivial solutions. The idea underlying Theorem 1.2 involves rst noticing that the pivot in each row is associated with a variable that is solved for in terms of remaining variables and the right hand side constants. Since the maximum number of these leading nonzero entries can be no larger than the total number of equations, which by hypothesis is strictly smaller than the total number of variables, there must 7 Gaussian Elimination be \left-over" variables that are associated with free parameters that therefore could take on an innite number of possible values. Simply stated, there are too few equations to completely specify the unknowns, so there will always be at least one free parameter able to take arbitrary values. Notice that a linear system with more unknowns than equations might have no solutions and that a linear system having an innite number of solutions might have fewer unknowns than equations (e.g., consider x1 2x2 = 1 2x1 + 4x2 = 2 3x1 6x2 = 3 which has the family of solutions (1 + 2s; s) with s varying over R). If we have multiple linear systems of equations all having the same left hand side coeÆcients but diering only in the right hand side constants, we do not have to resolve each system separately from scratch. Some advantage can be taken from the fact that the elementary row operations do not combine values across columns. For example, suppose we have two linear systems represented in terms of augmented matrices as 2 3 2 3 1 3 1 3 1 3 1 2 4 2 5 3 4 2 5 3 2 5: 45 3 9 4 1 3 9 4 3 We can represent both linear systems together in a single \fat" augmented matrix as 3 2 1 3 1 3 2 4 2 5 3 4 2 5; 3 9 4 1 3 and proceed with just a single reduction task instead of the original two that we had faced: + (Type 1) 2 3 1 3 1 3 2 4 0 1 1 10 2 5 0 0 1 8 3 + (Type 1, Type 3) 2 4 1 0 0 1 0 0 4 1 1 27 10 8 + (Type 1) 4 2 3 3 5 8 Linear Equations { a Review 2 3 1 0 0 5 8 4 0 1 0 2 15 0 0 1 8 3 Thus the solution set to the rst system is ( 5; 2; 8) and to the second system is (8; 1; 3). In exercise problems 1.1 - 1.4, nd the solution sets for the given systems. x1 + x2 + 2x3 = 8 x1 + x2 + 2x3 = 4 Exercise 1.1. 2x1 9x2 + 7x3 = 11 2x1 9x2 + 7x3 = 4 x1 2x2 + 3x3 = 1 x1 2x2 + 3x3 = 9 Exercise 1.2. x1 x2 = 2 x1 x2 = 7 2x1 + x2 = 1 2x1 + x2 = 5 3x1 + 2x2 = 1 3x1 + 2x2 = 6 Exercise 1.3. x1 2x2 + x3 4x4 = 1 2x1 + x2 + 8x3 2x4 = 3 2x1 9x2 4x3 14x4 = 1 Exercise 1.4. 2x1 + 2x2 x3 + x5 = 0 x1 x2 + 2x3 3x4 + x5 = 0 x1 + x2 2x3 x5 = 0 x3 + x4 + x5 = 0 Problem 1.1. How many arithmetic operations should you expect to be necessary to solve a system of m linear equations in n unknowns using Gauss elimination ? 1. Give an explicit expression for the leading terms with respect to m and n. Suppose rst that the total number of arithmetic operations has the form an3 + bn2 m + cnm2 + dm3 + : : : where \: : : " includes all terms of lower order like those that are pure second order or less in m or n. Find a, b. c, and d. For convenience, suppose that only Type 1 operations are necessary to reduce the matrix to row echelon form and that every Type 1 operation introduces only a single zero into the modied matrix. Assume that only the basic solution (all free parameters set to zero) is calculated. It might be useful to recall the elementary formulas n n X X n(n + 1)(2n + 1) n(n + 1) k2 = k= 2 6 k=1 k=1 Gaussian Elimination 9 2. Write a Matlab routine that accepts an augmented matrix of dimension m (n +1) representing the system of equations; performs Gauss elimination with the assumptions made in part 1; and returns the basic solution. Using the flops command to count arithmetic operations, plot the number of arithmetic operations against m, for values of m ranging from 5; : : : ; 15 and with xed n = 10. n, for values of n ranging from 5; : : : ; 15 and with xed m = 10. How do your plots compare with your predictions from part 1 ? 3. How would the number of arithmetic operations change if p > 1 linear systems associated with dierent right-hand sides but the same coeÆcient matrix were solved simultaneously ? Justify your answer either analytically or experimentally. 10 Linear Equations { a Review 1.1 Vectors in R n and C n Prerequisites: Basic knowledge of vectors and Euclidean length. Familiarity with the arithmetic of complex numbers. Learning Objectives: Familiarity with vector notation. Ability to calculate the Euclidean norm, dot product, angle, and Euclidean distance between vectors in Rn and C n . You should already have some idea of the basic algebra of vectors. For completeness, here are some basic denitions. Denition 1.3 (Rn ). We dene Rn to be the set of ordered n-tuples of real numbers 0 B x1 x2 x=B B . @ . . xn 1 C C C: A We usually call elements of Rn vectors and use boldface, lowercase, Latin letters to represent them. We assume the usual rules for vector addition 0 1 0 1 0 1 x1 y1 x1 + y1 B x2 C B y2 C B x2 + y2 C B C B C B B . C + B . C := B C .. C @ .. A @ .. A @ . A xn yn xn + yn and scalar multiplication 0 1 0 1 x1 (x1 ) B x2 C B (x2 ) C C B B B . C := B C .. C @ .. A @ . A xn (xn ) for any 2 R. Remark 1.4. Some texts use Rn to refer to n-dimensional columns of numbers and Rn to refer to n-dimensional rows of numbers. We will follow the more common practice of letting Rn refer to both sets with the obvious map (the transpose) between the two types of objects. Vectors in Rn and 11 Cn Remark 1.5. The space Rn satises the denition of an abstract vector space which is described elsewhere in the series. Example 1.6. Let x = (1; 1; 2; 3) and y = ( 4; 1; 02). Then x + y = (1 + ( 4); 1 + 1; 2 + 0; 3 + 2) = ( 3; 0; 2; 5) x y = (1 ( 4); 1 1; 2 0; 3 2) = (5; 2; 2; 1) 3y = (3( 4); 3(1); 3(0); 3(2)) = ( 12; 3; 0; 6): The basic geometric ideas of vectors are contained in the following denitions: Denition 1.7. We equip Rn with the Euclidean norm q kxk := x + x + + xn and the dot product Note that 2 1 2 2 2 x y := x1 y1 + x2 y2 + + xn yn : kxk = x x: The angle, between two nonzero vectors is dened by the identity cos := kxxkkyyk : 2 The distance between two vectors x and y is the norm of the dierence kx yk We dene the open ball of radius r 2 R about the vector x to be Br (x) := fy 2 Rn j kx yk < rg: In R2 we designate two special vectors of unit length i = (1; 0) and j = (0; 1). In R3 we write i = (1; 0; 0), j = (0; 1; 0) and k = (0; 0; 1). Example 1.8. Let x = (1; 1; 2; 3) and y = ( 4; 1; 0; 2). Then p p kxk = 12 + ( 1)2 + 22 + 32 = 15: p p kyk = ( 4)2 + 12 + 02 + 22 = 21: x y = (1)( 4) + ( 1)(1) + (2)(0) + (3)(2) = 1: The distance between the two vectors is given by p p kx yk = (1 ( 4))2 + ( 1 1)2 + (2 0)2 + (3 2)2 = 34 12 Linear Equations { a Review Problem 1.2. Compute the distance between (3; 2) and (7; 4). Compute the angle between them. The following two theorems state simply that the Euclidean norm and dot product obey more abstract denitions of a norm and \inner product" discussed elsewhere in the series. Theorem 1.9. The Euclidean norm in Rn satises the following properties. 1. For any x 2 Rn and any scalar 2 R we have kxk = jjkxk: (Note that jj denotes the absolute value of while kxk denotes the Euclidean norm of x.) 2. For any x 2 Rn we have kxk 0 and equality holds if and only if x = 0. 3. For any x 2 Rn and y 2 Rn we have the triangle inequality kx + yk kxk + kyk; Theorem 1.10. The dot product in Rn satises the following properties: 1. For any x 2 Rn , y 2 Rn , and z 2 Rn and any scalars 2 R and we have 2. For any x 2 Rn 2R (x + y) z = (x z) + (y z): and y 2 Rn xy =yx 3. For any x 2 Rn we have xx0 and equality holds if and only if x = 0. Denition 1.11 (C n ). The space C n of ordered n-tuples of complex numbers is dened in much the same way as Rn except that scalars are complex numbers. The Euclidean norm of z 2 C n is dened by p kzk := jz1 j2 + jz2 j2 + + jzn j2 and the dot product z w := z1 w1 + z2 w2 + + znwn : (Here jzi j is the modulus and zi is the complex conjugate of the complex number zi .) Note that jzj2 = z z: Vectors in Rn and 13 Cn Problem 1.3. Show that the Euclidean norm in C n satises the following prop- erties. 1. For any z 2 C n and any scalar 2 C we have kzk = jjkzk: (Note that jj denotes the modulus of the complex number while kzk denotes the Euclidean norm of z.) 2. For any z 2 C n we have kzk 0 and equality holds if and only if z = 0. 3. For any z 2 C n and w 2 C n we have the triangle inequality kz + wk kzk + kwk; Problem 1.4. Show that the dot product in C n satises the following proper- ties. 1. For any z 2 C n , w 2 C n , and y 2 Rn and any scalars 2 C and 2 C we have z (w + y) = (z w) + (z y): 2. For any z 2 C n and w 2 C n zw =wz 3. For any z 2 C n , w 2 C n , and any scalar 2 C we have (z) w = (z w): 4. For any z 2 C n we have zz 0 and equality holds if and only if z = 0. Remark 1.12. Note that we have dened the complex dot product so that only the rst vector sees conjugation and no conjugation occurs on the entries of the second vector. Many texts do exactly the opposite, so you should watch to see which convention is used. 14 Linear Equations { a Review 1.2 Matrix Operations Prerequisites: Familiarity with elementary matrix arithmetic. Learning Objectives: Familiarity with matrix notation. Review of basic operations such as addition, scalar multiplication, multiplication, and transposition of matrices. Familiarity with block matrix operations. The ability to express elementary row operations as matrix operations. Recall that an m n matrix A is a rectangular array of real or complex numbers having m rows and n columns. We denote the entry (or component) of A in the ith row and j th column by aij and write the matrix A represented in terms of its components as [aij ]. To add two matrices A = [aij ] and B = [bij ], both must be of the same size and we dene the sum C = [cij ] as A + B = C = [aij + bij ]; indicating that matrices are added componentwise and that cij = aij + bij for each i and j . We denote the m n matrix all of whose entries are zero by 0mn , or simply 0, if the context makes the matrix size unambiguous. Clearly, A + 0 = 0 + A = A: Scalar multiplication of a matrix, A, by a (real or complex) scalar is dened by A = [aij ]: If A = [aij ] is an m p matrix and B = [bij ] is a p n matrix we dene the matrix product AB as an m n matrix C = [cij ] with cij = p X `=1 ai` b`j Matrix Operations 15 Note that cij is just the dot product of the ith row of A with the j th row of B each considered as a vector in Rp . Recall that the matrix product of two matrices is usually not commutative, that is, AB is not usually equal to BA, even when both products are dened. However, matrix products are always associative, that is (AB)C = A(BC) We dene the square n n identity matrix In = [Æij ] so that Æij = 0 if i 6= j and Æii = 1. For example, 2 3 1 0 0 1 0 I2 = 0 1 ; I3 = 4 0 1 0 5 ; etc. 0 0 1 The greek letter Æ is generally used for the elements of In for historical reasons, but this use does serve to avoid confusion since the expected letter i is so often used as a subscript. In this context, Æij is also called the \Kronecker delta." If the matrix dimension is unambiguous then we simply write the identity matrix as I. It is easy to see that if A is an m n matrix then AIn = Im A = A: We will use the convention throughout these notes that n 1 matrices typically will be denoted by lower case boldface letters and will usually be referred to as vectors. Thus, dening A, B, and x as 2 3 2 3 2 3 1 2 2 3 x A = 4 3 5 4 5 ; B = 4 5 5 ; and x = 4 y 5 ; 2 1 3 7 z our original system (1.1) can then be written compactly with the aid of matrix multiplication as Ax = B: It is occasionally useful to view the matrix product of two matrices broken down in terms of matrix products of smaller submatrices. Consider the pair of matrices 2 3 3 2 6 1 1 77 A = 13 24 31 12 B=6 4 2 15 2 3 and label the submatrices 3 1 ;B = 3 2 ;B = 2 1 : A11 = 13 24 ; A12 = 11 21 1 2 1 1 2 3 16 Linear Equations { a Review A simple calculation will verify that AB = 111 154 = A11 B11 + A12 B21 : The right-hand expression is an example of partitioned matrix multiplication. More generally, suppose that the matrices A and B can be partitioned as 2 3 2 3 A11 A12 A1p B11 B12 B1r 6 A21 A22 A2p 7 6 B21 B22 B2r 7 7 6 A=6 B = 6 . 7 6 . .. .. 5 .. .. 775 . . . . . 4 .. 4 . . . . . . . Aq1 Aq2 Aqp Bp1 Bp2 Bpr where Aij and Bij are k k matrices (and so that, in particular, both row and column dimensions of each matrix are divisible by k). Focus on an entry of the matrix product C = AB , say, the (1; 2) entry. This is the dot product of the rst row of A with the second column of B, which can be broken down into a sum of dot products p X i=1 (rst row of A1i ) (second column of Bi1 ): Furthermore, this entry is exactly the (1; 2) entry of the matrix C11 = p X i=1 A1i Bi1 and in fact, the entire matrix product C = AB can be partitioned as 2 3 C11 C12 C1r 6 C21 C22 C2r 7 7 C=6 6 . .. . . . ... 75 4 .. . Cq1 Cq2 Cqr with Cij = p X `=1 Ai` B`j : This idea can be extended without diÆculty to situations where the submatrices are rectangular and of dierent sizes, provided only that the required operations on submatrices are all well-dened. A particularly useful, albeit special, example of this arises when C and B are partitioned by columns: C = [C1 C2 Cn ] B = [B1 B2 Bn ]: Then the columns of C = AB can be directly expressed in terms of the columns of B as Ci = Abi for each i = 1; ; n. 17 Matrix Operations The transpose of an m n matrix A may be dened as the n m matrix, T, obtained by interchanging rows with columns in A, in eect \twirling" A a half turn about its main diagonal. For example, if A = 30 24 17 2 3 0 T=4 2 4 1 7 then 3 5 The matrix T is typically written as At . An alternate, though occasionally more useful denition uses the dot product P u v = i ui vi and gives the transpose matrix T = [tij ] of A as that particular n m matrix such that (Tx) y = x (Ay) for all x 2 Rn and y 2 Rm . Taking this alternate denition as a starting point, notice then that (Tx) y = implying that 8 m <X n X i=1 : j =1 9 = tij xj yi = m X n X i=1 j =1 ; n X j =1 xj ( m X i=1 ) aji yi = x (Ay) ftij aji g xi yj = 0: Since we ask that this be true for all x 2 Rn and y 2 Rm , it must be true that tij = aji , which is equivalent to the rst (usual) denition of the transpose. The following properties of the transpose are easily proved 1. (At )t = A. 2. (A + B)t = At + Bt . 3. (A)t = At ; 2 R: 4. (AB )t = Bt At : Notice that the dot product itself can be written as x y = xt y. Transposition of partitioned matrices is straightforward: 2 3 2 t 3 C11 Ct21 Ctq1 C11 C12 C1r t 6 C21 C22 C2r 7 6 Ct12 Ct22 Ctq2 7 7 6 7 Ct = 6 = 6 . 7 6 . 7 .. . . . . . . . . . . . 4 .. 5 4 . . . . 5 . . . Cq1 Cq2 Cqr Ct1r Ct2r Ctqr Because of the observation that Row k of the product (BA) = (Row k of B) A; it immediately follows that elementary row operations satisfy a type of associativity with multiplication that can be expressed as rowop(BA) = rowop(B) A; 18 Linear Equations { a Review where \rowop(B)" for example indicates the outcome of an elementary row operation applied to B. If we take B = I in particular, then for any matrix A, rowop(A) = rowop(I) A: From this, one can see that any of the three types of elementary row operations can be applied to a given matrix through the premultiplication by an appropriate elementary matrix (given by rowop(I) above). For example, if A is a matrix with four rows, consider the following: Type 1 example: \Add (row 2) to (row 4), replacing (row 4) with the result" can be accomplished by premultiplication with 2 3 1 0 0 0 6 0 1 0 0 7 6 7 4 0 0 1 0 5 0 0 1 Type 2 example: \Interchange (row 1) and (row 3)" can be accomplished by premultiplication with 2 3 0 0 1 0 6 0 1 0 0 7 6 7 4 1 0 0 0 5 0 0 0 1 Type 3 example: \Multiply (row 3) by " can be accomplished by premultiplication with 2 3 1 0 0 0 6 0 1 0 0 7 6 7 4 0 0 0 5 0 0 0 1 . The reduction phase of Gauss elimination on an augmented matrix \A" then is expressible in terms of a sequence of premultiplications by elementary matrices: Ek Ek 1 E2 E1 A = R where Ei is the elementary matrix that does the ith elementary operation during the reduction. Here, k is the total number of elementary row operations that were necessary and R is the nal row echelon matrix. What happens if we postmultiply by elementary matrices ? A moment's reection should reveal that now columns are combined with one another via analogous elementary operations. Running through the previous examples now for a matrix A with four columns : Type 1 example: \Add (column 4) to (column 2), replacing (column 2) with the result" can be accomplished by postmultiplication of A with 2 3 1 0 0 0 6 0 1 0 0 7 6 7 4 0 0 1 0 5 0 0 1 19 Matrix Operations Type 2 example: \Interchange (column 1) and (column 3)" can be accomplished by postmultiplication of A with 2 6 6 4 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 3 7 7 5 Type 3 example: \Multiply (column 3) by " can be accomplished by postmultiplication of A with 2 6 6 4 . 1 0 0 0 0 1 0 0 0 0 0 0 0 0 1 3 7 7 5 Problem 1.5. How would you eect the following operations on a 3 4 matrix A using only matrix multiplication ? In each case give the matrix E that is needed and whether AE or EA is to be computed. Notice that only the rst two correspond to elementary row operations. 1. Add 3 times row 2 to row 3. 2. Interchange rows 1 and 3. 3. Multiply column 2 by 6. 4. Delete row 2 to obtain a 2 4 matrix. Now give the result of the matrix operations described above when performed in the order given if A is dened as 2 3 1 2 3 4 A = 4 4 3 2 1 5: 2 4 3 1 Write the result also in the form E1 AE2 , giving both E1 and E2 . 20 Linear Equations { a Review 1.3 Matrix Inverses Prerequisites: Linear systems of equations Matrix arithmetic Gauss elimination Elementary Matrices Learning Objectives: Familiarity with the denition of left and right inverses of a matrix. Familiarity with the inverse of elementary matrices. Ability to determine whether a matrix has a left inverse or a right inverse and to compute those matrices when they exist. Ability to determine whether a matrix is invertible using several techniques. Ability to compute the inverse of an invertible matrix. The notion of a matrix inverse arises naturally in the very special context of the \reversibility" of elementary row operations as discussed in x1 and x1.2. In particular, if E is an elementary matrix associated with some elementary row operation and E^ is the elementary matrix associated with the elementary row operation that undoes the rst, then the product E^ E = I since the successive application of the two elementary row operations should have no net eect { the second reverses the eect of the rst. An exactly analogous argument yields that EE^ = I, too. For example, the elementary row operation \Add (row 2) to (row 4), replacing (row 4) with the result" for a matrix with four rows is associated with the elementary matrix 2 3 1 0 0 0 6 0 1 0 0 7 7 E=6 4 0 0 1 0 5 0 0 1 The \inverse" elementary row operation is \Subtract (row 2) from (row 4), replacing (row 4) with the result" is then associated with the elementary matrix 2 3 1 0 0 0 ^ = 664 0 1 0 0 775 E 0 0 1 0 0 0 1 Matrix Inverses 21 One can multiply out to verify that E^ E = EE^ = I. What about \matrix inverses" for matrices that are not elementary matrices or maybe not even square ? This could be handy since then matrix inverses could play a role in the solution of linear systems analogous to the role reciprocals play in the solution of scalar equations. In the scalar case, reciprocals are dened so that the product of a scalar and its reciprocal is 1. Then to solve the scalar equation x = y, for x, multiply both sides of the equation by the reciprocal of which we will call (= 1=) to get ()x = y 1 x = x = y: But how then to dene \reciprocals" for matrix multiplication in general ? In as much as the identity matrix is the matrix analog of the scalar \1" and following on with what we observed for elementary matrices, we might try requiring that the product of a matrix A with its \reciprocal" matrix B be the identity matrix, I. However, since AB 6= BA typically, either of the plausible requirements AB = I or BA = I should be considered independent of the other. Suppose we have a matrix A 2 Rmn and a matrix BR dened in such a way that ABR = Im . BR is called a right inverse of A. Whenever an A 2 Rmn has a right inverse BR then BR 2 Rnm and the system of equations Ax = c will be consistent for every possible right hand side c, since BR c itself will always be one possible solution : A(BR c) = (ABR )c = Ic = c: There may be other solutions to the linear system as well, and indeed, there could be more than one right inverse to a matrix or, in the other extreme, none at all. For example, with A given by A = 12 35 11 ; both matrices 2 3 2 3 3 1 7 3 4 2 0 5 and 4 4 2 5 4 1 6 3 are right inverses for A. Tall, skinny matrices (more rows than columns) can never have right inverses. Indeed, if A 2 Rmn with m > n has a right inverse BR then BR 2 Rnm is short and fat (more columns than rows) and then we are assured from Theorem 1.2 that the homogeneous system of equations BR x = 0 must have a nontrivial solution x^ 6= 0. But then we nd ourselves in a dilemma, since now x^ = Ix^ = A(BR x^ ) = 0: So it can't happen that an A 2 Rmn with m > n has a right inverse ! 22 Linear Equations { a Review How do we go about calculating right inverses ? Partition BR and Im by columns: BR = [b1 b2 bm ] Im = [e1 e2 em]: Then the condition ABR = Im means that the columns of BR are solutions to the multiple linear systems Abi = ei for i = 1; ; m. We can solve these systems simultaneously with Gauss-Jordan elimination by forming the fat augmented matrix A e1 e2 em = A Im and then reducing to reduced row echelon form { for each choice of free parameters that arise in the nal stage of Gauss-Jordan elimination a set of solutions are obtained, which then leads to a right inverse. Using as an example the matrix A above, we nd 1 3 1 1 0 [A j Im ] = 2 5 1 0 1 + 1 0 2 5 3 (reduced row echelon form) 0 1 1 2 1 So, labeling the free parameter as s and t for the rst and second right-hand side columns respectively, all right inverses are given by 2 BR = 4 5 2s 2+s s 3 3 2t 1+t 5; t for all the various choices of s and t. Theorem 1.13. The following statements are equivalent (whenever one is true, they all must be true { whenever one is false they all must be false) A matrix A has a right inverse. the reduced row echelon form for A has a leading one in each row. The system of equations Ax = b is consistent for each possible right-hand side b. Proof . We proceed by showing that the rst statement implies the second which in turn implies the third, which nally comes around and implies the rst. Suppose that A has a right inverse (call it B) but that somehow the reduced row echelon form for A fails to have a leading one in each row and so, instead 23 Matrix Inverses must have a bottom row of zeros (and maybe other rows of zeros too). This means that there are m m elementary matrices fEi gki=1 so that Ek Ek 1 E E A = R0 : 2 1 Now postmultiply by the hypothesized right inverse of A to get Ek Ek 1 E2 E1 AB = Ek Ek 1 E2 E1 R = 0 B = RB 0 This amounts to saying that the product of the elementary matrices fEi gki=1 has a bottom row of all zeros ! But this can't happen since if it were true and fE^ i gki=1 denoted the elementary matrices that are inverse to fEi gki=1 then postmultiplication in turn by E^ 1 ; E^ 2 ; ; E^ k 1 ; and E^ k yields I = = RB E^ E^ E^ E^ 1 2 k 1 k 0 RBE^ 1 E^ 2 E^ k 1 E^ k 0 which can't be true, since the identity matrix has no rows of zeros. Thus, anytime a matrix A has a right inverse it must also have a reduced row echelon form with a leading one in each row. Now suppose the reduced row echelon form for A 2 Rmn has a leading one in each row. This means, in particular, that m n and that every system of equations of the form Ax = b is consistent for any b that might be chosen (since after row reduction every equation is associated with a dierent variable { and possibly some free parameters). Consistency for all right hand sides, in turn, implies that there are solutions to each of the multiple linear systems Abi = ei for i = 1; ; m and the condition ABR = Im can be satised column by column. Thus, anytime a matrix A has a reduced row echelon with a leading one in each row, Ax = b is consistent for every b and if a matrix A is such that Ax = b is consistent for every b, then A must have a right inverse. 2 Working now from the other side, suppose instead we have a matrix BL so that BL A = I. Appropriately enough, BL is called a left inverse of A. If A 2 Rmn has a left inverse BL , then BL 2 Rnm and whenever the system of equations Ax = b is consistent (it might not be) then there can only be one unique solution to the system. Indeed, suppose there were two solutions x1 and x2 to the linear system Ax = b. Then using BL A = I we nd x1 = Ix1 = BL Ax1 = BL b = BL Ax2 = Ix2 = x2 24 Linear Equations { a Review (the two solutions are the same !!). Theorem 1.14. The following statements are equivalent (whenever one is true, they all must be true { whenver one is false they all must be false) A matrix A has a left inverse the reduced row echelon form for A has a leading one in each column Whenever the system of equations Ax = b is consistent, there can only be one unique solution to the system Proof . We will proceed by showing that the rst two statements are equivalent and then showing that the second and third statements are equivalent. Suppose the reduced row echelon form for A 2 Rmn with m n has a leading one in each column. Then there are m m elementary matrices fEi gki=1 so that Ek Ek 1 E E A = I0n 2 1 Dene BL = In Z Ek Ek 1 E2 E1 for any Z 2 Rnm n . Then direct substitution veries that BL A = In and we have exhibited a left inverse for A. Conversely, suppose that A does have a left inverse BL 2 Rnm but that nonetheless it somehow fails to have a reduced row echelon form with a leading one in each column. Then Gauss-Jordan elimination on the augmented matrix [A j 0] representing the homogeneous system of equations Ax = 0 yields R 0 0 0 where R is a matrix containing no nontrivial rows and whatever nonzero entries that remain in the reduced row echelon form. Since R contains fewer than n leading ones it must be short and fat and by Theorem 1.2 there will be nontrivial solution x^ 6= 0 to Rx = 0 and hence to Ax = 0. A conict arises since then x^ = Ix^ = (BL Ax^ ) = 0: So whenever A has a left inverse, its reduced row echelon form has a leading one in each column. To show that the second and third statements are equivalent, observe that the reduced row echelon of the augmented matrix [A j b] will allow no free parameters exactly when the reduced row echelon form for A has leading ones in each column. No free parameters in the reduced row echelon of the augmented matrix [A j b] amounts to the assertion that the solution to Ax = b is unique, if it exists (the system might still be inconsistent, after all). 2 Just as for right inverses, a matrix could have more than one left inverse, or none at all. Short fat matrices cannot have left inverses. Similar to the case of right inverses (but not in exactly the same way), all left inverses of a matrix 25 Matrix Inverses can be revealed through Gauss elimination. Notice rst that if a matrix A has a left inverse BL then BtL is a right inverse to At . So to compute a left inverse, we go through the process described above to calculate the right inverse of the transposed matrix and transpose the result. For example, the set of left inverses to the matrix A dened as 2 3 1 1 A=4 1 1 5 2 3 can be obtained from Im = 11 11 23 10 01 + 1 1 0 3 2 (reduced row echelon form) 0 0 1 1 1 Labeling the free parameter as s and t for the rst and second right-hand side columns respectively, all left inverses are given by 3 s s 1 BL = 2 t t 1 ; for all the various choices of s and t. Note that a solution to the scalar equation x = y exists and is unique for each right-hand side y if and only if 6= 0, and this is precisely the condition under which has a reciprocal = 1=. By analogy then, it is natural to consider the circumstances under which a solution x^ to a system of linear equations Ax = b is guaranteed both to exist and to be unique for each right-hand side b to be exactly when we consider A to be \invertible." From the above discussion, such a situation implies that both a left and a right inverse exists for A. Notice this implies that m = n, that is that A is a square matrix. Furthermore, if both BR and BL exist then in fact, BL = BL I = BLABR = IBR = BR (BL and BR must be equal !!). This leads us to the following denition. An n n matrix A is said to be invertible (or equivalently, nonsingular) if there exists an n n matrix B such that AB = I and BA = I. The matrix B is called the inverse of A and is denoted by B = A 1 . If no such B exists that satises both AB = I and BA = I, we say that A is noninvertible or singular. Again, as we have dened it, only square matrices are candidates for invertibility (though left and right inverses can serve as useful generalizations to rectangular matrices). At Theorem 1.15. The following statements are equivalent (whenever one is true, they all must be true { whenever one is false they all must be false) 26 Linear Equations { a Review A matrix A is invertible. The matrix A is square and whenever the system of equations Ax = b is consistent, there can only be one unique solution to the system. the reduced row echelon form for A is the identity matrix, In . The matrix A is square and the system of equations Ax = b is consistent for each possible right-hand side b. This result has as a pleasant consequence, that a square matrix, A, has a left inverse if and only if it has a right inverse, as well. Furthermore, if A(1) and A(2) are two invertible matrices of the same size then the product A(1) A(2) is also invertible and (A(1) A(2) ) 1 = (A(2) ) 1 (A(1) ) 1 that is, the inverse of a product is the product of the inverses taken in reverse order. This easy to see by writing down, (A(1) A(2) )(A(2) ) 1 (A(1) ) 1 =A(1) (A(2) (A(2) ) 1 )(A(1) ) 1 =A(1) I(A(1) ) 1 = I Thus, (A(2) ) 1 (A(1) ) 1 is a (square) right inverse for A(1) A(2) and hence is also its left inverse. One can check (by multiplying out) that the matrix 2 3 1 2 3 A=4 2 5 3 5 1 0 8 is invertible with inverse given by 2 3 40 16 9 A 1 = 4 13 5 3 5 5 2 1 Given A we compute its inverse by solving the system AB = I, as above. This leads us to the following procedure. Write the fat augmented matrix 2 3 1 2 3 1 0 0 4 2 5 3 0 1 0 5 1 0 8 0 0 1 and reduce the left-hand 3 3 submatrix to reduced row echelon form using elementary row operations. The resulting right-hand 3 3 matrix will be the inverse, if the resulting reduced row echelon form is the identity matrix. We pick up the process from the time the left matrix is in row echelon form. 3 2 1 2 3 1 0 0 4 0 1 3 2 0 15 0 0 1 5 2 1 27 Matrix Inverses + 2 4 1 2 0 0 1 0 0 0 1 14 13 5 + 6 5 2 2 3 3 1 3 5 3 1 0 0 40 16 9 4 0 1 0 13 5 3 5 0 0 1 5 2 1 Example 1.16. Solve the system x1 + 2x2 + 3x3 = 1 2x1 + 5x2 + 3x3 = 0 x1 + 8x3 = 1 Writing the system as Ax = b, we obtain 2 32 3 40 16 9 1 x = 4 13 5 3 5 4 0 5 5 2 1 1 Multiplying out, the nal solution is x1 = 31; x2 = 10 and x3 = 4. If a matrix is not invertible, the proceedure used for computing inverses will exhibit this, just as Gauss elimination applied to an inconsistent system will reveal that inconsistency. For example let 2 3 1 1 1 A=4 1 2 0 5 2 3 1 At one point in the reduction procedure we arrive at the augmented matrix 3 2 1 0 2 2 1 0 4 0 1 1 1 1 05 0 0 0 1 1 1 We can stop here since the left 3 3 matrix is not I but is in reduced row echelon form. Our conclusion is that A has no inverse. In problems 1.6 and 1.7 determine which matrices are invertible. If the matrix is invertible, nd its inverse. Problem 1.6. 2 4 1 3 2 0 2 1 3 3 6 3 5 28 Linear Equations { a Review Problem 1.7. 2 4 2 4 1 4 4 12 1 3 5 3 5 Problem 1.8. A matrix D = [dij ] is called diagonal if all entries o the main diagonal are zero, dij = 0 whenever i 6= j . Show that a square diagonal matrix with no zero entries on the main diagonal (dii 6= 0) is invertible and nd its inverse. Suppose A = (aij ) is n n. If a1n ; a2;n 1; : : : an1 are all nonzero while all the other entries of A are zero, show that A is invertible and nd its inverse. 2. A matrix T = [tij ] is called upper triangular if all entries below the main 1. diagonal are zero, tij = 0 whenever i > j . Show that a square upper triangular matrix with no zero entries on the main diagonal (tii 6= 0) is invertible and show that its inverse is also upper triangular. Problem 1.9. 1. Explain carefully why short fat matrices A 2 Rmn with m < n cannot have left inverses. 2. Suppose A(1) and A(2) are two matrices dimensioned so that the product A(1) A(2) is is well-dened. Show that if A(1) and A(2) each have left (2) (1) (2) inverses, B(1) L and BL , respectively, then the product A A also has (2) (1) a left inverse given by BL BL . 3. Give an example of a matrix that has neither a left nor a right inverse. 4. If a matrix A 2 Rmn with m < n has a right inverse, how many free parameters does the family of all right inverses for A have ? 1.4. 1.4 THE LU DECOMPOSITION 29 The LU Decomposition Prerequisites: Linear systems of equations Matrix arithmetic Gauss elimination Elementary Matrices Matrix inverses Learning Objectives: Familiarity with the denition of the LU decomposition of a matrix. Ability to use an LU decomposition to solve a linear system. Detailed understanding of the representation of Gauss elimination using elementary matrices. Ability to compute the permuted LU decomposition of a matrix. Matrix decompositions play a fundamental role in modern matrix theory as they often can reveal structural features of the transformations the matrices represent. The LU decomposition is perhaps the most basic of these decompositions and is intimately related to Gaussian elimination. Without yet saying exactly how this would be done from scratch, notice that the original coeÆcient matrix of (1.1), 2 3 1 2 2 A = 4 3 5 4 5; 2 1 3 can be factored into the product of two matrices LU where U is an upper triangular matrix (that is, nonzero entries are on or above the main diagonal) and L is a unit lower triangular matrix (all nonzero entries are on or below the main diagonal with ones on the diagonal). In particular, one can multiply out to check that A = LU with 2 3 2 3 1 0 0 1 2 2 L = 4 3 1 0 5; U=4 0 1 2 5 2 3 1 0 0 1 Assuming that such a decomposition is known, consider the following alternate approach to solving the system of equations, Ax = b. Write Ax = b 30 Linear Equations { a Review as L(Ux) = b and dene y = Ux. Then solve in sequence the two (simple) triangular systems: First, solve Ly = b for y, or explicitly for (1.1), y1 = 3 y1 = 3 3y1 + y2 = 5 ) y2 = 4 2y1 + 3y2 + y3 = 7 y3 = 1 then solve Ux = y for x, or explicitly for (1.1), x1 + 2x2 + 2x3 = 3 (= y1 ) x1 = 3 x2 2x3 = 4 (= y2 ) ) x2 = 2 x3 = 1 (= y3 ) x3 = 1 A matrix A has an LU decomposition if A = LU with L a unit lower triangular matrix and with U an upper triangular matrix. Not every matrix has an LU decomposition { for example, one can attempt to complete the unknown entries of ? ? A = 01 11 = 1? 01 0 ? to see that there is no choice that would satisfy the condition A = LU. To see how this decomposition could be computed in circumstances where it does exist, let us consider the row reduction phase of Gauss Elimination on the above matrix. In this case, three elementary row operations, all of Type 1, are necessary to reduce A to row echelon form: Zero out the (2; 1) entry by adding 3 (row 1) to (row 2). Zero out the (3; 1) entry by adding 2 (row 1) to (row 3). Zero out the (3; 2) entry by adding 3 (row 2) to (row 3). When manifested as matrix multiplications this appears as: 2 32 3 2 3 1 0 0 1 2 2 1 2 2 4 3 1 0 54 3 5 4 5=4 0 1 2 5 0 0 1 2 1 3 2 1 3 2 32 1 0 0 4 0 1 0 54 2 0 1 2 32 1 0 0 1 0 4 0 1 0 54 0 1 0 3 1 2 0 Or, written altogether, 32 3 2 1 0 0 1 2 2 1 2 3 1 0 54 3 5 4 5 = 4 0 1 0 0 1 2 1 3 0 3 32 32 3 2 0 1 0 0 1 2 2 0 54 3 1 0 54 3 5 4 5 = 4 1 0 0 1 2 1 3 1 0 0 2 2 7 3 5 2 1 0 3 2 2 5: 1 E3;2E3;1 E2;1 A = R where we have denoted the Type 1 elementary matrix that introduces a zero at the (i; j ) location of A with Ei;j . 31 LU Decomposition 2 Observe that E3;2E3;1 E2;1 = 4 2 4 3 1 3 7 0 0 1 0 3 1 3 5 whereas (E3;2 E3;1 E2;1 ) 1 = 1 0 0 3 1 0 5. This is exactly the LU decomposition we gave above, 2 3 1 2 32 3 1 0 0 1 2 2 4 5 4 3 1 0 0 1 2 5 = LU: A = (E3;2 E3;1 E2;1 ) R = 2 3 1 0 0 1 Notice that the multipliers (with changed sign) are placed at the locations in L precisely where those zeros were in the matrix that the multipliers had a role in introducing. To see how this will work in general, consider the simplest case rst where only Type 1 operations (i.e., adding multiples of rows to one another) are suÆcient to complete reduction to row echelon form. As before, let Ei;j denote the Type 1 elementary matrix that introduces a zero into the (i; j ) location of the matrix being reduced and suppose specically that this is done by adding ij (row j ) to (row i). By writing it out, one may verify that Ei;j has a convenient representation: Ei;j = I + ij ei etj Since we only introduce zeros below the pivot entry in a given column we can always assume that Ei;j is dened just for i > j . Let's group together the Type 1 elementary matrices that work on the same column of A. Dene the matrices, M1 = En;1 En 1;1 E2;1 M2 = En;2 En 1;2 E3;2 and in general Mj = En;j En 1;j Ej+1;j : Each of the matrices Mj eects the row reduction that introduces zeros into the entries j + 1; : : : n 1; n of (column j ) by adding multiples of using (row j ). The matrix Mj is called a Gauss transformation and can be represented in the following useful way: Mj = I `j etj where `j = [0| ; 0; {z ; 0}; j+1;j ; ; n 1;j ; nj ]t 1 j This can be seen directly by multiplying out En;j En 1;j Ej+1;j = I + (nj en + n 1;j en 1 + + j+1;j ej+1 )etj 32 Linear Equations { a Review (all other products in the expansion are 0). Then to complete the reduction to row echelon form, we have Mn and 1 M M A = R 2 A = (M1 1 M2 1 M3 1 1 Mn )R: 1 1 Now, note that 1. For each j = 1; : : : ; n 1, we have Mj 1 = I + `j etj (just multiply it out to check !) 2. For each i < j , Mi 1Mj 1 = I + `i eti + `j etj + `i eti `j etj |{z} =0 But then M1 1 Mn 1 1 = = = .. . = (I + `1 et1 )(I + `2 et2 )(I + `3 et3 ) (I + `n 1 etn 1 ) (I + `1 et1 + `2 et2 )(I + `3 et3 ) (I + `n 1etn 1 ) (I + `1 et1 + `2 et2 + `3 et3 ) (I + `n 1 etn 1 ) I+ nX1 j =1 `j etj which is our lower triangular matrix, 2 1 6 21 L=6 6 .. 4 . 3 0 0 1 0 77 .. . . . .. 75 : . . n1 n2 1 This is exactly the pattern of multipliers we saw in the rst example. The general case where A is an n m matrix is somewhat more complicated. We commented in x1 that it is always possible to achieve row echelon form using only Type 1 and Type 2 elementary operations. But Type 2 operations play a role only when a row interchange is necessary to bring a nonzero entry into the pivot position in order then to zero out all nonzero entries below it in the same column using subsequent Type 1 elementary operations. For example, the matrix 2 3 1 2 2 4 3 6 45 2 1 3 33 LU Decomposition cannot be brought into row echelon form using only Type 1 operations. After two steps we nd 2 32 32 3 2 3 1 0 0 1 0 0 1 2 2 1 2 2 4 0 1 0 54 3 1 0 54 3 6 4 5=4 0 0 2 5 2 0 1 0 0 1 2 1 3 0 3 7 However, a row interchange at this point leaves us in row echelon form: 2 32 32 32 3 2 3 1 0 0 1 0 0 1 0 0 1 2 2 1 2 2 4 0 0 1 54 0 1 0 54 3 1 0 54 3 6 4 5=4 0 3 7 5 0 1 0 2 0 1 0 0 1 2 1 3 0 0 2 Thus, the following sequence of steps occur in the course of row reduction in general: 1. (Possible) row interchange of (row 1) with a lower row. 2. Row reduction using (row 1) to introduce zeros into entries 2; 3; : : : n of (column 1). 3. (Possible) row interchange of (row 2) with a lower row. 4. Row reduction using (row 2) to introduce zeros into entries 3; 4; : : : n of (column 2) .. 5. . In order to represent this reduction process with matrix multiplication, let j denote the elementary matrix that eects a row interchange of (row j ) with some appropriate lower row. The interleaving of Type 1 and Type 2 operations used to reduce to row echelon form as described above then appears as: M M A = R The matrices i are elementary matrices associated with Type 2 (row interchange) operations { i is an example of a permutation matrix. In general, permutation matrices are obtained by permuting the columns (or rows) of the identity matrix. As a result, it's not hard to see that any product of permutation matrices is also a permutation matrix and if P is a permutation matrix then PPt = I. Since products of permutation matrices are permutation matrices (how would you explain that ?) and since products of Gauss transformations are unit lower triangular matrices that lead in a simple way to our L, we'd like to separate out the Type 1 and Type 2 operations and collect each together separately. The key observation that lets us do this is that for i > j c j i i Mj = M Mn 1n 1 2 2 1 1 34 Linear Equations { a Review c j is a Gauss transformation of the same form as Mj but with two where M multipliers interchanged (ith and one with index bigger than i according to i ). Putting this together, Mn 1 n 1 M3 3 M2 2 M1 1 A = R becomes, after commuting the permutations past the Gauss Transformations, cn 2 M c3M c2M c 1 )(n (Mn 1 M or, with some rearrangement, A = (1 2 3 If we dene and 1 )A = R 3 2 1 c M c M c c n )M M n Mn R 1 P = n 1 1 1 2 1 3 1 c 1M c 1M c 1 L=M 1 2 3 3 2 1 2 1 1 1 c M n Mn ; 1 2 1 1 then we have our nal conclusion: Theorem 1.17. Any matrix, A 2 Rmn , may be decomposed as the product of three matrices A = Pt LU where P is an m m permutation matrix, L is a unit lower triangular m m matrix, and U is an upper triangular m n matrix in row echelon form. This is called the permuted LU decomposition. Notice that while not every matrix has an LU decomposition, every matrix does have a permuted LU decomposition. A somewhat surprising interpretation of this result is that if we had known what row interchanges would occur in the course of Gauss elimination and then performed them all at the outset instead before any reduction occured (eseentially forming PA in the process), we would complete the reduction then with no further Type 2 operations and with exactly the same Type 1 operations and obtain exactly the same nal R as the usual Gauss elimination with interleaved Type 1 and Type 2 operations would produce. Chapter 2 Vector Spaces and Linear Transformations 2.1 A Model for General Vector Spaces Prerequisites: Basic knowledge of vectors in R2 and R3 . Matrix manipulation. Basic calculus and multivariable calculus skills. Skills in language and logic, techniques of proof. Advanced Prerequisites: Denitions of groups and elds. Learning Objectives: Understanding of the basic idea of a vector space. Ability to prove elementary consequences of the vector space axioms. Familiarity with several examples of vector spaces. Familiarity with examples of sets of objects that are not vector spaces. The concept of vector space permeates virtually all of mathematics and a good chunk of those disciplines that call mathematics into service. Whence 35 36 Vector Spaces comes such universality ? Well-suited vocabulary, more or less. The most powerful feature of linear algebra { and by association, of matrix theory is the geometric intuition that our daily three dimensional experience can lend to much more (sometimes unspeakably!) complicated settings. Language used to describe vector spaces are a vehicle that carries many ideas and focusses thinking. The rst step in pushing past the boundaries of the \physical" vector spaces of two and three dimensions begins with the Cartesian view of two and three dimensional space as comprising ordered pairs of numbers (for two dimensional space) and ordered triples (for three dimensional space). Suppose n is a positive integer. An n-vector (usually just called a \vector" if n is unambiguous or immaterial) is an ordered n-tuple1 of n real numbers 0 B a1 a2 a=B B . @ . . an 1 C C C: A Although we endeavor to write vectors always as columns (as opposed to rows), the notation for matrix transpose allows us to write a also as a = (a1 ; a2 ; : : : ; an )t without squandering quite as much space and without having to make silly distinctions between \row vectors" and \column vectors". Out in the world one sees vectors written (seemingly at random) either as rows or columns. Let u = (u1 ; u2 ; : : : ; un )t and v = (v1 ; v2 ; : : : ; vn )t be two n-vectors. We say that u = v if and only if ui = vi for each i = 1; 2; : : : ; n. We dene u + v = (u1 + v1 ; u2 + v2 ; : : : ; un + vn )t fvector additiong and ku = (ku1 ; ku2; : : : ; kun)t ; k 2 R f scalar multiplication g: We also dene 0 = (0; 0; : : : ; 0)t and u = ( u1 ; u2; : : : ; un)t : The set of all such ordered n-tuples, together with the two dened arithmetic operations on elements of the set: vector addition and scalar multiplication, is called Euclidean n-space and is denoted Rn . This vector space is the natural generalization of the \physical" vector spaces R2 and R3 to higher dimension and is the prototype for even more general vector spaces. In general, a vector space is a set of objects (the \vectors"), a set of scalars (usually either the real numbers, \R," or complex numbers, \C ," but other scalar elds are sometimes used), and a pair of operations, vector-vector \addition" and scalar-vector \multiplication" that interact \sensibly" with the usual notions of addition and multiplication for real or complex numbers. 1 An odd expression extending the already clumsy terms: : : : quadruple (n = 4), quintuple (n = 5), sextuple (n = 6), etc. General Vector Spaces 37 The following eight properties for \sensible" vector addition and scalar multiplication should hold for all vectors u; v; and w in an (alleged) vector space V and scalars ; in R or C . These properties are easy to verify for vectors in Rn and scalars ; in R. 38 Vector Spaces General properties of vector addition and scalar multiplication: 1. 2. 3. 4. u+v =v+u fcommutativity of vector additiong u + (v + w)t = (u + v)t + w fassociativity of vector additiong There exists a zero vector \0" such that u + 0 = 0 + u = u For every vector u there exists a vector \ u" such that u + ( u) = 0 5. (u) = ()u 6. (u + v) = u + v 7. ( + )u = u + u of scalar-scalar multiplication f associativity with scalar-vector multiplication g multiplication f distributivity of scalar-vector over vector-vector addition g multiplication f distributivity of scalar-vector over scalar-scalar addition g 8. 1 u = u We can dene C n , complex n-space in the same way except we (usually) take the scalars to be complex numbers, elements of C . One can check that all the eight properties of addition and scalar multiplication still hold. Now let V be a set. Suppose we have dened a \vector addition" u + v so as always to produce a result that ends up in V for each u; v starting in V and a \scalar multiplication" u so as to produce a result in V for each u 2 V and 2 R (or alternatively for each 2 C ) (This is succinctly stated by saying that the set V is closed under vector addition and scalar multiplication). Further suppose that the eight properties for vector addition and scalar multiplication hold in V . Depending on whether real scalars or complex scalars are being used, V is called a real or complex vector space. Examples 2.1. 1. V = Rn is the model upon which the general denition of a vector space is based. 2. Let V be the set of vectors in R3 of the form (x; x; y)t with the addition and scalar multiplication it inherits from R3 . Clearly properties 1,2,5,6,7 and 8 hold for V since they hold for R3 . Properties 3 and 4 are satised with 0 = (0; 0; 0)t and u = ( x; x; y)t where both vectors are in V . We only need to show that V is closed under vector addition and scalar multiplication. To see this just note that if (x; x; y)t and (s; s; r)t are in V , then (x + s; x + s; y + r)t and (x; x; y)t are both in V . Thus V is a vector space. 3. Let V = Pn be the set of all polynomials of degree less than or equal to n with real coeÆcients. If p = an z n + + a0 and q = bn z n + + b0 , we dene p + q = (an + bn )z n + + (a0 + b0 ) 39 General Vector Spaces and p = (an )z n + (an 1 )z n 1 + + a0 4. Let V be the set of m n matrices with the usual denitions of matrix addition and scalar multiplication. 5. Let V be the set of all real valued functions on an interval I. For f; g 2 V , we dene the functions f + g and f by [f + g](x) = f (x) + g(x); [f ](x) = f (x) Problem 2.1. Show that examples 2,3 and 4 are vector spaces. Problem 2.2. Let V be the set of all polynomials of degree n. Is V a vector space? Problem 2.3. Show that in any vector space V with u in V and k real (or complex) that 1. k0 = 0 2. 0u = 0 3. u = 1u A subset W of a vector space V is called a subspace of V , if W is itself a vector space with the vector addition and scalar multiplication inherited from V . Given a subset W of a vector space V with the vector addition and scalar multiplication inherited from V , all that is necessary to check in order to assert that W is a subspace is that W is closed under vector addition and scalar multiplication. This follows from Problem 2.3 and the fact that vector addition and scalar multiplication in W is the same as for V , and so all the general properties of vector addition and scalar multiplication still hold in W . The following problem asks you to prove this. Problem 2.4. Show that a subset W of a vector space V , with the vector addition and scalar multiplication inherited from V , is a subspace of V if and only if W is closed under addition and scalar multiplication. Example 2.2. Let W = (x; y)t 2 R2 : x = 3y . Let u and v be in V . Then u = (3s; s)t and v = (3t; t)t for some choice of scalars s and t. Then u + v = ((3s + 3t); s + t)t = (3(s + t); s + t)t 2 W and u = ((3s); s)t = ((3)s; s)t 2 W W is a subspace of V since W is closed under vector addition and scalar multiplication. 40 Vector Spaces Problem 2.5. 1. Show W = (x; y; z )t 2 R3 : x + y 2z = 0 is a subspace of R3 . 2. Show W = (x; y)t 2 R2 : x = y + 1 is not a subspace of R2 . While there are an innite variety of subspaces that can be considered, there are two especially important subspaces associated with any given m n matrix A. Dene the kernel of A and the range of A by Ker(A) = fx 2 Rn : Ax = 0g and Ran(A) = fy 2 Rm : there exists an x 2 Rn such that Ax = yg : The kernel of a matrix is evidently the set of all possible solutions to the homogeneous system Ax = 0 and the range of a matrix is the set of all right hand sides, b, for which the linear system of equations Ax = b is consistent. Theorem 2.3. Let A be an m n matrix. Then 1. Ker(A) is a a subspace of Rn 2. Ran(A) is a subspace of Rm . Problem 2.6. Prove Theorem 2.3. We end this section with what is probably the most fundamental way of generating a subspace of a vector space. Let fv1 ; v2 ; : : : ; vr g be a set of vectors in a vector space V . We say that a vector u is a linear combination of v1 ; v2 ; : : : ; vr , if there exist scalars 1 ; 2 ; : : : ; r such that u = 1 v1 + 2 v2 + + r vr For example, in V = R2 the vector u = (5; 4)t is a linear combination of v1 = (1; 1)t and v2 = (3; 2)t with 1 = 2 and 2 = 1. The set of all possible linear combinations of a set of vectors v1 ; v2 ; : : : ; vr is called the span of v1 ; v2 ; : : : ; vr . Theorem 2.4. Let fv1 ; v2 ; : : : ; vr g be a set of vectors in a vector space V and let W be the set of all linear combinations of v1 ; v2 ; : : : ; vr . Then W is a subspace of V and furthermore W is the smallest subspace containing v1 ; v2 ; : : : ; vr in the sense that if U was another subspace containing v1 ; v2 ; : : : ; vr then W U. The subspace W in Theorem 2.4 is denoted as W = span(v1 ; v2 ; : : : ; vr ) . 41 General Vector Spaces Proof: Since V is closed under vector addition and scalar multiplication, every linear combination of v1 ; v2 ; : : : ; vr must also be in V , so that W is certainly a subset of V . To show that W is a subspace, it is enough to show that W itself is closed under vector addition and scalar multiplication. Let u; v 2 W . Then u = 1 v1 + 2 v2 + + r vr v = 1 v1 + 2 v2 + + r vr Hence u + v = (1 + 1 )v1 + (2 + 2 )v2 + + (r + r )vr which is a linear combination of v1 ; v2 ; : : : ; vr and hence in W . A similar proof shows that W is closed under scalar multiplication. Now suppose that U is some subspace of V containing the vectors v1 ; v2 ; : : : ; vr : Since U is a subspace it must be closed under vector addition and scalar multiplication and so must contain all vectors of the form 1 v1 + 2 v2 + + r vr ; which is to say, U must contain each vector of W . 2 Problem 2.7. Determine whether u and v 2 span(S ) for the set of vectors S given. 1. u = (1; 1; 1; 2)t, v = (1; 0; 12; 5)t and S = (1; 1; 2; 0)t; (0; 0; 1; 2)t; (1; 1; 0; 4)t 2. u = (3; 3; 3)t, v = (4; 2; 6)t and S = (1; 1; 3)t; (2; 4; 0)t 42 Vector Spaces 2.2 The Basics of Bases Prerequisites: Vector spaces. Subspaces. Matrix manipulation, Gauss elimination. Advanced Prerequisites: Fundamental theorem of algebra Learning Objectives: Familiarity with the concepts of linear combination, linear independence, span, basis and dimension. Ability to determine whether a set of vectors is linearly independent. Ability to determine whether a set of vectors spans a given subspace. Ability to determine whether a set of vectors is a basis for a given subspace. Ability to determine the dimension of a given subspace. Ability to compute the coordinates of a vector with respect to a given basis Ability to compute the change of basis matrix between any two bases. Vector bases are distinguished sets of vectors in a vector space with which we may uniquely represent any vector in the vector space. Bases are important for much the same reason any mechanism for representation is important { dierent representations of the same thing can bring out dierent features of the thing. 2.2.1 Spanning Sets and Linear Independence Let fv1 ; v2 ; : : : ; vn g V , for some vector space V . Since span(fv1 ; v2 ; : : : ; vn g) is the smallest subspace containing fv1 ; v2 ; : : : ; vn g ; we know that span fv1 ; v2 ; : : : ; vn g V: If, in fact, it happens that span fv1 ; v2 ; : : : ; vn g = V then we say that fv1 ; : : : ; vn g spans V: Notice that this means that every vector in V is expressible as a linear combination of fv1 ; v2 ; : : : ; vn g : 43 Bases Examples 2.5. 1. The standard example of a spanning set for R3 is i = (1; 0; 0); j = (0; 1; 0); k = (0; 0; 1): 2. We show that S = f(1; 1; 1); (1; 1; 2); (1; 0; 0)g spans R3 . This is true if and only if the system of equations (b1 ; b2 ; b3) = x1 (1; 1; 1) + x2 (1; 1; 2) + x3 (1; 0; 0) is consistent for every vector (b1 ; b2 ; b3 ). Solving the system by Gaussian elimination, we see that x3 = b1 b2 , x2 = b3 b2 , and x1 = 2b2 b3 . Hence S spans V . 3. Let S = f(1; 0; 0); (1; 1; 1); (2; 1; 1)g We show that S does not span R3 , since the system (b1 ; b2 ; b3) = x1 (1; 0; 0) + x2 (1; 1; 1) + x3 (2; 1; 1) is consistent only if b3 b2 = 0. Thus, for instance, (1; 0; 1) is not in the span of S. Problem 2.8. Determine whether the following sets of vectors span R3 . 1. f(1; 1; 1); (2; 2; 0); (3; 0; 0)g 2. f(1; 3; 3); (1; 3; 4); (1; 4; 3); (6; 2; 1)g Suppose we have n vectors v1 ; v2 ; : : : ; vn : which span a vector space V and suppose vn is a linear combination of v1 ; v2 ; : : : ; vn 1 . Then the vector vn is unnecessary in so far as v1 ; v2 ; : : : ; vn 1 still span V . Thus it is natural to look for the smallest spanning set. This leads to the following denition. We say that a set S = fv1 ; v2 ; : : : ; vr g is linearly independent if whenever check c1 v1 + c2 v2 + + cr vr = 0 condition then c1 = c2 = = cr = 0: Otherwise S is called linearly dependent. 44 Vector Spaces Suppose that S = fv1 ; v2 ; : : : ; vr g is linearly dependent. Then c1 v1 + c2 v2 + + cr vr = 0 has a nontrivial solution (c1 ; c2 ; : : : ; cr ) 6= 0. By re-ordering if necessary, we may assume that cr 6= 0. Then c1 v cr 1 crc vr r and so vr is a linear combination of v ; v ; : : : vr . Problem 2.9. Prove the converse to the above argument. That is, show that if vr is a linear combination of v ; v ; : : : vr , then v ; v ; : : : vr is linearly dependent. We see by the problem and the preceding argument that a set of vectors is linearly dependent precisely when some vector in the set is a linear combination of the others. Let 1 ; 0 ; 1 : S= 0 1 1 vr = c2 v cr 2 1 1 2 1 2 1 1 1 1 2 Clearly S is a linearly dependent set that spans R2 . In fact, every vector of R2 can be written as a linear combination of the vectors in S in more than one way. For instance, 1 =1 1 =1 1 +1 0 : 1 1 0 1 This ambiguity disappears if a spanning set S is linearly independent. Theorem 2.6. Let S = fv1 ; v2 ; : : : ; vr g be a linearly independent set spanning a vector space V . Then every vector in V can be written in a unique way as a linear combination of vectors in S. Proof: Since S is a spanning set for V , every vector in V has at least one representation as a linear combination of vectors in S. Suppose there were two: c1 v1 + c2 v2 + + cr vr = u = k1 v1 + k2 v2 + + kr vr : Then subtracting left from right, 0 = (c1 k1 )v1 + (c2 k2 )v2 + + (cr kr )vr : By linear independence we have that ci ki = 0 for each i = 1; 2; : : : ; r. Hence the representation of u is unique. 2 45 Bases 0 1 0 1 0 1 1 1 1 Example 2.7. To determine whether @ 1 A ; @ 1 A ; and @ 0 A are lin1 0 0 early independent vectors, consider the system of equations 0 1 0 1 0 1 0 1 1 1 1 0 x1 @ 1 A + x2 @ 1 A + x3 @ 0 A = @ 0 A 1 0 0 0 and nd all possible solutions via Gaussian elimination. We see that x1 = x2 = x3 = 0. Thus by denition, the vectors are linearly independent. Problem 2.10. Determine whether the following sets of vectors are linearly independent: 1. (1; 2; 3)t; (1; 2; 4)t; (1; 2; 5)t 2. (1; 1)t ; (2; 3)t; (3; 2)t 3. (1; 1; 1; 1)t; (3; 1; 1; 1)t; (1; 0; 0; 0)t Problem 2.11. Show that any set S which contains the zero vector is linearly dependent. Problem 2.12. Suppose S consists of 4 vectors in R3 . Explain why S is linearly dependent. Problem 2.13. Show that if two vectors in R2 are linearly dependent, they lie on the same line while three linearly dependent vectors in R3 lie in the same plane. 2.2.2 Basis and Dimension If S = fv1 ; : : : ; vr g is a set of vectors in a vector space V , then S is called a basis for V if S is a linearly independent set and spans V . Examples 2.8. 1. The standard example of a basis for Rn or C n is the set S = fe1 ; e2 ; e3 ; : : : ; en g ; where ej is the vector in Rn whose j th coordinate is 1 and whose other coordinates are all 0. It is called the natural or standard basis for Rn or C n. 2. If S is a linearly independent set, then S is a basis for span(S). Problem 2.14. 1. Prove that the natural basis is indeed a basis for Rn or C n 46 Vector Spaces 2. Find another basis for Rn and prove it is one. Our immediate goal is to prove that any two bases of a vector space must each have the same number of vectors. We rst need the following theorem. Theorem 2.9. Let S = fv1 ; : : : ; vr g be a basis for V and let T = fw1 ; : : : ; ws g V; where s > r. Then T is a linearly dependent set. Proof: Since S is a basis for V , we may write w1 = a11 v1 + + ar1 vr (2.1) .. .. .. . . . ws = a1s v1 + + ars vr To prove that T is linearly dependent, set k1 w1 + + ks ws = 0 (2.2) and substitute (2.1). After rearranging terms, we obtain (a11 k1 + + a1s ks )v1 + + (ar1 k1 + + ars ks )vr = 0: Since S is a basis, it is linearly independent and so we obtain the following system of equations in the variables ki . a11 k1 + + a1s ks = 0 .. .. .. . . . ar1k1 + + ars ks = 0 Since s > r, the above system has more unknowns than equations. Thus the system has innitely many solutions for k1 ; : : : ; ks , and in particular, solutions that are nontrivial. By equation 2.2, T must be linearly dependent. 2 We easily obtain the following theorem from Theorem 2.9. Theorem 2.10. If some basis of a vector space V contains n vectors, then every basis of V contains n vectors. Problem 2.15. Prove Theorem 2.10. The number of vectors in a basis of a vector space V is called the dimension of V and denoted by dim(V ). By Theorem 2.10 this number is independent of the particular basis of V we may happen to have and so reects a basic structural feature of the vector space. Remark: dim(Ran(A)) is called the rank of A. dim(Ker(A)) is called the nullity of A. Bases 47 Examples 2.11. 1. Since the natural basis for Rn contains n vectors, dim(Rn ) = n. 2. The set S = 1; x; x2 ; : : : ; xn is a basis for Pn (dened in section 2.1). Thus dim Pn = (n + 1). Example 2.12. We nd a basis and the dimension of the space of solutions to the system 2x1 + 2x2 x3 + x5 = 0 x1 x2 + 2x3 3x4 + x5 = 0 x1 + x2 2x3 x5 = 0 x3 + x4 + x5 = 0 By Gaussian elimination, we nd that x1 = s t; x2 = s; x3 = t; x4 = 0; x5 = t So the general solution is ( s t; s; t; 0; t) = s( 1; 1; 0; 0; 0) + t( 1; 0; 1; 0; 1) Clearly S = f( 1; 1; 0; 0; 0); ( 1; 0; 1; 0; 1)g spans the solution space. The set S is also linearly independent since 0 = s( 1; 1; 0; 0; 0) + t( 1; 0; 1; 0; 1) = ( s t; s; t; 0; t) implies that s = 0 and t = 0. Hence S is a basis and the dimension of the space spanned by S is 2. The method we just used to nd a basis for the solution space always yields a spanning set S, since every solution is evidently a linear combination of vectors in S with free parameters s and t functioning as coeÆcients. Linear independence of S obtained in this way is also easy to see since each free parameter will be xed at zero as a consequence of the \check condition" for linear independence. Our next theorem tells us that if we know a priori the dimension of a vector space, V , we need only check either that the set spans V or that the set is linearly independent to determine whether the set is a basis for V . Theorem 2.13. Let V be a vector space of dimension n. If S is a set of n vectors then S is linearly independent if and only if S spans V . Proof: Suppose the set of vectors S = fv1 ; : : : ; vn g is linearly independent. If S does not span V , then we can nd a vector vn+1 in V that is not in the span of S. and in particular, vn+1 is not a linear combination of the vectors in S. Hence S^ = fv1 ; : : : ; vn ; vn+1 g 48 Vector Spaces is linearly independent. But this contradicts Theorem 2.9 because the dimension of V is n and hence it contains a basis of n vectors. Now suppose S spans V . If S is not linearly independent, we may discard vectors until we arrive at a linearly independent set S^ with m < n vectors. But S^ still spans V since we've been casting out only redundant vectors, and so S^ is a basis for V with m < n vectors, a contradiction to the initial hypothesis that the dimension of V is n. 2 Example 2.14. To check whether f(1; 2; 3); (4; 5; 6); (2; 2; 2)g is a basis for R3 , we need only check linear independence since dim(R3 ) = 3. We note that this is a bit easier than checking whether the set spans. (Compare Problem 2.8 with Problem 2.10.) Problem 2.16. By inspection explain why f(1; 2); (0; 3); (2; 7)g is not a basis for R2 and why f( 1; 3; 2); (6; 1; 1)g is not a basis for R3 : Problem 2.17. Determine whether f(2; 3; 1); (4; 1; 1); (0; 7; 1)g is a basis for R3 . Problem 2.18. Find a basis for the subspaces of R3 consisting of solutions to the following systems of linear equations. 1. x1 + x2 x3 = 0 2x1 x2 + 2x3 = 0 x1 + x3 = 0 2. 3. x+y+z =0 3x + 2y 2z = 0 4x + 3y z = 0 6x + 5y + z = 0 x+y+z =0 y+z =0 y z=0 Next we tackle the problem of nding a basis for the range of a matrix. Suppose that S = fv1 ; : : : ; vr g spans V but is not linearly independent. As mentioned before, in this case it is possible to discard a vector which is a linear combinations of the other vectors and still have a set which spans V . Then after discarding nitely many vectors we arrive at a subset of S which is a basis for 49 Bases V . We now demonstrate a method which achieves this in the case V Rn . Let A be the matrix whose columns are the vectors in S. If y is in V , then y = x1 v1 + : : : + xr vr : Thus letting x = (x1 ; : : : ; xr ), we obtain Ax = y: Remark:The range of a matrix is equal to the space spanned by its columns. Let us put A in reduced row echelon form B. We note that if Ax = y; then Bx = y1 for some y1 2 V: Hence by the remark above, the vector y is in the range of A if and only if the vector y1 is in the range of B. More importantly, both of these vectors are images of the same vector x. By inspection it is easy to see which columns of B are linear combinations of the other columns. Say the ith column is such a column. Ignore it! That is given y1 in the range of B, we can nd a vector x whose ith coordinate is zero such that Bx = y1 : Thus given any y in V we can nd a vector x with ith coordinate zero such that Ax = y: Hence the ith column of A may be discarded. Example 2.15. Let S = (1; 2; 1)t; (2; 3; 3)t; (0; 1; 1)t; (2; 2; 2)t; (1; 0; 1)t We nd a subset of S which spans the same space as S but is linearly independent. Let 2 3 1 2 0 2 1 A = 4 2 3 1 2 0 5: 1 3 1 2 1 The reduced row echelon form for A is 2 3 1 0 0 0 1 B = 4 0 1 0 1 1 5: 0 0 1 1 1 Clearly columns 3 and 4 in B are linear combinations of columns 1,2 and 3 and so we discard columns 3 and 4 from A. The resulting linearly independent spanning set is S1 = (1; 2; 1)t; (2; 3; 3)t; (0; 1; 1)t : By the remark above, S1 is a basis for the range of A. 50 Vector Spaces Problem 2.19. Find a basis for the range of the following matrices: 1. 2. 3. 2 0 A=4 2 1 2 0 2 1 1 0 0 1 0 2 2 2 A=4 2 3 2 4 2 3 5 3 5 3 1 1 1 1 1 A=4 2 3 4 5 6 5 6 5 4 3 2 Problem 2.20. Let A be an m n matrix whose range has dimension r and whose kernel has dimension s. Show that r + s = n. Hint: This is simple for a matrix in reduced row echelon form. (This problem gives an alternate proof of the rank-nullity theorem proved in the next section.) If A is an m n matrix then the range of At is the subspace spanned by the rows of A. This is occasionally referred to as the row space of A. Since elementary row operations only form linear combinations of the rows of A, the row space of a matrix will not change in the course of Gauss elimination. So if A is reduced via elementary operations to a reduced row echelon form, R, the row space of A is identical to the row space of R. Since the nonzero rows of either the row echelon form or of the reduced row echelon form are linearly independent, the number of leading ones in the reduced row echelon form is then, evidently the dimension of the row space of A. But the leading ones mark the location of the linearly independent columns of A and so the number of leading ones is also rank(A), i.e., the dimension of the range of A. This leads to Remark:The rank of a matrix is equal to the dimension of its row space. We end this section with a very important theorem. Theorem 2.16. If S = fv1 ; : : : ; vr g is a linearly independent set in a vector space V of dimension n, then S can be enlarged to become a basis of V ; that is there exists a basis of V which contains S as a subset. If S spans V , then r = n and we are done. If not, there exists some vector vr+1 62 span(S). The set S1 = fv1 ; : : : ; vr ; vr+1 g is clearly linearly independent. If S1 spans V , then we are done. Otherwise there exists a vector vr+2 62 span(S1 ). We form S2 = S1 [ fvr+2 g. Since dim(V ) = n, this process must stop. When it does, we have the required basis of V . 51 Bases 2.2.3 Change of Basis Given a basis S = fu1 ; : : : ; un g of a vector space V , we have by Theorem 2.6 that every vector u in V can be written uniquely as u = k1 u1 + + kn un so there is a one-to-one correspondence between vectors u in V and n-tuples (k1 ; : : : ; kn )t . We write this association as (u)S := (k1 ; : : : ; kn )t : The above representation denes the \S-coordinates" of u unambiguously. When V is real or complex Euclidean n-space and S is the natural basis, we sometimes do away with the subscript S and then write u with its usual coordinate representation. Examples 2.17. Let T = f(0; 0; 1); (1; 2; 3); (1; 0; 0)g 1. To nd (u)T where u = (1; 2; 0), we need to solve the system (1; 2; 0) = c1 (0; 0; 1) + c2 (1; 2; 3) + c3 (1; 0; 0) for c1 ; c2 ; c3 . We nd that c1 = 3,c2 = 1, and c3 = 0. So uT = ( 3; 1; 0): 2. To nd what the vector (2; 4; 1)T equals in the usual basis simply note that (2; 4; 1)T = 2(0; 0; 1) + 4(1; 2; 3) + 1(1; 0; 0) = (5; 8; 14) Problem 2.21. Let S = f(1; 2; 1); (0; 1; 0); (1; 1; 0)g T = f(1; 1; 1); (1; 1; 0); (1; 0; 0)g : 1. Find (u)S and (u)T where u = (1; 2; 1). 2. Give the natural basis coordinate representation for (1; 5; 2)S and (3; 0; 2)T . Problem 2.22. Suppose A 2 Rmn has linearly independent columns and let the set of vectors A = fa1 ; a2 ; : : : ; an g be constituted from the columns of A. Explain the following statement: x^ solves the linear system Ax = b if and only if (b)A = x^ Suppose we are given a vector in S coordinates and we wish to write it in T coordinates, that is we seek to nd a relationship between (u)S and (u)T . 52 Vector Spaces Suppose V is two-dimensional: S = fu1 ; u2 g and T = fv1 ; v2 g each are bases for V . We rst nd (u1 )T and (u2 )T by solving the two linear systems u1 = av1 + bv2 : u2 = cv1 + dv2 Hence we will have found that (u1 )T = (a; b) and (u2 )T = (c; d) Now let v be an arbitrary vector in V with (v)S = (k1 ; k2 ). Then v = k1 u1 + k2 u2 = k1 (av1 + bv2 ) + k2 (cv1 + dv2 ) = (k1 a + k2 c)v1 + (k1 b + k2 d)v2 So, (v)T = (k1 a + k2 c; k1 b + k2 d): It is easy to see from this that (v)T = ab dc (v)S = [(u1 )T (u2 )T ] (v)S : = B(v)S (2.3) where B = [(u1 )T (u2 )T ] with columns (u1 )T and (u2 )T is called the change of basis matrix from S (the given basis) to T (the new basis). In general, if S = fu1 ; : : : ; un g and T = fv1 ; : : : ; vn g, the change of basis matrix B from S to T in equation (2.2.3) is given by [(u1 )T : : : (un )T ] : Example 2.18. Let S be the natural basis, fe1 ; e2 g for R2 and let T be the set consisting of u = (1; 1) and v = (2; 0). To nd the change of basis matrix B from S to T, we rst change the basis vectors of S to the T basis. By solving an appropriate system, we see that 1 1 e1 = 0u + 2 v e2 = 1u + 2 v Thus B = (e ) (e ) = 0 1 1 Let w = (1; 3)S . Then T 2 (w)T = B(w)S = 10 2 T 1 2 1 1 2 1 2 1 = 3 3 1 53 Bases Problem 2.23. Show that if S = f(a1 ; b1 ; c1 ); (a2 ; b2; c2 ); (a3 ; b3 ; c3 )g and T is the natural basis in R3 , then the change of basis matrix from S to T is 2 4 a1 a2 a 3 b1 b2 b3 c1 c2 c3 3 5: Suppose the change of basis matrix from S to T is B. Since the columns of B are the coordinate representations of vectors in S, these columns are linearly independent and thus B is invertible. Thus by equation (2.2.3) we have B 1 (v)T = (v)S : But this means that B 1 is the change of basis matrix from T to S. Problem 2.24. Let S = f(2; 4); (3; 8)g and T = f(1; 1); (0; 2)g. Find 1. (w)T given that (w)S = (1; 1)S . 2. (w)S given that (w)T = (1; 1)T. Example 2.19. Suppose the vectors e1 and e2 are rotated (counterclockwise) about the z -axis degrees while e3 is left xed. A simple trigonometric calculation tells us that the rotation transforms the natural basis S of R3 into a new basis 80 1 0 1 0 19 cos sin 0 = < T = : @ sin A ; @ cos A ; @ 0 A ; 0 0 1 The change of basis matrix from T to S is given by 2 3 cos sin 0 B = 4 sin cos 0 5 0 0 1 By rotating back degrees (clockwise), it is easy to see that the change of basis matrix from S to T is simply Bt . One can verify with a separate calculation that in this case Bt = B 1 . 54 Vector Spaces 2.3 Linear Transformations and their Representation Prerequisites: Vector spaces. Subspaces. Matrix manipulation, Gaussian elimination. Bases. Learning Objectives: Ability to identify linear transformations. Familiarity with basic properties of linear transformations. Familiarity with the kernel and range of a linear transformation. Ability to determine the action of a linear transformation from its action on a basis. Familiarity with the \rank plus nullity" theorem. Ability to compute the standard matrix representation for a linear transformation from Rn to Rm . Ability to compute the matrix representation of a linear transformation in any bases. If a function L maps a vector space V to a vector space W , we denote it by L : V ! W: We call L a linear transformation if the following two properties hold: 1. L(u + v) = L(u) + L(v) 2. L(ku) = kL(u) for all vectors u; v 2 V and k real or complex. Example 2.20. Let L : R2 ! R3 be dened by L(x; y) = (x; x + y; x y) To see that L is linear simply note that L[(x; y) + (u; v)] = L(x + u; y + v) = (x + u; x + u + y + v; x + u y v) = (x; x + y; x y) + (u; u + v; u v) = L(x; y) + L(u; v) 55 Linear Transformations Problem 2.25. Prove that the following are linear transformations. 1. L : R3 ! R2 dened by L(a; b; c) = (a; b) 2. L : R2 ! P3 dened by L(a; b) = ax2 + bx2 + ax + (a + 2b) 3. L : Rn ! Rm dened by where A is an m n matrix. 4. L : V ! W dened by L(u) = Au; L(u) = 0 for all u in V . Here V and W are vector spaces. (The transformation L here is called the 0 transformation.) The linear transformation in the third part of the above problem L(u) = Au is called a matrix transformation. We shall see in the next section that all linear transformations between (nite dimensional) vector spaces can be represented as matrix transformations. Problem 2.26. Determine whether the following functions from R2 to R3 are linear. 1. L(x; y) = (0; 0; y) 2. L(x; y) = (px; x; 0) 3. L(x; y) = (1; x; y) We need the following elementary properties of linear transformations. Theorem 2.21. Let L : V ! W be a linear transformation. Then 1. L(0) = 0 2. L( v ) = L(v) Proof: By linearity L(0) = L(0 + 0) = L(0) + L(0): Hence L(0) = 0. Also L( v) = L( 1v) = 1L(v) = L(v): 56 Vector Spaces Problem 2.27. Prove that if L is linear then L(v w) = L(v) L(w): If L : V ! W is a linear transformation, the set of vectors u in V such that L(u) = 0 is called the kernel of L and denoted Ker(L). The set of vectors w in W such that there exists u in V with L(u) = w is called the range of L and denoted Ran(L). Problem 2.28. Prove that the kernel and range of a linear transformation from V to W are subspaces of V and W respectively. We have already used the terms kernel and range for matrices. Indeed, let L : Rn ! Rm be dened by L(u) = Au where A is an m n matrix. Here the kernel of L is precisely the kernel of A. Similarly the range of L is the range of A. Problem 2.29. Dene L(u) = Au where 2 3 4 1 2 3 6 2 1 1 4 77 A=6 4 2 1 1 45 6 0 9 9 1. Find a basis for the kernel and range of L. 2. Determine which of the following vectors are in the range of L: (0; 0; 0; 6); (1; 3; 0; 0); (2; 4; 0; 1): 3. Determine which of the following vectors are in the kernel of L: (3; 9; 2; 0); (0; 0; 0; 1); (0; 4; 1; 0): One of the most important facts about linear transformations is that once Indeed, let L:V !W be linear and let S = fv1 ; : : : vn g be a basis for V . Suppose we are given the values of L(v1 ); : : : ; L(vn ): we know the transformation on a basis, we know it completely. 57 Linear Transformations Then if v 2 V , we have By linearity we obtain v = a1 v1 + + an vn : L(v) = L(a1 v1 + + an vn ) = a1 L(v1 ) + + an L(vn ): Thus L is completely determined. Problem 2.30. Let L : R3 ! R2 be a linear transformation such that L(1; 0; 0) = (1; 1) L(0; 1; 0) = (0; 2) L(0; 0; 1) = (1; 2): Find L(2; 1; 3). If L : V ! W is a linear transformation, the following terminology follows naturally from the matrix case: the dimension of the range of L is the rank of L; the dimension of the kernel of L is the nullity of L. We have the following theorem which relates rank and nullity. Theorem 2.22. If L : V ! W is a linear transformation, then rank(L) + nullity(L) = dim(V ) Proof: Let r = rank(L) and fw1 ; w2 ; ; wr g be a basis for Ran(L). Likewise, let = nullity(L) and fn1 ; n2 ; ; n g be a basis for Ker(L). Then r vectors exist fv1 ; v2 ; ; vr g V so that L(vi ) = wi for each i = 1; 2; ; r. Consider the composite set of vectors in V , S = fv1 ; v2 ; ; vr ; n1 ; n2 ; ; n g Every vector in V can be represented as a linear combination of vectors in the set S. Indeed, pick an arbitrary vector v 2 V . Then for some choice of scalars f1 ; 2 ; ; r g L(v) = = r X i=1 r X i=1 i wi i L(vi ) r X = L( P r v ) = 0 and v So that L(v i=1 i i must be scalars f1 ; 2 ; ; g so that v r X i=1 i=1 i vi ) Pr i vi = i=1 i vi X j =1 j nj 2 Ker(L). But, then there 58 Vector Spaces and so, v is a linear combination of vectors in S: v= r X i=1 i vi + X j =1 j nj The vectors of S are linearly independent and so form a basis for V . To see this consider a linear combination of the vectors of S : r X i=1 i vi + X j =1 j nj = 0 Applying L to both sides of the equation yields 0 r X L @ i vi r X i=1 i=1 + i L(vi ) + X j =1 X j =1 1 j nj A = 0 j L(nj ) = 0 r X i=1 i wi = 0 Since fw1 ; w2 ; ; wr g is a basis for Ran(L), i = 0 for each i = 1; 2; ; r. But then, X j =1 j nj = 0 and since fn1 ; n2 ; ; n g is a basis for Ker(L), it must be that also j = 0 for each j = 1; 2; ; . We conclude that S must be linearly independent set of vectors so that dim(V ) = r + , as asserted by the theorem. 2 Problem 2.31. Let L : Rn ! Rn such that L(v) = 3v for all v 2 V . Find the kernel and range of L. Problem 2.32. Let v1 = (1; 2; 3); v2 = (2; 5; 3); v3 = (1; 0; 10) be a basis for R3 . Find a general formula for L(v), a linear transformation on V , in terms of L(v1 ); L(v2 ) and L(v3 ) given that L(v1 ) = (1; 0); L(v2 ) = (1; 0); L(v3 ) = (0; 1) Then nd L(1; 1; 1). 59 Linear Transformations 2.3.1 Matrix Representations Suppose L : R3 ! R2 is a linear transformation such that L(1; 0; 0) = (a11 ; a21 ) L(0; 1; 0) = (a12 ; a22 ) L(0; 0; 1) = (a13 ; a23 ): Consider the matrix A= a11 a12 a13 a21 a22 a23 It is easy to see that Lx = Ax for all x in R3 . Indeed if x = (x1 ; x2 ; x3 ) we have by the linearity of L that L(x1 ; x2 ; x3 ) = x1 L(1; 0; 0) + x2 L(0; 1; 0) + x3 L(0; 0; 1) = (x1 a11 ; x1 a21 ) + (x2 a12 ; x2 a22 ) + (x3 a13 ; x3 a23 ) = (x1 a11 + x2 a12 + x3 a13 ; x1 a21 + x2 a22 + x3 a23 ): Clearly x 1 a11 + x2 a12 + x3 a13 A(x) = x a + x a + x a 1 21 2 22 3 23 Thus we have expressed L in terms of the matrix A. Note that to nd the columns of A we have taken the natural basis vectors in R3 applied L to them and expressed the resulting vectors in their natural basis coordinates in R2 . We say then that A is the matrix of L with respect to the bases S = f(1; 0; 0); (0; 1; 0); (0; 0; 1)g and T = f(1; 0); (0; 1)g : Since both bases are natural we also call A the standard or natural matrix of L. Example 2.23. 1. Let L : R3 ! R3 be dened by L(a1 ; a2 ; a3 ) = (a1 ; a2 + a3 ; 0): Since L(1; 0; 0) = (1; 0; 0); L(0; 1; 0) = (0; 1; 0); L(0; 0; 1) = (0; 1; 0) we have that 2 3 1 0 0 A=4 0 1 1 5 0 0 0 2. Let L : R2 ! R2 be dened by L(a1 ; a2 ) = (a2 ; a1 ) 60 Vector Spaces Since we have that L(1; 0) = (0; 1); L(0; 1) = (1; 0); 0 1 1 0 Problem 2.33. In parts 1. and 2. nd the standard matrix for the given linear transformation L. 1. Let L : R2 ! R4 be dened by L(x1 ; x2 ) = ( x2 ; x1 ; x1 + 3x2 ; x1 + x2 ): A= 2. Let L : R4 ! R5 be dened by L(x1 ; x2 ; x3 ; x4 ) = (x4 ; x1 ; x3 ; x2 ; x1 x3 ): 3. Let L : R2 ! R2 map each vector into its symmetric image about the y-axis. Suppose now that L : V ! W is a linear transformation between two nite dimensional vector spaces V and W with bases S = fv1 ; : : : ; vn g and T = fw1 ; : : : ; wm g respectively. To nd the matrix A of L with respect to S and T, we proceed as before. If L(v1 ) = w1 ; : : : ; L(vn ) = wn ; we set the n columns of A to be (w1 )T ; : : : (wn )T : Then if L(v) = w; we have that A(v)S = (w)T : An easy way to remember how to nd A is to write it in the form A = [(L(v1 ))T : : : (L(vn ))T ]: Warning ! The relationship between L and A depends completely on the bases S and T. Indeed for A to represent L with respect to S and T, it must consider all its input vectors with respect to the S coordinate system and all its output vectors with respect to the T coordinate system. We also remark that when V = W and S = T, we sometimes call A the matrix of L with respect to S. 61 Linear Transformations ! R be dened by L(x ; x ) = (x + x ; 2x + 4x ): It is easy to see that the standard matrix for L with respect to the natural basis is 1 1 A= 2 4 Example 2.24. Let L : R2 1 2 2 1 2 1 2 Suppose we want to nd the matrix B of L with respect to S = T = fu1 ; u2 g ; where u1 = (1; 1); and u2 = (1; 2): As before we nd that B = [(L(u1 ))T (L(u2 ))T ] Now L(u1 ) = (2; 2) = (2; 0)T ; and L(u2 ) = (3; 6) = (0; 3)T : We obtain B = 20 03 One notes in the above example that although L(1; 0) = (1; 2); we have that B 10 = 20 This is not a contradiction. Indeed B only inputs vectors in the S basis. Hence the vector multiplied by B is actually the vector (1; 0)S which in the natural basis is the vector (1; 1). If L is a linear transformation from Rn to Rm and we want to nd the kernel and range of L, we should rst nd the standard matrix A for L and then nd the kernel and range of A. Problem 2.34. 1. Let L : R2 ! R3 be dened by L(x1 ; x2 ) = (x1 x2 ; x1 ; 0): Find the matrix of L with respect to S = f(1; 3); ( 2; 4)g and T = f(1; 1; 1); (2; 2; 0); (3; 0; 0)g : 62 Vector Spaces 2. Let L : R3 ! R3 be dened by L(x1 ; x2 ; x3 ) = (x1 x2 ; x2 x1 ; x2 x3 ): Find the matrix of L with respect to S = f(1; 0; 1); (0; 1; 1); (1; 1; 0)g Problem 2.35. In Problem 2.34 parts 1. and 2. nd a basis for the kernel and range of L. 2.3.2 Similarity of Matrices Let L : V ! V be a linear transformation on a nite dimensional vector space. Let R and S be two bases for V and let A and B be the matrix of L with respect to R and S respectively. As the reader may have suspected, there is a special relationship between A and B. Indeed, let P be the change of basis matrix from S to R. Then if L(v) = (w) we have A(v)R = (w)R : This implies that AP(v)S = P(w)S : But also B(v)S = (w)S : Since the last two equations are true for every v 2 V , we must have that B = P 1 AP: (2.4) This leads to the following denition. If A and B are two n n matrices related by equation (2.4) we say that A is similar to B. Problem 2.36. Show that if A is similar to B then B is similar to A. If A is similar to B, the preceeding problem allows us to say that A and B are similar with no ambiguity. We have just seen that if two n n matrices A and B represent the same linear transformation, then A and B are similar. Suppose A and B are similar. We would like to show that they can be made to represent a common linear transformation. To see this dene L(v) = Av: Then L is a linear transformation from Rn to itself and A is the standard matrix for L. (Note since we have not subscripted v with a basis on the right side of Linear Transformations 63 the above equation, we are, in keeping with our convention, assuming it has natural basis coordinates.) We dene S = fPe1 ; : : : Pen g where e1 ; : : : en are the natural basis vectors for Rn . Since P is invertible, S is a linearly independent set and hence a basis for Rn and P is the change of basis matrix from S to the natural basis. Problem 2.37. Show that the claims made in the last sentence are true. Hence B is the matrix of L with respect to S. Indeed if L(v) = w then B(v)S = P 1 AP(v)S = P 1 Av = P 1w = (w ) S : Example 2.25. Let L, S, A, and B be as in Example 2.24. We now know that A and B are similar. In fact from the above discussion B = P 1 AP where P is the change of basis matrix from S to the natural basis. Thus 1 1 P= 1 2 64 Vector Spaces 2.4 Determinants Prerequisites: Vector spaces. Subspaces. Matrix manipulation, Gauss elimination. Bases Linear transformations Learning Objectives: Familiarity with the concept of the determinant as a measure of the distortion of a matrix transformation. Abilitiy to compute the determinant of a matrix by expressing the matrix as a product of elementary matrices. If A is a 2 2 matrix A = ac db ; the inverse of A is given by solving AB = I to get B= 1 ad bc d c b a : The quantity ad bc is called the determinant of A and is nonzero if and only if A is invertible. By algebraically calculating the inverse of a general square matrix A, determinants can be dened analogously by factoring out the denominators of the resulting elements. Not surprisingly, determinants have played an important historical role in the development of the theory of linear systems, however their current role is much diminished and restricted for the most part to use as an analytical tool. Our goal in this section is to give geometric motivation for the denition of the determinant of a square matrix and to show the reader a reasonable way to compute it. We begin by proving the following theorem. Theorem 2.26. Let A 2 Rnn , and let Rn be a bounded open set with smooth boundary. We let A( ) := fy 2 Rn j y = Ax for some x 2 g: 65 Determinants denote the image of under the linear transformation A. vol(A( )) vol( ) Then the ratio depends only on the matrix A. In particular, it is independent of the set : To prove this we use the following lemmas. Lemma 2.27. Any linear transformation L : Rn ! Rn maps parallel lines to parallel lines, parallelpipeds to parallelpipeds, and translated sets to translated sets. Proof. Let `1 and `2 be parallel lines in Rn . Then there are vectors u1 , u2 , and v such that `i = fui + tv j t 2 Rg i = 1; 2: Now by linearity L(`i ) = fL(ui + tv) = L(ui ) + tL(v) j t 2 Rg i = 1; 2: So the sets L(`i ) are parallel lines as well. The second assertion follows directly from the rst. The third assertion is left as a problem. Problem 2.38. Prove that any linear transformation L : Rn ! Rn maps translated sets to translated sets, i.e. let Rn and let + v := fv + x j x 2 g: Show that L( + v) = L() + L(v): Lemma 2.28. Let ai > 0 i; 1; 2; : : : ; n and let P (a1 ; a2 ; : : : ; an ) := f(x1 ; x2 ; : : : ; xn ) 2 Rn j 0 xi ai ; i; 1; 2; : : : ; ng denote a parallelpipe with sides on length ai on the positive ith coordinate axis. We let P1 := P (1; 1; : : : ; 1) denote the unit parallelpiped. Then if A 2 Rnn vol(A(P (a1 ; a2 ; : : : ; an )) = a1 a2 an vol(A(P1 )): We rst note that by linearity and the properties of Euclidean length kA(ei )k = kA(ei )k for any > 0. Thus the length of a side of the deformed parallelpiped is proportional to the length of the original side. Proof. 66 Vector Spaces We now claim that vol(A(P (a1 ; a2 ; : : : ; an )) = ai vol(A(P (a1 ; a2 ; : : : ; ai 1 ; 1; ai+1 ; : : : ; an )) i = 1; 2; : : : ; n: This is clear in two dimensions where the area of a parallelogram with sides a1 and a2 is given by a1 a2 sin where is the acute angle between the two sides. Thus the area scales with the length of any one side. It is also clear in three dimensions where the volume of a parallelpiped with sides of lengths a1 , a2 , and a3 is given by a1 a2 a3 sin 1 sin 2 . Here 1 is the acute angle between the sides a1 and a2 , and 2 is the acute angle between the side a3 and the plane containing a1 and a2 . The denition of the volume of a general parallelpiped in higher dimensions is dened similarly, but requires a more general denition of projection and angle, so we skip the proof for these cases. The lemma follows directly from the assertion above. Proof of Theorem 2.26. It follows immediately from lemma 2.28 that vol(A(P (a1 ; a2 ; : : : ; an )) = vol(A(P )): 1 vol(P (a1 ; a2 ; : : : ; an ) For any choice of a1 ; a2 ; : : : ; an . Note that by lemma 2.27 this also holds for translations of the parallelpiped P (a1 ; a2 ; : : : ; an). For a general region Rn , the volume is dened by integration theory. This is, the volume of is approximated by the sum of the volumes of small parallelpipeds with disjoint interiors and sides along the coordinate axes. The volume of A( ) can be approximated by the sum of volumes of the deformed parallelpipeds. Each of the tems in the sum can be expressed as the product of vol(A(P1 )) and the volume of the corresponding parallelpiped in . Thus, for every approximating partition of and A( ) into a collection of parallelpipeds, the ratio of the two approximations is always vol(A(P1 )). Taking the limit gives us the nal result. Denition 2.29. We call the constant vol(A(P1 )) the distortion of A and denote it by dist(A) := vol(A(P1 )): Problem 2.39. In the following problem, all matrices are assumed to have real entries. 1. Show if E is a 2 2 elementary matrix ofType 1 or 2, then dist(E)=1. 2. Show that if D is a 2 2 diagonal matrix, then dist(D) is the absolute value of the product of the diagonal entries. 3. Show dist(A)(B)= dist(A)dist(B). 4. Show dist(A)=0 if and only if A is singular. 67 Determinants Problem 2.40. Show that the distortion of A can be computed by reducing it to row echelon form using elementary matrices. Give an example of the process. Remark 2.30. When integrating vector valued functions in the new coordinate system in R2 (for example using Green's Theorem), it is not enough to know the distortion of A, we also need to know whether the orientation of the path is preserved. Specically, suppose we have a closed curve which is oriented positively, that is as we traverse the curve, the inside is to our left. If we map this curve under A does the resulting closed curve have the same property or is the inside now to the right? In the rst case A is called orientation preserving while in the second case, it is called orientation reversing. Without this knowledge the integral is correct only up to sign. Problem 2.41. The following exercises reveal the orientation properties of 22 elementary matrices. 0 1 1. Let E = 1 0 . Show that E (an elementery matrix of Type 2) is orientation reversing. 2. Show that any 2 2 Type 1 elementary matrix with real entries is orientation preserving. 3. Show that any 2 2 nonsingular diagonal matrix with real entries is orientation preserving if and only if the product of the diagonal elements is positive. Remark 2.31. With the motivation of Problems 2.39 and 2.41, we would like to dene the determinant of A, denoted det(A)2 , to be a complex valued function on n n matrices with complex entries that measures distortion according to the properties of Problem 2.39 and and measures orienation according to the properties of Problem 2.41. Unfortunately, although the function dist(A) is well dened, it is not at all clear whether the same would be true of det(A) given by the properties above. Indeed, we would require that the determinant of a diagonal matrix D is the product of its diagonal entries. However, if we factor such a matrix into two matrices, say D = AB, can we be sure that that det(A) det(B) is not the negative of det(D)? To avoid this problem, we will a give rather mysterious denition of the determinant and then prove that it satises the properties suggested by Problems 2.39 and 2.41. Denition 2.32 (Determinant). If (j1 ; : : : ; jn ) is an ordered n-tuple of integers between 1 and n inclusive, dene Y s(j1 ; : : : ; jn ) = sgn(jq jp ); p<q notation jAj is sometimes used to denote the determinant but we will reserve this notation to represent the matrix having as its elements the absolute values of the elements of A. 2 The 68 Vector Spaces Q where denotes a product and sgn(x) is 1 if x is positive, 1 if x is negative and 0 if x = 0. For an n n matrix A 2 C nn having entries [aij ] we dene X det(A) := s(j1 ; : : : jn )a1;j1 a2;j2 an;jn ; where the sum extends over all ordered n-tuples of integers (j1 ; : : : jn ) with 1 j` n. Remark 2.33. Note that s(j1 ; : : : ; jn ) is either 1, 1 or 0 and it changes sign if any two of the j s are interchanged. Because of this, each nonzero term in the sum dening the determinant consists of a product of entries of A one from each column. Since s(j1 ; : : : ; jn ) = 0 if any two of the j 0 s are equal, we only have n! terms in all. Problem 2.42. Show that the formula for the determinant of a 3 3 matrix is a11 a22 a33 a11 a23 a32 + a12 a23 a31 a12 a21 a33 + a13 a21 a32 a13 a22 a31 Theorem 2.34. The determinant has the following properties. 1. det(E) = 1 if E is an elementary matrix of Type 1. 2. det(E) = 1 if E is an elementary matrix of Type 2. 3. det(D) equals the product of the diagonal elements if D is diagonal. 4. det(AB) = det(A) det(B): 5. det(A) = 0 if and only if A is singular. Problem 2.43. Use the denition of the determinant to show that the rst three properties in Theorem 2.34 hold. It is a little trickier to prove that the fourth and fth properties of Theorem 2.34 hold. They are consequences of the following lemma. Lemma 2.35. For any j = 1; 2; : : : ; n, the determinant is a linear function of the j th column if all the other columns are left xed. We omit the proof3 . Problem 2.44. Let E be an elementary matrix of Type 1 or 2. Prove, using the denition of a determinant, that the determinant of E is equal to that of its transpose. Then prove, using Theorem 2.34 show that this is true for any square matrix. Problem 2.45. Show that the determinant of an upper triangular matrix 2 6 A=6 6 4 a11 a12 : : : a1n a22 : : : a2n 0 is the product of its diagonal entries. 3 See ... ann 3 7 7 7 5 W. Rudin, Principles of Mathematical Analysis, pp.232-234. Determinants 69 Example 2.36. We now compute a determinant by expressing it as the prod- uct of elementary matrices. Note that we do not explicitly write down these elementary matrices, but simply need to keep track of the eect they have on the determinant as we row reduce the matrix to an upper triangular one (i.e. put the matrix in row echelon form). 0 1 0 1 0 1 5 3 6 9 det @ 3 6 9 A = det @ 0 1 5 A 2 6 1 2 6 1 0 1 1 2 3 = 3 det @ 0 1 5 A 2 6 1 0 1 1 2 3 5 A = 165 = 3 det @ 0 1 0 0 55 Problem 2.46. Compute 0 1 3 6 9 3 B 1 0 1 0C C det B @ 1 2 2 1A 1 3 2 1 Problem 2.47. Compute 0 1 0 2 1 0 B 1 0 1 1 C C det B @ 2 1 3 1 A 0 1 2 3 Problem 2.48. Show that the system ax + by = e cx + dy = f is consistent for any value of e and f if and only if det(A) 6= 0. 70 Vector Spaces Chapter 3 Inner Products and Best Approximations 3.1 Inner Products Prerequisites: Vector spaces. Complex arithmetic. Advanced Prerequisites: Function spaces. Learning Objectives: Familiarity with the denitions of inner products Familiarity with the basic examples of inner product spaces. Familiarity with the Cauchy-Schwartz inequality. Familiarity with the concept of orthogonality and the orthogonal complement in inner product spaces. In R2 , one is able to calculate the angle between vectors using the dot product. Indeed, if u = (u1 ; u2 ) and v = (v1 ; v2 ), the angle between u and v satises: cos() = kuukkvvk 71 72 Inner Products and Best Approximations p where u v = u1 v1 + u2 v2 is the dot product of u and v; and kuk = u u = p u21 + u22 is the (usual) Euclidean length of a vector u. In order to extend the geometric notions of \angle between vectors" and \length of a vector" to more general vector spaces it would seem enough to have a suitable generalization of the dot product to more general vector spaces. The inner product is such a generalization. Let V be a complex vector space. An inner product on V is a function that maps pairs of vectors u; v 2 V to a complex scalar hu; vi in such a way that for all u; v 2 V : 1. hu; ui is real and nonnegative, and hu; ui = 0 if and only if u = 0. 2. hu; vi = hv; ui 3. hu; v + wi = hu; vi + hu; wi If V is a real vector space then for every pair of vectors u; v 2 V , hu; vi is a real scalar so that conditions (1) and (3) continue to hold; condition (2) becomes: hu; vi = hv; ui A vector space V on which an inner product is dened is called an inner product space p . We dene the norm of a vector in an inner product space as kuk = hu; ui Examples of inner product spaces: R2 { the set of real ordered pairs. For vectors u = (u1 ; u2) and v = (v1 ; v2 ) in R2 , hu; vi = u1 v1 + u2v2 = ut v denes an inner product on R2 . This provides a model for generalizations to other vector spaces. Cn { the set of complex n-tuples. For vectors u = (u1 ; u2 ; : : : ; un) and v = (v1 ; v2 ; : : : ; vn ) in C n , hu; vi = n X i=1 ui vi = u v denes an inner product on Cn . (The conjugate transpose of a matrix A = [aij ] is dened elmentwise as A = B = [bij ] with bij = aji ). Pn { the set of (complex) polynomials of degree n or less. Pick n + 1 distinct points fzi gni=0 in the complex plane. For polynomials p; q 2 Pn , hp; qi = denes an inner product on Pn . n X i=0 p(zi )q(zi ) 73 Inner Products C mn { the set of m n matrices with complex entries. For any matrix P T C mn dene trace(T) = ni=1 tii . Then for any A; B C mn , 2 2 hA; Bi = trace(A B) denes an inner product on C mn . The associated norm: kAkF = trace(A A) = P i;j jaij j is called variously the Frobenius norm, the Hilbert-Schmidt norm, or the Schatten 2-norm. We adhere to the name \Frobenius" and will hang an \F" on such matrix norms to distinguish them from others yet to come. C [a; b] { the set of real-valued continuous functions on [a; b]. For any functions f; g 2 C [a; b], 2 2 hf; gi = Z b a f (x)g(x) dx denes an inner product on C [a; b]. Problem 3.1. Prove that in each case above, the function h; i satises all the conditions of an inner product. It is a subtle but fundamental observation that the dening properties of the inner product are suÆcient to guarantee that jhu; vij=kukkvk 1. This allows us to extend sensibly the notion of angle between vectors in general vector spaces so that the angle uv , between the vectors u; v 2 V is dened so that cos(uv ) = hu; vi=kukkvk: The following theorem guarantees that the right-hand quantity is bounded by 1 in magnitude and is known as the Cauchy-Schwarz inequality. (Note that the conditions given for equality translate into the observation that uv is dened to be either 0 or if and only if u and v are collinear.) Theorem 3.1. Let u; v 2 V , an inner product space. Then jhu; vij2 kuk2 kvk2 Equality holds if and only if u = kv, for some scalar k 2 C . Proof: Let a = kvk2 , b = 2jhu; vij and c = kuk2 . First, if a = 0 then v = 0 (by property 1 of an inner product) which would then imply (by property 3 of an inner product) that b = 0. Clearly in this case the conclusion is true. Now, consider the case a > 0 and choose 2 [0; 2) so that ei hu; vi is a real nonnegative number. Then, ei hu; vi = jei hu; vij = jhu; vij Pick a real number t arbitrarily and dene z = tei . One may calculate: 0 ku + zvk2 = hu + zv; u + zvi = kuk2 + z hu; vi + z hu; vi + jz j2 kvk2 (3.1) = c + bt + at2 : 74 Inner Products and Best Approximations Since this is true for all t 2 R, the quadratic polynomial at2 + bt + c is always nonnegative; has only complex conjugate roots; and thus a nonpositive discriminant b2 4ac. Thus b2 4ac and the conclusion is obtained by taking square roots on each side. Notice that equality in the Cauchy-Schwarz inequality is equivalent to a zero discriminant (b2 = 4ac) which means that equality can be attained in (3.1) with t = b=2a. But that means in turn that u + zv = 0.2 Let V be an inner product space. Two vectors u and v in V are called orthogonal (or perpendicular) if hu; vi = 0. A set of vectors fv1 ; v2 ; : : : ; vr g 2 V is called orthogonal if hvi ; vj i = 0; i 6= j: An orthogonal set is called orthonormal if furthermore hvi ; vi i = kvi k2 = 1; i = 1; 2; : : : ; r: p p For example the set f(0; i; (i + 1)= 2); (0; (1 + i)= 2; 1)g is orthonormal. Let W be a subspace of an inner product space, V . The set of vectors W ? = fu 2 V such that hu; wi = 0 for all w 2 W g is called the orthogonal complement of W . Problem 3.2. Show that W ? is a subspace of V W W? ? W \ W ? = f0g. Problem 3.3. Show that if B is a set of orthogonal vectors, none of which is the zero vector, then B is linearly independent. Problem 3.4. Show that if B = fw1; w2 ; : : : ; w` g is a set of linearly independent vectors, then ` ` matrix, G = [hwi ; wj i] is nonsingular. G is called the Gram matrix for B . Problem 3.5. Prove the Pythagorean Theorem in inner product spaces: If u and v are orthogonal then ku + vk2 = kuk2 + kvk2 . 3.2. 75 BEST APPROXIMATION AND PROJECTIONS 3.2 Best approximation and projections Prerequisites: Inner product spaces. Matrix algebra. Learning Objectives: Familiarity with the concept of (possibly skew) projections. The ability to compute orthogonal projections. Understanding of the role of orthogonal projections in solving best approximation problems. Very often in application settings a particular subspace, W say, of an inner product space, V , may have some special properties that make it useful and interesting to approximate any given vector v 2 V as well as possible with a corresponding vector w 2 W . That is, nd w 2 W that solves min kw vk = kw vk (3.2) w2W We have the following characterization of the solution w Theorem 3.2. Let W be a subspace of an inner product space, V . The vector w 2 W is a solution to (3.2) if and only if w v ? W . Furthermore, for any given v 2 V , there can be no more than one such solution w . Suppose that w 2 W is a solution to (3.2) and pick an arbitrary vector w 2 W . Choose a 2 [0; 2) so that e{ hw v; wi is real and nonnegative. Now, for any real " > 0 dene z = "e{ and notice that Proof: kw vk k(w + zw) vk (3.3) = kw vk 2"jhw v; wij + " kwk So for all " > 0, 0 2"jhw v; wij + " kwk which means, 0 2jhw v; wij "kwk Since we are free to make " as small as we like, it must be that hw v; wi = 0 and this is true for each w 2 W . 2 2 2 2 2 2 2 2 76 Inner Products and Best Approximations To prove the converse, suppose w 2 W satises w v ? W . Then for any w 2 W , (w w ) 2 W; and kw vk2 = k(w w ) + (w v)k2 = k(w w )k2 + k(w v)k2 k(w v)k2 so w solves (3.2). To prove uniqueness, suppose that there were two solutions to (3.2, say w1 and w2 . Then (w1 v) 2 W ? and (w2 v) 2 W ? . Since W ? is a subspace, we nd (w1 v) (w2 v) = (w1 w2 ) 2 W ? : On the other hand, (w1 w2 ) 2 W . Since the only vector both in W and in W ? is 0 (see Problem 3.2), we nd w1 = w2 . 2 While this is a nice characterization of solutions to best approximation problems, this result leaves open the question of whether a solution w to (3.2) always exists and if so, how one might go about calculating it. The following theorem describes one way of obtaining a solution to (3.2). Theorem 3.3. Let V be an inner product space. If W is a nite-dimensional subspace of V , then every vector v 2 V can be expressed uniquely as v = w + w? ; where w 2 W and w? 2 W ? . w is the unique solution to (3.2). Proof: Let B = fw ; w ; : : : ; wr g be a basis for W . Dene [ij ] to be the r r matrix inverse to [hwi ; wj i] (see Problem 3.4) and 1 w = r X i;j =1 2 wi ij hwj ; vi and pick an arbitrary w 2 W . Then w = a1 w1 + + ar wr where ai 2 C , i = 1; 2; : : : ; r: We need only prove that v w and w are orthogonal regardless of which w 2 W was chosen and then we can dene w? = v w . Using properties of the inner product we see that Pr hw; v w i = P k=1 ak hwk ; v w i = rk=1 ak (hwk ; vi hwk ; w i) Observe that hwk ; w i = = r X hwk ; wi iij hwj ; vi i;j =1 ! r X r X hwk ; wi iij hwj ; vi = hwk ; vi j =1 i=1 77 Pro jections Thus hw; v w i = 0. Since w v 2 W ? , w is the unique solution to (3.2). 2 Corollary 3.4. W = W ? ? ? Proof: We saw in Problem 3.2 that W W ? . Pick any vector v 2 W ? ? . Then v = w + w? and since w 2 W W ? ? , one nds that ? v w 2 W ? . But v w = w? 2 W ? ; so v w = 0 and v = w 2 W: 2 The vector w in the above decomposition of v is called the orthogonal projection of v onto W . The vector w? is called the component of v orthogonal to W . From the proof of the theorem, one can see that best approximations to a subspace W , can be found in a straightforward way if a basis for W is known. The mapping that carries the vector v to the vector w that solves (3.2) is called an orthogonal projector, which we will denote as w = PW (v). PW is a linear transformation, since evidently w = r X i;j =1 wi ij hwj ; vi inherits linearity with respect to v from the linearity of the inner product with respect to its second argument. If V = C n , we have the following convenient matrix representation of PW in terms of fw1 ; w2 ; : : : ; wr g: PW = WG 1 W ; where W = [w1 ; w2 ; : : : ; wr ] and G = W W is the Gram matrix for fw1 ; w2 ; : : : ; wr g. In general, what characterizes an orthogonal projector ? Theorem 3.5. PW is an orthogonal projector onto a subspace W of C n if and only if: 1. Ran(PW ) = W = PW = PW Suppose that PW represents an orthogonal projector onto W , so that w = PW v solves (3.2) for each v 2 C n . Then, in particular, for any vector w 2 W , v = w itself solves (3.2) so w = PW w, and as a consequence Ran(PW ) = W . Furthermore, for any vector v 2 C n , w = PW v 2 W , so P2W v = PW (PW v) = PW w = w = PW v; implying that P2W = PW . Finally, for any vectors u; v 2 C n , u PW u 2 W ? and PW v 2 W so, hu PW u; PW vi = 0 hu; PW vi hPW u; PW vi = 0 hu; PW vi hu; PW PW vi = 0 hu; (PW PW PW )vi = 0 2. P2W 3. PW Proof: 78 Inner Products and Best Approximations Thus, PW = PW PW and as a consequence, (PW ) = (PW PW ) = PW PW = PW : Conversely, suppose that PW is a matrix satisfying the three properties above. Then for any vector v 2 C n and any vector w 2 W , we nd hv PW v; wi = hv; wi hPW v; wi = hv; wi hv; PW wi = hv; wi hv; PW wi = hv; wi hv; wi = 0 Thus, PW v solves (3.2) for each v 2 C n and so, represents an orthogonal projection. 2 Problem 3.6. If PW represents an orthogonal projection onto a subspace W of C n , show that I PW represents an orthogonal projection onto W ? . Problem 3.7. Given a matrix, C 2 Rnr such that Ker(C) = f0g, show that PRan(C) = C(Ct C) 1 Ct represents an orthogonal projection onto Ran(C). What role does the assumption on Ker(C) play here ? Problem 3.8. Given a matrix, B 2 Rrm such that Ran(B) = Rr , show that PKer(B) = I Bt (BBt ) 1 B represents an orthogonal projection onto Ker(B). What role does the assumption on Ran(B) play here ? Any linear transformation that QW that satises the two properties 1. Ran(QW ) = W 2. Q2W = QW is called a skew projector (or just a projector) onto the subspace W . Problem 3.9. Show that if QW is a projector (either skew or orthogonal), then Ran(QW ) = Ker(I QW ) and Ker(QW ) = Ran(I QW ) Problem 3.10. Suppose A 2 Rmn has a left inverse BL . Prove that Q = ABL is a (possibly skew) projector onto Ran(A). Problem 3.11. Suppose A 2 Rmn has a right inverse BR . Prove that Q = I BR A is a (possibly skew) projector onto Ker(A). Let V = R3 and let W be spanned by (1; 0; 0) and (0; 1; 0), that is W is the x-y plane. For given u 2 R3 , the vectors u; w and w? form a right triangle with u the hypotenuse, w in the x-y plane and w? parallel to the z -axis. Problem 3.12. Let W be the subspace of R3 spanned by the vectors v1 = (0; 1; 0); v2 = ( 4=5; 0; 3=5): Show that v1 and v2 form an orthonormal set and nd PW u, where u = (1; 2; 1). 3.3. 3.3 PSEUDOINVERSES 79 Pseudoinverses Prerequisites: Vector spaces. Inner products. Matrix inverses. Left and right inverses. Projections. Learning Objectives: Familiarity with pseudosolutions and pseudoinverses. Finding closest vectors out of subspaces can be used to extend the concepts of left and right inverses. Suppose that A 2 C mn has a right inverse and m < n. We know nullity(A) > 0 so A cannot have a left inverse. Although Ax = b is consistent for any b, each right hand side b will be associated with an innite family of solutions. Problem 1: Find the smallest solution x^ to Ax = b. Now, suppose instead that A 2 C mn has a left inverse and m > n. We know rank(A) = n < m so A cannot have a right inverse and Ax = b will be inconsistent for some b. Problem 2: Find a vector x^ that brings Ax as close as possible to b. In each of these cases, we seek vectors that are in some sense or other the best possible \solution" to Ax = b. Given any (rectangular) matrix A 2 C mn , the pseudoinverse of A is a matrix B 2 C nm , that provides for each b 2 C m , a solution Bb = x 2 C n to the following best approximation problem that is an aggregate of sorts of Problems 1 and 2: Problem 3 (= 1 + 2): Find a vector x^ that solves minx2C n kAx bk (3.4) such that kxk is minimal. A variety of notations are found for the pseudoinverse; the most common appears to be B =\Ay ". The denition (3.4) does little to give insight into what actions are taken to transform b into x . For that reason, the following prescription for constructing x may be more useful as a denition of Ay : The Action of the Pseudoinverse 80 Inner Products and Best Approximation Dene P to be the orthogonal projection onto Ran(A) and Q to be the orthogonal projection onto Ker(A)? . 1. Find the component of b in Ran(A): Pb = y . 2. Find any one solution, x^ , to the linear system Ax = y . 3. Find the component of x^ in Ker(A)? : x = Qx^ . Two issues immediately emerge: Is the construction above well-dened to the extent that the nal result x is the same regardless of which intermediate result x^ was picked ? Does x solve (3.4) ? Theorem 3.6. The construction of x specied above is well-dened and produces the unique solution to (3.4). Proof: Step (1) denes y uniquely as the solution to (3.2) with W = Ran(A) and v = b: ky bk = min ky bk = xmin kAx bk 2C n y2Ran(A) Since y 2 Ran(A), the linear system Ax = y of Step (2) must be consistent and has at least one solution, say x^ . Notice that x^ is a solution to min kAx bk = kAx^ bk x2C n and in fact, any solution to Ax = y will be a minimizer in the same sense. To show that the outcome of Step (3) is independent of which solution x^ was picked in Step (2), suppose that two solutions, x^ 1 and x^ 2 , were known that solve Ax = y and these happened to produce two outcomes in Step (3): x1 = Qx^ 1 and x2 = Qx^ 2 . Since Ax^ 1 = y = Ax^ 2 , rearrangement gives A(^x1 x^ 2 ) = 0 so that (^x1 x^2 ) 2 Ker(A). We nd x1 x2 = Q(^x1 x^ 2 ) = 0: Thus, x1 = x2 and the outcome of Step (3) is uniquely determined regardless of which solution in Step (2) was used. Furthermore, any such solution, x^ can be decomposed as x^ = x + n^ where n^ 2 Ker(A). By the Pythagorean Theorem, kx^k2 = kx k2 + kn^ k2 so that out of all possible solutions, x^ , the minimal norm solution must occur when n^ = 0 { that is, when x^ = x . In some cases, the prescription for x = Ay b can be used to nd what amounts to a formula for Ay . 81 Pseudoinverses Theorem 3.7. If A 2 C mn has the full rank factorization A = XY , where X 2 C mp and Y 2 C np are both of rank p, then the pseudoinverse is given by Ay = Y(Y Y) 1 (X X) 1 X Proof: We rst construct the projections P and Q. Notice that Ran(A) = Ran(X) and that Ker(A)? = Ran(A ) = Ran(Y). Then, directly P = X(X X) 1 X ; Q = Y(Y Y) 1 Y We now seek a solution to the system of equations Ax^ = XY x^ = X(X X) 1 X b = Pb Noticing that rank(X) = p implies that Ker(X) = f0g, we can rearrange to nd 0 = XY x^ X(X X) 1 X b 0 = X(Y x^ (X X) 1 X b) 0 = Y x^ (X X) 1 X b While this does not yield a formula for x^ itself (recall that there won't generally be just one solution), we may premultiply both sides of the nal equation by Y(Y Y) 1 to get Y(Y Y) 1 Y x^ = Y(Y Y) 1 (X X) 1 X b which implies : : : x = Qx^ = Y(Y Y) 1 (X X) 1 X b (3.5) Problem 3.13. Using the permuted LU factorization, show that for any matrix A 2 C mn , if the rank of A is p then there are matrices X 2 C mp and Y 2 C np , both of rank p, so that A = XY . 82 Inner Products and Best Approximation 3.4 Orthonormal Bases and the QR Decomposition Prerequisites: Vector spaces. Inner products. Matrix inverses. Left and right inverses. Projections. Learning Objectives: Ability to write vectors as linear combinations of an orthonormal basis. Ability to use the Gram-Schmidt process to convert any basis to an orthonormal basis. Ability to compute the QR decomposition of a matrix. If S = fw1 ; : : : ; wr g is a basis for a subspace W of a vector space V and u is an arbitrary vector in W , then u can be uniquely expressed as u = k1 w1 + + kr wr In particular, if W is a subspace of C n then by lining up the vectors of S as columns of a matrix W = [w1 ; w2 ; : : : ; wr ] and placing the unknown coeÆcients into a vector, k = [k1 ; : : : ; kr ]t , the coeÆcients can be found directly by solving the system of equations Wk = u. If the system is inconsistent then u was not in the subspace W after all. The next theorem tells us that if the vectors of S are orthonormal then the coeÆcients are especially easy to nd whenever u is in the subspace W and the case that u is not in the subspace W is also especially easy to discover. Theorem 3.8. If W = fw1 ; w2 ; : : : ; wr g is an orthonormal basis for a subspace W of an inner product space V , then r X i=1 jhwi ; uij kuk 2 2 (\Bessel's Inequality") and u 2 W if and only if kuk = 2 r X i=1 jhwi ; uij 2 (\Parseval Relation") 83 QR Decomposition in which case, u = hw1 ; uiw1 + hw2 ; uiw2 + + hwr ; uiwr Proof: Set i = hwi ; ui and calculate 0 ku r X i=1 i wi k2 =kuk2 hu; =kuk2 =kuk2 r X j =1 r X j =1 r X j =1 j wj i h r X i=1 r X j hu; wj i i=1 j j = kuk2 i wi ; ui + i hwi ; ui + r X j =1 r X i;j =1 r X i=1 i j hwi ; wj i i i jj j 2 This establishes Bessel's Inequality. The Parseval relation holds if and only if kuk2 = Prj=1 jj j2 which in turn occurs precisely when u = Pri=1 i wi . On the other hand, if u 2 W then since W is a basis for W , there exist constants k1 ; : : : ; kr such that u = k1 w1 + k2 w2 + + kr wr : Taking the inner product against w1 , we obtain that 1 = hw1 ; ui = hw1 ; k1 w1 + k2 w2 + + kr wr i = k1 hw1 ; w1 i + k2 hw1 ; w2 i + + kr hw1 ; wr i = k1 1 + k2 0 + + kr 0 = k1 Similarly, 2 =hw2 ; ui = k2 ; .. . r =hwr ; ui = kr : P Thus, u = ri=1 i wi and the Parseval Relation must hold. 2 Notice that any u 2 V can be written as u = w + w? with w 2 W and ? w ? W . Since hwi ; ui = hwi ; w i + hwi ; w? i = hwi ; w i; we have that w = hw ; uiw + hw ; uiw + + hwr ; uiwr and so through the calculation of the inner products hwi ; ui, we've solved the best approximation 1 1 2 2 84 Inner Products and Best Approximation problem min kw uk2 =kw uk2 =kw?k2 w2W r X =juk2 i=1 jhwi ; uij 2 The degree of \tightness" in Bessel's inequality then indicates how well u can be approximated by vectors in the subspace W . Problem 3.14. Suppose that W = f(0; 1; 0); ( 4=5; 0; 3=5)g is an orthonormal basis for W . Test Bessel's inequality and solve the best approximation problem for each of the following vectors a. (1; 1; 1) b. (2; 1; 0) c. (3; 2; 1): Which of the vectors lies closest to W ? The preceding discussion shows how an orthonormal basis for a subspace can be used to good eect. But how do we go about nding such a basis? The next theorem demonstrates how to construct one starting from any given basis. The construction is called the Gram-Schmidt Orthogonalization Process. Theorem 3.9. Let V be an inner product space with a basis, S = fu1 ; u2 ; : : : ; un g: Then V has an orthonormal basis, fq ; q ; : : : ; qn g; 1 2 so that for each k = 1; : : : ; n, span fu1 ; u2 ; : : : ; uk g = span fq1 ; q2 ; : : : ; qk g: Proof: First we note that since S is a basis, u1 6= 0. Thus we may set u1 q1 = ku k : Then kq k = 1 and span fu g = span fq g. To construct q , compute the component of u orthogonal to spanfq g as in Theorem 3.3 and divide by its length to produce a vector of length one. So we dene w = u hq ; u iq To see that w 6= 0, note that if it were zero, then u would be a scalar multiple of q which is in turn a scalar multiple of u . This would mean that u ; u is a 1 1 1 1 2 2 2 1 2 2 1 2 1 1 2 1 1 2 85 QR Decomposition linearly dependent set and could not be a part of a basis set for V , contradicting our initial hypothesis. So w2 6= 0 and we dene q2 = w2 kw k Clearly q has length 1 and since q is orthogonal to spanfq g, fq ; q g is an orthonormal set. Furthermore, u = r q + r q where r = hq ; u i and r = kw k, so that span fu ; u g = span fq ; q g. Now we continue the construction inductively. Suppose for some k > 1, we've produced a set of k 1 orthonormal vectors fq ; q ; : : : ; qk g so that for each j = 1; : : : ; k 1, span fu ; u ; : : : ; uj g = span fq ; q ; : : : ; qj g. (We've done this above for k=3). To construct qk , we will compute the component of uk orthogonal to spanfq ; q ; : : : ; qk g and then divide by its length, producing a vector of length one. Dene 2 2 2 22 2 2 1 1 12 1 2 22 1 2 1 2 2 1 2 2 1 1 1 12 2 2 1 1 2 1 wk = uk kX1 j =1 hqj ; uk iqj Again we must check that wk 6= 0. Were it so, then uk would be in the span of fq1 ; q2 ; : : : ; qk 1 g and hence would be a linear combination of fu1; u2 ; : : : ; uk 1 g. This implies that fu1 ; u2 ; : : : ; uk g is a linearly dependent set and could not be a part of a basis for V , contradicting our starting hypothesis. Thus, wk 6= 0 and we dene w qk = k kw k k A quick calculation veries for each j = 1; : : : ; k 1 hqj ; qk i = kw1 k hqj ; wk i (3.6) k * + kX1 = kw1 k qj ; uk hq` ; uk iq` k `=1 kX1 1 hq` ; uk ihqj ; q` i = kw k (hqj ; uk i k `=1 = kw1 k (hqj ; uk i hqj ; uk i) = 0; k so fq1 ; q2 ; : : : ; qk g is an orthonormal set. Furthermore, uk = Pk`=1 r`k q` , where r`k = hq` ; uk i for ` = 1; : : : ; k 1 and rkk = kwk k. Thus, uk 2 span fq1 ; q2 ; : : : ; qk g. Since we already know that spanfu1 ; u2 ; : : : ; uk 1 g = spanfq1 ; q2 ; : : : ; qk 1 g spanfq1 ; q2 ; : : : ; qk g; we nd that spanfu1 ; u2 ; : : : ; uj g = spanfq1 ; q2 ; : : : ; qj g for each j = 1; : : : ; k, which completes the induction step.2 86 Inner Products and Best Approximation Problem 3.15. Apply the Gram-Schmidt process to the vectors (1; 1; 1); (0; 1; 1); (1; 2; 1) Problem 3.16. Let W = spanf( 1; 0; 1; 2); (0; 1; 0; 1)g: Find the solution to minw2W kw v k where v = ( 1; 2; 6; 0). (Hint: First use the Gram-Schmidt process.) Notice that if the original vectors fu1 ; u2 ; : : : ; un g are a basis for a subspace of C m then the conclusions of Theorem 3.9 can interpretted as giving a matrix decomposition, the QR decomposition, for a matrix having fu1 ; u2 ; : : : ; un g as columns. Theorem 3.10. Let A 2 C mn have rank n. Then there exist matrices Q 2 C mn and R 2 C nn so that Q Q = I, R is upper triangular with strictly positive diagonal entries, and A = QR Proof: The columns of A, fa1 ; a2 ; : : : ; an g form a basis for Ran(A). Applying the Gram-Schmidt process to fa1 ; a2 ; : : : ; an g produces orthonormal vectors fq1 ; q2 ; : : : ; qn g so that for each k = 1; 2; : : : ; n ak = k X j =1 rjk qj where, in particular, rkk = kwk k > 0 as dened in the proof of Theorem 3.9. This is just a column{by{column description of A = QR with Q = [q1 ; : : : ; qn ]. Orthonormality of fq1 ; q2 ; : : : ; qn g is equivalent to Q Q = I. 2 Problem 3.17. Modify the Gram-Schmidt process so that it will produce an orthonormal basis for an inner product space V , starting with any spanning set for V , fu1 ; u2 ; : : : ; un g (not necessarily a basis). How does this change Theorem 3.10 ? 3.5. 3.5 UNITARY TRANSFORMATIONS AND THE SINGULAR VALUE DECOMPOSITION Unitary Transformations and the Singular Value Decomposition Prerequisites: Vector spaces. Inner products. Matrix inverses. Left and right inverses. Projections. Learning Objectives: Familiarity with unitary matrices. Familiarity with the concepts of unitarily equivalent and unitarily similar matrices. Familiarity with the singular value decomposition of a matrix. Consider a matrix U 2 C nn satisfying any one of the properties dened below. U preserves length if for all x 2 C n , kUxk = kxk: U preserves inner products if for all x and y in C n , hUx; Uyi = hx; yi: U is a unitary matrix if U U = I (that is, if the columns of U are orthonormal vectors in C n . The rst goal will be to show that if U has any one of these properties, it has the remaining two as well. Since the inner product of two vectors is proportional to the cosine of the angle between them, equivalence of these three properties amounts to the observation that a length preserving transformation also preserves angles, and that such a transformation is always associated with a unitary matrix. The action of this transformation is simply a rigid motion of the vectors of C n . Such a motion involves only rotations and reections through coordinate planes. 87 88 Inner Products and Best Approximation We introduce some standard notation. Let w = u + iv be a complex number with u and v real. We call u and v the real and imaginary parts of w and denote them by u = Re(w); and v = Im(w): Note that w + w = 2Re(w) and w w = 2iIm(w); Re( iw) = Im(w) and ww = Re(w)2 + Im(w)2 := jwj2 : Theorem 3.11. An n n matrix U preserves length, if and only if it preserves inner products, and if and only if it is unitary. Suppose rst that U preserves inner products. Then kUxk2 = hUx; Uxi = hx; xi = kxk2 : Hence U preserves length. Suppose now that U preserves lengths. We need the following equality whose proof we leave as a problem. Problem 3.18. Show for all complex numbers a and b, ja + bj2 = jaj2 + ba + ab + jbj2 : From the result of Problem 3.18 if x = (x1 ; : : : ; xn ); y = (y1 ; : : : ; yn ), then Proof: kx + yk = kxk + 2 2 n X i=1 (xi yi ) + xi yi ) + kyk2 (3.7) = kxk2 + 2Rehx; yi + kyk2 Substituting iy for y in (3.7) gives kx iyk2 = kxk2 + 2Rehx; iyi + kyk2 (3.8) 2 2 = kxk + 2Re[ ihx; yi] + kyk = kxk2 + 2Imhx; yi + kyk2 Applying U to (3.7) and (3.8) and using the fact that U preserves lengths gives kU(x + y)k2 = kUxk2 + 2RehUx; Uyi + kUyk2 = kxk2 + 2RehUx; Uyi + kyk2 89 Singular Value Decomposition and kU(x iy)k = kUxk + 2ImhUx; Uyi + kUyk = kxk + 2ImhUx; Uyi + kyk 2 2 2 2 2 Thus we see that the real and imaginary parts of hx; yi and hUx; Uyi agree. Hence hUx; Uyi = hx; yi and U preserves inner products. If U is unitary, then it preserves length since kUxk2 = hUx; Uxi = hU Ux; xi = hIx; xi = hx; xi = kxk2 : Conversely, if U preserves length (and so, inner products) then for all x and y in C n , hIx; yi = hx; yi = hUx; Uyi = hU Ux; yi so that h(U U I)x; yi = 0 for all x and y in C n . But this is possible only if (U U I) = 0, which is to say, only if U is unitary. 2 A matrix is unitary if and only if either the rows or columns of the matrix form an orthonormal basis for C n . Indeed, the columns of the unitary matrix, U, are the vectors fUe1 ; : : : ; Uen g; where e1 ; : : : ; en are the natural basis vectors for Cn . By the previous theorem, U preserves both lengths and inner products and so k hUej ; Uek i = 01 ifif jj 6= =k This says that the columns of U form an orthonormal basis for C n . We leave the converse to the reader. Notice that U U = I implies U 1 = U , so that UU = (U ) U = I and U evidently is unitary as well. But by our preceding discussion this means that the columns of U (which are the conjugates of the rows of U) form an orthonormal basis for C n . Hence the rows of U themselves form an orthonormal basis for C n . Problem 3.19. Show that the product of unitary matrices is a unitary matrix. Show that U is a unitary matrix if and only if the larger partitioned matrix I 0 is unitary. 0 U 90 Inner Products and Best Approximation Two matrices, A; B 2 C mn are said to be unitarily equivalent if there are unitary matrices U 2 C mm and V 2 C nn , such that B = U AV. Likewise, two square matrices, A; B 2 C nn are said to be unitarily similar if there is a unitary transformation U 2 C nn such that B = U AU = U 1 AU. Perhaps the single most useful matrix representation in matrix theory is the Singular Value Decomposition (SVD): Theorem 3.12. Every matrix is unitarily equivalent to a diagonal matrix (of the same size) having nonnegative entries on the diagonal. In particular, suppose A 2 C mn and rank(A) = r. There exist unitary matrices U 2 C mm and V 2 C nn so that A = UV (3.9) where = diag(1 ; 2 ; : : : ) 2 C mn with 1 2 r > 0 and r+1 = = p = 0 for p = minfm; ng. The columns of U = [u1 ; u2 ; : : : ; um ] are called the left singular vectors; the columns of V = [v1 ; v2 ; : : : ; vn ] are called the right singular vectors; and 1 ; 2 ; : : : are the singular values of A. We'll prove this theorem while discussing some adjacent ideas. The Frobenius norm of a matrix is invariant under unitary equivalence since for any unitary matrices U 2 C mm and V 2 C nn , kU AVk2F = trace([U AV] U AV) = trace(V A UU AV) = trace(V A AV) = trace(A AVV ) = trace(A A) = kAk2F If V is partitioned by columns as V = [v1 ; v2 ; : : : ; vn ], notice that kAkF = kAVkF = 2 2 n X i=1 kAvi k 2 While dierent choices of unitary V won't change the overall sum, they can aect the distribution of magnitudes among the summands. For a given matrix A, we will seek to collect the \mass" of the sum as close to the beginning of the summation as possible. In particular, this means we'll seek an orthonormal basis of C n (the columns of V): fv1 ; v2 ; : : : ; vn g; that maximizes the sequence of quantities: kAv1 k2; kAv1 k2 +kAv2 k2 ; kAv1 k2 +kAv2 k2 + kAv3 k2 ; .. . 91 Singular Value Decomposition Although at rst blush this may seem hopelessly complicated, notice that the rst quantity maximized depends only on v1 , the second depends (in eect) only on v2 since we've already gotten the best v1 , the third quantity depends only on v3 in the same sense, and so on at each step we only are concerned with maximizing with respect to the next vk in line, having already chosen the best values of all previous vs. The rst step proceeds as follows: Dene 1 = maxkvk=1 kAvk, let v1 be the maximizing vector, and let u1 = 11 Av1 . Now starting with v1 , complete an orthonormal basis for C n and ll out the associated unitary matrix V1 having v1 as its rst column. Likewise, starting with u1 complete an orthonormal basis for C m and ll out the associated unitary matrix U1 having u1 as its rst column. Examining the partitioned matrix product of U1 AV1 yields U1 AV1 = 1 w ^2 0 A The 0 in the (2; 1) location comes from the orthogonality of u1 (which is a multiple of Av1 ) to all the remaining columns of U 1 . We now will show that 1 w = 0. Suppose we dene the vectors x = w and v = kx1k x. Then we nd 2 2 3 1 + w w 6 7 .. 6 7 . 7 Av = 6 6 other stu 7 4 5 .. . In particular, we have that q kAvk kAvk + w w = kmax vk 2 1 1 =1 But the last expression is the largest possible value that the previous expressions can attain, so in fact all inequalities are actually equalities, which in turn means that it must be that w = 0. At this point, we've shown that any matrix is unitarily equivalent to a matrix having rst row and column zero except for a nonnegative diagonal entry. Continuing to the next step, we go through the same construction on A^ 2 , and dene 2 = maxkvk=1 kA^ 2 vk, let v^2 be the maximizing vector, and let ^ 2 v^2 . Now starting with v^2 , complete an orthonormal basis for C n 1 u^ 2 = 12 A and ll out the associated unitary matrix V^ 2 having v^2 as its rst column. Likewise, starting with u^ 1 complete an orthonormal basis for C m 1 and ll out the associated unitary matrix U^ 2 having u^ 2 as its rst column. Similar reasoning to that found in the rst step above reveals ^U2 A^ 2 V^ 2 = 2 ^0 0 A3 92 Inner Products and Best Approximation Problem 3.20. Explain why 2 as dened above satises 2 kAvk for all vectors v with kvk = 1 and hv; v1 i = 0 We nish the second step by constructing, V2 = I 0 ^2 0 V and U2 = Then one has 2 1 I 0 ^2 0 U 0 0 0 2 0 0 0 A3 (U1 U2 ) AV1 V2 = 4 3 5: The construction of the SVD continues in this way so that before we begin step k we have found an orthonormal set of vectors v1 ; v2 ; : : : ; vk 1 and a non-negative scalar k such that k kAvk for all vectors v with kvk = 1 and hv; vi i = 0 for i = 1; 2 : : : ; k 1. Step k continues by dening unit vectors vk and uk so that Avk = sigmak uk 2 Notice that the singular values i and (right) singular vectors vi are constructed in such a way so that as well, k kAvk for all vectors v with kvk = 1 and v 2 spanfv1 ; v2 ; : : : ; vk ; g This is easy to see since under these circumstances, n k X X Av = UV v = i v vui = i v vui so, i i=1 kAvk = 2 k X i=1 k 2 i=1 i i2 jvi vj2 k X i=1 jvi vj 2 = k2 kvj2 : The last step comes from an application of the Parseval Relation. The nal set of ideas that we'll explore here center on a result known as the Mirsky-Eckart-Young inequality: 93 Singular Value Decomposition Theorem 3.13. Let A be an mn complex matrix with singular values 1 ; 2 ; : : : ; p (p = min(m; n)), left singular vectors u1 ; u2 ; : : : ; um , and right singular vectors v1 ; v2 ; : : : ; vn . If B is any m n complex matrix with rank (B) = k < p then kA BkF v u p u X t i=k+1 i2 with equality occuring for the rank k matrix dened by B = Proof. k X i=1 i ui vi Notice rst that kA B kF =kU (A B )VkF =k diag(1 ; 2 ; : : : ; k )kF = v u p u X t i=k+1 i2 so equality is achieved with B and B is evidently of rank k. Now let B an arbitrary m n complex matrix with rank(B) = k < p and observe that the \rank + nullity" theorem allows us to conclude that Ker(B) has dimension n k. In the following lines we will generate a distinguished orthonormal basis for Ker(B) that we will label as fzk+1 ; zk+2 ; : : : ; zn g. Let's start by considering the sequence of subspaces: least 1 dim Zk+1 = |spanfv1 ; v{z ; atintersection 2 ; : : : ; vk+1 g \ Ker (B) } k+1 dimensional least 2 dim Zk+2 = |spanfv1 ; v{z2 ; : : : ; vk+2 g} \ Ker(B) ; atintersection k+2 dimensional least 3 dim Zk+3 = |spanfv1 ; : : : {z; vk+2 ; vk+3 g} \ Ker(B) ; atintersection k+3 dimensional .. . n k 1 dim ; at least Zn 1 = span fv1 ; v{z 2 ; : : : ; vn 1 g \ Ker (B) intersection } | n 1 dimensional Zn =Ker(B) ; n k dimensional subspace Notice that the subspaces are nested as Zk+1 Zk+2 : : : Zn k 1 Ker(B); 94 Inner Products and Best Approximation so we can pick a sequence of linearly independent vectors fzk+1 ; zk+2 ; : : : ; zn ; g so that spanfzk+1 g = Zk+1 spanfzk+1 ; zk+2 g = Zk+2 spanfzk+1 ; zk+2 ; zk+3 g = Zk+3 .. . spanfzk+1 ; zk+2 ; : : : ; zn g = Ker(B): Although our selection of fzi gni=k+1 might not be orthonormal we can apply the Gram-Schmidt process to fzi gni=k+1 to produce an orthonormal set of vectors with the same spanning properties given above. Thus, we can assume without loss of generality that fzi gni=k+1 is an orthonormal set { an orthonormal basis for Ker(B), in fact. We continue by augmenting fzi gni=k+1 with additional orthonormal vectors fzi gki=1 that span Ker(B)? to create a full orthonormal basis for C n . The unitary invariance of the Frobenius norm leads to kA BkF = 2 n X k(A B)zi k i=1 n X 2 k(A B)zi k i=k+1 n X = i=k+1 p X i=k+1 2 kAzi k 2 i2 ; with the last step justied by the observation that zi 2 spanfv1 ; v2 ; : : : ; vi g so i kAzi k for each i = k +1; k +2; : : : ; n. This proves the inequality. 2 Chapter 4 The Eigenvalue Problem 4.1 Eigenvalues and Eigenvectors Prerequisites: Matrix algebra. Learning Objectives: Familiarity with the basic denitions for the eigenvalue problem. Familiarity with the characteristic polynomial and the minimal polynomial associated with a matrix. The ability to compute eigenvalues by calculating roots of either the characteristic polynomial or the minimal polynomial. Familiarity with cyclic subspaces and Krylov subspaces and understanding their relationship with the minimal polynomial. Let A be any n n complex matrix. The complex number is an eigenvalue of A if the equation Au = u (4.1) is satised with a nontrivial solution u 6= 0. The vector u is called an eigenvector corresponding to . The pair (; u) may be referred to as an eigenpair. The set of all points in the complex plane that are eigenvalues for A is called the spectrum of A and commonly denoted as (A). 95 96 The Eigenvalue Problem 4.1.1 Eigenvalue Basics If we think of A as a transformation taking a vector, x 2 C n to another vector, Ax 2 C n , then generally one will expect both the direction and magnitude of Ax to dier from that of the vector x. If x happens to be an eigenvector for A then A leaves the direction of x unchanged, at least to the extent that Ax would be just a scalar multiple of x. After multiplying both sides of equation (4.1) by any nonzero scalar , one may see immediately that if (; u) is an eigenpair for A then so is (; u). From this one may conclude that the magnitude of an eigenvector is arbitrary (and so, irrelevant) and it is only the direction that is signicant. Aside from dierences of scalar multiples, a given eigenvalue might have more than one eigenvector associated with it. A tiny bit of manipulation in (4.1) yields the equivalent (A I) u = 0: (4.2) Thus, is an eigenvalue for A if and only if the system of equations (4.1) has a nontrivial solution u, which is to say, if and only if (A I) is singular. But this can happen in turn, if and only if det(A I) = 0: (4.3) We dene the characteristic polynomial of A as pA (t) = det(tI A): By expanding the determinant, one may see that pA (t) is an nth degree polynomial in t with the leading coeÆcient of tn equal to 1 (i.e., it's a monic polynomial of degree n). From (4.3), one sees that the eigenvalues of A must occur exactly at the roots of pA (t). Since the degree of pA is n, this means (from the fundamental theorem of algebra) A has at least one and at most n distinct eigenvalues. Each distinct eigenvalue of a matrix has at least one eigenvector associated with it, since if satises det(A I) = 0 then there is at least one nontrivial solution to (A I)u = 0. If pA (t) = tn + + c1 t + c0 , then we can factor pA such that pA (t) = (t 1 )m(1 ) (t 2 )m(2 ) : : : (t N )m(N ) where 1 ; 2 ; : : : ; N are the distinct zeros of pA (and hence the distinct eigenvalues of A) and m(1 ); m(2 ); : : : ; m(N ) are positive integers satisfying m(1 )+ m(2 ) + + m(N ) = n. m(i ) is the algebraic multiplicity of the eigenvalue i . There are two conventions used in labeling eigenvalues. One may label the distinct eigenvalues of the matrix as we did above, or one may label the eigenvalues \counting multiplicity". When we label the eigenvalues of a matrix counting multiplicity, we list each eigenvalue as many times as its multiplicity. For example if A is a 6 6 matrix with distinct eigenvalues Eigenvalues and Eigenvectors 97 1 with multiplicity one, 2 with multiplicity three, and 3 with multiplicity two, we may label the eigenvalues of A counting multiplicity as fi g6i=1 1 = 1 ; 2 = 3 = 4 = 2 ; and 5 = 6 = 3 : If the matrix A has all real entries, then pA (t) = tn + + c1 t + c0 is evidently a polynomial with real coeÆcients. Thus if is an eigenvalue of A then pA () = n + + c1 + c0 = 0: Taking the complex conjugate of both sides of the above equation and recalling that the ci are all real, gives that n + + c1 + c0 = 0 Thus is a zero of pA and hence an eigenvalue of A. Similarly if A has all real entries and u is an eigenvector corresponding to then conjugation yields Au = u Using the fact that A = A gives that Au = u: Hence u is an eigenvector of A corresponding to . We have proved the following theorem Theorem 4.1. Let A be an n n matrix with all real entries. Then if (; u) is an eigenpair for A, so is (; u). Example 4.2. Let 2 1 : A= 5 2 We nd that det(A I) = 2 + 1. Thus the eigenvalues are i; i. To nd the eigenspace for i, we solve the system 2 i 1 x1 5 2 i x2 98 The Eigenvalue Problem and obtain that x1 = (i 5 2) x2 . Thus the = i eigenspace for A is one dimensional with basis vector ( (i 5 2) ; 1). By Theorem 4.1 we have that the = i eigenspace is one dimensional with basis vector ( (i+2) ; 1). 5 Example 4.3. Let 2 A=4 3 2 0 2 0 3 0 0 5 3 5 We nd that det(A tI) = (t 1)(t 5)2 : To compute the = 5 eigenspace we need to solve the system 2 4 2 2 0 2 0 2 0 0 0 32 54 x1 x2 x3 3 2 3 0 5 = 4 0 5: 0 The eigenspace is two dimensional with basis vectors ( 1; 1; 0); (0; 0; 1). The = 1 eigenspace is one dimensional and we leave the calculation to the reader. Problem 4.1. Find bases for all the eigenspaces for the following matrices 10 9 1. 4 2 2 3 2 0 1 2. 4 6 2 0 5 19 5 4 Notice that if is an eigenvalue of A then both (A I) and (A I) are singular, so in particular, there are nontrivial solutions to (A I) y = 0. Such a vector y is called a left eigenvector associated with , since it satises y A = y Problem 4.2. Show if A is upper triangular then its eigenvalues are precisely its diagonal elements. Problem 4.3. Show that an n n matrix A is singular if and only if = 0 is an eigenvalue for A. Problem 4.4. Show that the eigenvalues of a unitary matrix must have the form ei for 2 [0; 2). Eigenvalues and Eigenvectors 99 Looking back to (4.2) again, is an eigenvalue of A if and only if nullity(A I) > 0 and every nontrivial vector in Ker (A I) is an eigenvector associated with the eigenvalue . The subspace Ker (A I) is called the eigenspace associated with the eigenvalue and nullity (A I) = dim Ker (A I) is called the geometric multiplicity of . The geometric multiplicity of an eigenvalue can be dierent from its algebraic multiplicity. For example, the matrix A = 30 73 has the characteristic polynomial pA (t) = (t 3)2 . So 3 is the (only) eigenvalue of A and algebraic multiplicity 2. But notice that nullity (A 3I) = it has 0 7 nullity 0 0 = 1, so the geometric multiplicity of 3 is only 1. Recall that two matrices, A and B, are similar if there exists a nonsingular matrix S so that A = S 1 BS: Problem 4.5. Suppose S 1 BS = A where A and B are n n matrices. If (; u) is an eigenpair for A, show that (; Su) is an eigenpair for B. Theorem 4.4. Similar matrices have the same eigenvalues with the same geometric and algebraic multiplicities. Proof: Suppose A and B are similar, and A = S 1 BS: Then S 1 (B I)S = S 1 (BS IS) = S 1 (BS SI) = S 1 BS I = A I Thus det(A I) = det(S 1 (B I)S) = det(S) det(B I) det(S) = det(B I): Thus A and B have the same characteristic polynomial (having the same roots with the same multiplicities !). To show the geometric multiplicities of the eigenvalues are the same, recall rst that the ranks of similar matrices are the same. Then since nullity (A I) = n rank (A I) = n rank (B I) = nullity (B I) ; is an eigenvalue of both A and B with the same geometric multiplicity. 2 The geometric multiplicity of an eigenvalue can never exceed the algebraic multiplicity. Indeed, suppose an eigenvalue of the matrix A has geometric 100 The Eigenvalue Problem multiplicity, g() = nullity(A I), and let the columns of the matrix S1 2 C ng() span the eigenspace for : Ker(A I). The columns of S1 form a basis for the eigenspace associated with that may be extended to a basis for all of C n by augmenting S1 with the columns of a second matrix S2 2 C n[n g()] . The partitioned matrix S = [S1 S2 ] then is invertible and has columns that span C n , so in particular, there will be matrices A^ 12 and A^ 22 so that AS2 = ^ 12 + S2 A^ 22 . Thus, S1 A AS = [AS1 AS2 ] h i ^ 12 + S2 A^ 22 S1 S1 A ^ 12 = [S1 S2 ] I0g() A ^ 22 ; A and so, S 1 AS = Ig() 0 ^ 12 A ^ 22 : A The characteristic polynomial of A then can be expressed as ^ 22 ) pA (t) = det(tI A) = (t )g() det(tI A Since det(tI A^ 22 ) is itself a monic polynomial in t that may have (t ) as a factor, pA (t) has as a root with a multiplicity of at least g(). 4.1.2 The Minimal Polynomial As polynomials play a fundamental role in our understanding of eigenvalues, we rst recall a useful fact from algebra that will be important in what follows. Proposition 4.5. If p and q are two polynomials with deg(p) deg(q), then there exist polynomials and with deg( ) = deg(p) deg(q) and deg() < deg(q) such that p(x) = (x)q(x) + (x) Example 4.6. Suppose p(x) = x3 2x2 + 3x + 2 and q(x) = x2 x + 2. Then if we take (x) = ax + b and (x) = cx + d, we nd (x)q(x) + (x) =(ax + b)(x2 x + 2) + (cx + d) =ax3 + (b a)x2 + (2a b + c)x + (2b + d) =x3 2x2 + 3x + 2 is satised with coeÆcients a; b; c; and d that solve 9 8 9 2 38 1 0 0 0 > a> 1> > > > > > < = < 6 1 1 0 0 77 b = 2 =: 6 4 2 1 1 0 5> > c > > > > 3 > > 0 2 0 1 :d; : 2; Eigenvalues and Eigenvectors 101 One nds a = 1, b = 1, c = 0, and d = 4 and indeed one may multiply out to verify: x3 2x2 + 3x + 2 = (x 1)(x2 x + 2) + (4) Let q(t) = cn tn + + c1 t + c0 be a polynomial. The matrix q(A) is dened by q(A) = cn An + + c1 A + c0 I: We have the following theorem. Theorem 4.7. Let (; x) be an eigenpair for A, then (q(); x) is an eigenpair for q (A). Proof: Since (; x) is an eigenpair for A, we have Ax = x. Thus A2 x = AAx = Ax = Ax = 2 x: Continuing we see that for all k = 1; 2; : : : ; n Ak x = k x: Thus q(A)x = cn An x + + c1 Ax + c0 Ix = cn n x + + c1 x + c0 x = q()x: Thus (q(); x) is an eigenpair for q(A). 2. Problem 4.6. Show that U Uq() and give an example showing that U 6 Uq() is possible. Pick any vector v 2 C n and consider the sequence of vectors fv; Av; A2 v; : : : g. Such a sequence of vectors is called a Krylov sequence generated by v. Since each vector of the sequence is in C n , not more than n vectors of any Krylov sequence can be linearly independent. Dene the subspace, M, spanned by the full Krylov sequence: M = spanfv; Av; A2 v; : : : g: M is called the cyclic subspace generated by v. The rst s def = dim(M) vectors of the Krylov sequence are linearly independent and so, must constitute a basis for M. Indeed, if this were not the case then A v 2 spanfv; Av; A2 v; : : : A 1 vg for some < s which would imply that for each successive power of A: A +1v 2 spanfAv; A2 v; : : : A vg spanfv; Av; A2 v; : : : A 1 vg +2 A v 2 spanfA2 v; A3 v; : : : A +1 vg spanfv; Av; A2 v; : : : A 1 vg .. . 102 The Eigenvalue Problem which would imply in turn that M spanfv; Av; A v; : : : A vg; and that dim M = s , contradicting the tentative assertion that < s. Pick any vector v 2 C n . From the discussion above we see that if s is the dimension of the cyclic subspace generated by v then s n and As v 2 spanfv; Av; A v; : : : As vg. Thus, for some set of coeÆcients fc ; c ; : : : ; cs g, As v = cs As v + + c Av + c v: or equivalently, As cs As c A c I v =0 q(A)v =0 where q is a polynomial of degree s. q is called the minimal polynomial for v and is said to \annihilate" v. Any polynomial that annihilates v will have the minimal polynomial for v as a factor (which is why it's called \minimal"). To see this, suppose that a polynomial p annihilates v: p(A)v = 0. p has to have degree at least as big as s (the degree of p must be at least as big as that of q otherwise the cyclic subspace that led us to q would have had a smaller dimension). But then we can divide q into p to get p(t) = (t)q(t) + (t); where (t) is the remainder left which must then have degree strictly less than s. This leads to trouble since then 0 = p(A)v = (A)q(A)v + (A)v = (A)v and (A) is a polynomial of degree strictly less than s = deg(q) that still annihilates v. But the same reasoning that led us to conclude that the degree of p had to be at least as big as s = deg(q) then forces us to a contradiction unless (t) 0. Thus = 0 and q is a factor for p. From the fundamental theorem of algebra, any polynomial of degree s, and q(t) in particular, has precisely s roots (counting possible multiplicity) and may be factored as q(t) = (t )(t ) : : : (t s ): At least one of f ; ; : : : ; s g must be an eigenvalue of A, since q(A)v = 0 implies that q(A) is singular and since q(A) = (A I)(A I) : : : (A s I); at least one of the factors (A i I) must be singular { if not even one were singular than q(A) would be nonsingular, leading to a contradiction. 2 2 1 1 0 1 1 1 1 1 1 1 1 2 2 1 2 0 0 1 1 Eigenvalues and Eigenvectors 103 But in fact, they all have to be singular. Suppose to the contrary that one of the factors is nonsingular and relabel the s so that (A s I) is the nonsingular factor. Then we may premultiply q(A)v = 0 by (A s I) 1 to get (A 1 I)(A 2 I) : : : (A s 1 I)v =0 As 1 c^s 2 As 2 c^1 A c^0 I v =0 for some set of coeÆcients fc^0; c^1 ; : : : ; c^s 2 g. But this implies that As 1 v 2 spanfv; Av; A2 v; : : : As 2 vg and that the cyclic subspace generated by v has dimension no bigger than s 1, q which then conicts with the way that s was dened. This means that all the roots of the minimal polynomial for a vector v will always be eigenvalues of the matrix A. Example 4.8. Let 3 1 A= 2 0 : Start with v = 10 and form K = v Av A2 v 1 3 7 = 0 2 6 The reduced row echelon form for K is 1 0 2 U= 0 1 3 Thus, A2 v = ( 2)v + 3Av which we rearrange to get A2 3A + 2I v = 0 Then we nd that q() = 2 3 + 2 = ( 1)( 2): Thus 1 = 1 and 2 = 2 are eigenvalues of A. To nd the eigenspace associated with 1 = 1 , we need to nd the solution set to the system (A 1I)u = 0: The solution set for this system is a one-dimensional subspace spanned by the vector (1; 1). Similarly the eigenspace associated with 2 = 2 is a one-dimensional subspace spanned by ( 2; 1). Since there are an innite number of possibilities for selecting the starting vector v, how many dierent eigenvalues of A could be generated ? Well, since the minimal polynomial for any vector can't have degree any larger than n, 104 The Eigenvalue Problem there will be choices of v, say v^ is one of them, for which the associated minimal polynomial q^ has maximal degree, m, { so all other minimal polynomials associated with other vectors can have degree no larger than m. In fact, all other minimal polynomials will be factors of q^! To see that, pick a second vector u that is not a scalar multiple of v^ and let qu be the minimal polynomial for u. Dene quv^ to be the lowest common multiple of qu and q^ { the lowest order polynomial that contains both qu and q^ as factors. Then quv^ annihilates every vector in the two dimensional subspace spanned by v^ and u: quv^ (A)(v^ + u) = 0; and so must contain the minimal polynomial of each vector in spanfv^ ; ug as a factor. How many distinct such minimal polynomials could there be ? quv^ has degree no bigger than 2m; minimal polynomials will have degree no bigger than m, so we can count the number of ways one may pick m or fewer monomial factors out of set of 2m possible choices. The exact maximum number of possibilities is not so important as the fact that there is a nite number of possibilities and so there must be only a nite number of possible distinct minimal polynomials for vectors in spanfv^ ; ug. We can partition this two dimensional subspace into a nite number of equivalence classes, each containing those vectors having the same minimal annihilating polynomials. These equivalence classes can't all be lines since we can't cover a two dimensional space with a nite number of lines. Thus it will have to happen that two linearly independent vectors x; y 2 spanfv^ ; ug could be found that will have the same minimal polynomials qx = qy . Now we can go through the previous argument and construct a lowest common multiple qxy which annihilates all vectors in spanfx; yg = spanfv^ ; ug. But now qxy = qx = qy has degree no bigger than m; it annihilates v^ and so contains q^ as a factor; and since both qxy and q^ have degree m, it must be that qxy = q^. So q^ annihilates every vector in spanfv^ ; ug { but since u was arbitrarily chosen, q^ actually annihilates every vector in C n : q^(A)v = 0 for all v 2 C n or equivalently q^(A) = 0: q^ is called the minimal polynomial for A. Since q^ is itself the minimal polynomial for a vector v^ , there can be no polynomial of lower degree that satises q(A) = 0. In particular, this means (via arguments similar to what we explored above) that any polynomial p that satises p(A) = 0 must contain q^ as a factor. Problem 4.7. Why is it necessary for the minimal polynomial of A to have each eigenvalue of A as a root ? 4.2. 4.2 INVARIANT SUBSPACES AND JORDAN FORMS 105 Invariant Subspaces and Jordan Forms Prerequisites: Matrix algebra. Vector spaces Familiarity with minimal polynomials Learning Objectives: Familiarity with invariant subspaces Familiarity with spectral projectors and their relationship with the minimal polynomial. Familiarity with Jordan forms associated with a matrix. Another (potentially less familiar) background fact from polynomial algebra that will be useful for us gives conditions suÆcient to guarantee that two polynomials can be found that when used as \coeÆcients" in a linear combination of a pair of the given polynomials, all powers but the constant term can be made to cancel out. This actually is an outcome of a deeper result giving conditions for the principal ideal generated by a pair of polynomials to be the full set of all polynomials { (including the constant polynomial, p(t) = c). A proof of this result is not oered here. Proposition 4.9. If q1 and q2 are polynomials with no roots in common, then there exist polynomials p1 and p2 so that p1 (x)q1 (x) + p2 (x)q2 (x) = 1 Example 4.10. Suppose q1 (x) = x 2 and q2 (x) = (x 3)2 . Then if we try p1 (x) = ax + b and p2 (x) = c (constant), we nd 1 = (ax + b)(x 2) + (c)(x 3)2 (4.4) 2 = (a + c)x + (b + 2a 6c)x + ( 2b + 9c): (4.5) The coeÆcients a; b; and c satisfy the linear system 9 8 9 2 38 1 0 1 <a= <0= 4 2 1 65 b = 0 : 0 2 9 :c; :1; This leads nally to a = 1, b = 4, and c = 1. One may verify that ( x + 4)(x 2) + (x 3)2 = 1 106 The Eigenvalue Problem 4.2.1 Invariant Subspaces A subspace U of C n is an invariant subspace for A if x 2 U implies that Ax 2 U , or more succinctly, if AU U . A matrix A 2 C nn will have a variety of invariant subspaces: the trivial subspace f0g, the whole space C n , and eigenspaces are all invariant subspaces. For any vector v 2 C n , the cyclic subspace generated by v is also an invariant subpace for A. Indeed, since any invariant subpace of A that contains v must also contain A v for each integer > 0, M is the smallest invariant subspace of A that contains v. A large variety of invariant subspaces can be associated with the minimal polynomial. Theorem 4.11. If the minimal polynomial, qA , associated with the matrix A is factored as qA (t) = q1 (t)q2 (t) so that q1 and q2 have no common factor then the two subspaces U1 = Ker(q1 (A)) and U2 = Ker(q2 (A)) satisfy: = U1 U2 which means C n = span(U1 ; U2 ) and U1 \ U2 = f0g U1 and U2 are invariant subspaces for A 1. C n 2. 3. the polynomials q1 and q2 are the lowest order polynomials that annihilate U1 and U2 , respectively. Proof: Since q1 and q2 have no common factor, there exist polynomials p1 and p2 such that p1 (t)q1 (t) + p2 (t)q2 (t) = 1 which implies p1 (A)q1 (A) + p2 (A)q2 (A) = I: Notice that for any x 2 C n , x = p1 (A)q1 (A)x + p2 (A)q2 (A)x; so dening x1 = p2 (A)q2 (A)x and x2 = p1 (A)q1 (A)x, observe q1 (A)x1 =q1 (A)p2 (A)q2 (A)x =p2 (A)qA (A)x = 0 thus x1 2 U1 . A similar argument shows that x2 2 U2 . This shows that every vector x 2 C n may written as x = x1 + x2 for some x1 2 U1 and x2 2 U2 . Suppose w 2 U1 \ U2 . One may write w = p1 (A)q1 (A)w + p2 (A)q2 (A)w = 0; thus U1 \ U2 = f0g. (4.6) 107 Invariant Subspaces and Jordan Forms To see that U1 and U2 are invariant subspaces for A, notice that x 2 U1 implies q1 (A)x = 0 which implies Aq1 (A)x = 0 which implies q1 (A)Ax = 0 which implies Ax 2 U1 . A similar argument can be mustered for U2 . Finally, suppose that r1 is any polynomial that annihilates U1 . Then for each x 2 C n , we have from (4.6) r1 (A)q2 (A)x = q2 (A)r1 (A)x1 + r1 (A)q2 (A)x2 = 0 and r1 (A)q2 (A) = 0. But this implies that r1 (t)q2 (t) must have the minimal polynomial qA (t) = q1 (t)q2 (t) as a factor { or equivalently r1 (t) must have q1 (t) as a factor. The same may be done for any polynomial r2 that annihilates U2 . 2 A matrix P is a spectral projector for a matrix A if 1. P2 = P 2. Ker(P) is an invariant subspace for A 3. Ran(P) is an invariant subspace for A Theorem 4.12. The matrices E1 = p2 (A)q2 (A) and E2 = p1 (A)q1 (A) are spectral projectors onto the invariant subspaces U1 and U2 , respectively. Proof. Immediately, we have that Ker(E1 ) = U2 and Ker(E2 ) = U1 which we established as invariant subspaces for A in the previous theorem. We next show that E1 and E2 are idempotent. The previous theorem established that for any x 2 C n, x = E1 x + E2 x: Applying E1 to both sides yields E1 x = E21 x + E1 E2 x: and E1 E2 =p1 (A)p2 (A)q1 (A)q2 (A) =p1 (A)p2 (A)qA (A) = 0 Thus, E1 = E21 and a similar argument gives that E2 = E22 . Now note that this means Ran(E1) = Ker(E2 ) = U1 and Ran(E2 ) = Ker(E1 ) = U2 are also invariant subspaces for A. The ascent,1 (), of the eigenvalue is the smallest integer > 0 so that Ker(A I) = Ker(A I)+1 . Theorem 4.13. The minimal polynomial, qA , associated with the matrix A can be represented as qA (t) = Y 2(A) (t )() (4.7) 1 This is sometimes called the index of an eigenvalue, but we eschew the term as too nondescript. 108 The Eigenvalue Problem Proof: The distinguishing feature of (4.7) is the ascent as exponent in each factor of (t ). Suppose is an eigenvalue of A and factor the minimal polynomial as qA (t) = q1 (t)q2 (t) with q1 (t) = (t )m for some m and q2 (t) such that q2 () 6= 0. The ascent () is dened so that (t )() is the minimal order polynomial annihilating Ker(A I)` for all ` () Since q1 is the minimal order polynomial that annihilates Ker(A I)m it cannot happen that m > (). If it should happen that m < (), then we could nd z 2 Ker(A I)m+1 such that z 62 Ker(A I)m . Now, we can use (4.6) to write z = x1 + x2 for some x1 2 Ker(A I)m and x2 2 Ker(q2 (A)). Since (t )m+1 annihilates both z and x1 , it must annihilate x2 as well. But if p(t) = (t )m+1 annihilates x2 , p(t) must contain the minimal polynomial for x2 and so p(t) must share a factor with the minimal polynomial for any invariant subspace containing x2 { in particular, (t ) must be a factor of q2 (t). This conicts with the original construction guaranteeing that q2 () 6= 0. So ultimately, it must be that m = (). 2 4.2.2 Jordan Forms If A has N distinct eigenvalues labeled as f1 ; 2 ; : : : ; N g then (4.7) is manifested as qA (t) = (t 1 )(1 ) (t 2 )(2 ) : : : (t N )(N ) (4.8) Since i 6= j if i 6= j , the individual terms (t i )(i ) share no common factors with any of the other terms (t j )(j ) when i 6= j . A straightforward extension of Theorem 4.11 is evident: Theorem 4.14. Let A have a minimal polynomial given by (4.8) and dene the N subspaces Ui = Ker(A i I) i ( ) for i = 1; 2; : : : ; N Then = U1 U2 : : : UN , which means C n = span(U1 ; U2 ; : : : UN ) and Ui \ Uj = f0g whenever i 6= j ; 1. C n 2. Each Ui is an invariant subspace for A; and 3. qi (t) = (t i )(i ) is the lowest order polynomial that annihilates Ui . The proof is omitted but follows the same lines as the proof of Theorem 4.11. Theorem 4.14 leads immediately to a fundamental matrix representation called the Jordan Form : 109 Invariant Subspaces and Jordan Forms Theorem 4.15. There exists an invertible matrix S such that 2 3 J(1 ) 0 ::: 0 6 0 J(2 ) 0 77 S 1 AS = 6 6 7 .. .. 4 with 2 J(i ) = 6 6 6 6 6 6 6 4 . 0 i I R(12i) : : : 0 i I : : : 0 0 . 0 .. 0 0 . .. J(N ) 5 R(1i)(i ) R(2i)(i ) . .. . i I R(i()i ) 1;(i ) 0 i I 3 7 7 7 7 7 7 7 5 i) (i) where each matrix in the superdiagonal R(`;` +1 has full column rank and rank (R`;`+1 ) = nullity(A i I)`+1 nullity(A i I)` . Proof: For each i = 1; : : : ; N , let Si be a matrix whose columns form a basis for Ker(A i I)(i ) . Then S = [S1 ; S2 ; : : : ; SN ] has columns that collectively form a basis for C n and so S is invertible. Since each Si spans an invariant subspace for A, there exist matrices, call them Ji for the time being, so that ASi = Si Ji (4.9) Each Ji is a matrix representation of A as a linear transformation of Ui to Ui with respect to the basis given by Si . As a result, Ji will always be a square matrix with a dimension equal to nullity(A i I)(i ) Notice that dierent choices for bases for Ker(A i I)(i ) produce dierent choices for Si which in turn, induce dierent choices for Ji . But for any such choice, (4.9) remains valid and can be written collectively as 3 2 J1 0 : : : 0 6 0 J2 0 77 (4.10) AS = S 6 7: 6 . . .. 5 4 .. 0 0 JN One natural approach to take in constructing a particular basis for Ker(A i I)(i ) begins with an orthonormal basis for Ker(A i I) as columns of a matrix S(1i) . Since Ker(A i I) Ker(A i I)2 , one may extend the columns of S(1i) to an orthonormal basis for Ker(A i I)2 by appending a set of columns, S(2i) . Notice then that Ran[(A i I)S(2i) ] Ker(A i I), so there must be a matrix R(12i) such that (A i I)S(2i) = S(1i) R(12i) . Furthermore, R(12i) must have full(i)column rank for if it didn't there would be a nontrivial vector z such that R12 z = 0 which would then imply that (A i I)S(2i) z = 0. This identies a 110 The Eigenvalue Problem linear combination of columns of S(2i) that lies in Ker(A i I) despite the fact that we constructed S(2i) so as to have all columns orthogonal to Ker(A i I). Thus R(12i) must have full column rank. We may continue the process so that at step `, say, the columns of [S(1i) ; S(2i) ; : : : ; S(`i) ] form an orthonormal basis for Ker(A i I)` and the columns of S(`i) include only those basis vectors for Ker(A i I)` that are orthogonal to Ker(A i I)` 1 . Thus, for any i = 1; : : : ; N (A i I)S(1i) = 0 (A i I)S(2i) = S(1i) R(12i) (A i I)S(3i) = S(2i) R(23i) + S(1i) R(13i) .. . (A i I)S(i()i ) = (X i ) 1 j =1 S(ji) R(ji)(i ) These expressions can be rearranged to discover ASi = ASi1 ; ASi2 ; : : : ; ASi(i ) 2 = 4i S(1i) ; i S(2i) + S(1i) R(12i) ; : : : ; i S(i()i ) + 2 h = S(1i) ; S(2i) ; : : : ; 6 6 i6 (i) S(i ) 6 6 6 6 4 = Si J(i ) i i I R 0 i I ( ) 12 0 0 ::: ::: (X i ) j =1 1 3 S(ji) R(ji)(i ) 5 3 R(1i)(i ) R(2i)(i ) . . . . . . ... 0 i I R(i()i ) 0 0 i I 1 ;(i ) 7 7 7 7 7 7 7 5 Thus, (4.10) holds with Ji = J(i ). 2 Theorem 4.16. Let A be an n n matrix with characteristic polynomial pA (t) and let m(i ) denote the algebraic multiplicity of i as an eigenvalue of A. Then g(i ) + (i ) 1 m(i ) g(i ) (i ) If the dimension of J(i ) is labeled as di then det(tI J(i )) = (t i )di and Theorem 4.15 can be used to establish Proof: pA (t) = N Y i=1 det(tI J(i )) = N Y i=1 (t i )di ; Invariant Subspaces and Jordan Forms 111 indicating that the algebraic multiplicity of i is m(i ) = di . Now, from the form of J(i ), one sees immediately that g(i ) is the dimension of the (1; 1) block of J(i ), the total number of blocks must be (i ), and these blocks cannot be increasing in dimension. So upper and lower bounds for di in terms of g(i ) and (i ) are evident by inspection. 2 Corollary 4.17. The \Cayley-Hamilton Theorem" pA (A) = 0 Since (i ) m(i ), pA (t) must contain the minimal polynomial of A as a factor, and so must annihilate A. Proof. 112 4.3 The Eigenvalue Problem Diagonalization Prerequisites: Basic denitions for the eigenvalue problem Matrix algebra. Vector spaces Learning Objectives: Familiarity with conditions under which a matrix can be diagonalized. Familiarity with conditions suÆcient to guarantee linearly independent eigenvectors. Let A be an n n matrix. If A is similar to a diagonal matrix, then A is said to be diagonalizable. Theorem 4.18. The matrix A 2 C nn is diagonalizable if and only if A has a set of n linearly independent eigenvectors. Proof: Suppose x1 ; : : : ; xn are n linearly independent eigenvectors of A. Let X be the n n matrix whose ith column is xi . Since the columns of X are linearly independent, we see that X is invertible. Then X 1 AX = X 1 A [x1 : : : xn ] = X 1 [Ax1 : : : Axn ] = X 1 [12x1 : : : n xn ] 3 = X 2 = 6 6 6 4 6 1 X6 6 4 1 0 1 2 0 ... 3 2 0 ... 0 n 7 7 7 5 n 7 7 7 5 Conversely, suppose that A is similar to a diagonal matrix 2 6 D=6 6 4 1 0 3 2 ... 0 n 7 7 7 5 113 Diagonalization Then there exists X invertible such that X 1 AX = D or equivalently that AX = XD and so if we denote the columns of X by x1 ; : : : ; xn , we have 2 6 A [x1 : : : xn ] = [x1 : : : xn ] 6 6 4 1 0 2 ... 0 n 3 7 7 7: 5 This implies that for each k = 1; 2; : : : ; n Axk = k xk : Thus x1 ; : : : ; xn are eigenvectors for A. Since they are the columns of the invertible matrix X, they are linearly independent. The theorem is proved. 2 Corollary 4.19. The matrix A is diagonalizable if and only if the ascent of each eigenvalue is 1; which occurs if and only if each eigenvalue appears as a simple root of the minimal polynomial of A. Let A be a diagonalizable matrix that is similar to the diagonal matrix D. Clearly A can be thought of as the matrix representation for a linear transformation L with respect to the standard basis. As we see from the preceeding proof, D is the matrix representation of L with respect to the basis formed by the n independent eigenvectors of A. Problem 4.8. Show that A = 00 10 is not diagonalizable. Example 4.20. Let 2 3 3 2 0 A=4 2 3 0 5 0 0 5 We nd that A has eigenvalues 1 = 1 with multiplicity one and 2 = 3 = 5 with multiplicity two. The 1 = 1 eigenspace has basis f(1; 1; 0)g 114 The Eigenvalue Problem and the 2 = 3 = 5 eigenspace has basis f( 1; 1; 0); (0; 0; 1)g: Thus setting 2 3 1 X=4 1 0 we have 1 0 1 0 5; 0 1 2 1 0 0 X 1 AX = 4 0 5 0 0 0 5 3 5 Problem 4.9. Diagonalize the following matrices. That is, nd X invertible and D diagonal such that X 1 AX = D. 2 1. 4 2 2. 4 2 3. 4 2 4. 6 6 4 19 25 17 9 11 9 6 9 4 2 0 0 0 5 3 1 0 0 3 4 0 3 1 3 0 0 0 0 0 0 3 0 1 3 5 3 5 0 2 0 0 0 0 3 1 0 0 0 3 3 7 7 5 ! R where L(x ; x ) = (3x + 4x ; 2x + x ): relative to which the matrix representation of L is diagonal. Problem 4.10. Let L : R2 1 Find a basis for R2 2 2 1 2 1 2 Theorem 4.21. Suppose 1 ; : : : ; k are distinct eigenvalues of A. If for each i = 1; : : : ; k, xi is an eigenvector corresponding to i , then the set fx ; : : : ; xk g 1 is a linearly independent set. Diagonalization 115 In particular, Theorem 4.21 implies that if Si is a basis for the i -eigenspace, i = 1; : : : ; k. of a matrix A with distinct eigenvalues 1 ; : : : ; k then S1 [ [ Sk is still a linearly independent set. Proof: Suppose that the set S = fx1 ; : : : ; xk g is linearly dependent. Suppose further that the largest linearly independent subset of S contains 0 < r < k vectors so that by re-ordering if necessary we can assume that fx1 ; : : : ; xr g are linearly independent, while fx1 ; : : : ; xr ; xr+1 g is linearly dependent. Then k1 x1 + + kr xr + kr+1 xr+1 = 0 (4.11) has a nontrivial solution k1 ; : : : ; kr ; kr+1 . Applying A to both sides of (4.11) gives 0 = A(k1 x1 + + kr xr + kr+1 xr+1 = k1 Ax1 + + kr Axr + kr Axr+1 = k1 1 x1 + + kr r xr + kr+1 r+1 xr+1 : Thus k1 1 x1 + + kr r xr + kr+1 r+1 xr+1 = 0 (4.12) Since x1 ; : : : ; xr are independent we have by (4.11) that kr+1 6= 0 and hence that at least one of k1 ; : : : ; kr is not zero. Multipling (4.11) by r+1 and adding to (4.12), we obtain that k1 (1 r+1 )x1 + + kr (r r+1 )xr = 0 Since x1 ; : : : ; xr are linearly independent, this has the solution k1 (1 r+1 ) = k2 (2 r+1 ) = = kr (r r+1 ) = 0 But since all the i are distinct, this implies that k1 = k2 = = kr = 0, which in turn forces, kr+1 = 0. This contradicts our starting hypothesis, and so the theorem is proved. 2 Theorem 4.21 allows us to give a suÆcient condition for a matrix to be diagonalizable. Theorem 4.22. If the n n matrix A has n distinct eigenvalues, then A is diagonalizable. Proof: Since each of the n distinct eigenvalues gives at least one eigenvector, we have by Theorem 4.21 that the n eigenvectors so obtained are linearly independent. 2 116 4.4 The Eigenvalue Problem The Schur Decomposition Prerequisites: Basic denitions for the eigenvalue problem Matrix algebra. Unitary matrices and orthogonal bases Learning Objectives: Familiarity with the construction of unitary triangularization. Understanding in what sense nondiagonalizable matrices are always close to dagonalizable matrices. We have seen in the previous two sections that not every square matrix is similar to a matrix in diagonal form, although construction of the Jordan form showed that every matrix is similar to a \block diagonal" form having upper triangular blocks. We pursue here the possibility that every matrix could be similar to some matrix in triangular form, though not necessarily \block diagonal" as for the Jordan form. The following theorem is known as Schur's Theorem. Theorem 4.23. Given an n n matrix A, there exists a unitary matrix U such that U AU = T where T is upper triangular. Clearly Theorem 4.23 tells us that A is similar to T via the matrix U. Since U is unitary we say that A and T are unitarily equivalent. Also the eigenvalues of A are the main diagonal entries of T. Indeed, this follows since the eigenvalues of an upper triangular matrix are the entries on the main diagonal and similar matrices have the same eigenvalues. Proof: We will prove this theorem by induction on the size n of the matrix. Suppose A is a 2 2 matrix. Then A has at least one eigenpair (1 ; u1 ) with u1 a unit vector. If the rst coordinate of u1 is nonzero, let S = fu1; e2 g If the rst coordinate is zero, then the second coordinate is not zero and we let S = fu1; e1 g 117 The Schur Decomposition In either case S is a basis for C 2 . Use the Gram-Schmidt process to get an orthonormal basis fu ; v g 1 2 of C 2 . Let U be the unitary matrix with columns u1 and v2 . Then AU has columns Au1 = 1 u1 and Av2 Thus the matrix U AU has rst column h u 1 ; u1 i 1 U u1 = 1 hv ; u i = 1 10 2 1 The matrix U AU is upper triangular and we are done in the n = 2 case. If A is a 3 3 matrix, let (1 ; u1 ) be an eigenpair as before. Again u1 has at least one nonzero coordinate. If the rst coordinate is nonzero, let S = fu1 ; e2 ; e3g: If the rst coordinate is zero but the second is nonzero, replace e2 in S by e1 . If the rst two coordinates of u1 are both zero, replace e3 by e1 . In any case S is a basis for C 3 . Using the Gram-Schmidt process we obtain a orthonormal basis fu ; v ; v :g 1 2 3 Let V be the unitary matrix with these vectors as columns. Then 2 1 0 B V AV = 4 0 3 5 where B is a 2 2 matrix and the symbol denotes an entry which may change from occurence to occurence, but is inconsequential in the proof of the theorem. We know from the 2 2 case already proved, that there exists a 2 2 unitary matrix W such that W BW = T1 where T1 is a 2 2 upper triangular matrix. Set 2 Y=4 1 0 0 0 0 W 3 5 118 We have The Eigenvalue Problem (VY) A(VY) = Y (2V AV)Y 3 1 5Y = Y 4 0 B 0 2 3 1 = 4 0 W BW 5 0 2 3 1 5 = 4 0 T 1 0 = T: Since the columns of Y are orthonormal, Y is unitary. Thus setting U = (VY), we have that U is the product of two unitary matrices and is hence unitary. Since U AU = T; we are done in the 3 3 case. For n = 4 we proceed as we did in the n = 3 case except that B is 3 3, Y has the vector (1; 0; 0; 0) for its rst row and column and W is the matrix that \triangularizes" B. It should now be clear how to proceed for general n. 2 Example 4.24. Let 2 3 0 0 1 A=4 0 0 1 5 4 0 0 We nd that pA (t) = t(t 2)(t + 2): and that (0; e2 ) is an eigenpair for A. We set S = fe2 ; e1 ; e3 g which coincidentally is already orthonormal. Set 2 3 0 1 0 V=4 1 0 0 5 0 0 1 We nd that 2 3 0 0 1 V AV = 4 0 0 1 5 0 4 0 119 The Schur Decomposition We now triangularize the 2 2 matrix B = 04 10 We already know that the diagonal elements of the nal 3 3 upper triangular matrix are 0; 2; 2. This means that B must have eigenvalues 2 and 2. We nd that p p u1 = ( 1= 5; 2= 5) is an eigenvector corresponding to 2. We now set S = fu1 ; e2 g Applying GramSchmidt we obtain the orthonormal basis p p p p ( 1= 5; 2= 5); (2= 5; 1= 5): We set p p 5 2 = 1 = p p5 W= 2= 5 1= 5 and 2 1 0 p0 3 p Y = 4 0 1=p5 2=p5 5 0 2= 5 1= 5 Finally we nd that p p 3 2 0 1= 5 2= 5 U = VY = 4 1 p0 p0 5 0 2= 5 1= 5 and that 2 p 3 p 0 2= 5 1= 5 U AU = T = 4 0 2 35 0 0 2 Problem 4.11. Show that given an n n matrix A, there exists a unitary matrix U such that U AU = L where L is lower triangular. Problem 4.12. Find upper triangular matrices unitarily equivalent to the following matrices 1. 12 03 120 The Eigenvalue Problem 2 3 0 1 1 2. 4 0 1 1 5 (Use the fact that (0; e1) is an eigenpair.) 0 1 0 Let A = (aij ), B = (bij ) be two n n complex matrices. We dene the distance from A to B, denoted by d(A; B) to be d(A; B) = kA BkF We say that the a sequence of matrices (Ak ) converges to a matrixA if d(Ak ; A) ! 0 as k ! 1. Theorem 4.25. Let A be an n n matrix. There exist diagonalizable matrices arbitrarily close to A It is enough to nd for each integer k a matrix Ak with distinct eigenvalues such that d(A; Ak ) < 1=k. By Schur's Theorem, we can nd an unitary matrix U and an upper triangular matrix T such that 2 1 3 7 ... A = UTU = U 6 4 5U 0 n Now dene 2 0 1 3 7 ... Ak = UT1 U = U 6 4 5U 0 0 n Here we mean that T and T1 dier only in their diagonal elements. It should be clear that we can make ji 0i j small enough so that d(A; Ak ) < 1=k. 2 In eect, this amounts to asserting that the set of nondiagonalizable matrices is \thin" in some sense. Proof: Problem 4.13. Recall the trace of an n n matrix A = [aij ], is dened by: trace(A) = a11 + a22 + + ann Prove that similar matrices have the same trace. Let A have eigenvalues 1 ; : : : ; n counting multiplicity. Show that trace (A) = 1 + + n (Comment: note that the characteristic polynomial of A, pA can be written pA (t) = (t 1 ) (t n ): So, the coeÆcient of tn 1 in the characteristic polynomial is trace (A)). Problem 4.14. Show that if A has eigenvalues 1 ; : : : ; n counting multiplicity, then det(A) = 1 2 n and the constant term in the characteristic polynomial pA (t) is det(A). 4.5. 4.5 HERMITIAN AND OTHER NORMAL MATRICES 121 Hermitian and other Normal Matrices Prerequisites: Schur decomposition Matrix algebra. Learning Objectives: Familiarity with Hermitian, skew-Hermitian, and positive denite matrices. Familiarity with the characterization of unitarily diagonalizable matrices. Understanding the relationship of the SVD of A with the unitary diagonalization of A A. The Schur decomposition reveals structure for many matrices that have some sort of symmetry or other features that can be described using both the matrix and its transpose. 4.5.1 Hermitian matrices An n n matrix H is called Hermitian or self-adjoint if H = H . If H is Hermitian and has only real entries, then H = Ht , and we say that H is symmetric. We say that an n n matrix A is unitarily diagonalizable if there exists a unitary matrix U and a diagonal matrix D such that U AU = D. As another application of Schur's Theorem we have the following important result. Theorem 4.26. Let H be Hermitian. Then H is unitarily diagonalizable and has only real eigenvalues. Proof: By Schur's Theorem, there exists a unitary matrix U and an upper triangular matrix T such that T = U HU: Taking the adjoint of both sides of this equation and using the fact that H is Hermitian, we obtain T = (U HU) = U H U = U HU Thus T = T . Since T is upper triangular and T is lower triangular, we see that T is diagonal with only real entries on the main diagonal. Since these diagonal entries are precisely the eigenvalues of H, the theorem is proved. 2 122 The Eigenvalue Problem Theorem 4.27. Let H be a Hermitian matrix. Eigenvectors belonging to different eigenspaces are orthogonal. Proof: and Suppose that h; ui and h; vi are eigenpairs for H with 6= . Then hHu; vi = hu; vi hu; Hvi = hu; vi: But since H is Hermitian, hu; Hvi = hu; H vi = hHu; vi Hence hu; vi = hu; vi. This implies that (u; vi = (u; vi = hu; vi since is real by Theorem 4.26. Since 6= , it must be that hu; vi = 0. The theorem is proved. 2 We know as a consequence of the previous two theorems we can unitarily diagonalize the Hermitian matrix H as follows. Suppose 1 ; : : : ; N are the distinct eigenvalues of H. Find a basis for each i -eigenspace and orthonormalize it via the Gram-Schmidt process. By Theorem 4.27 distinct eigenspaces are orthogonal and so we have produced n orthonormal eigenvectors which we use for the columns of U. Problem 4.15. Find a unitary matrix U that diagonalizes 2 3 4 2 2 1. H = 4 2 4 2 5 2 2 4 2. H = 1 2 i 1 +3 i 3 2 3 1 0 0 0 6 1 3 0 0 0 7 7 6 3. H = 66 0 0 2 1 1 77 (Note that pA (t) = (t 4)2 (t 1)2 (t 2).) 4 0 0 1 2 1 5 0 0 1 1 2 4.5.2 Normal Matrices Which matrices are unitarily diagonalizable ? Theorem 4.28. An n n matrix N is unitarily diagonalizable if and only if N N = NN Such matrices are called normal. 123 Hermitian and other Normal Matrices If N is unitarily diagonalizable then N = UDU for some unitary matrix U and diagonal matrix D. Since D D = DD , N must be normal. Conversely suppose N is a normal matrix and let N = UTU be the Schur decomposition of N. N N = NN implies that T T = TT . Now, calculate the (k,k) diagonal entry on each side of this equality: Proof: k X j =1 jtjk j = 2 n X j =k jtkj j 2 For k = 1: jt j = jt j + jt j + + jt n j 0 = jt j + + jt n j which implies t = = t n = 0. For k = 2: jt j + jt j = jt j + jt j + + jt n j 0 = jt j + + jt n j which implies t = t = = t n = 0. For k = 3: jt j + jt j + jt j = jt j + jt j + + jt n j 0 = jt j + + jt n j which implies t = t = = t n = 0. Thus the rst three rows of T have only zero o-diagonal entries. One can continue in this way to nd ultimately that all rows have only zero o-diagonal entries, so that T is, in fact, a diagonal matrix. 2 Problem 4.16. A matrix A is called skew-Hermitian if A = A . Show that skew-Hermitian matrices are normal. Show that if A is skew-Hermitian then all the entries on the main diagonal are pure imaginary and that all eigenvalues of A are pure imaginary. Problem 4.17. Show that a unitary matrix is unitarily diagonalizable. 11 2 11 12 12 2 12 2 2 2 22 2 22 24 23 2 23 2 35 2 2 2 2 2 2 2 33 2 33 34 34 2 2 1 23 13 1 1 12 23 2 2 2 34 2 3 3 2 2 3 4.5.3 Positive Denite Matrices An n n Hermitian matrix H is called positive denite if x Hx > 0 for all nonzero x. Theorem 4.29. A Hermitian matrix H is positive denite if and only if all its eigenvalues are positive. 124 The Eigenvalue Problem Proof: Suppose H is positive denite and (; x) is an eigenpair. Then if x is nonzero, 0 < x Hx = x x: Since x x > 0, we have that > 0. Conversely assume that all the eigenvalues of H are positive. Then there exists U unitary and D diagonal such that HU = UD: Given a nonzero x in C n , then y = U x is nonzero and satises Uy = x. Let y = (y1 ; : : : ; yn ). We have hHx; xi = = = = = hHUy; Uyi hUDy; Uyi hU UDy; yi hDy; yi jy j + + n jyn j > 0: 1 1 2 2 The last inequality follows since all the i are positive and at least one of the yi is nonzero. 2 Let fp1 ; p2 ; : : : pr g be a set of vectors in C n . The Gram matrix of fp1; p2 ; : : : pr g is the r r matrix dened by G = [pi pj ]. Theorem 4.30. The Gram matrix G = [pi pj ] of a set of vectors fp1; p2 ; : : : pr g is positive denite if and only if the vectors are linearly independent. Proof: Suppose that G is positive denite. Then if p = Prj=1 xj pj , r X kpk = h 2 i=1 xi pi ; r X j =1 xj pj i = x Gx: Thus if fp1 ; p2 ; : : : pr g is linearly dependent so that there is a nontrivial linear combination that yields p = 0 then G cannot be positive denite. The converse just reverses the argument and is left as an exercise. 2 As another application where positive denite matrices arise, consider the real valued function f dened on , a domain in R2 . Suppose that the function is dierentiable with continuous second and third partials on . The Taylor 125 Hermitian and other Normal Matrices series for such a function at a point (x0 ; y0 ) 2 is given by @f @f f (x0 + h; y0 + k) = f (x0 ; y0 ) + h+ k (x0 ;y0 ) @x @y (x0 ;y0 ) 2 2 @ f + @@xf2 (x0 ;y0 ) h2 + 2 @x@y hk (x0 ;y0 ) 2 + @@yf2 (x0 ;y0 ) k2 + R(x0 ; y0 ; h; k) h h h 2 = f (x0 ; y0 ) + rf k + r f k ; k +R(x0 ; y0 ; h; k): Here rf = h 2 rf 2 @f @x (x0 ;y0 ) @ 2 f 2 = 4 @@x2 f (x0 ;y0 ) @x@y (x0 ;y0 ) i @f ( x ;y ) @y 0 0 3 @ 2 f (x ;y ) @x@y 2 0 0 5 @ f @y2 (x0 ;y0 ) and R is a remainder term such that R=(h2 + k2 ) ! 0 as (h; k) ! (0; 0). The 1 2 matrix rf is called the derivative of f at (x0 ; y0 ) and the 2 2 Hermitian matrix r2 f is called the Hessian of f at (x0 ; y0 ). Recall that (x0 ; y0 ) is called a critical point of f if rf = 0 The critical point (x0 ; y0) is called a strict local minimum for f , if f (x0 + h; y0 + k) > f (x0 ; y0 ) for all (h; k) suÆciently close to the origin. Since rf = 0 and R is neglegible, we see that the critical point (x0 ; y0 ) is a strict local minimum if (but not only if) r f hk ; hk > 0 for all (h; k) suÆciently close to the origin. Since for all real (4.13) 2 h r f h = H hk ; hk ; k ; k we see that (x ; y ) is a local minimum if (4.13) holds for all (h; k) 2 R . Thus the critical point (x ; y ) is a strict local minimum if H is positive denite. Similarly the critical point (x ; y ) is a strict local maximum if H is positive denite. We call such a matrix negative denite. Problem 4.18. Show that the Hessian of f at (x ; y ) is positive denite if and only if at the point (x ; y ) 2 0 2 2 0 0 0 0 0 0 0 0 0 @2f @2f @2f 2 @2f > 0; and ( 2 )( 2 ) ( ) > 0: 2 @x @x @y @x@y 126 The Eigenvalue Problem 4.5.4 Revisiting the Singular Value Decomposition We learned earlier in this section that a normal matrix is unitarily equivalent to a diagonal matrix { or to reinterpret in terms of matrix representations: \Any linear transformation representable by a normal matrix can be represented by a diagonal matrix after an appropriate choice of an orthonormal basis for C n ." This is reminiscent of a more general matrix representation discussed at the end of Chapter 3 known as the singular value decomposition. We provide here a second derivation of the singular value decomposition using the Schur decomposition. Theorem 4.31. Suppose A 2 C mn and rank(A) = r. There exist unitary matrices U 2 C mm and V 2 C nn so that A = UV (4.14) where = diag(1 ; 2 ; : : : ) 2 C mn with 1 2 r > 0 and r+1 = = p = 0 for p = minfm; ng. Proof: Suppose rst that m n. The n n matrix A A is Hermitian, positive semidenite (i.e. hAx; xi 0forallx) with rank(A A) = r, and so A A has exactly r nonzero eigenvalues, all of which are positive and so may be written as squares of numbers i 0: 12 22 r2 > 0 r2+1 = = n2 = 0 Likewise there exists a unitary n n matrix V that diagonalizes A A, 2 0 2 r V (A A)V = diag(i ) = 0 0 : Since the o-diagonal terms of V (A A)V = (AV) (AV) are zero, the columns of (AV) are mutually orthogonal. Likewise, the norm of the i-th column is i . Thus there is an m n matrix Un so that Un Un = I and Un [r 0] = AV Now, augment the columns of Un to ll out an orthonormal basis of C m : h i ^ n . Then dening U = Un U = 0r 00 we nd U = Un [r 0] = AV and A = UV . We have decomposed A into the product of an m m unitary matrix, a nonnegative diagonal matrix, and an n n unitary matrix. The case m < n can be handled easily by considering A instead in the derivation above and interchanging the roles of U and V. 2