* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Linear Algebra Notes - An error has occurred.
Rotation matrix wikipedia , lookup
Exterior algebra wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Vector space wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Principal component analysis wikipedia , lookup
Determinant wikipedia , lookup
Jordan normal form wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Gaussian elimination wikipedia , lookup
System of linear equations wikipedia , lookup
Matrix calculus wikipedia , lookup
Four-vector wikipedia , lookup
Ordinary least squares wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Linear Algebra Notes 1 Linear Equations 1.1 Introduction to Linear Systems Definition 1. A system of linear equations is consistent if it has at least one solution. If it has no solutions it is inconsistent. 1.2 Matrices, Vectors, and Gauss-Jordan Elimination Definition 2. Given a system of linear equations x − 3y = 8 4x + 9y = −2, the coefficient matrix of the system contains the coefficients of the system: 1 −3 , 4 9 while the augmented matrix of the system contains 1 −3 4 9 the numbers to the right of the equals sign as well: 8 −2 . Definition 3. A matrix is in reduced row-echelon form (RREF) if it satisfies all of the following conditions: a) If a row has nonzero entries, then the first nonzero entry is a 1, called a leading 1. b) If a column contains a leading 1, then all the other entries in that column are 0. c) If a row contains a leading 1, then each row above it contains a leading 1 further to the left. Definition 4. An elementary row operation (ERO) is one of the following types of operations: a) Row swap: swap two rows. b) Row division: divide a row by a (nonzero) scalar. c) Row addition: add a multiple of a row to another row. Theorem 5. For any matrix A, there is a unique matrix rref(A) in RREF which can be obtained by applying a sequence of ERO’s to A. Procedure 6. Gauss-Jordan elimination (GJE) is a procedure for putting a matrix A in RREF by applying a sequence of ERO’s. It can be applied to the augmented matrix of a system of linear equations to solve the system. Imagine a cursor moving through the entries of the augmented matrix, starting in the upper-left corner. The procedure ends when the cursor leaves the matrix. Note that steps 2, 3, and 4 consist of elementary row operations. 1 1. If the cursor column contains all 0’s, move the cursor to the right and repeat step 1. 2. If the cursor entry is 0, swap the cursor row with a lower row so that the cursor entry becomes nonzero. 3. If the cursor entry is not 1, divide the cursor row by it to make it 1. 4. If the other entries in the cursor column are nonzero, make them 0 by adding the appropriate multiples of the cursor row to the other rows. 5. Move the cursor down and to the right and go to step 1. 1.3 On the Solutions of Linear Systems; Matrix Algebra Definition 7. If a column of the RREF of a coefficient matrix contains a leading 1, the corresponding variable of the linear system is called a leading variable. If a column does not contain a leading 1, the corresponding variable is called a free variable. Definition 8. A row of the form 0 0 · · · 0 | 1 in the RREF of an augmented matrix is called an inconsistent row, since it signifies the inconsistent equation 0 = 1. A | b can be read from Theorem 9. The number of solutions of a linear system with augmented matrix rref A | b : rref A | b no free variables free variable no inconsistent rows 1 ∞ inconsistent row 0 0 Definition 10 (1.3.2). The rank of a matrix A, written rank(A), is the number of leading 1’s in rref(A). Theorem 11. For an n × m matrix A, we have rank(A) ≤ n and rank(A) ≤ m. Proof. Each of the n rows contains at most one leading 1, as does each of the m columns. Theorem 12 (1.3.4, uniqueness of solution with n equations and n variables). A linear system of n equations in n variables has a unique solution if and only if the rank of its coefficient matrix A is n, in which case 1 0 0 ··· 0 0 1 0 · · · 0 rref(A) = 0 0 1 · · · 0 . .. .. .. . . . . . . . .. 0 0 0 ··· 1 Proof. A system Ax = b has a unique solution if and only if it has no free variables and rref A | b has no inconsistent rows. This happens exactly when each row and each column of rref(A) contain leading 1’s, which is equivalent to rank(A) = n. The matrix above is the only n × n matrix in RREF with rank n. Notation 13. We denote the ijth entry of a matrix A by either aij or Aij or [A]ij . (The final notation is convenient when working with a compound matrix such as A + B.) 2 Definition 14 (1.3.5). If A and B are n × m matrices, then their sum A + B is defined entry-by-entry: [A + B]ij = Aij + Bij . If k is a scalar, then the scalar multiple kA of A is also defined entry-by-entry: [kA]ij = kAij . Definition 15 (1.3.7, Row definition of matrix-vector multiplication). The product Ax of an n × m matrix A and a vector x ∈ Rm is given by w1 · x − w1 − . .. Ax = x = .. . . − wn − wn · x Notation 16. We denote the ith component of a vector x by either xi or [x]i . (The latter notation is convenient when working with a compound vector such as Ax.) Theorem 17 (1.3.8, Column definition of matrix-vector multiplication). The product Ax of an n×m matrix A and a vector x ∈ Rm is given by x 1 | | .. Ax = v1 · · · vm . = x1 v1 + · · · + xm vm . | | xm Proof. According to the row definition of matrix-vector multiplication, the ith component of Ax is [Ax]i = wi · x = ai1 x1 + · · · + aim xm = x1 [v1 ]i + · · · + xm [vm ]i = [x1 v1 ]i + · · · + [xm vm ]i = [x1 v1 + · · · + xm vm ]i . Since Ax and x1 v1 + · · · + xm vm have equal ith components for all i, they are equal vectors. Definition 18 (1.3.9). A vector b ∈ Rn is called a linear combination of the vectors v1 , . . . , vm in Rn if there exist scalars x1 , . . . , xm such that b = x1 v 1 + · · · + xm v m . Note that Ax is a linear combination of the columns of A. By convention, 0 is considered to be the unique linear combination of the empty set of vectors. Theorem 19 (1.3.10, properties of matrix-vector multiplication). If A, B are n × m matrices, x, y ∈ Rm , and k is a scalar, then a) A(x + y) = Ax + Ay, b) (A + B)x = Ax + Bx, c) A(kx) = k(Ax). 3 Proof. Let wi and ui be the ith rows of A and B, respectively. We show that the ith components of each side are equal. [A(x + y)]i = wi · (x + y) = wi · x + wi · y = [Ax]i + [Ay]i = [Ax + Ay]i , [(A + B)x]i = (ith row of A + B) · x = (wi + ui ) · x = wi · x + ui · x = [Ax]i + [Bx]i = [Ax + Bx]i , [A(kx)]i = wi · (kx) = k(wi · x) = k[Ax]i = [k(Ax)]i . Definition 20 (1.3.11). A linear system with augmented matrix A | b can be written in matrix form as Ax = b. 4 2 2.1 Linear Transformations Introduction to Linear Transformations and Their Inverses Definition 21 (2.1.1). A function T : Rm → Rn is called a linear transformation if there exists an n × m matrix A such that T (x) = Ax for all vectors x in Rm . Note 22. If T : R2 → R2 , T (x) = y = above can be written as or y1 a a12 x , A = 11 , and x = 1 , then the linear transformation y2 a21 a22 x2 y1 a a12 x1 = 11 , y2 a21 a22 x2 ( y1 = a11 x1 + a12 x2 y2 = a21 x1 + a22 x2 . Definition 23. The identity matrix of size n 1 0 In = 0 .. . is 0 1 0 .. . 0 0 1 .. . ··· ··· ··· .. . 0 0 0 , .. . 0 0 0 ··· 1 and T (x) = In x = x is the identity transformation from Rn to Rn . If the value of n is understood, then we often write just I for In . 0 0 .. . Definition 24. The standard (basis) vectors e1 , e2 , . . . , em in Rm are the vectors ei = 1, with a 1 in . .. 0 the ith place and 0’s elsewhere. Note that for a matrix A with m columns, Aei is the ith column of A. Theorem 25 (2.1.2, matrix of a linear transformation). For a linear transformation T : Rm → Rn , there is a unique matrix A such that T (x) = Ax, obtained by applying T to the standard basis vectors: | | | A = T (e1 ) T (e2 ) · · · T (em ) . | | | It follows that if two n × m matrices A and B satisfy Ax = Bx for all x ∈ Rm , then A = B. Proof. By the definition of matrix-vector multiplication, the ith column of A is Aei = T (ei ). For the second statement, note that if Ax = Bx for all x ∈ Rm , then A and B define the same linear transformation T , so they must be the same matrix by the first part of the theorem. 5 Theorem 26 (2.1.3, linearity criterion). A function T : Rm → Rn is a linear transformation if and only if a) T (v + w) = T (v) + T (w), for all vectors v and w in Rm , and b) T (kv) = kT (v), for all vectors v in Rm and all scalars k. Proof. Suppose T is a linear transformation, and let A be a matrix such that T (x) = Ax for all x ∈ Rm . Then T (v + w) = A(v + w) = Av + Aw = T (v) + T (w), T (kv) = A(kv) = k(Av) = kT (v). To prove the converse, suppose that a function T : Rm → Rn satisfies (a) and (b). Then for all x ∈ Rm , x1 x2 T (x) = T . = T (x1 e1 + x2 e2 + · · · + xm em ) .. xm = T (x1 e1 ) + T (x2 e2 ) + · · · + T (xm em ) = x1 T (e1 ) + x2 T (e2 ) + · · · + xm T (em ) x1 | | | x2 = T (e1 ) T (e2 ) · · · T (em ) . , .. | | | xm so T is a linear transformation. 2.2 Linear Transformations in Geometry Definition 27. The linear transformation from R2 to R2 represented by a matrix of the form A = k 0 0 k is called scaling by (a factor of ) k. Definition 28 (2.2.1). Given a line L through the origin in R2 parallel to the vector w, the orthogonal projection onto L is the linear transformation x·w projL (x) = w, w·w with matrix 2 1 w1 w12 + w22 w1 w2 w1 w2 . w22 1 The projections onto the x- and y-axes are represented by the matrices 0 0 0 and 0 0 0 , respectively. 1 Definition 29 (2.2.2). Given a line L through the origin in R2 parallel to the vector w, the reflection about L is the linear transformation x·w w − x, ref L (x) = 2 projL (x) − x = 2 w·w 6 with matrix 2 1 2w1 − 1 2 2 w1 + w2 2w1 w2 2w1 w2 . 2w22 − 1 1 The reflections about the x- and y-axes are represented by the matrices 0 0 −1 and −1 0 0 , respectively. 1 Definition 30 (2.2.3). The linear transformation from R2 to R2 represented by a matrix of the form cos θ − sin θ A= sin θ cos θ is called counterclockwise rotation through an angle θ (about the origin). Definition 31 (2.2.5). The linear transformation from R2 to R2 represented by a matrix of the form 1 k 1 0 A= or A = 0 1 k 1 is called a horizontal shear or a vertical shear, respectively. 2.3 Matrix Products Theorem 32. If T : Rm → Rp and S : Rp → Rn are linear transformations, then their composition S ◦ T : Rm → Rn given by (S ◦ T )(x) = S(T (x)) is also a linear transformation. Proof. We show that if T and S satisfy the linearity criteria, then so does S ◦ T . Let v, w ∈ Rm and k ∈ R. Then (S ◦ T )(v + w) = S(T (v + w)) = S(T (v) + T (w)) = S(T (v)) + S(T (w)) = (S ◦ T )(v) + (S ◦ T )(w), (S ◦ T )(kv) = S(T (kv)) = S(kT (v)) = k(S(T (v)) = k(S ◦ T )(v). Definition 33 (2.3.1, matrix multiplication from composition of linear transformations). If B is an n × p matrix and A is a q × m matrix, then the product matrix BA is defined if and only if p = q, in which case it is the matrix of the linear transformation T (x) = B(A(x)). As a result, (BA)x = B(A(x)). Theorem 34 (2.3.2, matrix multiplication using columns of matrix on the right). If B is an n × p matrix and A is a p × m matrix with columns v1 , . . . , vm , then | | | | | | BA = B v1 v2 · · · vm = Bv1 Bv2 · · · Bvm . | | | | | | Proof. The ith column of BA is (BA)ei = B(Aei ) = Bvi . 7 Theorem 35 (2.3.4, matrix multiplication matrix with columns v1 , . . . , vm , then the ijth − w1 .. . − w BA = i .. . − wn entry-by-entry). If B is an n × p matrix and A is a p × m entry of − | | | − v1 · · · vj · · · vm | | | − is the dot product of the ith row of B with the jth column of A: [BA]ij = wi · vj = bi1 a1j + bi2 a2j + · · · + bip apj = p X bik akj . k=1 Proof. The ijth entry of BA is the ith component of Bvj , by Theorem 34, which equals wi ·vj , by Definition 15. Theorem 36 (2.3.5, identity matrix). For any n × m matrix A, AIm = A and In A = A. Proof. Since (AIm )x = A(Im x) = Ax for all x ∈ Rm , we have AIm = A. The proof for In A = A is analogous. Theorem 37 (2.3.6, multiplication is associative). If AB and BC are defined, then (AB)C = A(BC). We can simply write ABC to indicate this single matrix. Proof. Using Definition 33 four times, we get ((AB)C)x = (AB)(Cx) = A(B(Cx)) = A((BC)x) = (A(BC))x for any x of appropriate dimension, so (AB)C = A(BC). Theorem 38 (2.3.7, multiplication distributes over addition). If A and B are n × p matrices and C and D are p × m matrices, then A(C + D) = AC + AD, (A + B)C = AC + BC. Proof. We show that the two sides of the first equation give the same linear transformation, using parts a and b of Theorem 19. Because of Theorem 37, we can suppress some parentheses: A(C + D)x = A(Cx + Dx) = ACx + ADx = (AC + AD)x. Similarly, (A + B)Cx = ACx + BCx = (AC + BC)x. Theorem 39 (2.3.8, scalar multiplication). If A is an n × p matrix, B is a p × m matrix, and k is a scalar, then k(AB) = (kA)B and k(AB) = A(kB). Note 40 (2.3.3, multiplication is not commutative). When A and B are both n × m matrices, AB and BA are both defined, but they are usually not equal. In fact, they do not even have the same dimensions unless n = m. 8 2.4 The Inverse of a Linear Transformation Definition 41. For a function T : X → Y , X is called the domain and Y is called the target. • A function T : X → Y is called one-to-one if for any y ∈ Y there is at most one input x ∈ X such that T (x) = y (different inputs give different outputs). • A function T : X → Y is called onto if for any y ∈ Y there is at least one input x ∈ X such that T (x) = y (every target element is an output). • A function T : X → Y is called invertible if for any y ∈ Y there is exactly one x ∈ X such that T (x) = y. Note that a function is invertible if and only if it is both one-to-one and onto. Definition 42 (2.4.1). If T : X → Y is invertible, we can define a unique inverse function T −1 : Y → X by setting T −1 (y) to be the unique x ∈ X such that T (x) = y. It follows that T −1 (T (x)) = x and T (T −1 (y)) = y, so T −1 ◦ T and T ◦ T −1 are identity functions. For any invertible function T , (T −1 )−1 = T , so T −1 is also invertible, with inverse function T . Note 43. A linear transformation T : Rm → Rn given by T (x) = Ax is invertible if for any y ∈ Rn , there is a unique x ∈ Rm such that Ax = y. Theorem 44 (2.4.2, linearity of the inverse). If a linear transformation T : Rm → Rn is invertible, then its inverse T −1 is also a linear transformation. Proof. We show that if T satisfies the linearity criteria (Theorem 26), then T −1 : Rn → Rm does also. Let v, w ∈ Rn and k ∈ R. Then v + w = T (T −1 (v)) + T (T −1 (w)) = T (T −1 (v) + T −1 (w)), and applying T −1 to each side gives T −1 (v + w) = T −1 (v) + T −1 (w). Similarly, kv = kT (T −1 (v)) = T (kT −1 (v)), and so T −1 (kv) = kT −1 (v). Definition 45 (2.4.2). If T (x) = Ax is invertible, then A is said to be an invertible matrix, and the matrix of T −1 is called the inverse matrix of A, written A−1 . Theorem 46 (2.4.8, the inverse matrix as multiplicative inverse). A is invertible if and only if there exists a matrix B such that BA = I and AB = I. In this case, B = A−1 . 9 Proof. If A is invertible, then, taking B to be A−1 , (BA)x = (A−1 A)x = A−1 (Ax) = T −1 (T (x)) = x = Ix (AB)y = (AA−1 )y = A(A−1 y) = T (T −1 (y)) = y = Iy for all x, y of correct dimension, so BA = I and AB = I. Conversely, if we have a matrix B such that BA = I and AB = I, then T (x) = Ax and S(y) = By satisfy S(T (x)) = B(Ax) = (BA)x = Ix = x, T (S(y)) = A(By) = (AB)y = Iy = y, for any x, y of correct dimension, so S ◦ T and T ◦ S are identity transformations. Thus S is the inverse transformation of T , A is invertible, and B = A−1 . Theorem 47 (2.4.3, invertibility criteria). If a matrix is not square, then it is not invertible. For an n × n matrix A, the following are equivalent: 1. A is invertible, 2. Ax = b has a unique solution x for any b, 3. rref(A) = In , 4. rank(A) = n. Proof. Let A be an n × m matrix with inverse A−1 . Since T (x) = Ax maps Rm → Rn , the inverse transformation T −1 maps Rn → Rm , so A−1 is an m × n matrix. If m > n, then the linear system Ax = 0 has at least one free variable, so it cannot have a unique solution, contradicting the invertibility of A. If n > m, then A−1 y = 0 has at least one free variable, so it cannot have a unique solution, contradicting the invertibility of A−1 . It follows that n = m. The equivalence of the first two statements is a restatement of Note 43. Statements 3 and 4 are equivalent to the second one by Theorem 12. Procedure 48 (2.4.5, the inverse To find the inverse of a matrix A (if it computing of a matrix). exists), compute rref A | In , which is equal to rref(A) | B for some B. • If rref(A) 6= In , then A is not invertible. • If rref(A) = In , then A is invertible and A−1 = B. b is invertible if and only if d a b ad − bc 6= 0. The scalar ad − bc is called the determinant of A, written det(A). If A = is invertible, c d then 1 d −b −1 . A = det(A) −c a Theorem 49 (2.4.9, inverse of a 2 × 2 matrix). A 2 × 2 matrix A = a c Theorem 50 (2.4.7, inverse of a product of matrices). If A and B are invertible n × n matrices, then AB is invertible as well, and (AB)−1 = B −1 A−1 . 10 Proof. To show that B −1 A−1 is the inverse of AB, we check that their product in either order is the identity matrix: (AB)(B −1 A−1 ) = A(BB −1 )A−1 = AIn A−1 = AA−1 = In , (B −1 A−1 )(AB) = B −1 (A−1 A)B = B −1 In B = B −1 B = In . 11 Subspaces of Rn and Their Dimensions 3 3.1 Image and Kernel of a Linear Transformation Definition 51 (3.1.1). The image of a function T : X → Y is its set of outputs: im(T ) = {T (x) : x ∈ X}, a subset of the target Y . Note that T is onto if and only if im(T ) = Y . For a linear transformation T : Rm → Rn , the image is im(T ) = {T (x) : x ∈ Rm }, a subset of the target Rn . Definition 52 (3.1.2). The set of all linear combinations of the vectors v1 , . . . , vm in Rn is called their span: span(v1 , v2 , . . . , vm ) = {c1 v1 + · · · + cm vm : c1 , . . . , cm ∈ R}. If span(v1 , v2 , . . . , vm ) = W for some subset W of Rn , we say that the vectors v1 , . . . , vm span W . Thus span can be used as a noun or as a verb. Theorem 53 (3.1.3). The image of a linear transformation T (x) = Ax is the span of the column vectors of A. We denote the image of T by im(T ) or im(A). Proof. By the column definition of matrix multiplication, x 1 | | .. T (x) = Ax = v1 · · · vm . = x1 v1 + · · · + xm vm . | | xm Thus the image of T consists of all linear combinations of the column vectors v1 , . . . , vm of A. Note 54. By the preceding theorem, the column vectors v1 , . . . , vm ∈ Rn of an n × m matrix A span Rn if and only if im(A) = Rn , which is equivalent to T (x) = Ax being onto. Theorem 55 (3.1.4, the image is a subspace). The image of a linear transformation T : Rm → Rn has the following properties: a) contains zero vector: 0 ∈ im(T ). b) closed under addition: If y1 , y2 ∈ im(T ), then y1 + y2 ∈ im(T ). c) closed under scalar multiplication: If y ∈ im(T ) and k ∈ R, then ky ∈ im(T ). As we will see in the next section, these three properties mean that im(T ) is a subspace. Proof. a) 0 = A0 = T (0) ∈ im(T ). b) There exist vectors x1 , x2 ∈ Rm such that T (x1 ) = y1 , T (x2 ) = y2 . Since T is linear, y1 + y2 = T (x1 ) + T (x2 ) = T (x1 + x2 ) ∈ im(T ). c) There exists a vector x ∈ Rm such that T (x) = y. Since T is linear, ky = kT (x) = T (kx) ∈ im(T ). 12 Definition 56 (3.1.1). The kernel of a linear transformation T : Rm → Rn is its set of zeros: ker(T ) = {x ∈ Rm : T (x) = 0}, a subset of the domain Rm . Theorem 57 (kernel criterion for one-to-one). A linear transformation T : Rm → Rn is one-to-one if and only if ker(T ) = {0}. Proof. Since T (x) = Ax for some A, we have T (0) = A0 = 0. If T is one-to-one, it follows immediately that 0 is the only solution of T (x) = 0, so ker(T ) = {0}. Conversely, suppose ker(T ) = {0}, and let x1 , x2 ∈ Rm satisfy T (x1 ) = y and T (x2 ) = y. By the linearity of T , T (x1 − x2 ) = T (x1 ) − T (x2 ) = y − y = 0, so x1 − x2 = 0 and x1 = x2 , proving that T is one-to-one. Definition 58 (3.2.6). A linear relation among the vectors v1 , . . . , vm ∈ Rn is an equation of the form c1 v1 + · · · + cm vm = 0 for scalars c1 , . . . , cm ∈ R. If c1 = · · · = cm = 0, the relation is called trivial, while if at least one of the ci in nonzero, the relation is nontrivial. Theorem 59. The kernel of a linear transformation T (x) = Ax is the set of solutions x of the equation Ax = 0, i.e. x 1 | | .. = 0, v1 · · · vm . | | xm which corresponds to the set of linear relations x1 v1 + · · · + xm vm among the column vectors v1 , . . . , vm of A. We denote the kernel of T by ker(T ) or ker(A). Proof. The first statement is immediate from the definition of kernel, while the correspondence with linear relations follows from the column definition of the product Ax. Theorem 60 (3.1.4, the kernel is a subspace). The kernel of a linear transformation T : Rm → Rn has the following properties: a) contains zero vector: 0 ∈ ker(T ). b) closed under addition: If x1 , x2 ∈ ker(T ), then x1 + x2 ∈ ker(T ). c) closed under scalar multiplication: If x ∈ ker(T ) and k ∈ R, then kx ∈ ker(T ). As we will see in the next section, these three properties mean that ker(T ) is a subspace. Proof. a) T (0) = A0 = 0. b) Since T is linear, T (x1 + x2 ) = T (x1 ) + T (x2 ) = 0 + 0 = 0. c) Since T is linear, T (kx) = kT (x) = k0 = 0. 13 Theorem 61 (3.1.7). For an n × m matrix A, ker(A) = {0} if and only if rank(A) = m. Proof. The equation Ax = 0 always has a solution x = 0. This is the only solution if and only if there are no free variables (by Theorem 9), meaning that all m variables are leading variables, i.e. rank(A) = m. Theorem 62 (2.4.3, invertibility criteria). For an n × n matrix A, the following are equivalent: 1. A is invertible, 2. Ax = b has a unique solution x for any b, 3. rref(A) = In , 4. rank(A) = n, 5. im(A) = Rn , 6. ker(A) = {0}. Proof. The equivalence of 1-4 was established by Theorem 47. Statement 5 means that the linear system Ax = b is consistent for any b, which follows immediately from 2. 5 implies 4 by proving the contrapositive. Suppose 4 is false, so that rank(A) < n. We show that Then rref(A) | en has an inconsistent row, so rref(A)x = en has no solutions. Applying the steps of Gaussrref(A) | e Jordan elimination on A to the augmented matrix n , but in reverse order, yields an augmented matrix of the form A | b for some vector b, so Ax = b must also have no solutions. Thus 5 is false and the contrapositive is proven. The equivalence of 4 and 6 follows from the case n = m of the preceding theorem. 3.2 Subspaces of Rn ; Bases and Linear Independence Definition 63 (3.2.1). A subset W of a vector space Rn is called a subspace of Rn if it has the following three properties: a) contains zero vector: 0 ∈ W . b) closed under addition: If w1 , w2 ∈ W , then w1 + w2 ∈ W . c) closed under scalar multiplication: If w ∈ W and k ∈ R, then kw ∈ W . Property a is needed only to assure that W is nonempty. If W contains any vector w, then it also contains 0w = 0, by property c. Properties b and c are together equivalent to W being closed under linear combinations. Note 64 (3.2.2). We proved in the preceding section that, for a linear transformation T : Rm → Rn , ker(T ) is a subspace of Rm , while im(T ) is a subspace of Rn . Definition 65 (3.2.3). Let v1 , . . . , vm ∈ Rn . a) A vector vi in the list v1 , . . . , vm is redundant if it is a linear combination of the preceding vectors v1 , . . . , vi−1 . Note that v1 is redundant if and only if it equals 0, the unique linear combination of the empty set of vectors. 14 b) The vectors v1 , . . . , vm are called linearly independent (LI) if none of them are redundant. Otherwise, they are linearly dependent (LD). c) The vectors v1 , . . . , vm form a basis of a subspace V of Rn if they span V and are linearly independent. Theorem 66 (3.2.7, linear dependence criterion). The vectors v1 , . . . , vm ∈ Rn are linearly dependent if and only if there exists an nontrivial (linear) relation among them. Proof. Suppose v1 , . . . , vm are linearly dependent and let vi = c1 v1 + · · · + ci−1 vi−1 be a redundant vector in this list. Then we obtain a nontrivial relation by subtracting vi from both sides: c1 v1 + · · · + ci−1 vi−1 + (−1)vi = 0. Conversely, if there is a nontrivial relation c1 v1 + · · · + ci vi + · · · + cm vm = 0, where i is the highest index such that ci 6= 0, then we can solve for vi to show that vi is redundant: vi = − ci−1 c1 v1 − · · · − vi−1 . ci ci Thus the vectors v1 , . . . , vm are linearly dependent. Theorem 67 (3.2.8-9, linear independence criteria). For a list v1 , . . . , vm of vectors in Rn , the following statements are equivalent: 1. v1 , . . . , vm are linearly independent. 2. None of v1 , . . . , vm are redundant. 3. There are no nontrivial relations among v1 , . . . , vm , i.e. c1 v 1 + · · · + cm v m = 0 implies c1 = · · · = cm = 0. | · · · vm = {0}. | | | 5. rank v1 · · · vm = m. | | | 4. ker v1 | To prove that vectors are linearly independent, statement 3 is useful in an abstract setting, whereas 5 is convenient when the vectors are given concretely. Proof. Statement 2 is the definition of 1. The equivalence of 2 and 3 follows immediately from the preceding theorem. x1 | | .. There exists a nonzero vector x = . in the kernel of v1 · · · vm if and only if there is a | | xm corresponding nontrivial relation x1 v1 + · · · + xm vm = 0, so 3 is equivalent to 4. Theorem 61 implies that 4 and 5 are equivalent. Note 68. By the equivalence of 1 and 4 in the preceding theorem, and by Theorem 57, the column vectors v1 , . . . vm ∈ Rn of a matrix A are linearly independent if and only if ker(A) = {0}, which is equivalent to T (x) = Ax being one-to-one. 15 Theorem 69 (3.2.10, bases and unique representation). The vectors v1 , . . . , vm form a basis of a subspace V of Rn if and only if every vector v ∈ V can be expressed uniquely as a linear combination v = c1 v 1 + · · · + cm v m . Proof. Suppose v1 , . . . , vm is a basis of V ⊂ Rn and let v be any vector in V . Since v1 , . . . , vm span V , v can be expressed as a linear combination of v1 , . . . , vm . Suppose there are two such representations v = c1 v1 + · · · + cm vm , v = d1 v1 + · · · + dm vm . Subtracting the equations yields the linear relation 0 = (c1 − d1 )v1 + · · · + (cm − dm )vm . Since v1 , . . . , vm are linearly independent, this relation is trivial, meaning that c1 − d1 = · · · = cm − dm = 0, so ci = di for all i. Thus any two representations of v as a linear combination of the basis vectors are in fact identical, so the representation is unique. Conversely, suppose every vector v ∈ V can be expressed uniquely as a linear combination of v1 , . . . , vm . Applying this statement with v = 0, we see that 0v1 + · · · + 0vm = 0 is the only linear relation among v1 , . . . , vm , so these vectors are linearly independent. Since each v ∈ V is a linear combination of v1 , . . . , vm , these vectors span V . We conclude that v1 , . . . , vm form a basis for V . 3.3 The Dimension of a Subspace of Rn Theorem 70 (3.3.1). Let V be a subspace of Rn . If the vectors v1 , . . . , vp ∈ V are linearly independent, and the vectors w1 , . . . , wq ∈ V span V , then p ≥ q. Proof. Define matrices | A = w1 | ··· | wq | | and B = v1 | ··· | vp . | The vectors v1 , . . . , vp are in V = span(w1 , . . . , wq ) = im(A), so there exist u1 , . . . , up ∈ Rq such that v1 = Au1 , Combining these equations, we get | B = v1 · · · | ..., | | vp = A u1 | | | vp = Aup . ··· {z C | up , | } or B = AC. The kernel of C is a subset of the kernel of B, which equals {0} since v1 , . . . , vp are linearly independent, so ker(C) = {0} as well. By Theorem 61, rank(C) = p, and the rank must be less than or equal to the number of rows, so p ≤ q as claimed. Theorem 71 (3.3.2, number of vectors in a basis). All bases of a subspace V of Rn contain the same number of vectors. Proof. Let v1 , . . . , vp and w1 , . . . , wq be two bases of V . Since v1 , . . . , vp are linearly independent and w1 , . . . , wq span V , we have p ≤ q, by the preceding theorem. On the other hand, since v1 , . . . , vp span V and w1 , . . . , wq are linearly independent, we have p ≥ q, so in fact p = q. 16 Definition 72 (3.3.3). The number of vectors in a basis of a subspace V of Rn is called the dimension of V , denoted dim(V ). Note 73. It can easily be shown that the standard basis vectors e1 , . . . , en of Rn do in fact form a basis of Rn , so that, as would be expected, dim(Rn ) = n. Theorem 74 (3.3.4, size of linearly independent and spanning sets). Let V be a subspace of Rn with dim(V ) = m. a) Any linearly independent set of vectors in V contains at most m vectors. If it contains exactly m vectors, then it forms a basis of V . b) Any spanning set of vectors in V contains at least m vectors. If it contains exactly m vectors, then it forms a basis of V . Proof. a) Suppose v1 , . . . , vp are linearly independent vectors in V and w1 , . . . , wm form a basis of V . Then w1 , . . . , wm span V , so p ≤ m by Theorem 70. Now let v1 , . . . , vm be linearly independent vectors in V . To prove that v1 , . . . , vm form a basis of V , we must show that any vector v ∈ V is contained the span of v1 , . . . , vm . By what we have already shown, the m + 1 vectors v1 , . . . , vm , v must be linearly dependent. Since no vi is redundant in this list, v must be redundant, meaning that v is a linear combination of v1 , . . . , vm , as needed. b) Suppose v1 , . . . , vq span V and w1 , . . . , wm form a basis of V . Then w1 , . . . , wm are linearly independent, so q ≥ m by Theorem 70. Now let v1 , . . . , vm be vectors which span V . To prove that v1 , . . . , vm form a basis of V , we must show that they are linearly independent. We use proof by contradiction. Suppose that v1 , . . . , vm are linearly dependent, with some redundant vi , so that vi = c1 v1 + · · · + ci−1 vi−1 for some scalars c1 , . . . , ci−1 . In any linear combination v = d1 v1 + · · · + dm vm , we can substitute for vi to rewrite v as a linear combination of the other vectors: v = (d1 + di c1 )v1 + · · · + (di−1 + di ci−1 )vi−1 + di+1 vi+1 + · · · + dm vm . We conclude that the subspace V = span(v1 , . . . , vm ) of dimension m is in fact spanned by just m − 1 vectors, a contradiction. Procedure 75 (finding a basis of the kernel). The kernel of a matrix A consists of all solutions x to the equation Ax= 0. To find a basis of the kernel of A, solve Ax = 0, using Gauss-Jordan elimination to compute rref A | 0 = rref(A) | 0 , solving the resulting system of linear equations for the leading variables, and substituting parameters r, s, t, etc. for the free variables. Then write the general solution as a linear combination of constant vectors with the parameters as coefficients. These constant vectors form a basis for ker(A). Procedure 76 (3.3.5, finding a basis of the image). To obtain a basis of the image of A, take the columns of A corresponding to the columns of rref(A) containing leading 1’s. Definition 77. The nullity of a matrix A, written nullity(A), is the dimension of the kernel of A. 17 Theorem 78. For any matrix A, rank(A) = dim(im A). Proof. By Procedure 76, a basis of im(A) contains as many vectors as the number of leading 1’s in rref(A), which is the definition of rank(A). Theorem 79 (3.3.7, Rank-Nullity Theorem). For any n × m matrix A, dim(ker A) + dim(im A) = m. In terms of the linear transformation T : Rm → Rn given by T (x) = Ax, this can be written as dim(ker T ) + dim(im T ) = dim(Rm ). In terms of nullity and rank, we have nullity(A) + rank(A) = m. Proof. From Procedure 75, we know that a basis of ker(A) contains a vector for each free variable of A. From Procedure 76, we know that a basis of im(A) contains a vector for each leading variable of A. Since the number of free variables plus the number of leading variables equals the total number of variables m, the first equation holds. The final two equations then follow from the definitions and the preceding theorem. | Theorem 80 (3.3.10, invertibility criteria). For an n × n matrix A = v1 | equivalent: ··· | vn , the following are | 1. A is invertible, 2. Ax = b has a unique solution x for any b, 3. rref(A) = In , 4. rank(A) = n, 5. im(A) = Rn , 6. ker(A) = {0}, 7. v1 , . . . , vn span Rn , 8. v1 , . . . , vn are linearly independent, 9. v1 , . . . , vn form a basis of Rn . Proof. Statements 1-6 are equivalent by Theorem 62. Statements 5 and 7 are equivalent by Note 54. Statements 6 and 8 are equivalent by Note 68. Statements 7, 8, and 9 are equivalent by Theorem 74. 18 3.4 Coordinates Definition 81 (3.4.1). Consider a basis B = (v1 , v2 , . . . , vm ) of a subspace V of Rn . By Theorem 69, any vector x ∈ V can be written uniquely as x = c1 v1 + c2 v2 + · · · + cm vm . The scalars c1 , c2 , . . . , cm are called the B-coordinates of x, and c1 c2 [x]B = . .. cm is the B-coordinate vector of x. | · · · vm , then the relationship between x and [x]B is given by | c 1 | | x = c1 v1 + · · · + cm vm = v1 · · · vm ... = S[x]B . | | cm | Note 82. If we let S = SB = v1 | Note 83. For the standard basis E = (e1 , . . . , en ) of Rn , the E-coordinate vector of a vector x ∈ Rn is just x itself, since x1 .. x = . = x1 e1 + · · · + xn en . xn | In terms of the preceding note, SE = e1 | ··· | en = I, so that x = I[x]E = [x]E . | Theorem 84 (3.4.2, linearity of coordinates). If B is a basis of a subspace V of Rn , then for all x, y ∈ V and k ∈ R: a) [x + y]B = [x]B + [y]B , b) [kx]B = k[x]B . Proof. Let B = (v1 , . . . , vm ). a) If x = c1 v1 + · · · + cm vm and y = d1 v1 + · · · + dm vm , then x + y = (c1 + d1 )v1 + · · · + (cm + dm )vm , so that c1 + d 1 c1 d1 . . . .. [x + y]B = = .. + .. = [x]B + [y]B . cm + dm cm dm b) If x = c1 v1 + · · · + cm vm , then kx = kc1 v1 + · · · + kcm vm , so that kc1 c1 .. .. [kx]B = . = k . = k[x]B . kcm 19 cm Theorem 85 (3.4.3, B-matrix of a linear transformation). Consider a linear transformation T : Rn → Rn and a basis B = (v1 , . . . , vn ) of Rn . Then for any x ∈ Rn , the B-coordinate vectors of x and of T (x) are related by the equation [T (x)]B = B[x]B , where | B = [T (v1 )]B | ··· | [T (vn )]B , | the B-matrix of T . In other words, taking either path in the following diagram yields the same result (we say that the diagram commutes): T x = c1 v1 + · · · + cn vn −−−−−−−−−→ T (x) y y c1 B .. [x]B = . −−−−−−−−−→ [T (x)]B cm Proof. Write x as a linear combination x = c1 v1 + · · · + cn vn of the vectors in the basis B. We use the linearity of T to compute T (x) = T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ). Taking the B-coordinate vector of each side and using the linearity of coordinates, we get [T (x)]B = [c1 T (v1 ) + · · · + cn T (vn )]B = c1 [T (v1 )]B + · · · + cn [T (vn )]B c 1 | | = [T (v1 )]B · · · [T (vn )]B ... | | cn = B[x]B . Note 86. For the standard basis E = (e1 , . . . , en ) of Rn , the E-matrix of a linear transformation T : Rn → Rn given by T (x) = Ax is just A, the standard matrix of T . In terms of the preceding theorem, | | | | [T (v1 )]E · · · [T (vn )]E = T (v1 ) · · · T (vn ) = A. | | | | In this case, the diagram above becomes: T x = x1 e1 + · · · + xn en −−−−−−−−−→ T (x) x1 A .. [x]E = . −−−−−−−−−→ [T (x)]E xm 20 Theorem 87 (3.4.4, standard matrix and B-matrix). Consider a linear transformation T : Rn → Rn and a basis B = (v1 , . . . , vn ) of Rn . The standard matrix A of T and the B-matrix B of T are related by the equation | | AS = SB, where S = v1 · · · vn . | | The equation AS = SB can be solved for A or for B to obtain equivalent equations A = SBS −1 and B = S −1 AS. The relationship between A and B is illustrated by the following diagram: x x S A −−−−−−−−−→ T (x) x S B [x]B −−−−−−−−−→ [T (x)]B Proof. Applying Note 82 and Theorem 85, we compute T (x) = Ax = A(S[x]B ) = (AS)[x]B , and T (x) = S[T (x)]B = S(B[x]B ) = (SB)[x]B . Thus (AS)[x]B = (SB)[x]B for all [x]B ∈ Rn , which implies that AS = SB. Multiplying on the left of each side by S −1 , we get S −1 AS = B. Multiplying instead on the right of each side by S −1 , we get A = SBS −1 . (We know that S is invertible because its columns form a basis of Rn .) Definition 88 (3.4.5). Given two n × n matrices A and B, we say that A is similar to B, abbreviated A ∼ B, if there exists an invertible matrix S such that AS = SB or, equivalently, B = S −1 AS. Note 89. By Theorem 87, the standard matrix A of a linear transformation T : Rn → Rn is similar to the B-matrix B of T for any basis B of Rn . Theorem 90 (3.4.6). Similarity is an equivalence relation, which means that it satisfies the following three properties for any n × n matrices A, B, and C: a) reflexivity: A ∼ A. b) symmetry: If A ∼ B, then B ∼ A. c) transitivity: If A ∼ B and B ∼ C, then A ∼ C. Proof. a) A = IAI = I −1 AI. b) If A ∼ B, then there exists S such that B = S −1 AS. Multiplying on the left of each side by S and on the right of each side by S −1 , we get SBS −1 = A, or A = SBS −1 = (S −1 )−1 B(S −1 ), which shows that B ∼ A. c) If A ∼ B and B ∼ C, then there exists S such that B = S −1 AS and T such that C = T −1 BT . Substituting for B in the second equation yields C = T −1 (S −1 AS)T = (ST )−1 A(ST ), which shows that A ∼ C. 21 4 Linear Spaces 4.1 Introduction to Linear Spaces Definition 91 (4.1.1). A linear space V (more commonly known as a vector space) is a set V together with an addition rule and a scalar multiplication rule: • For f, g ∈ V , there is an element f + g ∈ V . • For f ∈ V and k ∈ R, there is an element kf ∈ V . which satisfy the following eight properties (for all f, g, h ∈ V and c, k ∈ R): 1. addition is associative: (f + g) + h = f + (g + h). 2. addition is commutative: f + g = g + f . 3. an additive identity exists (a neutral element): There is an element n ∈ V such that f + n = f for all f ∈ V . This n is unique and is denoted by 0. 4. additive inverses exist: For each f ∈ V , there exists a g ∈ V such that f + g = 0. This g is unique and is denoted by (−f ). 5. s.m. distributes over addition in V : k(f + g) = kf + kg. 6. s.m. distributes over addition in R: (c + k)f = cf + kf . 7. s.m. is “associative”: c(kf ) = (ck)f . 8. an “identity” exists for s.m.: 1f = f . Note 92. Vector spaces Rn and their subspaces W ⊂ Rn are examples of linear spaces. Linear spaces are generalizations of Rn . Using the addition and scalar multiplication operations, we can construct linear combinations, which then enable us to define the basic notions of linear algebra (which we have already defined for Rn ): subspace, span, linear independence, basis, dimension, coordinates, linear transformation, image, kernel, matrix of a transformation, etc. Note 93. Typically, both Rn and the linear spaces defined above are referred to as vector spaces. If it is necessary to draw a distinction, then the latter are called abstract vector spaces. When one is first learning linear algebra, this terminology is potentially confusing because the elements of most “abstract vector spaces” are not vectors in the traditional sense, but functions, or polynomials, or sequences, etc. Thus we will follow the text in speaking of vector spaces Rn and linear spaces V . Definition 94. • The set F (R, R) of all functions f : R → R (real-valued functions of the real numbers) is a linear space. • The set C ∞ of all functions f : R → R which can be differentiated any number of times (smooth functions) is a linear space. It includes all polynomials, exponential functions, sin(x), cos(x), etc. • The set P of all polynomials (with real coefficients) is a linear space. • The set Pn of all polynomials of degree ≤ n is a linear space. • The set Rn×m of n × m matrices with real coefficients is a linear space. Definition 95 (4.1.2). A subset W of a linear space V is called a subspace of V if it satisfies the following three properties: 22 a) contains neutral element: 0 ∈ W . b) closed under addition: If f, g ∈ W , then f + g ∈ W . c) closed under scalar multiplication: If f ∈ W and k ∈ R, then kf ∈ W . Theorem 96. A subspace W of a linear space V is itself a linear space. Proof. Property a guarantees that W contains the neutral element from V , which is property 3 of a linear space. For property 4 of a linear space, first note that for any f ∈ V , 0f = (0 + 0)f = 0f + 0f , and so 0 = 0f . It follows that we can write the additive inverse as −f = (−1)f , since f + (−1)f = 1f + (−1)f = (1 + (−1))f = 0f = 0. Thus if f ∈ W , we have that the element −f = (−1)f ∈ V is also in W , by property c above. Properties b and c imply that addition and scalar multiplication are well defined as operations within W. For properties 1-2 and 5-8 of a linear space, we simply note that all elements of W are also elements of V , so the properties hold automatically. Definition 97 (4.1.3). The terms span, redundant, linearly independent, basis, coordinates, and dimension are defined for linear spaces V just as for vector spaces Rn . In particular, for a basis B = (f1 , . . . , fn ) of a linear space V , any element f ∈ V can be written uniquely as a linear combination f = c1 f1 + · · · + cn fn of the vectors in B. The coefficients c1 , . . . , cn are called the coordinates of f and the vector c1 .. [f ]B = . cn is the B-coordinate vector of f . We define the B-coordinate transformation LB : V → Rn by c1 LB (f ) = [f ]B = ... . cn If the basis B is understood, then we sometimes write just L for LB . The B-coordinate transformation is invertible, with inverse c1 −1 . LB .. = c1 f1 + · · · + cn fn . cn −1 n It is easy to check that in fact L−1 B ◦ LB is the identity on V , and LB ◦ LB is the identity on R . n Note that the basis vectors f1 , . . . , fn for V and the standard basis vectors e1 , . . . , en for R are related by LB / e i ∈ Rn , fi ∈ V o L−1 B since 0 .. . LB (fi ) = L(0f1 + · · · + 1fi + · · · + 0fn ) = 1 = ei . . .. 0 23 Theorem 98 (4.1.4, linearity of the coordinate transformation LB ). If B is a basis of a linear space V with dim(V ) = n, then the B-coordinate transformation LB : V → Rn is linear. In other words, for all f, g ∈ V and k ∈ R, a) [f + g]B = [f ]B + [g]B , b) [kf ]B = k[f ]B . Proof. The proof is analogous to that of Theorem 84. Note 99. | v1 · · · | If V = Rn and B = (v1 , . . . , vn ), then L−1 : Rn → V = Rn has standard matrix SB = B | vn (encountered in the preceding section), so that | LB (x) = SB−1 x = [x]B . Theorem 100 (4.1.5, dimension). If a linear space V has a basis with n elements, then all bases of V consist of n elements, and we say that the dimension of V is n: dim(V ) = n. Proof. Consider two bases B = (f1 , . . . , fn ) and C = (g1 , . . . , gm ) of V . We first show that [g1 ]B , . . . , [gm ]B ∈ Rn are linearly independent, which will imply m ≤ n by Theorem 74. Suppose c1 [g1 ]B + · · · + cm [gm ]B = 0. By the preceding theorem, [c1 g1 + · · · + cm gm ]B = 0, so that c1 g1 + · · · + cm gm = 0. Since g1 , . . . , gm are linearly independent, c1 = · · · = cm = 0, as claimed. Similarly, we can show that [f1 ]C , . . . , [fn ]C are linearly independent, so that n ≤ m. We conclude that n = m. Definition 101 (4.1.8). Not every linear space has a (finite) basis. If we allow infinite bases, then every linear space does have a basis, but we will not define infinite bases in this course. A linear space with a (finite) basis is called finite dimensional. A linear space without a (finite) basis is called infinite dimensional. Procedure 102 (4.1.6, finding a basis of a linear space V ). 1. Write down a typical element of V in terms of some arbitrary constants (parameters). 2. Express the typical element as a linear combination of some elements of V , using the arbitrary constants as coefficients; these elements then span V . 3. Verify that the elements of V in this linear combination are linearly independent; if so, they form a basis of V . Theorem 103 (4.1.7, linear differential equations). The solutions of the differential equation f (n) (x) + an−1 f (n−1) (x) + · · · + a1 f 0 (x) + a0 f (x) = 0, with a0 , . . . , an−1 ∈ R, form an n-dimensional subspace of C ∞ . A differential equation of this form is called an nth-order linear differential equation with constant coefficients. Proof. This theorem is proven in section 9.3 of the text, which we will not cover in this course. 24 4.2 Linear transformations and isomorphisms Definition 104 (4.2.1). Let V and W be linear spaces. A function f : V → W is called a linear transformation if, for all f, g ∈ V and k ∈ R, T (f + g) = T (f ) + T (g) and T (kf ) = kT (f ). For a linear transformation T : V → W , we let im(T ) = {T (f ) : f ∈ V } and ker(T ) = {f ∈ V : T (f ) = 0}. Then im(T ) is a subspace of the target W and ker(T ) is a subspace of the domain V , so im(T ) and ker(T ) are each linear spaces. If the image of T is finite dimensional, then dim(im T ) is called the rank of T , and if the kernel of T is finite dimensional, then dim(ker T ) is called the nullity of T . Theorem 105 (Rank-nullity Theorem). If V is finite dimensional, then the Rank-Nullity Theorem holds: dim(V ) = dim(im T ) + dim(ker T ) = rank(T ) + nullity(T ). Proof. The proof is a series of exercises in the text. Definition 106 (4.2.2). An invertible linear transformation T is called an isomorphism (from the Greek for “same structure”). The linear space V is said to be isomorphic to the linear space W , written V ' W , if there exists an isomorphism T : V → W . Theorem 107 (4.2.3, coordinate transformations are isomorphisms). If B = (f1 , f2 , . . . , fn ) is a basis of a linear space V , then the B-coordinate transformation LB (f ) = [f ]B from V to Rn is an isomorphism. Thus V is isomorphic to Rn . Proof. We showed in the preceding section that LB : V → Rn is an invertible linear transformation: c1 LB / .. f = c1 f1 + · · · + cn fn in V [f ] = . in Rn . B o (LB )−1 cn Note 108. It follows from the preceding theorem that any n-dimensional vector space is isomorphic to Rn . From this perspective, finite dimensional linear spaces are just vector spaces in disguise. An n-dimensional linear space is really just Rn written in another “language.” Theorem 109 (4.2.4, properties of isomorphisms). Let T : V → W be a linear transformation. a) T is an isomorphism if and only if ker(T ) = {0} and im(T ) = W . (study only this part for the quiz) b) Assume V and W are finite dimensional. If any two of the following statements are true, then T is an isomorphism. If T is an isomorphism, then all three statements are true. i. ker(T ) = {0} ii. im(T ) = W iii. dim(V ) = dim(W ) 25 Proof. a) Suppose T is an isomorphism. If T (f ) = 0 for an element f ∈ V , then we can apply T −1 to each side to obtain T −1 (T (f )) = T −1 (0), or f = 0, so ker(T ) = {0}. To see that im(T ) = W , note that any g in W can be written as g = T (T −1 (g)) ∈ im(T ). Now suppose ker(T ) = {0} and im(T ) = W . To show that T is invertible, we must show that T (f ) = g has a unique solution f for each g. Since im(T ) = W , there is at least one solution. If f1 and f2 are two solutions, with T (f1 ) = g and T (f2 ) = g, then T (f1 − f2 ) = T (f1 ) − T (f2 ) = g − g = 0, so that f1 − f2 is in the kernel of T . Since ker(T ) = {0}, we have f1 − f2 = 0 and thus f1 = f2 . b) If i. and ii. hold, then we have shown in part (a) that T is an isomorphism. If i. and iii. hold, then dim(im T ) = dim(V ) − dim(ker T ) = dim(V ) − dim{0} = dim(W ) − 0 = dim(W ). We prove that im(T ) = W by contradiction. Suppose that there is some element g ∈ W which is not contained in im(T ). If g1 , . . . , gn form a basis of im(T ), then g ∈ / span(g1 , . . . , gn ) = im(T ), and so g is not redundant in the list of vectors g1 , . . . , gn , g, which are therefore linearly independent. Thus dim(W ) ≥ n + 1 > n = dim(im T ), contradicting our result dim(im T ) = dim(W ) from above. If ii. and iii. hold, then dim(ker T ) = dim(V ) − dim(im T ) = dim(V ) − dim(W ) = 0. The only subspace with dimension 0 is {0}, so ker(T ) = {0}. If T is an isomorphism, then i. and ii. hold by part (a). Statement iii. holds by the Rank-Nullity Theorem and part (a): dim(V ) = dim(ker T ) + dim(im T ) = dim{0} + dim(W ) = 0 + dim(W ) = dim(W ). Theorem 110. If W is a subspace of a finite dimensional linear space V and dim(W ) = dim(V ), then W =V. Proof. Define a linear transformation T : W → V by T (x) = x. Then ker(T ) = {0}, which together with the hypothesis dim(W ) = dim(V ) implies, by the preceding theorem, that im(T ) = V . It follows that every element of V is also an element of W . Theorem 111 (isomorphism is an equivalence relation). Isomorphism of linear spaces is an equivalence relation, which means that it satisfies the following three properties for any linear spaces V , W , and U : a) reflexivity: V ' V . b) symmetry: If V ' W , then W ' V . c) transitivity: If V ' W and W ' U , then V ' U . Proof. a) Any linear space V is isomorphic to itself via the identity transformation I : V → V defined by I(f ) = f , which is its own inverse. 26 b) If V ' W , then there exists an invertible linear transformation T : V → W . The inverse transformation T −1 : W → V is then an isomorphism from W to V , so W ' V . c) If V ' W and W ' U , then there exist invertible linear transformations T : V → W and S : W → U . Composing these transformations, we obtain (S ◦T ) : V → U , with inverse transformation (S ◦T )−1 = T −1 S −1 . Thus S ◦ T is an isomorphism and V ' U . 4.3 The Matrix of a Linear Transformation Definition 112 (4.3.1). Let V be an n-dimensional linear space with basis B, and let T : V → V be a linear transformation. The B-matrix B of T is defined to be the standard matrix of the linear transformation n n LB ◦ T ◦ L−1 B : R → R , so that Bx = LB (T (L−1 B (x))) for all x ∈ Rn : T V −−−−−−−−−→ V x LB y L−1 B B Rn −−−−−−−−−→ Rn If f = L−1 B (x), so that x = LB (f ) = [f ]B , then [T (f )]B = B[f ]B for all f ∈ V : T f LB y V −−−−−−−−−→ V LB y LB y T −−−−−−−−−→ T (f ) LB y B B x = [f ]B −−−−−−−−−→ [T (f )]B Rn −−−−−−−−−→ Rn Theorem 113 (4.3.2, B-matrix of a linear transformation). Let V be a linear space with basis B = (f1 , . . . , fn ), and let T : V → V be a linear transformation. Then the B-matrix of T is given by | | B = [T (f1 )]B · · · [T (fn )]B . | | The columns of B are the B-coordinate vectors of the transforms of the elements f1 , . . . , fn in the basis B. Proof. The ith column of B is Bei = B[fi ]B = [T (fi )]B . Definition 114 (4.3.3). Let V be an n-dimensional linear space with bases B and C, and let T : V → V be a linear transformation. The change of basis matrix from B to C, denoted by SB→C , is defined to be the n n standard matrix of the linear transformation LC ◦ L−1 B : R → R , so that SB→C x = LC (L−1 B (x)) for all x ∈ Rn : 27 n jjj4 RO j j j jj jjjj SB→C V jTTTT TTTT T T TT L−1 B Rn LC If f = L−1 B (x), so that x = LB (f ) = [f ]B , then [f ]C = SB→C [f ]B for all f ∈ V : n jj4 RO j j j jjj jjjj SB→C V TTTT TTTT T T TT* LB Rn j4 [f O ]C LC jjjj j j j jjj SB→C f TjTTT TTTT T T * LB x = [f ]B LC −1 −1 Note 115. The inverse matrix of SB→C is SC→B , the standard matrix of LB ◦ L−1 . C = (LC ◦ LB ) Theorem 116 (4.3.3, change of basis matrix). Let V be a linear space with two bases B = (f1 , . . . , fn ) and C. Then the change of basis matrix from B to C is given by | | SB→C = [f1 ]C · · · [fn ]C . | | The columns of SB→C are the C-coordinate vectors of the elements f1 , . . . , fn in the basis B. Proof. The ith column of SB→C is SB→C ei = SB→C [fi ]B = [fi ]C . Theorem 117 (4.3.4, change of basis in a subspace of Rn ). Consider a subspace V of Rn with two bases B = (f1 , . . . , fm ) and C = (g1 , . . . , gm ). Then SB = SC SB→C , or | | | | f1 · · · fm = g1 · · · gm SB→C , | | | | which is illustrated in the following diagrams: 4 Rm LC jjjj jjjjjj O j j j jtjjjj S C SB→C V ⊂ Rn jTTT TTTTTTSTB T TTTTTT T* m LB R jj [x]O C jjSjj j j j C jj SB→C x jTtjTTT TTTTSB TTTT [x]B In the case n = m, SB and SC become invertible, and we can solve for the change of basis matrix: SB→C = SC−1 SB . If, in addition, we take C to be the standard basis E = (e1 , . . . , en ) of Rn , then we get SB→E = SE−1 SB = ISB = SB . 28 −1 m Proof. By definition, SB→C x = LC (L−1 B (x)) for any x ∈ R , and we can apply LC to both sides to obtain −1 L−1 C (SB→C x) = LB (x). By Note 99, we can rewrite this as SC SB→C x = SB x. Since this holds for any x ∈ Rm , we conclude that SC SB→C = SB . When n = m, SB and SC are n × n matrices whose columns form bases, so they are invertible. Theorem 118 (4.3.5, change of basis for the matrix of a linear transformation). Consider a linear transformation T : V → V , where V is a finite dimensional linear space with two bases B and C. The relationship between the B-matrix B of T and the C-matrix C of T involves the change of basis matrix S = SB→C : CS = SB or C = SBS −1 or B = S −1 CS. The first equation comes from the outer rectangle in the following diagram. The two trapezoids and the two (identical) triangles are precisely the commutative diagrams already encountered in this section. RO n SB→C Rn eKK KK LC KK KK KK sV ss s ss ss LB yss C T B / Rn 9 ss O LC ss s ss ss s /V K SB→C KK KK KK LB KKK % / Rn [f ]C eK O KK KKLC KK KK K SB→C f s s s s sss ysss LB [f ]B C / [T (f )]C s9 O s ss s s ss / T (f ) SB→C KKK KKK LB KK% / [T (f )]B LC T B Proof. We prove that CSB→C = SB→C B. Intuitively, the large rectangle commutes because the two trapezoids and the two triangles inside of it commute. Algebraically, this amounts to the following (we write S for SB→C and x for [f ]B ): −1 CSx = (LC ◦ T ◦ L−1 C )((LC ◦ LB )(x)) −1 = (LC ◦ T ◦ (L−1 C ◦ LC ) ◦ LB )(x) = (LC ◦ T ◦ L−1 B )(x). Similarly, −1 SBx = (LC ◦ L−1 B )((LB ◦ T ◦ LB )(x)) −1 = (LC ◦ (L−1 B ◦ LB ) ◦ T ◦ LB )(x) = (LC ◦ T ◦ L−1 B )(x). Combining these results, CSx = SBx for all x ∈ Rn , so CS = SB as desired. Note 119. If V is a subspace of dimension m in the vector space Rn , then we can write matrices SB and SC in place of linear transformations LB and LC (provided that we change the direction of the corresponding arrows): Rm O KK KK SC KK KK K% SB→C V 9 ⊂ Rn ss ss s sss SB s m R C T B / Rm O s s SC ss s ss yss / V ⊂ Rn SB→C eKK KK KK K K SB K / Rm [x]C K O KK KKSC KK KK K% SB→C 9x s ss s s ss sss SB [x]B 29 C T B / [T (x)]C O s SC s s s s ysss / T (x) SB→C eKKK KKK K K SB / [T (x)]B Finally, if V = Rn and we take C to be the standard basis E = (e1 , . . . , en ) of Rn , then the E-matrix of T equals the standard matrix A of T , SE = I, SB→E = SB , and the picture simplifies to the following, where the outer rectangle gives the familiar formula ASB = SB B from Theorem 87: RO n KKK KKKKK KKKKKK KKKKKK SB V 9 = Rn s s s ss s s ss SB s Rn A T B / Rn ssss O s s s s ssssss ssssss / V = Rn SB eKKK KKK K SB KKK / Rn [x]E K O KKKKK KKKKKK KKKKKK KK SB 9x ss s s sss sss SB [x]B 30 A T B / [T (x)]E ss O s s ssss s s s sssss / T (x) SB eKKK KKK SB KK / [T (x)]B 5 Orthogonality and Least Squares 5.1 Orthogonal Projections and Orthonormal Bases Definition 120 (5.1.1). • Two vectors v, w ∈ Rn are called perpendicular or orthogonal if v · w = 0. • A vector x ∈ Rn is orthogonal to a subspace V ⊂ Rn if x is orthogonal to all vectors v ∈ V . Theorem 121. A vector x ∈ Rn is orthogonal to a subspace V ⊂ Rn with basis v1 , . . . , vm if and only if x is orthogonal to all of the basis vectors v1 , . . . , vm . Proof. If x is orthogonal to V , then x is orthogonal to v1 , . . . , vm by definition. Conversely, if x is orthogonal to v1 , . . . , vm , then any v ∈ V can be written as a linear combination v = c1 v1 + · · · + cm vm of basis vectors, from which it follows that x · v = x · (c1 v1 + · · · + cm vm ) = x · (c1 v1 ) + · · · + x · (cm vm ) = c1 (x · v1 ) + · · · + cm (x · vm ) = c1 (0) + · · · + cm (0) = 0, so x is orthogonal to v. Definition 122 (5.1.1). • The length (or magnitude or norm) of a vector v ∈ Rn is ||v|| = √ v · v. • A vector u ∈ Rn is called a unit vector if its length is 1 (i.e., ||u|| = 1 or u · u = 1). Theorem 123. • For any vectors v, w ∈ Rn and scalar k ∈ R, k(v · w) = (kv) · w = v · (kw) • If v 6= 0, then the vector u = of v. 1 ||v|| v and ||kv|| = |k| ||v|| . is a unit vector in the same direction as v, called the normalization Proof. • We compute (kv) · w = v · (kw) = n X i=1 n X i=1 (kv)i wi = vi (kw)i = n X i=1 n X i=1 kvi wi = k vi kwi = k n X i=1 n X vi wi = k(v · w), vi wi = k(v · w), i=1 which proves the first claim. We then use the definition of length to obtain √ √ p p ||kv|| = (kv) · (kv) = k 2 (v · v) = k 2 v · v = |k| ||v|| . 31 • To prove that the normalization u is a unit vector, we compute its length: 1 1 ||v|| = 1 ||v|| = 1. ||u|| = v = ||v|| ||v|| ||v|| Definition 124 (5.1.2). The vectors u1 , . . . , um ∈ Rn are called orthonormal if they are all unit vectors and all orthogonal to each other: ( 1 if i = j . ui · uj = 0 if i = 6 j Note 125. The standard basis vectors e1 , . . . , en of Rn are orthonormal. Theorem 126. Orthonormal vectors u1 , . . . , um are linearly independent. Proof. Consider a relation c1 u1 + · · · + ci ui + · · · + cm um = 0. Taking the dot product of each side with ui , we get (c1 u1 + · · · + ci ui + · · · + cm um ) · ui = 0 · ui = 0, which simplifies to c1 (u1 · ui ) + c2 (u2 · ui ) + · · · + ci (ui · ui ) + · · · + cm (um · ui ) = 0. Since all of the dot products are 0 except for ui · ui = 1, we have ci = 0. This is true for all i = 1, 2, . . . , m, so u1 , . . . , um are linearly independent. Theorem 127 (5.1.4, orthogonal projection). For any vector x ∈ Rn and any subspace V ⊂ Rn , we can write x = xk + x⊥ for some xk in V and x⊥ perpendicular to V , and this representation is unique. The vector projV (x) = xk is called the orthogonal projection of x onto V and is given by the formula projV (x) = xk = (u1 · x)u1 + · · · + (um · x)um for all x ∈ Rn , where (u1 , . . . , um ) is any orthonormal basis of V . The resulting orthogonal projection transformation projV : Rn → Rn is linear. Proof. Any potential projV (x) = xk ∈ V can be written as a linear combination xk = c1 u1 + · · · + ci ui + · · · + cm um of the basis vectors of V . Then x⊥ = x − xk = x − c1 u1 − · · · − ci ui − · · · − cm um is orthogonal to V if and only if it is orthogonal to all of the basis vectors ui ∈ V : 0 = ui · (x − c1 u1 − · · · − ci ui − · · · − cm um ) = ui · x − ui · (c1 u1 ) − · · · − ui · (ci ui ) − · · · − ui · (cm um ) = ui · x − c1 (ui · u1 ) − · · · − ci (ui · ui ) − · · · − cm (ui · um ) | {z } | {z } | {z } 0 1 = ui · x − ci . 32 0 Thus the unique solution has ci = ui · x for i = 1, . . . , m, which means that xk = (u1 · x)u1 + · · · + (um · x)um and x⊥ = x − (u1 · x)u1 − · · · − (um · x)um . For linearity, take x, y ∈ Rn and k ∈ R. Then x + y = (xk + x⊥ ) + (yk + y⊥ ) = (xk + yk ) + (x⊥ + y⊥ ), with xk + yk in V and x⊥ + y⊥ orthogonal to V , so projV (x + y) = xk + yk = projV (x) + projV (y). Similarly, kx = k(xk + x⊥ ) = kxk + kx⊥ , with kxk in V and kx⊥ orthogonal to V , so projV (kx) = kxk = k projV (x). Note 128. The orthogonal projection of x onto a subspace V ⊂ Rn is obtained by summing the orthogonal projections (ui · x)ui of x onto the lines spanned by the orthonormal basis vectors u1 , . . . , um of V . Theorem 129 (5.1.6, coordinates via orthogonal projection). For any orthonormal basis B = (u1 , . . . , un ) of Rn , x = (u1 · x)u1 + · · · + (un · x)un for all x ∈ Rn , so the B-coordinate vector of x is given by u1 · x [x]B = ... . un · x Proof. If V = Rn in Theorem 127, then clearly x = x + 0 is a decomposition of x with x in V and 0 orthogonal to V . Thus x = xk = projV (x) = (u1 · x)u1 + · · · + (un · x)un . Definition 130 (5.1.7). The orthogonal complement V ⊥ of a subspace V ⊂ Rn is the set of all vectors x ∈ Rn which are orthogonal to all vectors in v ∈ V : V ⊥ = {x ∈ Rn : v · x = 0 for all v ∈ V }. Theorem 131. For a subspace V ⊂ Rn , the orthogonal complement V ⊥ is the kernel of projV . The image of projV is V itself. Proof. Note that x ∈ V ⊥ if and only if x⊥ = x, i.e. projV (x) = xk = 0. Any vector in the image of projV is contained in V by definition. Conversely, for any v ∈ V , projV (v) = v, so v is in the image of projV . 33 Theorem 132 (5.1.8, properties of the orthogonal complement). Let V be a subspace of Rn . a) V ⊥ is a subspace of Rn . b) V ∩ V ⊥ = {0} c) dim(V ) + dim(V ⊥ ) = n d) (V ⊥ )⊥ = V Proof. a) By the preceding theorem, V ⊥ is the kernel of the linear transformation projV : Rn → Rn and is therefore a subspace of the domain Rn . b) Clearly 0 is contained in both V and V ⊥ . Any vector x in both V and V ⊥ is orthogonal to itself, so 2 that x · x = ||x|| = 0 and thus x = 0. c) Applying the Rank-Nullity Theorem to the linear transformation projV : Rn → Rn , we have n = dim(im(projV )) + dim(ker(projV )) = dim(V ) + dim(V ⊥ ). d) Note that V ⊂ (V ⊥ )⊥ because, for any v ∈ V and x ∈ V ⊥ , x · v = v · x = 0. By part (c), dim((V ⊥ )⊥ ) = n − dim(V ⊥ ) = n − (n − dim(V )) = dim(V ), so Theorem 110 implies that (V ⊥ )⊥ = V . Theorem 133 (5.1.9, Pythagorean Theorem). For two vectors x, y ∈ Rn , the equation 2 2 2 ||x + y|| = ||x|| + ||y|| holds if and only if x and y are orthogonal. Proof. We compute: 2 ||x + y|| = (x + y) · (x + y) = x · x + 2(x · y) + y · y 2 2 = ||x|| + 2(x · y) + ||y|| , 2 2 which equals ||x|| + ||y|| if and only if x · y = 0. Theorem 134 (5.1.10). For any vector x ∈ Rn and subspace V ⊂ Rn , ||projV (x)|| ≤ ||x|| , with equality if and only if x ∈ V . Proof. Since projV (x) = xk is orthogonal to x⊥ , we can apply the Pythagorean Theorem: 2 2 2 ||x|| = ||projV (x)|| + x⊥ . 2 2 2 It follows that ||projV (x)|| ≤ ||x|| and thus ||projV (x)|| ≤ ||x||. There is equality if and only if x⊥ = 0, or x⊥ = 0, which is equivalent to x ∈ V . 34 Theorem 135 (5.1.11, Cauchy-Schwarz inequality). For two vectors x, y ∈ Rn , |x · y| ≤ ||x|| ||y|| , with equality if and only if x and y are parallel. Proof. Let u = 1 ||y|| y be the normalization of y. Then projV (x) = (x · u)u for any x ∈ Rn , so by the preceding theorem, 1 1 ||x|| ≥ ||projV (x)|| = ||(x · u)u|| = |x · u| ||u|| = |x · u| = x · y = |x · y| . ||y|| ||y|| Multiplying each side by ||y||, we get ||x|| ||y|| ≥ |x · y| . x·y Definition 136 (5.1.12). By the Cauchy-Schwarz inequality, ||x||||y|| = angle between two nonzero vectors x, y ∈ Rn to be θ = arccos |x·y| ||x||||y|| ≤ 1, so we may define the x·y . ||x|| ||y|| With this definition, we have the formula x · y = ||x|| ||y|| cos θ for the dot product in terms of the lengths of two vectors and the angle between them. 5.2 Gram-Schmidt Process and QR Factorization Theorem 137 (5.2.1). For a basis v1 , . . . , vm of a subspace V ⊂ Rn , define subspaces Vj = span(v1 , . . . , vj ) ⊂ V for j = 0, 1, . . . , m. Note that V0 = span ∅ = {0} and Vm = V . Let vj⊥ be the component of vj perpendicular to the span Vj−1 of the preceding basis vectors: k vj⊥ = vj − vj = vj − projVj−1 (vj ). We can normalize the vj⊥ to obtain unit vectors: 1 uj = ⊥ vj⊥ . v j Then u1 , . . . , um form an orthonormal basis of V . Proof. In order to define uj , we must ensure that vj⊥ 6= 0. This holds because vj is not redundant in the list v1 , . . . , vj , and thus is not contained in Vj−1 . By definition, the vj are all orthogonal to each other. Thus the uj are too, since ! ! 1 1 1 ⊥ ⊥ ui · uj = ⊥ vi · ⊥ vj = ⊥ ⊥ (vi · vj ) = 0. vi vj vi vj Since u1 , . . . , um ∈ V are orthogonal unit vectors, they are linearly independent. Since dim(V ) = m, these vectors form an (orthonormal) basis for V . 35 Procedure 138 (5.2.1, Gram-Schmidt orthogonalization). We compute the orthonormal basis u1 , . . . , um of the preceding theorem by performing the following steps for j = 0, 1, . . . , m. 1. Let vj⊥ = vj − projVj−1 (vj ) = vj − (u1 · vj )u1 − · · · − (uj−1 · vj )uj−1 . 2. Let 1 uj = ⊥ vj⊥ . vj Note that v1⊥ = v1 − projV0 (v1 ) = v1 − 0 = v1 . Note 139. By Theorem 117, the change of basis matrix SB→C from the original basis B = (v1 , . . . , vm ) to the orthonormal basis C = (u1 , . . . , um ) satisfies the equation SB = SC SB→C , or | | | | v1 · · · vm = u1 · · · um SB→C . | {z } | | | | | {z } | {z } R M Q These matrices are customarily named M , Q, and R, as indicated above; the preceding equation is called the QR factorization of M . [x]C O SC jjjjj j j j j jj t j SB→C x jTTTT TTTT T TTT SB [x]B [x]C O Q jjjjj j j j j jj t j R x jTTTT TTTT T TTT M [x]B Theorem 140 (5.2.2, QR factorization). Let M be an n × m matrix with linearly independent columns v1 , . . . , vm . Then there exists an n × m matrix Q with orthonormal columns u1 , . . . , um and an upper triangular matrix R with positive diagonal entries such that M = QR. This representation unique, and rij = ui · vj is the form rjj = vj⊥ ): ⊥ v 1 0 R= . .. 0 for i ≤ j (the diagonal entries can alternately be written in u 1 ·⊥v2 v2 .. . 0 ··· ··· .. . ··· u1 · vm u2 · vm .. . . ⊥ v m Proof. We obtain u1 , . . . , um using Gram-Schmidt orthogonalization. To find the jth column of R, we express vj as a linear combination of u1 , . . . , uj : vj = projVj−1 (vj ) + vj⊥ r1j rj−1,j rjj z }| { z }| { z }| { = (u1 · vj ) u1 + · · · + (uj−1 · vj ) uj−1 + vj⊥ uj . | {z } | {z } projVj−1 (vj ) vj⊥ entry rjj of R can be alternately expressed by taking the dot product of uj with each side of The diagonal v⊥ uj = v⊥ to get j j ⊥ vj = uj · vj⊥ = uj · [vj − (u1 · vj )u1 − · · · − (uj−1 · vj )uj−1 ] = uj · vj . The uniqueness of the factorization is an exercise in the text. 36 5.3 Orthogonal Transformations and Orthogonal Matrices Definition 141 (5.3.1). A linear transformation T : Rn → Rn is called orthogonal if it preserves the length of vectors: ||T (x)|| = ||x|| , for all x ∈ Rn . If T (x) = Ax is an orthogonal transformation, we say that A is an orthogonal matrix. Theorem 142 (5.3.2, orthogonal transformations preserve orthogonality). Let T : Rn → Rn be an orthogonal linear transformation. If v, w ∈ Rn are orthogonal, then so are T (v), T (w). Proof. We compute: 2 2 ||T (v) + T (w)|| = ||T (v + w)|| 2 = ||v + w|| 2 2 = ||v|| + ||w|| 2 2 = ||T (v)|| + ||T (w)|| , so T (v) is orthogonal to T (w) by the Pythagorean Theorem. Note 143. In fact, orthogonal transformations preserve all angles, not just right angles: the angle between two nonzero vectors v, w ∈ Rn equals the angle between T (v), T (w). This is a homework problem. Theorem 144 (orthogonal transformations preserve the dot product). A linear transformation T : Rn → Rn is orthogonal if and only if T preserves the dot product: v · w = T (v) · T (w) for all v, w ∈ Rn . Proof. Suppose T is orthogonal. Then T preserves the length of v + w, so 2 2 ||T (v + w)|| = ||v + w|| (T (v) + T (w)) · (T (v) + T (w)) = (v + w) · (v + w) T (v) · T (v) + 2T (v) · T (w) + T (w) · T (w) = v · v + 2v · w + w · w 2 2 2 2 ||T (v)|| + 2T (v) · T (w) + ||T (w)|| = ||v|| + 2v · w + ||w|| 2T (v) · T (w) = 2v · w T (v) · T (w) = v · w, where we have used that ||T (v)|| = ||v|| and ||T (w)|| = ||w||. Conversely, suppose T preserves the dot product. Then 2 2 ||T (v)|| = T (v) · T (v) = v · v = ||v|| , so ||T (v)|| = ||v||, and T is orthogonal. Theorem 145 (5.3.3, orthogonal matrices and orthonormal bases). An n × n matrix A is orthogonal if and only if its columns form an orthonormal basis of Rn . Proof. Define T (x) = Ax and recall that | A = T (e1 ) · · · | 37 | T (en ) . | Suppose A, and hence T , are orthogonal. Because e1 , . . . , en are orthonormal, their images T (e1 ), . . . , T (en ) are also orthonormal, since T preserves length and orthogonality. By Theorem 126, T (e1 ), . . . , T (en ) are linearly independent. Since dim(Rn ) = n, the columns T (e1 ), . . . , T (en ) of A form an (orthonormal) basis of Rn . Conversely, suppose T (e1 ), . . . , T (en ) form an orthonormal basis of Rn . Then for any x = x1 e1 + · · · + xn en ∈ Rn , 2 2 ||T (x)|| = ||x1 T (e1 ) + · · · + xn T (en )|| 2 2 = ||x1 T (e1 )|| + · · · + ||xn T (en )|| = (|x1 | ||T (e1 )||)2 + · · · + (|xn | ||T (en )||)2 = x21 + · · · + x2n 2 = ||x|| , where the second equals sign follows from the Pythagorean Theorem. Then ||T (x)|| = ||x|| and T and A are orthogonal. Theorem 146 (5.3.4, products and inverses of orthogonal matrices). a) The product AB of two orthogonal n × n matrices A and B is orthogonal. b) The inverse A−1 of an orthogonal n × n matrix A is orthogonal. Proof. a) The linear transformation T (x) = (AB)x is orthogonal because ||T (x)|| = ||A(Bx)|| = ||Bx|| = ||x|| . b) The linear transformation T (x) = A−1 x is orthogonal because ||T (x)|| = A−1 x = A(A−1 x) = ||Ix|| = ||x|| . Definition 147 (5.3.5). For an m × n matrix A, the transpose AT of A is the n × m matrix whose ijth entry is the jith entry of A: [AT ]ij = Aji . The rows of A become the columns of AT , and the columns of A become the rows of AT . A square matrix A is symmetric if AT = A and skew-symmetric if AT = −A. Note 148 (5.3.6). If v and w are two (column) vectors in Rn , then v · w = vT w. (Here we choose to ignore the difference between a scalar a and the 1 × 1 matrix a ). Theorem 149 (5.3.7, transpose criterion for orthogonal matrices). An n × n matrix A is orthogonal if and only if AT A = In or, equivalently, if A has inverse A−1 = AT . 38 Proof. If | A = v1 | | v2 | | vn , | ··· then we compute − − AT A = − v1T v2T .. . vnT − | − v1 | − | v2 | v1 · v1 | v2 · v1 vn = . .. | vn · v1 ··· v1 · v2 v2 · v2 .. . ··· ··· .. . v1 · vn v2 · vn .. . . vn · v2 ··· vn · vn This product equals In if and only if the columns of A are orthonormal, which is equivalent to A being an orthogonal matrix. If A is orthogonal, it is also invertible, since its columns form an (orthonormal) basis of Rn . Thus T A A = In is equivalent to A−1 = AT by simple matrix algebra. Theorem 150 (5.3.8, summary: orthogonal matrices). For an n×n matrix A, the following statements are equivalent: 1. A is an orthogonal matrix. 2. ||Ax|| = ||x|| for all x ∈ Rn . 3. The columns of A form an orthonormal basis of Rn . 4. AT A = In . 5. A−1 = AT . Proof. See Definition 141 and Theorems 145 and 149. Theorem 151 (5.3.9, properties of the transpose). a) If A is an n × p matrix and B a p × m matrix (so that AB is defined), then (AB)T = B T AT . b) If an n × n matrix A is invertible, then so is AT , and (AT )−1 = (A−1 )T . c) For any matrix A, rank(A) = rank(AT ). Proof. a) We check that the ijth entries of the two matrices are equal: [(AB)T ]ij = [AB]ji = (jth row of A) · (ith column of B), [B T AT ]ij = (ith row of B T ) · (jth column of AT ) = (ith column of B) · (jth row of A). b) Taking the transpose of both sides of AA−1 = I and using part (a), we get (AA−1 )T = (A−1 )T AT = I. Similarly, A−1 A = I implies (A−1 A)T = AT (A−1 )T = I. We conclude that the inverse of AT is (A−1 )T . 39 c) Suppose A has n columns. Since the vectors in the kernel of A are precisely those vectors in Rn which are orthogonal to all of the rows of A, and hence to the span of the rows of A, span(rows of A) = (ker A)⊥ . By the Rank-Nullity Theorem, together with its corollary in Theorem 132 part (c), rank(AT ) = dim(im AT ) = dim(span(columns of AT )) = dim(span(rows of A)) = dim((ker A)⊥ ) = n − dim(ker A) = dim(im A) = rank(A). − w1 .. Theorem 152 (invertibility criteria involving rows). For an n × n matrix A = . − wn following are equivalent: − , the − 1. A is invertible, 2. w1 , . . . , wn span Rn , 3. w1 , . . . , wn are linearly independent, 4. w1 , . . . , wn form a basis of Rn . Proof. By the preceding theorem, A is invertible if and only if AT is invertible. Statements 2-4 are just the last three invertibility criteria in Theorem 80 applied to AT , since the columns of AT are the rows of A. Theorem 153 (column-row definition of matrix multiplication). Given matrices − w1 − | | .. A = v1 · · · vm and B = , . | | − wm − with v1 , . . . , vm , w1 , . . . , wm ∈ Rn , think of the vi as n × 1 matrices and the wi as 1 × n matrices. Then the product of A and B can be computed as a sum of m n × n matrices: AB = v1 w1 + · · · + vm wm = m X vi wi . i=1 Proof. [v1 w1 + · · · + vm wm ]ij = [v1 w1 ]ij + · · · + [vm wm ]ij = [v1 ]i [w1 ]j + · · · + [vm ]i [wm ]j = Ai1 B1j + · · · + Aim Bmj = [AB]ij 40 Theorem 154 (5.3.10, matrix of an orthogonal projection). Let V be a subspace of Rn with orthonormal basis u1 , . . . , um . Then the matrix of the orthogonal projection onto V is | | QQT , where Q = u1 · · · um . | | Proof. We know from Theorem 127 that, for x ∈ Rn , projV (x) = (u1 · x)u1 + · · · + (um · x)um . If we view the vector ui as an n × 1 matrix and the scalar ui · x as a 1 × 1 matrix, we can write projV (x) = u1 (u1 · x) + · · · + um (um · x) = u1 (uT1 x) + · · · + um (uTm x) = (u1 uT1 x + · · · + um uTm )x − uT 1 | | .. = u1 · · · um . | | − uTm − x − T = QQ x. The second to last equals sign follows from the preceding theorem. 5.4 Least Squares and Data Fitting Theorem 155 (5.4.1). For any matrix A, (im A)⊥ = ker(AT ). Proof. Let | A = v1 | ··· | vm , | so that − v1T .. AT = . T − vm − , − and recall that im(A) = span(v1 , . . . , vm ). Then (im A)⊥ = {x ∈ Rn : v · x = 0 for all v ∈ im(A)} = {x ∈ Rn : vi · x = 0 for i = 1, . . . , m} = ker(AT ). Theorem 156 (5.4.2). a) If A is an n × m matrix, then ker(A) = ker(AT A). b) If A is an n × m matrix with ker(A) = {0}, then AT A is invertible. Proof. 41 a) If Ax = 0, then AT Ax = 0, so ker(A) ⊂ ker(AT A). Conversely, if AT Ax = 0, then Ax ∈ ker(A) and Ax ∈ im(AT ) = (ker A)⊥ , so Ax = 0 by Theorem 132 part (b). Thus ker(AT A) ⊂ ker(A). b) Since AT A is an m × m matrix and, by part (a), ker(AT A) = {0}, AT A is invertible by Theorem 47. Theorem 157 (5.4.3, alternative characterization of orthogonal projection). Given a vector x ∈ Rn and a subspace V ⊂ Rn , the orthogonal projection projV (x) is the vector in V closest to x, i.e., ||x − projV (x)|| < ||x − v|| for all v ∈ V not equal to projV (x). Proof. Note that x − projV (x) = x⊥ ∈ V ⊥ , while projV (x) − v ∈ V , so the two vectors are orthogonal. We can therefore apply the Pythagorean Theorem: 2 2 2 ||x − projV (x)|| + ||projV (x) − v|| = ||x − projV (x) + projV (x) − v|| 2 = ||x − v|| . 2 2 2 This implies that ||x − projV (x)|| < ||x − v|| unless ||projV (x) − v|| = 0, i.e. projV (x) = v. Definition 158 (5.4.4). Let A be an n × m matrix. Then a vector x∗ ∈ Rm is called a least-squares solution of the system Ax = b if the distance between Ax∗ and b is as small as possible: ||b − Ax∗ || ≤ ||b − Ax|| for all x ∈ Rm . Note 159. The vector x∗ is called a “least-squares solution” because it minimizes the sum of the squares of the components of the “error” vector b − Ax. If the system Ax = b is consistent, then the least-squares solutions x∗ are just the exact solutions, so that the error b − Ax∗ = 0. Theorem 160 (5.4.5, the normal equation). The least-squares solutions of the system Ax = b are the exact solutions of the (consistent) system AT Ax = AT b, which is called the normal equation of Ax = b. Proof. We have the following chain of equivalent statements: The vector x∗ is a least-squares solution of the system Ax = b ⇐⇒ ||b − Ax∗ || ≤ ||b − Ax|| for all x ∈ Rm ⇐⇒ Ax∗ = proj(im A) (b) ⇐⇒ b − Ax∗ ∈ (im A)⊥ = ker(AT ) ⇐⇒ AT (b − Ax∗ ) = 0 ⇐⇒ AT Ax∗ = AT b. 42 Theorem 161 (5.4.6, unique least-squares solution). If ker(A) = 0, then the linear system Ax = b has the unique least-squares solution x∗ = (AT A)−1 AT b. Proof. By Theorem 156 part (b), the matrix AT A is invertible. Multiplying each side of the normal equation on the left by (AT A)−1 , we obtain the result. Theorem 162 (5.4.7, matrix of an orthogonal projection). Let v1 , . . . , vm be any basis of a subspace V ⊂ Rn , and set | | A = v1 · · · vm . | | Then the matrix of the orthogonal projection onto V is A(AT A)−1 AT . Proof. Let b be any vector in Rn . If x∗ is a least-squares solution of Ax = b, then Ax∗ is the projection onto V = im(A) of b. Since the columns of A are linearly independent, ker(A) = {0}, so we have the unique least-squares solution x∗ = (AT A)−1 AT b. Multiplying each side by A on the left, we get projV (b) = Ax∗ = A(AT A)−1 AT b. Note 163. If v1 , . . . , vm form an orthonormal basis, then AT A = Im and the formula for the matrix of an orthogonal projection simplifies to AAT , as in Theorem 154. 5.5 Inner Product Spaces Definition 164 (5.5.1). An inner product on a linear space V is a rule that assigns a scalar, denoted hf, gi, to any pair f, g of elements of V , such that the following properties hold for all f, g, h ∈ V and all c ∈ R: 1. symmetry: hf, gi = hg, f i 2. preserves addition: hf + h, gi = hf, gi + hh, gi 3. preserves scalar multiplication: hcf, gi = c hf, gi 4. positive definiteness: hf, f i > 0 for all nonzero f ∈ V . A linear space endowed with an inner product is called an inner product space. Note 165. Properties 2 and 3 state that, for a fixed g ∈ V , the transformation T : V → R given by T (f ) = hf, gi is linear. Definition 166. We list some examples of inner product spaces: 43 • Let C[a, b] be the linear space of continuous functions from the interval [a, b] to R. Then Z hf, gi = b f (t)g(t) dt a defines an inner product on C[a, b]. • Let `2 be the linear space of all “square-summable” infinite sequences, i.e., sequences x = (x0 , x1 , x2 , . . . , xn , . . .), such that P∞ i=0 x2i = x20 + x21 + · · · converges. Then hx, yi = ∞ X xi yi = x0 y0 + x1 y1 + · · · i=0 defines an inner product on `2 . • The trace of a square matrix A, denoted tr(A), is the sum of its diagonal entries. An inner product on the linear space Rn×m of all n × m matrices is given by hA, Bi = tr(AT B). Definition 167 (5.5.2). • The norm (or magnitude) of an element f of an inner product space is p ||f || = hf, f i. • Two elements f, g of an inner product space are called orthogonal (or perpendicular) if hf, gi = 0. • The distance between two elements of an inner product space is defined to be the norm of their difference: dist(f, g) = ||f − g|| . • The angle θ between two elements f, g of an inner product space is defined by the formula hf, gi −1 θ = cos . ||f || ||g|| Theorem 168 (5.5.3, orthogonal projection). If V is an inner product space with finite dimensional subspace W , then the orthogonal projection projW (f ) of an element f ∈ V onto W is defined to be the unique element of W such that f − projW (f ) is orthogonal to W . Alternately, projW (f ) is the element of W which minimizes dist(f, projW (f )) = ||f − projW (F )|| . If g1 , . . . , gm is an orthonormal basis of W , then projW (f ) = hg1 , f i g1 + · · · + hgm , f i gm for all f ∈ V . 44 Definition 169. Consider the inner product 1 hf, gi = π Z π f (t)g(t) dt −π on the linear space C[−π, π] of continuous functions on the interval [−π, π]. For each positive integer n, define the subspace Tn of C[−π, π] to be Tn = span(1, sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt)). Then Tn consists of all functions of the form f (t) = a + b1 sin(t) + c1 cos(t) + · · · + bn sin(nt) + cn cos(nt), called trigonometric polynomials of order ≤ n. Theorem 170 (5.5.4, an orthonormal basis of Tn ). The functions 1 √ , sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt) 2 form an orthonormal basis of Tn . Proof. By the “Euler identities,” we obtain Z 1 π hsin(pt), cos(mt)i = sin(pt) cos(mt) dt = 0, for all integers p, m, π −π Z 1 π sin(pt) sin(mt) dt = 0, for distinct integers p, m, hsin(pt), sin(mt)i = π −π Z 1 π hcos(pt), cos(mt)i = cos(pt) cos(mt) dt = 0, for distinct integers p, m. π −π (Note that 1 = cos(0t).) Thus the functions 1, sin(t), cos(t), sin(2t), cos(2t), . . . , sin(nt), cos(nt) are orthogonal to each other, and hence linearly independent. Since they clearly span Tn , they form a basis for Tn . To obtain an orthonormal basis for Tn , we normalize the vectors. Since s Z √ 1 π ||1|| = 1 dt = 2, π −π s Z 1 π ||sin(mt)|| = sin2 (mt) dt = 1, π −π s Z 1 π cos2 (mt) dt = 1, ||cos(mt)|| = π −π √ we need only replace the function 1 by 1/ ||1|| = 1/ 2. Theorem 171 (5.5.5, Fourier coefficients). The best approximation (in the “continuous least-squares” sense) of f ∈ C[−π, π] by function in the subspace Tn is fn (t) = projTn f (t) 1 = a0 √ + b1 sin(t) + c1 cos(t) + · · · + bn sin(t) + cn cos(t), 2 45 where Z π 1 =√ f (t) dt, 2π −π Z 1 π f (t) sin(kt) dt, bk = hf (t), sin(kt)i = π −π Z 1 π ck = hf (t), cos(kt)i = f (t) cos(kt) dt. π −π a0 = 1 f (t), √ 2 The a0 , bk , ck are called the Fourier coefficients of f , and the function fn is called the nth-order Fourier approximation of f . Proof. This is a direct application of the formula in Theorem 168 for the orthogonal projection of f ∈ C[−π, π] onto the subspace Tn ⊂ C[−π, π]. Note 172. By the Pythagorean Theorem, 2 1 2 2 2 2 2 ||fn || = a0 √ + ||b1 sin(t)|| + ||c1 cos(t)|| + · · · + ||bn sin(nt)|| + ||cn cos(nt)|| 2 = a20 + b21 + c21 + · · · + b2n + c2n . Theorem 173 (5.5.6, behavior of fn as n → ∞). As we take higher and higher order approximations fn of a function f ∈ C[−π, π], the error approaches zero: lim ||f − fn || = 0. n→∞ Thus lim ||fn || = ||f || , n→∞ and, combining this fact with the preceding note, 2 a20 + b21 + c21 + · · · + b2n + c2n + · · · = ||f || . Proof. The first equality is proven using advanced calculus. The second one follows from the first, since 2 2 2 ||f − fn || + ||fn || = ||f || 2 by the Pythagorean Theorem. For the final equality, we use the preceding note to substitute for ||fn || in 2 2 lim ||fn || = ||f || . n→∞ Note 174. Applying the final equation of Theorem 173 to the function f (t) = t in C[−π, π], we obtain Z 4 4 4 1 π 2 2 2 t dt = π 2 , 4 + + + · · · + 2 + · · · = ||t|| = 4 9 n π −π 3 or, ∞ X 1 4 4 4 π2 =1+ + + + ··· = , 2 n 4 9 16 6 n=1 46 i.e., v u ∞ u X 1 π = t6 . n2 n=1 6 6.1 Determinants Introduction to Determinants Note 175. A 2 × 2 matrix A= a c b d is invertible if and only if det(A) = ad − bc 6= 0. The geometric reason for this is that (the absolute value of) the determinant measures the area of the parallelogram spanned by the columns of A. In particular, a 0 b a b = |ad − bc| = |det(A)| . 0 area of parallelogram spanned by and = c × d = c d 0 ad − bc 0 Thus, we have the following chain of equivalent statements: a b A is invertible ⇐⇒ the columns and of A are linearly independent c d a b ⇐⇒ the area of the parallelogram spanned by and is nonzero c d ⇐⇒ det(A) = ad − bc 6= 0. Definition 176 (6.1.1). Consider the 3 × 3 | A = u | matrix u1 | | v w = u2 | | u3 v1 v2 v3 w1 w2 . w3 We define the determinant of A to be the volume of the parallelepiped spanned by the column vectors u, v, w of A, namely det(A) = u · (v × w), also known as the “triple product” of u, v, w. Then, as in the 2 × 2 case, A is invertible ⇐⇒ the columns u, v, w of A are linearly independent ⇐⇒ the volume of the parallelepiped spanned by u, v, w is nonzero ⇐⇒ det(A) 6= 0. In terms of the entries of A, u1 v1 w1 u1 v2 w3 − v3 w2 det(A) = u2 · v2 × w2 = u2 · v3 w1 − v1 w3 u3 v3 w3 u3 v1 w2 − v2 w1 = u1 (v2 w3 − v3 w2 ) + u2 (v3 w1 − v1 w3 ) + u3 (v1 w2 − v2 w1 ) = u1 v2 w3 − u1 v3 w2 + u2 v3 w1 − u2 v1 w3 + u3 v1 w2 − u3 v2 w1 . Note 177. In the final expression above for det(A), note that each term contains exactly one entry from each row and each column of A. We have written the terms so that u, v, w always occur in the same order; only the indices change. In fact, the indices occur once in each of the 3! = 6 possible permutations. The sign on each term in the determinant formula is determined by how many pairs of indices are “out of order,” or “inverted,” in the corresponding permutation of the numbers 1, 2, 3. If the number of “inversions” is even, then the sign of the term is positive, and if odd, then negative. For example, in the permutation 3, 1, 2, the pair of numbers 1, 3 is inverted, as is the pair 2, 3. In term of the entries of the matrix A, this can be visualized by noting that v1 and w2 are both above and to the right of u3 . Since the number of inversions is even, the term u3 v1 w2 occurs with a positive sign. Armed with this insight, we can define the determinant of a general n × n matrix. 47 Definition 178 (6.1.3). • A pattern in an n × n matrix A is a choice of n entries of the matrix so that one entry is chosen in each row and in each column of A. The product of the entries in a pattern P is denoted prod P . • Two entries in a pattern are said to be inverted if one of them is above and to the right of the other in the matrix. • The signature of a pattern P is sgn P = (−1)number of inversions in P . • The determinant of A is then defined to be X det A = (sgn P )(prod P ), where the sum is taken over all n! patterns P in the matrix. Note 179. If we separate the positive terms from the negative terms, we can write: det A = X patterns P with an even # of inversions Note 180. For a 2 × 2 matrix prod P − X patterns P with an odd # of inversions a A= c prod P . b , d there are two patterns, with products ad and bc. They have 0 and 1 inversions, respectively, so det A = (1−)0 ad + (−1)1 bc = ad − bc, as expected. Theorem 181 (6.1.4, determinant of a triangular matrix). The determinant of an upper or lower triangular matrix is the product of the diagonal entries of the matrix. In particular, the determinant of a diagonal matrix is the product of its diagonal entries. Proof. For an upper triangular n × n matrix A, a pattern with nonzero product must contain a11 , and thus a22 , . . . , and finally ann , so there is only one pattern with potentially nonzero product. This diagonal pattern has no inversions, so its product is equal to det A. The proof for a lower triangular matrix is analogous, and a diagonal matrix is a special case. 6.2 Properties of the Determinant Theorem 182 (6.2.1, determinant of the transpose). For any square matrix A, det(AT ) = det A. Proof. Every pattern in A corresponds to a (transposed) pattern in AT with the same product and number of inversions. Thus the determinants are equal. 48 Theorem 183 (6.2.2, linearity of the determinant in each row and column). Let w1 , . . . , wi−1 , wi+1 , . . . , wn ∈ Rn be fixed row vectors. Then the function T : R1×n → R given by − w1 − .. . − vi−1 − x − T (x) = det − − vi+1 − .. . − vn − is a linear transformation. We say that the determinant is linear in the ith row. Similarly, let v1 , . . . , vi−1 , vi+1 , . . . , vn ∈ Rn be fixed column vectors. Then the function T : Rn×1 → R given by | | | | | T (x) = det v1 · · · vi−1 x vi+1 · · · vn | | | | | is a linear transformation. We say that the determinant is linear in the ith column. Proof. The product prod P of a pattern P is linear in each row and column because it contains exactly one factor from each row and one from each column. Since the determinant is a linear combination of pattern products, it is linear in each row and column as well. Note 184. The preceding theorem states that T (x + y) = T (x) + T (y) and T (kx) = kT (x), or, for linearity in a row, − v1 − − v1 − − v1 − .. .. .. . . . = det − x − + − x − and − x + y − det .. .. .. . . . − vn − − vn − − vn − − v1 − − v1 − .. .. . . det − kx − = k det − x − . .. .. . . − vn − − vn − Theorem 185 (6.2.3, elementary row operations and determinants). The elementary row operations have the following effects on the determinant of a matrix. a) If B is obtained from A by a row swap, then det B = − det A. b) If B is obtained from A by dividing a row of A by a scalar k, then det B = 1 det A. k c) If B is obtained from A by adding a multiple of a row of A to another row, then det B = det A. 49 Proof. a) Row swap: Each pattern P in A corresponds to a pattern Pswap in B involving the same numbers. If adjacent rows are swapped, then the number of inversions changes by exactly 1. Swapping any two rows amounts to an odd number of swaps of adjacent rows, so the total change in the number of inversions is odd. Thus sgn Pswap = − sgn P for each pattern P of A, which implies X X X det B = (sgn Pswap )(prod Pswap ) = (− sgn P )(prod P ) = − (sgn P )(prod P ) = − det A. P P P b) Row division: This follows immediately from linearity of the determinant in each row. c) Row addition: Suppose B is obtained by adding k times the ith row of A to the jth row of A. Then .. .. .. . . . − vi − − vi − − v − i .. .. .. det B = det , + k det = det . . . − vi − − vj − − vj + kvi − .. .. .. . . . by linearity of the determinant in the jth row. Note that the final matrix C has two equal rows. If we swap the two rows, the result is again C, so that det C = − det C (by part (a)) and det C = 0. Thus det B = det A. Procedure 186 (6.2.4, using ERO’s to compute the determinant). Use ERO’s to reduce the matrix A to a matrix B for which the determinant is known (for example, use GJE to obtain B = rref(A)). If you have swapped rows s times and divided rows by the scalars k1 , k2 , . . . , kr to get from A to B, then det A = (−1)s k1 k2 · · · kr (det B). Note 187. Since det(A) = det(AT ), elementary column operations (ECO’s) may also be used to compute the determinant. This is because performing an ECO on A is equivalent to first applying the corresponding ERO to AT , and then taking the transpose once again. Theorem 188 (6.2.6, determinant of a product). If A and B are n × n matrices, then det(AB) = (det A)(det B). Proof. • Suppose first that A is not invertible. Then im(AB) ⊂ im(A) 6= Rn , so AB is also not invertible. Thus (det A)(det B) = 0(det B) = 0 = det(AB). • If A is invertible, we begin by showing that rref A | AB = In | B . It is clear that rref A | AB = rref(A) | C = In | C for some matrix C. We can associate to A | AB a matrix equation AX = AB, where X is a variable n × n matrix. Multiplying each side by A−1 , we see that the unique solution is X = B. When we apply elementary row operations to 50 A | AB , the set of solutions of the corresponding matrix equation does not change. Thus the matrix equation In X = C also has unique solution X = B, and B = C as needed. A | AB . ConsidSuppose we swap rows s times and divide rows by k1 , k2 , . . . , kr in computing rref ering the left and right halves of A | AB separately, and using Procedure 186, we conclude that det A = (−1)s k1 k2 · · · kr and det(AB) = (−1)s k1 k2 · · · kr (det B) = (det A)(det B). Theorem 189 (6.2.7, determinants of similar matrices). If A is similar to B, then det A = det B. Proof. By definition, there exists an invertible matrix S such that AS = SB. By the preceding theorem, (det A)(det S) = det(AS) = det(SB) = (det S)(det B). Since S is invertible, det S 6= 0, so we can divide each side by it to obtain det A = det B. Theorem 190 (6.2.8, determinant of an inverse). If A is an invertible matrix, then det(A−1 ) = 1 = (det A)−1 . det A Proof. Taking the determinant of both sides of AA−1 = In , we get det(A) det(A−1 ) = det(AA−1 ) = det(In ) = 1. We divide both sides by det A 6= 0 to obtain the result. Theorem 191 (6.2.10, Laplace expansion). For an n × n matrix A, let Aij be the matrix obtained by omitting the ith row and the jth column of A. The determinant of the (n − 1) × (n − 1) matrix Aij is called the ijth minor of A. The determinant of A can be computed by Laplace expansion (or cofactor expansion) • along the ith row: det A = n X (−1)i+j aij det(Aij ), or j=1 • along the jth column: det A = n X (−1)i+j aij det(Aij ). i=1 Definition 192 (6.2.11, determinant of a linear transformation). • Let T : Rn → Rn be a linear transformation given by T (x) = Ax. Then the determinant of T is defined to be equal to the determinant of A: det T = det A. 51 • If V is a finite-dimensional vector space with basis B and T : V → V is a linear transformation, then we define the determinant of T to be equal to the determinant of the B-matrix of T : det T = det B. If we pick a different basis C of V , then the C-matrix C of T is similar to B, so det C = det B, and there is no ambiguity in the definition. Note that if V = Rn , then A is the E-matrix of T , where E = {e1 , . . . , en } is the standard basis of Rn , so our two definitions agree. 6.3 Geometrical Interpretations of the Determinant; Cramer’s Rule Theorem 193 (6.3.1, determinant of an orthogonal matrix). The determinant of an orthogonal matrix is either 1 or −1. Proof. If A is orthogonal, then AT A = I. Taking the determinant of both sides, we see that (det A)2 = det(AT ) det(A) = det(AT A) = det(I) = 1, so det A is either 1 or −1. Definition 194 (6.3.2). An orthogonal matrix A with det A = 1 is called a rotation matrix, and the linear transformation T (x) = Ax is called a rotation. Theorem 195 (6.3.3, the determinant and Gram-Schmidt orthogonalization). If A is an n × n matrix with columns v1 , v2 , . . . , vn , then |det A| = v1⊥ v2⊥ · · · vn⊥ , where vk⊥ is the component of vk perpendicular to span(v1 , . . . , vk−1 ). Proof. If A is invertible, then by Theorem 140 we can write A where Q is an orthogonal matrix and =QR, R is an upper triangular matrix with diagonal entries rjj = vj⊥ . Thus |det A| = |det(QR)| = |(det Q)(det R)| = |det Q| |det R| = (1)(r11 r22 · · · rnn ) = v1⊥ v2⊥ · · · vn⊥ . If A is not invertible, then some vk is redundant in the list v1 , . . . , vn , so vk⊥ = 0 and ⊥ ⊥ v1 v2 · · · vn⊥ = 0 = |det a| . Note 196. In the special case where A has orthogonal columns, the theorem says that |det A| = ||v1 || ||v2 || · · · ||vn || . Definition 197. • The m-parallelepiped defined by the vectors v1 , . . . , vm ∈ Rn is the set of all vectors in Rn of the form c1 v1 + · · · + cm vm , where 0 ≤ ci ≤ 1. A 2-parallelepiped is also called a parallelogram. 52 • The m-volume V (v1 , . . . , vm ) of this m-parallelepiped is defined to be ⊥ . V (v1 , . . . , vm ) = v1⊥ v2⊥ · · · vm In the case m = n, this is just |det A|, where A is the square matrix with columns v1 , . . . , vn ∈ Rn . Theorem 198 (6.3.6, volume of an m-parallelepiped in Rn ). The m-volume of the vectors v1 , . . . , vm ∈ Rn is q V (v1 , . . . , vm ) = det(AT A), where A is the n × m matrix with columns v1 , . . . , vm ∈ Rn . Proof. If the columns of A are linearly independent, then consider the QR factorization A = QR. Since Q is orthogonal, AT A = (QR)T (QR) = RT QT QR = RT R, so det(AT A) = det(RT R) = det(RT ) det(R) = (det R)2 ⊥ 2 = (V (v1 , . . . , vm ))2 . = v1⊥ v2⊥ · · · vm Note 199. If m = n in the preceding theorem, then the m-volume is q q p det(AT A) = det(AT ) det(A) = det(A) det(A) = |det(A)| , as noted above. Theorem 200 (6.3.7, expansion factor). Let T : Rn → Rn be a linear transformation. The image of the n-parallelepiped Ω defined by vectors v1 , . . . , vn is equal to the n-parallelepiped T (Ω) defined by the vectors T (v1 ), . . . , T (vn ). The ratio between the n-volumes of T (Ω) and Ω, called the expansion factor of T , is just |det T |: V (T (v1 ), . . . , T (vn )) = |det T | V (v1 , . . . , vn ). Proof. The first statement follows from the linearity of T : T (c1 v1 + · · · + cn vn ) = c1 T (v1 ) + · · · + cn T (vn ). To compute the expansion factor, suppose T (x) = Ax, and let B be the matrix with columns v1 , . . . , vn . Then AB has columns T (v1 ), . . . , T (vn ), so V (T (v1 ), . . . , T (vn )) = |det(AB)| = |det A| |det B| = |det T | V (v1 , . . . , vn ). Theorem 201 (6.3.8, Cramer’s Rule). Given a linear system Ax = b, with A invertible, define Ab,j to be the matrix obtained by replacing the jth column of A by b. Then the components xj of the unique solution vector x are det(Ab,j ) xj = . det A 53 Proof. Write A in terms of its columns, as A = v1 · · · vj · · · vn . If x is the solution of the system Ax = b, then det(Ab,j ) = det v1 · · · b · · · vn = v1 · · · Ax · · · vn = v1 · · · (x1 v1 + · · · + xj vj + · · · + xn vn ) · · · vn = v1 · · · xj vj · · · vn = xj v 1 · · · v j · · · v n = xj det A. Theorem 202 (6.3.9, adjoint and inverse of a matrix). Let A be an invertible n × n matrix. Define the classical adjoint adj(A) of A to be the n × n matrix whose ijth entry is (−1)i+j det(Aji ). Then A−1 = Note 203. In the 2 × 2 case, if A = a c 1 adj(A). det A b , then we get the familiar formula d A−1 = 1 d −b . ad − bc −c a 54 7 Eigenvalues and Eigenvectors 7.1 Dynamical Systems and Eigenvectors: An Introductory Example 7.2 Finding the Eigenvalues of a Matrix Definition 204 (7.1.1). Let A be an n × n matrix. A nonzero vector v ∈ Rn is called an eigenvector of A if Av is a scalar multiple of v, i.e., Av = λv for some scalar λ. The scalar λ is called the eigenvalue of A associated with the eigenvector v. We sometimes call v a λ-eigenvector. Note 205. Eigenvalues may be 0, but eigenvectors may not be 0. Eigen is German for “proper” or “characteristic.” Theorem 206 (geometric interpretation). A vector v ∈ Rn is an eigenvector of an n×n matrix A if and only if the line span(v) through the origin in Rn is mapped to itself by the linear transformation T (x) = Ax, i.e., x ∈ span(v) =⇒ Ax ∈ span(v). Proof. Suppose v is a λ-eigenvector of A. Any element of span(v) is equal to kv for some scalar k. We check that A(kv) ∈ span(v): A(kv) = k(Av) = k(λv) = (kλ)v. Conversely, suppose the line span(v) is mapped to itself by T (x) = Ax. Since v ∈ span(v), we must have Av ∈ span(v), so Av = λv for some scalar λ, which means that v is an eigenvector of A. Theorem 207 (7.2.1, finding eigenvalues). A scalar λ is an eigenvalue of an n × n matrix A if and only if det(A − λIn ) = 0. The expression fA (λ) = det(A − λIn ) is called the characteristic polynomial of A. Proof. Note that Av = λv ⇐⇒ Av − λv = 0 ⇐⇒ Av − λ(In v) = 0 ⇐⇒ (A − λIn )v = 0, so that we have the following chain of equivalent statements: λ is an eigenvalue of A ⇐⇒ There exists v 6= 0 such that Av = λv ⇐⇒ There exists v 6= 0 such that (A − λIn )v = 0 ⇐⇒ ker(A − λIn ) 6= {0} ⇐⇒ A − λIn is not invertible ⇐⇒ det(A − λIn ) = 0. 55 Theorem 208 (7.2.2, eigenvalues of a triangular matrix). The eigenvalues of a triangular matrix are its diagonal entries. Proof. If A is an n × n triangular matrix, then so is A − λIn . The characteristic polynomial is therefore det(A − λIn ) = (a11 − λ)(a22 − λ) · · · (ann − λ), with roots a11 , a22 , . . . , ann . Theorem 209 (7.2.5, characteristic polynomial). The characteristic polynomial fA (λ) = det(A − λIn ) is a polynomial of degree n in the variable λ, of the form fA (λ) = (−λ)n + (tr A)(−λ)n−1 + · · · + det A. Proof. We have a11 − λ a21 fA (λ) = det(A − λIn ) = . .. an1 a12 a22 − λ .. . ··· ··· .. . a1n a2n .. . an2 ··· ann − λ . The product of each pattern is a product of scalars aij and entries of the form aii − λ, which is a polynomial in λ with degree equal to the number of diagonal entries in the pattern. The determinant, as a sum of these products (or their opposites) is a sum of polynomials, and hence a polynomial. The diagonal pattern contributes the product (a11 − λ)(a22 − λ) · · · (ann − λ) = (−λ)n + (a11 + a22 + · · · + ann )(−λ)n−1 + (lower degree terms) = (−λ)n + (tr A)(−λ)n−1 + (lower degree terms). Any other pattern involves at least two entries off the diagonal, so its product is of degree ≤ n − 2. Thus the degree of fA (λ) is n, with the leading two terms as claimed. The constant term is fA (0) = det(A). Definition 210 (7.2.6). An eigenvalue λ0 of a square matrix A has algebraic multiplicity k if λ0 is a root of multiplicity k of the characteristic polynomial fA (λ), meaning that we can write fA (λ) = (λ0 − λ)k g(λ) for some polynomial g(λ) with g(λ0 ) 6= 0. We write AM(λ0 ) = k. Theorem 211 (7.2.7, number of eigenvalues). An n × n matrix A has at most n real eigenvalues, even if they are counted with their algebraic multiplicities. If n is odd, then A has at least one real eigenvalue. In summary, X 1≤ AM(λ) ≤ n. eigenvalues λ of A Proof. The sum of the algebraic multiplicities of the eigenvalues of A is just the number of linear factors in the complete factorization of the characteristic polynomial fA (λ) (over the real numbers), which is clearly ≤ n. If n is odd, then lim fA (λ) = ∞ and lim fA (λ) = −∞. λ→−∞ λ→∞ Thus there is some negative number a with fA (a) > 0 and some positive number b with fA (b) < 0. By the Intermediate Value Theorem, there exists a real number c between a and b such that fA (c) = 0, so that c is an eigenvalue of A. 56 Theorem 212 (7.2.8, determinant and trace in terms of eigenvalues). If an n × n matrix factors completely into linear factors, so that it has n eigenvalues λ1 , λ2 , . . . , λn (counted with their algebraic multiplicities), then det A = λ1 λ2 · · · λn and tr A = λ1 + λ2 + · · · + λn . Proof. Since the characteristic polynomial factors completely, it can be written fA (λ) = det(A − λIn ) = (λ1 − λ)(λ2 − λ) · · · (λn − λ). Substituting 0 for λ, we get fA (0) = det(A) = λ1 λ2 · · · λn . The trace result is an exercise. 7.3 Finding the Eigenvectors of a Matrix Definition 213 (7.3.1). Let λ be an eigenvalue of an n × n matrix A. The λ-eigenspace of A, denoted Eλ , is defined to be Eλ = ker(A − λIn ) = {v ∈ Rn : Av = λv} = {λ-eigenvectors of A} ∪ {0}. Note 214. An eigenspace is a subspace, since it is the kernel of the matrix A − λIn . All of the nonzero vectors in Eλ are λ-eigenvectors. Definition 215 (7.3.2). The dimension of the λ-eigenspace Eλ = ker(A − λIn ) is called the geometric multiplicity of λ, written GM (λ). We have GM (λ) = dim(Eλ ) = dim(ker(A − λIn )) = nullity(A − λIn ) = n − rank(A − λIn ). Definition 216 (7.3.3). Let A be an n × n matrix. A basis of Rn consisting of eigenvectors of A is called an eigenbasis for A. Theorem 217 (eigenvectors with distinct eigenvalues are linearly independent). Let A be a square matrix. If v1 , v2 , . . . , vs are eigenvectors of A with distinct eigenvalues, then v1 , v2 , . . . , vs are linearly independent. Proof. We use proof by contradiction. Suppose v1 , . . . , vs are linearly dependent, and let vm be the first redundant vector in this list, with vm = c1 v1 + · · · + cm−1 vm−1 . 57 Suppose Avi = λi vi . Since the eigenvector vm is not 0, there must be some nonzero coefficient ck . Multiplying the equation vm = c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 by A, we get Avm = A(c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 ) Avm = c1 Av1 + · · · + ck Avk + · · · + cm−1 Avm−1 λm vm = c1 λ1 v1 + · · · + ck λk vk + · · · + cm−1 λm−1 vm−1 . Multiplying the same equation instead by λm , we get λm vm = c1 λm v1 + · · · + ck λm vk + · · · + cm−1 λm vm−1 , which, when subtracted from our result above, yields 0 = (λm − λm )vm = c1 (λ1 − λm )v1 + · · · + ck (λk − λm )vk + · · · + cm−1 (λm−1 − λm )vm−1 . Since ck and λk − λm are nonzero, we have a nontrivial linear relation among the vectors v1 , . . . , vm−1 , contradicting the minimality of m. Note 218. Part (a) of the following theorem is a generalization of the preceding theorem, allowing multiple (linearly independent) eigenvectors with a single eigenvalue. Theorem 219 (7.3.4, eigenbases and geometric multiplicities). a) Let A be an n × n matrix. If we concatenate bases for each eigenspace of A, then the resulting eigenvectors v1 , . . . , vs will be linearly independent. (Note that s is the sum of the geometric multiplicities of the eigenvalues of A.) b) There exists an eigenbasis for an n × n matrix A if and only if the sum of the geometric multiplicities of its eigenvalues equals n: X GM(λ) = n. eigenvalues λ of A Proof. a) We use proof by contradiction. Suppose v1 , . . . , vs are linearly dependent, and let vm be the first redundant vector in this list, with vm = c1 v1 + · · · + cm−1 vm−1 . Suppose Avi = λi vi . There must be at least one nonzero coefficient ck such that λk 6= λm , since vm and the other vectors vi with the same eigenvalue have been chosen to be linearly independent. Multiplying the equation vm = c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 by A, we get Avm = A(c1 v1 + · · · + ck vk + · · · + cm−1 vm−1 ) Avm = c1 Av1 + · · · + ck Avk + · · · + cm−1 Avm−1 λm vm = c1 λ1 v1 + · · · + ck λk vk + · · · + cm−1 λm−1 vm−1 . Multiplying the same equation instead by λm , we get λm vm = c1 λm v1 + · · · + ck λm vk + · · · + cm−1 λm vm−1 , which, when subtracted from our result above, yields 0 = (λm − λm )vm = c1 (λ1 − λm )v1 + · · · + ck (λk − λm )vk + · · · + cm−1 (λm−1 − λm )vm−1 . Since ck and λk −λm are nonzero, we have a nontrivial linear relation among the vectors v1 , . . . , vm−1 , contradicting the minimality of m. 58 b) Any linearly independent set of eigenvectors can contain at most GM (λ) vectors from Eλ , so the sum s of the geometric multiplicities is an upper bound on the size of a linearly independent set of eigenvectors. By part (a), there does always exists a linearly independent set of s eigenvectors. These s linearly independent vectors form a basis of Rn if and only s = dim(Rn ) = n. Theorem 220 (7.3.5, n distinct eigenvalues). If an n × n matrix has n distinct eigenvalues, then there exists an eigenbasis for A. Proof. For each of the n eigenvalues, the geometric multiplicity is at least 1 (in fact they must all equal 1 in this case), so the sum of the geometric multiplicities is n. The preceding theorem implies that an eigenbasis exists. Theorem 221 (7.3.6, eigenvalues of similar matrices). Suppose A is similar to B. Then a) fA (λ) = fB (λ). (study only this part for the quiz) b) nullity(A) = nullity(B) and rank(A) = rank(B). c) A and B have the same eigenvalues, with the same algebraic and geometric multiplicities. d) det A = det B and tr A = tr B. Proof. a) If B = S −1 AS and A, B are n × n matrices, then fB (λ) = det(B − λIn ) = det(S −1 AS − λS −1 In S) = det(S −1 (A − λI − n)S) = (det S −1 )(det(A − λIn ))(det S) = (det S)−1 (det S)(det(A − λIn )) = det(A − λIn ) = fA (λ). b) Suppose SB = AS. Let p = nullity(B) and consider a basis v1 , . . . , vp of ker(B). Then A(Svi ) = S(Bvi ) = S(0) = 0, so Sv1 , . . . , Svp ∈ ker(A). Furthermore, we show that Sv1 , . . . , Svp are linearly independent. Any linear relation c1 Sv1 + · · · + cp Svp = 0 can be rewritten S(c1 v1 + · · · + cp vp ) = 0. Multiplying by S −1 yields a linear relation c1 v1 + · · · + cp vp = S −1 0 = 0, which must be trivial, so c1 = · · · = cp = 0. We have found p = nullity(B) linearly independent vectors in ker(A), which implies that nullity(A) ≥ nullity(B). A similar argument shows that nullity(A) ≤ nullity(B), so the nullities are equal. For the ranks, we use the Rank-Nullity Theorem: rank(A) = n − nullity(A) = n − nullity(B) = rank(B). c) A and B have the same eigenvalues and algebraic multiplicities by part (a). Since A − λIn is similar to B − λIn (see the proof of part (a)), the geometric multiplicities of an eigenvalue λ are equal by part (b): nullity(A − λIn ) = nullity(B − λIn ). 59 d) This follows from part (a), since determinant and trace are coefficients of the characteristic polynomial (up to a fixed sign). Note 222. Similar matrices generally do not have the same eigenvectors. Theorem 223 (7.3.7, algebraic and geometric multiplicity). If λ is an eigenvalue of A, then GM (λ) ≤ AM (λ). Combining this with earlier results, we get X eigenvalues λ of A 7.4 X GM(λ) ≤ AM(λ) ≤ n. eigenvalues λ of A Diagonalization Theorem 224 (7.4.1, matrix of a linear transformation with respect to an eigenbasis). Let T : Rn → Rn be a linear transformation given by T (x) = Ax. A basis D of Rn is an eigenbasis for A if and only if the D-matrix of T is diagonal. Proof. Let D = (v1 , v2 , . . . , vn ). The D-matrix of T is diagonal if [Avi ]D , is equal to λi ei , for some λi , i = 1, 2, . . . , n: λ1 0 | | | 0 λ2 λ1 e1 λ2 e2 · · · λn en = .. .. . . | | | 0 0 and only if its ith column, [T (vi )]D = ··· ··· .. . 0 0 .. . ··· λn . But [Avi ]D = λi ei if and only if Avi = λi vi , which is the definition of D being an eigenbasis. Definition 225 (7.4.2). Consider a linear transformation T : Rn → Rn given by T (x) = Ax. • T is called diagonalizable if there exists a basis D of Rn such that the D-matrix of T is diagonal. • A is called diagonalizable if A is similar to some diagonal matrix D, i.e., if there exists an invertible matrix S such that S −1 AS is diagonal. Theorem 226 (7.4.3, eigenbases and diagonalizability). For a linear transformation T : Rn → Rn given by T (x) = Ax, the following statements are equivalent: 1. T is diagonalizable. 2. A is diagonalizable. 3. There exists an eigenbasis for A. Proof. 1 and 2 are equivalent because the D-matrix for T is equal to D = S −1 AS, where the columns of S are the basis vectors in D. 1 and 3 are equivalent by Theorem 224. 60 Procedure 227 (7.4.4, diagonalizing a matrix). To diagonalize an n × n matrix A (if possible): 1. Find the eigenvalues of A, i.e., the roots of the characteristic polynomial fA (λ) = det(A − λIn ). 2. For each eigenvalue λ, find a basis of the eigenspace Eλ = ker(A − λIn ). 3. A is diagonalizable if and only if the dimensions of the eigenspaces add up to n. In this case, concatenate the bases of the eigenspaces found in step 2 to obtain an eigenbasis D = (v1 , v2 , . . . , vn ) for A. Then the matrix D = S −1 AS is diagonal, and the ith diagonal entry of D is the eigenvalue λi associated with vi : λ1 0 · · · 0 −1 | | | | | | 0 λ2 · · · 0 .. .. .. = v1 v2 · · · vn A v1 v2 · · · vn . .. . . . . | | | | | | {z } | {z } | 0 0 · · · λn {z } | S S −1 D Theorem 228 (7.4.5, powers of a diagonalizable matrix). Suppose a matrix A is diagonalizable, with λ1 0 · · · 0 0 λ2 · · · 0 S −1 AS = D = . .. . . .. . .. . . . 0 0 ··· λn t λ1 0 =S . .. 0 λt2 .. . ··· ··· .. . 0 0 .. . 0 0 ··· λtn Then, for any positive integer t, At = SDt S −1 −1 S . Proof. Solving for A in S −1 AS = D, we obtain A = SDS −1 . Thus t times t A = (SDS t times z }| { z }| { −1 −1 −1 ) = (SDS )(SDS ) · · · (SDS ) = S DD · · · D S −1 = SDt S −1 , −1 t so that λ1 0 At = S . .. 0 λ2 .. . ··· ··· .. . 0 0 .. . 0 0 ··· λn t −1 S t λ1 0 =S . .. 0 λt2 .. . ··· ··· .. . 0 0 .. . 0 0 ··· λtn −1 S . Definition 229 (7.4.6, eigenvalues of a linear transformation). • Let V be a linear space and T : V → V a linear transformation. A nonzero element f ∈ V is called an eigenvector (or an eigenfunction, eigenmatrix, etc., depending on the nature of V ) if T (f ) is a scalar multiple of f , i.e., T (f ) = λf for some scalar λ. The scalar λ is called the eigenvalue associated with the eigenvector f . • If V is finite dimensional, then a basis D of V consisting of eigenvectors of T is called an eigenbasis for T . 61 • The transformation T is called diagonalizable if there exists some basis D of V such that the D-matrix of T is diagonal. Theorem 230 (eigenbases and diagonalization). A linear transformation T : V → V is diagonalizable if and only if there exists an eigenbasis for T . Proof. Let D = (f1 , f2 , . . . , fn ) be a basis of V . Then the D-matrix of T is diagonal if and only if its ith column [T (fi )]D is equal to λi ei for some λi , i = 1, 2, . . . , n. This condition is equivalent to T (fi ) = λi vi , which is the definition of D being an eigenbasis for T . Procedure 231 (diagonalizing a linear transformation). Let V be a finite dimensional linear space. To diagonalize a linear transformation T : V → V (if possible): 1. Sometimes you can find an eigenbasis D directly, in which case you are done. If not, then choose any basis B = (f1 , . . . , fn ) of V . 2. Compute the B-matrix of V : | B = [T (f1 )]B | ··· | [T (fn )]B . | 3. Find the eigenvalues of B, i.e., the roots of the characteristic polynomial fB (λ) = det(B − λIn ). 4. For each eigenvalue λ, find a basis of the eigenspace Eλ = ker(B − λIn ). 5. B (and hence T ) is diagonalizable if and only if the dimensions of the eigenspaces add up to n. In this case, concatenate the bases of the eigenspaces found in step 4 to obtain an eigenbasis D0 = (v1 , v2 , . . . , vn ) for B. 6. The vi are the B-coordinate vectors of an eigenbasis D = (g1 , . . . , gn ) for T , that is, [gi ]B = vi , or gi = L−1 B (vi ). This procedure is illustrated in the diagrams below: R n LD0 Rn 7.5 eKK KK LB KK KK KK sV ss s ss ss LD yss B T D / n 9R ss LB ss ss ss s /V s LD0 KK KK KK K LD KKK % / Rn v i LD0 ei eKKK KKKLB KKK KK gi ss s s s sssLD s s ys B / λi vi s9 s s sss sss / λi gi LD0 KKK KKK K LD KK% / λi ei LB T D Complex Eigenvalues Definition 232. A field F is a set F together with an addition rule and a multiplication rule: • For a, b ∈ F, there is an element a + b ∈ F. • For a, b ∈ F, there is an element ab ∈ F. which satisfy the following ten properties for all a, b, c ∈ F: 1. addition is associative: (a + b) + c = a + (b + c). 2. addition is commutative: a + b = b + a. 62 3. an additive identity exists: There is an element n ∈ F such that a + n = a for all a ∈ F. This n is unique and is denoted by 0. 4. additive inverses exist: For each a ∈ F, there exists a b ∈ F such that a + b = 0. This b is unique and is denoted by (−a). 5. multiplication is associative: a(bc) = (ab)c. 6. multiplication is commutative: ab = ba. 7. a multiplicative identity exists: There is an element e ∈ F such that ae = a for all a ∈ F. This e is unique and is denoted by 1. 8. multiplicative inverses exist: For each nonzero a ∈ F, there exists a b ∈ F such that ab = 1. This b is unique and is denoted by a−1 . 9. multiplication distributes over addition: a(b + c) = ab + ac. 10. the identities are distinct: 0 6= 1. Note 233. • The existence of additive inverses allows us to subtract, while the existence of multiplicative inverses allows us to divide (by nonzero elements). • In this course, we have studied linear algebra over the field R of real numbers. Other common fields include the complex numbers C and the rational numbers Q. Many other fields exist, such as the field F2 = {0, 1} of two elements, for which 1 + 1 = 0. • The linear algebraic concepts we have studied in this course make sense over any field of scalars, with the exception of the material in Chapter 5 involving dot products. Theorem 234 (7.5.2, fundamental theorem of algebra). Any polynomial p(λ) with complex coefficients splits, meaning that it can be written as a product of linear factors p(λ) = k(λ − λ1 )(λ − λ2 ) · · · (λ − λn ) for some complex numbers k, λ1 , λ2 , · · · , λn . Proof. This is a result in complex analysis. Theorem 235 (7.5.4, number of complex eigenvalues). A complex n×n matrix A has exactly n complex eigenvalues, if they are counted with their algebraic multiplicities. In other words, X AM(λ) = n. eigenvalues λ of A Proof. The sum of the algebraic multiplicities is the number of linear factors in the complete factorization of fA (λ), which equals n by the fundamental theorem of algebra. 63 Theorem 236 (7.5.3, real 2 × 2 matrices with complex eigenvalues). If A is a real 2 × 2 matrix with eigenvalues a ± ib (where b 6= 0), and if v + iw is an eigenvector of A with eigenvalue a + ib, then a −b −1 S AS = , where S = w v . b a Thus A is similar, over the real numbers, to a rotation-scaling matrix p cos θ − sin θ a −b 2 2 , = a +b sin θ cos θ b a √ √ where cos θ = a/ a2 + b2 and sin θ = b/ a2 + b2 . Proof. By Theorem 226, P −1 a + ib 0 AP = , 0 a − ib where P = v + iw v − iw . Similarly, we can diagonalize the rotation-scaling matrix above to obtain a −b a + ib 0 i R−1 R= , where R = b a 0 a − ib 1 Thus P −1 AP = R−1 and a b a b −i . 1 −b R, a −b = R(P −1 AP )R−1 = S −1 AS, a where S = P R−1 and S −1 = (P R−1 )−1 = RP −1 . We check that 1 1 −1 v + iw v − iw S = PR = −1 2i i = w i v . Theorem 237 (7.5.5, determinant and trace in terms of eigenvalues). For any n × n complex matrix A with complex eigenvalues λ1 , λ2 , . . . , λn , listed with their algebraic multiplicities, det A = λ1 λ2 · · · λn and tr A = λ1 + λ2 + · · · + λn . Proof. The proof is the same as for Theorem 212. 64