* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download 1. Algebra of Matrices
Survey
Document related concepts
Rotation matrix wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Symmetric cone wikipedia , lookup
Capelli's identity wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Four-vector wikipedia , lookup
Jordan normal form wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Determinant wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Matrix calculus wikipedia , lookup
System of linear equations wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Transcript
LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS MAT 204 - FALL 2006 PRINCETON UNIVERSITY ALFONSO SORRENTINO [Read also § 1.1-7, 2.2,4, 4.1-4] 1. Algebra of Matrices Denition. set of mn Let m, n be two positive integers. A elements, placed on m rows and a11 a21 .. . am1 We will denote by M1,1 (R) Mmn (R) a12 a22 ... ... . . . .. am2 ... the set of can be identied with R n m m by n real matrix is an ordered columns: . by a1n a2n . . . . amn n real matrices. Obviously, the set (set of real numbers). NOTATION: A ∈ Mm,n (R). To simplify the notation, sometimes we will abbreviate A = (aij ). The element aij is placed on the i-th row and j -th column, it will be also (i) denoted by (A)ij . Morerover, we will denote by A the i-th row (ai1 , . . . , ain ) of aij A and by A(j) the j -th column ... of A. Therefore: Let amj A(1) . . . A = A(1) . . . A(n) = . A(m) Remark. For any integer n ≥ 1, we can consider the cartesian product ously, there exists a bijection between Denition. Let A ∈ Mm,n (R). The R n and M1,n (R) transpose of A or Rn . Obvi- Mn,1 (R). is a matrix B ∈ Mn,m (R) such that: bij = aji The matrix i-th B column of for any i = 1, . . . , n and j = 1, . . . , m. T . In few words, the i-th row of will be denoted by A AT : (AT )ij = Aji , (AT )j = A(j) and Moreover, (AT )T = A . 1 (AT )i = A(i) . A is simply the 2 ALFONSO SORRENTINO We will see that the space of matrices can be endowed with a structure of vector space. Let us start, by dening two operations in the set of matrices Mm,n (R): the addition and the scalar multiplication. • Addition: + : Mm,n (R) × Mm,n (R) −→ (A, B) 7−→ where • A+B is such that: Mm,n (R) A+B (A + B)ij = (A)ij + (B)ij . Scalar multiplication: · : R × Mm,n (R) −→ Mm,n (R) (c, A) 7−→ cA where cA is such that: (cA)ij = c(A)ij . Proposition 1. (Mm,n , +, ·) is a vector space. Moreover its dimension is mn. Proof. First of all, let us observe that the zero vector is the matrix with all entries equal to zero: to the addition) of a matrix A, 0ij = 0]; zero matrix 0 [i.e., the the opposite element (with respect will be a matrix −A, such that (−A)ij = −(A)ij . We leave to the reader to complete the exercise of verifying that all the axioms that dene a vector space hold in this setting. Let us compute its dimension. It suces to determine a basis. Let us consider the following mn matrices E hk h = 1, . . . , m and k = 1, . . . , n), 0 if (i, j) 6= (h, k), (E hk )ij = 1 if (i, j) = (h, k) (for [in other words, the only non-zero element of k -th column]. It is easy to verify that for any A= m X n X such that: E hk is the one on the h-th row A = (aij ) ∈ Mm,n (R), we have: and aij E ij ; i=1 j=1 therefore this set of mn matrices is a spanning set of Mm,n (R). Let us verify that they are also linearly independent. In fact, if m X n X aij E ij = 0 ⇐⇒ aij = 0 for all i = 1, . . . , m and j = 1, . . . , n . i=1 j=1 This shows that Denition. Let {E 11 , . . . , E mn } is a basis. A ∈ Mm,n (R). • A is a square matrix (of order n), if m = n; the n-ple (a11 , . . . , ann ) is called diagonal of A. The set Mn,n (R) will be simply denoted by Mn (R). • A square matrix A = (aij ) ∈ Mn (R) is said upper triangular [resp. lower triangular] if aij = 0, for all i > j [resp. if aij = 0, for all i < j ]. • A square matrix A ∈ Mn (R) is called diagonal if it is both upper and lower triangular [i.e., the only non-zero elements are on the diagonal: aij = 0, for all i 6= j ]. • A diagonal matrix is called scalar, if a11 = a22 = . . . = ann . • The scalar matrix with a11 = a22 = . . . = ann = 1 is called unit matrix, and will be denoted In . • A square matrix A ∈ Mn (R) is symmetric if A = AT [therefore, aij = aji ] T and it is skew-symmetric if A = −A [therefore, aij = −aji ]. Now we come to an important question: how do we multiply two matrices? The rst step is dening the product between a row and a column vector. LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS Denition. Let A = (a1 . . . an ) ∈ M1,n (R) and B= b1 . . . 3 ∈ Mm,1 (R). We bn dene the multiplication between A AB = (a1 . . . an ) B and b1 . . . by: = a1 b1 + . . . + an bn ∈ R . bn A ∈ Mm,n (R) and B ∈ Mn,p (R), we dene the matrix (1) A B(1) A(1) B(2) . . . A(1) B(p) A(2) B(1) A(2) B(2) . . . A(2) B(p) AB = ∈ Mm,p (R) , . . . .. . . . . . . . More generally, if where A(i) B(j) Remark. product: A(m) B(1) A(m) B(2) . . . A(m) B(p) Pn = k=1 aik bkj , for all i = 1, . . . , m and j = 1, . . . , p. Observe that this product makes sense only if the number of columns of A is the same as the number of rows of B . Obviously, the product is always dened when the two matrices A and B are square and have the same order. Let us see some properties of these operations (the proof of which, is left as an exercise). Proposition 2. [Exercise] i) The matrix multiplication is associative; namely: (AB)C = A(BC) ii) for any A ∈ Mm,n (R), B ∈ Mn,p (R) and C ∈ Mp,q (R). The following properties hold: (A + B)C = AC + BC , for any A, B ∈ Mm,n (R) and C ∈ Mn,p (R); A(B + C) = AB + AC , for any A ∈ Mm,n (R) and B, C ∈ Mn,p (R); AIn = A = Im A, for any A ∈ Mm,n (R); (cA)B = c(AB), for any c ∈ R and A ∈ Mm,n (R), B ∈ Mn,p (R); (A + B)T = AT + B T , for any A, B ∈ Mm,n (R); (AB)T = B T AT , for any A ∈ Mm,n (R) and B ∈ Mn,p (R). Remark. A is A ∈ Mn (R), it is not true in general that there AB = BA = In . In case it does, we say that inverse by A−1 . Given a square matrix exists a matrix B ∈ Mn (R) such that invertible and we denote its Let us consider in Mn (R) GLn := {A ∈ Mn (R) : This set is called the subset of invertible matrices: there exists B ∈ Mn (R) such that AB = BA = In } . general linear group of order n. We leave as an exercise to the reader, to verify that the following properties hold. Proposition 3. [Exercise] −1 −1 i) For any A1 , A2 ∈ GLn (R), we have: (A1 A2 ) = A−1 2 A1 . T T −1 ii) For any A ∈ GLn (R), we have: A ∈ GLn (R) and (A ) = (A−1 )T . −1 iii) For any A ∈ GLn (R) and c ∈ R, with c 6= 0, we have: (cA) = 1c A−1 . In −1 −1 particular, (−A) = −(A ). An important subset of matrices. GLn (R), that we will use later on, is the set of orthogonal 4 ALFONSO SORRENTINO Denition. A matrix A ∈ Mn (R) is called orthogonal, if: [i.e., A−1 = AT ]. AT A = In = AAT The set of the orthogonal matrices of order from what observed above (i.e., A−1 = AT ), n On (R). Moreover, On (R) ⊂ GLn (R) (i.e., is denoted by it follows that orthogonal matrices are invertible). To conclude this section, let us work out a simple exercise, that will provide us with a characterization of Example. The set Rα = for all O2 (R). O2 (R) consists of all matrices of the form: − sin α cos α cos α sin α cos α sin α and Sα = sin α − cos α , α ∈ R. Proof. One can easily verify (with a direct computation), that: T T Rα Rα = I2 = Rα Rα and Sα SαT = I2 = SαT Sα ; therefore, these matrices are orthogonal. We need to show that all orthogonal matrices are of this form. Consider a matrix such that A = Rα a b ∈ O2 (R). c d A = Sα . By denition: A= or A ∈ O2 (R) ⇐⇒ ⇐⇒ α ∈ R, AT A = In = AAT ⇐⇒ 2 a + c2 = 1 = a2 + b2 ab + cd = 0 = ac + bd 2 b + d2 = 1 = a2 + c2 . From the rst two equations, if follows that i) b = c Let us show that there exists b2 = c2 . There are two cases: ii) b = −c . or i) In this case, plugging into the other equations, one gets that (a + d)c = 0 and therefore: i0 ) c = 0 or i00 ) a + d = 0 . In the case i'): A= namely, A a 0 0 d with a2 = d2 = 1 ; is one of the following matrices: 1 0 0 1 −1 0 = Sπ , 0 1 −1 0 = Rπ . 0 −1 = R0 , 1 0 = S0 , 0 −1 In the case i): A= with a2 + b2 = 1 ; A = Sα with α ∈ [0, 2π) 2 2 possible since a + b = 1). therefore, is a b b −a such that (a, b) = (cos2 α, sin2 α) (this LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS (a − d)c = 0 ii) Proceeding as above, we get that 5 and therefore there are two possibilities: ii0 ) c = 0 ii00 ) a − d = 0 . or In the case ii'), we obtain again: a 0 0 d a −c c a A= with a2 = d2 = 1 . In the case ii): A= with a2 + c2 = 1 ; A = Rα with α ∈ [0, 2π) 2 2 possible since a + c = 1). therefore, is such that (a, c) = (cos2 α, sin2 α) (this 2. Systems of linear equations I: Gauss-Jordan method Denition. A be abbreviated system of m linear equations and n unknowns x1 , . . . , xn LS LS(m, n, R)) is a set of m equations a11 x1 + a12 x2 + . . . + a1n xn = b1 a21 x1 + a22 x2 + . . . + a2n xn = b2 or (that will of the form: . . . am1 x1 + am2 x2 + . . . + amn xn = bm , or, in a shorter way: Pn j=1 aij xj = bi for i = 1, . . . , m . aij ∈ R are called coecients and the bi ∈ R are the right-hand sides knowns) of the equations of the LS . n A solution of such LS is a n-ple (z1 , . . . , zn ) ∈ R such that The elements (or Pn j=1 A LS aij zj = bi for without any solution is said to be ible or solvable. i = 1, . . . , m . incompatible; otherwise it is called compat- LS(m, n, R) is homogeneous (it will be abbreviated HLS(m, n, R)), if bi = 0 i = 1, . . . , n. Obviously, a HLS is always solvable; in fact, the zero n-ple (0, . . . , 0) is always a solution (the trivial solution); the other non-trivial solutions A for all (if they exist) are called Observe that given any substituting all tem. bi 's by eigensolutions. LS(m, n, R), it is always possible to obtain a HLS(m, n, R), zeros; this new HLS is called associated homogeneous sys- Introducing the matrix notation, it is possible to rewrite linear systems in a more compact way. In fact, consider a LS(m, n, R): Pn j=1 aij xj = bi for i = 1, . . . , m and let us denote: • X the column of unknowns x1 . . . , xn • A = (aij ) ∈ Mm,n (R), the matrix of coecients, 6 ALFONSO SORRENTINO • b= b1 . . . ∈ Mm,1 (R), the columns of right-hand sides. bm From the denition of matrix multiplication, it follows that: a11 x1 + a12 x2 + . . . + a1n xn a21 x1 + a22 x2 + . . . + a2n xn AX = . . . am1 x1 + am2 x2 + . . . + amn xn Therefore, the LS can be rewritten in the b1 b2 = .. = b . . bm matricial form: AX = b . (A b) ∈ Mm,n+1 (R) is called complete matrix or bordered (or edged) AX = b and identies it univocally; there is, indeed, a bijecbetween Mm,n+1 (R) and LS(m, n, R). The matrix matrix tion of the system Remark. (z1 , . . . , zn ) ∈ Rn of the LS(m, n, R) AX = z1 b , are in a 1 − 1 correspondence with the column matrices z = ... ∈ Mn,1 (R) It is evident, that the solutions zn such that Az = b. Moreover, observe that any solution z = (z1 , . . . , zn ) of the linear system can be expressed as a linear combination of the columns of z1 . . . b = Az = (A(1) , . . . , A(n) ) A; AX = b in fact: n X zi A(i) . = i=1 zn Proposition 4. Let AX = 0 be a HLS(m, n, R). The set Σ0 of its solutions is a vector subspace of Rn . Proof. It suces to verify that for any In fact (identifying y and z y, z ∈ Σ0 and a, b ∈ R, then ay + bz ∈ Σ0 . with the corresponding column matrices): A(ay + bz) = A(ay) + A(bz) = a(Ay) + b(Az) = a0 + b0 = 0 ; hence, ay + bz ∈ Σ0 . Proposition 5. Let AX = b be a LS(m, n, R) and Σ the set of its solutions. Let us denote by Σ0 the set of solutions of the associated HLS AX = 0. If Σ is non-empty (i.e., the LS is solvable), then for any z0 ∈ Σ, we have: Σ = z0 + Σ0 = {z0 + y, for all y ∈ Σ0 }. Hence, there is a bijection between Σ and Σ0 . Proof. (⊆) Let z ∈ Σ (i.e., Az = b). Since Az0 = b, then: A(z − z0 ) = Az − Az0 = b − b = 0 ; consequently, z − z0 ∈ Σ 0 . (⊇) Let us verify that for any Hence, y ∈ Σ0 , z = z0 + (z − z0 ) ∈ z0 + Σ0 . z0 + y ∈ Σ. In fact, we have A(z0 + y) = Az0 + Ay = b + 0 = b . LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS Remark. AX = b Let Rn . In fact, correspondence with a vector subspace Σ Σ (0, . . . , 0) 6∈ Σ; nevertheless it is in a 1 − 1 n of R (namely, Σ0 associated to AX = 0). a solvable non-homogeneous linear system. In this case, is never a vector subspace of Since 7 is NOT a vector space, it does not make sense to talk about its dimension, but we can associate to it a sort of dimension, dened in terms of the dimension of Σ0 . More explicitly, if dim (Σ0 ) = t, we will say that the LS AX = b ∞t solutions. In particular, if dim (Σ0 ) = 0 (i.e., the associated only the trivial solution) then AX = b, if solvable, has only one solution the associated (if solvable) has HLS has 0 = 1). (∞ From prop. 5, we deduce that in order to identify all possible solutions of a LS AX = b, it is sucient to nd one particular solution and all the solutions of the associated Denition. HLS (and sum them up). Two linear systems unknowns are AX = b and A0 X = b0 with the same number n of equivalent, if and only if they have the same solutions (i.e., Σ = Σ0 ). We want now to describe an algorithm to solve a linear system: the Gauss-Jordan method. Let us start with a denition. Denition. A LS(m, n, R) AX = b aij = 0 aii 6= 0 A In particular, the matrix is called a if i > j for i = 1, . . . , m . is upper triangular, a11 0 .. . 0 ... a22 ... ... .. .. . 0 step system, if m ≥ n and i.e., of the form: ... ... . 0 am,m ... ... . . . . ... Proposition 6. Every step LS(m, n, R) AX = b is solvable and has ∞n−m solutions. Proof. If m = n, the LS has only one solution x = (x1 , . . . , xn ), that we get in the following way: • let us start solving the last equation of the system the component • plugging this value of the component • ann xn = bn , and we get xn ; xn in the second-last equation and solving it, we get xn−1 ; proceeding in the same way, one can nd all other components xn−2 , . . . , x2 , x1 . m < n. If we assign to the last n − m components xm+1 , . . . , xn , t1 , . . . , tn−m ∈ R, we obtain a step LS(m, m, R), that, for what has a unique solution (x1 , . . . , xm ). Therefore, Assume now that arbitrary values observed above, x = (x1 , . . . , xm , t1 , . . . , tn−m ) is a solution for the original system (so it is solvable). In other words, we have dened an application Φ : Rn−m −→ (t1 , . . . , tn−m ) 7−→ Σ x = (x1 , . . . , xm , t1 , . . . , tn−m ) . LS(m, m, R) z = (z1 , . . . , zn ) ∈ Σ, Such application is a bijection. In fact, it is injective since each step has a unique solution; moreover, it is surjective, since for any we have: Φ(zm+1 , . . . , zn ) = z . 8 ALFONSO SORRENTINO From the denition of dimension introduced above, to show that the dim (Σ0 ) = n − m HLS ). solutions, we need to verify that (where Σ0 LS has ∞n−m denotes the vector subspace of the solutions of the associated Analogously to what already done above, dene Φ0 : Rn−m where Σ0 −→ Σ0 , is the solution space associated to standard basis of Rn−m . AX = 0 and by e1 , . . . , en−m the One can easily verify that Φ0 (e1 ), . . . , Φ0 (en−m ) form a basis for Σ0 [EXERCISE]; this completes the proof. Gauss-Jordan method consists of transforming (if possible) a given LS(m, n, R) AX = b, into an equivalent step linear system (that can be solved using the method The of prop. 6). In order to perform this transformation, several elementary operations (on the equa- tions) are allowed: • I elementary operation: LS ; • II elementary operation: interchange the position of two equations of the Multiply both sides of an equation, by a non-zero real number; • III elementary operation: Substitute one equation, with the sum of the same equation and a multiple of another. It is evident that these transformations do not change the set of solutions, transform the given system in an equivalent one. In particular, we can see these operations as operations on the rows of the matrix M = (A b) of the system; more precisely (remember that M (i) i-th row of I: II: III: i.e., they complete denotes the M ): (i) [M ←→ M (j) ]; (i) [M ←→ cM (i) ], where c ∈ R and c 6= 0; (i) [M ←→ M (i) + cM (j) ], with i 6= j and c ∈ R. Now, let us describe the Gauss-Jordan algorithm (in four steps): i) we eliminate all zero rows of 0 = 0); ii) we want to make sure that M (they correspond to the trivial equation M(1) 6= 0 (i.e., A(1) 6= 0); in case it is not, we can always pursue it, interchanging two columns (this corresponds to an interchange of 2 variables); iii) we want to make If a11 = 0, a11 = 1. To obtain this, we proceed as follows. we perform operation I [M (1) ←→ M (i) ] and we get a11 6= 0 (it is always possible to do this since, from step ii) it follows that there exists M(1) ). a11 6= 0. If a11 6= 1, we perform operation II [M (1) ←→ a111 M (1) ] and get a11 = 1. iv) Now, we proceed in order to get a21 = a31 = . . . = am1 = 0. It is sucient (i) to perform operation III [M ←→ M (i) − ai1 M (1) ] for i = 2, . . . , m. After these transformations, the matrix A looks like: 1 a12 . . . . . . 0 a22 . . . . . . 0 ... ... ... .. . .. . . . . 0 am0 ,2 . . . . . . a non-zero element in the column Now, we can assume that LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS with 9 m0 ≤ m. We can now repeat the algorithm i)-iv), starting from the second row and the second column ( i.e., we apply it to the sub-matrix obtained eliminating the rst row and the rst column) and get: with 1 a12 0 1 0 0 ... ... a33 . . . . . . . . . 0 0 am00 ,3 ... ... ... . ... ... m00 ≤ m0 . We keep on iterating the algorithm (this time starting from the third row and third column and then on) • etc ... If, during the performance of the algorithm, we obtain a row with b 6= 0, then the LS (0, . . . , 0, b), is incompatible (and the Gauss-Jordan method stops); • otherwise, we end up with a step system that we can solve, as showed in prop. 6. Observe that, if the method involved any interchange of variable, we need to restore the original one in the nal solution (using the opposite interchange). [READ § 1.5 - Triangular factors and row exchange AND § 1.6 - Inverses and transposes (in particular, how to compute the inverse using Gauss-Jordan method).] 3. Determinant of a square matrix [READ Ch. 4 on the book, for an introduction to the determinant] We just summarize some properties of the determinant (see § 4.2). Proposition 7. Let A = (aij ) ∈ Mn (R). We have: Qn i) if A is a diagonal matrix, det(A) = i=1 aii . In particular, det(In ) = 1; ii) if A has a zero row or column, then det(A) = 0; T iii) det(A) = det(A ); (i) iv) let A = bU +cV , where b, c ∈ R and U, V ∈ M1,n (R). Let B, C ∈ Mn (R), the matrices obtained from A, substituting A(i) with (respectively) U and V ; we have: det(A) = b det(B) + c det(C) v) vi) vii) [ an analogous result holds for the columns of A]; Let B ∈ Mn (R) be the matrix obtained from A, by interchanging two rows or columns; then det(B) = − det(A); If A has two proportional rows or columns, then det(A) = 0; (Binet's Theorem): det(AB) = det(A) det(B) ; viii) For all A ∈ GLn (R), then: det(A−1 ) = 1 . det(A) In particular, it follows that if A ∈ GLn (R), then det(A) 6= 0. 10 ALFONSO SORRENTINO Finally, we want to illustrate another important property of the determinant ( theorem). Denition. and Laplace's Some preliminary denitions are necessary. Let 1 ≤ q ≤ n. M ∈ Mm,n (R) and p, q positive integers, such p integers {i1 , . . . , ip } such that that 1≤p≤m Let us choose 1 ≤ i1 ≤ . . . ≤ ip ≤ m , and q {j1 , . . . , jq } integers such that 1 ≤ j1 ≤ . . . ≤ jq ≤ n . submatrix of M relatively to the rows i1 , . . . , ip and the columns j1 , . . . , jq , as the matrix obtained intersecting the rows M (i1 ) , . . . , M (ip ) and the columns M(j1 ) , . . . , M(jq ) . This submatrix will be denoted by M (i1 , . . . , ip | j1 , . . . , jq ). We dene the M, One can verify that the submatrices of m p n q = (p, q) are exactly: m(m − 1) . . . (m − p + 1) n(n − 1) . . . (n − q + 1) . p! q! Denition. element of type Let A = (aij ) ∈ Mn (R), A αij = αij ∈ R, dened as: with n ≥ 2. We call cofactor of aij , the A αij = αij = (−1)i+j det (A(1, . . . , 6 i, . . . , n | 1, . . . , 6 j, . . . , n)) . The matrix formed by all cofactors is called A and it is denoted by cofactor matrix (or adjoint matrix) of CA : α11 α21 CA = . .. αn1 α12 α22 ... ... . . . .. αn2 ... . α1n α2n . . . . αnn Theorem 1 (Laplace's Theorem). Let A ∈ Mn (R), with n ≥ 2. For any i, j = 1, . . . , n: det(A) = n X ait αit and det(A) = t=1 n X atj αtj t=1 these expressions are called, respectively, expansion of the determinant with respect to the row A(i) and expansion of the determinant with respect to the column A(j) ). ( Remark. This theorem provides a very useful tool for computing the determinant of a matrix. In fact, given a square matrix A of order n and xed one of its rows or columns, we can rewrite the determinant as the sum of submatrices of order as the sum of n−1 n − 1; n determinants of inductively, each of these determinants can be written determinants of submatrices of order n−2 and so on ... Obviously, the more zeros there are in the chosen row or column, the easier the computation becomes. Corollary 1. Let A ∈ Mn (R). We have: T CA A = det(A)In = AT CA . It follows that: if A ∈ GLn (R), then A−1 = 1 CT . det(A) A LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS Proof. 11 Let us start to verify that: T (CA A)ij = det(A)δij = det(A) 0 if i = j if i = 6 j. In fact, T (CA A)ij (i) T A(j) = (CA )(i) A(j) = a1j n X (α1i , . . . , αni ) ... = αti atj . T CA = = anj t=1 i = j , the above sum is the expansion of the det(A) with respect to A(i) ; therefore, T (CA A)ij = det(A). If i 6= j , let us denote by B the matrix obtained from A, by substituting A(i) with A(j) . Let us compute the Laplace expansion of B with respect to B(i) . Observe A B that αti = αti (in fact, these cofactors do not depend on the i-th row, and A and B coincide apart from this row). Hence: If 0 = det(B) = n X B bti αti t=1 = n X A T atj αti = (CA A)ij . t=1 It remains to verify that: T (ACA )ij = det(A)δij . Proceeding as above, one can verify: T (ACA )ij = n X αjt ait . t=1 i = j , the above sum is the expansion of the det(A), with respect to A(i) ; if i 6= j , (i) (j) let us denote by B the matrix obtained from A, by substituting A with A . One (j) expands det(B) with respect to B and can proceed as above. If 4. Rank of a matrix [See also denition in § 2.2,4] In lecture I, we dened the vector subspace they span; rank of t vectors v1 , . . . , vt ∈ V i.e., as the dimension of the rank (v1 , . . . , vt ) = dim (hv1 , . . . , vt i). Recall, that it coincides with the maximum number of linearly independent vectors among {v1 , . . . , vt }; moreover, rank (v1 , . . . , vt ) ≤ min{t, dim (V )}. A ∈ Mm,n (R). Observe that the matrix A is made of m rows A(1) , . . . , A(m) ∈ M1,n (R) and n columns A(1) , . . . , A(n) ∈ Mm,1 (R); moreover, recall that M1,n (R) and Mm,1 (R) are two vector subspace, with dimension respectively n and m. Let Denition. We dene the row rank of A as the non-negative integer: rA = rank (A(1) , . . . , A(m) ) = dim (hA(1) , . . . , A(m) i) . Obviously, rA ≤ min{m, n}. Analogously, we dene the column rank of A as the non-negative integer: cA = rank (A(1) , . . . , A(n) ) = dim (hA(1) , . . . , A(n) i) . Also in this case, cA ≤ min{m, n}. 12 ALFONSO SORRENTINO Theorem 2. For each matrix A ∈ Mm,n (R), it results: rA = cA . For a proof of this theorem, see § 2.4. rank This common value will be called Denition. Let A ∈ Mm,n (R). A. We will call the equivalently the column rank) of independent of rows (or columns) Remark. of A. This of A. i) For any matrix rank A, of the row rank (or is the maximum number of linearly A ∈ Mm,n (R), we have: T rank (A ) = rank (A) [in fact, rank (A) = rA = cAT = rank (AT )]. ii) Performing an elementary operation on the rows or columns of A, the rank does not change. Proposition 8. Let A ∈ Mn (R). The following conditions are equivalent: i) ii) iii) A ∈ GLn (R); det(A) 6= 0; rank (A) = n. Proof.i) ⇔ ii) ⇒ ii) See prop. 7 (viii). rank (A) P < n, then one row is a linear combination A(i) = j6=i cj A(j) and denote by Bj the matrix ob(i) (j) tained by A, substituting A with A (for j = 1, . . . , n and i 6= j ), then det(Bj ) = 0. Moreover, because of prop. 7: X X det(A) = cj det(Bj ) = cj 0 = 0 ; iii) If, by contradiction, of the others. Let j6=i j6=i contradiction. iii) ⇒ i) Let us denote by E i 1j = δij ). {E 1 , . . . , E n } the standard basis of Since, by hypotheses Ei = n X bit A(t) (1) {A ,...,A (n) } M1,n (R) (therefore, is a basis of M1,n (R): [i = 1, . . . , n], t=1 Pn bit ∈ R [note that E i 1j = t=1 bit atj ]. B = (bit ) ∈ Mn (R), we have: for suitable matrix (BA)ij = B (i) A(j) = n X bit atj = E i 1j If we consider the = δij ; t=1 hence, BA = In . C ∈ Mn (R), AC = In (in fact, from BA = In = AC , it follows easily that B = C and A ∈ GLn (R)). To obtain C , one can proceed as above, expressing the matrices of the standard basis of Mn,1 (R) in terms of {A(1) , . . . , A(n) }. In order to conclude the proof, it suces to determine a matrix such that We want to conclude this section, characterizing the rank of a matrix as the greatest order of its non-zero minors. Let us start with a denition. Denition. A ∈ Mm,n (R). We call minor (of order t) of A, the determinant A of order t. ρ = ρ(A) the greatest order of the non-zero minors of A. In other Let of a square submatrix of We denote with words, ρ is dened by the following conditions: LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS • • there exists at least an invertible square submatrix of all square submatrices of A of order >ρ A of order have determinant 13 ρ; = 0. Theorem 3. Let A ∈ Mm,n (R). Then, rank (A) = ρ(A). We need a lemma. Lemma 1. Let B a submatrix of A; then rank (B) ≤ rank (A). Proof. [Lemma 1] Let B = A(i1 , . . . , ip | j1 , . . . , js ). Consider the matrix M = A(i1 , . . . , ip | 1, . . . , n), of which B is a submatrix. We have: B = M (1, . . . , p | j1 , . . . , js ) . Obviously, rank (M ) = dim (hA(i1 ) , . . . , A(ip ) i) ≤ rA = rank (A) . Moreover, rank (B) = dim (hM(j1 ) , . . . , M(js ) i) ≤ cM = rank (M ) . It follows, that rank (B) ≤ rank (M ) ≤ rank (A) . Proof. [Theorem 3] Let us start to verify that ρ = ρ(A) ≤ rank (A). Let us A an invertible submatrix of order ρ: M ∈ GLρ (R). Because of prop. 8, rank (M ) = ρ; from lemma 1, rank (M ) ≤ rank (A). Therefore, ρ ≤ rank (A). choose in r = rank (A) ≤ ρ(A). Let us choose in A r linearly A(i1 ) , . . . , A(ir ) [with i1 < . . . < ir ]. Let B be the submatrix of A, formed by these rows. Obviously, rank (B) = r and therefore, B has r linearly independent columns: B(j1 ) , . . . , B(jr ) . If we dene M = A(i1 , . . . , ir | j1 , . . . , jr ), we have that det(M ) 6= 0 and consequently ρ(A) ≥ r . Conversely, let us prove that independent rows: 5. Systems of linear equations II: Rouché-Capelli's theorem and Cramer's theorem Rouché-Capelli's theorem provides a useful tool to determine whether a system is solvable or not. Theorem 4 (Rouché-Capelli). Let AX = b be a LS(m, n, R). Such LS is solvable if and only if rank (A) = rank (A b). If such system is solvable, then it admits ∞n−rank (A) solutions. Proof. AX = b is solvable ⇐⇒ there exists a ∈ Mn,1 (R) such that Aa = b a1 ⇐⇒ there exists hA(1) , . . . , A(n) i rank (A b). . . . such that Pn i=1 ai A(i) = b an ⇐⇒ hA(1) , . . . , A(n) i = hA(1) , . . . , A(n) , bi Now, suppose that the rank (A b). ∈ Mn,1 (R) LS(m, n, R) AX = b is solvable and denote r b ∈ rank (A) = r = rank (A) = A), we Without any loss of generality (up to interchanging the rows of r rows of A are linearly independent. Therefore, also the (A b) are linearly independent (and the remaining m − r are a linear can assume that the rst rst ⇐⇒ ⇐⇒ rows of combinations of the former). Let us denote: A∗ = A(1) . . . A(r) and b∗ = b1 . . . br 14 ALFONSO SORRENTINO LS(r, n, R) A∗ X = b∗ . It is easy to see that this LS is equivalent A∗ X = b∗ , instead of AX = b. ∗ ∗ If we apply the Gauss-Jordan algorithm to A X = b , since this system is compatible, we will end up with a step LS ; moreover, since rank (A) = r and the rank is and consider the to the original one; hence, we will solve preserved by elementary operations, we won't obtain any zero row! In the end our step system will have exactly r equations and, as observed in prop. 6, it will have ∞n−r . Proposition 9. Let A ∈ Mm,n (R) and B ∈ Mn,p (R). One has: rank (AB) ≤ min{rank (A), rank (B)}. Proof. It is sucient to verify that rank (AB) ≤ rank (B). In fact, if this inequality is true, it follows that: rank (AB) = rank ((AB)T ) = rank (B T AT ) ≤ rank (AT ) = rank (A). Let us consider the i-th AB : = A(i) B(1) . . . A(i) B(p) = ! n n X X = ati bt1 . . . ati btp = row of AB (i) t=1 = n X t=1 ati (bt1 . . . btp ) = t=1 Therefore, AB (i) ∈ hB (1) , ..., B (n) n X ati B (t) . t=1 i [for i = 1, . . . , m]. It follows that hAB (1) , . . . , AB (n) i ⊆ hB (1) , . . . , B (n) i and consequently rank (AB) ≤ rank (B). Corollary 2. Let A ∈ Mm,n (R). For any B ∈ GLn (R) and C ∈ GLm (R), one has: rank (A) = rank (AB) = rank (CA). Proof. We have: rank (AB) ≤ rank (A) = rank (ABB −1 ) ≤ rank (AB); therefore, rank (AB) = rank (A) . Analogously: rank (CA) ≤ rank (A) = rank (C −1 CA) ≤ rank (CA); therefore, rank (CA) = rank (A) . Let us state and prove another method, that can be used to solve LS(n, n, R): Cramer's method. Theorem 5. Let AX = b a LS(n, n, R), with A ∈ GLn (R). We have: i) this LS has a unique solution; ii) for i = 1, . . . , n let us denote with Bi the matrix obtained by A, substituting the i-th column A(i) with b. The unique solution of this LS is given by: x= (Cramer's formula). det(Bn ) det(B1 ) , ..., det(A) det(A) LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS Proof. i) The column x = A−1 b −1 A(A If y AX = b; is a solution of −1 b) = (AA 15 in fact: )b = In b = b . is another solution, then: y = In y = (A−1 A)y = A−1 (Ay) = A−1 b ; therefore, the solution is unique. Bi , using Laplace's expansion (cofactors i-th column: n n X X (B ) (A) det(Bi ) = bt αti i = bt αti ii) Let us compute the determinant of expansion) with respect to the t=1 [in fact, the cofactors w.r.t. the A−1 = Since t=1 i-th column of A and B coincide]. 1 T det(A) CA , then x = A−1 b = 1 CT b det(A) A i = 1, . . . , n): 1 1 T (i) CA b = det(A) det(A) T 1 (CA )(i) b = det(A) b1 1 .. (α1i . . . αni ) . det(A) bn n X 1 bt αti = det(A) t=1 and consequently (for xi = = = = = T CA (i) b= = 1 det(Bi ) . det(A) From Rouché-Capelli's theorem, it follows that, in order to decide whether a AX = b is solvable rank (A b). or not, one has to compute (and then compare) rank (A) Kroenecker's theorem To compute the rank of a matrix, the following result ( theorem of the bordered minors) LS and or will come in handy, saving a lot of computations! Let us start with a denition. Denition. A, of order B Let We will call a be a square submatrix of order bordered minor r+1 of B, r, of a matrix A ∈ Mm,n (R). C of will say that C the determinant of any square submatrix B as a submatrix. B with a row and a column of A. r = min{m, n} B has no bordered minors. and such that it contains We is obtained by bordering Obviously, if Theorem 6 (Kroenecker). Let A ∈ Mm,n (R). We have that rank (A) = r if and only if the two following conditions are satised: a) there exists in A an invertible square submatrix B of order r ; b) all bordered minors of B (if there exist any) are zero. Proof. (=⇒) If rank (A) = r, then ρ(A) = r and therefore there exists an invertA of order r: B ∈ GLr (R). Moreover, all minors of order A (and in particular the bordered minors of B ) are zero. Hence, ible submatrix of r+1 of conditions a) and b) are satised. 16 ALFONSO SORRENTINO (⇐=) To simplify the notation, let us suppose that B = A(1, . . . , r | 1, . . . , r). By det(B) 6= 0 (i.e., rank (B) = r). Let C = (A(1) . . . A(r) ) be the submatrix of A, formed by the rst r columns of A. Obviously, rank (C) ≤ r ; since B is a submatrix of C , then r = rank (B) ≤ rank (C) ≤ r and consequently, rank (C) = r. Hence, the columns A(1) , . . . , A(r) are linearly independent. hypothesis, i.e., rank To show the claim ( (A) = r), we need to prove that: A(r+1) , . . . , A(n) ∈ hA(1) . . . A(r) i, namely, that for t = r + 1, . . . , n, the matrix (A(1) . . . A(r) A(t) ) has rank r. Let us denote such matrix by formed by the rst r D ∈ Mm,r+1 (R) and consider the submatrix rows: D(1) . . . ∈ Mr,r+1 (R). D(r) This submatrix has rank r (it has B as a submatrix, therefore it must have i.e., rank (D) = r), it suces to verify: maximal rank). To prove the claim ( D(r+1) , . . . , D(m) ∈ hD(1) , . . . , D(r) i. (1) D .. . In fact, for s = r + 1, . . . , m, consider det . This is a bordered D(r) D(s) matrix of B , therefore it is zero. This means that its rst r + 1 columns (1) are linearly dependent and, since the rst r D , . . . , D(r) are linearly independent, we have necessarily: D(s) ∈ hD(1) , . . . , D(r) i , as we wanted to show. Remark. • • • • • • Let A ∈ Mm,n (R). To compute the rank one can proceed as follows: A an invertible square matrix B of order t; if t = min{m, n}, then rank (A) = t; if t < min{m, n}, consider all possible bordered minors of B ; if all bordered minors of B are zero, then rank (A) = t; otherwise, we have obtained a new invertible square matrix C of order t+1; therefore, rank (A) ≥ t + 1 and we repeat the above procedure. nd in Without Kroenecker's theorem, once we have found an invertible square submatrix B oforder t,we shouldcheck that all possible minors of order t + 1 are zero; they m n are ; while the bordered minors of B are only (m − t)(n − t). t+1 t+1 For instance, if A ∈ M4,6 (R) and B ∈ GL2 (R), the minors of A of order 3 are 4 6 = 80, while the bordered minors are (4 − 2)(6 − 2) = 8. 3 3 Remark. Let AX = b be a matrix and let r = rank (A). if and only if rank (A b) = r. given LS(m, n, R). Let M = (A b) be its From Rouché-Capelli's theorem, such In this case, it has ∞n−r solutions. LS complete is solvable LECTURE II: ALGEBRA OF MATRICES AND SYSTEMS OF LINEAR EQUATIONS 17 We want to describe now a procedure to nd such solutions (without using GaussJordan's method). Choose in A an invertible square submatrix damental submatrix of the LS ). Dene: A(i1 ) . . . A0 = and consider the new B r (that we will call funB = A(i1 , . . . , ir | j1 , . . . , jr ). bi 1 b0 = ... bir of order For instance, let and A(ir ) LS(r, n, R): A0 X = b0 . This system is equivalent to the orig- m − r equations, corresponding A that could be expressed as linear combinations of the remaining r. Let us solve this new system. Bring the n − r unknowns dierent from xj1 , . . . , xjr to the right-hand side and attribute to them the values t1 , . . . , tn−r ∈ R (arbitrarily chosen). We get in this way, a system LS(r, r, R) that admits a unique solution (since the coecient matrix B is invertible), that can be expressed by Cramer's formula. Varying the n − r parameters t1 , . . . , tn−r ∈ R, we get the set Σ of the ∞n−r solutions of the LS . inal one, since it has been obtained by eliminating to the rows of A particular simple solution of the system, is obtained by choosing tn−r = 0; let us denote it by z0 . Using prop. 5, the vector space of the solutions of generic solution of of this HLS Σ, (where t1 = . . . = Σ0 denotes therefore, instead of computing the it might be more convenient to compute the generic solution and then sum it up to (t1 , . . . , tn−r ) Attributing to A0 X = 0); Σ = z0 + Σ 0 z0 . the values, (1, 0, . . . , 0), (0, 1, 0, . . . , 0), . . . , (0, . . . , 0, 1) , we get n−r n−r Therefore: LS(r, r, R), each of which admits a unique solution. y1 , . . . , yn−r are linearly independent and hence form a basis ( ) n−r X Σ = z0 + ti yi , for all t1 , . . . , tn−r ∈ R . systems solutions These of Σ0 . i=1 Remark. We have already observed that a HLS AX = 0 is always solvable (in fact, it has at least the trivial solution). This fact, is conrmed by Rouché-Capelli's rank (A) is clearly equal to rank (A 0). From the same theorem, HLS(m, n, R) AX = 0 has ∞n−rank (A) . One can verify that a HLS has no eigensolutions (i.e., the only solution is the 0 trivial one), if and only if n = rank (A) [in fact, ∞ = 1]. theorem; in fact, it follows that every Department of Mathematics, Princeton University E-mail address : [email protected]