* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download MATH10212 Linear Algebra Systems of Linear Equations
Quadratic form wikipedia , lookup
Jordan normal form wikipedia , lookup
Determinant wikipedia , lookup
Fundamental theorem of algebra wikipedia , lookup
Elementary algebra wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Signal-flow graph wikipedia , lookup
Cartesian tensor wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Singular-value decomposition wikipedia , lookup
System of polynomial equations wikipedia , lookup
Four-vector wikipedia , lookup
History of algebra wikipedia , lookup
Matrix calculus wikipedia , lookup
Bra–ket notation wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Basis (linear algebra) wikipedia , lookup
Matrix multiplication wikipedia , lookup
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column of n numbers (or letters): [a1 , . . . , an ] or a1 .. . . an The set of all such vectors (either only rows, or only columns) with real entries is denoted by Rn . Short notation for vectors varies: a, or a, or ~a. Definition. A linear equation in n variables x1 , x2 , . . . , xn is an equation a1 x1 + a2 x2 + · · · + an xn = b where the coefficients a1 , a2 , . . . , an and the constant term b are constants. A solution of a linear equation a1 x1 + a2 x2 + · · · + an xn = b is a vector [s1 , s2 , . . . , sn ] whose components satisfy the equation when we substitute x1 = s1 , x2 = s2 , . . . , xn = sn , that is, a1 s1 + a2 s2 + · · · + an sn = b. A system of linear equations is a finite set of linear equations, each with the same variables. A solution of a system of linear equations is a vector that is simultaneously a solution of each equation in the system. The solution set of a system of linear equations is the set of all solutions of the system. 1 2 MATH10212 • Linear Algebra • Brief lecture notes Definition A general solution of a linear system (or equation) is an expression of the unknowns in terms of certain parameters that can take independently any values producing all the solutions of the equation (and only solutions). Two linear systems are equivalent if they have the same solution sets. For example, x+y =3 x − y = −1 and x − y = −1 y=2 are equivalent, since both have the unique solution [1, 2]. We solve a system of linear equations by transforming it into an equivalent one of a triangular or staircase pattern: x−y−z y + 3z 5z = = = −4 11 15 Using back substitution, we find successively that z = 3, y = 11 − 3 · 3 = 1, and x = −4 + 1 + 2 = 1. So the unique solution is [1, 2, 3]. !!! However, in many cases the solution is not unique, or may not exist. If it does exist, we need to find all solutions. Another example: x−y+z y+z = = −1 1 Using back substitution: y = 1 − z; x = y − z − 1 = (1 − z) − z − 1 = −2z; thus, x = −2t, y = 1 − t, z = t, where t is a parameter; so the solution set is {[−2t, 1 − t, t] | t ∈ R}; infinitely many solutions. Matrices and Echelon Form The coefficient matrix of a linear system contains the coefficients of the variables, and the augmented matrix is the coefficient matrix augmented by an extra column containing the constant terms. (At the moment, matrix for us is simply a table of coefficients; no prior knowledge of matrices is assumed; properties of matrices will be studied later.) For the system 2x + y − z = 3 x + 5z = 1 −x + 3y − 2z = 0 the coefficient matrix is 3 MATH10212 • Linear Algebra • Brief lecture notes 2 1 −1 1 0 3 −1 5 −2 and the augmented matrix is 2 1 −1 1 0 3 −1 5 −2 ¯ ¯ 3 ¯ ¯ 1 ¯ ¯ 0 If a variable is missing, its coefficient 0 is entered in the appropriate position in the matrix. If we denote the coefficient matrix of a linear system by A and the column vector of constant terms by ~b, then the form of the augmented matrix is [A|~b]. Definition A matrix is in row echelon form if: 1. Any rows consisting entirely of zeros are at the bottom. 2. In each nonzero row, the first nonzero entry (called the leading entry) is in a column to the left of any leading entries below it. Definition If the augmented matrix of a linear system is in r.e.f., then the leading variables are those corresponding to the leading entries; the free variables are all the remaining variables (possibly, none). Remark If the augmented matrix of a linear system is in r.e.f., then it is easy to solve it (or see that there are no solutions): namely, there are no solutions if and only if there is a “bad row” at the bottom [0, 0, . . . , 0, b] with b 6= 0. If there is no bad row, then one can solve the system using back substitution: express the leading var. in the equation corresponding to the lowest non-zero row, substitute into all the upper equations, then express the leading var. from the equation of the next-upward row, substitute everywhere above, and so on. Elementary Row Operations These are what is used to arrive at r.e.f. for solving linear systems (and there are many other applications). Definition The following elementary row operations can be performed on a matrix: 1. Interchange two rows. 4 MATH10212 • Linear Algebra • Brief lecture notes 2. Multiply a row by a nonzero constant. 3. Add a multiple of a row to another row. Remark Observe that dividing a row by a nonzero constant is implied in the above definition, since, for example, dividing a row by 2 is the same as multiplying it by 12 . Similarly, subtracting a multiple of a row from another row is the same as adding a negative multiple of a row to another row. Notation for the three elementary row operations: 1. Ri ↔ Rj means interchange rows i and j. 2. kRi means multiplying row i by k (remember that k 6= 0!). 3. Ri + kRj means add k times row j to row i (and replace row i with the result, so only the ith row is changed). The process of applying elementary row operations to bring a matrix into row echelon form, called row reduction, is used to reduce a matrix to echelon form. Remarks • E.r.o.s must be applied only one at a time, consecutively. • The row echelon form of a matrix is not unique. Lemma on inverse e.r.o.s Elementary row operations are reversible by other e.r.o.s: operations 1–3 are undone by Ri ←→ Rj , 1 Ri (using k 6= 0), k Ri − kRj . Fundamental Theorem on E.R.O.s for Linear Systems. Elementary row operations applied to the augmented matrix do not alter the solution set of a linear system. (Thus, two linear systems with row equivalent matrices have the same solution set.) Proof. Suppose that one system (old) is transformed into a new one by an elementary row operation (of one of the types 1, 2, 3). (Clearly, we only need to consider one e.r.o.) Let S1 be the solution set of the old system, and S2 the solution set of the new one. We need to show that S1 = S2 . First it is almost obvious that S1 ⊆ S2 , that is, every solution of the old system is a solution of the new one. Indeed, if it was type 1, then clearly MATH10212 • Linear Algebra • Brief lecture notes 5 nothing changes, since the solution set does not depend on the order of equations. If it was e.r.o. of type 2, then only the ith equation changes: if ai1 u1 + ai2 u2 + · · · + ain un = bi (old), then kai1 u1 + kai2 u2 + · · · + kain un = k(ai1 u1 + ai2 u2 + · · · + ain un ) = kbi (new), so a solution (u1 , . . . , un ) of the old system remains a solution of the new one. Similarly, if it was type 3: only the ith equation changes: if [u1 , . . . , un ] was a solution of the old system, then both ai1 u1 + ai2 u2 + · · · + ain un = bi and aj1 u1 + aj2 u2 + · · · + ajn un = bj , whence by adding the second times k to the second and collecting terms we get (ai1 + kaj1 )u1 + (ai2 + kaj2 )u2 + · · · + (ain + kajn )un = bi + kbj , so [u1 , . . . , un ] remains a solution of the new system. Thus, in each case, S1 ⊆ S2 . But by Lemma on inverses each e.r.o. has inverse, so the old system can also be obtained from the new one by an elementary row operation. Therefore, by the same argument, we also have S2 ⊆ S1 . Since now both S2 ⊆ S1 and S1 ⊆ S2 , we have S2 = S1 , as required. This theorem is the theoretical basis of methods of solution by e.r.o.s. Gaussian Elimination method for solving linear systems 1. Write the augmented matrix of the system of linear equations. 2. Use elementary row operations to reduce the augmented matrix to row echelon form. 3. If there is a “bad row”, then there are no solutions. If there is no “bad row”, then solve the equivalent system that corresponds to the row-reduced matrix expressing the leading variables via the constant terms and free variables using back substitution. Remark When performed by hand, step 2 of Gaussian elimination allows quite a bit of choice. Here are some useful guidelines: (a) Locate the leftmost column that is not all zeros. (b) Create a leading entry at the top of this column using type 1 e.r.o. R1 ↔ Ri . (It helps if you make this leading entry = 1, if necessary using type 2 e.r.o. (1/k)R1 .) (c) Use the leading entry to create zeros below it: kill off all the entries of this column below the leading, using type 3 e.r.o. Ri − aR1 . (d) Cover (ignore) the first row containing the leading entry, and repeat steps (a), (b), (c) on the remaining submatrix. ...And so on, every time in (d) ignoring several upper rows with the already created leading entries. Stop when the entire matrix is in row echelon form. MATH10212 • Linear Algebra • Brief lecture notes 6 It is fairly obvious that this procedure always works. There are no solutions if and only if a “bad” row appears 0, 0, . . . , 0, b with b 6= 0: indeed, then nothing can satisfy this equation 0x1 + · · · + 0xn = b 6= 0. Variables corresponding to leading coefficients are leading variables; all other variables are free variables (possibly, none — then solution is unique). Clearly, when we back-substitute, free variables can take any values (“free”), while leading variables are uniquely expressed in terms of free variables and “lower” leading variables, which in turn are..., so in fact in the final form of solution leading variables are uniquely expressed in terms of free variables only, while free variables can take independently any values. In other words, free variables are equal to independent parameters, and leading variables are expressed in these parameters. Gauss–Jordan Elimination method for solving linear systems We can reduce the augmented matrix even further than in Gauss elimination. Definition A matrix is in reduced row echelon form if: 1. It is in row echelon form. 2. The leading entry in each nonzero row is a 1 (called a leading 1). 3. Each column containing a leading 1 has zeros everywhere else. Gauss–Jordan Elimination: 1. Write the augmented matrix of the system of linear equations. 2. Use elementary row operations to reduce the augmented matrix to reduced row echelon form. (In addition to (c) above, also kill off all entries )i.e. create zeros) above the leading one in the same column.) 3. If there is a “bad row”, then there are no solutions. If there is no “bad row” (i.e. the resulting system is consistent), then express the leading variables in terms of the constant terms and any remaining free variables. A bit more work to r.r.e.f., but then much easier expressing leading variables in terms of the free variables. The Gaussian (or Gauss–Jordan) elimination methods yield the following 7 MATH10212 • Linear Algebra • Brief lecture notes Corollary Every consistent linear system over R has either a unique solution (if there are no free variables, so all variables are leading), or infinitely many solutions (when there are free variables, which can take arbitrary values). (We included “over R” because sometimes linear systems are considered over other number systems, e.g. so-called finite fields, although in this module we work only over R.) Remark If one needs a particular solution (that is, just any one solution), simply set the parameters (leading var.) to any values (usually the simplest is to 0s). E.g. general solution {[1 − t + 2u, t, 3 + u, u] | t, u ∈ R}; setting t = u = 0 we get a particular solution [1, 0, 3, 0]; or we can set, say, t = 1 and u = 2, then we get a particular solution {[4, 1, 5, 2], etc. Definition The rank of a matrix is the number of nonzero rows in its row echelon form. We denote the rank of a matrix A by rank(A). Theorem 2.2 (The Rank Theorem) Let A be the coefficient matrix of a system of linear equations with n variables. If the system is consistent, then number of free variables = n− rank(A) Homogeneous Systems Definition A system of linear equations is called homogeneous if the constant term in each equation is zero. In other words, a homogeneous system has an augmented matrix of the form [A|~0]. E.g., the following system is homogeneous: x + 2y − 3z −x + y + 2z = 0 = 0 Remarks. 1) Every homogeneous system is consistent, as it has (at least) the trivial solution [0, 0, . . . , 0]. 2) Hence, by the Corollary above, every homogeneous system has either a unique solution (the trivial solution) or infinitely many solutions. The next theorem says that the latter case must occur if the number of variables is greater than the number of equations. Theorem 2.3. If [A|~0] is a homogeneous system of m linear equations with n variables, where m < n, then the system has infinitely many solutions. 8 MATH10212 • Linear Algebra • Brief lecture notes By-product result for matrices Definition Matrices A and B are row equivalent if there is a sequence of elementary row operations that converts A into B. For example, the matrices 0 1 2 and 1 0 0 0 2 3 2 −1 0 0 3 4 3 −2 0 1 4 5 4 −3 1 are row equivalent. Theorem 2.1 Matrices A and B are row equivalent if and only if they can be reduced to the same row echelon form. 9 MATH10212 • Linear Algebra • Brief lecture notes Spanning Sets, Linear (In)Dependence, Connections with Linear Systems Linear Combinations, Spans Recall that the sum of two vectors of the same length is a1 b1 a1 + b1 a2 b2 a2 + b2 .. + .. = .. . . . . an bn an + bn a1 ka1 a2 ka2 Multiplication by a scalar k ∈ R is: k . = . . .. .. an Definition. kan A linear combination of vectors ~v1 , ~v2 , . . . , ~vk ∈ Rn with coefficients c1 , . . . , ck ∈ R is c1~v1 + c2~v2 + · · · + ck~vk . Theorem 2.4. A system of linear equations with augmented matrix [A|~b] is consistent if and only if ~b is a linear combination of the columns of A. Method for deciding if a vector ~b is a linear combination of vectors ~a1 , . . . , ~ak (of course all vectors must be of the same length): form the linear system with augmented matrix whose columns are ~a1 , . . . , ~ak , ~b (the unknowns of this system are those coefficients). If it is consistent, then ~b is a linear combination of vectors ~a1 , . . . , ~ak ; if inconsistent, it is not. If one needs to express ~b as a linear combination of vectors ~a1 , . . . , ~ak , just produce some particular solution, which gives required coefficients. We will often be interested in the collection of all linear combinations of a given set of vectors. Definition. If S = {~v1 , ~v2 , . . . , ~vk } is a set of vectors in Rn , then the set of all linear combinations of ~v1 , ~v2 , . . . , ~vk is called the span of ~v1 , ~v2 , . . . , ~vk and is denoted by span(~v1 , ~v2 , . . . , ~vk ) or span(S). Thus, span(~v1 , ~v2 , . . . , ~vk ) = {c1~v1 + c2~v2 + · · · + ck~vk | ci ∈ R}. MATH10212 • Linear Algebra • Brief lecture notes Definition. 10 If span(S) = Rn , then S is called a spanning set for Rn . Obviously, to ask whether a vector ~b belongs to the span of vectors ~v1 , . . . , ~vk is exactly the same as to ask whether ~b is a linear combination of the vectors ~v1 , . . . , ~vk ; see Theorem 2.4 and the method described above. Linear (in)dependence Definition. A set of vectors S = {~v1 , ~v2 , . . . , ~vk } is linearly dependent if there are scalars c1 , c2 , . . . , ck at least one of which is not zero, such that c1~v1 + c2~v2 + · · · + ck~vk = ~0 A set of vectors that is not linearly dependent is called linearly independent. In other words, vectors {~v1 , ~v2 , . . . , ~vk } are linearly independent if equality c1~v1 + c2~v2 + · · · + ck~vk = ~0 implies that all the ci are zeros (or: only the trivial linear combination of the vi is equal to ~0). Remarks. • In the definition of linear dependence, the requirement that at least one of the scalars c1 , c2 , . . . , ck must be nonzero allows for the possibility that some may be zero. In the example above, ~u, ~v and w ~ are linearly dependent, since 3~u + 2~v − w ~ = ~0 and, in fact, all of the scalars are nonzero. On the other hand, ¸ · ¸ · ¸ · ¸ · 2 1 4 0 −2 +0 = 6 3 1 0 ¸ · ¸ ¸ · · 2 1 4 so , and are linearly dependent, since at least one 6 3 1 (in fact, two) of the three scalars 1, −2 and 0 is nonzero. (Note, that the actual dependence arises simply from the fact that the first two vectors are multiples.) • Since 0~v1 + 0~v2 + · · · + 0~vk = ~0 for any vectors ~v1 , ~v2 , . . . , ~vk , linear dependence essentially says that the zero vector can be expressed as a nontrivial linear combination of ~v1 , ~v2 , . . . , ~vk . Thus, linear independence means that the zero vector can be expressed as a linear combination of ~v1 , ~v2 , . . . , ~vk only in the trivial way: c1~v1 + c2~v2 + · · · + ck~vk = ~0 only if c1 = 0, c2 = 0, . . . , ck = 0. MATH10212 • Linear Algebra • Brief lecture notes Theorem 2.6. n × m matrix 11 Let ~v1 , ~v2 , . . . , ~vm be (column) vectors in Rn and let A be the A = [~v1 |~v2 | · · · |~vm ] with these vectors as its columns. Then ~v1 , ~v2 , . . . , ~vm are linearly dependent if and only if the homogeneous linear system with augmented matrix [A|~0] has a nontrivial solution. Proof. ~v1 , ~v2 , . . . , ~vm are linearly dependent if and only if there are scalars c1 , c2 , . . . , cm not all zero, such that c1~v1 + c2~v2 + · · · + cm~vm = ~0. By Theorem 2.4, this is equivalent to saying that the system with the augmented matrix [~v1 |~v2 | . . . |~vm |~0] has a non-trivial solution. Method for determining if given vectors ~v1 , ~v2 , . . . , ~vm are linearly dependent: form the homogeneous system as in Theorem 2.6 (unknowns are those coefficients). Reduce its augmented matrix to r.e.f. If there are no nontrivial solutions (= no free variables), then the vectors are linearly independent. If there are free variables, then there are non-trivial solutions and the vectors are dependent. To find a concrete dependence, find a particular non-trivial solution, which gives required coefficients; for that set the free variables to 1, say (not all to 0). Example 2.22. Any set of vectors ~0, ~v2 , . . . , ~vm containing the zero vector is linearly dependent. For we can find a nontrivial combination of the form c1~0 + c2~v2 + · · · + cm~vm = ~0. by setting c1 = 1 and c2 = c3 = · · · = cm = 0. The relationship between the intuitive notion of dependence and the formal definition is given in the next theorem. Theorem 2.5. Vectors ~v1 , ~v2 , . . . , ~vm in Rn are linearly dependent if and only if at least one of the vectors can be expressed as a linear combination of the others. Proof. If one of the vectors, say, ~v1 , is a linear combination of the others, then there are scalars c2 , . . . , cm such that ~v1 = c2~v2 + · · · + cm~vm . Rearranging, we obtain ~v1 − c2~v2 − · · · − cm~vm = ~0, MATH10212 • Linear Algebra • Brief lecture notes 12 which implies that ~v1 , ~v2 , . . . , ~vm are linearly dependent, since at least one of the scalars (namely, the coefficient 1 of ~v1 ) is nonzero. Conversely, suppose that ~v1 , ~v2 , . . . , ~vm are linearly dependent. Then there are scalars c1 , c2 , . . . , cm not all zero, such that c1~v1 + c2~v2 + · · · + cm~vm = ~0. Suppose c1 6= 0. Then c1~v1 = −c2~v2 − · · · − cm~vm and we may multiply both sides by c11 to obtain ~v1 as a linear combination of the other vectors: µ ¶ µ ¶ c2 cm ~v1 = − ~v2 − · · · − ~vm . c1 c1 Corollary. Two vectors u, v ∈ Rn are linearly dependent if and only if they are proportional. E.g., vectors [1, 2, 1] and [1, 1, 3] are linearly independent, as they are not proportional. Vectors [−1, 2, −1] and [2, −4, 2] are lin. dependent, since they are proportional (with coeff. −2). Theorem 2.8. Any set of m vectors in Rn is linearly dependent if m > n. Proof. Let ~v1 , ~v2 , . . . , ~vm be (column) vectors in Rn and let A be the n × m matrix A = [~v1 |~v2 | · · · |~vm ] with these vectors as its columns. By Theorem 2.6, ~v1 , ~v2 , . . . , ~vm are linearly dependent if and only if the homogeneous linear system with augmented matrix [A|~0] has a nontrivial solution. But, according to Theorem 2.3 (not 2.6 – a misprint in the Textbook here), this will always be the case if A has more columns than rows; it is the case here, since number of columns m is greater than number of rows n. (Note that here m and n have opposite meanings compared to Theorem 2.3.) Theorem 2.7. m × n matrix Let ~v1 , ~v2 , . . . , ~vm be (row) vectors in Rn and let A be the − − ~v1 − − − − ~v2 − − .. . − − ~vm − − with these vectors as its rows. Then ~v1 , ~v2 , . . . , ~vm are linearly dependent if and only if rank(A) < m. Note that there is no linear system in Th.2.7 (although e.r.o.s must be used to reduce A to r.e.f.; then rank(A) = number of non-zero rows of this r.e.f.) 13 MATH10212 • Linear Algebra • Brief lecture notes Proof. “⇒” If ~v1 , ~v2 , . . . , ~vm are linearly dependent, then by Th. 2.5 one of these vectors is equal to a linear combination of the others. Swapping rows by type 1 e.r.o. if necessary, we can assume that ~vm = c1~v1 + · · · + cm−1~vm−1 . We can now kill off the m-th row by e.r.o.s A Rm −c1 R1 Rm −c2 R2 −→ −→ ··· Rm −cm−1 Rm−1 −→ ; the resulting matrix will have m-th row consisting of zeros. Next, we apply e.r.o.s to reduce the submatrix consisting of the upper m − 1 rows to r.e.f. Clearly, together with the zero m-th row it will be r.e.f. of A, with at most m − 1 non-zero rows. Thus, rank(A) ≤ m − 1. “⇐” (assumed without proof) The idea is that if rank(A) ≤ m − 1, then r.e.f. of A has zero row at the bottom. Analysing e.r.o.s that lead from A to this r.e.f. one can show (we assume this without proof) that one of the rows is a linear combination of the others; see the textbook, Example 2.25. Row Method for deciding if vectors ~v1 , . . . , ~vm are linearly dependent. Form the matrix A with rows ~vi (even if originally you were given columns, just “lay them down”, rotate by 900 clockwise). Reduce A by e.r.o.s to r.e.f., the number of non-zero rows in this r.e.f. is =rank(A). The vectors are linearly dependent if and only if rank(A) < m. (Again: note that there is no linear system to solve here; no unknowns, it does not matter if there is a bad row.) Theorem on e.r.o.s and spans. matrix. E.r.o.s do not alter the span of rows of a (Again: there is no linear system here, no unknowns.) Proof. Let ~v1 , ~v2 , . . . , ~vm be the rows of a matrix A, to which we apply e.r.o.s. Clearly, it is sufficient to prove that the span of rows is not changed by a single e.r.o. Let ~u1 , ~u2 , . . . , ~um be the rows of the new matrix. By the definition of e.r.o., every ~ui is a linear combination of the ~vj (most rows are even the same). Now, in every linear combination c1 ~u1 + c2 ~u2 + · · · + cm ~um we can substitute those expressions of the ~ui via the ~vj . Expand brackets, collect terms: this becomes a linear combination of the the ~vj . In other words, span(~v1 , . . . , ~vm ) ⊇ span(~u1 , . . . , ~um ). By Lemma on inverse e.r.o., the old matrix is also obtained from the new one by the inverse e.r.o. By the same argument, span(~v1 , . . . , ~vm ) ⊆ span(~u1 , . . . , ~um ). As a result, span(~v1 , . . . , ~vm ) = span(~u1 , . . . , ~um ).