* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download A Brief on Linear Algebra
Survey
Document related concepts
Fundamental theorem of algebra wikipedia , lookup
Jordan normal form wikipedia , lookup
System of polynomial equations wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Tensor operator wikipedia , lookup
Geometric algebra wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Euclidean vector wikipedia , lookup
Vector space wikipedia , lookup
System of linear equations wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Cartesian tensor wikipedia , lookup
Matrix calculus wikipedia , lookup
Four-vector wikipedia , lookup
Bra–ket notation wikipedia , lookup
Transcript
A Brief on Linear Algebra by T.S. Angell∗ 1 Introduction Those parts of linear algebra that we will use in this course, are those concerned with: (A.) The basic theory of linear homogeneous and inhomogeneous systems of simultaneous algebraic equations e.g., for n = 3 a system of the form a11 x1 + a12 x2 + a13 x3 a21 x1 + a22 x2 + a23 x3 a31 x1 + a32 x2 + a33 x3 = b1 = b2 = b3 and the corresponding homogeneous system in which b1 = b2 = b3 = 0. In particular, it is necessary to know when the system has a unique solution, and infinity of solutions, or no solution at all, as well as what the relationship of the answers to the structure of the corresponding homogeneous system. (B.) The basic ideas of matrix algebra and the way in which matrix algebra can be brought to bear on the finding of answers to the questions raised in (1) above. (C.) The basic idea of a vector space, subspace of a vector space, basis, and linear independence of vectors. (D.) The ideas of eigenvalues and eigenvectors of a matrix and how they are computed. (E.) The idea of the exponential of a matrix and how it can be computed. 2 Vector Spaces We start with the notion of a field. A field is simply a set of objects that behave like the real numbers. So there are two ways of combining elements of a field (we usually call them addition (a + b) and multiplication (a · b)), which follow the usual rules of arithmetic. There are lots of these things running around. For example, the set of rational numbers Q is a ∗ c 1999, all rights reserved 1 field, although the integers Z do not form a field (this, for example, because 1/2 is not an integer). We will concentrate entirely on two concrete examples, the set of real numbers R and the set of complex numbers C. We can now defined the idea of a vector space over a field, which we will denote by {V, F} or simply by V when the particular field involved is clear. The usual terminology is that the objects in the set V are called vectors while the elements of F are called scalars. Here is the formal definition which really is a description of the rules for computation in a vector space. Definition 2.1 A vector space over F = R or C is a non-empty set V such that for any two elements x, y ∈ V there is a unique element x + y ∈ V (we call x + y “addition of x and y”) and for every x ∈ V and every α ∈ F there is an element α x ∈ F (the “stretching” of x by α). Furthermore the following axioms must hold: The Laws of Vector Addition: (a) Commutative Law for Sums: x + y = y + x. (b) Associative Law for Sums: (x + y) + z = x + (y + z). (c) Existence of a Zero Element: There is an element 0 ∈ V such that x + 0 = x. (d) Existence of Additive Inverses: To every x ∈ V there corresponds an element x0 such that x + x0 = 0. We denote this additive inverse by the symbol −x. (Caution: This is NOT “subtraction” which is not an operation defined directly in a vector space!) The Laws of Stretching: (e) The Distributive Law #1: 2 (α + β)x = αx + βx. (f ) The Distributive Law #2 : α(x + y) = αx + βy. (g) The Associative Property : (αβ)x = α(βx). (h) Property of 1 ∈ F: For 1 ∈ F, 1 x = x. Surprisingly, perhaps, these are all the rules that are needed. All other facts can be proven from these. Let us point out three simple ones in particular which initially appear to be a little ”nitpicky”; the last serves as a good illustration of the remark that subtraction is not defined as a separate operation. First we prove a little result about uniqueness: Lemma 2.2 If u, w ∈ V and u + w = 0, then w = −u, where this latter vector is the additive inverse of u of axiom (d) above. Proof: Suppose that the vector w ∈ V satisfies the equation u + w = 0. Then we can write: w = w + 0 = w + (u + (−u)) = (w + u) + (−u) by axioms (b) and (d) = 0 + (−u) = −u. Hence w = −u. It follows that additive inverses are unique; there can be no more than one. You will notice, if you review the axioms, that there is no specific relationship mentioned between the scalar product 0u for u ∈ V, 0 ∈ F and the additive identity element 0 ∈ V. That relationship we can establish simply as follows: Lemma 2.3 For any u ∈ V the scalar product (0)u represents the same vector as the additive identity of u, namely 0 (see axiom (c) above). Proof: By definition of the additive identity (axiom (c)), 0u = (0 + 0)u, 3 and, since 0u ∈ V, this vector has an additive inverse (−0u). Using the property of the additive inverse of any vector, 0u + [−(0u)] = (0u + 0u) + [−(0u)] 0 = (0u) + (0u + [−(0u)]), 0 = 0u + 0. which yields From this, we conclude that 0u = 0. Using this result we can show the relationship between the element −1 ∈ F and the additive inverse of an element of V. Lemma 2.4 Let u ∈ V and −1 ∈ F. Then (−1)u = −u. u + [(−1)u] = [1 + (−1)]u = 0 = u + (−u), or (−u) + u + [(−1)u]) = (−u)[u + (−u)] [−u + u] + (−1)u = [−u + u] + (−u), where we have used the distributive laws throughout. This then yeilds 0 + (−1)u = 0 + (−u). We conclude that (−1)u = −u. While, to the beginner, the facts proved in these lemmas seem trivial, really as almost that there is nothing to prove, they illustrate the power of the axioms. They are in fact, properties of various elements of the vector space and relate operations in the scalar field to the operations between vectors; they need to be proved from a logical point of view. To flesh out the concepts in this section, we need some concrete working examples. Here are some simple examples. Pay particular attention to the fifth one which gives you a hint about how vector spaces and differential equations are interconnected. Example 2.5 {V, R} = {R2 , R} which we write, traditionally, as “column vectors” of real numbers. To distinguish between the entries of the vector (its “components”) which are real numbers and the scalars, which again are real numbers, we use greek letters for the latter. The operations are defined “component-wise” as you have seen earlier in your studies: If (x, y)> , (u, v)> ∈ V and α ∈ R, then addition and stretching are defined by 4 x y u x+u + = v y+v and x αx α = y αy A similar example can be constructed for Rn for any integer n > 0. Example 2.6 The set P6 (C) of all polynomial functions of degree less that or equal to six and having complex coefficients. A typical element then has the form p(z) = c0 + c1 z + c2 z 2 + c3 z 3 + c4 z 4 + c5 z 5 + c6 z 6 . As expected, the vector operations are defined pointwise so that, if q is another such polynomial we have (p + q)(z) = (c0 + d0 ) + (c1 + d1 )z + . . . + (c6 + d6 )z 6 and (a p)(z) = a c0 + a c1 z + . . . + a c6 z 6 . Remark: We could just as well define P6 (R), where we take the scalars (including the coefficients) to be real. Likewise, we could take any integer n > 0 in place of the integer 6. Example 2.7 We take V = C([0, 1], R), the set of all continuous real-valued functions defined on the interval [0, 1] over the field R. Again, we define the operations pointwise. Thus, (f + g)(x) = f (x) + g(x), for all x ∈ [0, 1] and (a f )(x) = a f (x), for all x ∈ [0, 1]. Notice that since the set of all polynomials defined on [0, 1] are continuous, the vector space {P6 , R} is a subset of {C([0, 1], R), R}. Since the latter is a vector space in its own right, we call it a subspace of the former. Example 2.8 We take V = Cp1 ([0, 1], R), the set of all continuous real-valued functions defined on the interval [0, 1] and which are piecewise differentiable on [0, 1]. For the field we again take R, and define the operations pointwise. 5 The final example here is particularly important so that we will make some additional comments. Example 2.9 Let S := {y ∈ Cp2 ([0, 1]) | y 00 + 4y = 0}. Thus S is the solution set of the linear second order ODE which describes the harmonic oscillator with frequency ωo = 2. Let H := {S, R} then this pair, H, is a vector space under the usual pointwise definitions of addition and stretching for functions. Indeed, 0 ∈ H since the zero function is always a solution of a linear homogeneous equation, and the fact that if y1 , y2 ∈ H and α ∈ R, then (a) y1 + y2 ∈ S and αy1 ∈ S is just a restatement of the Principle of Superposition! This is an important example! It gives us a hint about how the basic theory of linear algebra has some bearing on the nature of solutions of homogeneous equations. Exercise 2.10 The same construction is possible for a system of linear algebraic equations. Show that the solution set of 5x1 + 2x2 + x3 2x1 − x2 − 3x3 = 0 = 0 is a vector space over R. 3 Subspaces and Bases As we remarked after Example 2.7, the set P6 (R) is a subset of the vector space C([0, 1], R) and it is a vector space in its own right with the same i.e. pointwise, definitions of the vector space operations. This is, as is clear after a moment’s thought, also true of Cp1 ([0, 1]; R) and of the vector space H of Example 2.9. Likewise, if you have solved Exercise 2.10, you can see that the solution set is likewise a vector space sitting inside the vector space R3 considered as a vector space over R. This situation is sufficiently common to give it a special definition. Definition 3.1 A subset of V which is itself a vector space over the same field and with the same operations, is called a subspace of {V, F}. All of the examples mentioned in the preceding paragraph are therefore subspaces of the corresponding vector space. It is obvious that since if W ⊂ V then the operations which are defined on V are likewise defined on W. It is likewise true that the basic rules of computation outlined in the definition of vector space are likewise valid rules in W. Therefore, if one is to check that a particular subset is or is not a subspace, we need only concentrate on whether the additive identity and additive inverses of elements of W are again in W and whether the sums and scalar multiples of elements of W are likewise in W. In fact, the situation is less complicated than it may seem at first glance. 6 Theorem 3.2 A subset W ⊂ V is a subspace of V provided (a) If u, v ∈ W, then u + v ∈ W and (b) If u ∈ W and α ∈ F, then α u ∈ W. Remark: If both (a) and (b) are satisfied, we say that W is closed with respect to the vector space operations inherited from V, or more simply, that W is algebraically closed. Proof: Suppose that W is algebraically closed. Then, as remarked above, we need only check that the vector 0 ∈ W and that, if u ∈ W then −u ∈ W. Now, by hypothesis (b), given any vector u ∈ W the vector 0u ∈ W. But, by Lemma 2.3 0u = 0, hence 0 ∈ W. Likewise, by hypothesis (b) the scalar multiple −1u ∈ W and Lemma 2.4 asserts that −1u = −u, the additive inverse of u ∈ W. This finishes the proof. We now take a look at a couple of different examples. Example 3.3 Consider the set of all vectors in R3 which satisfy the constraint x1 + x2 = 0. So these vectors all look like a -a z where a ∈ R is arbitrary. The set of all these vectors form a subspace since b a+b a −a + −b = −(a + b) z ẑ z + ẑ and for any α ∈ R, a αa α −a = −α a . z αz so that this set is preserved under both addition of vectors and stretching by α ∈ R. Hence, according to the theorem, the set of such vectors is a subspace. Contrast this situation with the following one, again in R3 . 7 Example 3.4 Consider the set of all vectors in R3 which satisfy the equations x+y−z 3x − 2y + (z + 1) = 0 = 0 This set of vectors is not a subspace of R3 . The easiest way to see this is that the set does not contain the vector 0 as can be easily seen by setting each of the variables equal to zero and observing that the last equation becomes the statement 1 = 0. Note that this algebraic system nevertheless has a one-parameter family of solutions, namely all vectors of the form ((y + 1)/4, y, (5y + 1)/4)> ; it is not and inconsistent system. Of course, as we hinted above, the solution set of a homogeneous linear ordinary differential equation of order n defined on an interval (possibly infinite) I ⊂ R, is a subspace of, for example {C n (I, R); R}. This is a fundamental fact in the theory of linear differential equations and is checked by using the theorem of this section. We repeat, that the theorem, in the context of differential equations, is called the Principle of Superposition. Exercise 3.5 Show that the solution set of the differential equation t2 ẍ − 2t2 ẋ + 5x = 0 is a subspace of the vector space {C 2 ([0, ∞), R), R}. Just for a moment, let us consider the set of real numbers itself. If we return to the definition of a vector space and look at R carefully, then we can see that we can, although we usually do not, think of the set R as a vector space over the field R. Our purpose in pointing this out is really the observation that for this very simple vector space, there is a single vector, namely the vector 1 in terms of which every vector in R can be represented as an appropriate multiple. For example, the vector 5 can be written as 5 1 while the vector π/17 = π/17 1. The crucial fact is that every vector in R can be represented as sums of scalar multiples of a finite set of vectors, in this case the singleton set {1}. We can say that 1 is a “basic” vector or that the set {1} is a “spanning set” or the vector space R. This is, in fact, a familiar situation in R2 and R3 where it is often the case that the familiar vectors of these Euclidean spaces are “decomposed” into components along the coordinate axes. That is, say in the case of R3 we introduce the vectors which we usually call î, ĵ, and k̂ which are taken to have unit length and to be mutually perpendicular, lying as they do along the coordinate axes. We then “take components” along the coordinate axes so that a vector from the point (0, 0, 0) to the point (1, −5, 3) is thought of as a “vector” (1, −5, 3)> and is written 1 −5 = 1î − 5ĵ + 3k̂. 3 8 Again, we say that, in considering the vector space {R3 , R} the set of vectors {î, ĵ, k̂} is a spanning set for the vector space. In light of the fact that, in physical applications we often have to deal with Euclidean spaces of dimension greater than three, one often writes the elements of the spanning set as êx1 , ê2 , and ê3 instead of î, ĵ, k̂ respectively. We should emphasize that this is a symbolic notation. As the objects in the vector space are really colums of three real numbers we have 1 0 0 ê1 = 0 , ê2 = 1 , ê3 = 0 . 0 0 1 Here is a less familiar example. Example 3.6 Consider the vector space {P3 (R), R}. The, by definition, of these polynomials, a spanning set is the set of monomials {1, x, x2 , x3 } since any polynomial of degree less than or equal to 3 can be written as sums of real mustiples of these vectors. Thus, for example, the polynomial that we usually write as x2 − 1 we can represent as (−1)1 + 0x + 1x2 + 0x3 . It is useful at this point to introduce the formal definition. Definition 3.7 Given a finite set of vectors {v1 , v2 , . . . , vk } ⊂ V a linear combination of the vectors in this set is a sum of the form c1 v1 + c2 v2 + . . . , ck−1 vk−1 + ck vk , where the c1 , i = 1, 2, . . . , k are elements of the field F. So, for example, 1î − 5ĵ + 3k̂ is a linear combination of the vectors î, ĵ, and k̂. Likewise (−1)1 + 0x + 1x2 + 0x3 is a linear combination of the vectors 1, x, x2 , and x3 . To the preceding definition, we add the following: Definition 3.8 A finite set of vectors S = {v1 , v2 , . . . , vn } is said to span the vector space {V, F}, provided every vector ∈ V can be written as a linear combination of the elements of S. We use the notation V = hv1 , v2 , . . . , vn i , or V = hSi . 3 With this definition and theDremarks E made above about the vector space R , we see that, as a particular example, R = î, ĵ, k̂ . 9 Cautionary Remark: It may not be immediately obvious, but there is certainly no uniqueness associated with the existence of a spanning set. In other words, there is nothing that says that there cannot be more than one such set. Moreover, there is nothing that says that the coefficients of a representation of a vector in V with respect to a spanning set are the only choices of coefficients that can be chosen. The first of these facts turn out to be a great benefit; the second is one to find a way to avoid. We consider these facts in order by means of examples. We can then decide how to avoid the second problem by introducing a new idea. We begin with the idea of two different sets spanning a given vector space. Indeed there are infinitely many spanning sets for any vector space. Example 3.9 The vector space {P3 (R), R} is certainly spanned by the set S1 := {1, x, x2 , x3 }. There is another, and in this case important and famous, spanning set of polynomials S2 := {po , p1 , p2 , p3 }, where po (x) = 1, p1 (x) = x, p2 (x) = (1/2)(3x2 − 1), and p3 (x) = (3/2)[(5/2)x3 − x). These polynomials are called Legendre Polynomials and have many interesting properties. What concerns us here is that the set of Legendre Polynomials is a spanning set for P3 (R). So, for example, if as before. the vector q ∈ P3 (R) is given by q(x) = x2 − 1, then we can write 2 2 po + 0p1 + p2 + 0p3 . q = − 3 3 Indeed, 2 2 − po + p2 3 3 2 2 1 2 3x − 1 = − + 3 3 2 2 1 1 2 = − + 3x2 − 1 = x2 − − 3 3 3 3 2 = x − 1. Nor is it necessarily the case that two spanning sets necessarily have the same number of elements as the following example demonstrates. Example 3.10 Let S1 be the set of that name in the previous example and let S3 := {1, x, x2 , (x3 + 3x2 − 1, x3 }. Then hS1 i = hS3 i = P3 (R). What is important to recognize in this rather artificial example, is that there is a certain “redundancy” in the set S3 . Thus, the last example for that set can be written as the sum of multiples of the vectors which precede it in the list. Indeed, “by inspection”, we see that x3 = 1 + 0x − 3(x2 ) + q3 , where q := 1 x3 + 3 x2 − 1 1. 10 This shows that the vector space can be described as the span of a smaller set, namely S4 := {1, x, x2 , q}, which is a much more economical way of writing the vectors as it involves, once again, only four coefficients rather than five. 4 Linearly Independent Sets, Basis, and Dimension For many purposes, it is convenient and indeed crucial to have a set of vectors which spans a given subspace and is yet not redundant as were the sets of monomials together with the vector q3 defined by the function x3 + (3x2 − 1). As we pointed out in the example, if one uses such a redundant set, then there is more than one way to represent a given vector as a linear combination of the elements of the spanning set. In essence, what we want is a non-redundant or “minimal” spanning set. The hint as to how we can identify such sets is given to us if we look at the example, for if we write down the linear combination, 11 + 0x − 3x2 + 1x3 − 1q3 , then we see that the set of coefficients {1, 0, −3, 1, −1} is a set of coefficients, not all of which are zero, so that the linear combination above is the zero vector. This observation is a generalization of the fact that a set of two vectors is a linearly independent √ set if one vector is not a multiple of the other. For if v2 is a scalar multiple of v1 , say v2 = 2πv1 , then the span i.e., the set of all linear combinations of the set of vectors {v1 , v2 } is just the span of the singleton {v1 } since any vector of the form √ c1 v1 + c2 v2 = [c1 + c2 ( 2π)]v1 , √ is just a scalar multiple, as indicated, of v1 . Moreover, we need only take d1 = − 2π, d2 = 1, to find a linear combination of the two vectors which vanishes without both of the coefficients vanishing: √ √ √ − 2πv1 + 1v2 = (− 2π + 2π)v1 = 0. We formalize these observations with the standard definition. Definition 4.1 Given a set of vectors {v1 .v2 . . . . , vk } in a vector space {V.F}, the set is said to be linearly dependent if there is a set of scalars, {c1 , c2 , . . . , ck }, such that (a) not all of the scalars are zero, and (b) c1 v1 + c2 v2 , . . . , ck vk = 0. 11 In a given vector space, it is easy to construct a linearly dependent set of vectors. The set {1, x, x2 , x3 , q3 } is, as we have seen, a linearly dependent set in the vector space {P3 , R}. A slightly more subtle example is the following. Example 4.2 In the vector space {C( (−π, π), R)}, consider the set of functions {1, sin 2t, cos 2t, cos2 t}. If we form the linear combination 11 + 0 sin 2t + 1 cos 2t − 2 cos2 t = 0. so that these vectors are linearly dependent! To see that the previous equation is true, just remember that cos 2t = 2 cos2 t − 1. There is, naturally, a corresponding definition of the opposite case, and that is really the one that we are after. Definition 4.3 A set of vectors in a vector space is said to be linearly independent provided that it is not a linearly dependent set. As such, the definition is not very useful. What we need to do is to develop a computational test for linear independence. The usually test is one which uses the definition of dependence in a way that seems a little tricky until you get used to it. It is given in the following theorem. Theorem 4.4 Let {V, F} be a vector space. Then a set of vectors {v1 , v2 , . . . , vk } is a linearly independent set of vectors, provided that if the linear combination c1 v1 + c2 v2 + . . . + ck vk = 0, then necessarily c1 = c2 = . . . = ck = 0. Proof: On the one hand, if the set of vectors is linearly independent, suppose that there are scalars ci , i = 1, 2, . . . , k such that the linear combination c1 v1 + c2 v2 + . . . + ck vk is the zero vector. By hypothesis, these vectors are not linearly dependent. Suppose that not all the coefficients ci were zero. Then there is a first one, say cko 6= 0 , so that vko = − ck ck0 +1 vko +1 − . . . − vk . cko ck0 +1 12 Since vko 6= 0, again, at least one of the coefficients on the right hand side of this last equation is non-zero. This shows, that not all the coefficients can vanish, and hence the set of vectors must be linearly dependent, which contradicts our initial choice of the vectors {v1 , v2 , . . . , vk }. Remark: Note that we claim that if a set of vectors is linearly independent, then it cannont contain the vector 0. Why? We illustrate with an example. Example 4.5 In {P3 , R} we have seen that the set of vectors {1, x, x2 , x3 , q3 } is a linearly dependent set. However the set {x, x2 , q3 , x3 } is a linearly independent set. Indeed if we assume that c1 x + c2 x2 + c3 q3 + c4 x3 = 0, the, regrouping the left hand side, we can write, c3 + c1 x + (c2 − 3c3 )x2 + (c3 + c4 )x3 = 0. Hence c3 = 0, c1 = 0, (c2 − 3c3 ) = 0, and (c3 + c4 ) = 0. The first and third equations imply that c2 = 0, while the first and last imply that c4 = 0. Hence the vectors are linearly independent according to the last theorem. We started with the idea of a spanning set, and found that there may well be certain redundancies that we wish to eliminate by finding a minimal spanning set. A minimal spanning set is necessarily a linearly independent set, and is called a basis of the vector space. That minimal spanning sets are linearly independent sets should be pretty clear. What takes a little more thought is the fact that, while there are many different choices of minimal spanning sets for a vector space, they all have the same number of vectors in them. This is a fact that we will not prove herer. Nevertheless it is true, and it enables us to associate a number (namely the number of vectors in the basis), called the dimension with a vector space. There are two categories of vector spaces, those with only a finite number of elements in a basis and those which contain and infinite set of vectors, any finite subset of which is a linearly independent set. We formalize this remark: Definition 4.6 Let {V, F} be a vector space and suppose that a finite set of vectors B := {v1 , v2 , . . . , vn } is a set of linearly independednt vectors which span the vector space. Then we say that {V, F} is finitie dimensional and has dimension n. If the vector space contains a set with infinitely many vectors, any finite subset of which is a linearly independent set, then the vector space {V.F} is called an infinite dimensional vector space. 13 The examples given so far afford several examples of bases. The most familiar is the set {î, ĵ, k̂} which is a basis for what we usually call R3 . We talk all the time about threedimensional space. Likewise, if we look at the example {P3 , R}, then the set of monomials {1, x, x2 , x3 } is a linearly independent spanning set, it is a basis with four elements, and so the vector space {P3 , R} has dimension four. To end this section, we will consider an example which we have treated many times, but using different words. The purpose of the example is to show explicitly how these new ideas involving basis and dimension, have already played a part in what we have done. Moreover, it will serve as motivation for the next section in which we will treat systems of first order differential equations. Example 4.7 Consider the vector space {C 2 (R, R), R} consisting of twice continuously differentiable continuous real-valued functions. Like the vector space {C([0, 1], R), R}, this space contains the monomials and so is an infinite dimensional vector space. Now, we look at the set S defined as S := {x ∈ C 2 (R, R) ẍ − 4ẋ + 4x = 0}, The Principle of Superposition tells us that the set S is a subspace of {C 2 (R, R), R)} since the sum of two solutions of a homogeneous linear differential equation is again a solution, as is any constant multiple of a solution. Hence the set S ⊂ {C 2 (R, R), R} is closed with respect to the vector space operations. (See Theorem 3.2.) When we analyze the given second order equation, we usually want to fine the “general solution” in the form of a two-parameter family of solutions. Indeed, we even showed that every solution of the given differential equation could be found by proper choice of constants in the family c1 x1 (t)+c2 x2 (t), provided that one of the solutions x1 and x2 was not a multiple of the other (we called this property “linear independence”. Looking at the present specific case, the characteristic polynomial is λ2 − 4λ + 4 = (λ − 2)2 . Therefore we found that one solution was x1 (t) = e2t , and we used the method of variation of constants to find another, namely x2 (t) = te2t . Indeed, in our present use of the term, these two functions are linearly independent vectors for, if a linear combination c1 x1 (t) + c2 x2 (t) = 0, for all t, then necessarily, c1 = c2 = 0. This is easy to see, for if we evaluate the linear combination at t = 0, then c1 e0 + c2 0 = 0 or c1 = 0, and then, if we evaluate the expression c2 te2t at t = 1 we obtain c2 e2 = 0 which implies that c2 = 0. Therefore, the two vectors x1 and x2 are linearly independent in the vector space {C 2 (R, R), R}. So we have two elements of S which are linearly independent. We claim that the set {x1 , x2 } ⊂ S is in fact a spanning set of the subspace S. Much earlier, when we first studied the second order equations, we gave a proof that any solution can be written in terms of the “general solution” with proper choices of the constants c1 and c2 . In our new vocabulary, we proved that the set {x1 , x2 } was a spanning set for S. We repeat the proof here. 14 Suppose that x ∈ S. Then this choice of solution determines the two initial conditions which are satisfied by x, namely x(0) = xo and ẋ(0) = x̂o , xo , x̂o ∈ R. Moreover, the uniqueness part of the existence and uniquness theorem for the initial value problem tells us that, if there exists a function y ∈ C 2 (R, R) for which y(0) = xo and ẏ(0) = x̂o , then y(t) = x(t), for all t ∈ R. Let us construct such a function y in the form c1 x1 + c2 x2 = c1 e2t + c2 t e2t . Indeed, we need only find the constants c1 and c2 , such that 1 x1 (0) + c2 x2 (0) = xo c1 ẋ1 (0) + c2 ẋ2 (0) = x̂o . Since ẋ1 (t) = 2e2 t, and ẋ2 = 2 t e2t + e2t , that system becomes 1 1 + c2 0 = x o c1 2 + c2 1 = x̂o , which has the solution c1 = xo , c2 = x̂o − 2xo . Hence the function y(t) = xo e2t + (x̂0 − 2xo ) t e2t , satisfies the same initial conditions as the original x and so x(t) = xo e2t + (x̂0 − 2xo ) t e2t . From this result, we conclude that the set {e2t , t e2t } is a spanning set for the subspace S consisting of two linearly independent vectors. So this set is a basis for the subspace S and we see that the subspace of all solutions of the given second order differential equation has dimension two. It will be useful to keep these details in mind as you read through the following section. 5 Applications to Linear ODE The theory that we have developed in the preceeding sections is directly relevant to the problem of describing geteral solutions of linear systems of ordinary differential equations. The connection is most obvious in our repeated reference to the Principle of Superposition, especially in Example 2.9 and Exercise 3.5. As we indicated there, the solution set of a homogeneous linear equation constitutes a subspace of a vector space of continuously differentiable functions. 15 5.1 The Solution Space for a First Order System Since we have indicated the argument in the case of a single first or higher order diffenential equation, we make similar comments here regarding the case of a system of first order equations. To this end, let x ∈ Rn , A ∈ Mn×n (R) i.e., an n × n matrix having real-valued entries or entries which are real-valued functions. We consider the first order system dx dx = Ax, or in the time-varying case = A(t)x. dt dt As usual, if x = x(t) = (x1 (t), x2 (t), . . . , xn (t))> is an n-vector valued function of the dx independent variable t, a ≤ t ≤ b, then represents the column vector whose n entries are dt xi the derivatives , i = 1, 2, . . . , n. Note that any nth order scalar equation can easily be put dt into this form as we have seen earlier. For non-constant coefficients, the entries of the n × n-matrix A(t), namely the functions aij (t), i = 1, 2, . . . , n, j = 1, 2, . . . , n, are individually functions of t. We will assume that they are continuous functions on an interval [a, b] which may possibly be unbounded. Now, as we explained in our discussion of dimnsion of a vector space, the space C 1 (I, Rn ) of continuous functions on the interval I and continuously differentiable on its interior, and which take values in Rn , does not have finite dimension. That is, there is no finite linearly independent set of functions which suffice to span the entire vector space. Recall, for example, that in the case that n = 1, the infinite set of monomials {1, x, x2 , . . .} any finite subset of distinct elements is a linearly independent set, but they fail to span C 1 ([0.1], R). It is therefore interesting that the set S of all solutions of the differential equation dx = A(t)x, dt is not only a subspace of C 1 (I, Rn ) but is also finite dimensional. Indeed, there is a basis for this subspace S consisting of exactly n linearly independent solutions of the homogeneous equation. It follows, of course, that any solution of the system above can be expressed as a linear combination of these basic solutions. Actually, we have a technique that will allow us to compute such a basis in the case of constant coefficient systems provided we can solve the characteristic polynomial equation which, in the case of an nth -order system is a polynomial of degree n. But before we review that technique, we want to look at some of the underlying theory. In particular we want to establish the character of the subspace S. In order to see that S is a vector space of dimension n, we must go back to the basic existence and uniqueness theorem for the initial value problem dx = A(t)x, dt x(to ) = xo . which says that if the entries of A(t) are continuous on the interval I, then there exists a unique solution of the initial value problem for any to ∈ I. The next result gives us some 16 idea of one method of insuring that solutions are linearly independent. As we will see, it is useful when we want to generate a so-called principle fundamental matrix of solutions which we will discuss presently. (1) (2) (n) Theorem 5.1 Let xo , xo , . . . , xo be n linearly independent vectors in Rn and, for each i = 1, 2, . . . , n, let the vector function x(i) = x(i) (t) be the unique solution of the initial value problem dx(i) = A(t)x(i) , dt (1) (2) (n) Then the solutions xo , xo , . . . , xo x(i) (to ) = x(i) o . are linearly independent vectors in C 1 (I, Rn ). Proof: This result is relatively easy to prove. Supose, to the contrary, that these functions form a linearly dependent set! Then, as a relation between functions we have that there eists a set of n constants, not all zero, such that c1 x(1) + c2 x(2) + . . . cn x(n) = 0, or equivalently, c1 x(1) (t) + c2 x(2) (t) + . . . cn x(n) (t) = 0, for all t ∈ I. In particular, we must have that, at t = to where the initial condition is given, c1 x(1) (to ) + c2 x(2) (to ) + . . . cn x(n) (to ) = 0, which, in light of the given initial conditions yields, (2) (n) c1 x(1) = 0, o + c2 xo + . . . cn xo where no all the ci , i = 1, 2, . . . , n are zero. But this is then a linear combination of a set of vectors which was chosen to be linearly independent and we thus arrive at a contradiction. It follows then that the set of functions {x(1) , x(2) , . . . , x(n) } is a linearly independent set of vectors in the vector space C(I, Rn ). So, for an n×n linear homogeneous system, we can produce n linearly independent solutions by simply choosing n linearly independent initial conditions. Our next step is to show that this linearly independent set is a spanning set for the solution space S. To do so, we must show that, given any solution x ∈ S, we can find constants c1 , c2 , . . . , cn , such that 17 x = c1 x(1) + c2 x(2) + . . . cn x(n) . Let to ∈ I be arbitrary. Then the vectors x(1) (to ), x(2) (to ), . . . , x(n) (to ) are linearly independent by the same argument as used above. Hence the vector x(to ) ∈ Rn can be expresses as a linear combination of these n vectors, say x(t0 ) = d1 x(1) (to ) + d2 x(2) (to ) + . . . dn x(n) (to ). Now look at the vector function y(t) := d1 x(1) (t) + d2 x(2) (t) + . . . dn x(n) (t), t ∈ I. Then, since the equation is linear, the function y is a solution of the oritinal homogeneous equation which satisfies the initial condition y(to ) := d1 x(1) (to ) + d2 x(2) (to ) + . . . dn x(n) (to ) = x(to ), that is, the two solutions satisfy the same initial condition. Therefore, y and x coincide according to the uniqueness of solutions of the initial value problem. Hence the function x ∈ S can be written as a linear combination of teh linearly independent set of solutions {x(1) , x(2) , . . . , x(n) }. Hence the set S = x(1) , x(2) , . . . , x(n) and the set S has dimension n. This completes the proof. 5.2 Fundamental Matrices This approach show how to produce a basis for the solution set; one need only choose n linearly independent vectors in Rn and solve the associated initial value problem. In particular we can introduce here the idea of a fundamental matrix of the homogeneous system. Definition 5.2 Let {x(1) , x(2) , . . . , x(n) } be a set of linearly independent solutions of the system dx = Ax, dt of n equations. Then the n × n matrix X := col x(1) , x(2) , . . . , x(n) i.e., the n × n matrix whose columns are the given linearly independent n-vector valued solutions, is called a fundamental matrix of the system. 18 Notice that the value of the matrix X(to ) is just a matrix whose columns represent the initial conditions satisfied by the respective solution. We note that, if C is an invertible matrix with inverse C −1 , then the matrix X̂ := X C also a fundamental matrix of the system as is easily seen by looking carefully at the definition of matrix multiplication and recognizing that the matrix resulting from the post-multiplication of the fundamental matrix X is a matrix whose columns are linear combinations of the columns of X. Hence the columns of X̂. are again linearly independent solutions of the linear homogeneous system of equations. From this observation, we see that fundamental matrices are not unique; but then we would not expect them to be. After all, any choice of linearly independent initial conditions leads to n linearly independent solutions and hence to a particular fundamental matrix. Moreover, since we define the derivative of a matrix of functions as the matrix whose entries are the derivatives of the original i.e., if M (t) = (mij (t)), then dmi,j dM = . dt dt it follows that the fundamental matrix itself satisfies the matrix differential equation dX = AX(t), dt since the product of the matrices on the right can be written simply as Ax(1) , Ax(2) , · · · , Ax(n) . It is often important, usually for ease of computations, to use the so-called principal fundamental matrix for the initial value problem dx = A(t)x, x(to ) = xo . dt The principle fundamental matrix is defined as the fundamental matrix which satisfies the initial condition X(to ) = I. From our construction, it looks easy, at least theoretically, to produce the principle fundamental matrix for a given initial initial time to . One need only find the unique solution to each of the n initial value problems defined by setting the initial condition x(to ) = ei , i = 1, 2, . . . , n where the vector ei is the usual unit vector (0, 0, . . . , 0, 1, 0, . . . , 0)> , i = 1, 2, . . . , n. {z } | ith position On the other hand, in applications, the given initial conditions are usually not given by these simple unit vectors ei and we do not initially find the principle fundamental matrix. The solution is simple. Since the fundamental matrix X(t) for any choice of fixed time t, the matrix X−1 (to ) can be computed. One then produces the principal fundamental matrix by the simple device of forming the product Y(t) = X(t) X−1 (to ). 19 5.3 A Concrete Example In this section we give a particular example, one with constant coefficients, find by the usual methods of computing eigenvalues and eigenvectors for the system matrix A, a set of linearly independent solutions and find the principal fundamantal matrix for the initial value problem at to = 0. We also check that the principal fundamantal matrix satisfies the matrix form of the original differential equation. To this end, consider the 2 × 2 homogeneous system dx = dt 2 6 1 1 2 6 1 1 x. The system matrix is A = and we can find two linearly independent solutions by solving the characteristic equation det(A − λI) = 0 and finding two corresponding linearly independent eigenvectors. In this case the characteristic polynomial is 2−λ 6 det = (2 − λ)(1 − λ) − 6 1 1−λ = 2 − 2λ − λ + λ2 − 6 = λ2 − 3λ − 4 = (λ − 4)(λ + 1). So we have two eigenvalues, λ1 = −1, and λ2 = 4. To find the corresponding eigenvectors, we must solve the matrix equations, (A − λ1 I)v(1) = 0, and (A − λ2 I)v(2) = 0. In the first case, we have, setting λ1 = −1, 2 − (−1) 6 v1 = 0. v2 1 1 − (−1) or equivalently 3 6 v1 = 0. 1 2 v2 1 2 This last matrix is clearly equivalent to . 1 2 Hence the eigenvectors are given by solutions of the equation v1 + 2v2 = 0 and if we set v1 = 1, then the corresponding eigenvalue-eigenvector pair is {−1, (1, −1/2)> }. 20 Similarly, for λ2 = 4, we have 2−4 6 1 1−4 v1 v2 = 0. or equivalently −2 6 1 −3 v1 v2 = 0. 1 −3 Again, looking at the row-equivalent form , the equation for the components of 1 −3 the corresponding eigenvector is v1 − 3v2 = 0, so setting v2 = 1, we arrive at the eigenvalueeigenvector pair {4, (3, 1)> }. It follows that two linearly independent solutions of the original differential equation are (1) x (t) = 1 − 21 −t (2) e , and x (t) = 3 1 e4t . Notice that, at say t0 = 0 that x(1) (0) = (1, −1/2)> and x(2) (0) = (3, 1)> so that the fundamental matrix associated with this pair of linearly independent solutions, e−t 3e4t X(t) = , − 21 e−t e4t is certainly not the principal matrix associated with the initial time t0 = 0. However, the matrix X(0) can be inverted and, in fact has the inverse 1 2 −6 . 5 1 2 Hence the principle fundamental matrix for to = 0 1 e−t 3e4t Y(t) = 1 −t e4t 5 −2e 1 2e−t + 3e4t = −e−t + e4t 5 is 2 −6 1 2 −6e−t + 6e4t 3e−t + 2e4t . It is easy to check that this matrix reduces to the 2 × 2 identity matrix at t = 0. We leave it to the following exercise to show the following fact. Exercise 5.3 Compute the derivative dtd Y(t) and the matrix product AY(t) and check that they are equal. In other words, show that the fundamental matrix Y satisfies the original homogeneous differential equation. 21 5.4 The Non-homogeneous Linear Equation Here, we make an observation concerning the non-homogeneous system dx = A(t)x + (t) dt where (t) = (f1 (t), f2 (t0, . . . , fn (t)) is the forcing function, together with the initial condition x(to ) = xo . Since we know from the exercise above that a fundamental matrix satisfies the differential equation itself, we can assert the following result which, in light of our work with scalar first order problems, we call the Variation of Constants formula. Theorem 5.4 If X is the principal fundamental matrix for the homogeneous problem at t = t0 i.e., it is a fundamental matrix for the differential equation dx = Ax, dt and satisfies the relation X(to ) = I, then the solution of the non-homogeneous initial value problem is given by the Variation of Constants formula Z t x(t) = X(t)xo + X(t) X−1 (s)f (s) ds. to Proof Evaluating the formula at t = to it is clear that the last term on the right vanishes because the two limits of integration coincide, while the first term on the right reduces to X(to )xo = Ixo = xo so that the initial condition is satisfied. To see that the function defined by this formula indeed satisfies the differential equation, simply differentiate both sides. On the left we have simply the derivative of x while on the right, Z t d d −1 X (s) f (s) ds (X(t)) xo + X(t) + X(t) dt dt to Z t Z t d d −1 −1 = AX(t)xo + X(t) X (s) f (s) ds + X(t) X (s) f (s) ds dt dt to to Z t −1 = A X(t)xo + X(t) X (s) f (s) ds + X(t)X−1 (t) f (t) to = Ax(t) + f (t) and the function does indeed satisfy the non-homogeneous differential system. It is hard to overestimate the power of this result in the geometric theory of ordinary differential equations. It is a central result that continues to have profound uses in modern-day theory and applications to such varied fields as dynamical systems, controltheory, and mathematical physics. 22