Download A Brief on Linear Algebra

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fundamental theorem of algebra wikipedia , lookup

Jordan normal form wikipedia , lookup

System of polynomial equations wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Equation wikipedia , lookup

Tensor operator wikipedia , lookup

Geometric algebra wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Laplace–Runge–Lenz vector wikipedia , lookup

Euclidean vector wikipedia , lookup

Vector space wikipedia , lookup

Dual space wikipedia , lookup

System of linear equations wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Cartesian tensor wikipedia , lookup

Matrix calculus wikipedia , lookup

Four-vector wikipedia , lookup

Bra–ket notation wikipedia , lookup

Linear algebra wikipedia , lookup

Basis (linear algebra) wikipedia , lookup

Transcript
A Brief on Linear Algebra
by
T.S. Angell∗
1
Introduction
Those parts of linear algebra that we will use in this course, are those concerned with:
(A.) The basic theory of linear homogeneous and inhomogeneous systems of simultaneous
algebraic equations e.g., for n = 3 a system of the form
a11 x1 + a12 x2 + a13 x3
a21 x1 + a22 x2 + a23 x3
a31 x1 + a32 x2 + a33 x3
= b1
= b2
= b3
and the corresponding homogeneous system in which b1 = b2 = b3 = 0.
In particular, it is necessary to know when the system has a unique solution, and infinity
of solutions, or no solution at all, as well as what the relationship of the answers to
the structure of the corresponding homogeneous system.
(B.) The basic ideas of matrix algebra and the way in which matrix algebra can be brought
to bear on the finding of answers to the questions raised in (1) above.
(C.) The basic idea of a vector space, subspace of a vector space, basis, and linear
independence of vectors.
(D.) The ideas of eigenvalues and eigenvectors of a matrix and how they are computed.
(E.) The idea of the exponential of a matrix and how it can be computed.
2
Vector Spaces
We start with the notion of a field. A field is simply a set of objects that behave like the
real numbers. So there are two ways of combining elements of a field (we usually call them
addition (a + b) and multiplication (a · b)), which follow the usual rules of arithmetic. There
are lots of these things running around. For example, the set of rational numbers Q is a
∗
c
1999,
all rights reserved
1
field, although the integers Z do not form a field (this, for example, because 1/2 is not an
integer). We will concentrate entirely on two concrete examples, the set of real numbers R
and the set of complex numbers C.
We can now defined the idea of a vector space over a field, which we will denote by
{V, F} or simply by V when the particular field involved is clear. The usual terminology is
that the objects in the set V are called vectors while the elements of F are called scalars.
Here is the formal definition which really is a description of the rules for computation in a
vector space.
Definition 2.1 A vector space over F = R or C is a non-empty set V such that for any
two elements x, y ∈ V there is a unique element x + y ∈ V (we call x + y “addition of x
and y”) and for every x ∈ V and every α ∈ F there is an element α x ∈ F (the “stretching”
of x by α). Furthermore the following axioms must hold:
The Laws of Vector Addition:
(a) Commutative Law for Sums:
x + y = y + x.
(b) Associative Law for Sums:
(x + y) + z = x + (y + z).
(c) Existence of a Zero Element: There is an element 0 ∈ V such that
x + 0 = x.
(d) Existence of Additive Inverses: To every x ∈ V there corresponds an element x0 such
that
x + x0 = 0.
We denote this additive inverse by the symbol −x. (Caution: This is NOT “subtraction” which is not an operation defined directly in a vector space!)
The Laws of Stretching:
(e) The Distributive Law #1:
2
(α + β)x = αx + βx.
(f ) The Distributive Law #2 :
α(x + y) = αx + βy.
(g) The Associative Property :
(αβ)x = α(βx).
(h) Property of 1 ∈ F: For 1 ∈ F,
1 x = x.
Surprisingly, perhaps, these are all the rules that are needed. All other facts can be proven
from these. Let us point out three simple ones in particular which initially appear to be a
little ”nitpicky”; the last serves as a good illustration of the remark that subtraction is not
defined as a separate operation. First we prove a little result about uniqueness:
Lemma 2.2 If u, w ∈ V and u + w = 0, then w = −u, where this latter vector is the
additive inverse of u of axiom (d) above.
Proof: Suppose that the vector w ∈ V satisfies the equation u + w = 0. Then we can
write:
w
= w + 0 = w + (u + (−u)) = (w + u) + (−u) by axioms (b) and (d)
= 0 + (−u) = −u.
Hence w = −u. It follows that additive inverses are unique; there can be no more than
one.
You will notice, if you review the axioms, that there is no specific relationship mentioned
between the scalar product 0u for u ∈ V, 0 ∈ F and the additive identity element 0 ∈ V.
That relationship we can establish simply as follows:
Lemma 2.3 For any u ∈ V the scalar product (0)u represents the same vector as the
additive identity of u, namely 0 (see axiom (c) above).
Proof: By definition of the additive identity (axiom (c)),
0u = (0 + 0)u,
3
and, since 0u ∈ V, this vector has an additive inverse (−0u). Using the property of the
additive inverse of any vector,
0u + [−(0u)] = (0u + 0u) + [−(0u)]
0 = (0u) + (0u + [−(0u)]),
0 = 0u + 0.
which yields
From this, we conclude that 0u = 0.
Using this result we can show the relationship between the element −1 ∈ F and the additive
inverse of an element of V.
Lemma 2.4 Let u ∈ V and −1 ∈ F. Then (−1)u = −u.
u + [(−1)u] = [1 + (−1)]u = 0 = u + (−u), or
(−u) + u + [(−1)u]) = (−u)[u + (−u)]
[−u + u] + (−1)u = [−u + u] + (−u),
where we have used the distributive laws throughout. This then yeilds
0 + (−1)u = 0 + (−u).
We conclude that (−1)u = −u.
While, to the beginner, the facts proved in these lemmas seem trivial, really as almost
that there is nothing to prove, they illustrate the power of the axioms. They are in fact,
properties of various elements of the vector space and relate operations in the scalar field to
the operations between vectors; they need to be proved from a logical point of view.
To flesh out the concepts in this section, we need some concrete working examples. Here
are some simple examples. Pay particular attention to the fifth one which gives you a hint
about how vector spaces and differential equations are interconnected.
Example 2.5 {V, R} = {R2 , R} which we write, traditionally, as “column vectors” of real
numbers. To distinguish between the entries of the vector (its “components”) which are real
numbers and the scalars, which again are real numbers, we use greek letters for the latter.
The operations are defined “component-wise” as you have seen earlier in your studies: If
(x, y)> , (u, v)> ∈ V and α ∈ R, then addition and stretching are defined by
4
x
y
u
x+u
+
=
v
y+v
and
x
αx
α
=
y
αy
A similar example can be constructed for Rn for any integer n > 0.
Example 2.6 The set P6 (C) of all polynomial functions of degree less that or equal to six
and having complex coefficients. A typical element then has the form
p(z) = c0 + c1 z + c2 z 2 + c3 z 3 + c4 z 4 + c5 z 5 + c6 z 6 .
As expected, the vector operations are defined pointwise so that, if q is another such polynomial we have
(p + q)(z) = (c0 + d0 ) + (c1 + d1 )z + . . . + (c6 + d6 )z 6
and
(a p)(z) = a c0 + a c1 z + . . . + a c6 z 6 .
Remark: We could just as well define P6 (R), where we take the scalars (including the
coefficients) to be real. Likewise, we could take any integer n > 0 in place of the integer 6.
Example 2.7 We take V = C([0, 1], R), the set of all continuous real-valued functions
defined on the interval [0, 1] over the field R. Again, we define the operations pointwise.
Thus,
(f + g)(x) = f (x) + g(x), for all x ∈ [0, 1]
and
(a f )(x) = a f (x), for all x ∈ [0, 1].
Notice that since the set of all polynomials defined on [0, 1] are continuous, the vector space
{P6 , R} is a subset of {C([0, 1], R), R}. Since the latter is a vector space in its own right, we
call it a subspace of the former.
Example 2.8 We take V = Cp1 ([0, 1], R), the set of all continuous real-valued functions
defined on the interval [0, 1] and which are piecewise differentiable on [0, 1]. For the field we
again take R, and define the operations pointwise.
5
The final example here is particularly important so that we will make some additional comments.
Example 2.9 Let S := {y ∈ Cp2 ([0, 1]) | y 00 + 4y = 0}. Thus S is the solution set of the
linear second order ODE which describes the harmonic oscillator with frequency ωo = 2.
Let H := {S, R} then this pair, H, is a vector space under the usual pointwise definitions
of addition and stretching for functions. Indeed, 0 ∈ H since the zero function is always a
solution of a linear homogeneous equation, and the fact that if y1 , y2 ∈ H and α ∈ R, then
(a) y1 + y2 ∈ S and αy1 ∈ S is just a restatement of the Principle of Superposition!
This is an important example! It gives us a hint about how the basic theory of linear algebra
has some bearing on the nature of solutions of homogeneous equations.
Exercise 2.10 The same construction is possible for a system of linear algebraic equations.
Show that the solution set of
5x1 + 2x2 + x3
2x1 − x2 − 3x3
= 0
= 0
is a vector space over R.
3
Subspaces and Bases
As we remarked after Example 2.7, the set P6 (R) is a subset of the vector space C([0, 1], R)
and it is a vector space in its own right with the same i.e. pointwise, definitions of the vector
space operations. This is, as is clear after a moment’s thought, also true of Cp1 ([0, 1]; R) and
of the vector space H of Example 2.9. Likewise, if you have solved Exercise 2.10, you can see
that the solution set is likewise a vector space sitting inside the vector space R3 considered
as a vector space over R.
This situation is sufficiently common to give it a special definition.
Definition 3.1 A subset of V which is itself a vector space over the same field and with the
same operations, is called a subspace of {V, F}.
All of the examples mentioned in the preceding paragraph are therefore subspaces of the
corresponding vector space. It is obvious that since if W ⊂ V then the operations which are
defined on V are likewise defined on W. It is likewise true that the basic rules of computation
outlined in the definition of vector space are likewise valid rules in W. Therefore, if one is to
check that a particular subset is or is not a subspace, we need only concentrate on whether
the additive identity and additive inverses of elements of W are again in W and whether
the sums and scalar multiples of elements of W are likewise in W. In fact, the situation is
less complicated than it may seem at first glance.
6
Theorem 3.2 A subset W ⊂ V is a subspace of V provided
(a) If u, v ∈ W, then u + v ∈ W
and
(b) If u ∈ W and α ∈ F, then α u ∈ W.
Remark: If both (a) and (b) are satisfied, we say that W is closed with respect to the
vector space operations inherited from V, or more simply, that W is algebraically closed.
Proof: Suppose that W is algebraically closed. Then, as remarked above, we need only
check that the vector 0 ∈ W and that, if u ∈ W then −u ∈ W.
Now, by hypothesis (b), given any vector u ∈ W the vector 0u ∈ W. But, by Lemma 2.3
0u = 0, hence 0 ∈ W.
Likewise, by hypothesis (b) the scalar multiple −1u ∈ W and Lemma 2.4 asserts that
−1u = −u, the additive inverse of u ∈ W. This finishes the proof.
We now take a look at a couple of different examples.
Example 3.3 Consider the set of all vectors in R3 which satisfy the constraint x1 + x2 = 0.
So these vectors all look like


a
 -a 
z
where a ∈ R is arbitrary. The set of all these vectors form a subspace since



 
b
a+b
a
 −a  +  −b  =  −(a + b) 
z
ẑ
z + ẑ

and for any α ∈ R,




a
αa
α  −a  =  −α a  .
z
αz
so that this set is preserved under both addition of vectors and stretching by α ∈ R. Hence,
according to the theorem, the set of such vectors is a subspace.
Contrast this situation with the following one, again in R3 .
7
Example 3.4 Consider the set of all vectors in R3 which satisfy the equations
x+y−z
3x − 2y + (z + 1)
= 0
= 0
This set of vectors is not a subspace of R3 . The easiest way to see this is that the set does
not contain the vector 0 as can be easily seen by setting each of the variables equal to zero
and observing that the last equation becomes the statement 1 = 0. Note that this algebraic
system nevertheless has a one-parameter family of solutions, namely all vectors of the form
((y + 1)/4, y, (5y + 1)/4)> ; it is not and inconsistent system.
Of course, as we hinted above, the solution set of a homogeneous linear ordinary differential
equation of order n defined on an interval (possibly infinite) I ⊂ R, is a subspace of, for example {C n (I, R); R}. This is a fundamental fact in the theory of linear differential equations
and is checked by using the theorem of this section. We repeat, that the theorem, in the
context of differential equations, is called the Principle of Superposition.
Exercise 3.5 Show that the solution set of the differential equation
t2 ẍ − 2t2 ẋ + 5x = 0
is a subspace of the vector space {C 2 ([0, ∞), R), R}.
Just for a moment, let us consider the set of real numbers itself. If we return to the definition
of a vector space and look at R carefully, then we can see that we can, although we usually
do not, think of the set R as a vector space over the field R. Our purpose in pointing this
out is really the observation that for this very simple vector space, there is a single vector,
namely the vector 1 in terms of which every vector in R can be represented as an appropriate
multiple. For example, the vector 5 can be written as 5 1 while the vector π/17 = π/17 1.
The crucial fact is that every vector in R can be represented as sums of scalar multiples of a
finite set of vectors, in this case the singleton set {1}. We can say that 1 is a “basic” vector
or that the set {1} is a “spanning set” or the vector space R.
This is, in fact, a familiar situation in R2 and R3 where it is often the case that the familiar
vectors of these Euclidean spaces are “decomposed” into components along the coordinate
axes. That is, say in the case of R3 we introduce the vectors which we usually call î, ĵ, and
k̂ which are taken to have unit length and to be mutually perpendicular, lying as they do
along the coordinate axes. We then “take components” along the coordinate axes so that a
vector from the point (0, 0, 0) to the point (1, −5, 3) is thought of as a “vector” (1, −5, 3)>
and is written


1
 −5  = 1î − 5ĵ + 3k̂.
3
8
Again, we say that, in considering the vector space {R3 , R} the set of vectors {î, ĵ, k̂} is
a spanning set for the vector space. In light of the fact that, in physical applications we
often have to deal with Euclidean spaces of dimension greater than three, one often writes
the elements of the spanning set as êx1 , ê2 , and ê3 instead of î, ĵ, k̂ respectively. We should
emphasize that this is a symbolic notation. As the objects in the vector space are really
colums of three real numbers we have
 
 
 
1
0
0
ê1 =  0  , ê2 =  1  , ê3 =  0  .
0
0
1
Here is a less familiar example.
Example 3.6 Consider the vector space {P3 (R), R}. The, by definition, of these polynomials, a spanning set is the set of monomials {1, x, x2 , x3 } since any polynomial of degree
less than or equal to 3 can be written as sums of real mustiples of these vectors. Thus, for
example, the polynomial that we usually write as x2 − 1 we can represent as
(−1)1 + 0x + 1x2 + 0x3 .
It is useful at this point to introduce the formal definition.
Definition 3.7 Given a finite set of vectors {v1 , v2 , . . . , vk } ⊂ V a linear combination of
the vectors in this set is a sum of the form
c1 v1 + c2 v2 + . . . , ck−1 vk−1 + ck vk ,
where the c1 , i = 1, 2, . . . , k are elements of the field F.
So, for example, 1î − 5ĵ + 3k̂ is a linear combination of the vectors î, ĵ, and k̂. Likewise
(−1)1 + 0x + 1x2 + 0x3 is a linear combination of the vectors 1, x, x2 , and x3 .
To the preceding definition, we add the following:
Definition 3.8 A finite set of vectors S = {v1 , v2 , . . . , vn } is said to span the vector space
{V, F}, provided every vector ∈ V can be written as a linear combination of the elements of
S. We use the notation
V = hv1 , v2 , . . . , vn i , or V = hSi .
3
With this definition and theDremarks
E made above about the vector space R , we see that, as
a particular example, R = î, ĵ, k̂ .
9
Cautionary Remark: It may not be immediately obvious, but there is certainly no uniqueness associated with the existence of a spanning set. In other words, there is nothing that
says that there cannot be more than one such set. Moreover, there is nothing that says that
the coefficients of a representation of a vector in V with respect to a spanning set are the
only choices of coefficients that can be chosen. The first of these facts turn out to be a great
benefit; the second is one to find a way to avoid. We consider these facts in order by means
of examples. We can then decide how to avoid the second problem by introducing a new
idea.
We begin with the idea of two different sets spanning a given vector space. Indeed there are
infinitely many spanning sets for any vector space.
Example 3.9 The vector space {P3 (R), R} is certainly spanned by the set S1 := {1, x, x2 , x3 }.
There is another, and in this case important and famous, spanning set of polynomials
S2 := {po , p1 , p2 , p3 }, where po (x) = 1, p1 (x) = x, p2 (x) = (1/2)(3x2 − 1), and p3 (x) =
(3/2)[(5/2)x3 − x). These polynomials are called Legendre Polynomials and have many
interesting properties. What concerns us here is that the set of Legendre Polynomials is a
spanning set for P3 (R).
So, for example, if as before. the vector q ∈ P3 (R) is given by q(x) = x2 − 1, then we can
write
2
2
po + 0p1 +
p2 + 0p3 .
q = −
3
3
Indeed,
2
2
−
po +
p2
3
3
2 2 1
2
3x − 1
= − +
3 3 2
2 1
1 2
= − +
3x2 − 1 = x2 − −
3 3
3 3
2
= x − 1.
Nor is it necessarily the case that two spanning sets necessarily have the same number of
elements as the following example demonstrates.
Example 3.10 Let S1 be the set of that name in the previous example and let
S3 := {1, x, x2 , (x3 + 3x2 − 1, x3 }.
Then hS1 i = hS3 i = P3 (R).
What is important to recognize in this rather artificial example, is that there is a certain
“redundancy” in the set S3 . Thus, the last example for that set can be written as the sum of
multiples of the vectors which precede it in the list. Indeed, “by inspection”, we see that
x3 = 1 + 0x − 3(x2 ) + q3 , where q := 1 x3 + 3 x2 − 1 1.
10
This shows that the vector space can be described as the span of a smaller set, namely
S4 := {1, x, x2 , q},
which is a much more economical way of writing the vectors as it involves, once again, only
four coefficients rather than five.
4
Linearly Independent Sets, Basis, and Dimension
For many purposes, it is convenient and indeed crucial to have a set of vectors which spans
a given subspace and is yet not redundant as were the sets of monomials together with the
vector q3 defined by the function x3 + (3x2 − 1). As we pointed out in the example, if one
uses such a redundant set, then there is more than one way to represent a given vector as
a linear combination of the elements of the spanning set. In essence, what we want is a
non-redundant or “minimal” spanning set.
The hint as to how we can identify such sets is given to us if we look at the example, for if
we write down the linear combination,
11 + 0x − 3x2 + 1x3 − 1q3 ,
then we see that the set of coefficients {1, 0, −3, 1, −1} is a set of coefficients, not all of
which are zero, so that the linear combination above is the zero vector. This observation is a
generalization of the fact that a set of two vectors is a linearly independent √
set if one vector
is not a multiple of the other. For if v2 is a scalar multiple of v1 , say v2 = 2πv1 , then the
span i.e., the set of all linear combinations of the set of vectors {v1 , v2 } is just the span of
the singleton {v1 } since any vector of the form
√
c1 v1 + c2 v2 = [c1 + c2 ( 2π)]v1 ,
√
is just a scalar multiple, as indicated, of v1 . Moreover, we need only take d1 = − 2π, d2 = 1,
to find a linear combination of the two vectors which vanishes without both of the coefficients
vanishing:
√
√
√
− 2πv1 + 1v2 = (− 2π + 2π)v1 = 0.
We formalize these observations with the standard definition.
Definition 4.1 Given a set of vectors {v1 .v2 . . . . , vk } in a vector space {V.F}, the set is
said to be linearly dependent if there is a set of scalars, {c1 , c2 , . . . , ck }, such that
(a) not all of the scalars are zero, and
(b) c1 v1 + c2 v2 , . . . , ck vk = 0.
11
In a given vector space, it is easy to construct a linearly dependent set of vectors. The set
{1, x, x2 , x3 , q3 } is, as we have seen, a linearly dependent set in the vector space {P3 , R}. A
slightly more subtle example is the following.
Example 4.2 In the vector space {C( (−π, π), R)}, consider the set of functions
{1, sin 2t, cos 2t, cos2 t}.
If we form the linear combination
11 + 0 sin 2t + 1 cos 2t − 2 cos2 t = 0.
so that these vectors are linearly dependent! To see that the previous equation is true, just
remember that cos 2t = 2 cos2 t − 1.
There is, naturally, a corresponding definition of the opposite case, and that is really the
one that we are after.
Definition 4.3 A set of vectors in a vector space is said to be linearly independent provided
that it is not a linearly dependent set.
As such, the definition is not very useful. What we need to do is to develop a computational
test for linear independence. The usually test is one which uses the definition of dependence
in a way that seems a little tricky until you get used to it. It is given in the following
theorem.
Theorem 4.4 Let {V, F} be a vector space. Then a set of vectors {v1 , v2 , . . . , vk } is a
linearly independent set of vectors, provided that if the linear combination
c1 v1 + c2 v2 + . . . + ck vk = 0,
then necessarily c1 = c2 = . . . = ck = 0.
Proof: On the one hand, if the set of vectors is linearly independent, suppose that there are
scalars ci , i = 1, 2, . . . , k such that the linear combination
c1 v1 + c2 v2 + . . . + ck vk
is the zero vector. By hypothesis, these vectors are not linearly dependent. Suppose that
not all the coefficients ci were zero. Then there is a first one, say cko 6= 0 , so that
vko = −
ck
ck0 +1
vko +1 − . . . −
vk .
cko
ck0 +1
12
Since vko 6= 0, again, at least one of the coefficients on the right hand side of this last
equation is non-zero. This shows, that not all the coefficients can vanish, and hence the
set of vectors must be linearly dependent, which contradicts our initial choice of the vectors
{v1 , v2 , . . . , vk }.
Remark: Note that we claim that if a set of vectors is linearly independent, then it cannont
contain the vector 0. Why?
We illustrate with an example.
Example 4.5 In {P3 , R} we have seen that the set of vectors {1, x, x2 , x3 , q3 } is a linearly
dependent set. However the set {x, x2 , q3 , x3 } is a linearly independent set. Indeed if we
assume that
c1 x + c2 x2 + c3 q3 + c4 x3 = 0,
the, regrouping the left hand side, we can write,
c3 + c1 x + (c2 − 3c3 )x2 + (c3 + c4 )x3 = 0.
Hence c3 = 0, c1 = 0, (c2 − 3c3 ) = 0, and (c3 + c4 ) = 0. The first and third equations
imply that c2 = 0, while the first and last imply that c4 = 0. Hence the vectors are linearly
independent according to the last theorem.
We started with the idea of a spanning set, and found that there may well be certain
redundancies that we wish to eliminate by finding a minimal spanning set. A minimal
spanning set is necessarily a linearly independent set, and is called a basis of the vector
space. That minimal spanning sets are linearly independent sets should be pretty clear.
What takes a little more thought is the fact that, while there are many different choices of
minimal spanning sets for a vector space, they all have the same number of vectors in them.
This is a fact that we will not prove herer. Nevertheless it is true, and it enables us to
associate a number (namely the number of vectors in the basis), called the dimension with
a vector space.
There are two categories of vector spaces, those with only a finite number of elements in
a basis and those which contain and infinite set of vectors, any finite subset of which is a
linearly independent set. We formalize this remark:
Definition 4.6 Let {V, F} be a vector space and suppose that a finite set of vectors B :=
{v1 , v2 , . . . , vn } is a set of linearly independednt vectors which span the vector space. Then
we say that {V, F} is finitie dimensional and has dimension n. If the vector space contains
a set with infinitely many vectors, any finite subset of which is a linearly independent set,
then the vector space {V.F} is called an infinite dimensional vector space.
13
The examples given so far afford several examples of bases. The most familiar is the set
{î, ĵ, k̂} which is a basis for what we usually call R3 . We talk all the time about threedimensional space.
Likewise, if we look at the example {P3 , R}, then the set of monomials {1, x, x2 , x3 } is a
linearly independent spanning set, it is a basis with four elements, and so the vector space
{P3 , R} has dimension four.
To end this section, we will consider an example which we have treated many times, but
using different words. The purpose of the example is to show explicitly how these new ideas
involving basis and dimension, have already played a part in what we have done. Moreover,
it will serve as motivation for the next section in which we will treat systems of first order
differential equations.
Example 4.7 Consider the vector space {C 2 (R, R), R} consisting of twice continuously differentiable continuous real-valued functions. Like the vector space {C([0, 1], R), R}, this space
contains the monomials and so is an infinite dimensional vector space. Now, we look at the
set S defined as
S := {x ∈ C 2 (R, R) ẍ − 4ẋ + 4x = 0},
The Principle of Superposition tells us that the set S is a subspace of {C 2 (R, R), R)} since
the sum of two solutions of a homogeneous linear differential equation is again a solution,
as is any constant multiple of a solution. Hence the set S ⊂ {C 2 (R, R), R} is closed with
respect to the vector space operations. (See Theorem 3.2.)
When we analyze the given second order equation, we usually want to fine the “general
solution” in the form of a two-parameter family of solutions. Indeed, we even showed that
every solution of the given differential equation could be found by proper choice of constants
in the family c1 x1 (t)+c2 x2 (t), provided that one of the solutions x1 and x2 was not a multiple
of the other (we called this property “linear independence”.
Looking at the present specific case, the characteristic polynomial is λ2 − 4λ + 4 = (λ − 2)2 .
Therefore we found that one solution was x1 (t) = e2t , and we used the method of variation
of constants to find another, namely x2 (t) = te2t . Indeed, in our present use of the term,
these two functions are linearly independent vectors for, if a linear combination
c1 x1 (t) + c2 x2 (t) = 0,
for all t,
then necessarily, c1 = c2 = 0. This is easy to see, for if we evaluate the linear combination
at t = 0, then c1 e0 + c2 0 = 0 or c1 = 0, and then, if we evaluate the expression c2 te2t at
t = 1 we obtain c2 e2 = 0 which implies that c2 = 0. Therefore, the two vectors x1 and x2
are linearly independent in the vector space {C 2 (R, R), R}.
So we have two elements of S which are linearly independent. We claim that the set
{x1 , x2 } ⊂ S is in fact a spanning set of the subspace S. Much earlier, when we first studied
the second order equations, we gave a proof that any solution can be written in terms of the
“general solution” with proper choices of the constants c1 and c2 . In our new vocabulary, we
proved that the set {x1 , x2 } was a spanning set for S. We repeat the proof here.
14
Suppose that x ∈ S. Then this choice of solution determines the two initial conditions which
are satisfied by x, namely x(0) = xo and ẋ(0) = x̂o , xo , x̂o ∈ R. Moreover, the uniqueness
part of the existence and uniquness theorem for the initial value problem tells us that, if there
exists a function y ∈ C 2 (R, R) for which y(0) = xo and ẏ(0) = x̂o , then y(t) = x(t), for all
t ∈ R.
Let us construct such a function y in the form
c1 x1 + c2 x2 = c1 e2t + c2 t e2t .
Indeed, we need only find the constants c1 and c2 , such that
1 x1 (0) + c2 x2 (0) = xo
c1 ẋ1 (0) + c2 ẋ2 (0) = x̂o .
Since ẋ1 (t) = 2e2 t, and ẋ2 = 2 t e2t + e2t , that system becomes
1 1 + c2 0 = x o
c1 2 + c2 1 = x̂o ,
which has the solution c1 = xo , c2 = x̂o − 2xo . Hence the function
y(t) = xo e2t + (x̂0 − 2xo ) t e2t ,
satisfies the same initial conditions as the original x and so
x(t) = xo e2t + (x̂0 − 2xo ) t e2t .
From this result, we conclude that the set {e2t , t e2t } is a spanning set for the subspace S
consisting of two linearly independent vectors. So this set is a basis for the subspace S and
we see that the subspace of all solutions of the given second order differential equation has
dimension two.
It will be useful to keep these details in mind as you read through the following section.
5
Applications to Linear ODE
The theory that we have developed in the preceeding sections is directly relevant to the
problem of describing geteral solutions of linear systems of ordinary differential equations.
The connection is most obvious in our repeated reference to the Principle of Superposition,
especially in Example 2.9 and Exercise 3.5. As we indicated there, the solution set of
a homogeneous linear equation constitutes a subspace of a vector space of continuously
differentiable functions.
15
5.1
The Solution Space for a First Order System
Since we have indicated the argument in the case of a single first or higher order diffenential
equation, we make similar comments here regarding the case of a system of first order
equations. To this end, let x ∈ Rn , A ∈ Mn×n (R) i.e., an n × n matrix having real-valued
entries or entries which are real-valued functions. We consider the first order system
dx
dx
= Ax, or in the time-varying case
= A(t)x.
dt
dt
As usual, if x = x(t) = (x1 (t), x2 (t), . . . , xn (t))> is an n-vector valued function of the
dx
independent variable t, a ≤ t ≤ b, then
represents the column vector whose n entries are
dt
xi
the derivatives , i = 1, 2, . . . , n. Note that any nth order scalar equation can easily be put
dt
into this form as we have seen earlier.
For non-constant coefficients, the entries of the n × n-matrix A(t), namely the functions
aij (t), i = 1, 2, . . . , n, j = 1, 2, . . . , n, are individually functions of t. We will assume that
they are continuous functions on an interval [a, b] which may possibly be unbounded.
Now, as we explained in our discussion of dimnsion of a vector space, the space C 1 (I, Rn )
of continuous functions on the interval I and continuously differentiable on its interior, and
which take values in Rn , does not have finite dimension. That is, there is no finite linearly
independent set of functions which suffice to span the entire vector space. Recall, for example,
that in the case that n = 1, the infinite set of monomials {1, x, x2 , . . .} any finite subset of
distinct elements is a linearly independent set, but they fail to span C 1 ([0.1], R).
It is therefore interesting that the set S of all solutions of the differential equation
dx
= A(t)x,
dt
is not only a subspace of C 1 (I, Rn ) but is also finite dimensional. Indeed, there is a basis for
this subspace S consisting of exactly n linearly independent solutions of the homogeneous
equation. It follows, of course, that any solution of the system above can be expressed as a
linear combination of these basic solutions. Actually, we have a technique that will allow us
to compute such a basis in the case of constant coefficient systems provided we can solve the
characteristic polynomial equation which, in the case of an nth -order system is a polynomial
of degree n. But before we review that technique, we want to look at some of the underlying
theory. In particular we want to establish the character of the subspace S.
In order to see that S is a vector space of dimension n, we must go back to the basic existence
and uniqueness theorem for the initial value problem
dx
= A(t)x,
dt
x(to ) = xo .
which says that if the entries of A(t) are continuous on the interval I, then there exists a
unique solution of the initial value problem for any to ∈ I. The next result gives us some
16
idea of one method of insuring that solutions are linearly independent. As we will see, it is
useful when we want to generate a so-called principle fundamental matrix of solutions which
we will discuss presently.
(1)
(2)
(n)
Theorem 5.1 Let xo , xo , . . . , xo be n linearly independent vectors in Rn and, for each
i = 1, 2, . . . , n, let the vector function x(i) = x(i) (t) be the unique solution of the initial value
problem
dx(i)
= A(t)x(i) ,
dt
(1)
(2)
(n)
Then the solutions xo , xo , . . . , xo
x(i) (to ) = x(i)
o .
are linearly independent vectors in C 1 (I, Rn ).
Proof: This result is relatively easy to prove. Supose, to the contrary, that these functions
form a linearly dependent set! Then, as a relation between functions we have that there eists
a set of n constants, not all zero, such that
c1 x(1) + c2 x(2) + . . . cn x(n) = 0,
or equivalently,
c1 x(1) (t) + c2 x(2) (t) + . . . cn x(n) (t) = 0, for all t ∈ I.
In particular, we must have that, at t = to where the initial condition is given,
c1 x(1) (to ) + c2 x(2) (to ) + . . . cn x(n) (to ) = 0,
which, in light of the given initial conditions yields,
(2)
(n)
c1 x(1)
= 0,
o + c2 xo + . . . cn xo
where no all the ci , i = 1, 2, . . . , n are zero. But this is then a linear combination of a set of
vectors which was chosen to be linearly independent and we thus arrive at a contradiction.
It follows then that the set of functions
{x(1) , x(2) , . . . , x(n) }
is a linearly independent set of vectors in the vector space C(I, Rn ).
So, for an n×n linear homogeneous system, we can produce n linearly independent solutions
by simply choosing n linearly independent initial conditions.
Our next step is to show that this linearly independent set is a spanning set for the solution
space S. To do so, we must show that, given any solution x ∈ S, we can find constants
c1 , c2 , . . . , cn , such that
17
x = c1 x(1) + c2 x(2) + . . . cn x(n) .
Let to ∈ I be arbitrary. Then the vectors x(1) (to ), x(2) (to ), . . . , x(n) (to ) are linearly independent by the same argument as used above. Hence the vector x(to ) ∈ Rn can be expresses as
a linear combination of these n vectors, say
x(t0 ) = d1 x(1) (to ) + d2 x(2) (to ) + . . . dn x(n) (to ).
Now look at the vector function
y(t) := d1 x(1) (t) + d2 x(2) (t) + . . . dn x(n) (t), t ∈ I.
Then, since the equation is linear, the function y is a solution of the oritinal homogeneous
equation which satisfies the initial condition
y(to ) := d1 x(1) (to ) + d2 x(2) (to ) + . . . dn x(n) (to ) = x(to ),
that is, the two solutions satisfy the same initial condition. Therefore, y and x coincide
according to the uniqueness of solutions of the initial value problem. Hence the function
x ∈ S can be written as a linear combination of teh linearly independent set of solutions
{x(1) , x(2) , . . . , x(n) }.
Hence the set S = x(1) , x(2) , . . . , x(n) and the set S has dimension n. This completes the
proof.
5.2
Fundamental Matrices
This approach show how to produce a basis for the solution set; one need only choose
n linearly independent vectors in Rn and solve the associated initial value problem. In
particular we can introduce here the idea of a fundamental matrix of the homogeneous
system.
Definition 5.2 Let {x(1) , x(2) , . . . , x(n) } be a set of linearly independent solutions of the
system
dx
= Ax,
dt
of n equations. Then the n × n matrix X := col x(1) , x(2) , . . . , x(n) i.e., the n × n
matrix whose columns are the given linearly independent n-vector valued solutions, is called
a fundamental matrix of the system.
18
Notice that the value of the matrix X(to ) is just a matrix whose columns represent the initial
conditions satisfied by the respective solution. We note that, if C is an invertible matrix
with inverse C −1 , then the matrix X̂ := X C also a fundamental matrix of the system as
is easily seen by looking carefully at the definition of matrix multiplication and recognizing
that the matrix resulting from the post-multiplication of the fundamental matrix X is a
matrix whose columns are linear combinations of the columns of X. Hence the columns of
X̂. are again linearly independent solutions of the linear homogeneous system of equations.
From this observation, we see that fundamental matrices are not unique; but then we would
not expect them to be. After all, any choice of linearly independent initial conditions leads
to n linearly independent solutions and hence to a particular fundamental matrix. Moreover,
since we define the derivative of a matrix of functions as the matrix whose entries are the
derivatives of the original i.e., if M (t) = (mij (t)), then
dmi,j
dM
=
.
dt
dt
it follows that the fundamental matrix itself satisfies the matrix differential equation
dX
= AX(t),
dt
since the product of the matrices on the right can be written simply as
Ax(1) , Ax(2) , · · · , Ax(n) .
It is often important, usually for ease of computations, to use the so-called principal fundamental matrix for the initial value problem
dx
= A(t)x, x(to ) = xo .
dt
The principle fundamental matrix is defined as the fundamental matrix which satisfies the
initial condition X(to ) = I.
From our construction, it looks easy, at least theoretically, to produce the principle fundamental matrix for a given initial initial time to . One need only find the unique solution to
each of the n initial value problems defined by setting the initial condition x(to ) = ei , i =
1, 2, . . . , n where the vector ei is the usual unit vector (0, 0, . . . , 0, 1, 0, . . . , 0)> , i = 1, 2, . . . , n.
{z
}
|
ith position
On the other hand, in applications, the given initial conditions are usually not given by these
simple unit vectors ei and we do not initially find the principle fundamental matrix. The
solution is simple. Since the fundamental matrix X(t) for any choice of fixed time t, the
matrix X−1 (to ) can be computed. One then produces the principal fundamental matrix by
the simple device of forming the product Y(t) = X(t) X−1 (to ).
19
5.3
A Concrete Example
In this section we give a particular example, one with constant coefficients, find by the usual
methods of computing eigenvalues and eigenvectors for the system matrix A, a set of linearly
independent solutions and find the principal fundamantal matrix for the initial value problem
at to = 0. We also check that the principal fundamantal matrix satisfies the matrix form of
the original differential equation.
To this end, consider the 2 × 2 homogeneous system
dx
=
dt
2 6
1 1
2 6
1 1
x.
The system matrix is
A =
and we can find two linearly independent solutions by solving the characteristic equation
det(A − λI) = 0 and finding two corresponding linearly independent eigenvectors. In this
case the characteristic polynomial is
2−λ
6
det
=
(2 − λ)(1 − λ) − 6
1
1−λ
= 2 − 2λ − λ + λ2 − 6 = λ2 − 3λ − 4
=
(λ − 4)(λ + 1).
So we have two eigenvalues, λ1 = −1, and λ2 = 4. To find the corresponding eigenvectors,
we must solve the matrix equations,
(A − λ1 I)v(1) = 0,
and (A − λ2 I)v(2) = 0.
In the first case, we have, setting λ1 = −1,
2 − (−1)
6
v1
= 0.
v2
1
1 − (−1)
or equivalently
3 6
v1
= 0.
1 2
v2
1 2
This last matrix is clearly equivalent to
.
1 2
Hence the eigenvectors are given by solutions of the equation v1 + 2v2 = 0 and if we set
v1 = 1, then the corresponding eigenvalue-eigenvector pair is {−1, (1, −1/2)> }.
20
Similarly, for λ2 = 4, we have
2−4
6
1
1−4
v1
v2
= 0.
or equivalently
−2 6
1 −3
v1
v2
= 0.
1 −3
Again, looking at the row-equivalent form
, the equation for the components of
1 −3
the corresponding eigenvector is v1 − 3v2 = 0, so setting v2 = 1, we arrive at the eigenvalueeigenvector pair {4, (3, 1)> }.
It follows that two linearly independent solutions of the original differential equation are
(1)
x (t) =
1
− 21
−t
(2)
e ,
and x (t) =
3
1
e4t .
Notice that, at say t0 = 0 that x(1) (0) = (1, −1/2)> and x(2) (0) = (3, 1)> so that the
fundamental matrix associated with this pair of linearly independent solutions,
e−t
3e4t
X(t) =
,
− 21 e−t e4t
is certainly not the principal matrix associated with the initial time t0 = 0.
However, the matrix X(0) can be inverted and, in fact has the inverse
1 2 −6
.
5 1 2
Hence the principle fundamental matrix for to = 0
1
e−t
3e4t
Y(t) =
1 −t
e4t
5 −2e
1 2e−t + 3e4t
=
−e−t + e4t
5
is
2 −6
1 2
−6e−t + 6e4t
3e−t + 2e4t
.
It is easy to check that this matrix reduces to the 2 × 2 identity matrix at t = 0.
We leave it to the following exercise to show the following fact.
Exercise 5.3 Compute the derivative dtd Y(t) and the matrix product AY(t) and check that
they are equal. In other words, show that the fundamental matrix Y satisfies the original
homogeneous differential equation.
21
5.4
The Non-homogeneous Linear Equation
Here, we make an observation concerning the non-homogeneous system
dx
= A(t)x + (t)
dt
where (t) = (f1 (t), f2 (t0, . . . , fn (t)) is the forcing function, together with the initial condition x(to ) = xo .
Since we know from the exercise above that a fundamental matrix satisfies the differential
equation itself, we can assert the following result which, in light of our work with scalar first
order problems, we call the Variation of Constants formula.
Theorem 5.4 If X is the principal fundamental matrix for the homogeneous problem at
t = t0 i.e., it is a fundamental matrix for the differential equation
dx
= Ax,
dt
and satisfies the relation X(to ) = I, then the solution of the non-homogeneous initial value
problem is given by the Variation of Constants formula
Z t
x(t) = X(t)xo + X(t)
X−1 (s)f (s) ds.
to
Proof Evaluating the formula at t = to it is clear that the last term on the right vanishes
because the two limits of integration coincide, while the first term on the right reduces to
X(to )xo = Ixo = xo so that the initial condition is satisfied. To see that the function
defined by this formula indeed satisfies the differential equation, simply differentiate both
sides. On the left we have simply the derivative of x while on the right,
Z t
d
d
−1
X (s) f (s) ds
(X(t)) xo +
X(t) + X(t)
dt
dt
to
Z t
Z t
d
d
−1
−1
= AX(t)xo +
X(t)
X (s) f (s) ds + X(t)
X (s) f (s) ds
dt
dt to
to
Z t
−1
= A X(t)xo + X(t)
X (s) f (s) ds + X(t)X−1 (t) f (t)
to
= Ax(t) + f (t)
and the function does indeed satisfy the non-homogeneous differential system.
It is hard to overestimate the power of this result in the geometric theory of ordinary differential equations. It is a central result that continues to have profound uses in modern-day
theory and applications to such varied fields as dynamical systems, controltheory, and mathematical physics.
22