Download Dougherty Lecture 3

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Fundamental theorem of calculus wikipedia , lookup

Matrix calculus wikipedia , lookup

Derivative wikipedia , lookup

Function of several real variables wikipedia , lookup

Sobolev space wikipedia , lookup

Chain rule wikipedia , lookup

Distribution (mathematics) wikipedia , lookup

Differential equation wikipedia , lookup

Lp space wikipedia , lookup

Neumann–Poincaré operator wikipedia , lookup

Partial differential equation wikipedia , lookup

Transcript
Lecture 3: First-Order Linear ODE’s
Dr. Michael Dougherty
January 13, 2010
1
Some Definitions
Here we briefly define a few terms which will be useful later. In fact, we will revisit the definition of
linear found in Farlow’s text (p. 12), and put it in a slightly expanded context. We will number the
coefficients differently, but the idea is the same. Also note that the order that terms y, y ′ , y ′′ , · · · , y (n)
changes according to convenience.
The general nth order linear ODE, where we assume y = y(x), is given by
an (x)
dn y
dn−1 y
dy
+ an−1 (x) n−1 + · · · + a1 (x)
+ a0 (x)y = f (x).
n
dx
dx
dx
(1)
We note that this can be rewritten using summation notation or matrix multiplication:
n
X
ak (x)
k=0
dk
y = f (x),
dxk

a0 (x)
a1 (x)
a2 (x)
···
an−1 (x)
y
y′
y ′′
..
.




an (x) 


 y (n−1)
y (n)





 = [f (x)].



It is important that the coefficients a0 (x), a1 (x), and so on are functions of x only, as is the right
hand side, f (x). An ODE which cannot be written in these ways would be called nonlinear. With
few exceptions, we will not be developing techniques for any nonlinear ODE’s.
A very useful notation we will use later in the course is borrowed from Calc I, where we define
d
dk
the basic differential operator D = dx
. Note that Dk = dx
k . With this we can write our original
ODE yet another way, where we define another, albeit more complicated differential operator L[ ]:
Definition
L[y] ======== an (x)Dn y + an−1 (x)Dn−1 y + · · · + a1 (x)Dy + a0 (x)y
= an (x)Dn + an−1 (x)Dn−1 + · · · + a1 (x)D + a0 (x) y.
(2)
which will let us write the linear equation in the more compact form
L[y] = f (x),
(3)
with the understanding that we are solving an equation, the solution being a function y = y(x).
One reason that (1), and thus L[y] = f (x) is called linear is because L is a linear differential
operator in the linear algebraic sense. The “differential” part means involving derivatives and the
“linear” part means that
L[y1 + y2 ] = L[y1 ] + L[y2 ],
L[βy] = βL[y],
for all relevant functions y1 , y2 ,
for all “scalars” β ∈ R.
1
(4)
(5)
In fact (4) and (5) together form the general linear-algebraic definition of a linear operator, with the
vector space here being some kind of function space in which L makes sense (such as a space of
n-times differentiable functions, to be somewhat specific).1
In linear algebra terms, an L satisfying (4) is said to “preserve (vector) addition,” while an L
satisfying (5) is said to “preserve scalar multiplication,” the scalars here being the constants β ∈ R.
To see that L as in (2) does indeed fit the definition of linear operator, we note first that Dk is
a linear operator:
Dk (y1 + y2 ) = Dk (y1 ) + Dk (y2 ),
Dk (βy) = βDk y,
which is just the fact that the derivative (of any order) of a sum is the sum of derivatives, and that
multiplicative constants “go along for the ride” with derivatives.
The fact that the general L is still linear follows similarly, since the ak (x) functions are coefficients which themselves “go along for the ride.” To prove in the more general case it is easier to
cite a theorem from linear algebra:
Theorem 1 An operator is linear, i.e., satisfies (4) and (5) if and only if the following holds for all
y1 , y2 , α, β:
L[αy1 + βy2 ] = αL[y1 ] + βL[y2 ].
(6)
The proof is a fairly quick linear algebra exercise. In short, If (6) holds, it is true for α, β = 1, which
gives (4), and for α = 0, giving (5). Conversely, if (4) and (5) both hold, then
L[αy1 + βy2 ] = L[αy1 ] + L[βy2 ] = αL[y1 ] + βL[y2 ],
{z
} |
{z
}
|
by (4)
by (5)
which is (6), where we used preservation of addition first, and then preservation of scalar multiplication. That completes a proof.
The upshot of Theorem 1 is that we can prove one equation, namely (6), that L preserves linear
combinations of functions y1 , y2 , to show (2) gives a linear operator in the sense of (4) and (5).
To save space, let us just show that an operator with nth, first and zero-order terms, i.e.,
L[y] = an (x)
dy
dn y
+ a1 (x)
+ a0 (x)y
n
dx
dx
is linear (and the middle terms would fit in the obvious way if included).
dn
d
{αy1 + βy2 } + a1 (x)
{αy1 + βy2 } + a0 (x) {αy1 + βy2 }
n
dx
dx
n
n
d y2
dy1
dy2
d y1
+ βa1 (x)
+ αa0 (x)y1 + βa0 (x)y2
= αan (x) n + βan (x) n + αa1 (x)
dx
dx
dx dx
dn y2
dn y1
dy1
dy2
= α an (x) n + a1 (x)
+ a0 (x)y1 + β an (x) n + a1 (x)
+ a0 (x)y2
dx
dx
dx
dx
L[αy1 + βy2 ] = an (x)
= αL[y1 ] + βL[y2 ],
1A
vector space is a set with operations called “vector addition” and “scalar multiplication,” satisfying several
structural axioms. (See any text on linear algebra.) For our part later in the course, the crucial axioms are that it is
closed under these operations, meaning if the vector space is V , then
(1) for all u, v ∈ V we have u + v ∈ V , and
(2) for all u ∈ V and β ∈ R, we have βu ∈ V .
More advanced texts define vector spaces which are “function spaces,” specifically
˛

ff
˛
C k (I) = f : I → R ˛˛ f, f ′ , f ′′ , . . . , f (k) exist and are continuous on I ,
where I is some interval, and f : I → R means that f inputs values in I and outputs values in R. (I will be the
domain, and R will contain the range.) Thus C n (I) for a given interval I, or even C n (R) are natural domains of the
operator L in (3). Note from calculus that if f k is defined and continuous, so is f k−1 , and therefore f k−2 , etc., until
we are down to f itself. C(I) is the space of functions which are continuous on I, C 1 (I) is the set whose derivatives
are also, etc. These are all vector spaces.
2
as we claimed. It is important that the coefficients ak (x) are functions of x only; if they are allowed
to contain y as well, we would lose linearity.
In fact, solving (1) is quite difficult, if not impossible without resorting to numerical methods,
except under certain circumstances. If the coefficients a0 (x), a1 (x), . . . , an (x) are all constant, then
there is great hope, as we will see in a later lecture. Fortunately, when we have such an equation
which is only order 1, even when the coefficients are nonconstant a general method is available. It
is a “clever” enough method that it is best memorized and not re-invented each time it is needed.
It is presented below.
2
Solving First-Order Linear ODE’s
By definition, these will be of the form
a1 (x)
dy
+ a0 (x)y = g(x).
dx
(7)
However, this is not the form that we will use to build our method upon. Instead, we will divide by
a1 (x), to get
a0 (x)
g(x)
dy
+
y=
,
dx a1 (x)
a1 (x)
which we then write for convenience as
dy
+ P (x)y = f (x).
dx
(8)
Most texts call (8) the standard form of (7). Solving (8) is the subject of Farlow’s Section 2.1.
Note that we divided by a1 (x), which may occasionally be zero. We have not yet discussed the
topic of just where we can find a solution, i.e., for which x’s can we solve such an equation. Thus
anytime we try to solve such an equation, we must realize that our method may well break down
outside of intervals on which P (x) and f (x) are defined and continuous. Usually it is obvious, from
the form of the solution, just where the solution is valid. We will revisit this idea as we continue
our development.
Returning to (8), the following technique (trick?) was discovered over the years:
1. Given (8), i.e.,
dy
+ P (x)y = f (x).
dx
2. Multiply both sides by µ(x) = e
R
P (x) dx
µ(x)
e
R
P (x) dx dy
dx
:
dy
+ µ(x)P (x)y = µ(x)f (x),
dx
+e
R
P (x) dx
P (x)y = e
R
P (x) dx
i.e.,
f (x).
(9)
(10)
3. Recognize that the LHS of (9) (or (10)) is a product rule. In fact, notice two things about this
new equation:
(a) The derivative of µ(x) is given by the chain rule and Fundamental Theorem of Calculus:
Z
R
R
dµ(x)
P (x) dx d
=e
P (x) dx = e P (x) dx · P (x) = P (x)µ(x);
(11)
dx
dx
(b) The RHS is a function of x alone, call it q(x) = µ(x)f (x). Thus we have
µ
dµ
dy
+y
= q(x).
dx
dx
3
(12)
4. Re-write the LHS as a derivative of a product:
d
(µy) = q(x).
dx
5. This gives µy =
Z
(13)
q(x) dx, so that
R
y=
q(x) dx
.
µ
If we would like to trace everything through based upon (8), we would get
R R P (x) dx
e
f (x) dx
R
y=
.
P
(x)
dx
e
(14)
R
The function µ(x) = e P (x) dx is called an integrating factor, because multiplying by this function
gives a desirable form, in this case a product rule form on the LHS of (8), from which we can quickly
“integrate,” or solve the ODE.
A couple of remarks about constants should be made here. First, we can use any constant we
would like in the integral appearing in the integrating factor. It is usually easier to just assume the
arbitrary constant of integration is zero there. In fact note
e
R
P (x) dx+C2
=e
R
P (x) dx C2
e
= C3 e
R
P (x) dx
,
R
so if we change the constant in P (x) dx we are simply multiplying the equation in standard form
(8) by a nonzero constant, which does not change anything, including the product rule form in
the LHS of the new equation (12). However, the whole of the integral in the numerator of (14)
will contain an arbitrary additive constant which does matter, and becomes the parameter in the
one-parameter family of solutions of the original ODE.
3
The Integrating Factor in Action
One could simply memorize the solution (14) to solve these. However, the above process is usually
superior because the formula is sufficiently complicated, and there are places to catch mistakes if
we break it into the smaller steps. Furthermore, all we need to memorize are the forms of the ODE
(8), the spirit of the process, and the integrating factor
µ(x) = e
R
P (x) dx
.
(15)
Example 1 (From a textbook of Dennis G. Zill, which we used for years at SWOSU.) Solve the
ODE: y ′ + 3x2 y = x2 .
Solution: Here P (x) = 3x2 , so
µ(x) = e
R
P (x) dx
=e
R
3x2 dx
3
= ex .
Multiplying our ODE by µ(x) gives us
y ′ + 3x2 y = x2 =⇒
=⇒
=⇒
3
3
3
ex y ′ + 3x2 ex y = ex x2
3 ′
3
ex y = x2 ex
Z
3
x3
e y = x2 ex dx
3
1 x3
e +C
3
1 x3
+C
3e
.
y=
x
e 3
ex y =
=⇒
=⇒
4
Usually this process gives us a solution which can be then simplified a bit:
3
1
+ Ce−x .
3
y=
This is a one-parameter family of curves. In fact, the solution is valid for all x ∈ R, which is the
where this solution is defined and is continuous.
Example 1 is one of the simplest. In fact, it is separable (as the reader should check), unlike the
subsequent examples given
R below. These can become less forgiving as they become
R more difficult
to compute the integrals P (x) dx, and therefore the integrating factor µ = exp( P (x) dx. Indeed
these can become much more involved. It is also crucial that the equation is in the form (8), i.e.,
y ′ + P (x)y = f (x).
dy
= x + y.
dx
First we need to get the correct form, which again was
Example 2 Solve the ODE:
y ′ + P (x)y = f (x):
dy
− y = x.
dx
This gives P (x) = −1, so µ(x) = e
gives
R
P (x) dx
=e
y ′ − y = x =⇒
=⇒
R
(−1) dx
= e−x . Multiplying by this integrating factor
e−x y ′ − e−x y = e−x x
′
e−x y = xe−x
Z
−x
e y = xe−x dx.
=⇒
Of course now we must integrate
R
xe−x dx by parts.
u=x
dv = e−x dx
du = dx
v = −e−x
This gives us
Z
xe−x dx = uv −
Z
v du = x(−e−x ) +
Inserting this into our earlier solution e−x y =
R
Z
e−x dx = −xe−x − e−x + C.
xe−x dx gives us
e−x y = −xe−x − e−x + C.
Multiplying by ex then gives us y = ex −xe−x − e−x + C and so
y = −x − 1 + Cex .
Example 3 (From another edition of Zill’s text) Solve the ODE:
Here P (x) = cot x, and so
µ(x) = e
R
P (x) dx
=e
R
cot x dx
dy
+ y cot x = 2 cos x.
dx
= eln | sin x| = | sin x|.
Here we can wave our hands a bit. After all, | sin x| = ± sin x, depending upon whether sin x is
positive or negative, but we can certainly multiply both sides of our ODE by either function (and
5
check that the method works, i.e., that we get a product rule form on the LHS of our new ODE).
For simplicity we will multiply by sin x:
y ′ + y cot x = 2 cos x =⇒
=⇒
y ′ sin x + y cot x sin x = 2 cos x sin x
y ′ sin x + y cos x = 2 cos x sin x
(y sin x)′ = 2 cos x sin x
Z
y sin x = 2 cos x sin x dx = sin2 x + C
=⇒
=⇒
=⇒
y=
sin2 x + C
.
sin x
Thus
y = sin x + C csc x,
and is valid on all intervals of the form (nπ, (n + 1)π), i.e., except where sin x = 0.
In the next example, a small amount of work has to be carried out to put the equation into the
correct form before using the formula for µ.
dy
+ y sin x = 1. (This is from Farlow 2.1, #10.)
dx
SolutionIt is important that the coefficient of the y ′ = dy/dx term is 1, so we divide:
Example 4 Solve cos x
cos x
dy
+ y sin x = 1 =⇒
dx
=⇒
=⇒
dy
+ (tan x)y = sec x
dx
e
R
tan x dx dy
R
tan x dx
R
tan x dx
sec x
dx
dy
+ eln | sec x| (tan x)y = eln | sec x| sec x.
eln | sec x|
dx
+e
(tan x)y = e
As before, we have other choices but we will simply use µ(x) = sec x here:
sec x
dy
+ (sec x tan x)y = sec2 x =⇒
dx
=⇒
d
(y sec x) = sec2 x
dx
y sec x = tan x + C
=⇒
=⇒
y = cos x(tan x + C)
y = sin x + C cos x.
Such ODE’s can also be part of initial value problems (IVP’s). If we wanted a particular solution,
say one whose graph runs through the point (π, 7), we would insert this into the solution:
7 = sin π + C cos π =⇒ 7 = 0 + C(−1) =⇒ C = −7.
The solution to the IVP would be y = sin x − 7 cos x:

dy
+ y sin x = 1
cos x
dx
=⇒

y(π) = 7
4
y = sin x − 7 cos x.
Derivation of µ(x)
For completeness the derivation of µ is given here. The idea is that one assumes an integrating
factor exists, and then attempts to use the ODE to find what the form of µ must be. Recall that
the whole point of multiplying by µ was to make the LHS a product rule, in this case. Thus we can
ignore the RHS (so long as it is a function of x). What we really need is (µy)′ = RHS, i.e.,
(µy)′ = µy ′ + P (x)µy.
6
(16)
Expanding the derivative on the left then gives
µy ′ + yµ′ = µy ′ + P (x)µy.
(17)
Subtracting µy ′ from both sides gives
yµ′ = P (x)µy,
(18)
which, after dividing by µy gives us a separable equation2
µ′
= P (x).
µ
(19)
Recalling that µ = µ(x), and putting in the integrals gives
Z ′
Z
µ (x)
dx = P (x) dx.
µ(x)
(20)
The integrand on the LHS is the same as (ln |µ(x)|)′ , so we have
Z
ln |µ(x)| = P (x) dx,
giving us
R
P (x) dx
R
P (x) dx
|µ(x)| = e
so that
µ(x) = ±e
(21)
(22)
.
(23)
Now as we mentioned before, if µ works as an integrating factor, so does −µ(x) (and in fact any
nonzero multiple of µ will work because multiplicative constants can go along for the ride), so we
wave our hands a little and just take
µ(x) = e
R
P (x) dx
.
(24)
Again, this derivation is not necessary once we know the method, but it is interesting to see how
one could derive the method. Furthermore the techniques used in this derivation can be attempted
with other, more exotic ODE’s, so it is included here.3
2 Equation (19) is separable in the sense that it can be written in the form (1/µ) · dµ/dx = P (x), or (1/µ) dµ =
P (x) dx. Recall that dµ(x) = µ′ (x) dx (divide by dx if it is unclear), or dµ = µ′ dx. Thus we can see the “separation:”
µ′
= P (x)
µ
⇐⇒
1 dµ
= P (x)
µ dx
⇐⇒
1
dµ = P (x) dx.
µ
3 Farlow’s derivation is interesting, except that he first does a simple case of y ′ +ay = f (x), with a being a constant,
which he multiplies by eax to get
eax (y ′ + ay) = eax f (x) ⇐⇒ (eax y)′ = eax f (x).
From there (p. 31) he derives the more general µ following much the same procedure as above.
Coddington also has similar derivations, with the names of the variables changed, and slightly different ways of
looking at the intermediate steps. See Coddington, 1.7 (page 43), and adjacent sections for another derivation.
7