Download Higher–Dimensional Chain Rules I. Introduction. The one

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

History of calculus wikipedia , lookup

Series (mathematics) wikipedia , lookup

Automatic differentiation wikipedia , lookup

Sobolev space wikipedia , lookup

Derivative wikipedia , lookup

Catenary wikipedia , lookup

Partial differential equation wikipedia , lookup

Function of several real variables wikipedia , lookup

Matrix calculus wikipedia , lookup

Generalizations of the derivative wikipedia , lookup

Limit of a function wikipedia , lookup

Fundamental theorem of calculus wikipedia , lookup

Chain rule wikipedia , lookup

Transcript
Higher–Dimensional Chain Rules
I. Introduction.
The one–dimensional Chain Rule tells us how to find the derivative of a composed function
w1 (t) = k c(t) from the derivatives of k and c—in
the case where these are all Calculus I functions. As you
are well aware, the Chain Rule says: If k 0 c(t) and c0 (t) both exist then w10 (t) exists and is given by
i
dh
k c(t) = w10 (t) = k 0 c(t) c0 (t).
dt
(1)
Now, the notion of composing functions extends to functions with more general domains and ranges. For
example, if
~
C(t)
= h cos(t), sin(t) i
and
k(x, y) = x2 e3y ,
~ is
then the composition of k with C
~
w2 (t) = k C(t)
= cos2 (t)e3 sin(t) ,
(2)
and you might guess that there exist chain rules similar to the first one for situations such as this. That
d h ~ i
guess would be correct: there is indeed a chain rule for calculating
k C(t) from the derivatives of k
dt
~
and C. In fact, there is a whole family of chain rules, one for each case
Rn
C k k
R
R` .
−→
−→
This handout will state and (almost) prove the first and simplest of these new chain rules, the case in which
n = ` = 1 and k = 2:
~
C
k
R
R2
R.
−→
−→
Note that Equation (2) above is an instance of this case.
II. The statement of the theorem. First, let me set the stage and introduce some notation.
~ = C(t)
~
~ 0 ) = h f (t0 ), g(t0 ) i. I will assume
[a]: Let C
= h f (t), g(t) i be a planar curve, with (x0 , y0 ) = C(t
0
0
0
~
that C (t0 ) = h f (t0 ), g (t0 ) i exists.
[b]: Let k = k(x, y) be a real–valued function of two variables, defined in a neighborhood of
~ 0 ). I will assume that k is differentiable at (x0 , y0 ), so that
(x0 , y0 ) = C(t





|f (x, y) − L(x, y)|
= 0,
(x,y)→(x0 ,y0 ) k h x − x0 , y − y0 i k
lim
(3)
where L(x, y) = k(x0 , y0 ) + kx (x0 , y0 )(x − x0 ) + ky (x0 , y0 )(y − y0 ).
[c]: For any h 6= 0, let
xh = f (t0 + h)
.
yh = g(t0 + h)
I can now state the theorem.
~ be as above, and let w2 (t) = k C(t)
~
Theorem. Let k and C
. Then w2 is differentiable at t0 , and
w20 (t0 ) = kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ).
(4)
In class, we will differentiate the function in Equation (2) both by Calculus I methods and with this new
formula, and we will compare the answers.
1
III. The (almost) proof of the theorem.
Step 1. I will start with the (Calculus I) definition of the derivative of w2 :
~ 0 + h) − k C(t
~ 0)
k C(t
k f (t0 + h), g(t0 + h) − k f (t0 ), g(t0 )
= lim
h→0
h→0
h
h
w20 (t0 ) = lim
(5)
k xh , yh − k x0 , y0
.
h→0
h
= lim
Step 2. I will add and subtract L(xh , yh ) in the numerator of (5), so as to break (5) into two limit problems:
(5) = lim
h→0
k(xh , yh ) − L(xh , yh )
L(xh , yh ) − k(x0 , y0 )
+ lim
,
h→0
h
h
|
{z
}
|
{z
}
(B)
(A)
if both of these limits exist.
Step 3: Evaluating the limit of quantity (A). Since
L(xh , yh ) = k(x0 , y0 ) + kx (x0 , y0 )(xh − x0 ) + ky (x0 , y0 )(yh − y0 ),
(A) =
so that
kx (x0 , y0 )(xh − x0 ) + ky (x0 , y0 )(yh − y0 )
,
h
xh − x0
yh − y0
+ lim ky (x0 , y0 )
h→0
h→0
h
h
xh − x0
y h − y0
= kx (x0 , y0 ) lim
+ ky (x0 , y0 ) lim
h→0
h→0
h
h
f (t0 + h) − f (t0 )
g(t0 + h) − g(t0 )
= kx (x0 , y0 ) lim
+ ky (x0 , y0 ) lim
h→0
h→0
h
h
= kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ).
lim (A) = lim kx (x0 , y0 )
h→0
Thus, the limit of quantity (A) is exactly the right-hand side of Equation (4). To prove the theorem, then,
I must show that the limit of quantity (B) is zero.
Step 4: Evaluating the limit of quantity (B). Assume, for all h 6= 0, that h xh , yh i 6= h x0 , y0 i1 , so
that one can multiply and divide quantity (B) by k h xh − x0 , yh − y0 i k. I will do this arithmetic (and also
replace quantity (B) with its absolute value):
(B) = |k(xh , yh ) − L(xh , yh )| · k h xh − x0 , yh − y0 i k .
k h xh − x0 , yh − y0 i kk
|h|
|
{z
} |
{z
}
(C)
(D)
It follows that
k h xh − x0 , yh − y0 i k
|k(xh , yh ) − L(xh , yh )|
· lim
,
lim (B) = lim
h→0
h→0 k h xh − x0 , yh − y0 i k h→0
|h|
|
{z
}
|
{z
}
(C)
(6)
(D)
1
This additional assumption is what makes this an almost proof. One must employ a trick involving
~ = (x0 , y0 ) for multiple values of t. I will not bother you with
continuity to deal with the case in which the C
the details.
2
if the two limits on the right of (6) exist. Furthermore, the limit of quantity (C) exists and equals zero; this
~ is continuous at t0 (see Exercise (1)). The limit
follows immediately from equation (3) and the fact that C
of quantity (D) will require some further computation:
k h xh , yh i − h x0 , y0 i k
|h|
1
= lim ( h xh , yh i − h x0 , y0 i ) h→0 h
1 ~
~ 0) = lim
C(t0 + h) − C(t
h→0 h ~ 0 (t0 ) .
= C
lim (D) = lim
h→0
h→0
I can now complete the proof by finishing the computation begun in Equation (6):
k h xh − x0 , yh − y0 i k
|k(xh , yh ) − L(xh , yh )|
~ 0 (t0 ) = 0.
· lim
= (0) C
lim (B) = lim
h→0
h→0 k h xh − x0 , yh − y0 i k h→0
|h|
Exercise 1. Prove that the limit of quantity (C) is zero.
IV. The gradient. Observe that the Calculus I Chain Rule (Equation (1)) is very compact: it says that
you can calculate w10 (t), by multiplying together two derivatives, one supplied by the function k and the other supplied by the function c.
The new Chain Rule (Equation (4)) is messier: this formula instructs us, in order to calculate w20 (t), to
~
combine four derivative-type quantities2 —two supplied by k and the other two supplied by C—in
a rather
0
complicated way. As it turns out, though, we can do better. If you recombine the “f (t0 )” and “g 0 (t0 )” into
the derivative vector
~ 0 (t0 ) = hf 0 (t0 ), g 0 (t0 )i
C
and you put the two partial derivatives of k together to make the vector
h kx (x0 , y0 ), ky (x0 , y0 ) i ,
(7)
then the right-hand side of (4) is exactly the dot product of these vectors:
kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ) = h kx (x0 , y0 ), ky (x0 , y0 ) i · h f 0 (t0 ), g 0 (t0 ) i .
(8)
Equations (8) makes it possible to state the new Chain Rule as succinctly as we can state the old one:
you can calculate w20 (t), by taking the dot product of two vectors, one supplied by the func~
tion k and the other supplied by the function C.
The vector supplied by k (Equation (7)) is important. It is called the gradient of k (x0 , y0 ), and it is
denoted “∇k(x0 , y0 ).” with this notation the new Chain Rule can be expressed:
~ 0 )) · C
~ 0 (t0 ).
w20 (t0 ) = ∇k(C(t
2
There will be six in the R
~
C
k
R3
R case!
−→
−→
3