* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Download Higher–Dimensional Chain Rules I. Introduction. The one
History of calculus wikipedia , lookup
Series (mathematics) wikipedia , lookup
Automatic differentiation wikipedia , lookup
Sobolev space wikipedia , lookup
Partial differential equation wikipedia , lookup
Function of several real variables wikipedia , lookup
Matrix calculus wikipedia , lookup
Generalizations of the derivative wikipedia , lookup
Limit of a function wikipedia , lookup
Higher–Dimensional Chain Rules
I. Introduction.
The one–dimensional Chain Rule tells us how to find the derivative of a composed function
w1 (t) = k c(t) from the derivatives of k and c—in
the case where these are all Calculus I functions. As you
are well aware, the Chain Rule says: If k 0 c(t) and c0 (t) both exist then w10 (t) exists and is given by
i
dh
k c(t) = w10 (t) = k 0 c(t) c0 (t).
dt
(1)
Now, the notion of composing functions extends to functions with more general domains and ranges. For
example, if
~
C(t)
= h cos(t), sin(t) i
and
k(x, y) = x2 e3y ,
~ is
then the composition of k with C
~
w2 (t) = k C(t)
= cos2 (t)e3 sin(t) ,
(2)
and you might guess that there exist chain rules similar to the first one for situations such as this. That
d h ~ i
guess would be correct: there is indeed a chain rule for calculating
k C(t) from the derivatives of k
dt
~
and C. In fact, there is a whole family of chain rules, one for each case
Rn
C k k
R
R` .
−→
−→
This handout will state and (almost) prove the first and simplest of these new chain rules, the case in which
n = ` = 1 and k = 2:
~
C
k
R
R2
R.
−→
−→
Note that Equation (2) above is an instance of this case.
II. The statement of the theorem. First, let me set the stage and introduce some notation.
~ = C(t)
~
~ 0 ) = h f (t0 ), g(t0 ) i. I will assume
[a]: Let C
= h f (t), g(t) i be a planar curve, with (x0 , y0 ) = C(t
0
0
0
~
that C (t0 ) = h f (t0 ), g (t0 ) i exists.
[b]: Let k = k(x, y) be a real–valued function of two variables, defined in a neighborhood of
~ 0 ). I will assume that k is differentiable at (x0 , y0 ), so that
(x0 , y0 ) = C(t
|f (x, y) − L(x, y)|
= 0,
(x,y)→(x0 ,y0 ) k h x − x0 , y − y0 i k
lim
(3)
where L(x, y) = k(x0 , y0 ) + kx (x0 , y0 )(x − x0 ) + ky (x0 , y0 )(y − y0 ).
[c]: For any h 6= 0, let
xh = f (t0 + h)
.
yh = g(t0 + h)
I can now state the theorem.
~ be as above, and let w2 (t) = k C(t)
~
Theorem. Let k and C
. Then w2 is differentiable at t0 , and
w20 (t0 ) = kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ).
(4)
In class, we will differentiate the function in Equation (2) both by Calculus I methods and with this new
formula, and we will compare the answers.
1
III. The (almost) proof of the theorem.
Step 1. I will start with the (Calculus I) definition of the derivative of w2 :
~ 0 + h) − k C(t
~ 0)
k C(t
k f (t0 + h), g(t0 + h) − k f (t0 ), g(t0 )
= lim
h→0
h→0
h
h
w20 (t0 ) = lim
(5)
k xh , yh − k x0 , y0
.
h→0
h
= lim
Step 2. I will add and subtract L(xh , yh ) in the numerator of (5), so as to break (5) into two limit problems:
(5) = lim
h→0
k(xh , yh ) − L(xh , yh )
L(xh , yh ) − k(x0 , y0 )
+ lim
,
h→0
h
h
|
{z
}
|
{z
}
(B)
(A)
if both of these limits exist.
Step 3: Evaluating the limit of quantity (A). Since
L(xh , yh ) = k(x0 , y0 ) + kx (x0 , y0 )(xh − x0 ) + ky (x0 , y0 )(yh − y0 ),
(A) =
so that
kx (x0 , y0 )(xh − x0 ) + ky (x0 , y0 )(yh − y0 )
,
h
xh − x0
yh − y0
+ lim ky (x0 , y0 )
h→0
h→0
h
h
xh − x0
y h − y0
= kx (x0 , y0 ) lim
+ ky (x0 , y0 ) lim
h→0
h→0
h
h
f (t0 + h) − f (t0 )
g(t0 + h) − g(t0 )
= kx (x0 , y0 ) lim
+ ky (x0 , y0 ) lim
h→0
h→0
h
h
= kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ).
lim (A) = lim kx (x0 , y0 )
h→0
Thus, the limit of quantity (A) is exactly the right-hand side of Equation (4). To prove the theorem, then,
I must show that the limit of quantity (B) is zero.
Step 4: Evaluating the limit of quantity (B). Assume, for all h 6= 0, that h xh , yh i 6= h x0 , y0 i1 , so
that one can multiply and divide quantity (B) by k h xh − x0 , yh − y0 i k. I will do this arithmetic (and also
replace quantity (B) with its absolute value):
(B) = |k(xh , yh ) − L(xh , yh )| · k h xh − x0 , yh − y0 i k .
k h xh − x0 , yh − y0 i kk
|h|
|
{z
} |
{z
}
(C)
(D)
It follows that
k h xh − x0 , yh − y0 i k
|k(xh , yh ) − L(xh , yh )|
· lim
,
lim (B) = lim
h→0
h→0 k h xh − x0 , yh − y0 i k h→0
|h|
|
{z
}
|
{z
}
(C)
(6)
(D)
1
This additional assumption is what makes this an almost proof. One must employ a trick involving
~ = (x0 , y0 ) for multiple values of t. I will not bother you with
continuity to deal with the case in which the C
the details.
2
if the two limits on the right of (6) exist. Furthermore, the limit of quantity (C) exists and equals zero; this
~ is continuous at t0 (see Exercise (1)). The limit
follows immediately from equation (3) and the fact that C
of quantity (D) will require some further computation:
k h xh , yh i − h x0 , y0 i k
|h|
1
= lim ( h xh , yh i − h x0 , y0 i ) h→0 h
1 ~
~ 0) = lim
C(t0 + h) − C(t
h→0 h ~ 0 (t0 ) .
= C
lim (D) = lim
h→0
h→0
I can now complete the proof by finishing the computation begun in Equation (6):
k h xh − x0 , yh − y0 i k
|k(xh , yh ) − L(xh , yh )|
~ 0 (t0 ) = 0.
· lim
= (0) C
lim (B) = lim
h→0
h→0 k h xh − x0 , yh − y0 i k h→0
|h|
Exercise 1. Prove that the limit of quantity (C) is zero.
IV. The gradient. Observe that the Calculus I Chain Rule (Equation (1)) is very compact: it says that
you can calculate w10 (t), by multiplying together two derivatives, one supplied by the function k and the other supplied by the function c.
The new Chain Rule (Equation (4)) is messier: this formula instructs us, in order to calculate w20 (t), to
~
combine four derivative-type quantities2 —two supplied by k and the other two supplied by C—in
a rather
0
complicated way. As it turns out, though, we can do better. If you recombine the “f (t0 )” and “g 0 (t0 )” into
the derivative vector
~ 0 (t0 ) = hf 0 (t0 ), g 0 (t0 )i
C
and you put the two partial derivatives of k together to make the vector
h kx (x0 , y0 ), ky (x0 , y0 ) i ,
(7)
then the right-hand side of (4) is exactly the dot product of these vectors:
kx (x0 , y0 )f 0 (t0 ) + ky (x0 , y0 )g 0 (t0 ) = h kx (x0 , y0 ), ky (x0 , y0 ) i · h f 0 (t0 ), g 0 (t0 ) i .
(8)
Equations (8) makes it possible to state the new Chain Rule as succinctly as we can state the old one:
you can calculate w20 (t), by taking the dot product of two vectors, one supplied by the func~
tion k and the other supplied by the function C.
The vector supplied by k (Equation (7)) is important. It is called the gradient of k (x0 , y0 ), and it is
denoted “∇k(x0 , y0 ).” with this notation the new Chain Rule can be expressed:
~ 0 )) · C
~ 0 (t0 ).
w20 (t0 ) = ∇k(C(t
2
There will be six in the R
~
C
k
R3
R case!
−→
−→
3