Download Relevant parts of lecturenotes of A. Pultr, slightly adapted.

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational electromagnetics wikipedia , lookup

Mathematical optimization wikipedia , lookup

Routhian mechanics wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Simplex algorithm wikipedia , lookup

Generalized linear model wikipedia , lookup

Transcript
Lecture from NMAI055 (2.5.2017)
Incomplete notes. Adapted by Tereza Klimošová from notes by Aleš Pultr.
Composed functions and the Chain Rule.
Recall the proof of the rule of the derivative for composed functions. It was
based on the “total differential formula for one variable”. By an analogous
procedure we will obtain the following
Theorem 1 (Chain Rule in its simplest form). Let f (x) have a total differential at a point a. Let real functions gk (t) have derivatives at a point b and
let gk (b) = ak for all k = 1, . . . , n. Put
F (t) = f (g(t)) = f (g1 (t), . . . , gn (t)).
Then F has a derivative at b, and
0
F (b) =
n
X
∂f (a)
∂xk
k=1
· gk0 (b).
Corollary 2. (The Chain Rule) Let f (x) have a total differential at a point
a. Let real functions gk (t1 , . . . , tr ) have partial derivatives at b = (b1 , . . . , br )
and let gk (b) = ak for all k = 1, . . . , n. Then the function
(f ◦ g)(t1 , . . . , tr ) = f (g(t)) = f (g1 (t), . . . , gn (t))
has all the partial derivatives at b, and
n
∂(f ◦ g)(b) X ∂f (a) ∂gk (b)
=
·
.
∂tj
∂x
∂t
k
j
k=1
Note. Just possessing partial derivatives would not suffice.
Chain rule for vector functions. Let us make one more step and
consider a mapping f = (f1 , . . . , fs ) : D → Rs . Take its composition f ◦ g
with a mapping g : D0 → Rn (recall the convention in 1.4). Then we have
∂(f ◦ g) X ∂fi ∂gk
=
·
.
(∗)
∂tj
∂x
k ∂xj
k
1
It certainly has not escaped the reader’s attention that the right hand side
is the product of matrices
∂gk
∂fi
.
(∗∗)
∂xj
∂xk
i,k
k,j
Recall from linear algebra the role of matrices in describing linear functions
L : Vn → Vm . In particular recall that a composition of linear mappings
results in the product of the associated matrices. Then the formulas (∗)
resp. (∗∗) should not be surprising: they represent a fact to be expected,
namely that the linear approximation of a composition f ◦g is the composition
of the linear approximations of f and g .
Following the above comment, we may express the chain rule in matrix
form as follows. For an f = (f1 , . . . , fs ) : U → Rs , D ⊆ Rn , define Df as the
matrix
∂fi
.
Df = ∂x
k
i,k
Then we have
D(f ◦ g) = Df · Dg.
More explicitly, in a concrete argument t we have
D(f ◦ g)(t) = D(f(g))(t) · Dg(t).
Compare it with the one variable rule
(f ◦ g)0 (t) = f 0 (g(t)) · g 0 (t);
for 1 × 1 matrices we of course have (a)(b) = (ab).
Geometric meaning of differential.
What is happening geometrically is this: If we think of a function f as
represented by its “graph”, the hypersurface
S = {(x1 , . . . , xn , f (x1 , . . . , xn )) | (x1 , . . . , xn ) ∈ D} ⊆ Rn+1 ,
the partial derivatives describe just the tangent lines in the directions of
the coordinate axes, while the total differential describes the entire tangent
hyperplane.
Then normal vector of the graph of the function of two veriables is a
vector given by ( ∂f
, ∂f , −1).
∂x ∂y
2
Higher order partial derivatives.
then similarly as taking the second
When we have a function g(x) = ∂f∂x(x)
k
derivative of a function of one variable, we may consider partial derivatives
of g(x), that is,
∂g(x)
.
∂xl
The result, if it exists, is then denoted by
∂ 2 f (x)
.
∂xk ∂xl
More generally, iterating this procedure we may obtain
∂ r f (x)
,
∂xk1 ∂xk2 . . . ∂xkr
the partial derivatives of order r.
Note that the order is given by the number of taking derivatives and does
not depend on repeated individual variables. Thus for example,
∂ 3 f (x, y, x)
∂x∂y∂z
and
∂ 3 f (x, y, x)
∂x∂x∂x
are derivatives of third order (even though in the former case we have taken
a partial derivative by each variable only once).
To simplify notation, taking a partial derivatives by the same variable
more than once consecutively may be indicated by an exponent, e.g.
∂ 5 f (x, y)
∂ 5 f (x, y)
=
,
∂x2 ∂y 3
∂x∂x∂x∂y∂y
∂ 5 f (x, y)
∂ 5 f (x, y)
=
.
∂x2 ∂y 2 ∂x
∂x∂x∂y∂y∂x
Example. Compute the “mixed” second order derivatives of the function
f (x, y) = x sin(y 2 + x). We obtain, first,
∂f (x, y)
= sin(y 2 + x) + x cos(y 2 + x) and
∂x
3
∂f (x, y)
= 2xy cos(y 2 + x).
∂y
Now for the second order derivatives we get
∂ 2f
∂ 2f
= 2y cos(y 2 + x) − 2xy sin(y 2 + x) =
.
∂x∂y
∂y∂x
Whether it is surprising or not, it suggests that higher order partial derivatives may not depend on the order of differentiation. This is true – provided all the derivatives in question are continuous (it should be noted,
though, that without this assumption the equality does not necessarily hold
see https://en.wikipedia.org/wiki/Symmetry of second derivatives).
Theorem 3. Let f : U → R be a function such that the partial derivatives
∂2f
∂2f
and ∂y∂x
are defined and continuous in a neighborhood of a point ā.
∂x∂y
Then we have
∂ 2 f (ā)
∂ 2 f (ā)
=
.
∂x∂y
∂y∂x
Iterating these interchanges we obtain the following corollary by an easy
induction.
Corollary 4. Let a function f of n variables possess continuous partial
derivatives up to the order k on an open neighborhood U of ā. Then the values
of these derivatives depend only on the number of times a partial derivative
is taken in each of the individual variables x1 , . . . , xn .
Thus, under the assumption of the corollary, we can write a general partial
derivative of the order r ≤ k as
∂rf
∂xr11 ∂xr22 . . . ∂xrnn
with r1 + r2 + · · · + rn = r
where, of course, rj = 0 is allowed and indicates the absence of the symbol
∂xj .
4