Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Lecture from NMAI055 (2.5.2017) Incomplete notes. Adapted by Tereza Klimošová from notes by Aleš Pultr. Composed functions and the Chain Rule. Recall the proof of the rule of the derivative for composed functions. It was based on the “total differential formula for one variable”. By an analogous procedure we will obtain the following Theorem 1 (Chain Rule in its simplest form). Let f (x) have a total differential at a point a. Let real functions gk (t) have derivatives at a point b and let gk (b) = ak for all k = 1, . . . , n. Put F (t) = f (g(t)) = f (g1 (t), . . . , gn (t)). Then F has a derivative at b, and 0 F (b) = n X ∂f (a) ∂xk k=1 · gk0 (b). Corollary 2. (The Chain Rule) Let f (x) have a total differential at a point a. Let real functions gk (t1 , . . . , tr ) have partial derivatives at b = (b1 , . . . , br ) and let gk (b) = ak for all k = 1, . . . , n. Then the function (f ◦ g)(t1 , . . . , tr ) = f (g(t)) = f (g1 (t), . . . , gn (t)) has all the partial derivatives at b, and n ∂(f ◦ g)(b) X ∂f (a) ∂gk (b) = · . ∂tj ∂x ∂t k j k=1 Note. Just possessing partial derivatives would not suffice. Chain rule for vector functions. Let us make one more step and consider a mapping f = (f1 , . . . , fs ) : D → Rs . Take its composition f ◦ g with a mapping g : D0 → Rn (recall the convention in 1.4). Then we have ∂(f ◦ g) X ∂fi ∂gk = · . (∗) ∂tj ∂x k ∂xj k 1 It certainly has not escaped the reader’s attention that the right hand side is the product of matrices ∂gk ∂fi . (∗∗) ∂xj ∂xk i,k k,j Recall from linear algebra the role of matrices in describing linear functions L : Vn → Vm . In particular recall that a composition of linear mappings results in the product of the associated matrices. Then the formulas (∗) resp. (∗∗) should not be surprising: they represent a fact to be expected, namely that the linear approximation of a composition f ◦g is the composition of the linear approximations of f and g . Following the above comment, we may express the chain rule in matrix form as follows. For an f = (f1 , . . . , fs ) : U → Rs , D ⊆ Rn , define Df as the matrix ∂fi . Df = ∂x k i,k Then we have D(f ◦ g) = Df · Dg. More explicitly, in a concrete argument t we have D(f ◦ g)(t) = D(f(g))(t) · Dg(t). Compare it with the one variable rule (f ◦ g)0 (t) = f 0 (g(t)) · g 0 (t); for 1 × 1 matrices we of course have (a)(b) = (ab). Geometric meaning of differential. What is happening geometrically is this: If we think of a function f as represented by its “graph”, the hypersurface S = {(x1 , . . . , xn , f (x1 , . . . , xn )) | (x1 , . . . , xn ) ∈ D} ⊆ Rn+1 , the partial derivatives describe just the tangent lines in the directions of the coordinate axes, while the total differential describes the entire tangent hyperplane. Then normal vector of the graph of the function of two veriables is a vector given by ( ∂f , ∂f , −1). ∂x ∂y 2 Higher order partial derivatives. then similarly as taking the second When we have a function g(x) = ∂f∂x(x) k derivative of a function of one variable, we may consider partial derivatives of g(x), that is, ∂g(x) . ∂xl The result, if it exists, is then denoted by ∂ 2 f (x) . ∂xk ∂xl More generally, iterating this procedure we may obtain ∂ r f (x) , ∂xk1 ∂xk2 . . . ∂xkr the partial derivatives of order r. Note that the order is given by the number of taking derivatives and does not depend on repeated individual variables. Thus for example, ∂ 3 f (x, y, x) ∂x∂y∂z and ∂ 3 f (x, y, x) ∂x∂x∂x are derivatives of third order (even though in the former case we have taken a partial derivative by each variable only once). To simplify notation, taking a partial derivatives by the same variable more than once consecutively may be indicated by an exponent, e.g. ∂ 5 f (x, y) ∂ 5 f (x, y) = , ∂x2 ∂y 3 ∂x∂x∂x∂y∂y ∂ 5 f (x, y) ∂ 5 f (x, y) = . ∂x2 ∂y 2 ∂x ∂x∂x∂y∂y∂x Example. Compute the “mixed” second order derivatives of the function f (x, y) = x sin(y 2 + x). We obtain, first, ∂f (x, y) = sin(y 2 + x) + x cos(y 2 + x) and ∂x 3 ∂f (x, y) = 2xy cos(y 2 + x). ∂y Now for the second order derivatives we get ∂ 2f ∂ 2f = 2y cos(y 2 + x) − 2xy sin(y 2 + x) = . ∂x∂y ∂y∂x Whether it is surprising or not, it suggests that higher order partial derivatives may not depend on the order of differentiation. This is true – provided all the derivatives in question are continuous (it should be noted, though, that without this assumption the equality does not necessarily hold see https://en.wikipedia.org/wiki/Symmetry of second derivatives). Theorem 3. Let f : U → R be a function such that the partial derivatives ∂2f ∂2f and ∂y∂x are defined and continuous in a neighborhood of a point ā. ∂x∂y Then we have ∂ 2 f (ā) ∂ 2 f (ā) = . ∂x∂y ∂y∂x Iterating these interchanges we obtain the following corollary by an easy induction. Corollary 4. Let a function f of n variables possess continuous partial derivatives up to the order k on an open neighborhood U of ā. Then the values of these derivatives depend only on the number of times a partial derivative is taken in each of the individual variables x1 , . . . , xn . Thus, under the assumption of the corollary, we can write a general partial derivative of the order r ≤ k as ∂rf ∂xr11 ∂xr22 . . . ∂xrnn with r1 + r2 + · · · + rn = r where, of course, rj = 0 is allowed and indicates the absence of the symbol ∂xj . 4