* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Part II. Optimization methods
Survey
Document related concepts
Computational fluid dynamics wikipedia , lookup
Computational electromagnetics wikipedia , lookup
Renormalization group wikipedia , lookup
Corecursion wikipedia , lookup
Scalar field theory wikipedia , lookup
Path integral formulation wikipedia , lookup
Dirac delta function wikipedia , lookup
Multiple-criteria decision analysis wikipedia , lookup
Linear algebra wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Simplex algorithm wikipedia , lookup
Least squares wikipedia , lookup
Generalized linear model wikipedia , lookup
Newton's method wikipedia , lookup
Transcript
CALCULUS OF VARIATIONS AND OPTIMIZATION METHODS Part II. Optimization methods Lecture 13. Gradient methods and differentiation of the functionals We consider the methods of solving extremum problems, which is based on the necessary conditions of the extremum. However these methods transform the initial problem to another problem. There is the system of optimality conditions. Then we used iterative methods for solving this system. Our next step is the analysis of the direct iterative methods of solving extremum problems without using conditions of the extremum. 13.1. Gradient method for the functions with many variables Let F F (v) F (v1 ,...., vn ) be a smooth enough function. Problem 13.1. Find the vector v (v1 ,...., vn ) which minimize the function F. The practical solving of this problem is based on the determination of a sequence {v k } of n degree vectors. Let the value v k be known. The next iteration can be found by the formula (13.1) v k 1 v k k hk , k k where is a number (parameter of the algorithm, which is chosen positive), and h is n degree vector. We try to choose these values such that the function F will be increase from k iteration to k 1 iteration. We obtain n F (v k ) k F (v k 1 ) F (v k k h k ) F (v k ) k hi ( k ), v i 1 i where ( k ) / k 0 as k 0 . We have the sum of the product of the vector components at the second term of the right side of this equality. It is the scalar product of two vectors. First of them has the partial derivative of the function F as the components. This vector is called the gradient of the function F. So we get the equality F (v k 1 ) F (v k ) k F (v k ), h k ( k ). Determine the vector hk F (v k ). (13.2) Then we have 2 F (v k 1 ) F (v k ) k F (v k ) ( k ) because the scalar square of the vector is the square of its norm. Choose the number k small enough for large enough value k. Then the sigh of the sum at the right side of this equality will be determined by the sign of its first term. Therefore we obtain the inequality F (v k 1 ) F (v k ) 0. So the sequence F (v k ) is decreasing. Therefore we can suppose that the limit of the sequence can be a solution of Problem 13.1. The algorithm (13.1), (13.2) is called gradient method. Remark 13.1. In really the value can be non-enough small. So we cannot guarantee the convergence of the method to the minimum of the function F. Besides it can converge to the point of its local but not this absolute one because the properties of the function are same at the small enough neighbourhood of the local and absolute minimum. Remark 13.2. If F (v k ) 0, then we have the equality v k 1 v k , and we get the convergence of the gradient method. So the gradient method is the method of finding the solutions of the stationary condition in really. There exist a lot of variants of the gradient method. It depends of from the choice of the parameter k . For example, we obtain the quickest descent method if we find it from the equality (13.3) F v k k F (v k ) min F v k F (v k ) . 0 The quickest descent method has the best velocity of the convergence. However it uses the problem (13.3) of minimization the function of one variable for each step of the iteration. We would like now to extend these results to the problems of the functionals minimization. So we need to determine the derivatives of the functionals, namely analogues of the gradient of the functions of many variables. 13.2. Projection gradient method We considered the minimization problems without any constraint. Try to use our results for conditional minimization problems. Let again F be the function of n variable, and U be an n-dimensional set. Problem 13.2. Find the vector v (v1 ,...., vn ) which minimize the function F on the set U. We can use the following algorithm v k 1 P v k k F (v k ) , (13.4) where k is a positive number (parameter of the algorithm, and P is the operator of the projection or projector on the set U. It maps the given point of the n-dimensional space to the nearest point of the set U. This algorithm is called the projection gradient method. Example 13.1. Let us have the problem of the minimization of a function F F (v) on the interval [ a, b] . Then the projection gradient method is determined by the formula a, if v k k F (v k ) a, v k 1 v k k F (v k ), if a v k k F (v k ) b, b, if v k k F (v k ) b. There exist a lot of variants of the projection gradient method with different choice of the parameter k . Remark 13.3. If the set U is difficult enough we can have some difficulty of the calculation of projection. 13.3. Gateaux derivative Let us consider a general functional I on the arbitrary set V. We could extend in principle the stationary condition and gradient method for its minimization if we had some methods of differentiation of this functional. Let us try to use the standard technique for calculate its derivative at a point v of the set V. It is know that the derivative of a function at a point is the result of passing to the limit to the increment of the function devised by the increment of the argument as the increment of the argument tends to zero. Try to consider the increment of the argument at the point v. Let it be described by a value h. So the corresponding increment of functional is the difference between its value at the new point v h and its value at the initial point v. There is the question, what is the sum of two elements of the given set V? We have the necessity to guarantee belonging of the sum of two arbitrary elements of V to this set. So the operation of the addition is determined on the set V. The difference I (v h ) I (v ) has the sense in this case. Our next step is the calculation of the ratio of the increments I ( v h ) I ( v ) / h. We have the functional I, so its numerator is a number. But what is the sense of the division of the number I (v h ) I (v ) to the element h of the set V. It is clear, if h is a number. However it can be a vector or a function. Unfortunately the considered fraction does not have any sense in this case. So we need to correct the definition of the derivative. We can determine the derivative of the function f f ( x ) at the point x by the following method. We consider the ratio f ( x h ) f ( x ) / , where and h are number. Then we try to pass to the limit here as tends to zero. If there exists a limit of this value, and this limit is linear with respect to h, then it has the form f ( x ) h for all value h, where f ( x ) is the derivative of the function f at the point x. We can try to use this idea for our case. Consider the ratio I ( v h ) I ( v ) / , where is a number, and h is an element of the set V. We have the ratio of two numbers here. So this term has the natural sense, and we will not any problem with division. However we need to interpret the product h . Our technique can be true if we can determine the product between an arbitrary number and the element h of the set V. So the operation of the multiplication of the element of V to the number is determined on the set V. Definition 13.1. The set with addition of the elements and the multiplication of the element to the number (with natural additional properties) is called the linear space. Hence we will consider the set V as a linear space. The term I ( v h ) I ( v ) / has the sense for this case. Our next step is passing to the limit at the last term. However the convergence does not determined on the arbitrary set. We can pass to the limit on the topological space only. So we suppose that our set V is a topological space. But we have the necessity of a relation between the given algebraic operations and passing to the limit. Particularly we would like to have the convergence h 0 for all h as 0. Then we would like to obtain the convergence v h v. We can guarantee these properties if our operations are continuous. Let us have the convergence uk u. The operation of the multiplication by number is continuous if we obtain uk u. Let us have also the convergence vk v. The operation of the addition is continuous if we get uk vk (u v). Definition 13.2. If the set is linear and topological space with continuous operations, then is called the linear topological space. Hence we will consider the set V as a linear topological space. So the passing to the limit for the term I ( v h ) I ( v ) / as 0 has the sense. Suppose the existence of the corresponding limit. It depends of course from v and h. We suppose also that its dependence from h. So we have the equality I (v h ) I (v ) I ( v ) h, (13.5) where the map h I (v ) h is linear. The term in the left side of this equality is a number, because it is the limit of ration of the two numbers. So the term in the right side of (13.5) is number too. Therefore the map I ( v ) is a linear functional. lim 0 Definition 13.3. A functional I is called Gateaux differentiable at the point v if there exists a linear functional I ( v ) such that the equality (13.5) holds. Besides I ( v ) is called Gateaux derivative of the functional I at the point v. 13.4. Examples of Gateaux derivative Consider examples of Gateaux derivatives. Example 13.1. Linear functional. The functional I on the set V is linear if it satisfies the following equality I ( u v ) I (u ) I (v ) for all elements u and v of V and number ,. Find the derivative of the linear functional I at the arbitrary point v. We have I (v h ) I (v ) I (v ) I ( h ) I (v ) lim lim I ( h) 0 0 for all h. So the linear functional is Gateaux differentiable at the arbitrary point; besides it its derivative is equal to the initial functional. Example 13.2. Affine functional. The functional I on the set V is called affine if it is determine by the equality I (v ) J (v ) a v V , where J is a linear on V, and a is an element of V. We find J ( v h ) a J (v ) a I (v h ) I (v ) lim lim J ( h). 0 0 So the affine functional is Gateaux differentiable at the arbitrary point; besides it its derivative is equal to the corresponding linear functional. Example 13.3. The function of one variable. Consider a function f f ( x ). Suppose it is differentiable at the point x. So we have the equality f ( x h) f ( x ) f ( x )h (13.6) for all number h. So Gateaux derivative of the function f at the point x is its classical derivative at this point. lim 0 Example 13.4. The function of many variables. Consider a function f f ( x1 ,..., xn ). Determine its Gateaux derivative at the point x ( x1 ,..., xn ). We would like to pass to the limit in the equality f ( x h) f ( x) f ( x1 h1 ,..., xn hn ) f ( x1 ,..., xn ) as 0 for all vector h (h1 ,..., hn ). Let the function f be differentiable with respect to all its arguments. Then we have lim 0 f ( x h) f ( x ) n i 1 f ( x) hi . xi (13.7) The term at the right side of this equality is the scalar product of the gradient f ( x) f ( x) f ( x) ,..., xn x1 and the vector h. So Gateaux derivative of the function f of many variable at the point x is its gradient at this point. The term at the right side of the equality (13.7) is the scalar product of Gateaux derivative f ( x) and the vector h. There exists a class of linear topological spaces, where a scalar product has the sense. The definition of Gateaux derivative by means the scalar product is clear for this case. 13.5. Unitary spaces and Gateaux derivatives We determine that Gateaux derivative is definite by scalar product for the functions of many variables. It is possible to extend this result for the general linear spaces with scalar product. Definition 13.4. The value (u , v ) is the scalar product of the elements u and v of the linear space V, if the maps u (u , v ) and v (u , v ) are linear functionals and (u , v ) (v, u ) for all u , v V ; besides (u , u ) 0, and (u , u ) 0, if and only if u is zero element of the space V, that is u v v for all v. The linear space with a scalar product is called the unitary space. The set of real number R is the unitary space with scalar product (u , v ) uv u , v R. n Euclid space R is the unitary space with standard scalar product n n (u , v ) ui vi u , v R . i 1 The set S of the smooth enough functions on the set with zero values on the boundary of is the unitary spaces with scalar product (u, v) u( x)v( x)d u, v S (). Let V be the linear space with a scalar product. The value v ( v ,v ) is called the norm of the element v of V. The sequence {vk } of the linear space V with scalar product converges to the point v, if lim vk v 0. k The unitary space with this convergence is the linear topological space. Definition 13.5. A functional I on the unitary space V is called Gateaux differential at the point v if there exists an element I ( v ) of V such that lim I (v h) I (v ) I ( v ), h h V holds. Besides I ( v ) is called Gateaux derivative of the functional I at the point v. 0 (3.6) It obviously, that the definition 3.2 and 3.5 are equivalent for the set of real numbers and Euclid space. Example 13.5. Square of the norm. Consider the functional 2 I (v) v on the unitary space V. Find the value I (v h) v h, v h v, v 2 v, h 2 h, h . Then we have I (v h) I (v) 2 v, h 2 h . 2 After division by and passing to the limit as 0 we get I '(v), h 2v, h h V . So I '(v) 2v. Example 13.6. Lagrange functional. Consider the functional x2 I (v) F [ x, v( x),v( x)]dx x1 on the set S[ x1 , x2 ] of the smooth enough functions on the interval [ x1 , x2 ] with zero values on the boundary of this interval. This functional is called Lagrange one. Let the function F be smooth enough. We find x2 x1 x1 F x,v( x) h( x),v( x) h( x) dx F x,v( x),v( x) dx I (v h) x2 x2 F x, v ( x ), v( x ) h ( x ) dx v x1 x2 v F x, v( x), v( x) h( x)dx ( ), x1 where ( ) / 0 as 0 . Then we calculate lim I (v h) I (v ) 0 x2 F x, v ( x ), v( x ) h( x ) dx v x1 x2 v F x, v( x), v( x) h( x)dx x1 for all function h from the set S[ x1 , x2 ] . Differentiating by parts we get x2 F x, v ( x ), v( x ) h( x ) dx v x1 x2 d F x, v ( x ), v( x ) h( x ) dx dx v x1 x 2 F x, v ( x ), v( x ) h( x ) v x1 x2 d F x, v ( x ), v( x ) h( x ) dx dx v x1 because the function h is equal to zero at the point x1 and x2. Therefore the derivative of Lagrange functional at the point (function) v is determined by the equality x2 I (v),h d F x, v ( x ), v( x ) F x, v ( x ), v( x ) v dx v h( x)dx h S[ x1 , x2 ]. x1 So we determine Gateaux derivative I (v) d F x, v( x), v( x ) F x, v ( x ), v( x ) . v dx v Example 13.7. Dirichlet integral. Let be n-dimensional set with the boundary S. Consider an integral n v 2 d , I (v) 2 vf x i 1 i where f is a given function. This integral is called Dirichlet one. We consider it on the set S of the smooth enough functions on with zero value on S. Find the value n (v h) 2 n v 2 I (v h) 2(v h) f d 2vf d x x i 1 i i 1 i 2 n n v h h 2 2 fh d d . x x x i 1 i 1 i i i Then we have lim I (v h) I (v ) 0 n v xi i 1 xi n hdS , 2 h fh d . Using Green’s formula we get n v h x i 1 where is Laplace operator, and i xi d vhd v S v is the derivative of v with respect to the normal to the n boundary S. So we can determine Gateaux derivative of Dirichlet integral by the equality I ( v ),h 2 ( v f ) hd h S because the function h is equal to zero on the set S. Then we find Gateaux derivative I ( v ) 2( f v ). Example 13.8. Discontinuous function. Consider the function of two variables, which is determine by the equality y( x 2 y 2 ) / x, x 0, f ( x, y) x 0. 0, Find the difference f ( h, g ) f (0,0) 2 (h2 g 2 ) g / h for all numbers h and g. After division by and passing to the limit as 0 we obtain that Gateaux derivative of the function f at zero point is zero, namely second order zero vector. However we continue our analysis. Determine x t 3 , y t , where t is a parameter. Then we have f ( x, y) (t 4 1). Passing to the limit as t 0 we have ( x, y ) 0 and f ( x, y ) 1. But the value of the function f at zero is zero. So this function is discontinuous at zero. Hence the discontinuous function of two variables can be Gateaux differentiable. Example 13.9. Absolute value. Consider the function f ( x) x . We have f ( h) f (0) h . Then we obtain lim h / h , lim h / h . 0 0 The result depends from the method of passing to the limit, and the result is not linear with respect to h. So the absolute value function is not Gateaux differentiable. 13.5. Gradient methods for the functionals Let us have the problem of the minimization of the Gateaux differentiable functional I on the unitary space V. We can find an approximate solution of this problem with using gradient method (13.7) v k 1 v k k I (v k ), k k where v is a k iteration of the unknown function, is a positive number (a parameter of the algorithm), I ( v k ) is Gateaux derivative of the functional I at the point I ( v k ) . If I is a function of many variables, and V is Euclid space, then the formula (13.7) is equivalent to the algorithm (13.1), (13.2). Consider a problem of the minimization Lagrange functional on the space of smooth enough functions on the interval [ x1, x2 ] . Using the formula of its Gateaux derivative, we get the gradient method v k 1 ( x) v ( x) k k , d k k k k F x, v ( x ), v ( x ) F x, v ( x ), v ( x ) v dx v x [ x1 , x2 ]. Now consider the problem of the minimization of Dirichlet integral on the space of smooth enough functions on the . Using the formula of its Gateaux derivative, we obtain the following form of the gradient method (13.7) k 1 v ( x ) v ( x ) 2 ( f v ). We will solve also the problem of the minimization a differentiable functional I on the subset U of a unitary space V. Then we will the projection gradient method v k 1 P v k k I (v k ) , (13.8) k k k where P is the projection on the set U. Example 13.10. We have the problem of the minimization of the functional 2 I (v) v( x) y( x) dx on the set U v S () a v( x) b. Determine the derivative of the functional. We have I (u h) I (u) (u y h)2 (u y)2 dx 2 (u y)h 2h2 dx. After division by and passing to the limit we get I (u), h 2(u y)hdx. Then we find the derivative I (u ) 2(u y ). So the gradient method (13.8) is determined by the equality a, v k ( x) 2 k u ( x) y ( x) a, v k 1 ( x) v k ( x) 2 k u ( x) y ( x) , a v k ( x) 2 k u ( x) y ( x) b, v k ( x) 2 k u ( x) y ( x) b. b, Task. Minimize the functional I I (v ) on the given set U. Values of parameters: I (v ) variant 1 (v , v ) v14 v24 1 v (t )dt 3 0 3 v16 v22 4 v ( x)dx (v , v ) 1 vi bi , i 1, 2 2 U v v( x) 1, x 2 2 1 5 v v22 v32 6 v( x) sin x 2 vi ai , i 1, 2 2 U v v(t ) 1, t [0,1] 1 2 U (v , v , v ) 1 dx 2 3 v1 b, v2 a U v 0 v( x) 1, x 7 v14 2v22 8 1 v 4 (t ) dt 0 9 v v22 v32 2 1 (v , v ) 1 2 v1 a U v 0 v(t ) 1, t [0,1] (v , v , v ) 1 2 3 a vi b, i 1, 2,3 Steps of the task: 1. Find Gateaux derivative of the functional. 2. Determine gradient method for the problem of the minimization of the given functional without any constraints. 3. Determine gradient method for the problem of the minimization of the given functional with given constraint. 1. Акулич М.Л. Математическое программирование в примерах и задачах. – М., ВШ, 1986. – С. 269-282. 2. Кузнецов Ю.Н., Кузубов В.И., Волощенко А.Б. Математическое программирование. – М., ВШ, 1976. – С. 249-270.