Download Part II. Optimization methods

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Computational fluid dynamics wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Renormalization group wikipedia , lookup

Corecursion wikipedia , lookup

Scalar field theory wikipedia , lookup

Path integral formulation wikipedia , lookup

Dirac delta function wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Linear algebra wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Simplex algorithm wikipedia , lookup

Least squares wikipedia , lookup

Generalized linear model wikipedia , lookup

Newton's method wikipedia , lookup

Mathematical optimization wikipedia , lookup

False position method wikipedia , lookup

Transcript
CALCULUS OF VARIATIONS AND OPTIMIZATION METHODS
Part II. Optimization methods
Lecture 13. Gradient methods and differentiation of the functionals
We consider the methods of solving extremum problems, which is based on the necessary conditions of the
extremum. However these methods transform the initial problem to another problem. There is the system of
optimality conditions. Then we used iterative methods for solving this system. Our next step is the analysis of the
direct iterative methods of solving extremum problems without using conditions of the extremum.
13.1. Gradient method for the functions with many variables
Let F  F (v)  F (v1 ,...., vn ) be a smooth enough function.
Problem 13.1. Find the vector v  (v1 ,...., vn ) which minimize the function F.
The practical solving of this problem is based on the determination of a sequence {v k } of n
degree vectors. Let the value v k be known. The next iteration can be found by the formula
(13.1)
v k 1  v k   k hk ,
k
k
where  is a number (parameter of the algorithm, which is chosen positive), and h is n degree
vector. We try to choose these values such that the function F will be increase from k iteration to
k  1 iteration.
We obtain
n
F (v k ) k
F (v k 1 )  F (v k   k h k )  F (v k )   k 
hi   ( k ),

v
i 1
i
where  ( k ) /  k  0 as  k  0 . We have the sum of the product of the vector components at
the second term of the right side of this equality. It is the scalar product of two vectors. First of
them has the partial derivative of the function F as the components. This vector is called the
gradient of the function F. So we get the equality
F (v k 1 )  F (v k )   k  F (v k ), h k    ( k ).
Determine the vector
hk  F (v k ).
(13.2)
Then we have
2
F (v k 1 )  F (v k )   k F (v k )   ( k )
because the scalar square of the vector is the square of its norm. Choose the number  k small
enough for large enough value k. Then the sigh of the sum at the right side of this equality will
be determined by the sign of its first term. Therefore we obtain the inequality
F (v k 1 )  F (v k )  0.
So the sequence  F (v k ) is decreasing. Therefore we can suppose that the limit of the sequence
can be a solution of Problem 13.1. The algorithm (13.1), (13.2) is called gradient method.
Remark 13.1. In really the value  can be non-enough small. So we cannot guarantee the
convergence of the method to the minimum of the function F. Besides it can converge to the point of its
local but not this absolute one because the properties of the function are same at the small enough
neighbourhood of the local and absolute minimum.
Remark 13.2. If F (v k )  0, then we have the equality v k 1  v k , and we get the convergence of the
gradient method. So the gradient method is the method of finding the solutions of the stationary condition
in really.
There exist a lot of variants of the gradient method. It depends of from the choice of the
parameter  k . For example, we obtain the quickest descent method if we find it from the
equality
(13.3)
F v k   k F (v k )   min F v k  F (v k )  .
 0
The quickest descent method has the best velocity of the convergence. However it uses the
problem (13.3) of minimization the function of one variable for each step of the iteration.
We would like now to extend these results to the problems of the functionals minimization.
So we need to determine the derivatives of the functionals, namely analogues of the gradient of
the functions of many variables.
13.2. Projection gradient method
We considered the minimization problems without any constraint. Try to use our results for
conditional minimization problems. Let again F be the function of n variable, and U be an
n-dimensional set.
Problem 13.2. Find the vector v  (v1 ,...., vn ) which minimize the function F on the set U.
We can use the following algorithm
v k 1  P v k   k F (v k )  ,
(13.4)
where  k is a positive number (parameter of the algorithm, and P is the operator of the
projection or projector on the set U. It maps the given point of the n-dimensional space to the
nearest point of the set U. This algorithm is called the projection gradient method.
Example 13.1. Let us have the problem of the minimization of a function F  F (v) on the
interval [ a, b] . Then the projection gradient method is determined by the formula
 a,
if v k   k F (v k )  a,

v k 1  v k   k F (v k ), if a  v k   k F (v k )  b,
b,
if v k   k F (v k )  b.

There exist a lot of variants of the projection gradient method with different choice of the
parameter  k .
Remark 13.3. If the set U is difficult enough we can have some difficulty of the calculation of
projection.
13.3. Gateaux derivative
Let us consider a general functional I on the arbitrary set V. We could extend in principle the
stationary condition and gradient method for its minimization if we had some methods of
differentiation of this functional. Let us try to use the standard technique for calculate its
derivative at a point v of the set V. It is know that the derivative of a function at a point is the
result of passing to the limit to the increment of the function devised by the increment of the
argument as the increment of the argument tends to zero.
Try to consider the increment of the argument at the point v. Let it be described by a value h.
So the corresponding increment of functional is the difference between its value at the new point
v  h and its value at the initial point v. There is the question, what is the sum of two elements
of the given set V? We have the necessity to guarantee belonging of the sum of two arbitrary
elements of V to this set. So the operation of the addition is determined on the set V. The
difference I (v  h )  I (v ) has the sense in this case.
Our next step is the calculation of the ratio of the increments  I ( v  h )  I ( v )  / h. We have the
functional I, so its numerator is a number. But what is the sense of the division of the number
I (v  h )  I (v ) to the element h of the set V. It is clear, if h is a number. However it can be a
vector or a function. Unfortunately the considered fraction does not have any sense in this case.
So we need to correct the definition of the derivative.
We can determine the derivative of the function f  f ( x ) at the point x by the following
method. We consider the ratio  f ( x  h )  f ( x )  /  , where  and h are number. Then we try to
pass to the limit here as  tends to zero. If there exists a limit of this value, and this limit is linear
with respect to h, then it has the form f ( x ) h for all value h, where f ( x ) is the derivative of the
function f at the point x.
We can try to use this idea for our case. Consider the ratio  I ( v  h )  I ( v )  /  , where  is a
number, and h is an element of the set V. We have the ratio of two numbers here. So this term
has the natural sense, and we will not any problem with division. However we need to interpret
the product  h . Our technique can be true if we can determine the product between an arbitrary
number  and the element h of the set V. So the operation of the multiplication of the element of
V to the number is determined on the set V.
Definition 13.1. The set with addition of the elements and the multiplication of the element to
the number (with natural additional properties) is called the linear space.
Hence we will consider the set V as a linear space. The term  I ( v  h )  I ( v )  /  has the
sense for this case. Our next step is passing to the limit at the last term. However the
convergence does not determined on the arbitrary set. We can pass to the limit on the topological
space only. So we suppose that our set V is a topological space. But we have the necessity of a
relation between the given algebraic operations and passing to the limit. Particularly we would
like to have the convergence  h  0 for all h as   0. Then we would like to obtain the
convergence v   h  v. We can guarantee these properties if our operations are continuous.
Let us have the convergence uk  u. The operation of the multiplication by number is
continuous if we obtain  uk   u. Let us have also the convergence vk  v. The operation of


the addition is continuous if we get uk  vk  (u  v).
Definition 13.2. If the set is linear and topological space with continuous operations, then is
called the linear topological space.
Hence we will consider the set V as a linear topological space. So the passing to the limit for
the term  I ( v  h )  I ( v )  /  as   0 has the sense. Suppose the existence of the
corresponding limit. It depends of course from v and h. We suppose also that its dependence
from h. So we have the equality
I (v   h )  I (v )

 I ( v ) h,
(13.5)

where the map h  I (v ) h is linear. The term in the left side of this equality is a number,
because it is the limit of ration of the two numbers. So the term in the right side of (13.5) is
number too. Therefore the map I ( v ) is a linear functional.
lim
 0
Definition 13.3. A functional I is called Gateaux differentiable at the point v if there exists a
linear functional I ( v ) such that the equality (13.5) holds. Besides I ( v ) is called Gateaux
derivative of the functional I at the point v.
13.4. Examples of Gateaux derivative
Consider examples of Gateaux derivatives.
Example 13.1. Linear functional. The functional I on the set V is linear if it satisfies the
following equality
I ( u   v )   I (u )   I (v )
for all elements u and v of V and number ,. Find the derivative of the linear functional I at the
arbitrary point v. We have
I (v   h )  I (v )
I (v )   I ( h )  I (v )
lim
 lim
 I ( h)
 0
 0


for all h. So the linear functional is Gateaux differentiable at the arbitrary point; besides it its
derivative is equal to the initial functional.

Example 13.2. Affine functional. The functional I on the set V is called affine if it is
determine by the equality
I (v )  J (v )  a v  V ,
where J is a linear on V, and a is an element of V. We find
 J ( v  h )  a    J (v )  a 
I (v   h )  I (v )
lim
 lim
 J ( h).
 0
 0


So the affine functional is Gateaux differentiable at the arbitrary point; besides it its derivative is
equal to the corresponding linear functional.

Example 13.3. The function of one variable. Consider a function f  f ( x ). Suppose it is
differentiable at the point x. So we have the equality
f ( x   h)  f ( x )

 f ( x )h
(13.6)

for all number h. So Gateaux derivative of the function f at the point x is its classical derivative at
this point.

lim
 0
Example 13.4. The function of many variables. Consider a function f  f ( x1 ,..., xn ).
Determine its Gateaux derivative at the point x  ( x1 ,..., xn ). We would like to pass to the limit
in the equality
f ( x   h)  f ( x) f ( x1   h1 ,..., xn   hn )  f ( x1 ,..., xn )



as   0 for all vector h  (h1 ,..., hn ). Let the function f be differentiable with respect to all its
arguments. Then we have
lim
 0
f ( x   h)  f ( x )

n

i 1
f ( x)
hi .
xi
(13.7)
The term at the right side of this equality is the scalar product of the gradient
 f ( x)
f ( x) 
f ( x)  
,...,

xn 
 x1
and the vector h. So Gateaux derivative of the function f of many variable at the point x is its
gradient at this point.

The term at the right side of the equality (13.7) is the scalar product of Gateaux derivative
f ( x) and the vector h. There exists a class of linear topological spaces, where a scalar product
has the sense. The definition of Gateaux derivative by means the scalar product is clear for this
case.
13.5. Unitary spaces and Gateaux derivatives
We determine that Gateaux derivative is definite by scalar product for the functions of many
variables. It is possible to extend this result for the general linear spaces with scalar product.
Definition 13.4. The value (u , v ) is the scalar product of the elements u and v of the linear
space V, if the maps u  (u , v ) and v  (u , v ) are linear functionals and (u , v )  (v, u ) for all
u , v  V ; besides (u , u )  0, and (u , u )  0, if and only if u is zero element of the space V, that is
u  v  v for all v. The linear space with a scalar product is called the unitary space.
The set of real number R is the unitary space with scalar product
(u , v )  uv u , v  R.
n
Euclid space R is the unitary space with standard scalar product
n
n
(u , v )   ui vi u , v  R .
i 1
The set S    of the smooth enough functions on the set  with zero values on the boundary of
 is the unitary spaces with scalar product
(u, v)   u( x)v( x)d  u, v  S ().

Let V be the linear space with a scalar product. The value
v  ( v ,v )
is called the norm of the element v of V. The sequence {vk } of the linear space V with scalar
product converges to the point v, if
lim vk  v  0.
k 
The unitary space with this convergence is the linear topological space.
Definition 13.5. A functional I on the unitary space V is called Gateaux differential at the
point v if there exists an element I ( v ) of V such that
lim
I (v   h)  I (v )
  I ( v ), h  h  V

holds. Besides I ( v ) is called Gateaux derivative of the functional I at the point v.
 0
(3.6)
It obviously, that the definition 3.2 and 3.5 are equivalent for the set of real numbers and
Euclid space.
Example 13.5. Square of the norm. Consider the functional
2
I (v)  v
on the unitary space V. Find the value
I (v   h)   v   h, v   h    v, v   2  v, h    2  h, h  .
Then we have
I (v   h)  I (v)  2  v, h    2 h .
2
After division by  and passing to the limit as   0 we get
 I '(v), h   2v, h  h V .

So I '(v)  2v.
Example 13.6. Lagrange functional. Consider the functional
x2
I (v)   F [ x, v( x),v( x)]dx
x1
on the set S[ x1 , x2 ] of the smooth enough functions on the interval [ x1 , x2 ] with zero values on
the boundary of this interval. This functional is called Lagrange one. Let the function F be
smooth enough. We find

x2
x1
x1
 F  x,v( x) h( x),v( x) h( x) dx   F  x,v( x),v( x) dx 
I (v   h) 
x2
x2

F x, v ( x ), v( x )  h ( x ) dx 
v


x1
x2

 v F  x, v( x), v( x) h( x)dx   ( ),
x1
where  ( ) /   0 as   0 . Then we calculate
lim
I (v   h)  I (v )

 0
x2



F x, v ( x ), v( x ) h( x ) dx 
v


x1
x2

 v F  x, v( x), v( x) h( x)dx
x1
for all function h from the set S[ x1 , x2 ] . Differentiating by parts we get
x2


F x, v ( x ), v( x ) h( x ) dx
v


x1
x2
 
d 
F x, v ( x ), v( x ) h( x ) dx 
dx v


x1
x
2

  F x, v ( x ), v( x ) h( x )
v
x1


x2
 
d 
F x, v ( x ), v( x ) h( x ) dx
dx v


x1
because the function h is equal to zero at the point x1 and x2. Therefore the derivative of
Lagrange functional at the point (function) v is determined by the equality
x2
 I (v),h   


d 
F x, v ( x ), v( x ) 
F x, v ( x ), v( x )
v
dx v



 h( x)dx h  S[ x1 , x2 ].
x1
So we determine Gateaux derivative
I (v) 

d 
F x, v( x), v( x ) 
F x, v ( x ), v( x ) .
v
dx v





Example 13.7. Dirichlet integral. Let  be n-dimensional set with the boundary S. Consider
an integral
 n  v 2

 d ,
I (v)    

2
vf


x


i

1
 i

where f is a given function. This integral is called Dirichlet one. We consider it on the set S   
of the smooth enough functions on  with zero value on S. Find the value
 n   (v   h) 2

 n  v  2

I (v   h)     
  2(v   h) f  d         2vf  d  

x
x
i

1



i




 i 1  i 
2
n
 n v h

 h 
2
2   
 fh  d      
 d .

x

x

x
i

1
i

1
i
i

 i


Then we have
lim
I (v   h)  I (v )

 0
 n v
   xi
  i 1
xi

 n hdS ,
2
h

 fh  d .

Using Green’s formula we get
n
v h
  x
 i 1
where  is Laplace operator, and
i
xi
d    vhd  

v
S
v
is the derivative of v with respect to the normal to the
n
boundary S. So we can determine Gateaux derivative of Dirichlet integral by the equality
 I ( v ),h   2  ( v 
f ) hd  h  S   

because the function h is equal to zero on the set S. Then we find Gateaux derivative
I ( v )  2( f  v ).

Example 13.8. Discontinuous function. Consider the function of two variables, which is
determine by the equality
 y( x 2  y 2 ) / x, x  0,
f ( x, y)  
x  0.
0,
Find the difference
f ( h,  g )  f (0,0)   2 (h2  g 2 ) g / h
for all numbers h and g. After division by and passing to the limit as   0 we obtain that
Gateaux derivative of the function f at zero point is zero, namely second order zero vector.
However we continue our analysis. Determine x  t 3 , y  t , where t is a parameter. Then we have
f ( x, y)  (t 4  1). Passing to the limit as t  0 we have ( x, y )  0 and f ( x, y )  1. But the
value of the function f at zero is zero. So this function is discontinuous at zero. Hence the
discontinuous function of two variables can be Gateaux differentiable.

Example 13.9. Absolute value. Consider the function f ( x)  x . We have
f ( h)  f (0)  h

.


Then we obtain
lim  h /   h , lim  h /    h .
 0
 0
The result depends from the method of passing to the limit, and the result is not linear with
respect to h. So the absolute value function is not Gateaux differentiable.

13.5. Gradient methods for the functionals
Let us have the problem of the minimization of the Gateaux differentiable functional I on the
unitary space V. We can find an approximate solution of this problem with using gradient
method
(13.7)
v k 1  v k   k I (v k ),
k
k
where v is a k iteration of the unknown function,  is a positive number (a parameter of the
algorithm), I ( v k ) is Gateaux derivative of the functional I at the point I ( v k ) .
If I is a function of many variables, and V is Euclid space, then the formula (13.7) is equivalent
to the algorithm (13.1), (13.2).
Consider a problem of the minimization Lagrange functional on the space of smooth enough
functions on the interval [ x1, x2 ] . Using the formula of its Gateaux derivative, we get the
gradient method
v
k 1
( x)  v ( x)  
k
k

,

d 
k
k
k
k
F  x, v ( x ), v ( x )  
F  x, v ( x ), v ( x ) 
v
dx v
x  [ x1 , x2 ].
Now consider the problem of the minimization of Dirichlet integral on the space of smooth
enough functions on the . Using the formula of its Gateaux derivative, we obtain the following
form of the gradient method (13.7)
k 1
v ( x )  v ( x )  2 ( f  v ).
We will solve also the problem of the minimization a differentiable functional I on the subset
U of a unitary space V. Then we will the projection gradient method
v k 1  P v k   k I (v k )  ,
(13.8)
k
k
k
where P is the projection on the set U.
Example 13.10. We have the problem of the minimization of the functional
2
I (v)    v( x)  y( x)  dx

on the set
U  v  S () a  v( x)  b.
Determine the derivative of the functional. We have
I (u   h)  I (u)   (u  y   h)2  (u  y)2  dx   2 (u  y)h   2h2  dx.


After division by  and passing to the limit we get
 I (u), h    2(u  y)hdx.

Then we find the derivative I (u )  2(u  y ). So the gradient method (13.8) is determined by the
equality
 a,
v k ( x)  2 k u ( x)  y ( x)   a,

v k 1 ( x)  v k ( x)  2 k u ( x)  y ( x)  , a  v k ( x)  2 k u ( x)  y ( x)   b,

v k ( x)  2 k u ( x)  y ( x)   b.
b,
Task. Minimize the functional I  I (v ) on the given set U.
Values of parameters:
I (v )
variant
1
4
2
1
 v (t )dt
3
0
(v , v )
3
v16  v22
4
 v ( x)dx
5
v12  v22  v32
6
 v( x)  sin x
1

2
(v , v , v )
1
dx
8
1
v
4
(t ) dt
0
9
v12  v22  v32
2
3
v1  b, v2  a
U  v 0  v( x)  1, x  

v14  2v22
vi  bi , i  1, 2
2
U  v v( x)  1, x  
2
7
vi  ai , i  1, 2
2
U  v v(t )  1, t  [0,1]
1
2
U
(v , v )
v v
4
1
(v , v )
1
2
v1  a
U  v 0  v(t )  1, t  [0,1]
(v , v , v )
1
2
3
a  vi  b, i  1, 2,3
Steps of the task:
1. Find Gateaux derivative of the functional.
2. Determine gradient method for the problem of the minimization of the given functional
without any constraints.
3. Determine gradient method for the problem of the minimization of the given functional
with given constraint.
1. Акулич М.Л. Математическое программирование в примерах и задачах. – М., ВШ, 1986. –
С. 269-282.
2. Кузнецов Ю.Н., Кузубов В.И., Волощенко А.Б. Математическое программирование. – М., ВШ,
1976. – С. 249-270.