Download Unconstrained Univariate Optimization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Renormalization group wikipedia , lookup

Recursion (computer science) wikipedia , lookup

Computational electromagnetics wikipedia , lookup

Generalized linear model wikipedia , lookup

Computational fluid dynamics wikipedia , lookup

Genetic algorithm wikipedia , lookup

Simulated annealing wikipedia , lookup

Simplex algorithm wikipedia , lookup

Reinforcement learning wikipedia , lookup

Dirac delta function wikipedia , lookup

Multiple-criteria decision analysis wikipedia , lookup

Computational chemistry wikipedia , lookup

Root-finding algorithm wikipedia , lookup

Multi-objective optimization wikipedia , lookup

Newton's method wikipedia , lookup

False position method wikipedia , lookup

Mathematical optimization wikipedia , lookup

Transcript
Unconstrained Univariate
Optimization
Univariate optimization means optimization of a scalar
function of a single variable:
y = P(x)
These optimization methods are important for a variety
of reasons:
1) there are many instances in engineering when
we want to find the optimum of functions such as these
(e.g. optimum reactor temperature, etc.),
2) almost all multivariable optimization methods
in commercial use today contain a line search step in
their algorithm,
3) they are easy to illustrate and many of the
fundamental ideas are directly carried over to
multivariable optimization.
Of course these discussions will be limited to nonlinear
functions, but before discussing optimization methods
we need some mathematical background.
Unconstrained Univariate
Optimization
Continuity
In this course we will limit our discussions to continuous
functions.
discontinuous
continuous
x0
x0
Functions can be continuous but there derivatives may
not be.
discontinuous
derivatives
continuous
derivatives
x0
x0
A function P(x) is continuous at a point x0 iff:
P( x 0 ) exists
and
lim P( x 0 ) " lim P( x 0 ) " P( x 0 )
x !+ x0
x !- x0
Unconstrained Univariate
Optimization
Some discontinuous functions include:
•
price of processing equipment,
•
manpower costs,
•
measurements,
M
Many of the functions we deal with in engineering are
either continuous or can be approximated as continuous
functions. However, there is a growing interest in solving
optimization problems with discontinuities.
Convexity of Functions
In most optimization problems we require functions
which are either convex or concave over some region
(preferably globally):
concave
convex
P(x)
P(x)
xa
xb
xa
xb
Unconstrained Univariate
Optimization
This can be expressed mathematically as:
i) for a convex function and ! & "[0,1],
P(& x a + [1 # & ]x b ) $ & P(x a ) + [1 # & ] P(x b )
ii) for a concave function and ! & "[0,1],
P(& x a + [1 # & ]x b ) % & P(x a ) + [1 # & ] P(x b )
Convexity is a general idea used in optimization theory
which when applied to a function, describes the property
of having an optimum. Convexity combines both
stationarity and curvature into a single concept. Often
both convex and concave functions are described as
exhibiting the property of convexity.
These conditions are very inconvenient for testing a
specific function, so we tend to test the derivatives of the
function at selected points. First derivatives give us a
measure of the rate of change of the function. Second
derivatives give us a measure of curvature or the rate of
change of the second derivatives.
Unconstrained Univariate
Optimization
Notice for a convex function:
P(x)
dP
<0
dx
dP
>0
dx
dP
=0
dx
x
The first derivatives increase (become more positive) as
x increases. Thus, in the region we are discussing, the
second derivative is positive. Since the first derivative
starts out negative and becomes positive in the region of
interest, there is some point where the function no longer
decreases in value as x increases and this point is the
function’s minimum. This is another way of stating that
the point x* is a (local) minimum of the function P(x) iff:
P( x * ) ! P( x ) " x #[ x a , x b ]
If this can be stated as a strict inequality x* is said to be
a unique (local) minimum.
Unconstrained Univariate
Optimization
A similar line of argument can be followed for concave
functions, with the exception that: the first derivatives
decrease (become more negative) with increasing x, the
second derivative is negative and as a result, the
optimum is a maximum.
The problem with these ideas is that we can only
calculate the derivatives at a point. So looking for an
optimum using these fundamental ideas would be
infeasible since a very large number of points
(theoretically all of the points) in the region of interest
must be checked.
Necessary and Sufficient Conditions for an Optimum
Recall that for a twice continuously differentiable function
P(x), the point x* is an optimum iff:
stationarity
dP
= 0
dx x *
and:
d 2P
> 0
dx 2 x *
d 2P
< 0
dx 2 x *
a minimum
a maximum
Unconstrained Univariate
Optimization
What happens when the second derivative is zero?
Consider the function:
y = x4
Which we know to have an minimum at x = 0.
y = x4
y
0
x
At the point x=0:
dy
= 4x 3
dx x = 0
= 0
d2y
= 12 x 2 = 0
2
dx x = 0
d3y
= 24 x = 0
dx 3 x = 0
d4y
= 24
dx 4 x = 0
> 0
stationarity
a minimum?
Unconstrained Univariate
Optimization
We need to add something to our optimality conditions. If
at the stationary point x* the second derivative is zero we
must check the higher-order derivatives. Thus, for the
first non-zero derivative is odd, that is:
d nP
!0
dx n x *
where n "{3,5,7,K}
Then x* is an inflection point. Otherwise, if the first
higher-order, non-zero derivative is even:
a minimum
d nP
>0
dx n x *
where n !{4,6,8,K}
d nP
<0
dx n x *
where n !{4,6,8,K}
a maximum
Some Examples:
1)
min ax 2 + bx + c
x
stationarity:
dP
= 2ax + b = 0
dx
curvature:
! x* =
"b
2a
# a > 0 ! minimum
d 2P
=
2
a
$
dx 2
%a < 0 ! maximum
Unconstrained Univariate
Optimization
2)
min x 3 ! 2 x 2 ! 5 x + 6
x
stationarity:
dP
= 3x 2 ! 4 x ! 5 = 0
dx
curvature:
d 2P
dx 2 x * = 2 ±
19
3
" x* =
2 ± 19
3
)
# * 2 + 19 &
2
19
>
0
"
minimum
%x =
(
+
3 '
$
+
= 6x - 4 *
+! 2 19 < 0 " maximum #% x * = 2 ! 19 &(
+,
3 '
$
P( x ) = x 3 ! 2 x 2 ! 5 x + 6
P(x)
x
3)
min ( x ! 1) n
x
Unconstrained Univariate
Optimization
4) A more realistic example. Consider the situation where
we have two identical settling ponds in a waste water
treatment plant:
F, c1
F, ci
F, ci
V,ρ
V,ρ
These are normally operated at a constant throughput
flow of F, with an incoming concentration of
contaminants ci. The system is designed so that at
nominal operating conditions the outgoing concentration
co is well below the required levels. However, we know
that twice a day (~7 am and 7 pm) the incoming
concentration of contaminants to this section of the plant
dramatically increase above these nominal levels. What
we would like to determine is whether the peak outgoing
levels will remain within acceptable limits.
What assumptions can we make?
1) well mixed,
2) constant density,
3) constant flowrate.
Unconstrained Univariate
Optimization
contaminant balance:
second pond
d!Vco
= Fc1 " Fco
dt
d!Vc1
= Fci " Fc1
dt
first pond
"V
!=
Let the residence time be
F
balances become:
. Then the
second pond
dco
!
= c1 " co
dt
dc
! 1 = ci " c1
dt
first pond
Since we wish to know how changes in ci will affect the
outlet concentration co. Define perturbation variables
around the nominal operating point and take Laplace
Transforms:
second pond
!s Co ( s ) = C1 ( s ) " Co ( s )
!s C1 ( s ) = C i ( s ) " C1 ( s )
first pond
Combining and simplifying into a single transfer function:
1
C0 ( s ) =
Ci ( s )
2
(!s + 1)
Unconstrained Univariate
Optimization
If we represent the concentration disturbances as ideal
impulses:
Ci ( s ) = k
Then, transfer function of the outlet concentration
becomes:
k
C0 ( s ) =
(!s + 1)2
In the time-domain this is:
c0 ( t ) = kte
!
co*
t
"
co(t)
t*
t
t
!
Since we want to determine:
)
max
co ( t ) = kte
t
stationarity:
dco
t % ! )t
"
= k $1 ! ' e = 0
# )&
dt
curvature:
!
2
d co
dt 2
t =)
t
e
= k "$ ! 2%'
#)
& )
t
( "$1 ! %' = 0
# )&
t
)
ke !1
=!
<0
)
t =)
(t =)
(
maximum
( k > 0)
Unconstrained Univariate
Optimization
Finally, the maximum outlet concentration is:
c0 (! ) = k!e $1 # 0.368
k"V
F
This example serves to highlight some difficulties with
the analytical approach. These include:
→ determining when the first derivative goes to zero.
This required the solution of a nonlinear equation
which is often as difficult as the solution of the
original optimization problem.
→ computation of the appropriate higher-order
derivatives.
→ evaluation of the higher-order derivatives at the
appropriate points.
We will need more efficient methods for optimization of
complex nonlinear functions. Existing numerical methods
can be classified into two broad types:
1) those that rely solely on the evaluation of the
objective function,
2) those that use derivatives (or approximations
of the
derivatives) of the objective function.
Interval Elimination Methods
These methods work by systematically reducing the size
of the interval within which the optimum is believed to
exist. Since we are only ever working with an interval,
we can never find the optimum exactly. There are many
interval elimination algorithms; however, they all consist
of the same steps:
1) determine an interval in which the optimum is
believed to exist,
2) reduce the size of the interval,
3) check for convergence,
4) repeat steps 2 & 3 until converged.
We will examine only one interval elimination method,
called the “Golden Section Search”.
Scanning & Bracketing
As an example, consider that our objective is to minimize
a function P(x). In order to start the optimization
procedure we need determine the upper and lower
bounds for an interval in which we believe an optimum
will exist. These could be established by:
•
physical limits,
•
scanning,
- choosing a starting point,
- checking the function initially
decreases,
- finding another point of higher value.
Interval Elimination Methods
Golden Section Search
Consider that we want to break up the interval, in which
we believe the optimum exists, into two regions:
l1
l2
a
L
b
x
l1 l2
where: = = r
l2 L
Since L=l1+l2 , l2 =rL and l1 =r2L, then:
L = rL + r 2 L
or:
r2 + r !1 = 0
There are two solutions for this quadratic equation. We
are only interested in the positive root:
!1+ 5
r =
" 0.61803398K
2
Which has the interesting property:
1
r=
r +1
Golden ratio
Interval Elimination Methods
We will use this Golden ratio to break up our interval as
follows:
rL
rL
ak
x1k
x2k
bk
x
L
By breaking up the interval in this way we will be able to
limit ourselves to one simple calculation of an xi and one
objective function evaluation per iteration.
The optimization algorithm is:
1) using the interval limits ak and bk, determine
x1k and x2k,
2) evaluate P(x1k ) and P(x2k ),
3) eliminate the part of the interval in which the
optimum is not located,
4) repeat steps 1 through 3 until the desired
accuracy is obtained.
How can we decide which part of the interval to
eliminate?
Interval Elimination Methods
There are only three possible situations. For the
minimization case, consider:
P(x)
P(x1k)>P(x2k)
x1k
P(x)
x
P(x1k)<P(x2k)
x1k
P(x)
x2k
eliminate the interval to the left
of x1k. For a convex function, all
values to the right of x1k are less
than those on the left.
x2k
eliminate the interval to the right
of x2k. For a convex function, all
values to the left of x2k are less
than those on the right.
x
we know that the optimum is
between x1k and x2k, so we could
eliminate the intervals to the left
of x1k and to the right of x2k. In
practice we eliminate only one
region for computational ease.
P(x1k)=P(x2k)
x1k
x2k
x
Interval Elimination Methods
Graphically the algorithm is:
eliminate kth iteration
eliminate k+1th iteration
P(x)
ak+1
x1k
ak
x1k +1 x2k +1 bk +1
x2k
bk
x
interval after k+1th iteration
Three things worth noting:
1) only one new point needs to be evaluated
during each iteration,
2) the length of the remaining interval after k
iterations (or accuracy to which you know the
optimum value of x) is:
Lk = r k L0 ! ( 0.618034 ) k L0
3) the new point for the current iteration is:
x1k = a k + b k ! x 2k
or:
x 2k = a k + b k ! x1k
Interval Elimination Methods
Golden-Section Example
Find the solution to within 5% for:
x 2 ! 2x + 1
min
x
Guess an initial interval for the optimum as 0 ! x * ! 2 .
k
ak
bk
x1k
x2k
P(x1k)
0
1
2
3
4
5
6
0.0000
0.7640
0.7640
0.7640
0.7640
0.8755
0.9443
2.0000
2.0000
1.5280
1.2360
1.0558
1.0558
1.0558
0.7640
1.2360
1.0558
0.9443
0.8755
0.9443
0.9869
1.2360
1.5280
1.2360
1.0558
0.9443
0.9869
1.0132
5.5696E-02
5.5696E-02
3.1158E-03
3.1025E-03
1.5507E-02
3.1025E-03
1.7130E-04
P(x2k)
5.5696E-02
2.7878E-01
5.5696E-02
3.1158E-03
3.1025E-03
1.7130E-04
1.7440E-04
Note that the number of calculation we have performed
is:
•
(k+2) = 8 evaluations of P(xi),
•
(k+2) = 8 determinations of a new point xi.
There are a variety of other interval elimination methods
including Interval Bisection, Fibonacci Searches, and so
forth. They are all simple to implement and not
computationally intensive. They tend to be fairly robust
and somewhat slow to converge.
Polynomial Approximation Methods
There are a number of polynomial approximation
methods, which differ only in the choice of polynomial
used to locally approximate the function to be optimized.
Polynomials are attractive since they are easily fit to data
and the optima of low-order polynomials are simple to
calculate. We will only discuss Successive Quadratic
Approximation as it illustrates the basic ideas and will
carry over into multivariable optimization techniques.
Successive Quadratic Approximation
Suppose we have evaluated some test points of a
function as follows:
P(x)
P( x ) ! ax 2 + bx + c
x1
x2
x3
x
We could approximate the objective function by fitting a
quadratic through the test points.
Polynomial Approximation Methods
The main advantage of using a quadratic approximation
is that we know that optimum is:
b
2a
The first step is to calculate the approximation constants
(a,b,c) in the quadratic from the available points (x1, x2,
x3). Using these three evaluation points we can write:
x$ * = !
P( x1 ) = ax12 + bx1 + c
P( x 2 ) = ax 22 + bx 2 + c
linear in (a,b,c)
P( x3 ) = ax32 + bx3 + c
We have three equations and three unknowns, so in
general the constants (a,b,c) can be uniquely
determined. In matrix form the set of equations are
written:
! x12 x1 1$ !a $
! P( x1 ) $
# 2
&# &
# P( x )&
x
x
1
b
=
2
2
2
#
&# &
#
&
2
#" x3 x3 1&% #" c &%
#" P( x3 )&%
Which has the solution:
! x12
!a $
#b& = # x 2
# 2
# &
#" x32
#" c &%
x1 1$
&
x 2 1&
x3 1&%
-1
! P( x1 ) $
# P( x )&
2
#
&
#" P( x3 )&%
Polynomial Approximation Methods
Some algebra yields the solution for the minimum as:
2
2
2
2
2
2
1 " (x 2 ! x3 )P( x1 ) + (x3 ! x1 )P( x 2 ) + (x1 ! x 2 )P( x3 ) %
*
x$ = $
'
2 $# (x 2 ! x3 )P( x1 ) + (x3 ! x1 )P( x 2 ) + (x1 ! x 2 )P( x3 ) '&
Graphically, the situation is:
P(x)
P( x ) ! ax 2 + bx + c
x1
x2
x$ *
x3
x
To perform the next iteration, we need to chose which
three points to use next. The textbook gives you some
elaborate rules for coding into a computer. If you’re
doing hand calculations choose the point (xi) with the
smallest value of the objective function P(x) and the
x$ *
points immediately
on either side. Note that will not
necessarily have the smallest objective function value.
Finally, the accuracy to which the optimum is known at
any iteration is given by the two points which bracket the
point with the smallest objective function value.
Polynomial Approximation Methods
The procedure is:
1) choose three points that bracket the optimum,
2) determine a quadratic approximation for the
objective function based on these
three points.
3) calculate the optimum of the quadratic x$ *
approximation (the predicted optimum ),
4) repeat steps 1 to 3 until the desired accuracy
is reached.
Example:
revisit the wastewater treatment problem
with τ = 1.
max te ! t " min ! te ! t
t
t
iteration
t1
t2
t3
P( t1 )
P( t 2 )
P( t3 )
t$ *
P( t$ * )
0
0.5
15
.
2.5
! 0.3303
! 0.3347
! 0.2052
11953
.
! 0.3617
1
0.5
11953
.
15
.
! 0.3303
! 0.3617
! 0.3347 1.0910
! 0.3664
2
0.5 1.0910
11953
.
! 0.3303
! 0.3664
! 0.3617 1.0396
! 0.3676
3
0.5 1.0396 1.0910
! 0.3303
! 0.3676
! 0.3664 1.0185
! 0.3678
4
0.5 1.0185 1.0396
! 0.3303
! 0.3678
! 0.3676
! 0.3679
1.0083
Newton’s Method
This is a derivative-based method that uses first and
second derivatives to approximate the objective function.
The optimum of the objective function is then estimated
using this approximation.
Consider the 2nd-order Taylor Series approximations of
our objective function:
dP
1 d 2P
2
P( x ) ! P( x k ) +
x
"
x
(x " x k ) +
(
)
k
dx x k
2 dx 2 x k
a quadratic approximation
of P(x) at the point xk.
Then we could approximate the first derivative of the
objective function as:
dP
dP
1
d 2P
!
+ ( 2 ) 2 (x " x k )
dx
dx x k 2
dx x k
Since we require that the first derivative of the objective
function to be zero at the optimum, then:
dP
dP
d 2P
!
+
(x " x k ) = 0
dx
dx x k
dx 2 x k
Newton’s Method
A little algebra will allow us to estimate what the
optimum value of x is:
x$ * = x k +1
"
$
= xk ! $
$
$
#
dP
dx x k
d 2P
dx 2 x k
%
'
'
'
'
&
This is called a Newton step and we can iterate on this
equation to find the optimum value of x.
Graphically, this is:
dP
dx
x*
0
xk+1
xk
x
Newton’s method attempts to find the stationary point of
a function by successive approximation using the first
and second derivatives of the original objective function.
Newton’s Method
Example: revisit the wastewater treatment problem
with τ = 1.
max te ! t " min ! te ! t
t
t
iteration
xk
P !(x k )
P !!(x k )
P(x k )
0
1
2
3
4
5
0.0000
0.5000
0.8333
0.9762
0.9994
10000
.
1000
.
0.3303
0.0724
0.0090
0.0002
" 2.0000
" 0.9098
" 0.5070
" 0.3857
" 0.3683
" 0.0000
" 0.3303
" 0.3622
" 0.3678
" 0.3679
In summary, Newton’s Method:
1) shows relatively fast convergence (finds the
optimum of a quadratic in a single step),
2) requires first and second derivatives of the
function,
3) has a computational load of first and second
derivative evaluations, and a Newton step
calculation per iteration,
4) may oscillate or not converge when there
many local optima or stationary points.
Quasi-Newton Methods
A major drawback of Newton’s method is that it requires
us to have analytically determined both the first and
second derivatives of our objective function. Often this is
considered onerous, particularly in the case of the
second derivative. The large family of optimization
algorithms that use finite difference approximations of
the derivatives are called “Quasi-Newton” methods.
There are a variety of ways first derivatives can be
approximated using finite differences:
P(x + $x ) # P(x # $x )
central
2 $x
P(x + $x ) # P(x )
forward
P !( x ) "
$x
P(x ) # P(x # $x )
P !( x ) "
backward
$x
Approximations for the second derivatives can be
determined in terms of either the first derivatives or the
original objective function:
P !(x + $x ) # P !(x # $x )
P !!( x ) "
2 $x
P !(x + $x ) # P !(x )
P !!( x ) "
$x
P !(x ) # P !(x # $x )
P !!( x ) "
$x
P !( x ) "
Quasi-Newton Methods
P(x + $x ) # 2 P( x ) + P(x # $x )
$x 2
P(x + 2$x ) # 2 P(x + $x ) + P( x )
P !!( x ) "
$x 2
P(x ) # 2 P(x # $x ) + P( x # 2$x )
P !!( x ) "
$x 2
P !!( x ) "
The major differences within the Quasi-Newton family of
algorithms arises from: the number of function and / or
derivatives that must be evaluated, the speed of
converge and the stability of the algorithm.
Regula Falsi
This method approximates the second derivative as:
P "( x q ) # P "( x p )
d 2P
!
dx 2
xq # x p
where xp and xq are chosen so that P !( x p ) and P !( x q )
have different signs (i.e. the two points bracket the
stationary point).
Quasi-Newton Methods
Substituting this approximation into the formula for a
Newton step yields:
( dP
+
*
(xq ! x p ) * dx x q
xq ! *
" dP
%dP
*$
'!
*) $# dx x q dx x p '& -,
x$ k* =
The Regula Falsi method iterates on this formula taking
care on each iteration to retain two points:
depending upon:
x$ k* and
x p or x q
keep x$ k* and x q
[
]
[
]
[
]
[
]
sign P !( x$ k* ) = sign P !( x p )
or:
sign P !( x$ k* ) = sign P !( x q )
keep x$ k* and x p
Graphically:
dP
dx
x$ k* +1
x$ k*
0
xp
x*
xq
x
Quasi-Newton Methods
Example:
problem
revisit the wastewater treatment
with τ = 1.
max te ! t " min ! te ! t
t
t
iteration
xp
xq
x$ k*
0
1
2
3
4
5
6
0
0
0
0
0
0
0
2.0000
17616
.
15578
.
13940
.
12699
.
11804
.
11184
.
17616
.
15578
.
13940
.
12699
.
11804
.
11184
.
10768
.
P !( x p ) P !( x q ) P !( x$ k* )
"1
"1
"1
"1
"1
"1
"1
01353
.
01308
.
01175
.
0.0978
0.0758
0.0554
0.0387
01308
.
01175
.
0.0978
0.0758
0.0554
0.0387
0.0262
note: computational load is one derivative
evaluation, a sign check and a step
calculation per iteration.
In summary, Quasi-Newton methods:
1) show slower convergence rates the Newton’s
Method,
2) use some finite difference approximations for
derivatives of the objective function,
3) may oscillate or not converge when there
many
local optima or stationary points.
Univariate Summary
There are only two basic ideas in unconstrained
univariate optimization:
1) direct search (scanning, bracketing, interval
elimination, approximation),
2) derivative based methods (optimality
conditions,
Newton and Quasi-Newton methods).
The direct search methods work directly on the objective
function to successively determine better values of x.
The derivative based methods try to find the stationary
point of the objective function, usually by some iterative
procedure.
Choosing which method to use is always a trade-off
between:
1) ease of implementation,
2) convergence speed,
3) robustness,
4) computational load (number of function
evaluations, derivative evaluations /
approximations, etc.).
There are no clear answers as to which univariate
method is better. The best choice is problem dependent:
•
for nearly quadratic functions, the Newton /
Quasi-Newton methods are generally a good
choice,
•
for very flat functions, interval elimination
methods can be very effective.
Univariate Summary
When the univariate search is buried in a multivariate
code, the best choice of univariate search will depend
upon the properties and requirements of the multivariate
search method.
Finally remember that to solve an unconstrained
univariate optimization problem you must:
1) have a function (and derivatives) you can
evaluate,
2) choose an optimization method appropriate for
the
problem,
3) choose a starting point (or interval),
4) select a termination criteria (or accuracy).