* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Unconstrained Univariate Optimization
Renormalization group wikipedia , lookup
Recursion (computer science) wikipedia , lookup
Computational electromagnetics wikipedia , lookup
Generalized linear model wikipedia , lookup
Computational fluid dynamics wikipedia , lookup
Genetic algorithm wikipedia , lookup
Simulated annealing wikipedia , lookup
Simplex algorithm wikipedia , lookup
Reinforcement learning wikipedia , lookup
Dirac delta function wikipedia , lookup
Multiple-criteria decision analysis wikipedia , lookup
Computational chemistry wikipedia , lookup
Root-finding algorithm wikipedia , lookup
Multi-objective optimization wikipedia , lookup
Newton's method wikipedia , lookup
Unconstrained Univariate Optimization Univariate optimization means optimization of a scalar function of a single variable: y = P(x) These optimization methods are important for a variety of reasons: 1) there are many instances in engineering when we want to find the optimum of functions such as these (e.g. optimum reactor temperature, etc.), 2) almost all multivariable optimization methods in commercial use today contain a line search step in their algorithm, 3) they are easy to illustrate and many of the fundamental ideas are directly carried over to multivariable optimization. Of course these discussions will be limited to nonlinear functions, but before discussing optimization methods we need some mathematical background. Unconstrained Univariate Optimization Continuity In this course we will limit our discussions to continuous functions. discontinuous continuous x0 x0 Functions can be continuous but there derivatives may not be. discontinuous derivatives continuous derivatives x0 x0 A function P(x) is continuous at a point x0 iff: P( x 0 ) exists and lim P( x 0 ) " lim P( x 0 ) " P( x 0 ) x !+ x0 x !- x0 Unconstrained Univariate Optimization Some discontinuous functions include: • price of processing equipment, • manpower costs, • measurements, M Many of the functions we deal with in engineering are either continuous or can be approximated as continuous functions. However, there is a growing interest in solving optimization problems with discontinuities. Convexity of Functions In most optimization problems we require functions which are either convex or concave over some region (preferably globally): concave convex P(x) P(x) xa xb xa xb Unconstrained Univariate Optimization This can be expressed mathematically as: i) for a convex function and ! & "[0,1], P(& x a + [1 # & ]x b ) $ & P(x a ) + [1 # & ] P(x b ) ii) for a concave function and ! & "[0,1], P(& x a + [1 # & ]x b ) % & P(x a ) + [1 # & ] P(x b ) Convexity is a general idea used in optimization theory which when applied to a function, describes the property of having an optimum. Convexity combines both stationarity and curvature into a single concept. Often both convex and concave functions are described as exhibiting the property of convexity. These conditions are very inconvenient for testing a specific function, so we tend to test the derivatives of the function at selected points. First derivatives give us a measure of the rate of change of the function. Second derivatives give us a measure of curvature or the rate of change of the second derivatives. Unconstrained Univariate Optimization Notice for a convex function: P(x) dP <0 dx dP >0 dx dP =0 dx x The first derivatives increase (become more positive) as x increases. Thus, in the region we are discussing, the second derivative is positive. Since the first derivative starts out negative and becomes positive in the region of interest, there is some point where the function no longer decreases in value as x increases and this point is the function’s minimum. This is another way of stating that the point x* is a (local) minimum of the function P(x) iff: P( x * ) ! P( x ) " x #[ x a , x b ] If this can be stated as a strict inequality x* is said to be a unique (local) minimum. Unconstrained Univariate Optimization A similar line of argument can be followed for concave functions, with the exception that: the first derivatives decrease (become more negative) with increasing x, the second derivative is negative and as a result, the optimum is a maximum. The problem with these ideas is that we can only calculate the derivatives at a point. So looking for an optimum using these fundamental ideas would be infeasible since a very large number of points (theoretically all of the points) in the region of interest must be checked. Necessary and Sufficient Conditions for an Optimum Recall that for a twice continuously differentiable function P(x), the point x* is an optimum iff: stationarity dP = 0 dx x * and: d 2P > 0 dx 2 x * d 2P < 0 dx 2 x * a minimum a maximum Unconstrained Univariate Optimization What happens when the second derivative is zero? Consider the function: y = x4 Which we know to have an minimum at x = 0. y = x4 y 0 x At the point x=0: dy = 4x 3 dx x = 0 = 0 d2y = 12 x 2 = 0 2 dx x = 0 d3y = 24 x = 0 dx 3 x = 0 d4y = 24 dx 4 x = 0 > 0 stationarity a minimum? Unconstrained Univariate Optimization We need to add something to our optimality conditions. If at the stationary point x* the second derivative is zero we must check the higher-order derivatives. Thus, for the first non-zero derivative is odd, that is: d nP !0 dx n x * where n "{3,5,7,K} Then x* is an inflection point. Otherwise, if the first higher-order, non-zero derivative is even: a minimum d nP >0 dx n x * where n !{4,6,8,K} d nP <0 dx n x * where n !{4,6,8,K} a maximum Some Examples: 1) min ax 2 + bx + c x stationarity: dP = 2ax + b = 0 dx curvature: ! x* = "b 2a # a > 0 ! minimum d 2P = 2 a $ dx 2 %a < 0 ! maximum Unconstrained Univariate Optimization 2) min x 3 ! 2 x 2 ! 5 x + 6 x stationarity: dP = 3x 2 ! 4 x ! 5 = 0 dx curvature: d 2P dx 2 x * = 2 ± 19 3 " x* = 2 ± 19 3 ) # * 2 + 19 & 2 19 > 0 " minimum %x = ( + 3 ' $ + = 6x - 4 * +! 2 19 < 0 " maximum #% x * = 2 ! 19 &( +, 3 ' $ P( x ) = x 3 ! 2 x 2 ! 5 x + 6 P(x) x 3) min ( x ! 1) n x Unconstrained Univariate Optimization 4) A more realistic example. Consider the situation where we have two identical settling ponds in a waste water treatment plant: F, c1 F, ci F, ci V,ρ V,ρ These are normally operated at a constant throughput flow of F, with an incoming concentration of contaminants ci. The system is designed so that at nominal operating conditions the outgoing concentration co is well below the required levels. However, we know that twice a day (~7 am and 7 pm) the incoming concentration of contaminants to this section of the plant dramatically increase above these nominal levels. What we would like to determine is whether the peak outgoing levels will remain within acceptable limits. What assumptions can we make? 1) well mixed, 2) constant density, 3) constant flowrate. Unconstrained Univariate Optimization contaminant balance: second pond d!Vco = Fc1 " Fco dt d!Vc1 = Fci " Fc1 dt first pond "V != Let the residence time be F balances become: . Then the second pond dco ! = c1 " co dt dc ! 1 = ci " c1 dt first pond Since we wish to know how changes in ci will affect the outlet concentration co. Define perturbation variables around the nominal operating point and take Laplace Transforms: second pond !s Co ( s ) = C1 ( s ) " Co ( s ) !s C1 ( s ) = C i ( s ) " C1 ( s ) first pond Combining and simplifying into a single transfer function: 1 C0 ( s ) = Ci ( s ) 2 (!s + 1) Unconstrained Univariate Optimization If we represent the concentration disturbances as ideal impulses: Ci ( s ) = k Then, transfer function of the outlet concentration becomes: k C0 ( s ) = (!s + 1)2 In the time-domain this is: c0 ( t ) = kte ! co* t " co(t) t* t t ! Since we want to determine: ) max co ( t ) = kte t stationarity: dco t % ! )t " = k $1 ! ' e = 0 # )& dt curvature: ! 2 d co dt 2 t =) t e = k "$ ! 2%' #) & ) t ( "$1 ! %' = 0 # )& t ) ke !1 =! <0 ) t =) (t =) ( maximum ( k > 0) Unconstrained Univariate Optimization Finally, the maximum outlet concentration is: c0 (! ) = k!e $1 # 0.368 k"V F This example serves to highlight some difficulties with the analytical approach. These include: → determining when the first derivative goes to zero. This required the solution of a nonlinear equation which is often as difficult as the solution of the original optimization problem. → computation of the appropriate higher-order derivatives. → evaluation of the higher-order derivatives at the appropriate points. We will need more efficient methods for optimization of complex nonlinear functions. Existing numerical methods can be classified into two broad types: 1) those that rely solely on the evaluation of the objective function, 2) those that use derivatives (or approximations of the derivatives) of the objective function. Interval Elimination Methods These methods work by systematically reducing the size of the interval within which the optimum is believed to exist. Since we are only ever working with an interval, we can never find the optimum exactly. There are many interval elimination algorithms; however, they all consist of the same steps: 1) determine an interval in which the optimum is believed to exist, 2) reduce the size of the interval, 3) check for convergence, 4) repeat steps 2 & 3 until converged. We will examine only one interval elimination method, called the “Golden Section Search”. Scanning & Bracketing As an example, consider that our objective is to minimize a function P(x). In order to start the optimization procedure we need determine the upper and lower bounds for an interval in which we believe an optimum will exist. These could be established by: • physical limits, • scanning, - choosing a starting point, - checking the function initially decreases, - finding another point of higher value. Interval Elimination Methods Golden Section Search Consider that we want to break up the interval, in which we believe the optimum exists, into two regions: l1 l2 a L b x l1 l2 where: = = r l2 L Since L=l1+l2 , l2 =rL and l1 =r2L, then: L = rL + r 2 L or: r2 + r !1 = 0 There are two solutions for this quadratic equation. We are only interested in the positive root: !1+ 5 r = " 0.61803398K 2 Which has the interesting property: 1 r= r +1 Golden ratio Interval Elimination Methods We will use this Golden ratio to break up our interval as follows: rL rL ak x1k x2k bk x L By breaking up the interval in this way we will be able to limit ourselves to one simple calculation of an xi and one objective function evaluation per iteration. The optimization algorithm is: 1) using the interval limits ak and bk, determine x1k and x2k, 2) evaluate P(x1k ) and P(x2k ), 3) eliminate the part of the interval in which the optimum is not located, 4) repeat steps 1 through 3 until the desired accuracy is obtained. How can we decide which part of the interval to eliminate? Interval Elimination Methods There are only three possible situations. For the minimization case, consider: P(x) P(x1k)>P(x2k) x1k P(x) x P(x1k)<P(x2k) x1k P(x) x2k eliminate the interval to the left of x1k. For a convex function, all values to the right of x1k are less than those on the left. x2k eliminate the interval to the right of x2k. For a convex function, all values to the left of x2k are less than those on the right. x we know that the optimum is between x1k and x2k, so we could eliminate the intervals to the left of x1k and to the right of x2k. In practice we eliminate only one region for computational ease. P(x1k)=P(x2k) x1k x2k x Interval Elimination Methods Graphically the algorithm is: eliminate kth iteration eliminate k+1th iteration P(x) ak+1 x1k ak x1k +1 x2k +1 bk +1 x2k bk x interval after k+1th iteration Three things worth noting: 1) only one new point needs to be evaluated during each iteration, 2) the length of the remaining interval after k iterations (or accuracy to which you know the optimum value of x) is: Lk = r k L0 ! ( 0.618034 ) k L0 3) the new point for the current iteration is: x1k = a k + b k ! x 2k or: x 2k = a k + b k ! x1k Interval Elimination Methods Golden-Section Example Find the solution to within 5% for: x 2 ! 2x + 1 min x Guess an initial interval for the optimum as 0 ! x * ! 2 . k ak bk x1k x2k P(x1k) 0 1 2 3 4 5 6 0.0000 0.7640 0.7640 0.7640 0.7640 0.8755 0.9443 2.0000 2.0000 1.5280 1.2360 1.0558 1.0558 1.0558 0.7640 1.2360 1.0558 0.9443 0.8755 0.9443 0.9869 1.2360 1.5280 1.2360 1.0558 0.9443 0.9869 1.0132 5.5696E-02 5.5696E-02 3.1158E-03 3.1025E-03 1.5507E-02 3.1025E-03 1.7130E-04 P(x2k) 5.5696E-02 2.7878E-01 5.5696E-02 3.1158E-03 3.1025E-03 1.7130E-04 1.7440E-04 Note that the number of calculation we have performed is: • (k+2) = 8 evaluations of P(xi), • (k+2) = 8 determinations of a new point xi. There are a variety of other interval elimination methods including Interval Bisection, Fibonacci Searches, and so forth. They are all simple to implement and not computationally intensive. They tend to be fairly robust and somewhat slow to converge. Polynomial Approximation Methods There are a number of polynomial approximation methods, which differ only in the choice of polynomial used to locally approximate the function to be optimized. Polynomials are attractive since they are easily fit to data and the optima of low-order polynomials are simple to calculate. We will only discuss Successive Quadratic Approximation as it illustrates the basic ideas and will carry over into multivariable optimization techniques. Successive Quadratic Approximation Suppose we have evaluated some test points of a function as follows: P(x) P( x ) ! ax 2 + bx + c x1 x2 x3 x We could approximate the objective function by fitting a quadratic through the test points. Polynomial Approximation Methods The main advantage of using a quadratic approximation is that we know that optimum is: b 2a The first step is to calculate the approximation constants (a,b,c) in the quadratic from the available points (x1, x2, x3). Using these three evaluation points we can write: x$ * = ! P( x1 ) = ax12 + bx1 + c P( x 2 ) = ax 22 + bx 2 + c linear in (a,b,c) P( x3 ) = ax32 + bx3 + c We have three equations and three unknowns, so in general the constants (a,b,c) can be uniquely determined. In matrix form the set of equations are written: ! x12 x1 1$ !a $ ! P( x1 ) $ # 2 &# & # P( x )& x x 1 b = 2 2 2 # &# & # & 2 #" x3 x3 1&% #" c &% #" P( x3 )&% Which has the solution: ! x12 !a $ #b& = # x 2 # 2 # & #" x32 #" c &% x1 1$ & x 2 1& x3 1&% -1 ! P( x1 ) $ # P( x )& 2 # & #" P( x3 )&% Polynomial Approximation Methods Some algebra yields the solution for the minimum as: 2 2 2 2 2 2 1 " (x 2 ! x3 )P( x1 ) + (x3 ! x1 )P( x 2 ) + (x1 ! x 2 )P( x3 ) % * x$ = $ ' 2 $# (x 2 ! x3 )P( x1 ) + (x3 ! x1 )P( x 2 ) + (x1 ! x 2 )P( x3 ) '& Graphically, the situation is: P(x) P( x ) ! ax 2 + bx + c x1 x2 x$ * x3 x To perform the next iteration, we need to chose which three points to use next. The textbook gives you some elaborate rules for coding into a computer. If you’re doing hand calculations choose the point (xi) with the smallest value of the objective function P(x) and the x$ * points immediately on either side. Note that will not necessarily have the smallest objective function value. Finally, the accuracy to which the optimum is known at any iteration is given by the two points which bracket the point with the smallest objective function value. Polynomial Approximation Methods The procedure is: 1) choose three points that bracket the optimum, 2) determine a quadratic approximation for the objective function based on these three points. 3) calculate the optimum of the quadratic x$ * approximation (the predicted optimum ), 4) repeat steps 1 to 3 until the desired accuracy is reached. Example: revisit the wastewater treatment problem with τ = 1. max te ! t " min ! te ! t t t iteration t1 t2 t3 P( t1 ) P( t 2 ) P( t3 ) t$ * P( t$ * ) 0 0.5 15 . 2.5 ! 0.3303 ! 0.3347 ! 0.2052 11953 . ! 0.3617 1 0.5 11953 . 15 . ! 0.3303 ! 0.3617 ! 0.3347 1.0910 ! 0.3664 2 0.5 1.0910 11953 . ! 0.3303 ! 0.3664 ! 0.3617 1.0396 ! 0.3676 3 0.5 1.0396 1.0910 ! 0.3303 ! 0.3676 ! 0.3664 1.0185 ! 0.3678 4 0.5 1.0185 1.0396 ! 0.3303 ! 0.3678 ! 0.3676 ! 0.3679 1.0083 Newton’s Method This is a derivative-based method that uses first and second derivatives to approximate the objective function. The optimum of the objective function is then estimated using this approximation. Consider the 2nd-order Taylor Series approximations of our objective function: dP 1 d 2P 2 P( x ) ! P( x k ) + x " x (x " x k ) + ( ) k dx x k 2 dx 2 x k a quadratic approximation of P(x) at the point xk. Then we could approximate the first derivative of the objective function as: dP dP 1 d 2P ! + ( 2 ) 2 (x " x k ) dx dx x k 2 dx x k Since we require that the first derivative of the objective function to be zero at the optimum, then: dP dP d 2P ! + (x " x k ) = 0 dx dx x k dx 2 x k Newton’s Method A little algebra will allow us to estimate what the optimum value of x is: x$ * = x k +1 " $ = xk ! $ $ $ # dP dx x k d 2P dx 2 x k % ' ' ' ' & This is called a Newton step and we can iterate on this equation to find the optimum value of x. Graphically, this is: dP dx x* 0 xk+1 xk x Newton’s method attempts to find the stationary point of a function by successive approximation using the first and second derivatives of the original objective function. Newton’s Method Example: revisit the wastewater treatment problem with τ = 1. max te ! t " min ! te ! t t t iteration xk P !(x k ) P !!(x k ) P(x k ) 0 1 2 3 4 5 0.0000 0.5000 0.8333 0.9762 0.9994 10000 . 1000 . 0.3303 0.0724 0.0090 0.0002 " 2.0000 " 0.9098 " 0.5070 " 0.3857 " 0.3683 " 0.0000 " 0.3303 " 0.3622 " 0.3678 " 0.3679 In summary, Newton’s Method: 1) shows relatively fast convergence (finds the optimum of a quadratic in a single step), 2) requires first and second derivatives of the function, 3) has a computational load of first and second derivative evaluations, and a Newton step calculation per iteration, 4) may oscillate or not converge when there many local optima or stationary points. Quasi-Newton Methods A major drawback of Newton’s method is that it requires us to have analytically determined both the first and second derivatives of our objective function. Often this is considered onerous, particularly in the case of the second derivative. The large family of optimization algorithms that use finite difference approximations of the derivatives are called “Quasi-Newton” methods. There are a variety of ways first derivatives can be approximated using finite differences: P(x + $x ) # P(x # $x ) central 2 $x P(x + $x ) # P(x ) forward P !( x ) " $x P(x ) # P(x # $x ) P !( x ) " backward $x Approximations for the second derivatives can be determined in terms of either the first derivatives or the original objective function: P !(x + $x ) # P !(x # $x ) P !!( x ) " 2 $x P !(x + $x ) # P !(x ) P !!( x ) " $x P !(x ) # P !(x # $x ) P !!( x ) " $x P !( x ) " Quasi-Newton Methods P(x + $x ) # 2 P( x ) + P(x # $x ) $x 2 P(x + 2$x ) # 2 P(x + $x ) + P( x ) P !!( x ) " $x 2 P(x ) # 2 P(x # $x ) + P( x # 2$x ) P !!( x ) " $x 2 P !!( x ) " The major differences within the Quasi-Newton family of algorithms arises from: the number of function and / or derivatives that must be evaluated, the speed of converge and the stability of the algorithm. Regula Falsi This method approximates the second derivative as: P "( x q ) # P "( x p ) d 2P ! dx 2 xq # x p where xp and xq are chosen so that P !( x p ) and P !( x q ) have different signs (i.e. the two points bracket the stationary point). Quasi-Newton Methods Substituting this approximation into the formula for a Newton step yields: ( dP + * (xq ! x p ) * dx x q xq ! * " dP %dP *$ '! *) $# dx x q dx x p '& -, x$ k* = The Regula Falsi method iterates on this formula taking care on each iteration to retain two points: depending upon: x$ k* and x p or x q keep x$ k* and x q [ ] [ ] [ ] [ ] sign P !( x$ k* ) = sign P !( x p ) or: sign P !( x$ k* ) = sign P !( x q ) keep x$ k* and x p Graphically: dP dx x$ k* +1 x$ k* 0 xp x* xq x Quasi-Newton Methods Example: problem revisit the wastewater treatment with τ = 1. max te ! t " min ! te ! t t t iteration xp xq x$ k* 0 1 2 3 4 5 6 0 0 0 0 0 0 0 2.0000 17616 . 15578 . 13940 . 12699 . 11804 . 11184 . 17616 . 15578 . 13940 . 12699 . 11804 . 11184 . 10768 . P !( x p ) P !( x q ) P !( x$ k* ) "1 "1 "1 "1 "1 "1 "1 01353 . 01308 . 01175 . 0.0978 0.0758 0.0554 0.0387 01308 . 01175 . 0.0978 0.0758 0.0554 0.0387 0.0262 note: computational load is one derivative evaluation, a sign check and a step calculation per iteration. In summary, Quasi-Newton methods: 1) show slower convergence rates the Newton’s Method, 2) use some finite difference approximations for derivatives of the objective function, 3) may oscillate or not converge when there many local optima or stationary points. Univariate Summary There are only two basic ideas in unconstrained univariate optimization: 1) direct search (scanning, bracketing, interval elimination, approximation), 2) derivative based methods (optimality conditions, Newton and Quasi-Newton methods). The direct search methods work directly on the objective function to successively determine better values of x. The derivative based methods try to find the stationary point of the objective function, usually by some iterative procedure. Choosing which method to use is always a trade-off between: 1) ease of implementation, 2) convergence speed, 3) robustness, 4) computational load (number of function evaluations, derivative evaluations / approximations, etc.). There are no clear answers as to which univariate method is better. The best choice is problem dependent: • for nearly quadratic functions, the Newton / Quasi-Newton methods are generally a good choice, • for very flat functions, interval elimination methods can be very effective. Univariate Summary When the univariate search is buried in a multivariate code, the best choice of univariate search will depend upon the properties and requirements of the multivariate search method. Finally remember that to solve an unconstrained univariate optimization problem you must: 1) have a function (and derivatives) you can evaluate, 2) choose an optimization method appropriate for the problem, 3) choose a starting point (or interval), 4) select a termination criteria (or accuracy).