Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Part II Static Optimization We will study here techniques that fall under the category of (static) nonlinear programming. While these techniques still apply for the subdomain of linear programming, there exist stronger results for that domain that we will not explore. Good references for linear programming include Dantzig and Intrilligator. 8 Statement of the Problem The general optimization problem (for our purposes) consists of an objective function, assumed to be real-valued, together with a set of inequality constraints and equality constraints. Given θ ∈ Θ, the problem is to find x ∈ X that solves max f (x, θ) (8.1) subject to the inequality constraints gj (x, θ) ≤ 0, 1≤j≤J (8.2) hk (x, θ) = 0, 1≤k≤K (8.3) and the equality constraints If we define the constraint set as ! C(θ) ≡ X ∩ \ {x : gj (x, θ) ≤ 0} ! ∩ 1≤j≤J \ {x : hk (x, θ) = 0} (8.4) 1≤k≤K then the problem can be more compactly written as max f (x, θ). (8.5) x∈C(θ) A solution to equation (8.5) is a global maximizer. Definition 8.1. A point x∗ ∈ C(θ) is a global maximizer (or just maximizer) for the maximization problem (8.5) if for all x ∈ C(θ), f (x∗ , θ) ≥ f (x, θ). It is a strict global maximizer if for all x ∈ C(θ), x 6= x∗ , f (x∗ , θ) > f (x, θ). The definition of (strict) global minimizer has the inequality reversed. 26 While not necessarily a solution to the maximization problem, local maximizers are interesting candidates for solutions since global maximizers are necessarily local maximizers. Definition 8.2. A point x∗ ∈ C(θ) is a local maximizer for the maximization problem (8.5) if there exists an open6 neighborhood U ⊂ C(θ) of x∗ such that for all x ∈ U , f (x∗ , θ) ≥ f (x, θ). It is a strict local maximizer if for all x ∈ U , x 6= x∗ , f (x∗ , θ) > f (x, θ). The definition of (strict) local minimizer has the inequality reversed. Suppose x∗ ∈ C(θ) is a solution of equation (8.5) (where the notation of the dependence of x∗ on θ has been suppressed). Then the value function can defined as follows: V (θ) ≡ maxx∈C(θ) f (x, θ) = f (x∗ , θ). Consider a statement of the following sort: “If x∗ ∈ C(θ) solves (8.5), then condition A”, where A is a mathematical statement. This type of statement describes a necessary condition for a maximizer. Suppose instead we have a statement of the following sort: “If condition A, then x∗ ∈ C(θ) solves (8.5)”. This type of statement describes a sufficient condition for a maximizer. A necessary condition furnishes a set of potential solutions and guarantees that any solution is a member of this set. A sufficient condition furnishes a set of guaranteed solutions but potentially excludes some solutions. Sufficient conditions can be viewed as existence theorems. 9 Existence of Optima Theorem 9.1 (Finite Constraint Set). If the constraint set C(θ) is nonempty and finite, then the objective function f has both a maximizer and a minimizer in the constraint set. Theorem 9.2 (Weierstrass Extreme Value Theorem). If the constraint set C(θ) is nonempty and compact and f is continuous, then f has both a maximizer and a minimizer in the constraint set. Proof. Since continuous functions map compact sets to compact sets (see Theorem 5.9), V ≡ f (C(θ)) ⊂ R is a compact set. By the Heine-Borel Theorem, V is closed and bounded. But any closed and bounded subset of R contains its least upper bound, and thus has a maximal value. Then, there exists some x ∈ C(θ) that maps to the maximal member of the set V , and so x is a maximizer. The proof for the case of the existence of a minimizer is analogous. 6 The relevant topology for the problem is the relative topology of C(θ) derived from that of X, both of which can be generated from the metric of the space X. 27 Corollary 9.3. If X ≡ RN and the constraint set C(θ) is nonempty, closed and bounded, and if f is continuous, then f has both a maximizer and a minimizer. Example 9.1. The following are examples illustrating the role of the assumptions of the Weierstrass Theorem. 1. Suppose C(θ) = R, which is nonempty but not compact (since it is not bounded). Then the continuous function f (x) = x has no maximizer. Also if C(θ) = (0, 1) which is bounded but not closed and so not compact, then f has no maximizer. But if C(θ) = (0, 1], which is also nonempty and not compact, then our previously defined continuous function has a maximizer. 2. Suppose C(θ) = [−1, 1], which is nonempty and compact. Then the discontinuous function ( 1 − |x| : x 6= 0 f (x) = 0 : x=0 has no maximizer. 3. Suppose we have a discontinuous function f on R such that f (x) = 1 when x is rational, and f (x) = 0 when x is irrational. Then f has a maximizer when C(θ) ≡ R, which is a noncompact set. The condition of continuity of the objective function can be weakened to yield a generalization of the Weierstrass Theorem. Definition 9.1 (Level Sets). Let f : X → R be a function on some space X. The level set of f at α (also termed the contour or isoquant) is the set I(α; f ) ≡ {x ∈ X : f (x) = α}. The upper level set of f at α (also termed the upper contour) is the set U (α; f ) ≡ {x ∈ X : f (x) ≥ α}. The lower level set of f at α (also termed the lower contour) is the set L(α; f ) ≡ {x ∈ X : f (x) ≤ α}. When clear from context, I will denote upper and lower sets of a function without reference to the function. Definition 9.2 (Semicontinuity). Let (X, ρ) be a metric space. A function f : X → R is upper semicontinuous if for all α ∈ R, the upper level set U (α) is closed. It is lower semicontinuous if for all α ∈ R, the lower level set L(α) is closed. A function f is continuous if and only if it is both upper and lower semicontinuous. 28 Theorem 9.4 (Generalized Weierstrass Theorem). Suppose the constraint set C(θ) is compact. If f is upper semicontinuous, then it has a maximizer in the constraint set. If f is lower semicontinuous, then it has a minimizer. 10 Convex Sets and Functions on Convex Sets Before we dig deeper into necessary or sufficient conditions for optimizers, we will define and understand the properties of four special classes of functions, quasiconcave, concave, quasiconvex, and convex functions. Suppose V is a vector space, for example RN . Definition 10.1 (Convex Set). A set S ⊂ V is convex if for all x, y ∈ S, λx + (1 − λ)y ∈ S for all λ ∈ (0, 1). A set is strictly convex if for all x, y ∈ S, λx + (1 − λ)y ∈ intS for all λ ∈ (0, 1). The empty set is assumed to be convex. Proposition 10.1. The intersection of an arbitrary family of convex sets is convex. Definition 10.2 (Convex Hull). The convex hull of a set S ⊂ V , denoted cvxS is smallest (under the set inclusion order) convex set that contains S. Equivalently, it is the intersection of all convex sets that contain S. Definition 10.3 (Concavity and Convexity). Let f be a real-valued function on a convex subset S of a vector space V . The function f is concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥ λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1). The function f is strictly concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) > λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1). The function f is convex if for all distinct x, y ∈ S, f (λx+(1−λ)y) ≤ λf (x)+(1−λ)f (y) for all λ ∈ (0, 1). The function f is strictly convex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) < λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1). Definition 10.4 (Quasiconcavity and Quasiconvexity). Let f be a real-valued function on a convex subset S of a vector space V . The function f is quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥ min{f (x), f (y)} for all λ ∈ (0, 1). 29 The function f is strictly quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) > min{f (x), f (y)} for all λ ∈ (0, 1). The function f is quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≤ max{f (x), f (y)} for all λ ∈ (0, 1). The function f is strictly quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) < max{f (x), f (y)} for all λ ∈ (0, 1). Concavity and convexity could also be defined in terms of hypographs and epigraphs. Definition 10.5 (Graph, Hypograph, Epigraph). The graph of a function f : X → R, where X is a convex set, is the set G(f ) ≡ {(x, α) : f (x) = α} ⊂ X × R. The hypograph is H(f ) ≡ {(x, α) : f (x) ≥ α} ⊂ X ×R. The epigraph is E(f ) ≡ {(x, α) : f (x) ≤ α} ⊂ X ×R. Proposition 10.2. Suppose we have a function f : X → R, where X is a convex set. The function f is concave if its hypograph H(f ) is convex, and is strictly concave if its hypograph is strictly convex. The function is convex if its epigraph E(f ) is convex, and is strictly convex if its epigraph is strictly convex. Quasiconcavity and quasiconvexity could also be defined in terms of level sets 7 . Proposition 10.3. A function f is quasiconcave if and only if for all α ∈ R, U (α) is convex. A function f is strictly quasiconcave if and only if for all α ∈ R, U (α) is strictly convex. A function f is quasiconvex if and only if for all α ∈ R, L(α) is convex. A function f is strictly quasiconvex if and only if for all α ∈ R, L(α) is strictly convex. Notice that the definitions of these properties do not require continuity or differentiability. In fact, the weak version of these properties have do not require a topology on the space. The strict version of these properties (for example, strict quasiconcavity) does require the vector space to have a norm, however, because our definition of a strictly convex set makes reference to the interior of the set, which is a topological concept. If we strengthen the assumptions to include differentiability (of varying degrees), we can obtain alternative conditions that are necessary or sufficient for these properties. We shall see this below. It is straightforward to show that every concave function is quasiconcave and every convex function is quasiconvex. The converse is not true. For example, any monotonic function is both quasiconcave and quasiconvex, but only linear functions are both concave and convex. Positive monotonic transformations of a concave (convex) function do not preserve concavity (convexity) necessarily, but they do preserve quasiconcavity (quasiconvexity). 7 The definition in terms of level sets is the one put forth by Arrow, Enthoven (Econometrica 1961). 30 Proposition 10.4. Let f : S → R be a quasiconcave (quasiconvex) function. Then, for any nondecreasing function g : R → R, g ◦ f is quasiconcave (quasiconvex). √ Example 10.1. Suppose f : R+ → R is defined by f (x) = x. This function is strictly concave. Now, suppose g : R → R is defined by g(x) = x4 , which is a non-decreasing function. Notice that g ◦ f (x) = x2 , which is a strictly convex function. Thus, concavity is not preserved. However, both f and g ◦ f are quasiconcave. A natural question to ask is whether every quasiconcave function is just a non-negative monotonic transformation of some concave function. The answer is no; the following is an example from Arrow, Enthoven (Econometrica 1961). 1 Example 10.2. Suppose f (x, y) = (x − 1) + ((x − 1)2 + 4(x + y)) 2 . The level sets of f are nonparallel straight lines (a Grapher file of the function and level sets is available here: https://www2.bc.edu/samson-alva/ec720f11/arrowQCexample.gcx). Another example originally from Aumann (Econometrica 1975). Suppose f (x, y) = y + p x + y 2 . Again, the level sets of f are nonparallel straight lines (a Grapher file of the function and level sets is available here: https://www2.bc.edu/samson-alva/ec720f11/ aumannQCexample.gcx). Notice that f is strictly concave when restricted to either the first or the second dimension, but linearity of the level sets implies that it is only weakly quasiconcave. Philip Reny (2010) proves that a continuous quasiconcave function cannot be transformed by a strictly increasing function into a concave function unless it has parallel level sets (his result is actually even stronger than this). The two examples above are such continuous quasiconcave functions. Afriat’s Theorem states that for any finite set of choices satisfying the Generalized Axiom of Revealed Preference there exists a continuous strictly increasing concave utility function that would generate those choices. For more details on concavifiability of quasiconcave functions, see the extensive discussion in Connell, Rasmusen (2011). Proposition 10.5. Here are some useful results about quasiconcave and concave functions: 1. If f is strictly concave, and h is strictly increasing, then h ◦ f is strictly quasiconcave. 2. If f is strictly quasiconcave and h is strictly increasing, then h ◦ f is strictly quasiconcave. 3. If f is strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly quasiconcave. 31 4. If f is weakly but not strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly quasiconcave. 5. If f is weakly but not strictly quasiconcave and h is strictly increasing, then h ◦ f is NOT necessarily strictly quasiconcave. Proposition 10.6. Here are some useful results about quasiconvex and convex functions: 1. If fi is quasiconvex, wi ≥ 0 then f ≡ maxi {wi fi } is quasiconvex. 2. If fi is convex, then maxi {fi } is convex. 3. If f, g are convex, and g is nondecreasing, then g(f ) is convex. 4. If f, g are concave, and g is nonincreasing, g(f ) concave. Now, let’s make some assumptions about differentiability. Theorem 10.7. Let X ⊂ RN , and suppose f : X → R, f ∈ C 1 . 1. f is concave if and only if, for all x, y ∈ X, f (y) − f (x) ≤ Df (x)(y − x). 2. f is strictly concave if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) < Df (x)(y − x). 3. f is convex if and only if, for all x, y ∈ X, f (y) − f (x) ≥ Df (x)(y − x). 4. f is strictly convex if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) > Df (x)(y − x). 5. f is quasiconcave if and only if, for all x, y ∈ X, f (y) ≥ f (x) implies Df (x)(y−x) ≥ 0. 6. If, for all x, y ∈ X, y 6= x, f (y) ≥ f (x) implies Df (x)(y − x) > 0, then f is strictly quasiconcave. The converse is not true, as discussed below. 7. f is quasiconvex if and only if, for all x, y ∈ X, f (y) ≤ f (x) implies Df (x)(y − x) ≤ 0. 8. If, for all x, y ∈ X, y 6= x, f (y) ≤ f (x) implies Df (x)(y − x) < 0, then f is strictly quasiconvex. The converse is not true, as discussed below. Theorem 10.8. Let X ⊂ RN , and suppose f : X → R, f ∈ C 2 . 1. f is concave if and only if for all x, D2 f (x) is negative semidefinite. 2. If, for all x, D2 f (x) is negative definite, then f is strictly concave. 3. f is convex if and only if for all x, D2 f (x) is positive semidefinite. 32 4. If, for all x, D2 f (x) is positive definite, then f is strictly convex. 5. f is quasiconcave if and only if for all x, D2 f (x) is negative semidefinite on the nullspace8 of Df (x). 6. If, for all x, D2 f (x) is negative definite on the nullspace of Df (x), then f is strictly quasiconcave. 7. f is quasiconvex if and only if for all x, D2 f (x) is positive semidefinite on the nullspace of Df (x). 8. If, for all x, D2 f (x) is positive definite on the nullspace of Df (x), then f is strictly quasiconvex. There is a characterization of (semi)definite matrices involving determinants. Definition 10.6 (Principal Minors). Let A be a real-valued, symmetric N ×N matrix. Then a principal minor of order m of the matrix A is a submatrix of A where all but m rows ! and corresponding (by index) columns are deleted. There are m!(NN−m)! principal minors of order m. The leading principal minor of order m is the principal minor of order m with the last N − m rows and columns deleted. The following theorems characterize (semi)definiteness of a symmetric matrix. Theorem 10.9 (Characterization of Definiteness). Suppose A is a real-valued, symmetric N × N matrix. 1. A is negative definite if and only if the determinant of the leading principal minor of order m is nonzero and has the sign (−1)m , for all 1 ≤ m ≤ N . 2. A is positive definite if and only if the determinant of the leading principal minor of order m is strictly positive, for all 1 ≤ m ≤ N . 3. A is negative semidefinite if and only if the determinant of every principal minor of order m is zero or has the sign (−1)m , for all 1 ≤ m ≤ N i.e. odd-ordered principal minors have nonpositive determinants and even-ordered principal minors have nonnegative determinants. 4. A is positive semidefinite if and only if the determinant of every principal minor of order m is nonnegative, for all 1 ≤ m ≤ N . 8 The nullspace of a vector is the set of all vectors that are orthogonal to it. The nullspace of a matrix is the set of all vectors that the matrix maps to the zero vector. 33 Checking definiteness of a matrix on a subspace requires using the Bordered Matrix test, where the matrix in question is bordered on upper and on the left side by the constraints. Definition 10.7 (Bordered Matrix). Let A be a real-valued symmetric N × N matrix and bk ∈ RN for k ∈ {1 . . . K}, a set of independent vectors. Let B be the N × K matrix b1 . . . bk , and denote by B 0 the transpose of B. Then, the bordered matrix of A with ! 0 B0 . respect to B is H ≡ B A Definition 10.8 (Border-Respecting Principal Minors). Let A be a real-valued, symmetric N ×N matrix, and B be a real-valued N ×K matrix of full rank, and denote by H the bordered matrix of A with respect to B. Then a border-respecting principal minor of order m of the bordered matrix H is a submatrix of H where all but m rows and corresponding (by index) columns are deleted, with the restriction that the index of a deleted row (and column) be greater than K. The leading border-respecting principal minor of order m is the principal minor of order m with the last N + K − m rows and columns deleted. Theorem 10.10 (Characterization of Definiteness on a Linear Constraint Set). Suppose A N is a real-valued symmetric N ×N matrix and bk ∈ R for k ∈ {1 . . . K}, a set of independent vectors. Let B be the N × K matrix b1 . . . bk , and denote by B 0 the transpose of B. ! 0 B0 Define the bordered matrix H ≡ . B A 1. A is negative definite on the subspace {v : B 0 v = 0} if and only if for each m ∈ {2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor of order m of matrix H is nonzero and has the sign (−1)(m−K) i.e. the determinant of H has the sign (−1)N and the signs of the last (largest) N −K leading border-respecting principal minors have alternating signs. 2. A is positive definite on the subspace {v : b0 v = 0} if and only if for each m ∈ {2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor of order m of matrix H is nonzero and has the sign (−1)K . The characterization of semidefiniteness on a linear constraint set involves testing every border-respecting principal minor of every order m ∈ {2K + 1, . . . , N + K}, and not just the border-respecting leading principal minors, analogous to the characterization of semidefiniteness of an unconstrained symmetric matrix. 34 Theorem 10.11 (Characterization of Semidefiniteness on a Linear Constraint Set). Suppose we have A, bk , and B as in Theorem 10.10. 1. A is negative semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈ {2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order m alternates in sign or is equal to zero, with the sign of the determinant of H being (−1)N or equal to zero i.e. every border-respecting principal minor of order m has a nonpositive determinant if m − K is odd and has a nonnegative determinant if m − K is even. 2. A is positive semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈ {2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order m is nonnegative if K is even and is nonpositive if K is odd. Therefore, to test for, say, quasiconcavity of a twice continuously differentiable function f in the neighborhood of a point x, we need to find the Hessian of f evaluated at x, which is a real-valued symmetric matrix, and check whether this Hessian, when bordered by the Jacobian of f evaluated at x, passes the test of negative semidefiniteness of a matrix on a linear subspace described in Theorem 10.11. 10.1 References Afriat. The Construction of Utility Functions from Expenditure Data. (International Economic Review 1967) Arrow, Enthoven. Quasiconcave Programming. (Econometrica 1961) Aumann. (Econometrica 1975) Connell, Rasmusen. Concavifying the Quasiconcave. (Working Paper 2011) Reny. A Simple Proof of the Nonconcavifiability of Functions with Linear Not-All-Parallel Contour Sets. (Working Paper 2010) 11 Unconstrained Optimization With a Differentiable Objective Function 11.1 Overview - FONC can be derived using a first-order Taylor expansion, which requires the objective function to be continuously differentiable 35 - SONC can be derived using a second-order Taylor expansion, which requires the objective function to be twice continuously differentiable - SOSC can be derived using a second-order Taylor expansion, which requires the objective function to be twice continuously differentiable Also, see later 12.4 for more on second-order conditions for unconstrained problems. 11.2 References See Kim Border’s notes on the calculus of one variable: http://www.hss.caltech.edu/ ~kcb/Notes/Max1.pdf 12 Classical Programming: Optimization with Equality Constraints Let us focus on optimization problems where the domain of the objective and constraint functions is an open subset X of RN , with only equality constraints i.e. J = 0 in equation (8.2) 12.1 Overview - Introduce auxiliary variables, called multipliers, for each equality constraint, thereby converting a constrained optimization problem to an unconstrained optimization problem with a larger set of choice variables - Show that the necessary conditions for maxima of the Lagrangian problem yield necessary conditions for maxima of the original problem - Intuition based on the gradients of the objective function and the constraint function - Interpretation of the multipliers. Nice article on Wikipedia: http://en.wikipedia. org/wiki/Lagrange_multiplier - Explanation of the constraint qualification and the failure of the theory to find optima when the CQ is violated 12.2 First Order Conditions For illustrative purposes consider the problem with one equality constraint: max f (x1 , x2 ) subject to h1 (x1 , x2 ) = 0 36 Figure 1: Graphical Depiction of a Constrained Optimization Problem A geometric visualization of this problem is given in Figure 1. Note from Figure 1 that at the point x∗ = (x∗1 , x∗2 ) the level curve of f and the constraint are tangents to each other, i.e. both have a common slope. We will explore this observation in order to find a characterization of the solution for this class of problems and its generalization to m restrictions. In order to find the derivative of the level curve at the optimum point x∗ , recall from the implicit function theorem that for a function G(y1 , y2 ) = c, dy2 (ŷ) = − dy1 ∂G (ŷ) ∂y2 −1 ∂G (ŷ), ∂y1 where ŷ is some point in the domain, if, on an open neighborhood of (ŷ), G is continuously ∂G 2 differentiable and ∂y (ŷ) is nonzero. In particular according to Figure 1, dx (x̂) defined dx1 2 dx2 ∗ ∗ implicitly by f (x̂) ≡ f (x ), and dx1 (x̂) defined implicitly by h(x̂) ≡ h(x ), must be the same at x∗ : ∂f ∂h (x∗ ) (x∗ ) dx2 ∗ dx2 ∗ ∂x1 ∂x1 (x ) = ∂f = ∂h ∗ = (x ), ∗ dx1 dx1 (x ) (x ) ∂x2 ∂x 2 which after some rearrangement yields ∗ λ ≡ ∂f (x∗ ) ∂x1 ∂h (x∗ ) ∂x1 37 = ∂f (x∗ ) ∂x2 . ∂h (x∗ ) ∂x2 (12.1) where λ∗ ∈ R is the common value of the slope at x∗ . We are assuming that the ratios above do not have zero denominators, the assurance of which is the motivation for the constraint qualifications discussed later. Now, rewrite equation (12.1) as two equations and ∂h ∗ ∂f ∗ (x ) − λ∗ (x ) = 0 ∂x1 ∂x1 (12.2) ∂h ∗ ∂f ∗ (x ) − λ∗ (x ) = 0. ∂x2 ∂x2 (12.3) Together with the constraint equation h(x1 , x2 ) = c, (12.4) we have a system of three equations (12.2), (12.3), and (12.4) with three unknowns: (x∗1 , x∗2 , λ∗ ). This system is equivalent to the first-order conditions for stationary points of the following function L(x1 , x2 , λ) ≡ f (x1 , x2 ) − λ(h(x1 , x2 ) − c), (12.5) which we call the Lagrangian; we also call the term λ the Lagrange multiplier. Thus, for the case with two choice variables and one constraint, a maximizer (subject to a qualification) ∂L ∂L x∗ satisfies ∂x (x∗1 , x∗2 , λ∗ ) = 0, ∂x (x∗1 , x∗2 , λ∗ ) = 0, and ∂L (x∗1 , x∗2 , λ∗ ) = 0, for some λ∗ . ∂λ 1 2 The Lagrange method transforms a constrained problem into an unconstrained problem via the formulation of the Lagrangian. The transformation introduces a Lagrange multiplier for every constraint. It is important to note that the transformation is valid only if at least ∂h ∂h one of ∂x (x∗ ) and ∂x (x∗ ) is nonzero. If not, then there is no way to define a multiplier, as 1 1 should be clear from examining equation (12.1). This is called the non-degenerate constraint qualification. If the constraint is linear, this qualification will automatically be satisfied. We can mimic the steps above for an arbitrary problem with N choice variables and K constraints, where K < N . Suppose we have a solution to the general constrained maximization problem with only equation constraints; denote it x∗ ∈ RN . Then it must be the case that h(x∗ ) = 0, where h is the K-dimensional vector function of constraints. Consider a linear approximation of this constraint function at x∗ : the derivative (the Jacobian) of this linear approximation will be Dx h(x∗ ), according to the Taylor Approximation Theorem for a first-order approximation, assuming h is continuously differentiable at x∗ . This Jacobian ∂hk matrix is a K × N matrix, where a generic term is ∂x , and when each row is viewed as n 38 a vector, the K vectors span a subspace of RN . Each row of the Jacobian is a vector (the transpose of the gradient vector of the associated constraint) that has an associated (N − 1)dimensional nullspace (as long as this vector is non-degenerate), which is the tangent plane to the associated constraint at x∗ , the linear approximation of the constraint function at x∗ . Then, the K rows define K such subspaces, and for a vector to satisfy all these constraints, the vector must be in every one of these subspaces. If the row vectors of the Jacobian are linearly independent, then the nullspace of the Jacobian is exactly the subspace of vectors that are in the tangent plane of each constraint, a subspace of (N − K) dimensions. Now, given the gradient vector of the objective function Dx f (x∗ ), consider the subspace orthogonal to the gradient. This hyperplane is a linear approximation of the level set of the objective function at the point x∗ i.e. any local movement from x∗ within this hyperplane will not change the value of the objective function. So, if x∗ is a maximum, it must be the case that any local move from x∗ that does not locally violate the constraints i.e. movement within the nullspace of the full-rank Jacobian Dx h(x∗ ), does not change the value of the objective, and thus we can conclude that the nullspace of Dx h(x∗ ) must be contained by the nullspace of Dx f (x∗ ). But this means that the vector Dx f (x∗ ) is not in the nullspace of Dx h(x∗ ) and so it can be expressed as a linear combination of the rows of this matrix. Thus, we can conclude that at a maximum, there exists K constants, λ∗k , 1 ≤ k ≤ K, such that Dx f (x∗ ) = X λ∗k Dx hk (x∗ ). k Combined with the constraints h, we have N + K equations that must be satisfied by the N + K unknown maximum x∗ and the constant λ∗ , under the qualification that Dx h(x∗ ) has full rank. Notice, too, that the argument above is exactly the same if x∗ is a minimum. Thus, these necessary conditions hold for both maxima and minima. The results are formally stated below. Definition 12.1 (Nondegenerate Constraint Qualification). The functions hk , 1 ≤ k ≤ K satisfy the Nondegenerate Constraint Qualification (NDCQ) at x∗ if the rank of the Jacobian Dx h(x∗ ) is K i.e. the Jacobian has full rank. Theorem 12.1 (First Order Necessary Conditions). Let f and h be continuously differentiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set is defined by {x ∈ X : h(x) = 0}, and suppose that the constraints satisfy the NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λh(x) to be the associated Lagrangian. Then there 39 exists λ∗ ∈ RK such that: X ∂hk ∂f ∗ ∂L ∗ ∗ (x , λ ) = (x ) − (x∗ ) = 0 λ∗k ∂xn ∂xn ∂xn k and ∂L ∗ ∗ (x , λ ) = −hk (x∗ ) = 0 ∂λk for every n ∈ {1, . . . , N }, k ∈ {1, . . . , K}, which equivalently means that (x∗ , λ∗ ) is a stationary point of the Lagrangian. 12.3 Meaning of the Multiplier Theorem 12.2 (Meaning of the Multiplier). Let f , hk be C 1 function with domain in RN and let θk ∈ R. Consider the maximization problem: max f (x) subject to hk (x) = θk , k ∈ {1, . . . , K}. Suppose x∗ is a constrained maximizer, and λ∗ the associated Lagrange multipliers, and suppose x∗ and λ∗ are C 1 functions of θ. Then, ∂f (x∗ (θ)) = λ∗k (θ). ∂θk From Theorem 12.2 we conclude that the Lagrange multiplier can be interpreted as the change in the maximum achieved by the objective function if we slightly relax (tighten) the corresponding constraint. Consider for example the case in which f is a utility function with two arguments x1 and x2 , with h(x1 , x2 ) = I representing the budget constraint given an income of I. The Lagrange multiplier associated with the budget constraint is equivalent to the marginal utility of income. The principle behind this result is the envelope theorem, which states that only the direct effect of an exogenous parameter on the objective function matters when studying the total effect of a change on the optimal value. The change in the parameter also induces a change in the endogenous choice variables, but optimality requires that the first-order effect of a change in the endogenous variables will have no effect on the value of the objective. 12.4 Second Order Conditions 12.4.1 Unconstrained Optimization As discussed in section 10, a concave function f , when twice continuously differentiable, is characterized by the result that they have negative semidefinite Hessians i.e. Dx2 f is 40 negative semidefinite at every point on the domain. There is no characterization of a strictly concave function, but a negative definite Hessian on the domain of the function is a sufficient condition of strict concavity. We can easily define local version of these concepts at a point x, by weakening the requirement that the property of the Hessian holds on the whole domain to holding for some open neighborhood of the point x. Then, if x is a local maximum i.e. x is a maximum of f in an open neighborhood of x, then f (x) ≥ f (x0 ) for all x0 in the neighborhood, and with a continuously differentiable function f , this yields Df (x)(x0 − x) ≥ f (x0 ) − f (x). But this is exactly the condition for a continuously differentiable function to be concave, and so local concavity of f is a necessary condition for a local maximum. If we also know f is twice continuously differentiable, then we can conclude that the Hessian of f must be negative semidefinite on the neighborhood, given that f must be locally concave. Suppose f is locally strictly concave on this open neighborhood, then x is a local strict maximum.9 Moreover, if f is twice continuously differentiable at x, then strict local concavity implies Dx2 f (x) is negative definite on this neighborhood, and thus serves as a sufficient condition for a local maximum when combined with some necessary conditions for a maximum, such as the standard first-order conditions. Definition 12.2 (Regular Maximum). For some twice continuously differentiable function f : X → R, we call a local maximizer x∗ regular if Dx2 f (x∗ ) is negative definite on an open neighborhood of x∗ . Notice that from the arguments above, a regular maximum is a strict local maximum. However, the converse need not be true, as should be clear from the following example. Example 12.1 (A strict local maximizer that is not regular). Suppose f (x) = −x4 . Then, f is twice continuously differentiable, with a strict (local) maximum at x = 0. However, f 00 (x) = −12x2 evaluates to 0 at x = 0, and so f 00 is not negative definite at 0. However, f is strictly concave, and so we see that strict concavity does not imply the Hessian is negative definite, and as a consequence, a strict local maximum need not be regular. 12.4.2 Constrained Optimization For constrained maximization problems, the intuition for second order conditions is similar to the unconstrained case. However, the constraints imply that local (strict) concavity 9 It may seem that the converse should also be true, but notice that the function f (x) = −|x|, which has a strict maximum at x = 0, is only concave, and not strictly concave, even locally at 0. 41 at the optimum need only be tested on the tangent space of the constraint set at the optimum. Moreover, the relevant function is no longer the objective function, but the associate Lagrangian function, which is the function whose stationary points we actually compute. Theorem 12.3 (Second Order Sufficient Conditions). Suppose the functions f : X → R and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the Langrangian function. Suppose x∗ ∈ RN , λ∗ ∈ RK , such that x∗ and λ∗ satisfies the first order conditions of Theorem 12.1 and the constraint h(x∗ ) = 0, and x∗ satisfies the NDCQ. If Dx2 L(x∗ , λ∗ ) is negative definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict local constrained maximum. If Dx2 L(x∗ , λ∗ ) is positive definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict local constrained minimum. Theorem 12.4 (Second Order Necessary Conditions). Suppose the functions f : X → R and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the associated Langrangian function, and suppose that x∗ ∈ RN satisfies the NDCQ. If x∗ is a local constrained maximum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is negative semidefinite on the subspace {v : Dx h(x∗ )v = 0}. If x∗ is a local constrained minimum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is positive semidefinite on the subspace {v : Dx h(x∗ )v = 0}. 13 Nonlinear Programming: The Karush-Kuhn- Tucker Approach 13.1 Overview - The KKT method of dealing with inequality constraints: complementary slackness - Show that the necessary conditions for maxima of the KKT problem yield necessary conditions for maxima of the original problem - Intuition based on gradients of the objective function and constraint function, and an explanation of the constraint qualification, paying attention to the difference in the meaning of the sign of the gradient of the constraint function, and hence the difference from the case with equality constraints - Failure of CQ is problematic 42 13.2 First Order Conditions Theorem 13.1 (Karush-Kuhn-Tucker Theorem). Let f and g be continuously differentiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set is defined by {x ∈ X : g(x) ≤ 0}. Denote by JB the subset of indices of the constraints that bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there exists λ∗ ∈ RJ+ such that: X ∂gj ∂L ∗ ∗ ∂f ∗ (x , λ ) = (x ) − λ∗j (x∗ ) = 0 ∂xn ∂xn ∂xn j and ∂L ∗ ∗ (x , λ ) = −gj (x∗ ) ≥ 0, ∂λk λ∗j ≥ 0, λ∗j ∂L ∗ ∗ (x , λ ) = 0 ∂λk for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}. Suppose we have nonnegativity constraints on the choice variables. We can treat these nonnegativity constraints differently, as done so in the original Kuhn-Tucker Theorem. Theorem 13.2 (Original Kuhn-Tucker Theorem). Let f and g be continuously differentiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set is defined by {x ∈ X : g(x) ≤ 0, x ≥ 0}. Denote by JB the subset of indices of the constraints that bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there exists λ∗ ∈ RJ+ such that: X ∂gj ∂f ∗ ∂L ∗ ∗ (x , λ ) = (x ) − λ∗j (x∗ ) ≤ 0, ∂xn ∂xn ∂x n j and ∂L ∗ ∗ (x , λ ) = −gj (x∗ ) ≥ 0, ∂λk λ∗j ≥ 0, x∗n ≥ 0, λ∗j x∗n ∂L ∗ ∗ (x , λ ) = 0 ∂xn ∂L ∗ ∗ (x , λ ) = 0 ∂λk for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}. 13.3 Second Order Conditions The second-order theorems for the case of equality constraints in the classical programming framework hold here, with binding constraints and any equality constraints of the general nonlinear problem treated as the equality constraints in the classical programming framework and the nonbinding constraints just ignored. 43 13.4 The Fritz John Theorem The following theorem does not require a constraint qualification, which seems good, but in some cases introduces many candidates, even when the corresponding KKT theorem would yield sufficient conditions, such as the case of a concave objective. Theorem 13.3 (Fritz John). Suppose f and g are continuously differentiable, as in the KKT Theorem 13.1. Suppose x∗ is a constrained maximizer of f subject to the constraints g = 0. Then there exists λ∗ ∈ RJ+ and γ ∗ ∈ R, with at least one of γ ∗ , λ∗1 , . . . , λ∗J not equal to 0, such that γ ∗ Df (x∗ ) − λ∗ Dg(x∗ ) = 0. For more on this theorem, see Simon, Blume pg 475. 14 The Saddle Point Theorem Definition 14.1 (Saddle Point). Let f : X × Y → R. (x∗ , y ∗ ) ∈ X × Y is a saddle point of f if f (x, y ∗ ) ≤ f (x∗ , y ∗ ) ≤ f (x∗ , y), for ∀x ∈ X , y ∈ Y Lemma 14.1 (Interchangebility). Let f : X × Y → R, and let (x1 , y1 ) ∈ X × Y and (x2 , y2 ) ∈ X × Y be saddle point. Then (x1 , y2 ) and (x2 , y1 ) are also saddle points. Also all saddle points have the same value. Proof. We know that f (x, y1 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y and f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x2 , y), x ∈ X , y ∈ Y Then, f (x, y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 ) Also, f (x2 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) Thus, f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y 44 Similarly, we can show f (x, y1 ) ≤ f (x2 , y1 ) ≤ f (x2 , y), x ∈ X , y ∈ Y Suppose, we have the Lagrangian: L(x, λ) ≡ f (x) − λg(x) where g(x) : RN → RJ . Theorem 14.2 (Saddle Point Theorem). For any X ⊂ RN , and any f, gj : X → R, if (x∗ , λ∗ ) is a saddle point of L, then x∗ maximize f over X s.t. gj (x) ≤ 0 and moreover λ∗j gj (x∗ ) = 0, ∀j ∈ J Proof. Since (x∗ , λ∗ ) is a saddle point, L(x∗ , λ∗ ) ≤ L(x∗ , λ), and so f (x∗ ) − λg(x∗ ) ≤ f (x∗ ) − λg(x∗ ), =⇒ , λ∗ (x∗ ) ≥ λg(x∗ ) for all λ ≥ 0. Thus, g(x∗ ) ≤ 0. If this is not true, then ∗ ∗) 0 ∃j, s.t.gj (x∗ ) > 0. Now is λ̃j > λgjg(x and λ˜j 0 , ∀j 6= j, then λ̃g(x∗ ) = ˜(λj )gj (x∗ ) > λ∗ g(x∗ ), (x∗ ) violating the saddle point condition. Thus, x∗ satisfies constraints. Also, λ = 0, =⇒ λ∗ g(x∗ ) ≥ 0. But λ∗ ≥ 0 and g(x∗ ) ≤ 0, =⇒ λ∗ g(x∗ ) ≤ 0 =⇒ λ∗ g(x∗ ) = 0. In fact, we have λ∗j gj (x∗ ) = 0. Note, L(x∗ , λ∗ ) ≥ L(x, λ∗ ) and so f (x∗ ) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x). But λ∗ g(x∗ ) = 0, so f (x∗ ) ≥ f (x) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x). But λ∗ g(x∗ ) = 0. So f (x∗ ) ≥ f (x) − λ∗ g(x). If x satisfies g(x) ≤ 0, then λ∗ g(x) ≤ 0, =⇒ f (x) − λ∗ g(x) ≥ f (x). Thus, f (x∗ ) ≥ f (x) The converse of the Saddle Point Theorem 14.2 isn’t true in general. But there is a partial converse result. Theorem 14.3. Let X be convex subset of R. Let f : X → R be quasiconcave and gj : X → R be convex. Suppose, ∃x̃ ∈ X , s.t. gj (˜(x)) < 0, ∀j ∈ J , a condition known as the Slater Constraint Qualification. If x∗ is a constrained max of f and g, then ∃λ∗ ∈ RJ+ , s.t. (x∗ , λ∗ ) is a saddlepoint of L : X × RJ+ → R, L(x, λ) ≡ f (x) − λg(x) 15 The Hyperplane Theorems and the Farkas Lemma Theorem 15.1 (Strictly Separating Hyperplane Theorem). Let X ⊂ R be nonempty closed and convex. Let y ∈ RN =⇒ X . Then ∃a ∈ Rn , and c ∈ R, s.t. ax < c < ay, ∀x ∈ X 45 Importance of assumptions: 1. X is closed: otherwise y could be a boundary point and then there are points arbitrarily close to y, yielding a failure of the strict inequality (though of course we can still find a ∈ RN , s.t. ax ≤ ay) 2. X is convex: otherwise the plane will intersect X for some choice of y ∈ /X Theorem 15.2. Let x, y ⊂ RN be nonempty convex and disjoint. Then ∃a ∈ RN , s.t. 0 0 ∀x ∈ X , y ∈ Y , ax ≤ ay, and ∃x , y ∈ Y , s.t. ax < ay Theorem 15.3 (Separating Hyper-Plane Theorem). Let x, y ⊂ Z ⊂ RN , nonempty convex T and intX Y = ∅. Then ∃a ∈ RN and c ∈ R, s.t. ax ≤ c ≤ ay, ∀x ∈ X , y ∈ Y 0 Theorem 15.4 (Supporting Hyperplane Theorem). Let x ⊂ RN , X is convex and x ∈ 0 X − intX . Then, ∃a ∈ RN , a 6= 0, s.t. ax ≤ ax 0 00 00 0 00 0 Note: x needs to be a boundary point of X , otherwise ∃x ∈ X , x 6= x , s.t. ax > ax Theorem 15.5 (The Farkas Lemma). Let a1 , · · · , an , · · · , be non zero vectors in RN . Let a1 . . A≡ . am Then exactly one of following is true: 1)∃λ ∈ Rm t , s.t. b ∈ λA n 2)∃x ∈ R , s.t. bx > 0 and Ax ≤ 0 16 Solving Constrained Optimization Problems 1. Find all points that violate NDCQ (or other constraint qualification). These points are candidate optima. 2. Determine the KKT first order conditions. 3. Find all points that satisfy the KKT conditions: for every subset of constraints, assume these have nonzero multipliers and try to find any points that satisfy the KKT conditions. All such points that pass NDCQ are candidate optima. 4. If the objective is concave (convex) and the constraint function quasiconvex, then every solution of the KKT conditions is a global constrained maximum. However, we need to ensure that we haven’t missed a solution that violates the NDCQ. 46 5. Evaluate all candidate points to find global optima, or use second order conditions to discriminate between candidate points. 17 17.1 Summary of Optimization Theorems Unconstrained Let f : X → R be a continuous function, where X is an open subset of RN . Note that assuming X is open means that any local optimum is an interior optimum. Henceforth, I will assume the problem is to find maxima. The results can be easily translated for minima. Also, keep in mind that an open domain implies a maximum may not exist. 1. If f is continuously differentiable, then a necessary condition for a local maximum x∗ is that Df (x∗ ) = 0. 2. If f is twice continuously differentiable, then a necessary condition for a local maximum x∗ is that D2 f (x∗ ) is negative semidefinite. 3. If f is twice continuously differentiable, then the first order necessary condition is also a sufficient condition for a local maximum x∗ if D2 f (x∗ ) is negative definite. 4. If f is continuously differentiable and concave, then the first order necessary condition for a local maximum is also a sufficient condition for a global maximum. 5. If f is continuously differentiable and strictly concave, then x∗ is the unique maximizer of f if Df (x∗ ) = 0. 6. If f is twice continuously differentiable and concave, then, if x∗ solves Df (x∗ ) = 0 and D2 f (x∗ ) is negative definite, it is the unique maximizer. 17.2 Constrained Theorem 17.1 (Local-Global Theorem). Let f : X → R be a continuous function, where X is an open subset of RN . Suppose C ⊂ X is convex and compact, and f is quasiconcave. Then every local maximum is a global maximum. 17.2.1 Classic KKT N J Let f : RN + → R and g : R+ → R be continuous functions, and suppose f, g are continuously differentiable. 47 Consider the following problem: max f (x) x∈RN + subject to g(x) ≤ 0. Notice that x comes from RN + , and so implicitly we have nonnegativity constraints. The classic KKT conditions are: Dx f (x∗ ) − λ∗ Dx g(x∗ ) ≤ 0, x∗ ≥ 0, (17.1) g(x∗ ) ≤ 0 λ∗ ≥ 0, (17.2) x∗ (Dx f (x∗ ) − λ∗ Dx g(x∗ )) = 0, (17.3) λ∗ g(x∗ ) = 0. (17.4) (17.5) Suppose x∗ is a solution to the maximization problem. Then the KKT conditions are necessary conditions (and there exists an associated λ∗ ) if any one of the following is true: 1. The Jacobian of the constraints that bind at x∗ has full rank (NDCQ). 2. The constraints are affine 10 . 3. The constraint functions gj are convex, and there exists an interior point of the constraint set i.e. there exists x̃ ∈ RN + such that for all j, gj (x̃) < 0. This is the Slater condition. 4. The constraint functions gj are quasiconvex, have a nonempty interior (the Slater condition), and if for any j, gj is not convex, then Dx gj (x) 6= 0 for any x ∈ R+ . This is a weakening of the previous item, but ruling out pesky stationary points for the constraint functions. Suppose (x∗ , λ∗ ) is a solution of the KKT conditions. Then x∗ is a maximizer, that is the KKT conditions are sufficient conditions, if gj is quasiconvex for every j and any one of the following is true: 1. f is concave. 2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0. 10 A function F is linear if F (ax + by) = aF (x) + bF (y). An affine function is a linear function with an added constant. For example, F (x) = 12x is linear (and affine), but F (x) = 12x + 3 is not linear, though still affine. 48 3. f is quasiconcave, and one of the following holds: (a) ∂f (x∗ ) ∂xi < 0 for some i (b) ∂f (x∗ ) ∂xi > 0 for some i such that there exists x̃ ∈ RN + with x̃i > 0. 17.2.2 Modern KKT Let f : X → R and g : X → RJ be continuous functions, where X ⊂ RN is open, and suppose f, g are continuously differentiable. Consider the following problem: max f (x) x∈X subject to g(x) ≤ 0. Any nonnegativity constraints should be included in the set of inequality constraints explicitly. Define the constraint set by C ≡ {x ∈ X : g(x) ≤ 0}. The modern KKT conditions are: Dx f (x∗ ) − λ∗ Dx g(x∗ ) = 0 g(x∗ ) ≤ 0 λ∗ ≥ 0, λ∗ g(x∗ ) = 0. (17.6) (17.7) (17.8) (17.9) If every reference to classic KKT is replaced with modern KKT in the section on Classic KKT, then the results there apply here. 17.2.3 KKT - mixed constraints Let f : RN → R, g : RN → RK , h : RN → R be continuous functions, and suppose f, g, h are continuously differentiable. Consider the following problem: max f (x) x∈RN subject to gj (x) ≤ 0, hk (x) = 0 Note also that an equality constraint hk (x) = 0 could be replaced by two inequality constraints gj (x) ≤ 0 and gj 0 (x) ≤ 0, where gj = hk and gj 0 = −hk . Thus, the results below are just appropriate restatements of the results in the classic KKT section. 49 The mixed KKT conditions are: Dx f (x∗ ) − λ∗ Dx g(x∗ ) − µ∗ Dx h(x∗ ) = 0, (17.10) g(x∗ ) ≤ 0, λ∗ ≥ 0, λ∗ g(x∗ ) = 0 (17.11) h(x∗ ) = 0 (17.12) Suppose x∗ is a solution to the maximization problem. Then the mixed KKT conditions are necessary conditions (and there exists an associated λ∗ ) if any one of the following is true: 1. The Jacobian of the equality constraints and the binding inequality constraints at x∗ has full rank (NDCQ). 2. The constraints are affine. Suppose (x∗ , λ∗ , µ∗ ) is a solution of the mixed KKT conditions. Then x∗ is a maximizer, that is the KKT conditions are sufficient conditions, if gj is quasiconvex for every j, hk is linear for every k and any one of the following is true: 1. f is concave. 2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0. 3. f is quasiconcave, and one of the following holds: (a) ∂f (x∗ ) ∂xi < 0 for some i (b) ∂f (x∗ ) ∂xi > 0 for some i such that there exists x̃ ∈ RN + with x̃i > 0. 17.2.4 Saddlepoint Theorem Let f : X → R and g : X → RJ be functions on an arbitrary set X. Define the Lagrangian function L(x, λ) ≡ f (x) − λg(x), where λ ∈ RJ+ . 1. If (x∗ , λ∗ ) is a saddlepoint of L, then x∗ is a constrained maximizer, and λ∗ g(x∗ ) = 0. 2. Suppose X ⊂ RN is convex, f is concave, gj is convex for each j. If x∗ is a constrained maximizer of f subject to g ≤ 0 and there exists x̃ such that g(x̃) < 0 (the Slater condition), then there exists λ∗ ∈ RJ+ such that (x∗ , λ∗ ) is a saddlepoint of L. 50 18 References George Dantzig. Linear Programming and Extensions. 1963.11 Michael Intriligator. Mathematical Optimization and Economic Theory. 1971. 11 A pdf version is available for free from RAND Corporation. See http://www.rand.org/pubs/reports/ R366.html 51