Download Static Optimization

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Matrix calculus wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Transcript
Part II
Static Optimization
We will study here techniques that fall under the category of (static) nonlinear programming. While these techniques still apply for the subdomain of linear programming, there
exist stronger results for that domain that we will not explore. Good references for linear
programming include Dantzig and Intrilligator.
8
Statement of the Problem
The general optimization problem (for our purposes) consists of an objective function, assumed to be real-valued, together with a set of inequality constraints and equality constraints.
Given θ ∈ Θ, the problem is to find x ∈ X that solves
max f (x, θ)
(8.1)
subject to the inequality constraints
gj (x, θ) ≤ 0,
1≤j≤J
(8.2)
hk (x, θ) = 0,
1≤k≤K
(8.3)
and the equality constraints
If we define the constraint set as
!
C(θ) ≡ X ∩
\
{x : gj (x, θ) ≤ 0}
!
∩
1≤j≤J
\
{x : hk (x, θ) = 0}
(8.4)
1≤k≤K
then the problem can be more compactly written as
max f (x, θ).
(8.5)
x∈C(θ)
A solution to equation (8.5) is a global maximizer.
Definition 8.1. A point x∗ ∈ C(θ) is a global maximizer (or just maximizer) for the
maximization problem (8.5) if for all x ∈ C(θ), f (x∗ , θ) ≥ f (x, θ). It is a strict global
maximizer if for all x ∈ C(θ), x 6= x∗ , f (x∗ , θ) > f (x, θ). The definition of (strict) global
minimizer has the inequality reversed.
26
While not necessarily a solution to the maximization problem, local maximizers are interesting candidates for solutions since global maximizers are necessarily local maximizers.
Definition 8.2. A point x∗ ∈ C(θ) is a local maximizer for the maximization problem
(8.5) if there exists an open6 neighborhood U ⊂ C(θ) of x∗ such that for all x ∈ U , f (x∗ , θ) ≥
f (x, θ). It is a strict local maximizer if for all x ∈ U , x 6= x∗ , f (x∗ , θ) > f (x, θ). The
definition of (strict) local minimizer has the inequality reversed.
Suppose x∗ ∈ C(θ) is a solution of equation (8.5) (where the notation of the dependence
of x∗ on θ has been suppressed). Then the value function can defined as follows: V (θ) ≡
maxx∈C(θ) f (x, θ) = f (x∗ , θ).
Consider a statement of the following sort: “If x∗ ∈ C(θ) solves (8.5), then condition
A”, where A is a mathematical statement. This type of statement describes a necessary
condition for a maximizer. Suppose instead we have a statement of the following sort: “If
condition A, then x∗ ∈ C(θ) solves (8.5)”. This type of statement describes a sufficient
condition for a maximizer. A necessary condition furnishes a set of potential solutions and
guarantees that any solution is a member of this set. A sufficient condition furnishes a set
of guaranteed solutions but potentially excludes some solutions. Sufficient conditions can be
viewed as existence theorems.
9
Existence of Optima
Theorem 9.1 (Finite Constraint Set). If the constraint set C(θ) is nonempty and finite,
then the objective function f has both a maximizer and a minimizer in the constraint set.
Theorem 9.2 (Weierstrass Extreme Value Theorem). If the constraint set C(θ) is nonempty
and compact and f is continuous, then f has both a maximizer and a minimizer in the
constraint set.
Proof. Since continuous functions map compact sets to compact sets (see Theorem 5.9),
V ≡ f (C(θ)) ⊂ R is a compact set. By the Heine-Borel Theorem, V is closed and bounded.
But any closed and bounded subset of R contains its least upper bound, and thus has a
maximal value. Then, there exists some x ∈ C(θ) that maps to the maximal member of the
set V , and so x is a maximizer. The proof for the case of the existence of a minimizer is
analogous.
6
The relevant topology for the problem is the relative topology of C(θ) derived from that of X, both of
which can be generated from the metric of the space X.
27
Corollary 9.3. If X ≡ RN and the constraint set C(θ) is nonempty, closed and bounded,
and if f is continuous, then f has both a maximizer and a minimizer.
Example 9.1. The following are examples illustrating the role of the assumptions of the
Weierstrass Theorem.
1. Suppose C(θ) = R, which is nonempty but not compact (since it is not bounded).
Then the continuous function f (x) = x has no maximizer. Also if C(θ) = (0, 1) which
is bounded but not closed and so not compact, then f has no maximizer. But if
C(θ) = (0, 1], which is also nonempty and not compact, then our previously defined
continuous function has a maximizer.
2. Suppose C(θ) = [−1, 1], which is nonempty and compact. Then the discontinuous
function
(
1 − |x| : x 6= 0
f (x) =
0
: x=0
has no maximizer.
3. Suppose we have a discontinuous function f on R such that f (x) = 1 when x is rational,
and f (x) = 0 when x is irrational. Then f has a maximizer when C(θ) ≡ R, which is
a noncompact set.
The condition of continuity of the objective function can be weakened to yield a generalization of the Weierstrass Theorem.
Definition 9.1 (Level Sets). Let f : X → R be a function on some space X.
The level set of f at α (also termed the contour or isoquant) is the set I(α; f ) ≡
{x ∈ X : f (x) = α}.
The upper level set of f at α (also termed the upper contour) is the set U (α; f ) ≡
{x ∈ X : f (x) ≥ α}.
The lower level set of f at α (also termed the lower contour) is the set L(α; f ) ≡
{x ∈ X : f (x) ≤ α}.
When clear from context, I will denote upper and lower sets of a function without reference to the function.
Definition 9.2 (Semicontinuity). Let (X, ρ) be a metric space. A function f : X → R is
upper semicontinuous if for all α ∈ R, the upper level set U (α) is closed. It is lower
semicontinuous if for all α ∈ R, the lower level set L(α) is closed. A function f is
continuous if and only if it is both upper and lower semicontinuous.
28
Theorem 9.4 (Generalized Weierstrass Theorem). Suppose the constraint set C(θ) is compact. If f is upper semicontinuous, then it has a maximizer in the constraint set. If f is
lower semicontinuous, then it has a minimizer.
10
Convex Sets and Functions on Convex Sets
Before we dig deeper into necessary or sufficient conditions for optimizers, we will define
and understand the properties of four special classes of functions, quasiconcave, concave,
quasiconvex, and convex functions.
Suppose V is a vector space, for example RN .
Definition 10.1 (Convex Set). A set S ⊂ V is convex if for all x, y ∈ S, λx + (1 − λ)y ∈ S
for all λ ∈ (0, 1). A set is strictly convex if for all x, y ∈ S, λx + (1 − λ)y ∈ intS for all
λ ∈ (0, 1).
The empty set is assumed to be convex.
Proposition 10.1. The intersection of an arbitrary family of convex sets is convex.
Definition 10.2 (Convex Hull). The convex hull of a set S ⊂ V , denoted cvxS is smallest
(under the set inclusion order) convex set that contains S. Equivalently, it is the intersection
of all convex sets that contain S.
Definition 10.3 (Concavity and Convexity). Let f be a real-valued function on a convex
subset S of a vector space V .
The function f is concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥ λf (x) + (1 −
λ)f (y) for all λ ∈ (0, 1).
The function f is strictly concave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) >
λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).
The function f is convex if for all distinct x, y ∈ S, f (λx+(1−λ)y) ≤ λf (x)+(1−λ)f (y)
for all λ ∈ (0, 1).
The function f is strictly convex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) <
λf (x) + (1 − λ)f (y) for all λ ∈ (0, 1).
Definition 10.4 (Quasiconcavity and Quasiconvexity). Let f be a real-valued function on
a convex subset S of a vector space V .
The function f is quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≥
min{f (x), f (y)} for all λ ∈ (0, 1).
29
The function f is strictly quasiconcave if for all distinct x, y ∈ S, f (λx + (1 − λ)y) >
min{f (x), f (y)} for all λ ∈ (0, 1).
The function f is quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) ≤
max{f (x), f (y)} for all λ ∈ (0, 1).
The function f is strictly quasiconvex if for all distinct x, y ∈ S, f (λx + (1 − λ)y) <
max{f (x), f (y)} for all λ ∈ (0, 1).
Concavity and convexity could also be defined in terms of hypographs and epigraphs.
Definition 10.5 (Graph, Hypograph, Epigraph). The graph of a function f : X → R,
where X is a convex set, is the set G(f ) ≡ {(x, α) : f (x) = α} ⊂ X × R. The hypograph is
H(f ) ≡ {(x, α) : f (x) ≥ α} ⊂ X ×R. The epigraph is E(f ) ≡ {(x, α) : f (x) ≤ α} ⊂ X ×R.
Proposition 10.2. Suppose we have a function f : X → R, where X is a convex set.
The function f is concave if its hypograph H(f ) is convex, and is strictly concave if its
hypograph is strictly convex.
The function is convex if its epigraph E(f ) is convex, and is strictly convex if its epigraph
is strictly convex.
Quasiconcavity and quasiconvexity could also be defined in terms of level sets 7 .
Proposition 10.3. A function f is quasiconcave if and only if for all α ∈ R, U (α) is convex.
A function f is strictly quasiconcave if and only if for all α ∈ R, U (α) is strictly convex.
A function f is quasiconvex if and only if for all α ∈ R, L(α) is convex.
A function f is strictly quasiconvex if and only if for all α ∈ R, L(α) is strictly convex.
Notice that the definitions of these properties do not require continuity or differentiability.
In fact, the weak version of these properties have do not require a topology on the space. The
strict version of these properties (for example, strict quasiconcavity) does require the vector
space to have a norm, however, because our definition of a strictly convex set makes reference
to the interior of the set, which is a topological concept. If we strengthen the assumptions to
include differentiability (of varying degrees), we can obtain alternative conditions that are
necessary or sufficient for these properties. We shall see this below.
It is straightforward to show that every concave function is quasiconcave and every convex
function is quasiconvex. The converse is not true. For example, any monotonic function is
both quasiconcave and quasiconvex, but only linear functions are both concave and convex.
Positive monotonic transformations of a concave (convex) function do not preserve concavity
(convexity) necessarily, but they do preserve quasiconcavity (quasiconvexity).
7
The definition in terms of level sets is the one put forth by Arrow, Enthoven (Econometrica 1961).
30
Proposition 10.4. Let f : S → R be a quasiconcave (quasiconvex) function. Then, for any
nondecreasing function g : R → R, g ◦ f is quasiconcave (quasiconvex).
√
Example 10.1. Suppose f : R+ → R is defined by f (x) = x. This function is strictly
concave. Now, suppose g : R → R is defined by g(x) = x4 , which is a non-decreasing
function. Notice that g ◦ f (x) = x2 , which is a strictly convex function. Thus, concavity is
not preserved. However, both f and g ◦ f are quasiconcave.
A natural question to ask is whether every quasiconcave function is just a non-negative
monotonic transformation of some concave function. The answer is no; the following is an
example from Arrow, Enthoven (Econometrica 1961).
1
Example 10.2. Suppose f (x, y) = (x − 1) + ((x − 1)2 + 4(x + y)) 2 . The level sets of f
are nonparallel straight lines (a Grapher file of the function and level sets is available here:
https://www2.bc.edu/samson-alva/ec720f11/arrowQCexample.gcx).
Another example originally from Aumann (Econometrica 1975). Suppose f (x, y) = y +
p
x + y 2 . Again, the level sets of f are nonparallel straight lines (a Grapher file of the
function and level sets is available here: https://www2.bc.edu/samson-alva/ec720f11/
aumannQCexample.gcx). Notice that f is strictly concave when restricted to either the
first or the second dimension, but linearity of the level sets implies that it is only weakly
quasiconcave.
Philip Reny (2010) proves that a continuous quasiconcave function cannot be transformed
by a strictly increasing function into a concave function unless it has parallel level sets (his
result is actually even stronger than this). The two examples above are such continuous
quasiconcave functions.
Afriat’s Theorem states that for any finite set of choices satisfying the Generalized Axiom
of Revealed Preference there exists a continuous strictly increasing concave utility function
that would generate those choices.
For more details on concavifiability of quasiconcave functions, see the extensive discussion
in Connell, Rasmusen (2011).
Proposition 10.5. Here are some useful results about quasiconcave and concave functions:
1. If f is strictly concave, and h is strictly increasing, then h ◦ f is strictly quasiconcave.
2. If f is strictly quasiconcave and h is strictly increasing, then h ◦ f is strictly quasiconcave.
3. If f is strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly quasiconcave.
31
4. If f is weakly but not strictly quasiconcave and h is nondecreasing, then h ◦ f is weakly
quasiconcave.
5. If f is weakly but not strictly quasiconcave and h is strictly increasing, then h ◦ f is
NOT necessarily strictly quasiconcave.
Proposition 10.6. Here are some useful results about quasiconvex and convex functions:
1. If fi is quasiconvex, wi ≥ 0 then f ≡ maxi {wi fi } is quasiconvex.
2. If fi is convex, then maxi {fi } is convex.
3. If f, g are convex, and g is nondecreasing, then g(f ) is convex.
4. If f, g are concave, and g is nonincreasing, g(f ) concave.
Now, let’s make some assumptions about differentiability.
Theorem 10.7. Let X ⊂ RN , and suppose f : X → R, f ∈ C 1 .
1. f is concave if and only if, for all x, y ∈ X, f (y) − f (x) ≤ Df (x)(y − x).
2. f is strictly concave if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) < Df (x)(y − x).
3. f is convex if and only if, for all x, y ∈ X, f (y) − f (x) ≥ Df (x)(y − x).
4. f is strictly convex if and only if, for all x, y ∈ X, y 6= x, f (y) − f (x) > Df (x)(y − x).
5. f is quasiconcave if and only if, for all x, y ∈ X, f (y) ≥ f (x) implies Df (x)(y−x) ≥ 0.
6. If, for all x, y ∈ X, y 6= x, f (y) ≥ f (x) implies Df (x)(y − x) > 0, then f is strictly
quasiconcave. The converse is not true, as discussed below.
7. f is quasiconvex if and only if, for all x, y ∈ X, f (y) ≤ f (x) implies Df (x)(y − x) ≤ 0.
8. If, for all x, y ∈ X, y 6= x, f (y) ≤ f (x) implies Df (x)(y − x) < 0, then f is strictly
quasiconvex. The converse is not true, as discussed below.
Theorem 10.8. Let X ⊂ RN , and suppose f : X → R, f ∈ C 2 .
1. f is concave if and only if for all x, D2 f (x) is negative semidefinite.
2. If, for all x, D2 f (x) is negative definite, then f is strictly concave.
3. f is convex if and only if for all x, D2 f (x) is positive semidefinite.
32
4. If, for all x, D2 f (x) is positive definite, then f is strictly convex.
5. f is quasiconcave if and only if for all x, D2 f (x) is negative semidefinite on the
nullspace8 of Df (x).
6. If, for all x, D2 f (x) is negative definite on the nullspace of Df (x), then f is strictly
quasiconcave.
7. f is quasiconvex if and only if for all x, D2 f (x) is positive semidefinite on the nullspace
of Df (x).
8. If, for all x, D2 f (x) is positive definite on the nullspace of Df (x), then f is strictly
quasiconvex.
There is a characterization of (semi)definite matrices involving determinants.
Definition 10.6 (Principal Minors). Let A be a real-valued, symmetric N ×N matrix. Then
a principal minor of order m of the matrix A is a submatrix of A where all but m rows
!
and corresponding (by index) columns are deleted. There are m!(NN−m)!
principal minors of
order m.
The leading principal minor of order m is the principal minor of order m with the
last N − m rows and columns deleted.
The following theorems characterize (semi)definiteness of a symmetric matrix.
Theorem 10.9 (Characterization of Definiteness). Suppose A is a real-valued, symmetric
N × N matrix.
1. A is negative definite if and only if the determinant of the leading principal minor of
order m is nonzero and has the sign (−1)m , for all 1 ≤ m ≤ N .
2. A is positive definite if and only if the determinant of the leading principal minor of
order m is strictly positive, for all 1 ≤ m ≤ N .
3. A is negative semidefinite if and only if the determinant of every principal minor of
order m is zero or has the sign (−1)m , for all 1 ≤ m ≤ N i.e. odd-ordered principal minors have nonpositive determinants and even-ordered principal minors have
nonnegative determinants.
4. A is positive semidefinite if and only if the determinant of every principal minor of
order m is nonnegative, for all 1 ≤ m ≤ N .
8
The nullspace of a vector is the set of all vectors that are orthogonal to it. The nullspace of a matrix
is the set of all vectors that the matrix maps to the zero vector.
33
Checking definiteness of a matrix on a subspace requires using the Bordered Matrix test,
where the matrix in question is bordered on upper and on the left side by the constraints.
Definition 10.7 (Bordered Matrix). Let A be a real-valued symmetric N × N matrix and
bk ∈ RN for k ∈ {1 . . . K}, a set of independent vectors. Let B be the N × K matrix
b1 . . . bk , and denote by B 0 the transpose of B. Then, the bordered matrix of A with
!
0 B0
.
respect to B is H ≡
B A
Definition 10.8 (Border-Respecting Principal Minors). Let A be a real-valued, symmetric
N ×N matrix, and B be a real-valued N ×K matrix of full rank, and denote by H the bordered
matrix of A with respect to B. Then a border-respecting principal minor of order m
of the bordered matrix H is a submatrix of H where all but m rows and corresponding (by
index) columns are deleted, with the restriction that the index of a deleted row (and column)
be greater than K.
The leading border-respecting principal minor of order m is the principal minor
of order m with the last N + K − m rows and columns deleted.
Theorem 10.10 (Characterization of Definiteness on a Linear Constraint Set). Suppose A
N
is a real-valued symmetric N ×N matrix
and bk ∈ R for k ∈ {1 . . . K}, a set of independent
vectors. Let B be the N × K matrix b1 . . . bk , and denote by B 0 the transpose of B.
!
0 B0
Define the bordered matrix H ≡
.
B A
1. A is negative definite on the subspace {v : B 0 v = 0} if and only if for each m ∈
{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor
of order m of matrix H is nonzero and has the sign (−1)(m−K) i.e. the determinant of
H has the sign (−1)N and the signs of the last (largest) N −K leading border-respecting
principal minors have alternating signs.
2. A is positive definite on the subspace {v : b0 v = 0} if and only if for each m ∈
{2K + 1 . . . N + K}, the determinant of the leading border-respecting principal minor
of order m of matrix H is nonzero and has the sign (−1)K .
The characterization of semidefiniteness on a linear constraint set involves testing every
border-respecting principal minor of every order m ∈ {2K + 1, . . . , N + K}, and not just
the border-respecting leading principal minors, analogous to the characterization of semidefiniteness of an unconstrained symmetric matrix.
34
Theorem 10.11 (Characterization of Semidefiniteness on a Linear Constraint Set). Suppose
we have A, bk , and B as in Theorem 10.10.
1. A is negative semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈
{2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order
m alternates in sign or is equal to zero, with the sign of the determinant of H being
(−1)N or equal to zero i.e. every border-respecting principal minor of order m has a
nonpositive determinant if m − K is odd and has a nonnegative determinant if m − K
is even.
2. A is positive semidefinite on the subspace {v : b0 v = 0} if and only if for each m ∈
{2K +1 . . . N +K}, the determinant of every border-respecting principal minor of order
m is nonnegative if K is even and is nonpositive if K is odd.
Therefore, to test for, say, quasiconcavity of a twice continuously differentiable function
f in the neighborhood of a point x, we need to find the Hessian of f evaluated at x, which
is a real-valued symmetric matrix, and check whether this Hessian, when bordered by the
Jacobian of f evaluated at x, passes the test of negative semidefiniteness of a matrix on a
linear subspace described in Theorem 10.11.
10.1
References
Afriat. The Construction of Utility Functions from Expenditure Data. (International
Economic Review 1967)
Arrow, Enthoven. Quasiconcave Programming. (Econometrica 1961)
Aumann. (Econometrica 1975)
Connell, Rasmusen. Concavifying the Quasiconcave. (Working Paper 2011)
Reny. A Simple Proof of the Nonconcavifiability of Functions with Linear Not-All-Parallel
Contour Sets. (Working Paper 2010)
11
Unconstrained Optimization With a Differentiable
Objective Function
11.1
Overview
- FONC can be derived using a first-order Taylor expansion, which requires the objective
function to be continuously differentiable
35
- SONC can be derived using a second-order Taylor expansion, which requires the objective function to be twice continuously differentiable
- SOSC can be derived using a second-order Taylor expansion, which requires the objective function to be twice continuously differentiable
Also, see later 12.4 for more on second-order conditions for unconstrained problems.
11.2
References
See Kim Border’s notes on the calculus of one variable: http://www.hss.caltech.edu/
~kcb/Notes/Max1.pdf
12
Classical Programming: Optimization with Equality Constraints
Let us focus on optimization problems where the domain of the objective and constraint
functions is an open subset X of RN , with only equality constraints i.e. J = 0 in equation
(8.2)
12.1
Overview
- Introduce auxiliary variables, called multipliers, for each equality constraint, thereby
converting a constrained optimization problem to an unconstrained optimization problem
with a larger set of choice variables
- Show that the necessary conditions for maxima of the Lagrangian problem yield necessary conditions for maxima of the original problem
- Intuition based on the gradients of the objective function and the constraint function
- Interpretation of the multipliers. Nice article on Wikipedia: http://en.wikipedia.
org/wiki/Lagrange_multiplier
- Explanation of the constraint qualification and the failure of the theory to find optima
when the CQ is violated
12.2
First Order Conditions
For illustrative purposes consider the problem with one equality constraint:
max f (x1 , x2 ) subject to h1 (x1 , x2 ) = 0
36
Figure 1: Graphical Depiction of a Constrained Optimization Problem
A geometric visualization of this problem is given in Figure 1. Note from Figure 1 that at
the point x∗ = (x∗1 , x∗2 ) the level curve of f and the constraint are tangents to each other, i.e.
both have a common slope. We will explore this observation in order to find a characterization
of the solution for this class of problems and its generalization to m restrictions.
In order to find the derivative of the level curve at the optimum point x∗ , recall from the
implicit function theorem that for a function G(y1 , y2 ) = c,
dy2
(ŷ) = −
dy1
∂G
(ŷ)
∂y2
−1
∂G
(ŷ),
∂y1
where ŷ is some point in the domain, if, on an open neighborhood of (ŷ), G is continuously
∂G
2
differentiable and ∂y
(ŷ) is nonzero. In particular according to Figure 1, dx
(x̂) defined
dx1
2
dx2
∗
∗
implicitly by f (x̂) ≡ f (x ), and dx1 (x̂) defined implicitly by h(x̂) ≡ h(x ), must be the same
at x∗ :
∂f
∂h
(x∗ )
(x∗ )
dx2 ∗
dx2 ∗
∂x1
∂x1
(x ) = ∂f
= ∂h ∗ =
(x ),
∗
dx1
dx1
(x )
(x )
∂x2
∂x
2
which after some rearrangement yields
∗
λ ≡
∂f
(x∗ )
∂x1
∂h
(x∗ )
∂x1
37
=
∂f
(x∗ )
∂x2
.
∂h
(x∗ )
∂x2
(12.1)
where λ∗ ∈ R is the common value of the slope at x∗ . We are assuming that the ratios above
do not have zero denominators, the assurance of which is the motivation for the constraint
qualifications discussed later.
Now, rewrite equation (12.1) as two equations
and
∂h ∗
∂f ∗
(x ) − λ∗
(x ) = 0
∂x1
∂x1
(12.2)
∂h ∗
∂f ∗
(x ) − λ∗
(x ) = 0.
∂x2
∂x2
(12.3)
Together with the constraint equation
h(x1 , x2 ) = c,
(12.4)
we have a system of three equations (12.2), (12.3), and (12.4) with three unknowns:
(x∗1 , x∗2 , λ∗ ).
This system is equivalent to the first-order conditions for stationary points of the following
function
L(x1 , x2 , λ) ≡ f (x1 , x2 ) − λ(h(x1 , x2 ) − c),
(12.5)
which we call the Lagrangian; we also call the term λ the Lagrange multiplier. Thus, for the
case with two choice variables and one constraint, a maximizer (subject to a qualification)
∂L
∂L
x∗ satisfies ∂x
(x∗1 , x∗2 , λ∗ ) = 0, ∂x
(x∗1 , x∗2 , λ∗ ) = 0, and ∂L
(x∗1 , x∗2 , λ∗ ) = 0, for some λ∗ .
∂λ
1
2
The Lagrange method transforms a constrained problem into an unconstrained problem
via the formulation of the Lagrangian. The transformation introduces a Lagrange multiplier
for every constraint. It is important to note that the transformation is valid only if at least
∂h
∂h
one of ∂x
(x∗ ) and ∂x
(x∗ ) is nonzero. If not, then there is no way to define a multiplier, as
1
1
should be clear from examining equation (12.1). This is called the non-degenerate constraint
qualification. If the constraint is linear, this qualification will automatically be satisfied.
We can mimic the steps above for an arbitrary problem with N choice variables and K
constraints, where K < N . Suppose we have a solution to the general constrained maximization problem with only equation constraints; denote it x∗ ∈ RN . Then it must be the
case that h(x∗ ) = 0, where h is the K-dimensional vector function of constraints. Consider a
linear approximation of this constraint function at x∗ : the derivative (the Jacobian) of this
linear approximation will be Dx h(x∗ ), according to the Taylor Approximation Theorem for
a first-order approximation, assuming h is continuously differentiable at x∗ . This Jacobian
∂hk
matrix is a K × N matrix, where a generic term is ∂x
, and when each row is viewed as
n
38
a vector, the K vectors span a subspace of RN . Each row of the Jacobian is a vector (the
transpose of the gradient vector of the associated constraint) that has an associated (N − 1)dimensional nullspace (as long as this vector is non-degenerate), which is the tangent plane
to the associated constraint at x∗ , the linear approximation of the constraint function at x∗ .
Then, the K rows define K such subspaces, and for a vector to satisfy all these constraints,
the vector must be in every one of these subspaces. If the row vectors of the Jacobian are
linearly independent, then the nullspace of the Jacobian is exactly the subspace of vectors
that are in the tangent plane of each constraint, a subspace of (N − K) dimensions.
Now, given the gradient vector of the objective function Dx f (x∗ ), consider the subspace
orthogonal to the gradient. This hyperplane is a linear approximation of the level set of the
objective function at the point x∗ i.e. any local movement from x∗ within this hyperplane
will not change the value of the objective function. So, if x∗ is a maximum, it must be the
case that any local move from x∗ that does not locally violate the constraints i.e. movement
within the nullspace of the full-rank Jacobian Dx h(x∗ ), does not change the value of the
objective, and thus we can conclude that the nullspace of Dx h(x∗ ) must be contained by
the nullspace of Dx f (x∗ ). But this means that the vector Dx f (x∗ ) is not in the nullspace of
Dx h(x∗ ) and so it can be expressed as a linear combination of the rows of this matrix. Thus,
we can conclude that at a maximum, there exists K constants, λ∗k , 1 ≤ k ≤ K, such that
Dx f (x∗ ) =
X
λ∗k Dx hk (x∗ ).
k
Combined with the constraints h, we have N + K equations that must be satisfied by the
N + K unknown maximum x∗ and the constant λ∗ , under the qualification that Dx h(x∗ )
has full rank. Notice, too, that the argument above is exactly the same if x∗ is a minimum.
Thus, these necessary conditions hold for both maxima and minima. The results are formally
stated below.
Definition 12.1 (Nondegenerate Constraint Qualification). The functions hk , 1 ≤ k ≤ K
satisfy the Nondegenerate Constraint Qualification (NDCQ) at x∗ if the rank of the
Jacobian Dx h(x∗ ) is K i.e. the Jacobian has full rank.
Theorem 12.1 (First Order Necessary Conditions). Let f and h be continuously differentiable functions. Suppose x∗ is a constrained maximum or minimum of f , where the
constraint set is defined by {x ∈ X : h(x) = 0}, and suppose that the constraints satisfy the
NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λh(x) to be the associated Lagrangian. Then there
39
exists λ∗ ∈ RK such that:
X ∂hk
∂f ∗
∂L ∗ ∗
(x , λ ) =
(x ) −
(x∗ ) = 0
λ∗k
∂xn
∂xn
∂xn
k
and
∂L ∗ ∗
(x , λ ) = −hk (x∗ ) = 0
∂λk
for every n ∈ {1, . . . , N }, k ∈ {1, . . . , K}, which equivalently means that (x∗ , λ∗ ) is a stationary point of the Lagrangian.
12.3
Meaning of the Multiplier
Theorem 12.2 (Meaning of the Multiplier). Let f , hk be C 1 function with domain in RN
and let θk ∈ R. Consider the maximization problem: max f (x) subject to hk (x) = θk ,
k ∈ {1, . . . , K}.
Suppose x∗ is a constrained maximizer, and λ∗ the associated Lagrange multipliers, and
suppose x∗ and λ∗ are C 1 functions of θ. Then,
∂f (x∗ (θ))
= λ∗k (θ).
∂θk
From Theorem 12.2 we conclude that the Lagrange multiplier can be interpreted as the
change in the maximum achieved by the objective function if we slightly relax (tighten) the
corresponding constraint. Consider for example the case in which f is a utility function with
two arguments x1 and x2 , with h(x1 , x2 ) = I representing the budget constraint given an
income of I. The Lagrange multiplier associated with the budget constraint is equivalent to
the marginal utility of income.
The principle behind this result is the envelope theorem, which states that only the direct
effect of an exogenous parameter on the objective function matters when studying the total
effect of a change on the optimal value. The change in the parameter also induces a change
in the endogenous choice variables, but optimality requires that the first-order effect of a
change in the endogenous variables will have no effect on the value of the objective.
12.4
Second Order Conditions
12.4.1
Unconstrained Optimization
As discussed in section 10, a concave function f , when twice continuously differentiable,
is characterized by the result that they have negative semidefinite Hessians i.e. Dx2 f is
40
negative semidefinite at every point on the domain. There is no characterization of a strictly
concave function, but a negative definite Hessian on the domain of the function is a sufficient
condition of strict concavity.
We can easily define local version of these concepts at a point x, by weakening the
requirement that the property of the Hessian holds on the whole domain to holding for some
open neighborhood of the point x. Then, if x is a local maximum i.e. x is a maximum of f
in an open neighborhood of x, then f (x) ≥ f (x0 ) for all x0 in the neighborhood, and with a
continuously differentiable function f , this yields Df (x)(x0 − x) ≥ f (x0 ) − f (x). But this is
exactly the condition for a continuously differentiable function to be concave, and so local
concavity of f is a necessary condition for a local maximum. If we also know f is twice
continuously differentiable, then we can conclude that the Hessian of f must be negative
semidefinite on the neighborhood, given that f must be locally concave.
Suppose f is locally strictly concave on this open neighborhood, then x is a local strict
maximum.9 Moreover, if f is twice continuously differentiable at x, then strict local concavity
implies Dx2 f (x) is negative definite on this neighborhood, and thus serves as a sufficient condition for a local maximum when combined with some necessary conditions for a maximum,
such as the standard first-order conditions.
Definition 12.2 (Regular Maximum). For some twice continuously differentiable function
f : X → R, we call a local maximizer x∗ regular if Dx2 f (x∗ ) is negative definite on an open
neighborhood of x∗ .
Notice that from the arguments above, a regular maximum is a strict local maximum.
However, the converse need not be true, as should be clear from the following example.
Example 12.1 (A strict local maximizer that is not regular). Suppose f (x) = −x4 . Then,
f is twice continuously differentiable, with a strict (local) maximum at x = 0. However,
f 00 (x) = −12x2 evaluates to 0 at x = 0, and so f 00 is not negative definite at 0. However, f is
strictly concave, and so we see that strict concavity does not imply the Hessian is negative
definite, and as a consequence, a strict local maximum need not be regular.
12.4.2
Constrained Optimization
For constrained maximization problems, the intuition for second order conditions is similar to the unconstrained case. However, the constraints imply that local (strict) concavity
9
It may seem that the converse should also be true, but notice that the function f (x) = −|x|, which has
a strict maximum at x = 0, is only concave, and not strictly concave, even locally at 0.
41
at the optimum need only be tested on the tangent space of the constraint set at the optimum. Moreover, the relevant function is no longer the objective function, but the associate
Lagrangian function, which is the function whose stationary points we actually compute.
Theorem 12.3 (Second Order Sufficient Conditions). Suppose the functions f : X → R
and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the Langrangian
function. Suppose x∗ ∈ RN , λ∗ ∈ RK , such that x∗ and λ∗ satisfies the first order conditions
of Theorem 12.1 and the constraint h(x∗ ) = 0, and x∗ satisfies the NDCQ.
If Dx2 L(x∗ , λ∗ ) is negative definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict
local constrained maximum.
If Dx2 L(x∗ , λ∗ ) is positive definite on the subspace {v : Dx h(x∗ )v = 0}, then x∗ is a strict
local constrained minimum.
Theorem 12.4 (Second Order Necessary Conditions). Suppose the functions f : X → R
and h : X → RK are twice continuously differentiable. Let L ≡ f − µh be the associated
Langrangian function, and suppose that x∗ ∈ RN satisfies the NDCQ.
If x∗ is a local constrained maximum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is
negative semidefinite on the subspace {v : Dx h(x∗ )v = 0}.
If x∗ is a local constrained minimum, then there exists λ∗ ∈ RK such that Dx2 L(x∗ , λ∗ ) is
positive semidefinite on the subspace {v : Dx h(x∗ )v = 0}.
13
Nonlinear
Programming:
The
Karush-Kuhn-
Tucker Approach
13.1
Overview
- The KKT method of dealing with inequality constraints: complementary slackness
- Show that the necessary conditions for maxima of the KKT problem yield necessary
conditions for maxima of the original problem
- Intuition based on gradients of the objective function and constraint function, and an
explanation of the constraint qualification, paying attention to the difference in the meaning
of the sign of the gradient of the constraint function, and hence the difference from the case
with equality constraints
- Failure of CQ is problematic
42
13.2
First Order Conditions
Theorem 13.1 (Karush-Kuhn-Tucker Theorem). Let f and g be continuously differentiable
functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set
is defined by {x ∈ X : g(x) ≤ 0}. Denote by JB the subset of indices of the constraints that
bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the NDCQ
at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there exists
λ∗ ∈ RJ+ such that:
X ∂gj
∂L ∗ ∗
∂f ∗
(x , λ ) =
(x ) −
λ∗j
(x∗ ) = 0
∂xn
∂xn
∂xn
j
and
∂L ∗ ∗
(x , λ ) = −gj (x∗ ) ≥ 0,
∂λk
λ∗j ≥ 0,
λ∗j
∂L ∗ ∗
(x , λ ) = 0
∂λk
for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}.
Suppose we have nonnegativity constraints on the choice variables. We can treat these
nonnegativity constraints differently, as done so in the original Kuhn-Tucker Theorem.
Theorem 13.2 (Original Kuhn-Tucker Theorem). Let f and g be continuously differentiable
functions. Suppose x∗ is a constrained maximum or minimum of f , where the constraint set
is defined by {x ∈ X : g(x) ≤ 0, x ≥ 0}. Denote by JB the subset of indices of the constraints
that bind (hold with equality) at x∗ , and suppose that these binding constraints satisfy the
NDCQ at x∗ . Define L(x, λ) ≡ f (x) − λg(x) to be the associated Lagrangian. Then there
exists λ∗ ∈ RJ+ such that:
X ∂gj
∂f ∗
∂L ∗ ∗
(x , λ ) =
(x ) −
λ∗j
(x∗ ) ≤ 0,
∂xn
∂xn
∂x
n
j
and
∂L ∗ ∗
(x , λ ) = −gj (x∗ ) ≥ 0,
∂λk
λ∗j ≥ 0,
x∗n ≥ 0,
λ∗j
x∗n
∂L ∗ ∗
(x , λ ) = 0
∂xn
∂L ∗ ∗
(x , λ ) = 0
∂λk
for every n ∈ {1, . . . , N }, j ∈ {1, . . . , J}.
13.3
Second Order Conditions
The second-order theorems for the case of equality constraints in the classical programming framework hold here, with binding constraints and any equality constraints of the
general nonlinear problem treated as the equality constraints in the classical programming
framework and the nonbinding constraints just ignored.
43
13.4
The Fritz John Theorem
The following theorem does not require a constraint qualification, which seems good,
but in some cases introduces many candidates, even when the corresponding KKT theorem
would yield sufficient conditions, such as the case of a concave objective.
Theorem 13.3 (Fritz John). Suppose f and g are continuously differentiable, as in the
KKT Theorem 13.1. Suppose x∗ is a constrained maximizer of f subject to the constraints
g = 0. Then there exists λ∗ ∈ RJ+ and γ ∗ ∈ R, with at least one of γ ∗ , λ∗1 , . . . , λ∗J not equal
to 0, such that
γ ∗ Df (x∗ ) − λ∗ Dg(x∗ ) = 0.
For more on this theorem, see Simon, Blume pg 475.
14
The Saddle Point Theorem
Definition 14.1 (Saddle Point). Let f : X × Y → R. (x∗ , y ∗ ) ∈ X × Y is a saddle point
of f if f (x, y ∗ ) ≤ f (x∗ , y ∗ ) ≤ f (x∗ , y), for ∀x ∈ X , y ∈ Y
Lemma 14.1 (Interchangebility). Let f : X × Y → R, and let (x1 , y1 ) ∈ X × Y and
(x2 , y2 ) ∈ X × Y be saddle point. Then (x1 , y2 ) and (x2 , y1 ) are also saddle points. Also all
saddle points have the same value.
Proof. We know that
f (x, y1 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y
and
f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x2 , y), x ∈ X , y ∈ Y
Then,
f (x, y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 )
Also,
f (x2 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 )
Thus,
f (x, y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y2 ) ≤ f (x2 , y2 ) ≤ f (x1 , y1 ) ≤ f (x1 , y), x ∈ X , y ∈ Y
44
Similarly, we can show
f (x, y1 ) ≤ f (x2 , y1 ) ≤ f (x2 , y), x ∈ X , y ∈ Y
Suppose, we have the Lagrangian:
L(x, λ) ≡ f (x) − λg(x)
where g(x) : RN → RJ .
Theorem 14.2 (Saddle Point Theorem). For any X ⊂ RN , and any f, gj : X → R, if
(x∗ , λ∗ ) is a saddle point of L, then x∗ maximize f over X s.t. gj (x) ≤ 0 and moreover
λ∗j gj (x∗ ) = 0, ∀j ∈ J
Proof. Since (x∗ , λ∗ ) is a saddle point, L(x∗ , λ∗ ) ≤ L(x∗ , λ), and so f (x∗ ) − λg(x∗ ) ≤ f (x∗ ) −
λg(x∗ ), =⇒ , λ∗ (x∗ ) ≥ λg(x∗ ) for all λ ≥ 0. Thus, g(x∗ ) ≤ 0. If this is not true, then
∗
∗)
0
∃j, s.t.gj (x∗ ) > 0. Now is λ̃j > λgjg(x
and λ˜j 0 , ∀j 6= j, then λ̃g(x∗ ) = ˜(λj )gj (x∗ ) > λ∗ g(x∗ ),
(x∗ )
violating the saddle point condition. Thus, x∗ satisfies constraints. Also, λ = 0, =⇒
λ∗ g(x∗ ) ≥ 0. But λ∗ ≥ 0 and g(x∗ ) ≤ 0, =⇒ λ∗ g(x∗ ) ≤ 0 =⇒ λ∗ g(x∗ ) = 0. In fact,
we have λ∗j gj (x∗ ) = 0. Note, L(x∗ , λ∗ ) ≥ L(x, λ∗ ) and so f (x∗ ) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x).
But λ∗ g(x∗ ) = 0, so f (x∗ ) ≥ f (x) − λ∗ g(x∗ ) ≥ f (x) − λ∗ g(x). But λ∗ g(x∗ ) = 0. So
f (x∗ ) ≥ f (x) − λ∗ g(x). If x satisfies g(x) ≤ 0, then λ∗ g(x) ≤ 0, =⇒ f (x) − λ∗ g(x) ≥ f (x).
Thus, f (x∗ ) ≥ f (x)
The converse of the Saddle Point Theorem 14.2 isn’t true in general. But there is a
partial converse result.
Theorem 14.3. Let X be convex subset of R. Let f : X → R be quasiconcave and gj : X →
R be convex. Suppose, ∃x̃ ∈ X , s.t. gj (˜(x)) < 0, ∀j ∈ J , a condition known as the Slater
Constraint Qualification.
If x∗ is a constrained max of f and g, then ∃λ∗ ∈ RJ+ , s.t. (x∗ , λ∗ ) is a saddlepoint of
L : X × RJ+ → R, L(x, λ) ≡ f (x) − λg(x)
15
The Hyperplane Theorems and the Farkas Lemma
Theorem 15.1 (Strictly Separating Hyperplane Theorem). Let X ⊂ R be nonempty closed
and convex. Let y ∈ RN =⇒ X . Then ∃a ∈ Rn , and c ∈ R, s.t. ax < c < ay, ∀x ∈ X
45
Importance of assumptions:
1. X is closed: otherwise y could be a boundary point and then there are points arbitrarily
close to y, yielding a failure of the strict inequality (though of course we can still find
a ∈ RN , s.t. ax ≤ ay)
2. X is convex: otherwise the plane will intersect X for some choice of y ∈
/X
Theorem 15.2. Let x, y ⊂ RN be nonempty convex and disjoint. Then ∃a ∈ RN , s.t.
0
0
∀x ∈ X , y ∈ Y , ax ≤ ay, and ∃x , y ∈ Y , s.t. ax < ay
Theorem 15.3 (Separating Hyper-Plane Theorem). Let x, y ⊂ Z ⊂ RN , nonempty convex
T
and intX Y = ∅. Then ∃a ∈ RN and c ∈ R, s.t. ax ≤ c ≤ ay, ∀x ∈ X , y ∈ Y
0
Theorem 15.4 (Supporting Hyperplane Theorem). Let x ⊂ RN , X is convex and x ∈
0
X − intX . Then, ∃a ∈ RN , a 6= 0, s.t. ax ≤ ax
0
00
00
0
00
0
Note: x needs to be a boundary point of X , otherwise ∃x ∈ X , x 6= x , s.t. ax > ax
Theorem 15.5 (The Farkas Lemma). Let a1 , · · · , an , · · · , be non zero vectors in RN . Let


a1
 . 
. 
A≡
 . 
am
Then exactly one of following is true: 1)∃λ ∈ Rm
t , s.t. b ∈ λA
n
2)∃x ∈ R , s.t. bx > 0 and Ax ≤ 0
16
Solving Constrained Optimization Problems
1. Find all points that violate NDCQ (or other constraint qualification). These points
are candidate optima.
2. Determine the KKT first order conditions.
3. Find all points that satisfy the KKT conditions: for every subset of constraints, assume these have nonzero multipliers and try to find any points that satisfy the KKT
conditions. All such points that pass NDCQ are candidate optima.
4. If the objective is concave (convex) and the constraint function quasiconvex, then every
solution of the KKT conditions is a global constrained maximum. However, we need
to ensure that we haven’t missed a solution that violates the NDCQ.
46
5. Evaluate all candidate points to find global optima, or use second order conditions to
discriminate between candidate points.
17
17.1
Summary of Optimization Theorems
Unconstrained
Let f : X → R be a continuous function, where X is an open subset of RN . Note that
assuming X is open means that any local optimum is an interior optimum. Henceforth, I
will assume the problem is to find maxima. The results can be easily translated for minima.
Also, keep in mind that an open domain implies a maximum may not exist.
1. If f is continuously differentiable, then a necessary condition for a local maximum x∗
is that Df (x∗ ) = 0.
2. If f is twice continuously differentiable, then a necessary condition for a local maximum
x∗ is that D2 f (x∗ ) is negative semidefinite.
3. If f is twice continuously differentiable, then the first order necessary condition is also
a sufficient condition for a local maximum x∗ if D2 f (x∗ ) is negative definite.
4. If f is continuously differentiable and concave, then the first order necessary condition
for a local maximum is also a sufficient condition for a global maximum.
5. If f is continuously differentiable and strictly concave, then x∗ is the unique maximizer
of f if Df (x∗ ) = 0.
6. If f is twice continuously differentiable and concave, then, if x∗ solves Df (x∗ ) = 0 and
D2 f (x∗ ) is negative definite, it is the unique maximizer.
17.2
Constrained
Theorem 17.1 (Local-Global Theorem). Let f : X → R be a continuous function, where
X is an open subset of RN . Suppose C ⊂ X is convex and compact, and f is quasiconcave.
Then every local maximum is a global maximum.
17.2.1
Classic KKT
N
J
Let f : RN
+ → R and g : R+ → R be continuous functions, and suppose f, g are
continuously differentiable.
47
Consider the following problem:
max f (x)
x∈RN
+
subject to
g(x) ≤ 0.
Notice that x comes from RN
+ , and so implicitly we have nonnegativity constraints.
The classic KKT conditions are:
Dx f (x∗ ) − λ∗ Dx g(x∗ ) ≤ 0, x∗ ≥ 0,
(17.1)
g(x∗ ) ≤ 0 λ∗ ≥ 0,
(17.2)
x∗ (Dx f (x∗ ) − λ∗ Dx g(x∗ )) = 0,
(17.3)
λ∗ g(x∗ ) = 0.
(17.4)
(17.5)
Suppose x∗ is a solution to the maximization problem. Then the KKT conditions are
necessary conditions (and there exists an associated λ∗ ) if any one of the following is true:
1. The Jacobian of the constraints that bind at x∗ has full rank (NDCQ).
2. The constraints are affine
10
.
3. The constraint functions gj are convex, and there exists an interior point of the constraint set i.e. there exists x̃ ∈ RN
+ such that for all j, gj (x̃) < 0. This is the Slater
condition.
4. The constraint functions gj are quasiconvex, have a nonempty interior (the Slater
condition), and if for any j, gj is not convex, then Dx gj (x) 6= 0 for any x ∈ R+ . This
is a weakening of the previous item, but ruling out pesky stationary points for the
constraint functions.
Suppose (x∗ , λ∗ ) is a solution of the KKT conditions. Then x∗ is a maximizer, that is
the KKT conditions are sufficient conditions, if gj is quasiconvex for every j and any one of
the following is true:
1. f is concave.
2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0.
10
A function F is linear if F (ax + by) = aF (x) + bF (y). An affine function is a linear function with an
added constant. For example, F (x) = 12x is linear (and affine), but F (x) = 12x + 3 is not linear, though
still affine.
48
3. f is quasiconcave, and one of the following holds:
(a)
∂f
(x∗ )
∂xi
< 0 for some i
(b)
∂f
(x∗ )
∂xi
> 0 for some i such that there exists x̃ ∈ RN
+ with x̃i > 0.
17.2.2
Modern KKT
Let f : X → R and g : X → RJ be continuous functions, where X ⊂ RN is open, and
suppose f, g are continuously differentiable.
Consider the following problem:
max f (x)
x∈X
subject to
g(x) ≤ 0.
Any nonnegativity constraints should be included in the set of inequality constraints
explicitly. Define the constraint set by C ≡ {x ∈ X : g(x) ≤ 0}.
The modern KKT conditions are:
Dx f (x∗ ) − λ∗ Dx g(x∗ ) = 0
g(x∗ ) ≤ 0 λ∗ ≥ 0,
λ∗ g(x∗ ) = 0.
(17.6)
(17.7)
(17.8)
(17.9)
If every reference to classic KKT is replaced with modern KKT in the section on Classic
KKT, then the results there apply here.
17.2.3
KKT - mixed constraints
Let f : RN → R, g : RN → RK , h : RN → R be continuous functions, and suppose f, g, h
are continuously differentiable.
Consider the following problem:
max f (x)
x∈RN
subject to
gj (x) ≤ 0,
hk (x) = 0
Note also that an equality constraint hk (x) = 0 could be replaced by two inequality constraints gj (x) ≤ 0 and gj 0 (x) ≤ 0, where gj = hk and gj 0 = −hk . Thus, the results below are
just appropriate restatements of the results in the classic KKT section.
49
The mixed KKT conditions are:
Dx f (x∗ ) − λ∗ Dx g(x∗ ) − µ∗ Dx h(x∗ ) = 0,
(17.10)
g(x∗ ) ≤ 0, λ∗ ≥ 0, λ∗ g(x∗ ) = 0
(17.11)
h(x∗ ) = 0
(17.12)
Suppose x∗ is a solution to the maximization problem. Then the mixed KKT conditions
are necessary conditions (and there exists an associated λ∗ ) if any one of the following is
true:
1. The Jacobian of the equality constraints and the binding inequality constraints at x∗
has full rank (NDCQ).
2. The constraints are affine.
Suppose (x∗ , λ∗ , µ∗ ) is a solution of the mixed KKT conditions. Then x∗ is a maximizer,
that is the KKT conditions are sufficient conditions, if gj is quasiconvex for every j, hk is
linear for every k and any one of the following is true:
1. f is concave.
2. f is twice continuously differentiable, quasiconcave, and Dx f (x∗ ) 6= 0.
3. f is quasiconcave, and one of the following holds:
(a)
∂f
(x∗ )
∂xi
< 0 for some i
(b)
∂f
(x∗ )
∂xi
> 0 for some i such that there exists x̃ ∈ RN
+ with x̃i > 0.
17.2.4
Saddlepoint Theorem
Let f : X → R and g : X → RJ be functions on an arbitrary set X. Define the
Lagrangian function L(x, λ) ≡ f (x) − λg(x), where λ ∈ RJ+ .
1. If (x∗ , λ∗ ) is a saddlepoint of L, then x∗ is a constrained maximizer, and λ∗ g(x∗ ) = 0.
2. Suppose X ⊂ RN is convex, f is concave, gj is convex for each j. If x∗ is a constrained
maximizer of f subject to g ≤ 0 and there exists x̃ such that g(x̃) < 0 (the Slater
condition), then there exists λ∗ ∈ RJ+ such that (x∗ , λ∗ ) is a saddlepoint of L.
50
18
References
George Dantzig. Linear Programming and Extensions. 1963.11
Michael Intriligator. Mathematical Optimization and Economic Theory. 1971.
11
A pdf version is available for free from RAND Corporation. See http://www.rand.org/pubs/reports/
R366.html
51