Download Introduction to Optimization Theory

Document related concepts

Cayley–Hamilton theorem wikipedia , lookup

Matrix calculus wikipedia , lookup

Derivative wikipedia , lookup

Shapley–Folkman lemma wikipedia , lookup

Brouwer fixed-point theorem wikipedia , lookup

Transcript
Introduction to Optimization
Theory
Lecture Notes
JIANFEI SHEN
SCHOOL OF ECONOMICS
SHANDONG UNIVERSITY
Besides language and music, mathematics is one of the primary
manifestations of the free creative power of the human mind.
— Hermann Weyl
CONTENTS
.
.
.
.
.
.
.
.
.
1
2
8
13
13
16
19
25
27
28
2 Optimization in Rn
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2.2 Unconstrained Optimization . . . . . . . . . . . . . . . . . . .
2.3 Equality Constrained Optimization: Lagrange’s Method . . .
2.4 Inequality Constrained Optimization: Kuhn-Tucker Theorem
2.5 Envelop Theorem . . . . . . . . . . . . . . . . . . . . . . . . . .
.
.
.
.
.
31
31
33
35
42
49
3 Convex Analysis in Rn
3.1 Convex Sets . . . . . . . . . . .
3.2 Separation Theorem . . . . . .
3.3 Systems of Linear Inequalities:
3.4 Convex Functions . . . . . . . .
3.5 Convexity and Optimization .
.
.
.
.
.
55
55
57
63
65
70
1 Multivariable Calculus
1.1 Functions on Euclidean Space . . . . . . . . . . . . . .
1.2 Directional Derivative and Derivative . . . . . . . . . .
1.3 Partial Derivatives and the Jacobian . . . . . . . . . . .
1.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . .
1.5 The Implicit Function Theorem . . . . . . . . . . . . . .
1.6 Gradient and Its Properties . . . . . . . . . . . . . . . .
1.7 Continuously Differentiable Functions . . . . . . . . .
1.8 Quadratic Forms: Definite and Semidefinite Matrices
1.9 Homogeneous Functions and Euler’s Formula . . . . .
. . . . . . . .
. . . . . . . .
Theorems of
. . . . . . . .
. . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. . . . . . . . . .
. . . . . . . . . .
the Alternative
. . . . . . . . . .
. . . . . . . . . .
References
75
Index
78
1
MULTIVARIABLE CALCULUS
In this chapter we consider functions mapping Rm into Rn , and we define
what we mean by the derivative of such a function. It is important to be
familiar with the idea that the derivative at a point a of a map between
open sets of (normed) vector spaces is a linear transformation between the
vector spaces (in this chapter the linear transformation is represented as a
n m matrix).
This chapter is based on Spivak (1965, Chapters 1 & 2) and Munkres
(1991, Chapter 2)—one could do no better than to study theses two excellent books for multivariable calculus.
Notation
We use standard notation:
N ´ f1; 2; 3; : : :g D the set of all natural numbers;
Z ´ f0; ˙1; ˙2; : : :g D the set of all integers;
n
W n; m 2 Z and m ¤ 0 D the set of all rational numbers;
Q´
m
R ´ the set of all real numbers:
We also define
RC ´ fx 2 R W x > 0g and RCC ´ fx 2 R W x > 0g:
CHAPTER 1
2
1.1
MULTIVARIABLE CALCULUS
Functions on Euclidean Space
Norm, Inner Product and Metric
Definition 1.1 (Euclidean n-space) Euclidean n-space Rn is defined as the
set of all n-tuples .x1 ; : : : ; xn / of real numbers xi :
˚
Rn ´ .x1 ; : : : ; xn / W each xi 2 R :
An element of Rn is often called a point in Rn , and R1 , R2 , R3 are often called
the line, the plane, and space, respectively.
If x denotes an element of Rn , then x is an n-tuple of numbers, the i th
one of which is denoted xi ; thus, we can write
x D .x1 ; : : : ; xn /:
A point in Rn is frequently also called a vector in Rn , because Rn , with
x C y D .x1 C y1 ; : : : ; xn C yn / x; y 2 Rn
and
˛x D .˛x1 ; : : : ; ˛xn /;
˛ 2 R and x 2 Rn ;
as operations, is a vector space.
To obtain the full geometric structure of Rn , we introduce three structures on Rn : the Euclidean norm, inner product and metric.
Definition 1.2 (Norm) In Rn , the length of a vector x 2 Rn , usually called
the norm kxk of x , is defined by
kxk D
q
x12 C xn2 :
Remark 1.3 The norm k k satisfies the following properties: for all x; y 2
Rn and ˛ 2 R,
kxk > 0,
kxk D 0 iff1 x D 0,
k˛xk D j˛j kxk,
kx C yk 6 kxk C kyk (Triangle inequality).
1 “iff”
is the abbreviation of “if and only if”.
1.1 FUNCTIONS ON EUCLIDEAN SPACE
?
ˇ
Exercise 1.4 Prove that ˇkxk
(use the triangle inequality).
3
ˇ
kykˇ 6 kx yk for any two vectors x; y 2 Rn
Definition 1.5 (Inner Product) Given x; y 2 Rn , the inner product of the
vectors x and y , denoted x y or hx; yi, is defined as
xy D
n
X
xi y i :
i D1
Remark 1.6 The norm and the inner product are related through the
following identity:
p
x x:
kxk D
Theorem 1.7 (Cauchy-Schwartz Inequality)
For any x; y 2 Rn we have
jx yj 6 kxk kyk:
Proof. We assume that x ¤ 0; for otherwise the proof is trivial. For every
a 2 R, we have
0 6 kax C yk2 D a2 kxk2 C 2a.x y/ C kyk2 :
In particular, let a D
the desired result.
?
.x y/=kxk2 . Then, from the above display, we get
t
u
Exercise 1.8 Prove the triangle inequality (use the Cauchy-Schwartz Inequality). Show it holds with equality iff one of the vector is a nonnegative
scalar multiple of the other.
Definition 1.9 (Metric)
is given by
The distance d.x; y/ between two vectors x; y 2 Rn
p
d.x; y/ D
n
X
.xi
yi /2 :
iD1
The distance function d is called a metric.
Example 1.10 In R2 , choose two points x 1 D .x11 ; x21 / and x 2 D .x12 ; x22 /
with x12 x11 D a and x22 x21 D b . Then Pythagoras tells us that (Figure 1.1)
p
d.x ; x / D a2 C b 2 D
1
Remark 1.11
2
r
x12
x11
2
C x22
x21
2
:
The metric is related to the norm k k through the identity
d.x; y/ D kx
yk:
CHAPTER 1
4
MULTIVARIABLE CALCULUS
x2
x2
x22
2
p
2/
1 ;x
x21
0
x1
2
a
Cb
D
x22
d .x
x12
x11
x21 D b
x11 D a
x12
x1
Figure 1.1: Distance in the plane.
Subsets of Rn
Definition 1.12 (Open Ball) Let x 2 Rn and r > 0. The open ball B.xI r/
with center x and radius r is given by
˚
B.xI r/ ´ y 2 Rn W d.x; y/ < r :
Definition 1.13 (Interior) Let S Rn . A point x 2 S is called an interior
point of S if there is some r > 0 such that B.xI r/ S . The set of all interior
points of S is called its interior and is denoted S B .
Definition 1.14
Let S Rn .
S is open if for every x 2 S there exists r > 0 such that B.xI r/ S .
S is closed if its complement Rn X S is open.
S is bounded if there exists r > 0 such that S B.0I r/.
S is compact if (and only if) it is closed and bounded (Heine-Borel Theorem).2
Example 1.15 On R, the interval .0; 1/ is open, the interval Œ0; 1 is closed.
Both .0; 1/ and Œ0; 1 are bounded, and Œ0; 1 is compact. However, the interval
.0; 1 is neither open nor closed. But R is both open and closed.
2 This
details.
definition does not work for more general metric spaces. See Willard (2004) for
1.1 FUNCTIONS ON EUCLIDEAN SPACE
5
Limit and Continuity
Functions A function from Rm to Rn (sometimes called a vector-valued
function of m variables) is a rule which associates to each point in Rm some
point in Rn . We write
f W Rm ! Rn
to indicate that f .x/ 2 Rn is defined for x 2 Rm .
The notation f W A ! Rn indicates that f .x/ is defined only for x in the
set A, which is called the domain of f . If B A, we define f .B/ as the set
of all f .x/ for x 2 B :
f .B/ ´ ff .x/ W x 2 Bg :
If C Rn we define
f
1
.C / ´ fx 2 A W f .x/ 2 C g :
The notation f W A ! B indicates that f .A/ B . The graph of f W A ! B is
the set of all pairs .a; b/ 2 A B such that b D f .a/.
A function f W A ! Rn determines n component functions f1 ; : : : ; fn W A !
R by
f .x/ D f1 .x/; : : : ; fn .x/ :
Sequences A sequence is a function that assigns to each natural number
n 2 N a vector or point x n 2 Rn . We usually write the sequences as .x n /1
nD1
or .x n /.
Example 1.16
Examples of sequences in R2 are
(a) .x n / D ..n; n//.
(b) .x n / D ..cos
n
; sin n
//.
2
2
(c) .x n / D ... 1/n =2n ; 1=2n //.
(d) .x n / D ... 1/n
1=n; . 1/n
1=n//.
See Figure 1.2.
Definition 1.17 (Limit) A sequence .x n / is said to have a limit x or to
converge to x if for every " > 0 there is N" 2 N such that whenever n > N" ,
we have x n 2 B.xI "/. We write
lim x n D x
n!1
or
x n ! x:
Example 1.18 In Example 1.16, the sequences (a), (b) and (d) do not converge, while the sequence (c) converges to .0; 0/.
CHAPTER 1
6
MULTIVARIABLE CALCULUS
x2
x2
x1 x5
x2
x6
x3
x4
x1
0
x2
x1
x3
x1
0
(b) .x n / D .cos
(a) .x n / D .n; n/
n
n
2 ; sin 2 /
x2
x1
x2
x3
x4
x1
0
˚
(c) .x n / D .. 1/n =2n ; 1=2n / , which is convergent
x2
x2 x4
x1
0
x3
x1
(d) .x n / D .. 1/n
1=n; . 1/n
1=n/
Figure 1.2: Examples of sequences
1.1 FUNCTIONS ON EUCLIDEAN SPACE
7
0
0
(a) A continuous function.
x
(b) We cannot draw the graph of 1=x
without taking our pencil off the paper.
Figure 1.3: Naive continuity.
Naive Continuity The simplest way to say that a function f W A ! R is
continuous is to say that one can draw its graph without taking the pencil off
the paper. For example, a function whose graph looks like in Figure 1.3(a)
would be continuous in this sense (Crossley, 2005, Chapter 2).
But if we look at the function f .x/ D 1=x , then we see that things are
not so simple. The graph of this function has two parts—one part corresponding to negative x values, and the other to positive x values. The
function is not defined at 0, so we certainly cannot draw both parts of this
graph without taking our pencil off the paper; see Figure 1.3(b). Of course,
f .x/ D 1=x is continuous near every point in its domain. Such a function
deserves to be called continuous. So this characterization of continuity in
terms of graph-sketching is too simplistic.
Rigorous Continuity The notation limx!a f .x/ D b means, as in the
one-variable case, that we get f .x/ as close to b as desired, by choosing x
sufficiently close to, but not equal to, a. In mathematical terms this means
that for every number " > 0 there is a number ı > 0 such that kf .x/ bk < "
for all x in the domain of f which satisfy 0 < kx ak < ı .
A function f W A ! Rn is called continuous at a 2 A if limx!a f .x/ D
f .a/, and f is continuous if it is continuous at each a 2 A.
CHAPTER 1
8
?
Exercise 1.19
˚
Let
f .x/ D
x
MULTIVARIABLE CALCULUS
if x ¤ 1
3=2 if x D 1:
Show that f .x/ is not continuous at a D 1.
1.2
Directional Derivative and Derivative
Let us first recall how the derivative of a real-valued function of a real variable is defined. Let A R; let f W A ! R. Suppose A contains a neighborhood of the point a, that is, there is an open ball B.aI r/ such that
B.aI r/ A. We define the derivative of f at a by the equation
f 0 .a/ D lim
t!0
f .a C t /
t
f .a/
;
(1.1)
provided the limit exists. In this case, we say that f is differentiable at a.
Geometrically, f 0 .a/ is the slope of the tangent line to the graph of f at the
point .a; f .a//.
Definition 1.20
For a function f W .a; b/ ! R, and point x0 2 .a; b/, if
lim
t "0
f .x0 C t /
t
f .x0 /
exists and is finite, we denote this limit by f 0 .x0 / and call it the left-hand
derivative of f at x0 . Similarly, we define fC0 .x0 / and call it the right-hand
derivative of g at x0 . Of course, f is differentiable at x0 iff it has left-hand
and right-hand derivatives at x0 that are equal.
Now let A Rm , where m > 1; let f W A ! Rn . Can we define the
derivative of f by replacing a and t in the definition just given by points of
Rm ? Certainly we cannot since we cannot divide a point of Rn by a point of
Rm if m > 1.
Directional Derivative
The following is our first attempt at a definition of “derivative”. Let A Rm
and let f W A ! Rn . We study how f changes as we move from a point
a 2 AB (the interior of A) along a line segment to a nearby point a C u,
where u ¤ 0. Each point on the segment can be expressed as a C tu, where
t 2 R. The vector u describes the direction of the line segment. Since a is
an interior point of A, the line segment joining a to a C t u will lie in A if t is
small enough.
1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE
9
Definition 1.21 (Directional Derivative) Let A Rm ; let f W A ! Rn . Suppose A contains a neighborhood of a. Given u 2 Rm with u ¤ 0, define
f 0 .aI u/ D lim
t!0
f .a C tu/
t
f .a/
;
provided the limit exists. This limit is called the directional derivative of f
at a with respect to the vector u.
Remark 1.22 In calculus, one usually requires u to be a unit vector, i.e.,
kuk D 1, but that is not necessary.
Example 1.23 Let f W R2 ! R be given by the equation f .x1 ; x2 / D x1 x2 .
The directional derivative of f at a D .a1 ; a2 / with respect to the vector
u D .u1 ; u2 / is
f 0 .aI u/ D lim
t !0
f .a C tu/
t
f .a/
.a1 C t u1 /.a2 C t u2 /
t
D u2 a1 C u1 a2 :
D lim
a1 a2
t !0
Example 1.24 Suppose the directional derivative of f at a with respect to
u exists. Then for c 2 R,
f .a C t cu/
t!0
t
f 0 .aI cu/ D lim
f .a/
f .a C t cu/ f .a/
t!0
tc
f .a C su/ f .a/
D c lim
s!0
s
0
D cf .aI u/:
D c lim
Remark 1.25 Example 1.24 shows that if u and v are collinear vectors in
Rm , then f 0 .aI u/ and f 0 .aI v/ are collinear in Rn .
However, directional derivative is not the appropriate generalization of
the notion of “derivative”. The main problems are:
Problem 1. Continuity does not follow from this definition of “differentiability”. There exists functions such that f 0 .aI u/ exists for all u ¤ 0 but
are not continuous.
Example 1.26
Define f W R2 ! R by setting
€
f .x; y/ D
0
if .x; y/ D .0; 0/
x2y
x4 C y2
if .x; y/ ¤ .0; 0/:
CHAPTER 1
10
MULTIVARIABLE CALCULUS
We show that all directional derivatives of f exist at 0, but that f is not
continuous at 0. Let u D .h; k/ ¤ 0. Then
f .0 C t u/
t
f .0/
h2 k
.t h/2 .t k/
D 2 4
D
;
t h C k2
.t h/4 C .t k/2 t
˚
so that
f 0 .0I u/ D
h2 =k
if k ¤ 0;
0
if k D 0:
However, the function f takes the value 1=2 as each point of the parabola
y D x 2 (except at 0), so f is not continuous at 0 since f .0/ D 0.
Problem 2.
tiable.
Composites of “differentiable” functions may not differen-
Derivative
To give the right generalization of the notion of “derivative”, let us reconsider (1.1). In fact, if f 0 .a/ exists, let Ra .t / denote the difference
˚
Ra .t / ´
f .aCt / f .a/
t
0
f 0 .a/ if t ¤ 0
if t D 0:
(1.2)
From (1.2) we see that lim t!0 Ra .t / D 0. Then we have
f .a C t / D f .a/ C f 0 .a/t C Ra .t /t:
(1.3)
Note that (1.3) also holds for t D 0. This is called the first-order Taylor
formula for approximating f .a C t / f .a/ by f 0 .a/t . The error committed
is Ra .t /t .3 See Figure 1.4. It is this idea leads to the following definition:
Definition 1.27 (Differentiability) Let A Rm ; let f W A ! Rn . Suppose A
contains a neighborhood of a. We say that f is differentiable at a if there is
an n m matrix Ba such that
f .a C h/ D f .a/ C Ba h C khk Ra .h/;
where limh!0 Ra .h/ D 0. The matrix Ba , which is unique, is called the
derivative of f at a; it is denoted Df .a/.
3 “All
science is dominated by the idea of approximation.”—Bertrand Russell.
1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE
11
a/
f.
a
f.
C
t/
f .a/
0 .a/
t
t
f .x /
a
0
f
x
Figure 1.4: f 0 .a/t is the linear approximation to f .a C t /
a
f .a/ at a.
f .a/
f
Df .a/
0
Figure 1.5: Df .a/ is the linear part of f at a.
Remark 1.28 [1] Notice that h is a point of Rm and f .a C h/
is a point of Rn , so the norm signs are essential.
x
f .a/ Ba h
[2] The derivative Df .a/ depends on the point a as well as the function f .
We are not saying that there exists a B which works for all a, but that
for a fixed a such a B exists.
[3] Here is how to visualize Df . Take m D n D 2. The function f W A ! R2
distorts shapes nonlinearly; its derivative describes the linear part of
the distortion. Circles are sent by f to wobbly ovals, but they become
ellipses under Df .a/ (here we treat Df .a/ as the matrix that represents
a linear operator; see, e.g., Axler 1997.) Lines are sent by f to curves,
but they become straight lines under Df .a/. See Figure 1.5.
Example 1.29
Let f W Rm ! Rn be defined by the equation
f .x/ D A x C b;
CHAPTER 1
12
MULTIVARIABLE CALCULUS
where A is an n m matrix, and a 2 Rn . Then
f .a C h/ D A .a C h/ C b D A a C b C A h
D f .a/ C A h:
Hence, Ra .h/ D Œf .a C h/
f .a/
A h=khk D 0; that is, Df .a/ D A.
We now show that the definition of derivative is stronger than directional derivative. In particular, we have:
Theorem 1.30 Let A Rm ; let f W A ! Rn . If f is differentiable at a, then
f is continuous at a.
Proof. Differentiablity at a implies that
kf .a C h/
f .a/k D Df .a/ h C khk Ra .h/
6 k Df .a/k khk C kRa .h/k khk
! 0;
as a C h ! a, where the inequality follows from the Triangle Inequality and
Cauchy-Schwartz Inequality (Theorem 1.7).
t
u
However, there is a nice connection between directional derivative and
derivative.
Theorem 1.31 Let A Rm ; let f W A ! Rn . If f is differentiable at a, then
all the directional derivatives of f at a exist, and
f 0 .aI u/ D Df .a/ u:
Proof. Fix any u 2 Rm and take h D t u. Then
f .a C t u/
t
Df .a/ .t u/ C kt uk Ra .t u/
t
jt j kuk
D Df .a/ u C
Ra .t u/:
t
The last term converges to zero as t ! 0, which proves that f 0 .aI u/ D
Df .a/ u.
t
u
?
Exercise 1.32
f .a/
€
D
Define f W R2 ! R by setting
f .x; y/ D
0
if .x; y/ D .0; 0/
x2y
x4 C y2
if .x; y/ ¤ .0; 0/:
Show that f is not differentiable at .0; 0/.
?
Exercise 1.33 Let f W R2 ! R be defined by f .x; y/ D
is not differentiable at .0; 0/.
p
jxyj. Show that f
1.3 PARTIAL DERIVATIVES AND THE JACOBIAN
1.3
13
Partial Derivatives and the Jacobian
We now introduce the notion of the “partial derivatives” of a real-valued
function. Let .e 1 ; : : : ; e m / be the standard basis of Rm , i.e.,
e 1 D .1; 0; 0; : : : ; 0/;
e 2 D .0; 1; 0; : : : ; 0/;
:::
e m D .0; 0; : : : ; 0; 1/:
Definition 1.34 (Partial Derivatives) Let A Rm ; let f W A ! R. We define
the j th partial derivative of f at a to be the directional derivative of f at a
with respect to the vector ej , provided this derivative exists; and we denote
it by Dj f .a/. That is,
Dj f .a/ D lim
t!0
f .a C tej /
t
f .a/
:
Remark 1.35 It is important to note that Dj f .a/ is the ordinary derivative
of a certain function; in fact, if g.x/ D f .a1 ; : : : ; aj 1 ; x; aj C1 ; : : : ; am /, then
Dj f .a/ D g 0 .aj /. This means that Dj f .a/ is the slope of the tangent line at
.a; f .a// to the curve obtained by intersecting the graph of f with the plane
xi D ai with i ¤ j . See Figure 1.6.
The following theorem relates partial derivatives to the derivative in the
case where f is a real-valued function.
Theorem 1.36
Let A Rm ; let f W A ! R. If f is differentiable at a, then
h
Df .a/ D D1 f .a/
D2 f .a/
i
Dm f .a/ :
Proof. If f is differentiable at a, then Df .a/ is a .1 m/-matrix. Let
h
Df .a/ D 1
2
i
m :
It follows from Theorem 1.31 that
Dj f .a/ D f 0 .aI ej / D Df .a/ ej D j :
1.4
The Chain Rule
We extend the familiar chain rule to the current setting.
t
u
CHAPTER 1
14
MULTIVARIABLE CALCULUS
x2
.a; b/
x1
Figure 1.6: D1 f .a; b/.
Theorem 1.37 (Chain Rule) Let A Rm ; let B Rn . Let f W A ! Rn and
g W B ! Rp , with f .A/ B . Suppose f .a/ D b. If f is differentiable at a,
and if g is differentiable at b, then the composite function g B f W A ! Rp is
differentiable at a. Furthermore,
D.g B f /.a/ D Dg.b/ Df .a/:
Proof. Omitted. See, e.g., Spivak (1965, Theorem 2-2), Rudin (1976, Theorem 9.15), or Munkres (1991, Theorem 7.1).
t
u
Here is an application of the Chain Rule.
Theorem 1.38 Let A Rm ; let f W A ! Rn . Suppose A contains a neighborhood of a. Let fi W A ! R be the i th component function of f , so that
2
3
f1 .x/
6 : 7
: 7
f .x/ D 6
4 : 5:
fn .x/
(a) The function f is differentiable at a if and only if each component function fi is differentiable at a.
1.4 THE CHAIN RULE
15
(b) If f is differentiable at a, then its derivative is the .n m/-matrix whose
i th row is the derivative of the function fi . That is,
2
3 2
Df1 .a/
D1 f1 .a/
6
7 6
::
::
6
7
6
Df .a/ D 4
:
:
5D4
Dfn .a/
D1 fn .a/
::
:
3
Dm f1 .a/
7
::
7:
:
5
Dm fn .a/
Proof. (a) Assume that f is differentiable at a and express the i th component of f as
fi D i B f;
where i W Rn ! R is the projection that sends a vector x D .x1 ; : : : ; xn / to
xi . Notice that we can write i as i .x/ D A x , where A is a 1 n matrix
such that
h
i
AD 0
0
1
0
0 ;
where the number 1 appears at the i th place. Then i is differentiable and
D.x/ D A for all x 2 A (see Example 1.29). By the Chain Rule, fi is
differentiable at a and
Dfi .a/ D D.i B f /.a/ D Di f .a/ Df .a/ D A Df .a/:
(1.4)
Now suppose that each fi is differentiable at a. Let
2
3
Df1 .a/
6
7
::
7:
B´6
:
4
5
Dfn .a/
We show that Df .a/ D B.
2
khk Ra .h/ D f .a C h/
f .a/
f1 .a C h/
f1 .a/
6
::
BhD6
:
4
fn .a C h/ fn .a/
2
3
Ra .hI f1 /
6
7
::
7
D khk 6
:
4
5
Ra .hI fn /;
Df1 h
Dfn h
3
7
7
5
where Ra .hI fi / is the Taylor remainder for fi . It is clear that limh!0 Ra .h/ D
0, and which proves that Df .a/ D B.
CHAPTER 1
16
(b)
MULTIVARIABLE CALCULUS
This claim follows from the previous part and Theorem 1.36.
t
u
Remark 1.39 Theorem 1.38 implies that there is little loss of generality
assuming n D 1, i.e., that our functions are real-valued. Multidimensionality
of the domain, not the range, is what distinguished multivariable calculus
from one-variable calculus.
Definition 1.40 (Jocobian Matrix) Let A Rm ; let f W A ! Rn . If the partial
derivatives of the component functions fi of f exist at a, then one can form
the matrix that has Dj fi .a/ as its entry in row i and column j . This matrix,
denoted by J f .a/, is called the Jacobian matrix of f . That is,
2
D1 f1 .a/
6
::
J f .a/ D 6
:
4
D1 fn .a/
::
:
3
Dm f1 .a/
7
::
7:
:
5
Dm fn .a/
Remark 1.41 [1] The Jacobian encapsulates all the essential information
regarding the linear function that best approximates a differentiable
function at a particular point. For this reason it is the Jacobian which is
usually used in practical calculations with the derivative
[2] If f is differentiable at a, then J f .a/ D Df .a/. However, it is possible for the partial derivatives, and hence the Jacobian matrix, to exist,
without it following that f is differentiable at a (see Exercise 1.32).
1.5
The Implicit Function Theorem
Let U Rk Rn be open. Let f W U ! Rn . Fix a point .a; b/ 2 U and write
f .a; b/ D 0. Our goal is to solve the equation
(1.5)
f .x; y/ D 0
near .a; b/. More precisely, we hope to show that the set of points .x; y/
nearby .a; b/ at which f .x; y/ D 0, the level-set of f through 0, is the graph
of a function y D g.x/. If so, g is the implicit function defined by (1.5). See
Figure 1.7.
Under various hypotheses we will show that g exists, is unique, and is
differentiable. Let us first consider an example.
Example 1.42
Consider the function f W R2 ! R defined by
f .x; y/ D x 2 C y 2
1:
1.5 THE IMPLICIT FUNCTION THEOREM
17
Rn
Lf .0/
b
0
a
Rk
Figure 1.7: Near .a; b/, Lf .0/ is the graph of a function y D g.x/.
If we choose .a; b/ with f .a; b/ D 0 and a ¤ ˙1, there are (Figure 1.8) open
intervals A containing a and B containing b with the following property: if
x 2 A, there is a unique y 2 B with f .x; y/ D 0. We can therefore define a
function g W A ! R by the condition g.x/p
2 B and f .x; g.x// D 0 (if b > 0,
as indicated in Figure 1.8, then g.x/ D 1 x 2 ). For the function f we
are considering there is another number b1 such that f .a; b1 / D 0. There
will also be an interval B1 containing b1 such that, when
p x 2 A, we have
1 x 2 ). Both g and
f .x; g1 .x// D 0 for a unique g1 .x/ 2 B1 (here g1 .x/ D
g1 are differentiable. These functions are said to be defined implicitly by
the equation f .x; y/ D 0.
If we choose a D 1 or 1 it is impossible to find any such function g
defined in an open interval containing a.
Theorem 1.43 (Implicit Function Theorem) Let U RkCn be open; let
f W U ! Rn be of class C r . Write f in the form f .x; y/ for x 2 Rk and
y 2 Rn . Suppose that .a; b/ is a point of U such that f .a; b/ D 0. Let M be
the n n matrix
2
DkC1 f1 .a; b/
6
6 DkC1 f2 .a; b/
MD6
::
6
:
4
DkC1 fn .a; b/
DkC2 f1 .a; b/
DkC2 f1 .a; b/
::
:
DkC2 fn .a; b/
::
:
3
DkCn f1 .a; b/
7
DkCn f2 .a; b/7
7:
::
7
:
5
DkCn fn .a; b/
If det .M/ ¤ 0, then near .a; b/, Lf .0/ is the graph of a unique function
y D g.x/. Besides, g is C r .
Proof. The proof is too long to give here. You can find it from, e.g., Spivak
(1965, Theorem 2-12), Rudin (1976, Theorem 9.28), Munkres (1991, Theorem 2.9.2), or Pugh (2002, Theorem5.22).
t
u
CHAPTER 1
18
MULTIVARIABLE CALCULUS
y
graph of g
B
.a; b/
b
A
a
0
f .x; y/ D 0
B1
x
b1
graph of g1
Figure 1.8: Implicit function theorem.
Example 1.44
Let f W R2 ! R be given by the equation
f .x; y/ D x 2
y3:
Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,
we do not expect to be able to solve this equation for y in terms of x near
.0; 0/. But in fact, we can; and furthermore, the solution is unique! However,
the function we obtain is not differentiable at x D 0. See Figure 1.9.
Example 1.45
Let f W R2 ! R be given by the equation
f .x; y/ D
x4 C y2:
Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0,
we do not expect to be able to solve for y in terms of x near .0; 0/. In fact,
however, we can do so, and we can do so in such a way that the resulting function is differentiable. However, the solution is not unique. See
Figure 1.10.
Now the point .1; 1/ is also a solution to f .x; y/ D 0. Because @f .1; 1/=@y D
2, one can solve this equation for y as a continuous function of x in a neighborhood of x D 1. See Figure 1.10.
1.6 GRADIENT AND ITS PROPERTIES
19
y
x
0
Figure 1.9: y is not differentiable at x D 0.
y
1
−2
0
−1
1
x
2
−1
Figure 1.10: Example 1.45.
Remark 1.46 We will use the Implicit Function Theorem in Theorem 2.12.
The theorem will also be used to derive comparative statics for economic
models, which we perhaps do not have time to discuss.
1.6
Gradient and Its Properties
In this section we investigate the significance of the gradient vector, which
is defined as follows:
Definition 1.47 (Gradient) Let A Rm ; let f W A ! R. Suppose A contains
a neighborhood of a. The gradient of f , denoted by Of .a/, is defined by
h
Of .a/ ´ D1 f .a/
D2 f .a/
i
Dm f .a/ :
Remark 1.48 [1] It follows from Theorem 1.36 that if f is differentiable
at a, then Of .a/ D Df .a/. The inverse does not hold; see Remark 1.41[2].
CHAPTER 1
20
MULTIVARIABLE CALCULUS
x2
y
Of .1; 3/ D .2; 6/
Of .x/
.1; 3/
Tan
ge
nt p
lane
x
slope D
x
0
Lf .10/
Lf .c /
D1 f .x/
D2 f .x/
x1
0
(a)
(b)
Figure 1.11: The geometric interpretation of Of .x/.
[2] With the notation of gradient, we can write the Jocobian of f D .f1 ; : : : ; fn /
as
2
3
6
J f .a/ D 6
4
Of1 .a/
:: 7
7
: 5:
Ofn .a/
Let f W Rm ! Rn with f D .f1 ; : : : ; fn /. Recall that the level set of f
through 0 is given by
n
\
˚
Lfi .0/:
Lf .0/ ´ x 2 Rm W f .x/ D 0 D
(1.6)
i D1
Given a point a 2 Lf .0/, it is intuitively clear what it means for a plane to be
tangent to Lf .0/ at a. Figure 1.12 shows some examples of tangent planes.
A formal definition of tangent plane will be given later. In this section we
show that the gradient Of .a/ is orthogonal to the tangent plane of Lf .0/
at a and, under some conditions, the tangent plane of Lf .0/ at a can be
characterized by the vectors that are orthogonal to Of .a/. We begin with
the simplest case that f W R2 ! R. Here is an example.
Example 1.49
Let f .x; y/ D x 2 C y 2 . The level set Lf .10/ is given by
n
o
Lf .10/ D .x; y/ 2 R2 W x 2 C y 2 D 10 :
Calculus yields
ˇ
d y ˇˇ
D
dx ˇalong Lf .10/ at .1;3/
1
:
3
1.6 GRADIENT AND ITS PROPERTIES
21
Of .a/
Tangent plane
a
Lf .0/
(a) f W R2 ! R.
Of .a/
Tangent plane
a
Lf .0/
(b) f W R3 ! R.
Of1 .a/
Tangent plane
a
Of2 .a/
Lf1 .0/
Lf2 .0/
(c) f1 ; f2 W R3 ! R.
Figure 1.12: Examples of tangent planes.
CHAPTER 1
22
MULTIVARIABLE CALCULUS
Hence, the tangent plane at .1; 3/ is given by y D 3 .x
1/=3. Since
Of .1; 3/ D .2; 6/, the result follows immediately; see Figure 1.11(a).
The result in Example 1.49 can be explained as follows. If we change x1
and x2 , and are to remain on Lf .0/, then dx1 and dx2 must be such as to
leave the value of f unchanged at 0. They must therefore satisfy
f 0 .xI . dx1 ; dx2 // D D1 f .x/ dx1 C D2 f .x/ dx2 D 0:
(1.7)
By solving (1.7) for dx2 = dx1 , the slope of the level set through x will be (see
Figure 1.11(b))
d x2
D1 f .x/
D
:
dx1
D2 f .x/
Since the slope of the vector Of .x/ D . D1 f .x/; D2 f .x// is D2 f .x/= D1 f .x/,
we obtain the desired result.
We then present the general result. For simplicity, we assume throughout this section that each fi 2 C 1 . For Lf .0/ defined in (1.6), obtaining an
explicit representation for the tangent plane is a fundamental problem that
we now address. First we define curves on Lf .0/ and the tangent plane at
some point x 2 Rm . You may want to refer some Differential Geometry textbooks, e.g., O’neill (2006), Spivak (1999), or Lee (2009), for understanding
some of the following concepts better.
One can picture a curve in Rm as a trip taken by a moving point c . At
each “time” t in some interval Œa; b R, c is located at the point
c.t / D c1 .t /; : : : ; cm .t / 2 Rm :
In rigorous terms then, c is a function from Œa; b to Rm , and the component functions c1 ; : : : ; cm are its Euclidean coordinate functions. We define
the function c to be differentiable provided its component functions are
differentiable.
Example 1.50
A helix c W R ! R3 is obtained through the formula
c.t / D .a cos t; a sin t; bt/ ;
where a; b > 0. See Figure 1.13.
Definition 1.51 A curve on Lf .0/ is a continuous curve c W Œa; b ! Lf .0/.
A curve c.t / is said to pass through the point a 2 Lf .0/ if a D c.t / for some
t 2 Œa; b.
1.6 GRADIENT AND ITS PROPERTIES
23
Figure 1.13: The Helix.
Definition 1.52 The tangent plane at a 2 Lf .0/, denoted Tf .a/, is defined
as the collection of the derivatives at a of all differentiable curves on Lf .0/
passing through a.
Ideally, we would like to express the tangent plane defined in Definition 1.52 in terms of derivatives of functions fi that defines the surface
Lf .0/ (see Example 1.49). We introduce the subspace
˚
M ´ x 2 Rm W Df .a/ x D 0
(1.8)
and investigate under what conditions M is equal to the tangent plane at a.
The following result shows that Ofi .a/ is orthogonal to the tangent plane
Tf .a/ for all a 2 Lf .0/.
Theorem 1.53 For each a 2 Lf .0/, the gradient Ofi .a/ is orthogonal to the
tangent plane Tf .a/.
Proof. We establish this result by showing Tf .a/ M for each a 2 Lf .0/.
Every curve c.t / passing through a at t D t satisfies f .x.t// D 0, and so
Df c.t / Dc.t / D 0:
That is, Dc.t / 2 M .
t
u
Definition 1.54 A point a 2 Lf .0/ is said to be a regular point if the
gradient vectors .Of1 .a/; : : : ; Ofn .a// are linearly independent.
In general, at regular points it is possible to characterize the tangent
plane in terms of Of1 .a/; : : : ; Ofn .a/.
CHAPTER 1
24
Theorem 1.55
equal to M
MULTIVARIABLE CALCULUS
At a regular point a of Lf .0/, the tangent plane Tf .a/ is
Proof. We show that M Tf .a/. Combining this result with Theorem 1.53,
we have Tf .a/ D M .
To show M Tf .a/, we must show that if x 2 M then there exists a
curve on Lf .a/ passing through a with derivative x . To construct such a
curve we consider the equations
f a C tx C Df .a/T u.t / D 0;
(1.9)
where for fixed t we consider u.t / 2 Rn to be the unknown. This is a nonlinear system of n equations and n unknowns, parametrized continuously by
t . At t D 0 there is a solution u.0/ D 0. The Jacobian matrix of the system
with respect to u at t D 0 is the n n matrix
Df .a/ Df .a/T ;
which is nonsingular, since Df .a/ is of full rank if a is a regular point. Thus,
by the Implicit Function Theorem (Theorem 1.43) there is a continuously
differentiable solution u.t / in some region t 2 Œ a; a.
The curve
c.t / ´ a C t x C Df .a/T u.t /
is thus a curve on Lf .0/. By differentiating the system (1.9) with respect to
t at t D 0 we obtain
h
i
Df .a/ x C Df .a/T Du.0/ D 0:
(1.10)
By definition of x we have Df .a/x D 0 and thus, again since Df .a/ Df .a/T
is nonsingular, we conclude from (1.10) that
h
i
Du.0/ D Df .a/ Df .a/T
1
0 D 0:
Therefore,
Dc.0/ D x C Df .a/T Du.0/ D x;
and the constructed curve has derivative x at a.
t
u
Example 1.56 [1] In R2 let f .x1 ; x2 / D x1 . Then Lf .0/ is the x2 -axis, and
every point on that asis is regular since Of .0; x2 / D .1; 0/. In this case,
Tf ..x1 ; 0// D M , and which is the x2 -axis.
1.7 CONTINUOUSLY DIFFERENTIABLE FUNCTIONS
25
[2] In R2 let f .x1 ; x2 / D x12 . Again, Lf .0/ is the x2 -axis, but now no point
on the x2 -axis is regular: Of .0; x2 / D .0; 0/. Indeed in this case M D R2 ,
while the tangent plane is the x2 -axis.
We close this section by providing another property of gradient vectors.
Let f W Rn ! R be a differentiable function and a 2 Rn , where Of .a/ ¤ 0.
Suppose that we want to determine the direction in which f increases most
rapidly at a. By a “direction” here we mean a unit vector u. Let u denote
the angle between u and Of .a/. Then
f 0 .aI u/ D Of .a/ u D kOf .a/k cos u :
But cos u attains its maximum value of 1 when u D 0, that is, when u
and Of .a/ are collinear and point in the same direction. We conclude that
kOf .a/k is the maximum value of f 0 .aI u/ for u a unit vector, and that this
maximum value is attained with u D Of .a/= kOf .a/k.
1.7
Continuously Differentiable Functions
We know that mere existence of the partial derivatives does not imply differentiability (see Exercise 1.32). If, however, we impose the additional condition that these partial derivatives are continuous, then differentiability is
assured.
Theorem 1.57 Let A be open in Rm . Suppose that the partial derivatives
Dj fi .x/ of the component functions of f exist at each point x 2 A and are
continuous on A. Then f is differentiable at each point of A.
A function satisfying the hypotheses of this theorem is often said to be
continuously differentiable, or of class C 1 , on A.
Proof of Theorem 1.57. It suffices to show that each component function
of f is differentiable by Theorem 1.38. Therefore we may restrict ourselves
to the case of a real-valued function f W A ! R. Let a 2 A. We claim that
Df .a/ D Of .a/ when f 2 C 1 .
Recall that .e 1 ; : : : ; e m / is the standard basis of Rm . Then every h D
P
.h1 ; : : : ; hm / 2 Rm can be written as h D m
iD1 hi e i . For each i D 1; : : : ; m, let
pi ´ a C
i
X
hk e k D p i
1
C hi e i ;
kD1
where p 0 ´ a. Figure 1.14 illustrates the case where m D 3 and all hi are
positive. For each i D 1; : : : ; m, we also define a function i W Œ0; 1 ! Rm by
CHAPTER 1
26
MULTIVARIABLE CALCULUS
x3
p3 D a C h
p0 D a
x2
p1
p2
x1
Figure 1.14: The segmented path from a to a C h.
letting
i .t / D pi
1
C t hi e i :
So, i is a segment from p i 1 to p i .
By the one-dimensional chain rule and mean value theorem applied to
the differentiable real-valued function g W Œ0; 1 ! R defined by
g.t / D .f B i /.t /;
there exists tNi 2 .0; 1/ such that
f .pi /
f .pi
1/
D g.1/
0
g.0/
D g tNi
d f hi1 1 ; hii
D
ˇ
ˇ
C t hi ; hiiC11 ; : : : ; him 1 ˇ
ˇ
ˇ
dt
t DtNi
1
i 1
1 ; hi
D Di f i tNi hi :
Telescoping f .a C h/
f .a C h/
f .a/
f .a/ along .1 ; : : : ; m / gives
Of .a/ h D
D
m h
X
f pi
i D1
m X
i D1
f pi
Di f i tNi
1
i
Of .a/ h
Di f .a/ hi :
1.8 QUADRATIC FORMS: DEFINITE AND SEMIDEFINITE MATRICES
Continuity of the partials implies that Di f .i .tNi //
0.
27
Di f .a/ ! 0 as khk !
t
u
Remark 1.58 It follows from Theorem 1.57 that sin.xy/ and xy 2 C ze xy
are both differentiable since they are of class C 1 .
Let A Rm and f W A ! Rn . Suppose that the partial derivative Dj fi of
the component functions of f exist on A. These then are functions from A
to R, and we may consider their partial derivatives, which have the form
Dk . Dj fi / µ Dj k fi
and are called the second-order partial derivatives of f . Similarly, one defines the third-order partial derivatives of the functions fi , or more generally the partial derivatives of order r for arbitrary r .
Definition 1.59 If the partial derivatives of the function fi of order less
than or equal to r are continuous on A, we say f is of class C r on A. We
say f is of class C 1 on A if the partials of the functions fi of all orders are
continuous on A.
Definition 1.60 (Hessian) Let a 2 A Rm ; let f W A ! R be twicedifferentiable at a. The m m matrix representing the second derivative
of f is called the Hessian of f , denoted Hf .a/:
2
D11 f .a/
6
6 D21 f .a/
Hf .a/ D 6
::
6
:
4
Dm1 f .a/
D12 f .a/
D22 f .a/
::
:
Dm2 f .a/
::
:
3
D1m f .a/
7
D2m f .a/ 7
7 D D.Of /:
::
7
:
5
Dmm f .a/
Remark 1.61 If f W A ! R is of class C 2 , then the Hessian of f is a
symmetric matrix, i.e., Dij f .a/ D Dj i f .a/ for all i; j D 1; : : : ; m and for all
a 2 A. See Rudin (1976, Corollary to Theorem 9.41, p. 236).
?
Exercise 1.62
Find the Hessian of the Cobb-Douglas function
f .x; y/ D x ˛ y ˇ :
1.8
Quadratic Forms: Definite and Semidefinite
Matrices
Definition 1.63 (Quadratic Form) Let A be a symmetric n n matrix. A
quadratic form on Rn is a function QA W Rn ! R of the form
QA .x/ D x Ax D
n X
n
X
i D1 j D1
aij xi xj :
CHAPTER 1
28
MULTIVARIABLE CALCULUS
Since the quadratic form QA is completely specified by the matrix A,
we henceforth refer to A itself as the quadratic form. Observe that if f
is of class C 2 , then the Hessian Hf of f defines a quadratic form; see
Remark 1.61.
Definition 1.64
A quadratic form A is said to be
positive definite if we have x Ax > 0 for all x 2 Rn X f0g;
positive semidefinite if we have x Ax > 0 for all x 2 Rn ;
negative definite if we have x Ax < 0 for all x 2 Rn X f0g;
negative semidefinite if we have x Ax 6 0 for all x 2 Rn .
1.9
Homogeneous Functions and Euler’s Formula
Definition 1.65 (Homogeneous Function) A function f W Rn ! R is homogeneous of degree r (for r D : : : ; 1; 0; 1; : : :) if for every t > 0 we have
f .tx1 ; : : : ; txn / D t r f .x1 ; : : : ; xn /:
?
Exercise 1.66
The function
f .x; y/ D Ax ˛ y ˇ ;
A; ˛; ˇ > 0;
is known as the Cobb-Douglas function. Check whether this function is
homogeneous.
Theorem 1.67 (Euler’s Formula) Suppose that f W Rn ! R is homogeneous
of degree r (for some r D : : : ; 1; 0; 1; : : :) and differentiable. Then at any
x 2 Rn we have
Of .x / x D rf .x /:
Proof. By definition we have
f .tx /
t r f .x / D 0:
Differentiating with respect to t using the chain rule, we have
Of .t x / x D rt r
Evaluating at t D 1 gives the desired result.
1
f .x /:
t
u
Lemma 1.68 If f is homogeneous of degree r , its partial derivatives are
homogeneous of degree r 1.
1.9 HOMOGENEOUS FUNCTIONS AND EULER’S FORMULA
?
?
Exercise 1.69
Prove Lemma 1.68.
Exercise 1.70 Let f .x; y/ D Ax ˛ y ˇ with ˛ C ˇ D 1 and A > 0. Show that
Theorem 1.67 and Lemma 1.68 hold for this function.
29
2
OPTIMIZATION IN RN
This chapter is based on Luenberger (1969, Chapters 8 & 9), Mas-Colell,
Whinston and Green (1995, Sections M.J & M.K), Sundaram (1996), Vohra
(2005), Luenberger and Ye (2008, Chapter 11), Duggan (2010), and Jehle
and Reny (2011, Chapter A2).
2.1
Introduction
An optimization problem in Rn , or simply an optimization problem, is one
where the values of a given function f W Rn ! R are to be maximized or
minimized over a given set X Rn . The function f is called the objective
function and the set X the constraint set. Notationally, we will represent
these problems by
Maximize f .x/
subject to x 2 X;
Minimize f .x/
subject to x 2 X;
and
respectively. More compactly, we shall also write
max ff .x/ W x 2 Xg
and
min ff .x/ W x 2 Xg :
Example 2.1 [a] Let X D Œ0; 1/ and f .x/ D x . Then the problem maxff .x/ W
x 2 X g has no solution; see Figure 2.1(a).
[b] Let X D Œ0; 1 and f .x/ D x.1 x/. Then the problem maxff .x/ W x 2 X g
has exactly one solution, namely x D 1=2; see Figure 2.1(b).
OPTIMIZATION IN RN
CHAPTER 2
x
32
x2
x/
f .x
/D
f
.x
/D
f .x/ D x.1
x
0
(a) No solution.
1x
0
(b) Exactly one solution.
−1
1 x
0
(c) Two solutions.
Figure 2.1: Example 2.1.
[c] Let X D Œ 1; 1 and f .x/ D x 2 . Then the problem maxff .x/ W x 2 Xg
has two solutions, namely x D 1 and x D 1; see Figure 2.1(c).
Example 2.1 suggests that we shall talk of the set of solutions of the
optimization problem, which is denoted
˚
˚
argmax ff .x/ W x 2 Xg D x 2 X W f .x/ > f .y/ for all y 2 X ;
and
argmin ff .x/ W x 2 X g D x 2 X W f .x/ 6 f .y/ for all y 2 X :
We close this section by considering an optimization problem in economics.
Example 2.2 There are n commodities in an economy. There is a consumer
whose utility from consuming xi > 0 units of commodity i (i D 1; : : : ; n) is
given by u.x1 ; : : : ; xn /, where u W RnC ! R is the consumer’s utility function.
The consumer’s income is I > 0, and faces the price vector p D .p1 ; : : : ; pn /.
His budget set is given by (see Figure 2.2)
˚
B.p; I / ´ x 2 RnC W p x 6 I :
The consumer’s objective is to maximize his utility over the budget set, i.e.,
Maximize u.x/ subject to x 2 B.p; I /:
Here we introduce an important fact we will make frequent use of.
2.2 UNCONSTRAINED OPTIMIZATION
33
x2
I =px2
B.p; I /
I =px1 x1
0
Figure 2.2: The budget set B.px1 ; px2 ; I /.
Let f W Rn ! R be differentiable. Then for all h 2 Rn satisfying
h Of .x/ > 0, we have
Lemma 2.3
f .x C "h/ > f .x/ for all " > 0 sufficiently small.
Proof. We approximate f by its Taylor series expansion (see Definition 1.27):
f .x C "h/ D f .x/ C "h Of .x/ C k"hk Rx ."h/;
where lim"!0 Rx ."h/ D 0. Then
lim
"#0
f .x C "h/
"
f .x/
D
h Of .x/
h Of .x/
C khk lim Rx ."h/ D
> 0I
"#0
khk
khk
therefore, there is "N > 0 such that for all " 2 .0; "N/,
f .x C "h/
"
i.e., f .x C "/
2.2
f .x/
> 0;
f .x/ > 0 for all " 2 .0; "N/.
t
u
Unconstrained Optimization
Definition 2.4 (Maximum)
is a maximum of f if
Given X Rn , f W X ! R and x 2 X , we say x
f .x/ D max ff .y/ W y 2 Xg :
We say x is a local maximum of f if there is some " > 0 such that for all
y 2 X \ B.xI "/ we have f .x/ > f .y/. And x is a strict local maximum of f
if the latter inequality holds strictly.
CHAPTER 2
34
2.2.1
OPTIMIZATION IN RN
First-Order Necessary Conditions
Recall that X B is the interior of X Rn (Definition 1.13), and f 0 .xI u/ is the
directional derivative of f at x with respect to u (Definition 1.21).
Theorem 2.5 Let X Rn and x 2 X B ; let f W X ! R be differentiable at x .
If x is a local maximum of f , then Of .x/ D 0.
Proof. Suppose that Of .x/ ¤ 0. Let h D Of .x/ (we can do this since x 2
X B ). Then h Of .x/ > 0. Hence f .x C "h/ > f .x/ for all " > 0 sufficiently
small by Lemma 2.3. This contradicts local optimality of x .
t
u
Definition 2.6 (Critical Point)
called a critical point.
Example 2.7
condition is
A vector x 2 Rn such that Of .x/ D 0 is
Let X D R2 and f .x; y/ D xy
Of .x; y/ D .y
8x 3 ; x
2x 4
y 2 . The first order
2y/ D .0; 0/:
Thus, the critical points are .x; y/ D .0; 0/; .1=4; 1=8/; . 1=4; 1=8/.
2.2.2
Second-Order Sufficient Conditions
The first-order conditions for unconstrained local optima do not distinguish between maxima and minima (see the following Example 2.9). To
obtain such a distinction in the behavior of f at an optimum, we need to
examine the behavior of the Hessian Hf of f (see Definition 1.60).
Theorem 2.8
Suppose f is of class C 2 on X Rn , and x 2 X B .
(a) If f has a local maximum at x , then Hf .x/ is negative semidefinite.
(b) If f has a local minimum at x , then Hf .x/ is positive semidefinite.
(c) If Of .x/ D 0 and Hf .x/ is negative definite at some x , then x is a strict
local maximum of f on X .
(d) If Of .x/ D 0 and Hf .x/ is positive definite at some x , then x is a strict
local minimum of f on X .
Proof. See Sundaram (1996, Section 4.6).
t
u
Example 2.9 Let f W R ! R be defined by f .x/ D 2x 3 3x 2 . It is easy to
check that f 2 C 2 on R and there are two critical points: x D 0 and x D 1.
Invoking the second-order conditions, we get f 00 .0/ D 6 and f 00 .1/ D 6.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD
2x 3
3x 2
35
0
−1
1
x
Figure 2.3: x D 0 and x D 1 are local optima but not global optima.
Thus, the point x D 0 is a strict local maximum of f on R, and the point
x D 1 is a strict local minimizer of f on R; see Figure 2.3.
However, there is nothing in the first- or second-order conditions that
will help determine whether these points are global optima. In fact, they are
not: global optima do not exist in this example, since limx!C1 f .x/ D C1
and limx! 1 f .x/ D 1.
2.3
Equality Constrained Optimization: Lagrange’s
Method
In this section we consider problems with equality constraints of the form
max
f .x/
s:t: gi .x/ D 0;
i D 1; : : : ; k 6 n:
(ECP)
We assume that f W Rn ! R, gi W Rn ! R, i D 1; : : : ; k , are continuously
differentiable functions. For notational convenience, we introduce the constraint function g W Rn ! Rk , where
g D .g1 ; : : : ; gk /:
We can then write the constraints in the more compact form g.x/ D 0. Also
recall that a point x such that gi .x / D 0 for all i D 1; : : : ; k is called a
regular point if the gradient vectors
Og1 .x /; : : : ; Ogk .x /
CHAPTER 2
36
OPTIMIZATION IN RN
are linearly independent (see Definition 1.54).
2.3.1
First-Order Necessary Conditions
Since the representation of the tangent plane is known from Theorem 1.55,
it is fairly simple to derive the necessary and sufficient conditions for a
point to be a local extremum point subject to equality constraints.
Lemma 2.10 Let x be a regular point of the constrains g.x/ D 0 and a
local maximum of f subject to these constraints. Then all x 2 Rn satisfying
Dg.x / x D 0
(2.1)
Of .x / x D 0:
(2.2)
must also satisfy
Proof. Take an arbitrary point x 2 M ´ fy 2 Rn W Dg.x / y D 0g. Since
x is regular, it follows from Theorem 1.55 that the tangent plane Tg .x /
coincides with M . Therefore, there exists a differentiable curve c W Œa; b !
Rn on Lg .x / passing through x with c.t / D x and Dc.t / D x for some
t 2 Œa; b.
Since x is a constrained local maximium of f , we have
ˇ
d f c.t / ˇˇ
dt
ˇ
D 0;
tDt or equivalently,
Of .x / x D 0:
t
u
Lemma 2.10 says that Of .x / is orthogonal to the tangent plane Tg .x /.
Here is an example.
Example 2.11
Consider the following problem
max
f .x1 ; x2 / D
s:t: g.x1 ; x2 / D
x1 =2 C x2 =2
x12
C x22 D 2:
At the local maximum x D . 1; 1/, the gradient Of .x / is orthogonal to
the tangent plane of the constraint surface. See Figure 2.4.
We now conclude that Lemma 2.10 implies that Of .x / is a linear combination of the gradients of g at x .
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD
37
x2
L
f
.
1/
Og.x / D . 2; 2/
x D . 1; 1/
Of .x / D .
x1
L
g.
0/
0
1 1
2; 2/
Tangent plane
Figure 2.4: Example 2.11.
Theorem 2.12 (Lagrange’s Theorem) Let x be a local maximum of f subject to g.x/ D 0, and assume that x is a regular point of these constraints.
Then, there exists a unique vector D .1 ; : : : ; k / such that
Of .x / D
k
X
i Ogi .x /:
i D1
Proof. Since x is a regular point of the constraints, the vectors Og1 .x /,
: : :, Ogk .x / consists of a basis of Rn if k D n, and in this case it is obvious
to see that the theorem holds. So we assume that k < n.
Let U be the subspace spanned by Og1 .x /; : : : ; Ogk .x /. Then, Rn can
be represented as the direct sum of U and U ? , the orthogonal complement
of U , i.e., Rn D U ˚ U ? . It follows from Theorem 1.55 that U ? D M .
Therefore, dim.Rn / D dim.U / C dim.M /, or equivalently,
dim.M / D n
(2.3)
k:
Let V be the subspace spanned by Og1 .x /; : : : ; Ogk .x /; Of .x /. SupPk
pose there does not exist 1 ; : : : ; k such that Of .x / D
iD1 k Ogi .x /.
Then Og1 .x /; : : : ; Ogk .x /; Of .x / are linearly independent, and so dim.V / D
k C 1. However, Lemma 2.10 says that the
M V ?;
and which means that We then have
dim.M / 6 dim.V ? / D n
dim.V / D n
k C 1;
(2.4)
CHAPTER 2
38
OPTIMIZATION IN RN
which contradicts (2.3). We thus conclude that Of .x / 2 U , i.e., there exists
a unique vector D .1 ; : : : ; k / such that
Of .x / D
k
X
i Ogi .x /:
t
u
i D1
Example 2.13
Consider the problem
max
x1 x2 C x2 x3 C x1 x3
s.t.
x1 C x2 C x3 D 3:
The necessary conditions become
x2 C x3 D x1 C x3 D x1 C x2 D :
These three equations together with the one constraint equation give four
equations that can be solved for the four unknowns x1 , x2 , x3 , . Solution
yields x1 D x2 D x3 D 1, D 2.
The Regularity If x is not a regular point, then Theorem 2.12 can fail.
Consider the following example:
Example 2.14 Let f .x/ D .x C 1/2 and g.x/ D x 2 . Consider the problem
of maximizing f subject to g.x/ D 0. The maximum is clearly x D 0. But
Dg.0/ D 0 and Df .0/ D 2, so there is no such that Df .0/ D Dg.0/.
The Lagrange Multipliers The vector D .1 ; : : : ; k / in Theorem 2.12
is called the vector of Lagrange multipliers corresponding to the local optimum x . The i th multiplier i measures the sensitivity of the value of the
objective function at x to a small relaxation of the i th constraint gi .
To clarify the notion of “relaxation of a constraint”, let us consider the
following 1-constraint case:
max f .x/
s.t. g.x/
Q
c D 0;
where c 2 R. Then a relaxation of the constraint may be thought of as an
increase in c .
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD
39
Suppose that for each c 2 R there exists a global maximum x .c/. Suppose further that x .c/ is a regular point for each c , so there exists .c/ 2 R
such that
Of x .c/ D .c/ OgQ i x .c/ :
Suppose further that x .c/ is differentiable with respect to c . Then
d f x .c/
dc
Since g.x
Q .c//
D Of x .c/ Dx .c/
h
i
D .c/ OgQ x .c/ Dx .c/ :
c D 0, we also have OgQ x .c/ Dx .c/ D 1. Therefore,
d f x .c/
dc
D .c/:
In sum, the lagrange multiplier .c/ tells us that a small relaxation in
the constraint will raise the maximized value of the objective function by
.c/. For this reason, .c/ is also called the shadow price of the constraint.
Lagrange’s Method
to solve (ECP).
We now describe a procedure for using Theorem 2.12
Step 1 Set up a function L W Rn Rk ! R, called the Lagrangian, defined
by
L.x; / D f .x/
k
X
i gi .x/:
iD1
k
The vector D .1 ; : : : ; k / 2 R is called the vector of Lagrange multipliers.
Step 2 Find all critical points of L.x; /:
@L
.x; / D Di f .x/
@xi
@L
.x; / D gj .x/ D 0;
@j
k
X
` Di g` .x/ D 0;
i D 1; : : : ; n
(2.5)
`D1
j D 1; : : : ; k:
Define
n
o
Y ´ .x; / 2 RnCk W .x; / satisfies (2.5) and (2.6) :
(2.6)
CHAPTER 2
40
OPTIMIZATION IN RN
Step 3 Evaluate f at each point x in the set
˚
x 2 Rn W 9 such that .x; / 2 Y :
Thus, we see that Lagrange’s method is a clever way of converting a
maximization problem with constraints, to another maximization problem
without constraint, by increasing the number of variables.
Why the Lagrange’s Method typically succeeds in identifying the desired
optima? This is because the set of all critical points of L contains the set of
all local maximums and minimums of (ECP) when the regularity condition
is met. That is, if x is a local maximum or minimum of f subject to
g.x / D 0, and if x is regular, then there exists such that .x ; / is a
critical point of L.
We are not going to explain why the Lagrange’s method could fail (but
see Sundaram 1996, Section 5.4 for details).
Example 2.15
Consider the problem
x2
max
s.t. x C y
y2
1 D 0:
First, form the Lagrangian,
L.x; y; / D
x2
y2
.x C y
1/:
Then set all of its first-order partials equal to zero:
@L
D 2x D 0
@x
@L
D 2y D 0
@y
@L
D x C y 1 D 0:
@
So the critical points of L is
.x ; y ; / D .1=2; 1=2; 1/:
Hence, f .x ; y / D
?
1=2; see Figure 2.5.
Exercise 2.16 A consumer purchases a bundle .x; y/ to maximize utility.
His income is I > 0 and prices are px > 0 and py > 0. His utility function is
u.x; y/ D x a y b ;
where a; b > 0. Find his optimal choice .x ; y /.
2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD
41
y
2/
x
L
f
.
L
f
1=
.
L
f
1/
.
2/
Lg .0/
x
0
Figure 2.5: Lagrange’s method.
Lagrange’s Theorem Is Not Sufficient Lagrange’s Theorem (Theorem 2.12)
only gives us a necessary—not a sufficient—condition for a constrained local maximum. To see why the first order condition is not generally sufficient, consider the following example.
Example 2.17
Consider the problem
max x C y 2
s.t. x
1 D 0:
Observe that .x ; y / D .1; 0/ satisfies the constraint g.x ; y / D 0, and
.1; 0/ is regular. Furthermore, the first-order condition from Lagrange’s Theorem is satisfied at .1; 0/. This is because Of .1; 0/ D Og.1; 0/ D .1; 0/. Hence,
by letting D 1 we have Of .1; 0/ D Og.1; 0/.
However, .1; 0/ is not a constrained local maximum: for " > 0, we have
g.1; "/ D 0 and f .1; "/ D 1 C "2 > 1 D f .1; 0/. See Figure 2.6.
2.3.2
Second-Order Analysis
We probably do not have time to discus the second-order conditions. See
Jehle and Reny (2011, Section A2.3.4) and Sundaram (1996, Section 5.3).
OPTIMIZATION IN RN
CHAPTER 2
42
y
Lg .0/
Lf .1 C "2 /
Lf .1/
.1; "/
.1; 0/
0
x
Figure 2.6: The Lagrange’s Theorem is not sufficient.
2.4
Inequality Constrained Optimization: Kuhn-Tucker
Theorem
We now consider a problem of the form
max
s.t.
f .x/
h1 .x/ 6 0; : : : ; h` 6 0;
(ICP)
where f and hi are continuously differentiable functions from Rn to R.
More succinctly, we can write this problem as
max
s.t.
f .x/
h.x/ D 0;
where h W Rn ! R` is the function h D .h1 ; : : : ; h` /.
For any feasible point x , the set of active inequality constraints is denoted by
A.x/ ´ fi W hi .x/ D 0g :
(2.7)
If i … A.x/, we say that the i th constraint is inactive at x . We note that
if x is a local maximum of (ICP), then x is also a local maximum for a
problem identical to (ICP) except that the inactive constraints at x have
been discarded. Thus, inactive constraints at x do not matter; they can be
ignored in the statement of optimality conditions.
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM
43
x2
Lh2 .0/
Oh1 .y/
Oh1 .z/
Of .y/
Of .z/
Oh2 .z/
y
z
Df .x/ D 0
x
0
Lh1 .0/
x1
Figure 2.7: Inequality constrained optimization.
Definition 2.18 Let x be a point satisfying the constraint h.x / 6 0. Then
x is said to be regular if Ohi .x /, i 2 A.x /, are linearly independent.
2.4.1
First-Order Analysis
Example 2.19 Figure 2.7 illustrates a problem with two inequality constraints and depicts three possibilities, depending on whether none, one,
or two constraints are active.
In the first case, we could have a constrained local maximum such as x ,
for which both constraints are inactive. Such a vector must be a critical
point of the objective function.
In the second case, only a single constraint is active at a constrained
local maximum such as y , and here the gradients of the objective and
constraint are collinear. As we will see, these gradients actually point in
the same direction.
Lastly, we could have a constrained local maximum such as z, where
both constraints are active. Here, the gradient of the objective is not
collinear with the gradient of either constraint, and it may appear that
CHAPTER 2
44
OPTIMIZATION IN RN
Oh1 .x /
Of .x /
x
Oh2 .x /
Figure 2.8: Kuhn-Tucker Theorem.
no gradient restriction is possible. But in fact, Of .z/ can be written as a
linear combination of Oh1 .z/ and Oh2 .z/ with non-negative weights.
The restrictions evident in Figure 2.7 are formalized in the next theorem.
Theorem 2.20 (Kuhn-Tucker Theorem) Let x be a local maximum of (ICP),
and assume that x is regular. Then there exists a vector D .1 ; : : : ; ` /
such that
i > 0 and i hi .x / D 0; i D 1; : : : ; `;
(KT-1)
Of .x / D
`
X
i Ohi .x /:
(KT-2)
iD1
Proof. See Sundaram (1996, Section 6.5).
t
u
Remark 2.21 [1] Geometrically, the first-order condition from the KuhnTucker Theorem means that the gradient of the objective function, Of .x /,
is contained in the “semi-positive cone” generated by the gradients of
binding constraints, i.e., it is contained in the set
8
`
<X
:
i D1
9
=
˛i Ohi .x / W ˛1 ; : : : ; ˛` > 0 ;
;
depicted in Figure 2.8.
[2] Condition (KT-1) in Theorem 2.20 is called the condition of complementary slackness: if hi .x / < 0 then i D 0; if i > 0 then hi .x / D 0.
The Kuhn-Tucker Multipliers The vector in Theorem 2.20 is called
the vector of Kuhn-Tucker multipliers corresponding to the local maximum
x . The Kuhn-Tucker multipliers measure the sensitivity of the objective
function at x to relaxations of the various constraints:
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM
If hi .x / < 0, then the i th constraint is already slack, so relaxing it further will not help raise the value of the objective function, and i must
be zero.
If hi .x / D 0, then relaxing the i th constraint may help increase the value
of the maximization exercise, so we have i > 0.
Two Differences There are two important differences from the case of
equality constraints (see Theorem 2.12 and Theorem 2.20):
The regularity condition now holds only for the gradients of binding
constraints.
The multipliers are non-negative. This difference comes from the fact
that now only the inequality hi .x/ 6 0 needs to be maintained, so relaxing the constraint never hurts.
The Regularity Condition As with the analogous condition in Theorem 2.12
(see Example 2.14), here we show that the regularity condition in Theorem 2.20 is essential.
Example 2.22
Consider the following maximization problem
max
s.t.
f .x1 ; x2 / D x1
x1 /3 C x2 6 0
h1 .x1 ; x2 / D
.1
h2 .x1 ; x2 / D
x1 6 0
h3 .x1 ; x2 / D
x2 6 0:
See Figure 2.9. Clearly the solution is .x1 ; x2 / D .1; 0/. At this point we have
Oh1 .1; 0/ D .0; 1/;
Oh3 .1; 0/ D .0; 1/ and Of .1; 0/ D .1; 0/:
Since x1 > 0, it follows from the complementary slackness condition (KT-1)
that 2 D 0. But now (KT-2) fails: for any 1 > 0 and 3 > 0, we have
1 Oh1 .1; 0/ C 3 Oh3 .1; 0/ D .0; 1
3 / ¤ Of .1; 0/:
This is because .1; 0/ is not regular: There are two binding constraints
at .1; 0/, namely h1 and h3 , and the gradients Oh1 .1; 0/ and Oh3 .1; 0/ are
colinear. Certainly Of .1; 0/ cannot be contained in the cone generated by
Oh1 .1; 0/ and Oh3 .1; 0/; see Remark 2.21.
45
OPTIMIZATION IN RN
CHAPTER 2
46
x2
Lf .1=2/
Lf .1/
1
/
.0
L h1
Oh1 .1; 0/
Of .1; 0/
x1
0
Oh3 .1; 0/
Figure 2.9: The constraint qualification fails at .1; 0/.
As with equality constraints, we can define the Lagrangian
L W R R ! R by
The Lagrangian
n
`
L.x; / D f .x/
`
X
i hi .x/;
i D1
and then condition (KT-2) from Theorem 2.20 is the requirement that x is
a critical point of the Lagrangian given multipliers 1 ; : : : ; ` .
Let us consider a numerical example.
Example 2.23
Consider the problem
max
s.t.
x2
y
x2 C y2
1 6 0:
Set the Lagrangian:
L.x; y; / D x 2
y
.x 2 C y 2
1/:
2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM
Lf .
1/
y
L
h .0
Lf .
5=4/
/
x
0
.
47
p
p
3=4; 1=2/
.
3=4; 1=2/
Figure 2.10: Example 2.23.
The critical points of L are the solutions .x; y; / to
2x
2x D 0
(2.8)
1
2y D 0
(2.9)
>0
(2.10)
160
(2.11)
1/ D 0:
(2.12)
2
x Cy
2
.x C y
2
2
For (2.8) to hold, we must have x D 0 or D 1.
If D 1, then (2.9) implies y D
That is,
p
1=2, and (2.12) implies x D ˙ 3=2.
p
3
.x; y; / D ˙
;
2
!
1
;1 :
2
We thus have f .x; y/ D 5=4; see Figure 2.10.
If x D 0, then > 0 by (2.9). Hence h is binding: x 2 C y 2 1 D 0, and so
y D ˙1. Since (2.9) implies that y D 1 is impossible, we have
1
.x; y; / D 0; 1;
:
2
At this critical point, we have f .0; 1/ D 1 < 5=4, which means that
.0; 1; 1=2/ cannot be a solution. Since there are no other critical points,
it follows that there are exactly
two solutions to the maximization probp
lem, namely .x ; y / D .˙ 3=2; 1=2/.
CHAPTER 2
48
?
OPTIMIZATION IN RN
Exercise 2.24 Let U D R2 ; let f .x; y/ D .x 1/2 C y 2 , and let h.x; y/ D
2kx y 2 6 0, where k > 0. Solve the maximization problem
max ff .x; y/ W h.x; y/ 6 0g :
Example 2.25
Consider a consumer’s problem:
max fu.x; y/ D x C yg
.x;y/2R2
s.t.
h1 .x; y/ D
x60
h2 .x; y/ D
y60
h3 .x; y/ D px x C py y
(2.13)
I 6 0;
where px ; py ; I > 0.
We first identify all possible combinations of constraints that can, in
principle, be binding at the optimum. There are eight combinations to be
check:
¿; h1 ; h2 ; h3 ; .h1 ; h2 /; .h1 ; h3 /; .h2 ; h3 /; and .h1 ; h2 ; h3 /:
Of these, the last one can be ruled out, since h1 D h2 D 0 implies that h3 < 0.
Moreover, since u is strictly increasing in both arguments, it is obvious that
h3 D 0. So we only need to check three combinations: .h1 ; h3 /, .h2 ; h3 /, and
h3 .
If the optimum occurs at a point where only h1 and h3 are binding, then
Oh1 .x; y/; Oh3 .x; y/ D . 1; 0/; .px ; py /
is linear independent. So the constraint qualification holds at such a
point.
?
Similarly, the constraint qualification holds if only .h2 ; h3 / bind.
If h3 is the only binding constraint, then Oh3 .x; y/ D .px ; py / ¤ 0; that
is, the constraint qualification holds.
Exercise 2.26
2.4.2
Solve the problem of (2.13).
Second-Order Analysis
We probably do not have time to discus the second-order conditions. See
Duggan (2010, Section 6.3).
2.5 ENVELOP THEOREM
2.5
49
Envelop Theorem
Let A R. The graph of a real-valued function f on A is a curve in the
R2 plane, and we shall also refer to the curve itself as f . Given a onedimensional parametrized family of curves f˛ W A ! R, where ˛ runs over
some interval, the curve h W A ! R is the envelope of the family if
each point on the curve h is tangent to the graph of one of the curves f˛
and
each curve f˛ is tangent to h.
(See, e.g., Apostol 1974, p. 342 or Zorich 2004, p. 252 for this definition.)
That is, for each ˛ , there is some q and also for each q , there is some ˛ ,
satisfying
˚
f˛ .q/ D h.q/
f˛0 .q/ D h0 .q/:
We may regard h as a function of ˛ if the correspondence between curves
and points on the envelope is one-to-one.
2.5.1
An Envelopment Theorem for Unconstrained
Maximization
Consider now an unconstrained parametrized maximization problem. Let
x .q/ be the value of the control variable x that maximizes f .x; q/, where q
is our parameter of interest. For some fixed x , the function
x .q/ ´ f .x; q/
defines a curve. We also define the value function
V .q/ ´ f x .q/; q D max x .q/:
x
Under appropriate conditions, the graph of the value function V will be the
envelope of the curves x . “Envelope theorems” in maximization theory are
concerned with the tangency conditions this entails.
Example 2.27
Let
f .x; q/ D q
.x
q/2 C 1;
where x; q 2 Œ0; 2:
Then given q , the maximizing x is given by x .q/ D q , and V .q/ D q C 1.
CHAPTER 2
50
x .q/
OPTIMIZATION IN RN
V .q/
1
3
x
D
1
2
x
2
V
.q
/D
q
C
3
D
8
0:
x
D
0:6
1
0:4
0:2
xD
xD 0
xD
1
0
1
q
2
0
1
(a) The family of curves fx .q/ W x D
0; 0:2; : : : ; 2g.
2
q
(b) The envelop.
Figure 2.11: The graph of V is the envelope of the family of graphs of the
functions x .
For each x , the function x is given by
x .q/ D q
.x
q/2 C 1:
The graphs of these functions and of V are shown for selected values of x
in Figure 2.11. Observe that the graph of V is the envelope of the family of
graphs of the functions x . Consequently the slope of V is the slope of the
x to which it is tangent, that is,
V 0 .q/ D
ˇ
@x ˇˇ
@f
D
ˇ
@q xDx .q/Dq
@q
ˇ
ˇ
ˇ
ˇ
D 1 C 2.x
xDx .q/Dq
q/jxDx .q/Dq D 1:
This last observation is one version of the Envelope Theorem.
2.5.2
An Envelope Theorem for Constrained Maximization
Consider the maximization problem,
max f .x; q/ s.t. gi .x; q/ D 0;
x2Rn
i D 1; : : : ; m;
(2.14)
2.5 ENVELOP THEOREM
51
where x is a vector of choice variables, and q D .q1 ; : : : ; q` / 2 R` is a vector
of parameters that may enter the objective function, the constraints, or
both.
Suppose that for each q there exists a unique solution x.q/. Furthermore, we assume that the objective function f W Rn ! R, constraints gi W Rn R` ! R (i D 1; : : : ; m), and solutions x W R` ! Rn are differentiable in the
parameter q .
Then, for every parameter q , the maximized value of the objective function is f .x.q/; q/. This defines a new function, V W R` ! R, called the value
function. Formally,
V .q/ ´ maxn ff .x; q/ W gi .x; q/ D 0; i D 1; : : : ; mg :
x2R
(2.15)
Theorem 2.28 (Envelope Theorem) Consider the value function V .q/ for
the problem (2.14). Let .1 ; : : : ; m / be values of the Lagrange multipliers
N at qN . Then for each k D 1; : : : ; `,
associated with the maximum solution x.q/
N
N q/
N
@V .q/
@f .x.q/;
D
@qk
@qk
m
X
j
j D1
N q/
N
@gj .x.q/;
:
@qk
(2.16)
Proof. By definition, V .q/ D f .x.q/; q/ for all q . Using the chain rule, we
have
"
#
n
X @f .x.q/;
N q/
N @xi .q/
N
N q/
N
N
@f .x.q/;
@V .q/
D
C
:
@qk
@xi
@qk
@qk
i D1
It follows from Theorem 2.12 that
m
X @gj .x.q/;
N q/
N
N q/
N
@f .x.q/;
D
j
:
@xi
@xi
j D1
Hence,
#
"
n
X
N
N q/
N @xi .q/
N
N q/
N
@V .q/
@f .x.q/;
@f .x.q/;
D
C
@qk
@xi
@qk
@qk
i D1
20
3
1
m
n
X
X
N q/
N A @xi .q/
N 7 @f .x.q/;
N q/
N
@gj .x.q/;
6@
D
j
4
5C
@xi
@qk
@qk
i D1
D
n
X
j D1
j D1
"
#
n
X
N q/
N @xi .q/
N
N q/
N
@gj .x.q/;
@f .x.q/;
j
C
:
@xi
@qk
@qk
i D1
Finally, since gj .x.q/; q/ D 0 for all q , we have
"
#
n
X
N q/
N @xi .q/
N
N q/
N
@gj .x.q/;
@gj .x.q/;
C
D 0:
@xi
@qk
@qk
iD1
CHAPTER 2
52
OPTIMIZATION IN RN
Combining, we get (2.16).
Example 2.29
t
u
We are given the problem
max ff ..x; y/; q/ D xyg
s.t. g..x; y/; q/ D 2x C 4y
.x;y/2R2
q D 0:
Forming the Lagrangian, we get
L D xy
.2x C 4y
q/;
with first-order conditions:
q
y
2 D 0
x
4 D 0
2x
4y D 0:
(2.17)
These solve for x.q/ D q=4, y.q/ D q=8 and .q/ D q=16. Thus,
V .q/ D x.q/y.q/ D
q2
:
32
Differentiating V .q/ with respect to q we get
V 0 .q/ D
q
:
16
Now let us verify this using the Envelope Theorem. The theorem tells us
that
V 0 .q/ D
@f ..x.q/; y.q//; q/
@q
.q/
@g..x.q/; y.q//; q/
q
D .q/ D
:
@q
16
Example 2.30 Consider a consumer whose utility function u W RnC ! R is
strictly increasing in every commodity i D 1; : : : ; n. Then this consumer’s
problem is
max u.x/
s.t.
n
X
xi pi D I:
i D1
The Lagrangian is
0
L.x; / D u.x/
@I
n
X
1
xi pi A :
iD1
It follows from Theorem 2.28 that
V 0 .I / D
@L.x; /
D :
@I
That is, measures the marginal utility of income.
2.5 ENVELOP THEOREM
2.5.3
Integral Form Envelope Theorem
The Envelope theorems we introduced so far rely on assumptions that are
not satisfactory for applications, e.g., mechanism design. Unfortunately, it
is too technique to develop the more advanced treatment of the Envelope
Theorem. We refer the reader to Milgrom and Segal (2002) and Milgrom
(2004, Chapter 3) for the integral form Envelope Theorem.
53
3
CONVEX ANALYSIS IN RN
Rockafellar (1970) is the classical reference for finite-dimensional convex analysis. As for infinite-dimensional convex analysis, Luenberger (1969)
is an excellent text.
To understand the material what follows, it is necessary that the reader
have a good background in Multivariable Calculus (Chapter 1) and Linear
Algebra (Axler, 1997).
In this chapter, we will exclusively consider convexity in Rn for concreteness, but much of the discussion here generalizes to infinite dimensional vector spaces. You may want to consult Lay (1982), Hiriart-Urruty
and Lemaréchal (2001), Berkovitz (2002), Bertsekas, Nedić and Ozdaglar
(2003), Bertsekas (2009), Ok (2007, Chapter G) and Royden and Fitzpatrick
(2010, Chapter 6).
3.1
Convex Sets
Definition 3.1 (Convex Set) A subset C Rn is convex if for every pair of
points x 1 ; x 2 2 C , the line segment
Œx 1 ; x 2  ´ fx W x D x 1 C .1
/x 2 ; 0 6 6 1g
belongs to C .
?
Exercise 3.2 Sketch the following sets in R2 and determine from figure
which sets are convex and which are not:
(a)
n
o
x; y W x 2 C y 2 6 1 ,
(b)
n
o
x; y W 0 < x 2 C y 2 6 1 ,
(c)
n
o
x; y W y > x 2 ,
CHAPTER 3
56
CONVEX ANALYSIS IN RN
(0, 0, 1)
(1, 0, 0)
(0, 1, 0)
Figure 3.1: 2 in R3 .
(d)
n
o
x; y W jxj Cjyj 6 1 , and
(e)
n
o
ı
x; y W y > 1 1 C x 2 .
Lemma 3.3 Let fC˛ g be a collection of convex sets such that C ´
¿. Then C is convex.
T
˛
C˛ ¤
Proof. Let x 1 ; x 2 2 C . Then x 1 ; x 2 2 C˛ for all ˛ . Since C˛ is convex, we
have Œx 1 ; x 2  C˛ for all ˛ . Hence, Œx 1 ; x 2  C , so C is convex.
t
u
For each positive integer n, define
n
1
´
8
<
.1 ; : : : ; n / 2 Œ0; 1n W
:
n
X
i D1
9
=
i D 1 :
;
(3.1)
For n D 1, the set 0 is the singleton f1g. For n D 2, the set 1 is the
closed line segment joining .0; 1/ and .1; 0/. For n D 3, the set 2 is the
closed triangle with vertices .1; 00/, .0; 1; 0/ and .0; 0; 1/ (see Figure 3.1).
Definition 3.4 A point x 2 Rn is a convex combination of points x 1 ; : : : ; x k
if there exists 2 k 1 such that
xD
k
X
i x i :
iD1
Lemma 3.5 A set C Rn is convex iff every convex combination of points
in C is also in C .
3.2 SEPARATION THEOREM
57
Proof. The “if” part is evident. So we shall prove the “only if” statement by
induction on k . It holds for k D 2 by definition. Suppose the statement is
true for k D n. Now consider k D n C 1. Let x 1 ; : : : ; x kC1 2 C and 2 n
with kC1 2 .0; 1/. Then
xD
nC1
X
i x i
i D1
0
D@
n
X
j D1
12
j A 4
n
X
i
Pn
j D1
iD1
!
j
3
x i 5 C kC1 x kC1
2 C:
t
u
Let A Rn , and let A be the class of all convex subsets of Rn that
T
contain A. We have A ¤ ¿—after all, Rn 2 A. Then, by Lemma 3.3,
A is
a convex set in Rn that contains A. Clearly, this set is the smallest (that is,
-minimum) convex subset of Rn that contains A.
Definition 3.6 The convex hull of A, denoted by cov.A/, is the intersection
of all convex sets containing A.
?
Exercise 3.7 For a given set A, let K.A/ denote the set of all convex
combinations of points in A. Show that K.A/ is convex and A K.A/.
Theorem 3.8
Let A Rn . Then cov.A/ D K.A/.
T
Proof. Let A be the family of convex sets containing A. Since cov.A/ D A
and K.A/ 2 A (Exercise 3.7), we have cov.A/ K.A/.
To prove the reverse inclusion relation, take an arbitrary C 2 A. Then
T
A C . It follows from Lemma 3.5 that K.A/ C . Hence K.A/ A D
cov.A/.
t
u
3.2
Separation Theorem
This section is devoted to the establishment of separation theorems. In
some sense, these theorems are the fundamental theorems of optimization
theory. For simplicity, we restrict our analysis on Rn .
ˇ
Definition 3.9 A hyperplane Ha in Rn is defined to be the set of points
that satisfy the equation ha; xi D ˛ . Thus,
˚
Haˇ ´ x 2 Rn W ha; xi D ˇ :
The vector a is said to be a normal to the hyperplane.
CONVEX ANALYSIS IN RN
CHAPTER 3
58
a
ˇ
Ha
ˇ
Half-space above Ha
ˇ
Half-space below Ha
Figure 3.2: Hyperplane and half-spaces.
ˇ
Remark 3.10 Geometrically, a hyperplane Ha in Rn is a translation of an
.n 1/-dimensional subspace (an affine manifold). Algebraically, it is a level
set of a linear functional. For an excellent explanation about hyperplanes,
see Luenberger (1969, Section 5.12).
ˇ
A hyperplane Ha divides Rn into two half spaces, one on each side of
The set fx 2 Rn W ha; xi > ˇg is called the half-space above the hyperˇ
plane Ha , and the set fx 2 Rn W ha; xi 6 ˇg is called the half-space below the
ˇ
hyperplane Ha ; see Figure 3.2.
Haˇ .
It therefore seems natural to say that two sets X and Y are separated by
ˇ
a hyperplane Ha if they are contained in different half spaces determined
ˇ
by Ha . We will introduce two separation theorems.
Theorem 3.11 (A First Separation Theorem) Let C be a closed and convex
subset of Rn ; let y 2 Rn X C . Then there exists a vector a 2 Rn with a ¤ 0,
and a scalar ˇ 2 R such that ha; yi > ˇ and ha; xi < ˇ for all x 2 C .
To motivate the proof, we argue heuristically from Figure 3.3, where C
is assumed to have a tangent at each boundary point. Draw a line from y to
x , the point of C that is closest to y . The vector y x is orthogonal to C
in the sense that y x is orthogonal to the tangent line at x . The tangent
line, which is exactly the set fz 2 Rn W hy x ; z x i D 0g, separates y and
C . The point x is characterized by the fact that hy x ; x x i 6 0 for all
x 2 C . If we move the tangent line parallel to itself so as to pass through
a point x 0 2 .x ; y/, we are done. We now justify these steps in a series of
claims.
Claim 1 Let C be a convex subset of Rn and let y 2 Rn X C . If there exists
a point in C that is closest to y , then it is unique.
3.2 SEPARATION THEOREM
59
a
ˇ
Ha
y
y
x
x
x
x
x
C
Figure 3.3: The separating hyperplane theorem.
Proof. Suppose that there were two points x 1 and x 2 of C that were closest
to y . Then .x 1 C x 2 /=2 2 C since C is convex, and so1
x1 C x2
d.y; C / 6 y
2
1 D .x 1 y/ C .x 2 y/ 2
1
1
6 kx 1 y k C kx 2 y k
2
2
D d.y; C /:
Hence the triangle inequality holds with equality. It follows from Exercise 1.8 that there exists > 0 such that x 1 y D .x 2 y/. Clearly, ¤ 0;
for otherwise x 1 y D 0 implies that y D x 1 2 C . Then D 1 since
kx 1 yk D kx 2 yk D d.y; C /. But then x 1 D x 2 .
Claim 2 Let C be a closed subset of Rn and let y 2 Rn X C . Then there
exists a point x 2 C that is closest to y .
Proof. Take an arbitrary point x 0 2 C . Let r > kx 0 yk. Then C1 ´ B.yI r/\
C is nonempty (at least x 0 is in the intersection), closed, and bounded and
hence is compact. The function x 7! kx yk is continuous on C1 and so
attains its minimum at some point x 2 C1 , i.e.,
kx 1 For
yk 6 kx
yk for all x 2 C1 :
a set X Rn , the function d.y; X / is the distance from y to X defined by
d.y; X / D inf ky
x2X
xk:
CONVEX ANALYSIS IN RN
CHAPTER 3
60
For every x 2 C X C1 , we have
kx
yk > kx yk > r > kx 0
yk;
since x 0 2 C1 .
Claim 3 Let C be a convex subset of Rn and let y 2 Rn X C . Then x 2 C
is a closest point in C to y iff
˝
x; x
y
˛
x 6 0 for all x 2 C:
(3.2)
Proof. Let x 2 C be a closest point to y and let x 2 C . Since C is convex,
we have
˚
Œx ; x ´ z.t / 2 Rn W z.t / D x C t .x
x /; t 2 Œ0; 1 C:
Let
˝
yk2 D x C t .x
g.t / ´ kz.t /
n
X
xi
D
x/
y; x C t .x
x/
y
˛
2
xi / :
yi C t .xi
iD1
Observe that g.0/ D x . Since g is continuously differentiable on .0; 1 and
0
x 2 argminx2C kx yk, we have gC
.0/ > 0. Since
g 0 .t / D 2
n
X
xi
xi / .xi
yi C t .xi
xi /
i D1
2
n
X
4
D2
.yi
n
X
xi / C t
.xi
xi /.xi
i D1
h ˝
D2
y
3
xi /2 5
(3.3)
iD1
x ;x
x
˛
C tkx
i
x k2 :
Letting t # 0 we get (3.2).
Conversely, suppose that (3.2) holds. Take an arbitrary x 2 C X fx g. It
follows from (3.3) that if t 2 .0; 1 then
i
x k2 > 2tkx
x k2 > 0:
That is, g is strictly increasing on Œ0; 1. Thus, g.1/ D kx
g.0/.
yk > kx g 0 .t / D 2
h ˝
y
x; x
˛
x C t kx
yk D
3.2 SEPARATION THEOREM
61
Proof of Theorem 3.11. We now can complete the proof of Theorem 3.11.
Let x 2 C be the closest point to y (by Claim 1 and Claim 2). Let a D y x .
Then for all x 2 C , we have ha; x x i 6 0 (by Claim 3), i.e., ha; xi 6 ha; x i,
with equality occurring when x D x . Hence,
max ha; xi D a; x :
˝
˛
x2C
On the other hand, ha; y
x i D kak2 > 0, so
˝
˛
˝
˛
ha; yi D a; x C kak2 > a; x :
Finally, take an arbitrary ˇ 2 .ha; x i ; ha; yi/. We thus have ha; yi > ˇ
u
t
and ha; xi 6 ha; x i < ˇ for all x 2 C .
Theorem 3.12 (Separation Theorem) Let X and Y be two disjoint convex
subsets of Rn . Then there exists a 2 Rn with a ¤ 0 and a scalar ˇ 2 R such
that ha; xi > ˇ for all x 2 X and ha; yi 6 ˇ for all y 2 Y . That is, there is a
ˇ
hyperplane Ha that separates X and Y .
Proof. First note that
X \ Y D ¿ () 0 … X
Since X and Y are convex, so is X
that
ha; x
Y:
Y . Hence, there exists an a ¤ 0 such
yi 6 0 for all x
y2X
Y:
Therefore,
ha; xi 6 ha; yi
for all x 2 X and y 2 Y:
(3.4)
Let ˛ ´ supfha; xi W x 2 Xg and ´ inffha; yi W y 2 Y g. Then from (3.4)
we get that ˛ and are finite, and for any number ˇ 2 Œ˛; ,
ha; xi 6 ˇ 6 ha; yi
for all x 2 X and y 2 Y:
ˇ
Thus the hyperplane Ha separates X and Y .
t
u
Theorem 3.13 (Proper Separation Theorem) Let X and Y be two convex
ˇ
sets such that X B ¤ ¿ and X B \ Y D ¿. Then there exists a hyperplane Ha
that properly separates Xx and Yx .
Proof. Since X B is convex and disjoint from Y , it follows from Theorem 3.12
ˇ
that there exists a hyperplane Ha such that
ha; xi 6 ˇ 6 ha; yi
for all x 2 X B and all y 2 Y:
(3.5)
CHAPTER 3
62
CONVEX ANALYSIS IN RN
Let x 2 Xx . Then x 2 X B since Xx D X B . So there exists a sequence .x n /
in X B such that x n ! x . It then follows from (3.5) and the continuity of
the inner product that (3.5) holds for all x 2 Xx and all y 2 Yx . Hence, the
ˇ
hyperplane Ha separates Xx and Yx . Since .Xx /B D X B , we know that .Xx /B ¤
¿. Furthermore, ha; xi < ˇ for x 2 .Xx /B , so the separation is proper.
t
u
Theorem 3.14 (Strong Separation Theorem) Let K be a compact convex
set and let C be a closed convex set such that K \ C D ¿. Then K and C
can be strongly separated.
Proof. For every " > 0, define
K" ´ K C B.0I "/ D
[
x C B.0I "/
x2K
C" ´ C C B.0I "/ D
[
y C B.0I "/ :
y2C
Clear, both K" and C" are open and convex.
We now show that there exists " > 0 such that K" \ C" D ¿. If the
assertion were false, there would exist a sequence ."n / with "n > 0 and
"n ! 0 and a sequence .wn / such that wn 2 K"k \ C"k for each n. Since
˚
wn D
x n C zn
with x n 2 K and kzn k < "k
y n C z0n
with y n 2 C and z0n < "k ;
we have a sequence .x k / in K and a sequence .y n / in C such that
kwn
x n k < "k ;
kwn
y n k < "k :
Hence,
kx n
y n k D .x n
wn / C .wn
y n / 6 kx n
wn kCky n
wn k < 2"k : (3.6)
Since K is compact, there exists a subsequence .x nk / that converges to
x 0 2 K . It follows from (3.6) that
ky nk
x 0 k 6 kx nk
x 0 k C kx nk
y nk k ! 0;
that is, y nk ! x 0 . Since C is closed, x 0 2 C . This contradicts the assumption that C \ K D ¿, and so the assertion is true.
We have shown that there exists " > 0 such that K" \ C" D ¿. Hence,
by Theorem 3.13 there is a hyperplane that properly separates K" and C" .
Since both K" and C" are open, the separation is strict. According to the
definition, this says that K and C are strongly separated.
t
u
3.3 SYSTEMS OF LINEAR INEQUALITIES: THEOREMS OF THE ALTERNATIVE
3.3
63
Systems of Linear Inequalities: Theorems of the
Alternative
Let A be an m n matrix and b ¤ 0 be a vector in Rm . We focus on finding a
non-negative x 2 RnC such that A x D b or show that no such x exists. The
problem can be framed as follows: can b be expressed as a non-negative
linear combination of the columns of A?
Definition 3.15 The set of all non-negative linear combinations of the
columns of A is called the finite cone generated by the columns of A. It is
denoted cone.A/.
Let A be an m n matrix, then cone.A/ is a convex set.
Lemma 3.16
Proof. Take any two y; y 0 2 cone.A/. Then there exist x; x 0 2 RnC such that
y D Ax and y 0 D Ax 0 :
For any 2 Œ0; 1 we have
y C .1
Since x C .1
/y 0 D Ax C .1
/Ax 0 D A x C .1
/x 0 2 RnC it follows that y C .1
/x 0 :
/y 0 2 cone.A/.
t
u
Let A be an m n matrix, then cone.A/ is a closed set.
Lemma 3.17
Proof. We prove this lemma by considering two cases.
Case 1: The columns of A are linearly independent. Let .wn / be a convergent sequence in cone.A/ with limit w. We show that w 2 cone.A/. For
each wn there exists x n 2 RnC such that wn D Ax n . We use the fact that
.Ax n / converges to show that .x n / converges.
t
u
Theorem 3.18 (Farkas’ Lemma) Let A be an mn matrix and let b 2 Rm Xf0g.
Then one and only one of the systems
Ax D b;
yA > 0;
has a solution.
x > 0;
y b < 0;
x 2 Rn ;
m
y2R ;
(FI )
(FII )
CONVEX ANALYSIS IN RN
CHAPTER 3
64
Proof. We first suppose that (FI ) has a solution x . We shall show that (FII )
has no solution. Suppose there exists y 0 2 Rm such that AT y 0 > 0. Then
y 0 b D x .AT y 0 / > 0:
Hence (FII ) has no solution.
We now suppose that (FI ) has no solution; that is,
b … cone.A/:
ˇ
Since cone.A/ is convex and closed, there exists a hyperplane Hy 0 that
strongly separates b and cone.A/. Thus, y 0 ¤ 0 and
˝
˛
˝
˛
y 0; b > ˇ > y 0; z
for all z 2 cone.A/:
Since 0 2 cone.A/, we get that ˇ > 0. Thus,
˝
˛
y 0 ; b > 0:
(3.7)
For all z 2 cone.A/ we have
˝
E
˛ ˝
˛ D
y 0 ; z D y 0 ; Ax D x; AT y 0 < ˛;
the last inequality now holding for all x 2 RnC .
t
u
Example 3.19 We use the Farkas’ Lemma to decide if the following system
has a non-negative solution:
"
4
1
2 3
# x
" #
1
5 6 7
1
:
4x2 5 D
2
1
x3
1
0
The Farkas alternative is
h
y1
Or equivalently,
"
i 4
y2
1
˚
1
0
#
h
5
> 0
2
0
i
0 ;
y1 C y2 < 0:
4y1 C y2
> 0;
y1
> 0;
5y1 C 2y2 > 0;
y1 C y2
< 0:
(3.8)
3.4 CONVEX FUNCTIONS
65
There has no solution to the system (3.8). The second inequality requires
that y1 > 0. Combining this with the last inequality we conclude that y2 < 0.
But y1 > 0 and y2 < 0 contradict the third inequality of (3.8). So, the original
system has a non-negative solution.
We use the Farkas’ Lemma to decide the solvability of the
Example 3.20
system:
2
1
6
60
6
61
4
1
1
1
0
1
3
2 3
0 2 3
2
7 x1
6 7
7
6
17 6 7 627
4x 2 5 D 6 7
7:
17
5 x
425
3
1
1
We are interested in non-negative solutions of this system. The Farkas
alternative is
2
h
y1
y2
y3
1
i6
60
y4 6
61
4
1
1
1
0
1
3
0
7 h
17
7> 0
17
5
1
0
0
i
0 ;
2y1 C 2y2 C 2y3 C y4 < 0:
One solution is y1 D y2 D y3 D
system has no solution.
3.4
1=2 and y4 D 1, implying that the given
Convex Functions
Throughout this section we will assume the subset C Rn is convex and
f is a real-valued function defined on C , that is, f W C ! R. When we take
x 1 ; x 2 2 C , we will let x t ´ tx 1 C .1 t /x 2 , for t 2 Œ0; 1, denote the convex
combination of x 1 and x 2 .
Definitions
Definition 3.21
t 2 Œ0; 1,
A function f W C ! R is convex if for all x 1 ; x 2 2 C and
f Œtx 1 C .1
t /x 2  6 tf .x 1 / C .1
t /f .x 2 /:
The function f is strictly convex if the above inequality holds strictly.
A function f W C ! R is concave if for all x 1 ; x 2 2 C and t 2 Œ0; 1,
f Œtx 1 C .1
t /x 2  > tf .x 1 / C .1
t /f .x 2 /:
The function f is strictly concave if the above inequality holds strictly.
CHAPTER 3
66
f .x/
CONVEX ANALYSIS IN RN
f .x/
epi.f /
sub.f /
x
0
x
0
(a) f is convex.
(b) f is concave.
x2
x2
Lf
S.y0 /
/
.y 0
I.y0 /
Lf
/
.y 0
x1
0
x1
0
(c) f is quasi-convex.
(d) f is quasi-concave.
Figure 3.4:
Definition 3.22
and t 2 Œ0; 1,
A function f W C ! R is quasi-convex if, for all x 1 ; x 2 2 C
f Œtx 1 C .1
t /x 2  6 max ff .x 1 /; f .x 2 /g :
The function f is strictly quasi-convex function if the above inequality holds
strictly.
A function f W C ! R is quasi-concave if, for all x 1 ; x 2 2 C and t 2 Œ0; 1,
f Œtx 1 C .1
t /x 2  > min ff .x 1 /; f .x 2 /g :
The function f is strictly quasi-concave if the above inequality holds strictly.
Geometric Interpretation
Given a function f W C ! R and y0 2 R, let us define
3.4 CONVEX FUNCTIONS
67
The epigraph of f :
epi.f / ´ f.x; y/ 2 C R W f .x/ 6 yg;
The subgraph of f :
sub.f / ´ f.x; y/ 2 C R W f .x/ > yg;
The superior set for level y0 :
The inferior set for level y0 :
S.y0 / ´ fx 2 C W f .x/ > y0 g;
I.y0 / ´ fx 2 C W f .x/ 6 y0 g.
We then have (see Figure 3.4)
f is convex () epi.f / is convex.
f is concave () sub.f / is convex.
f is quasi-convex () I.y0 / is convex.
f is quasi-concave () S.y0 / is convex.
Convexity and Quasi-convexity
It is a simple matter to show that concavity (convexity) implies quasi-concavity
(quasi-convexity).
Theorem 3.23 Let C Rn and f W C ! R. If f is concave on C , it is
also quasi-concave on C . If f is convex on C , it is also quasi-convex
on C .
Proof. We only prove the first claim, and leave the second as an exercise.
Suppose f is concave on C . Take any x; y 2 C and t 2 Œ0; 1. Without loss
of generality, we let
f .x/ > f .y/:
By the definition of concavity, we have
f tx C .1
t /y > tf .x/ C .1 t /f .y/
D t f .x/ f .y/ C f .y/
> f .y/
D min ff .x/; f .y/g :
Hence, f is quasi-concave.
?
t
u
Exercise 3.24 Prove the second claim in Theorem 3.23: if f is convex,
then it is also quasi-convex.
CHAPTER 3
68
CONVEX ANALYSIS IN RN
Concavity and Hessian
We now characterize concavity of a function using the Hessian matrix.
Theorem 3.25 Let A Rn . The (twice continuously differentiable)
function f W A ! R is concave if and only if Hf .x/ is negative semidefinite for every x 2 A.
Proof. See Mas-Colell et al. (1995, Theorem M.C.2).
t
u
Example 3.26 Let A ´ .0; 1/ . 5; 1/. Let f .x; y/ D ln x C ln.y C 5/. For
each point .x; y/ 2 A, the Hessian of f is
"
Hf .x; y/ D
#
0
:
1=.y C 5/2
1=x 2
0
Then for each .u; v/ 2 R2 , we have
h
u
i
v "
1=x 2
0
0
1=.y C 5/2
#" #
u
D
v
u2
x2
v2
6 0:
.y C 5/2
That is, Hf .x; y/ is negative semidefinite. Hence, f .x; y/ is concave. See
Figure 3.5.
Jensen’s Inequality
Theorem 3.27 (Jensen’s Inequality) Let f W C ! R, where C Rn is
convex. Then f is convex iff for every finite set of points x 1 ; : : : ; x k 2 C
and every t D .t1 ; : : : ; tk / 2 k 1 (see (3.1) for the definition of k 1 ),
0
f @
k
X
i D1
1
ti x i A 6
k
X
ti f .x i /:
(3.9)
i D1
Proof. The ”If” part is evident. So we only prove the “only if” part. Suppose
that f is convex. We shall prove (3.9) by induction on k . For k D 1 the
relation is trivial (remember that 0 D f1g when k D 1). For k D 2 the
relation follows from the definition of a convex function. Suppose that
k > 2 and that (3.9) has been established for k 1. We show that (3.9) holds
for k .
3.4 CONVEX FUNCTIONS
69
6
4
30
2
20
0
30
y
10
20
10
0
x
Figure 3.5: The function ln x C ln.y C 5/ is concave.
If tk D 1, then there is nothing to prove. If tk < 1, set T D
T C tk D 1, T D 1 tk > 0 and
k
X1
i D1
ti
T
D
D 1:
T
T
Pk
1
i D1 ti .
Then
CHAPTER 3
70
CONVEX ANALYSIS IN RN
Hence,
0
f @
k
X
1
0
1
ti x i A D f @
ti x i C tk x k A
k
X1
iD1
iD1
0 2
1
3
k
1 X
ti
B
C
D f @T 4
x i 5 C tk x k A
T
i D1
0
6 Tf @
k
X1 i D1
2
6T4
k
X1 i D1
D
k
X
ti
T
ti
T
1
x i A C tk f .x k /
3
f .x i /5 C tk f .x k /
ti f .x i /;
i D1
where the first inequality follows from the convexity of f and the second
inequality follows from the inductive hypothesis.
t
u
3.5
Convexity and Optimization
Concave Programming
We first present two results which indicates the importance of convexity
for optimization theory.
In convex optimization problems, all local optima must also be global
optima.
If a strictly convex optimization problem admits a solution, the solution
must be unique.
Theorem 3.28
Let X Rn be convex and f W X ! R be concave. Then
(a) Any local maximizer of f is a global maximizer of f .
(b) The set argmaxff .x/ W x 2 Xg of maximizers of f on X is either
empty or convex.
3.5 CONVEXITY AND OPTIMIZATION
71
Proof. (a) Suppose x is a local maximizer but not a global maximizer of f .
Then there exists " > 0 such that
f .x/ > f .y/;
for all y 2 X \ B.xI "/;
and there exists z 2 X such that
(3.10)
f .z/ > f .x/:
Since X is convex, Œt x C .1 t /z 2 X for all t 2 .0; 1/. Take tN sufficiently
close to 1 so that tNx C .1 tN/z 2 B.xI "/. By the concavity of f and (3.10), we
have
f tNx C .1
tN/z > tNf .x/ C .1
tN/f .z/ > f .x/:
A contradiction.
Suppose that x 1 and x 2 are both maximizers of f on X . Then f .x 1 / D
f .x 2 /. For any t 2 .0; 1/, we have
b
f .x 1 / > f tx 1 C .1
That is, f Œtx C .1
on X .
t /x 2 > tf .x 1 / C .1
t /x 2  D f .x 1 /. Thus, tx 1 C .1
t /f .x 2 / D f .x 1 /:
t /x 2 is a maximizer of f
t
u
Theorem 3.29 Let X Rn be convex and f W X ! R is strictly concave. Then argmaxff .x/ W x 2 Xg either is empty or contains a single
point.
?
Exercise 3.30
Prove Theorem 3.29.
We now present a extremely important theorem, which says that the
first-order conditions of the Kuhn-Tucker Theorem (Theorem 2.20) are both
necessary and sufficient to identify optima of convex inequality-constrained
optimization problem, provided a mild regularity condition is met.
CHAPTER 3
72
CONVEX ANALYSIS IN RN
Theorem 3.31 (Kuhn-Tucker Theorem under Convexity) Let U Rn
be open and convex. Let f W U ! R be a concave C 1 function. For
i D 1; : : : ; `, let hi W U ! R be convex C 1 functions. Suppose there is
some xN 2 U such that
N < 0;
hi .x/
i D 1; : : : ; `:
(This is called the Slater’s condition.) Then x maximizes f over
X ´ fx 2 U W hi .x/ 6 0; i D 1; : : : ; `g
if and only if there is 2 R` such that the Kuhn-Tucker first-order
conditions hold:
> 0;
`
X
i hi .x / D 0:
(KTC-1)
i D1
Of .x / D
`
X
i Ohi .x /:
(KTC-2)
i D1
Proof. See Sundaram (1996, Section 7.7).
t
u
Let U D .0; 1/ . 5; 1/. Let f .x; y/ D ln x C ln.y C 5/,
h1 .x; y/ D x C y 4 and h2 .x; y/ D y . Consider the problem
Example 3.32
max f .x; y/
.x;y/2U
s.t.
?
(3.11)
h1 .x; y/ 6 0;
h2 .x; y/ 6 0:
Exercise 3.33 Show that .x ; y / D .4; 0/ is the unique point satisfying the
first-order condition for a local maximizer of the problem (3.11).
Clearly, the Slater’s condition holds. Then, combining Theorem 3.31, Exercise 3.33 and Example 3.26, we conclude that .4; 0/ is a global maximizer.
See Figure 3.6.
Slater’s Condition
For a formal demonstration of the need for Slater’s condition, let us consider the following example.
3.5 CONVEXITY AND OPTIMIZATION
73
y
Lf .ln
4Cl
4
n 5/
Lh
/
.0
1
X
.4; 0/
0
4
x
Figure 3.6: .4; 0/ is the global maximizer.
Example 3.34 Let U D R; let f .x/ D x and g.x/ D x 2 . The only point in
R satisfying g.x/ 6 0 is x D 0, so this is trivially the constrained maximizer
of f . But
f 0 .0/ D 1 and g 0 .0/ D 0;
so there is no > 0 such that f 0 .0/ D g 0 .0/.
Quasi-concavity and Optimization
Quasi-concave and quasi-convex functions fail to exhibit many of the sharp
properties that distinguish concave and convex functions. As an example,
we show that Theorem 3.31 fails for quasi-concave objective functions.
Example 3.35 Let f W R ! R and h W R ! R be quasi-concave continuously
differentiable functions, where
„
f .x/ D
x3
?
if x 2 Œ0; 1
0
.x
and h.x/ D
if x 2 . 1; 0/
2
1/
if x 2 .1; 1/;
x ; see Figure 3.7.
Exercise 3.36 Show that for every point x 2 Œ0; 1, there exists > 0 such
that the pair .x; / satisfies (KT-1) and (KT-2) (see p. 44).
CONVEX ANALYSIS IN RN
CHAPTER 3
74
x/
h.
f .x
/
Furthermore, the Slater’s condition holds. However, it is clear that no
point x 2 Œ0; 1 can be a solution to the problem (Why?).
D
x
0
1
x
Figure 3.7: Quasi-concavity and optimization.
REFERENCES
[1] Apostol, Tom M. (1974) Mathematical Analysis: Pearson Education,
2nd edition. [49]
[2] Axler, Sheldon (1997) Linear Algebra Done Right, Undergraduate
Texts in Mathematics, New York: Springer-Verlag, 2nd edition. [11,
55]
[3] Berkovitz, Leonard D. (2002) Convexity and Optimization in Rn ,
Pure and Applied Mathematics: A Wiley-Interscience Series of Texts,
Monographs and Tracts, New York: Wiley-Interscience. [55]
[4] Bertsekas, Dimitri P. (2009) Convex Optimization Theory, Belmont,
Massachusetts: Athena Scientific. [55]
[5] Bertsekas, Dimitri P., Angelia Nedić, and Asuman E. Ozdaglar
(2003) Convex Analysis and Optimization, Belmont, Massachusetts:
Athena Scientific. [55]
[6] Crossley, Martin D. (2005) Essential Topology, Springer Undergraduate Mathematics Series, London: Springer-Verlag. [7]
[7] Duggan, John (2010) “Notes on Optimization and Pareto Optimality,”
URL: http://www.johnduggan.net/lecture, Rochester University.
[31, 48]
[8] Hiriart-Urruty, Jean-Baptiste and Claude Lemaréchal (2001)
Fundamentals of Convex Analysis, Grundlehren Text Editions, Berlin:
Springer-Verlag. [55]
[9] Jehle, Geoffrey A. and Philip J. Reny (2011) Advanced Microeconomic Theory, Essex, England: Prentice Hall, 3rd edition. [31, 41]
[10] Lay, Steven R. (1982) Convex Sets and Their Applications, New York:
John Wiley & Sons. [55]
REFERENCES
76
[11] Lee, Jeffrey M. (2009) Manifolds and Differential Geometry, Vol. 107
of Graduate Studies in Mathematics, Providence, Rhode Island: American Mathematical Society. [22]
[12] Luenberger, David G. (1969) Optimization by Vector Space Methods,
Wiley Professional Paperback Series, New York: John Wiley & Sons.
[31, 55, 58]
[13] Luenberger, David G. and Yinyu Ye (2008) Linear and Nonlinear
Programming, International Series in Operations Research and Management Science, New York: Springer, 3rd edition. [31]
[14] Mas-Colell, Andreu, Michael D. Whinston, and Jerry R. Green
(1995) Microeconomic Theory, New York: Oxford University Press. [31,
68]
[15] Milgrom, Paul (2004) Putting Auction Theory to Work, Cambridge,
U.K.: Cambridge University Press. [53]
[16] Milgrom, Paul and Ilya Segal (2002) “Envelope Theorems for Arbitrary Choice Sets,” Econometrica, 70 (2): 583–601. [53]
[17]
Munkres, James R. (1991) Analysis on Manifolds, Boulder, Colorado:
Westview Press. [1, 14, 17]
[18]
Ok, Efe A. (2007) Real Analysis with Economic Applications, New Jersey: Princeton University Press. [55]
[19]
O’neill, Barrett (2006) Elementary Differential Geometry, New York:
Academic Press, revised second edition edition. [22]
[20]
Pugh, Charles Chapman (2002) Real Mathematical Analysis, Undergraduate Texts in Mathematics, New York: Springer-Verlag. [17]
[21]
Rockafellar, R. Tyrrell (1970) Convex Analysis, Princeton Mathematical Series, New Jersey: Princeton University Press, 2nd edition.
[55]
[22]
Royden, Halsey and Patrick Fitzpatrick (2010) Real Analysis,
New Jersey: Prentice Hall, 4th edition. [55]
[23]
Rudin, Walter (1976) Principles of Mathematical Analysis, New York:
McGraw-Hill Companies, Inc. 3rd edition. [14, 17, 27]
REFERENCES
[24] Spivak, Michael (1965) Calculus on Manifolds: A Modern Approach
to Classical Theorems of Advanced Calculus, Boulder, Colorado: Westview Press. [1, 14, 17]
[25]
(1999) A Comprehensive Introduction to Differential Geometry, Vol. Two, Houston, Texas: Publish or Perish, Inc. [22]
[26] Sundaram, Rangarajan K. (1996) A First Course in Optimization
Theory, Cambridge, Massachusetts: Cambridge University Press. [31,
34, 40, 41, 44, 72]
[27] Vohra, Rakesh V. (2005) Advanced Mathematical Economics, London: Routledge. [31]
[28] Willard, Stephen (2004) General Topology, New York: Dover Publications, Inc. [4]
[29] Zorich, Vladimir A. (2004) Mathematical Analysis II, Universitext,
Berlin: Springer-Verlag. [49]
77
INDEX
C 1 , 25
C 1 , 27
C r , 27
Active inequality constraint, 42
Affine manifold, 58
Chain Rule, 14
Cobb-Douglas function, 28
Component function, 5
Concave function, 65
Continuously differentiable, 25
Convex combination, 56
Convex function, 65
Convex hull, 57
Convex set, 55
Critical point, 34
Curve, 22
Directional derivative, 9
Envelope, 49
Epigraph, 67
Euclidean n-space, 2
Farkas’ Lemma, 63
First-order Taylor formula, 10
Gradient, 19
Graph, 5
Half space, 58
Helix, 22
Hessian, 27
Hyperplane, 57
Implicit function, 16
Implicit function theorem, 17
Inactive constraint, 42
Inequality constrained optimization,
42
Inferior set, 67
Inner product, 3
Interior, 4
Jacobian matrix, 16, 24
Jensen’s inequality, 68
Kuhn-Tucker Multipliers, 44
Kuhn-Tucker Theorem, 44
Lagrange multiplier, 38, 39
Lagrange multipliers, 38
Lagrange’s Method, 39
Lagrange’s Theorem, 37
Lagrangian, 39
Limit, 5
Local maximum, 33
Maximum, 33
maximum, 33
Negative definite, 28
Negative semidefinite, 28
Norm, 2
Orthogonal complement, 37
Partial derivative, 13
INDEX
Positive definite, 28
Positive semidefinite, 28
Proper Separation Theorem, 61
Quadratic form, 27
quasi-concave function, 66
quasi-convex function, 66
Regular point, 23, 43
Second-order partial derivative, 27
Separation Theorem, 61
Separation theorem, 58
Sequence, 5
Shadow price, 39
Slater’s condition, 72
Standard basis, 13
Strict local maximum, 33
Strictly concave function, 65
Strictly convex function, 65
Strictly quasi-concave function, 66
Strictly quasi-convex function, 66
Strong Separation Theorem, 62
Subgraph, 67
Superior set, 67
Tangent plane, 23
Unconstrained optimization, 33
Value function, 49, 51
79