Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction to Optimization Theory Lecture Notes JIANFEI SHEN SCHOOL OF ECONOMICS SHANDONG UNIVERSITY Besides language and music, mathematics is one of the primary manifestations of the free creative power of the human mind. — Hermann Weyl CONTENTS . . . . . . . . . 1 2 8 13 13 16 19 25 27 28 2 Optimization in Rn 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Unconstrained Optimization . . . . . . . . . . . . . . . . . . . 2.3 Equality Constrained Optimization: Lagrange’s Method . . . 2.4 Inequality Constrained Optimization: Kuhn-Tucker Theorem 2.5 Envelop Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 31 33 35 42 49 3 Convex Analysis in Rn 3.1 Convex Sets . . . . . . . . . . . 3.2 Separation Theorem . . . . . . 3.3 Systems of Linear Inequalities: 3.4 Convex Functions . . . . . . . . 3.5 Convexity and Optimization . . . . . . 55 55 57 63 65 70 1 Multivariable Calculus 1.1 Functions on Euclidean Space . . . . . . . . . . . . . . 1.2 Directional Derivative and Derivative . . . . . . . . . . 1.3 Partial Derivatives and the Jacobian . . . . . . . . . . . 1.4 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . 1.5 The Implicit Function Theorem . . . . . . . . . . . . . . 1.6 Gradient and Its Properties . . . . . . . . . . . . . . . . 1.7 Continuously Differentiable Functions . . . . . . . . . 1.8 Quadratic Forms: Definite and Semidefinite Matrices 1.9 Homogeneous Functions and Euler’s Formula . . . . . . . . . . . . . . . . . . . . . Theorems of . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . the Alternative . . . . . . . . . . . . . . . . . . . . References 75 Index 78 1 MULTIVARIABLE CALCULUS In this chapter we consider functions mapping Rm into Rn , and we define what we mean by the derivative of such a function. It is important to be familiar with the idea that the derivative at a point a of a map between open sets of (normed) vector spaces is a linear transformation between the vector spaces (in this chapter the linear transformation is represented as a n m matrix). This chapter is based on Spivak (1965, Chapters 1 & 2) and Munkres (1991, Chapter 2)—one could do no better than to study theses two excellent books for multivariable calculus. Notation We use standard notation: N ´ f1; 2; 3; : : :g D the set of all natural numbers; Z ´ f0; ˙1; ˙2; : : :g D the set of all integers; n W n; m 2 Z and m ¤ 0 D the set of all rational numbers; Q´ m R ´ the set of all real numbers: We also define RC ´ fx 2 R W x > 0g and RCC ´ fx 2 R W x > 0g: CHAPTER 1 2 1.1 MULTIVARIABLE CALCULUS Functions on Euclidean Space Norm, Inner Product and Metric Definition 1.1 (Euclidean n-space) Euclidean n-space Rn is defined as the set of all n-tuples .x1 ; : : : ; xn / of real numbers xi : ˚ Rn ´ .x1 ; : : : ; xn / W each xi 2 R : An element of Rn is often called a point in Rn , and R1 , R2 , R3 are often called the line, the plane, and space, respectively. If x denotes an element of Rn , then x is an n-tuple of numbers, the i th one of which is denoted xi ; thus, we can write x D .x1 ; : : : ; xn /: A point in Rn is frequently also called a vector in Rn , because Rn , with x C y D .x1 C y1 ; : : : ; xn C yn / x; y 2 Rn and ˛x D .˛x1 ; : : : ; ˛xn /; ˛ 2 R and x 2 Rn ; as operations, is a vector space. To obtain the full geometric structure of Rn , we introduce three structures on Rn : the Euclidean norm, inner product and metric. Definition 1.2 (Norm) In Rn , the length of a vector x 2 Rn , usually called the norm kxk of x , is defined by kxk D q x12 C xn2 : Remark 1.3 The norm k k satisfies the following properties: for all x; y 2 Rn and ˛ 2 R, kxk > 0, kxk D 0 iff1 x D 0, k˛xk D j˛j kxk, kx C yk 6 kxk C kyk (Triangle inequality). 1 “iff” is the abbreviation of “if and only if”. 1.1 FUNCTIONS ON EUCLIDEAN SPACE ? ˇ Exercise 1.4 Prove that ˇkxk (use the triangle inequality). 3 ˇ kykˇ 6 kx yk for any two vectors x; y 2 Rn Definition 1.5 (Inner Product) Given x; y 2 Rn , the inner product of the vectors x and y , denoted x y or hx; yi, is defined as xy D n X xi y i : i D1 Remark 1.6 The norm and the inner product are related through the following identity: p x x: kxk D Theorem 1.7 (Cauchy-Schwartz Inequality) For any x; y 2 Rn we have jx yj 6 kxk kyk: Proof. We assume that x ¤ 0; for otherwise the proof is trivial. For every a 2 R, we have 0 6 kax C yk2 D a2 kxk2 C 2a.x y/ C kyk2 : In particular, let a D the desired result. ? .x y/=kxk2 . Then, from the above display, we get t u Exercise 1.8 Prove the triangle inequality (use the Cauchy-Schwartz Inequality). Show it holds with equality iff one of the vector is a nonnegative scalar multiple of the other. Definition 1.9 (Metric) is given by The distance d.x; y/ between two vectors x; y 2 Rn p d.x; y/ D n X .xi yi /2 : iD1 The distance function d is called a metric. Example 1.10 In R2 , choose two points x 1 D .x11 ; x21 / and x 2 D .x12 ; x22 / with x12 x11 D a and x22 x21 D b . Then Pythagoras tells us that (Figure 1.1) p d.x ; x / D a2 C b 2 D 1 Remark 1.11 2 r x12 x11 2 C x22 x21 2 : The metric is related to the norm k k through the identity d.x; y/ D kx yk: CHAPTER 1 4 MULTIVARIABLE CALCULUS x2 x2 x22 2 p 2/ 1 ;x x21 0 x1 2 a Cb D x22 d .x x12 x11 x21 D b x11 D a x12 x1 Figure 1.1: Distance in the plane. Subsets of Rn Definition 1.12 (Open Ball) Let x 2 Rn and r > 0. The open ball B.xI r/ with center x and radius r is given by ˚ B.xI r/ ´ y 2 Rn W d.x; y/ < r : Definition 1.13 (Interior) Let S Rn . A point x 2 S is called an interior point of S if there is some r > 0 such that B.xI r/ S . The set of all interior points of S is called its interior and is denoted S B . Definition 1.14 Let S Rn . S is open if for every x 2 S there exists r > 0 such that B.xI r/ S . S is closed if its complement Rn X S is open. S is bounded if there exists r > 0 such that S B.0I r/. S is compact if (and only if) it is closed and bounded (Heine-Borel Theorem).2 Example 1.15 On R, the interval .0; 1/ is open, the interval Œ0; 1 is closed. Both .0; 1/ and Œ0; 1 are bounded, and Œ0; 1 is compact. However, the interval .0; 1 is neither open nor closed. But R is both open and closed. 2 This details. definition does not work for more general metric spaces. See Willard (2004) for 1.1 FUNCTIONS ON EUCLIDEAN SPACE 5 Limit and Continuity Functions A function from Rm to Rn (sometimes called a vector-valued function of m variables) is a rule which associates to each point in Rm some point in Rn . We write f W Rm ! Rn to indicate that f .x/ 2 Rn is defined for x 2 Rm . The notation f W A ! Rn indicates that f .x/ is defined only for x in the set A, which is called the domain of f . If B A, we define f .B/ as the set of all f .x/ for x 2 B : f .B/ ´ ff .x/ W x 2 Bg : If C Rn we define f 1 .C / ´ fx 2 A W f .x/ 2 C g : The notation f W A ! B indicates that f .A/ B . The graph of f W A ! B is the set of all pairs .a; b/ 2 A B such that b D f .a/. A function f W A ! Rn determines n component functions f1 ; : : : ; fn W A ! R by f .x/ D f1 .x/; : : : ; fn .x/ : Sequences A sequence is a function that assigns to each natural number n 2 N a vector or point x n 2 Rn . We usually write the sequences as .x n /1 nD1 or .x n /. Example 1.16 Examples of sequences in R2 are (a) .x n / D ..n; n//. (b) .x n / D ..cos n ; sin n //. 2 2 (c) .x n / D ... 1/n =2n ; 1=2n //. (d) .x n / D ... 1/n 1=n; . 1/n 1=n//. See Figure 1.2. Definition 1.17 (Limit) A sequence .x n / is said to have a limit x or to converge to x if for every " > 0 there is N" 2 N such that whenever n > N" , we have x n 2 B.xI "/. We write lim x n D x n!1 or x n ! x: Example 1.18 In Example 1.16, the sequences (a), (b) and (d) do not converge, while the sequence (c) converges to .0; 0/. CHAPTER 1 6 MULTIVARIABLE CALCULUS x2 x2 x1 x5 x2 x6 x3 x4 x1 0 x2 x1 x3 x1 0 (b) .x n / D .cos (a) .x n / D .n; n/ n n 2 ; sin 2 / x2 x1 x2 x3 x4 x1 0 ˚ (c) .x n / D .. 1/n =2n ; 1=2n / , which is convergent x2 x2 x4 x1 0 x3 x1 (d) .x n / D .. 1/n 1=n; . 1/n 1=n/ Figure 1.2: Examples of sequences 1.1 FUNCTIONS ON EUCLIDEAN SPACE 7 0 0 (a) A continuous function. x (b) We cannot draw the graph of 1=x without taking our pencil off the paper. Figure 1.3: Naive continuity. Naive Continuity The simplest way to say that a function f W A ! R is continuous is to say that one can draw its graph without taking the pencil off the paper. For example, a function whose graph looks like in Figure 1.3(a) would be continuous in this sense (Crossley, 2005, Chapter 2). But if we look at the function f .x/ D 1=x , then we see that things are not so simple. The graph of this function has two parts—one part corresponding to negative x values, and the other to positive x values. The function is not defined at 0, so we certainly cannot draw both parts of this graph without taking our pencil off the paper; see Figure 1.3(b). Of course, f .x/ D 1=x is continuous near every point in its domain. Such a function deserves to be called continuous. So this characterization of continuity in terms of graph-sketching is too simplistic. Rigorous Continuity The notation limx!a f .x/ D b means, as in the one-variable case, that we get f .x/ as close to b as desired, by choosing x sufficiently close to, but not equal to, a. In mathematical terms this means that for every number " > 0 there is a number ı > 0 such that kf .x/ bk < " for all x in the domain of f which satisfy 0 < kx ak < ı . A function f W A ! Rn is called continuous at a 2 A if limx!a f .x/ D f .a/, and f is continuous if it is continuous at each a 2 A. CHAPTER 1 8 ? Exercise 1.19 ˚ Let f .x/ D x MULTIVARIABLE CALCULUS if x ¤ 1 3=2 if x D 1: Show that f .x/ is not continuous at a D 1. 1.2 Directional Derivative and Derivative Let us first recall how the derivative of a real-valued function of a real variable is defined. Let A R; let f W A ! R. Suppose A contains a neighborhood of the point a, that is, there is an open ball B.aI r/ such that B.aI r/ A. We define the derivative of f at a by the equation f 0 .a/ D lim t!0 f .a C t / t f .a/ ; (1.1) provided the limit exists. In this case, we say that f is differentiable at a. Geometrically, f 0 .a/ is the slope of the tangent line to the graph of f at the point .a; f .a//. Definition 1.20 For a function f W .a; b/ ! R, and point x0 2 .a; b/, if lim t "0 f .x0 C t / t f .x0 / exists and is finite, we denote this limit by f 0 .x0 / and call it the left-hand derivative of f at x0 . Similarly, we define fC0 .x0 / and call it the right-hand derivative of g at x0 . Of course, f is differentiable at x0 iff it has left-hand and right-hand derivatives at x0 that are equal. Now let A Rm , where m > 1; let f W A ! Rn . Can we define the derivative of f by replacing a and t in the definition just given by points of Rm ? Certainly we cannot since we cannot divide a point of Rn by a point of Rm if m > 1. Directional Derivative The following is our first attempt at a definition of “derivative”. Let A Rm and let f W A ! Rn . We study how f changes as we move from a point a 2 AB (the interior of A) along a line segment to a nearby point a C u, where u ¤ 0. Each point on the segment can be expressed as a C tu, where t 2 R. The vector u describes the direction of the line segment. Since a is an interior point of A, the line segment joining a to a C t u will lie in A if t is small enough. 1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 9 Definition 1.21 (Directional Derivative) Let A Rm ; let f W A ! Rn . Suppose A contains a neighborhood of a. Given u 2 Rm with u ¤ 0, define f 0 .aI u/ D lim t!0 f .a C tu/ t f .a/ ; provided the limit exists. This limit is called the directional derivative of f at a with respect to the vector u. Remark 1.22 In calculus, one usually requires u to be a unit vector, i.e., kuk D 1, but that is not necessary. Example 1.23 Let f W R2 ! R be given by the equation f .x1 ; x2 / D x1 x2 . The directional derivative of f at a D .a1 ; a2 / with respect to the vector u D .u1 ; u2 / is f 0 .aI u/ D lim t !0 f .a C tu/ t f .a/ .a1 C t u1 /.a2 C t u2 / t D u2 a1 C u1 a2 : D lim a1 a2 t !0 Example 1.24 Suppose the directional derivative of f at a with respect to u exists. Then for c 2 R, f .a C t cu/ t!0 t f 0 .aI cu/ D lim f .a/ f .a C t cu/ f .a/ t!0 tc f .a C su/ f .a/ D c lim s!0 s 0 D cf .aI u/: D c lim Remark 1.25 Example 1.24 shows that if u and v are collinear vectors in Rm , then f 0 .aI u/ and f 0 .aI v/ are collinear in Rn . However, directional derivative is not the appropriate generalization of the notion of “derivative”. The main problems are: Problem 1. Continuity does not follow from this definition of “differentiability”. There exists functions such that f 0 .aI u/ exists for all u ¤ 0 but are not continuous. Example 1.26 Define f W R2 ! R by setting f .x; y/ D 0 if .x; y/ D .0; 0/ x2y x4 C y2 if .x; y/ ¤ .0; 0/: CHAPTER 1 10 MULTIVARIABLE CALCULUS We show that all directional derivatives of f exist at 0, but that f is not continuous at 0. Let u D .h; k/ ¤ 0. Then f .0 C t u/ t f .0/ h2 k .t h/2 .t k/ D 2 4 D ; t h C k2 .t h/4 C .t k/2 t ˚ so that f 0 .0I u/ D h2 =k if k ¤ 0; 0 if k D 0: However, the function f takes the value 1=2 as each point of the parabola y D x 2 (except at 0), so f is not continuous at 0 since f .0/ D 0. Problem 2. tiable. Composites of “differentiable” functions may not differen- Derivative To give the right generalization of the notion of “derivative”, let us reconsider (1.1). In fact, if f 0 .a/ exists, let Ra .t / denote the difference ˚ Ra .t / ´ f .aCt / f .a/ t 0 f 0 .a/ if t ¤ 0 if t D 0: (1.2) From (1.2) we see that lim t!0 Ra .t / D 0. Then we have f .a C t / D f .a/ C f 0 .a/t C Ra .t /t: (1.3) Note that (1.3) also holds for t D 0. This is called the first-order Taylor formula for approximating f .a C t / f .a/ by f 0 .a/t . The error committed is Ra .t /t .3 See Figure 1.4. It is this idea leads to the following definition: Definition 1.27 (Differentiability) Let A Rm ; let f W A ! Rn . Suppose A contains a neighborhood of a. We say that f is differentiable at a if there is an n m matrix Ba such that f .a C h/ D f .a/ C Ba h C khk Ra .h/; where limh!0 Ra .h/ D 0. The matrix Ba , which is unique, is called the derivative of f at a; it is denoted Df .a/. 3 “All science is dominated by the idea of approximation.”—Bertrand Russell. 1.2 DIRECTIONAL DERIVATIVE AND DERIVATIVE 11 a/ f. a f. C t/ f .a/ 0 .a/ t t f .x / a 0 f x Figure 1.4: f 0 .a/t is the linear approximation to f .a C t / a f .a/ at a. f .a/ f Df .a/ 0 Figure 1.5: Df .a/ is the linear part of f at a. Remark 1.28 [1] Notice that h is a point of Rm and f .a C h/ is a point of Rn , so the norm signs are essential. x f .a/ Ba h [2] The derivative Df .a/ depends on the point a as well as the function f . We are not saying that there exists a B which works for all a, but that for a fixed a such a B exists. [3] Here is how to visualize Df . Take m D n D 2. The function f W A ! R2 distorts shapes nonlinearly; its derivative describes the linear part of the distortion. Circles are sent by f to wobbly ovals, but they become ellipses under Df .a/ (here we treat Df .a/ as the matrix that represents a linear operator; see, e.g., Axler 1997.) Lines are sent by f to curves, but they become straight lines under Df .a/. See Figure 1.5. Example 1.29 Let f W Rm ! Rn be defined by the equation f .x/ D A x C b; CHAPTER 1 12 MULTIVARIABLE CALCULUS where A is an n m matrix, and a 2 Rn . Then f .a C h/ D A .a C h/ C b D A a C b C A h D f .a/ C A h: Hence, Ra .h/ D Œf .a C h/ f .a/ A h=khk D 0; that is, Df .a/ D A. We now show that the definition of derivative is stronger than directional derivative. In particular, we have: Theorem 1.30 Let A Rm ; let f W A ! Rn . If f is differentiable at a, then f is continuous at a. Proof. Differentiablity at a implies that kf .a C h/ f .a/k D Df .a/ h C khk Ra .h/ 6 k Df .a/k khk C kRa .h/k khk ! 0; as a C h ! a, where the inequality follows from the Triangle Inequality and Cauchy-Schwartz Inequality (Theorem 1.7). t u However, there is a nice connection between directional derivative and derivative. Theorem 1.31 Let A Rm ; let f W A ! Rn . If f is differentiable at a, then all the directional derivatives of f at a exist, and f 0 .aI u/ D Df .a/ u: Proof. Fix any u 2 Rm and take h D t u. Then f .a C t u/ t Df .a/ .t u/ C kt uk Ra .t u/ t jt j kuk D Df .a/ u C Ra .t u/: t The last term converges to zero as t ! 0, which proves that f 0 .aI u/ D Df .a/ u. t u ? Exercise 1.32 f .a/ D Define f W R2 ! R by setting f .x; y/ D 0 if .x; y/ D .0; 0/ x2y x4 C y2 if .x; y/ ¤ .0; 0/: Show that f is not differentiable at .0; 0/. ? Exercise 1.33 Let f W R2 ! R be defined by f .x; y/ D is not differentiable at .0; 0/. p jxyj. Show that f 1.3 PARTIAL DERIVATIVES AND THE JACOBIAN 1.3 13 Partial Derivatives and the Jacobian We now introduce the notion of the “partial derivatives” of a real-valued function. Let .e 1 ; : : : ; e m / be the standard basis of Rm , i.e., e 1 D .1; 0; 0; : : : ; 0/; e 2 D .0; 1; 0; : : : ; 0/; ::: e m D .0; 0; : : : ; 0; 1/: Definition 1.34 (Partial Derivatives) Let A Rm ; let f W A ! R. We define the j th partial derivative of f at a to be the directional derivative of f at a with respect to the vector ej , provided this derivative exists; and we denote it by Dj f .a/. That is, Dj f .a/ D lim t!0 f .a C tej / t f .a/ : Remark 1.35 It is important to note that Dj f .a/ is the ordinary derivative of a certain function; in fact, if g.x/ D f .a1 ; : : : ; aj 1 ; x; aj C1 ; : : : ; am /, then Dj f .a/ D g 0 .aj /. This means that Dj f .a/ is the slope of the tangent line at .a; f .a// to the curve obtained by intersecting the graph of f with the plane xi D ai with i ¤ j . See Figure 1.6. The following theorem relates partial derivatives to the derivative in the case where f is a real-valued function. Theorem 1.36 Let A Rm ; let f W A ! R. If f is differentiable at a, then h Df .a/ D D1 f .a/ D2 f .a/ i Dm f .a/ : Proof. If f is differentiable at a, then Df .a/ is a .1 m/-matrix. Let h Df .a/ D 1 2 i m : It follows from Theorem 1.31 that Dj f .a/ D f 0 .aI ej / D Df .a/ ej D j : 1.4 The Chain Rule We extend the familiar chain rule to the current setting. t u CHAPTER 1 14 MULTIVARIABLE CALCULUS x2 .a; b/ x1 Figure 1.6: D1 f .a; b/. Theorem 1.37 (Chain Rule) Let A Rm ; let B Rn . Let f W A ! Rn and g W B ! Rp , with f .A/ B . Suppose f .a/ D b. If f is differentiable at a, and if g is differentiable at b, then the composite function g B f W A ! Rp is differentiable at a. Furthermore, D.g B f /.a/ D Dg.b/ Df .a/: Proof. Omitted. See, e.g., Spivak (1965, Theorem 2-2), Rudin (1976, Theorem 9.15), or Munkres (1991, Theorem 7.1). t u Here is an application of the Chain Rule. Theorem 1.38 Let A Rm ; let f W A ! Rn . Suppose A contains a neighborhood of a. Let fi W A ! R be the i th component function of f , so that 2 3 f1 .x/ 6 : 7 : 7 f .x/ D 6 4 : 5: fn .x/ (a) The function f is differentiable at a if and only if each component function fi is differentiable at a. 1.4 THE CHAIN RULE 15 (b) If f is differentiable at a, then its derivative is the .n m/-matrix whose i th row is the derivative of the function fi . That is, 2 3 2 Df1 .a/ D1 f1 .a/ 6 7 6 :: :: 6 7 6 Df .a/ D 4 : : 5D4 Dfn .a/ D1 fn .a/ :: : 3 Dm f1 .a/ 7 :: 7: : 5 Dm fn .a/ Proof. (a) Assume that f is differentiable at a and express the i th component of f as fi D i B f; where i W Rn ! R is the projection that sends a vector x D .x1 ; : : : ; xn / to xi . Notice that we can write i as i .x/ D A x , where A is a 1 n matrix such that h i AD 0 0 1 0 0 ; where the number 1 appears at the i th place. Then i is differentiable and D.x/ D A for all x 2 A (see Example 1.29). By the Chain Rule, fi is differentiable at a and Dfi .a/ D D.i B f /.a/ D Di f .a/ Df .a/ D A Df .a/: (1.4) Now suppose that each fi is differentiable at a. Let 2 3 Df1 .a/ 6 7 :: 7: B´6 : 4 5 Dfn .a/ We show that Df .a/ D B. 2 khk Ra .h/ D f .a C h/ f .a/ f1 .a C h/ f1 .a/ 6 :: BhD6 : 4 fn .a C h/ fn .a/ 2 3 Ra .hI f1 / 6 7 :: 7 D khk 6 : 4 5 Ra .hI fn /; Df1 h Dfn h 3 7 7 5 where Ra .hI fi / is the Taylor remainder for fi . It is clear that limh!0 Ra .h/ D 0, and which proves that Df .a/ D B. CHAPTER 1 16 (b) MULTIVARIABLE CALCULUS This claim follows from the previous part and Theorem 1.36. t u Remark 1.39 Theorem 1.38 implies that there is little loss of generality assuming n D 1, i.e., that our functions are real-valued. Multidimensionality of the domain, not the range, is what distinguished multivariable calculus from one-variable calculus. Definition 1.40 (Jocobian Matrix) Let A Rm ; let f W A ! Rn . If the partial derivatives of the component functions fi of f exist at a, then one can form the matrix that has Dj fi .a/ as its entry in row i and column j . This matrix, denoted by J f .a/, is called the Jacobian matrix of f . That is, 2 D1 f1 .a/ 6 :: J f .a/ D 6 : 4 D1 fn .a/ :: : 3 Dm f1 .a/ 7 :: 7: : 5 Dm fn .a/ Remark 1.41 [1] The Jacobian encapsulates all the essential information regarding the linear function that best approximates a differentiable function at a particular point. For this reason it is the Jacobian which is usually used in practical calculations with the derivative [2] If f is differentiable at a, then J f .a/ D Df .a/. However, it is possible for the partial derivatives, and hence the Jacobian matrix, to exist, without it following that f is differentiable at a (see Exercise 1.32). 1.5 The Implicit Function Theorem Let U Rk Rn be open. Let f W U ! Rn . Fix a point .a; b/ 2 U and write f .a; b/ D 0. Our goal is to solve the equation (1.5) f .x; y/ D 0 near .a; b/. More precisely, we hope to show that the set of points .x; y/ nearby .a; b/ at which f .x; y/ D 0, the level-set of f through 0, is the graph of a function y D g.x/. If so, g is the implicit function defined by (1.5). See Figure 1.7. Under various hypotheses we will show that g exists, is unique, and is differentiable. Let us first consider an example. Example 1.42 Consider the function f W R2 ! R defined by f .x; y/ D x 2 C y 2 1: 1.5 THE IMPLICIT FUNCTION THEOREM 17 Rn Lf .0/ b 0 a Rk Figure 1.7: Near .a; b/, Lf .0/ is the graph of a function y D g.x/. If we choose .a; b/ with f .a; b/ D 0 and a ¤ ˙1, there are (Figure 1.8) open intervals A containing a and B containing b with the following property: if x 2 A, there is a unique y 2 B with f .x; y/ D 0. We can therefore define a function g W A ! R by the condition g.x/p 2 B and f .x; g.x// D 0 (if b > 0, as indicated in Figure 1.8, then g.x/ D 1 x 2 ). For the function f we are considering there is another number b1 such that f .a; b1 / D 0. There will also be an interval B1 containing b1 such that, when p x 2 A, we have 1 x 2 ). Both g and f .x; g1 .x// D 0 for a unique g1 .x/ 2 B1 (here g1 .x/ D g1 are differentiable. These functions are said to be defined implicitly by the equation f .x; y/ D 0. If we choose a D 1 or 1 it is impossible to find any such function g defined in an open interval containing a. Theorem 1.43 (Implicit Function Theorem) Let U RkCn be open; let f W U ! Rn be of class C r . Write f in the form f .x; y/ for x 2 Rk and y 2 Rn . Suppose that .a; b/ is a point of U such that f .a; b/ D 0. Let M be the n n matrix 2 DkC1 f1 .a; b/ 6 6 DkC1 f2 .a; b/ MD6 :: 6 : 4 DkC1 fn .a; b/ DkC2 f1 .a; b/ DkC2 f1 .a; b/ :: : DkC2 fn .a; b/ :: : 3 DkCn f1 .a; b/ 7 DkCn f2 .a; b/7 7: :: 7 : 5 DkCn fn .a; b/ If det .M/ ¤ 0, then near .a; b/, Lf .0/ is the graph of a unique function y D g.x/. Besides, g is C r . Proof. The proof is too long to give here. You can find it from, e.g., Spivak (1965, Theorem 2-12), Rudin (1976, Theorem 9.28), Munkres (1991, Theorem 2.9.2), or Pugh (2002, Theorem5.22). t u CHAPTER 1 18 MULTIVARIABLE CALCULUS y graph of g B .a; b/ b A a 0 f .x; y/ D 0 B1 x b1 graph of g1 Figure 1.8: Implicit function theorem. Example 1.44 Let f W R2 ! R be given by the equation f .x; y/ D x 2 y3: Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0, we do not expect to be able to solve this equation for y in terms of x near .0; 0/. But in fact, we can; and furthermore, the solution is unique! However, the function we obtain is not differentiable at x D 0. See Figure 1.9. Example 1.45 Let f W R2 ! R be given by the equation f .x; y/ D x4 C y2: Then .0; 0/ is a solution of the equation f .x; y/ D 0. Because @f .0; 0/=@y D 0, we do not expect to be able to solve for y in terms of x near .0; 0/. In fact, however, we can do so, and we can do so in such a way that the resulting function is differentiable. However, the solution is not unique. See Figure 1.10. Now the point .1; 1/ is also a solution to f .x; y/ D 0. Because @f .1; 1/=@y D 2, one can solve this equation for y as a continuous function of x in a neighborhood of x D 1. See Figure 1.10. 1.6 GRADIENT AND ITS PROPERTIES 19 y x 0 Figure 1.9: y is not differentiable at x D 0. y 1 −2 0 −1 1 x 2 −1 Figure 1.10: Example 1.45. Remark 1.46 We will use the Implicit Function Theorem in Theorem 2.12. The theorem will also be used to derive comparative statics for economic models, which we perhaps do not have time to discuss. 1.6 Gradient and Its Properties In this section we investigate the significance of the gradient vector, which is defined as follows: Definition 1.47 (Gradient) Let A Rm ; let f W A ! R. Suppose A contains a neighborhood of a. The gradient of f , denoted by Of .a/, is defined by h Of .a/ ´ D1 f .a/ D2 f .a/ i Dm f .a/ : Remark 1.48 [1] It follows from Theorem 1.36 that if f is differentiable at a, then Of .a/ D Df .a/. The inverse does not hold; see Remark 1.41[2]. CHAPTER 1 20 MULTIVARIABLE CALCULUS x2 y Of .1; 3/ D .2; 6/ Of .x/ .1; 3/ Tan ge nt p lane x slope D x 0 Lf .10/ Lf .c / D1 f .x/ D2 f .x/ x1 0 (a) (b) Figure 1.11: The geometric interpretation of Of .x/. [2] With the notation of gradient, we can write the Jocobian of f D .f1 ; : : : ; fn / as 2 3 6 J f .a/ D 6 4 Of1 .a/ :: 7 7 : 5: Ofn .a/ Let f W Rm ! Rn with f D .f1 ; : : : ; fn /. Recall that the level set of f through 0 is given by n \ ˚ Lfi .0/: Lf .0/ ´ x 2 Rm W f .x/ D 0 D (1.6) i D1 Given a point a 2 Lf .0/, it is intuitively clear what it means for a plane to be tangent to Lf .0/ at a. Figure 1.12 shows some examples of tangent planes. A formal definition of tangent plane will be given later. In this section we show that the gradient Of .a/ is orthogonal to the tangent plane of Lf .0/ at a and, under some conditions, the tangent plane of Lf .0/ at a can be characterized by the vectors that are orthogonal to Of .a/. We begin with the simplest case that f W R2 ! R. Here is an example. Example 1.49 Let f .x; y/ D x 2 C y 2 . The level set Lf .10/ is given by n o Lf .10/ D .x; y/ 2 R2 W x 2 C y 2 D 10 : Calculus yields ˇ d y ˇˇ D dx ˇalong Lf .10/ at .1;3/ 1 : 3 1.6 GRADIENT AND ITS PROPERTIES 21 Of .a/ Tangent plane a Lf .0/ (a) f W R2 ! R. Of .a/ Tangent plane a Lf .0/ (b) f W R3 ! R. Of1 .a/ Tangent plane a Of2 .a/ Lf1 .0/ Lf2 .0/ (c) f1 ; f2 W R3 ! R. Figure 1.12: Examples of tangent planes. CHAPTER 1 22 MULTIVARIABLE CALCULUS Hence, the tangent plane at .1; 3/ is given by y D 3 .x 1/=3. Since Of .1; 3/ D .2; 6/, the result follows immediately; see Figure 1.11(a). The result in Example 1.49 can be explained as follows. If we change x1 and x2 , and are to remain on Lf .0/, then dx1 and dx2 must be such as to leave the value of f unchanged at 0. They must therefore satisfy f 0 .xI . dx1 ; dx2 // D D1 f .x/ dx1 C D2 f .x/ dx2 D 0: (1.7) By solving (1.7) for dx2 = dx1 , the slope of the level set through x will be (see Figure 1.11(b)) d x2 D1 f .x/ D : dx1 D2 f .x/ Since the slope of the vector Of .x/ D . D1 f .x/; D2 f .x// is D2 f .x/= D1 f .x/, we obtain the desired result. We then present the general result. For simplicity, we assume throughout this section that each fi 2 C 1 . For Lf .0/ defined in (1.6), obtaining an explicit representation for the tangent plane is a fundamental problem that we now address. First we define curves on Lf .0/ and the tangent plane at some point x 2 Rm . You may want to refer some Differential Geometry textbooks, e.g., O’neill (2006), Spivak (1999), or Lee (2009), for understanding some of the following concepts better. One can picture a curve in Rm as a trip taken by a moving point c . At each “time” t in some interval Œa; b R, c is located at the point c.t / D c1 .t /; : : : ; cm .t / 2 Rm : In rigorous terms then, c is a function from Œa; b to Rm , and the component functions c1 ; : : : ; cm are its Euclidean coordinate functions. We define the function c to be differentiable provided its component functions are differentiable. Example 1.50 A helix c W R ! R3 is obtained through the formula c.t / D .a cos t; a sin t; bt/ ; where a; b > 0. See Figure 1.13. Definition 1.51 A curve on Lf .0/ is a continuous curve c W Œa; b ! Lf .0/. A curve c.t / is said to pass through the point a 2 Lf .0/ if a D c.t / for some t 2 Œa; b. 1.6 GRADIENT AND ITS PROPERTIES 23 Figure 1.13: The Helix. Definition 1.52 The tangent plane at a 2 Lf .0/, denoted Tf .a/, is defined as the collection of the derivatives at a of all differentiable curves on Lf .0/ passing through a. Ideally, we would like to express the tangent plane defined in Definition 1.52 in terms of derivatives of functions fi that defines the surface Lf .0/ (see Example 1.49). We introduce the subspace ˚ M ´ x 2 Rm W Df .a/ x D 0 (1.8) and investigate under what conditions M is equal to the tangent plane at a. The following result shows that Ofi .a/ is orthogonal to the tangent plane Tf .a/ for all a 2 Lf .0/. Theorem 1.53 For each a 2 Lf .0/, the gradient Ofi .a/ is orthogonal to the tangent plane Tf .a/. Proof. We establish this result by showing Tf .a/ M for each a 2 Lf .0/. Every curve c.t / passing through a at t D t satisfies f .x.t// D 0, and so Df c.t / Dc.t / D 0: That is, Dc.t / 2 M . t u Definition 1.54 A point a 2 Lf .0/ is said to be a regular point if the gradient vectors .Of1 .a/; : : : ; Ofn .a// are linearly independent. In general, at regular points it is possible to characterize the tangent plane in terms of Of1 .a/; : : : ; Ofn .a/. CHAPTER 1 24 Theorem 1.55 equal to M MULTIVARIABLE CALCULUS At a regular point a of Lf .0/, the tangent plane Tf .a/ is Proof. We show that M Tf .a/. Combining this result with Theorem 1.53, we have Tf .a/ D M . To show M Tf .a/, we must show that if x 2 M then there exists a curve on Lf .a/ passing through a with derivative x . To construct such a curve we consider the equations f a C tx C Df .a/T u.t / D 0; (1.9) where for fixed t we consider u.t / 2 Rn to be the unknown. This is a nonlinear system of n equations and n unknowns, parametrized continuously by t . At t D 0 there is a solution u.0/ D 0. The Jacobian matrix of the system with respect to u at t D 0 is the n n matrix Df .a/ Df .a/T ; which is nonsingular, since Df .a/ is of full rank if a is a regular point. Thus, by the Implicit Function Theorem (Theorem 1.43) there is a continuously differentiable solution u.t / in some region t 2 Œ a; a. The curve c.t / ´ a C t x C Df .a/T u.t / is thus a curve on Lf .0/. By differentiating the system (1.9) with respect to t at t D 0 we obtain h i Df .a/ x C Df .a/T Du.0/ D 0: (1.10) By definition of x we have Df .a/x D 0 and thus, again since Df .a/ Df .a/T is nonsingular, we conclude from (1.10) that h i Du.0/ D Df .a/ Df .a/T 1 0 D 0: Therefore, Dc.0/ D x C Df .a/T Du.0/ D x; and the constructed curve has derivative x at a. t u Example 1.56 [1] In R2 let f .x1 ; x2 / D x1 . Then Lf .0/ is the x2 -axis, and every point on that asis is regular since Of .0; x2 / D .1; 0/. In this case, Tf ..x1 ; 0// D M , and which is the x2 -axis. 1.7 CONTINUOUSLY DIFFERENTIABLE FUNCTIONS 25 [2] In R2 let f .x1 ; x2 / D x12 . Again, Lf .0/ is the x2 -axis, but now no point on the x2 -axis is regular: Of .0; x2 / D .0; 0/. Indeed in this case M D R2 , while the tangent plane is the x2 -axis. We close this section by providing another property of gradient vectors. Let f W Rn ! R be a differentiable function and a 2 Rn , where Of .a/ ¤ 0. Suppose that we want to determine the direction in which f increases most rapidly at a. By a “direction” here we mean a unit vector u. Let u denote the angle between u and Of .a/. Then f 0 .aI u/ D Of .a/ u D kOf .a/k cos u : But cos u attains its maximum value of 1 when u D 0, that is, when u and Of .a/ are collinear and point in the same direction. We conclude that kOf .a/k is the maximum value of f 0 .aI u/ for u a unit vector, and that this maximum value is attained with u D Of .a/= kOf .a/k. 1.7 Continuously Differentiable Functions We know that mere existence of the partial derivatives does not imply differentiability (see Exercise 1.32). If, however, we impose the additional condition that these partial derivatives are continuous, then differentiability is assured. Theorem 1.57 Let A be open in Rm . Suppose that the partial derivatives Dj fi .x/ of the component functions of f exist at each point x 2 A and are continuous on A. Then f is differentiable at each point of A. A function satisfying the hypotheses of this theorem is often said to be continuously differentiable, or of class C 1 , on A. Proof of Theorem 1.57. It suffices to show that each component function of f is differentiable by Theorem 1.38. Therefore we may restrict ourselves to the case of a real-valued function f W A ! R. Let a 2 A. We claim that Df .a/ D Of .a/ when f 2 C 1 . Recall that .e 1 ; : : : ; e m / is the standard basis of Rm . Then every h D P .h1 ; : : : ; hm / 2 Rm can be written as h D m iD1 hi e i . For each i D 1; : : : ; m, let pi ´ a C i X hk e k D p i 1 C hi e i ; kD1 where p 0 ´ a. Figure 1.14 illustrates the case where m D 3 and all hi are positive. For each i D 1; : : : ; m, we also define a function i W Œ0; 1 ! Rm by CHAPTER 1 26 MULTIVARIABLE CALCULUS x3 p3 D a C h p0 D a x2 p1 p2 x1 Figure 1.14: The segmented path from a to a C h. letting i .t / D pi 1 C t hi e i : So, i is a segment from p i 1 to p i . By the one-dimensional chain rule and mean value theorem applied to the differentiable real-valued function g W Œ0; 1 ! R defined by g.t / D .f B i /.t /; there exists tNi 2 .0; 1/ such that f .pi / f .pi 1/ D g.1/ 0 g.0/ D g tNi d f hi1 1 ; hii D ˇ ˇ C t hi ; hiiC11 ; : : : ; him 1 ˇ ˇ ˇ dt t DtNi 1 i 1 1 ; hi D Di f i tNi hi : Telescoping f .a C h/ f .a C h/ f .a/ f .a/ along .1 ; : : : ; m / gives Of .a/ h D D m h X f pi i D1 m X i D1 f pi Di f i tNi 1 i Of .a/ h Di f .a/ hi : 1.8 QUADRATIC FORMS: DEFINITE AND SEMIDEFINITE MATRICES Continuity of the partials implies that Di f .i .tNi // 0. 27 Di f .a/ ! 0 as khk ! t u Remark 1.58 It follows from Theorem 1.57 that sin.xy/ and xy 2 C ze xy are both differentiable since they are of class C 1 . Let A Rm and f W A ! Rn . Suppose that the partial derivative Dj fi of the component functions of f exist on A. These then are functions from A to R, and we may consider their partial derivatives, which have the form Dk . Dj fi / µ Dj k fi and are called the second-order partial derivatives of f . Similarly, one defines the third-order partial derivatives of the functions fi , or more generally the partial derivatives of order r for arbitrary r . Definition 1.59 If the partial derivatives of the function fi of order less than or equal to r are continuous on A, we say f is of class C r on A. We say f is of class C 1 on A if the partials of the functions fi of all orders are continuous on A. Definition 1.60 (Hessian) Let a 2 A Rm ; let f W A ! R be twicedifferentiable at a. The m m matrix representing the second derivative of f is called the Hessian of f , denoted Hf .a/: 2 D11 f .a/ 6 6 D21 f .a/ Hf .a/ D 6 :: 6 : 4 Dm1 f .a/ D12 f .a/ D22 f .a/ :: : Dm2 f .a/ :: : 3 D1m f .a/ 7 D2m f .a/ 7 7 D D.Of /: :: 7 : 5 Dmm f .a/ Remark 1.61 If f W A ! R is of class C 2 , then the Hessian of f is a symmetric matrix, i.e., Dij f .a/ D Dj i f .a/ for all i; j D 1; : : : ; m and for all a 2 A. See Rudin (1976, Corollary to Theorem 9.41, p. 236). ? Exercise 1.62 Find the Hessian of the Cobb-Douglas function f .x; y/ D x ˛ y ˇ : 1.8 Quadratic Forms: Definite and Semidefinite Matrices Definition 1.63 (Quadratic Form) Let A be a symmetric n n matrix. A quadratic form on Rn is a function QA W Rn ! R of the form QA .x/ D x Ax D n X n X i D1 j D1 aij xi xj : CHAPTER 1 28 MULTIVARIABLE CALCULUS Since the quadratic form QA is completely specified by the matrix A, we henceforth refer to A itself as the quadratic form. Observe that if f is of class C 2 , then the Hessian Hf of f defines a quadratic form; see Remark 1.61. Definition 1.64 A quadratic form A is said to be positive definite if we have x Ax > 0 for all x 2 Rn X f0g; positive semidefinite if we have x Ax > 0 for all x 2 Rn ; negative definite if we have x Ax < 0 for all x 2 Rn X f0g; negative semidefinite if we have x Ax 6 0 for all x 2 Rn . 1.9 Homogeneous Functions and Euler’s Formula Definition 1.65 (Homogeneous Function) A function f W Rn ! R is homogeneous of degree r (for r D : : : ; 1; 0; 1; : : :) if for every t > 0 we have f .tx1 ; : : : ; txn / D t r f .x1 ; : : : ; xn /: ? Exercise 1.66 The function f .x; y/ D Ax ˛ y ˇ ; A; ˛; ˇ > 0; is known as the Cobb-Douglas function. Check whether this function is homogeneous. Theorem 1.67 (Euler’s Formula) Suppose that f W Rn ! R is homogeneous of degree r (for some r D : : : ; 1; 0; 1; : : :) and differentiable. Then at any x 2 Rn we have Of .x / x D rf .x /: Proof. By definition we have f .tx / t r f .x / D 0: Differentiating with respect to t using the chain rule, we have Of .t x / x D rt r Evaluating at t D 1 gives the desired result. 1 f .x /: t u Lemma 1.68 If f is homogeneous of degree r , its partial derivatives are homogeneous of degree r 1. 1.9 HOMOGENEOUS FUNCTIONS AND EULER’S FORMULA ? ? Exercise 1.69 Prove Lemma 1.68. Exercise 1.70 Let f .x; y/ D Ax ˛ y ˇ with ˛ C ˇ D 1 and A > 0. Show that Theorem 1.67 and Lemma 1.68 hold for this function. 29 2 OPTIMIZATION IN RN This chapter is based on Luenberger (1969, Chapters 8 & 9), Mas-Colell, Whinston and Green (1995, Sections M.J & M.K), Sundaram (1996), Vohra (2005), Luenberger and Ye (2008, Chapter 11), Duggan (2010), and Jehle and Reny (2011, Chapter A2). 2.1 Introduction An optimization problem in Rn , or simply an optimization problem, is one where the values of a given function f W Rn ! R are to be maximized or minimized over a given set X Rn . The function f is called the objective function and the set X the constraint set. Notationally, we will represent these problems by Maximize f .x/ subject to x 2 X; Minimize f .x/ subject to x 2 X; and respectively. More compactly, we shall also write max ff .x/ W x 2 Xg and min ff .x/ W x 2 Xg : Example 2.1 [a] Let X D Œ0; 1/ and f .x/ D x . Then the problem maxff .x/ W x 2 X g has no solution; see Figure 2.1(a). [b] Let X D Œ0; 1 and f .x/ D x.1 x/. Then the problem maxff .x/ W x 2 X g has exactly one solution, namely x D 1=2; see Figure 2.1(b). OPTIMIZATION IN RN CHAPTER 2 x 32 x2 x/ f .x /D f .x /D f .x/ D x.1 x 0 (a) No solution. 1x 0 (b) Exactly one solution. −1 1 x 0 (c) Two solutions. Figure 2.1: Example 2.1. [c] Let X D Œ 1; 1 and f .x/ D x 2 . Then the problem maxff .x/ W x 2 Xg has two solutions, namely x D 1 and x D 1; see Figure 2.1(c). Example 2.1 suggests that we shall talk of the set of solutions of the optimization problem, which is denoted ˚ ˚ argmax ff .x/ W x 2 Xg D x 2 X W f .x/ > f .y/ for all y 2 X ; and argmin ff .x/ W x 2 X g D x 2 X W f .x/ 6 f .y/ for all y 2 X : We close this section by considering an optimization problem in economics. Example 2.2 There are n commodities in an economy. There is a consumer whose utility from consuming xi > 0 units of commodity i (i D 1; : : : ; n) is given by u.x1 ; : : : ; xn /, where u W RnC ! R is the consumer’s utility function. The consumer’s income is I > 0, and faces the price vector p D .p1 ; : : : ; pn /. His budget set is given by (see Figure 2.2) ˚ B.p; I / ´ x 2 RnC W p x 6 I : The consumer’s objective is to maximize his utility over the budget set, i.e., Maximize u.x/ subject to x 2 B.p; I /: Here we introduce an important fact we will make frequent use of. 2.2 UNCONSTRAINED OPTIMIZATION 33 x2 I =px2 B.p; I / I =px1 x1 0 Figure 2.2: The budget set B.px1 ; px2 ; I /. Let f W Rn ! R be differentiable. Then for all h 2 Rn satisfying h Of .x/ > 0, we have Lemma 2.3 f .x C "h/ > f .x/ for all " > 0 sufficiently small. Proof. We approximate f by its Taylor series expansion (see Definition 1.27): f .x C "h/ D f .x/ C "h Of .x/ C k"hk Rx ."h/; where lim"!0 Rx ."h/ D 0. Then lim "#0 f .x C "h/ " f .x/ D h Of .x/ h Of .x/ C khk lim Rx ."h/ D > 0I "#0 khk khk therefore, there is "N > 0 such that for all " 2 .0; "N/, f .x C "h/ " i.e., f .x C "/ 2.2 f .x/ > 0; f .x/ > 0 for all " 2 .0; "N/. t u Unconstrained Optimization Definition 2.4 (Maximum) is a maximum of f if Given X Rn , f W X ! R and x 2 X , we say x f .x/ D max ff .y/ W y 2 Xg : We say x is a local maximum of f if there is some " > 0 such that for all y 2 X \ B.xI "/ we have f .x/ > f .y/. And x is a strict local maximum of f if the latter inequality holds strictly. CHAPTER 2 34 2.2.1 OPTIMIZATION IN RN First-Order Necessary Conditions Recall that X B is the interior of X Rn (Definition 1.13), and f 0 .xI u/ is the directional derivative of f at x with respect to u (Definition 1.21). Theorem 2.5 Let X Rn and x 2 X B ; let f W X ! R be differentiable at x . If x is a local maximum of f , then Of .x/ D 0. Proof. Suppose that Of .x/ ¤ 0. Let h D Of .x/ (we can do this since x 2 X B ). Then h Of .x/ > 0. Hence f .x C "h/ > f .x/ for all " > 0 sufficiently small by Lemma 2.3. This contradicts local optimality of x . t u Definition 2.6 (Critical Point) called a critical point. Example 2.7 condition is A vector x 2 Rn such that Of .x/ D 0 is Let X D R2 and f .x; y/ D xy Of .x; y/ D .y 8x 3 ; x 2x 4 y 2 . The first order 2y/ D .0; 0/: Thus, the critical points are .x; y/ D .0; 0/; .1=4; 1=8/; . 1=4; 1=8/. 2.2.2 Second-Order Sufficient Conditions The first-order conditions for unconstrained local optima do not distinguish between maxima and minima (see the following Example 2.9). To obtain such a distinction in the behavior of f at an optimum, we need to examine the behavior of the Hessian Hf of f (see Definition 1.60). Theorem 2.8 Suppose f is of class C 2 on X Rn , and x 2 X B . (a) If f has a local maximum at x , then Hf .x/ is negative semidefinite. (b) If f has a local minimum at x , then Hf .x/ is positive semidefinite. (c) If Of .x/ D 0 and Hf .x/ is negative definite at some x , then x is a strict local maximum of f on X . (d) If Of .x/ D 0 and Hf .x/ is positive definite at some x , then x is a strict local minimum of f on X . Proof. See Sundaram (1996, Section 4.6). t u Example 2.9 Let f W R ! R be defined by f .x/ D 2x 3 3x 2 . It is easy to check that f 2 C 2 on R and there are two critical points: x D 0 and x D 1. Invoking the second-order conditions, we get f 00 .0/ D 6 and f 00 .1/ D 6. 2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 2x 3 3x 2 35 0 −1 1 x Figure 2.3: x D 0 and x D 1 are local optima but not global optima. Thus, the point x D 0 is a strict local maximum of f on R, and the point x D 1 is a strict local minimizer of f on R; see Figure 2.3. However, there is nothing in the first- or second-order conditions that will help determine whether these points are global optima. In fact, they are not: global optima do not exist in this example, since limx!C1 f .x/ D C1 and limx! 1 f .x/ D 1. 2.3 Equality Constrained Optimization: Lagrange’s Method In this section we consider problems with equality constraints of the form max f .x/ s:t: gi .x/ D 0; i D 1; : : : ; k 6 n: (ECP) We assume that f W Rn ! R, gi W Rn ! R, i D 1; : : : ; k , are continuously differentiable functions. For notational convenience, we introduce the constraint function g W Rn ! Rk , where g D .g1 ; : : : ; gk /: We can then write the constraints in the more compact form g.x/ D 0. Also recall that a point x such that gi .x / D 0 for all i D 1; : : : ; k is called a regular point if the gradient vectors Og1 .x /; : : : ; Ogk .x / CHAPTER 2 36 OPTIMIZATION IN RN are linearly independent (see Definition 1.54). 2.3.1 First-Order Necessary Conditions Since the representation of the tangent plane is known from Theorem 1.55, it is fairly simple to derive the necessary and sufficient conditions for a point to be a local extremum point subject to equality constraints. Lemma 2.10 Let x be a regular point of the constrains g.x/ D 0 and a local maximum of f subject to these constraints. Then all x 2 Rn satisfying Dg.x / x D 0 (2.1) Of .x / x D 0: (2.2) must also satisfy Proof. Take an arbitrary point x 2 M ´ fy 2 Rn W Dg.x / y D 0g. Since x is regular, it follows from Theorem 1.55 that the tangent plane Tg .x / coincides with M . Therefore, there exists a differentiable curve c W Œa; b ! Rn on Lg .x / passing through x with c.t / D x and Dc.t / D x for some t 2 Œa; b. Since x is a constrained local maximium of f , we have ˇ d f c.t / ˇˇ dt ˇ D 0; tDt or equivalently, Of .x / x D 0: t u Lemma 2.10 says that Of .x / is orthogonal to the tangent plane Tg .x /. Here is an example. Example 2.11 Consider the following problem max f .x1 ; x2 / D s:t: g.x1 ; x2 / D x1 =2 C x2 =2 x12 C x22 D 2: At the local maximum x D . 1; 1/, the gradient Of .x / is orthogonal to the tangent plane of the constraint surface. See Figure 2.4. We now conclude that Lemma 2.10 implies that Of .x / is a linear combination of the gradients of g at x . 2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 37 x2 L f . 1/ Og.x / D . 2; 2/ x D . 1; 1/ Of .x / D . x1 L g. 0/ 0 1 1 2; 2/ Tangent plane Figure 2.4: Example 2.11. Theorem 2.12 (Lagrange’s Theorem) Let x be a local maximum of f subject to g.x/ D 0, and assume that x is a regular point of these constraints. Then, there exists a unique vector D .1 ; : : : ; k / such that Of .x / D k X i Ogi .x /: i D1 Proof. Since x is a regular point of the constraints, the vectors Og1 .x /, : : :, Ogk .x / consists of a basis of Rn if k D n, and in this case it is obvious to see that the theorem holds. So we assume that k < n. Let U be the subspace spanned by Og1 .x /; : : : ; Ogk .x /. Then, Rn can be represented as the direct sum of U and U ? , the orthogonal complement of U , i.e., Rn D U ˚ U ? . It follows from Theorem 1.55 that U ? D M . Therefore, dim.Rn / D dim.U / C dim.M /, or equivalently, dim.M / D n (2.3) k: Let V be the subspace spanned by Og1 .x /; : : : ; Ogk .x /; Of .x /. SupPk pose there does not exist 1 ; : : : ; k such that Of .x / D iD1 k Ogi .x /. Then Og1 .x /; : : : ; Ogk .x /; Of .x / are linearly independent, and so dim.V / D k C 1. However, Lemma 2.10 says that the M V ?; and which means that We then have dim.M / 6 dim.V ? / D n dim.V / D n k C 1; (2.4) CHAPTER 2 38 OPTIMIZATION IN RN which contradicts (2.3). We thus conclude that Of .x / 2 U , i.e., there exists a unique vector D .1 ; : : : ; k / such that Of .x / D k X i Ogi .x /: t u i D1 Example 2.13 Consider the problem max x1 x2 C x2 x3 C x1 x3 s.t. x1 C x2 C x3 D 3: The necessary conditions become x2 C x3 D x1 C x3 D x1 C x2 D : These three equations together with the one constraint equation give four equations that can be solved for the four unknowns x1 , x2 , x3 , . Solution yields x1 D x2 D x3 D 1, D 2. The Regularity If x is not a regular point, then Theorem 2.12 can fail. Consider the following example: Example 2.14 Let f .x/ D .x C 1/2 and g.x/ D x 2 . Consider the problem of maximizing f subject to g.x/ D 0. The maximum is clearly x D 0. But Dg.0/ D 0 and Df .0/ D 2, so there is no such that Df .0/ D Dg.0/. The Lagrange Multipliers The vector D .1 ; : : : ; k / in Theorem 2.12 is called the vector of Lagrange multipliers corresponding to the local optimum x . The i th multiplier i measures the sensitivity of the value of the objective function at x to a small relaxation of the i th constraint gi . To clarify the notion of “relaxation of a constraint”, let us consider the following 1-constraint case: max f .x/ s.t. g.x/ Q c D 0; where c 2 R. Then a relaxation of the constraint may be thought of as an increase in c . 2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 39 Suppose that for each c 2 R there exists a global maximum x .c/. Suppose further that x .c/ is a regular point for each c , so there exists .c/ 2 R such that Of x .c/ D .c/ OgQ i x .c/ : Suppose further that x .c/ is differentiable with respect to c . Then d f x .c/ dc Since g.x Q .c// D Of x .c/ Dx .c/ h i D .c/ OgQ x .c/ Dx .c/ : c D 0, we also have OgQ x .c/ Dx .c/ D 1. Therefore, d f x .c/ dc D .c/: In sum, the lagrange multiplier .c/ tells us that a small relaxation in the constraint will raise the maximized value of the objective function by .c/. For this reason, .c/ is also called the shadow price of the constraint. Lagrange’s Method to solve (ECP). We now describe a procedure for using Theorem 2.12 Step 1 Set up a function L W Rn Rk ! R, called the Lagrangian, defined by L.x; / D f .x/ k X i gi .x/: iD1 k The vector D .1 ; : : : ; k / 2 R is called the vector of Lagrange multipliers. Step 2 Find all critical points of L.x; /: @L .x; / D Di f .x/ @xi @L .x; / D gj .x/ D 0; @j k X ` Di g` .x/ D 0; i D 1; : : : ; n (2.5) `D1 j D 1; : : : ; k: Define n o Y ´ .x; / 2 RnCk W .x; / satisfies (2.5) and (2.6) : (2.6) CHAPTER 2 40 OPTIMIZATION IN RN Step 3 Evaluate f at each point x in the set ˚ x 2 Rn W 9 such that .x; / 2 Y : Thus, we see that Lagrange’s method is a clever way of converting a maximization problem with constraints, to another maximization problem without constraint, by increasing the number of variables. Why the Lagrange’s Method typically succeeds in identifying the desired optima? This is because the set of all critical points of L contains the set of all local maximums and minimums of (ECP) when the regularity condition is met. That is, if x is a local maximum or minimum of f subject to g.x / D 0, and if x is regular, then there exists such that .x ; / is a critical point of L. We are not going to explain why the Lagrange’s method could fail (but see Sundaram 1996, Section 5.4 for details). Example 2.15 Consider the problem x2 max s.t. x C y y2 1 D 0: First, form the Lagrangian, L.x; y; / D x2 y2 .x C y 1/: Then set all of its first-order partials equal to zero: @L D 2x D 0 @x @L D 2y D 0 @y @L D x C y 1 D 0: @ So the critical points of L is .x ; y ; / D .1=2; 1=2; 1/: Hence, f .x ; y / D ? 1=2; see Figure 2.5. Exercise 2.16 A consumer purchases a bundle .x; y/ to maximize utility. His income is I > 0 and prices are px > 0 and py > 0. His utility function is u.x; y/ D x a y b ; where a; b > 0. Find his optimal choice .x ; y /. 2.3 EQUALITY CONSTRAINED OPTIMIZATION: LAGRANGE’S METHOD 41 y 2/ x L f . L f 1= . L f 1/ . 2/ Lg .0/ x 0 Figure 2.5: Lagrange’s method. Lagrange’s Theorem Is Not Sufficient Lagrange’s Theorem (Theorem 2.12) only gives us a necessary—not a sufficient—condition for a constrained local maximum. To see why the first order condition is not generally sufficient, consider the following example. Example 2.17 Consider the problem max x C y 2 s.t. x 1 D 0: Observe that .x ; y / D .1; 0/ satisfies the constraint g.x ; y / D 0, and .1; 0/ is regular. Furthermore, the first-order condition from Lagrange’s Theorem is satisfied at .1; 0/. This is because Of .1; 0/ D Og.1; 0/ D .1; 0/. Hence, by letting D 1 we have Of .1; 0/ D Og.1; 0/. However, .1; 0/ is not a constrained local maximum: for " > 0, we have g.1; "/ D 0 and f .1; "/ D 1 C "2 > 1 D f .1; 0/. See Figure 2.6. 2.3.2 Second-Order Analysis We probably do not have time to discus the second-order conditions. See Jehle and Reny (2011, Section A2.3.4) and Sundaram (1996, Section 5.3). OPTIMIZATION IN RN CHAPTER 2 42 y Lg .0/ Lf .1 C "2 / Lf .1/ .1; "/ .1; 0/ 0 x Figure 2.6: The Lagrange’s Theorem is not sufficient. 2.4 Inequality Constrained Optimization: Kuhn-Tucker Theorem We now consider a problem of the form max s.t. f .x/ h1 .x/ 6 0; : : : ; h` 6 0; (ICP) where f and hi are continuously differentiable functions from Rn to R. More succinctly, we can write this problem as max s.t. f .x/ h.x/ D 0; where h W Rn ! R` is the function h D .h1 ; : : : ; h` /. For any feasible point x , the set of active inequality constraints is denoted by A.x/ ´ fi W hi .x/ D 0g : (2.7) If i … A.x/, we say that the i th constraint is inactive at x . We note that if x is a local maximum of (ICP), then x is also a local maximum for a problem identical to (ICP) except that the inactive constraints at x have been discarded. Thus, inactive constraints at x do not matter; they can be ignored in the statement of optimality conditions. 2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM 43 x2 Lh2 .0/ Oh1 .y/ Oh1 .z/ Of .y/ Of .z/ Oh2 .z/ y z Df .x/ D 0 x 0 Lh1 .0/ x1 Figure 2.7: Inequality constrained optimization. Definition 2.18 Let x be a point satisfying the constraint h.x / 6 0. Then x is said to be regular if Ohi .x /, i 2 A.x /, are linearly independent. 2.4.1 First-Order Analysis Example 2.19 Figure 2.7 illustrates a problem with two inequality constraints and depicts three possibilities, depending on whether none, one, or two constraints are active. In the first case, we could have a constrained local maximum such as x , for which both constraints are inactive. Such a vector must be a critical point of the objective function. In the second case, only a single constraint is active at a constrained local maximum such as y , and here the gradients of the objective and constraint are collinear. As we will see, these gradients actually point in the same direction. Lastly, we could have a constrained local maximum such as z, where both constraints are active. Here, the gradient of the objective is not collinear with the gradient of either constraint, and it may appear that CHAPTER 2 44 OPTIMIZATION IN RN Oh1 .x / Of .x / x Oh2 .x / Figure 2.8: Kuhn-Tucker Theorem. no gradient restriction is possible. But in fact, Of .z/ can be written as a linear combination of Oh1 .z/ and Oh2 .z/ with non-negative weights. The restrictions evident in Figure 2.7 are formalized in the next theorem. Theorem 2.20 (Kuhn-Tucker Theorem) Let x be a local maximum of (ICP), and assume that x is regular. Then there exists a vector D .1 ; : : : ; ` / such that i > 0 and i hi .x / D 0; i D 1; : : : ; `; (KT-1) Of .x / D ` X i Ohi .x /: (KT-2) iD1 Proof. See Sundaram (1996, Section 6.5). t u Remark 2.21 [1] Geometrically, the first-order condition from the KuhnTucker Theorem means that the gradient of the objective function, Of .x /, is contained in the “semi-positive cone” generated by the gradients of binding constraints, i.e., it is contained in the set 8 ` <X : i D1 9 = ˛i Ohi .x / W ˛1 ; : : : ; ˛` > 0 ; ; depicted in Figure 2.8. [2] Condition (KT-1) in Theorem 2.20 is called the condition of complementary slackness: if hi .x / < 0 then i D 0; if i > 0 then hi .x / D 0. The Kuhn-Tucker Multipliers The vector in Theorem 2.20 is called the vector of Kuhn-Tucker multipliers corresponding to the local maximum x . The Kuhn-Tucker multipliers measure the sensitivity of the objective function at x to relaxations of the various constraints: 2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM If hi .x / < 0, then the i th constraint is already slack, so relaxing it further will not help raise the value of the objective function, and i must be zero. If hi .x / D 0, then relaxing the i th constraint may help increase the value of the maximization exercise, so we have i > 0. Two Differences There are two important differences from the case of equality constraints (see Theorem 2.12 and Theorem 2.20): The regularity condition now holds only for the gradients of binding constraints. The multipliers are non-negative. This difference comes from the fact that now only the inequality hi .x/ 6 0 needs to be maintained, so relaxing the constraint never hurts. The Regularity Condition As with the analogous condition in Theorem 2.12 (see Example 2.14), here we show that the regularity condition in Theorem 2.20 is essential. Example 2.22 Consider the following maximization problem max s.t. f .x1 ; x2 / D x1 x1 /3 C x2 6 0 h1 .x1 ; x2 / D .1 h2 .x1 ; x2 / D x1 6 0 h3 .x1 ; x2 / D x2 6 0: See Figure 2.9. Clearly the solution is .x1 ; x2 / D .1; 0/. At this point we have Oh1 .1; 0/ D .0; 1/; Oh3 .1; 0/ D .0; 1/ and Of .1; 0/ D .1; 0/: Since x1 > 0, it follows from the complementary slackness condition (KT-1) that 2 D 0. But now (KT-2) fails: for any 1 > 0 and 3 > 0, we have 1 Oh1 .1; 0/ C 3 Oh3 .1; 0/ D .0; 1 3 / ¤ Of .1; 0/: This is because .1; 0/ is not regular: There are two binding constraints at .1; 0/, namely h1 and h3 , and the gradients Oh1 .1; 0/ and Oh3 .1; 0/ are colinear. Certainly Of .1; 0/ cannot be contained in the cone generated by Oh1 .1; 0/ and Oh3 .1; 0/; see Remark 2.21. 45 OPTIMIZATION IN RN CHAPTER 2 46 x2 Lf .1=2/ Lf .1/ 1 / .0 L h1 Oh1 .1; 0/ Of .1; 0/ x1 0 Oh3 .1; 0/ Figure 2.9: The constraint qualification fails at .1; 0/. As with equality constraints, we can define the Lagrangian L W R R ! R by The Lagrangian n ` L.x; / D f .x/ ` X i hi .x/; i D1 and then condition (KT-2) from Theorem 2.20 is the requirement that x is a critical point of the Lagrangian given multipliers 1 ; : : : ; ` . Let us consider a numerical example. Example 2.23 Consider the problem max s.t. x2 y x2 C y2 1 6 0: Set the Lagrangian: L.x; y; / D x 2 y .x 2 C y 2 1/: 2.4 INEQUALITY CONSTRAINED OPTIMIZATION: KUHN-TUCKER THEOREM Lf . 1/ y L h .0 Lf . 5=4/ / x 0 . 47 p p 3=4; 1=2/ . 3=4; 1=2/ Figure 2.10: Example 2.23. The critical points of L are the solutions .x; y; / to 2x 2x D 0 (2.8) 1 2y D 0 (2.9) >0 (2.10) 160 (2.11) 1/ D 0: (2.12) 2 x Cy 2 .x C y 2 2 For (2.8) to hold, we must have x D 0 or D 1. If D 1, then (2.9) implies y D That is, p 1=2, and (2.12) implies x D ˙ 3=2. p 3 .x; y; / D ˙ ; 2 ! 1 ;1 : 2 We thus have f .x; y/ D 5=4; see Figure 2.10. If x D 0, then > 0 by (2.9). Hence h is binding: x 2 C y 2 1 D 0, and so y D ˙1. Since (2.9) implies that y D 1 is impossible, we have 1 .x; y; / D 0; 1; : 2 At this critical point, we have f .0; 1/ D 1 < 5=4, which means that .0; 1; 1=2/ cannot be a solution. Since there are no other critical points, it follows that there are exactly two solutions to the maximization probp lem, namely .x ; y / D .˙ 3=2; 1=2/. CHAPTER 2 48 ? OPTIMIZATION IN RN Exercise 2.24 Let U D R2 ; let f .x; y/ D .x 1/2 C y 2 , and let h.x; y/ D 2kx y 2 6 0, where k > 0. Solve the maximization problem max ff .x; y/ W h.x; y/ 6 0g : Example 2.25 Consider a consumer’s problem: max fu.x; y/ D x C yg .x;y/2R2 s.t. h1 .x; y/ D x60 h2 .x; y/ D y60 h3 .x; y/ D px x C py y (2.13) I 6 0; where px ; py ; I > 0. We first identify all possible combinations of constraints that can, in principle, be binding at the optimum. There are eight combinations to be check: ¿; h1 ; h2 ; h3 ; .h1 ; h2 /; .h1 ; h3 /; .h2 ; h3 /; and .h1 ; h2 ; h3 /: Of these, the last one can be ruled out, since h1 D h2 D 0 implies that h3 < 0. Moreover, since u is strictly increasing in both arguments, it is obvious that h3 D 0. So we only need to check three combinations: .h1 ; h3 /, .h2 ; h3 /, and h3 . If the optimum occurs at a point where only h1 and h3 are binding, then Oh1 .x; y/; Oh3 .x; y/ D . 1; 0/; .px ; py / is linear independent. So the constraint qualification holds at such a point. ? Similarly, the constraint qualification holds if only .h2 ; h3 / bind. If h3 is the only binding constraint, then Oh3 .x; y/ D .px ; py / ¤ 0; that is, the constraint qualification holds. Exercise 2.26 2.4.2 Solve the problem of (2.13). Second-Order Analysis We probably do not have time to discus the second-order conditions. See Duggan (2010, Section 6.3). 2.5 ENVELOP THEOREM 2.5 49 Envelop Theorem Let A R. The graph of a real-valued function f on A is a curve in the R2 plane, and we shall also refer to the curve itself as f . Given a onedimensional parametrized family of curves f˛ W A ! R, where ˛ runs over some interval, the curve h W A ! R is the envelope of the family if each point on the curve h is tangent to the graph of one of the curves f˛ and each curve f˛ is tangent to h. (See, e.g., Apostol 1974, p. 342 or Zorich 2004, p. 252 for this definition.) That is, for each ˛ , there is some q and also for each q , there is some ˛ , satisfying ˚ f˛ .q/ D h.q/ f˛0 .q/ D h0 .q/: We may regard h as a function of ˛ if the correspondence between curves and points on the envelope is one-to-one. 2.5.1 An Envelopment Theorem for Unconstrained Maximization Consider now an unconstrained parametrized maximization problem. Let x .q/ be the value of the control variable x that maximizes f .x; q/, where q is our parameter of interest. For some fixed x , the function x .q/ ´ f .x; q/ defines a curve. We also define the value function V .q/ ´ f x .q/; q D max x .q/: x Under appropriate conditions, the graph of the value function V will be the envelope of the curves x . “Envelope theorems” in maximization theory are concerned with the tangency conditions this entails. Example 2.27 Let f .x; q/ D q .x q/2 C 1; where x; q 2 Œ0; 2: Then given q , the maximizing x is given by x .q/ D q , and V .q/ D q C 1. CHAPTER 2 50 x .q/ OPTIMIZATION IN RN V .q/ 1 3 x D 1 2 x 2 V .q /D q C 3 D 8 0: x D 0:6 1 0:4 0:2 xD xD 0 xD 1 0 1 q 2 0 1 (a) The family of curves fx .q/ W x D 0; 0:2; : : : ; 2g. 2 q (b) The envelop. Figure 2.11: The graph of V is the envelope of the family of graphs of the functions x . For each x , the function x is given by x .q/ D q .x q/2 C 1: The graphs of these functions and of V are shown for selected values of x in Figure 2.11. Observe that the graph of V is the envelope of the family of graphs of the functions x . Consequently the slope of V is the slope of the x to which it is tangent, that is, V 0 .q/ D ˇ @x ˇˇ @f D ˇ @q xDx .q/Dq @q ˇ ˇ ˇ ˇ D 1 C 2.x xDx .q/Dq q/jxDx .q/Dq D 1: This last observation is one version of the Envelope Theorem. 2.5.2 An Envelope Theorem for Constrained Maximization Consider the maximization problem, max f .x; q/ s.t. gi .x; q/ D 0; x2Rn i D 1; : : : ; m; (2.14) 2.5 ENVELOP THEOREM 51 where x is a vector of choice variables, and q D .q1 ; : : : ; q` / 2 R` is a vector of parameters that may enter the objective function, the constraints, or both. Suppose that for each q there exists a unique solution x.q/. Furthermore, we assume that the objective function f W Rn ! R, constraints gi W Rn R` ! R (i D 1; : : : ; m), and solutions x W R` ! Rn are differentiable in the parameter q . Then, for every parameter q , the maximized value of the objective function is f .x.q/; q/. This defines a new function, V W R` ! R, called the value function. Formally, V .q/ ´ maxn ff .x; q/ W gi .x; q/ D 0; i D 1; : : : ; mg : x2R (2.15) Theorem 2.28 (Envelope Theorem) Consider the value function V .q/ for the problem (2.14). Let .1 ; : : : ; m / be values of the Lagrange multipliers N at qN . Then for each k D 1; : : : ; `, associated with the maximum solution x.q/ N N q/ N @V .q/ @f .x.q/; D @qk @qk m X j j D1 N q/ N @gj .x.q/; : @qk (2.16) Proof. By definition, V .q/ D f .x.q/; q/ for all q . Using the chain rule, we have " # n X @f .x.q/; N q/ N @xi .q/ N N q/ N N @f .x.q/; @V .q/ D C : @qk @xi @qk @qk i D1 It follows from Theorem 2.12 that m X @gj .x.q/; N q/ N N q/ N @f .x.q/; D j : @xi @xi j D1 Hence, # " n X N N q/ N @xi .q/ N N q/ N @V .q/ @f .x.q/; @f .x.q/; D C @qk @xi @qk @qk i D1 20 3 1 m n X X N q/ N A @xi .q/ N 7 @f .x.q/; N q/ N @gj .x.q/; 6@ D j 4 5C @xi @qk @qk i D1 D n X j D1 j D1 " # n X N q/ N @xi .q/ N N q/ N @gj .x.q/; @f .x.q/; j C : @xi @qk @qk i D1 Finally, since gj .x.q/; q/ D 0 for all q , we have " # n X N q/ N @xi .q/ N N q/ N @gj .x.q/; @gj .x.q/; C D 0: @xi @qk @qk iD1 CHAPTER 2 52 OPTIMIZATION IN RN Combining, we get (2.16). Example 2.29 t u We are given the problem max ff ..x; y/; q/ D xyg s.t. g..x; y/; q/ D 2x C 4y .x;y/2R2 q D 0: Forming the Lagrangian, we get L D xy .2x C 4y q/; with first-order conditions: q y 2 D 0 x 4 D 0 2x 4y D 0: (2.17) These solve for x.q/ D q=4, y.q/ D q=8 and .q/ D q=16. Thus, V .q/ D x.q/y.q/ D q2 : 32 Differentiating V .q/ with respect to q we get V 0 .q/ D q : 16 Now let us verify this using the Envelope Theorem. The theorem tells us that V 0 .q/ D @f ..x.q/; y.q//; q/ @q .q/ @g..x.q/; y.q//; q/ q D .q/ D : @q 16 Example 2.30 Consider a consumer whose utility function u W RnC ! R is strictly increasing in every commodity i D 1; : : : ; n. Then this consumer’s problem is max u.x/ s.t. n X xi pi D I: i D1 The Lagrangian is 0 L.x; / D u.x/ @I n X 1 xi pi A : iD1 It follows from Theorem 2.28 that V 0 .I / D @L.x; / D : @I That is, measures the marginal utility of income. 2.5 ENVELOP THEOREM 2.5.3 Integral Form Envelope Theorem The Envelope theorems we introduced so far rely on assumptions that are not satisfactory for applications, e.g., mechanism design. Unfortunately, it is too technique to develop the more advanced treatment of the Envelope Theorem. We refer the reader to Milgrom and Segal (2002) and Milgrom (2004, Chapter 3) for the integral form Envelope Theorem. 53 3 CONVEX ANALYSIS IN RN Rockafellar (1970) is the classical reference for finite-dimensional convex analysis. As for infinite-dimensional convex analysis, Luenberger (1969) is an excellent text. To understand the material what follows, it is necessary that the reader have a good background in Multivariable Calculus (Chapter 1) and Linear Algebra (Axler, 1997). In this chapter, we will exclusively consider convexity in Rn for concreteness, but much of the discussion here generalizes to infinite dimensional vector spaces. You may want to consult Lay (1982), Hiriart-Urruty and Lemaréchal (2001), Berkovitz (2002), Bertsekas, Nedić and Ozdaglar (2003), Bertsekas (2009), Ok (2007, Chapter G) and Royden and Fitzpatrick (2010, Chapter 6). 3.1 Convex Sets Definition 3.1 (Convex Set) A subset C Rn is convex if for every pair of points x 1 ; x 2 2 C , the line segment Œx 1 ; x 2 ´ fx W x D x 1 C .1 /x 2 ; 0 6 6 1g belongs to C . ? Exercise 3.2 Sketch the following sets in R2 and determine from figure which sets are convex and which are not: (a) n o x; y W x 2 C y 2 6 1 , (b) n o x; y W 0 < x 2 C y 2 6 1 , (c) n o x; y W y > x 2 , CHAPTER 3 56 CONVEX ANALYSIS IN RN (0, 0, 1) (1, 0, 0) (0, 1, 0) Figure 3.1: 2 in R3 . (d) n o x; y W jxj Cjyj 6 1 , and (e) n o ı x; y W y > 1 1 C x 2 . Lemma 3.3 Let fC˛ g be a collection of convex sets such that C ´ ¿. Then C is convex. T ˛ C˛ ¤ Proof. Let x 1 ; x 2 2 C . Then x 1 ; x 2 2 C˛ for all ˛ . Since C˛ is convex, we have Œx 1 ; x 2 C˛ for all ˛ . Hence, Œx 1 ; x 2 C , so C is convex. t u For each positive integer n, define n 1 ´ 8 < .1 ; : : : ; n / 2 Œ0; 1n W : n X i D1 9 = i D 1 : ; (3.1) For n D 1, the set 0 is the singleton f1g. For n D 2, the set 1 is the closed line segment joining .0; 1/ and .1; 0/. For n D 3, the set 2 is the closed triangle with vertices .1; 00/, .0; 1; 0/ and .0; 0; 1/ (see Figure 3.1). Definition 3.4 A point x 2 Rn is a convex combination of points x 1 ; : : : ; x k if there exists 2 k 1 such that xD k X i x i : iD1 Lemma 3.5 A set C Rn is convex iff every convex combination of points in C is also in C . 3.2 SEPARATION THEOREM 57 Proof. The “if” part is evident. So we shall prove the “only if” statement by induction on k . It holds for k D 2 by definition. Suppose the statement is true for k D n. Now consider k D n C 1. Let x 1 ; : : : ; x kC1 2 C and 2 n with kC1 2 .0; 1/. Then xD nC1 X i x i i D1 0 D@ n X j D1 12 j A 4 n X i Pn j D1 iD1 ! j 3 x i 5 C kC1 x kC1 2 C: t u Let A Rn , and let A be the class of all convex subsets of Rn that T contain A. We have A ¤ ¿—after all, Rn 2 A. Then, by Lemma 3.3, A is a convex set in Rn that contains A. Clearly, this set is the smallest (that is, -minimum) convex subset of Rn that contains A. Definition 3.6 The convex hull of A, denoted by cov.A/, is the intersection of all convex sets containing A. ? Exercise 3.7 For a given set A, let K.A/ denote the set of all convex combinations of points in A. Show that K.A/ is convex and A K.A/. Theorem 3.8 Let A Rn . Then cov.A/ D K.A/. T Proof. Let A be the family of convex sets containing A. Since cov.A/ D A and K.A/ 2 A (Exercise 3.7), we have cov.A/ K.A/. To prove the reverse inclusion relation, take an arbitrary C 2 A. Then T A C . It follows from Lemma 3.5 that K.A/ C . Hence K.A/ A D cov.A/. t u 3.2 Separation Theorem This section is devoted to the establishment of separation theorems. In some sense, these theorems are the fundamental theorems of optimization theory. For simplicity, we restrict our analysis on Rn . ˇ Definition 3.9 A hyperplane Ha in Rn is defined to be the set of points that satisfy the equation ha; xi D ˛ . Thus, ˚ Haˇ ´ x 2 Rn W ha; xi D ˇ : The vector a is said to be a normal to the hyperplane. CONVEX ANALYSIS IN RN CHAPTER 3 58 a ˇ Ha ˇ Half-space above Ha ˇ Half-space below Ha Figure 3.2: Hyperplane and half-spaces. ˇ Remark 3.10 Geometrically, a hyperplane Ha in Rn is a translation of an .n 1/-dimensional subspace (an affine manifold). Algebraically, it is a level set of a linear functional. For an excellent explanation about hyperplanes, see Luenberger (1969, Section 5.12). ˇ A hyperplane Ha divides Rn into two half spaces, one on each side of The set fx 2 Rn W ha; xi > ˇg is called the half-space above the hyperˇ plane Ha , and the set fx 2 Rn W ha; xi 6 ˇg is called the half-space below the ˇ hyperplane Ha ; see Figure 3.2. Haˇ . It therefore seems natural to say that two sets X and Y are separated by ˇ a hyperplane Ha if they are contained in different half spaces determined ˇ by Ha . We will introduce two separation theorems. Theorem 3.11 (A First Separation Theorem) Let C be a closed and convex subset of Rn ; let y 2 Rn X C . Then there exists a vector a 2 Rn with a ¤ 0, and a scalar ˇ 2 R such that ha; yi > ˇ and ha; xi < ˇ for all x 2 C . To motivate the proof, we argue heuristically from Figure 3.3, where C is assumed to have a tangent at each boundary point. Draw a line from y to x , the point of C that is closest to y . The vector y x is orthogonal to C in the sense that y x is orthogonal to the tangent line at x . The tangent line, which is exactly the set fz 2 Rn W hy x ; z x i D 0g, separates y and C . The point x is characterized by the fact that hy x ; x x i 6 0 for all x 2 C . If we move the tangent line parallel to itself so as to pass through a point x 0 2 .x ; y/, we are done. We now justify these steps in a series of claims. Claim 1 Let C be a convex subset of Rn and let y 2 Rn X C . If there exists a point in C that is closest to y , then it is unique. 3.2 SEPARATION THEOREM 59 a ˇ Ha y y x x x x x C Figure 3.3: The separating hyperplane theorem. Proof. Suppose that there were two points x 1 and x 2 of C that were closest to y . Then .x 1 C x 2 /=2 2 C since C is convex, and so1 x1 C x2 d.y; C / 6 y 2 1 D .x 1 y/ C .x 2 y/ 2 1 1 6 kx 1 y k C kx 2 y k 2 2 D d.y; C /: Hence the triangle inequality holds with equality. It follows from Exercise 1.8 that there exists > 0 such that x 1 y D .x 2 y/. Clearly, ¤ 0; for otherwise x 1 y D 0 implies that y D x 1 2 C . Then D 1 since kx 1 yk D kx 2 yk D d.y; C /. But then x 1 D x 2 . Claim 2 Let C be a closed subset of Rn and let y 2 Rn X C . Then there exists a point x 2 C that is closest to y . Proof. Take an arbitrary point x 0 2 C . Let r > kx 0 yk. Then C1 ´ B.yI r/\ C is nonempty (at least x 0 is in the intersection), closed, and bounded and hence is compact. The function x 7! kx yk is continuous on C1 and so attains its minimum at some point x 2 C1 , i.e., kx 1 For yk 6 kx yk for all x 2 C1 : a set X Rn , the function d.y; X / is the distance from y to X defined by d.y; X / D inf ky x2X xk: CONVEX ANALYSIS IN RN CHAPTER 3 60 For every x 2 C X C1 , we have kx yk > kx yk > r > kx 0 yk; since x 0 2 C1 . Claim 3 Let C be a convex subset of Rn and let y 2 Rn X C . Then x 2 C is a closest point in C to y iff ˝ x; x y ˛ x 6 0 for all x 2 C: (3.2) Proof. Let x 2 C be a closest point to y and let x 2 C . Since C is convex, we have ˚ Œx ; x ´ z.t / 2 Rn W z.t / D x C t .x x /; t 2 Œ0; 1 C: Let ˝ yk2 D x C t .x g.t / ´ kz.t / n X xi D x/ y; x C t .x x/ y ˛ 2 xi / : yi C t .xi iD1 Observe that g.0/ D x . Since g is continuously differentiable on .0; 1 and 0 x 2 argminx2C kx yk, we have gC .0/ > 0. Since g 0 .t / D 2 n X xi xi / .xi yi C t .xi xi / i D1 2 n X 4 D2 .yi n X xi / C t .xi xi /.xi i D1 h ˝ D2 y 3 xi /2 5 (3.3) iD1 x ;x x ˛ C tkx i x k2 : Letting t # 0 we get (3.2). Conversely, suppose that (3.2) holds. Take an arbitrary x 2 C X fx g. It follows from (3.3) that if t 2 .0; 1 then i x k2 > 2tkx x k2 > 0: That is, g is strictly increasing on Œ0; 1. Thus, g.1/ D kx g.0/. yk > kx g 0 .t / D 2 h ˝ y x; x ˛ x C t kx yk D 3.2 SEPARATION THEOREM 61 Proof of Theorem 3.11. We now can complete the proof of Theorem 3.11. Let x 2 C be the closest point to y (by Claim 1 and Claim 2). Let a D y x . Then for all x 2 C , we have ha; x x i 6 0 (by Claim 3), i.e., ha; xi 6 ha; x i, with equality occurring when x D x . Hence, max ha; xi D a; x : ˝ ˛ x2C On the other hand, ha; y x i D kak2 > 0, so ˝ ˛ ˝ ˛ ha; yi D a; x C kak2 > a; x : Finally, take an arbitrary ˇ 2 .ha; x i ; ha; yi/. We thus have ha; yi > ˇ u t and ha; xi 6 ha; x i < ˇ for all x 2 C . Theorem 3.12 (Separation Theorem) Let X and Y be two disjoint convex subsets of Rn . Then there exists a 2 Rn with a ¤ 0 and a scalar ˇ 2 R such that ha; xi > ˇ for all x 2 X and ha; yi 6 ˇ for all y 2 Y . That is, there is a ˇ hyperplane Ha that separates X and Y . Proof. First note that X \ Y D ¿ () 0 … X Since X and Y are convex, so is X that ha; x Y: Y . Hence, there exists an a ¤ 0 such yi 6 0 for all x y2X Y: Therefore, ha; xi 6 ha; yi for all x 2 X and y 2 Y: (3.4) Let ˛ ´ supfha; xi W x 2 Xg and ´ inffha; yi W y 2 Y g. Then from (3.4) we get that ˛ and are finite, and for any number ˇ 2 Œ˛; , ha; xi 6 ˇ 6 ha; yi for all x 2 X and y 2 Y: ˇ Thus the hyperplane Ha separates X and Y . t u Theorem 3.13 (Proper Separation Theorem) Let X and Y be two convex ˇ sets such that X B ¤ ¿ and X B \ Y D ¿. Then there exists a hyperplane Ha that properly separates Xx and Yx . Proof. Since X B is convex and disjoint from Y , it follows from Theorem 3.12 ˇ that there exists a hyperplane Ha such that ha; xi 6 ˇ 6 ha; yi for all x 2 X B and all y 2 Y: (3.5) CHAPTER 3 62 CONVEX ANALYSIS IN RN Let x 2 Xx . Then x 2 X B since Xx D X B . So there exists a sequence .x n / in X B such that x n ! x . It then follows from (3.5) and the continuity of the inner product that (3.5) holds for all x 2 Xx and all y 2 Yx . Hence, the ˇ hyperplane Ha separates Xx and Yx . Since .Xx /B D X B , we know that .Xx /B ¤ ¿. Furthermore, ha; xi < ˇ for x 2 .Xx /B , so the separation is proper. t u Theorem 3.14 (Strong Separation Theorem) Let K be a compact convex set and let C be a closed convex set such that K \ C D ¿. Then K and C can be strongly separated. Proof. For every " > 0, define K" ´ K C B.0I "/ D [ x C B.0I "/ x2K C" ´ C C B.0I "/ D [ y C B.0I "/ : y2C Clear, both K" and C" are open and convex. We now show that there exists " > 0 such that K" \ C" D ¿. If the assertion were false, there would exist a sequence ."n / with "n > 0 and "n ! 0 and a sequence .wn / such that wn 2 K"k \ C"k for each n. Since ˚ wn D x n C zn with x n 2 K and kzn k < "k y n C z0n with y n 2 C and z0n < "k ; we have a sequence .x k / in K and a sequence .y n / in C such that kwn x n k < "k ; kwn y n k < "k : Hence, kx n y n k D .x n wn / C .wn y n / 6 kx n wn kCky n wn k < 2"k : (3.6) Since K is compact, there exists a subsequence .x nk / that converges to x 0 2 K . It follows from (3.6) that ky nk x 0 k 6 kx nk x 0 k C kx nk y nk k ! 0; that is, y nk ! x 0 . Since C is closed, x 0 2 C . This contradicts the assumption that C \ K D ¿, and so the assertion is true. We have shown that there exists " > 0 such that K" \ C" D ¿. Hence, by Theorem 3.13 there is a hyperplane that properly separates K" and C" . Since both K" and C" are open, the separation is strict. According to the definition, this says that K and C are strongly separated. t u 3.3 SYSTEMS OF LINEAR INEQUALITIES: THEOREMS OF THE ALTERNATIVE 3.3 63 Systems of Linear Inequalities: Theorems of the Alternative Let A be an m n matrix and b ¤ 0 be a vector in Rm . We focus on finding a non-negative x 2 RnC such that A x D b or show that no such x exists. The problem can be framed as follows: can b be expressed as a non-negative linear combination of the columns of A? Definition 3.15 The set of all non-negative linear combinations of the columns of A is called the finite cone generated by the columns of A. It is denoted cone.A/. Let A be an m n matrix, then cone.A/ is a convex set. Lemma 3.16 Proof. Take any two y; y 0 2 cone.A/. Then there exist x; x 0 2 RnC such that y D Ax and y 0 D Ax 0 : For any 2 Œ0; 1 we have y C .1 Since x C .1 /y 0 D Ax C .1 /Ax 0 D A x C .1 /x 0 2 RnC it follows that y C .1 /x 0 : /y 0 2 cone.A/. t u Let A be an m n matrix, then cone.A/ is a closed set. Lemma 3.17 Proof. We prove this lemma by considering two cases. Case 1: The columns of A are linearly independent. Let .wn / be a convergent sequence in cone.A/ with limit w. We show that w 2 cone.A/. For each wn there exists x n 2 RnC such that wn D Ax n . We use the fact that .Ax n / converges to show that .x n / converges. t u Theorem 3.18 (Farkas’ Lemma) Let A be an mn matrix and let b 2 Rm Xf0g. Then one and only one of the systems Ax D b; yA > 0; has a solution. x > 0; y b < 0; x 2 Rn ; m y2R ; (FI ) (FII ) CONVEX ANALYSIS IN RN CHAPTER 3 64 Proof. We first suppose that (FI ) has a solution x . We shall show that (FII ) has no solution. Suppose there exists y 0 2 Rm such that AT y 0 > 0. Then y 0 b D x .AT y 0 / > 0: Hence (FII ) has no solution. We now suppose that (FI ) has no solution; that is, b … cone.A/: ˇ Since cone.A/ is convex and closed, there exists a hyperplane Hy 0 that strongly separates b and cone.A/. Thus, y 0 ¤ 0 and ˝ ˛ ˝ ˛ y 0; b > ˇ > y 0; z for all z 2 cone.A/: Since 0 2 cone.A/, we get that ˇ > 0. Thus, ˝ ˛ y 0 ; b > 0: (3.7) For all z 2 cone.A/ we have ˝ E ˛ ˝ ˛ D y 0 ; z D y 0 ; Ax D x; AT y 0 < ˛; the last inequality now holding for all x 2 RnC . t u Example 3.19 We use the Farkas’ Lemma to decide if the following system has a non-negative solution: " 4 1 2 3 # x " # 1 5 6 7 1 : 4x2 5 D 2 1 x3 1 0 The Farkas alternative is h y1 Or equivalently, " i 4 y2 1 ˚ 1 0 # h 5 > 0 2 0 i 0 ; y1 C y2 < 0: 4y1 C y2 > 0; y1 > 0; 5y1 C 2y2 > 0; y1 C y2 < 0: (3.8) 3.4 CONVEX FUNCTIONS 65 There has no solution to the system (3.8). The second inequality requires that y1 > 0. Combining this with the last inequality we conclude that y2 < 0. But y1 > 0 and y2 < 0 contradict the third inequality of (3.8). So, the original system has a non-negative solution. We use the Farkas’ Lemma to decide the solvability of the Example 3.20 system: 2 1 6 60 6 61 4 1 1 1 0 1 3 2 3 0 2 3 2 7 x1 6 7 7 6 17 6 7 627 4x 2 5 D 6 7 7: 17 5 x 425 3 1 1 We are interested in non-negative solutions of this system. The Farkas alternative is 2 h y1 y2 y3 1 i6 60 y4 6 61 4 1 1 1 0 1 3 0 7 h 17 7> 0 17 5 1 0 0 i 0 ; 2y1 C 2y2 C 2y3 C y4 < 0: One solution is y1 D y2 D y3 D system has no solution. 3.4 1=2 and y4 D 1, implying that the given Convex Functions Throughout this section we will assume the subset C Rn is convex and f is a real-valued function defined on C , that is, f W C ! R. When we take x 1 ; x 2 2 C , we will let x t ´ tx 1 C .1 t /x 2 , for t 2 Œ0; 1, denote the convex combination of x 1 and x 2 . Definitions Definition 3.21 t 2 Œ0; 1, A function f W C ! R is convex if for all x 1 ; x 2 2 C and f Œtx 1 C .1 t /x 2 6 tf .x 1 / C .1 t /f .x 2 /: The function f is strictly convex if the above inequality holds strictly. A function f W C ! R is concave if for all x 1 ; x 2 2 C and t 2 Œ0; 1, f Œtx 1 C .1 t /x 2 > tf .x 1 / C .1 t /f .x 2 /: The function f is strictly concave if the above inequality holds strictly. CHAPTER 3 66 f .x/ CONVEX ANALYSIS IN RN f .x/ epi.f / sub.f / x 0 x 0 (a) f is convex. (b) f is concave. x2 x2 Lf S.y0 / / .y 0 I.y0 / Lf / .y 0 x1 0 x1 0 (c) f is quasi-convex. (d) f is quasi-concave. Figure 3.4: Definition 3.22 and t 2 Œ0; 1, A function f W C ! R is quasi-convex if, for all x 1 ; x 2 2 C f Œtx 1 C .1 t /x 2 6 max ff .x 1 /; f .x 2 /g : The function f is strictly quasi-convex function if the above inequality holds strictly. A function f W C ! R is quasi-concave if, for all x 1 ; x 2 2 C and t 2 Œ0; 1, f Œtx 1 C .1 t /x 2 > min ff .x 1 /; f .x 2 /g : The function f is strictly quasi-concave if the above inequality holds strictly. Geometric Interpretation Given a function f W C ! R and y0 2 R, let us define 3.4 CONVEX FUNCTIONS 67 The epigraph of f : epi.f / ´ f.x; y/ 2 C R W f .x/ 6 yg; The subgraph of f : sub.f / ´ f.x; y/ 2 C R W f .x/ > yg; The superior set for level y0 : The inferior set for level y0 : S.y0 / ´ fx 2 C W f .x/ > y0 g; I.y0 / ´ fx 2 C W f .x/ 6 y0 g. We then have (see Figure 3.4) f is convex () epi.f / is convex. f is concave () sub.f / is convex. f is quasi-convex () I.y0 / is convex. f is quasi-concave () S.y0 / is convex. Convexity and Quasi-convexity It is a simple matter to show that concavity (convexity) implies quasi-concavity (quasi-convexity). Theorem 3.23 Let C Rn and f W C ! R. If f is concave on C , it is also quasi-concave on C . If f is convex on C , it is also quasi-convex on C . Proof. We only prove the first claim, and leave the second as an exercise. Suppose f is concave on C . Take any x; y 2 C and t 2 Œ0; 1. Without loss of generality, we let f .x/ > f .y/: By the definition of concavity, we have f tx C .1 t /y > tf .x/ C .1 t /f .y/ D t f .x/ f .y/ C f .y/ > f .y/ D min ff .x/; f .y/g : Hence, f is quasi-concave. ? t u Exercise 3.24 Prove the second claim in Theorem 3.23: if f is convex, then it is also quasi-convex. CHAPTER 3 68 CONVEX ANALYSIS IN RN Concavity and Hessian We now characterize concavity of a function using the Hessian matrix. Theorem 3.25 Let A Rn . The (twice continuously differentiable) function f W A ! R is concave if and only if Hf .x/ is negative semidefinite for every x 2 A. Proof. See Mas-Colell et al. (1995, Theorem M.C.2). t u Example 3.26 Let A ´ .0; 1/ . 5; 1/. Let f .x; y/ D ln x C ln.y C 5/. For each point .x; y/ 2 A, the Hessian of f is " Hf .x; y/ D # 0 : 1=.y C 5/2 1=x 2 0 Then for each .u; v/ 2 R2 , we have h u i v " 1=x 2 0 0 1=.y C 5/2 #" # u D v u2 x2 v2 6 0: .y C 5/2 That is, Hf .x; y/ is negative semidefinite. Hence, f .x; y/ is concave. See Figure 3.5. Jensen’s Inequality Theorem 3.27 (Jensen’s Inequality) Let f W C ! R, where C Rn is convex. Then f is convex iff for every finite set of points x 1 ; : : : ; x k 2 C and every t D .t1 ; : : : ; tk / 2 k 1 (see (3.1) for the definition of k 1 ), 0 f @ k X i D1 1 ti x i A 6 k X ti f .x i /: (3.9) i D1 Proof. The ”If” part is evident. So we only prove the “only if” part. Suppose that f is convex. We shall prove (3.9) by induction on k . For k D 1 the relation is trivial (remember that 0 D f1g when k D 1). For k D 2 the relation follows from the definition of a convex function. Suppose that k > 2 and that (3.9) has been established for k 1. We show that (3.9) holds for k . 3.4 CONVEX FUNCTIONS 69 6 4 30 2 20 0 30 y 10 20 10 0 x Figure 3.5: The function ln x C ln.y C 5/ is concave. If tk D 1, then there is nothing to prove. If tk < 1, set T D T C tk D 1, T D 1 tk > 0 and k X1 i D1 ti T D D 1: T T Pk 1 i D1 ti . Then CHAPTER 3 70 CONVEX ANALYSIS IN RN Hence, 0 f @ k X 1 0 1 ti x i A D f @ ti x i C tk x k A k X1 iD1 iD1 0 2 1 3 k 1 X ti B C D f @T 4 x i 5 C tk x k A T i D1 0 6 Tf @ k X1 i D1 2 6T4 k X1 i D1 D k X ti T ti T 1 x i A C tk f .x k / 3 f .x i /5 C tk f .x k / ti f .x i /; i D1 where the first inequality follows from the convexity of f and the second inequality follows from the inductive hypothesis. t u 3.5 Convexity and Optimization Concave Programming We first present two results which indicates the importance of convexity for optimization theory. In convex optimization problems, all local optima must also be global optima. If a strictly convex optimization problem admits a solution, the solution must be unique. Theorem 3.28 Let X Rn be convex and f W X ! R be concave. Then (a) Any local maximizer of f is a global maximizer of f . (b) The set argmaxff .x/ W x 2 Xg of maximizers of f on X is either empty or convex. 3.5 CONVEXITY AND OPTIMIZATION 71 Proof. (a) Suppose x is a local maximizer but not a global maximizer of f . Then there exists " > 0 such that f .x/ > f .y/; for all y 2 X \ B.xI "/; and there exists z 2 X such that (3.10) f .z/ > f .x/: Since X is convex, Œt x C .1 t /z 2 X for all t 2 .0; 1/. Take tN sufficiently close to 1 so that tNx C .1 tN/z 2 B.xI "/. By the concavity of f and (3.10), we have f tNx C .1 tN/z > tNf .x/ C .1 tN/f .z/ > f .x/: A contradiction. Suppose that x 1 and x 2 are both maximizers of f on X . Then f .x 1 / D f .x 2 /. For any t 2 .0; 1/, we have b f .x 1 / > f tx 1 C .1 That is, f Œtx C .1 on X . t /x 2 > tf .x 1 / C .1 t /x 2 D f .x 1 /. Thus, tx 1 C .1 t /f .x 2 / D f .x 1 /: t /x 2 is a maximizer of f t u Theorem 3.29 Let X Rn be convex and f W X ! R is strictly concave. Then argmaxff .x/ W x 2 Xg either is empty or contains a single point. ? Exercise 3.30 Prove Theorem 3.29. We now present a extremely important theorem, which says that the first-order conditions of the Kuhn-Tucker Theorem (Theorem 2.20) are both necessary and sufficient to identify optima of convex inequality-constrained optimization problem, provided a mild regularity condition is met. CHAPTER 3 72 CONVEX ANALYSIS IN RN Theorem 3.31 (Kuhn-Tucker Theorem under Convexity) Let U Rn be open and convex. Let f W U ! R be a concave C 1 function. For i D 1; : : : ; `, let hi W U ! R be convex C 1 functions. Suppose there is some xN 2 U such that N < 0; hi .x/ i D 1; : : : ; `: (This is called the Slater’s condition.) Then x maximizes f over X ´ fx 2 U W hi .x/ 6 0; i D 1; : : : ; `g if and only if there is 2 R` such that the Kuhn-Tucker first-order conditions hold: > 0; ` X i hi .x / D 0: (KTC-1) i D1 Of .x / D ` X i Ohi .x /: (KTC-2) i D1 Proof. See Sundaram (1996, Section 7.7). t u Let U D .0; 1/ . 5; 1/. Let f .x; y/ D ln x C ln.y C 5/, h1 .x; y/ D x C y 4 and h2 .x; y/ D y . Consider the problem Example 3.32 max f .x; y/ .x;y/2U s.t. ? (3.11) h1 .x; y/ 6 0; h2 .x; y/ 6 0: Exercise 3.33 Show that .x ; y / D .4; 0/ is the unique point satisfying the first-order condition for a local maximizer of the problem (3.11). Clearly, the Slater’s condition holds. Then, combining Theorem 3.31, Exercise 3.33 and Example 3.26, we conclude that .4; 0/ is a global maximizer. See Figure 3.6. Slater’s Condition For a formal demonstration of the need for Slater’s condition, let us consider the following example. 3.5 CONVEXITY AND OPTIMIZATION 73 y Lf .ln 4Cl 4 n 5/ Lh / .0 1 X .4; 0/ 0 4 x Figure 3.6: .4; 0/ is the global maximizer. Example 3.34 Let U D R; let f .x/ D x and g.x/ D x 2 . The only point in R satisfying g.x/ 6 0 is x D 0, so this is trivially the constrained maximizer of f . But f 0 .0/ D 1 and g 0 .0/ D 0; so there is no > 0 such that f 0 .0/ D g 0 .0/. Quasi-concavity and Optimization Quasi-concave and quasi-convex functions fail to exhibit many of the sharp properties that distinguish concave and convex functions. As an example, we show that Theorem 3.31 fails for quasi-concave objective functions. Example 3.35 Let f W R ! R and h W R ! R be quasi-concave continuously differentiable functions, where „ f .x/ D x3 ? if x 2 Œ0; 1 0 .x and h.x/ D if x 2 . 1; 0/ 2 1/ if x 2 .1; 1/; x ; see Figure 3.7. Exercise 3.36 Show that for every point x 2 Œ0; 1, there exists > 0 such that the pair .x; / satisfies (KT-1) and (KT-2) (see p. 44). CONVEX ANALYSIS IN RN CHAPTER 3 74 x/ h. f .x / Furthermore, the Slater’s condition holds. However, it is clear that no point x 2 Œ0; 1 can be a solution to the problem (Why?). D x 0 1 x Figure 3.7: Quasi-concavity and optimization. REFERENCES [1] Apostol, Tom M. (1974) Mathematical Analysis: Pearson Education, 2nd edition. [49] [2] Axler, Sheldon (1997) Linear Algebra Done Right, Undergraduate Texts in Mathematics, New York: Springer-Verlag, 2nd edition. [11, 55] [3] Berkovitz, Leonard D. (2002) Convexity and Optimization in Rn , Pure and Applied Mathematics: A Wiley-Interscience Series of Texts, Monographs and Tracts, New York: Wiley-Interscience. [55] [4] Bertsekas, Dimitri P. (2009) Convex Optimization Theory, Belmont, Massachusetts: Athena Scientific. [55] [5] Bertsekas, Dimitri P., Angelia Nedić, and Asuman E. Ozdaglar (2003) Convex Analysis and Optimization, Belmont, Massachusetts: Athena Scientific. [55] [6] Crossley, Martin D. (2005) Essential Topology, Springer Undergraduate Mathematics Series, London: Springer-Verlag. [7] [7] Duggan, John (2010) “Notes on Optimization and Pareto Optimality,” URL: http://www.johnduggan.net/lecture, Rochester University. [31, 48] [8] Hiriart-Urruty, Jean-Baptiste and Claude Lemaréchal (2001) Fundamentals of Convex Analysis, Grundlehren Text Editions, Berlin: Springer-Verlag. [55] [9] Jehle, Geoffrey A. and Philip J. Reny (2011) Advanced Microeconomic Theory, Essex, England: Prentice Hall, 3rd edition. [31, 41] [10] Lay, Steven R. (1982) Convex Sets and Their Applications, New York: John Wiley & Sons. [55] REFERENCES 76 [11] Lee, Jeffrey M. (2009) Manifolds and Differential Geometry, Vol. 107 of Graduate Studies in Mathematics, Providence, Rhode Island: American Mathematical Society. [22] [12] Luenberger, David G. (1969) Optimization by Vector Space Methods, Wiley Professional Paperback Series, New York: John Wiley & Sons. [31, 55, 58] [13] Luenberger, David G. and Yinyu Ye (2008) Linear and Nonlinear Programming, International Series in Operations Research and Management Science, New York: Springer, 3rd edition. [31] [14] Mas-Colell, Andreu, Michael D. Whinston, and Jerry R. Green (1995) Microeconomic Theory, New York: Oxford University Press. [31, 68] [15] Milgrom, Paul (2004) Putting Auction Theory to Work, Cambridge, U.K.: Cambridge University Press. [53] [16] Milgrom, Paul and Ilya Segal (2002) “Envelope Theorems for Arbitrary Choice Sets,” Econometrica, 70 (2): 583–601. [53] [17] Munkres, James R. (1991) Analysis on Manifolds, Boulder, Colorado: Westview Press. [1, 14, 17] [18] Ok, Efe A. (2007) Real Analysis with Economic Applications, New Jersey: Princeton University Press. [55] [19] O’neill, Barrett (2006) Elementary Differential Geometry, New York: Academic Press, revised second edition edition. [22] [20] Pugh, Charles Chapman (2002) Real Mathematical Analysis, Undergraduate Texts in Mathematics, New York: Springer-Verlag. [17] [21] Rockafellar, R. Tyrrell (1970) Convex Analysis, Princeton Mathematical Series, New Jersey: Princeton University Press, 2nd edition. [55] [22] Royden, Halsey and Patrick Fitzpatrick (2010) Real Analysis, New Jersey: Prentice Hall, 4th edition. [55] [23] Rudin, Walter (1976) Principles of Mathematical Analysis, New York: McGraw-Hill Companies, Inc. 3rd edition. [14, 17, 27] REFERENCES [24] Spivak, Michael (1965) Calculus on Manifolds: A Modern Approach to Classical Theorems of Advanced Calculus, Boulder, Colorado: Westview Press. [1, 14, 17] [25] (1999) A Comprehensive Introduction to Differential Geometry, Vol. Two, Houston, Texas: Publish or Perish, Inc. [22] [26] Sundaram, Rangarajan K. (1996) A First Course in Optimization Theory, Cambridge, Massachusetts: Cambridge University Press. [31, 34, 40, 41, 44, 72] [27] Vohra, Rakesh V. (2005) Advanced Mathematical Economics, London: Routledge. [31] [28] Willard, Stephen (2004) General Topology, New York: Dover Publications, Inc. [4] [29] Zorich, Vladimir A. (2004) Mathematical Analysis II, Universitext, Berlin: Springer-Verlag. [49] 77 INDEX C 1 , 25 C 1 , 27 C r , 27 Active inequality constraint, 42 Affine manifold, 58 Chain Rule, 14 Cobb-Douglas function, 28 Component function, 5 Concave function, 65 Continuously differentiable, 25 Convex combination, 56 Convex function, 65 Convex hull, 57 Convex set, 55 Critical point, 34 Curve, 22 Directional derivative, 9 Envelope, 49 Epigraph, 67 Euclidean n-space, 2 Farkas’ Lemma, 63 First-order Taylor formula, 10 Gradient, 19 Graph, 5 Half space, 58 Helix, 22 Hessian, 27 Hyperplane, 57 Implicit function, 16 Implicit function theorem, 17 Inactive constraint, 42 Inequality constrained optimization, 42 Inferior set, 67 Inner product, 3 Interior, 4 Jacobian matrix, 16, 24 Jensen’s inequality, 68 Kuhn-Tucker Multipliers, 44 Kuhn-Tucker Theorem, 44 Lagrange multiplier, 38, 39 Lagrange multipliers, 38 Lagrange’s Method, 39 Lagrange’s Theorem, 37 Lagrangian, 39 Limit, 5 Local maximum, 33 Maximum, 33 maximum, 33 Negative definite, 28 Negative semidefinite, 28 Norm, 2 Orthogonal complement, 37 Partial derivative, 13 INDEX Positive definite, 28 Positive semidefinite, 28 Proper Separation Theorem, 61 Quadratic form, 27 quasi-concave function, 66 quasi-convex function, 66 Regular point, 23, 43 Second-order partial derivative, 27 Separation Theorem, 61 Separation theorem, 58 Sequence, 5 Shadow price, 39 Slater’s condition, 72 Standard basis, 13 Strict local maximum, 33 Strictly concave function, 65 Strictly convex function, 65 Strictly quasi-concave function, 66 Strictly quasi-convex function, 66 Strong Separation Theorem, 62 Subgraph, 67 Superior set, 67 Tangent plane, 23 Unconstrained optimization, 33 Value function, 49, 51 79