* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Maximum and Minimum Values, cont`d
Laplace–Runge–Lenz vector wikipedia , lookup
Vector space wikipedia , lookup
Rotation matrix wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Determinant wikipedia , lookup
System of linear equations wikipedia , lookup
Principal component analysis wikipedia , lookup
Jordan normal form wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Gaussian elimination wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Four-vector wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Jim Lambers MAT 280 Spring Semester 2009-10 Lecture 9 Notes These notes correspond to Sections 11.7-11.8 in Stewart and Sections 3.2-3.4 in Marsden and Tromba. Maximum and Minimum Values, cont’d Previously, we learned that when seeking a local minimum or maximum of a function of variables, the Second Derivative Test from single-variable calculus, in which the sign of the second derivative indicated whether a local extremum was a maximum or minimum, generalizes to the Second Derivatives Test, which indicates that a local extremum x0 is a minimum if the Hessian, the matrix of second partial derivatives, is positive definite at x0 . We will now use Taylor series to explain why this test is effective. Recall that in single-variable calculus, Taylor’s Theorem states that a function 𝑓 (𝑥) with at least three continuous derivatives at 𝑥0 can be written as 1 1 𝑓 (𝑥) = 𝑓 (𝑥0 ) + 𝑓 ′ (𝑥0 )(𝑥 − 𝑥0 ) + 𝑓 ′′ (𝑥0 )(𝑥 − 𝑥0 )2 + 𝑓 ′′′ (𝜉)(𝑥 − 𝑥0 )3 , 2 6 where 𝜉 is between 𝑥 and 𝑥0 . In the multivariable case, Taylor’s Theorem states that if 𝑓 : 𝐷 ⊆ ℝ𝑛 → ℝ has continuous third partial derivatives at x0 ∈ 𝐷, then 𝑓 (x) = 𝑓 (x0 ) + ∇𝑓 (x0 ) ⋅ (x − x0 ) + (x − x0 ) ⋅ 𝐻𝑓 (x0 )(x − x0 ) + 𝑅2 (x0 , x), where 𝐻𝑓 (x0 ) is the Hessian, the matrix of second partial ⎡ ∂2𝑓 ∂2𝑓 (x0 ) ∂𝑥1 ∂𝑥2 (x0 ) ∂𝑥21 ⎢ ∂2𝑓 ∂2𝑓 ⎢ (x0 ) ⎢ ∂𝑥2 ∂𝑥1 (x0 ) ∂𝑥22 𝐻𝑓 (x0 ) = ⎢ . ⎢ .. ⎣ 2 ∂ 𝑓 ∂2𝑓 ∂𝑥𝑛 ∂𝑥1 (x0 ) ∂𝑥𝑛 ∂𝑥2 (x0 ) derivatives at x0 , defined by ⎤ 2𝑓 ⋅ ⋅ ⋅ ∂𝑥∂1 ∂𝑥 (x ) 0 𝑛 ⎥ ∂2𝑓 ⋅ ⋅ ⋅ ∂𝑥2 ∂𝑥𝑛 (x0 ) ⎥ ⎥ ⎥, .. ⎥ . ⎦ 2 ∂ 𝑓 ⋅⋅⋅ (x0 ) ∂𝑥2 𝑛 and 𝑅2 (x0 , x) is the Taylor remainder, which satisfies lim x→x0 (0) (0) 𝑅2 (x0 , x) = 0. ∥x − x0 ∥2 (0) If we let x0 = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ), then Taylor’s Theorem can be rewritten using summations: 𝑓 (x) = 𝑓 (x0 ) + 𝑛 𝑛 ∑ ∑ ∂𝑓 ∂2𝑓 (0) (0) (0) (x0 )(𝑥𝑖 − 𝑥𝑖 ) + (x0 )(𝑥𝑖 − 𝑥𝑖 )(𝑥𝑗 − 𝑥𝑗 ) + 𝑅2 (x0 , x). ∂𝑥𝑖 ∂𝑥𝑖 ∂𝑥𝑗 𝑖=1 𝑖,𝑗=1 1 Example Let 𝑓 (𝑥, 𝑦) = 𝑥2 𝑦 3 + 𝑥𝑦 4 , and let (𝑥0 , 𝑦0 ) = (1, −2). Then, from partial differentiation of 𝑓 , we obtain its gradient ] [ ] [ ∇𝑓 = 𝑓𝑥 𝑓𝑦 = 2𝑥𝑦 3 + 𝑦 4 3𝑥2 𝑦 2 + 4𝑥𝑦 3 , and its Hessian, [ 𝐻𝑓 (𝑥, 𝑦) = 𝑓𝑥𝑥 𝑓𝑥𝑦 𝑓𝑦𝑥 𝑓𝑦𝑦 ] [ = 2𝑦 3 6𝑥𝑦 2 + 4𝑦 3 2 3 6𝑥𝑦 + 4𝑦 6𝑥2 𝑦 + 12𝑥𝑦 2 Therefore ∇𝑓 (1, −2) = [ 0 −20 ] [ , 𝐻𝑓 (1, −2) = −16 −8 −8 36 ] . ] , and the Taylor expansion of 𝑓 around (1, −2) is 𝑓 (𝑥, 𝑦) = 𝑓 (𝑥0 , 𝑦0 ) + ∇𝑓 (𝑥0 , 𝑦0 ) ⋅ ⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ + 1 ⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ ⋅ 𝐻𝑓 (𝑥0 , 𝑦0 )⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ + 𝑅2 ((𝑥0 , 𝑦0 ), (𝑥, 𝑦)) 2 ] ][ ] [ [ [ ] 𝑥−1 𝑥−1 −16 −8 + 𝑅2 ((1, −2), (𝑥, 𝑦)) + ⟨𝑥 − 1, 𝑦 + 2⟩ ⋅ = 8 + 0 −20 𝑦+2 −8 36 𝑦+2 = 8 − 20(𝑦 + 2) − 16(𝑥 − 1)2 − 16(𝑥 − 1)(𝑦 + 2) + 36(𝑦 + 2)2 + 𝑅2 ((1, −2), (𝑥, 𝑦)). The first three terms represent an approximation of 𝑓 (𝑥, 𝑦) by a quadratic function that is valid near the point (1, −2). □ Now, suppose that x0 is a critical point of x. If this point is to be a local minimum, then we must have 𝑓 (x) ≥ 𝑓 (x0 ) for x near x0 . Since ∇𝑓 (x0 ) = 0, it follows that we must have (x − x0 ) ⋅ [𝐻𝑓 (x0 )(x − x0 )] ≥ 0. However, if the Hessian 𝐻𝑓 (x0 ) is a positive definite matrix, then, by definition, this expression is actually strictly greater than zero. Therefore, we are assured that x0 is a local minimum. In fact, x0 is a strict local minimum, since we can conclude that 𝑓 (x) > 𝑓 (x0 ) for all x sufficiently near x0 . As discussed previously, there are various properties possessed by symmetric positive definite matrices. One other, which provides a relatively straightforward method of checking whether a matrix is positive definite, is to check whether the determinants of its principal minors are positive. Given an 𝑛 × 𝑛 matrix 𝐴, its principal minors are the submatrices consisting of its first 𝑘 rows and columns, for 𝑘 = 1, 2, . . . , 𝑛. Note that checking these determinants is equivalent to the test that we have previously described for determining whether a 2 × 2 matrix is positive definite. Example Let 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑥𝑦. To find any local maxima or minima of this function, we compute its gradient, which is [ ] ∇𝑓 (𝑥, 𝑦, 𝑧) = 2𝑥 + 𝑦 2𝑦 + 𝑥 2𝑧 . 2 It follows that the only critical point is at (𝑥0 , 𝑦0 , 𝑧0 ) = (0, 0, 0). To perform the Second Derivatives Test, we compute the Hessian of 𝑓 , which is ⎤ ⎡ ⎤ ⎡ 2 1 0 𝑓𝑥𝑥 𝑓𝑥𝑦 𝑓𝑥𝑧 𝐻𝑓 (𝑥, 𝑦, 𝑧) = ⎣ 𝑓𝑦𝑥 𝑓𝑦𝑦 𝑓𝑦𝑧 ⎦ = ⎣ 1 2 0 ⎦ . 0 0 2 𝑓𝑧𝑥 𝑓𝑧𝑦 𝑓𝑧𝑧 To determine whether this matrix is positive definite, we can compute the determinants of the principal minors of 𝐻𝑓 (0, 0, 0), which are [ [𝐻𝑓 (0, 0, 0)]11 = 2, [𝐻𝑓 (0, 0, 0)]1:2,1:2 = 2 1 1 2 ⎡ ] , [𝐻𝑓 (0, 0, 0)]1:3,1:3 ⎤ 2 1 0 = ⎣ 1 2 0 ⎦. 0 0 2 We have det([𝐻𝑓 (0, 0, 0)]11 ) = 2, det([𝐻𝑓 (0, 0, 0)]1:2,1:2 ) = 2(2) − 1(1) = 3, det([𝐻𝑓 (0, 0, 0)]1:3,1:3 ) = 2 det([𝐻𝑓 (0, 0, 0)]1:2,1:2 ) = 6. Since all of these determinants are positive, we conclude that 𝐻𝑓 (0, 0, 0) is positive definite, and therefore the critical point is a minimum of 𝑓 . □ Constrained Optimization Now, we consider the problem of finding the maximum or minimum value of a function 𝑓 (x), except that the independent variables x = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) are subject to one or more constraints. These constraints prevent us from using the standard approach for finding extrema, but the ideas behind the standard approach are still useful for developing an approach to the constrained problem. We assume that the constraints are equations of the form 𝑔𝑖 (x) = 0, 𝑖 = 1, 2, . . . , 𝑚 for given functions 𝑔𝑖 (x). That is, we may only consider x = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) that belong to the intersection of the hypersurfaces (surfaces, when 𝑛 = 3, or curves, when 𝑛 = 2) defined by the 𝑔𝑖 , when computing a maximum or minimum value of 𝑓 . For conciseness, we rewrite these constraints as a vector equation g(x) = 0, where g : ℝ𝑛 → ℝ𝑚 is a vector-valued function with component functions 𝑔𝑖 , for 𝑖 = 1, 2, . . . , 𝑚. By Taylor’s theorem, we have, for x0 ∈ ℝ𝑛 at which g is differentiable, 𝑔(x) = 𝑔(x0 ) + 𝐽g (x0 )(x − x0 ) + 𝑅1 (x0 , x), 3 where 𝐽g (x0 ) is the Jacobian matrix of g at x0 , consisting of the first partial derivatives of the 𝑔𝑖 evaluated at x0 , and 𝑅1 (x0 , x) is the Taylor remainder, which satisfies lim x→x0 𝑅1 (x0 , x) = 0. ∥x − x0 ∥ It follows that if u is a vector belonging to all of the tangent spaces of the hypersurfaces defined by the 𝑔𝑖 , then, because each 𝑔𝑖 must remain constant as x deviates from x0 in the direction of u, we must have 𝐽g (x0 )u = 0. In other words, ∇𝑔𝑖 (x0 ) ⋅ u = 0 for 𝑖 = 1, 2, . . . , 𝑚. Now, suppose that x0 is a local minimum of 𝑓 (x), subject to the constraints g(x0 ) = 0. Then, x0 may not necessarily be a critical point of 𝑓 , but 𝑓 may not change along any direction from x0 that satisfies the constraints. Therefore, we must have ∇𝑓 (x0 ) ⋅ u = 0 for any vector u in the intersection of tangent spaces, at x0 , of the hypersurfaces defined by the constraints. It follows that if u is any such vector in this tangent plane, and there exist constants 𝜆1 , 𝜆2 , . . . , 𝜆𝑚 such that ∇𝑓 (x0 ) = 𝜆1 ∇𝑔1 (x0 ) + 𝜆2 ∇𝑔2 (x0 ) + ⋅ ⋅ ⋅ + 𝜆𝑚 ∇𝑔𝑚 (x0 ), then the requirement ∇𝑓 (x0 ) ⋅ u = 0 follows directly from the fact that ∇𝑔𝑖 (x0 ) ⋅ u = 0, and therefore x0 must be a constrained critical point of 𝑓 . The constants 𝜆1 , 𝜆2 , . . . , 𝜆𝑚 are called Lagrange multipliers. Example When 𝑚 = 1; that is, when there is only one constraint, the problem of finding a constrained minimum or maximum reduces to finding a point x0 in the domain of 𝑓 such that ∇𝑓 (x0 ) = 𝜆∇𝑔(x0 ), for a single Lagrange multiplier 𝜆. Let 𝑓 (𝑥, 𝑦) = 4𝑥2 + 9𝑦 2 . The minimum value of this function is at 0, which is attained at 𝑥 = 𝑦 = 0, but we wish to find the minimum of 𝑓 (𝑥, 𝑦) subject to the constraint 𝑥2 +𝑦 2 −2𝑥−2𝑦 = 2. That is, we must have 𝑔(𝑥, 𝑦) = 0 where 𝑔(𝑥, 𝑦) = 𝑥2 + 𝑦 2 − 2𝑥 − 2𝑦 − 2. To find any points that are candidates for the constrained minimum, we compute the gradients of 𝑓 and 𝑔, which are [ ] ∇𝑓 = 8𝑥 18𝑦 , [ ] ∇𝑔 = 2𝑥 − 2 2𝑦 − 2 . In order for the equation ∇𝑓 (𝑥, 𝑦) = 𝜆∇𝑔(𝑥, 𝑦) to be satisfied, we must have, for some choice of 𝜆, 𝑥 and 𝑦, 8𝑥 = 𝜆(2𝑥 − 2), 18𝑦 = 𝜆(2𝑦 − 2). However, we must also have 𝑥 + 𝑦 − 8 = 0, which yields the equations 8𝑥 = 𝜆(2𝑥 − 2), 144 − 18𝑥 = 𝜆(14 − 2𝑥). 4 It follows from these equations that 8𝑥 144 − 18𝑥 = . 2𝑥 − 2 14 − 2𝑥 Cross-multiplying yields 8𝑥(14 − 2𝑥) = (144 − 18𝑥)(2𝑥 − 2), which, upon expanding, reduces to the quadratic equation 20𝑥2 − 212𝑥 + 288 = 0. This equation has the roots 𝑥 = 9 and 𝑥 = 8/5. Therefore, the critical points of the constrained optimization problem are ( ) 8 32 (𝑥1 , 𝑦1 ) = (9, −1), (𝑥2 , 𝑦2 ) = , . 5 5 Evaluating 𝑓 (𝑥, 𝑦) at these points yields 𝑓 (𝑥1 , 𝑦1 ) = 𝑓 (9, −1) = 333, 𝑓 (𝑥2 , 𝑦2 ) = 𝑓 (8/5, 32/5) = 378.88. Therefore, the maximum value of 𝑓 (𝑥, 𝑦), subject to 𝑥 + 𝑦 = 8, is 378.88, achieved at 𝑥 = 8/5, 𝑦 = 32/5, and the minimum is 333, achieved at 𝑥 = 9, 𝑦 = −1. □ Example Let 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥 + 𝑦 + 𝑧. We wish to find the extremea of this function subject to the constraints 𝑥2 + 𝑦 2 = 1 and 2𝑥 + 𝑧 = 1. That is, we must have 𝑔1 (𝑥, 𝑦, 𝑧) = 𝑔2 (𝑥, 𝑦, 𝑧) = 0, where 𝑔1 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 − 1 and 𝑔2 (𝑥, 𝑦, 𝑧) = 2𝑥 + 𝑧 − 1. We must find 𝜆1 and 𝜆2 such that ∇𝑓 = 𝜆1 ∇𝑔1 + 𝜆2 ∇𝑔2 , or [ 1 1 1 ] = 𝜆1 [ 2𝑥 2𝑦 0 ] + 𝜆2 [ 2 0 1 ] . This equation, together with the constraints, yields the system of equations 1 = 2𝑥𝜆1 + 2𝜆2 1 = 2𝑦𝜆1 1 = 𝜆2 1 = 𝑥2 + 𝑦 2 1 = 2𝑥 + 𝑧. From the third equation, 𝜆2 = 1, which, by the first equation, yields 2𝑥𝜆1 = −1. It follows from the second equation that 𝑥 = −𝑦. This, in conjunction with the fourth equation, yields 5 √ √ √ √ (𝑥, 𝑦) = (1/ 2, −1/ 2) or (𝑥, 𝑦) = (−1/ 2, 1/ 2). From the fifth equation, we obtain the two critical points ( ) ( ) √ √ 1 1 1 1 (𝑥1 , 𝑦1 , 𝑧1 ) = √ , − √ , 1 − 2 , (𝑥2 , 𝑦2 , 𝑦2 ) = − √ , √ , 1 + 2 . 2 2 2 2 √ √ Substituting these points into 𝑓 yields 𝑓 (𝑥1 , 𝑦1 , 𝑧1 ) = 1 − 2 and 𝑓 (𝑥2 , 𝑦2 , 𝑧2 ) = 1 + 2, so we conclude that (𝑥1 , 𝑦1 , 𝑧1 ) is a local minimum of 𝑓 and (𝑥2 , 𝑦2 , 𝑧2 ) is a local maximum of 𝑓 , subject to the constraints 𝑔1 (𝑥, 𝑦, 𝑧) = 𝑔2 (𝑥, 𝑦, 𝑧) = 0. □ The method of Lagrange multipliers can be used in conjunction with the method of finding unconstrained local maxima and minima in order to find the absolute maximum and minimum of a function on a compact (closed and bounded) set. The basic idea is as follows: ∙ Find the (unconstrained) critical points of the function, and exclude those that do not belong to the interior of the set. ∙ Use the method of Lagrange multipliers to find the constrained critical points that lie on the boundary of the set, using equations that characterize the boundary points as constraints. Also, include corners of the boundary, as they represent critical points due to the function, restricted to the boundary, not being differentiable. ∙ Evaluate the function at all of the constrained and unconstrained critical points. The largest value is the absolute maximum value on the set, and the smallest value is the absolute minimum value on the set. From a linear algebra point of view, ∇𝑓 (x0 ) must be orthogonal to any vector u in the null space of 𝐽g (x0 ) (that is, the set consisting of any vector v such that 𝐽g (x0 )v = 0), and therefore it must lie in the range of 𝐽g (x0 )𝑇 , the transpose of 𝐽g (x0 ). That is, ∇𝑓 (x0 ) = 𝐽g (x0 )𝑇 u for some vector u, meaning that ∇𝑓 (x0 ) must be a linear combination of the rows of 𝐽g (x0 ) (the columns of 𝐽g (x0 )𝑇 ), which are the gradients of the component functions of g at x0 . Another way to view the method of Lagrange multipliers is as a modified unconstrained optimization problem. If we define the function ℎ(x, 𝜆) by ℎ(x, 𝜆) = 𝑓 (x) − 𝜆 ⋅ g(x) = 𝑓 (x) − 𝑚 ∑ 𝜆𝑖 𝑔𝑖 (x), 𝑖=1 then we can find constrained extrema of 𝑓 by finding unconstrained extrema of ℎ, for [ ] ∇ℎ(x, 𝜆) = ∇𝑓 (x) − 𝜆 ⋅ 𝐽g (x) −g(x) . Because all components of the gradient must be equal to zero at a critical point (when the gradient exists), the constraints must be satisfied at a critical point of ℎ, and ∇𝑓 must be a linear combination of the ∇𝑔𝑖 , so 𝑓 is only changing along directions that violate the constraints. Therefore, a critical point is a candidate for a constrained maximum or minimum. By the Second Derivatives Test, we can then use the Hessian of ℎ to determine if any constrained extremum is a maximum or minimum. 6 Appendix: Linear Algebra Concepts Matrix Multiplication As we work with Jacobian matrices for vector-valued functions of several variables, matrix multiplication is a highly relevant operation in multivariable calculus. We have previously defined the product of an 𝑚 × 𝑛 matrix 𝐴 (that is, 𝐴 has 𝑚 rows and 𝑛 columns) and an 𝑛 × 𝑝 matrix 𝐵 as the 𝑚 × 𝑝 matrix 𝐶 = 𝐴𝐵, where the entry in row 𝑖 and column 𝑗 of 𝐶 is the dot product of row 𝑖 of 𝐴 and column 𝑗 of 𝐵. This can be written using sigma notation as 𝑐𝑖𝑗 = 𝑛 ∑ 𝑎𝑖𝑘 𝑏𝑘𝑗 , 𝑖 = 1, 2, . . . , 𝑚, 𝑗 = 1, 2, . . . , 𝑝. 𝑘=1 Note that the number of columns in 𝐴 must equal the number of rows in 𝐵, or the product 𝐴𝐵 is undefined. Furthermore, in general, even if 𝐴 and 𝐵 can be multiplied in either order (that is, if they are square matrices of the same size), 𝐴𝐵 does not necessarily equal 𝐵𝐴. In the special case where the matrix 𝐵 is actually a column vector x with 𝑛 components (that is, 𝑝 = 1), it is useful to be able to recognize the summation 𝑦𝑖 = 𝑛 ∑ 𝑎𝑖𝑗 𝑥𝑗 𝑗=1 as the formula for the 𝑖th component of the vector y = 𝐴x. Example Let 𝐴 a 3 × 2 matrix, and 𝐵 be a 2 × 2 matrix, whose entries are given by ⎡ ⎤ ] [ 1 −2 −7 8 . 𝐴 = ⎣ −3 4 ⎦ , 𝐵 = 9 −10 5 −6 Then, because the number of columns in 𝐴 is equal to the number of 𝐶 = 𝐴𝐵 is defined, and equal to the 3 × 2 ⎡ ⎤ ⎡ 1(−7) + (−2)9 1(8) + (−2)(−10) −25 𝐶 = ⎣ (−3)(−7) + 4(9) (−3)(8) + 4(−10) ⎦ = ⎣ 57 5(−7) + (−6)9 5(8) + (−6)(−10) −89 rows in 𝐵, the product ⎤ 28 −64 ⎦ . 100 Because the number of columns in 𝐵 is not the same as the number of rows in 𝐴, it does not make sense to compute the product 𝐵𝐴. □ In multivariable calculus, matrix multiplication most commonly arises when applying the Chain Rule, because the Jacobian matrix of the composition f ∘ g at point x0 in the domain of g is the product of the Jacobian matrix of f , evaluated at 𝑔(x0 ), and the Jacobian matrix of g evaluated at 7 x0 . It follows that the Chain Rule only makes sense when composing functions f and g such that the number of dependent variables of g (that is, the number of rows in its Jacobian matrix) equals the number of independent variables of f (that is, the number of columns in its Jacobian matrix). Matrix multiplication also arises in Taylor series expansions of multivariable functions, because if 𝑓 : 𝐷 ⊆ ℝ𝑛 → ℝ, then the Taylor expansion of 𝑓 around x0 ∈ 𝐷 involves the dot product of ∇𝑓 (x0 ) with the vector x − x0 , which is a multiplication of a 1 × 𝑛 matrix with an 𝑛 × 1 matrix to produce a scalar (by convention, the gradient is written as a row vector, while points are written as column vectors). Also, such an expansion involves the dot product of x − x0 with the product of the Hessian matrix, the matrix of second partial derivatives at x0 , and the vector x − x0 . Finally, if g : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is a vector-valued function of 𝑛 variables, then the second term in its Taylor expansion around x0 ∈ 𝑈 is the product of the Jacobian matrix of g at x0 and the vector x − x0 . Eigenvalues Previously, it was mentioned that the eigenvalues of a matrix that is both symmetric, and positive definite, are positive. A scalar 𝜆, which can be real or complex, is an eigenvalue of an 𝑛 × 𝑛 matrix 𝐴 (that is, 𝐴 has 𝑛 rows and 𝑛 columns) if there exists a nonzero vector x such that 𝐴x = 𝜆x. That is, matrix-vector multiplication of 𝐴 and x reduces to a simple scaling of x by 𝜆. The vector x is called an eigenvector of 𝐴 corresponding to 𝜆. The eigenvalues of 𝐴 are roots of the characteristic polynomial det(𝐴−𝜆𝐼), which is a polynomial of degree 𝑛 in the variable 𝜆. Therefore, an 𝑛 × 𝑛 matrix 𝐴 has 𝑛 eigenvalues, which may repeat. Although the eigenvalues of a matrix may be real or complex, even when the matrix is real, the eigenvalues of a real, symmetric matrix, such as the Hessian of any function with continuous second partial derivatives, are real. For a general matrix 𝐴, det(𝐴), the determinant of 𝐴, is the product of all of the eigenvalues of 𝐴. The trace of 𝐴, denoted by tr(𝐴), which is defined to be the sum of the diagonal entries of 𝐴, is also the sum of the eigenvalues of 𝐴. It follows that when 𝐴 is a 2 × 2 symmetric matrix, the determinant and trace can be used to easily confirm that the eigenvalues of 𝐴 are either both positive, both negative, or of opposite signs. This is the basis for the Second Derivatives Test for functions of two variables. Example Let 𝐴 be a symmetric 2 × 2 matrix defined by [ ] 4 −6 𝐴= . −6 10 Then tr(𝐴) = 4 + 10 = 14, det(𝐴) = 4(10) − (−6)(−6) = 4. 8 It follows that the product and the sum of 𝐴’s two eigenvalues are both positive. Because 𝐴 is symmetric, its eigenvalues are also real. Therefore, they must both also be positive, and we can conclude that 𝐴 is positive definite. To actually compute the eigenvalues, we can compute its characteristic polynomial, which is ([ ]) 4−𝜆 −6 det(𝐴 − 𝜆𝐼) = det = (4 − 𝜆)(10 − 𝜆) − (−6)(−6) = 𝜆2 − 14𝜆 + 4. −6 10 − 𝜆 Note that det(𝐴 − 𝜆𝐼) = 𝜆2 − tr(𝐴)𝜆 + det(𝐴), which is true for 2 × 2 matrices in general. To compute the eigenvalues, we use the quadratic formula to compute the roots of this polynomial, and obtain √ √ 14 ± 142 − 4(4)(1) 𝜆= = 7 ± 3 5 ≈ 13.708, 0.292. 2(1) If 𝐴 represented the Hessian of a function 𝑓 (𝑥, 𝑦) at a point (𝑥0 , 𝑦0 ), and ∇𝑓 (𝑥0 , 𝑦0 ) = 0, then 𝑓 would have a local minimum at (𝑥0 , 𝑦0 ). □ The Transpose, Inner Product and Null Space The dot product of two vectors u and v, denoted by u ⋅ v, can also be written as u𝑇 v, where u and v are both column vectors, and u𝑇 is the transpose of u, which converts u into a row vector. In general, the transpose of a matrix 𝐴 is the matrix 𝐴𝑇 whose entries are defined by [𝐴𝑇 ]𝑖𝑗 = [𝐴]𝑗𝑖 . That is, in the transpose, the sense of rows and columns are reversed. The dot product is also known as an inner product; the outer product of two column vectors u and v is uv𝑇 , which is a matrix, whereas the inner product is a scalar. Given an 𝑚 × 𝑛 matrix 𝐴, the null space of 𝐴 is the set 𝒩 (𝐴) of all 𝑛-vectors such that if x ∈ 𝒩 (𝐴), then 𝐴x = 0. If x is such a vector, then for any 𝑚-vector v, v𝑇 (𝐴x) = v𝑇 0 = 0. However, because of two properites of the transpose, (𝐴𝑇 )𝑇 = 𝐴 and (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 , this inner product can be rewritten as v𝑇 𝐴x = v𝑇 (𝐴𝑇 )𝑇 x = (𝐴𝑇 v)𝑇 x. It follows that any vector in 𝒩 (𝐴) is orthogonal to any vector in the range of 𝐴𝑇 , denoted by ℛ(𝐴𝑇 ), which is the set of all 𝑛-vectors of the form 𝐴𝑇 v, where v is an 𝑚-vector. This is the basis for the condition ∇𝑓 = 𝐽g𝑇 𝜆 in the method of Lagrange multipliers when there are multiple constraints. Example Let ⎡ ⎤ 1 −2 4 𝐴 = ⎣ 1 3 −6 ⎦ . 1 −5 10 Then ⎡ ⎤ 1 1 1 𝐴𝑇 = ⎣ −2 3 −5 ⎦ . 4 −6 10 9 The null space of 𝐴, 𝒩 (𝐴), consists of all vectors that are multiples of the vector ⎡ ⎤ 0 ⎣ v = 2 ⎦, 1 as it can be verified by matrix-vector multiplication that 𝐴v = 0. Now, if we let w be any vector in ℝ3 , and we compute u = 𝐴𝑇 w, then v ⋅ u = v𝑇 u = 0, because v𝑇 u = v𝑇 𝐴𝑇 w = (𝐴v)𝑇 w = 0𝑇 w = 0. For example, it can be confirmed directly that v is orthogonal to any of the columns of 𝐴𝑇 . □ Practice Problems Practice problems from the recommended textbooks are: ∙ Stewart: Section 11.8, Exercises 1-29 odd ∙ Marsden/Tromba: Section 3.4, Exercises 1-11 odd 10