Download Maximum and Minimum Values, cont`d

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Laplace–Runge–Lenz vector wikipedia , lookup

Vector space wikipedia , lookup

Rotation matrix wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Determinant wikipedia , lookup

System of linear equations wikipedia , lookup

Principal component analysis wikipedia , lookup

Jordan normal form wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Gaussian elimination wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
Jim Lambers
MAT 280
Spring Semester 2009-10
Lecture 9 Notes
These notes correspond to Sections 11.7-11.8 in Stewart and Sections 3.2-3.4 in Marsden and
Tromba.
Maximum and Minimum Values, cont’d
Previously, we learned that when seeking a local minimum or maximum of a function of variables,
the Second Derivative Test from single-variable calculus, in which the sign of the second derivative indicated whether a local extremum was a maximum or minimum, generalizes to the Second
Derivatives Test, which indicates that a local extremum x0 is a minimum if the Hessian, the matrix
of second partial derivatives, is positive definite at x0 .
We will now use Taylor series to explain why this test is effective. Recall that in single-variable
calculus, Taylor’s Theorem states that a function 𝑓 (𝑥) with at least three continuous derivatives
at 𝑥0 can be written as
1
1
𝑓 (𝑥) = 𝑓 (𝑥0 ) + 𝑓 ′ (𝑥0 )(𝑥 − 𝑥0 ) + 𝑓 ′′ (𝑥0 )(𝑥 − 𝑥0 )2 + 𝑓 ′′′ (𝜉)(𝑥 − 𝑥0 )3 ,
2
6
where 𝜉 is between 𝑥 and 𝑥0 . In the multivariable case, Taylor’s Theorem states that if 𝑓 : 𝐷 ⊆
ℝ𝑛 → ℝ has continuous third partial derivatives at x0 ∈ 𝐷, then
𝑓 (x) = 𝑓 (x0 ) + ∇𝑓 (x0 ) ⋅ (x − x0 ) + (x − x0 ) ⋅ 𝐻𝑓 (x0 )(x − x0 ) + 𝑅2 (x0 , x),
where 𝐻𝑓 (x0 ) is the Hessian, the matrix of second partial
⎡ ∂2𝑓
∂2𝑓
(x0 )
∂𝑥1 ∂𝑥2 (x0 )
∂𝑥21
⎢ ∂2𝑓
∂2𝑓
⎢
(x0 )
⎢ ∂𝑥2 ∂𝑥1 (x0 )
∂𝑥22
𝐻𝑓 (x0 ) = ⎢
.
⎢
..
⎣
2
∂ 𝑓
∂2𝑓
∂𝑥𝑛 ∂𝑥1 (x0 ) ∂𝑥𝑛 ∂𝑥2 (x0 )
derivatives at x0 , defined by
⎤
2𝑓
⋅ ⋅ ⋅ ∂𝑥∂1 ∂𝑥
(x
)
0
𝑛
⎥
∂2𝑓
⋅ ⋅ ⋅ ∂𝑥2 ∂𝑥𝑛 (x0 ) ⎥
⎥
⎥,
..
⎥
.
⎦
2
∂ 𝑓
⋅⋅⋅
(x0 )
∂𝑥2
𝑛
and 𝑅2 (x0 , x) is the Taylor remainder, which satisfies
lim
x→x0
(0)
(0)
𝑅2 (x0 , x)
= 0.
∥x − x0 ∥2
(0)
If we let x0 = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ), then Taylor’s Theorem can be rewritten using summations:
𝑓 (x) = 𝑓 (x0 ) +
𝑛
𝑛
∑
∑
∂𝑓
∂2𝑓
(0)
(0)
(0)
(x0 )(𝑥𝑖 − 𝑥𝑖 ) +
(x0 )(𝑥𝑖 − 𝑥𝑖 )(𝑥𝑗 − 𝑥𝑗 ) + 𝑅2 (x0 , x).
∂𝑥𝑖
∂𝑥𝑖 ∂𝑥𝑗
𝑖=1
𝑖,𝑗=1
1
Example Let 𝑓 (𝑥, 𝑦) = 𝑥2 𝑦 3 + 𝑥𝑦 4 , and let (𝑥0 , 𝑦0 ) = (1, −2). Then, from partial differentiation
of 𝑓 , we obtain its gradient
] [
]
[
∇𝑓 = 𝑓𝑥 𝑓𝑦 = 2𝑥𝑦 3 + 𝑦 4 3𝑥2 𝑦 2 + 4𝑥𝑦 3 ,
and its Hessian,
[
𝐻𝑓 (𝑥, 𝑦) =
𝑓𝑥𝑥 𝑓𝑥𝑦
𝑓𝑦𝑥 𝑓𝑦𝑦
]
[
=
2𝑦 3
6𝑥𝑦 2 + 4𝑦 3
2
3
6𝑥𝑦 + 4𝑦 6𝑥2 𝑦 + 12𝑥𝑦 2
Therefore
∇𝑓 (1, −2) =
[
0 −20
]
[
,
𝐻𝑓 (1, −2) =
−16 −8
−8 36
]
.
]
,
and the Taylor expansion of 𝑓 around (1, −2) is
𝑓 (𝑥, 𝑦) = 𝑓 (𝑥0 , 𝑦0 ) + ∇𝑓 (𝑥0 , 𝑦0 ) ⋅ ⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ +
1
⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ ⋅ 𝐻𝑓 (𝑥0 , 𝑦0 )⟨𝑥 − 𝑥0 , 𝑦 − 𝑦0 ⟩ + 𝑅2 ((𝑥0 , 𝑦0 ), (𝑥, 𝑦))
2
]
][
]
[
[
[
] 𝑥−1
𝑥−1
−16 −8
+ 𝑅2 ((1, −2), (𝑥, 𝑦))
+ ⟨𝑥 − 1, 𝑦 + 2⟩ ⋅
= 8 + 0 −20
𝑦+2
−8 36
𝑦+2
= 8 − 20(𝑦 + 2) − 16(𝑥 − 1)2 − 16(𝑥 − 1)(𝑦 + 2) + 36(𝑦 + 2)2 + 𝑅2 ((1, −2), (𝑥, 𝑦)).
The first three terms represent an approximation of 𝑓 (𝑥, 𝑦) by a quadratic function that is valid
near the point (1, −2). □
Now, suppose that x0 is a critical point of x. If this point is to be a local minimum, then we
must have 𝑓 (x) ≥ 𝑓 (x0 ) for x near x0 . Since ∇𝑓 (x0 ) = 0, it follows that we must have
(x − x0 ) ⋅ [𝐻𝑓 (x0 )(x − x0 )] ≥ 0.
However, if the Hessian 𝐻𝑓 (x0 ) is a positive definite matrix, then, by definition, this expression is
actually strictly greater than zero. Therefore, we are assured that x0 is a local minimum. In fact,
x0 is a strict local minimum, since we can conclude that 𝑓 (x) > 𝑓 (x0 ) for all x sufficiently near x0 .
As discussed previously, there are various properties possessed by symmetric positive definite
matrices. One other, which provides a relatively straightforward method of checking whether a
matrix is positive definite, is to check whether the determinants of its principal minors are positive.
Given an 𝑛 × 𝑛 matrix 𝐴, its principal minors are the submatrices consisting of its first 𝑘 rows and
columns, for 𝑘 = 1, 2, . . . , 𝑛. Note that checking these determinants is equivalent to the test that
we have previously described for determining whether a 2 × 2 matrix is positive definite.
Example Let 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 + 𝑧 2 + 𝑥𝑦. To find any local maxima or minima of this function,
we compute its gradient, which is
[
]
∇𝑓 (𝑥, 𝑦, 𝑧) = 2𝑥 + 𝑦 2𝑦 + 𝑥 2𝑧 .
2
It follows that the only critical point is at (𝑥0 , 𝑦0 , 𝑧0 ) = (0, 0, 0). To perform the Second Derivatives
Test, we compute the Hessian of 𝑓 , which is
⎤ ⎡
⎤
⎡
2 1 0
𝑓𝑥𝑥 𝑓𝑥𝑦 𝑓𝑥𝑧
𝐻𝑓 (𝑥, 𝑦, 𝑧) = ⎣ 𝑓𝑦𝑥 𝑓𝑦𝑦 𝑓𝑦𝑧 ⎦ = ⎣ 1 2 0 ⎦ .
0 0 2
𝑓𝑧𝑥 𝑓𝑧𝑦 𝑓𝑧𝑧
To determine whether this matrix is positive definite, we can compute the determinants of the
principal minors of 𝐻𝑓 (0, 0, 0), which are
[
[𝐻𝑓 (0, 0, 0)]11 = 2,
[𝐻𝑓 (0, 0, 0)]1:2,1:2 =
2 1
1 2
⎡
]
,
[𝐻𝑓 (0, 0, 0)]1:3,1:3
⎤
2 1 0
= ⎣ 1 2 0 ⎦.
0 0 2
We have
det([𝐻𝑓 (0, 0, 0)]11 ) = 2,
det([𝐻𝑓 (0, 0, 0)]1:2,1:2 ) = 2(2) − 1(1) = 3,
det([𝐻𝑓 (0, 0, 0)]1:3,1:3 ) = 2 det([𝐻𝑓 (0, 0, 0)]1:2,1:2 ) = 6.
Since all of these determinants are positive, we conclude that 𝐻𝑓 (0, 0, 0) is positive definite, and
therefore the critical point is a minimum of 𝑓 . □
Constrained Optimization
Now, we consider the problem of finding the maximum or minimum value of a function 𝑓 (x), except
that the independent variables x = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) are subject to one or more constraints. These
constraints prevent us from using the standard approach for finding extrema, but the ideas behind
the standard approach are still useful for developing an approach to the constrained problem.
We assume that the constraints are equations of the form
𝑔𝑖 (x) = 0,
𝑖 = 1, 2, . . . , 𝑚
for given functions 𝑔𝑖 (x). That is, we may only consider x = (𝑥1 , 𝑥2 , . . . , 𝑥𝑛 ) that belong to the
intersection of the hypersurfaces (surfaces, when 𝑛 = 3, or curves, when 𝑛 = 2) defined by the 𝑔𝑖 ,
when computing a maximum or minimum value of 𝑓 . For conciseness, we rewrite these constraints
as a vector equation g(x) = 0, where g : ℝ𝑛 → ℝ𝑚 is a vector-valued function with component
functions 𝑔𝑖 , for 𝑖 = 1, 2, . . . , 𝑚.
By Taylor’s theorem, we have, for x0 ∈ ℝ𝑛 at which g is differentiable,
𝑔(x) = 𝑔(x0 ) + 𝐽g (x0 )(x − x0 ) + 𝑅1 (x0 , x),
3
where 𝐽g (x0 ) is the Jacobian matrix of g at x0 , consisting of the first partial derivatives of the 𝑔𝑖
evaluated at x0 , and 𝑅1 (x0 , x) is the Taylor remainder, which satisfies
lim
x→x0
𝑅1 (x0 , x)
= 0.
∥x − x0 ∥
It follows that if u is a vector belonging to all of the tangent spaces of the hypersurfaces defined
by the 𝑔𝑖 , then, because each 𝑔𝑖 must remain constant as x deviates from x0 in the direction of u,
we must have 𝐽g (x0 )u = 0. In other words, ∇𝑔𝑖 (x0 ) ⋅ u = 0 for 𝑖 = 1, 2, . . . , 𝑚.
Now, suppose that x0 is a local minimum of 𝑓 (x), subject to the constraints g(x0 ) = 0. Then,
x0 may not necessarily be a critical point of 𝑓 , but 𝑓 may not change along any direction from
x0 that satisfies the constraints. Therefore, we must have ∇𝑓 (x0 ) ⋅ u = 0 for any vector u in the
intersection of tangent spaces, at x0 , of the hypersurfaces defined by the constraints.
It follows that if u is any such vector in this tangent plane, and there exist constants 𝜆1 , 𝜆2 , . . . , 𝜆𝑚
such that
∇𝑓 (x0 ) = 𝜆1 ∇𝑔1 (x0 ) + 𝜆2 ∇𝑔2 (x0 ) + ⋅ ⋅ ⋅ + 𝜆𝑚 ∇𝑔𝑚 (x0 ),
then the requirement ∇𝑓 (x0 ) ⋅ u = 0 follows directly from the fact that ∇𝑔𝑖 (x0 ) ⋅ u = 0, and
therefore x0 must be a constrained critical point of 𝑓 . The constants 𝜆1 , 𝜆2 , . . . , 𝜆𝑚 are called
Lagrange multipliers.
Example When 𝑚 = 1; that is, when there is only one constraint, the problem of finding a
constrained minimum or maximum reduces to finding a point x0 in the domain of 𝑓 such that
∇𝑓 (x0 ) = 𝜆∇𝑔(x0 ),
for a single Lagrange multiplier 𝜆.
Let 𝑓 (𝑥, 𝑦) = 4𝑥2 + 9𝑦 2 . The minimum value of this function is at 0, which is attained at
𝑥 = 𝑦 = 0, but we wish to find the minimum of 𝑓 (𝑥, 𝑦) subject to the constraint 𝑥2 +𝑦 2 −2𝑥−2𝑦 = 2.
That is, we must have 𝑔(𝑥, 𝑦) = 0 where 𝑔(𝑥, 𝑦) = 𝑥2 + 𝑦 2 − 2𝑥 − 2𝑦 − 2. To find any points that
are candidates for the constrained minimum, we compute the gradients of 𝑓 and 𝑔, which are
[
]
∇𝑓 = 8𝑥 18𝑦 ,
[
]
∇𝑔 = 2𝑥 − 2 2𝑦 − 2 .
In order for the equation ∇𝑓 (𝑥, 𝑦) = 𝜆∇𝑔(𝑥, 𝑦) to be satisfied, we must have, for some choice of 𝜆,
𝑥 and 𝑦,
8𝑥 = 𝜆(2𝑥 − 2), 18𝑦 = 𝜆(2𝑦 − 2).
However, we must also have 𝑥 + 𝑦 − 8 = 0, which yields the equations
8𝑥 = 𝜆(2𝑥 − 2),
144 − 18𝑥 = 𝜆(14 − 2𝑥).
4
It follows from these equations that
8𝑥
144 − 18𝑥
=
.
2𝑥 − 2
14 − 2𝑥
Cross-multiplying yields
8𝑥(14 − 2𝑥) = (144 − 18𝑥)(2𝑥 − 2),
which, upon expanding, reduces to the quadratic equation
20𝑥2 − 212𝑥 + 288 = 0.
This equation has the roots 𝑥 = 9 and 𝑥 = 8/5. Therefore, the critical points of the constrained
optimization problem are
(
)
8 32
(𝑥1 , 𝑦1 ) = (9, −1), (𝑥2 , 𝑦2 ) =
,
.
5 5
Evaluating 𝑓 (𝑥, 𝑦) at these points yields
𝑓 (𝑥1 , 𝑦1 ) = 𝑓 (9, −1) = 333,
𝑓 (𝑥2 , 𝑦2 ) = 𝑓 (8/5, 32/5) = 378.88.
Therefore, the maximum value of 𝑓 (𝑥, 𝑦), subject to 𝑥 + 𝑦 = 8, is 378.88, achieved at 𝑥 = 8/5,
𝑦 = 32/5, and the minimum is 333, achieved at 𝑥 = 9, 𝑦 = −1. □
Example Let 𝑓 (𝑥, 𝑦, 𝑧) = 𝑥 + 𝑦 + 𝑧. We wish to find the extremea of this function subject to the
constraints 𝑥2 + 𝑦 2 = 1 and 2𝑥 + 𝑧 = 1. That is, we must have 𝑔1 (𝑥, 𝑦, 𝑧) = 𝑔2 (𝑥, 𝑦, 𝑧) = 0, where
𝑔1 (𝑥, 𝑦, 𝑧) = 𝑥2 + 𝑦 2 − 1 and 𝑔2 (𝑥, 𝑦, 𝑧) = 2𝑥 + 𝑧 − 1. We must find 𝜆1 and 𝜆2 such that
∇𝑓 = 𝜆1 ∇𝑔1 + 𝜆2 ∇𝑔2 ,
or
[
1 1 1
]
= 𝜆1
[
2𝑥 2𝑦 0
]
+ 𝜆2
[
2 0 1
]
.
This equation, together with the constraints, yields the system of equations
1 = 2𝑥𝜆1 + 2𝜆2
1 = 2𝑦𝜆1
1 = 𝜆2
1 = 𝑥2 + 𝑦 2
1 = 2𝑥 + 𝑧.
From the third equation, 𝜆2 = 1, which, by the first equation, yields 2𝑥𝜆1 = −1. It follows
from the second equation that 𝑥 = −𝑦. This, in conjunction with the fourth equation, yields
5
√
√
√
√
(𝑥, 𝑦) = (1/ 2, −1/ 2) or (𝑥, 𝑦) = (−1/ 2, 1/ 2). From the fifth equation, we obtain the two
critical points
(
)
(
)
√
√
1
1
1 1
(𝑥1 , 𝑦1 , 𝑧1 ) = √ , − √ , 1 − 2 , (𝑥2 , 𝑦2 , 𝑦2 ) = − √ , √ , 1 + 2 .
2
2
2 2
√
√
Substituting these points into 𝑓 yields 𝑓 (𝑥1 , 𝑦1 , 𝑧1 ) = 1 − 2 and 𝑓 (𝑥2 , 𝑦2 , 𝑧2 ) = 1 + 2, so we
conclude that (𝑥1 , 𝑦1 , 𝑧1 ) is a local minimum of 𝑓 and (𝑥2 , 𝑦2 , 𝑧2 ) is a local maximum of 𝑓 , subject
to the constraints 𝑔1 (𝑥, 𝑦, 𝑧) = 𝑔2 (𝑥, 𝑦, 𝑧) = 0. □
The method of Lagrange multipliers can be used in conjunction with the method of finding
unconstrained local maxima and minima in order to find the absolute maximum and minimum of
a function on a compact (closed and bounded) set. The basic idea is as follows:
∙ Find the (unconstrained) critical points of the function, and exclude those that do not belong
to the interior of the set.
∙ Use the method of Lagrange multipliers to find the constrained critical points that lie on the
boundary of the set, using equations that characterize the boundary points as constraints.
Also, include corners of the boundary, as they represent critical points due to the function,
restricted to the boundary, not being differentiable.
∙ Evaluate the function at all of the constrained and unconstrained critical points. The largest
value is the absolute maximum value on the set, and the smallest value is the absolute
minimum value on the set.
From a linear algebra point of view, ∇𝑓 (x0 ) must be orthogonal to any vector u in the null
space of 𝐽g (x0 ) (that is, the set consisting of any vector v such that 𝐽g (x0 )v = 0), and therefore
it must lie in the range of 𝐽g (x0 )𝑇 , the transpose of 𝐽g (x0 ). That is, ∇𝑓 (x0 ) = 𝐽g (x0 )𝑇 u for some
vector u, meaning that ∇𝑓 (x0 ) must be a linear combination of the rows of 𝐽g (x0 ) (the columns
of 𝐽g (x0 )𝑇 ), which are the gradients of the component functions of g at x0 .
Another way to view the method of Lagrange multipliers is as a modified unconstrained optimization problem. If we define the function ℎ(x, 𝜆) by
ℎ(x, 𝜆) = 𝑓 (x) − 𝜆 ⋅ g(x) = 𝑓 (x) −
𝑚
∑
𝜆𝑖 𝑔𝑖 (x),
𝑖=1
then we can find constrained extrema of 𝑓 by finding unconstrained extrema of ℎ, for
[
]
∇ℎ(x, 𝜆) = ∇𝑓 (x) − 𝜆 ⋅ 𝐽g (x) −g(x) .
Because all components of the gradient must be equal to zero at a critical point (when the gradient
exists), the constraints must be satisfied at a critical point of ℎ, and ∇𝑓 must be a linear combination
of the ∇𝑔𝑖 , so 𝑓 is only changing along directions that violate the constraints. Therefore, a critical
point is a candidate for a constrained maximum or minimum. By the Second Derivatives Test, we
can then use the Hessian of ℎ to determine if any constrained extremum is a maximum or minimum.
6
Appendix: Linear Algebra Concepts
Matrix Multiplication
As we work with Jacobian matrices for vector-valued functions of several variables, matrix multiplication is a highly relevant operation in multivariable calculus. We have previously defined the
product of an 𝑚 × 𝑛 matrix 𝐴 (that is, 𝐴 has 𝑚 rows and 𝑛 columns) and an 𝑛 × 𝑝 matrix 𝐵 as
the 𝑚 × 𝑝 matrix 𝐶 = 𝐴𝐵, where the entry in row 𝑖 and column 𝑗 of 𝐶 is the dot product of row 𝑖
of 𝐴 and column 𝑗 of 𝐵. This can be written using sigma notation as
𝑐𝑖𝑗 =
𝑛
∑
𝑎𝑖𝑘 𝑏𝑘𝑗 ,
𝑖 = 1, 2, . . . , 𝑚,
𝑗 = 1, 2, . . . , 𝑝.
𝑘=1
Note that the number of columns in 𝐴 must equal the number of rows in 𝐵, or the product 𝐴𝐵 is
undefined. Furthermore, in general, even if 𝐴 and 𝐵 can be multiplied in either order (that is, if
they are square matrices of the same size), 𝐴𝐵 does not necessarily equal 𝐵𝐴. In the special case
where the matrix 𝐵 is actually a column vector x with 𝑛 components (that is, 𝑝 = 1), it is useful
to be able to recognize the summation
𝑦𝑖 =
𝑛
∑
𝑎𝑖𝑗 𝑥𝑗
𝑗=1
as the formula for the 𝑖th component of the vector y = 𝐴x.
Example Let 𝐴 a 3 × 2 matrix, and 𝐵 be a 2 × 2 matrix, whose entries are given by
⎡
⎤
]
[
1 −2
−7
8
.
𝐴 = ⎣ −3 4 ⎦ , 𝐵 =
9 −10
5 −6
Then, because the number of columns in 𝐴 is equal to the number of
𝐶 = 𝐴𝐵 is defined, and equal to the 3 × 2
⎡
⎤ ⎡
1(−7) + (−2)9 1(8) + (−2)(−10)
−25
𝐶 = ⎣ (−3)(−7) + 4(9) (−3)(8) + 4(−10) ⎦ = ⎣ 57
5(−7) + (−6)9 5(8) + (−6)(−10)
−89
rows in 𝐵, the product
⎤
28
−64 ⎦ .
100
Because the number of columns in 𝐵 is not the same as the number of rows in 𝐴, it does not make
sense to compute the product 𝐵𝐴. □
In multivariable calculus, matrix multiplication most commonly arises when applying the Chain
Rule, because the Jacobian matrix of the composition f ∘ g at point x0 in the domain of g is the
product of the Jacobian matrix of f , evaluated at 𝑔(x0 ), and the Jacobian matrix of g evaluated at
7
x0 . It follows that the Chain Rule only makes sense when composing functions f and g such that
the number of dependent variables of g (that is, the number of rows in its Jacobian matrix) equals
the number of independent variables of f (that is, the number of columns in its Jacobian matrix).
Matrix multiplication also arises in Taylor series expansions of multivariable functions, because
if 𝑓 : 𝐷 ⊆ ℝ𝑛 → ℝ, then the Taylor expansion of 𝑓 around x0 ∈ 𝐷 involves the dot product of
∇𝑓 (x0 ) with the vector x − x0 , which is a multiplication of a 1 × 𝑛 matrix with an 𝑛 × 1 matrix to
produce a scalar (by convention, the gradient is written as a row vector, while points are written
as column vectors). Also, such an expansion involves the dot product of x − x0 with the product of
the Hessian matrix, the matrix of second partial derivatives at x0 , and the vector x − x0 . Finally,
if g : 𝑈 ⊆ ℝ𝑛 → ℝ𝑚 is a vector-valued function of 𝑛 variables, then the second term in its Taylor
expansion around x0 ∈ 𝑈 is the product of the Jacobian matrix of g at x0 and the vector x − x0 .
Eigenvalues
Previously, it was mentioned that the eigenvalues of a matrix that is both symmetric, and positive
definite, are positive. A scalar 𝜆, which can be real or complex, is an eigenvalue of an 𝑛 × 𝑛 matrix
𝐴 (that is, 𝐴 has 𝑛 rows and 𝑛 columns) if there exists a nonzero vector x such that
𝐴x = 𝜆x.
That is, matrix-vector multiplication of 𝐴 and x reduces to a simple scaling of x by 𝜆. The vector
x is called an eigenvector of 𝐴 corresponding to 𝜆.
The eigenvalues of 𝐴 are roots of the characteristic polynomial det(𝐴−𝜆𝐼), which is a polynomial
of degree 𝑛 in the variable 𝜆. Therefore, an 𝑛 × 𝑛 matrix 𝐴 has 𝑛 eigenvalues, which may repeat.
Although the eigenvalues of a matrix may be real or complex, even when the matrix is real, the
eigenvalues of a real, symmetric matrix, such as the Hessian of any function with continuous second
partial derivatives, are real.
For a general matrix 𝐴, det(𝐴), the determinant of 𝐴, is the product of all of the eigenvalues
of 𝐴. The trace of 𝐴, denoted by tr(𝐴), which is defined to be the sum of the diagonal entries of
𝐴, is also the sum of the eigenvalues of 𝐴. It follows that when 𝐴 is a 2 × 2 symmetric matrix,
the determinant and trace can be used to easily confirm that the eigenvalues of 𝐴 are either both
positive, both negative, or of opposite signs. This is the basis for the Second Derivatives Test for
functions of two variables.
Example Let 𝐴 be a symmetric 2 × 2 matrix defined by
[
]
4 −6
𝐴=
.
−6 10
Then
tr(𝐴) = 4 + 10 = 14,
det(𝐴) = 4(10) − (−6)(−6) = 4.
8
It follows that the product and the sum of 𝐴’s two eigenvalues are both positive. Because 𝐴 is
symmetric, its eigenvalues are also real. Therefore, they must both also be positive, and we can
conclude that 𝐴 is positive definite.
To actually compute the eigenvalues, we can compute its characteristic polynomial, which is
([
])
4−𝜆
−6
det(𝐴 − 𝜆𝐼) = det
= (4 − 𝜆)(10 − 𝜆) − (−6)(−6) = 𝜆2 − 14𝜆 + 4.
−6 10 − 𝜆
Note that
det(𝐴 − 𝜆𝐼) = 𝜆2 − tr(𝐴)𝜆 + det(𝐴),
which is true for 2 × 2 matrices in general. To compute the eigenvalues, we use the quadratic
formula to compute the roots of this polynomial, and obtain
√
√
14 ± 142 − 4(4)(1)
𝜆=
= 7 ± 3 5 ≈ 13.708, 0.292.
2(1)
If 𝐴 represented the Hessian of a function 𝑓 (𝑥, 𝑦) at a point (𝑥0 , 𝑦0 ), and ∇𝑓 (𝑥0 , 𝑦0 ) = 0, then 𝑓
would have a local minimum at (𝑥0 , 𝑦0 ). □
The Transpose, Inner Product and Null Space
The dot product of two vectors u and v, denoted by u ⋅ v, can also be written as u𝑇 v, where u and
v are both column vectors, and u𝑇 is the transpose of u, which converts u into a row vector. In
general, the transpose of a matrix 𝐴 is the matrix 𝐴𝑇 whose entries are defined by [𝐴𝑇 ]𝑖𝑗 = [𝐴]𝑗𝑖 .
That is, in the transpose, the sense of rows and columns are reversed. The dot product is also
known as an inner product; the outer product of two column vectors u and v is uv𝑇 , which is a
matrix, whereas the inner product is a scalar.
Given an 𝑚 × 𝑛 matrix 𝐴, the null space of 𝐴 is the set 𝒩 (𝐴) of all 𝑛-vectors such that if
x ∈ 𝒩 (𝐴), then 𝐴x = 0. If x is such a vector, then for any 𝑚-vector v, v𝑇 (𝐴x) = v𝑇 0 = 0.
However, because of two properites of the transpose, (𝐴𝑇 )𝑇 = 𝐴 and (𝐴𝐵)𝑇 = 𝐵 𝑇 𝐴𝑇 , this inner
product can be rewritten as v𝑇 𝐴x = v𝑇 (𝐴𝑇 )𝑇 x = (𝐴𝑇 v)𝑇 x. It follows that any vector in 𝒩 (𝐴) is
orthogonal to any vector in the range of 𝐴𝑇 , denoted by ℛ(𝐴𝑇 ), which is the set of all 𝑛-vectors
of the form 𝐴𝑇 v, where v is an 𝑚-vector. This is the basis for the condition ∇𝑓 = 𝐽g𝑇 𝜆 in the
method of Lagrange multipliers when there are multiple constraints.
Example Let
⎡
⎤
1 −2 4
𝐴 = ⎣ 1 3 −6 ⎦ .
1 −5 10
Then
⎡
⎤
1
1
1
𝐴𝑇 = ⎣ −2 3 −5 ⎦ .
4 −6 10
9
The null space of 𝐴, 𝒩 (𝐴), consists of all vectors that are multiples of the vector
⎡ ⎤
0
⎣
v = 2 ⎦,
1
as it can be verified by matrix-vector multiplication that 𝐴v = 0. Now, if we let w be any vector
in ℝ3 , and we compute u = 𝐴𝑇 w, then v ⋅ u = v𝑇 u = 0, because
v𝑇 u = v𝑇 𝐴𝑇 w = (𝐴v)𝑇 w = 0𝑇 w = 0.
For example, it can be confirmed directly that v is orthogonal to any of the columns of 𝐴𝑇 . □
Practice Problems
Practice problems from the recommended textbooks are:
∙ Stewart: Section 11.8, Exercises 1-29 odd
∙ Marsden/Tromba: Section 3.4, Exercises 1-11 odd
10