Download xi. linear algebra

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Linear least squares (mathematics) wikipedia , lookup

Rotation matrix wikipedia , lookup

System of linear equations wikipedia , lookup

Determinant wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Jordan normal form wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Four-vector wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Principal component analysis wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Ordinary least squares wikipedia , lookup

Gaussian elimination wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
XI.
LINEAR ALGEBRA: MATRICES FOR DERIVATIVES
I always think of matrices as compact ways of expressing systems of equations,
derivatives, observations, and such. Matrix algebra provides the tools for handling
these large systems very efficiently, including matrices of derivatives. In the simplest
case, think of a real-valued function of n variables, f : R n ! R1 . It has n partial
derivatives. The column vector consisting of all these partial derivatives is the
gradient vector,
# " f " x1 &
%" f " x (
2(
!x f ( x ) = %
% ! (
%
(
$ " f " xn '
We talk about gradients only for functions into R1 . More generally, the matrix of
first derivatives of a function f : R n ! R m , also called the Jacobian matrix, is:
! (( xf1
! f1 (x) $
# 1
# f (x) &
# (( xf2
2
#
&
f (x) =
' Dx f(x) = # 1
# ! &
# !
#
&
# ( fm
f
(x)
" m %
#" ( x1
( f1
( x2
( f2
( x2
( fm
( x2
$ ! ( ) x f1 (x))* $
&
& #
( f2
#
&
&
) x f2 (x))* &
( xn
(
#
&=
&
# ! & #
!
#
&
(f &
" ( xmn &% #( ) f (x))* &
" x m
%
"
( f1
( xn
The matrix of first derivatives is m ! n . Going back to the case for R1 -valued
functions, you will often see instead of gradients, the “row vector” (and to those who
claim there are no such thing, I say “ 1 ! n matrix”) of partial derivatives:
Dx f(x) = "# ! x1
!f
!f
! x2
!
!f
! xn
$ = [ & x f(x)]'
%
There only difference is geometric orientation.
Example:
Dp x ( p, w ) ?
Example:
# ! w (1 " ! ) w &
Let x ( p, w ) = ( x1 ( p1 , p2 , w ) , x2 ( p1 , p2 , w )) = %
.
,
p2 ('
$ p1
What is
"w
p %
Now let x ( p, w ) = $ ! 1, 1 ' . What is Dp x ( p, w ) ?
p2 &
# p1
The second important matrix in theory classes in the matrix of second-derivatives
and cross-partials of a real-valued function of n variables, f : R n ! R1 . The Hessian
matrix is the n ! n whose ij-th entry is ! 2 f !xi !x j . On the diagonal of this matrix,
Fall 2007 math class notes, page 80
we have second derivatives; off the diagonals, we have cross-partials. Since
! 2 f !xi !x j = ! 2 f !x j !xi , the ij-th entry of the matrix is the same as the ji-th entry, so
the Hessian matrix is always symmetric.
# " 2f
# ""xf &)
% " x1
% " f1 (
% "2 f
% " x1 (
T
2
f ( x ) ! Dx f ( x ) = % ( = ( * x f ( x )) ! Dxx f ( x ) = % " x1" x2
% !
%! (
% 2
% "f (
% " f
%$ " x1 ('
$ " x1" xn
2
"2 f
" x1" x2
"2 f
" x22
"2 f
" x2 " xn
&
(
"2 f (
" x2 " xn (
= Dx* x f ( x )
#
! (
(
"2 f
" " x2 (
'
n
"
"2 f
" x1" xn
Hessian matrices are comparable to the second derivative of an R1 ! R1 function,
and they will always be used to test the concavity of a function of more than one
variable. You should get very accustomed to finding Hessians.
Example:
F ( K, L ) = AK ! L1" ! . Find D 2 F ( K, L ) with respect to all the inputs.
Example:
U ( x1 , x2 ) = x1! + x2!
(
)
1!
2
. Find Dxx
U ( x1 , x2 ) .
Example: U ( x1 , x2 , x3 ) = x1! x2" x3# and x ( p, w ) = (! + " + # )
2
V ( p, w ) .
Find V ( p, w ) = U ( x ( p, w )) and then Dpp
$1
(! w
p1 , " w p2 , # w p3 ) .
Example: U ( x1 , x2 ) = x1 + ln x2 and x ( p, w ) = ( w p1 ! 1 , p2 p1 ) . Find V ( p, w ) and
2
V ( p, w ) .
then Dpp
The test for concavity or convexity of a function of one variable was whether the
second derivative was negative or positive. We see that for functions of more than
one variable, there is a matrix of second derivatives. How do we identify whether a
matrix is positive or negative? The answer is not to look to look simply at each of
the ! 2 f !xi2 . It is also not sufficient to ensure that each individual element is
negative.
Example: Is the function f ( x, y ) = x 2 y 2 convex? On first inspection, it looks like the
product of two convex functions. It’s convex in each variable individually; that is, both (own)
second derivatives are positive. The Hessian matrix of this function is:
! 2y 2 4xy $
D 2 f ( x, y ) = #
2&
" 4xy 2x %
All the elements of this matrix are positive. And yet, consider this: at two points on the
axes, f ( 0,1) = f (1, 0 ) = 0 . At a convex combination of those two points, we find
that 1 2 f ( 0,1) + 1 2 f ( 0 ) = 0 !/ 116 = f ( 1 2 , 1 2 ) . This is not a convex function. (It looks like a
giant scalloped bowl.)
Fall 2007 math class notes, page 81
This demonstrates that we need a new way to define positive and negative for
matrices, at least as far as second-derivative tests are concerned.
Definition: An n ! n matrix A is positive semidefinite if for all vectors x !R n , the
number x ! Ax " 0 .
Definition: An n ! n matrix A is negative semidefinite if for all vectors x !R n ,
the number x ! Ax " 0 .
Definition: An n ! n matrix A is positive definite if for all vectors x !R n , x ! 0 ,
the number x ! Ax > 0 .
Definition: An n ! n matrix A is positive definite if for all vectors x !R n , x ! 0 , the
number x ! Ax > 0 .
Any matrix that is not one of these is called indefinite. Note that if M is a scalar
(which is really just a 1 ! 1 matrix), these definitions correspond to the usual
definitions of weakly or strictly positive or negative, since x 2 changes the sign of
nothing, provided x ! 0 .
A function is concave if and only if its Hessian matrix is negative semidefinite;
strictly concave if and only if its Hessian matrix is negative definite. Whenever we
are maximizing a function of more than one variable, we must find the Hessian
matrix of the funtion and confirm that it is negative semidefinite in order to ensure
that we have found a maximum.
The definitions are generally hard to work with, when doing these tests. There is a
set of rules for determining the sign and definiteness of a matrix.
If we have an n ! n matrix A, a k-th order principal submatrix of A is a matrix
that results from deleting n ! k rows and the same n ! k columns from A. If we have
a 3 ! 3 matrix,
! a11
A = ## a21
#" a31
a12
a22
a32
a13 $
a23 &&
a33 %&
then we can form three second order principal submatrices: by deleting the first row
and first column, by deleting the second row and second column, and by deleting the
third row and third column:
Fall 2007 math class notes, page 82
! a22
#a
" 23
! a11
#a
" 31
! a11
#a
" 21
a12 a13 $
!a
a32 $ # 11
= a
a22 a23 &&
a33 &% # 21
#" a31 a32 a33 &%
a12 a13 $
!a
a13 $ # 11
= a
a22 a23 &&
a33 &% # 21
#" a31 a32 a33 &%
a12 a13 $
!a
a12 $ # 11
= # a21 a22 a23 &&
&
a22 %
#" a31 a32 a33 &%
The leading principal submatrices of A are only those principal submatrices
formed by deleting the last n ! k rows and columns of matrix. For the 3 ! 3 matrix
described above, the first, second, and third order leading principal submatrices are:
[a ] ,
11
! a11
#a
" 21
a12 $
,
a22 &%
and:
! a11
#a
# 21
#" a31
a12
a22
a32
a13 $
a23 &&
a33 &%
The determinant of a k-th order principal submatrix is called a k-th order
principal minor. Here is the relationship between the principle minors and the
sign and definiteness of a matrix:
Theorem: An n ! n matrix A is positive definite if and only if all its n leading principal
minors are strictly positive.
Theorem: An n ! n matrix A is positive semidefinite if and only if all its principal
minors (not just leading!) are nonnegative.
Theorem: An n ! n matrix A is negative definite if and only if its n leading principal
k
minors alternate in signs, with the sign of the k-th order leading principal minor equal to ( !1) .
Theorem: An n ! n matrix A is negative semidefinite if and only if all its principal
k
minors (not just leading!) of order k equal zero or have the sign of ( !1) .
This is what you have to do to test the concavity or convexity of a function of several
variables. Finding principal submatrices of two-by-two and three-by-three matrices
isn’t terribly difficult. Once it starts getting to four and five and more dimensional,
it’s a real pain. Unfortunately, there’s not really a better way to determine concavity.
Only a particularly sadistic professor would ask you to test the concavity of a
function of more than three variables. Here are the simple rules for two and three
variable cases (just for concavity).
Fall 2007 math class notes, page 83
Suppose your Hessian matrix is 2 ! 2 :
! a11
A=#
" a21
a12 $
a22 &%
To show that the function is strictly concave, you need to show that:
a11
a21
a11 < 0 and
a12
>0
a22
For just plain concavity, you to show that both of those hold with weak inequality, and
additionally that a22 ! 0 . For a function of three variables, you have a 3 ! 3 Hessian
matrix. For strict concavity, you need to confirm that the leading principal minors
have the right signs:
a11
a21
a11 < 0 ,
a11
a21
a31
a12
> 0 , and:
a22
a12
a22
a32
a13
a23 < 0
a33
For plain concavity, you need to show that all the second derivatives are nonpositive:
a11 ! 0 ,
a22 ! 0 ,
a33 ! 0
And that all the second-order principal minors are nonnegative:
a22
a23
a32
! 0,
a33
a11
a31
a13
!0,
a33
a11
a21
a12
!0
a22
Finally, check that the determinant of the matrix itself is nonpositive.
Example:
Let f ( x, y ) = x 2 y 2 . Check for concavity or convexity (strict or otherwise).
Example:
Let f ( x, y ) = xy . Check for concavity or convexity (strict or otherwise).
Example:
otherwise).
Let f ( x, y ) = x 2 + y 2 + xy . Check for concavity or convexity (strict or
Example: Let F ( K, L ) = AK ! L1" ! .
concavity or strict concavity.
(
Example: Let U ( x1 , x2 ) = x1! + x2!
concavity or strict concavity.
Find the matrix D 2 F ( K, L ) and check for
)
1!
. Find the matrix Dxx2 U ( x1 , x2 ) and check for
Fall 2007 math class notes, page 84
Example:
Let U ( x1 , x2 , x3 ) = x1! x2" x3# and x ( p, w ) =
(
!w
p1 (! + " + # )
"w
2
V ( p, w ) and check for concavity or strict concavity.
the matrix D pp
Example:
Let U ( x1 , x2 ) = x1 + ln x2
and x ( p, w ) =
2
D pp
V ( p, w ) and check for concavity or strict concavity.
(
w
p1
#w
)
, p2 (! + " + # ) , p2 (! + " + # ) . Find
)
! 1, p21 .
p
Find the matrix
Okay, enough about concavity. Let’s talk briefly about eigenvalues and eigenvectors,
which will be useful for checking the stability of systems of differential (or
difference) equations.
In macro, you might have a matrix that describes how several variables in the
economy evolve, for instance:
! kt +1 $ ! a11
# m & = #a
" t +1 % " 21
a12 $ ! kt $
a22 &% #" mt &%
where kt and kt +1 describe the capital stock of the country at time t and at t+1;
mt and mt +1 are real money balances. The matrix A consists of constants, or likely as
not, linear first-order approximations of some functions (remember the Taylor
series?). If we want to find the values of these two variables two years in the future,
we would use the formula iteratively:
! kt + 2 $ ! a11
# m & = #a
" t + 2 % " 21
a12 $ ! kt +1 $ ! a11
=
a22 &% #" mt +1 &% #" a21
a12 $ ! a11
a22 &% #" a21
a12 $ ! kt $
a22 &% #" mt &%
And then to find the value of the variables n years into the future we just repeat this
multiplication n times:
n
! kt + n $ ! a11 a12 $ ! kt $
! kt $
= An # &
# m & = #a
&
#
&
" t + n % " 21 a22 % " mt %
" mt %
Does this settle down at some point, or do the variables keep growing forever?
Remember from earlier that if we multiply any x !( "1,1) by itself a number of times,
it gets really small, and:
xn ! 0
as
n!"
In macroeconomics we might be wondering something very similar, except that the
number x has been replaces with a matrix A. In order to see whether this matrix
converges or not, we have to look at its eigenvalues.
Fall 2007 math class notes, page 85
Given an n ! n matrix A, a scalar ! is called an eigenvalue or characteristic
value of A if there exists a nonzero vector x !R n (called the eigenvector or
characteristic vector) such that:
Ax = ! x
Here are some alternatives characterizations of an eigenvalue.
Theorem:
The following statements are equivalent:
! is an eigenvalue of A.
!
!
2. ( A ! " I ) x = 0 has a solution other than x = 0 .
1.
3. A ! " I is singular
4.
A ! "I = 0 .
Matrices often have multiple eigenvalues. In fact, almost all n ! n matrices have n
distinct eigenvalues. Given that a matrix A has m eigenvalues !1 , !2 ,…, !m , the
following are true:
"
!m = tr ( A )
m
i =1
"
and:
m
i =1
!m = A
The large capital pi indicates product over a bunch of variables, just as a large capital
sigma indicates the sum. We can make a few observations (indirectly) from these
properties. First, a square matrix A is invertible if and only if zero is not an
eigenvalue of A. Second, if ! is an eigenvalue of A and A is invertible, then ! "1 is an
eigenvalue of A !1 . Third, if A and B are both n ! n matrices, then the eigenvalues of
AB are the same as those of BA.
The fourth characterization of an eigenvalue is usually the easiest to work with in
order to solve for them. In the 2 ! 2 case, we have that:
a()
b
!a b $
A=#
' A ( )I = 0 '
=0
&
c
d()
"c d %
This means that the eigenvalues are roots to the quadratic equation:
( a ! " ) ( d ! " ) ! cb = 0
# " 2 ! ( a + d ) " + ( ad ! cb ) = 0
Consulting Sydsæter, Strøm, and Berck (page seven!) for the quadratic formula, we
can write the solutions to the eigenvalue problem as:
!=
1
2
(
tr ( A ) ±
( tr ( A ))
2
" 4 det ( A )
)
Fall 2007 math class notes, page 86
This leads to a problem, that eigenvalues are not necessarily real numbers, since the
term under the radical is not necessarily positive. What we are interested in is the
modulus of any complex eigenvalues.
If we have a complex number z = x + yi , where x and y are real scalars, the modulus
or magnitude of z is defined as z = x 2 + y 2 . This is like the length of the vector z
in the complex plane. If a number is strictly real, then its modulus is its absolute
value. All of this stuff is necessary for this result, important for testing stability.
Theorem: All eigenvalues of a square matrix A have moduli strictly less than zero if and
only if A n ! 0 as n ! " .
Corollary:
If A ! 1 or A ! 1 , then A n does not converge.
This second result confirms the case when x is a scalar and x ! 0 , then x n does not
converge. Again, we see some relationship between a determinant of a matrix and
the absolute value of a scalar.
The last thing on the agenda for this lecture is to do a problem working with
matrices. You will have to do this frequently in econometrics—in fact, this example
is the fundamental principle behind linear regressions.
Somewhere out in the world, there is a relationship between some dependent
variable yi and some other variables. We would like to describe this relationship as
an affine function of m variables and a constant, more or less:
yi = ! 0 + !1 xi1 + ! 2 xi 2 + … + ! m xim + ei
Because we have n > m observations, we can’t exactly solve this system of equations
(we have more equations than unknowns). Therefore, we’ll have to say that some of
each value of yi is explained by some “outside, unobservable things” captured in the
term ei that have absolutely no bearing on anything we’re actually interested in. We
observe the values of each of the xij and the yi . The question is to find values of ! k ,
so that we can blame as little as possible of the outcome of “unobserved stuff” on the
ei . First, though, let’s write this problem in matrix form:
! y1 $
!1 x11
#y &
#1 x
12
# 2& = #
#!&
#1 !
# &
#
" yn % n '1 "1 x1n
x21 " xm1 $
! (0 $
! e1 $
&
#
&
#e &
x22
xm 2
(
2
&
# 1&
+# &
# ! &
#!&
# ! &
&
# &
# &
x2n " xmn % n ' ( m +1) " ( m %( m +1) '1 " en % n '1
Then I rewrite this as:
Yn !1 = X n ! ( m +1)"( m +1) !1 + e n !1
Fall 2007 math class notes, page 87
The problem will be that we want to find the value of beta that minimizes the size of
the vector e. Recall that for a vector z !R k , its size or length is defined as:
z = z12 + z22 + … + zk2 = z !z
Essentially, the problem is to solve:
min ! "R m
(
e #e
)
And thus begins our first exercise in working with matrices.
First of all, I remember that minimizing a function is the same thing as minimizing a
strictly monotonic transformation of that function, so I square the objective. Also, I
make a substitution.
min ! "R m ( e#e ) = min ! "R m % ( Y $ X! )# ( Y $ X! )'
&
(
Then I multiply out the term in parentheses, keeping in mind that the usual formula
for squaring a term doesn’t work for matrices (since matrix multiplication is not
commutative, right?).
min ! "R m ( Y#Y $ Y#X! $ ! #X#Y + ! #X#X! )
Since I have an objective function that I want to minimize with respect to the vector
beta, I am going to take a derivative and set it equal to zero.
Here is a rule for taking the derivative of a linear function of a vector, when the
vector is transposed:
!
# !
&"
z
M
=
z
M
"
"
( ) %$
('
!z
! z"
Then the first order condition for this problem is to find ! to solve:
! (e"e)
= $ Y"X $ Y"X + # "X"X + # "X"X = 0
!#
Collecting terms and transposing, we want to find:
!2 X"Y + 2 X"X# = 0
X!X" = X!Y
( X!X )"1 ( X!X ) # = ( X!X )"1 ( X!Y )
! = ( X"X )
#1
( X"Y )
Fall 2007 math class notes, page 88
With a little matrix algebraic manipulation, we have derived the expression for the
estimator of the coefficients in least-squares, linear regression model. At least, we’ve
found a critical point—the proof that this is indeed a minimum is left to the reader.
In some ways, working with systems of equations written in matrix form is only a bit
more complicated that working with a single variable.
Fall 2007 math class notes, page 89