Survey

Document related concepts

Matrix calculus wikipedia, lookup

System of linear equations wikipedia, lookup

Singular-value decomposition wikipedia, lookup

Eigenvalues and eigenvectors wikipedia, lookup

Gaussian elimination wikipedia, lookup

Matrix multiplication wikipedia, lookup

Cayley–Hamilton theorem wikipedia, lookup

Linear least squares (mathematics) wikipedia, lookup

Ordinary least squares wikipedia, lookup

Principal component analysis wikipedia, lookup

Non-negative matrix factorization wikipedia, lookup

Perron–Frobenius theorem wikipedia, lookup

Orthogonal matrix wikipedia, lookup

Jordan normal form wikipedia, lookup

Transcript

XI. LINEAR ALGEBRA: MATRICES FOR DERIVATIVES I always think of matrices as compact ways of expressing systems of equations, derivatives, observations, and such. Matrix algebra provides the tools for handling these large systems very efficiently, including matrices of derivatives. In the simplest case, think of a real-valued function of n variables, f : R n ! R1 . It has n partial derivatives. The column vector consisting of all these partial derivatives is the gradient vector, # " f " x1 & %" f " x ( 2( !x f ( x ) = % % ! ( % ( $ " f " xn ' We talk about gradients only for functions into R1 . More generally, the matrix of first derivatives of a function f : R n ! R m , also called the Jacobian matrix, is: ! (( xf1 ! f1 (x) $ # 1 # f (x) & # (( xf2 2 # & f (x) = ' Dx f(x) = # 1 # ! & # ! # & # ( fm f (x) " m % #" ( x1 ( f1 ( x2 ( f2 ( x2 ( fm ( x2 $ ! ( ) x f1 (x))* $ & & # ( f2 # & & ) x f2 (x))* & ( xn ( # &= & # ! & # ! # & (f & " ( xmn &% #( ) f (x))* & " x m % " ( f1 ( xn The matrix of first derivatives is m ! n . Going back to the case for R1 -valued functions, you will often see instead of gradients, the “row vector” (and to those who claim there are no such thing, I say “ 1 ! n matrix”) of partial derivatives: Dx f(x) = "# ! x1 !f !f ! x2 ! !f ! xn $ = [ & x f(x)]' % There only difference is geometric orientation. Example: Dp x ( p, w ) ? Example: # ! w (1 " ! ) w & Let x ( p, w ) = ( x1 ( p1 , p2 , w ) , x2 ( p1 , p2 , w )) = % . , p2 (' $ p1 What is "w p % Now let x ( p, w ) = $ ! 1, 1 ' . What is Dp x ( p, w ) ? p2 & # p1 The second important matrix in theory classes in the matrix of second-derivatives and cross-partials of a real-valued function of n variables, f : R n ! R1 . The Hessian matrix is the n ! n whose ij-th entry is ! 2 f !xi !x j . On the diagonal of this matrix, Fall 2007 math class notes, page 80 we have second derivatives; off the diagonals, we have cross-partials. Since ! 2 f !xi !x j = ! 2 f !x j !xi , the ij-th entry of the matrix is the same as the ji-th entry, so the Hessian matrix is always symmetric. # " 2f # ""xf &) % " x1 % " f1 ( % "2 f % " x1 ( T 2 f ( x ) ! Dx f ( x ) = % ( = ( * x f ( x )) ! Dxx f ( x ) = % " x1" x2 % ! %! ( % 2 % "f ( % " f %$ " x1 (' $ " x1" xn 2 "2 f " x1" x2 "2 f " x22 "2 f " x2 " xn & ( "2 f ( " x2 " xn ( = Dx* x f ( x ) # ! ( ( "2 f " " x2 ( ' n " "2 f " x1" xn Hessian matrices are comparable to the second derivative of an R1 ! R1 function, and they will always be used to test the concavity of a function of more than one variable. You should get very accustomed to finding Hessians. Example: F ( K, L ) = AK ! L1" ! . Find D 2 F ( K, L ) with respect to all the inputs. Example: U ( x1 , x2 ) = x1! + x2! ( ) 1! 2 . Find Dxx U ( x1 , x2 ) . Example: U ( x1 , x2 , x3 ) = x1! x2" x3# and x ( p, w ) = (! + " + # ) 2 V ( p, w ) . Find V ( p, w ) = U ( x ( p, w )) and then Dpp $1 (! w p1 , " w p2 , # w p3 ) . Example: U ( x1 , x2 ) = x1 + ln x2 and x ( p, w ) = ( w p1 ! 1 , p2 p1 ) . Find V ( p, w ) and 2 V ( p, w ) . then Dpp The test for concavity or convexity of a function of one variable was whether the second derivative was negative or positive. We see that for functions of more than one variable, there is a matrix of second derivatives. How do we identify whether a matrix is positive or negative? The answer is not to look to look simply at each of the ! 2 f !xi2 . It is also not sufficient to ensure that each individual element is negative. Example: Is the function f ( x, y ) = x 2 y 2 convex? On first inspection, it looks like the product of two convex functions. It’s convex in each variable individually; that is, both (own) second derivatives are positive. The Hessian matrix of this function is: ! 2y 2 4xy $ D 2 f ( x, y ) = # 2& " 4xy 2x % All the elements of this matrix are positive. And yet, consider this: at two points on the axes, f ( 0,1) = f (1, 0 ) = 0 . At a convex combination of those two points, we find that 1 2 f ( 0,1) + 1 2 f ( 0 ) = 0 !/ 116 = f ( 1 2 , 1 2 ) . This is not a convex function. (It looks like a giant scalloped bowl.) Fall 2007 math class notes, page 81 This demonstrates that we need a new way to define positive and negative for matrices, at least as far as second-derivative tests are concerned. Definition: An n ! n matrix A is positive semidefinite if for all vectors x !R n , the number x ! Ax " 0 . Definition: An n ! n matrix A is negative semidefinite if for all vectors x !R n , the number x ! Ax " 0 . Definition: An n ! n matrix A is positive definite if for all vectors x !R n , x ! 0 , the number x ! Ax > 0 . Definition: An n ! n matrix A is positive definite if for all vectors x !R n , x ! 0 , the number x ! Ax > 0 . Any matrix that is not one of these is called indefinite. Note that if M is a scalar (which is really just a 1 ! 1 matrix), these definitions correspond to the usual definitions of weakly or strictly positive or negative, since x 2 changes the sign of nothing, provided x ! 0 . A function is concave if and only if its Hessian matrix is negative semidefinite; strictly concave if and only if its Hessian matrix is negative definite. Whenever we are maximizing a function of more than one variable, we must find the Hessian matrix of the funtion and confirm that it is negative semidefinite in order to ensure that we have found a maximum. The definitions are generally hard to work with, when doing these tests. There is a set of rules for determining the sign and definiteness of a matrix. If we have an n ! n matrix A, a k-th order principal submatrix of A is a matrix that results from deleting n ! k rows and the same n ! k columns from A. If we have a 3 ! 3 matrix, ! a11 A = ## a21 #" a31 a12 a22 a32 a13 $ a23 && a33 %& then we can form three second order principal submatrices: by deleting the first row and first column, by deleting the second row and second column, and by deleting the third row and third column: Fall 2007 math class notes, page 82 ! a22 #a " 23 ! a11 #a " 31 ! a11 #a " 21 a12 a13 $ !a a32 $ # 11 = a a22 a23 && a33 &% # 21 #" a31 a32 a33 &% a12 a13 $ !a a13 $ # 11 = a a22 a23 && a33 &% # 21 #" a31 a32 a33 &% a12 a13 $ !a a12 $ # 11 = # a21 a22 a23 && & a22 % #" a31 a32 a33 &% The leading principal submatrices of A are only those principal submatrices formed by deleting the last n ! k rows and columns of matrix. For the 3 ! 3 matrix described above, the first, second, and third order leading principal submatrices are: [a ] , 11 ! a11 #a " 21 a12 $ , a22 &% and: ! a11 #a # 21 #" a31 a12 a22 a32 a13 $ a23 && a33 &% The determinant of a k-th order principal submatrix is called a k-th order principal minor. Here is the relationship between the principle minors and the sign and definiteness of a matrix: Theorem: An n ! n matrix A is positive definite if and only if all its n leading principal minors are strictly positive. Theorem: An n ! n matrix A is positive semidefinite if and only if all its principal minors (not just leading!) are nonnegative. Theorem: An n ! n matrix A is negative definite if and only if its n leading principal k minors alternate in signs, with the sign of the k-th order leading principal minor equal to ( !1) . Theorem: An n ! n matrix A is negative semidefinite if and only if all its principal k minors (not just leading!) of order k equal zero or have the sign of ( !1) . This is what you have to do to test the concavity or convexity of a function of several variables. Finding principal submatrices of two-by-two and three-by-three matrices isn’t terribly difficult. Once it starts getting to four and five and more dimensional, it’s a real pain. Unfortunately, there’s not really a better way to determine concavity. Only a particularly sadistic professor would ask you to test the concavity of a function of more than three variables. Here are the simple rules for two and three variable cases (just for concavity). Fall 2007 math class notes, page 83 Suppose your Hessian matrix is 2 ! 2 : ! a11 A=# " a21 a12 $ a22 &% To show that the function is strictly concave, you need to show that: a11 a21 a11 < 0 and a12 >0 a22 For just plain concavity, you to show that both of those hold with weak inequality, and additionally that a22 ! 0 . For a function of three variables, you have a 3 ! 3 Hessian matrix. For strict concavity, you need to confirm that the leading principal minors have the right signs: a11 a21 a11 < 0 , a11 a21 a31 a12 > 0 , and: a22 a12 a22 a32 a13 a23 < 0 a33 For plain concavity, you need to show that all the second derivatives are nonpositive: a11 ! 0 , a22 ! 0 , a33 ! 0 And that all the second-order principal minors are nonnegative: a22 a23 a32 ! 0, a33 a11 a31 a13 !0, a33 a11 a21 a12 !0 a22 Finally, check that the determinant of the matrix itself is nonpositive. Example: Let f ( x, y ) = x 2 y 2 . Check for concavity or convexity (strict or otherwise). Example: Let f ( x, y ) = xy . Check for concavity or convexity (strict or otherwise). Example: otherwise). Let f ( x, y ) = x 2 + y 2 + xy . Check for concavity or convexity (strict or Example: Let F ( K, L ) = AK ! L1" ! . concavity or strict concavity. ( Example: Let U ( x1 , x2 ) = x1! + x2! concavity or strict concavity. Find the matrix D 2 F ( K, L ) and check for ) 1! . Find the matrix Dxx2 U ( x1 , x2 ) and check for Fall 2007 math class notes, page 84 Example: Let U ( x1 , x2 , x3 ) = x1! x2" x3# and x ( p, w ) = ( !w p1 (! + " + # ) "w 2 V ( p, w ) and check for concavity or strict concavity. the matrix D pp Example: Let U ( x1 , x2 ) = x1 + ln x2 and x ( p, w ) = 2 D pp V ( p, w ) and check for concavity or strict concavity. ( w p1 #w ) , p2 (! + " + # ) , p2 (! + " + # ) . Find ) ! 1, p21 . p Find the matrix Okay, enough about concavity. Let’s talk briefly about eigenvalues and eigenvectors, which will be useful for checking the stability of systems of differential (or difference) equations. In macro, you might have a matrix that describes how several variables in the economy evolve, for instance: ! kt +1 $ ! a11 # m & = #a " t +1 % " 21 a12 $ ! kt $ a22 &% #" mt &% where kt and kt +1 describe the capital stock of the country at time t and at t+1; mt and mt +1 are real money balances. The matrix A consists of constants, or likely as not, linear first-order approximations of some functions (remember the Taylor series?). If we want to find the values of these two variables two years in the future, we would use the formula iteratively: ! kt + 2 $ ! a11 # m & = #a " t + 2 % " 21 a12 $ ! kt +1 $ ! a11 = a22 &% #" mt +1 &% #" a21 a12 $ ! a11 a22 &% #" a21 a12 $ ! kt $ a22 &% #" mt &% And then to find the value of the variables n years into the future we just repeat this multiplication n times: n ! kt + n $ ! a11 a12 $ ! kt $ ! kt $ = An # & # m & = #a & # & " t + n % " 21 a22 % " mt % " mt % Does this settle down at some point, or do the variables keep growing forever? Remember from earlier that if we multiply any x !( "1,1) by itself a number of times, it gets really small, and: xn ! 0 as n!" In macroeconomics we might be wondering something very similar, except that the number x has been replaces with a matrix A. In order to see whether this matrix converges or not, we have to look at its eigenvalues. Fall 2007 math class notes, page 85 Given an n ! n matrix A, a scalar ! is called an eigenvalue or characteristic value of A if there exists a nonzero vector x !R n (called the eigenvector or characteristic vector) such that: Ax = ! x Here are some alternatives characterizations of an eigenvalue. Theorem: The following statements are equivalent: ! is an eigenvalue of A. ! ! 2. ( A ! " I ) x = 0 has a solution other than x = 0 . 1. 3. A ! " I is singular 4. A ! "I = 0 . Matrices often have multiple eigenvalues. In fact, almost all n ! n matrices have n distinct eigenvalues. Given that a matrix A has m eigenvalues !1 , !2 ,…, !m , the following are true: " !m = tr ( A ) m i =1 " and: m i =1 !m = A The large capital pi indicates product over a bunch of variables, just as a large capital sigma indicates the sum. We can make a few observations (indirectly) from these properties. First, a square matrix A is invertible if and only if zero is not an eigenvalue of A. Second, if ! is an eigenvalue of A and A is invertible, then ! "1 is an eigenvalue of A !1 . Third, if A and B are both n ! n matrices, then the eigenvalues of AB are the same as those of BA. The fourth characterization of an eigenvalue is usually the easiest to work with in order to solve for them. In the 2 ! 2 case, we have that: a() b !a b $ A=# ' A ( )I = 0 ' =0 & c d() "c d % This means that the eigenvalues are roots to the quadratic equation: ( a ! " ) ( d ! " ) ! cb = 0 # " 2 ! ( a + d ) " + ( ad ! cb ) = 0 Consulting Sydsæter, Strøm, and Berck (page seven!) for the quadratic formula, we can write the solutions to the eigenvalue problem as: != 1 2 ( tr ( A ) ± ( tr ( A )) 2 " 4 det ( A ) ) Fall 2007 math class notes, page 86 This leads to a problem, that eigenvalues are not necessarily real numbers, since the term under the radical is not necessarily positive. What we are interested in is the modulus of any complex eigenvalues. If we have a complex number z = x + yi , where x and y are real scalars, the modulus or magnitude of z is defined as z = x 2 + y 2 . This is like the length of the vector z in the complex plane. If a number is strictly real, then its modulus is its absolute value. All of this stuff is necessary for this result, important for testing stability. Theorem: All eigenvalues of a square matrix A have moduli strictly less than zero if and only if A n ! 0 as n ! " . Corollary: If A ! 1 or A ! 1 , then A n does not converge. This second result confirms the case when x is a scalar and x ! 0 , then x n does not converge. Again, we see some relationship between a determinant of a matrix and the absolute value of a scalar. The last thing on the agenda for this lecture is to do a problem working with matrices. You will have to do this frequently in econometrics—in fact, this example is the fundamental principle behind linear regressions. Somewhere out in the world, there is a relationship between some dependent variable yi and some other variables. We would like to describe this relationship as an affine function of m variables and a constant, more or less: yi = ! 0 + !1 xi1 + ! 2 xi 2 + … + ! m xim + ei Because we have n > m observations, we can’t exactly solve this system of equations (we have more equations than unknowns). Therefore, we’ll have to say that some of each value of yi is explained by some “outside, unobservable things” captured in the term ei that have absolutely no bearing on anything we’re actually interested in. We observe the values of each of the xij and the yi . The question is to find values of ! k , so that we can blame as little as possible of the outcome of “unobserved stuff” on the ei . First, though, let’s write this problem in matrix form: ! y1 $ !1 x11 #y & #1 x 12 # 2& = # #!& #1 ! # & # " yn % n '1 "1 x1n x21 " xm1 $ ! (0 $ ! e1 $ & # & #e & x22 xm 2 ( 2 & # 1& +# & # ! & #!& # ! & & # & # & x2n " xmn % n ' ( m +1) " ( m %( m +1) '1 " en % n '1 Then I rewrite this as: Yn !1 = X n ! ( m +1)"( m +1) !1 + e n !1 Fall 2007 math class notes, page 87 The problem will be that we want to find the value of beta that minimizes the size of the vector e. Recall that for a vector z !R k , its size or length is defined as: z = z12 + z22 + … + zk2 = z !z Essentially, the problem is to solve: min ! "R m ( e #e ) And thus begins our first exercise in working with matrices. First of all, I remember that minimizing a function is the same thing as minimizing a strictly monotonic transformation of that function, so I square the objective. Also, I make a substitution. min ! "R m ( e#e ) = min ! "R m % ( Y $ X! )# ( Y $ X! )' & ( Then I multiply out the term in parentheses, keeping in mind that the usual formula for squaring a term doesn’t work for matrices (since matrix multiplication is not commutative, right?). min ! "R m ( Y#Y $ Y#X! $ ! #X#Y + ! #X#X! ) Since I have an objective function that I want to minimize with respect to the vector beta, I am going to take a derivative and set it equal to zero. Here is a rule for taking the derivative of a linear function of a vector, when the vector is transposed: ! # ! &" z M = z M " " ( ) %$ (' !z ! z" Then the first order condition for this problem is to find ! to solve: ! (e"e) = $ Y"X $ Y"X + # "X"X + # "X"X = 0 !# Collecting terms and transposing, we want to find: !2 X"Y + 2 X"X# = 0 X!X" = X!Y ( X!X )"1 ( X!X ) # = ( X!X )"1 ( X!Y ) ! = ( X"X ) #1 ( X"Y ) Fall 2007 math class notes, page 88 With a little matrix algebraic manipulation, we have derived the expression for the estimator of the coefficients in least-squares, linear regression model. At least, we’ve found a critical point—the proof that this is indeed a minimum is left to the reader. In some ways, working with systems of equations written in matrix form is only a bit more complicated that working with a single variable. Fall 2007 math class notes, page 89