Download Document

Lecture XXVI   The material for this lecture is found in James R. Schott Matrix Analysis for Statistics (New York: John Wiley & Sons, Inc. 1997). A matrix A of size m x n is an m x n rectangular array of scalars:  a11 a12 a a 21 22 A     am1 am 2  a1n   a2 n       amn   It is sometimes useful to partition matrices into vectors.  a11 a A   21    am1 a12 a22  am 2  a1n   a2 n   a1      amn   a1  a    2        a3  a2  an   a1 j  a  2j  a j  or ai  ai1 ai 2  aim       amj   The sum of two identically dimensioned matrices can be expressed as A  B   aij  bij    In order to multiply a matrix by a scalar, multiply each element of the matrix by the scalar. In order to discuss matrix multiplication, we first discuss vector multiplication. Two vectors x and y can be multiplied together to form z (z=x y) only if they are conformable. If x is of order 1 x n and y is of order n x 1, then the vectors are conformable and the multiplication becomes: n z  x y   xi yi i 1  Extending this discussion to matrices, two matrices A and B can be multiplied if they are conformable. If A is order k x n and B is of order n x l. then the matrices are conformable. Using the partitioned matrix above, we have  a1  a  2   b1 b2 C  AB        ak    a1b1 a1b2 a b a b 2  2   2 1     ak b1 ak b2  bl   a1bl   a2bl       ak bl   Theorem 1.1 Let a and b be scalars and A, B, and C be matrices. Then when the operations involved are defined, the following properties hold: ◦ ◦ ◦ ◦ ◦ A+B=B+A. (A+B)+C=A+(B+C). a(A+B)=aA+aB. (a+b)A=aA+bA. A-A=A+(-A)=(0). ◦ A(B+C)=AB+AC. ◦ (A+B)C=AC+BC. ◦ (AB)C=A(BC).  The transpose of an m x n matrix is a n x m matrix with the rows and columns interchanged. The transpose of A is denoted A’.  Theorem 1.2 Let a and b be scalars and A and B be matrices. Then when defined, the following hold ◦ ◦ ◦ ◦ (aA)’=aA’. (A’)’=A. (aA+bB)’=aA’+bB’. (AB)’=B’A’.  The trace is a function defined as the sum of the diagonal elements of a square matrix. m tr  A   aii i 1  Theorem 1.3 Let a be scalar and A and B be matrices. Then when the appropriate operations are defined, we have ◦ ◦ ◦ ◦ ◦ tr(A’)=tr(A). tr(aA)=atr(A). tr(A+B)=tr(A)+tr(B). tr(AB)=tr(BA). tr(A’A)=0 if and only if A=(0).  Traces can be very useful in statistical applications. For example, natural logarithm of the normal distribution function can be written as:  1 1 1  n  ,     mn ln 2   n ln     tr  1Z 2 2 2  n Z    yi    yi   ' i 1 ◦ Jan R. Magnus and Heinz Neudecker Matrix Differential Calculus with Applications in Statistics and Econometrics (New York: John Wiley & Sons, 1988) p. 314.  The Determinant is another function of square matrices. In its most technical form, the determinant is defined as: A    1    1 f i1 ,i2 ,im  f i1 ,i2 im  a1i1 a2i2  amim ai11ai2 2  aim m where the summation is taken over all permutations, (i1,i2,…im) of the set of integers (1,…m), and the function f(i1,i2,…im) equals the number of transpositions necessary to change (i1,i2,…im).  In the simple case of a 2 x 2, we have two possibilities (1,2) and (2,1). The second requires one transposition. Under the basic definition of the determinant: A   1 a11a22   1 a12a21 0 1  In the slightly more complicated case of a 3 x 3, we have six possibilities (1,2,3), (2,1,3), (2,3,1), (3,2,1), (3,1,2), (1,3,2). Each one of these differs from the previous one by one transposition. Thus, the number of transpositions are 0, 1, 2, 3, 4, 5. The determinant is then defined as: A   1 a11a22a33   1 a12a21a33   1 a12a23a31  0 1 2  1 a13a22a31   1 a13a21a32   1 a11a23a32 3 4 5  a11a22a33  a12a21a33  a12a23a31  a13a22a31  a13a21a32  a11a23a32  A more straightforward definition involves the expansion down a column or across the row. ◦ In order to do this, I want to introduce the concept of principal minors.  The principal minor of an element in a matrix is the matrix with the row and column of the element removed.  The determinant of the principal minor times negative one raised to the row number plus the column number is called the cofactor of the element. ◦ The determinant is then the sum of the cofactors times the elements down a particular column or across the row: m  A   aij Aij   aij  1 mij j 1 i j   In the three by three case: A  a11  1 11 a22 a32 a31  1 31 a23 2 1 a12  a21  1 a33 a32 a12 a22 a13  a33 a13 a23 A  a11a22a33  a11a23a32  a12a21a33  a13a21a32  a12a23a31  a13a22a31  Theorem 1.4 If a is a scalar and A is an m x m matrix, then the following properties hold: |A’|=|A|. |aA|=am|A|. If A is a diagonal matrix, then |A|=a11a22…amm. If all elements of a row (or column) of A are zero, |A|=0. ◦ If two rows (or columns) of A are proportional to one another, |A|=0. ◦ ◦ ◦ ◦ ◦ The interchange of two rows (or columns) of A changes the sign of |A|. ◦ If all the elements of a row (or column) of A are multiplied by a, then the determinant is multiplied by a. ◦ The determinant of A is unchanged when a multiple of one row (or column) is added to another row (or column).  Any m x m matrix A such that |A|≠0 is said to be a nonsingular matrix and possesses an inverse denoted A-1. 1 1 AA  A A  I m  Theorem 1.6 If a is a nonzero scalar, and A and B are nonsingular m x m matrices, then ◦ ◦ ◦ ◦ ◦ ◦ ◦ (aA)-1=a-1A-1. (A’)-1=(A-1)’. (A-1)-1=A. |A-1|=|A|-1. If A=diag(a11,…amm), then A-1=diag(a11-1,…amm-1). If A=A’, then A-1=(A-1)’. (AB)-1=B-1A-1.  The most general definition of an inverse involves the adjoint matrix (denoted A#). The adjoint matrix of A is the transpose of the matrix of cofactors of A. By construction of the adjoint, we know that: AA#  A# A  diag  A , A , A   A I m  In order to see this identity, note that aibi  A where B  A# a j bi  0 where B  A# and i  j  Focusing on the first point AA# 11  a11 a12   1 a11 11  1 1 3 a13  11 a22   1 a32   1 2 a21 a13  1 a31   1 3 a21  1 a31  a22 a23 a32 a33 a21 a22 a31 a32 a23   a33  a23   a33  a22   a32    1 a12 1 2  A a21 a23 a31 a33   Given this expression, we see that 1 1 A  A A#  1 0 0  1 9     3 1 0  3 7   2 0 1  2 3   9 1  1 0   20 0  1 0  0 20    0  15 1  0 20   5 1 0 0  8 0 1 0 5 0 0 1  0 0   20  7  3 1 0   15  5  2 0 1  9 5 1  1 0  37  1  5   0 1  7  0 5   4  0 0 0   1 0  0 1  0 0  9 0 37 7 0  20 20 20 3 1 7 1 0 20 20 20  1 0 1 3 1 4 4 4  0  11 6  37  5 5 0 1 1 7  5 5 1 1 3 4    The rank of a matrix is the number of linearly independent rows or columns. One way to determine the rank of any general matrix m x n is to delete rows or columns until the resulting r x r matrix has a nonzero determinant. What is the rank of the above matrix? If the above matrix had been: 1 9 5    A  3 7 8   4 16 13   note |A|=0. Thus, to determine the rank, we delete the last row and column leaving 1 9   A1  7  27  20. A1   3 7  The rank of a matrix A remains unchanged by any of the following operations, called elementary transformations: ◦ The interchange of two rows (or columns) of A. ◦ The multiplication of a row (or column) of A by a nonzero scalar. ◦ The addition of a scalar multiple of a row (or column) of A to another row (or column) of A.  An m x 1 vector p is said to be a normalized vector or a unit vector if p’p=1. The m x 1 vectors p1, p2,…pn where n is less than or equal to m are said to be orthogonal if pi’pj=0 for all i not equal to j. If a group of n orthogonal vectors are also normalized, the vectors are said to be orthonormal. An m x m matrix consisting of orthonormal vectors is said to be orthogonal. It then follows: P' P  I  It is possible to show that the determinant of an orthogonal matrix is either 1 or –1.  In general, the a quadratic form of a matrix can be written as: m m x' Ay   xi y j aij  i 1 j 1 We are most often interested in the quadratic form x’Ax.  Every symmetric matrix A can be classified into one of five categories: ◦ If x’Ax > 0 for all x ≠ 0, the A is positive definite. ◦ If x’Ax ≥ 0 for all x ≠ 0 and x’Ax=0 for some x ≠ 0, the A is positive semidefinite. ◦ If x’Ax < 0 for all x ≠ 0 then A is negative definite. ◦ If x’Ax ≤ 0 for all x ≠ 0 and x’Ax=0 for some x ≠ 0, the A is negative semidefinite. ◦ If x’Ax>0 for some x and x’Ax<0 for some x, then A is indefinite.  Definition 2.1. Let S be a collection of m x 1 vectors satisfying the following: ◦ If x1 ε S and x2 ε S, then x1+x2 ε S. ◦ If x ε S and a is a real scalar, the ax ε S. Then S is called a vector space in mdimensional space. If S is a subset of T, which is another vector space in mdimensional space, the S is called a vector subspace of T.  Definition 2.2 Let {x1,…xn} be a set of m x 1 vectors in the vector space S. If each vector in S can be expressed as a linear combination of the vectors x1,…xn, then the set {x1,…xn} is said to span or generate the vector space S, and {x1,…xn} is called a spanning set of S.  Definition 2.6 The set of m x 1 vectors {x1,…xn} is said to be a linearly independent if the only solution to the equation n a x i 1 i i 0 is the zero vector a1=…an=0.  1 0 0  1 9 5       3 1 0  3 7 8    4 0 1  4 16 13    1 9  1 0 9 5    20  0  1  0  0  20  7  20   1 1  0  20  7  0    1 0 37   20  0 1 7  20   0  0 0    This reduction implies that:  1 9 5       37  3   7  7    8  20 20  4 16  13       Or that the third column of the matrix is a linear combination of the first two.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Document