Download Matrix Algebra Primer - Louisiana Tech University

A Little Necessary Matrix Algebra for Doctoral Studies in Business Administration James J. Cochran Department of Computer Information Systems & Analysis Louisiana Tech University [email protected] Matrix Algebra Matrix algebra is a means of efficiently expressing large numbers of calculations to be made upon ordered sets of numbers Often referred to as Linear Algebra Why use it? Matrix algebra is used primarily to facilitate mathematical expression. Many equations would be completely intractable if scalar mathematics had to be used. It is also important to note that the scalar algebra is under there somewhere. Definitions - Scalars scalar - a single value (i.e., a number) Definitions - Vectors Vector - a single row or column of numbers  Each individual entry is called an element  denoted with bold small letters  row vector a  1 2 3 4  column vector 1  2 a   3   4  Definitions - Matrices A matrix is a rectangular array of numbers (called elements) arranged in orderly rows and columns a A   11 a21 a12 a22 a13  a32  Subscripts denote row (i=1,…,n) and column (j=1,…,m) location of an element Definitions - Matrices Matrices are denoted with bold Capital letters All matrices (and vectors) have an order or dimensions - that is the number of rows x the number of columns. Thus A is referred to as a two by three matrix.  Often a matrix A of dimension n x m is denoted Anxm  Often a vector a of dimension n (or m) is denoted An (or Am) Definitions - Matrices Null matrix – a matrix for which all elements are zero, i.e., aij = 0  i,j Square Matrix – a matrix for which the number of rows equals the number of columns (n = m) Symmetric Matrix – a matrix for which aij = aji  i,j Definitions - Matrices Diagonal Elements – Elements of a Square Matrix for which the row and column locations are equal, i.e., aij  i = j Upper Triangular Matrix – a matrix for which all elements below the diagonal are zero, i.e., aij = 0  i,j  i < j Lower Triangular Matrix – a matrix for which all elements above the diagonal are zero, i.e., aij = 0  i,j  i > j Matrix Equality Thus two matrices are equal iff (if and only if) all of their elements are identical Note that statistical data sets are matrices (usually with observations in the rows and variables in the columns) Variable 1 Variable 2 Observation 1 a11 a12 Observation 2 a21 a22 Observation n an1 an2 Variable m a1m a2m anm Basic Matrix Operations Transpositions Sums and Differences Products Inversions The Transpose of a Matrix The transpose A’ of a matrix A is the matrix such that the ith row of A is the jth column of A’, i.e., B is the transpose of A iff bij = aji  i,j This is equivalent to fixing the upper left and lower right corners then rotating the matrix 180 degrees Transpose of a Matrix An Example If we have then i.e., 1 4  A  2 5  3 6  A '  1 2 3  4 5 6  1 4  A   2 5  A '  1 2 3  4 5 6  3 6  More on the Transpose of a Matrix (A’)’ = A (think about it!) If A = A', then A is symmetric Sums and Differences of Matrices Two matrices may be added (subtracted) iff they are the same order Simply add (subtract) elements from corresponding locations where  a11 a21 a31  a12  b11 b12   c11 a22  + b21 b22  = c 21 a32  b31 b32  c 31 a11 + b11 = c11 , a12 + b12 = c12 , a21 + b21 = c 21 , a22 + b22 = c 22 , a31 + b31 = c 31 , a32 + b32 = c 32 c12  c 22  c 32  Sums and Differences An Example If we have 1 2  A  3 4  5 6  and 7 10  B = 8 11  9 12  then we can calculate C = A + B by 1 2  7 10   8 12  C  A + B = 3 4  + 8 11 = 11 15  5 6  9 12  14 18  Sums and Differences An Example Similarly, if we have 1 2  A  3 4  5 6  and 7 10  B = 8 11  9 12  then we can calculate C = A - B by 1 2  7 10  -6 -8  C  A - B = 3 4  - 8 11 = -5 -7  5 6  9 12  -4 -6  Some Properties of Matrix Addition/Subtraction Note that  The transpose of a sum = sum of transposes  (A+B+C)’ = A’+B’+C’  A+B = B+A (i.e., matrix addition is commutative)  Matrix addition can be extended beyond two matrices  matrix addition is associative, i.e., A+(B+C) = (A+B)+C Products of Scalars and Matrices To multiply a scalar times a matrix, simply multiply each element of the matrix by the scalar quantity  a11 b a21 a12  ba11 ba12  = a22  ba21 ba22  Products of Scalars & Matrices An Example If we have 1 2  A  3 4  5 6  and b = 3.5 then we can calculate bA by 1 2   3.5 7.0  bA  3.5 3 4  = 10.5 14.0  5 6  17.5 21.0  Note that bA = Ab if b is a scalar Some Properties of Scalar x Matrix Multiplication Note that  if b is a scalar then bA = Ab (i.e., scalar x matrix multiplication is commutative)  Scalar x Matrix multiplication can be extended beyond two scalars  Scalar x Matrix multiplication is associative, i.e., ab(C) = a(bC)  Scalar x Matrix multiplication leads to removal of a common factor, i.e., if ba11 ba12   a11 C  ba21 ba22  then C  bA where A = a21 ba31 ba32  a31    a12  a22  a32  Products of Matrices We write the multiplication of two matrices A and B as AB This is referred to either as  pre-multiplying B by A or  post-multiplying A by B So for matrix multiplication AB, A is referred to as the premultiplier and B is referred to as the postmultiplier Products of Matrices In order to multiply matrices, they must be conformable (the number of columns in the premultiplier must equal the number of rows in postmultiplier) Note that  an (m x n) x (n x p) = (m x p)  an (m x n) x (p x n) = cannot be done  a (1 x n) x (n x 1) = a scalar (1 x 1) Products of Matrices If we have A(3x2) and B(2x3) then  a11 a12 a 13  b11 b12   c11 c12  AB  a21 a22 a23  x b21 b22  = c 21 c 22   C a31 a32 a33  b31 b32  c 31 c 32        where c11 = a11b11 + a12b21 + a13b31 c12 = a11b12 + a12b22 + a13b32 c 21 = a21b11 + a22b21 + a23b31 c 22 = a21b12 + a22b22 + a23b32 c 31 = a31b11 + a32b21 + a33b31 c 32 = a31b12 + a32b22 + a33b32 Products of Matrices If we have A(3x2) and B(2x3) then b11 b12   a11 a12 a 13  BA  b21 b22  x a21 a22 a23  = undefined b31 b32  a31 a32 a33      i.e., matrix multiplication is not commutative (why?) Matrix Multiplication An Example If we have then 1 4 7  1 4  A  2 5 8  and B = 2 5  3 6 9  3 6  1 4 7  1 4   c11 c12  30 66  AB  2 5 8  x 2 5  = c 21 c 22  = 36 81  3 6 9  3 6  c 31 c 32   42 96  where c11 = a11b11 + a12b21 + a13b31 = 1 1  + 4  2  + 7  3  = 30 c12 = a11b12 + a12b22 + a13b32 = 1  4  + 4 5  + 7  6  = 66 c 21 = a21b11 + a22b21 + a23b31 = 2 1  + 5  2  + 8  3  = 36 c 22 = a21b12 + a22b22 + a23b32 = 2  4  + 5 5  + 8  6  = 81 c 31 = a31b11 + a32b21 + a33b31 = 3 1  + 6  2  + 9 3  = 42 c 32 = a31b12 + a32b22 + a33b32 = 3  4  + 6  5  + 9  6  = 96 Some Properties of Matrix Multiplication Note that  Even if conformable, AB does not necessarily equal BA (i.e., matrix multiplication is not commutative)  Matrix multiplication can be extended beyond two matrices  matrix multiplication is associative, i.e., A(BC) = (AB)C Some Properties of Matrix Multiplication Also note that  The transpose of a product is equal to the product of the transposes in reverse order  (ABC)’ = C’B’A’  If AA’ = A then A' is idempotent (and A' = A) Special Uses for Matrix Multiplication Sum Row Elements of a Matrix  Premultiply a matrix A by a conformable row vector of 1s – If 1 4 7  A  2 5 8  3 6 9  then premultiplication by 1  1 1 1 will yield the column totals for A, i.e. 1 4 7  A1  1 1 1 2 5 8  = 6 15 24  3 6 9  Special Uses for Matrix Multiplication Sum Column Elements of a Matrix  Postmultiply a matrix A by a conformable column vector of 1s – If 1 4 7  A  2 5 8  3 6 9  then postmultiplication by 1 1  1 1 will yield the column totals for A, i.e. 1 4 7  1 12  A1  2 5 8  1 = 15  3 6 9  1 18  Special Uses for Matrix Multiplication The Dot (or Inner) Product of two Vectors  Premultiplication of a column vector a by conformable row vector b yields a single value called the dot product or inner product - If a  3 4 6  then ab gives us and 5  b  2  8  5  ab  3 4 6  2  = 3  5  + 4  2  + 6  8  = 71 8  which is the sum of products of elements in similar positions for the two vectors Special Uses for Matrix Multiplication The Outer Product of two Vectors  Postmultiplication of a column vector a by conformable row vector b yields a matrix containing the products of each pair of elements from the two matrices (called the outer product) - If a  3 4 6  then ba gives us and 5  b  2  8  5  15 20 30  ba  2  3 4 6  =  6 8 12  8  24 32 48  Special Uses for Matrix Multiplication Sum the Squared Elements of a Vector  Premultiply a column vector a by its transpose – If 5  a  2  8  then premultiplication by a row vector a’ a'  5 2 8 will yield the sum of the squared values of elements for a, i.e. 5  a'a  5 2 8  2  = 52 + 22 + 82 = 93 8  Special Uses for Matrix Multiplication  Postmultiply a row vector a by its transpose – If a  7 10 1 then postmultiplication by a column vector a’ 7 a'  10   1  will yield the sum of the squared values of elements for a, i.e. 7 aa'  7 10 1 10  = 72 +102 +12 = 150  1  Special Uses for Matrix Multiplication Determining if two vectors are Orthogonal – Two conformable vectors a and b are orthogonal iff a’b = 0 Example: Suppose we have a  7 10 1 then and b  1 0.5 2  1  ab'    7 10 -1 0.5 = -7 1 +10  0.5  - 1 -2  = 0  2  Special Uses for Matrix Multiplication Representing Systems of Simultaneous Equations – Suppose we have the following system of simultaneous equations: px1 + qx2 + rx3 = M dx1 + ex2 + fx3 = N If we let  x1   p q r  A , x =  x 2  , and b = M d e f  N  x3    then we can represent the system (in matrix notation) as Ax = b (why?) Special Uses for Matrix Multiplication Linear Independence – any subset of columns (or rows) of a matrix A are said to be linearly independent if no column (row) in the subset can be expressed as a linear combination of other columns (rows) in the subset. If such a combination exists, then the columns (rows) are said to be linearly dependent. Special Uses for Matrix Multiplication The Rank of a matrix is defined to be the number of linearly independent columns (or rows) of the matrix. Nonsingular (Full Rank) Matrix – Any matrix that has no linear dependencies among its columns (rows). For a square matrix A this implies that Ax = 0 iff x = 0. Singular (Not of Full Rank) Matrix – Any matrix that has at least one linear dependency among its columns (rows). Special Uses for Matrix Multiplication Example - The following matrix A 1 2 3  A  3 4 9  6 5 12  is singular (not of full rank) because the third column is equal to three times the first column. This result implies there is either i) no unique solution or ii) no existing solution to the system of equations Ax = 0 (why?). Special Uses for Matrix Multiplication Example - The following matrix A 1 2 5  A  3 4 11 6 5 16  is singular (not of full rank) because the third column is equal to the first column plus two times the second column. Note that the number of linearly independent rows in a matrix will always equal the number of linearly independent columns in the matrix. Geometry of Vectors Vectors have geometric properties of length and direction – for a vector x x   x1   2  we have 2 x2 length of x  L x  x   x1 x2  x1 1 x12  x 22   x12  x 22  x'x Why?  x p2 Geometry of Vectors Recall the Pythagorean Theorem: in any right triangle, the lengths of the hypotenuse and the other two sides are related by the simple formula. 2 length of hypotenuse  L h  side a 1 side b L2a  L2b  x12  x 22  x'x Geometry of Vectors Vector addition – for the vectors we have x y x   x1  , y   y1   2   2   x1  y1   x 2  y 2  2 y  y1 y2  x   x1 x2  q 1 x  y x + y   x 1  y1   2 2  Geometry of Vectors Scalar multiplication changes only the vector length – for the vector we have 2 x x   x1   2  length of cx  L cx  cx  c x1 x2   c 2  x12  x 22  c 2  x12  x 22   c x'x 1  x p2  Geometry of Vectors Vector multiplication have angles between them – for the vectors we have x y x   x1  , y   y1   2   2   xy  xy q  arccos   cos  q     L L  L xL y  x y y  y1 y2  x   x1 x2  2 q 1 xy x'x y'y A Little Trigonometry Review The Unit Circle 2 Cos q = 0 -1  Cos q  0 0  Cos q  1 Cos q = -1 q -1  Cos q  0 Cos q = 1 1 0  Cos q  1 Cos q = 0 A Little Trigonometry Review Suppose we rotate x any y so x lies on 2 axis 1: y  y1 y2  x   x1 x2  qxy 1 A Little Trigonometry Review What does this imply about rxy? 2 Cos q = 0 -1  Cos q  0 0  Cos q  1 qxy Cos q = -1 Cos q = 1 1 -1  Cos q  0 0  Cos q  1 Cos q = 0 Geometry of Vectors What is the correlation between the 2 vectors x and y? 1.0 -0.5 x  0.6 , y  -0.3     Plotting in the column space gives us x 1 y Geometry of Vectors Rotating so x lies on axis 1 makes it 2 easier to see: Qxy=1800 1 Geometry of Vectors What is the correlation between the vectors x and y? 1.0 -0.5 x  0.6 , y  -0.3     xy cosq   L xL y   xy x'x y'y 1.0  -0.5  + 0.6  -0.3 1.0 1.0   0.6  0.6  -0.5  -0.5    -0.3  -0.3 -0.68 = -1.00 1.36 0.34 Geometry of Vectors Of course, we can see this by plotting the these values in the x,y (row) space: 1.0 -0.5 x  0.6 , y  -0.3     Y X 0.6, -0.3 1.0, -0.5 Geometry of Vectors What is the correlation between the vectors x and y? 1.0 -0.50  x    ,y   0.4   1.25  xy cosq   L xL y   xy x'x y'y 1.0  -0.50  + 0.4 1.25  1.0 1.0   0.4  0.4  -0.50  -0.50   1.25  1.25  0.00 = 0.00 1.16 1.8125 Geometry of Vectors Plotting in the column space gives us -0.50 y    1.25  2 Qxy=900 1.0 x    0.4  1 Geometry of Vectors Rotating so x lies on axis 1 makes it 2 easier to see: Qxy=900 1 Geometry of Vectors  The space of all real m-tuples, with scalar multiplication and vector addition as we have defined, is called vector space.  The vector y  a1x1  a2x2   ak xk is a linear combination of the vectors x1, x2,…, xk.  The set of all linear combinations of the vectors x1, x2,…, xk is called their linear span. Geometry of Vectors Here is the column space plot for some vectors x1 and x2: 3 -0.50 x2   0.50  1.50 1.00 x1  0.40 0.20 1 2 Geometry of Vectors Here is the linear span for some vectors x1 and x2: 3 1 2 Geometry of Vectors  A set of vectors x1, x2,…, xk is said to be linearly dependent if there exist k numbers a1, a2,…, ak, at least one of which is nonzero, such that a1x1  a2x2   ak xk  0 Otherwise the set of vectors x1, x2,…, xk is said to be linearly independent Geometry of Vectors Are the vectors 1.0 -0.5 x  0.6 , y  -0.3     linearly independent? Take a1 = 0.5 and a2 = 1.0. Then we have 1.0 -0.5 0.0 a1x  a2y  0.5 0.6  1.0 -0.3  0.0       The vectors x and y are dependent. Geometry of Vectors Geometrically x and y look like this: 2 1.0 x  0.6   1 -0.5 y  -0.3   Geometry of Vectors Rotating so x lies on axis 1 makes it 2 easier to see: Qxy=1800 1 Geometry of Vectors Are the vectors 1.0 -0.50  x    ,y   0.4   1.25  linearly independent? There are no real values a1, a2 such that 1.0 -0.50  0.0 a1x  a2y  a1    a2     0.4   1.25  0.0  so the vectors x and y are independent. Geometry of Vectors Geometrically x and y look like this: -0.50 y    1.25  2 Qxy=900 1.0 x    0.4  1 Geometry of Vectors Rotating so x lies on axis 1 makes it 2 easier to see: Qxy=900 1 Geometry of Vectors Here x and y are called perpendicular (or orthogonal) – this is written x  y. Some properties of orthogonal vectors:  x’y = 0  x  y  z is perpendicular to every vector iff z = 0  If z is perpendicular to each vector x1, x2, …, xk, then z is perpendicular their linear span. Geometry of Vectors Here vectors x1 and x2 (plotted in the column space) are orthogonal 3 -0.50 x2   0.50  1.50 1.00 x1  0.40 0.20 1 2 Geometry of Vectors Recall that the linear span for vectors x1 and x2 is: 3 1 2 Geometry of Vectors Vector z looks like this: 3  0.10 z  -0.32  0.14  0.10 z  -0.32  0.14 1 2 Geometry of Vectors The vector z is perpendicular to the linear span for vectors x1 and x2: 3  0.10 z  -0.32  0.14 Check each of the dot products! 1 2 Geometry of Vectors Here vectors x1, x2, and z from our previous problem are orthogonal 3 -0.50 x2   0.50  1.50  0.10 z  -0.32  0.14 1.00 x1  0.40 0.20 2 x1 and z are perpendicular 1 Geometry of Vectors Here vectors x1, x2, and z from our previous problem are orthogonal 3 -0.50 x2   0.50  1.50  0.10 z  -0.32  0.14 1.00 x1  0.40 0.20 2 x2 and z are perpendicular 1 Geometry of Vectors Here vectors x1, x2, and z from our previous problem are orthogonal 3 -0.50 x2   0.50  1.50 x1 and x2 are perpendicular  0.10 z  -0.32  0.14 1.00 x1  0.40 0.20 2 1 Note we could rotate x1, x2, and z until they lied on our three axes! Geometry of Vectors The projection (or shadow) of a vector x on a vector y is given by: x'y y 2 Ly If y has unit length (i.e., Ly = 1), the projection (or shadow) of a vector x on a vector y simplifies to (x’y)y Geometry of Vectors For the vectors 0.6 0.8 x    ,y    1.0  0.3 the projection (or shadow) of x on y is: 0.8  0.6 1.0   0.3 x'y y = 2 0.8  Ly  0.8 0.3 0.3 0.8   = 1.0685 0.3 =   0.8  =  0.78  0.8    0.3  0.73  0.3 0.8548  0.3205  Geometry of Vectors Geometrically the projection of x on y 2 looks like this: projection of x on y 0.6 x    1.0  0.8 y    0.3 Perpendicular wrt y 1 Geometry of Vectors Rotating so y lies on axis 1 makes it 2 easier to see: 1 Geometry of Vectors Note that we write the length of the projection of x on y like this: x'y Ly = Lx x'y  L x cosq xy L xL y For our previous example, the length of the length of the projection of x on y is: 0.8 0.6 1.0   0.3 x'y =  0.912921 Ly 0.8 0.8 0.3   0.3 The Gram-Schmidt (Orthogonalization) Process For linearly independent vectors x1, x2,…, xk, there exist mutually perpendicular vectors u1, u2,…, uk with the same linear span. These may be constructed by setting: u1  x1 u2 x 2' u1  x2  ' u1 u1u1 uk  x k x 2' u1  ' u1 x1  u1u1 x k' uk -1  ' uk -1 uk -1uk -1 The Gram-Schmidt (Orthogonalization) Process We can normalize (convert to vectors z of unit length) the vectors u by setting zj  uj u'ju j Finally, note that we can project a vector xk onto the linear span of vectors x1, x2,…, xk-1: k -1  j=1 x'k z j  The Gram-Schmidt (Orthogonalization) Process Here are vectors x1, x2, and z from our previous problem: 3 -0.50 x2   0.50  1.50  0.10 z  -0.32  0.14 1.00 x1  0.40 0.20 1 2 The Gram-Schmidt (Orthogonalization) Process Let’s construct mutually perpendicular vectors u1, u2, u3 with the same linear span – we’ll arbitrarily select the first axis as u1: 0.00 u1  0.00 1.00 The Gram-Schmidt (Orthogonalization) Process Now we construct a vector u2 perpendicular with vector u1 (and in the linear span of x1, x2, z): u2  1.50 0.00 1.50   0.50  -0.50 0.00  0.50 -0.50 1.00 0.00 0.00 1.00 0.00 0.00 0.00  1.50 1.00   0.50  0.00 -0.50 -0.50 0.50 1.50 0.00 1.00 0.00 0.00 1.00 The Gram-Schmidt (Orthogonalization) Process Finally, we construct a vector u3 perpendicular with vectors u1 and u2 (and in the linear span of x1, x2, z): The Gram-Schmidt (Orthogonalization) Process  0.14 0.00 1.50  0.1025  -0.32  0.10 0.00  0.25 0.50  -0.3325  0.10 1.00 0.00  0.0000 1.50 0.00 0.50 1.50 0.50   0.00 1.50  1.50 0.50 0.10 -0.32 0.14 0.50 0.00 0.00 0.00 1.00 0.00 0.00 0.00      0.14 1.00 0.00 u1' u1 u2' u2 0.00 u3  z  u1 x1  u2  -0.32  1 2   z' u z' u 0.00 1.00  0.10 0.10 -0.32 0.14 0.00 1.00 The Gram-Schmidt (Orthogonalization) Process Here are our orthogonal vectors u1, u2, and u3: 3 0.00 u2  0.50 1.50  0.0000 u3  -0.3325  0.1025 1.00 u1  0.00 0.00 2 1 The Gram-Schmidt (Orthogonalization) Process If we normalized our vectors u1, u2, and u3, we get: z1  u1 u1' u1   1.0000 0.0000 0.0000 1.0000 1.0000 0.0000 0.0000 0.0000 0.0000 1.0 1.0 1.0000 1.0000 0.0000  0.0000 0.0000 0.0000 The Gram-Schmidt (Orthogonalization) Process and: z2  u2 u2' u2   0.0000 0.5000 1.5000 0.0000 0.0000 0.5000 1.5000 0.5000 1.5000 1.0 2.5 0.0000 0.0000 0.5000  0.3162 1.5000 0.9487 The Gram-Schmidt (Orthogonalization) Process and: z3  u3 u'3u3    0.0000 -0.3325  0.1025  0.0000 0.0000 -0.3325 0.1025 -0.3325  0.1025 1.0 0.1211  0.0000  0.0000 -0.3325  -0.9556  0.1025  0.2946 The Gram-Schmidt (Orthogonalization) Process The normalized vectors z1, z2, and z3 look like this: 3 0.0000 z2  0.3162 0.9487  0.0000 z3  -0.9556  0.2946 1.00 z1  0.00 0.00 2 1 Special Matrices There are a number of special matrices. These include  Diagonal Matrices  Identity Matrices  Null Matrices  Commutative Matrices  Anti-Commutative Matrices  Periodic Matrices  Idempotent Matices  Nilpodent Matrices  Orthogonal Matrices Diagonal Matrices A diagonal matrix is a square matrix that has values on the diagonal with all off-diagonal entities being zero. a11 0 0 0  0 a22 0 0 0 0 a33 0 0  0  0  a44  Identity Matrices An identity matrix is a diagonal matrix where the diagonal elements all equal 1 1 I  0 0 0 0 1 0 0 0 0 1 0 0 0 0 1  When used as a premultiplier or postmultiplier of any conformable matrix A, the Identity Matrix will return the original matrix A, i.e., IA = AI = A Why? Null Matrices A square matrix whose elements all equal 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0  Usually arises as the difference between two equal square matrices, i.e., a–b=0a=b Commutative Matrices Any two square matrices A and B such that AB = BA are said to commute. Note that it is easy to show that any square matrix A commutes with both itself and with a conformable identity matrix I. Anti-Commutative Matrices Any two square matrices A and B such that AB = -BA are said to anticommute. Periodic Matrices Any matrix A such that Ak+1 = A is said to be of period k. Of course any matrix that commutes with itself of period k for any integer value of k (why?). Idempotent Matrices Any matrix A such that A2 = A is said to be of idempotent. Thus an idempotent matrix commutes with itself if of period k for any integer value of k. Nilpotent Matrices Any matrix A such that Ap = 0 where p is a positive integer is said to be of nilpotent. Note that if p is the least positive integer such that Ap = 0, then A is said to be nilpotent of index p. Orthogonal Matrices Any square matrix A with rows (considered as vectors) are mutually perpendicular and have unit lengths, i.e., A’A = I Note that A is orthogonal iff A-1 = A’. The Determinant of a Matrix The determinant of a matrix A is commonly denoted by |A| or det A. Determinants exist only for square matrices. They are a matrix characteristic (that can be somewhat tedious to compute). The Determinant for a 2x2 Matrix If we have a matrix A such that then A   a11 a21 a12  a22  A = a11a22 - a12 a21 For example, the determinant of is A  1 2  3 4  A = 1 2  = a11a22 - a12a21  1 4  -  2  3  = -2 3 4  Determinants for 2x2 matrices are easy! The Determinant for a 3x3 Matrix If we have a matrix A such that  a11 A  a21 a31  a12 a22 a32 a13  a23  a33  Then the determinant is det A = A = a11 a22 a32 a23 a - a12 21 a33 a31 a23 a + a13 21 a31 a33 a22 a32 which can be expanded and rewritten as det A = A = a11a22a33 - a11a23a32 + a12a23a31 - a12 a21a33 + a13 a21a32 - a13 a22 a31 (Why?) The Determinant for a 3x3 Matrix If we rewrite the determinants for each of the 2x2 submatrices in det A = A = a11 as a22 a32 a23 a - a12 21 a33 a31 a23 a + a13 21 a31 a33 a22 a32 a23 =a22 a33 - a23a32 , a33 a21 a31 a23 =a21a33 - a23a31 , and a33 a21 a31 a22 =a21a32 - a22 a31 a32 a22 a32 by substitution we have A = a11a22a33 - a11a23a32 + a12a23a31 - a12a21a33 + a13a21a32 - a13a22a31 The Determinant for a 3x3 Matrix Note that if we have a matrix A such that  a11 A  a21 a31  a12 a22 a32 a13  a23  a33  Then |A| can also be written as det A = A = a11 a22 a32 a23 a - a12 21 a33 a31 det A = A = -a21 a12 a32 a13 a + a22 11 a33 a31 det A = A = a31 a12 a22 a13 a - a32 11 a23 a21 or or a23 a + a13 21 a31 a33 a22 a32 a23 a - a23 11 a31 a33 a12 a32 a13 a + a33 11 a23 a21 a12 a22 The Determinant for a 3x3 Matrix To do so first create a matrix of the same dimensions as A consisting only of alternating signs (+,-,+,…) + - + - + - + - + The Determinant for a 3x3 Matrix Then expand on any row or column (i.e., multiply each element in the selected row/column by the corresponding sign, then multiply each of these results by the determinant of the submatrix that results from elimination of the row and column to which the element belongs For example, let’s expand on the second column a a a  11 A  a21 a31  12 a22 a32 13 a23  a33  The Determinant for a 3x3 Matrix The three elements on which our expansion is based will be a12, a22, and a32. The corresponding signs are -, +, -. + - + - + - + - + The Determinant for a 3x3 Matrix So for the first term of our expansion we will multiply -a12 by the determinant of the matrix formed when row 1 and column 2 are eliminated from A (called the minor and often denoted Arc where r and c are the deleted rows and columns):  a11 A  a21 a31  a12 a22 a32 which gives us a13  a a23  so A12   21 a31 a33  -a12 a21 a31 a23  a33  a23 a33 This product is called a cofactor. The Determinant for a 3x3 Matrix For the second term of our expansion we will multiply a22 by the determinant of the matrix formed when row 2 and column 2 are eliminated from A:  a11 A  a21 a31  a12 a22 a32 a13  a a23  so A 22  11 a31  a33  which gives us a22 a11 a31 a13 a33 a13 a33 The Determinant for a 3x3 Matrix Finally, for the third term of our expansion we will multiply -a32 by the determinant of the matrix formed when row 3 and column 2 are eliminated from A:  a11 A  a21 a31  a12 a22 a32 a13  a a23  so A 22  11 a21  a33  which gives us -a32 a11 a21 a13 a23 a13 a23 The Determinant for a 3x3 Matrix Putting this all together yields det A = A = -a12 a21 a31 a23 a + a22 11 a33 a31 a13 a - a32 11 a33 a21 a13 a23 So there are nine distinct ways to calculate the determinant of a 3x3 matrix! These can be expressed as m det A = A =  aij  -1 j=1 i+j n Aij =  aij  -1 i=1 i+j Aij Note that this is referred to as the method of cofactors and can be used to find the determinant of any square matrix. The Determinant for a 3x3 Matrix – An Example Suppose we have the following matrix A: 1 2 3  A = 2 5 4  1 -3 -2  Using row 1 (i.e., i=1), the determinant is: m det A = A =  a1j  -1 j=1 1+j A1j  1(2)  2(8)  3(11)  15 Note that this is the same result we would achieve using any other row or column! Some Properties of Determinants Determinants have several mathematical properties useful in matrix manipulations:  |A|=|A'|  If each element of a row (or column) of A is 0, then |A|= 0  If every value in a row is multiplied by k, then |A| = k|A|  If two rows (or columns) are interchanged the sign, but not value, of |A| changes  If two rows (or columns) of A are identical, |A| = 0 Some Properties of Determinants  |A| remains unchanged if each element of a row is multiplied by a constant and added to any other row  If A is nonsingular, then |A|=1/|A-1|, i.e., |A||A-1|=1  |AB|= |A||B| (i.e., the determinant of a product = product of the determinants)  For any scalar c, |cA| = ck|A| where k is the order of A  Determinant of a diagonal matrix is simply the product of the diagonal elements Why are Determinants Important? Consider the small system of equations: a11x1 + a12x2 = b1 a21x1 + a22x2 = b2 Which can be represented by: Ax = b where A =  a11 a12  , x =  x1  , and b = b1  a21 a22  x 2  b 2  Why are Determinants Important? If we were to solve this system of equations simultaneously for x2 we would have: a21(a11x1 + a12x2 = b1) -a11(a21x1 + a22x2 = b2) Which yields (through cancellation & rearranging): a21a11x1 + a21a12x2 - a11a21x1 - a11a22x2 = a21b1 - a11b2 Why are Determinants Important? or (a11a2 - a21a12)x2 = a11b2 - a21b1 which implies a11b2  a21b1 x2= a11a22 - a12a21 Notice that the denominator is: A = a11a22 - a12 a21 Thus iff |A|= 0 there is either i) no unique solution or ii) no existing solution to the system of equations Ax = b! Why are Determinants Important? This result holds true:  if we solve the system for x1 as well; or  for a square matrix A of any order. Thus we can use determinants in conjunction with the A matrix (coefficient matrix in a system of simultaneous equations) to see if the system has a unique solution. Traces of Matrices The trace of a square matrix A is the sum of the diagonal elements Denoted tr(A) We have n m tr  A  =  aii =  ajj i=1 j=1 For example, the trace of is n A  1 2  3 4  tr  A  =  aii = 1 + 4 = 5 i=1 Some Properties of Traces Traces have several mathematical properties useful in matrix manipulations:     For any scalar c, tr(cA) = c[tr(A)] tr(A  B) = tr(A)  tr(B) tr(AB) = tr(BA) tr(B-1AB) = tr(A) n m  tr  AA'  =  aij2 i=1 j=1 The Inverse of a Matrix The inverse of a matrix A is commonly denoted by A-1 or inv A. The inverse of an n x n matrix A is the matrix A-1 such that AA-1 = I = A-1A The matrix inverse is analogous to a scalar reciprocal A matrix which has an inverse is called nonsingular The Inverse of a Matrix For some n x n matrix A an inverse matrix A-1 may not exist. A matrix which does not have an inverse is singular. An inverse of n x n matrix A exists iff |A|0 Inverse by Simultaneous Equations Pre or postmultiply your square matrix A by a dummy matrix of the same dimensions, i.e., AA 1  a11  a21 a31  a12 a22 a32 a13   a b c  a23  d e f  a33  g h i  Set the result equal to an identity matrix of the same dimensions as your square matrix A, i.e., AA 1  a11  a21 a31  a12 a22 a32 a13   a b c  1 0 0  a23  d e f   0 1 0  a33  g h i  0 0 1  Inverse by Simultaneous Equations Recognize that the resulting expression implies a set of n2 simultaneous equations that must be satisfied if A-1 exists: a11(a) + a12(d) + a13(g) = 1, a11(c) + a12(f) + a13(i) = 0; a11(b) + a12(e) + a13(h) = 0, a21(a) + a22(d) + a23(g) = 0, a21(c) + a22(f) + a23(i) = 0; a21(b) + a22(e) + a23(h) = 1, a31(a) + a32(d) + a33(g) = 0, a31(c) + a32(f) + a33(i) = 1. a31(b) + a32(e) + a33(h) = 0, or or Solving this set of n2 equations simultaneously yields A-1. Inverse by Simultaneous Equations – An Example If we have 1 2 3  A = 2 5 4  1 -3 -2  Then the postmultiplied matrix would be AA 1 1 2 3   a b c   2 5 4   d e f  1 -3 -2  g h i  We now set this equal to a 3x3 identity matrix 1 2 3   a b c  1 0 0  2 5 4   d e f   0 1 0  1 -3 -2  g h i  0 0 1  Inverse by Simultaneous Equations – An Example Recognize that the resulting expression implies the following n2 simultaneous equations: 1a + 2d + 3g = 1, 1b + 2e + 3h = 0, 1c + 2f + 3i = 0; or 2a + 5d + 4g = 0, 2b + 5e + 4h = 1, 2c + 5f + 4i = 0; or 1a - 3d - 2g = 0, 1b - 3e - 2h = 0, 1c - 3f - 2i = 1. This system can be satisfied iff A-1 exists. Inverse by Simultaneous Equations – An Example Solving the set of n2 equations simultaneously yields: a = -2/15, b = 1/3, c = 7/3, d = -8/15, e = 1/3, f = -2/3’ g = 11/15, h =-1/3, i = -1/15 so we have that A-1  -2 1 7  15 3 15   1 -2  A -1 =  -8 1115 -13 -115   15 3 15  Inverse by Simultaneous Equations – An Example ALWAYS check your answer. How? Use the fact that AA-1= A-1A =I and do a little matrix multiplication!  -2 1 7  3 15  1 0 0  1 2 3   15 1 -2   0 1 0   I 3x3 AA -1 = 2 5 4   -8 1 -3 -2  1115 -13 -115  0 0 1   15 3 15  So we have found A-1! Inverse by the Gauss-Jordan Algorithm Augment your matrix A with an identity matrix of the same dimensions, i.e.,  a11 a12 A | I  a21 a22 a  31 a32 a13 1 0 0  a23 0 1 0  a33 0 0 1  Now we use valid Row Operations necessary to convert A to I (and so A|I to I|A-1) Inverse by the Gauss-Jordan Algorithm Valid Row Operations on A|I  You may interchange rows  You may multiply a row by a scalar  You may replace a row with the sum of that row and another row multiplied by a scalar (which is often negative) Every operation performed on A must be performed on I Use valid Row Operations on A|I to convert A to I (and so A|I to I|A-1) Inverse by the Gauss-Jordan Algorithm – An Example If we have 1 2 3  A = 2 5 4  1 -3 -2  Then the augmented matrix A|I is 1 2 3 1 0 0  A | I  2 5 4 0 1 0  1 -3 -2 0 0 1    We now wish to use valid row operations to convert the A side of this augmented matrix to I Inverse by the Gauss-Jordan Algorithm – An Example Step 1: Subtract 2·Row 1 from Row 2 2 5  4 0 - 2 1 2 3 1 1 0  0 0  0 1 -2 -2 -1 0    And substitute the result for Row 2 in A|I 1 2 3 1 0 0  0 1 -2 -2 1 0  1 -3 -2 0 0 1    Inverse by the Gauss-Jordan Algorithm – An Example Step 2: Subtract Row 3 from Row 1 1  2 31 0 - 1 -3 -2 0 0 0  5 0  1 5 1 0 -1 Divide the result by 5 and substitute for Row 3 in the matrix derived in the previous step 1 2 3 1 0 0    0 1 -2 -2 1 0   0 1 1 1 5 0 -1 5  Inverse by the Gauss-Jordan Algorithm – An Example Step 3: Subtract Row 2 from Row 3 0 1 1 1 5   0 1 -2 - 2 0 0  3 11 0 -1  5 1 0  -1 -1  5 5 Divide the result by 3 and substitute for Row 3 in the matrix derived in the  previous step 1 2 3 1 0 0   0 1 -2 -2 1 0   0 0 1 1115 -1 3 -115  Inverse by the Gauss-Jordan Algorithm – An Example Step 4: Subtract 2·Row 2 from Row 1 1 2  3 1  2 0 1 -2 - 2 1 0  0 0  1 0  7 5 -2 0  Substitute the result for Row 1 in the matrix derived in the previous step   1 0 7 5 -2 0   0 1 -2 -2 1 0   0 0 1 1115 -1 3 -115  Inverse by the Gauss-Jordan Algorithm – An Example Step 5: Subtract 7·Row 3 from Row 1 1 0  7 0 0  1 0  0  7 5 -2 1 11 15 -1 -1  3 15  0 -2 1 7 15 3  15  Substitute the result for Row 1 in the matrix derived in the previous step -2 1 7   1 0 0 15 3 15   1 0  0 1 -2 -2 0 0 1 1115 -1 3 -115    Inverse by the Gauss-Jordan Algorithm – An Example Step 6: Add 2·Row 3 to Row 2 0 1 -2 - 2  2 0 0 1 11 15  0 1  0 -8 15 0  1 -1 1 -1  3 15  3 -2  15  Substitute the result for Row 2 in the matrix derived in the previous step -2 1 7   3 15  1 0 0 15 0 1 0 -8 15 1 3 -215 0 0 1 11 -1 -1   15 3 15 Inverse by the Gauss-Jordan Algorithm – An Example Now that the left side of the augmented matrix is an identity matrix I, the right side of the augmented matrix is the inverse of the matrix A (A-1), i.e., A 1 1 7   -2 3 15   15 1 -2    -8 15 3 15 11 -1 -1   15 3 15 Inverse by the Gauss-Jordan Algorithm – An Example To check our work, let’s see if our result yields AA-1 = I: AA 1 1 7   -2 3 15 1 0 0 1 2 3   15 1 -2   0 1 0  2 5 4   -8 1 -3 -2 1115 -13 -115 0 0 1   15 3 15 So our work checks out! Inverse by Determinants Replace each element aij in a matrix A with an element calculated as follows:  Find the determinant of the submatrix that results when the ith row and jth column are eliminated from A (i.e., Aij)  Attach the sign that you identified in the Method of Cofactors  Divide by the determinant of A  After all elements have been replaced, transpose the resulting matrix Inverse by Determinants – An Example Again suppose we have some matrix A: 1 2 3  A  2 5 4  1 -3 -2 We have calculated the determinant of A to be –15, so we replace element 1,1 with 11 1    A11     1 2 15   215 A  Similarly, we replace element 1,2 with 1 2 1    A12     1 8 15   815 A  Inverse by Determinants – An Example After using this approach to replace each of the nine elements of A, The eventual result will be 1 7   -2 3 15   15  -8 15 1 3 -215 11 -1 -1   15 3 15  which is A-1! Eigenvalues and Eigenvectors For a square matrix A, let I be a conformable identity matrix. Then the scalars satisfying the polynomial equation |A - lI| = 0 are called the eigenvalues (or characteristic roots) of A. The equation |A - lI| = 0 is called the characteristic equation or the determinantal equation. Eigenvalues and Eigenvectors For example, if we have a matrix A: A  2 4   4 -4  then A  lI   2 4   l 1 0   4 -4  0 1  4  2  l 4  l  16  0  2-l   4 -4 - l  or l 2  2l  24  0 which implies there are two roots or eigenvalues -- l=-6 and l=4. Eigenvalues and Eigenvectors For a matrix A with eigenvectors l, a nonzero vector x such that Ax = lx is called an eigenvector (or characteristic vector) of A associated with l. Eigenvalues and Eigenvectors For example, if we have a matrix A: A  2 4   4 -4  with eigenvalues l = -6 and l = 4, the eigenvector of A associated with l = -6 is  x1   2 4   x1  Ax  lx   6    4 -4   x 2  x 2   2x 1  4x 2  6x 1  8x 1  4x 2  0 and  4x 1  4x 2  6x 2  4x 1  2x 2  0 Fixing x1=1 yields a solution for x2 of –2. Eigenvalues and Eigenvectors Note that eigenvectors are usually normalized so they have unit length, i.e., e x x'x For our previous example we have: e x x'x   1 -2   1  1  -2   5    -2  5  1  1 -2   5  -2  Thus our arbitrary choice to fix x1=1 has no impact on the eigenvector associated with l = -6. Eigenvalues and Eigenvectors For matrix A and eigenvalue l = 4, we have Ax  lx   2 4   x1   4  x1   4 -4   x 2  x 2   2x 1  4x 2  4x 1  2x 1  4x 2  0 and  4x 1  4x 2  4x 2  4x 1  8x 2  0 We again arbitrarily fix x1=1, which now yields a solution for x2 of 1/2. Eigenvalues and Eigenvectors Normalization to unit length yields e x x'x   1 1   2  1  1 1  1   2  2 2 5       1  5 5  1   1 1   1  5  4 2 2 2    Again our arbitrary choice to fix x1=1 has no impact on the eigenvector associated with l = 4. Quadratic Forms A Quadratic From is a function Q(x) = x’Ax in k variables x1,…,xk where  x1  x2  x  xk    and A is a k x k symmetric matrix. Quadratic Forms Note that a quadratic form has only squared terms and crossproducts, and so can be written k k Q  x    aij x i x j Suppose we have then i1 j1 x  x   1  and A  1 4  0 2  x 2  Q(x) = x'Ax = x12 + 4x1 x 2 - 2x 22 Spectral Decomposition and Quadratic Forms Any k x k symmetric matrix can be expressed in terms of its k eigenvalueeigenvector pairs (li, ei) as k A   l ie ie i1 ' i This is referred to as the spectral decomposition of A. Spectral Decomposition and Quadratic Forms For our previous example on eigenvalues and eigenvectors we showed that  2 4  A  4 4  has eigenvalues l1 = -6 and l2 = -4, with corresponding (normalized) eigenvectors  1  2      5 5 e1  ,e  ,  -2  2 1    5  5  Spectral Decomposition and Quadratic Forms Can we reconstruct A? k A   l ie ie i1 ' i  1  2  1  2     5 5 -2  6 +4   -2    1   5 5 5   5  5   1 -2  4 2   6  5 5   4  5 5    2 4   A -2 5 4 5   2 5 1 5   4 4      1  5  Spectral Decomposition and Quadratic Forms Spectral decomposition can be used to develop/illustrate many statistical results/ concepts. We start with a few basic concepts: - Nonnegative Definite Matrix – when any k x k matrix A such that 0  x’Ax  x’ =[x1, x2, …, xk] the matrix A and the quadratic form are said to be nonnegative definite. Spectral Decomposition and Quadratic Forms - Positive Definite Matrix – when any k x k matrix A such that 0 < x’Ax  x’ =[x1, x2, …, xk]  [0, 0, …, 0] the matrix A and the quadratic form are said to be positive definite. Spectral Decomposition and Quadratic Forms Example - Show that the following quadratic form is positive definite: 6x12 + 4x 22 - 4 2x1x 2 We first rewrite the quadratic form in matrix notation: Q( x) =  x1  6   x1  -2 2 x 2   = x ' Ax   x 2  -2 2 4   Spectral Decomposition and Quadratic Forms Now identify the eigenvalues of the resulting matrix A (they are l1 = 2 and l2 = 8).  6  1 0  -2 2 A  lI    l  0 1  -2 2 4       6 - l -2 2   6  l  4  l   -2 2 -2 2  0 -2 2 4 - l or l 2  10l  16   l  2  l  8   0 Spectral Decomposition and Quadratic Forms Next, using spectral decomposition we can write: k A   l ieiei'  l1e1e1'  l 2 e 2 e 2'  2e1e1'  8e 2 e 2' i1 where again, the vectors ei are the normalized and orthogonal eigenvectors associated with the eigenvalues l1 = 2 and l2 = 8. Spectral Decomposition and Quadratic Forms Sidebar - Note again that we can recreate the original matrix A from the spectral decomposition: k A   l ie ie i' i1 1  1   3 2  2   3  3   1  2 3  2  3  2  3 2 2  +8  -1   3 3   3    2  2 3   8 3 3  2 2  1 3  3 3  2 -1  3      6 2   2 2 4  2  A   Spectral Decomposition and Quadratic Forms Because l1 and l2 are scalars, premultiplication and postmultiplication by x’ and x, respectively, yield: ' ' ' 1 ' ' 2 2 1 2 2 x Ax  2x e1e x  8x e2e x  2y + 8y  0 where ' ' 1 ' ' 2 y1  x e1  e x1 and y 2  x e2  e x 2 At this point it is obvious that x’Ax is at least nonnegative definite! Spectral Decomposition and Quadratic Forms We now show that x’Ax is positive definite, i.e. x ' Ax  2y12 + 8y 22  0 From our definitions of y1 and y2 we have '   y1  e1   x1   or y  Ex '    y 2  e2   x 2  Spectral Decomposition and Quadratic Forms Since E is an orthogonal matrix, E’ exists. Thus, ' x Ey But 0  x = E’y implies y  0 . At this point it is obvious that x’Ax is positive definite! Spectral Decomposition and Quadratic Forms This suggests rules for determining if a k x k symmetric matrix A (or equivalently, its quadratic form x’Ax) is nonegative definite or positive definite: - A is a nonegative definite matrix iff li  0, i = 1,…,rank(A) - A is a positive definite matrix iff li > 0, i = 1,…,rank(A) Measuring Distance Euclidean (straight line) distance – The Euclidean distance between two points x and y (whose coordinates are represented by the elements of the corresponding vectors) in p-space is given by d(x , y)  x  y1  + 2 1  + xp  yp  2 Measuring Distance For a previous example 3 0.0000 z2  0.3162 0.9487  0.0000 z3  -0.9556  0.2946 1.00 z1  0.00 0.00 1 2 the Euclidean (straight line) distances are Measuring Distance d(z1 , z 2 )  d(z1 , z 3 )  d(z 2 , z 3 )  1.00  0.00  + 0.00  0.3162  + 0.00  0.9487  1.00  0.00  + 0.00  0.9556  + 0.00  0.2946  2 2 2  1.414 2 2 2  1.414  0.00  0.00  +  0.3162  0.9556  +  0.9487  0.2946   1.430 2 2 2 3 0.0000 z2  0.3162 0.9487 1.430  0.0000 z3  -0.9556  0.2946 1.414 1.414 2 1.00 z1  0.00 0.00 1 Measuring Distance Notice that the lengths of the vectors are their distances from the origin: d(0, P)  x  0 + 2 1  x12 +  + xp  0  2 + x p2 This is yet another place where the Pythagorean Theorem rears its head! Measuring Distance Notice also that if we connect all points equidistant from some given point z, the result is a hypersphere with its center at z and area of pr2: 2 In p=2 dimensions this yields a circle z r 1 Measuring Distance In p = 2 dimensions, we actually talk about area. In p  3 dimensions, we talk about volume - which is 4/3pr3 for this n n2 problem or, more generally r p 3 In p=3 dimensions we have a sphere z  n  1   2  r 1 2 Measuring Distance Problem – What if the coordinates of a point x (i.e., the elements of vector x) are random variables with differing variances? Suppose - we have n pairs of measurements on two variables X1 and X2, each having a mean of zero - X1 is more variable than X2 - X1 and X2 vary independently Measuring Distance A scatter diagram of these data might look like this: Which point really lies further from the origin in statistical terms (i.e., which point is less likely to have occurred randomly)? 2 1 Euclidean distance does not account for differences in variation of X1 and X2! Measuring Distance Notice that a circle does not efficiently inscribe the data: 2 r2 The area of the ellipse is pr1r2. r1 1 An ellipse does so much more efficiently! Measuring Distance How do we take the relative dispersions on the two axes into consideration? 2 1 We standardize each value of Xi by dividing by its standard deviation. Measuring Distance Note that the problem can extend beyond two dimensions. 3 The area of the ellipsoid is (4/3)pr1r2r3 or more generally 1 n n2 r p  i1  n  1   2  2 Measuring Distance If we are looking at distances from the origin D(0,P), we could divide coordinate i by its sample standard deviation sii: * i x = xi s ii Measuring Distance The resulting measure is called Statistical Distance or Mahalanobis Distance: d(0, P)  x  * 1 2  x   1  s11  x1 has a relative 1 weight of k 1  s11  2 1 x + s11   + + x 2 2  x + p  s pp    +   + * p x 2 p s pp     2 x p has a relative 1 weight of k p  s pp Measuring Distance Note that if we plot all points a constant squared distance c2 from the origin: c s 22 2 c s11 c s11 1 The area of this ellipse is  p c s11 c s 22  c s 22 all points that satisfy 2 2 x x d2  0, P   1 + + p =c 2 s11 s pp Measuring Distance What if the scatter diagram of these data looked like this: 2 1 X1 and X2 now have an obvious positive correlation! Measuring Distance We can plot a rotated coordinate system ~ ~ on axes x1 and x2: 2 Q 1 This suggests that we calculate distance ~ ~ based on the rotated axes x1 and x2. Measuring Distance The relation between the original coordinates (x1, x2) and the rotated ~ ~ coordinates (x1, x2) is provided by: x1=x1 cos  q   x 2 sin  q  x 2 =-x1 sin  q  x 2 cos  q  Measuring Distance Now we can write the distance from P = ~ ~ (x1, x2) to the origin in terms of the original coordinates x1 and x2 of P as 2 11 1 d(0, P)  a x + 2a12 x1x 2 + a22 x where a11  2 2 cos 2  q  cos 2  q  s11  2 sin  q  cos  q  s12  sin2  q  s 22  sin2  q  cos 2  q  s 22  2 sin  q  cos  q  s12  sin2  q  s 11 Measuring Distance a22  sin2  q  cos  q  s11  2 sin  q  cos  q  s12  sin  q  s 22 2 2 cos  q  2  cos 2  q  s 22  2 sin  q  cos  q  s12  sin2  q  s 11 and a12  cos  q  sin  q  cos  q  s11  2 sin  q  cos  q  s12  sin  q  s 22 2  2 sin  q  cos  q  cos 2  q  s 22  2 sin  q  cos  q  s12  sin2  q  s 11 Measuring Distance Note that the distance from P = (x1, x2) to the origin for uncorrelated coordinates x1 and x2 is d(0, P)  a11x12 + 2a12 x1x 2 + a22 x 22 for weights 1 aij  s ij Measuring Distance What if we wish to measure distance from some fixed point Q = (y1, y2)? 2 Q=(y1, y2) 1 _ _ In this diagram, Q = (y1, y2) = (x1, x2) is called the centroid of the data. Measuring Distance The distance from any point p to some fixed point Q = (y1, y2) is 2 P=(x1, x2) Q=(y1, y2) Q 1 d(P, Q)  a11  x1  y1  + 2a12  x1  y1  x 2  y 2  + a22  x 2  y 2  2 2 Measuring Distance Suppose we have the following ten bivariate observations (coordinate sets of (x1, x2)): Obs # x x 1 2 3 4 5 6 7 8 9 10 xi 1 2 -3 -2 -1 -2 -3 -1 -3 -3 0 -2 -2.0 3 6 4 4 6 6 2 5 7 7 5.0 Measuring Distance The plot of these points would look like this: 2 Centroid (-2, 5) 1 The data suggest a positive correlation between x1, and x2. Measuring Distance The inscribing ellipse (and major and minor axes) look like this: 2 Q450 1 Measuring Distance The rotational weights are: a11     45  1.11  2 sin  45  cos  45  0.8   sin  45  2.89  cos 2 450 cos 2  0   0 0 sin2  q      2 0   cos 2 450  2.89   2 sin 450 cos 450  0.8   sin2 450 1.11  0.6429 Measuring Distance and: a22    cos  45  1.11  2 sin  45  cos  45   0.8   sin  45   2.89  sin2 450 2  0   0 0 cos 2  q      2 0   cos 2 450  2.89   2 sin 450 cos 450  0.8   sin2 450 1.11  0.6429 Measuring Distance and: a12      cos  45  1.11  2 sin  45  cos  45   0.8   sin  45   2.89  cos  q  sin  45   cos  45   2.89   2 sin  45  cos  45   0.8   sin  45  1.11 sin 450 cos 450 2 0 0 0 2 0 2 0 0 2 0  0.3571 0 0 Measuring Distance So the distances of the observed points from their centroid Q = (-2.0, 5.0) are: Obs # x1 x2 x1 x2 ~ Euclidean D(P,Q) 1 2 3 4 5 6 7 8 9 10 xi -3 -2 -1 -2 -3 -1 -3 -3 0 -2 -2.0 3 6 4 4 6 6 2 5 7 7 5.0 0.0000 2.8284 2.1213 1.4142 2.1213 3.5355 -0.7071 1.4142 4.9497 3.5355 4.2426 5.6569 3.5355 4.2426 6.3640 4.9497 3.5355 5.6569 4.9497 6.3640 2.2361 1.0000 1.4142 1.0000 1.4142 1.4142 3.1623 1.0000 2.8284 2.0000 ~ Mahalanobis D(P,Q) 1.3363 0.8018 1.4142 0.8018 1.4142 0.7559 2.0702 0.8018 1.5119 1.6036 Measuring Distance Mahalonobis distance can easily be generalized to p dimensions: d(P, Q)  p  a x i1 ii j-1 i p   y i  + 2 aij  x i  y i  x j  y j 2 i1 j 2 and all points satisfying p j-1 p   2 a x  y + 2 a x  y x  y  c  ii  i i   ij  i i  j j i1 2 i1 j2 form a hyperellipsoid with centroid Q.  Measuring Distance Now let’s backtrack – the Mahalonobis distance of a random p dimensional point P from the origin is given by d(0, P)  p j-1 p 2 a x  ii i + 2 aij x i x j i1 i1 j 2 so we can say p j-1 p d (0, P)   aii x + 2 aij x i x j 2 i1 2 i i1 j2 provided that d2 > 0  x  0. Measuring Distance Recognizing that aij = aji, i  j, i = 1,…,p, j = 1,…,p, we have 0<d2 (0, P)   x1 x 2 for x  0.  a11 a12 a21 a22 x p   a a  p1 p2 a1p   x1  a2p   x 2      x'Ax app   x p  Measuring Distance Thus, p x p symmetric matrix A is positive definite, i.e., distance is determined from a positive definite quadratic form x’Ax! We can also conclude from this result that a positive definite quadratic form can be interpreted as a squared distance! Finally, if the square of the distance from point x to the origin is given by x’Ax, then the square of the distance from point x to some arbitrary fixed point m is given by (x-m)’A (x-m). Measuring Distance Expressing distance as the square root of a positive definite quadratic form yields an interesting geometric interpretation based on the eigenvalues and eigenvectors of A. For example, in p = 2 two dimensions all points x   x1   x 2  that are constant distance c from the origin must satisfy x ' Ax  a11 x12 + 2a12 x1x 2 + a22 x 22  c Measuring Distance By the spectral decomposition we have k A   l ieie  l1e1e  l 2 e 2 e ' i i1 ' 1 ' 2 so by substitution we now have  ' ' x Ax  l1 x e1  2  '  l 2 x e2  2  c2 and A is positive definite, so l1 > 0 and l2 >0, which means 2  ' c  l1 x e1 is an ellipse.  2  '  l 2 x e2  2 Measuring Distance Finally, a little algebra can be used to show that x satisfies c l1 e1 2  c  ' ' 2   x Ax  l1 e1e1  c  l1    Measuring Distance Similarly, a little algebra can be used to show that x satisfies c l2 e2 2  c  ' ' 2   x Ax  l 2 e2e2  c  l2    Measuring Distance So the points at a distance c lie on an ellipse whose axes are given by the eigenvectors of A with lengths proportional to the reciprocals of the square roots of the corresponding eigenvalues (with constant of proportionality c) x c 2 e2 This generalizes to p dimensions l1 c l2 e1 x1 Square Root Matrices Because spectral decomposition allows us to express the inverse of a square matrix in terms of its eigenvalues and eigenvectors, it enables us to conveniently create a square root matrix. Let A be a p x p positive definite matrix with the spectral decomposition k A   l ie ie i' i1 Square Root Matrices Also let P be a matrix whose columns are the normalized eigenvectors e1, e2, …, ep of A, i.e., Then P  e 2 e p  e2 k A   l ie ie i'  P ' P i1 where P’P = PP’ = I and l1 0  0  0 l2 0 0 0 0  l p  Square Root Matrices Now since (P-1P’)PP’=PP’(P-1P’)=PP’=I we have 1 A  P P   eiei' i1 l i -1 Next let  l1 1  2   0   0  1 ' 0 l2 0 k 0   0   lp   Square Root Matrices The matrix 1 2 k P P '   l i eiei'  A i1 is called the square root of A. 1 2 Square Root Matrices The square root of A has the following properties:   A   1 2 1 2 ' 1    A 2  1 2  A A A -1 2 1 2 -1 2 -1 2 1 2  A A A A  A A A 1 -1 2 I where A -1 2   A   1 2    -1 Square Root Matrices Next let -1 denote the matrix matrix whose columns are the normalized eigenvectors e1, e2, …, ep of A, i.e., Then P  e 2 e p  e2 k A   l ie ie i'  P ' P i1 where P’P = PP’ = I and l1 0  0  0 l2 0 0 0 0  l p  Singular Value Decomposition We can extend the operations of spectral decomposition for a rectangular matrix by using the eigenvalues and eigenvectors from the square matrix A’A or (AA’). Suppose A is an m x k real matrix. There exists an m x m orthogonal matrix U and a k x k orthogonal matrix V such that singular values of A A = UV’ where  has ith diagonal element li  0 for i = 1, 2,…, min (m,k) and 0 for all other elements. Singular Value Decomposition Singular Value decomposition can also be expressed as a matrix expansion that depends on the rank r of A. There exist r - positive constants l1, l2,…, lr, - orthogonal m x 1 unit vectors u1, u2, …, ur, - orthogonal k x 1 unit vectors v1, v2, …, vr, such that k A   l iui v i'  Ur  r Vr' i1 Singular Value Decomposition k A   l iui v  Ur  r V i1 ' i ' r where - Ur = [u1, u2, …, ur] - Vr = [v1, v2, …, vr] -  is an r x r with diagonal entries l1, l2,…, lr and off diagonal entries 0 Singular Value Decomposition We can show that AA’ has eigenvalueeigenvector pairs (li, ui), so 2 i AA'ui  l ui   iui with li>0, i=1,...,r and li=0, i=r+1,...,m (for m>k) then 1 i v i  l A'ui Singular Value Decomposition Alternatively, we can show that A’A has eigenvalue-eigenvector pairs (li, vi), so 2 i A'Av i  l v i   i v i with li>0, i=1,...,r and li=0, i=r+1,...,k (for k>m) then 1 i ui  l Av i   i Av i Singular Value Decomposition Suppose we have a rectangular matrix then  2  1  1  A 1 2 2   2 1  2  1  1   6  2  '   AA  1 2  1 2 2   1 2   2 9    Singular Value Decomposition and AA’ has eigenvalues of 1=5 and 2=10 with corresponding normalized eigenvectors  2   -1  5 , u   5 u1   2  1   2    5  5  Singular Value Decomposition Similarly for our rectangular matrix then  2  1  1  A 1 2 2   2 1 5 0 0   2  1  1  '   A A  1 2  0 5 5   1 2  1 2 2  0 5 5  Singular Value Decomposition and A’A also has eigenvalues of 1=0, 2=5 and 3=10 with corresponding normalized eigenvectors      0   0  1   v 1   1  , v 2  0  , v 3   1   2 2 0  1  -1    2 2 Singular Value Decomposition Now taking l1 = 1 = 0, l 2 =  2 = 5, l 3 =  3 = 10 we find that the singular value decomposition of A is  2  5  1 0 0  A  2 1 1  5   1 2 2   1   5   -1     5 1  10 0  2   2  5  1  2  Singular Value Decomposition The singular value decomposition is closely connected to the approximation of a rectangular matrix by a lower-dimension matrix [Eckart and Young, 1936]: First note that if a m x k matrix A is approximated by B of same dimension but lower rank, m k   a i1 j1 ij - bij  2 '  = tr  A - B  A - B     Singular Value Decomposition Eckart and Young used this fact to show that, for a m x k real matrix A with m  k and singular value decomposition UV’, s B   l iui v i1 ' i is the rank-s least squares approximation to A (where s < k = rank (A)). Singular Value Decomposition This matrix B minimizes m k   a i1 j1 ij - bij  2 '  = tr  A - B  A - B     over all m x k matrices of rank no greater than s! It can also be shown that the error of this approximation is k 2 l  i i s 1 Random Vectors and Matrices Random Vector – vector whose individual elements are random variables Random Matrix – matrix whose individual elements are random variables Random Vectors and Matrices The expected value of a random vector or matrix is the matrix containing the expected values of the individual elements, i.e., E  x11  E  x12       E  x  E  x  21  22  E  X        E  x n1  E  x n2  E  x1p    E  x 2p      E  x np    Random Vectors and Matrices where     x ijpij x ij  all x ij    E  x ij      x ij fij x ij dx ij      Random Vectors and Matrices Note that for random matrices X and Y of the same dimension, conformable matrices of constants A and B, and scalar c    E(cX) = cE(X) E(X+Y) = E(X) + E(Y) E(AXB)= AE(X)B Random Vectors and Matrices Mean Vector – random vector whose elements are the means of the corresponding random variables, i.e.,   x ipi  x i   all x i  E  x i   m i     x i fi  x i  dx i    Random Vectors and Matrices In matrix notation we can write the mean vector as E  x1    m     1 E  x 2   m 2  E  X     m     E  x   mp    p     Random Vectors and Matrices For the bivariate probability distribution X1 X2 -4 1 3 p1(x 1) 1 2 0.15 0.10 0.20 0.25 0.05 0.25 0.40 0.60 p2(x 2) 0.25 0.45 0.30 1.00 the mean vector is   x 1p1  x 1   E  x 1   m   all x  1 1   m  E  X     E  x 2   m 2    x 2p 2  x 2    all x 2   0.4(1)  0.6(2)  1.60     0.1(4)  0.25(1)  0.25(3)  0.35  Random Vectors and Matrices Covariance Matrix – random symmetric vector whose diagonal elements are variances of the corresponding random variables, i.e., 2  x i  m i  pi  x i     all x i 2  E  x i  mi    i2        x  m 2 f  x  dx  i i i i i   Random Vectors and Matrices and whose off-diagonal elements are covariances of the corresponding random variable pairs, i.e.,     x i  mi  x k  mk  pik  x i , x k   all x i all x k    E  x i  mi  x k  mk    ik      x i  mi  x k  mk  fik  x i , x k  dx idx k       notice that if we this expression, when i = k, returns the variance, i.e., ii   2 i Random Vectors and Matrices In matrix notation we can write the covariance matrix as    x1  m1      x  m    ' 2 2  E  X - m  X - m    E     x 1  m1          x p  m p   x 2  m2     x  m  2  E E  x1  m1  x 2  m 2   1   1    x  m  2  E  x 2  m 2  x1  m1   E 2   2      E  x  m  x  m   E  x  m  x  m   p 1 1 p 2 2   p    p         x p  mp            E   x 1  m1  x p  m p    11 12 E  x 2  m 2  x p  m p   21  22         p1 p2 2    E x p  mp      1p    2p     pp  Random Vectors and Matrices For the bivariate probability distribution we used earlier X1 X2 -4 1 3 p1(x 1) 1 2 0.15 0.10 0.20 0.25 0.05 0.25 0.40 0.60 p2(x 2) 0.25 0.45 0.30 1.00 the covariance matrix is   x  m  2   x1  m1  x 2  m 2    E E 1 1    '   E  X - m  X - m        2   E  x 2  m 2  x1  m1    E  x 2  m2        2  x  m    1 1 p1  x1   all x1      x 2  m 2  x1  m1  p 21  x 2 , x1   all x 2 all x1   x all x1 all x 2  m1  x 2  m 2  p12  x 1 , x 2     12  11     2    22   21  x 2  m2  p2  x 2    all x 2  1 Random Vectors and Matrices which can be computed as 2  x  m p1  x1     x1  m1  x 2  m2  p12  x1 , x 2     1 1  ' all x1 all x1 all x 2  E  X - m  X - m     2      x 2  m2  p2  x 2     x 2  m2  x1  m1  p21  x 2 , x1    all x 2 all x1  all x 2  1  1.6  4  0.35 0.15   1  1.6 1  0.35 0.2    2 2   1  1.6  3  0.35  0.05   2  1.6  4  0.35  0.1  1  1.6  0.4   2  1.6  0.6      2  1.6 1  0.35  0.25    2  1.6  3  0.35  0.25       4  0.35 1  1.6 0.15   4  0.35 2  1.6 0.1             2 2 2  4  0.35  0.25   1  0.35  0.45   3  0.35  0.3   1  0.35 1  1.6  0.2   1  0.35  2  1.6  0.25       3  0.35 1  1.6  0.05    3  0.35  2  1.6  0.25   0.24 0.39  11 12       0.39 7.0275 21 22     Random Vectors and Matrices Thus the population represented by the bivariate probability distribution X1 X2 -4 1 3 p1(x 1) 1 2 0.15 0.10 0.20 0.25 0.05 0.25 0.40 0.60 p2(x 2) 0.25 0.45 0.30 1.00 Have population mean vector and variance-covariance matrix 1.60  m  and   0.35 0.24 0.39    0.39 7.0275   Random Vectors and Matrices We can use the population variancecovariance matrix S to calculate the population correlation matrix r. Individual population correlation coefficients are defined as rik  ik ii kk Random Vectors and Matrices In matrix notation we can write the correlation matrix as  11   11 11  21  r   22 11    p1     pp 11 12 11 22 22  22  22 p2 pp  22 1p   11 pp   r11 r12 2p  r rik ik    22 pp       rp1 rp2 pp  pp pp   r1p   r2p    rpp  Random Vectors and Matrices We can easily show that   V1 2rV1 2 where V 12  11   0     0 which implies  r V 12 0   0     pp   0  22 0   V   12  Random Vectors and Matrices For the bivariate probability distribution we used earlier X1 X2 -4 1 3 p1(x 1) 1 2 0.15 0.10 0.20 0.25 0.05 0.25 0.40 0.60 p2(x 2) 0.25 0.45 0.30 1.00 the variance matrix is V 12  11   0  0   0.24  22   0  0.48989 0    0 2.65094 7.0275    0 Random Vectors and Matrices so the population correlation matrix is  r V 12   V   12  2.04124 0  0.24 0.39  2.04124 0      0 0.37739 0.39 7.0275 0 0.37739     1.00 0.30    0.30 1.00  Random Vectors and Matrices We often deal with variables that naturally fall into groups. In the simplest case, we have two groups of size q and p – q of variables. Under such circumstances, it may be convenient to partition the matrices and vectors. Random Vectors and Matrices Here we have a mean vector and variance-covariance matrix:  11  m1             q1  mq   m       m  ___    ___  ,      m      q1  q1,1   m      m   p1  p  1p 1,q1 qq  q,q1 q1,q q1,q1 pq p,q1 1p    qp   11    q1,p    21   pp  12     22  Random Vectors and Matrices Rules for Mean Vectors and Covariance Matrices for Linear Combinations of Random Variables: The linear combination of real constants c and random variables X  x1  c'X  c1 c2 has mean vector   p x 2 c p       c j X j   j1    x p  E c'X   c'E  X   c'm Random Vectors and Matrices The linear combination of real constants c and random variables X c'X  c1 c2  x1    p x 2 c p       c j X j   j1    x p  also has variance Var c'X   c'  c Random Vectors and Matrices Suppose, for example, we have random vector  X1  X=  X2  and we want to find the mean vector and covariance matrix for the linear combinations: Z1 = X1 - X 2 , Z 2 = X1 + X 2  Z1  1 1  X1  i.e., Z=         CX Z 2  1 1   X 2  Random Vectors and Matrices We have mean vector 1 1 m1  m1  m 2  m Z = E Z  = Cm X        1 1  m2  m1  m2  and covariance matrix Note what happens if 11=22 – the off diagonal terms vanish (i.e., the sum and difference of two random variables with identical variances are uncorrelated) 1 1 11 12   Z = Cov Z  = C  X C     1 1  21 22  11  212   22 11   22         2    11 22 11 12 22   ' Matrix Inequalities and Maximization These results will be useful in later applications: - Cauchy-Schwartz Inequality – Let b and d be any two p x 1 vectors. Then (b’d)2  (b’b)(d’d) with equality iff b=cd or (or d=cb) for some constant c. Matrix Inequalities and Maximization - Extended Cauchy-Schwartz Inequality – Let b and d be any two p x 1 vectors and B be a p x p positive definite matrix. Then (b’d)2  (b’Bb)(d’B-1d) with equality iff b=cB-1d or (or d=cB -1d) for some constant c. Matrix Inequalities and Maximization - Maximization Lemma – let d be a given p x 1 vector and B be a p x p positive definite matrix. Then for an arbitrary nonzero vector x max x 0  x ' d 2 x ' Bx  d ' B 1d with the maximum attained when x = cB-1d for any constant c. Matrix Inequalities and Maximization - Maximization of Quadratic Forms for Points on the Unit Sphere – let B be a p x p positive definite matrix with eigenvalues l1  l2   lp and associated eigenvectors e1, e2, ,ep. Then x ' Bx  l1 (attained when x=e1 ) max x 0 x'x x ' Bx  l p (attained when x=e p ) min x 0 x ' x Matrix Inequalities and Maximization and x ' Bx  l k+1 (attained when x  ek+1 , k  1,2, max x e1 , ek x ' x ,p-1) Summary There are MANY other Matrix Algebra results that will be important to our study of statistical modeling. This site will be updated occasionally to reflect these results as they become necessary.

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Matrix Algebra Primer - Louisiana Tech University