Download A matrix primer for ST 711.pdf

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Addition wikipedia , lookup

Bra–ket notation wikipedia , lookup

Location arithmetic wikipedia , lookup

Mathematics of radio engineering wikipedia , lookup

Determinant wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
A matrix primer for ST 711. ***Basics: A matrix is a rectangular array of numbers like ⎛1 4 ⎞
⎜
⎟
X = ⎜1 3 ⎟ ⎜1 5 ⎟
⎝
⎠
A vector is a matrix with just one row (row vector) or column (column vector) like ⎛14 ⎞
⎜ ⎟
Y = ⎜ 8 ⎟ or Y ′ = (14 8 10 ) ⎜10 ⎟
⎝ ⎠
The transpose of a matrix is what you get by converting the columns to rows and a prime (apostrophe) is used to denote transpose as you see with Y above. A matrix is symmetric if it is equal to its own transpose. The word “scalar” is used to denote just a number so 5 is called a scalar when we talk about matrices to distinguish it. ***Arithmetic [+, ‐] You can only add matrices that have the same number of rows and the same number of columns. You add element by element, same for subtraction. The elements of a matrix are the entries in the matrix and the ijth element is the one in row i and column j. For example ⎛ 3 −4 2 ⎞ ⎛1 0 −2 ⎞ ⎛ 4 −4 0 ⎞
⎛ 2 −2 0 ⎞
⎜
⎟+⎜
⎟=⎜
⎟ = 2⎜
⎟ ⎝ 1 0 8 ⎠ ⎝1 4 2 ⎠ ⎝ 2 4 10 ⎠
⎝1 2 5⎠
which also shows how scalar multiplication works, and ⎛ 3 −4 2 ⎞ ⎛1 0 −2 ⎞ ⎛ 2 −4 4 ⎞
⎛ 1 −2 2 ⎞
⎜
⎟−⎜
⎟=⎜
⎟ = 2⎜
⎟ ⎝ 1 0 8 ⎠ ⎝1 4 2 ⎠ ⎝ 0 −4 6 ⎠
⎝ 0 −2 3 ⎠
You can multiply a row times a column. It is the sum of cross products. For example you might have ⎛5⎞
( 2 1 4 ) ⎜⎜ 4 ⎟⎟ = ( 2 ) , because 2(5) + 1(4) + 4(−3) = 14 − 12 = 2 Another example would be ⎜ −3 ⎟
⎝ ⎠
⎛14 ⎞
⎜ ⎟
Y ′Y = (14 8 10 ) ⎜ 8 ⎟ = 142 + 82 + 102 = 360 which is the sum of squared Y values. ⎜10 ⎟
⎝ ⎠
[multiplication] To multiply matrices, AB for example, take the ith row of A times the jth column of B. The number of entries must match up, otherwise the matrices cannot be multiplied (they are “not conformable” for multiplication). Example: ⎛ 3 2⎞
⎛2 1 ⎞
⎛1 0⎞
A=⎜
⎟, B = ⎜
⎟, I = ⎜
⎟ gives us ⎝4 5⎠
⎝ 3 −2 ⎠
⎝0 1⎠
⎛ 3 2 ⎞ ⎛ 2 1 ⎞ ⎛ 6 + 6 3 − 4 ⎞ ⎛ 12 −1 ⎞
AB = ⎜
⎟⎜
⎟=⎜
⎟=⎜
⎟ as well as ⎝ 4 5 ⎠ ⎝ 3 −2 ⎠ ⎝ 8 + 15 4 − 10 ⎠ ⎝ 23 −6 ⎠
⎛ 2 1 ⎞⎛ 3 2 ⎞ ⎛ 6 + 4 4 + 5 ⎞ ⎛10 9 ⎞
BA = ⎜
⎟⎜
⎟=⎜
⎟=⎜
⎟ so clearly AB is not always the same as BA. If ⎝ 3 −2 ⎠⎝ 4 5 ⎠ ⎝ 9 − 8 6 − 10 ⎠ ⎝ 1 −4 ⎠
one of the matrices is just a scalar, then AB will equal BA. We also notice that AI=A, IA=A, BI=B, and IB=B. The matrix with 1s on its diagonal (upper left to lower right entries) and 0s off the diagonal is symbolized I and is called the “identity matrix” because when any conformable matrix is multiplied by I on the right or the left, it remains unchanged – identical to what it was to start with. [division] Technically, there is no such thing as division. What we do is invert and multiply. With numbers, you may recall that dividing 8 by 0.5 is the same as multiplying 8 by the inverse of 0.5. The inverse of 0.5 is a number X that solves 0.5X=1 and note that any number multiplied by 1 is unchanged (1 is the identity in arithmetic). The inverse of 0.5 is 2 and 8/0.5 = 8(2) = 16 (try it on your calculator). Also there is a number (0) that has no inverse so we cannot divide by 0. In matrices, the identity is the matrix I as described under [multiplication] so the inverse of a matrix A is another matrix C that satisfies AC=I. In order for A to have an inverse, it must be square and have full rank (see the next section). If A has an inverse then instead of writing the inverse as C we denote the inverse of matrix A by A‐1. This means that A‐1 is a matrix such that A(A‐1)=I and whenever that happens, you can be sure that A‐1A=I as well. ⎛ 0.4 −0.2 ⎞
⎛ 6 2⎞
−1
⎟ because ⎟ then A = ⎜
⎝ −0.7 0.6 ⎠
⎝7 4⎠
⎛ 6 2 ⎞ ⎛ 0.4 −0.2 ⎞ ⎛ 2.4 − 1.4 −1.2 + 1.2 ⎞ ⎛ 1 0 ⎞
AA−1 = ⎜
⎟⎜
⎟=⎜
⎟=⎜
⎟ ⎝ 7 4 ⎠ ⎝ −0.7 0.6 ⎠ ⎝ 2.8 − 2.8 −1.4 + 2.4 ⎠ ⎝ 0 1 ⎠
For example, if A = ⎜
Usually we let the computer calculate the inverse of a matrix, however it can be helpful to know a formula for a 2 by 2 matrix like our A here. The formula is −1
⎛a b ⎞
⎛ d −b ⎞
1
⎜
⎟ =
⎜
⎟ so for our matrix A we got the inverse (ad − bc) ⎝ −c a ⎠
⎝c d ⎠
−1
⎛ 6 2⎞
⎛ 4 −2 ⎞
⎛ 4 −2 ⎞ ⎛ 0.4 −0.2 ⎞
1
⎜
⎟ =
⎜
⎟ = 0.1⎜
⎟=⎜
⎟ (24 − 14) ⎝ −7 6 ⎠
⎝7 4⎠
⎝ −7 6 ⎠ ⎝ −0.7 0.6 ⎠
⎛ 6 −4 ⎞
⎛6 4⎞
1
−1
‐1
⎜
⎟ but that ⎟ . What is B ? We try to compute B =
(36 − 36) ⎝ −9 6 ⎠
⎝9 6⎠
Suppose we have B = ⎜
involves 1/0 which is undefined. That number (ad‐bc) is called the “determinant” as it determines if the matrix has an inverse. *** rank and dependence, or, is there an inverse? ⎛6 4⎞
⎟ we can multiply the columns of B by numbers (scalars) that are not ⎝9 6⎠
⎛6⎞
⎛ 4⎞ ⎛0⎞
all 0, add these together and get a column of 0s. We see that (−2) ⎜ ⎟ + (3) ⎜ ⎟ = ⎜ ⎟ where at least ⎝9⎠
⎝ 6⎠ ⎝0⎠
Notice that for matrix B = ⎜
one of the coefficients (‐2 and 3) is not 0. This is called a column dependency. Columns for which there are no dependencies are called “linearly independent”. The columns of matrix A are linearly independent. The number of linearly independent columns in a matrix is called the rank of the matrix. Matrix B has rank 1 because the two columns are linearly independent (rank < 2) and at least one of the columns has the property that only a multiplier 0 will give a column of 0s. (The only rank 0 matrix is a ⎛6⎞
⎝5⎠
⎛0⎞ ⎛0⎞
⎝0⎠ ⎝0⎠
⎛6 0⎞
⎟ is of ⎝5 0⎠
matrix with all entries 0). As another example, (0) ⎜ ⎟ + (3) ⎜ ⎟ = ⎜ ⎟ so the matrix D = ⎜
rank 1, not 2, because we have demonstrated a linear dependency in the columns. The number of linearly independent rows in a matrix always equals the number of linearly independent columns so you can use either one. A square matrix has an inverse if (and only if) all of its columns are linearly independent, that is, only if it is full rank. Full rank means that the rank (number of linearly independent columns) of the matrix is the same as the number of columns it contains. A square matrix of less than full rank cannot be inverted and is referred to as “singular”. Matrix A has an inverse and matrix B and D do not. We can transpose or try to invert the product of two matrices but the left to right order is affected as follows: ( AB )′ = ( B′)( A′) and for the inverse of a product we have ( AB ) −1 = ( B −1 )( A−1 ) if A and B both have inverses. *** The matrix version of regression Consider again ⎛1 4 ⎞
⎛14 ⎞
⎜
⎟
⎜ ⎟
X = ⎜1 3 ⎟ and Y = ⎜ 8 ⎟ and the related motivating problem of simple linear regression for the (x,y) ⎜1 5 ⎟
⎜10 ⎟
⎝
⎠
⎝ ⎠
points (4,14), (3,8), and (5,10). (A) Compute X’X ⎛1 4 ⎞
⎛1 1 1⎞⎜
⎟ ⎛ 3 12 ⎞
X ′X = ⎜
⎟ ⎜1 3 ⎟ = ⎜
⎟ which is symmetric. A matrix is symmetric if it is 12
50
⎝ 4 3 5⎠⎜
⎝
⎠
⎟
⎝1 5 ⎠
equal to its own transpose. (B) Invert X’X ( X ′X ) −1 =
⎛ 50 −12 ⎞ ⎛ 50 / 6 −2 ⎞
1
⎜
⎟=⎜
⎟ 150 − 144 ⎝ −12 3 ⎠ ⎝ −2 0.5 ⎠
(C) Compute X’Y ⎛14 ⎞
⎛ 1 1 1 ⎞ ⎜ ⎟ ⎛ 32 ⎞
X ′Y = ⎜
⎟⎜ 8 ⎟ = ⎜
⎟ ⎝ 4 3 5 ⎠ ⎜ ⎟ ⎝130 ⎠
⎝10 ⎠
(D) Write the normal equations (E) Multiply both sides by (X’X)‐1 making sure that the products have (X’X)‐1 first (i.e. on the left) ⎛ 3 12 ⎞ ⎛ βˆ0 ⎞ ⎛ 32 ⎞
X ′X βˆ = X ′Y − −− > ⎜
⎟⎜ ⎟ = ⎜
⎟ ⎝12 50 ⎠ ⎜⎝ βˆ1 ⎟⎠ ⎝ 130 ⎠
⎛ 50 / 6 −2 ⎞ ⎛ 3 12 ⎞ ⎛ βˆ0 ⎞ ⎛ 50 / 6 −2 ⎞ ⎛ 32 ⎞
⎟⎜
⎟⎜ ⎟ = ⎜
⎟⎜
⎟ ⎝ −2 0.5 ⎠ ⎝12 50 ⎠ ⎜⎝ βˆ1 ⎟⎠ ⎝ −2 0.5 ⎠ ⎝130 ⎠
( X ′X ) −1 X ′X βˆ = ( X ′X ) −1 X ′Y − −− > ⎜
⎛ 1 0 ⎞ ⎛ βˆ0 ⎞ ⎛ 50 / 6 −2 ⎞ ⎛ 32 ⎞
⎟⎜ ⎟ = ⎜
⎟⎜
⎟ ⎝ 0 1 ⎠ ⎜⎝ βˆ1 ⎟⎠ ⎝ −2 0.5 ⎠ ⎝130 ⎠
I βˆ = ( X ′X ) −1 X ′Y − −− > ⎜
⎛ βˆ0 ⎞ ⎛ 6.67 ⎞
⎟=
⎜ βˆ ⎟ ⎜⎝ 1 ⎟⎠
⎝ 1⎠
βˆ = ( X ′X ) −1 X ′Y − −− > ⎜
The simple linear regression for the points (4,14), (3,8), and (5,10) is therefore Yˆ = 6.67 + 49 X Check in SAS: Data Example; Input X Y; cards; 4 14 3 8 5 10 ; Proc reg; model Y=X; Run; Parameter Estimates
Variable
Intercept
X
DF
Parameter
Estimate
Standard
Error
t Value
1
1
6.66667
1.00000
11.78511
2.88675
0.57
0.35
Pr > |t|
0.6723
0.7877