Download Matrix Algebra in S-Plus

Matrix Algebra in R Entering a vector 𝒙 > x <- c(3,1,3,2,-1,5) > x [1] 3 1 3 2 -1 5 Subsetting from a Vector > x = c(3,1,3,2,-1,5) > x[4] [1] 2 > x[1:4] [1] 3 1 3 2 > x[c(4,6)] [1] 2 5 > x[-c(4,6)] [1] 3 1 3 -1 > x[-1] [1] 1 3 2 -1 5 > x[x>2] [1] 3 3 5 > x[x<0] [1] -1 Find the length of 𝒙 𝒐𝒓 ‖𝒙‖ > Lx <- sqrt(t(x)%*%x) note that t(x)denotes 𝒙′ > Lx [,1] [1,] 7 Outer Product of a Vector (𝒙𝒙′) > x = c(3,1,3,2,-1,5) > x%*%t(x)  The vector 𝒙 𝑖𝑠 6 × 1 𝑎𝑛𝑑 𝒙′ 𝑖𝑠 1 × 6 thus 𝒙𝒙′ 𝑖𝑠 6 × 6. [,1] [,2] [,3] [,4] [,5] [,6] [1,] 9 3 9 6 -3 15 [2,] 3 1 3 2 -1 5 [3,] 9 3 9 6 -3 15 [4,] 6 2 6 4 -2 10 [5,] -3 -1 -3 -2 1 -5 [6,] 15 5 15 10 -5 25 1 Entering a matrix A > A 1: 4: 7: 10: <- matrix(scan(),nrow=3,ncol=3,byrow=T) 1 3 6 2 -1 5 3 2 6 <enter> > A [1,] [2,] [3,] [,1] [,2] [,3] 1 3 6 2 -1 5 3 2 6 Subsetting from a Matrix (or Data Frame) > A[1,] [1] 1 3 6 > A[3,] [1] 3 2 6 > A[1:2,] [,1] [,2] [,3] [1,] 1 3 6 [2,] 2 -1 5 > A[c(1,3),] [,1] [,2] [,3] [1,] 1 3 6 [2,] 3 2 6 > A[,1] [1] 1 2 3 > A[,3] [1] 6 5 6 > A[,1:2] [,1] [,2] [1,] 1 3 [2,] 2 -1 [3,] 3 2 > A[,c(1,3)] [,1] [,2] [1,] 1 6 [2,] 2 5 [3,] 3 6 > A[2,3] [1] 5 2 Vector times a matrix (𝑨𝒙) > x = c(2,1,3) > A%*%x # note the matrix multiplication operator in R is %*% [,1] [1,] 23 [2,] 18 [3,] 26 Transpose of a matrix (𝑨′) > At <- t(A) > At [,1] [,2] [,3] [1,] 1 2 3 [2,] 3 -1 2 [3,] 6 5 6 Matrix Multiplication > AtA <- At %*% A > AtA [,1] [,2] [,3] [1,] 14 7 34 [2,] 7 14 25 [3,] 34 25 97  Compute 𝑨′𝑨 > A%*%t(A)  Compute 𝑨𝑨′ [,1] [,2] [,3] [1,] 46 29 45 [2,] 29 30 34 [3,] 45 34 49 > B = matrix(scan(),nrow=3,ncol=2) 1: 1 2 3: 0 -1 5: -1 3 7: Read 6 items > A%*%B [,1] [,2] [1,] 7 14 [2,] 0 14 [3,] 7 13 > B%*%A Error in B %*% A : non-conformable arguments 3 Inverse of a Matrix 𝑨−𝟏 > Ainv <- solve(A) > Ainv [,1] [,2] [1,] -0.45714286 -0.1714286 [2,] 0.08571429 -0.3428571 [3,] 0.20000000 0.2000000 [,3] 0.6 0.2 -0.2 Verify that 𝑨−𝟏 𝑨 𝒂𝒏𝒅 𝑨𝑨−𝟏 = 𝑰 > Ainv%*%A [,1] [,2] [1,] 1 2.220446e-16 [2,] 0 1.000000e+00 [3,] 0 -1.110223e-16 > A%*%Ainv [,1] [,2] [1,] 1.000000e+00 0 [2,] -1.110223e-16 1 [3,] 0.000000e+00 0 [,3] 4.440892e-16 0.000000e+00 1.000000e+00 [,3] 0.000000e+00 2.220446e-16 1.000000e+00 Eigenvalues and Eigenvectors > eigA <- eigen(A) > eigA $values: [1] 9.894725 -2.452321 -1.442404 $vectors: [,1] [,2] [,3] [1,] 0.6619680 -1.1067100 -0.95589971 [2,] 0.4651549 0.9263453 0.01641638 [3,] 0.7487598 0.1736137 0.38090735 > P <- eigA$vectors # store matrix of eigenvectors in 𝑃 > lam <- eigA$values # store eigenvalues in a vector lam > e1 <- e[,1] # extract first eigenvector (i.e. column) from 𝑃 > lam1 <- lam[1] # extract first eigenvalue from lam > A%*%e1 [,1] [1,] 6.549991 [2,] 4.602580 [3,] 7.408772 > lam1*e1 # Note that 𝑨𝒆𝟏 = 𝝀𝟏 𝒆𝟏 [1] 6.549991 4.602580 7.408772 Determinant of a matrix (we actually write our own function!) Use fact that det(𝐴) = |𝐴| = ∏𝑘𝑖=1 𝜆𝑖 > det <- function(x) prod(eigen(x)$values) > det(A) [1] 35 4 Spectral Decomposition of a Symmetric Matrix A Example 1: > A <- matrix(scan(),ncol=3,nrow=3) # Note A is a 3 × 3 symmetric matrix 1: 1 -2 3 4: -2 4 -1 7: 3 -1 5 10: <enter> > eigA <- eigen(A) # performs and eigen-analysis of the matrix A > eigA $values: [1] 7.5895980 3.3838454 -0.9734434 $vectors: [,1] [,2] [,3] [1,] 0.4799416 0.0151370 0.8771698 [2,] -0.4732062 -0.8374647 0.2733656 [3,] 0.7387367 -0.5462817 -0.3947713 > P <- eigA$vectors # create matrix P which has columns equal to the eigenvectors of A > t(P)%*%P [,1] [,2] [,3] [1,] 1.000000e+00 5.551115e-17 0.000000e+00 [2,] 5.551115e-17 1.000000e+00 8.326673e-17 [3,] 0.000000e+00 8.326673e-17 1.000000e+00 > P%*%t(P) [,1] [,2] [,3] [1,] 1.000000e+00 -8.326673e-17 5.551115e-17 [2,] -8.326673e-17 1.000000e+00 6.938894e-17 [3,] 5.551115e-17 6.938894e-17 1.000000e+00 Note that 𝑃′ 𝑃 = 𝑃𝑃′ = 𝐼, thus 𝑃 is an orthogonal matrix. > Lam <- diag(eigA$values)# this command creates a diagonal matrix with the eigenvalues for the diagonal entries. 𝜆1 Λ = (0 0 > Lam 0 𝜆2 0 0 0) 𝜆3 [,1] [,2] [,3] [1,] 7.589598 0.000000 0.0000000 [2,] 0.000000 3.383845 0.0000000 [3,] 0.000000 0.000000 -0.9734434 > P %*% Lam %*% t(P) # shows A = P P' [1,] [2,] [3,] [,1] [,2] [,3] 1 -2 3 -2 4 -1 3 -1 5 5 Example 2: > A <- matrix(scan(),ncol=2,nrow=2) 1: 2.0 2.0 3: 2.0 2.5 5: <enter> > eigA <- eigen(A) > eigA $values: [1] 4.2655644 0.2344356 $vectors: [,1] [,2] [1,] 0.6618026 0.7496782 [2,] 0.7496782 -0.6618026 > P <- eigA$vectors e1 <- P[,1] # store the eigenvectors e1 and e2 by extracting the first and second columns of P e2 <- P[,2] > A%*%e1 [,1] [1,] 2.822961 [2,] 3.197801  > eigA$values[1]*e1 [1] 2.822961 3.197801 > e1%*%e1 [,1] [1,] 1 > e1%*%e2 [,1] [1,] 0 > P%*%t(P) # compute Ae # this shows Ae = e # this shows e is normal (i.e. unit length ‖𝑒1 ‖ = 1) # this shows e1 and e2 are orthogonal # this shows P is an orthogonal matrix (t(P)%*%P) would work also, i.e. 𝑃′ 𝑃 = 𝑃𝑃′ = 𝐼. [,1] [,2] [1,] 1 0 [2,] 0 1 > Lam <- diag(eigA$values) # create the diagonal matrix   > P%*%Lam%*%t(P) # shows A = PP’ (i.e. the spectral decomposition) [,1] [,2] [1,] 2 2.0 [2,] 2 2.5 6 Square Root Matrix (and its inverse) - 𝐴1⁄2 and 𝐴−1⁄2 𝐴1⁄2 = 𝑃Λ1⁄2 𝑃′ and 𝐴−1⁄2 = 𝑃Λ−1⁄2 𝑃′ > A = matrix(scan(),nrow=2,ncol=2,byrow=T) 1: 2 2 3: 2 2.5 5: Read 4 items > eigA = eigen(A) > P = eigA$vectors > Lam = diag(eigA$values) > sqrtA <- P%*%Lam^(1/2)%*%t(P) # creates the square root matrix > sqrtA [,1] [,2] [1,] 1.1766968 0.7844645 [2,] 0.7844645 1.1766968 > sqrtA%*%sqrtA [,1] [,2] [1,] 2 2.0 [2,] 2 2.5 A1 / 2 # shows A = A1/2 A1/2 > Lam.5 = diag(1/sqrt(eigA$values)) > sqrtAinv = P%*%Lam.5%*%t(P) > sqrtAinv [,1] [,2] [1,] 1.3728129 -0.7844645 [2,] -0.7844645 1.1766968 > Ainv = solve(A) > Ainv [,1] [,2] [1,] 2.5 -2 [2,] -2.0 2 1 # creates the inverse square root matrix 𝐴−2 # find 𝐴−1 > sqrtAinv%*%sqrtAinv [,1] [,2] [1,] 2.5 -2 [2,] -2.0 2 1 1 # shows 𝐴−2 𝐴−2 = 𝐴−1 1 1 1 1 > sqrtA%*%sqrtAinv # shows 𝐴−2 𝐴2 = 𝐴2 𝐴−2 = 𝐼 [,1] [,2] [1,] 1.000000e+00 0 [2,] -2.220446e-16 1 > sqrtAinv%*%sqrtA [,1] [,2] [1,] 1 -2.220446e-16 [2,] 0 1.000000e+00 7 Drawing Contours > theta <- seq(0,2*pi,len=100) # these commands generate 100 vectors of unit length centered at origin. > x <- cos(theta) > y <- sin(theta) > par(pty="s") # creates square plotting region > plot(x,y,xlim=c(-4,4),ylim=c(-4,4),type="l") # x and y axis limits (-4,4) and plots line (type=”l”) instead of points (type = “p” which is the default). > points(0,0,pch=16) > Ax <- A%*%rbind(x,y) > lines(t(Ax)) # puts a large point at the origin # multiply set of unit length vectors by A # draw ellipse formed by vectors from previous step Connection to Eigenvalues and Eigenvectors > eigA $values: [1] 4.2655644 0.2344356 $vectors: [,1] [,2] [1,] 0.6618026 0.7496782 [2,] 0.7496782 -0.6618026 > 4.2655644*e1  𝜆1 𝑒1 [1] 2.822961 3.197801 > segments(0,0,2.822961,3.197801) > .2344356*e2  𝜆2 𝑒2 [1] .1757513 -0.1551501 > segments(0,0,.1757513,-.1551501) # Where did the segment appear? Why? # Where did this segment appear? Why? 8 9 Computing Mean Vectors, Variance-Covariance and Correlation Matrices > attach(City) # attach the City data frame. > X <- cbind(income,welfare,poverty) # columns are variables, rows are cities > row.names(X) <- row.names(City)  retain observation labels > X[1:5,] # display the first 5 rows of the data matrix 𝑋 income welfare poverty New.York.NY 29823 13.1 19.3 Los.Angeles.CA 30925 10.7 18.9 Chicago.IL 26301 14.4 21.6 Houston.TX 26261 7.1 20.7 Philadelphia.PA 24603 14.0 20.3 > head(X) # the function head() displays the first 6 rows of a matrix or a data frame. New.York.NY Los.Angeles.CA Chicago.IL Houston.TX Philadelphia.PA San.Diego.CA income welfare poverty 29823 13.1 19.3 30925 10.7 18.9 26301 14.4 21.6 26261 7.1 20.7 24603 14.0 20.3 33686 8.8 13.4 > Xt <- t(X) # columns are cities, rows are variables in 𝑋′ > Xt[,1:5] # displays the values of these variables for the first 5 cities (columns) in the data set New.York.NY income 29823.0 welfare 13.1 poverty 19.3 Los.Angeles.CA Chicago.IL Houston.TX Philadelphia.PA 3 0925.0 26301.0 26261.0 24603.0 10.7 14.4 7.1 14.0 18.9 21.6 20.7 20.3 > apply(X,2,mean) # calculates and displays the sample mean vector for these variables, i.e. the column means of 𝑋. income welfare poverty 26976.65 10.21688 17.9753 > var(X) # computes the variance covariance matrix, S, for these variables income welfare poverty income welfare poverty 32202069.73 -14949.50453 -28010.65087 -14949.50 24.90616 23.22700 -28010.65 23.22700 35.80899 > cor(X) # computes the correlation matrix, R, for these variables income welfare poverty income 1.00 -0.5278756 -0.8248696 welfare -0.5278756 1.00 0.7777567 poverty -0.8248696 0.7777567 1.00 10 > sX <- scale(Xt) # scales the variables to have mean 0 and variance 1. Notice the variancecovariance matrix of the scaled variables is the correlation matrix. For many methods we will be examining, it is often necessary to standardize variables first. We will see an example of why this is important below. > var(sX) income welfare poverty income 1.00 -0.5278755 -0.8248696 welfare -0.5278755 1.00 0.7777567 poverty -0.8248696 0.7777567 1.00 Measuring Distance/Similarity Between Cities On the basis of these three measured characteristics (median income, percent of population receiving welfare, and percentage of the population below the poverty line) how can measure how different or similar two cities are, e.g. Detroit, MI and Minneapolis, MN? The Euclidean distance between two vectors 𝑥𝑖 𝑎𝑛𝑑 𝑥𝑗 is given by 𝑑𝑖𝑠𝑡(𝑥𝑖 , 𝑥𝑗 ) = ‖𝑥𝑖 − 𝑥𝑗 ‖ = √(𝑥𝑖 − 𝑥𝑗 )′(𝑥𝑖 − 𝑥𝑗 ) > View(City) > xd = X[9,]  extract data for Detroit > xd income welfare poverty 18742.0 26.1 32.4 > xm = X[47,]  extract data for Minneapolis > xm income welfare poverty 25324.0 10.5 18.5 > t(xd-xm)%*%(xd-xm) [,1] [1,] 43323161 > sqrt(t(xd-xm)%*%(xd-xm))  Euclidean Distance [,1] [1,] 6582.033 Calculations by hand: 𝑑𝑖𝑠𝑡(𝑥𝑑 , 𝑥𝑚 ) = √(18742 − 25324)2 + (26.1 − 10.5)2 + (32.4 − 18.5)2 = √43322724 + 243.36 + 193.21 = 6582.03 Clearly the distance between Detroit and Minneapolis (and any other two cities for that matter) is dominated by the median income. Thus the discrepancies between the percent on welfare and below poverty level have little to do with the dissimilarity between these two cities on the basis of these characteristics. 11 If we standardize the variables first, we put them all on the same scale. > sxd = sX[9,]  Detroit > sxd income welfare poverty -1.451120 3.182602 2.410516 > sxm = sX[47,]  Minneapolis > sxm income welfare poverty -0.29123182 0.05672995 0.08767880 > sqrt(t(sxd-sxm)%*%(sxd-sxm)) [,1] [1,] 4.063495 𝑑𝑖𝑠𝑡(𝑥𝑑 , 𝑥𝑚 ) = √(−1.45 − .291)2 + (3.183 − .0567)2 + (2.411 − .0877)2 = √3.03 + 9.774 + 5.397 = 4.0635 In the standardized scale the discrepancies between the percentages on welfare and below the poverty level are the largest contributors to the distance between Detroit and Minneapolis. The using a similar process the Euclidean distance between Detroit and Stockton, CA is 8134 and the standardized scale the distance is 2.52. Given the scatterplot below, which measure of discrepancy/distance is more appropriate? Note: These plots were created in JMP. 12

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Matrix Algebra in S-Plus