Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Matrix Algebra in R Entering a vector π > x <- c(3,1,3,2,-1,5) > x [1] 3 1 3 2 -1 5 Subsetting from a Vector > x = c(3,1,3,2,-1,5) > x[4] [1] 2 > x[1:4] [1] 3 1 3 2 > x[c(4,6)] [1] 2 5 > x[-c(4,6)] [1] 3 1 3 -1 > x[-1] [1] 1 3 2 -1 5 > x[x>2] [1] 3 3 5 > x[x<0] [1] -1 Find the length of π ππ βπβ > Lx <- sqrt(t(x)%*%x) οnote that t(x)denotes πβ² > Lx [,1] [1,] 7 Outer Product of a Vector (ππβ²) > x = c(3,1,3,2,-1,5) > x%*%t(x) ο The vector π ππ 6 × 1 πππ πβ² ππ 1 × 6 thus ππβ² ππ 6 × 6. [,1] [,2] [,3] [,4] [,5] [,6] [1,] 9 3 9 6 -3 15 [2,] 3 1 3 2 -1 5 [3,] 9 3 9 6 -3 15 [4,] 6 2 6 4 -2 10 [5,] -3 -1 -3 -2 1 -5 [6,] 15 5 15 10 -5 25 1 Entering a matrix A > A 1: 4: 7: 10: <- matrix(scan(),nrow=3,ncol=3,byrow=T) 1 3 6 2 -1 5 3 2 6 <enter> > A [1,] [2,] [3,] [,1] [,2] [,3] 1 3 6 2 -1 5 3 2 6 Subsetting from a Matrix (or Data Frame) > A[1,] [1] 1 3 6 > A[3,] [1] 3 2 6 > A[1:2,] [,1] [,2] [,3] [1,] 1 3 6 [2,] 2 -1 5 > A[c(1,3),] [,1] [,2] [,3] [1,] 1 3 6 [2,] 3 2 6 > A[,1] [1] 1 2 3 > A[,3] [1] 6 5 6 > A[,1:2] [,1] [,2] [1,] 1 3 [2,] 2 -1 [3,] 3 2 > A[,c(1,3)] [,1] [,2] [1,] 1 6 [2,] 2 5 [3,] 3 6 > A[2,3] [1] 5 2 Vector times a matrix (π¨π) > x = c(2,1,3) > A%*%x # note the matrix multiplication operator in R is %*% [,1] [1,] 23 [2,] 18 [3,] 26 Transpose of a matrix (π¨β²) > At <- t(A) > At [,1] [,2] [,3] [1,] 1 2 3 [2,] 3 -1 2 [3,] 6 5 6 Matrix Multiplication > AtA <- At %*% A > AtA [,1] [,2] [,3] [1,] 14 7 34 [2,] 7 14 25 [3,] 34 25 97 ο Compute π¨β²π¨ > A%*%t(A) ο Compute π¨π¨β² [,1] [,2] [,3] [1,] 46 29 45 [2,] 29 30 34 [3,] 45 34 49 > B = matrix(scan(),nrow=3,ncol=2) 1: 1 2 3: 0 -1 5: -1 3 7: Read 6 items > A%*%B [,1] [,2] [1,] 7 14 [2,] 0 14 [3,] 7 13 > B%*%A Error in B %*% A : non-conformable arguments 3 Inverse of a Matrix π¨βπ > Ainv <- solve(A) > Ainv [,1] [,2] [1,] -0.45714286 -0.1714286 [2,] 0.08571429 -0.3428571 [3,] 0.20000000 0.2000000 [,3] 0.6 0.2 -0.2 Verify that π¨βπ π¨ πππ π¨π¨βπ = π° > Ainv%*%A [,1] [,2] [1,] 1 2.220446e-16 [2,] 0 1.000000e+00 [3,] 0 -1.110223e-16 > A%*%Ainv [,1] [,2] [1,] 1.000000e+00 0 [2,] -1.110223e-16 1 [3,] 0.000000e+00 0 [,3] 4.440892e-16 0.000000e+00 1.000000e+00 [,3] 0.000000e+00 2.220446e-16 1.000000e+00 Eigenvalues and Eigenvectors > eigA <- eigen(A) > eigA $values: [1] 9.894725 -2.452321 -1.442404 $vectors: [,1] [,2] [,3] [1,] 0.6619680 -1.1067100 -0.95589971 [2,] 0.4651549 0.9263453 0.01641638 [3,] 0.7487598 0.1736137 0.38090735 > P <- eigA$vectors # store matrix of eigenvectors in π > lam <- eigA$values # store eigenvalues in a vector lam > e1 <- e[,1] # extract first eigenvector (i.e. column) from π > lam1 <- lam[1] # extract first eigenvalue from lam > A%*%e1 [,1] [1,] 6.549991 [2,] 4.602580 [3,] 7.408772 > lam1*e1 # Note that π¨ππ = ππ ππ [1] 6.549991 4.602580 7.408772 Determinant of a matrix (we actually write our own function!) Use fact that det(π΄) = |π΄| = βππ=1 ππ > det <- function(x) prod(eigen(x)$values) > det(A) [1] 35 4 Spectral Decomposition of a Symmetric Matrix A Example 1: > A <- matrix(scan(),ncol=3,nrow=3) # Note A is a 3 × 3 symmetric matrix 1: 1 -2 3 4: -2 4 -1 7: 3 -1 5 10: <enter> > eigA <- eigen(A) # performs and eigen-analysis of the matrix A > eigA $values: [1] 7.5895980 3.3838454 -0.9734434 $vectors: [,1] [,2] [,3] [1,] 0.4799416 0.0151370 0.8771698 [2,] -0.4732062 -0.8374647 0.2733656 [3,] 0.7387367 -0.5462817 -0.3947713 > P <- eigA$vectors # create matrix P which has columns equal to the eigenvectors of A > t(P)%*%P [,1] [,2] [,3] [1,] 1.000000e+00 5.551115e-17 0.000000e+00 [2,] 5.551115e-17 1.000000e+00 8.326673e-17 [3,] 0.000000e+00 8.326673e-17 1.000000e+00 > P%*%t(P) [,1] [,2] [,3] [1,] 1.000000e+00 -8.326673e-17 5.551115e-17 [2,] -8.326673e-17 1.000000e+00 6.938894e-17 [3,] 5.551115e-17 6.938894e-17 1.000000e+00 Note that πβ² π = ππβ² = πΌ, thus π is an orthogonal matrix. > Lam <- diag(eigA$values)# this command creates a diagonal matrixο¬ο οο¬ with the eigenvalues for the diagonal entries. π1 Ξ = (0 0 > Lam 0 π2 0 0 0) π3 [,1] [,2] [,3] [1,] 7.589598 0.000000 0.0000000 [2,] 0.000000 3.383845 0.0000000 [3,] 0.000000 0.000000 -0.9734434 > P %*% Lam %*% t(P) # shows A = Pο ο P' [1,] [2,] [3,] [,1] [,2] [,3] 1 -2 3 -2 4 -1 3 -1 5 5 Example 2: > A <- matrix(scan(),ncol=2,nrow=2) 1: 2.0 2.0 3: 2.0 2.5 5: <enter> > eigA <- eigen(A) > eigA $values: [1] 4.2655644 0.2344356 $vectors: [,1] [,2] [1,] 0.6618026 0.7496782 [2,] 0.7496782 -0.6618026 > P <- eigA$vectors e1 <- P[,1] # store the eigenvectors e1 and e2 by extracting the first and second columns of P e2 <- P[,2] > A%*%e1 ο ο ο ο ο ο ο ο ο ο ο ο ο ο ο [,1] [1,] 2.822961 [2,] 3.197801 ο > eigA$values[1]*e1 [1] 2.822961 3.197801ο > e1%*%e1 [,1] [1,] 1 > e1%*%e2 [,1] [1,] 0 > P%*%t(P) # compute Aeο±ο # this shows Aeο± = ο¬ο±eο±ο # this shows eο± is normal (i.e. unit length βπ1 β = 1) # this shows e1 and e2 are orthogonal # this shows P is an orthogonal matrix (t(P)%*%P) would work also, i.e. πβ² π = ππβ² = πΌ. [,1] [,2] [1,] 1 0 [2,] 0 1 > Lam <- diag(eigA$values) # create the diagonal matrix οο ο > P%*%Lam%*%t(P) # shows A = PοPβ (i.e. the spectral decomposition)ο [,1] [,2] [1,] 2 2.0 [2,] 2 2.5 6 Square Root Matrix (and its inverse) - π΄1β2 and π΄β1β2 π΄1β2 = πΞ1β2 πβ² and π΄β1β2 = πΞβ1β2 πβ² > A = matrix(scan(),nrow=2,ncol=2,byrow=T) 1: 2 2 3: 2 2.5 5: Read 4 items > eigA = eigen(A) > P = eigA$vectors > Lam = diag(eigA$values) > sqrtA <- P%*%Lam^(1/2)%*%t(P) # creates the square root matrix > sqrtA [,1] [,2] [1,] 1.1766968 0.7844645 [2,] 0.7844645 1.1766968 > sqrtA%*%sqrtA [,1] [,2] [1,] 2 2.0 [2,] 2 2.5 A1 / 2 # shows A = A1/2 A1/2 > Lam.5 = diag(1/sqrt(eigA$values)) > sqrtAinv = P%*%Lam.5%*%t(P) > sqrtAinv [,1] [,2] [1,] 1.3728129 -0.7844645 [2,] -0.7844645 1.1766968 > Ainv = solve(A) > Ainv [,1] [,2] [1,] 2.5 -2 [2,] -2.0 2 1 # creates the inverse square root matrix π΄β2 # find π΄β1 > sqrtAinv%*%sqrtAinv [,1] [,2] [1,] 2.5 -2 [2,] -2.0 2 1 1 # shows π΄β2 π΄β2 = π΄β1 1 1 1 1 > sqrtA%*%sqrtAinv # shows π΄β2 π΄2 = π΄2 π΄β2 = πΌ [,1] [,2] [1,] 1.000000e+00 0 [2,] -2.220446e-16 1 > sqrtAinv%*%sqrtA [,1] [,2] [1,] 1 -2.220446e-16 [2,] 0 1.000000e+00 7 Drawing Contours > theta <- seq(0,2*pi,len=100) # these commands generate 100 vectors of unit length centered at origin. > x <- cos(theta) > y <- sin(theta) > par(pty="s") # creates square plotting region > plot(x,y,xlim=c(-4,4),ylim=c(-4,4),type="l") # x and y axis limits (-4,4) and plots line (type=βlβ) instead of points (type = βpβ which is the default). > points(0,0,pch=16) > Ax <- A%*%rbind(x,y) > lines(t(Ax)) # puts a large point at the origin # multiply set of unit length vectors by A # draw ellipse formed by vectors from previous step Connection to Eigenvalues and Eigenvectors > eigA $values: [1] 4.2655644 0.2344356 $vectors: [,1] [,2] [1,] 0.6618026 0.7496782 [2,] 0.7496782 -0.6618026 > 4.2655644*e1 ο π1 π1 [1] 2.822961 3.197801 > segments(0,0,2.822961,3.197801) > .2344356*e2 ο π2 π2 [1] .1757513 -0.1551501 > segments(0,0,.1757513,-.1551501) # Where did the segment appear? Why? # Where did this segment appear? Why? 8 9 Computing Mean Vectors, Variance-Covariance and Correlation Matrices > attach(City) # attach the City data frame. > X <- cbind(income,welfare,poverty) # columns are variables, rows are cities > row.names(X) <- row.names(City) ο retain observation labels > X[1:5,] # display the first 5 rows of the data matrix π income welfare poverty New.York.NY 29823 13.1 19.3 Los.Angeles.CA 30925 10.7 18.9 Chicago.IL 26301 14.4 21.6 Houston.TX 26261 7.1 20.7 Philadelphia.PA 24603 14.0 20.3 > head(X) # the function head() displays the first 6 rows of a matrix or a data frame. New.York.NY Los.Angeles.CA Chicago.IL Houston.TX Philadelphia.PA San.Diego.CA income welfare poverty 29823 13.1 19.3 30925 10.7 18.9 26301 14.4 21.6 26261 7.1 20.7 24603 14.0 20.3 33686 8.8 13.4 > Xt <- t(X) # columns are cities, rows are variables in πβ² > Xt[,1:5] # displays the values of these variables for the first 5 cities (columns) in the data set New.York.NY income 29823.0 welfare 13.1 poverty 19.3 Los.Angeles.CA Chicago.IL Houston.TX Philadelphia.PA 3 0925.0 26301.0 26261.0 24603.0 10.7 14.4 7.1 14.0 18.9 21.6 20.7 20.3 > apply(X,2,mean) # calculates and displays the sample mean vector for these variables, i.e. the column means of π. income welfare poverty 26976.65 10.21688 17.9753 > var(X) # computes the variance covariance matrix, S, for these variables income welfare poverty income welfare poverty 32202069.73 -14949.50453 -28010.65087 -14949.50 24.90616 23.22700 -28010.65 23.22700 35.80899 > cor(X) # computes the correlation matrix, R, for these variables income welfare poverty income 1.00 -0.5278756 -0.8248696 welfare -0.5278756 1.00 0.7777567 poverty -0.8248696 0.7777567 1.00 10 > sX <- scale(Xt) # scales the variables to have mean 0 and variance 1. Notice the variancecovariance matrix of the scaled variables is the correlation matrix. For many methods we will be examining, it is often necessary to standardize variables first. We will see an example of why this is important below. > var(sX) income welfare poverty income 1.00 -0.5278755 -0.8248696 welfare -0.5278755 1.00 0.7777567 poverty -0.8248696 0.7777567 1.00 Measuring Distance/Similarity Between Cities On the basis of these three measured characteristics (median income, percent of population receiving welfare, and percentage of the population below the poverty line) how can measure how different or similar two cities are, e.g. Detroit, MI and Minneapolis, MN? The Euclidean distance between two vectors π₯π πππ π₯π is given by πππ π‘(π₯π , π₯π ) = βπ₯π β π₯π β = β(π₯π β π₯π )β²(π₯π β π₯π ) > View(City) > xd = X[9,] ο extract data for Detroit > xd income welfare poverty 18742.0 26.1 32.4 > xm = X[47,] ο extract data for Minneapolis > xm income welfare poverty 25324.0 10.5 18.5 > t(xd-xm)%*%(xd-xm) [,1] [1,] 43323161 > sqrt(t(xd-xm)%*%(xd-xm)) ο Euclidean Distance [,1] [1,] 6582.033 Calculations by hand: πππ π‘(π₯π , π₯π ) = β(18742 β 25324)2 + (26.1 β 10.5)2 + (32.4 β 18.5)2 = β43322724 + 243.36 + 193.21 = 6582.03 Clearly the distance between Detroit and Minneapolis (and any other two cities for that matter) is dominated by the median income. Thus the discrepancies between the percent on welfare and below poverty level have little to do with the dissimilarity between these two cities on the basis of these characteristics. 11 If we standardize the variables first, we put them all on the same scale. > sxd = sX[9,] ο Detroit > sxd income welfare poverty -1.451120 3.182602 2.410516 > sxm = sX[47,] ο Minneapolis > sxm income welfare poverty -0.29123182 0.05672995 0.08767880 > sqrt(t(sxd-sxm)%*%(sxd-sxm)) [,1] [1,] 4.063495 πππ π‘(π₯π , π₯π ) = β(β1.45 β .291)2 + (3.183 β .0567)2 + (2.411 β .0877)2 = β3.03 + 9.774 + 5.397 = 4.0635 In the standardized scale the discrepancies between the percentages on welfare and below the poverty level are the largest contributors to the distance between Detroit and Minneapolis. The using a similar process the Euclidean distance between Detroit and Stockton, CA is 8134 and the standardized scale the distance is 2.52. Given the scatterplot below, which measure of discrepancy/distance is more appropriate? Note: These plots were created in JMP. 12