Download Matrix Algebra in S-Plus

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Matrix Algebra in R
Entering a vector 𝒙
> x <- c(3,1,3,2,-1,5)
> x
[1] 3 1 3 2 -1 5
Subsetting from a Vector
> x = c(3,1,3,2,-1,5)
> x[4]
[1] 2
> x[1:4]
[1] 3 1 3 2
> x[c(4,6)]
[1] 2 5
> x[-c(4,6)]
[1] 3 1 3 -1
> x[-1]
[1] 1 3
2 -1
5
> x[x>2]
[1] 3 3 5
> x[x<0]
[1] -1
Find the length of 𝒙 𝒐𝒓 ‖𝒙‖
> Lx <- sqrt(t(x)%*%x) οƒŸnote that t(x)denotes 𝒙′
> Lx
[,1]
[1,] 7
Outer Product of a Vector (𝒙𝒙′)
> x = c(3,1,3,2,-1,5)
> x%*%t(x)
οƒŸ The vector 𝒙 𝑖𝑠 6 × 1 π‘Žπ‘›π‘‘ 𝒙′ 𝑖𝑠 1 × 6 thus 𝒙𝒙′ 𝑖𝑠 6 × 6.
[,1] [,2] [,3] [,4] [,5] [,6]
[1,]
9
3
9
6
-3
15
[2,]
3
1
3
2
-1
5
[3,]
9
3
9
6
-3
15
[4,]
6
2
6
4
-2
10
[5,]
-3
-1
-3
-2
1
-5
[6,]
15
5
15
10
-5
25
1
Entering a matrix A
> A
1:
4:
7:
10:
<- matrix(scan(),nrow=3,ncol=3,byrow=T)
1 3 6
2 -1 5
3 2 6
<enter>
> A
[1,]
[2,]
[3,]
[,1] [,2] [,3]
1
3
6
2
-1
5
3
2
6
Subsetting from a Matrix (or Data Frame)
> A[1,]
[1] 1 3 6
> A[3,]
[1] 3 2 6
> A[1:2,]
[,1] [,2] [,3]
[1,]
1
3
6
[2,]
2
-1
5
> A[c(1,3),]
[,1] [,2] [,3]
[1,]
1
3
6
[2,]
3
2
6
> A[,1]
[1] 1 2 3
> A[,3]
[1] 6 5 6
> A[,1:2]
[,1] [,2]
[1,]
1
3
[2,]
2
-1
[3,]
3
2
> A[,c(1,3)]
[,1] [,2]
[1,]
1
6
[2,]
2
5
[3,]
3
6
> A[2,3]
[1] 5
2
Vector times a matrix (𝑨𝒙)
> x = c(2,1,3)
> A%*%x
# note the matrix multiplication operator in R is %*%
[,1]
[1,]
23
[2,]
18
[3,]
26
Transpose of a matrix (𝑨′)
> At <- t(A)
> At
[,1] [,2] [,3]
[1,]
1
2
3
[2,]
3
-1
2
[3,]
6
5
6
Matrix Multiplication
> AtA <- At %*% A
> AtA
[,1] [,2] [,3]
[1,] 14
7
34
[2,]
7
14
25
[3,] 34
25
97
οƒŸ Compute 𝑨′𝑨
> A%*%t(A)
οƒŸ Compute 𝑨𝑨′
[,1] [,2] [,3]
[1,]
46
29
45
[2,]
29
30
34
[3,]
45
34
49
> B = matrix(scan(),nrow=3,ncol=2)
1: 1 2
3: 0 -1
5: -1 3
7:
Read 6 items
> A%*%B
[,1] [,2]
[1,]
7
14
[2,]
0
14
[3,]
7
13
> B%*%A
Error in B %*% A : non-conformable arguments
3
Inverse of a Matrix π‘¨βˆ’πŸ
> Ainv <- solve(A)
> Ainv
[,1]
[,2]
[1,] -0.45714286 -0.1714286
[2,] 0.08571429 -0.3428571
[3,] 0.20000000 0.2000000
[,3]
0.6
0.2
-0.2
Verify that π‘¨βˆ’πŸ 𝑨 𝒂𝒏𝒅 π‘¨π‘¨βˆ’πŸ = 𝑰
> Ainv%*%A
[,1]
[,2]
[1,]
1 2.220446e-16
[2,]
0 1.000000e+00
[3,]
0 -1.110223e-16
> A%*%Ainv
[,1] [,2]
[1,] 1.000000e+00
0
[2,] -1.110223e-16
1
[3,] 0.000000e+00
0
[,3]
4.440892e-16
0.000000e+00
1.000000e+00
[,3]
0.000000e+00
2.220446e-16
1.000000e+00
Eigenvalues and Eigenvectors
> eigA <- eigen(A)
> eigA
$values:
[1] 9.894725 -2.452321 -1.442404
$vectors:
[,1]
[,2]
[,3]
[1,]
0.6619680
-1.1067100 -0.95589971
[2,]
0.4651549
0.9263453
0.01641638
[3,]
0.7487598
0.1736137
0.38090735
> P <- eigA$vectors # store matrix of eigenvectors in 𝑃
> lam <- eigA$values # store eigenvalues in a vector lam
> e1 <- e[,1]
# extract first eigenvector (i.e. column) from 𝑃
> lam1 <- lam[1]
# extract first eigenvalue from lam
> A%*%e1
[,1]
[1,] 6.549991
[2,] 4.602580
[3,] 7.408772
> lam1*e1
# Note that π‘¨π’†πŸ = π€πŸ π’†πŸ
[1] 6.549991 4.602580 7.408772
Determinant of a matrix (we actually write our own function!)
Use fact that det(𝐴) = |𝐴| = βˆπ‘˜π‘–=1 πœ†π‘–
> det <- function(x) prod(eigen(x)$values)
> det(A)
[1] 35
4
Spectral Decomposition of a Symmetric Matrix A
Example 1:
> A <- matrix(scan(),ncol=3,nrow=3) # Note A is a 3 × 3 symmetric matrix
1: 1 -2 3
4: -2 4 -1
7: 3 -1 5
10: <enter>
> eigA <- eigen(A) # performs and eigen-analysis of the matrix A
> eigA
$values:
[1] 7.5895980 3.3838454 -0.9734434
$vectors:
[,1]
[,2]
[,3]
[1,] 0.4799416 0.0151370
0.8771698
[2,] -0.4732062 -0.8374647 0.2733656
[3,] 0.7387367 -0.5462817 -0.3947713
> P <- eigA$vectors
# create matrix P which has columns equal to the eigenvectors of A
> t(P)%*%P
[,1]
[,2]
[,3]
[1,] 1.000000e+00 5.551115e-17 0.000000e+00
[2,] 5.551115e-17 1.000000e+00 8.326673e-17
[3,] 0.000000e+00 8.326673e-17 1.000000e+00
> P%*%t(P)
[,1]
[,2]
[,3]
[1,] 1.000000e+00 -8.326673e-17 5.551115e-17
[2,] -8.326673e-17 1.000000e+00 6.938894e-17
[3,] 5.551115e-17 6.938894e-17 1.000000e+00
Note that 𝑃′ 𝑃 = 𝑃𝑃′ = 𝐼, thus 𝑃 is an orthogonal matrix.
> Lam <- diag(eigA$values)# this command creates a diagonal matrixο€¬ο€ οŒο€¬ with the eigenvalues
for the diagonal entries.
πœ†1
Ξ› = (0
0
> Lam
0
πœ†2
0
0
0)
πœ†3
[,1]
[,2]
[,3]
[1,] 7.589598 0.000000 0.0000000
[2,] 0.000000 3.383845 0.0000000
[3,] 0.000000 0.000000 -0.9734434
> P %*% Lam %*% t(P) # shows A = Pο€ οŒ P'
[1,]
[2,]
[3,]
[,1] [,2] [,3]
1
-2
3
-2
4
-1
3
-1
5
5
Example 2:
> A <- matrix(scan(),ncol=2,nrow=2)
1: 2.0 2.0
3: 2.0 2.5
5: <enter>
> eigA <- eigen(A)
> eigA
$values:
[1] 4.2655644 0.2344356
$vectors:
[,1]
[,2]
[1,] 0.6618026 0.7496782
[2,] 0.7496782 -0.6618026
> P <- eigA$vectors
e1 <- P[,1]
# store the eigenvectors e1 and e2 by extracting the first and
second columns of P
e2 <- P[,2]
> A%*%e1
ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ ο€ [,1]
[1,] 2.822961
[2,] 3.197801
ο€ 
> eigA$values[1]*e1
[1] 2.822961 3.197801ο€ 
> e1%*%e1
[,1]
[1,]
1
> e1%*%e2
[,1]
[1,]
0
> P%*%t(P)
# compute Aeο€±ο€ 
# this shows Aeο€± = eο€±ο€ 
# this shows eο€± is normal (i.e. unit length ‖𝑒1 β€– = 1)
# this shows e1 and e2 are orthogonal
# this shows P is an orthogonal matrix (t(P)%*%P) would work
also, i.e. 𝑃′ 𝑃 = 𝑃𝑃′ = 𝐼.
[,1] [,2]
[1,] 1
0
[2,] 0
1
> Lam <- diag(eigA$values) # create the diagonal matrix οŒο€ 
ο€ 
> P%*%Lam%*%t(P) # shows A = PP’ (i.e. the spectral decomposition)ο€ 
[,1] [,2]
[1,]
2 2.0
[2,]
2 2.5
6
Square Root Matrix (and its inverse) - 𝐴1⁄2 and π΄βˆ’1⁄2
𝐴1⁄2 = 𝑃Λ1⁄2 𝑃′ and π΄βˆ’1⁄2 = π‘ƒΞ›βˆ’1⁄2 𝑃′
> A = matrix(scan(),nrow=2,ncol=2,byrow=T)
1: 2 2
3: 2 2.5
5:
Read 4 items
> eigA = eigen(A)
> P = eigA$vectors
> Lam = diag(eigA$values)
> sqrtA <- P%*%Lam^(1/2)%*%t(P) # creates the square root matrix
> sqrtA
[,1]
[,2]
[1,] 1.1766968 0.7844645
[2,] 0.7844645 1.1766968
> sqrtA%*%sqrtA
[,1] [,2]
[1,]
2 2.0
[2,]
2 2.5
A1 / 2
# shows A = A1/2 A1/2
> Lam.5 = diag(1/sqrt(eigA$values))
> sqrtAinv = P%*%Lam.5%*%t(P)
> sqrtAinv
[,1]
[,2]
[1,] 1.3728129 -0.7844645
[2,] -0.7844645 1.1766968
> Ainv = solve(A)
> Ainv
[,1] [,2]
[1,] 2.5
-2
[2,] -2.0
2
1
# creates the inverse square root matrix π΄βˆ’2
# find π΄βˆ’1
> sqrtAinv%*%sqrtAinv
[,1] [,2]
[1,] 2.5
-2
[2,] -2.0
2
1
1
# shows π΄βˆ’2 π΄βˆ’2 = π΄βˆ’1
1
1
1
1
> sqrtA%*%sqrtAinv
# shows π΄βˆ’2 𝐴2 = 𝐴2 π΄βˆ’2 = 𝐼
[,1] [,2]
[1,] 1.000000e+00
0
[2,] -2.220446e-16
1
> sqrtAinv%*%sqrtA
[,1]
[,2]
[1,]
1 -2.220446e-16
[2,]
0 1.000000e+00
7
Drawing Contours
> theta <- seq(0,2*pi,len=100)
# these commands generate 100 vectors of unit length
centered at origin.
> x <- cos(theta)
> y <- sin(theta)
> par(pty="s")
# creates square plotting region
> plot(x,y,xlim=c(-4,4),ylim=c(-4,4),type="l")
# x and y axis limits (-4,4) and plots line (type=”l”)
instead of points (type = β€œp” which is the default).
> points(0,0,pch=16)
> Ax <- A%*%rbind(x,y)
> lines(t(Ax))
# puts a large point at the origin
# multiply set of unit length vectors by A
# draw ellipse formed by vectors from previous step
Connection to Eigenvalues and Eigenvectors
> eigA
$values:
[1] 4.2655644 0.2344356
$vectors:
[,1]
[,2]
[1,] 0.6618026 0.7496782
[2,] 0.7496782 -0.6618026
> 4.2655644*e1 οƒŸ πœ†1 𝑒1
[1] 2.822961 3.197801
> segments(0,0,2.822961,3.197801)
> .2344356*e2 οƒŸ πœ†2 𝑒2
[1] .1757513 -0.1551501
> segments(0,0,.1757513,-.1551501)
# Where did the segment appear? Why?
# Where did this segment appear? Why?
8
9
Computing Mean Vectors, Variance-Covariance and Correlation Matrices
> attach(City) # attach the City data frame.
> X <- cbind(income,welfare,poverty) # columns are variables, rows are cities
> row.names(X) <- row.names(City) οƒŸ retain observation labels
> X[1:5,]
# display the first 5 rows of the data matrix 𝑋
income welfare poverty
New.York.NY
29823
13.1
19.3
Los.Angeles.CA
30925
10.7
18.9
Chicago.IL
26301
14.4
21.6
Houston.TX
26261
7.1
20.7
Philadelphia.PA 24603
14.0
20.3
> head(X)
# the function head() displays the first 6 rows of a matrix or a data frame.
New.York.NY
Los.Angeles.CA
Chicago.IL
Houston.TX
Philadelphia.PA
San.Diego.CA
income welfare poverty
29823
13.1
19.3
30925
10.7
18.9
26301
14.4
21.6
26261
7.1
20.7
24603
14.0
20.3
33686
8.8
13.4
> Xt <- t(X) # columns are cities, rows are variables in 𝑋′
> Xt[,1:5]
# displays the values of these variables for the first 5 cities (columns) in the data set
New.York.NY
income
29823.0
welfare
13.1
poverty
19.3
Los.Angeles.CA Chicago.IL Houston.TX Philadelphia.PA
3 0925.0
26301.0
26261.0
24603.0
10.7
14.4
7.1
14.0
18.9
21.6
20.7
20.3
> apply(X,2,mean) # calculates and displays the sample mean vector for these variables,
i.e. the column means of 𝑋.
income welfare poverty
26976.65 10.21688 17.9753
> var(X) # computes the variance covariance matrix, S, for these variables
income
welfare
poverty
income
welfare
poverty
32202069.73 -14949.50453 -28010.65087
-14949.50
24.90616
23.22700
-28010.65
23.22700
35.80899
> cor(X) # computes the correlation matrix, R, for these variables
income
welfare
poverty
income
1.00 -0.5278756 -0.8248696
welfare
-0.5278756
1.00
0.7777567
poverty
-0.8248696 0.7777567
1.00
10
> sX <- scale(Xt) # scales the variables to have mean 0 and variance 1. Notice the variancecovariance matrix of the scaled variables is the correlation matrix. For many
methods we will be examining, it is often necessary to standardize
variables first. We will see an example of why this is important below.
> var(sX)
income
welfare
poverty
income
1.00
-0.5278755 -0.8248696
welfare -0.5278755
1.00 0.7777567
poverty -0.8248696 0.7777567
1.00
Measuring Distance/Similarity Between Cities
On the basis of these three measured characteristics (median income, percent of population
receiving welfare, and percentage of the population below the poverty line) how can measure
how different or similar two cities are, e.g. Detroit, MI and Minneapolis, MN?
The Euclidean distance between two vectors π‘₯𝑖 π‘Žπ‘›π‘‘ π‘₯𝑗 is given by
𝑑𝑖𝑠𝑑(π‘₯𝑖 , π‘₯𝑗 ) = β€–π‘₯𝑖 βˆ’ π‘₯𝑗 β€– = √(π‘₯𝑖 βˆ’ π‘₯𝑗 )β€²(π‘₯𝑖 βˆ’ π‘₯𝑗 )
> View(City)
> xd = X[9,] οƒŸ extract data for Detroit
> xd
income welfare poverty
18742.0
26.1
32.4
> xm = X[47,] οƒŸ extract data for Minneapolis
> xm
income welfare poverty
25324.0
10.5
18.5
> t(xd-xm)%*%(xd-xm)
[,1]
[1,] 43323161
> sqrt(t(xd-xm)%*%(xd-xm)) οƒŸ Euclidean Distance
[,1]
[1,] 6582.033
Calculations by hand:
𝑑𝑖𝑠𝑑(π‘₯𝑑 , π‘₯π‘š ) = √(18742 βˆ’ 25324)2 + (26.1 βˆ’ 10.5)2 + (32.4 βˆ’ 18.5)2 = √43322724 + 243.36 + 193.21 = 6582.03
Clearly the distance between Detroit and Minneapolis (and any other two cities for that matter) is
dominated by the median income. Thus the discrepancies between the percent on welfare and
below poverty level have little to do with the dissimilarity between these two cities on the basis
of these characteristics.
11
If we standardize the variables first, we put them all on the same scale.
> sxd = sX[9,] οƒŸ Detroit
> sxd
income
welfare
poverty
-1.451120 3.182602 2.410516
> sxm = sX[47,] οƒŸ Minneapolis
> sxm
income
welfare
poverty
-0.29123182 0.05672995 0.08767880
> sqrt(t(sxd-sxm)%*%(sxd-sxm))
[,1]
[1,] 4.063495
𝑑𝑖𝑠𝑑(π‘₯𝑑 , π‘₯π‘š ) = √(βˆ’1.45 βˆ’ .291)2 + (3.183 βˆ’ .0567)2 + (2.411 βˆ’ .0877)2 = √3.03 + 9.774 + 5.397 = 4.0635
In the standardized scale the discrepancies between the percentages on welfare and below the
poverty level are the largest contributors to the distance between Detroit and Minneapolis. The
using a similar process the Euclidean distance between Detroit and Stockton, CA is 8134 and the
standardized scale the distance is 2.52. Given the scatterplot below, which measure of
discrepancy/distance is more appropriate? Note: These plots were created in JMP.
12
Related documents