Download shipment - South Asian University

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Capelli's identity wikipedia , lookup

Exterior algebra wikipedia , lookup

Euclidean vector wikipedia , lookup

Linear least squares (mathematics) wikipedia , lookup

Vector space wikipedia , lookup

Covariance and contravariance of vectors wikipedia , lookup

Rotation matrix wikipedia , lookup

System of linear equations wikipedia , lookup

Matrix (mathematics) wikipedia , lookup

Principal component analysis wikipedia , lookup

Jordan normal form wikipedia , lookup

Determinant wikipedia , lookup

Eigenvalues and eigenvectors wikipedia , lookup

Non-negative matrix factorization wikipedia , lookup

Perron–Frobenius theorem wikipedia , lookup

Singular-value decomposition wikipedia , lookup

Orthogonal matrix wikipedia , lookup

Four-vector wikipedia , lookup

Cayley–Hamilton theorem wikipedia , lookup

Gaussian elimination wikipedia , lookup

Matrix multiplication wikipedia , lookup

Matrix calculus wikipedia , lookup

Transcript
International Workshop on Machine Learning and Text Analytics (MLTA2013)
Linear Algebra for Machine Learning and IR
Manoj Kumar Singh
DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS)
Banaras Hindu University (BHU), Varanasi-221005, INDIA.
E-mail: [email protected]
December 15, 2013
South Asian University (SAU), New Delhi.
BHU
Banaras Hindu University
1
DST-CIMS
Content
 Vector Matrix Model in IR, ML and Other Area
 Vector Space
- Formal definition
- Linear Combination - Independence - Generator and Basis
- Example R n ( R), R n (C )
- Dimension - Inner product, Norm, Orthogonality
 Linear Transformation
- Definition
- Matrix and Determinant - LT using Matrix - Rank and Nullity
- Column Space and Row Space - Invertility - Singularity and Non-Singularity – Eigen
Value Eigen Vector - Linear Algebra
 Different Type of Matrix And Matrix Algebra
 Matrix Factorization
 Applications
BHU
Banaras Hindu University
2
DST-CIMS
Vector Matrix Model in IR
A collection consisting of the following five documents is
queried for latent semantic indexing (q):
Classification
d1 = LSI tutorials and fast tracks.
d2 = Books on semantic analysis.
d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing.
d5 = Analysis of latent structures.
Rank documents in decreasing order of relevance to the
query?
Recommendation System:
Item based collaborative filtering
BHU
Banaras Hindu University
Item1
Item2
Item3
Item4
Item5
Alice
5
3
4
4
?
User1
3
1
2
3
3
User2
4
3
4
3
5
User3
3
3
1
5
4
User4
1
5
5
2
1
3
DST-CIMS
Blind Source Separation
Cocktail Party Problem
Confused Computer in Cocktail
Party Situation
What, who from
where
rb
Ga
ag
e
Humans are
capable in steering
hearing attention.
So it is like
identification of
source of interest
but BSS is about
separation of
sources, very near
12
to CPP solution
In Multiple
speaker
environment
microphones
collect garbage
…. a hotchpotch
of speech !
13
Source
x  As,
A  [aij ]mn
Measured
BHU
Banaras Hindu University
4
DST-CIMS
Imaging Application
I ( x, y)
hdet ( x, y )
hoptics (x, y)
Input
Scene
Lens
WG
Array
hWG ( x, y )
Sample and
Hold
Circuit
Rx
hRx ( x, y )
h sh ( x , y )
helec ( x , y ) hdisp ( x, y)
Electronics
Display
heye ( x, y)
O ( x, y )
Human
Eye
Output
Scene
Figure 1. PSF of the components in FPA imaging system
fˆ  H 1 y
BHU
Banaras Hindu University
5
DST-CIMS
Vector Space
Def.:
Algebraic structure (V,F, ,+, , ) with sets V   , F   and binary operations
: VV  V
+:F F  F
 : F  F  F  :F  V  V
is vector space if
(V,) is Abelian Group:
i. Associativity: (a  b)  c=a  (b  c) , a,b,c  V
ii. Identity : e  V s.t. a  e  e  a=a, a  V
1
-1
-1
iii. Inverse : a  V, a  V s.t. a  a  a  a=e
iv. Commutativity: a  b=b  a, a,b  V
(F,+, ) is Field:
i. (F,+) is Abelian Group.
*
*
ii. (F ,) is Abelian Group. Where F  F-{0}
iii. Multiplication operation, , is distributive over +:
a  (b+c)=a  b+a  c, a,b,c  F
Scalar Mult.  satisfy following:
i.   a  V, a  V,   F
ii.   (a  b)=  a    b, a,b  V,   F
iii. (   )  a=  a    a, a  V,  ,   F
iv. (   )  a=  (   a), a  V,  ,   F
v. 1  a=a, a  V, 1 is unity element of F
BHU
Banaras Hindu University
6
DST-CIMS
Vector Space
Linear Algebra:
A vector space (V,F, ,+, , ) is called a linear algebra
over field F if there is an additional operation  : V  V
 V called multiplication of vectors and satisfying the
following postulates:
i. a  b  V, a,b  V
ii. a  (b  c)=(a  b)  c, a,b,c  V
iii. a  (b+c)=(a  b)+(a  c), a,b,c  V
iv.  (a  b)=(  a)  b, a,b  V,   F
If there is an element 1 in V such that
1  a=a  1=a, a  V,
then V is linear algebar with identity. And 1 is called as
the identity of V.
Algebra V(F) is commuatitive if:
a  b=b  a, a,b  V
Note:
1. Elements of V are called as vector and F are scalar.
2. Vector do not mean vector quantity as defined in
vector algebra as directed line segment.
3. We say vector space V over field F and denote it as
V(F).
BHU
Banaras Hindu University
7
DST-CIMS
Vector Space
Subspace :
V(F) be vector space. W  V is called a subspace of V if W(F) is itself a vector space
w.r.t. operation in V(F).
e.g. W={(x,2y,3z): x,y,z  R} is subpsace of R3 (R).
W={(x,y,0): x,y  R} is subpsace of R3 (R).
V(F) be vector space of all n  1 matrices and A be an m  n matrix over F. W={x  V: Ax=o } is subspace of V.
Generator:
V(F) be a vector space and S  V. If U is a subspace of V containing S and is
contained in every subspace V containg S, then U is smallest subspace of V
containing S. This subspace U of V containing S is called as subspace of V
generated or spanned by S and denoted as [S], i.e., U=[S].
Linear Combination:
V(F) be a vector space. Any vector a = α1a1+α2a2 +....+αnan , where α1,α2,..αn  F is called
linear combination of the vectors a1,a2,...,an .
BHU
Banaras Hindu University
8
DST-CIMS
Vector Space
Linear Span:
V(F) be a vector space and S(   )  V. Linear Span of S, L(S), is the set of all linear combinations
of finite sets of elements of S.
L(S) =1a1   2a2  3a3  ...   nan , where {1, 2 , 3 ,.., n }  F and {a1,a2 , a3 ,..,an }  S.
Note: L(S) is subspace of V(F) and L(S)=[S].
Linear Dependence (LD):
V(F) be vector space, and {a1,a2,...,an }  V is said to be LD if 1, 2 ,.., n  F s.t.
1a1+ 2a2 +..+ nan  0; and some  i  0.
Linear Independence (LI):
V(F) be vector space, and {a1,a2,...,an }  V is said to be LI if
1a1+ 2a2 +..+ nan  0,  i  F, 1  i  n
  i  0,  1  i  n.
Basis: S  V(F) is basis of vector space V(F), if i) S consists of LI elements. ii) V=[S]=L(S).
Dimension: V(F) is said to be finite dimensonal if  finite subset S  V such that V=L(S)=[S].
Number of elemnet in the basis of the finite dimensonal V(F) is the dimenson of V(
e.g. S1={(1,0,0),(0,1,0),(0,0,1)} and S2 ={(1,0,0),(1,1,0),(1,1,1)} are basis of R3 (R).
BHU
Banaras Hindu University
9
DST-CIMS
Vector Space
Inner Product
An inner product on vector space V(R/C) is a functiion < >:V×V  R/C, which assigns each
ordered pair of vectors a,b in V a scalar <a, b> such that
i. <a,b>= <a,b> [< > denote complex conjugate]
ii.<αa+βb,c>=α<a,c>+β<b,c>
iii. <α,α>  0 and <α,α>=0  α=0
b
V=C[a,b]. Then inner product:  x,y   x(t )y (t )dt
a
n
V=R . Then inner product of x=(x1,x 2 ,..,x n ), y=(y1,y 2 ,..,y n ) is given as
 x,y  x1y1,+x 2 y 2 +..+x n y n
Norm / Length: Lenght of a vector in V(F):
x = <x,x>
Distance: Distance between two vectors x, y in V(F): d(x,y)= x-y   x-y,x-y 
Note: (V,d) is metric space.
Orthogonality: (V,<>) is inner product and let x, y  V. Vectors x and y are said to be orthogonal
to each other if :
<x,y>=0
Orthogonal set (  0)  LI ; LI 
 Orthogonality;
Orthogonality:  Orthogonal
BHU
Banaras Hindu University

Gram-Schmidt
LI

Orthogonality
x 1
10
DST-CIMS
Linear Transformation
Definition (LT):
U(F) and V(F) be two vector spaces, a Linear Trnsformation from U into V is
function T:U  V such that:
T( x+ y)= T(x)+ T(y);  ,  F and x,y  U
Linear Operator: Linear Operator on V(F) is function T:V  V such that:
T( x+ y)= T(x)+ T(y);  ,  F and x,y  U
Range Space of LT: T:U(F)  V(F) is LT. The range space of T, R(T), is given as follows:
R(T)={T(x)  V: x  U}
Null Space of LT: T:U(F)  V(F) is LT. The null space of T, N(T), is given as follows:
N(T)={x  U: T(x)=0  V}
Note: 1. R(T)  V is subspace of V, N(T)  U is subspace of U.
2. If U(F) is finite dimensonal, then R(T) also finite dimensonal.
Rank and Nullity of LT: 1. Rank: Dimenson of range space of LT.  (T)=dim(R(T)).
2. Nullity: Dimenson of null space of LT.  (T) =dim(N(T)).
Note: T:U(F)  V(F), (T)+ (T)=dim(U)
Non-Singular Transform: A LT T:U  V is non-singular if N(T)={0}, i.e., x  U and T(x)=0  x=0
Singular Transform:
BHU
A LT T:U  V is singular if x  0  U such that T(x)=0
Banaras Hindu University
11
DST-CIMS
Matrices
Definition: A set of mn elements of any field F arranged in the form of a rectangular array having
m rows n columns is called an m  n matrix over the field F.
a11 a12 a1n 
a21 a22 a2n 

A


am1 am 2 amn 


m  n then matrix A called as square matrix.
, A  [aij ]m  n ;
aij for which i  j constitute principal diagonal.
mn
1 0 0 
Unit / Identity Matrix:
0 1 0 
1, i  j
I 
 , aij  
,
0,
i

j



0 0 1 
A  [ ij ]n  n
d1 0 0 
Diagonal Matrix: D  0 0 0  Square A=[aij ]n n for which aij  0 for i  j .
0 0 d3 
33
Scalar Matrix:
BHU
k 0 0 
0 k 0  Diagonal Matrix A=[aij ]n  n for which aii  k for i  j .
S
 A is any matrix and S is sclar matrix then SA = AS = kA


0 0 k  n  n
Banaras Hindu University
12
DST-CIMS
Matrices
Upper Triangular Matrix: Square matrix A=[aij ]n n is upper triangular, if aij  0 whenever i  j .
a11
0
A  0

0

a12 a13
a22 a23
0 a33
0
0
a1n 
a2n 
a3n 

ann  nn
Lower Triangular Matrix: Square matrix A=[aij ]nn is lower triangular if aij  0 whenever i  j.
a11
a21
A  a31

an1

0 0
a22 0
a32 a33
an2 an3
0 
0 
0 

ann  nn
Symmetric : Square matrix A=[aij ]nn is symmetric if aij  a ji , i , j .
a b
D  b e
c d
c
d
f  33
Skew Symmetric: Square matrix A=[aij ]nn is skey symmetric if aij  a ji , i, j.
BHU
Banaras Hindu University
13
 0 h g
D  -h 0 f 
-g -f 0  33
DST-CIMS
Matrices
Transpose : A=[aij ]m n , the n  m matrix obtained from A by changing its rows into columns and
column into rows is called the transpose of A, and is denoted by A' or A T .
1 2 3 
1 2 3 4 
A  2 3 4 1 , AT  2 3 4 
3 4 2 
3 4 2 1 3 4
 4 1 1  43
Trace :
A=[aij ]n  n square matrix. The sum of the main diagonal element of A is the trace of the
n
matrix. tr(A) = aii
i 1
Addition: If A=[a ij ]m  n , B=[b ij ]m  n , then C= A  B is defined as c ij =aij  bij ; (M,+) is abelian group.
Scalar Mult.: k  A=A  k  [k  aij ]m  n
Multiplication: A=[aij ]m  n , B=[b ij ]n  p , A  B is possible when no. of column in A is equal to no. of
rows in B. A  B is n  p matrix C = [c ik ]n  p such that:
n
c ik   aij  b jk
j 1
a11 a12 a1n 
a21 a22 a2n 
th

i row of matrix is denoted by vector ri  (ai 1,ai 2 ,ai 3 , ,ai ,n )
A


am1 am 2 amn 
i th column of matrix is denoted by vector c i  (a1i ,a2i ,a3i , ,ami )


r1 
 
A  r2   c1,c 2,c 3 , ,c n mn
rm 
mn
Row /Column Vector Representation of Matrix:
row vectors r1,r2, ,rm  V (F) and column vectors c1,c 2, ,c m  V (F)
BHU
n
m
Banaras Hindu University
14
mn
DST-CIMS
Matrices
Row Space And Row Rank of Matrix :
Let R={ r1,r2, ,rm } the linear span L(R)  V n (F) is called as row space of the matrix. Row rank of the matrix
r (A) = dim(L(R))  dim(V n (F))=n. r (A)=n ?
{r1,r2 ,...,rm } is LI
Column Space And Column Rank of Matrix :
Let C={ c1,c 2 , ,c m } the linear span L(C)  V m (F) is called as column space of the matrix. Col.
rank of the matrix c (A)=dim(C(R))  dim(V m (F))=m. c (A)=m ? {c1,c 2 ,...,c m } is LI
Rank of Matrix :  (A)  min(r (A), c (A)).
Determinant of Square Matrix:
Let f be a scalar function(not vector or a matrix function) of x1,x 2 , x n , called the determinant of A,
satisfying the following conditions: i). f ( x1,x 2 , , cxi , x n )  cf ( x1,x 2 , , xi , ,x n ), where c is scalar.
This condition means that if any row is multiplied by a scalar then it is equivalent to multyplying the
whole determinant by the scalar. ii). f ( x1,x 2, xi , cxi +x j , x n )  f ( x1,x 2 , , xi , x j , ,x n ). If scalr
multiple of ith row (col.) is added to the jth row (col.) the value of the determinant remains the same.
iii). x i is written as sum of two vectors, x i  y i  z i , then f ( x1,x 2 , ,y i  z i , x n )  f ( x1,x 2 , ,y i , ,x n )
f ( x1,x 2 , , zi , ,x n ). This means that if the i-th row (col.) is split as sum of two vectors(col.),y i  z i ,
then the determinant becomes sum of two determinants. iv). f (e1,e 2 , ,e i , ,e n )  1, w here e1,e2 , ,
ei , ,en are the basic unit vectors. This condition says that the determinant of identity matrix is 1.
BHU
Banaras Hindu University
15
DST-CIMS
Determinant
The conditions (i)-(iv) are called the postulates define the determinant of a square matrix. Standard
notation: |A|, det(A)= determinant of A.
Some Properties of Determinant:
a
a12 
i. Determinant of a 2  2 matrix A=  11
is |A|=ad-bc.
a21 a22 
ii. The det. of square null matrix is zero. Determinant of a square matrix with one or more rows or
column null is zero.
iii. The determinant of a diagonal matrix is the product of the diagonal elements.
iv. The determinant of traingular matrix is the product of the diagonal elements.
v. If any two row (col.) are interchange then the value of determinant of the new matrix is -1 times the
value of original determinat.
v. The value of determinant of a matrix of real numbers can be negative, positive, or zero.
From postulate (ii) the value of determinant remains the same if any multiple of any row (col.) added
to any other row (col.). Thus if one or more rows (col.) are LD on other rows (col.) then these dependent
rows (col.) can be made null be linear operations. Then the determinant is zero.
vi. |A|  0 iff all rows (col.) form LI set of vectors. And hence  r (A)=c (A)=n.
vii. For n  n matrices A and B: |AB|=|A||B|
viii. A be any nxn matrix. Then matrix B, if exists, such that, AB=BA =In , denoted as B=A 1.
1
ix. AA 1  A 1A=I  |AA 1 || A 1A|=|I|=1  |A||A 1 | 1  |A 1 |
|A |
BHU
Banaras Hindu University
16
DST-CIMS
Cofactor Expansion
Minors:
A=[aij ] be m  n matrix. Delete m rows and n columns, m<n. The determinant of the resulting submatrix is
called a minor. If the ith row and jth columns ar deleted then the determinant of the resulting submatrix is
called the minor of a ij .
2 0 1
A=  1 2 4  then 2 4  minor of a11, 1 4  minor of a22
1 5
0 5
0 1 5 
Leading Minors:
If the submatrices are formed by deleting the rows and columns from 2nd onward, from the 3rd onward,
and show on then the corresponding minors are called the leading minors.
2 0 1
2
0
2,
,1 2 4
1 2 0 1 5
Cofactors :
Let A=[a ij ] be nxn matrix. The cofactor of aij is defined as (-1)i  j times the minor a ij . That is, if the cofactor
and minor of a ij is denoted by C ij and Mij respectively then:
Cij  ( 1)i  j Mij
BHU
Banaras Hindu University
17
DST-CIMS
Cofactor Expansion
Evaluation of Determinant:
Let A=[aij ] be nxn matrix, and cofactor and minor of a ij is denoted by C ij and Mij . Then
A  a11 C11  a12 C12 
 a1n C1n
 a11 M11  a12 M12 
 a i 1 Ci 1  a i 2 Ci 2 
 ( 1)n 1a1n M1n
 ain Cin
 ai 1( 1)i 1 Mi 1  ai 2 ( 1)i  2 MI 2 
Cofactor Matrix:
 ( 1)i  n a in Min ; for i=1,2,
n
Let A=[aij ] be nxn matrix, and cofactor of aij is denoted by Cij . Then the cofactor matrix of A, cof(A):
|C11| |C12 |
|C21| |C22 |
cof(A)= 

|Cn1| |Cn 2 |
|C1n | 
|C2n |


|Cnn |
Inverse of Matrix: Let A=[aij ] be nxn matrix, the inverse of A, if it exist, is given by:
A 1 
1
[cof(A)]T , |A|  0
|A|
Singular and Non Singular Matrix:
A sqaure matrix A =[a]n  n is said to be non-singular or singular according as |A|  0 or |A|=0
BHU
Banaras Hindu University
18
DST-CIMS
Cofactor Expansion
Rank of Matrix:
A number r is said to be the rank of a matrix A if it possesses the following two properties:
i. There is at least one square submatrix of A of size rxr whose det. is not zero.
ii. If matrix contain any square submatrix of size (r+1)x(r+1), then the det. of every such square matrix
must be zero.
Invertbility of Matrix: Following are equivalent statement:
  A is non-singular   (A)=n  r (A)=n  c (A)=n
A 1 exist  AA 1  A 1A  I  |A|=0
 R={r1,r2 ,
BHU
rn } LI  C={c1,c 2 , ,c n } is LI.
Banaras Hindu University
19
DST-CIMS
LT using Matrix
T:U(F)  V(F) is LT, B = {1, 2 , , n } and B' = {1,  2 ,
, m } be ordered bases for U and V. Then
 i  B, each of the n vectors T( j )  V is uniquely expressed as linear combination of elements
of B'.
T( j )=a1 j 1  a2 j  2 
T(1 ) 


T(

)
2






T( n )
m
 amj  m i.e. T( j )= aij  i
a11 a21 am1 
a21 a22 am2 

= 


am1 am 2 amn 


 1 
 
 2  ;
 
 
mn  m 
i 1
T   [T; B; B']  matrix of T relative to B, B'.
Example: Let T be a LT on vector space V2 (F) be defined by T(a,b)=(a,0). Find matrix of T relative to
standard bases B={e1, e 2 }={(1,0),(0,1)}
T(e1 )=T(1,0)=(1,0)=1(1,0)+0(0,1)=1e1  0e2 
T(e1 )   1 0  e1 

  0 0  e 
T(e2 )=T(0,1)=(0,0)=0(1,0)+0(0,1)=0e1  0e2 
T(e
)
 2 
 2
The matrix of T relative to ordered basis B =TB =[T;B]= 
1 0
.
0 0 
BHU
Banaras Hindu University
20
DST-CIMS
Eigen Value and Eigen Vector
Eigen Value and Eigen Vector of LT:
  V, such that
Let T:V(F)  V(F) be LT. The scalar c  F is called a eigen value of T if  x(=0)
T(x)=cx
Then vector x is called as eigen vector corresponding to eigen value c.
T(x)=cx  T(x)=cI(x), where I is identity transform.  T(x)-cI(x)=0  (T-cI)(x)=0  T'(x)=0.,
where T' is LT and T'=T-cI.
x  V such that T'(x)=0  T' is singular.  det(T')=0
Eigen Value and Eigen Vector of Matrix:
Let A be nxn matrix. Consider Eq.
Ax= x
where  is scalar and x is an nx1 vector. Null vector is trivial solution of this equation. If the equation has
solution for a  and for a non-null x then  is called an eigenvalue or characteristic or latent root of A.
And the Non-null x satisfying equation for that particular  is called eigenvector or characteristic vector or
latent vector corresponding to that eigenvalue .
Ax= x  Ax=I(x)  (A-I)x=0 is homogeneous linear equation have non-null solution
 A-I is singular  A-I  0
BHU
Banaras Hindu University
21
DST-CIMS
Eigen Value and Eigen Vector
Properties:
i. The eigenvalues of a diagonal values of a matrix are its diagonal element.
ii. The eigenvalues of a triangular (upper or lower) matrix are its diagonal elements.
iii. The eigenvalues of a scalar matrix with the diagonal elements c each are c repeated n times.
iv. The eigenvalues of a Identity matrix are 1 repeated n times.
v. |A|=1  2  3
n
vi. Matrix A is sigular if atleast its one eigenvalue is zero.
vii. tr(A)=a11 + a 22 +
+a nn  1  2  3 
 n .
viii. A and AT have the same eigenvalues.
ix. The eigenvectors corresponding different eigenvalue are LI.
x. If x1, x 2 are two eigenvector corresponding same eigenvalue then c 1x1  c2 x2 is also
eigenvector for same eigenvalue.
xi. Eigen value of real symmetric matrix is real.
x. Eigenvectors corresponding different eigenvalue of real symmetric matrix are orthogonal.
BHU
Banaras Hindu University
22
DST-CIMS
Similarity of Matrix
Def.
Let A and B be square matrices of order n. Then B is said to similar to A if there exists a non-
singular matrix P such that
B = P 1AP.
Note: 1. Similarity is equivalence relation.
2. If matrix A is similar to diagonal matrix D, then diagonal elements of D are eigenvalues of A.
Diagonalizable Matrix:
A matrix A is said to be diagonalizable if it is similar to a diagonal matrix. Thus A diagonalizable
if there exists an invertable matrix P such that
P 1AP=D, where D is a diagonal matrix.
i.
A nxn matrix is diagonalizable iff it possesses n LI eigenvector.
ii. If eigenvalues of an nxn matrix are all distinct then it is always similar to diagonal matrix.
iii. Two nxn matrices with the same set of n distinct eigenvalues are similar.
iv. P 1AP=D  A=PDP 1 is EVD.
v. (Spectral Decomposition for Symmetric Matrix): Square symmetric matrix A can be expressed
in terms of its eigenvalue-eigenvector pairs (i ,e i ) as
A= 1e1eT1  2e2eT2  3 e3 eT3 
BHU
Banaras Hindu University
 n en eTn
23
DST-CIMS
Similarity of Matrix
Singular Value Decomposition
A singular value and corresponding singular vectors of a rectangular matrix A are, respectively, a scalar σ and a
pair of vectors u and v that satisfy Av= u and A u= v
T
With the singular values on the diagonal of a diagonal matrix Σ and the corresponding singular vectors forming
the columns of two orthogonal matrices U and V, we have : AV=  U and A TU=  V
Since U and V are orthogonal, this becomes the singular value decomposition: A=U  V T
Def.: Every mxn matrix A can be written A =U  V T where U is mxm, V is nxn orthogonal matrices and
 is mxn diagonal matrix.
Note: 1. Diagonal Element of  termed as singular values of A.
2. Using SVD directly we get A T A =(U  V T )T (U  V T )=V  2 V T and AA T =U  2 UT .
 Columns of U and V represent the eigenvectors of AA T and A T A respectively, and the diagonal
entries of  2 represent their set of eigenvalues.
BHU
Banaras Hindu University
24
DST-CIMS
Similarity of Matrix
Cholesky Factorization:The Cholesky factorization expresses a symmetric matrix as the product of a triangular
matrix and its transpose.
A  RT R
where R is an upper triangular matrix. Not all symmetric matrices can be factored in this way; the matrices that
have such a factorization are said to be positive definite. The Cholesky factorization allows the linear system:
Ax=b to be replaced by RT Rx  b
to form triangular system of equation. Solved easily by forward and
backward substitution.
LU Factorization: LU factorization, or Gaussian elimination, expresses any square matrix A as the product of
a permutation of a lower triangular matrix and an upper triangular matrix
A=LU
where L is a permutation of a lower triangular matrix with ones on its diagonal and U is an upper triangular matrix.
A  L U  U  u11  u22 
 unn
and A 1  U1L1
QR Factorization: The orthogonal, or QR, factorization expresses any rectangular matrix as the product of
an orthogonal or unitary matrix and an upper triangular matrix.
A=QR
where Q is orthogonal or unitary, R is upper triangular.
BHU
Banaras Hindu University
25
DST-CIMS
APPLICATION
Documents Ranking
BHU
Banaras Hindu University
26
DST-CIMS
Documents Ranking
Rank documents in decreasing order of relevance to the query?
A collection consisting of the following five documents:
d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing.
d5 = Analysis of latent structures.
queried for latent semantic indexing (q).
Decreasing order of cosine similarities
Assume that:
1. Documents are linearized, tokenized, and their stop words removed. Stemming is not used. Survival terms
are used to construct a term-document matrix A. This matrix is populated with term weights :
aij  Lij Gi N j
Lij  fij , where frequency of term, i ,in document j . This is so-called FREQ model.
Gi  log(D / d i ), where D is the collection size and d i is the number of documents conatining term i .
This is so called IDF model. IDF Satand for Inverse Document Frequency.
N j  1 / l; i.e. document lengths are normalized to 1/l. In general, l is the so called L 2  norm or Frobenius
length.
aij  fij log(D / d i )N j .
BHU
Banaras Hindu University
27
DST-CIMS
Documents Ranking
2. Query terms are scored using FREQ; i.e., aiq  Liq  fiq , where fiq is the frequency of term i in the query q.
Procedure:
1. Compute A and q.
2. Normalize the document vectors query vector.
A  An
q  qn
3. Compute qnT A n .
where n denotes normalized vector.
Term-Document Matrix
Documents in collection:
d1 = LSI tutorials and fast tracks.
d2 = Books on semantic analysis.
d3 = Learning latent semantic indexing.
d4 = Advances in structures and advances in indexing.
d5 = Analysis of latent structures.
BHU
Banaras Hindu University
LSI
Tutorials
fast
tracks
books
semantic
analysis
learning
latent
indexing
advances
structures
28
d1
1
1
1
1
0
0
0
0
0
0
0
0
d2
0
0
0
0
1
1
1
0
0
0
0
0
d3
0
0
0
0
0
1
0
1
1
1
0
0
d4
0
0
0
0
0
0
0
0
0
1
2
1
d5
0
0
0
0
0
0
1
0
1
0
0
1
DST-CIMS
Documents Ranking
Step1:
A=
Weight Matrix
d1
d2
d3
d4
d5
d1
d2
d3
d4
d5
d1
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
0
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
0
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
0
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
0
0
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
1log(5/2) 1log(5/2)
0
0
0
0.3979
0.3979
0
0
0
1log(5/2)
0
0
1log(5/2)
0
0.3979
0
0
0.3979
0
0
0
1log(5/1)
0
0
0
0
0.6990
0
0
0
0
0
1log(5/2)
0
1log(5/1)
0
0
0.3979
0
0.6990
1
0
0
1log(5/2) 1log(5/1)
0
0
0
0.3979
0.6990
0
1
0
0
0
2log(5/1)
0
0
0
0
1.3980
0
0
0
0
0
1log(5/1) 1log(5/1)
0
0
0
0.6990
0.6990
0
BHU
Banaras Hindu University
=
29
q=
DST-CIMS
0
1
Documents Ranking
Frobenius norm(L 2  norms, Euclidean lengths) of documents:
Step2: Normalization:
A n=
qnT =
0
BHU
0
d1
d2
d3
d4
d5
d1
0.5000
0
0
0
0
0
0.5000
0
0
0
0
0
0.5000
0
0
0
0
0
0.5000
0
0
0
0
0
0
0.7790
0
0
0
0
0.4434
0.4054
0
0
0
0.4434
0
0
0.5774
0
0
0
0.7121
0
0
0
0
0
0.4054
0
0.5774
0.5774
0
0
0.4054
0.2640
0
0.5774
0
0
0
0.9277
0
0
0
0
0
0.2640
0.5774
0
0.5774
0
0
0
Banaras Hindu University
0
qn=
0
30
0
0.5774
0.5774 0.5774
0
0
DST-CIMS
Documents Ranking
Step3: Compute
qnT An
Documents rank as follows:
qnT
A n=
d1
d2
d3
d4
d5
0
0.2560
0.7022
0.1524
0.3334
d3  d5  d2  d4  d1
Exercises
1. Repeat the above calculations, this time including all stopwords. Explain any difference in computed results.
2. Repeat the above calculations, this time scoring global weights using IDF probabilistic (IDFP):
Gi  log((D  d i ) / d )
Explain any difference in computed results.
BHU
Banaras Hindu University
31
DST-CIMS
APPLICATION
Latent Semantic Indexing (LSI)
Using SVD
BHU
Banaras Hindu University
32
DST-CIMS
Latent Semantic Indexing
 Use of LSI to cluster term, and
find the terms that could be used to expand or
reformulate the query.
Example: Collection consist of following documents:
d1 = Shipment of gold damaged in a fire.
Assume that the query is gold silver truck.
d2 = Delivery of silver arrived in a silver truck.
d3 = Shipment of gold arrived in a truck.
SVD
Every matrix A of dimensions m  n m  n  can be decomposed as : A=U  VT
where
- U has dimension m  m, and col. are orthogonal, ie. UUT  UT U  Im  m .
-  has dimension m  n, the only non-zero elements are on main daiagonal.
- V has dimension n  n and its col. are orthogonal, i.e. VVT  V TV  In  n
A  Up  p VpT
- Up is m  p, with orthogonal col.
-  p is p  p, and diagonal.
- Vp n  p with orthogonal col.
BHU
Banaras Hindu University
33
DST-CIMS
Latent Semantic Indexing
(Procedure)
Step1: Score term weights and construct the term – document matrix A and query matrix.
a
arrived
damaged
delivery
fire
gold
in
of
shipment
silver
truck
BHU
d1
1
0
1
0
1
1
1
1
1
0
0
Banaras Hindu University
d2
1
1
0
1
0
0
1
1
0
2
1
d3
1
1
0
0
0
1
1
1
1
0
1
A=
1
0
1
0
1
1
1
1
1
0
0
1
1
0
1
0
0
1
1
0
2
1
34
1
1
0
0
0
1
1
1
1
0
1
q=
0
0
0
0
0
1
0
0
0
1
1
DST-CIMS
Latent Semantic Indexing
(Procedure)
Step2-1: Decompose matrix A using SVD procedure into U, S and V matrices. A=U  V
A=
1
0
1
0
1
1
1
1
1
0
0
1
1
0
1
0
0
1
1
0
2
1
1
1
0
0
0
1
1
1
1
0
1
=
U=
V=
BHU
Banaras Hindu University
-0.49447
-0.64918
-0.57799
-0.64582
0.719447
-0.25556
-0.58174
-0.24691
0.774995
35
DST-CIMS
Latent Semantic Indexing
(Procedure)
A=U  V
Step2-2: Decompose matrix A using SVD procedure into U, S and V matrices.
-0.42012
-0.0748
-0.04597
-0.29949 0.200092 0.407828
-0.12063 -0.27489
-0.4538
-0.15756 0.304648 -0.20065
U=
-0.12063 -0.27489
-0.4538
-0.26256 -0.37945 0.154674
-0.42012
-0.0748
-0.04597
-0.42012
-0.0748
-0.04597
=
4.098872
0
0
0
2.361571
0
0
0
1.273669
-0.49447 -0.64918 -0.57799
V=
-0.64582 0.719447 -0.25556
-0.58174 -0.24691 0.774995
-0.26256 -0.37945 0.154674
-0.31512 0.609295 -0.40129
-0.29949 0.200092 0.407828
Step3: Rank 2 Approximation :
Uk=
BHU
-0.42012
-0.29949
-0.12063
-0.15756
-0.12063
-0.26256
-0.42012
-0.42012
-0.26256
-0.31512
-0.29949
-0.0748
0.200092
-0.27489
0.304648
-0.27489
-0.37945
-0.0748
-0.0748
-0.37945
0.609295
0.200092
Banaras Hindu University
k =
4.098872
0
0 2.361571
36
Vk=
-0.49447
-0.64918
-0.64582
0.719447
-0.58174
-0.24691
DST-CIMS
Latent Semantic Indexing
(Procedure)
Step 4: Find the new term vector coordinates in this reduced 2-dimensonal space.
Rows of U holds eigenvector values. These are coordinates of the individual term vectors. Thus from the
reduced matrix (Uk) :
1
2
3
4
5
6
7
8
9
10
11
a
arrived
Damaged
delivery
fire
gold
in
of
shipment
silver
truck
-0.42012
-0.29949
-0.12063
-0.15756
-0.12063
-0.26256
-0.42012
-0.42012
-0.26256
-0.31512
-0.29949
-0.0748
0.200092
-0.27489
0.304648
-0.27489
-0.37945
-0.0748
-0.0748
-0.37945
0.609295
0.200092
Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. Using
q=qTUk Sk1
q=
0 0 0 0 0 1 0 0 0 1 1
BHU
Banaras Hindu University
-0.42012
-0.29949
-0.12063
-0.15756
-0.12063
-0.26256
-0.42012
-0.42012
-0.26256
-0.31512
-0.29949
-0.0748
0.200092
-0.27489
0.304648
-0.27489
-0.37945
-0.0748
-0.0748
-0.37945
0.609295
0.200092
 1
 4.0989

 0

37


1 

2.3616 
0
= [-0.2140
-0.1821
DST-CIMS
]
Latent Semantic Indexing
(Procedure)
Step 6: Group terms into clusters
Grouping is done by comparing cosine angles between any two pair of vectors.
The following clusters are obtained:
1. a, in of
2. gold, shipment
3. damaged, fire
4. arrived, truck
5. silver
6. delivery
BHU
Banaras Hindu University
Some vectors are not shown since these are completely
superimposed. This is the case of points 1 – 4.
If unit vectors are used and small deviation ignored, clusters
3 and 4 and clusters 4 and 5 can be merged.
38
DST-CIMS
Latent Semantic Indexing
(Procedure)
Step 7: Find terms that could be used to expand or reformulate the query
The query is gold silver truck. Note that in relation to the query, clusters 1, 2 and 3 are far away from
the query. Similarity wise these could be viewed as belonging to a “long tail”. If we insist in combining
these with the query, possible expanded queries could be
gold silver truck shipment
gold silver truck damaged
gold silver truck shipment damaged
gold silver truck damaged in a fire
shipment of gold silver truck damaged in a fire
etc…
Looking around the query, the closer clusters are 4, 5, and 6. We could use these clusters to expand
or reformulate the query. For example, the following are some of the expanded queries one could test.
gold silver truck arrived
delivery gold silver truck
gold silver truck delivery
gold silver truck delivery arrived etc…
Documents containing these terms should be more relevant to the initial query.
BHU
Banaras Hindu University
39
DST-CIMS
APPLICATION
Latent Semantic Indexing (LSI)
Exercise
BHU
Banaras Hindu University
40
DST-CIMS
Latent Semantic Indexing
(Exercise)
The svd was the original factorization proposed for Latent Semantic Indexing (LSI), the process of replacing
a term-document matrix A with a low-rank approximation Ap which reveals implicit relationships among
documents that don’t necessarily share common terms. Example:
Term
D1
D2
D3
D4
D5
twain
53
65
0
30
1
clemens
10
20
40
43
0
huckleberry
30
10
25
52
70
 A query on clemens will retrieve D1, D2, D3, and D4.
 A query on twain will retrieve D1, D2, and D4.
For p = 2, the svd gives
Term
twain
clemens
huckleberry
D1
49
23
25
D2
65
22
9
D3
7
14
34
D4
34
30
57
D5
-5
21
63
 Now a query on clemens will retrieve all documents.
 A query on twain will retrieve D1, D2, D4, and possibly D3.
 The negative entry is disturbing to some and motivates the nonnegative factorizations.
BHU
Banaras Hindu University
41
DST-CIMS
References
1.
Linear Algebra –I module 1, Vector and Matrices, by A.M. MATHAI, Centre for Mathematical Sciences
(CMS) Pala.
2.
Linear Algebra –II module 2, Determinants and Eigenvalues by A.M. MATHAI, Centre for Mathematical
Sciences (CMS) Pala.
3.
Introduction to Linear Algebra, Wellesley – Cambridge Press, 1993.
4.
Matrix Computation, C. Golub and C. Van Loan, Johns Hopkins University Press, 1989.
5.
Linear Algebra, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.
6.
Matrices, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan.
7.
Linear Algebra, Ramji Lal, Sail Publication, Allahabad.
8.
An Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze,
Cambridge University Press.
BHU
Banaras Hindu University
42
DST-CIMS