* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download shipment - South Asian University
Capelli's identity wikipedia , lookup
Exterior algebra wikipedia , lookup
Euclidean vector wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
Vector space wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Rotation matrix wikipedia , lookup
System of linear equations wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Principal component analysis wikipedia , lookup
Jordan normal form wikipedia , lookup
Determinant wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Four-vector wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Gaussian elimination wikipedia , lookup
International Workshop on Machine Learning and Text Analytics (MLTA2013) Linear Algebra for Machine Learning and IR Manoj Kumar Singh DST-Centre for Interdisciplinary Mathematical Sciences(DST-CIMS) Banaras Hindu University (BHU), Varanasi-221005, INDIA. E-mail: [email protected] December 15, 2013 South Asian University (SAU), New Delhi. BHU Banaras Hindu University 1 DST-CIMS Content Vector Matrix Model in IR, ML and Other Area Vector Space - Formal definition - Linear Combination - Independence - Generator and Basis - Example R n ( R), R n (C ) - Dimension - Inner product, Norm, Orthogonality Linear Transformation - Definition - Matrix and Determinant - LT using Matrix - Rank and Nullity - Column Space and Row Space - Invertility - Singularity and Non-Singularity – Eigen Value Eigen Vector - Linear Algebra Different Type of Matrix And Matrix Algebra Matrix Factorization Applications BHU Banaras Hindu University 2 DST-CIMS Vector Matrix Model in IR A collection consisting of the following five documents is queried for latent semantic indexing (q): Classification d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures. Rank documents in decreasing order of relevance to the query? Recommendation System: Item based collaborative filtering BHU Banaras Hindu University Item1 Item2 Item3 Item4 Item5 Alice 5 3 4 4 ? User1 3 1 2 3 3 User2 4 3 4 3 5 User3 3 3 1 5 4 User4 1 5 5 2 1 3 DST-CIMS Blind Source Separation Cocktail Party Problem Confused Computer in Cocktail Party Situation What, who from where rb Ga ag e Humans are capable in steering hearing attention. So it is like identification of source of interest but BSS is about separation of sources, very near 12 to CPP solution In Multiple speaker environment microphones collect garbage …. a hotchpotch of speech ! 13 Source x As, A [aij ]mn Measured BHU Banaras Hindu University 4 DST-CIMS Imaging Application I ( x, y) hdet ( x, y ) hoptics (x, y) Input Scene Lens WG Array hWG ( x, y ) Sample and Hold Circuit Rx hRx ( x, y ) h sh ( x , y ) helec ( x , y ) hdisp ( x, y) Electronics Display heye ( x, y) O ( x, y ) Human Eye Output Scene Figure 1. PSF of the components in FPA imaging system fˆ H 1 y BHU Banaras Hindu University 5 DST-CIMS Vector Space Def.: Algebraic structure (V,F, ,+, , ) with sets V , F and binary operations : VV V +:F F F : F F F :F V V is vector space if (V,) is Abelian Group: i. Associativity: (a b) c=a (b c) , a,b,c V ii. Identity : e V s.t. a e e a=a, a V 1 -1 -1 iii. Inverse : a V, a V s.t. a a a a=e iv. Commutativity: a b=b a, a,b V (F,+, ) is Field: i. (F,+) is Abelian Group. * * ii. (F ,) is Abelian Group. Where F F-{0} iii. Multiplication operation, , is distributive over +: a (b+c)=a b+a c, a,b,c F Scalar Mult. satisfy following: i. a V, a V, F ii. (a b)= a b, a,b V, F iii. ( ) a= a a, a V, , F iv. ( ) a= ( a), a V, , F v. 1 a=a, a V, 1 is unity element of F BHU Banaras Hindu University 6 DST-CIMS Vector Space Linear Algebra: A vector space (V,F, ,+, , ) is called a linear algebra over field F if there is an additional operation : V V V called multiplication of vectors and satisfying the following postulates: i. a b V, a,b V ii. a (b c)=(a b) c, a,b,c V iii. a (b+c)=(a b)+(a c), a,b,c V iv. (a b)=( a) b, a,b V, F If there is an element 1 in V such that 1 a=a 1=a, a V, then V is linear algebar with identity. And 1 is called as the identity of V. Algebra V(F) is commuatitive if: a b=b a, a,b V Note: 1. Elements of V are called as vector and F are scalar. 2. Vector do not mean vector quantity as defined in vector algebra as directed line segment. 3. We say vector space V over field F and denote it as V(F). BHU Banaras Hindu University 7 DST-CIMS Vector Space Subspace : V(F) be vector space. W V is called a subspace of V if W(F) is itself a vector space w.r.t. operation in V(F). e.g. W={(x,2y,3z): x,y,z R} is subpsace of R3 (R). W={(x,y,0): x,y R} is subpsace of R3 (R). V(F) be vector space of all n 1 matrices and A be an m n matrix over F. W={x V: Ax=o } is subspace of V. Generator: V(F) be a vector space and S V. If U is a subspace of V containing S and is contained in every subspace V containg S, then U is smallest subspace of V containing S. This subspace U of V containing S is called as subspace of V generated or spanned by S and denoted as [S], i.e., U=[S]. Linear Combination: V(F) be a vector space. Any vector a = α1a1+α2a2 +....+αnan , where α1,α2,..αn F is called linear combination of the vectors a1,a2,...,an . BHU Banaras Hindu University 8 DST-CIMS Vector Space Linear Span: V(F) be a vector space and S( ) V. Linear Span of S, L(S), is the set of all linear combinations of finite sets of elements of S. L(S) =1a1 2a2 3a3 ... nan , where {1, 2 , 3 ,.., n } F and {a1,a2 , a3 ,..,an } S. Note: L(S) is subspace of V(F) and L(S)=[S]. Linear Dependence (LD): V(F) be vector space, and {a1,a2,...,an } V is said to be LD if 1, 2 ,.., n F s.t. 1a1+ 2a2 +..+ nan 0; and some i 0. Linear Independence (LI): V(F) be vector space, and {a1,a2,...,an } V is said to be LI if 1a1+ 2a2 +..+ nan 0, i F, 1 i n i 0, 1 i n. Basis: S V(F) is basis of vector space V(F), if i) S consists of LI elements. ii) V=[S]=L(S). Dimension: V(F) is said to be finite dimensonal if finite subset S V such that V=L(S)=[S]. Number of elemnet in the basis of the finite dimensonal V(F) is the dimenson of V( e.g. S1={(1,0,0),(0,1,0),(0,0,1)} and S2 ={(1,0,0),(1,1,0),(1,1,1)} are basis of R3 (R). BHU Banaras Hindu University 9 DST-CIMS Vector Space Inner Product An inner product on vector space V(R/C) is a functiion < >:V×V R/C, which assigns each ordered pair of vectors a,b in V a scalar <a, b> such that i. <a,b>= <a,b> [< > denote complex conjugate] ii.<αa+βb,c>=α<a,c>+β<b,c> iii. <α,α> 0 and <α,α>=0 α=0 b V=C[a,b]. Then inner product: x,y x(t )y (t )dt a n V=R . Then inner product of x=(x1,x 2 ,..,x n ), y=(y1,y 2 ,..,y n ) is given as x,y x1y1,+x 2 y 2 +..+x n y n Norm / Length: Lenght of a vector in V(F): x = <x,x> Distance: Distance between two vectors x, y in V(F): d(x,y)= x-y x-y,x-y Note: (V,d) is metric space. Orthogonality: (V,<>) is inner product and let x, y V. Vectors x and y are said to be orthogonal to each other if : <x,y>=0 Orthogonal set ( 0) LI ; LI Orthogonality; Orthogonality: Orthogonal BHU Banaras Hindu University Gram-Schmidt LI Orthogonality x 1 10 DST-CIMS Linear Transformation Definition (LT): U(F) and V(F) be two vector spaces, a Linear Trnsformation from U into V is function T:U V such that: T( x+ y)= T(x)+ T(y); , F and x,y U Linear Operator: Linear Operator on V(F) is function T:V V such that: T( x+ y)= T(x)+ T(y); , F and x,y U Range Space of LT: T:U(F) V(F) is LT. The range space of T, R(T), is given as follows: R(T)={T(x) V: x U} Null Space of LT: T:U(F) V(F) is LT. The null space of T, N(T), is given as follows: N(T)={x U: T(x)=0 V} Note: 1. R(T) V is subspace of V, N(T) U is subspace of U. 2. If U(F) is finite dimensonal, then R(T) also finite dimensonal. Rank and Nullity of LT: 1. Rank: Dimenson of range space of LT. (T)=dim(R(T)). 2. Nullity: Dimenson of null space of LT. (T) =dim(N(T)). Note: T:U(F) V(F), (T)+ (T)=dim(U) Non-Singular Transform: A LT T:U V is non-singular if N(T)={0}, i.e., x U and T(x)=0 x=0 Singular Transform: BHU A LT T:U V is singular if x 0 U such that T(x)=0 Banaras Hindu University 11 DST-CIMS Matrices Definition: A set of mn elements of any field F arranged in the form of a rectangular array having m rows n columns is called an m n matrix over the field F. a11 a12 a1n a21 a22 a2n A am1 am 2 amn m n then matrix A called as square matrix. , A [aij ]m n ; aij for which i j constitute principal diagonal. mn 1 0 0 Unit / Identity Matrix: 0 1 0 1, i j I , aij , 0, i j 0 0 1 A [ ij ]n n d1 0 0 Diagonal Matrix: D 0 0 0 Square A=[aij ]n n for which aij 0 for i j . 0 0 d3 33 Scalar Matrix: BHU k 0 0 0 k 0 Diagonal Matrix A=[aij ]n n for which aii k for i j . S A is any matrix and S is sclar matrix then SA = AS = kA 0 0 k n n Banaras Hindu University 12 DST-CIMS Matrices Upper Triangular Matrix: Square matrix A=[aij ]n n is upper triangular, if aij 0 whenever i j . a11 0 A 0 0 a12 a13 a22 a23 0 a33 0 0 a1n a2n a3n ann nn Lower Triangular Matrix: Square matrix A=[aij ]nn is lower triangular if aij 0 whenever i j. a11 a21 A a31 an1 0 0 a22 0 a32 a33 an2 an3 0 0 0 ann nn Symmetric : Square matrix A=[aij ]nn is symmetric if aij a ji , i , j . a b D b e c d c d f 33 Skew Symmetric: Square matrix A=[aij ]nn is skey symmetric if aij a ji , i, j. BHU Banaras Hindu University 13 0 h g D -h 0 f -g -f 0 33 DST-CIMS Matrices Transpose : A=[aij ]m n , the n m matrix obtained from A by changing its rows into columns and column into rows is called the transpose of A, and is denoted by A' or A T . 1 2 3 1 2 3 4 A 2 3 4 1 , AT 2 3 4 3 4 2 3 4 2 1 3 4 4 1 1 43 Trace : A=[aij ]n n square matrix. The sum of the main diagonal element of A is the trace of the n matrix. tr(A) = aii i 1 Addition: If A=[a ij ]m n , B=[b ij ]m n , then C= A B is defined as c ij =aij bij ; (M,+) is abelian group. Scalar Mult.: k A=A k [k aij ]m n Multiplication: A=[aij ]m n , B=[b ij ]n p , A B is possible when no. of column in A is equal to no. of rows in B. A B is n p matrix C = [c ik ]n p such that: n c ik aij b jk j 1 a11 a12 a1n a21 a22 a2n th i row of matrix is denoted by vector ri (ai 1,ai 2 ,ai 3 , ,ai ,n ) A am1 am 2 amn i th column of matrix is denoted by vector c i (a1i ,a2i ,a3i , ,ami ) r1 A r2 c1,c 2,c 3 , ,c n mn rm mn Row /Column Vector Representation of Matrix: row vectors r1,r2, ,rm V (F) and column vectors c1,c 2, ,c m V (F) BHU n m Banaras Hindu University 14 mn DST-CIMS Matrices Row Space And Row Rank of Matrix : Let R={ r1,r2, ,rm } the linear span L(R) V n (F) is called as row space of the matrix. Row rank of the matrix r (A) = dim(L(R)) dim(V n (F))=n. r (A)=n ? {r1,r2 ,...,rm } is LI Column Space And Column Rank of Matrix : Let C={ c1,c 2 , ,c m } the linear span L(C) V m (F) is called as column space of the matrix. Col. rank of the matrix c (A)=dim(C(R)) dim(V m (F))=m. c (A)=m ? {c1,c 2 ,...,c m } is LI Rank of Matrix : (A) min(r (A), c (A)). Determinant of Square Matrix: Let f be a scalar function(not vector or a matrix function) of x1,x 2 , x n , called the determinant of A, satisfying the following conditions: i). f ( x1,x 2 , , cxi , x n ) cf ( x1,x 2 , , xi , ,x n ), where c is scalar. This condition means that if any row is multiplied by a scalar then it is equivalent to multyplying the whole determinant by the scalar. ii). f ( x1,x 2, xi , cxi +x j , x n ) f ( x1,x 2 , , xi , x j , ,x n ). If scalr multiple of ith row (col.) is added to the jth row (col.) the value of the determinant remains the same. iii). x i is written as sum of two vectors, x i y i z i , then f ( x1,x 2 , ,y i z i , x n ) f ( x1,x 2 , ,y i , ,x n ) f ( x1,x 2 , , zi , ,x n ). This means that if the i-th row (col.) is split as sum of two vectors(col.),y i z i , then the determinant becomes sum of two determinants. iv). f (e1,e 2 , ,e i , ,e n ) 1, w here e1,e2 , , ei , ,en are the basic unit vectors. This condition says that the determinant of identity matrix is 1. BHU Banaras Hindu University 15 DST-CIMS Determinant The conditions (i)-(iv) are called the postulates define the determinant of a square matrix. Standard notation: |A|, det(A)= determinant of A. Some Properties of Determinant: a a12 i. Determinant of a 2 2 matrix A= 11 is |A|=ad-bc. a21 a22 ii. The det. of square null matrix is zero. Determinant of a square matrix with one or more rows or column null is zero. iii. The determinant of a diagonal matrix is the product of the diagonal elements. iv. The determinant of traingular matrix is the product of the diagonal elements. v. If any two row (col.) are interchange then the value of determinant of the new matrix is -1 times the value of original determinat. v. The value of determinant of a matrix of real numbers can be negative, positive, or zero. From postulate (ii) the value of determinant remains the same if any multiple of any row (col.) added to any other row (col.). Thus if one or more rows (col.) are LD on other rows (col.) then these dependent rows (col.) can be made null be linear operations. Then the determinant is zero. vi. |A| 0 iff all rows (col.) form LI set of vectors. And hence r (A)=c (A)=n. vii. For n n matrices A and B: |AB|=|A||B| viii. A be any nxn matrix. Then matrix B, if exists, such that, AB=BA =In , denoted as B=A 1. 1 ix. AA 1 A 1A=I |AA 1 || A 1A|=|I|=1 |A||A 1 | 1 |A 1 | |A | BHU Banaras Hindu University 16 DST-CIMS Cofactor Expansion Minors: A=[aij ] be m n matrix. Delete m rows and n columns, m<n. The determinant of the resulting submatrix is called a minor. If the ith row and jth columns ar deleted then the determinant of the resulting submatrix is called the minor of a ij . 2 0 1 A= 1 2 4 then 2 4 minor of a11, 1 4 minor of a22 1 5 0 5 0 1 5 Leading Minors: If the submatrices are formed by deleting the rows and columns from 2nd onward, from the 3rd onward, and show on then the corresponding minors are called the leading minors. 2 0 1 2 0 2, ,1 2 4 1 2 0 1 5 Cofactors : Let A=[a ij ] be nxn matrix. The cofactor of aij is defined as (-1)i j times the minor a ij . That is, if the cofactor and minor of a ij is denoted by C ij and Mij respectively then: Cij ( 1)i j Mij BHU Banaras Hindu University 17 DST-CIMS Cofactor Expansion Evaluation of Determinant: Let A=[aij ] be nxn matrix, and cofactor and minor of a ij is denoted by C ij and Mij . Then A a11 C11 a12 C12 a1n C1n a11 M11 a12 M12 a i 1 Ci 1 a i 2 Ci 2 ( 1)n 1a1n M1n ain Cin ai 1( 1)i 1 Mi 1 ai 2 ( 1)i 2 MI 2 Cofactor Matrix: ( 1)i n a in Min ; for i=1,2, n Let A=[aij ] be nxn matrix, and cofactor of aij is denoted by Cij . Then the cofactor matrix of A, cof(A): |C11| |C12 | |C21| |C22 | cof(A)= |Cn1| |Cn 2 | |C1n | |C2n | |Cnn | Inverse of Matrix: Let A=[aij ] be nxn matrix, the inverse of A, if it exist, is given by: A 1 1 [cof(A)]T , |A| 0 |A| Singular and Non Singular Matrix: A sqaure matrix A =[a]n n is said to be non-singular or singular according as |A| 0 or |A|=0 BHU Banaras Hindu University 18 DST-CIMS Cofactor Expansion Rank of Matrix: A number r is said to be the rank of a matrix A if it possesses the following two properties: i. There is at least one square submatrix of A of size rxr whose det. is not zero. ii. If matrix contain any square submatrix of size (r+1)x(r+1), then the det. of every such square matrix must be zero. Invertbility of Matrix: Following are equivalent statement: A is non-singular (A)=n r (A)=n c (A)=n A 1 exist AA 1 A 1A I |A|=0 R={r1,r2 , BHU rn } LI C={c1,c 2 , ,c n } is LI. Banaras Hindu University 19 DST-CIMS LT using Matrix T:U(F) V(F) is LT, B = {1, 2 , , n } and B' = {1, 2 , , m } be ordered bases for U and V. Then i B, each of the n vectors T( j ) V is uniquely expressed as linear combination of elements of B'. T( j )=a1 j 1 a2 j 2 T(1 ) T( ) 2 T( n ) m amj m i.e. T( j )= aij i a11 a21 am1 a21 a22 am2 = am1 am 2 amn 1 2 ; mn m i 1 T [T; B; B'] matrix of T relative to B, B'. Example: Let T be a LT on vector space V2 (F) be defined by T(a,b)=(a,0). Find matrix of T relative to standard bases B={e1, e 2 }={(1,0),(0,1)} T(e1 )=T(1,0)=(1,0)=1(1,0)+0(0,1)=1e1 0e2 T(e1 ) 1 0 e1 0 0 e T(e2 )=T(0,1)=(0,0)=0(1,0)+0(0,1)=0e1 0e2 T(e ) 2 2 The matrix of T relative to ordered basis B =TB =[T;B]= 1 0 . 0 0 BHU Banaras Hindu University 20 DST-CIMS Eigen Value and Eigen Vector Eigen Value and Eigen Vector of LT: V, such that Let T:V(F) V(F) be LT. The scalar c F is called a eigen value of T if x(=0) T(x)=cx Then vector x is called as eigen vector corresponding to eigen value c. T(x)=cx T(x)=cI(x), where I is identity transform. T(x)-cI(x)=0 (T-cI)(x)=0 T'(x)=0., where T' is LT and T'=T-cI. x V such that T'(x)=0 T' is singular. det(T')=0 Eigen Value and Eigen Vector of Matrix: Let A be nxn matrix. Consider Eq. Ax= x where is scalar and x is an nx1 vector. Null vector is trivial solution of this equation. If the equation has solution for a and for a non-null x then is called an eigenvalue or characteristic or latent root of A. And the Non-null x satisfying equation for that particular is called eigenvector or characteristic vector or latent vector corresponding to that eigenvalue . Ax= x Ax=I(x) (A-I)x=0 is homogeneous linear equation have non-null solution A-I is singular A-I 0 BHU Banaras Hindu University 21 DST-CIMS Eigen Value and Eigen Vector Properties: i. The eigenvalues of a diagonal values of a matrix are its diagonal element. ii. The eigenvalues of a triangular (upper or lower) matrix are its diagonal elements. iii. The eigenvalues of a scalar matrix with the diagonal elements c each are c repeated n times. iv. The eigenvalues of a Identity matrix are 1 repeated n times. v. |A|=1 2 3 n vi. Matrix A is sigular if atleast its one eigenvalue is zero. vii. tr(A)=a11 + a 22 + +a nn 1 2 3 n . viii. A and AT have the same eigenvalues. ix. The eigenvectors corresponding different eigenvalue are LI. x. If x1, x 2 are two eigenvector corresponding same eigenvalue then c 1x1 c2 x2 is also eigenvector for same eigenvalue. xi. Eigen value of real symmetric matrix is real. x. Eigenvectors corresponding different eigenvalue of real symmetric matrix are orthogonal. BHU Banaras Hindu University 22 DST-CIMS Similarity of Matrix Def. Let A and B be square matrices of order n. Then B is said to similar to A if there exists a non- singular matrix P such that B = P 1AP. Note: 1. Similarity is equivalence relation. 2. If matrix A is similar to diagonal matrix D, then diagonal elements of D are eigenvalues of A. Diagonalizable Matrix: A matrix A is said to be diagonalizable if it is similar to a diagonal matrix. Thus A diagonalizable if there exists an invertable matrix P such that P 1AP=D, where D is a diagonal matrix. i. A nxn matrix is diagonalizable iff it possesses n LI eigenvector. ii. If eigenvalues of an nxn matrix are all distinct then it is always similar to diagonal matrix. iii. Two nxn matrices with the same set of n distinct eigenvalues are similar. iv. P 1AP=D A=PDP 1 is EVD. v. (Spectral Decomposition for Symmetric Matrix): Square symmetric matrix A can be expressed in terms of its eigenvalue-eigenvector pairs (i ,e i ) as A= 1e1eT1 2e2eT2 3 e3 eT3 BHU Banaras Hindu University n en eTn 23 DST-CIMS Similarity of Matrix Singular Value Decomposition A singular value and corresponding singular vectors of a rectangular matrix A are, respectively, a scalar σ and a pair of vectors u and v that satisfy Av= u and A u= v T With the singular values on the diagonal of a diagonal matrix Σ and the corresponding singular vectors forming the columns of two orthogonal matrices U and V, we have : AV= U and A TU= V Since U and V are orthogonal, this becomes the singular value decomposition: A=U V T Def.: Every mxn matrix A can be written A =U V T where U is mxm, V is nxn orthogonal matrices and is mxn diagonal matrix. Note: 1. Diagonal Element of termed as singular values of A. 2. Using SVD directly we get A T A =(U V T )T (U V T )=V 2 V T and AA T =U 2 UT . Columns of U and V represent the eigenvectors of AA T and A T A respectively, and the diagonal entries of 2 represent their set of eigenvalues. BHU Banaras Hindu University 24 DST-CIMS Similarity of Matrix Cholesky Factorization:The Cholesky factorization expresses a symmetric matrix as the product of a triangular matrix and its transpose. A RT R where R is an upper triangular matrix. Not all symmetric matrices can be factored in this way; the matrices that have such a factorization are said to be positive definite. The Cholesky factorization allows the linear system: Ax=b to be replaced by RT Rx b to form triangular system of equation. Solved easily by forward and backward substitution. LU Factorization: LU factorization, or Gaussian elimination, expresses any square matrix A as the product of a permutation of a lower triangular matrix and an upper triangular matrix A=LU where L is a permutation of a lower triangular matrix with ones on its diagonal and U is an upper triangular matrix. A L U U u11 u22 unn and A 1 U1L1 QR Factorization: The orthogonal, or QR, factorization expresses any rectangular matrix as the product of an orthogonal or unitary matrix and an upper triangular matrix. A=QR where Q is orthogonal or unitary, R is upper triangular. BHU Banaras Hindu University 25 DST-CIMS APPLICATION Documents Ranking BHU Banaras Hindu University 26 DST-CIMS Documents Ranking Rank documents in decreasing order of relevance to the query? A collection consisting of the following five documents: d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures. queried for latent semantic indexing (q). Decreasing order of cosine similarities Assume that: 1. Documents are linearized, tokenized, and their stop words removed. Stemming is not used. Survival terms are used to construct a term-document matrix A. This matrix is populated with term weights : aij Lij Gi N j Lij fij , where frequency of term, i ,in document j . This is so-called FREQ model. Gi log(D / d i ), where D is the collection size and d i is the number of documents conatining term i . This is so called IDF model. IDF Satand for Inverse Document Frequency. N j 1 / l; i.e. document lengths are normalized to 1/l. In general, l is the so called L 2 norm or Frobenius length. aij fij log(D / d i )N j . BHU Banaras Hindu University 27 DST-CIMS Documents Ranking 2. Query terms are scored using FREQ; i.e., aiq Liq fiq , where fiq is the frequency of term i in the query q. Procedure: 1. Compute A and q. 2. Normalize the document vectors query vector. A An q qn 3. Compute qnT A n . where n denotes normalized vector. Term-Document Matrix Documents in collection: d1 = LSI tutorials and fast tracks. d2 = Books on semantic analysis. d3 = Learning latent semantic indexing. d4 = Advances in structures and advances in indexing. d5 = Analysis of latent structures. BHU Banaras Hindu University LSI Tutorials fast tracks books semantic analysis learning latent indexing advances structures 28 d1 1 1 1 1 0 0 0 0 0 0 0 0 d2 0 0 0 0 1 1 1 0 0 0 0 0 d3 0 0 0 0 0 1 0 1 1 1 0 0 d4 0 0 0 0 0 0 0 0 0 1 2 1 d5 0 0 0 0 0 0 1 0 1 0 0 1 DST-CIMS Documents Ranking Step1: A= Weight Matrix d1 d2 d3 d4 d5 d1 d2 d3 d4 d5 d1 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 0 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 0 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 0 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 0 0 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 1log(5/2) 1log(5/2) 0 0 0 0.3979 0.3979 0 0 0 1log(5/2) 0 0 1log(5/2) 0 0.3979 0 0 0.3979 0 0 0 1log(5/1) 0 0 0 0 0.6990 0 0 0 0 0 1log(5/2) 0 1log(5/1) 0 0 0.3979 0 0.6990 1 0 0 1log(5/2) 1log(5/1) 0 0 0 0.3979 0.6990 0 1 0 0 0 2log(5/1) 0 0 0 0 1.3980 0 0 0 0 0 1log(5/1) 1log(5/1) 0 0 0 0.6990 0.6990 0 BHU Banaras Hindu University = 29 q= DST-CIMS 0 1 Documents Ranking Frobenius norm(L 2 norms, Euclidean lengths) of documents: Step2: Normalization: A n= qnT = 0 BHU 0 d1 d2 d3 d4 d5 d1 0.5000 0 0 0 0 0 0.5000 0 0 0 0 0 0.5000 0 0 0 0 0 0.5000 0 0 0 0 0 0 0.7790 0 0 0 0 0.4434 0.4054 0 0 0 0.4434 0 0 0.5774 0 0 0 0.7121 0 0 0 0 0 0.4054 0 0.5774 0.5774 0 0 0.4054 0.2640 0 0.5774 0 0 0 0.9277 0 0 0 0 0 0.2640 0.5774 0 0.5774 0 0 0 Banaras Hindu University 0 qn= 0 30 0 0.5774 0.5774 0.5774 0 0 DST-CIMS Documents Ranking Step3: Compute qnT An Documents rank as follows: qnT A n= d1 d2 d3 d4 d5 0 0.2560 0.7022 0.1524 0.3334 d3 d5 d2 d4 d1 Exercises 1. Repeat the above calculations, this time including all stopwords. Explain any difference in computed results. 2. Repeat the above calculations, this time scoring global weights using IDF probabilistic (IDFP): Gi log((D d i ) / d ) Explain any difference in computed results. BHU Banaras Hindu University 31 DST-CIMS APPLICATION Latent Semantic Indexing (LSI) Using SVD BHU Banaras Hindu University 32 DST-CIMS Latent Semantic Indexing Use of LSI to cluster term, and find the terms that could be used to expand or reformulate the query. Example: Collection consist of following documents: d1 = Shipment of gold damaged in a fire. Assume that the query is gold silver truck. d2 = Delivery of silver arrived in a silver truck. d3 = Shipment of gold arrived in a truck. SVD Every matrix A of dimensions m n m n can be decomposed as : A=U VT where - U has dimension m m, and col. are orthogonal, ie. UUT UT U Im m . - has dimension m n, the only non-zero elements are on main daiagonal. - V has dimension n n and its col. are orthogonal, i.e. VVT V TV In n A Up p VpT - Up is m p, with orthogonal col. - p is p p, and diagonal. - Vp n p with orthogonal col. BHU Banaras Hindu University 33 DST-CIMS Latent Semantic Indexing (Procedure) Step1: Score term weights and construct the term – document matrix A and query matrix. a arrived damaged delivery fire gold in of shipment silver truck BHU d1 1 0 1 0 1 1 1 1 1 0 0 Banaras Hindu University d2 1 1 0 1 0 0 1 1 0 2 1 d3 1 1 0 0 0 1 1 1 1 0 1 A= 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 0 1 1 0 2 1 34 1 1 0 0 0 1 1 1 1 0 1 q= 0 0 0 0 0 1 0 0 0 1 1 DST-CIMS Latent Semantic Indexing (Procedure) Step2-1: Decompose matrix A using SVD procedure into U, S and V matrices. A=U V A= 1 0 1 0 1 1 1 1 1 0 0 1 1 0 1 0 0 1 1 0 2 1 1 1 0 0 0 1 1 1 1 0 1 = U= V= BHU Banaras Hindu University -0.49447 -0.64918 -0.57799 -0.64582 0.719447 -0.25556 -0.58174 -0.24691 0.774995 35 DST-CIMS Latent Semantic Indexing (Procedure) A=U V Step2-2: Decompose matrix A using SVD procedure into U, S and V matrices. -0.42012 -0.0748 -0.04597 -0.29949 0.200092 0.407828 -0.12063 -0.27489 -0.4538 -0.15756 0.304648 -0.20065 U= -0.12063 -0.27489 -0.4538 -0.26256 -0.37945 0.154674 -0.42012 -0.0748 -0.04597 -0.42012 -0.0748 -0.04597 = 4.098872 0 0 0 2.361571 0 0 0 1.273669 -0.49447 -0.64918 -0.57799 V= -0.64582 0.719447 -0.25556 -0.58174 -0.24691 0.774995 -0.26256 -0.37945 0.154674 -0.31512 0.609295 -0.40129 -0.29949 0.200092 0.407828 Step3: Rank 2 Approximation : Uk= BHU -0.42012 -0.29949 -0.12063 -0.15756 -0.12063 -0.26256 -0.42012 -0.42012 -0.26256 -0.31512 -0.29949 -0.0748 0.200092 -0.27489 0.304648 -0.27489 -0.37945 -0.0748 -0.0748 -0.37945 0.609295 0.200092 Banaras Hindu University k = 4.098872 0 0 2.361571 36 Vk= -0.49447 -0.64918 -0.64582 0.719447 -0.58174 -0.24691 DST-CIMS Latent Semantic Indexing (Procedure) Step 4: Find the new term vector coordinates in this reduced 2-dimensonal space. Rows of U holds eigenvector values. These are coordinates of the individual term vectors. Thus from the reduced matrix (Uk) : 1 2 3 4 5 6 7 8 9 10 11 a arrived Damaged delivery fire gold in of shipment silver truck -0.42012 -0.29949 -0.12063 -0.15756 -0.12063 -0.26256 -0.42012 -0.42012 -0.26256 -0.31512 -0.29949 -0.0748 0.200092 -0.27489 0.304648 -0.27489 -0.37945 -0.0748 -0.0748 -0.37945 0.609295 0.200092 Step 5: Find the new query vector coordinates in the reduced 2-dimensional space. Using q=qTUk Sk1 q= 0 0 0 0 0 1 0 0 0 1 1 BHU Banaras Hindu University -0.42012 -0.29949 -0.12063 -0.15756 -0.12063 -0.26256 -0.42012 -0.42012 -0.26256 -0.31512 -0.29949 -0.0748 0.200092 -0.27489 0.304648 -0.27489 -0.37945 -0.0748 -0.0748 -0.37945 0.609295 0.200092 1 4.0989 0 37 1 2.3616 0 = [-0.2140 -0.1821 DST-CIMS ] Latent Semantic Indexing (Procedure) Step 6: Group terms into clusters Grouping is done by comparing cosine angles between any two pair of vectors. The following clusters are obtained: 1. a, in of 2. gold, shipment 3. damaged, fire 4. arrived, truck 5. silver 6. delivery BHU Banaras Hindu University Some vectors are not shown since these are completely superimposed. This is the case of points 1 – 4. If unit vectors are used and small deviation ignored, clusters 3 and 4 and clusters 4 and 5 can be merged. 38 DST-CIMS Latent Semantic Indexing (Procedure) Step 7: Find terms that could be used to expand or reformulate the query The query is gold silver truck. Note that in relation to the query, clusters 1, 2 and 3 are far away from the query. Similarity wise these could be viewed as belonging to a “long tail”. If we insist in combining these with the query, possible expanded queries could be gold silver truck shipment gold silver truck damaged gold silver truck shipment damaged gold silver truck damaged in a fire shipment of gold silver truck damaged in a fire etc… Looking around the query, the closer clusters are 4, 5, and 6. We could use these clusters to expand or reformulate the query. For example, the following are some of the expanded queries one could test. gold silver truck arrived delivery gold silver truck gold silver truck delivery gold silver truck delivery arrived etc… Documents containing these terms should be more relevant to the initial query. BHU Banaras Hindu University 39 DST-CIMS APPLICATION Latent Semantic Indexing (LSI) Exercise BHU Banaras Hindu University 40 DST-CIMS Latent Semantic Indexing (Exercise) The svd was the original factorization proposed for Latent Semantic Indexing (LSI), the process of replacing a term-document matrix A with a low-rank approximation Ap which reveals implicit relationships among documents that don’t necessarily share common terms. Example: Term D1 D2 D3 D4 D5 twain 53 65 0 30 1 clemens 10 20 40 43 0 huckleberry 30 10 25 52 70 A query on clemens will retrieve D1, D2, D3, and D4. A query on twain will retrieve D1, D2, and D4. For p = 2, the svd gives Term twain clemens huckleberry D1 49 23 25 D2 65 22 9 D3 7 14 34 D4 34 30 57 D5 -5 21 63 Now a query on clemens will retrieve all documents. A query on twain will retrieve D1, D2, D4, and possibly D3. The negative entry is disturbing to some and motivates the nonnegative factorizations. BHU Banaras Hindu University 41 DST-CIMS References 1. Linear Algebra –I module 1, Vector and Matrices, by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala. 2. Linear Algebra –II module 2, Determinants and Eigenvalues by A.M. MATHAI, Centre for Mathematical Sciences (CMS) Pala. 3. Introduction to Linear Algebra, Wellesley – Cambridge Press, 1993. 4. Matrix Computation, C. Golub and C. Van Loan, Johns Hopkins University Press, 1989. 5. Linear Algebra, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan. 6. Matrices, A. R. Vasishtha and J.N. Sharma, Krishana Prakashan. 7. Linear Algebra, Ramji Lal, Sail Publication, Allahabad. 8. An Introduction to Information Retrieval, Christopher D. Manning, Prabhakar Raghavan, Hinrich Schutze, Cambridge University Press. BHU Banaras Hindu University 42 DST-CIMS