* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download LSA - University of Victoria
Linear least squares (mathematics) wikipedia , lookup
Vector space wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
System of linear equations wikipedia , lookup
Rotation matrix wikipedia , lookup
Euclidean vector wikipedia , lookup
Determinant wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Principal component analysis wikipedia , lookup
Jordan normal form wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Gaussian elimination wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Four-vector wikipedia , lookup
Matrix multiplication wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Latent Semantic Analysis An Example d1 : Romeo and Juliet. d2 : Juliet: O happy dagger! d3 : Romeo died by dagger. d4 : “Live free or die”, that’s the New-Hampshire’s motto. d5 : Did you know, New-Hampshire is in New-England. q: dies, dagger Which document should be returned and how the ranking should be? Eigenvectors and Eigenvalues • Let A be an n × n matrix. • If x is an n-dimensional vector, then the matrix-vector product Ax is well-defined, and the result is again an n-dimensional vector. • In general, multiplication by a matrix changes the direction of a non-zero vector x, unless the vector is special and we have that Ax = x for some scalar . Matrix Decomposition • Let S be the matrix with eigenvectors of A as columns. • Let be the diagonal matrix with the eigenvalues of A on the diagonal. • Then A = SS-1 • If A is symmetric then we have S-1=ST A = SST Singular Value Decomposition • Let A be an m × n matrix with entries being real numbers and m > n. • Consider the n × n square matrix B = ATA. – B is symmetric – it has been shown that the eigenvalues of such (ATA) matrices are non-negative. – Since they are non-negative we can write them in decreasing order as squares of non-negative real numbers: 12 > . . . > n2 • For some index r (possibly n) the first r numbers are positive whereas the rest are zero. • S1 = [x1, . . . , xr] • y1=(1/1)Ax1 ... yr=(1/r)Axr • S2 = [y1, ..., yr] • We can show that A = S2 S 1 T • is diagonal and the values along the diagonal are 1, . . . , n which are called singular values. • If we denote S2 by S and S1 by U we have A = S UT Example d1 : Romeo and Juliet. d2 : Juliet: O happy dagger! d3 : Romeo died by dagger. d4 : “Live free or die”, that’s the New-Hampshire’s motto. d5 : Did you know, New-Hampshire is in New-England. q: dies, dagger Document-term matrix Latent Concepts • Latent Semantic Indexing (LSI) is a method for discovering hidden concepts in document data. • Each document and term (word) is then expressed as a vector with elements corresponding to these concepts. – Each element in a vector gives the degree of participation of the document or term in the corresponding concept. • Goal is not to describe the concepts verbally, but to be able to represent the documents and terms in a unified way for exposing – document-document, – document-term, and – term-term similarities which are otherwise hidden… Matrix Matrix A can be written: A = SUT Let's "neglect" the last three singular values of as being too "small"... Also, just keep two columns from S obtaining S2 and two rows from UT obtaining U2T Matrix A is approximated as: A2 = S2U2T In general: Ak = SkUkT where a good value for k is determined empirically. Matrices 2, S2, U2 Representing Documents, Terms, and Queries • Represent documents by the column vectors of U2T • Represent terms by the row vectors S2 • Represent queries by the centroid vector of their terms Geometry