* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Slides - DidaWiki - Università di Pisa
Matrix completion wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Linear least squares (mathematics) wikipedia , lookup
System of linear equations wikipedia , lookup
Rotation matrix wikipedia , lookup
Determinant wikipedia , lookup
Matrix (mathematics) wikipedia , lookup
Four-vector wikipedia , lookup
Principal component analysis wikipedia , lookup
Jordan normal form wikipedia , lookup
Orthogonal matrix wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Gaussian elimination wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Latent Semantic Indexing (mapping onto a smaller space of latent concepts) Paolo Ferragina Dipartimento di Informatica Università di Pisa Reading 18 Speeding up cosine computation What if we could take our vectors and “pack” them into fewer dimensions (say 50,000100) while preserving distances? Now, O(nm) to compute cos(d,q) for all d Then, O(km+kn) where k << n,m Two methods: “Latent semantic indexing” Random projection Briefly LSI is data-dependent Create a k-dim subspace by eliminating redundant axes Pull together “related” axes – hopefully car and automobile What about polysemy ? Random projection is data-independent Choose a k-dim subspace that guarantees good stretching properties with high probability between any pair of points. Notions from linear algebra Matrix A, vector v Matrix transpose (At) Matrix product Rank Eigenvalues l and eigenvector v: Av = lv Overview of LSI Pre-process docs using a technique from linear algebra called Singular Value Decomposition Create a new (smaller) vector space Queries handled (faster) in this new space Singular-Value Decomposition Recall m n matrix of terms docs, A. Define term-term correlation matrix T=AAt A has rank r m,n T is a square, symmetric m m matrix Let P be m r matrix of eigenvectors of T Define doc-doc correlation matrix D=AtA D is a square, symmetric n n matrix Let R be n r matrix of eigenvectors of D A’s decomposition Given P (for T, m r) and R (for D, n r) formed by orthonormal columns (unit dot-product) It turns out that A = PSRt Where S is a diagonal matrix with the eigenvalues of T=AAt in decreasing order. mn A = mr P rr S rn Rt Dimensionality reduction For some k << r, zero out all but the k biggest eigenvalues in S [choice of k is crucial] Denote by Sk this new version of S, having rank k Typically k is about 100, while r (A’s rank) is > 10,000 document k k = 0 0 k 0 r Ak Pm x kr Sk Rtkr xx nn useless due to 0-col/0-row of Sk Guarantee Ak is a pretty good approximation to A: Relative distances are (approximately) preserved Of all m n matrices of rank k, Ak is the best approximation to A wrt the following measures: minB, rank(B)=k ||A-B||2 = ||A-Ak||2 = sk+1 minB, rank(B)=k ||A-B||F2 = ||A-Ak||F2 = sk+12+ sk+22+...+ sr2 Frobenius norm ||A||F2 = s12+ s22+...+ sr2 Reduction Xk = Sk Rt R,P are formed by orthonormal eigenvectors of the matrices D,T is the doc-matrix k x n, hence reduced to k dim Since we are interested in doc/q correlation, we consider: D=At A =(P S Rt)t (P S Rt) = (SRt)t (SRt) Approx S with Sk, thus get At A Xkt Xk (both are n x n matr.) We use Xk to define how to project A and Q: Xk = Sk Rt , substitute Rt = S-1 Pt A, so get Pkt A In fact, Sk S-1 Pt = Pkt which is a k x m matrix This means that to reduce a doc/query vector is enough to multiply it by Pkt thus paying O(km) per doc/query Cost of sim(q,d), for all d, is O(kn+km) instead of O(mn) Which are the concepts ? c-th concept = c-th row of Pkt (which is k x m) Denote it by Pkt [c], whose size is m = #terms Pkt [c][i] = strength of association between c-th concept and i-th term Projected document: d’j = Pkt dj d’j[c] = strenght of concept c in dj Projected query: q’ = Pkt q q’ [c] = strenght of concept c in q Random Projections Paolo Ferragina Dipartimento di Informatica Università di Pisa Slides only ! An interesting math result Lemma (Johnson-Linderstrauss, ‘82) Let P be a set of n distinct points in m-dimensions. Given e > 0, there exists a function f : P IRk such that for every pair of points u,v in P it holds: (1 - e) ||u - v||2 ≤ ||f(u) – f(v)||2 ≤ (1 + e) ||u-v||2 Where k = O(e-2 log n) f() is called JL-embedding Setting v=0 we also get a bound on f(u)’s stretching!!! What about the cosine-distance ? f(u)’s, f(v)’s stretching substituting formula above for ||u-v||2 How to compute a JL-embedding? If we set R = ri,j to be a random mx k matrix, where the components are independent random variables with one of the following distributions E[ri,j] = 0 Var[ri,j] = 1 Finally... Random projections hide large constants k (1/e)2 * log m, so it may be large… it is simple and fast to compute LSI is intuitive and may scale to any k optimal under various metrics but costly to compute, now good libraries indeed