Download Dimensionality Reduction: Part 1

Dimensionality Reduction Part 2: Nonlinear Methods Comp 790-090 Spring 2007 The UNIVERSITY of Mining, KENTUCKY CS685 : Special Topics in Data UKY Why Dimensionality Reduction • Two approaches to reduce number of features – Feature selection: select the salient features by some criteria – Feature extraction: obtain a reduced set of features by a transformation of all features • Data visualization and exploratory data analysis also need to reduce dimension – Usually reduce to 2D or 3D CS685 : Special Topics in Data Mining, UKY Deficiencies of Linear Methods • Data may not be best summarized by linear combination of features – Example: PCA cannot discover 1D structure of a helix 20 15 10 5 0 1 0.5 1 0.5 0 0 -0.5 -0.5 -1 -1 CS685 : Special Topics in Data Mining, UKY Intuition: how does your brain store these pictures? CS685 : Special Topics in Data Mining, UKY Brain Representation CS685 : Special Topics in Data Mining, UKY Brain Representation • Every pixel? • Or perceptually meaningful structure? – Up-down pose – Left-right pose – Lighting direction So, your brain successfully reduced the high-dimensional inputs to an intrinsically 3dimensional manifold! CS685 : Special Topics in Data Mining, UKY Manifold Learning latent • Discover low dimensional representations (smooth manifold) for data in high dimension. yi  R d Y xi  R N X • Linear approaches(PCA, MDS) • Non-linear approaches (ISOMAP, LLE, others) observed CS685 : Special Topics in Data Mining, UKY Linear Approach- PCA • PCA Finds subspace linear projections of input data. CS685 : Special Topics in Data Mining, UKY Linear Method • Linear Methods for Dimensionality Reduction – PCA: rotate data so that principal axes lie in direction of maximum variance – MDS: find coordinates that best preserve pairwise distances PCA CS685 : Special Topics in Data Mining, UKY Motivation • Linear Dimensionality Reduction doesn’t always work • Data violates underlying “linear” assumptions – Data is not accurately modeled by “affine” combinations of measurements – Structure of data, while apparent, is not simple – In the end, linear methods do nothing more than “globally transform” (rate, translate, and scale) all of the data, sometime what’s needed is to “unwrap” the data first CS685 : Special Topics in Data Mining, UKY What does PCA Really Model? • Principle Component Analysis assumptions • Mean-centered distribution – What if the mean, itself is atypical? • Eigenvectors of Covariance – Basis vectors aligned with successive directions of greatest variance • Classic 1st Order statistical model – Distribution is characterized by its mean and variance (Gaussian Hyperspheres) CS685 : Special Topics in Data Mining, UKY Nonlinear Approaches- Isomap Josh. Tenenbaum, Vin de Silva, John langford 2000 • Constructing neighbourhood graph G • For each pair of points in G, Computing shortest path distances ---geodesic distances. • Use Classical MDS with geodesic distances. Euclidean distance Geodesic distance CS685 : Special Topics in Data Mining, UKY Sample points with Swiss Roll • Altogether there are 20,000 points in the “Swiss roll” data set. We sample 1000 out of 20,000. CS685 : Special Topics in Data Mining, UKY Construct neighborhood graph G K- nearest neighborhood (K=7) DG is 1000 by 1000 (Euclidean) distance matrix of two neighbors (figure A) CS685 : Special Topics in Data Mining, UKY Compute all-points shortest path in G Now DG is 1000 by 1000 geodesic distance matrix of two arbitrary points along the manifold (figure B) CS685 : Special Topics in Data Mining, UKY Use MDS to embed graph in Rd Find a d-dimensional Euclidean space Y (Figure c) to preserve the pariwise diatances. CS685 : Special Topics in Data Mining, UKY Isomap Small Euclidean distance • Key Observation: On a manifold distances are measured using geodesic distances rather than Euclidean distances Large geodesic distance CS685 : Special Topics in Data Mining, UKY Problem: How to Get Geodesics • Without knowledge of the manifold it is difficult to compute the geodesic distance between points • It is even difficult if you know the manifold • Solution – Use a discrete geodesic approximation – Apply a graph algorithm to approximate the geodesic distances CS685 : Special Topics in Data Mining, UKY Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm CS685 : Special Topics in Data Mining, UKY Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm CS685 : Special Topics in Data Mining, UKY Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm CS685 : Special Topics in Data Mining, UKY Dijkstra’s Algorithm • Efficient Solution to all-points-shortest path problem • Greedy breath-first algorithm CS685 : Special Topics in Data Mining, UKY Isomap algorithm – Compute fully-connected neighborhood of points for each item • Can be k nearest neighbors or ε-ball • Neighborhoods must be symmetric • Test that resulting graph is fullyconnected, if not increase either K or  – Calculate pairwise Euclidean distances within each neighborhood – Use Dijkstra’s Algorithm to compute shortest path from each point to non-neighboring points – Run MDS on resulting distance matrix CS685 : Special Topics in Data Mining, UKY Isomap Results • Find a 2D embedding of the 3D S-curve (also shown for LLE) • Isomap does a good job of preserving metric structure (not surprising) • The affine structure is also well preserved CS685 : Special Topics in Data Mining, UKY Residual Fitting Error CS685 : Special Topics in Data Mining, UKY Neighborhood Graph CS685 : Special Topics in Data Mining, UKY More Isomap Results CS685 : Special Topics in Data Mining, UKY More Isomap Results CS685 : Special Topics in Data Mining, UKY Isomap Failures • Isomap also has problems on closed manifolds of arbitrary topology CS685 : Special Topics in Data Mining, UKY Local Linear Embeddings • First Insight – Locally, at a fine enough scale, everything looks linear CS685 : Special Topics in Data Mining, UKY Local Linear Embeddings • First Insight – Find an affine combination the “neighborhood” about a point that best approximates it CS685 : Special Topics in Data Mining, UKY Finding a Good Neighborhood • This is the remaining “Art” aspect of nonlinear methods • Common choices • -ball: find all items that lie within an epsilon ball of the target item as measured under some metric – Best if density of items is high and every point has a sufficient number of neighbors • K-nearest neighbors: find the k-closest neighbors to a point under some metric – Guarantees all items are similarly represented, CS685 : Special Topics in Data Mining, UKY Characterictics of a Manifold Rn M z Locally it is a linear patch Key: how to combine all local patches together? R2 x: coordinate for z x2 x x1 CS685 : Special Topics in Data Mining, UKY LLE: Intuition • Assumption: manifold is approximately “linear” when viewed locally, that is, in a small neighborhood – Approximation error, e(W), can be made small • Local neighborhood is effected by the constraint Wij=0 if zi is not a neighbor of zj • A good projection should preserve this local geometric property as much as possible CS685 : Special Topics in Data Mining, UKY LLE: Intuition We expect each data point and its neighbors to lie on or close to a locally linear patch of the manifold. Each point can be written as a linear combination of its neighbors. The weights chosen to minimize the reconstruction Error. CS685 : Special Topics in Data Mining, UKY LLE: Intuition • The weights that minimize the reconstruction errors are invariant to rotation, rescaling and translation of the data points. – Invariance to translation is enforced by adding the constraint that the weights sum to one. – The weights characterize the intrinsic geometric properties of each neighborhood. • The same weights that reconstruct the data points in D dimensions should reconstruct it in the manifold in d dimensions. – Local geometry is preserved CS685 : Special Topics in Data Mining, UKY LLE: Intuition Low-dimensional embedding the i-th row of W Use the same weights from the original space CS685 : Special Topics in Data Mining, UKY Local Linear Embedding (LLE) • Assumption: manifold is approximately “linear” when viewed locally, that is, in a small neighborhood • Approximation error, (W), can be made small • Meaning of W: a linear representation of every data point by its neighbors – This is an intrinsic geometrical property of the manifold • A good projection should preserve this geometric property as much as possible CS685 : Special Topics in Data Mining, UKY Constrained Least Square problem Compute the optimal weight for each point individually: Neightbors of x Zero for all non-neighbors of x CS685 : Special Topics in Data Mining, UKY Finding a Map to a Lower Dimensional Space • Yi in Rk: projected vector for Xi • The geometrical property is best preserved if the error below is small Use the same weights computed above • Y is given by the eigenvectors of the lowest d non-zero eigenvalues of the matrix CS685 : Special Topics in Data Mining, UKY Numerical Issues • Numerical problems can arise in computing LLEs • The least-squared covariance matrix that arises in the computation of the weighting matrix, W, solution can be ill-conditioned – Regularization (rescale the measurements by adding a small multiple of the Identity to covariance matrix) • Finding small singular (eigen) values is not as well conditioned as finding large ones. The small ones are subject to numerical precision errors, and to get mixed – Good (but slow) solvers exist, you have to use them CS685 : Special Topics in Data Mining, UKY Results • The resulting parameter vector, yi, gives the coordinates associated with the item xi • The dth embedding coordinate is formed from the orthogonal vector associated with the dst singular value of A. CS685 : Special Topics in Data Mining, UKY Reprojection • Often, for data analysis, a parameterization is enough • For interpolation and compression we might want to map points from the parameter space back to the “original” space • No perfect solution, but a few approximations – Delauney triangulate the points in the embedding space, find the triangle that the desired parameter setting falls into, and compute the baricenric coordinates of it, and use them as weights – Interpolate by using a radially symmetric kernel CS685 : Special Topics in Data Mining, UKY centered about the desired parameter setting LLE Example • 3-D S-Curve manifold with points color-coded • Compute a 2-D embedding – The local affine structure is well maintained – The metric structure is okay locally, but can drift slowly over the domain (this causes the manifold to taper) CS685 : Special Topics in Data Mining, UKY More LLE Examples CS685 : Special Topics in Data Mining, UKY More LLE Examples CS685 : Special Topics in Data Mining, UKY LLE Failures • Does not work on to closed manifolds • Cannot recognize Topology CS685 : Special Topics in Data Mining, UKY Summary • Non-Linear Dimensionality Reduction Methods – These methods are considerably more powerful and temperamental than linear method – Applications of these methods are a hot area of research • Comparisons – LLE is generally faster, but more brittle than Isomaps – Isomaps tends to work better on smaller data sets CS685 : Special Topics in Data Mining, UKY

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Dimensionality Reduction: Part 1