Download Machine Learning ICS 273A

Machine Learning ICS 178 Instructor: Max Welling visualization & k nearest neighbors Types of Learning • Supervised Learning • Labels are provided, there is a strong learning signal. • e.g. classification, regression. • Semi-supervised Learning. • Only part of the data have labels. • e.g. a child growing up. • Reinforcement learning. • The learning signal is a (scalar) reward and may come with a delay. • e.g. trying to learn to play chess, a mouse in a maze. • Unsupervised learning • There is no direct learning signal. We are simply trying to find structure in data. • e.g. clustering, dimensionality reduction. Ingredients • Data: • what kind of data do we have? • Prior assumptions: • what do we know a priori about the problem? • Representation: • How do we represent the data? • Model / Hypothesis space: • What hypotheses are we willing to entertain to explain the data? • Feedback / learning signal: • what kind of learning signal do we have (delayed, labels)? • Learning algorithm: • How do we update the model (or set of hypothesis) from feedback? • Evaluation: • How well did we do, should we change the model? Data Preprocessing • Before you start modeling the data, you want to have a look at it to get a “feel”. • What are the “modalities” of the data: e.g. • Netflix: users and movies • Text: words-tokens and documents • Video: pixels, frames, color-index (R,G,B) • What is the domain? • Netflix: rating-values [1,2,3,4,5,?] • Text: # times a word appears: [0,1,2,3,...] • Video: brightness value: [0,..,255] or real-valued. • Are there missing data-entries? • Are there outliers in the data? (perhaps a typo?) Data Preprocessing • Often it is a good idea to compute the mean and variance of the data. • 1 E[ X i ]  N N X n 1 mean in 1 VAR[ X i ]  N N (X n 1 in variance  E[ X i ]) 2 STD[ X i ]  Var[ X i ] standard deviation • Mean gives you a sense of location, Variance/STD a sense of scale. • Better even is to histogram the data: Tricky issue: how do you choose the bin-size: too small: you see noise, too big: it’s one clump. Preprocessing • For netflix you can histogram this for both modalities: • The rating distribution over users for a movie. • The rating distribution over movies for a user. • The rating distribution over users for all movies jointly. • The rating distribution over all movies for all users jointly. user-variance • You can compute properties and plot them against each other. For example: • Compute the the user-specific mean variance over movies and plot a scatter plot: every dot is a different user user-mean Scatter-Plots This shows all the 2-D projections of the “Iris data”. Color indicates the class of iris. How many attributes do we have for Iris? 3-D visualization contour plot meshgrid plot Embeddings • Every red dot represents an image. • An image has +/- 1000 pixels • Each image is projected to a 2-D space • Projections are such that similar images are projected to similar locations in the 2-D embedding. • This gives us an idea how the data is organized. These plots are produced by “local linear embedding” http://www.cs.toronto.edu/~roweis/lle/ Embeddings Visualization by Clustering By performing a clustering of the data and looking at the cluster-prototypes you can get an idea of the type of data. Preprocessing • Often it is useful to “standardize” (or “whiten”) the data before you start modeling. • The idea is to remove the mean and the variance so that your algorithm can focus on more sophisticated (higher order) structure. 1) X in  X in  E[ X i ] 2) X in  X in STD[ X i ] In that order! Be Creative! WEKA DEMO

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Machine Learning ICS 273A