Download Machine Learning

Machine Learning Marta Vomlelová [email protected] KTIML, S303 Literature T. Hastie, R. Tishirani, and J. Friedman. The Elements of Statistical Learning, Data Mining, Inference and Prediction. Springer Series in Statistics. Springer, (2003 a later, web) or Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani: An Introduction to Statistical Learning with Applications in R (2013) Some of the figures in this presentation are taken from "An Introduction to Statistical Learning, with applications in R" (Springer, 2013) with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani Further Reading I.H.Witten and E.Frank. Data Mining - Practical machine learning tools and techniques with Java implementation. Accademic Press Pub., USA, 1999. P. Berka.Dobývání znalostí z databází. Academia, 2003. T. Mitchell. Machine Learning. McGraw Hill, New York, 1997. S. Russel and P. Norwig. Artificial Intelligence: A Modern Approach. Prentice Hall, 2003. Examples of our tasks Create a model to: ● Provide robot's position ● ● from last position an sensor readings. For an hospitalized patient, predict the risk of new heart attack. ● We know demographic data and client's anamnesis. ● Classify spam/e-mail. ● Predict the stock price in 6 months ● from actual state of the company and economic data. Typical Scenario ● We have the goal variable (cílovou veličinu) ● quantitative Y – stock price, robot position, or ● categorical G – heart attack yes/no. ● ● ● We ame to predict it based on features (příznaky) X (=predictors, independend variables, variables) sensor measurement, clinical tests. We have training dataset, where both features and goal variable are known. ● Based od training set we create a model ̂f , ĝ . ● Good model predicts the goal with a small error. Supervised / Unsupervised Learning (Učení s učitelem / bez učitele) ● ● Previous slide described supervized learning. In Unsupervised Learning no goal variable is given. We are looking for clusters or dependencies. Wage dataset ● Features: age, year, education ● Goal: wage, numeric ● xy plots (blue line = model), boxplot Gene Expression Data ● 64 cell lines, for each 6830 measurements ● Are there groups – clusters? ● Right plot: symbols for different cancer types. ● Cell lines of the same cancer type tend to be nearby. Movies ● Movielens, Netflix datasets. ● Users, Movies, Ratings. ● Tasks: ● Predict the rating. ● Predict the next selected movie. ● Find interesting relations. Scatterplot ● Three datasets N,X,T. ● Average rating of movies. ● Does it agree across datasets? Models (for regression) ● We (usually) assume: ● the goal variable Y – depends on (X1, ..., Xp) – by fixed but unknown function and random error of observation – . Why estimate f: Prediction ● Why estimate f: prediction and/or inference. ● Prediction: a back box extimate ● ● (error term averages to zero) The accuracy of ● reducible error ● ireducible error. depends on: E Why estimate f: Inference ● ● ● ● Inference tries to understand the relationship between predictor(s) and Y. example: Which media generate the biggest boost in sales? Simple and interpretable inference: linear models, trees. Usually better predictions: non-linear models, random forests, ... . How estimate f? ● We use training data. ● Two types of models: parametric, nonparametric. Parametric models: ● Select functional form (for example, linear) ● Estimate parameters from training data – usually, minimizing (penalized) square training error. ● Penalization tries to avoid overfitting, following noise in train data. Non-parametric models ● Without explicite assumptions about the functional form of f. Examples: ● kNN k-nearest neighbours ● spline, thin-plate spline Prediction Acuuracy versus Interpretability Model Accuracy ● Most popular: mean square error (MSE) ● Training error underestimate the true error ● therefore we use test data. Train vs. test error, Overfitting ● black: true f, orange: linear model, green, blue: splines ● on the green spline appears overfitting ● less flexible model would yielded a smaller test MSE. Other Fitting Examples The Bias-Variance Trade-Off ● E(test MSE) = variance + bias2 + irreducible Content Chapter 2: Introduction to Supervized Learning Chapter 3: Linear Regression Chapter 4: Classification Chapter 5: Resampling Methods, Model Assessment and Model Selection Chapter 6: Linear Model Selection and Regularization Chapter 7: Moving Beyond Linearity Chapter 8: Tree-Based Methods, Boosting Chapter 9: Support Vector Machines Chapter 10: Unsupervised Learning AND: EM Algorithm (ESL Chapter 8.) Examination: Typical Questions ● Explain the term: overfitting, curse of dimensionality, Bayes decision boundary, maximum likelihood estimate, optimal separating hyperplane, ... ● Write the algorithm: AdaBoost, Decision trees with prunning, EM, SVM, ... ● Additional questions to check your understanding and for the mark 1. ● Zkouška může probíhat i česky, slovensky. R code demo ● ISL book contains LAB sections ● R file can be downloaded from www. LABs neither following demo needed for exam. ● Just to show IT IS EASY TO RUN, – if you need, you will make it. Curse of Dimensionality ● Nearest neighbour is quite far in high dimension. Curse of Dimensionality 2 Curse of Dimensionality 3

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Machine Learning