Download Machine Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Mixture model wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Machine Learning
Marta Vomlelová
[email protected]
KTIML, S303
Literature
T. Hastie, R. Tishirani, and J.
Friedman.
The Elements of Statistical Learning,
Data Mining, Inference and Prediction.
Springer Series in Statistics. Springer,
(2003 a later, web)
or
Gareth James, Daniela Witten, Trevor
Hastie and Robert Tibshirani:
An Introduction to Statistical Learning
with Applications in R (2013)
Some of the figures in this presentation are taken from
"An Introduction to Statistical Learning, with applications in R" (Springer, 2013)
with permission from the authors: G. James, D. Witten, T. Hastie and R. Tibshirani
Further Reading
I.H.Witten and E.Frank. Data Mining - Practical
machine learning tools and techniques with Java
implementation. Accademic Press Pub., USA,
1999.
P. Berka.Dobývání znalostí z databází. Academia,
2003.
T. Mitchell. Machine Learning. McGraw Hill, New
York, 1997.
S. Russel and P. Norwig. Artificial Intelligence: A
Modern Approach. Prentice Hall, 2003.
Examples of our tasks
Create a model to:
●
Provide robot's position
●
●
from last position an sensor readings.
For an hospitalized patient, predict the risk of
new heart attack.
●
We know demographic data and client's anamnesis.
●
Classify spam/e-mail.
●
Predict the stock price in 6 months
●
from actual state of the company and economic
data.
Typical Scenario
●
We have the goal variable (cílovou veličinu)
●
quantitative Y – stock price, robot position, or
●
categorical G – heart attack yes/no.
●
●
●
We ame to predict it based on features (příznaky) X
(=predictors, independend variables, variables)
sensor measurement, clinical tests.
We have training dataset, where both features
and goal variable are known.
●
Based od training set we create a model ̂f , ĝ .
●
Good model predicts the goal with a small error.
Supervised / Unsupervised Learning
(Učení s učitelem / bez učitele)
●
●
Previous slide described supervized learning.
In Unsupervised Learning no goal variable is
given.
We are looking for clusters
or dependencies.
Wage dataset
●
Features: age, year, education
●
Goal: wage, numeric
●
xy plots (blue line = model), boxplot
Gene Expression Data
●
64 cell lines, for each 6830 measurements
●
Are there groups – clusters?
●
Right plot: symbols for different cancer types.
●
Cell lines of the same cancer type tend to be nearby.
Movies
●
Movielens, Netflix datasets.
●
Users, Movies, Ratings.
●
Tasks:
●
Predict the rating.
●
Predict the next selected movie.
●
Find interesting relations.
Scatterplot
●
Three datasets N,X,T.
●
Average rating of movies.
●
Does it agree across datasets?
Models (for regression)
●
We (usually) assume:
●
the goal variable Y
–
depends on (X1, ..., Xp)
–
by fixed but unknown function
and random error of observation
–
.
Why estimate f: Prediction
●
Why estimate f: prediction and/or inference.
●
Prediction: a back box extimate
●
●
(error term averages to zero)
The accuracy of
●
reducible error
●
ireducible error.
depends on:
E
Why estimate f: Inference
●
●
●
●
Inference tries to understand the relationship
between predictor(s) and Y.
example: Which media generate the biggest
boost in sales?
Simple and interpretable inference: linear
models, trees.
Usually better predictions: non-linear models,
random forests, ... .
How estimate f?
●
We use training data.
●
Two types of models: parametric, nonparametric.
Parametric models:
●
Select functional form (for example, linear)
●
Estimate parameters from training data
–
usually, minimizing (penalized) square training error.
●
Penalization tries to avoid overfitting, following noise in train data.
Non-parametric models
●
Without explicite assumptions about the
functional form of f.
Examples:
●
kNN k-nearest neighbours
●
spline, thin-plate spline
Prediction Acuuracy versus
Interpretability
Model Accuracy
●
Most popular: mean square error (MSE)
●
Training error underestimate the true error
●
therefore we use test data.
Train vs. test error, Overfitting
●
black: true f, orange: linear model, green, blue: splines
●
on the green spline appears overfitting
●
less flexible model would yielded a smaller test
MSE.
Other Fitting Examples
The Bias-Variance Trade-Off
●
E(test MSE) = variance + bias2 + irreducible
Content
Chapter 2: Introduction to Supervized Learning
Chapter 3: Linear Regression
Chapter 4: Classification
Chapter 5: Resampling Methods,
Model Assessment and Model Selection
Chapter 6: Linear Model Selection and
Regularization
Chapter 7: Moving Beyond Linearity
Chapter 8: Tree-Based Methods, Boosting
Chapter 9: Support Vector Machines
Chapter 10: Unsupervised Learning
AND: EM Algorithm (ESL Chapter 8.)
Examination: Typical Questions
●
Explain the term:
overfitting, curse of dimensionality, Bayes decision
boundary, maximum likelihood estimate, optimal
separating hyperplane, ...
●
Write the algorithm:
AdaBoost, Decision trees with prunning, EM,
SVM, ...
●
Additional questions to check your
understanding
and for the mark 1.
●
Zkouška může probíhat i česky, slovensky.
R code demo
●
ISL book contains LAB sections
●
R file can be downloaded from www.
LABs neither following demo needed for exam.
●
Just to show IT IS EASY TO RUN,
–
if you need, you will make it.
Curse of Dimensionality
●
Nearest neighbour is quite far in high
dimension.
Curse of Dimensionality 2
Curse of Dimensionality 3