Download Introduc%on to Applied Machine Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Principal component analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
Center for Complex Engineering Systems Introduc)on to Applied Machine Learning Abdullah Almaatouq ([email protected]) Abdulaziz Alhassan (a.alhassan@cces-­‐kacst-­‐mit.org) Ahmad Alabdulkareem ([email protected]) Tutorial Outline •  What is machine learning •  How to learn •  How to learn well •  Prac)cal Session Examples of Machine Learning What is Machine Learning •  Given computers the ability to learn without explicitly being programmed. – Arthur Samuel 1959 •  Machine learning is a computer program that improves its performance in a certain task with experience. – Tom Mitchell (1998) What is Machine Learning •  Making predic)ons, based on learned paSerns from historical data. •  Not to be confused with data mining –  Data mining focuses on exploi)ng paSerns in historical data. •  New field? –  Sta)s)cal Generaliza)on –  Computa)onal Sta)s)cs Why Machine learning? •  Huge amounts of data !!!! •  Humans are expensive, computers are cheap. –  88ECU, 32CPU and 244GB RAM ≈ $3.50/h –  Minimum worker wage in the US is $7.25/h •  Rule based is (oden): – 
– 
– 
– 
Difficult to capture all relevant paSerns. You can’t add rules to paSerns you don’t know exist Very hard to maintain (oden) doesn’t work well •  Psychic superpowers (some)mes!) When to use Machine Learning •  There is a paSern to be detected •  You have data Distance •  You can NOT pin it mathema)cally d = r × t Time (t) Types of Problems •  Supervised –  Training from seen labeled examples to generalize to unseen new observa)ons. –  Given some input xi predict ‘class/value’ of yi •  Unsupervised –  Find hidden structures in unlabeled data Types of Problems Machine Learning problems Supervised Regression Classifica)on Unsupervised Clustering Outlier Detec)on Supervised Learning •  ClassificaBon –  Binary: Given xi find yi in {-­‐1, 1} –  Mul)category : Given xi find yi in {-­‐1, .. K} •  Regression –  Given xi find yi in R (or Rd) Spam VS Not Spam Images to digits Beak depth predic)on Supervised Learning Framework Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) input Hypothesis set Op)miza)on rou)nes Error func)on New observa)on x h(x) es s
o
o
ch
Machine Learning input produces Known func)on g g ≈ f Predict value/category y Linear Regression Linear form: y = ax+b •  Capture the collec)ve behavior of credit officers •  Predict the credit line for new customers Bias 1 Age 23 years Annual salary $144000 Input x = Years in residence 4 years Years in job 2 years ….. … = [1,23,144000,4,2..] h(x) = θ1 x1 +... + θ d xd + θ 0 x0
d
Linear regression output: h(x) =
∑θ x
i
i=0
i
= θT x Linear Regression •  The historical dataset (x1,y1), (x2,y2),…, (xN,yN) •  y n ∈
R is the credit line for customer xn •  How well our h(x) = θTx approximate f(x) –  Squared error ( h(x) − f(x) )2 –  E(h) = 1
N
N
∑ (h(x ) − y
n
n
)2
n=1
•  Goal is to choose h that minimizes the E(h) Linear Regression y x1 x2 Illustra)on of 2d and 3d regression line and plane. It works the same in higher dimensions SeMng-­‐up the problem E(h) = X=
x
T
1
T
2
. . . x Tn
N
∑ (h(x ) − y
n
n
)2
n=1
Features E(θ) = 1
N
N
∑ (θ
T
xn − yn )2
n=1
1
|| Xθ − y ||2
N
= X=
Observa)ons x
1
N
Minimizing E(h) E(θ) = 1
|| Xθ − y ||2
N
E(θ) = 2 T
X (Xθ − y)
N
Python: theta = linalg.inv (X.T*X)*X.T*y = 2 T
X (Xθ − y) = 0
N
= 2
(X T Xθ − X T y) = 0
N
θ = (X T X )−1 X T y
Inverse here is pseudo-­‐inverse MATLAB: theta = inv(X'*X)*X'*y Linear Regression Supervised Learning Framework Credit line officers Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) Historical credit line h(x) = ax+ b Error func)on Squared error New observa)on x input Hypothesis set Gradient Op)miza)on decent rou)nes New applica)ons h(x) es s
o
o
ch
Machine Learning Linear Regression input produces Known func)on g g ≈ f Predict Credit line value/category y Support Vector Machines •  Separa)ng plane with maximum margin •  Margins are configurable with parameters to allow for mistakes •  Gold standard blackbox for many prac))oners Support Vector Machines •  Effec)ve in high dimensional spaces •  effec)ve even when #features > #observa)ons •  It can uses different kernel func)ons Polynomial Linear RBF Supervised Learning Framework Unknown func)on f f: x ! y generates Training Examples (xi,yi) … .. (xn,yn) input Hypothesis set Op)miza)on rou)nes Error func)on New observa)on x h(x) es s
o
o
ch
Machine Learning input produces Known func)on g g ≈ f Predict value/category y Types of Problems Machine Learning problems Supervised Regression Classifica)on Unsupervised Clustering Outlier Detec)on Unsupervised Learning •  Trying to find hidden structure in unlabeled data. •  Clustering –  Group similar data points in “clusters” •  k-­‐means •  Mixture models •  hierarchical clustering Unsupervised Learning •  K-­‐means •  “Birds of a feather flock together” •  Group samples around a “mean” –  Start with random “means” and assign each point to the nearest “mean” –  Calculate new “means” –  Re-­‐assign points to new “means”… Random ini)al means Assignment Mean Recalcula)on Re-­‐assign points to new means Mean Recalcula)on Re-­‐assign points to new means Unsupervised Learning •  Mixture Models –  Presence of subpopula)ons within an overall popula)on Anomaly DetecBon •  Detect abnormal data •  Whitelis)ng good behavior Anomaly DetecBon •  Sta)s)cal Anomaly Detec)on 0.06 # X1 X2 … Xn 0.05 1 12 8 … 48 0.04 2 14 7 … 52 0.03 3 16 8 … 57 4 11 6 … 53 … … … … … m 13 7 … 56 0.02 0 1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79 82 85 88 91 94 97 100 0.01 P(x) = p(x1, α1, μ1) p(x2, α2, μ2)… p(xn, αn, μn) Anomaly DetecBon •  Local Outlier Factor (LOF) –  Local density –  Locality is given by nearest neighbors How to learn (well) IniBal criBcal obstacles •  Choosing Features: –  By domain experts knowledge and exper)se –  Or by Feature extrac)on algorithms •  Choosing Parameters, for example: –  Degree of the regression equa)on –  SVM parameters (c, gamma) –  Number of clusters •  K-­‐mean (pre) •  Agglomera)ve (post) A simple example: FiMng a polynomial •  The green curve is the true func)on (which is not a polynomial) •  We will use a loss func)on that measures the squared error in the predic)on of y(x) from x. Some fits to the data: which is best? which is best? Occam's Razor •  “The simplest explanaBon for some phenomenon is more likely to be accurate than more complicated explanaBons.” which is best? A simple way to reduce model complexity Add penal)es from Bishop Example •  Mixture Models Using a validaBon set •  Divide the total dataset into three subsets: –  Training data: for learning the parameters of the model. –  Valida)on data for deciding what type of model and what amount of regulariza)on works best. –  Test data is used to get a final, unbiased es)mate of how well the algorithm works. We expect this es)mate to be worse than on the valida)on data. PCA •  Data visualiza)on •  Noise reduc)on •  Data reduc)on •  Can help with the curse of dimensionality •  And more Original Variable B What are PCA PC 2 PC 1 Original Variable A •  Orthogonal direc)ons of greatest variance in data Principal Components •  First PC is direc)on of maximum variance from origin Wavelength 2
30
25
PC 1
20
15
10
5
0 0
5
10
15
20
Wavelength 1
25
30
10
15
20
Wavelength 1
25
30
•  Subsequent PCs are orthogonal to 1st PC and describe maximum residual variance Wavelength 2
30
25
20
15
PC 2
10
5
0 0
5
Principal Components 5
2nd Principal
Component, y2
1st Principal
Component, y1
4
3
2
4.0
4.5
5.0
5.5
6.0
Principal Components 5
xi2
yi,2
4
yi,1
3
2
4.0
4.5
5.0
xi1
5.5
6.0
PCA on real example Original PCA r=4 64 256 3600 Thank you •  Make sure you have python and IPython notebook –  Easiest way to get this running hSp://con)nuum.io/downloads •  Download the tutorial IPython notebook hSp://www.amaatouq.com/notebook.zip