Download Numerical Learning

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cooley–Tukey FFT algorithm wikipedia , lookup

Factorization of polynomials over finite fields wikipedia , lookup

Transcript
Introduction
Numerical Learning
Numerical learning methods learn the parameters or weights of a model,
often by optimizing an error function. Examples include:
Calculate the parameters of a probability distribution.
Separate positive from negative examples by a decision boundary.
Find points close to positive but far from negative examples.
Update parameters to decrease error.
CS 5233 Artificial Intelligence
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Naive Bayes
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Naive Bayes Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Naive Bayes Example Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
3
4
5
The EM Algorithm
6
The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
The Nearest Neighbor Algorithm
7
The Nearest Neighbor Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Support Vector Machines
Introduction . . . . . . . . . . . . . . . . . . . .
Example SVM for Separable Examples . . .
Example SVM for Nonseparable Examples
Example Gaussian Kernel SVM. . . . . . . .
Example Gaussian Kernel, Zoomed In . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
8
. 8
. 9
10
11
12
Numerical Learning – 2
Naive Bayes
3
Introduction
For class C and attributes Xi , assume:
P(C, X1, ..., Xn) = P(C)P(X1|C)...P(Xn|C)
This corresponds to a Bayesian network where C is the sole parent of each
Xi .
Estimate prior and conditional probabilities by counting.
If an outcome occurs m times out of n trials, Laplace’s law of succession
recommends the estimate (m + 1)/(n + k) where k is the number of
outcomes.
CS 5233 Artificial Intelligence
Numerical Learning – 3
Naive Bayes Example
Using Laplace’s law of succession on the 14 examples.:
P (pos) = (9 + 1)/(14 + 2) = 10/16
P (neg) = (5 + 1)/(14 + 2) = 6/16
P (sunny | pos) = (2 + 1)/(9 + 3) = 3/12
P (overcast | pos) = (4 + 1)/(9 + 3) = 5/12
P (rain | pos) = (3 + 1)/(9 + 3) = 4/12
CS 5233 Artificial Intelligence
1
Numerical Learning – 4
2
The Nearest Neighbor Algorithm
Naive Bayes Example Continued
For the first example:
7
The Nearest Neighbor Algorithm
P (pos | sunny, hot, high, false)
= α (10/16) (3/12) (3/12) (4/11) (7/11)
≈ α 0.00904
P (neg | sunny, hot, high, false)
= α (6/16) (4/8) (3/8) (5/7) (3/7)
≈ α 0.02152
0.02152
≈
≈ 0.704
0.00904 + 0.02152
CS 5233 Artificial Intelligence
The k-nearest neighbor algorithm classifies a test example by finding the k
closest training example(s), returning the most common class.
Suppose 10% noise (best possible test error is 10%).
With sufficient training exs., a test example will agree with its nearest
neighbor with prob. (.9)(.9) + (.1)(.1) = .82 (both not noisy or both noisy)
and disagree with prob. (.9)(.1) + (.1)(.9) = .18.
In general 1-nearest neighbor converges to less than twice the optimal error
(3-NN to less than 32% higher than optimal).
CS 5233 Artificial Intelligence
Numerical Learning – 7
Numerical Learning – 5
Support Vector Machines
The EM Algorithm
6
Introduction
The EM Algorithm
The Expectation-Maximization algorithm can be used to estimate hidden
variables. Let C be the hidden variable.
1. Initialize. Guess the probability distribution P0(C, X1, ..., Xn)
2. E-Step. Find Pi (cj |x1, ..., xn) for each ex.
3. M-Step. Estimate Pi+1(C, X1, ..., Xn) based on probabilities from the
E-Step.
4. Stop if changes were small, else go to E-Step.
CS 5233 Artificial Intelligence
Numerical Learning – 6
A SVM assigns a weight αi to each example (yi , xi) (yi is either −1 or 1,
xi is an attribute value vector).
A SVM computes a discriminant by:
h(x) = sign b + Σ αi yi K(x, xi)
i
where K is a kernel function.
A SVM learns by optimizing the error function:
minimize khk2/2 + Σ max(0, 1 − yi h(xi))
i
subject to 0 ≤ αi ≤ C
where khk is the size of h in kernel space
CS 5233 Artificial Intelligence
3
8
Numerical Learning – 8
4
Example SVM for Separable Examples
Example SVM for Nonseparable Examples
3
w.x + b = -1
w.x + b = 0
w.x + b = 1
2.5
3
2
w.x + b = -1
w.x + b = 0
w.x + b = 1
2.5
1.5
2
1
0.5
1.5
0
1
-0.5
-1
0.5
0
1
2
3
CS 5233 Artificial Intelligence
4
5
6
Numerical Learning – 9
5
3
3.5
4
4.5
5
CS 5233 Artificial Intelligence
5.5
6
6.5
7
Numerical Learning – 10
6
Example Gaussian Kernel SVM
1
0
-1
3
2.5
2
1.5
1
0.5
0
1
2
3
4
5
CS 5233 Artificial Intelligence
6
7
Numerical Learning – 11
Example Gaussian Kernel, Zoomed In
1
0
-1
2
1.8
1.6
1.4
1.2
1
4.2 4.4 4.6 4.8
CS 5233 Artificial Intelligence
5
5.2 5.4 5.6 5.8
Numerical Learning – 12
7