Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Introduction Numerical Learning Numerical learning methods learn the parameters or weights of a model, often by optimizing an error function. Examples include: Calculate the parameters of a probability distribution. Separate positive from negative examples by a decision boundary. Find points close to positive but far from negative examples. Update parameters to decrease error. CS 5233 Artificial Intelligence Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Naive Bayes Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Example Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 The EM Algorithm 6 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Nearest Neighbor Algorithm 7 The Nearest Neighbor Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Support Vector Machines Introduction . . . . . . . . . . . . . . . . . . . . Example SVM for Separable Examples . . . Example SVM for Nonseparable Examples Example Gaussian Kernel SVM. . . . . . . . Example Gaussian Kernel, Zoomed In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . 8 . 9 10 11 12 Numerical Learning – 2 Naive Bayes 3 Introduction For class C and attributes Xi , assume: P(C, X1, ..., Xn) = P(C)P(X1|C)...P(Xn|C) This corresponds to a Bayesian network where C is the sole parent of each Xi . Estimate prior and conditional probabilities by counting. If an outcome occurs m times out of n trials, Laplace’s law of succession recommends the estimate (m + 1)/(n + k) where k is the number of outcomes. CS 5233 Artificial Intelligence Numerical Learning – 3 Naive Bayes Example Using Laplace’s law of succession on the 14 examples.: P (pos) = (9 + 1)/(14 + 2) = 10/16 P (neg) = (5 + 1)/(14 + 2) = 6/16 P (sunny | pos) = (2 + 1)/(9 + 3) = 3/12 P (overcast | pos) = (4 + 1)/(9 + 3) = 5/12 P (rain | pos) = (3 + 1)/(9 + 3) = 4/12 CS 5233 Artificial Intelligence 1 Numerical Learning – 4 2 The Nearest Neighbor Algorithm Naive Bayes Example Continued For the first example: 7 The Nearest Neighbor Algorithm P (pos | sunny, hot, high, false) = α (10/16) (3/12) (3/12) (4/11) (7/11) ≈ α 0.00904 P (neg | sunny, hot, high, false) = α (6/16) (4/8) (3/8) (5/7) (3/7) ≈ α 0.02152 0.02152 ≈ ≈ 0.704 0.00904 + 0.02152 CS 5233 Artificial Intelligence The k-nearest neighbor algorithm classifies a test example by finding the k closest training example(s), returning the most common class. Suppose 10% noise (best possible test error is 10%). With sufficient training exs., a test example will agree with its nearest neighbor with prob. (.9)(.9) + (.1)(.1) = .82 (both not noisy or both noisy) and disagree with prob. (.9)(.1) + (.1)(.9) = .18. In general 1-nearest neighbor converges to less than twice the optimal error (3-NN to less than 32% higher than optimal). CS 5233 Artificial Intelligence Numerical Learning – 7 Numerical Learning – 5 Support Vector Machines The EM Algorithm 6 Introduction The EM Algorithm The Expectation-Maximization algorithm can be used to estimate hidden variables. Let C be the hidden variable. 1. Initialize. Guess the probability distribution P0(C, X1, ..., Xn) 2. E-Step. Find Pi (cj |x1, ..., xn) for each ex. 3. M-Step. Estimate Pi+1(C, X1, ..., Xn) based on probabilities from the E-Step. 4. Stop if changes were small, else go to E-Step. CS 5233 Artificial Intelligence Numerical Learning – 6 A SVM assigns a weight αi to each example (yi , xi) (yi is either −1 or 1, xi is an attribute value vector). A SVM computes a discriminant by: h(x) = sign b + Σ αi yi K(x, xi) i where K is a kernel function. A SVM learns by optimizing the error function: minimize khk2/2 + Σ max(0, 1 − yi h(xi)) i subject to 0 ≤ αi ≤ C where khk is the size of h in kernel space CS 5233 Artificial Intelligence 3 8 Numerical Learning – 8 4 Example SVM for Separable Examples Example SVM for Nonseparable Examples 3 w.x + b = -1 w.x + b = 0 w.x + b = 1 2.5 3 2 w.x + b = -1 w.x + b = 0 w.x + b = 1 2.5 1.5 2 1 0.5 1.5 0 1 -0.5 -1 0.5 0 1 2 3 CS 5233 Artificial Intelligence 4 5 6 Numerical Learning – 9 5 3 3.5 4 4.5 5 CS 5233 Artificial Intelligence 5.5 6 6.5 7 Numerical Learning – 10 6 Example Gaussian Kernel SVM 1 0 -1 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 CS 5233 Artificial Intelligence 6 7 Numerical Learning – 11 Example Gaussian Kernel, Zoomed In 1 0 -1 2 1.8 1.6 1.4 1.2 1 4.2 4.4 4.6 4.8 CS 5233 Artificial Intelligence 5 5.2 5.4 5.6 5.8 Numerical Learning – 12 7