Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Introduction Numerical Learning Numerical learning methods learn the parameters or weights of a model, often by optimizing an error function. Examples include: Calculate the parameters of a probability distribution. Separate positive from negative examples by a decision boundary. Find points close to positive but far from negative examples. Update parameters to decrease error. CS 5233 Artificial Intelligence Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 Naive Bayes Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Naive Bayes Example Continued . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3 4 5 The EM Algorithm 6 The EM Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 The Nearest Neighbor Algorithm 7 The Nearest Neighbor Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Support Vector Machines Introduction . . . . . . . . . . . . . . . . . . . . Example SVM for Separable Examples . . . Example SVM for Nonseparable Examples Example Gaussian Kernel SVM. . . . . . . . Example Gaussian Kernel, Zoomed In . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 . 8 . 9 10 11 12 Numerical Learning – 2 Naive Bayes 3 Introduction For class C and attributes Xi , assume: P(C, X1, ..., Xn) = P(C)P(X1|C)...P(Xn|C) This corresponds to a Bayesian network where C is the sole parent of each Xi . Estimate prior and conditional probabilities by counting. If an outcome occurs m times out of n trials, Laplace’s law of succession recommends the estimate (m + 1)/(n + k) where k is the number of outcomes. CS 5233 Artificial Intelligence Numerical Learning – 3 Naive Bayes Example Using Laplace’s law of succession on the 14 examples.: P (pos) = (9 + 1)/(14 + 2) = 10/16 P (neg) = (5 + 1)/(14 + 2) = 6/16 P (sunny | pos) = (2 + 1)/(9 + 3) = 3/12 P (overcast | pos) = (4 + 1)/(9 + 3) = 5/12 P (rain | pos) = (3 + 1)/(9 + 3) = 4/12 CS 5233 Artificial Intelligence 1 Numerical Learning – 4 2 The Nearest Neighbor Algorithm Naive Bayes Example Continued For the first example: 7 The Nearest Neighbor Algorithm P (pos | sunny, hot, high, false) = α (10/16) (3/12) (3/12) (4/11) (7/11) ≈ α 0.00904 P (neg | sunny, hot, high, false) = α (6/16) (4/8) (3/8) (5/7) (3/7) ≈ α 0.02152 0.02152 ≈ ≈ 0.704 0.00904 + 0.02152 CS 5233 Artificial Intelligence The k-nearest neighbor algorithm classifies a test example by finding the k closest training example(s), returning the most common class. Suppose 10% noise (best possible test error is 10%). With sufficient training exs., a test example will agree with its nearest neighbor with prob. (.9)(.9) + (.1)(.1) = .82 (both not noisy or both noisy) and disagree with prob. (.9)(.1) + (.1)(.9) = .18. In general 1-nearest neighbor converges to less than twice the optimal error (3-NN to less than 32% higher than optimal). CS 5233 Artificial Intelligence Numerical Learning – 7 Numerical Learning – 5 Support Vector Machines The EM Algorithm 6 Introduction The EM Algorithm The Expectation-Maximization algorithm can be used to estimate hidden variables. Let C be the hidden variable. 1. Initialize. Guess the probability distribution P0(C, X1, ..., Xn) 2. E-Step. Find Pi (cj |x1, ..., xn) for each ex. 3. M-Step. Estimate Pi+1(C, X1, ..., Xn) based on probabilities from the E-Step. 4. Stop if changes were small, else go to E-Step. CS 5233 Artificial Intelligence Numerical Learning – 6 A SVM assigns a weight αi to each example (yi , xi) (yi is either −1 or 1, xi is an attribute value vector). A SVM computes a discriminant by: h(x) = sign b + Σ αi yi K(x, xi) i where K is a kernel function. A SVM learns by optimizing the error function: minimize khk2/2 + Σ max(0, 1 − yi h(xi)) i subject to 0 ≤ αi ≤ C where khk is the size of h in kernel space CS 5233 Artificial Intelligence 3 8 Numerical Learning – 8 4 Example SVM for Separable Examples Example SVM for Nonseparable Examples 3 w.x + b = -1 w.x + b = 0 w.x + b = 1 2.5 3 2 w.x + b = -1 w.x + b = 0 w.x + b = 1 2.5 1.5 2 1 0.5 1.5 0 1 -0.5 -1 0.5 0 1 2 3 CS 5233 Artificial Intelligence 4 5 6 Numerical Learning – 9 5 3 3.5 4 4.5 5 CS 5233 Artificial Intelligence 5.5 6 6.5 7 Numerical Learning – 10 6 Example Gaussian Kernel SVM 1 0 -1 3 2.5 2 1.5 1 0.5 0 1 2 3 4 5 CS 5233 Artificial Intelligence 6 7 Numerical Learning – 11 Example Gaussian Kernel, Zoomed In 1 0 -1 2 1.8 1.6 1.4 1.2 1 4.2 4.4 4.6 4.8 CS 5233 Artificial Intelligence 5 5.2 5.4 5.6 5.8 Numerical Learning – 12 7