Download Job Shop Scheduling

Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3 Feedback in Learning • Supervised learning: correct answers for each example • Unsupervised learning: correct answers not given • Reinforcement learning: occasional rewards The problem of finding labels for unlabeled data So far we have solved “supervised” classification problems where a teacher told us the label of each example. In nature, items often do not come with labels. How can we learn labels without a teacher? Unlabeled data x2 Labeled data 6 6 4 4 2 2 0 0 -2 -2 -4 -4 -6 -6 -10 -8 -6 -4 -2 -10 -8 -6 -4 -2 x1 From Shadmehr & Diedrichsen Example: image segmentation Identify pixels that are white matter, gray matter, or outside of the brain. Outside the brain 1500 1250 1000 Gray matter White matter 750 500 250 0.2 0.4 0.6 0.8 1 Pixel value (normalized) From Shadmehr & Diedrichsen Raw Sensor Data Measured distances for expected distance of 300 cm. Sonar Laser ANEMIA PATIENTS AND CONTROLS Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 Gaussians p ( x) ~ N (  ,  2 ) :  p ( x)  1 e 2   1 ( x )2 2 2 Univariate -  p (x) ~ Ν (μ,Σ) : p ( x)  1 (2 ) d /2 Σ 1/ 2 e Multivariate 1  ( x μ ) t Σ 1 ( x μ ) 2  Fitting a Gaussian PDF to Data Good fit Poor fit 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 -5 -4 -3 © D. Weld and D. Fox -2 -1 0 0 1 2 -5 3 -4 -3 4 -2 5 -1 -0.2 0 1 2 3 4 5 From Russell 9 Fitting a Gaussian PDF to Data • Suppose y = y1,…,yn,…,yN is a set of N data values • Given a Gaussian PDF p with mean  and variance , define: N p  y |  ,     p  yn |  ,   n 1 N  n 1 1 e 2  1 ( yn   ) 2  2 2 • How do we choose  and  to maximise this probability? © D. Weld and D. Fox From Russell 10 Maximum Likelihood Estimation • Define the best fitting Gaussian to be the one such that p(y|,) is maximised. • Terminology: p(y|,), thought of as a function of y is the probability (density) of y p(y|,), thought of as a function of , is the likelihood of , • Maximizing p(y|,) with respect to , is called Maximum Likelihood (ML) estimation of , © D. Weld and D. Fox From Russell 11 ML estimation of , • Intuitively: The maximum likelihood estimate of  should be the average value of y1,…,yN, (the sample mean) The maximum likelihood estimate of  should be the variance of y1,…,yN. (the sample variance) • This turns out to be true: p(y| , ) is maximised by setting: 1  N © D. Weld and D. Fox N y , n 1 n 1  N N  y n 1   2 n From Russell 12 0.5 p(x) 0.4 Component 1 Component 2 0.3 0.2 0.1 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 © D. Weld and D. Fox 0 x 13 0.5 p(x) 0.4 Component 1 Component 2 0.3 0.2 0.1 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 © D. Weld and D. Fox 0 x 14 2 p(x) 1.5 Component Models 1 0.5 0 -5 0 5 10 5 10 0.5 p(x) 0.4 Mixture Model 0.3 0.2 0.1 0 -5 © D. Weld and D. Fox 0 x 15 Mixtures If our data is not labeled, we can hypothesize that: y  1, 2, 1. There are exactly m classes in the data: , m P  y 2. Each class y occurs with a specific frequency: 3. Examples of class y are governed by a specific distribution: p  x y  According to our hypothesis, each example x(i) must have been generated from a specific “mixture” distribution: m p  x   P  y  j  p  x y  j  j 1 We might hypothesize that the distributions are Gaussian: Parameters of the distributions    P  y  1 , 1 , 1 , , P  y  m  , m ,  m  p x     P  y  j N x  j ,  j  m j 1 Mixing proportions Normal distribution Graphical Representation of Gaussian Mixtures p( y) Hidden variable P y  y p( x | y ) Measured variable x p( x | y  1, 1 , 1 ) p( x | y  3, 3 ,  3 ) p( x | y  2, 2 , 2 ) 3 p ( x)   p ( y  i ) p ( x | y  i, i ,  i ) i 1 Learning of mixture models Learning Mixtures from Data Consider fixed K = 2 e.g., Unknown parameters Q = {1 , 1 , 2 , 2 , a1} Given data D = {x1,…….xN}, we want to find the parameters Q that “best fit” the data Early Attempts Weldon’s data, 1893 - n=1000 crabs from Bay of Naples - Ratio of forehead to body length - Suspected existence of 2 separate species Early Attempts Karl Pearson, 1894: - JRSS paper - proposed a mixture of 2 Gaussians - 5 parameters Q = {1 , 1 , 2 , 2 , a1} - parameter estimation -> method of moments - involved solution of 9th order equations! (see Chapter 10, Stigler (1986), The History of Statistics) “The solution of an equation of the ninth degree, where almost all powers, to the ninth, of the unknown quantity are existing, is, however, a very laborious task. Mr. Pearson has indeed possessed the energy to perform his heroic task…. But I fear he will have few successors…..” (1906) Charlier Maximum Likelihood Principle • Fisher, 1922 assume a probabilistic model likelihood = p(data | parameters, model) find the parameters that make the data most likely 1977: The EM Algorithm • Dempster, Laird, and Rubin General framework for likelihood-based parameter estimation with missing data • • • • start with initial guesses of parameters E-step: estimate memberships given params M-step: estimate params given memberships Repeat until convergence Converges to a (local) maximum of likelihood E-step and M-step are often computationally simple Generalizes to maximum a posteriori (with priors) EM for Mixture of Gaussians • E-step: Compute probability that point xj was generated by component i: pij  P(C  i | x j ) pij  a P( x j | C  i ) P(C  i ) pi   pij • M-step: Compute new mean, covariance, and component weights: j i   pij x j / pi j  2   pij ( x j  i ) 2 / pi j wi  pi © D. Weld and D. Fox 25 ANEMIA PATIENTS AND CONTROLS Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 1 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 3 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 5 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 10 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 15 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 EM ITERATION 25 Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 4.1 4 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 LOG-LIKELIHOOD AS A FUNCTION OF EM ITERATIONS 490 480 Log-Likelihood 470 460 450 440 430 420 410 400 0 5 10 15 EM Iteration 20 25 ANEMIA DATA WITH LABELS Red Blood Cell Hemoglobin Concentration 4.4 4.3 4.2 Control Group 4.1 4 Anemia Group 3.9 3.8 3.7 3.3 3.4 3.5 3.6 3.7 Red Blood Cell Volume 3.8 3.9 4 Mixture Density  a hit     a unexp  P ( z | x, m )   a max    a   rand  T  Phit ( z | x, m)     Punexp ( z | x, m)   Pmax ( z | x, m)     P ( z | x, m )   rand  How can we determine the model parameters? Raw Sensor Data Measured distances for expected distance of 300 cm. Sonar Laser Approximation Results Laser Sonar 300cm 400cm Predict Goal and Path Predicted goal Predicted path © D. Weld and D. Fox 38 Learned Transition Parameters Going to the workplace : bus © D. Weld and D. Fox Going home : car : foot 39

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Job Shop Scheduling