Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Machine Learning – Lecture 17 Introduction to Gaussian Processes 14.07.2009 Bastian Leibe RWTH Aachen http://www.umic.rwth-aachen.de/multimedia [email protected] Many slides adapted from B. Schiele Course Outline • Fundamentals (2 weeks) Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Bayes Decision Theory Probability Density Estimation • Discriminative Approaches (5 weeks) Lin. Discriminants, SVMs, Boosting Dec. Trees, Random Forests, Model Sel. • Graphical Models (5 weeks) Bayesian Networks & Applications Markov Random Fields & Applications Exact Inference Approximate Inference • Regression Problems (2 weeks) Gaussian Processes B. Leibe 2 Recap: Sampling Idea • Objective: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Evaluate expectation of a function f(z) w.r.t. a probability distribution p(z). • Sampling idea Draw L independent samples z(l) with l = 1,…,L from p(z). This allows the expectation to be approximated by a finite sum XL 1 f^ = f (zl ) L l= 1 As long as the samples z(l) are drawn independently from p(z), then Unbiased estimate, independent of the dimension of z! Slide adapted from Bernt Schiele B. Leibe 3 Image source: C.M. Bishop, 2006 Recap: Sampling from a pdf Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • In general, assume we are given the pdf p(x) and the corresponding cumulativeZdistribution: x F (x) = p(z)dz ¡ 1 • To draw samples from this pdf, we can invert the cumulative distribution function: u » Uniform(0; 1) ) F ¡ 1 (u) » p(x) Slide credit: Bernt Schiele B. Leibe 4 Image source: C.M. Bishop, 2006 Recap: Rejection Sampling Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Assumptions Sampling directly from p(z) is difficult. But we can easily evaluate p(z) (up to some norm. factor Zp): 1 p~(z) Zp • Idea We need some simpler distribution q(z) (called proposal p(z) = distribution) from which we can draw samples. Choose a constant k such that: 8z : kq(z) ¸ p ~(z) • Sampling procedure Generate a number z0 from q(z). Generate a number u0 from the uniform distribution over [0,kq(z0)]. ~(z0 ) reject sample, otherwise accept. If u0 > p Slide adapted from Bernt Schiele B. Leibe 5 Image source: C.M. Bishop, 2006 Recap: Importance Sampling • Approach Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Approximate expectations directly (but does not enable to draw samples from p(z) directly). Goal: • Idea Use a proposal distribution q(z) from which it is easy to sample. Express expectations in the form of a finite sum over samples {z(l)} drawn from q(z). Importance weights Slide adapted from Bernt Schiele B. Leibe 6 Image source: C.M. Bishop, 2006 Recap: MCMC – Markov Chain Monte Carlo • Overview Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Allows to sample from a large class of distributions. Scales well with the dimensionality of the sample space. • Idea We maintain a record of the current state z(¿) The proposal distribution depends on the current state: q(z|z(¿)) The sequence of samples forms a Markov chain z(1), z(2),… • Approach At each time step, we generate a candidate sample from the proposal distribution and accept the sample according to a criterion. Different variants of MCMC for different criteria. Slide adapted from Bernt Schiele B. Leibe 7 Image source: C.M. Bishop, 2006 Recap: MCMC – Metropolis Algorithm • Metropolis algorithm Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine [Metropolis et al., 1953] Proposal distribution is symmetric: q(zA jzB ) = q(zB jzA ) The new candidate sample z* is accepted with probability µ ¶ ? p ~ (z ) A(z? ; z( ¿) ) = min 1; p~(z( ¿) ) New candidate samples always accepted if p ~(z? ) ¸ p~(z(¿) ) . The algorithm sometimes accepts a state with lower probability. • Metropolis-Hastings Algorithm Generalization: Proposal distribution not necessarily symmetric. The new candidate sample z* is accepted with probability µ ¶ ? ( ¿) ? p~(z )qk (z jz ) A(z? ; z( ¿) ) = min 1; p~(z( ¿) )qk (z? jz( ¿) ) where k labels the members of the set of considered transitions. Slide adapted from Bernt Schiele B. Leibe 8 Recap: Gibbs Sampling • Approach Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine MCMC-algorithm that is simple and widely applicable. May be seen as a special case of Metropolis-Hastings. • Idea Sample variable-wise: replace zi by a value drawn from the distribution p(zi|z\i). – This means we update one coordinate at a time. Repeat procedure either by cycling through all variables or by choosing the next variable. • Properties The algorithm always accepts! Completely parameter free. Can also be applied to subsets of variables. Slide adapted from Bernt Schiele B. Leibe 9 Topics of This Lecture • Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Least-squares regression Polynomial regression Overfitting Maximum-likelihood regression • Gaussian Processes: Weight Space View Linear model MAP estimate Prediction Non-linear model • Gaussian Processes: Function space view Definition Prediction with noise-free observations Prediction with noisy observations B. Leibe 10 From Classification to Regression • We will leave the realm of classification and turn to a Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine different task… • Regression Predict a continuous function value. Polynomial of order 0 (constant value) Slide credit: Bernt Schiele B. Leibe 11 Image source: C.M. Bishop, 2006 From Classification to Regression • We will leave the realm of classification and turn to a Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine different task… • Regression Predict a continuous function value. Polynomial of order 1 (line) Slide credit: Bernt Schiele B. Leibe 12 Image source: C.M. Bishop, 2006 From Classification to Regression • We will leave the realm of classification and turn to a Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine different task… • Regression Predict a continuous function value. Polynomial of order 2 (quadratic) Slide credit: Bernt Schiele B. Leibe 13 Image source: C.M. Bishop, 2006 From Classification to Regression • We will leave the realm of classification and turn to a Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine different task… • Regression Predict a continuous function value. Polynomial of order 9 Massive overfitting! Slide credit: Bernt Schiele B. Leibe 14 Image source: C.M. Bishop, 2006 From Classification to Regression • In 2-class classification with a discriminant function, our Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine goal was to find a function y(x) such that y(x) > 0 for all data points in class 1 y(x) < 0 for all data points in class -1 • Regression Given training data {(x1,y1),…,(xn,yn)} where xi is a training point with desired function value yi. Want to find a function y : Rd x ! R ! 7 y(x ) Should fit the training data well, but also generalize! Slide credit: Bernt Schiele B. Leibe 15 From Classification to Regression • Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Find a function y : Rd ! R x ! 7 y(x ) Generalization of binary classification to arbitrary real values. This suggests that some of our classification methods might be adapted to this situation. • First things first: Least-squares Slide credit: Bernt Schiele B. Leibe 16 Least-Squares Regression • We have given Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Training data points: Associated function values: X = f x 1 2 Rd ; : : : ; x n g Y = f y1 2 R; : : : ; yn g • Start with linear regressor: Try to enforce x Ti w + w0 = yi ; 8i = 1; : : : ; n One linear equation for each training data point / label pair. This is the same basic setup used for least-squares classification! – Only the values are now continuous. Slide credit: Bernt Schiele B. Leibe 17 Least-Squares Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine x Ti w + w0 = yi ; • Setup µ xi 1 Step 1: Define x~i = Step 2: Rewrite ~ = yi ; x~Ti w Step 3: Matrix-vector notation 8i = 1; : : : ; n ¶ eT w ~= y X µ ~= w ; w w0 ¶ 8i = 1; : : : ; n with e = [~ X x 1 ; : : : ; x~n ] y = [y1 ; : : : ; yn ]T Step 4: Find least-squares solution eT w ~ ¡ yk2 ! min kX Solution: Slide credit: Bernt Schiele eX e T )¡ 1X ey ~ = (X w B. Leibe 18 Polynomial Regression • How can we fit arbitrary polynomials using least-squares Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine regression? We introduce a feature transformation as before. T y(x ) = w Á(x ) assume Á0 (x) = 1 XM = wi Ái (x ) i= 0 basis functions E.g.: Á(x) = (1; x; x 2 ; x 3 ) T Fitting a cubic polynomial. Slide credit: Bernt Schiele B. Leibe 19 Overfitting Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Example: Polynomial of degree 9 Relatively little data Overfitting typical Slide credit: Bernt Schiele Enough data Good estimate B. Leibe 20 Image source: C.M. Bishop, 2006 What Is Happening Here? Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • The coefficients get very large: Fitting the data from before with various polynomials. Coefficients: Slide credit: Bernt Schiele B. Leibe 21 Image source: C.M. Bishop, 2006 What Is Happening Here? • Obvious problems Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Overfitting Numerical instability • How can we address these in a principled way? We use the probabilistic notion that we’ve been using throughout the lecture. • First step: Least-squares regression as maximum likelihood estimation. Slide credit: Bernt Schiele B. Leibe 22 Probabilistic Regression • First assumption: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Our target function values y are generated by adding noise to the function estimate: Target function value y = f (x; w) + ² Regression function (previously y(¢)) Input value Noise Weights or parameters • Second assumption: The noise is Gaussian distributed. p(yjx; w; ¯) = N (yjf (x; w); ¯ ¡ 1) Mean Slide credit: Bernt Schiele B. Leibe Variance (¯ precision) 23 Probabilistic Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Given Training data points: X = [x 1 ; : : : ; x n ] 2 Rd£ n Associated function values: y = [y1 ; : : : ; yn ]T • Conditional likelihood (assuming i.i.d. data) Yn p(y jX ; w ; ¯) = N (yi jf (x i ; w ); ¯ ¡ 1 ) = i= 1 N (yi jw T Á(x i ); ¯ ¡ 1 ) i= 1 Generalized linear regression function Maximize w.r.t. w, ¯ Slide credit: Bernt Schiele Yn B. Leibe 24 Maximum Likelihood Regression • Simplify the log-likelihood Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Xn log p(y jX ; w ; ¯) = log N (yi jw T Á(x i ); ¯ ¡ 1 ) i= 1 Xn · = i= 1 µ p ¶ ¸ ¯ ¯ p log ¡ (yi ¡ w T Á(x i )) 2 2 2¼ n n n ¯X = log ¯ ¡ log(2¼) ¡ (yi ¡ w T Á(x i )) 2 2 2 2 i= 1 • Gradient w.r.t. w: Xn r w log p(y jX ; w ; ¯) = ¡ ¯ (yi ¡ w T Á(x i ))Á(x i ) i= 1 Slide credit: Bernt Schiele B. Leibe 25 Maximum Likelihood Regression Xn Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine r w log p(y jX ; w ; ¯) = ¡ ¯ (yi ¡ w T Á(x i ))Á(x i ) i= 1 • Setting the gradient to zero: Xn (yi ¡ w T Á(x i ))Á(x i ) 0 = ¡ ¯ i= 1 " Xn , Xn yi Á(x i ) = i= 1 # Á(x i )Á(x i ) T w i= 1 , ©y = ©© T w , w M L = (©© T ) ¡ 1 ©y © = [Á(x 1 ); : : : ; Á(x n )] Same as in least-squares regression! Slide credit: Bernt Schiele B. Leibe 26 Regression – Two Common Approaches • Restrict class of functions that we consider Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine E.g. linear functions of the input or non-linear functions Á(x). Solve with least-squares fitting or ML (maximum likelihood) estimate. (See previous slides) • Bayesian modeling Give a prior probability over all possible functions p(f). f: possible functions, X: input data, y: output Calculate MAP (maximum a posteriori) estimate: Likelihood of observations Prior over functions p(y jX ; f )p(f ) p(f jy ; X ) = p(y jX ) Normalization Slide credit: Bernt Schiele B. Leibe 27 Visualization of Bayesian Modeling Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • 1D example: • Left: Visualization of the prior p(f) (defined by Gaussian Process): – 4 function samples are drawn from the prior distribution. Point-wise mean is zero. Grey area: point-wise variance over all function values. • Right: Posterior given 2 observations (no uncertainty of their values) Grey area: point-wise variance of remaining function values. Slide credit: Bernt Schiele B. Leibe 28 Image source: Rasmussen & Williams, 2006 Topics of This Lecture • Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Least-squares regression Polynomial regression Overfitting Maximum-likelihood regression • Gaussian Processes: Weight Space View Linear model MAP estimate Prediction Non-linear model • Gaussian Processes: Function space view Definition Prediction with noise-free observations Prediction with noisy observations B. Leibe 29 Gaussian Process (Informal Introduction) • Gaussian distribution Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Probability distribution over scalars / vectors. • Gaussian process (generalization of Gaussian distrib.) Describes properties of functions. Function: Think of a function as a long vector where each entry specifies the function value f(xi) at a particular point xi. Issue: How to deal with infinite number of points… – If you ask only for properties of the function at a finite number of points… – Then inference in Gaussian Process gives you the same answer if you ignore the infinitely many other points. Slide credit: Bernt Schiele B. Leibe 30 Gaussian Process • Example prior over functions p(f) Represents our prior belief about functions before seeing any data. Although specific functions don’t have mean of zero, the mean of f(x) values for any fixed x is zero (here). Favors smooth functions Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine – I.e. functions cannot vary too rapidly – Smoothness is induced by the covariance function of the Gaussian Process. Learning in Gaussian processes – Is mainly defined by finding suitable properties of the covariance function. Slide credit: Bernt Schiele B. Leibe 31 Image source: Rasmussen & Williams, 2006 Standard Linear Model Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Linear regression model with Gaussian noise f (x) = x T w; y = f (x) + ² ² » N (0; ¾n2 ) • Calculation of likelihood X = (x1,…,xn) Given input data With corresponding output values y1,…,yn Yn p(y jw ; X ) = p(yi jx i ; w ) i= 1 Slide credit: Bernt Schiele B. Leibe 32 Linear Model – Likelihood Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Likelihood Yn p(y jw ; X ) = p(yi jx i ; w ) i= 1 Yn 1 p = exp 2¼¾n i= 1 1 = exp n =2 (2¼¾n ) ½ ½ ¡ (yi ¡ x Ti w ) 2 2¾n2 T ¡ jy ¡ X w j 2¾n2 2 ¾ ¾ p(yjw; X ) = N (X T w; ¾n2 I ) Slide credit: Bernt Schiele B. Leibe 33 Linear Model – MAP Estimate • Inference in Bayesian Model Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Calculation of posterior distribution over the weights p(y jw ; X )p(w ; X ) p(y jw ; X )p(w jX )p(X ) p(w jy ; X ) = = p(y ; X ) p(y jX )p(X ) p(y jw ; X )p(w jX ) likelihood £ prior = = p(y jX ) marginal likelihood T 2 p(yjw; X ) = N (X w; ¾n I ) Likelihood: Prior, e.g.: p(w) = N (0; § p ) Marginal likelihood (normalization constant): Z p(y jX ) = Slide credit: Bernt Schiele p(y jw ; X )p(w )dw B. Leibe 34 Linear Model – MAP Estimate • Posterior Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine p(wjy; X ) / p(yjw; X )p(w; X ) ½ ¾ ½ ¾ 1 1 T ¡1 T T T = exp ¡ (y ¡ X w ) (y ¡ X w ) exp ¡ w §p w 2 2¾n 2 ½ µ ¶ ¾ 1 1 T T ¡ 1 ¹ ¹) = exp ¡ (w ¡ w ) X X + § (w ¡ w p 2 2 ¾n with µ A= 1 T ¡ 1 X X + § p ¾n2 ¶ ¹ = w 1 ¡1 A Xy ¾n2 • MAP ¹ A ¡ 1) Is simply mean of p(wjy; X ) » N ( w; Slide credit: Bernt Schiele B. Leibe 35 Linear Model: Predictions • Predictions for a test case in Bayesian model Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Average over all possible parameter values weighted by their posterior probability. non-Bayesian: single parameter chosen by criterion (e.g. ML) 4 • Predictive distribution for f ? = f (x ? ) at x* Given by averaging over all possible models: Z p(f ? jx ? ; y ; X ) = p(f ? jx ? ; w )p(w jy ; X )dw µ = N Slide credit: Bernt Schiele 1 T ¡1 T ¡ 1 x A X y ; x x? ? ?A 2 ¾n B. Leibe ¶ 36 Linear Model: Predictions Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine • Predictive distribution µ p(f ? jx ? ; y ; X ) = N 1 T ¡1 T ¡ 1 x A X y ; x x? ? ?A 2 ¾n Predictive distribution is again Gaussian. Mean: ¹ x T? w – Uses MAP estimate of weight-vector ¹ = w ¶ 1 ¡1 A Xy ¾n2 ¡ 1 Variance: x T A x? ? – Quadratic form of the test input x* with the posterior covariance matrix A-1. Slide credit: Bernt Schiele B. Leibe 37 Linear Model: Predictions • 1D example: f(x) = w1 + w2x Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine 3 training points (crosses): X = (x 1 ; x 2 ; x 3 ) ¾n = 1 Assume noise in points: Predictive mean (solid line) Predicted standard deviation (dotted line) Likelihood: p(yjw; X ) = N (X T w; ¾n2 I ) Slide credit: Bernt Schiele Prior: p(w) = N (0; I ) B. Leibe Posterior: ¹ A ¡ 1) p(wjy; X ) = N ( w; 38 Image source: Rasmussen & Williams, 2006 Non-Linear Model • Map D-dimensional x into N-dimensional feature space: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine x ! Á(x) • Linear regression in the N-dimensional space: f (x) = Á(x) T w • Non-linear model: Previous analysis applies analogously by replacing X with ©(X ) = (Á(x 1); : : : ; Á(x n )) µ ¶ 1 T ¡ 1 T ¡ 1 p(f ? jx ? ; y ; X ) » N Á(x ) A ©(X )y ; Á(x ) Á(x ? ) ? ? A 2 ¾n with 1 A = 2 ©(X )©(X ) T + § ¡p 1 ¾n Slide credit: Bernt Schiele B. Leibe 39 Topics of This Lecture • Regression Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Least-squares regression Polynomial regression Overfitting Maximum-likelihood regression • Gaussian Processes: Weight Space View Linear model MAP estimate Prediction Non-linear model • Gaussian Processes: Function space view Definition Prediction with noise-free observations Prediction with noisy observations B. Leibe 40 Function Space View • Function space view Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Derive above results by performing inference in function space directly. We use a Gaussian process to describe a distribution over functions. • Definition A Gaussian process (GP) is a collection of random variables any finite number of which has a joint Gaussian distribution. Slide credit: Bernt Schiele B. Leibe 41 Gaussian Process • A Gaussian process is completely defined by Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Mean function m(x) and m(x) = E[f (x)] Covariance function k(x,x’) k(x; x 0) = E[(f (x) ¡ m(x)(f (x 0) ¡ m(x 0))] We write the Gaussian process (GP) f (x) » GP(m(x); k(x; x 0)) Slide credit: Bernt Schiele B. Leibe 42 Gaussian Process • Property Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Defined as a collection of random variables, which implies consistency. Consistency means · § 11 (y1,y2) » N(¹,§) §= § 21 – Then it must also specify y1 » N(¹1,§11) – If the GP specifies e.g. § 12 § 22 ¸ I.e. examination of a larger set of variables does not change the distribution of a smaller set. Slide credit: Bernt Schiele B. Leibe 43 Gaussian Process: Example • Example: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Bayesian linear regression model: f (x) = Á(x) T w With Gaussian prior: w » N (0; § p ) Mean: E[f (x)] = Á(x) T E[w] = 0 Covariance: E[f (x)f (x 0)] = Á(x) T E[ww T ]Á(x 0) = Á(x) T § p Á(x 0) Slide credit: Bernt Schiele B. Leibe 44 Gaussian Process: Squared Exponential • Typical covariance function Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Squared exponential (SE) – Covariance function specifies the covariance between pairs of random variables ½ ¾ 1 cov(f (x p ); f (x q )) = k(x p ; x q ) = exp ¡ jx p ¡ x q j 2 2 • Remarks Covariance between the outputs is written as a function between the inputs. The squared exponential covariance function corresponds to a Bayesian linear regression model with an infinite number of basis functions. For any positive definite covariance function k(.,.), there exists a (possibly infinite) expansion in terms of basis functions. Slide credit: Bernt Schiele B. Leibe 45 Gaussian Process: Prior over Functions • Distribution over functions: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Specification of covariance function implies distribution over functions. I.e. we can draw samples from the distribution of functions evaluated at a (finite) number of points. Procedure – We choose a number of input points X ? – We write the corresponding covariance matrix (e.g. using SE) element-wise: K (X ? ; X ? ) – Then we generate a random Gaussian vector with this covariance matrix: f ? » N (0; K (X ? ; X ? )) Slide credit: Bernt Schiele B. Leibe Example of 3 functions 46 sampled Image source: Rasmussen & Williams, 2006 Prediction with Noise-free Observations • Assume our observations are noise-free: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine f (x i ; f i )ji = 1; : : : ; ng • Joint distribution of the training outputs f and test outputs f* according to the prior: · ¸ µ · ¸¶ f K (X ; X ) K (X ; X ? ) » N 0; f? K (X ? ; X ) K (X ? ; X ? ) K(X, X*) contains covariances for all pairs of training and test points. • To get the posterior (after including the observations) We need to restrict the above prior to contain only those functions which agree with the observed values. Think of generating functions from the prior and rejecting those that disagree with the observations (obviously prohibitive). Slide credit: Bernt Schiele B. Leibe 47 Prediction with Noise-free Observations • Calculation of posterior Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Corresponds to conditioning the joint Gaussian prior distribution on the observations: f ? jX ? ; X ; f » N ( f¹? ; cov(f ? )) with: f¹? = K (X ? ; X )K (X ; X ) ¡ 1f cov(f ? ) = K (X ? ; X ?) ¡ K (X ? ; X )K (X ; X ) ¡ 1K (X ; X ? ) Slide credit: Bernt Schiele B. Leibe 48 Prediction with Noise-free Observations • Example: Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine Left: prior Right: posterior using 5 noise-free observations. Slide credit: Bernt Schiele B. Leibe 49 Image source: Rasmussen & Williams, 2006 References and Further Reading • Gaussian Processes are (shortly) described in Chapter Augmented Computing and Sensory Perceptual Summer’09 Learning, Machine 6.4 of Bishop’s book. Christopher M. Bishop Pattern Recognition and Machine Learning Springer, 2006 Carl E. Rasmussen, Christopher K.I. Williams Gaussian Processes for Machine Learning MIT Press, 2006 • A better introduction can be found in Chapters 1 and 2 of the book by Rasmussen & Williams (also available online: http://www.gaussianprocess.org/gpml/) B. Leibe 50