* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download Lecture 6 - IDA.LiU.se
Theoretical computer science wikipedia , lookup
Predictive analytics wikipedia , lookup
Data analysis wikipedia , lookup
Regression analysis wikipedia , lookup
Inverse problem wikipedia , lookup
Data assimilation wikipedia , lookup
Generalized linear model wikipedia , lookup
Corecursion wikipedia , lookup
Least squares wikipedia , lookup
Overview • Basis expansion • Splines • (Natural) cubic splines • Smoothing splines • Nonparametric logistic regression • Multidimensional splines • Wavelets Data mining and statistical learning - lecture 6 1 Linear basis expansion (1) Linear regression x1 x2 x3 y 1 -3 6 12 … … … … True model: y f ( x) 1 x1 2 x2 3 x3 Question: How to find fˆ ? Answer: Solve a system of linear equations to obtain ˆ1 , ˆ2 , ˆ3 Data mining and statistical learning - lecture 6 2 Linear basis expansion (2) Nonlinear model x1 x2 x3 y 1 -3 -1 12 … … … … True model: y 1 x1 x2 2 x2 e x3 3 sin x3 4 x12 Question: How to find fˆ ? Answer: A) Introduce new variables u1 x1 x2 , u 2 x2 e x3 , u3 sin x3 , u 4 x12 Data mining and statistical learning - lecture 6 3 Linear basis expansion (3) Nonlinear model B) Transform the data set u1 u2 u3 -3 -1.1 -0.84 1 12 … … … … u4 y True model: y 1u1 2u2 3u3 4u4 C) Apply linear regression to obtain ˆ1 , ˆ2 , ˆ3 , ˆ4 Data mining and statistical learning - lecture 6 4 Linear basis expansion (4) Conclusion: We can easily fit any model of the type M f X m hm X m 1 i.e., we can easily undertake a linear basis expansion in X Example: If the model is known to be nonlinear, but the exact form is unknown, we can try to introduce interaction terms f X 1 X 1 p X p 11 X 12 12 X 1 X 2 Data mining and statistical learning - lecture 6 5 Piecewise polynomial functions Assume X is one-dimesional Def. Assume the domain [a, b] of X is split into intervals [a, ξ1], [ξ 1, ξ 2], ..., [ξ n, b]. Then f(X) is said to be piecewise polynomial if f(X) is represented by separate polynomials in the different intervals. Note The points ξ1,..., ξ n are called knots Data mining and statistical learning - lecture 6 6 Piecewise polynomials Example. Continuous piecewise linear function Alternative A. Introduce linear functions on each interval and a set of constraints y1 1 x 1 y2 2 x 2 y x 3 3 3 (4 free parameters) y1 1 y 2 1 y 2 2 y 3 2 INS. FIG 5.1 lower left Alternative B. Use a basis expansion (4 free parameters) h1 X 1, h2 X X , h3 X X 1 , h4 X X 2 Theorem. The given formulations are equivalent. Data mining and statistical learning - lecture 6 7 Splines Definition A piecewise polynomial is called order-M spline if it has continuous derivatives up to order M-1 at the knots. Alternative definition An order-M spline is a function which can be represented by basis functions ( K= #knots ) h j X X j 1 , j 1, , M hM l X X l , l 1, , K M 1 Theorem. The definitions above are equivalent. Terrminology. Order-4 spline is called cubic spline INS. FIG 5.2 LR (look at basis and compare #free parameters) Note. Cubic splines: knot-discontinuity is not visible Data mining and statistical learning - lecture 6 8 Variance of spline estimators – boundary effects INSERT FIG 5.3 Data mining and statistical learning - lecture 6 9 Natural cubic spline Def. A cubic spline f is called natural cubic spline if the its 2nd and 3rd derivatives are zero at a and b Note It implies that f is linear on extreme intervals Basis functions of natural cubic splines N1 X 1, N 2 X X , N k 2 d k X d K 1 X , k 1, ..., K 2 where d k X X k 3 X K 3 K k Data mining and statistical learning - lecture 6 10 Fitting smooth functions to data Minimize a penalized sum of squared residuals N RSS f , y i f xi f t dt 2 2 i 1 where λ is smoothing parameter. λ=0 : any function interpolating data λ=+ : least squares line fit Data mining and statistical learning - lecture 6 11 Optimality of smoothing splines Theorem The function f minimizing RSS for a given is a natural cubic spline with knots at all unique values of xi (NOTE: N knots!) The optimal spline can be computed as follows. N f x N j x j N x T j 1 RSS , y N y N T N T Nij N j xi N ij N i'' t N 'j' t dt ˆ N T N N 1 NT y Data mining and statistical learning - lecture 6 12 A smoothing spline is a linear smoother The fitted function fˆ N NT N N 1 NT y S y is linear in the response values. Data mining and statistical learning - lecture 6 13 Degrees of freedom of smoothing splines The effective degrees of freedom is dfλ = trace(Sλ) i.e., the sum of the diagonal elements of S. Data mining and statistical learning - lecture 6 14 Smoothing splines and eigenvectors It can be shown that S I K 1 where K is the so-called penalty matrix Furthermore, the eigen-decomposition is N S k u k u Tk k 1 k 1 1 d k Note: dk and uk are eigenvalues and eigenvectors, respectively, of K Data mining and statistical learning - lecture 6 15 Smoothing splines and shrinkage N S y u k k u Tk , y k 1 • Smoothing spline decomposes vector y with respect to basis of eigenvectors and shrinks respective contributions • The eigenvectors ordered by ρ increase in complexity. The higher the complexity, the more the contribution is shrunk. Data mining and statistical learning - lecture 6 16 Smoothing splines and local curve fitting • Eigenvalues are reverse functions of λ. The higher λ, the higher penalization. • Smoother matrix is has banded nature -> local fitting method N • df traceS k 1 1 1 d k Data mining and statistical learning - lecture 6 INSERT fig 5.8 17 Fitting smoothing splines in practice (1) Reinsch form: S I K 1 Theorem. If f is natural cubic spline with values at knots f and second derivative at knots then QT f R where Q and R are band matrices, dependent on ξ only. Theorem. 1 K QR Q T Data mining and statistical learning - lecture 6 18 Fitting smoothing splines in practice (2) Reinsch algorithm • Evaluate QTy • Compute R+λQTQ and find Cholesky decomposition (in linear time!) • Solve matrix equation (in linear time!) • Obtain f=y-λQγ Data mining and statistical learning - lecture 6 19 Automated selection of smoothing parameters (1) What can be selected: Regression splines • Degree of spline • Placement of knots ->MARS procedure Smoothing spline • Penalization parameter Data mining and statistical learning - lecture 6 20 Automated selection of smoothing parameters (2) Fixing the degrees of freedom N 1 k 1 1 d k df traceS • If we fix dfλ then we can find λ by solving the equation numerically • One could try two different dfλ and choose one based on Ftests, residual plots etc. Data mining and statistical learning - lecture 6 21 Automated selection of smoothing parameters (3) The bias-variance trade off N 1 k 1 1 d k df traceS INSERT FIG. 5.9 EPE – integrated squared prediction error, CV- cross validation Data mining and statistical learning - lecture 6 22 Nonparametric logistic regression Logistic regression model log Pr Y 1 | X x f (X ) Pr Y 0 | X x Note: X is one-dimensional What is f: Linear -> ordinary logistic regression (Chapter 4) • Enough smooth -> nonparametric logistic regression (splines+others) • Other choices are possible Data mining and statistical learning - lecture 6 23 Nonparametric logistic regression Problem formulation: Minimize penalized log-likelihood 2 1 min l p f , lu f , f t dt 2 Good news: Solution is still a natural cubic spline. Bad news: There is no analytic expression of that spline function Data mining and statistical learning - lecture 6 24 Nonparametric logistic regression How to proceed? Use Newton-Rapson to compute spline numerically, i.e • Compute l p l p , lp 2 2l p T (analytically) 1. Compute Newton direction using current value of parameter and derivative information 2. Compute new value of parameter using old value and update formula new old l 1 2 p l p Data mining and statistical learning - lecture 6 25 Multidimensional splines How to fit data smoothly in higher dimensions? A) Use basis of one dimensional functions and produce basis by tensor product g jk X h1 j X 1 h2 k X 2 , g X jk g jk X Problem: Exponential INS FIG. 6.10 growth of basis with dim Data mining and statistical learning - lecture 6 26 Multidimensional splines How to fit data smoothly in higher dimensions? B) Formulate a new problem min y f xi J f 2 i i • The solution is thin-plate splines • The similar properties for λ=0. • The solution in 2 dimension is essentially sum of radial basis functions f x 0 T x j x x j Data mining and statistical learning - lecture 6 27 Wavelets Introduction • The idea: to fit bumpy function by removing noise • Application area: Signal processing, compression • How it works: The function is represented in the basis of bumpy functions. The small coefficients are filtered. Data mining and statistical learning - lecture 6 28 Wavelets Basis functions (Haar Wavelets, Symmlet-8 Wavelets) INSERT FIG 5.13 Data mining and statistical learning - lecture 6 29 Wavelets Example Insert FIG 5.14 Data mining and statistical learning - lecture 6 30