Download Lecture 6 - IDA.LiU.se

Overview • Basis expansion • Splines • (Natural) cubic splines • Smoothing splines • Nonparametric logistic regression • Multidimensional splines • Wavelets Data mining and statistical learning - lecture 6 1 Linear basis expansion (1) Linear regression x1 x2 x3 y 1 -3 6 12 … … … … True model: y  f ( x)  1 x1   2 x2   3 x3   Question: How to find fˆ ? Answer: Solve a system of linear equations to obtain ˆ1 , ˆ2 , ˆ3 Data mining and statistical learning - lecture 6 2 Linear basis expansion (2) Nonlinear model x1 x2 x3 y 1 -3 -1 12 … … … … True model: y  1 x1 x2   2 x2 e x3   3 sin x3   4 x12   Question: How to find fˆ ? Answer: A) Introduce new variables u1  x1 x2 , u 2  x2 e x3 , u3  sin x3 , u 4  x12 Data mining and statistical learning - lecture 6 3 Linear basis expansion (3) Nonlinear model B) Transform the data set u1 u2 u3 -3 -1.1 -0.84 1 12 … … … … u4 y True model: y  1u1   2u2  3u3   4u4   C) Apply linear regression to obtain ˆ1 , ˆ2 , ˆ3 , ˆ4 Data mining and statistical learning - lecture 6 4 Linear basis expansion (4) Conclusion: We can easily fit any model of the type M f  X     m hm  X  m 1 i.e., we can easily undertake a linear basis expansion in X Example: If the model is known to be nonlinear, but the exact form is unknown, we can try to introduce interaction terms f  X   1 X 1     p X p  11 X 12  12 X 1 X 2   Data mining and statistical learning - lecture 6 5 Piecewise polynomial functions Assume X is one-dimesional Def. Assume the domain [a, b] of X is split into intervals [a, ξ1], [ξ 1, ξ 2], ..., [ξ n, b]. Then f(X) is said to be piecewise polynomial if f(X) is represented by separate polynomials in the different intervals. Note The points ξ1,..., ξ n are called knots Data mining and statistical learning - lecture 6 6 Piecewise polynomials Example. Continuous piecewise linear function Alternative A. Introduce linear functions on each interval and a set of constraints  y1   1 x   1   y2   2 x   2 y   x   3 3  3 (4 free parameters)  y1 1   y 2 1    y 2  2   y 3  2  INS. FIG 5.1 lower left Alternative B. Use a basis expansion (4 free parameters) h1  X   1, h2  X   X , h3  X    X  1  , h4  X    X   2  Theorem. The given formulations are equivalent. Data mining and statistical learning - lecture 6 7 Splines Definition A piecewise polynomial is called order-M spline if it has continuous derivatives up to order M-1 at the knots. Alternative definition An order-M spline is a function which can be represented by basis functions ( K= #knots ) h j X   X j 1 , j  1, , M hM l  X    X   l  , l  1, , K M 1 Theorem. The definitions above are equivalent. Terrminology. Order-4 spline is called cubic spline INS. FIG 5.2 LR (look at basis and compare #free parameters) Note. Cubic splines: knot-discontinuity is not visible Data mining and statistical learning - lecture 6 8 Variance of spline estimators – boundary effects INSERT FIG 5.3 Data mining and statistical learning - lecture 6 9 Natural cubic spline Def. A cubic spline f is called natural cubic spline if the its 2nd and 3rd derivatives are zero at a and b Note It implies that f is linear on extreme intervals Basis functions of natural cubic splines N1  X   1, N 2  X   X , N k  2  d k  X   d K 1  X , k  1, ..., K  2 where d k  X    X   k 3   X   K 3 K  k Data mining and statistical learning - lecture 6 10 Fitting smooth functions to data Minimize a penalized sum of squared residuals N RSS  f ,      y i  f xi      f t  dt 2 2 i 1 where λ is smoothing parameter. λ=0 : any function interpolating data λ=+ : least squares line fit Data mining and statistical learning - lecture 6 11 Optimality of smoothing splines Theorem The function f minimizing RSS for a given  is a natural cubic spline with knots at all unique values of xi (NOTE: N knots!) The optimal spline can be computed as follows. N f x    N j  x  j  N x   T j 1 RSS ,    y  N  y  N    T  N  T Nij  N j  xi    N ij   N i'' t N 'j' t dt ˆ  N T N    N  1 NT y Data mining and statistical learning - lecture 6 12 A smoothing spline is a linear smoother The fitted function  fˆ  N NT N   N  1 NT y  S  y is linear in the response values. Data mining and statistical learning - lecture 6 13 Degrees of freedom of smoothing splines The effective degrees of freedom is dfλ = trace(Sλ) i.e., the sum of the diagonal elements of S. Data mining and statistical learning - lecture 6 14 Smoothing splines and eigenvectors It can be shown that S   I  K  1 where K is the so-called penalty matrix Furthermore, the eigen-decomposition is N S     k  u k u Tk k 1  k    1 1  d k Note: dk and uk are eigenvalues and eigenvectors, respectively, of K Data mining and statistical learning - lecture 6 15 Smoothing splines and shrinkage N S  y   u k  k   u Tk , y k 1 • Smoothing spline decomposes vector y with respect to basis of eigenvectors and shrinks respective contributions • The eigenvectors ordered by ρ increase in complexity. The higher the complexity, the more the contribution is shrunk. Data mining and statistical learning - lecture 6 16 Smoothing splines and local curve fitting • Eigenvalues are reverse functions of λ. The higher λ, the higher penalization. • Smoother matrix is has banded nature -> local fitting method N • df   traceS    k 1 1 1  d k Data mining and statistical learning - lecture 6 INSERT fig 5.8 17 Fitting smoothing splines in practice (1) Reinsch form: S   I  K  1 Theorem. If f is natural cubic spline with values at knots f and second derivative  at knots then QT f  R where Q and R are band matrices, dependent on ξ only. Theorem. 1 K  QR Q T Data mining and statistical learning - lecture 6 18 Fitting smoothing splines in practice (2) Reinsch algorithm • Evaluate QTy • Compute R+λQTQ and find Cholesky decomposition (in linear time!) • Solve matrix equation (in linear time!) • Obtain f=y-λQγ Data mining and statistical learning - lecture 6 19 Automated selection of smoothing parameters (1) What can be selected: Regression splines • Degree of spline • Placement of knots ->MARS procedure Smoothing spline • Penalization parameter Data mining and statistical learning - lecture 6 20 Automated selection of smoothing parameters (2) Fixing the degrees of freedom N 1 k 1 1  d k df   traceS     • If we fix dfλ then we can find λ by solving the equation numerically • One could try two different dfλ and choose one based on Ftests, residual plots etc. Data mining and statistical learning - lecture 6 21 Automated selection of smoothing parameters (3) The bias-variance trade off N 1 k 1 1  d k df   traceS     INSERT FIG. 5.9 EPE – integrated squared prediction error, CV- cross validation Data mining and statistical learning - lecture 6 22 Nonparametric logistic regression Logistic regression model log Pr Y  1 | X  x   f (X ) Pr Y  0 | X  x  Note: X is one-dimensional What is f: Linear -> ordinary logistic regression (Chapter 4) • Enough smooth -> nonparametric logistic regression (splines+others) • Other choices are possible Data mining and statistical learning - lecture 6 23 Nonparametric logistic regression Problem formulation: Minimize penalized log-likelihood 2 1 min l p  f ,    lu  f ,       f t  dt 2 Good news: Solution is still a natural cubic spline. Bad news: There is no analytic expression of that spline function Data mining and statistical learning - lecture 6 24 Nonparametric logistic regression How to proceed? Use Newton-Rapson to compute spline numerically, i.e • Compute l p  l p    ,  lp  2  2l p    T (analytically) 1. Compute Newton direction using current value of parameter and derivative information 2. Compute new value of parameter using old value and update formula  new  old   l  1 2 p l p Data mining and statistical learning - lecture 6 25 Multidimensional splines How to fit data smoothly in higher dimensions? A) Use basis of one dimensional functions and produce basis by tensor product g jk  X   h1 j  X 1 h2 k  X 2 , g  X     jk g jk  X  Problem: Exponential INS FIG. 6.10 growth of basis with dim Data mining and statistical learning - lecture 6 26 Multidimensional splines How to fit data smoothly in higher dimensions? B) Formulate a new problem min y  f xi   J  f  2 i i • The solution is thin-plate splines • The similar properties for λ=0. • The solution in 2 dimension is essentially sum of radial basis functions  f x    0   T x    j x  x j Data mining and statistical learning - lecture 6  27 Wavelets Introduction • The idea: to fit bumpy function by removing noise • Application area: Signal processing, compression • How it works: The function is represented in the basis of bumpy functions. The small coefficients are filtered. Data mining and statistical learning - lecture 6 28 Wavelets Basis functions (Haar Wavelets, Symmlet-8 Wavelets) INSERT FIG 5.13 Data mining and statistical learning - lecture 6 29 Wavelets Example Insert FIG 5.14 Data mining and statistical learning - lecture 6 30

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Lecture 6 - IDA.LiU.se