Download 2006-01-20 princomp, ridge, PLS

2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Target readings: Hastie, Tibshirani, Friedman Chapter 3: Linear regression, principal components, ridge regression, partial least squares No classes on Jan 25, 27.. Gauss-Markov Theorem: The “best linear unbiased estimator” minimizes RSS. To estimate a linear combination =aTβ with an unbiased linear combination cTy, choose ˆ  a T ˆ But, by accepting some bias, you can do much better. Best subset selection: Note that selecting the “best subset” is NOT a linear estimator. Why not?? -1- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Ridge Regression See Fig 3.7. The Principle: Add a “ridge” of size XTX, to stabilize the matrix inverse.  to the diagonal of ˆridge  ( X T X   I )1 X T Y Another view: penalized likelihood ˆridge : arg min  (Y  X  )T (Y  X  )   ||  ||2  This can also be thought of as maximizing a Bayesian posterior, where the prior is [ ] ~ N (0,(2 )1 I p ) . This is also an example of data augmentation : Let X aug X   Y    , Y aug    . Then OLS will yield 0  diag (  )  Another view: Solve a constrained optimization problem (restricting the model space): ˆ : arg min  (Y  X  )T (Y  X  ) restricted to the set { : ||  ||2  K } Note error in Fig 3.12: if X T X  I p , then as in Table 3.4, ˆ ridge So the ellipse’s major or minor axis must go through the origin. -2-  1 ˆ , ls 1  ˆridge . 2005-03-30 BIOINF 2054/BIOSTAT 2018 Supplemental notes, Statistical Foundations for Bioinformatics Data Mining The singular value decomposition of X (svd( )) X  UDV T is where U is N by p, U T U  I p and UU T  X ( X T X )1 X T  H (“hat” matrix). U transforms data points in “scatterplot space” p (rows of X in R ), creating a new dataset U T X  DV T V is p by p, V TV  VV T  I p , V rotates data points in “variable space” (columns of X in R n ), defining new variables XV  UD . D is diagonal, d1  ...  d p  0 are the singular values. [What are the eigenvalues of X T X ?] Then X ˆls  Yˆ  HY  UU T Y . Ridge Regression: degrees of freedom (equation 3.47) First, note that UU    U i1 j U T  p T i1i2 j 1 ji2 (definition of matrix multiplication) p   U i1 jU i2 j j 1    u j u Tj  p j 1 i1 i2 (outer product of column j with itself) p Therefore UU   u j u Tj . T j 1 -3- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Similarly, U diag(a)U    U i1 j a j U T  p T i1i2 j 1 ji2 p   a jU i1 jU i2 j j 1   a j  u j u Tj  p j 1 i1 i2 p Therefore U diag(a)U   a j u j u Tj (regarding a j as a scalar multiplier) T j 1 p   u j a j u Tj j 1 In 3.47, (regarding a j as a 1  1 matrix) diag(a)  D( D   I ) D, 2 1 aj  d j2 d j2   . We conclude: p X ˆridge  UD( D 2   I )1 DU T Y   u j j 1 Recall that d j2 uTj Y . dj  2 X ˆls  UU T Y . For a linear smooth Yˆ  X ˆ  SY , the effective degrees of freedom are (is?) df  tr ( S )  S11  ...  S pp  sum(diag( S )) p So for ridge regression df ( )   d j 1 d j2 2 j and for least squares, df  df (0)  p . -4-  , (see 5.4.1, 7.6). 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Lasso Regression See Fig 3.9. ˆ : arg min  (Y  X  )T (Y  X  )   ||  || Note the FIRST power of the length in the penalty function. Another view: Solve a constrained optimization problem (restricting the model space): ˆ : arg min  (Y  X  )T (Y  X  ) restricted to the set { : ||  || K } Be prepared to compare ridge regression to lasso regression. See Fig 3.12 and 3.13. -5- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Principal Components Recall: See Fig 3.10. X  UDV T . The principal component weights are the columns of V, v1...v p . X T XV  VDU TUDV TV  VD 2 , so X T Xv j  d 2j v j (the v j are eigenvectors). The principal components are the linear combinations z j  Xv j , j  1,..., p . Note that Z ( z1...z p )  XV  UD. This is a derived covariate technique. Z replaces X. Algorithm for generating principal components: The successive principal components solve v j  arg max Var ( X  )  over all α of length 1 and orthogonal to v1 ,..., v j 1 . (Important: note that Y does not enter into this.) See Fig 3.8. -6- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Principal components regression is the model yˆ M pcr  y  ˆ j z j , so j 1 ˆ M pcr  ˆ j v j , j 1 Note that M, the number of components to include, is a model complexity tuning parameter. -7- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Partial Least Squares The successive PLS components solve ˆ j  arg max Cov(Y , X  ) over all α of length 1 and orthogonal to 1 ,..., j 1 . This is the same as ˆ j  arg max corr 2 (Y , X  )Var ( X  ) Contrast this with principal components, where Y plays no role. PLS regression is the model yˆ M pls  y  ˆ j z j , where z j  X ˆ j , so j 1 M ˆ pls  ˆ,jˆ j j 1 (depends on M, a “smoothing” or “model complexity” parameter). This is another derived covariates method. -8- 2005-03-30 Supplemental notes, BIOINF 2054/BIOSTAT 2018 Statistical Foundations for Bioinformatics Data Mining Comparing the methods: See Fig 3.6, 3.11, Table 3.3. Summary: What do you need to remember about these methods? List here. Exercises due Feb 1: Go to http://www-stat.stanford.edu/~tibs/ElemStatLearn/ . Obtain the prostate cancer data set. Load it into R. Carry out OLS regression, ridge regression, principle components regression, and partial least squares regression. Also do exercises 3.1- 3.7 (skip 3.3b), 3.9, 3.11, 3.17. As usual, bring to class at least one AHA and one Question about Chapter 3. You will read Ch. 4 .1 – 4.3 for Friday Feb 3. -9-

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 2006-01-20 princomp, ridge, PLS