Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Smoothing/Natural Splines and Penalized Likelihood
Regression data: (xi, Yi), a < x1 < · · · < xn < b
Yi = µ(xi) + i var(i) = σ 2
1X
min
(Yi − µ(xi))2 + λP (µ)
µ smooth n
where λ > 0 is a smoothing parameter and P = penalty.
EXAMPLE: (smooth.spline). Minimize
1X
(Yi − µ(xi))2 + λ
n
Z
(µ00)2
R 00 2
0
Minimize over all µ with µ absolutely continuous, (µ ) < ∞.
(µ0 absolutely continuous means µ00 exists almost everywhere and
R t 00
0
0
s µ (x) dx = µ (s) − µ (t) for all s < t.)
FACTS FOR THIS EXAMPLE:
• lines are reproduced: Yi = a + bxi ⇒ µ̂(x) = a + bx
• As λ → ∞, µ̂ converges to the least squares line.
• If λ = 0, µ̂ interpolates the data.
• µ̂ is piecewise cubic, with joins at the data points. µ̂ has a continuous
second derivative and µ̂00(x) = 0 for x ∈ [a, x1) ∪ (xn, b].
EXAMPLE: smooth.Pspline. Minimize
1X
(Yi − µ(xi))2 + λ
n
Z
(µ(m))2
R (m) 2
(m−1)
Minimize over all µ with µ
absolutely continuous, (µ
) < ∞.
(µ(m−1) absolutely continuous means µ(m) exists almost everywhere and
R t (m)
(x) dx = µ(m−1)(s) − µ(m−1)(t) for all s < t.)
sµ
FACTS FOR THIS EXAMPLE:
• degree (m − 1) polynomials are reproduced
• As λ → ∞, µ̂ converges to the least squares degree m − 1 polynomial.
• If λ = 0, µ̂ interpolates the data.
• µ̂ is piecewise polynomial of degree 2m − 1 with 2m − 2 continuous
derivatives, and µ̂(m)(x) = 0 for x ∈ [a, x1) ∪ (xn, b].
That is:
THEOREM. µ̂ = a natural spline of degree 2m−1 with knots at x1, . . . , xn
RECALL: a spline of degree 2m − 1 is a piecewise polynomial of degree
2m − 1 with 2m − 2 continuous derivatives.
DEFINITION of a natural spline
f is a natural spline of degree 2m − 1 with (interior) knots at x1, . . . , xn if
f is a spline of degree 2m − 1 and f (m) ≡ 0 on [a, x1) ∪ (xn, b].
DIMENSION OF NATURAL SPLINE SPACE is n
REFERENCES: Grace Wahba, Chong Gu, Bernard Silverman ...
IMPLEMENTED IN R:
smooth.spline (2nd derivative penalty)
smooth.Pspline (general integer mth derivative penalty)
R commands: smooth.spline, predict.smooth.spline, derivatives too
smooth.spline(x, y=NULL, w=NULL, df, spar=NULL, cv=FALSE,
all.knots=FALSE, nknots=NULL, df.offset = 0, penalty = 1,
control.spar=list())
predict(object, x, deriv = 0, ...)
Choose λ (spar) by CV, GCV (no plug-in)
1 (Y − µ(x ))2 + λ (µ(m) )2 IN THEORY:
TO MINIMIZE n
i
i
• old-fashioned: calculus or calculus of variations
• elegant: functional analysis with inner products, projections, Reproducing Kernel Hilbert Spaces (Chong Gu’s book, Grace Wahba’s book)
P
R
Heuristics of minimization when m = 1:
Minimize
b
1X
2
(Yi − µ(xi)) + λ (µ0(x))2 dx
n
a
Z
(1) µ̂ is a line on (xi, xi+1) and µ̂ is constant on [a, x1) ∪ (xn, b].
(2) Find â1 ≡ µ̂(x1), . . . , ân ≡ µ̂(xn).
(2) Find â1 ≡ µ̂(x1), . . . , ân ≡ µ̂(xn).
2
[µ̂(x
)
−
µ̂(x
)]
i
i−1
(µ̂0)2 = (xi − xi−1) × (slope)2 =
2≤i≤n
xi − xi−1
xi−1
Z x
i
Write
µ(x2) − µ(x1)
µ(x1)
...
.
= D .. ,
µ(xn) − µ(xn−1)
µ(xn)
Λ = diag
1
xi − xi−1
!
So we minimize:
1
1
(Y − µ)0(Y − µ) + λ(Dµ)0Λ(Dµ) = (Y − µ)0(Y − µ) + λµ0D0ΛDµ
n
n
⇒ µ̂ = (I + nλD0ΛD)−1Y ≡ SλY
GENERAL SMOOTHING SPLINE CASE:
1 P(Y − µ(x ))2 + λ R (µ(m) )2
minimize n
i
i
⇒ µ̂ is a natural spline.
P
So for some basis φ1, . . . , φn, µ̂(x) = n
1 β̂j φj (x).
So we minimize, as a function of β1, . . . , βn,
n
X
X
1X
2
βj φj (xj )] + λ
βj βk
[Yi −
n
1
j,k
Z
(m)
φj
(m)
(x)φk
(x) dx
1
= ||Y − Φβ)||2 + λβ 0P β
n
⇒ Ŷ = Φβ̂ = Φ(Φ0Φ + nλP )−1Φ0Y ≡ SλY.
MODIFICATIONS/EXTENSIONS
THIN PLATE SPLINES:
Xi ∈ <d. Minimize
1X
n
(Yi − µ(Xi))2 + λ
Z
Implemented in Chong Gu’s gss
∂2
∂x2
1
µ + ··· +
∂2
∂x2
d
µ
!2
dx1 · · · dxd
MODIFICATIONS/EXTENSIONS
PARTIAL LINEAR MODEL: data: (Yi, xi, zi) with
E(Yi) = βzi + µ(xi)
µ smooth, β and µ unknown. Minimize
1X
(Yi − βzi − µ(xi))2 + λ
n
Z
(µ00)2.
Implemented in gam (yes), in gss (??)
COMMENT: gcv/cv chooses λ that is good for predicting Y ∗ = βz +
µ(x) + . In general, this λ is good for estimating µ but is too big for
estimating β.
MODIFICATIONS/EXTENSIONS
PENALIZED LIKELIHOOD. Minimize
− log likelihood (µ(x1), . . . , µ(xn)) + λ
Z
(µ(m))2
⇒ natural spline of degree 2m − 1 with knots at x1, . . . , xn.
EXAMPLE: Suppose that Yi = 1, 0. P{Yi = 1} = p(xi).
log likelihood =
n
X
"
log p(xi)Yi (1 − p(xi))1−Yi
#
1
=
n
X
"
#
Yi log p(xi)/(1 − p(xi) + log[1 − p(xi)]
1
Set
µ(xi) = log [p(xi)/(1 − p(xi))] .
MODIFICATIONS/EXTENSIONS
OTHER PENALTIES (Ramsay and Heckman)
Minimize
Z
1X
2
(Yi − µ(xi)) + λ (Lm)2
n
where L is an mth order linear differential operator. (Lµ =
Solution is a generalized type spline.
Pm
(j) )
α
µ
j
0
Examples:
R
R 00 2
2
• (Lµ) = (µ ) : don’t penalize a line
R
R 1 00
2
• (Lµ) = 0 (µ + (2π)2µ)2: don’t penalize µ(x) = a0 cos(2πx) +
a1 sin(2πx)
R
R 1 00
2
• (Lµ) = 0 (µ − βµ0)2: don’t penalize µ(x) = a0 + a1 exp(βx) (can
estimate β)
DENSITY ESTIMATION (Silverman, 1982)
Estimate a density f based on data X1, . . . , Xn iid ∼ f .
log likelihood =
n
X
log f (Xi)
1
Remove the positivity constraint by letting g = log f and maximize
n
1X
g(Xi) − λ
n 1
Z
(Lg)2 subject to
Z
exp(g(x)) dx = 1.
Maximizer exists provided MLE exists within the parametric class of densities: {f : L(log f ) ≡ 0}
Silverman showed that this is equivalent to maximizing
n
1X
g(Xi) − λ
n 1
Z
(Lg)2 +
Z
exp(g(x)) dx.