Download Document

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Expectation–maximization algorithm wikipedia , lookup

Principal component regression wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Lasso (statistics) wikipedia , lookup

Choice modelling wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Linear regression wikipedia , lookup

Transcript
Revisiting Regression – local
models, and non-parametric…
Peter Fox
Data Analytics – ITWS-4963/ITWS-6965
Week 12a, April 21, 2015
1
Why local?
2
Sparse?
3
Remember this one?
10
5
0
log(bronx$SALE.PRICE)
15
How would you apply local
methods here?
6
8
10
12
log(bronx$GROSS.SQUARE.FEET)
14
4
SVM-type
• One-class-classification: this model tries to
find the support of a distribution and thus
allows for outlier/novelty detection;
• epsilon-regression: here, the data points lie in
between the two borders of the margin which
is maximized under suitable conditions to
avoid outlier inclusion;
• nu-regression: with analogue modifications of
the regression model as in the classification
case.
5
Reminder SVM and margin
6
Loss functions…
classification
outlier
regression
7
Regression
• By using a different loss function called the εinsensitive loss function ||y−f (x)||ε = max{0,
||y− f(x)|| − ε}, SVMs can also perform
regression.
• This loss function ignores errors that are
smaller than a certain threshold ε > 0 thus
creating a tube around the true output.
8
Example lm v. svm
9
10
Again SVM in R
• E1071 - the svm() function in e1071 provides
a rigid interface to libsvm along with
visualization and parameter tuning methods.
• kernlab features a variety of kernel-based
methods and includes a SVM method based
on the optimizers used in libsvm and bsvm
• Package klaR includes an interface to
SVMlight, a popular SVM implementation that
additionally offers classification tools such as
Regularized Discriminant Analysis.
11
• Svmpath – you get the idea…
Knn is local – right?
• nearest neighbors is a simple algorithm that
stores all available cases and predict the
numerical target based on a similarity
measure (e.g., distance functions).
• KNN has been used in statistical estimation
and pattern recognition already in the
beginning of 1970’s as a non-parametric
technique.
12
Distance…
• A simple implementation of KNN regression
is to calculate the average of the numerical
target of the K nearest neighbors.
• Another approach uses an inverse distance
weighted average of the K nearest
neighbors. Choosing K!
• KNN regression uses the same distance
functions as KNN classification.
• knn.reg and also in kknn
• http://cran.rproject.org/web/packages/kknn/kknn.pdf
13
Classes of local regression
• Locally (weighted) scatterplot smoothing
– LOESS
– LOWESS
• Fitting is done locally - the fit at point x, the fit
is made using points in a neighborhood of x,
weighted by their distance from x (with
differences in ‘parametric’ variables being
ignored when computing the distance)
14
15
Classes of local regression
• The size of the neighborhood is controlled by
α (set by span).
• For α < 1, the neighbourhood includes
proportion α of the points, and these have
tricubic weighting (proportional to (1 (dist/maxdist)^3)^3). For α > 1, all points are
used, with the ‘maximum distance’ assumed
to be α^(1/p) times the actual maximum
distance for p explanatory variables.
16
Classes of local regression
• For the default family, fitting is by (weighted)
least squares. For family="symmetric" a few
iterations of an M-estimation procedure with
Tukey's biweight are used.
• Be aware that as the initial value is the leastsquares fit, this need not be a very resistant
fit.
• It can be important to tune the control list to
achieve acceptable speed.
17
Friedman (supsmu in modreg)
• is a running lines smoother which chooses
between three spans for the lines.
• The running lines smoothers are symmetric,
with k/2 data points each side of the predicted
point, and values of k as 0.5 * n, 0.2 * n and
0.05 * n, where n is the number of data
points.
• If span is specified, a single smoother with
span span * n is used.
18
Friedman
• The best of the three smoothers is chosen by
cross-validation for each prediction. The best
spans are then smoothed by a running lines
smoother and the final prediction chosen by
linear interpolation.
• “For small samples (n < 40) or if there are
substantial serial correlations between
observations close in x-value, then a
prespecified fixed span smoother (span > 0)
should be used. Reasonable span values are
0.2 to 0.4.”
19
Local non-param
• lplm (in Rearrangement)
• Local nonparametric method, local linear
regression estimator with box kernel (default),
for conditional mean functions
20
Ridge regression
• Addresses ill-posed regression problems
using filtering approaches (e.g. high-pass)
• Often called “regularization”
• lm.ridge (in MASS)
21
• Quantile regression
– is desired if conditional quantile functions are of
interest. One advantage of quantile regression,
relative to the ordinary least squares regression,
is that the quantile regression estimates are more
robust against outliers in the response
measurements.
– In practice we often prefer using different
measures of central tendency and statistical
dispersion to obtain a more comprehensive
analysis of the relationship between variables.
22
More…
• Partial Least Squares Regression (PLSR)
• mvr (in pls)
• Principal Component Regression (PCR)
• Canonical Powered Partial Least Squares
(CPPLS)
23
• PCR creates components to explain the
observed variability in the predictor variables,
without considering the response variable at
all.
• On the other hand, PLSR does take the
response variable into account, and therefore
often leads to models that are able to fit the
response variable with fewer components.
• Whether or not that ultimately translates into
a better model, in terms of its practical use,
depends on the context.
24
Splines
• smooth.spline, splinefun (stats, modreg) and
ns (in splines)
– http://www.inside-r.org/r-doc/splines
• a numeric function that is piecewise-defined
by polynomial functions, and which
possesses a sufficiently high degree of
smoothness at the places where the
polynomial pieces connect (which are known
as knots)
25
Splines
• For interpolation, splines are often preferred
to polynomial interpolation - they yields
similar results to interpolating with higher
degree polynomials while avoiding instability
due to overfitting
• Features: simplicity of their construction, their
ease and accuracy of evaluation, and their
capacity to approximate complex shapes
• Most common: cubic spline, i.e., of order 3—
in particular, cubic B-spline
26
cars
27
Smoothing/ local …
• https://web.njit.edu/all_topics/Prog_Lang_Doc
s/html/library/modreg/html/00Index.html
• http://cran.r-project.org/doc/contrib/Riccirefcard-regression.pdf
28