Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
One-Dimensional Curve-Fitting Presented by Wenli Li, Shuhong Li, and Vivian Tam Venables and Ripley Section 8.7 Novemeber 22, 2004 INTRODUCTION • Curve-fitting: • Sample data:{(x0,y0), (x1,y1), ... (xn, yn)} • interpolation & extrapolation • One-dimensional curve-fitting (section 8.7): • The functional form is not pre-specified • SPLINES (ns, smooth.spline) • Local Regression (LOESS, SUPSMU, KERNEL SMOOTHER and LOCPOLY) • Data set: • One independent & one dependent Examples: GAGurine & Mercury level GAGurine (MASS) • Dataset: – – • Variables: • Age: independent • GAG: dependent Sample size: 314 Classical way: library(MASS) attach(GAGurine) plot(Age, GAG, main=”Degree 6 polynomial”) GAG.lm<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6) +I(Age^7) +I(Age^8)) anova(GAG.lm) GAG.lm2<-lm(GAG~Age+I(Age^2) +I(Age^3) +I(Age^4) +I(Age^5) +I(Age^6)) xx<-seq(0, 17, len=200) lines(xx, predict(GAG.lm2, data.frame(Age=xx), col=“red”) Age: 0.00 0.00……0.46 0.47.….17.30 7.67 GAG 23.0 23.8……18.6 26.4.…..1.9 9.3 ======================================= Terms added sequentially (first to last) Df Sum of Sq Mean Sq Age 1 12590 12590 I(Age^2) 1 3751 3751 I(Age^3) 1 1492 1492 I(Age^4) 1 449 449 I(Age^5) 1 174 174 I(Age^6) 1 286 286 I(Age^7) 1 57 57 I(Age^8) 1 45 45 F-value 593.58 176.84 70.32 21.18 8.22 13.48 2.70 2.12 Pr(F) 0.0000 0.0000 0.0000 0.00001 0.00444 0.00028 0.10151 0.14667 SPLINES • Algorithm: • Function: ns( ) • Generate a Basis Matrix for Natural Cubic Splines • Usage: ns(x, df, knots, intercept=F, Boundary.knots,derivs) • Arguments: • Required: x the predictor variable. • Optional: • Df: degrees of freedom. One can supply df rather than knots; ns then chooses df-1-intercept knots at suitably chosen quantiles of x. This argument is ignored if knots is supplied. • Knots: breakpoints that define the spline. SPLINES Function: smooth.spline( ) • Fits a cubic B-spline smooth to the input data. • Usage: smooth.spline(x, y, w = <<see below>>, df = <<see below>>, spar = 0, cv = F, all.knots = F, df.offset = 0, penalty = 1) • Arguments: • Required: X, values of the predictor variable. There should be at least ten distinct x values. • Optional: • Y: response variable, of the same length as x. • Df:a number which supplies the degrees of freedom = trace(S)rather than a smoothing parameter. SPLINES library(splines) plot(Age, GAG, type=”n”, main=”Spline”)#splines lines(Age, fitted(lm(GAG~ns(Age, df=5))), col=”red”) lines(Age, fitted(lm(GAG~ns(Age, df=10))), lty=3, col=”green”) lines(Age, fitted(lm(GAG~ns(Age, df=20))), lty=4, col=”blue”) lines(smooth.spline(Age, GAG), lwd=3, col=”black”)# Smoothing splines legend(12, 50, c(“red: df=5”, “green:df=10”, “blue:df=20”, “Smoothing”), lty=c(1,3, 4,1), lwd=c(1, 1,1, 3), bty=”n”) KERNEL SMOOTH Function: ksmooth( ) • Estimates a probability density or performs scatterplot smoothing using kernel estimates. • Usage: ksmooth(x, y=NULL, kernel="box", bandwidth=0.5, range.x=range(x), n.points=length(x), x.points=<<see below>>) • Arguments: • Required: X, vector of x data • Optional: • Y: vector of y data. This must be same length as x, and missing values are not accepted. • Kernel: "box“,"triangle“,"parzen“,"normal” • Bandwidth: Larger values of bandwidth make smoother estimates, smaller values of bandwidth make less smooth estimates. Kernel Smoother #kernel smoother: plot(Age, GAG, type=”n”, main=”ksmooth”) lines(ksmooth(Age, GAG, “normal”, bandwidth=1), col=”red”) lines(ksmooth(Age, GAG, “normal”, bandwidth=5)) legend(12, 50, c(“red: bandwidth=1”, “black: bandwidth=5”),bty=”n”) LOESS • Using Local Polynomial Regression fit a curve determined by one or more numerical predictors • gets a predicted value at each point by fitting a weighted linear regression, where the weights decrease with distance from the point of interest LOESS Parameters • f:controls the window size • weights: distance from some point x • span: the parameter alpha which controls the degree of smoothing • degree: the degree of the polynomials to be used, up to 2 LOESS Code: library(MASS) attach(GAGurine) plot(Age,GAG,type="n",main="loess") lines(loess.smooth(Age,GAG,span=2/3,degree=1),col="red",lwd=1) lines(loess.smooth(Age,GAG,span=2/3,degree=4),col="blue",lwd=2) lines(loess.smooth(Age,GAG,span=1/3,degree=4),col="green",lwd=1) legend(10,45, c("Red: span=2/3,deg=1","Blue: span=2/3,deg=4",”green: span=1/3,deg=4"),bty="n") SUPSMU • Serves a purpose similar to that of the function loess • The best of the three smoothers is chosen by cross-validation • If there are substantial correlations in x-value, then a pre-specified fixed span smoother should be used. Reasonable span values are 0.2 to 0.4 SUPSMU Parameters: • span: the fraction of the observations in the span of the running(lines smoother, or ‘“cv”’ to choose this by leave-one-out crossvalidation) • bass: controls the smoothness of the fitted curve. Values of up to 10 indicate increasing smoothness • periodic: if TRUE, the smoother assumes x is a periodic variable with values in the range [0.0, 1.0] and period 1.0. An error occurs if x has values outside this range References: Friedman, J. H. (1984) A variable span scatterplot smoother. Laboratory for Computational Statistics, Stanford University Technical Report No. 5 Code: plot(Age,GAG,type="n",main="supsmu") lines(supsmu(Age,GAG)) lines(supsmu(Age,GAG,bass=3),lty=3) lines(supsmu(Age,GAG,bass=10),lty=4) legend(12,50,c("default","bass=3","bass=10"),lty =c(1,3,4),bty="n") LOCPOLY • • • • Estimates a probability density function using local polynomials A fast binned implementation over an equally-spaced grid is used Use approximations over an equally-spaced grid for fast computation In a simple form : locpoly(x, y, degree=#, bandwidth=# ) Parameters: • • • • • locpoly(x, y, drv=0, degree=<<see below>>, kernel="normal“ bandwidth,gridsize=401, bwdisc=25, range.x=<<see below>>, binned=FALSE, truncate=TRUE ) drv: order of derivative to be estimated degree: degree of local polynomial used bandwidth: the kernel bandwidth smoothing parameter range.x: vector containing the minimum and maximum values of 'x' at which to compute the estimate Code: LOCPOLY library(MASS) attach(GAGurine) library(KernSmooth) plot(Age, GAG, type="n", main="(Age, GAG) Locpoly") (h<- dpill(Age, GAG)) lines(locpoly(Age, GAG, degree=0, bandwidth=h), col="red",lty=1,lwd=2) lines(locpoly(Age, GAG, degree=1, bandwidth=h), col="blue",lty=3,lwd=3) lines(locpoly(Age, GAG, degree=2, bandwidth=h), col="green",lty=4,lwd=3) legend(10,40,c("const=0 red","linear=1 blue","quad=2 green"),lty=c(1,3,4),bty="n") detach() LOCPOLY : GAGurine Example: Mercury Level • Model : Mercury and Alkalinity • In 1990 to 1991, largemouth bass fish were studied in 53 different Florida lakes to examine the Mercury contamination level and the factors that influenced the level of mercury absorpsion in the fish • One factor studied was the Alkaliniity level of the water • The graph of Mercury level and Alkalinity level is plotted to study the relationship Mercury Level Graphs Coding: • • • • #1 loess plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Loess") lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 1), col="red",lwd=2) lines(loess.smooth(Alkalinity,Mercury,span = 2/3, degree = 2), col="blue",lwd=2) legend(65,1.0, c("deg=1 Red","deg=2 Blue"),bty="n") • • • • • #2 supsmu plot( Alkalinity, Mercury, main="Alkalinity and Mercury, Supsmu") lines(supsmu(Alkalinity,Mercury, bass=1), lty=1,col="red",lwd=2) lines(supsmu(Alkalinity,Mercury, bass=10), lty=3,col="blue",lwd=3) legend(58,1.0, c("base=1red","base=10blue"),lty=c(1,3),bty="n",lwd=2) • • • • #3 ksmooth plot(Alkalinity, Mercury, type="n", main="Alkalinity and Mercury, Ksmooth") lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=1),col="green",lwd=2) lines(ksmooth(Alkalinity, Mercury, "normal", bandwidth=5),col="red", lty=2,lwd=2) legend(75,1.0, c("bw=1","bw=5"),lty=c(1,2),bty="n") • • • • • • • • • • • #4 locpoly library(KernSmooth) plot( Alkalinity, Mercury, type="n",main="Alkalinity and Mercury, Locpoly") #select bandwidth (h <- dpill(Alkalinity,Mercury)) lines(locpoly(Alkalinity,Mercury,degree=0, bandwidth=h),lty=1,col="green",lwd=2) lines(locpoly(Alkalinity,Mercury,degree=1, bandwidth=h),lty=2,col="red",lwd=2) lines(locpoly(Alkalinity,Mercury,degree=2, bandwidth=h),lty=3,col="purple",lwd=3) legend(75,1.0, c("const","linear","quad"),lty=c(1,2,3),bty="n") SUMMARY • Use One-Dimensional Curve-Fitting when: Scatter Plot does not result in a Linear Model Data Transformation does not give satisfactory Linear Model result Accommodate future data Include previous outliers Business applications • • Several methods discussed including: 1. SPLINES 2. LOESS 3. SUPSMU 4. KSMOOTH 5. LOCPOLY Parameters: such as bandwidth, df, derivative, smoothness, degree etc can help the curve fitting.