Download 1 Non-linear Curve Fitting

Mathematics 241 1 1.1 Nonlinear models Non-linear Curve Fitting Linearization Suppose that we wish to fit a function y = f (x) to data for which a linear function is clearly not appropriate. We generally know this because we see a definite non-linear pattern in the scatterplot (or in a residual plot) or because the science behind the relationship tells us that a non-linear relationship might be more appropriate. Of course we cannot simply search for an arbitrary function f . We could fit the data exactly with a polynomial of sufficiently high degree, but such a polynomial is unlikely to be a useful model. Therefore, to fit a non-linear function to data, we generally constrain the function we are looking for to be in some small class of functions. Usually this class is defined by a small number of parameters. This is what we did in the linear case – we limited ourselves to a two-parameter (β0 , β1 ) family of functions. We use the example of the world records in track. Example 1.1. Various biological models suggest that a possible model for the men’s world records in track might have the form y = b0 x b 1 where x is the distance in meters and y is the time in seconds and b0 and b1 are parameters. One way to fit these data are to linearize the relationship. Notice that if this equation is correct, ln y = ln b0 + b1 ln x or y 0 = β̂0 + β̂1 x0 Notice here that we have rewritten this relationship so that we can see that it really is linear. The variables x0 and y 0 are just data – x0 = ln x and y 0 = ln y. The estimated parameters β̂0 and β̂1 are just ln b0 and b1 respectively. We can compute β̂0 and β̂1 using lm. > ltrack=lm(log(Seconds)~log(Meters),data=mentrack) > ltrack Call: lm(formula = log(Seconds) ~ log(Meters), data = mentrack) Coefficients: (Intercept) log(Meters) -2.921 1.124 > exp(-2.921) [1] 0.05387978 \ = .05Meters1.12 . Thus −2.921 is ln b0 and 1.124 is b1 . Therefore the curve that we want is Seconds 1.2 Least-squares Solution There is another way to solve estimate a non-linear relationship of the form y = f (x) that does not involve the step of linearizing the relationship. Since not all relationships can be linearized, this method should be in any scientists toolbox. We found the the least squares line in Example 1.1 by minimizing the sums of squares of the residuals in the transformed equation. Notice that the residuals in this case were in the units of log-seconds. We next try to find b0 and b1 without transforming the equation but by minimizing the sums of squares of residuals directly. Our residuals have the form ei = yi − b0 xbi 1 and minimizing the sum of squares of these quantities will amount to minimizing a non-linear function. Mathematics 241 Nonlinear models The R function nls minimizes the sums of squares of residuals for a wide range of functions. However a non-linear minimization routine needs a starting guess that is sufficiently close to the minimum in order to be successful. We have a good starting guess in the solution to our linearization. > nltrack=nls(Seconds~b0*Meters^b1, start=list(b0=.05,b1=1),data=mentrack) > nltrack Nonlinear regression model model: Seconds ~ b0 * Meters^b1 data: mentrack b0 b1 0.08395 1.06864 residual sum-of-squares: 168.1 Number of iterations to convergence: 5 Achieved convergence tolerance: 1.484e-06 \ = .08Meters1.07 . This is not the same solution as we got in Example 1.1. This The fitted function is now Seconds is because we are minimizing a different quantity and so we have a different criterion for the best fitting line. In this case, our residuals are in seconds. Roughly this means that larger errors (in seconds) are penalized more in this method of fitting than in the transformed model (since log-seconds are much smaller). 1.3 Comparison of the two Methods There is no right method. Both of these two methods fit the function by some reasonable criterion of “best.” Sometimes we will find that the first method suits our purposes better and in others, the second method does. Quite often, the functions computed by each method will be close. Here we compare the two models for the men’s track records. dy = −2.92 + 1.12 ln x minimized the sums of squares of residuals. To see what In the linear model, we found that ln this means we write the equation for the residuals: ln yi = −2.92 + 1.12 ln xi + ei Transforming this equation back to the original units, we have yi = eei x1.12 In other words, the residuals are multiplicative in the original model and what we are minimizing is really percentage error rather than absolute error. The following output compares the errors of the two models. It is easy to see the differences between the two models by looking at the extreme points. For the 10,000 meter race, the prediction of the linearized model is more than a minute off (104 seconds) but is only about 6% off. The maximum percentage error for the linear model is about 7% (for the 200 meter race). On the other hand, the non-linear least squares model has a maximum error of at most 7.5 seconds but is more than 15% off in its prediction of the 100 meter race time. Clearly the question here is whether we prefer to minimize absolute error (in seconds) or relative error (in percentages). Historically, the first of our two methods was usually preferred. The reason for that is that the linear-least squares solution can be computed with a minimum of technology. The second method is now widely used – for example it is the method implemented in Logger-Pro for all non-linear problems. However the choice between the two methods should be made based on what one considers a good model rather than on what technology is more convenient. As we have shown above, both methods are easy to use in R. Mathematics 241 Nonlinear models > > > > > > > > > options(digits=3) lmodel = exp(predict(ltrack)) lerror=mentrack$Seconds-lmodel lfactor=exp(residuals(ltrack)) nmodel = predict(nltrack) nerror=residuals(nltrack) nfactor=mentrack$Seconds/nmodel results = data.frame(actual=mentrack$Seconds, lmodel, lerror,lfactor, nmodel, nerror,nfactor) results actual lmodel lerror lfactor nmodel nerror nfactor 1 9.69 9.52 0.173 1.018 11.5 -1.83 0.841 2 19.30 20.74 -1.436 0.931 24.2 -4.85 0.799 3 43.18 45.18 -1.999 0.956 50.7 -7.48 0.852 4 101.11 98.44 2.673 1.027 106.3 -5.15 0.952 5 131.96 126.48 5.475 1.043 134.9 -2.92 0.978 6 206.00 199.47 6.527 1.033 208.0 -2.03 0.990 7 223.13 215.88 7.254 1.034 224.3 -1.14 0.995 8 284.79 275.59 9.203 1.033 282.9 1.89 1.007 9 440.67 434.61 6.056 1.014 436.3 4.34 1.010 10 757.35 771.54 -14.193 0.982 753.2 4.19 1.006 11 1577.53 1681.05 -103.516 0.938 1579.7 -2.18 0.999 Problem Find functions g and h that transform the following non-lienar equations y = f (x) that depend on parameters b0 and b1 to linear equations g(y) = b00 + b01 h(x). b0 b1 + x x 2. y = b0 + b1 x 1. y = 3. y = 1 1 + b0 e b 1 x

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download 1 Non-linear Curve Fitting