Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Posters MULTIPLE MINIMA: PHENOMENA OF THE DATA OR A DRAWBACK OF THE MEmOD? (COMPARISON OF SASIIML@NONLINEAROPTIMIZATION ROUTINES) Ekaterina Gibiansky Sandoz Pharmaceuticals Corp. unknown parameters that also have to be estimated; N is a known number of terms. ABSTRACT Several SASIIML nonlinear optimization routines and the least squares routines were used to estimate the parameters of a nonlinear model. The model came from a pharmacokinetic setting, and had a complicated, highly nonlinear form. Eight parameters had to be estimated from 73 data points for each of ten data sets. The derivatives of the objective function were not provided analytically. The analyses were repeated with 100 random (in a plausible range) sets of starting values for each of the ten sets of data. The only two methods that consistently ran without error messages were the. Neider-Mead Simplex Method (NLPNMS) and the Quasi-Newton Method (NLPQN). Results of these methods are compared, and pluses and minuses of the methods are analyzed. The importance of repeating the estimation with many sets of starting values is shown. The basic strategy of using and combining the methods is proposed. After repeated doses of the drug, the concentration is a superposition of concentrations from all the doses taken by the time of the measurement. It is a complicated, highly nonlinear function of time and a set of unknown parameters. The parameters have to be estimated from the experimental data. This is a problem of nonlinear estimation. Several SASIIML routines are provided for this purpose: NLPNMS, NLPCG, NLPQN, NLPDQN, NLPDD, NLPTR, NLPNRR, NLPLM, NLPHQN .. But none provides an easy way to the solution. DATA The data came from ten different subjects who were given the drug every day for 30 days. Each of them had 73 blood samples taken during 58 days since the start of dosing. On three occasions (days I, 16, 30) they had blood drawn 13 times during 24 hours. The rest of the samples were taken 24 hours post dose right before the next dose on some of the days. Thus there were ten data sets with 73 data points in each, describing drug concentration in a person's blood during the study. INTRODUCTION In a clinical trial subjects are given repeated doses of a drug and measurements of the drug concentrations in their blood are taken at one or several occasions after some of the doses. One of the pharmacokinetic objectives is to characterize drug concentration in the body as a function of time and to estimate several pharmacokinetic parameters that are constant within a subject. According to pharmacokinetic theory, the concentration at time t following a single dose of the drug is of the form [I]: C(t)=Dose 0, if t<T: E A.-cxp( -b/), if t2:T: N 1 METHODS Routines for nonlinear optimization available in SASIIML: 1. Two Least Squares methods: Levenberg-Marquardt (NLPLM) and Hybrid Quasi-Newton (NLPHQN) methods. 2. Seven Optimization methods: 1'1 a. The Trust-Region (NLPTR) and Newton-Raphson Ridge (NLPNRR) optimization methods use second order derivatives. Finite difference approximations are used to obt8in the first and the second order derivatives, if they are not provided. To be efficient, these methods require at least gradients to be specified analytically. Here 't, b; are the unknown parameters that have to be estimated. A; are known functions of b; and some other 621 Posters ABSGTOL=IO'S forQN, ABSFTOL=10's forNMSI and ABSFTOL= 10.7 for NMS2. The table also shows the time it took on a V AX station 4000-90. b. The Conjugate Gradient (NLPCG), Quasi-Newton (NLPQN), Dual Quasi-Newton (NLPDQN) and DoubleDogleg (NLPDDG) optimization methods use gradients. If they are not provided analytically, they are approximated by fmite differences. c. The Neider-Mead Simplex (NLPNMS) method uses only the function itself; it does not need the derivatives. There are only two differences in the columns of the-table: in the data set 2, where NMS2 improved the value of the objective function from NMSHo match QN. And in the data set 9, where QN has a superior result regardless of the NMS criterion used. Though differences in the objective functions are small (fourth or fifth All the above algorithms require input of the initial estimates for the unknown parameters. To obtain them, the program STRIP [2] was used on a subset of data (corresponding to the first 24 hours since start of dosing) for each of ten subjects. Optimization methods also need an objective function. Least Squares was used: the sum of squared errors (SSE) had to be minimized. QN NMS ABSFTOL= 10" ABSGTOL=IO's All the methods, listed above were tried. All of them, except the Neider-Mead Simplex and Quasi-Newton methods, gave unclear error messages and exited with no results. Probably, more tuning of starting values could have helped. Two methods worked - the Neider-Mead Simplex and Quasi-Newton methods. To study the sensitivity of the methods to starting values, estimation was repeated 100 times for each of the ten data sets. Each time starting values for all the parameters were generated from random uniformly distributed variables in a range of 100% change. from the initial starting values obtained using STRIP. RC RESULTS Table 1 shows the results of the first run, with STRIP's starting values, for both the Neider-Mead Simplex (NMS) and Quasi-Newton (QN) methods. A Return Code (RC) greater than zero indicates convergence; a negative Return Code states unsuccessful termination. ForRC=-5 the SAS manual gives: . "The subroutine cannot improve the function value (this is a very general formulation aIld is used for various circumstances)". The absolute function· convergence criterion ABSFTOL was used in NMS; the absolute gradient convergence criterion ABSGTOL was used in QN. Though convergence was reached for all ten data sets in NMS and only for the fIrSt three in QN, values of all the objective functions are smaller for QN, except for the fIrSt data set, where it clearly converged to a local minimum very far from the global one. Table 2 displays the best that converged (RC >0) experiment was repeated convergence criteria. SSE RC SSE I 2 1.180 5 305.595 2 2 314.222 5 314.195 3 2 2.199 5 2.123 4 2 3.321 -5 3.209 5 2 0.938 -5 0.865 6 2 . 1.721 -5 1.719 7 2 2.352 -5 2.349 8 2 2.213 -5 2.050 9 2 3.718 -5 3.541 JO 2 3.696 -5 3.694 Table) The objective function (SSE) and the Return Code (RC) obtained by the Neider-Mead Simplex (NMS) and QuasiNewton (QN) methods for the ten data sets with STRIP's sets of starting values. significant digit), differences in the parameter estimates are much larger. Maximum errors of the NMSI and NMS2 parameter estimates relative to the QN estimates are 3% and 14%. respectively. Another important result comes from a comparison of the two tables. In 4 out of 10 data sets QN produced the wrong results on the first run, and NONE of the NMS results was correct! results of 100 runs. Only runs were taken into account. The twice for NMS with different The criteria used were: 622· Posters In theory we all know that nonlinear algorithms converge to a local minimum, but in practice how often do we check the results? The above result makes repetition of the estimation using different starting values A MUST. From another point of view, on the first run NMS converged in all 10 cases. Only one of the objective functions was very far from the global minimum; for the rest the relative error didn't exceed 12%. Interestingly, the maximum relative error of the parameter estimates for that 'bad' case didn't exceed 50%, whereas it exceeded 500% in one of the other cases. QN converged only in three cases, and two of them were in local minima very far from the global one. But in 5 out of 7 cases where the Another disadvantage of NMS: it is very inefficient. It takes NMS 10 hours to complete 1000 (100*10) estimations, whereas QN takes 21 minutes. And with the NMSI NMS2 QN 1 1.146 l.146 l.146 2 2.357 2.355 2.355 3 2.123 2.123 2.123 4 3.209 3.209 3.209 5 0.865 0.865 0.865 6 1.719 1.719 1.719 7 2.349 2.349 2.349 8 2.050 2.050 2.050 9 3.433 3.433 3.432 10 3.298 3.298 3.298 QN NMSI NMS2 NMS2: < 100% error 1 4 69 49 38 2 5 70 59 56 3 3 79 75 73 4 4 53 39 35 5 3 81 66 54 6 5 88 67 63 7 I 84 71 57 8 5 88 73 64 9 6 88 80 76 10 7 59 43 42 Average 4.3 75.9 62.2 55.8 Converged (out of 1000) 243 988 917 Elapsed time on VAX 4000-90 10 hours 35 hours 21 min . Maximum relative error of the parameter estimates 14% 3% Table 2 Table 3 The smallest SSE obtained in 100 runs of the QuasiNewton (ABSGTOL=10's) and Neider-Mead Simplex (ABSFTOL=lO's and ABSFTOL=10") methods for each of the ten data sets. Elapsed computer time for each of the algorithms. Maximum error of the NMS parameter estimates relative to the QN parameter estimates. Number of distinct minima (when rounded to 0.001) produced by QN, NMSl, NMS2 in 100 runs with different starting values. Also, number of minima produced by NMS 2 that are in the 100% error range from the smallest one. more rigorous convergence criterion NMS takes several times longer. algorithm said it Qidnl converge, it apparently really did! 623 Posters The next table, Table 3 shows how many distinct minima each of the algorithms produced during 100 runs. Again, only those that converged are counted. The last column shows the number of minima produced by NMS2 that are in the 100% error range from the smallest one. The flTSt part may be skipped if starting values obtained from other considerations allow the Quasi-Newton algorithm to converge. ACKNOWLEDGMENTS Whereas QN detects I to 7 minima (with average 4.3) in the neighborhood of the global minimum, NMS accounts for an average of76 (NMS1) or 62 (NMS2) minima! Even if you restrict your attention only to minima in a 100% range of a "true" minimum, an average number of minima produced by NMS2 is still 56. This means that the algorithm can't produce the correct result, no matter how strict the termination criteria are. Almost every new set of starting values gives a new local minimum. I am grateful to l.R. Nedelman for stimulating discussions and comments that helped to improve the paper. REFERENCES 1. Wagner, J.G (1975) Fundamentals of clinical pharmacokinetics. Drug Intelligence, Illinois. 2. Gibiansky, E.L. (1994) Best initial values for nonlinear estimation: stripping algorithm implemented as SASIlML routine. Proceedings of the Seventh Annual NorthEast SAS Users Group Conference. 3. Hartmann W.M (1994) SAS Internal Document: Nonlinear Optimization in IML, Release 6.08 CONCLUSIONS· For a multidimensional, highly nonlinear problem the Neider-Mead Simplex method is unable to find the minimum, no matter how tight the convergence criterion is. It converges somewhere around the minimum, in a new place for each set of starting values. However, by repeating the fit many times with different starting values and choosing the best fit, the minimum (or at least a value very close to the minimum) can be obtained. The algorithm is very slow, and this makes the idea of repeating the fit many times impractical. To contact the author: Ekaterina Gibiansky Sandoz Pharmaceuticals Corp., 59 Route 10, East Hanover Nl 07936 Phone: (201)-503-7399 Fax: (201)-503-8865 E-mail: [email protected] The Quasi-Newton method, on the contrary, is a very efficient method able to obtain a minimum. To converge it needs fmer tuning of starting values, and also needs repetitions of the fit with many sets of starting values to fmd the global minimum. SAS, SASIlML are registered trademarkS or trademarks of SAS Institute Inc. in the USA and other countries ... indicates USA registratiori. The other methods available in IML need analytical specification of the derivatives or very fine tuning of the starting values. Otherwise, it is impossible to make them work. The proposed strategy of model fitting: 1. Use the Neider-Mead Simplex method with a loose convergence criterion (for speed) to obtain a set of starting values; 2. Improve the result with the Quasi-Newton method; 3. Approve/improve the result by repeating fitting (as many times as computer speed allows) with different sets of starting values in some neighborhood of the result obtained in step 2. Use a grid or a random generation of starting values. 624