Download Multiple Minima: Phenomena of The Data Or A Drawback Method? (Comparison of SAS/IML NonLinear Optimization Routines)

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Drug design wikipedia , lookup

Bad Pharma wikipedia , lookup

Pharmacokinetics wikipedia , lookup

Transcript
Posters
MULTIPLE MINIMA: PHENOMENA OF THE DATA
OR A DRAWBACK OF THE MEmOD?
(COMPARISON OF SASIIML@NONLINEAROPTIMIZATION
ROUTINES)
Ekaterina Gibiansky
Sandoz Pharmaceuticals Corp.
unknown parameters that also have to be estimated; N is
a known number of terms.
ABSTRACT
Several SASIIML nonlinear optimization routines and the
least squares routines were used to estimate the parameters
of a nonlinear model. The model came from a
pharmacokinetic setting, and had a complicated, highly
nonlinear form. Eight parameters had to be estimated from
73 data points for each of ten data sets. The derivatives of
the objective function were not provided analytically. The
analyses were repeated with 100 random (in a plausible
range) sets of starting values for each of the ten sets of
data. The only two methods that consistently ran without
error messages were the. Neider-Mead Simplex Method
(NLPNMS) and the Quasi-Newton Method (NLPQN).
Results of these methods are compared, and pluses and
minuses of the methods are analyzed. The importance of
repeating the estimation with many sets of starting values
is shown. The basic strategy of using and combining the
methods is proposed.
After repeated doses of the drug, the concentration is a
superposition of concentrations from all the doses taken by
the time of the measurement. It is a complicated, highly
nonlinear function of time and a set of unknown
parameters. The parameters have to be estimated from the
experimental data. This is a problem of nonlinear
estimation. Several SASIIML routines are provided for this
purpose: NLPNMS, NLPCG, NLPQN, NLPDQN,
NLPDD, NLPTR, NLPNRR, NLPLM, NLPHQN .. But
none provides an easy way to the solution.
DATA
The data came from ten different subjects who were given
the drug every day for 30 days. Each of them had 73
blood samples taken during 58 days since the start of
dosing. On three occasions (days I, 16, 30) they had blood
drawn 13 times during 24 hours. The rest of the samples
were taken 24 hours post dose right before the next dose
on some of the days. Thus there were ten data sets with
73 data points in each, describing drug concentration in a
person's blood during the study.
INTRODUCTION
In a clinical trial subjects are given repeated doses of a
drug and measurements of the drug concentrations in their
blood are taken at one or several occasions after some of
the doses. One of the pharmacokinetic objectives is to
characterize drug concentration in the body as a function
of time and to estimate several pharmacokinetic
parameters that are constant within a subject. According to
pharmacokinetic theory, the concentration at time t
following a single dose of the drug is of the form [I]:
C(t)=Dose
0,
if t<T:
E A.-cxp( -b/),
if t2:T:
N
1
METHODS
Routines for nonlinear optimization available in SASIIML:
1. Two Least Squares methods: Levenberg-Marquardt
(NLPLM) and Hybrid Quasi-Newton (NLPHQN) methods.
2. Seven Optimization methods:
1'1
a. The Trust-Region (NLPTR) and Newton-Raphson
Ridge (NLPNRR) optimization methods use second order
derivatives. Finite difference approximations are used to
obt8in the first and the second order derivatives, if they
are not provided. To be efficient, these methods require at
least gradients to be specified analytically.
Here 't, b; are the unknown parameters that have to be
estimated. A; are known functions of b; and some other
621
Posters
ABSGTOL=IO'S forQN, ABSFTOL=10's forNMSI and
ABSFTOL= 10.7 for NMS2. The table also shows the time
it took on a V AX station 4000-90.
b. The Conjugate Gradient (NLPCG), Quasi-Newton
(NLPQN), Dual Quasi-Newton (NLPDQN) and DoubleDogleg (NLPDDG) optimization methods use gradients. If
they are not provided analytically, they are approximated
by fmite differences.
c. The Neider-Mead Simplex (NLPNMS) method uses
only the function itself; it does not need the derivatives.
There are only two differences in the columns of the-table:
in the data set 2, where NMS2 improved the value of the
objective function from NMSHo match QN. And in the
data set 9, where QN has a superior result regardless of
the NMS criterion used. Though differences in the
objective functions are small (fourth or fifth
All the above algorithms require input of the initial
estimates for the unknown parameters. To obtain them, the
program STRIP [2] was used on a subset of data
(corresponding to the first 24 hours since start of dosing)
for each of ten subjects. Optimization methods also need
an objective function. Least Squares was used: the sum of
squared errors (SSE) had to be minimized.
QN
NMS
ABSFTOL= 10" ABSGTOL=IO's
All the methods, listed above were tried. All of them,
except the Neider-Mead Simplex and Quasi-Newton
methods, gave unclear error messages and exited with no
results. Probably, more tuning of starting values could
have helped. Two methods worked - the Neider-Mead
Simplex and Quasi-Newton methods. To study the
sensitivity of the methods to starting values, estimation
was repeated 100 times for each of the ten data sets. Each
time starting values for all the parameters were generated
from random uniformly distributed variables in a range of
100% change. from the initial starting values obtained
using STRIP.
RC
RESULTS
Table 1 shows the results of the first run, with STRIP's
starting values, for both the Neider-Mead Simplex (NMS)
and Quasi-Newton (QN) methods. A Return Code (RC)
greater than zero indicates convergence; a negative Return
Code states unsuccessful termination. ForRC=-5 the SAS
manual gives: . "The subroutine cannot improve the
function value (this is a very general formulation aIld is
used for various circumstances)". The absolute function·
convergence criterion ABSFTOL was used in NMS; the
absolute gradient convergence criterion ABSGTOL was
used in QN.
Though convergence was reached for all ten data sets in
NMS and only for the fIrSt three in QN, values of all the
objective functions are smaller for QN, except for the fIrSt
data set, where it clearly converged to a local minimum
very far from the global one.
Table 2 displays the best
that converged (RC >0)
experiment was repeated
convergence criteria.
SSE
RC
SSE
I
2
1.180
5
305.595
2
2
314.222
5
314.195
3
2
2.199
5
2.123
4
2
3.321
-5
3.209
5
2
0.938
-5
0.865
6
2
. 1.721
-5
1.719
7
2
2.352
-5
2.349
8
2
2.213
-5
2.050
9
2
3.718
-5
3.541
JO
2
3.696
-5
3.694
Table)
The objective function (SSE) and the Return Code (RC)
obtained by the Neider-Mead Simplex (NMS) and QuasiNewton (QN) methods for the ten data sets with STRIP's
sets of starting values.
significant digit), differences in the parameter estimates are
much larger. Maximum errors of the NMSI and NMS2
parameter estimates relative to the QN estimates are 3%
and 14%. respectively. Another important result comes
from a comparison of the two tables. In 4 out of 10 data
sets QN produced the wrong results on the first run,
and NONE of the NMS results was correct!
results of 100 runs. Only runs
were taken into account. The
twice for NMS with different
The criteria used were:
622·
Posters
In theory we all know that nonlinear algorithms converge
to a local minimum, but in practice how often do we
check the results? The above result makes repetition of the
estimation using different starting values A MUST.
From another point of view, on the first run NMS
converged in all 10 cases. Only one of the objective
functions was very far from the global minimum; for the
rest the relative error didn't exceed 12%. Interestingly, the
maximum relative error of the parameter estimates for that
'bad' case didn't exceed 50%, whereas it exceeded 500%
in one of the other cases. QN converged only in three
cases, and two of them were in local minima very far
from the global one. But in 5 out of 7 cases where the
Another disadvantage of NMS: it is very inefficient. It
takes NMS 10 hours to complete 1000 (100*10)
estimations, whereas QN takes 21 minutes. And with the
NMSI
NMS2
QN
1
1.146
l.146
l.146
2
2.357
2.355
2.355
3
2.123
2.123
2.123
4
3.209
3.209
3.209
5
0.865
0.865
0.865
6
1.719
1.719
1.719
7
2.349
2.349
2.349
8
2.050
2.050
2.050
9
3.433
3.433
3.432
10
3.298
3.298
3.298
QN
NMSI
NMS2
NMS2:
< 100%
error
1
4
69
49
38
2
5
70
59
56
3
3
79
75
73
4
4
53
39
35
5
3
81
66
54
6
5
88
67
63
7
I
84
71
57
8
5
88
73
64
9
6
88
80
76
10
7
59
43
42
Average
4.3
75.9
62.2
55.8
Converged
(out of 1000)
243
988
917
Elapsed time on VAX 4000-90
10
hours
35
hours
21
min
.
Maximum relative error of the
parameter estimates
14%
3%
Table 2
Table 3
The smallest SSE obtained in 100 runs of the QuasiNewton (ABSGTOL=10's) and Neider-Mead Simplex
(ABSFTOL=lO's and ABSFTOL=10") methods for each
of the ten data sets. Elapsed computer time for each of the
algorithms. Maximum error of the NMS parameter
estimates relative to the QN parameter estimates.
Number of distinct minima (when rounded to 0.001)
produced by QN, NMSl, NMS2 in 100 runs with different
starting values. Also, number of minima produced by
NMS 2 that are in the 100% error range from the smallest
one.
more rigorous convergence criterion NMS takes several
times longer.
algorithm said it Qidnl converge, it apparently really did!
623
Posters
The next table, Table 3 shows how many distinct minima
each of the algorithms produced during 100 runs. Again,
only those that converged are counted. The last column
shows the number of minima produced by NMS2 that are
in the 100% error range from the smallest one.
The flTSt part may be skipped if starting values obtained
from other considerations allow the Quasi-Newton
algorithm to converge.
ACKNOWLEDGMENTS
Whereas QN detects I to 7 minima (with average 4.3) in
the neighborhood of the global minimum, NMS accounts
for an average of76 (NMS1) or 62 (NMS2) minima! Even
if you restrict your attention only to minima in a 100%
range of a "true" minimum, an average number of minima
produced by NMS2 is still 56. This means that the
algorithm can't produce the correct result, no matter how
strict the termination criteria are. Almost every new set of
starting values gives a new local minimum.
I am grateful to l.R. Nedelman for stimulating discussions
and comments that helped to improve the paper.
REFERENCES
1.
Wagner, J.G (1975) Fundamentals of clinical
pharmacokinetics. Drug Intelligence, Illinois.
2.
Gibiansky, E.L. (1994) Best initial values for
nonlinear estimation: stripping algorithm
implemented as SASIlML routine. Proceedings of
the Seventh Annual NorthEast SAS Users Group
Conference.
3.
Hartmann W.M (1994) SAS Internal Document:
Nonlinear Optimization in IML, Release 6.08
CONCLUSIONS·
For a multidimensional, highly nonlinear problem the
Neider-Mead Simplex method is unable to find the
minimum, no matter how tight the convergence criterion
is. It converges somewhere around the minimum, in a new
place for each set of starting values. However, by
repeating the fit many times with different starting values
and choosing the best fit, the minimum (or at least a value
very close to the minimum) can be obtained. The
algorithm is very slow, and this makes the idea of
repeating the fit many times impractical.
To contact the author:
Ekaterina Gibiansky
Sandoz Pharmaceuticals Corp.,
59 Route 10, East Hanover Nl 07936
Phone: (201)-503-7399
Fax: (201)-503-8865
E-mail: [email protected]
The Quasi-Newton method, on the contrary, is a very
efficient method able to obtain a minimum. To converge
it needs fmer tuning of starting values, and also needs
repetitions of the fit with many sets of starting values to
fmd the global minimum.
SAS, SASIlML are registered trademarkS or trademarks of SAS Institute
Inc. in the USA and other countries ... indicates USA registratiori.
The other methods available in IML need analytical
specification of the derivatives or very fine tuning of the
starting values. Otherwise, it is impossible to make them
work.
The proposed strategy of model fitting:
1. Use the Neider-Mead Simplex method with a loose
convergence criterion (for speed) to obtain a set of starting
values;
2. Improve the result with the Quasi-Newton method;
3. Approve/improve the result by repeating fitting (as
many times as computer speed allows) with different sets
of starting values in some neighborhood of the result
obtained in step 2. Use a grid or a random generation of
starting values.
624