Download Practice of SAS® Logistic Regression on Binary Pharmacodynamic Data - Problems and Solutions

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Pharmacokinetics wikipedia , lookup

Plateau principle wikipedia , lookup

Hormesis wikipedia , lookup

Bilastine wikipedia , lookup

Transcript
NESUG 15
Pharmaceuticals
Practice of SAS® Logistic Regression on Binary Pharmacodynamic Data – Problems
and Solutions
Alan J Xiao, Cognigen Corporation, Buffalo NY
Logistic regression has been widely applied to population
pharmacodynamic analyses of dose-response (binary
efficacy or safety endpoints). Limited by the model
structure p(y) = exp(y)/(1+exp(y)), where p(y) is the
response probability and y = logit(x) is a function of the
explanatory variable vector x (usually drug exposure and
other covariates), direct use of this procedure to some
special data might yield misleading results. Although
the logarithm (and other) transformations of explanatory
variables can expand the use of logistic regression to
those types of data, eligible explanatory variables for
transformation cannot include those with zero values,
such as dose or drug exposure in placebo subjects. This
could make the analysis even more difficult, especially
when all dosages lie near or at the plateau of the
response. An alternative solution may be the use of
PROC NLIN to model, as a continuous function, the
outcome probabilities at each level of the explanatory
variable. A dose-response case study with a limited
number of treatment groups, including placebo, were
illustrated, where alternative methods of modeling were
better implemented in PROC NLIN. Utilization of SAS
Logistic or alternative approaches requires thorough
understanding of these procedures, the underlying
methodology, data features, and the physiological
meaning of variables.
It assumes that the response probability (p) of a patient to
y (a function of exposure, such as dose) is always from 0
to 1 in an S-shape pattern, as illustrated in Figure 1.
This paper will discuss the application of the basic model
defined by Equation 1 to pharmacodynamic data in
different approaches. To simplify the description, a
simulated case study was introduced in this paper. In
this study, 400 patients were evenly grouped to take a
daily dose of 0 (placebo), 2, 4 and 8 mg of a hypothetical
drug. At the end of treatment, 12, 41, 41 and 54 patients
(out of 100) in each group were observed to have
response (yes) to a PD endpoint. That is to say, the
population response probability was 0.12, 0.41, 0.41 and
0.54, respectively.
1
0.9
Response probability p(y)
ABSTRACT
0.8
0.7
0.6
0.5
0.4
0.3
0.2
Keyword: Logistic regression, nonlinear, log
transformation
0
-15
INTRODUCTION
-10
-5
0
5
10
15
y
Exposure – response relationship is very important in
evaluating the efficacy and safety of a drug. The
response, as a pharmacodynamic (PD) endpoint in both
efficacy and safety studies for a drug, is frequently
recorded as binary data. The exposure may refer to dose,
drug or metabolite concentrations, or AUC (area under
concentration-time profile) values. The exposure –
response relationship is frequently modeled using the
method of logistic regression, such as SAS® PROC
LOGISTIC1.
In logistic regression, the basic model structure is
illustrated as in Equation 1:
p(y) = exp(y)/(1+exp(y))
0.1
Figure 1. The shape of the curve of p(y) versus y in
Equation 1.
ORDINARY LOGISTIC REGRESSION
When Equation 1 is actually applied to
pharamcodynamic endpoints of a drug, y is usually a
function of exposure variables such as dose or
concentrations or AUC values. In the case of ordinary
logistic regression, y is assumed to be a linear function of
exposure (x), as expressed in Equation 2.
(1)
y = intercept + coeff*x
(2)
NESUG 15
Pharmaceuticals
of odds ratios from logistic regression with log
transformation were investigated by Keen2 and Elswick
Jr. et al3, respectively.
Substitution of y in Equation 1 with Equation 2 leads to:
p(x) = B*exp(coeff*x)/(1+B*exp(coeff*x))
(3)
Assuming:
where B = exp(intercept). Ordinary logistic regression is
therefore to obtain the estimates of intercept and coeff in
Equation 2 (expressed as the logit function in SAS®) by
fitting Equations 3 to the exposure – response data.
y = a + b*log(x)
(4)
where a, b, x are intercept, coefficient and exposure, e.g.,
dose, respectively. The probability function (Equation 1)
becomes:
Obviously, Equation 1 represents a special case of
Equation 3 with intercept=0 (thus B=1) and coeff=1.
When coeff = 1 while intercept≠0 (then B≠1), the curve
of p(x) vs. x is equivalent to shift the curve of p(y) vs. y
with a distance of |intercept| to the right (if intercept<0)
or to the left (if intercept>0) in Figure 1. Therefore, the
curves of p(x) vs. x and p(y) vs. y in this case have
exactly the same shape: S-shape, in a regular coordinate
system. Generally, the curve of p(x) vs. x has a different
steepness (determined by coeff) and x50 (defined as the x
value at which the response probability is 0.5 and
determined by B and coeff) than the curve of p(y) vs. y
even though both have the same S-shape. Note that, in
practice where ordinary logistic regression applies, the Sshape does not necessarily appear in the exposureresponse graph if the measured exposure (x) range is
beyond the inflection point (xi = -intercept/coeff), i.e., x ≥
xi, where response probability p(xi) ≥ 0.5.
p(x) = A*xb/(1+A*xb)
(5)
where A = exp(a). Note that when x=C, b=γ and
A=1/C50γ, this equation becomes the same equation as
that for the concentration-response probability
relationship used by Bailey and Gregg4 for investigating
the inter-patient variability (Probit regression).
When ordinary logistic regression is applied to the case
study, the parameters (as in Equation 2) are estimated as
(mean±standard deviation):
intercept = -1.271±0.178 and coeff = 0.200±0.037.
The model predictions and measurements are illustrated
in Figure 2. As is shown, the model-predictions of the
response probability at placebo and the dosage of 2 mg
are at least 0.1 off from the measurements (with a
relative standard deviation up to 50% or 25%). In
addition, the predicted S-shape dose-response
relationship seems not to sufficiently agree with the
measured C-shape relationship where measurements at
all dose levels seem to be near or at the plateau of the
curve. Since only four dose levels including placebo are
available to develop a dose-response relationship and all
three dose levels appear to be near or at the plateau of the
response curve, it is worthwhile to explore other methods
of developing the dose-response relationship.
Figure 2. Response probability versus dose for the case
study. Pred_logo in the legend represents model
predictions from ordinary logistic regression.
In a regular coordinate system, the curve of p(x) vs. x
expressed by Equation 5 is a C-shape curve when b ≤ 1
or an S-shape curve when b > 1. The maximum
probability is 1 (when x → infinity). Also note that, the
S-shape of the curve may not appear in a graph of
response probability vs. exposure if the exposure range
(x) is beyond the inflection point (xi = [exp(-a)(b1)/(b+1)]1/b), i.e., x ≥ xi where the response probability
p(x) ≥ (b-1)/(2b) .
LOGISTIC REGRESSION WITH
LOGARITHM TRANSFORMATION
Log transformation expands logistic regression analysis
from S-shape curves to C-shape curves and the
interpretation of parameter estimates is different2-3 Characteristics of log transformation and interpretation
2
NESUG 15
Pharmaceuticals
closer to the measurement the model-predicted
probability at placebo. When the perturbation is less
than 10-6, the model-predicted dose-response curves are
visually indistinguishable, although their parameter
estimates in logistic model are different. Table 1 lists
parameter estimates of the logistic regression model after
log transformation with different perturbation levels to
dose values.
For the case study, since only four dose levels including
placebo are used in the clinical trial, direct log
transformation will result in discarding of the data point
for placebo. Thus, only three dose levels were available
for regression. The model inference based on these three
dose levels would be misleading – the predicted response
probability is constant across the dose range (not shown
in Figure 3), as indicated by the parameter estimates in
the last row in Table 1. At least, the model-predictions
of constant probability (0.45) should not be extrapolated
to placebo.
Visually, the fitting by logistic regression with log
transformation and perturbation is slightly better than
that by ordinary logistic regression. However, the
approximation for placebo using perturbation approach is
tentative and might not be readily accepted. As a matter
of fact, this approximation can actually be avoided if
Equation 5 is directly used to fit the data, instead of
utilizing log transformation in logistic regression
(Equation 1 plus 4). This nonlinear modeling can be
implemented with SAS® PROC NLIN, which is
extensively applied in engineering.
Table 1. The Parameter Estimates for A Logistic
Model after Log Transformation (refer to Equation 4)
Perturbation
level to doses
a
Mean
(std err)
b
Mean
(std err)
0.01
-0.610
(0.115)
-0.424
(0.109)
-0.351
(0.109)
-0.187
(0.116)
0.306
(0.053)
0.175
(0.031)
0.122
(0.022)
0
0.0001
Figure 3. Dose-response relationship fit by logistic
regression with log transformation of dose. Pred_log2,
Pred_log4 and Pred_log6 are model-predictions under 3
different perturbation levels: dose = dose+0.01,
dose+0.0001, or dose+0.000001, respectively, in order to
implement the log transformation on placebo.
0.000001
Placebo
excluded
Therefore, when all response measurements are near or
at plateau of the exposure-response curve, the data point
that represents placebo is critical for analysis and should
be included. To do this, a small perturbation to doses for
log transformation might be helpful. For example,
0.000001 can be added to all dose levels, including
placebo. Thus, the value of log transformation of placebo
will be –13.8 (natural logarithm) while the values of log
transformation of dose 2, 4, and 8 are actually not
changed. When this strategy is used, the model
predictions are reasonably consistent with measurements.
Figure 3 demonstrates the fit of the model (Equation 5)
with logistic regression after log transformation
(Equation 4) with different levels of perturbation on
doses. Stepwise selection of covariates was used with
entry criterion of p=0.05 and elimination criterion of
p=0.01. Generally, the smaller the perturbation, the
PROC NLIN PROCEDURE
When the regression model or the shape of the exposureresponse curve is known, SAS® PROC NLIN is a good
option for nonlinear models, including those for PD data.
Actually, the dose-response curve shape for the case
study can be described with the following general model
structure:
p(x) = α + β*xγ/(δ+ xγ)
(6)
Equation 5 represents just a special case of Equation 6
when α=0, β=1, γ=b and δ=A-1. Similar to Equation 5,
Equation 6 also represents 2 different shapes of curves:
C-shape when γ ≤ 1 and S-shape otherwise, with the
inflection point at xi = [β(γ-1)/(1+γ)]1/γ. Therefore,
3
NESUG 15
Pharmaceuticals
model. However, this model means that the maximum
probability is 0.6. Theoretically, it does not make sense
because for whatever drug, when the drug dose goes to
infinity, the response probability should approach 1 (for
either efficacy or safety). In practice, it could be true that
in a certain range of dosages the response probability
keeps under certain value. Whether this is really true or
not is confirmable from measurements of additional
expanded dose levels or prior information from
fundamental studies. When the maximum probability is
constrained to 1, the fitting is slightly worse, but it is at
least comparable to, if not better than, that through
logistic regression with log transformation.
exposure-response models, which can be developed
through PROC LOGISTIC with log transformation, can
be potentially developed through PROC NLIN without
any transformation. In addition, PROC NLIN can work
on more types of datasets. One advantage of using PROC
NLIN is that placebo data can be directly included for
analysis without any approximation treatment.
As a comparison, Equation 5 was directly used to fit the
data using PROC NLIN and the model-prediction is
shown in Figure 4 by the legend Pred_nlin4. The
parameter estimates are (mean±standard deviation):
A=0.486±0.162 and b=0.384±0.220 (refer to Equation
5). Obviously, this fit is much better than that using
logistic regression with log transformation although the
models are equivalent.
Table 2. Parameter Estimates of the Model Expressed
by Equation 6 via SAS® PROC NLIN Procedure
Figure 4. Model predictions of exposure-response
relationship (Equation 6) with γ=1 under different
conditions: Pred_nlin1 – all three parameters α, β and δ
are estimated from fitting; Pred_nlin2 – α and δ are
estimated from fitting while β is fixed as 1; Pred_nlin3 –
only δ is estimated from fitting while α is fixed as 0.12,
referring to the measured response probability for the
placebo, and β is fixed as 1. Pred_nlin4 – model
(Equation 5) fitting using PROC NLIN.
Modeling
α,
Mean
(std err)
β
Mean
(std err)
δ
Mean
(std err)
Nlin1
0.122
(0.046)
0.149
(0.070)
0.12 fixed
0.475
(0.109)
1–α
fixed
1-α
fixed
1.644
(1.272)
7.95
(3.54)
9.02
(1.75)
Nlin2
Nlin3
MODEL/METHOD SELECTION
For the case study, a reduced function of Equation 6 with
γ=1was tried. Different from logistic regression which
directly uses raw binary data (0 or 1) for the response
variable (dichotomous), the response variable in
nonlinear models are continuous whose values are
calculated subpopulation probabilities (from 0 to 1) at
each exposure level and/or covariate group. Since no
covariate is identified significant for this particular case
study during logistic regression, the response probability
is therefore simply calculated from the subpopulation at
each dose level. The fittings by the model via PROC
NLIN under 3 conditions are illustrated in Figure 4
while their parameter estimates are listed in Table 2. As
expected, when all parameters are obtained from
regression, the fitting is the best (refer to Pred_nlin1 in
Figure 4) since more parameters are included in the
As discussed previously, for the case study in this paper,
there are at least three potential approaches to model the
limited data: ordinary logistic regression, logistic
regression with log transformation, and nonlinear model
fitting. Each of them has advantages and disadvantages.
The quality of fitting is also different, as demonstrated in
Figure 5.
If the measurements are representative and reliable, i.e.
close to the “true” values, the model with the best
prediction is the best. For this particular case study,
general nonlinear model through PROC NLIN is superior
to logistic regressions since the model (refer to Prednlin1 in Figure 5) is best fitting to the data.
4
NESUG 15
Pharmaceuticals
REFERENCE
1.
SAS Institute Inc. SAS/STAT® User’s Guide,
Volume 2. Version 6, 4th edition. Cary, NC: SAS
Institute Inc., 1994.
2.
N Keene. The Log Transformation Is Special.
Statistics in Medicine, Vol. 14, 811-819 (1995).
3.
R K Elswick, Jr, P F Schwartz and J A Welsh.
Interpretation of the Odds Ratio from Logistic
Regression after A Transformation of the Covariate
Vector. Statistics in Medicine, Vol. 16, 1695-1703
(1997).
4.
J M Bailey and K M Gregg. A Technique for
Population Pharmacodynamic Analysis of
Concentration-Binary Response Data.
Anaesthesiology1997; 86:825-35.
SAS® and all other SAS Institute Inc. product r service
names are registered trademarks or trade marks of SAS
Institute Inc. in the USA and other countries. ® indicates
USA registration. Other brand and product names are
registered trademarks or trademarks of their respective
companies.
Figure 5. Comparison of the quality of fitting for three
different approaches: Pred_nlin1 – totally free nonlinear
model (Equation 6 with γ=1) fitting using PROC NLIN;
Pred_nlin4 – nonlinear model (Equation 5) fitting via
PROC NLIN; Pred_logo – ordinary logistic regression
via PROC LOGISTIC; and Pred_log6 – logistic
regression after log transformation on doses plus
approximating placebo with a dosage of 0.000001.
DISCLAIMER
This presentation only reflects my current personal
thinking on potential alternative approaches to PD data
analysis when logistic regression as a primary option
cannot work well, based on my experience in engineering
area. So far, none of these approaches has yet been
applied to any real projects.
SUMMARY
Logistic regression is a powerful tool widely used to
perform PD data analysis. However, its applicability is
limited by its strict assumptions inherited in the model
structure. Although log transformation can expand the
application of logistic regression, the transformation
process itself might restrict this expansion when placebo
data has to be included for analysis. Nonlinear modeling
through PROC NLIN is generally a more flexible and
powerful approach. However, prior information about
the model structure is required and whether PROC NLIN
is successful or not sometime depends on the model
structure and appropriateness of the initial guesses for
parameter estimates.
No material in this presentation is from Cognigen
Corporation. Under no circumstances should this
presentation be related to the position that Cognigen
Corporation takes on PD data analysis.
CONTACT INFORMATION
The author can be contacted at:
Alan J Xiao, Ph.D.
Population PK/PD Scientist
Cognigen Corporation
395 Youngs Road
Buffalo, NY 14221-5831
Phone: 716-633-3463 ext. 265
Fax: 716-633-7404
Email: [email protected]
Web: www.cognigencorp.com
Exploratory graphs of exposure – response probability
should be helpful to select the primary methods and
models and logistic regression could be a convenient
primary option. However, when logistic regression
cannot work very well, alternative methods should be
explored. The selection of the final model should be
based on the combined information about the
characteristics of data, quality of fitting and physiological
rationale.
5