Download An Alternative to the Odds Ratio: A Method for Comparing Adjusted Treatment Group Effects on a Dichotomous Outcome

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Data assimilation wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Time series wikipedia , lookup

Regression toward the mean wikipedia , lookup

Confidence interval wikipedia , lookup

Least squares wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
An Alternative to the Odds Ratio: A Method for Comparing Adjusted Treatment Group
Effects on a Dichotomous Outcome
P. Chris Holland, CL McIntosh and Associates, Inc., Rockville, MD USA
Abstract
The odds ratio is a commonly used statistic for measuring
the association between two groups on a dichotomous
outcome. In clinical trials, it can be used as a measure of
association between two treatment groups on a clinical
response rate. However, it is sometimes desired to
compare directly the estimated difference in response
rates between two treatment groups.
While the
SAS/STAT® software does offer features for such
comparisons, it does not allow for adjustments made to
the treatment effect based on other explanatory variables.
I present a method for testing success rate differences
between two treatment groups while adjusting for other
factors and a SAS macro that performs the necessary
steps for carrying out the procedure. The macro produces
an output data set that contains estimated adjusted
success rates, the difference between them, confidence
intervals around the difference, and an associated pvalue.
Introduction
The odds ratio is a widely used statistic in logistic
regression analysis. In clinical trials, it allows researchers
to answer the question, “which treatment is better?” with
respect to a certain outcome or event. In a study that
examines the effects of a drug used to treat heart disease,
the event could be a heart attack within. Or there could
be some pre-defined criteria that are used to determine if
a study subject is a treatment responder or nonresponder.
In a simple logistic regression model, the odds ratio
between two groups is often used to determine which
group increases or decreases a subject’s chances of
experiencing whatever outcome is being modeled. An
odds ratio of 1 indicates that both groups are equal.
Confidence intervals can be used to see if the odds ratio
contains the value of 1. If so, one could conclude that,
with a specified level of certainty, the two groups are
statistically equal. Conversely, a confidence interval that
does not contain 1 would signify that the odds of the event
occurring are statistically significantly greater for one
treatment group as compared to another.
In a multiple logistic regression model, factors such as
age, gender or smoking history can be added to a model
and used to help better predict a the chances of an event.
For example, if the gender distribution differs between two
groups and gender is known to have an effect on the
outcome, then the effect from the gender disparity can be
used to adjust the group’s effect on the event. This
results in a more accurate estimate of the model
coefficient for the group effect and the odds ratio
associated with it.
Sometimes, however, researchers are interested in
knowing the estimated event rates and the difference
between those rates as actual percents. One reason for
this would be the ease in interpreting the results.
Interpreting percentages, the differences between those
percentages, and confidence intervals around those
differences are likely to be more intuitive to the reader
than an odds ratio.
Another, and perhaps more
compelling reason for certain clinical studies, can be
attributed to FDA guidelines. For studies that evaluate the
efficacy of antimicrobials, for example, the FDA guidance
documents suggest trial success criteria that are based
on the differences in treatment success rates between the
test drug and an active control. The recommended
analytical approach involves estimation through the use of
95% confidence intervals around the treatment difference
and a certain threshold for the lower limit of those
intervals (the threshold depends on the treatment success
rates observed during the trial).
Presenting results in the form of success rates,
differences between the observed success rates, and
confidence intervals around those differences is
straightforward when no adjustments are made. The 6.12
release of the SAS System made these analyses easy to
perform with the addition of the RISKDIFF option for the
FREQ procedure. However, the same philosophy that
explained the advantage of using adjusted odds ratios
could be applied to estimated response rates. Without
making adjustments for factors that are known to affect
the response, disparities among these factors between
the treatment groups could be mistaken for a treatment
effect or non-effect.
I describe the PCNTDIFF macro that, with the use of the
GENMOD procedure and the SAS macro language,
derives estimated and adjusted response rates from a
given logistic regression model for two specified groups.
The difference between these rates and an associated pvalue (using normal approximation) are then computed.
Associated confidence intervals are also constructed.
The results are then saved in an output data set.
Logistic Regression Background
Logistic regression is used to model dichotomous or
binary outcomes. It represents a way of transforming
these data so that the properties of simple linear
regression can hold up.
In any simple regression equation, we model the mean
value of the outcome variable, Y, given the value of the
independent variable, x. This is known as the conditional
mean, or the “expected value of Y given x and is denoted
as E(Y|x). The regression equation is:
E(Y|x) = β0 + β1 x
As the name suggests, logistic regression is based on the
logistic distribution. Common notation to represent E(Y|x)
when logistic regression is used is π(x). The form of the
logistic regression model is:
β +β x
π(x) = __e 0 1 _
β +β x
1+e 0 1
In order to take advantage of the properties of linear
regression, a transformation of π(x) is necessary. This
transformation is called the logit transformation and is
define as:
g(x) = ln[ π(x) / (1 - π(x)) ]
Given π(x) from above we have:
g(x) = β0 + β1 x
Expanding this equation for multiple regression we get:
g(x1, x2,…, xp ) = β0 + β1 x1 + β2x2 + …βp xp
Adjusted Effect Estimates
Although the odds ratio is a useful and widely used
statistic, it doesn’t tell the whole story when trying to
compare two groups to one another. Most importantly, it
doesn’t tell a reader the difference in the outcome rates
between the two groups. The methodology below
explains how adjusted effect estimates, the difference
between the estimates, a confidence interval around the
difference and the associated p-value are all are
constructed.
Given x’=(x1, x2,…, xp), we have the following equation for
this multiple logistic regression model:
g(x’) = β0 + β1 x1 + β2x2 + …+ βp xp
Reverting back from the logit transformation gives us:
β +β x
β x
π(x’) = __e 0 1 1+ … + p p _
β +β x
β x
1 + e 0 1 1+ … + p p
This can perhaps more simply be expressed as:
π(x’) = __
1
_
-(β + β x
β x )
1 + e 0 1 1+ … + p p
Now, let’s assume that the two groups we want to
compare are represented by x1, whose corresponding coefficient is β 1. Since π( x’) represents the equation for the
estimated outcome rate, our objective then is to fit π(x’)
for x1=1 and x1=0 (noted as π1 and π0 ) and then find the
difference between these two values. The approach used
for doing this is the same as that used to construct leastsquares means, or population marginal means, which
results in the mean for each group that you would expect
from a balanced design.
For each model parameter, a co-efficient is sought. For
the covariates (continuous variables), we use the overall
population mean. One thought may be to choose the
mean value for each factor in the group where x1=1 and
then do likewise for x1=0. This would, after all, allow us to
find the most accurate predictor from each of the two
treatment groups. However, finding the most accurate
predictor isn’t the objective. In fact, allowing potential
differences between the two groups with respect to other
factors in the model would confound the objective of trying
to quantify the differences between the two groups while
holding all other effects equal.
For a categorical variable, the weight (or co-efficient) is
the inverse of the number of levels in the category.
Alternatively, if desired, one could use the population
mean percentages for each category. To do this, replace
the categorical variable from the CLASS statement with
dummy or indicator variables that are treated as
covariates.
For interaction terms, simply use the product of the coefficient used for each term. So, if modeling for x1=1, an
interaction term involving the group and a categorical
variable with k levels would use 1*1/k as the co-efficient
for each of the (k-1) interaction parameters. If modeling
for π0, then the co-efficient is zero.
Let’s say we have a model with the group variable and
parameter β1, a categorical variable with k levels and
parameters β21 to β2(k-1), a continuous covariate with
parameter β3, and a term for the group-by-categorical
variable interaction with parameters β41 to β4(k-1). The two
equations for finding estimates of π1 and π0 would then
be:
-[β +β +β (1/k) +
π1 = 1 / 1 + e 0 1
+
(1/k)]
+β
… 4(k-1)
π0 = 1 / 1 + e
21
-[β + β (1/k)
0
21
+β
…
+β
…
2(k-1)
2(k-1)
(1/k) +β xbar+β (1/k)
3
41
(1/k) +β xbar]
3
The estimate of interest is then π1 - π0, which, at this point,
is simple to compute. Finding the variance of this
difference, however, is admittedly heuristic. Unlike the
actual least squares means for linear models where the
variance associated with the difference between the
means is well understood, finding a precise variance
estimator for π1 - π0 is not. As a result, π1 and π0 are
treated as parameters to binomial distributions. The
variance of each is thus π1(1 - π1) and π0(1 - π0),
respectively. We can estimate the variance for π1 - π0 as
S p = π1(1 - π1)/n1 + π0(1 - π0)/n0. For sufficiently large n1
and n0, we can regard π1 - π0 as a normal random variable
and can use the normal approximation to construct a twosided confidence interval around this difference. This
gives us the following formula:
(π1 - π0) ± zα/2 * sqrt[π1(1 - π1)/n1 + π0(1 - π0)/n0]
Where zα/2 represents the value from a standard normal
distribution. It follows that the p-value is also derived from
the normal distribution.
The PCNTDIFF Macro
All of computations mentioned so far are automatically
taken care of by the PCNTDIFF macro. To use the
macro, one simply needs to provide a few macro
parameters, such as the input data set, the response
variable, the group variable, x1, and other explanatory
variables for the requested logistic regression model.
These include the CLASS (categorical) variables, the
covariates, and the interaction terms. A macro call using
all of these parameters would look something like this:
%pcntdiff(data=dataset, response=success, grp=trt,
classvrs=site race, intrax=trt*site trt*race, covariat=age,
byvars= study);
The first three parameters are required, although without
any adjustments from class variables, covariates, or
interactions, the purpose of using the macro is defeated.
The first three parameters represent the input data set,
the dependent variable, and the group variable,
respectively. Class variables, interaction terms, and
covariates are specified with the CLASSVRS, INTRAX,
and COVARIAT parameters, respectively. If more than
one term is desired for either of these, then each term
should be separated by a space (however, with interaction
terms, the term itself should have no spaces on either
side of the “*” symbol). Lastly, the BYVARS parameter is
used to run the analysis for different BY groups.
Once all of the macro parameters are specified and the
macro is executed, the GENMOD procedure in the
SAS/STAT software is used to construct the parameter
estimates for the logistic regression model. The FREQ
procedure is used to find the weights for the CLASS
variables and the MEANS procedure is used to calculate
the means of each covariate. As shown above, if
interaction terms are present, the coefficients for
interaction terms involving the treatment group variable
fall out of the π0 equation (since x1=0).
If desired, the procedure output from all procedures used
within the macro can be printed to an output file (or the
output window if running SAS interactively). This can be
helpful for double-checking the macro’s results.
After all of the necessary values are found, the estimated
and adjusted outcome rates are computed. An output data
set is created containing variables that represent the BY
variables (if any are specified), character or numeric
representations for each of the two comparison groups,
n1, n0, π1, π0, (π1 - π0), the CI around (π1 - π0), the
associated p-value, and other related statistics.
An Applied Example
To illustrate the advantage of constructing adjusted
success rates and the use of the PCNTDIFF macro,
consider an analysis using fabricated data that examines
the effects of an experimental drug with an active control
on subjects who have been diagnosed with pneumonia. A
subject is considered a success if his or her condition
improves or the subject is cured after one week on
treatment (as judged by the investigating physician). In
order for the trial to be considered a success, the results
should demonstrate that the experimental treatment is
statistically and clinically superior or equivalent in efficacy
to the active control.
The recommended analytical
approach as defined by FDA guidance documents is to
construct a two-sided 95% confidence interval of the
treatment difference (test drug minus control) in success
rates. The confidence interval should contain zero and
the lower limit of the confidence interval should not
exceed the clinically specified boundary for establishing
efficacy equivalence, which depends on the better of the
two treatment success rates.
The data used to demonstrate this application contain 219
intent-to-treat subjects, 113 of whom are in the active
control group and 106 of whom are in the test drug group.
In accordance with the analytical approach mentioned
above, the efficacy analysis is based on a two-sided 95%
confidence interval around the difference in the treatment
success rates.
These are obtained by using the
RISKDIFF option in PROC FREQ. Output 1 contains the
contingency table.
Output 1:
TABLE OF TMT BY SUCCESS
TMT
SUCCESS
Frequency
‚
Row Pct
‚No
‚Yes
‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Test Drug
‚
35 ‚
71 ‚
‚ 33.02 ‚ 66.98 ‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Active Control ‚
21 ‚
92 ‚
‚ 18.58 ‚ 81.42 ‚
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
56
163
Total
106
113
219
As can be seen from the table, the success rate in the
active control group looks considerably greater than that
in the test group (81.4% vs. 67.0%). Since the overall
success rate of the better of the two groups is in the 8090% range, the lower limit of confidence interval is going
to have to be greater than –15% in order to demonstrate
clinical and statistical equivalence (according to the FDA
guidelines). The Chi-Square test statistics appear in
Output 2.
Output 2:
STATISTICS FOR TABLE OF TMT BY SUCCESS
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
1
5.988
0.014
Likelihood Ratio Chi-Square
1
6.027
0.014
Continuity Adj. Chi-Square
1
5.253
0.022
Mantel-Haenszel Chi-Square
1
5.961
0.015
Fisher's Exact Test (Left)
0.011
(Right)
0.995
(2-Tail)
0.020
Phi Coefficient
-0.165
Contingency Coefficient
0.163
Cramer's V
-0.165
Regardless of which test you use, statistical significance
falls in favor of the active control group, which suggests
that the asymptotic confidence interval is not going to
contain zero and the study success criteria is therefore
not going to be met. Nonetheless, lets look at the column
2 risk estimates in Output 3:
Output 3:
Output 6:
Column 2 Risk Estimates
Final Results-- Data Set logit
95% Confidence Bounds 95% Confidence Bounds
Risk
ASE
(Asymptotic)
(Exact)
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Row 1
0.670
0.046
0.580
0.759
0.572
0.758
Row 2
0.814
0.037
0.742
0.886
0.730
0.881
Total
0.744
0.029
0.687
0.802
0.681
0.801
Difference
-0.144
(Row 1 - Row 2)
0.059
-0.259
Output 4:
TTEST PROCEDURE
Variable: AGE
TMT
N
Mean
Std Dev
Std Error
---------------------------------------------------------------Active Control
113
42.33628319
4.95625618
0.46624536
Test Drug
106
47.31132075
6.93419849
0.67350890
Subjects in the test group are five years older, on
average, than the active group. This is not something you
would expect to happen in a properly randomized trial, but
it is subject to happen nonetheless. Since age can be
considered an influential factor on the outcome, there is
good reason to re-construct the response rates by
adjusting for age. This is where the PCNTDIFF macro
comes into play. The macro call would look like:
%pcntdiff(data=pneumo, response=success, grp=trt,
covariat=age);
Here, we are adjusting treatment group effects by the
covariate age. Looking at the parameter estimates from
the logistic regression model as seen in Output 5, we can
get a good idea about the strength and direction of the
age effect.
Output 5:
The GENMOD Procedure
Analysis Of Parameter Estimates
NOTE:
Test Drug
Active Control
Active Control
Test Drug
0.7912
OBS
1
Difference
-0.07
Adjusted
Success Rate
for Group 2
0.7212
95% Confidence
Interval
normal
approximation
p-value
(-.1836, 0.0436)
0.22712
As suspected, the results are much different from what
was seen before the age adjustment was made. Applying
these results to the study success criteria, we see that the
confidence interval does contain zero. The lower limit of
the confidence interval is less than –15%, but since the
success rate for the better of the two treatments is now
less than 80%, the lower bound threshold changes to
–20% according to the FDA criteria. By these
parameters, the trial success criteria are met.
Conclusion
Variances
T
DF
Prob>|T|
--------------------------------------Unequal
-6.0734
189.0
0.0001
Equal
-6.1369
217.0
0.0000
INTERCEPT
TMT
TMT
AGE
SCALE
1
Group 1 Value
Adjusted
Success Rate
for Group 1
-0.030
The confidence interval for Test minus Control is (-0.259, 0.030). All may not be lost, however. Review of the
demographics and baseline characteristics between the
two treatment groups reveals a statistically significant
difference with respect to the age. Results from a t-test
are in Output 4.
Parameter
OBS
Group 2
Value
DF
Estimate
Std Err
ChiSquare
Pr>Chi
1
1
0
1
0
4.8700
-0.3817
0.0000
-0.0791
1.0000
1.1000
0.3470
0.0000
0.0247
0.0000
19.6017
1.2097
.
10.2394
.
0.0001
0.2714
.
0.0014
.
The scale parameter was held fixed.
The evidence is strong that response rates adjusted for
age would yield more favorable results. The results from
the PCNTDIFF macro are in Output 6.
The odds ratio has become an important statistic for
comparing the relationship between two groups with
respect to a dichotomous or binary outcome. Multiple
logistic regression models allow researches to adjust
group effects for other possible sources of variation.
However, there remains a missing link between the odds
ratio (and the model parameter estimate with which it is
associated) and the adjusted event rates that multiple
logistic regression models help estimate.
The
methodology presented in this paper, and the SAS macro
used to carry out this methodology, attempts to bridge the
gap from hypothesis-based statistics for the logistic
regression model’s parameters, to the estimated event
rates derived from those models. The macro creates a
data set that contains adjusted estimates to the event
rates for the groups of interest, an estimated difference
between those event rates, a two-sided confidence
interval around the difference, and p-value for the test that
the two groups are equal. Under proper conditions, the
conclusions drawn from this hypothesis test should
closely match those from the multiple logistic regression
model’s odds ratio and parameter estimate hypothesis
tests.
References
Fisher, Lloyd; van Belle, Gerald; Biostatistics: a
methodology for the health sciences, New York, Wiley,
1993
Holland, P. Chris, More Class to PROC PHREG: An
Enhanced SAS® Macro for the Analysis of Cox
Proportional Hazards Models that Involve Multinomial
Effects, PharmaSUG 1999 Conference Proceedings
Hosmer, David W.; Lemeshow, Stanley; Applied Logistic
Regression, New York, Wiley, 1989
SAS Institute Inc., SAS/STAT User’s Guide, Version 6,
Fourth Edition, Volumes 1 and 2, Cary, NC: SAS Institute
Inc., 1989
Stokes, Maura E.; David, Charles S., and Kock, Gary G.;
Categorical Data Analysis Using the SAS System, Cary,
NC: SAS Institute Inc., 1995 499 pp.
US FDA Draft Guidance Document for Evaluating Clinical
Studies of Antimicrobials in the Division of Anti-infective
Drug Products. Issued February, 1997
SAS Institute Inc., SAS/STAT Software: Changes and
Enhancements through Release 6.11, Cary, NC: SAS
Institute Inc., 1996 1104 pp.
SAS and SAS/STAT are registered trademarks or
trademarks of SAS Institute Inc. in the USA and other
countries.  indicates USA registration.
Acknowledgements
I would like to thank Dr. Hoi Leung for his help in
explaining and introducing this procedure to me.
Contact Information
P. Chris Holland, MS
Statistician
CL McIntosh & Associates, Inc.
12300 Twinbrook Parkway, Suite 625
Rockville, MD20852
Phone: (301) 770-9590 ext. 271 (W)
(703) 524-9810 (H)
e-mail: [email protected]
The PCNTDIFF macro can be downloaded from the World
Wide Web at:
http://www.erols.com/petey/macrodoc