Download A Concise and Easily Interpretable Way of Presenting the Results of Regression Analyses

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Probability wikipedia , lookup

Statistics wikipedia , lookup

History of statistics wikipedia , lookup

Transcript
A CONCISE AND EASILY INTERPREI'ABLE WAY OF PRESENTING
THE RESULTS OF REGRESSION ANALYSES
Leland Stewart, Lockheed Palo Alto Research Laboratories
Stephen Senzer, Lockheed Palo Alto Research Laboratories
Introduction
Our interest is in problems where
stepwise or all-subsets regression would
normally be used, i.e. it is suspected that
some of the independent variables will have
negligible effect, or equivalently, that some of
the ais will equal zero. (In this paper, we
say "ai=o" when ai is small enough that~ is
considered to have a negligible effect.)
This paper describes a method of
presenting the results of regression analyses
that uses a graphical display of the posterior
distributions of the regression parameters.
This is an effecti va way of concisely
communicating both the posterior probability
that a regression parameter equals zero (i.e.,
the corresponding variable has no effect) and
the posterior density of the parameter if it
does not equal zero. This approach also
allows the user to compute the posterior
probability for each model in a set of possible
models and therefore to retain consideration
of several or many models throughout the
anslysis rather than to restrict attention to
just one "best" model
Presenting the Results of a Regression
Analysis
In Bayesian analysis knowledge and
uncertainty about parameter values are
expressed by probability distributions. Before
seeing the data this distribution is called the
prior distribution. The prior distribution has
the same sta tUB as other standard
assumptions such as normality, linearity,
independence, etc., in the sense that they all
add information to the analysis of the data.
The details of the prior that is used here can
be found in Stewart (1987).
It is suggested that this method be
considered for future inclusion in the SAS*
system as a supplement or alternative to
currently used methods of displaying results
in regression analyses.
Example
The joint distribution of the parameters,
given the data, is called the posterior
distribution. The posterior distribution
represents the combination of the information
in the data with the information and
uncertainty expressed by the prior. All
inferences, uncertainty intervals, decisions,
etc. are computed from the posterior
distribution.
A specific example in multiple logistic
regression is used to illustrate this method,
however, this approach has general
applicability. In this example the observed
response is success or failure versus eight
'independent' or 'regressor' variables, X" The
usual linear relationship was employed~
Figure 1 shows the marginal posterior
distributions for the regression coefficients.
This is the type of display that we propose for
future inclusion in the SAS System. The
scale of values for the 9;'s is shown at the
bottom of Figure 1. Note particularly the
location of 9 i=O. When 9 i=O the
corresponding independent variable Xi has no
effect on the response.
8
y=
I:9 i X i +9 9
i=1
where the 9 i are unknown parameters
(regression coefficients). In logistic regression
Prob [Failure] = F 00, where F is the logistic
cumulative distribution function. This
example is described and analyzed in detail
in Stewart (1987).
1250
Figure 1
POSTERIOR DISTRIBUTIONS OF THE PARAMETERS
0.29
~
p(a
2
= 0)
. 0.10
-0.6
0.8
-o.~
1251
To understand the meaning of Figure 1
one can examine the marginal posterior
distribution for the second regression
coefficient 6 2 , The "spike" of probability
denotes that the probability that 62"'0' given
the data, is 0.29, that is, the posterior
probability is 29% that the independent
variable X 2 has negligible effect. The
posterior probability that 6 2 is not equal to
zero is 0.71 and the uncertainty in the true
value of 6 2 i. indicated by the density shown.
(Actually this is not a true density since its
area equals 0.71, not 1.)
Computation
In the computation of the posterior
distributions in Figure 1, all subsets of the
independent variables were considered.
There are 2 8=256 subsets, each of which
defines a regression model with non-zero
coefficients. A Bayesian approach allows the
analyst to compute the posterior probability
for each model in a set of possible models and
therefore to retain consideration of several or
many models throughout the analysis rather
than to restrict attention to just one 'best'
model. This makes possible the inclusion of
model uncertainty in predictions and other
results. Prediction uncertainty limits derived
from a 'best' model fail to account for model
uncertainty and are usually overly optimistic.
Also notice that by not narrowing
consideration to a single model one avoids the
problem of deciding whether or not variables
such as X2 should be included in the model.
One if the advantages of a Bayesian
approach is that it makes possible the type of
presentation of the results of a regression
analysis that is shown in Figure 1. A great
deal of information is conveyed in that figure
about what is implied by the data, and it is
presented in a form that can be easily and
correctly interpreted by the user. The
posterior uncertainties associated with each
6 i are clearly shown.
This same example was run using the
LOG 1ST Procedure. In the backward
stepwise mode the hypothesis that 62"'0 was
not rejected in spite of the fact that the
posterior probability was 71% that 9 2 has a
non-zero value.
Table A
PROB[6i =O I DATAl VS. PVALUES
i
P-values
In most cases the P-value is much smaller
than the posterior probability that the
hypothesis is true. In Table A the posterior
probabilities that 6 i=O and the coITellponding
P-values are compared for this example. The
differences shown are typical of many cases
that have been studied. A Bayesian would
argue that posterior probabilities should be
used in de.cision making. For more
information on this subject see Berger and
SeIlke (1987) and the discussion that follows
their paper.
1252
Posterior
Probability
that9i =0
PValues
(Proc LOGIST)
1
.00
.00
2
.29
.04
3
.16
.015
4
.10
.005
5
.56
.40
6
.47
.21
7
.54
.40
8
.56
.40
When the number of independent
variables gets large, say 30, then there would
be 10 9 possible models. However, only a
small fraction of those have to be examined if
we can draw a suitable Monte Carlo sample of
the models. Techniques for doing this
efficiently need to be developed.
Contact
If you have any questions or commehts
please write or call:
Lee Stewart
0/92-20 BI2ME
Lockheed Research Laboratories
3251 Hanover St.
Palo Alto, CA 94304
Phone: (415) 424-2710
Conclusions
The methodology that we have presented
has several desirable properties. It provides a
concise overall picture of the results of the
analysis. The displays are informative and
easily interpretable. Results are given in
terms of posterior probabilities instead of Pvalues (it would be interesting and
informative to display these two side by side).
Consideration of several or many models can
be retained throughout the analysis which
means that model uncertainty can be
included in the results.
References
Berger, J.O. and SelIke, T. (1987),
"Testing a Point Null Hypothesis: The
Irreconcilability ofP-Values and Evidence,"
Journal of the American Statistical
Association, 82, 112-122.
Stewart, L.T. (1987), "Hierarchical
Bayesian Analysis Using Monte Carlo
Integration: Computing Posterior
Distributions When There are Many Possible
Models," The Statistician, 36, 211-219.
We suggest that these methods be
considered for future inclusion in the SAS
System.
* SAS is the registered trademark of SAS
Institute Inc., Cary, N.C., USA
1253