Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
A CONCISE AND EASILY INTERPREI'ABLE WAY OF PRESENTING THE RESULTS OF REGRESSION ANALYSES Leland Stewart, Lockheed Palo Alto Research Laboratories Stephen Senzer, Lockheed Palo Alto Research Laboratories Introduction Our interest is in problems where stepwise or all-subsets regression would normally be used, i.e. it is suspected that some of the independent variables will have negligible effect, or equivalently, that some of the ais will equal zero. (In this paper, we say "ai=o" when ai is small enough that~ is considered to have a negligible effect.) This paper describes a method of presenting the results of regression analyses that uses a graphical display of the posterior distributions of the regression parameters. This is an effecti va way of concisely communicating both the posterior probability that a regression parameter equals zero (i.e., the corresponding variable has no effect) and the posterior density of the parameter if it does not equal zero. This approach also allows the user to compute the posterior probability for each model in a set of possible models and therefore to retain consideration of several or many models throughout the anslysis rather than to restrict attention to just one "best" model Presenting the Results of a Regression Analysis In Bayesian analysis knowledge and uncertainty about parameter values are expressed by probability distributions. Before seeing the data this distribution is called the prior distribution. The prior distribution has the same sta tUB as other standard assumptions such as normality, linearity, independence, etc., in the sense that they all add information to the analysis of the data. The details of the prior that is used here can be found in Stewart (1987). It is suggested that this method be considered for future inclusion in the SAS* system as a supplement or alternative to currently used methods of displaying results in regression analyses. Example The joint distribution of the parameters, given the data, is called the posterior distribution. The posterior distribution represents the combination of the information in the data with the information and uncertainty expressed by the prior. All inferences, uncertainty intervals, decisions, etc. are computed from the posterior distribution. A specific example in multiple logistic regression is used to illustrate this method, however, this approach has general applicability. In this example the observed response is success or failure versus eight 'independent' or 'regressor' variables, X" The usual linear relationship was employed~ Figure 1 shows the marginal posterior distributions for the regression coefficients. This is the type of display that we propose for future inclusion in the SAS System. The scale of values for the 9;'s is shown at the bottom of Figure 1. Note particularly the location of 9 i=O. When 9 i=O the corresponding independent variable Xi has no effect on the response. 8 y= I:9 i X i +9 9 i=1 where the 9 i are unknown parameters (regression coefficients). In logistic regression Prob [Failure] = F 00, where F is the logistic cumulative distribution function. This example is described and analyzed in detail in Stewart (1987). 1250 Figure 1 POSTERIOR DISTRIBUTIONS OF THE PARAMETERS 0.29 ~ p(a 2 = 0) . 0.10 -0.6 0.8 -o.~ 1251 To understand the meaning of Figure 1 one can examine the marginal posterior distribution for the second regression coefficient 6 2 , The "spike" of probability denotes that the probability that 62"'0' given the data, is 0.29, that is, the posterior probability is 29% that the independent variable X 2 has negligible effect. The posterior probability that 6 2 is not equal to zero is 0.71 and the uncertainty in the true value of 6 2 i. indicated by the density shown. (Actually this is not a true density since its area equals 0.71, not 1.) Computation In the computation of the posterior distributions in Figure 1, all subsets of the independent variables were considered. There are 2 8=256 subsets, each of which defines a regression model with non-zero coefficients. A Bayesian approach allows the analyst to compute the posterior probability for each model in a set of possible models and therefore to retain consideration of several or many models throughout the analysis rather than to restrict attention to just one 'best' model. This makes possible the inclusion of model uncertainty in predictions and other results. Prediction uncertainty limits derived from a 'best' model fail to account for model uncertainty and are usually overly optimistic. Also notice that by not narrowing consideration to a single model one avoids the problem of deciding whether or not variables such as X2 should be included in the model. One if the advantages of a Bayesian approach is that it makes possible the type of presentation of the results of a regression analysis that is shown in Figure 1. A great deal of information is conveyed in that figure about what is implied by the data, and it is presented in a form that can be easily and correctly interpreted by the user. The posterior uncertainties associated with each 6 i are clearly shown. This same example was run using the LOG 1ST Procedure. In the backward stepwise mode the hypothesis that 62"'0 was not rejected in spite of the fact that the posterior probability was 71% that 9 2 has a non-zero value. Table A PROB[6i =O I DATAl VS. PVALUES i P-values In most cases the P-value is much smaller than the posterior probability that the hypothesis is true. In Table A the posterior probabilities that 6 i=O and the coITellponding P-values are compared for this example. The differences shown are typical of many cases that have been studied. A Bayesian would argue that posterior probabilities should be used in de.cision making. For more information on this subject see Berger and SeIlke (1987) and the discussion that follows their paper. 1252 Posterior Probability that9i =0 PValues (Proc LOGIST) 1 .00 .00 2 .29 .04 3 .16 .015 4 .10 .005 5 .56 .40 6 .47 .21 7 .54 .40 8 .56 .40 When the number of independent variables gets large, say 30, then there would be 10 9 possible models. However, only a small fraction of those have to be examined if we can draw a suitable Monte Carlo sample of the models. Techniques for doing this efficiently need to be developed. Contact If you have any questions or commehts please write or call: Lee Stewart 0/92-20 BI2ME Lockheed Research Laboratories 3251 Hanover St. Palo Alto, CA 94304 Phone: (415) 424-2710 Conclusions The methodology that we have presented has several desirable properties. It provides a concise overall picture of the results of the analysis. The displays are informative and easily interpretable. Results are given in terms of posterior probabilities instead of Pvalues (it would be interesting and informative to display these two side by side). Consideration of several or many models can be retained throughout the analysis which means that model uncertainty can be included in the results. References Berger, J.O. and SelIke, T. (1987), "Testing a Point Null Hypothesis: The Irreconcilability ofP-Values and Evidence," Journal of the American Statistical Association, 82, 112-122. Stewart, L.T. (1987), "Hierarchical Bayesian Analysis Using Monte Carlo Integration: Computing Posterior Distributions When There are Many Possible Models," The Statistician, 36, 211-219. We suggest that these methods be considered for future inclusion in the SAS System. * SAS is the registered trademark of SAS Institute Inc., Cary, N.C., USA 1253