Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Lovely Lucid Logistics the analysis and graphic presentation of effects of nominal and metric variables on binary outcomes Diana Eugenie Kornbrot Blended Learning Unit University of Hertfordshire [email protected] 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire 1 Abstract Logistic regression can be used to answer the same questions about binary variables that ANOVA and ANCOVA answer about metric variables. However, SPSS provides much less support for logistic regression. The Logistic Regression Procedure provides no equivalent of ANOVA Means Tables or Profile Plots. This presentation shows how to use a combination of SPSS Procedures to produce Tables and Graphs of predicted logit and probabilities as a function of categorical factor and metric covariate variables. Diagnostics for model fit NOT discussed Merits own presentation 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 2 Acknowledgments Lia Kvavilashvili For all the prospective memory data Stimulating theoretical discussion on content ESRC Project Grant 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 3 Goals Motivate Logistic regression Graphic Presentation of Logistic Model Results Predictions Factors and Contrasts Application to Different Designs Logits and Probabilities as function explanatory variables Identification of statistically reliable effects Interpretation much easier from graphs Explanatory variables: 2 or 3 categorical Explanatory variables: 1 metric, 1 or 2 categorical Recommendations to Users of Logistic Regression Recommendations to SPSS 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 4 Why Logistic Analysis? Need to analyse binary, i.e. 2 alternative, responses Errors: right, wrong Events: remembered, forgotten Success: grant awarded, grant rejected patient recovered, or not More than 1 categorical variable Chi-square not sufficient Combination of metric and categorical explanatory variables Interactions matter 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 5 Why Interpretation of Results is a Problem Analysis is on log (odds ratio) or logits Need for Packages SPSS or other Lack of intuitive feel for logits Lack of intuitive feel for odds ratios for non-betters Probabilities are more ‘natural’? Can’t hand calculate, as no closed form answer SPSS Output Primary output is in logits No directly useful graphics output BUT Save permits direct saving of probabilities no logits ?No confidence levels on probabilities 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 6 Analysis Analysis GLM framework Effects assumed to be linear on logits Model Goodness of Fit Test on – 2LogLikelihood, -2LL Model Fitting Procedure Effect of Evauluation Criteria: SPSS uses Wald SPSS uses Wald, other packages use deviance = -2LL On factors and covariates On model parameters Other Packages Vary, all give Wald as minimum JMP, SPSS, SAS, SYSTAT 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 7 Data Example: Prospective Memory Prospective Memory Does person have GOOD prospective memory 5 or 6 occasions remembered from 6 opportunities Model 1: task(action, event, time), age(4 categories) Model 2: task(action, event, time), age(4), intellect Presentation Criteria Easy to interpret > Graphics Predicted probability and logits Estimate of accuracy as part of results Tests for explanatory variable effects and contrasts 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 8 Model 1 using SPSS menus Analyze > Regression > Binary Logistic Dependent Covariates Method Categorical or Save Options good# task#(cat) age#(cat) task#(cat)*age#(cat) Enter task#(deviation) age#(deviation) task#(repeated) age#(repeated) !!!NOT indicator, the default!!! not a lot of people know that! probabilities, Cook’s, deviation CI for exp(B) 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 9 Model 1 Global Results Model 1: task(action, event, time), age(4 categories) Omnibus Test Significant = Good Model Summary Substantial variance accounted for Model Summary Omnibus Tests of Mo del Co efficients Chi-square Step 1 Step 48.948 Block 48.948 Model 48.948 df 11 11 11 Sig. .000 .000 .000 Step 1 -2 Log Cox & Snell likelihood R Square 180.197 a .238 Nagelkerke R Square .331 a. Estimation terminated at iteration number 6 because parameter estimates changed by less than .001. 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 10 SPSS: Model 1 Parameters Variable effect not salient No effects or standard errors for reference (last) Wald Estimates of s.e. may not be those that are needed? Variables in the Equation B Step a 1 TASK# TASK#(1) TASK#(2) AGE# AGE#(1) AGE#(2) AGE#(3) AGE# * TASK# AGE#(1) by TASK#(1) AGE#(1) by TASK#(2) AGE#(2) by TASK#(1) AGE#(2) by TASK#(2) AGE#(3) by TASK#(1) AGE#(3) by TASK#(2) Constant S.E. .828 -1.024 .299 .259 1.309 .064 -.303 .371 .335 .344 .290 -.478 .014 .257 -.536 -.476 .704 .600 .441 .512 .442 .480 .469 .199 Wald 16.521 7.677 15.611 17.675 12.476 .036 .774 7.540 .234 1.175 .001 .337 1.246 1.028 12.554 df 2 1 1 3 1 1 1 6 1 1 1 1 1 1 1 Sig. .000 .006 .000 .001 .000 .850 .379 .274 .629 .278 .978 .561 .264 .311 .000 Exp(B) 95.0% C.I.for EXP(B) Lower Upper 2.288 .359 1.274 .216 4.110 .597 3.704 1.066 .739 1.791 .553 .376 7.660 2.054 1.450 1.337 .620 1.014 1.293 .585 .621 2.022 .412 .261 .372 .543 .228 .248 4.337 1.472 2.766 3.075 1.500 1.559 a. Variable(s) entered on step 1: TASK#, AGE#, AGE# * TASK# . 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 11 SPSS Graphic Representation Predicted Probabilities, pre_1 Directly Available from Save Logits can be calculated Compute > Transform Graph > Interactive > Line plot Lgt = ln(pre_1/(1-pre_1) NB Most other packages allow direct saving of logits Y axis X axis Colour predicted probability (mean) age# task# No interactions So expect logit plots to be ‘more’ linear 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 12 SPSS: Logit & Probability Graphs 1.00 0.80 action event time task# 2.00 action event time 0.60 lgt Predicted probability 3.00 task# 1.00 0.40 0.00 0.20 -1.00 18-30 61-65 71-75 76-80 age# 18-30 61-65 71-75 76-80 age# Raw probability Logit ??looks more linear?? Confidence Levels??? NOT in SPSS!!! 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 13 Confidence Levels Assume no extra-binomial dispersion Asymptotic for logit Asymptotic for probability Symmetric about mean(lgt) se(lgt)2 = 1/Noccur - 1/Nnot occur Lower Confidence Level, 95%, LCL(lgt) = mean(lgt) -1.96se(lgt) Upper Confidence Level, 95%, LCL(lgt) = mean(lgt) +1.96se(lgt) Asymmetric about mean(prob). Calculate from lgt CLs probability = exp(lgt)/[1+exp(lgt] LCL(prob) = exp(LCL(lgt)0/[1+exp(LCL(lgt)) UCL(prob) = exp(UCL(lgt)0/[1+exp(UCL(lgt)) Use EXCEL, can’t customise error bars in SPSS 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 14 EXCEL: Logit & Probability Graphs action event time 1.0000 action event time 6.00 .9000 5.00 .8000 4.00 .7000 .6000 3.00 .5000 2.00 .4000 1.00 .3000 .00 .2000 18-30 61-65 71-75 76-80 -1.00 .1000 -2.00 .0000 18-30 61-65 Raw probability 71-75 76-80 -3.00 Logit Errors are for each group. So low power for interaction 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 15 Model 2 Using SPSS menus Analyze > Regression > Binary Logistic Dependent Covariates Method Categorical or Save Options good# task#(cat), age#(cat), intellec task#(cat)*age#(cat) task#(cat)*intellec intellec*age#(cat) task#(cat)*age#(cat)*intellec Enter task#(deviation), age#(deviation) task#(repeated), age#(repeated probabilities, Cook’s, deviation CI for exp(B) 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 16 Model 2 Summary Omnibus=Whole Model LR chi2(23)=82.2, p=.0000001 Various r2 values McFadden=.36; Cox & Snell=.37; Nagelkerke=.51 Variable Effects Source TASK AGE 3 intellect TASK*AGE TASK*intellect AGE*intellect TASK*AGE*intellect DF Wald chi^2 2 14.03 3 4.45 1 2.87 6 6.00 2 4.32 3 5.00 6 10.52 Wald Prob .000899 .217040 .089995 .423621 .115183 .171542 .104480 LR Chi^2 29.70 4.96 6.03 14.63 7.73 7.07 21.43 LR Prob .000000 .174500 .014101 .023371 .021003 .069614 .001532 Comparison of Variable Effects with different methods/packages 1. Likelihood Ratio shows strong effects intellec + intellec interactions Used JMP-IN [even version 3, 5 is better for some things] 2. 3. Wald does NOT show these effect - WORRYING Model improvement with intellec: chi2(12)=33.3, p=.00087 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 17 Model 2 Probability by Age 18-30 Predicted probability: Model 2 1.00 61-65 •Not very clear! 0.75 action event time task# 0.50 0.25 0.00 71-75 1.00 76-80 Predicted probability: Model 2 0.75 0.50 •Task effect: •Event has lower prob •Intellect: •Most groups: •Prob increase with intellec •3 way interactions: • > 70, event; 61-65 time •Prob decrease with intellec 0.25 -2.00 0.00 -3.00 -1.00 0.00 intellec 1.00 2.00 -3.00 -2.00 -1.00 0.00 1.00 2.00 intellec 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 18 Model 2 Logit by Age 18-30 61-65 25 20 action event time 15 10 lgt2 •Bit clearer! task# 5 0 -5 -10 -15 71-75 76-80 25 20 15 lgt2 10 5 0 -5 -10 -15 •Task effect: •Event has lower prob •Intellect: •Most groups: •Prob increase with intellec •Large: 71-75time, 76-80action •3 way interactions: • > 70, event; 61-65 time •Prob decrease with intellec -3 -2 -1 0 intellec 1 2 -3 -2 -1 0 1 2 intellec 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 19 Summary & Recommendations Recommend Logit analyses as a very important tool Recommend Graphic displays toimprove interpretability SPSS provides basic procedure Limitations of SPSS No direct predicted logit or probability Table or Graph Summary Poor model diagnostics and power procedures No direct group standard errors No Maximum Likelihood estimates for explanatory variables No mixed models Other general packages are also DIRE - in different ways Need simple tools for routine logistic applications Can SPSS User Groups do anything? 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 20 References Agresti, A. (1990). Categorical data analyses. Chichester: Wiley. Agresti, A. (1996). Introduction to categorical data analyses. Chichester: Wiley. Agresti, A., & Finley, B. (1997). Statistical methods for the social sciences (3 ed.). Upper Saddle River, NJ: Prentice Hall. Agresti, A., & Hartzel, J. (2000). Tutorial in biostatistics: strategies for comparing treatments on a binary response with mulit-centre data. Statistics in Medicine, 19, 1115-1139. Everitt, B., & Dunn, G. (2001). Applied multivariate data analysis (2 ed.). London: Edward Arnold. Kornbrot, D. E. (2000, 17-20 july 2000). Counting on prospective memory: Advantages of logistic and log linear models over ANOVA and correlations. Paper presented at the 1st International Prospective Memory Conference, Hatfield, Hertfordshire, U.K. Kvavilashvili, L., Kornbrot , D. E., Mash , V., Cockburn, J., & Milne, A. (2000, 17-20 july 2000). Remembering event-, time- and activity-based tasks in young, young-old and old-old people. Paper presented at the 1st International Prospective Memory Conference, Hatfield, Hertfordshire, U.K. Lindsey, J. K. (1999). Models for repeated measurements (2 ed.). Oxford: Oxford University Press. Sofroniou, N., & Hutcheson, G. D. (2002). Confidence Intervals for the Predictions of Logistic Regression in the Presence and Absence of a Variance– Covariance Matrix. Understanding Statistics, 1(1), 3–18. Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3 ed.). New York: Harper Collins. 11-Nov-05 Lucid Logistics: Kornbrot, Blended Learning Unit, University of Hertfordshire SPSS York 21