Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Data assimilation wikipedia , lookup
Time series wikipedia , lookup
Expectation–maximization algorithm wikipedia , lookup
Interaction (statistics) wikipedia , lookup
Instrumental variables estimation wikipedia , lookup
Choice modelling wikipedia , lookup
Linear regression wikipedia , lookup
Correlation and Regression It’s the Last Lecture Hooray! Correlation • Analyze Correlate Bivariate… – Click over variables you wish to correlate – Options Can select descriptives and pairwise vs. listwise deletion • Pairwise deletion – only cases with data for all variables are included (default) • Listwise deletion - only cases with data for both variables are included Correlation • Assumptions: – Linear relationship between variables • Inspect scatterplot – Normality • Shapiro-Wilk’s W • Other issues: – Range restriction & Heterogenous subgroups • Identified methodologically – Outliers • Inspect scatterplot Correlations Time 1 Generality=Mean of all ASQ Stability and Globality s cores for bad events Time 2 Generality=Mean of all ASQ Stability and Globality s cores for bad events Total BDI Score=Sum of all 21 BDI items Total BDI Score=Sum of all 21 BDI items Time 1 Generality= Mean of all ASQ Stability and Globality s cores for bad events Pears on Correlation 1 Sig. (2-tailed) N 112 Pears on Correlation Sig. (2-tailed) .507** N Time 2 Generality= Mean of all ASQ Stability Total BDI and Globality Score=Sum s cores for bad of all 21 BDI events items .507** .097 Total BDI Score=Sum of all 21 BDI items .131 .000 .310 .198 98 112 98 1 .098 .160 .335 .116 .000 98 98 98 98 Pears on Correlation Sig. (2-tailed) N .097 .310 .098 .335 1 .692** .000 112 98 112 Pears on Correlation Sig. (2-tailed) N .131 .198 98 .160 .116 98 **. Correlation is s ignificant at the 0.01 level (2-tailed). .692** .000 98 98 1 98 Correlation • Partial Correlation – removes variance from a 3rd variable, like ANCOVA – Analyze Correlate Partial… Correlations Control Variables -none-a Time 2 Generality=Mean of all ASQ Stability and Globality s cores for bad events Correlation Total BDI Score=Sum of all 21 BDI items Correlation Significance (2-tailed) df Correlation Significance (2-tailed) df Time 1 Generality=Mean of all ASQ Stability and Globality s cores for bad events Time 1 Generality=Mean of all ASQ Stability and Globality s cores for bad events Significance (2-tailed) df Time 2 Generality=Mean of all ASQ Stability and Globality s cores for bad events Correlation Total BDI Score=Sum of all 21 BDI items Correlation Significance (2-tailed) df a. Cells contain zero-order (Pearson) correlations . Significance (2-tailed) df Time 2 Generality= Mean of all ASQ Stability and Globality s cores for bad events 1.000 Total BDI Score=Sum of all 21 BDI items .160 Time 1 Generality= Mean of all ASQ Stability and Globality s cores for bad events .507 . .116 .000 0 96 96 .160 .116 96 .507 .000 1.000 . 0 .131 .198 .131 .198 96 1.000 . 96 96 0 1.000 .109 . .286 0 95 .109 .286 95 1.000 . 0 Regression • Analyze Regression Linear… – Use if both predictor(s) and criterion variables are continuous – Dependent = Criterion – Independent = Predictor(s) – Statistics… • Regression Coefficients (b & β) – Estimates – Confidence intervals – Covariance matrix Regression – Statistics… • • • • • Model fit R square change Descriptives Part and partial correlations Collinearity diagnostics – Recall that you don’t want your predictors to be too highly related to one another – Collinearity/Mulitcollinearity – when predictors are too highly correlated with one another – Eigenvalues of the scaled and uncentered crossproducts matrix, condition indices, and variancedecomposition proportions are displayed along with variance inflation factors (VIF) and tolerances for individual predictors – Tolerances should be > .2; VIF should be < 4 Regression – Statistics… • Residuals – Durbin-Watson » Tests correlation among residuals (i.e. autocorrelation) - significant correlation implies nonindependent data » Clicking on this will also display a histogram of residuals, a normal probability plot of residuals, and the case numbers and standardized residuals for the 10 cases with the largest standardized residuals – Casewise diagnostics » Identifies outliers according to pre-specified criteria Regression – Plots… • Plot standardized residuals (*ZRESID) on yaxis and standardized predicted values (*ZPRED) on x-axis • Check “Normal probability plot” under “Standardized Residual Plots” Regression • Assumptions: – Observations are independent – Linearity of Regression • Look for residuals that get larger at extreme values, i.e. if residual are normally distributed – Save unstandardized residuals » Click Save… Under “Residuals” click “Unstandardized” when you run your regression, – Run a Shapiro-Wilk’s W test on this variable (RES_1) Regression – Normality in Arrays • Examine normal probability plot of the residuals, residuals should resemble normal distribution curve BAD GOOD Regression – Homogeneity of Variance in Arrays • Look for residuals getting more spread out as a function of predicted value – i.e. cone shaped patter in plot of standardized residuals vs. standardized predicted values BAD GOOD Regression Output Variables Entered/Removedb Descriptive Statistics Time 2 Total BDI Score Time 1 Total BDI Score Time 1 Pess imis m Mean 7.8980 9.17 4.7819 Std. Deviation 6.86916 6.430 .44929 N 98 98 98 Model 1 Variables Entered Time 1 Pes simis m, Time 1 Total BDI a Score Variables Removed . Method Enter a. All reques ted variables entered. b. Dependent Variable: Time 2 Total BDI Score Correlations Pears on Correlation Sig. (1-tailed) N Time 2 Total BDI Score Time 1 Total BDI Score Time 1 Pess imis m Time 2 Total BDI Score Time 1 Total BDI Score Time 1 Pess imis m Time 2 Total BDI Score Time 1 Total BDI Score Time 1 Pess imis m Time 2 Total BDI Score 1.000 .692 .131 . .000 .099 98 98 98 Time 1 Total BDI Score .692 1.000 .084 .000 . .205 98 98 98 Time 1 Pes simis m .131 .084 1.000 .099 .205 . 98 98 98 Regression Output Model Summaryb Change Statis tics Model 1 R R Square .696 a .484 Adjus ted R Square .473 Std. Error of the Es timate 4.98632 R Square Change .484 F Change 44.543 df1 df2 2 95 Sig. F Change .000 DurbinWatson 1.951 a. Predictors : (Constant), Time 1 Pes s imis m, Time 1 Total BDI Score b. Dependent Variable: Time 2 Total BDI Score ANOVAb Model 1 Regress ion Res idual Total Sum of Squares 2214.959 2362.020 4576.980 df 2 95 97 Mean Square 1107.480 24.863 F 44.543 Sig. .000 a a. Predictors : (Constant), Time 1 Pes s imis m, Time 1 Total BDI Score b. Dependent Variable: Time 2 Total BDI Score Coefficientsa Model 1 (Cons tant) Time 1 Total BDI Score Time 1 Pess imis m Uns tandardized Coefficients B Std. Error -4.193 5.419 .732 .079 1.123 1.131 a. Dependent Variable: Time 2 Total BDI Score Standardized Coefficients Beta .686 .073 t -.774 9.270 .993 Sig. .441 .000 .323 Zero-order .692 .131 Correlations Partial .689 .101 Part .683 .073 Collinearity Statis tics Tolerance VIF .993 .993 1.007 1.007 Regression Output Collinearity Diagnosticsa Model 1 Dimens ion 1 2 3 Eigenvalue 2.761 .235 .004 Condition Index 1.000 3.428 25.224 Variance Proportions Time 1 Total Time 1 (Cons tant) BDI Score Pes simism .00 .04 .00 .01 .96 .01 .99 .00 .99 a. Dependent Variable: Time 2 Total BDI Score Residuals Statisticsa Predicted Value Res idual Std. Predicted Value Std. Res idual Minimum .4173 -9.50501 -1.565 -1.906 Maximum 24.5050 19.97385 3.475 4.006 a. Dependent Variable: Time 2 Total BDI Score Mean 7.8980 .00000 .000 .000 Std. Deviation 4.77856 4.93465 1.000 .990 N 98 98 98 98 Logistic Regression • Analyze Regression Binary Logistic… – Use if criterion is dichotomous [no assumptions about predictor(s)] – Use “Multinomial Logistic…” if criterion polychotomous (3+ groups) • Don’t worry about that though Logistic Regression • Assumptions: – Observations are independent – Criterion is dichotomous • No stats needed to show either one of these • Important issues: – Outliers • Save Influence Check “Cook’s” and “Leverage values” • Cook’s statistic – outlier = any variable > 4/(n-k-1), where n = # of cases & k = # of predictors • Leverage values – outlier = anything > .5 Logistic Regression – Multicollinearity • Tolerance and/or VIF statistics aren’t easily obtained with SPSS, so you’ll just have to let this one go • Options… – Classification plots • Table of actual # of S’s in each criterion group vs. predicted group membership – Shows, in detail, how well regression predicted data Logistic Regression • Options… – Hosmer-Lemeshow goodness-of-fit • More robust than traditional χ2 goodness-offit statistic, particularly for models with continuous covariates and small sample sizes – Casewise listing of residuals • Helps ID cases with large residuals (outliers) Logistic Regression • Options… – Correlations of estimates • Just what it sounds like, correlations among predictors – Iteration history – CI for exp(B) • Provides confidence intervals for standardized logistic regression coefficient • Categorical… – If any predictors are discrete, they must be identified here, as well as which group is the reference group (identified as 0 vs. 1) Logistic Regression Output Case Processing Summary a Unweighted Cas es Selected Cases Included in Analysis Mis sing Cas es Total Uns elected Cas es Total N 112 0 112 0 112 Percent 100.0 .0 100.0 .0 100.0 a. If weight is in effect, s ee class ification table for the total number of cas es . Iteration Historya, b,c Iteration Step 1 0 2 3 4 5 -2 Log likelihood 87.117 84.443 84.397 84.397 84.397 Coefficients Cons tant 1.500 1.885 1.945 1.946 1.946 a. Cons tant is included in the model. b. Initial -2 Log Likelihood: 84.397 c. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001. Dependent Variable Encoding Original Value Attritor Non-Attritor Internal Value 0 1 Logistic Regression Output Classification Tablea, b Predicted Step 0 Obs erved attritor Attritor Attritor Non-Attritor attritor Non-Attritor 0 14 0 98 Percentage Correct .0 100.0 87.5 Overall Percentage a. Cons tant is included in the model. b. The cut value is .500 Variables in the Equation Step 0 Cons tant B 1.946 S.E. .286 Wald 46.385 df Sig. .000 1 Exp(B) 7.000 Variables not in the Equation Step 0 Variables Overall Statistics t1gen t1bidtot Score 1.905 .892 3.078 df 1 1 2 Sig. .168 .345 .215 Logistic Regression Output Iteration Historya, b,c,d -2 Log likelihood 85.242 81.564 81.407 81.407 81.407 Iteration Step 1 1 2 3 4 5 Coefficients t1gen .410 .777 .920 .930 .930 Cons tant -.254 -1.417 -1.942 -1.981 -1.981 t1bidtot -.021 -.038 -.043 -.044 -.044 a. Method: Enter b. Cons tant is included in the model. Model Summary c. Initial -2 Log Likelihood: 84.397 d. Estimation terminated at iteration number 5 becaus e parameter estimates changed by less than .001. Step 1 -2 Log Cox & Snell likelihood R Square a 81.407 .026 Nagelkerke R Square .050 a. Estimation terminated at iteration number 5 because parameter estimates changed by less than .001. Omnibus Tests of Model Coefficients Step 1 Step Block Model Chi-s quare 2.990 2.990 2.990 df 2 2 2 Sig. .224 .224 .224 Logistic Regression Output Hosmer and Lemeshow Test Step 1 Chi-s quare 6.603 df Sig. .580 8 Contingency Table for Hosmer and Lemeshow Test Step 1 1 2 3 4 5 6 7 8 9 10 attritor = Attritor Obs erved Expected 4 2.776 1 1.993 0 1.672 3 1.424 1 1.310 2 1.318 1 1.042 1 .925 1 .819 0 .722 attritor = Non-Attritor Obs erved Expected 7 8.224 10 9.007 11 9.328 8 9.576 10 9.690 10 10.682 10 9.958 10 10.075 10 10.181 12 11.278 Total 11 11 11 11 11 12 11 11 11 12 Classification Tablea Predicted Step 1 Obs erved attritor Overall Percentage a. The cut value is .500 Attritor Non-Attritor attritor Attritor Non-Attritor 0 14 0 98 Percentage Correct .0 100.0 87.5 Logistic Regression Output Variables in the Equation Step a 1 t1gen t1bidtot Cons tant B .930 -.044 -1.981 S.E. .639 .041 2.960 Wald 2.114 1.152 .448 df 1 1 1 Sig. .146 .283 .503 a. Variable(s ) entered on step 1: t1gen, t1bidtot. Correlation Matrix Step 1 Cons tant t1gen t1bidtot Cons tant 1.000 -.985 -.038 t1gen -.985 1.000 -.108 t1bidtot -.038 -.108 1.000 Exp(B) 2.534 .957 .138 95.0% C.I.for EXP(B) Lower Upper .724 8.873 .884 1.037 Logistic Regression Output • • • • • • • • • • • • • • • • • • • • • • • • • • Step number: 1 Observed Groups and Predicted Probabilities 32 ô ô ó ó ó ó F ó ó R 24 ô ô E ó N ó Q ó N ó U ó NN ó E 16 ô NNNN ô N ó NNNN ó C ó NNNN ó Y ó N NNNN ó 8ô NNNNNN ô ó N NNNNNN ó ó N NN NNANNNN ó ó N N AA ANNAAAAAANN ó Predicted òòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòôòòòòòòòòòòòòòòò Prob: 0 .25 .5 .75 1 Group: AAAAAAAAAAAAAAAAAAAAAAAAAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNN Predicted Probability is of Membership for Non-Attritor The Cut Value is .50 Symbols: A - Attritor N - Non-Attritor Each Symbol Represents 2 Cases.