Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Statistics for Health Research Regression: Checking the Model Peter T. Donnan Professor of Epidemiology and Biostatistics Objectives of session • Recognise the need to check fit of the model • Carry out checks of assumptions in SPSS for simple linear regression • Understand predictive model • Understand residuals How is the fitted line obtained? Use method of least squares (LS) Seek to minimise squared vertical differences between each point and fitted line Results in parameter estimates or regression coefficients of slope (b) and intercept (a) – y=a+bx Dependent (y) Consider Fitted line of y = a +bx a Explanatory (x) Consider the regression of age on minimum LDL cholesterol achieved • Select Regression Linear…. • Dependent (y) – Min LDL achieved • Independent (x) - Age_Base Output from SPSS linear regression Coefficientsa Model 1 Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t (Constant) 2.024 .105 19.340 Age at baseline -.008 .002 -.121 -4.546 sig .000 .000 a. Dependent Variable: Min LDL achieved N.B. -0.008 may look very small but represents: The DECREASE in LDL achieved for each increase in one unit of age i.e. ONE year Output from SPSS linear regression Coefficientsa Model 1 Unstandardized Coefficients Standardized Coefficients B Std. Error Beta t (Constant) 2.024 .105 19.340 Age at baseline -.008 .002 -.121 -4.546 sig .000 .000 a. Dependent Variable: Min LDL achieved H0 : slope b = 0 Test t = slope/se = -0.008/0.002 = 4.546 with p<0.001, so statistically significant Predicted LDL = 2.024 - 0.008xAge Prediction Equation from linear regression Predicted LDL achieved = 2.024 - 0.008xAge So for a man aged 65 the predicted LDL achieved = 2.024 – 0.008x 65 = 1.504 Age Predicted Min LDL 45 1.664 55 1.584 65 1.504 75 1.424 Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed Use Graphs and Scatterplot to obtain the Lowess line of fit Use Graphs and Scatterplot to obtain the Lowess line of fit 1. Create Scatterplot and then double-click to enter chart editor 2. Chose Icon ‘Add fit line at total’ 3. Then select type of fit such as Lowess Linear assumption: Fitted lowess smoothed line Lowess smoothed line (red) gives a good eyeball examination of linear assumption (green) Definition of a residual A residual is the difference between the predicted value (fitted line) and the actual value or unexplained variation ri = yi – E ( yi ) Or ri = yi – ( a + bx ) Residuals To assess the residuals in SPSS linear regression, select plots….. Normalised or standardised predicted value of LDL Normalised residual Select histogram of residuals and normal probability plot In SPSS linear regression, select Statistics….. Model fit Select confidence intervals for regression coefficients Select DurbinWatson for serial correlation and identification of outliers Output: Scatterplot of residuals vs. predicted Note 1) Mean of residuals = 0 2) Most of data lie within + or -3 SDs of mean Assumptions of Regression 1. Relationship is linear 2. Outcome variable and hence residuals or error terms are approx. Normally distributed Output: Histogram of standardised residuals Plot of residuals with normal curve superimposed Output: Cumulative probability plot Look for deviation from diagonal line to indicate nonnormality Output: Description of residuals Descriptive statistics for residuals Residuals Statisticsa Minimum Maxim um Predicted Value 1.314867 1.843205 Residual -1.65389 4.0658469 Std. Predicted Value -2.750 3.264 Std. Residual -2.302 5.660 Mean Std. Deviation 1.556478 .0878548 .0000000 .7181448 .000 1.000 .000 1.000 a. Dependent Variable: Min LDL achieved Worth investigation? Subjects with standardised residuals > 3 Casewise Diagnostics(a) N 1383 1383 1383 1383 Case NumberStd. Residual Min LDL 164 5.660 5.5840 209 4.395 4.5260 250 3.143 3.7875 268 3.064 3.8730 274 3.227 4.0953 362 4.095 4.5350 517 3.636 4.3240 849 3.968 4.3290 1047 4.207 4.4360 1075 3.885 4.4040 1103 3.519 3.9905 1229 3.016 3.7660 1290 3.975 4.2345 Predicted 1.518153 1.368685 1.529325 1.671664 1.777153 1.593460 1.711788 1.478113 1.413686 1.613219 1.462584 1.599254 1.379107 a. Dependent Variable: Min LDL achieved Residual 4.0658471 3.1573148 2.2581750 2.2013357 2.3180975 2.9415398 2.6122125 2.8508873 3.0223141 2.7907805 2.5279157 2.1667456 2.8553933 Output: Model fit and serial correlation Model Summary Model 1 R .121a R Square Adjusted R Square Std. Error of the Estimate Durbin-Watson .015 .014 .7184048 2.034 a. Predictors: (Constant), Age at baseline R – correlation between min LDL achieved and Age at baseline, here 0.121 R2 - % variation explained, here 1.5%, not particularly high Durbin-Watson test - serial correlation of residuals should be approximately 2 if no serial correlation Summary After fitting any regression model check assumptions • Functional form – linearity is default, often not best fit, consider quadratic… • Check Residuals for approx. normality • Check Residuals for outliers (> 3 SDs) • All accomplished within SPSS Practical on Model Checking Read in ‘LDL Data.sav’ 1) Fit age squared term in min LDL model and check fit of model compared to linear fit (Hint: Use transform/compute to create age squared term and fit age and age2) 2) Fit separate linear regressions with min Chol achieved with predictors of 1) baseline Chol 2) APOE_lin 3) adherence Check assumptions and interpret results