Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Logistic regression analysis Martin van der Esch, PhD Amsterdam Rehabilitation Research Center | Reade Discovering statistics using SPSS Andy Field http://www.youtube.com/watch?v=OvQShzJ7Sns (part 1) http://www.youtube.com/watch?v=zdJhydkcqv4 (part 2) http://www.youtube.com/watch?v=hxcDOoupB4Y (part 3) etc Amsterdam Rehabilitation Research Center | Reade Logistic regression analysis The basic principle of logistic regression is much the same as in linear regression analysis Aim is to predict a transformation of the dichotomized dependent variable logit transformation Amsterdam Rehabilitation Research Center | Reade Steps to follow Step 1: simple linear regression equation for binary dependent variable: Step 2: formulate estimated probability of Y: Step 3: in logistic regression we use odds ratio for estimated probability: Amsterdam Rehabilitation Research Center | Reade Y0,1 b0 b1 X 1 ... P(Y0,1 ) b0 b1 X 1 ... pY b0 b1 X 1 ... 1 pY Steps to follow 2 Step 4: in case of skewed data (right sided): Logit transformation , makes log odds. Step 5: Different ways of presentation: estimated probability of p can be calculated from combination of variables Amsterdam Rehabilitation Research Center | Reade pY ln b0 b1 X 1 .... 1 pY pY 1 1 1 e b0 b1 X 1 ... Binary instead of continuous outcome We are interested in a binary outcome measure For example; Heart attack Y = 0 (“no”) Y = 1 (“yes”) Amsterdam Rehabilitation Research Center | Reade … and we want Y0,1 0 1 X But, how do we get there…? Amsterdam Rehabilitation Research Center | Reade Analysing a binary variable (Y) as if it was a continuous variable hartinfarct 1 0,8 0,6 0,4 0,2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 leeftijd Not possible, because Y (heart attack) is no or yes (0 or 1) Amsterdam Rehabilitation Research Center | Reade Number of heart attacks in different age groups Age 20-29 30-34 n 10 15 35-39 40-44 45-49 12 15 13 50-54 55-59 60-69 8 17 10 Heart attack No Yes 9 1 13 2 9 3 10 5 7 6 3 5 4 13 2 8 Amsterdam Rehabilitation Research Center | Reade P 0.10 0.13 0.25 0.33 0.46 0.63 0.76 0.80 Possible… Heart attack 1 0,8 Relation between age Relation between and probable heart age and probable attack; p(y=1) heart attack; p(y=1) 0,6 0,4 0,2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age Amsterdam Rehabilitation Research Center | Reade use of logistic model NO modelling of the dichotomous outcome event itself model probability of the outcome event given a set of prognostic factors probability (D=1 | X1,X2,…,Xn) • probability (death | man, 80 yrs, with hypertension, normal cholesterol level) 10 Amsterdam Rehabilitation Research Center | Reade Y0,1 0 1 X Outcome becomes estimated probability of outcome Amsterdam Rehabilitation Research Center | Reade Estimated probability of outcome P(Y0,1 ) b0 b1 X 1 ... But, distribution of probability is skewed… Amsterdam Rehabilitation Research Center | Reade Possible… Heart attack 1 0,8 Relation between age Relation between and probable heart age and probable attack; p(y=1) heart attack; p(y=1) 0,6 0,4 0,2 0 20 25 30 35 40 45 50 55 60 65 70 75 80 85 90 age Amsterdam Rehabilitation Research Center | Reade Logit(p) of outcome pY b0 b1 X 1 ... 1 pY Logit transformation of proportion to remove skewness! Amsterdam Rehabilitation Research Center | Reade Logit(p) of outcome a probability can be transformed into a number between minus infinity and infinity in two step obtain the odds (2 out of 5 is sick: odds = ?) prob(event ) odds(event ) 1 prob(event ) take the natural logarithm: ln(odds) The natural logarithm is the logaritm with the basic value e (e=2,71828…): 'elog' or 'ln' 15 Amsterdam Rehabilitation Research Center | Reade Model the ln(odds) of an event is modelled the model is similar to the linear regression model P( sick ) ln( ) a b1 x1 b2 x2 bn xn 1 p( sick ) ln(odds) a b1 x1 b2 x2 bn xn 16 Amsterdam Rehabilitation Research Center | Reade Summary 1. Rewrite the outcome as the probability of the outcome 2. Logit transformation: rewrite the outcome as a Ln(odds) Amsterdam Rehabilitation Research Center | Reade Model for Logistic regression pY 1 ln 0 1 X 1 pY 1 β’s (beta’s) estimated with Maximum Likelihood procedure Amsterdam Rehabilitation Research Center | Reade Logistic regression analysis • ‘Best’ line is calculated with ‘maximum likelihood procedure’ • Maximum likelihood: obtained by several repeated cycles of calculation Amsterdam Rehabilitation Research Center | Reade Example: Binary outcome (heart attack) and one binary predictor (smoking) Variables in the Equation Step a 1 ROKEN Cons tant B ,800 -,171 S.E. ,245 ,111 Wald 10,623 2,384 a. Variable(s ) entered on step 1: ROKEN. Amsterdam Rehabilitation Research Center | Reade df 1 1 Sig. ,001 ,123 Exp(B) 2,225 ,843 Hypothesis testing: statistical difference between smokers and non-smokers 1) Wald test 2) 95% CI of Odds Ratio 3) Likelihood-ratio-test (see M2-HC7 diagnosis) Amsterdam Rehabilitation Research Center | Reade Wald test = (b/SE(b))2 (0.7997 / 0.2454)2 = 10.6231 Significance? Critical p-value derived from a Chi Square distribution with one degree of freedom (i.e. Wald is from the Chi Square family) Amsterdam Rehabilitation Research Center | Reade Testing the model -2log likelihood of the model with the determinant in comparison with the -2log likelihood of the model without the determinant Difference is chi-square distributed • The amount of df is the same as the difference between the variables between both models Amsterdam Rehabilitation Research Center | Reade Logistic regression with categorical predictor Analysis of three groups Amsterdam Rehabilitation Research Center | Reade Frequence of ‘recovery’ group medication1 medication2 placebo recovery yes recovery no 35 40 20 65 60 80 Amsterdam Rehabilitation Research Center | Reade What to do? We analyse both medication groups and the placebo group with dummy variables Amsterdam Rehabilitation Research Center | Reade Variables in the Equation B Step a 1 GROEP GROEP(1) GROEP(2) Cons tant ,767 ,981 -1,386 S.E. ,326 ,323 ,250 Wald 9,698 5,529 9,235 30,748 df 2 1 1 1 a. Variable(s ) entered on step 1: GROEP. Variables in the Equation Step a 1 GROEP(1) GROEP(2) Constant Exp(B) 2,154 2,667 ,250 95,0% C.I.for EXP(B) Lower Upper 1,136 4,083 1,417 5,020 a. Variable(s ) entered on s tep 1: GROEP. Amsterdam Rehabilitation Research Center | Reade Sig. ,008 ,019 ,002 ,000 We are also able to analyse the relationship between continuous variable and binary outcome with logistic regression analysis. Amsterdam Rehabilitation Research Center | Reade Logistic regression analysic with a continuous variable Relation between age and pain (no/yes) Variables in the Equation Step a 1 age Cons tant B ,079 -4,302 S.E. ,026 1,629 Wald 9,157 6,970 df 1 1 Sig. ,002 ,008 Exp(B) 1,083 ,014 95,0% C.I.for EXP(B) Lower Upper 1,028 1,140 a. Variable(s ) entered on step 1: age. EXP(B) is odds ratio for the change of one unit of the determinant, i.e. one additional year of age gives an 8.3% increase in odds of having pain Be careful! Relationship is unlikely to be linear over whole age range Amsterdam Rehabilitation Research Center | Reade Linearity check Similar with linear regression analysis • No scatter plot, but histogram: • Adding a quadratic term and splitting exposure variable into groups. • Be careful: do not use OR (EXP()), but itself ! Amsterdam Rehabilitation Research Center | Reade