Download Amsterdam Rehabilitation Research Center | Reade

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Logistic regression analysis
Martin van der Esch, PhD
Amsterdam Rehabilitation Research Center | Reade
Discovering statistics using SPSS Andy Field
http://www.youtube.com/watch?v=OvQShzJ7Sns (part
1)
http://www.youtube.com/watch?v=zdJhydkcqv4 (part 2)
http://www.youtube.com/watch?v=hxcDOoupB4Y (part
3)
etc
Amsterdam Rehabilitation Research Center | Reade
Logistic regression analysis
The basic principle of logistic regression is much the
same as in linear regression analysis
Aim is to predict a transformation of the dichotomized
dependent variable
 logit transformation
Amsterdam Rehabilitation Research Center | Reade
Steps to follow
Step 1: simple linear
regression equation for
binary dependent variable:
Step 2: formulate estimated
probability of Y:
Step 3: in logistic regression
we use odds ratio for
estimated probability:
Amsterdam Rehabilitation Research Center | Reade
Y0,1  b0  b1 X 1  ...
P(Y0,1 )  b0  b1 X 1  ...
pY 
 b0  b1 X 1  ...
1  pY 
Steps to follow 2
Step 4: in case of skewed data
(right sided): Logit transformation
, makes log odds.
Step 5: Different ways of
presentation: estimated
probability of p can be calculated
from combination of variables
Amsterdam Rehabilitation Research Center | Reade
pY 
ln
 b0  b1 X 1  ....
1  pY 
pY  1 
1
1  e
b0 b1 X 1 ... 
Binary instead of continuous outcome
We are interested in a binary outcome measure
For example; Heart attack
Y = 0 (“no”)
Y = 1 (“yes”)
Amsterdam Rehabilitation Research Center | Reade
… and we want
Y0,1   0  1 X
But, how do we get there…?
Amsterdam Rehabilitation Research Center | Reade
Analysing a binary variable (Y) as if it was a continuous
variable
hartinfarct

1
0,8
0,6
0,4
0,2
0
20
25
30
35
40
45
50
55
60
65
70
75
80
85
90
leeftijd
Not possible, because Y (heart attack) is no or yes (0 or 1)
Amsterdam Rehabilitation Research Center | Reade
Number of heart attacks in different age
groups
Age
20-29
30-34
n
10
15
35-39
40-44
45-49
12
15
13
50-54
55-59
60-69
8
17
10
Heart attack
No
Yes
9
1
13
2
9
3
10
5
7
6
3
5
4
13
2
8
Amsterdam Rehabilitation Research Center | Reade
P
0.10
0.13
0.25
0.33
0.46
0.63
0.76
0.80
Possible…
Heart attack
1
0,8
Relation between age
Relation
between
and
probable
heart age
and probable
attack;
p(y=1) heart
attack; p(y=1)
0,6
0,4
0,2
0
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
age
Amsterdam Rehabilitation Research Center | Reade
use of logistic model
NO modelling of the dichotomous outcome event itself
model probability of the outcome event given a set of
prognostic factors
probability (D=1 | X1,X2,…,Xn)
• probability (death | man, 80 yrs, with hypertension,
normal cholesterol level)
10
Amsterdam Rehabilitation Research Center | Reade
Y0,1   0  1 X
Outcome becomes estimated
probability of outcome
Amsterdam Rehabilitation Research Center | Reade
Estimated probability of outcome
P(Y0,1 )  b0  b1 X 1  ...
But, distribution of probability is skewed…
Amsterdam Rehabilitation Research Center | Reade
Possible…
Heart attack
1
0,8
Relation between age
Relation
between
and
probable
heart age
and probable
attack;
p(y=1) heart
attack; p(y=1)
0,6
0,4
0,2
0
20 25 30 35 40 45 50 55 60 65 70 75 80 85 90
age
Amsterdam Rehabilitation Research Center | Reade
Logit(p) of outcome
pY 
 b0  b1 X 1  ...
1  pY 
Logit transformation of proportion to
remove skewness!
Amsterdam Rehabilitation Research Center | Reade
Logit(p) of outcome
a probability can be transformed into a number
between minus infinity and infinity in two step
obtain the odds
(2 out of 5 is sick: odds = ?)
prob(event )
odds(event ) 
1  prob(event )
take the natural logarithm:
ln(odds)
The natural logarithm is the logaritm with the basic value e (e=2,71828…): 'elog'
or 'ln'
15
Amsterdam Rehabilitation Research Center | Reade
Model
the ln(odds) of an event is modelled
the model is similar to the linear regression model
P( sick )
ln(
)  a  b1 x1  b2 x2    bn xn 
1  p( sick )
ln(odds)  a  b1 x1  b2 x2    bn xn
16
Amsterdam Rehabilitation Research Center | Reade
Summary
1. Rewrite the outcome as the probability of the
outcome
2. Logit transformation: rewrite the outcome as a
Ln(odds)
Amsterdam Rehabilitation Research Center | Reade
Model for Logistic regression
 pY  1 
ln 
   0  1 X
 1  pY  1 
β’s (beta’s) estimated with Maximum
Likelihood procedure
Amsterdam Rehabilitation Research Center | Reade
Logistic regression analysis
• ‘Best’ line is calculated with ‘maximum
likelihood procedure’
• Maximum likelihood: obtained by several
repeated cycles of calculation
Amsterdam Rehabilitation Research Center | Reade
Example:
Binary outcome (heart attack) and one binary predictor
(smoking)
Variables in the Equation
Step
a
1
ROKEN
Cons tant
B
,800
-,171
S.E.
,245
,111
Wald
10,623
2,384
a. Variable(s ) entered on step 1: ROKEN.
Amsterdam Rehabilitation Research Center | Reade
df
1
1
Sig.
,001
,123
Exp(B)
2,225
,843
Hypothesis testing: statistical difference
between smokers and non-smokers
1)
Wald test
2)
95% CI of Odds Ratio
3)
Likelihood-ratio-test (see M2-HC7 diagnosis)
Amsterdam Rehabilitation Research Center | Reade
Wald test = (b/SE(b))2
(0.7997 / 0.2454)2 = 10.6231
Significance?
Critical p-value derived from a Chi Square distribution
with one degree of freedom (i.e. Wald is from the
Chi Square family)
Amsterdam Rehabilitation Research Center | Reade
Testing the model
-2log likelihood of the model with the determinant in
comparison with the -2log likelihood of the model
without the determinant
Difference is chi-square distributed
• The amount of df is the same as the difference
between the variables between both models
Amsterdam Rehabilitation Research Center | Reade
Logistic regression with categorical predictor
Analysis of three groups
Amsterdam Rehabilitation Research Center | Reade
Frequence of ‘recovery’
group
medication1
medication2
placebo
recovery
yes
recovery
no
35
40
20
65
60
80
Amsterdam Rehabilitation Research Center | Reade
What to do?
We analyse both medication groups and the placebo
group with dummy variables
Amsterdam Rehabilitation Research Center | Reade
Variables in the Equation
B
Step
a
1
GROEP
GROEP(1)
GROEP(2)
Cons tant
,767
,981
-1,386
S.E.
,326
,323
,250
Wald
9,698
5,529
9,235
30,748
df
2
1
1
1
a. Variable(s ) entered on step 1: GROEP.
Variables in the Equation
Step
a
1
GROEP(1)
GROEP(2)
Constant
Exp(B)
2,154
2,667
,250
95,0% C.I.for EXP(B)
Lower
Upper
1,136
4,083
1,417
5,020
a. Variable(s ) entered on s tep 1: GROEP.
Amsterdam Rehabilitation Research Center | Reade
Sig.
,008
,019
,002
,000
We are also able to analyse the relationship between
continuous variable and binary outcome with logistic
regression analysis.
Amsterdam Rehabilitation Research Center | Reade
Logistic regression analysic with a continuous
variable
Relation between age and pain (no/yes)
Variables in the Equation
Step
a
1
age
Cons tant
B
,079
-4,302
S.E.
,026
1,629
Wald
9,157
6,970
df
1
1
Sig.
,002
,008
Exp(B)
1,083
,014
95,0% C.I.for EXP(B)
Lower
Upper
1,028
1,140
a. Variable(s ) entered on step 1: age.
EXP(B) is odds ratio for the change of one unit of the
determinant, i.e. one additional year of age gives an
8.3% increase in odds of having pain
Be careful! Relationship is unlikely to be linear over
whole age range
Amsterdam Rehabilitation Research Center | Reade
Linearity check
Similar with linear regression analysis
• No scatter plot, but histogram:
• Adding a quadratic term and splitting exposure
variable into groups.
• Be careful: do not use OR (EXP()), but  itself !
Amsterdam Rehabilitation Research Center | Reade
Related documents