Download Lab Lec 7 Logistic

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
1
Psychology 522/622
Winter 2008
Lab Lecture 7
Logistic Regression
DATAFILE: BIRTHWEIGHT.SAV
The following example involves data collected at Baystate Medical Center, Springfield,
Massachusetts, during 1986. We are interested in understanding the variables that predict
the likelihood of a mother giving birth to a baby with low-birth weight (defined as a baby
weighing less than 2500 grams). The variables we’ll use to predict low birth weight will
be the mother’s age, whether the mother smoked during the pregnancy, whether the
mother had hypertension, and the weight of the mother during her last menstrual period.
Before we begin, here’s a table that should help you “translate” odds, log odds, and
likelihood of the outcome (i.e., whichever level of the variable is coded 1).
Likelihood of Outcome (e.g., low birthweight)
Odds
Log Odds
Outcome is more likely to occur
>1
>0
Outcome is not more or less likely to occur
1
0
Outcome is less likely to occur
0 – .99
<0
Also, a quick conversion table might be helpful:
Desired Conversion
Calculation
Probability  Odds
Probability of outcome/Probability of not outcome = odds
Odds  Log Odds
(ln)odds = log odds
Log Odds  Odds
elog odds = odds
Log Odds  Probability elog odds / (1 + elog odds) = probability of outcome
Descriptive Analysis
Let’s run descriptive statistics for our data.
Analyze Descriptives StatisticsDescriptivesMove the following variables into the
Variable(s) box: Low birth weight, age of mother, mother’s last weight at menstrual
period, smoking status during pregnancy, history of hypertension.
Click OK.
De scri ptive Statistics
N
Low Birth W eight
Age of Mot her
Mother's W eight at Las t
Menst rual Period
Smoking Status During
Pregnancy
History of Hypertension
Valid N (lis twis e)
189
189
Minimum
0
14.0
Maximum
1
45.0
Mean
.31
23.238
St d. Deviat ion
.465
5.2987
189
80.00
250.00
129.8148
30.57938
189
0
1
.39
.489
189
189
0
1
.06
.244
2
Note: 31% of mothers had children with low birth weight. The odds of a low birth
weight baby are .31/.69 = .45, meaning that having a low birth weight baby is less likely
than having a baby of normal weight (because the odds are less than 1). The log of the
odds of having a low birth weight baby is ln(.45) = -.80. Again, because the log of the
odds is less than 0, low birth weight babies are less likely than normal birth weight
babies.
Now let’s look at the correlations among the study variables.
AnalyzeCorrelateBivariate Move the following variables into the Variable(s) box:
Low birth weight, age of mother, mother’s last weight at menstrual period, smoking
status during pregnancy, history of hypertension. *I unchecked the “flag significant
correlations” box because we are not interested in p-values associated with these
correlations. They are for descriptive purposes only.*
Click OK.
Correlations
Low Birth Weight
Age of Mother
Mother's Weight at Last
Menstrual Period
Smoking Status During
Pregnancy
His tory of Hypertens ion
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Pearson Correlation
Sig. (2-tailed)
N
Low Birth
Weight
1
.
189
-.119
.103
189
-.170
.020
Age of Mother
-.119
.103
189
1
.
189
.180
.013
Mother's
Weight at Last
Menstrual
Period
-.170
.020
189
.180
.013
189
1
.
189
189
189
189
189
.161
.027
189
.152
.036
189
-.044
.545
189
-.016
.829
189
-.044
.546
189
.236
.001
189
1
.
189
.013
.855
189
.013
.855
189
1
.
189
Smoking
Status During
Pregnancy
.161
.027
189
-.044
.545
189
-.044
.546
His tory of
Hypertension
.152
.036
189
-.016
.829
189
.236
.001
Note: We look at this to get a descriptive sense of how the variables are related and don’t
pay much attention to the significance of the correlations. We see that the tendency to
have a low birth weight baby is lower for younger mothers (-.119) and lighter mothers (.170), while it is higher for smoking mothers (.161) and hypertensive mothers (.152). We
also inspect the correlations among the predictors to make sure that collinearity issues are
not present. Nothing appears out of the ordinary.
Logistic Regression
Analyze  Regression  Binary Logistic
Make LBW the dependent variable (1 = low birth weight, 0 = Normal weight).
Move age, weight, smoke, and hyper to the covariates box.
Click OK.
3
Logistic Regression
Case Processing Summary
Unweighted Cases
Selected Cases
a
N
Included in Analysis
Mis sing Cases
Total
Unselected Cas es
Total
189
0
189
0
189
Percent
100.0
.0
100.0
.0
100.0
a. If weight is in effect, s ee class ification table for the total
number of cases.
De pendent Varia ble Encoding
Original Value Int ernal Value
Normal
0
Low
1
Note: It is good to double-check this to see that the group you think you coded 1 is the
group SPSS kept as 1 and likewise for the group coded 0. When SPSS does change things
around we’d have a lot of fun interpreting what is going on when in fact the opposite
relationships are in fact true. Nothing funny happened here so we proceed.
Block 0: Beginning Block
Classification Table a,b
Predicted
Step 0
Observed
Low Birth Weight
Normal
Low
Low Birth Weight
Normal
Low
130
0
59
0
Overall Percentage
Percentage
Correct
100.0
.0
68.8
a. Constant is included in the model.
b. The cut value is .500
Note: Because having a low birth weight baby is less likely, when no predictors are
considered, all mothers are predicted to have a normal birth weight baby.
Va riables in the Equa tion
St ep 0
Constant
B
-.790
S. E.
.157
W ald
25.327
df
1
Sig.
.000
Ex p(B)
.454
Note: The logistic regression coefficient (which is the constant) is the log of the odds of
having a low birth weight baby. This number agrees to rounding error what we calculated
by hand above; ln(.45) = -.80. Exp(B) are the odds of having a low birth weight baby
which also agrees within rounding error of what we computed by hand above; .31/.69 =
.45.
4
Va riables not in the Equa tion
St ep
0
Variables
Sc ore
5.438
2.674
4.924
4.388
18.060
weight
age
smoke
hy per
Overall Statistics
df
1
1
1
1
4
Sig.
.020
.102
.026
.036
.001
Block 1: Method = Enter
Omnibus Tests of Model Coefficients
Step 1
Step
Block
Model
Chi-square
18.988
18.988
18.988
df
4
4
4
Sig.
.001
.001
.001
Note: This output is similar to the F-statistic for a multiple regression model. These
statistics ask whether the four predictor variables taken all together are helpful in
predicting the outcome (i.e., the log odds of giving a low birth weight baby). The null
hypothesis here is that these predictors are not useful (i.e., their logistic regression
coefficients are all zero). We reject this idea. Taken together, these 4 predictor variables
appear to be useful in predicting the log odds of having a low birth weight baby.
Model Summary
Step
1
-2 Log
Cox & Snell
likelihood
R Square
215.684a
.096
Nagelkerke
R Square
.134
a. Es timation terminated at iteration number 5 because
parameter estimates changed by les s than .001.
Note: These R Square like statistics are just as they seem, they are R Square like
statistics and have similar kinds of interpretation as they do in multiple linear regression
(i.e., the bigger the better).
Classification Table a
Predicted
Step 1
Observed
Low Birth Weight
Overall Percentage
Normal
Low
Low Birth Weight
Normal
Low
121
9
49
10
Percentage
Correct
93.1
16.9
69.3
a. The cut value is .500
Note: If we were interested in using logistic regression for classification, this table would
tell us how well we would do. Comparing this table with the one above with no
predictors, our classification accuracy improves (but just a little). This is partly due to the
fact that having a low birth weight baby is a low probability event.
5
Va riables in the Equa tion
Sta ep
1
weight
age
smoke
hy per
Constant
B
-.017
-.036
.679
1.788
1.767
S. E.
.007
.033
.332
.686
1.052
W ald
6.554
1.144
4.185
6.797
2.819
df
1
1
1
1
1
Sig.
.010
.285
.041
.009
.093
Ex p(B)
.983
.965
1.972
5.978
5.852
a. Variable(s) ent ered on step 1: weight, age, smoke, hyper.
Note: The B column contains our logistic regression coefficients. Remember that these
are partial logistic regression coefficients (i.e., they are the relationship between mother’s
weight and the log odds of having a low birth weight baby controlling for mother’s age,
smoking history, and hypertension history).
Age: The age coefficient is not statistically significant whereas the others are. Controlling
for the other variables, age does not appear to be much related to the log odds of having a
low birth weight baby.
Weight: Controlling for the other variables, weight is negatively related to the log odds of
having a low birth weight baby (i.e., controlling for the other variables heavier mothers
are less likely to have a low birth weight baby). Specifically, for every additional pound
of mother’s weight the log odds of having a low birth weight baby decreases by .017.
Smoke: Controlling for the other variables, smoking increases the log odds of having a
low birth weight baby by 1.788. *This is a categorical variable, smoked/didn’t smoke.
Hyper: Controlling for the other variables, a history of hypertension increases the log
odds of having a low birth weight baby by 1.767. *This is a categorical variable, had
hypertension/didn’t have hypertension.
Regarding Exp(B): this is the odds value associated with the log odds value in column B.
For example: exp-.017 = .983; ln(.983) = -.02
Some Potential Follow-up Analyses for Additional Interpretation
Let’s compare two groups of women to get a sense of the effect of a history of
hypertension on the likelihood of having a low birth weight baby. The groups will be the
same on age (the mean age of 23.2), the same weight (the mean weight of 129.8), and
they will not smoke. However, one group will have a history of hypertension
(Hypertension group) and one group will not have a history of hypertension (No
Hypertension group).
No Hypertension Group
Log odd (LBW) = 1.767 - .017*(weight = 129.8) - .036*(age = 23.2) + .679*(smoke = 0)
+ 1.788*(hyper = 0)
= -1.27
e 1.27
.281
prob( LBW ) 

 .219
1.27
1  .281
1 e
It is rare to have a low birth weight baby whose mother is of average weight, average age,
did not smoke during the pregnancy, and has no history of hypertension.
6
Hypertension Group
Log odd (LBW) = 1.767 - .017*(weight = 129.8) - .036*(age = 23.2) + .679*(smoke = 0)
+ 1.788*(hyper = 1) = .51
e .51
1.665

 .625
.51
1  1.665
1 e
The probability of a low birth weight child being born to a mother with this constellation
of predictors (i.e., average weight, average age, didn’t smoke during pregnancy, history
of hypertension) is .625.
prob( LBW ) 
High Risk Group
Let’s predict the outcome for a high risk group. This group will consist of smokers with
a history of hypertension. They will be 1 SD below the mean for weight and 1 SD below
the mean for age.
Log odd (LBW) = 1.767 - .017*(weight = 129.8 – 30.6 = 99.2) - .036*(age = 23.2-5.3 =
17.9) + .679*(smoke = 1) + 1.788*(hyper = 1)
= 1.767 - .017(99.2) - .036(17.9) + .679(1) + 1.788(1)
= 1.71
e1.71
5.529

 .847
1.71
1  5.529
1 e
Here the probability of having a low birth weight baby within this high risk group is .847,
a pretty high probability.
prob( LBW ) 
Sample Results
A logistic regression analysis was conducted to evaluate the relationships between the
likelihood of having a low birth weight (LBW) and certain maternal characteristics. The
predictors included: mother’s weight at last menstrual period, mother’s age, whether or
not the mother smoked during pregnancy, and whether or not the mother had a history of
hypertension. The outcome variable was birth weight of the child, with 1 = low birth
weight (5 lbs 8 oz. or less), 0 = normal birth weight (greater than 5 lbs. 8 oz.). The four
predictors together were significantly related to the log odds of having a LBW baby, χ2
(4) = 18.99, p = .001, Cox-Snell R2 = .096.
With the exception of mother’s age, all predictors were significant. Controlling
for the other variables, weight is negatively related to the log odds of having a low birth
weight baby, slope = -.02, Wald χ2 statistic = 6.55, p =.01 . Specifically, for every
additional pound of mother’s weight the log odds of having a low birth weight baby
decreases by .017. Controlling for the other variables, smoking increases the log odds of
having a low birth weight baby by 1.788, slope = .68, Wald χ2 statistic = 4.19, p =.04.
Controlling for the other variables, a history of hypertension increases the log odds of
having a low birth weight baby by 1.767, slope = 1.79, Wald χ2 statistic = 1.05, p =.01.
The probability of “high risk” mothers (1 SD below the mean weight and age, smoked
during pregnancy and had a history of hypertension) having a LBW baby was .85.