Download Determining Probability Estimates from Logistic Regression

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Determining Probability Estimates from Logistic Regression Coefficient Estimates
Vartanian: SW541
Method 1: Evaluating the Coefficients at their mean values.
The likelihood of being at or below 150% of the poverty line10 years after initially entering the
sample.
HiAFDC = using AFDC for 3 or 4 out of 4 years in late adolescence or early adulthood.
LowAFDC = using AFDC for 1 or 2 out of 4 years in late adolescence or early adulthood.
HiPov = Being at or below 150% of the poverty line 3 or 4 years in late adolescence or
early adulthood.
Lowpov= Being at or below 150% of the poverty line 1 or 2 years in late adolescence or
early
adulthood.
Npsamp = never using AFDC or being below 150% of the poverty line for all four years in
late adolescence or early adulthood.
The above variables are a set of dummy variables with Hipov being the excluded category.
Kids1= number of children
White = dummy variable for race of white.
The LOGISTIC Procedure
Model Information
Data Set
WORK.Z
Response Variable
IN150
Number of Response Levels
2
Number of Observations
1708
Weight Variable
Sum of Weights
Link Function
Optimization Technique
WEIGHT
1708
Logit
Fisher's scoring
Model Fit Statistics
Criterion
AIC
SC
-2 Log L
Intercept
Only
Intercept
and
Covariates
1958.664
1964.107
1956.664
1517.652
1555.754
1503.652
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
453.0119
471.6869
344.0779
6
6
6
<.0001
<.0001
<.0001
The LOGISTIC Procedure
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Intercept
KIDS1
WHITE
HIAFDC
LOWAFDC
1
1
1
1
1
1.0902
-0.3963
-0.9588
0.8817
-0.4945
0.2098
0.1465
0.1449
0.2541
0.2514
26.9979
7.3228
43.7570
12.0388
3.8694
<.0001
0.0068
<.0001
0.0005
0.0492
LOWPOV
NPSAMP
1
1
-1.3263
-2.1128
0.2033
0.1976
42.5562
114.2793
<.0001
<.0001
Odds Ratio Estimates
Effect
Point
Estimate
95% Wald
Confidence Limits
KIDS1
WHITE
HIAFDC
LOWAFDC
0.673
0.383
2.415
0.610
0.505
0.289
1.468
0.373
0.896
0.509
3.974
0.998
LOWPOV
NPSAMP
0.265
0.121
0.178
0.082
0.395
0.178
Determining Mean Values for the Independent Variables
The MEANS Procedure
Variable
N
Mean
Std Dev
Minimum
Maximum
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
KIDS1
1708
0.7684085
0.4219729
0
1.0000000
WHITE
1708
0.7776183
0.4159680
0
1.0000000
HIAFDC
1708
0.0914987
0.2884014
0
1.0000000
LOWAFDC
1708
0.0683175
0.2523639
0
1.0000000
hipov
1708
0.0985880
0.2981954
0
1.0000000
LOWPOV
1708
0.2213286
0.4152628
0
1.0000000
NPSAMP
1708
0.5202671
0.4997354
0
1.0000000
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
To determine the overall probability of being at or below 150% of the poverty line, use the formula for
the logistic function:
prob'
e bX
1%e bX
Replace the x values with the means for the x values. Remember that you must also include the
intercept.
XB=1.0902 +.7684085*(-.3963)+.7776183*(-.9588)+.0914987*(.8817)+
.0683175*(-.4945)+.2213286*(-1.3263)+.5202671*(-2.1128) = -1.30578
Use this value of -1.30578 in the logistic formula to come up with an overall likelihood of being at or
below 150% of the poverty line:
prob'
e &1.30578
1%e &1.30578
'
.270961
'.213194
1.270961
In other words, there is a 21.3194% chance of being at or below 150% of the poverty line.
If you’re interested in the likelihood of being at or below 150% of the poverty line for those who are in
the high AFDC group, you would multiply the high AFDC group by 1 (instead of the mean value for
this group) and multiply the other groups in this set of dummy variables by 0 (low AFDC, low poverty,
and non-poor sample). White and number of kids are still evaluated at their mean values.
XB=1.0902 +.7684085*(-.3963)+.7776183*(-.9588)+1*(.8817)+0*(-.4945)+0*(-1.3263)+
0*(-2.1128) = .921799
Plug this XB into the logistic formula:
prob'
e .921799
1%e
.921799
'
2.514
'.715424
3.514
This gives you the probability of being in the condition for those who are in the high AFDC group,
controlling for the effects of race and number of children.
If you wanted to determine the likelihood of being in or near poverty for those with 5 children, you
would use the first XB numbers from above except that you would multiply the coefficient estimate for
kids by 5 instead of by its mean value.
Method 2: Evaluating the coefficient estimates at the actual values for each sample member
for the independent variables.
This is what I was showing you with the SAS code in class.
Let’s say we only had 3 individuals in the sample (for simplicity) but somehow came up with the
coefficients given above. Let’s say that these 3 individuals had the following independent variable
values:
Observation
1
2
3
Kids1
1
2
3
White
1
0
1
HiAFDC
0
1
0
LowAFDC
1
0
0
Lowpov Npsamp
0
0
0
0
0
0
We would then determine XB values for each of the observations use these XB values to determine
probability estimates for each of the individuals. We would then take the mean of these probability
estimates to come up with an overall probability estimate.
For observation 1:
XB= 1.0902 +1*(-.3963)+1*(-.9588)+0*(.8817)+1*(-.4945)+0*(-1.3263)+0*(-2.1128) = -.7594
prob'
e &.7594
1%e &.7594
'
.4679
'.318755
1.4679
For observation 2:
XB= 1.0902 +2*(-.3963)+0*(-.9588)+1*(.8817)+0*(-.4945)+0*(-1.3263)+0*(-2.1128) = 1.1793
prob'
e 1.1793
1%e 1. 1793
'
3.25
'.7647
4.25
For observation 3:
XB= 1.0902 +3*(-.3963)+1*(-.9588)+0*(.8817)+0*(-.4945)+0*(-1.3263)+0*(-2.1128) = -1.058
prob'
e &1.058
1%e
&1.058
'
.347
'.2577
1.347
To determine the overall mean:
(.318755+.7647+.2577)/3 = .447052, or 44.7052% likelihood of being in or near poverty.
Related documents