Download LogisticRegressionHandout

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Discrete choice wikipedia , lookup

Interaction (statistics) wikipedia , lookup

Instrumental variables estimation wikipedia , lookup

Data assimilation wikipedia , lookup

Time series wikipedia , lookup

Expectation–maximization algorithm wikipedia , lookup

Least squares wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Stat 701 Handout on
Binary Logistic Regression
The Study Of Interest (Example on page 575 of text): The data provided below is from a study to assess the
ability to complete a task within a specified time pertaining to a complex programming problem, and to
relate this ability to the experience level of the programmer. Twenty-five programmers were used in this
study. They were all given the same task. The data set from the study is given below.
X = Months of Programming Experience;
Y = Success in Task (1 = Successful, 0 = Failure).
Note that X, the predictor variable is a quantitative variable; while Y, the response variable is a
dichotomous, qualitative variable.
The scatterplot of the data is given below.
Y = Task Success
1.0
0.5
0.0
0
10
20
30
X = Months Of Experience
The problem is to obtain a model for relating the response variable (Y) to the predictor variable (X). The
model utilized is called the logistic regression model described as follows:
Let  ( x)  P{Y  1 | X  x} be the conditional probability of observing a successful outcome in
performing the task when the level of programming experience of the subject is x. In the logistic regression
model it is assumed that
  ( x) 
log 
   0  1 x.
1   ( x) 
This is equivalent to assuming that
 ( x) 
exp 0  1 x
.
1  exp 0  1 x
Here are two graphs of this logistic function corresponding to two sets of values of (0, 1). Note that one
of the graphs will be a very bad model for the data above, while the other graph might be a good model for
the success probability of the programming data above.
1
Two Graphs of the Logistic Probability Function
1.0
Beta0 = 4, Beta1 = -2
Probability of Success
0.9
0.8
0.7
0.6
0.5
0.4
0.3
Beta0 = -3.05, Beta1 = .16
0.2
0.1
0.0
0
10
20
30
x
Interpretation of the Coefficients (discussed in more detail in class):
0 = intercept term for the linear model of the log-odds.
First, the ODDS of the probability (x) is given by
ODDS [ ( x)] 
 ( x)
.
1   ( x)
The coefficient 1 could be interpreted in several ways.
1.
2.
It could be viewed as the change in the value of the log-odds when the value of the predictor
variable is changed by one unit.
exp(1) could also be interpreted as the ODDS RATIO (OR), which is the ratio of the odds
when the predictor value is (x+1) and the odds when the predictor value is x. Symbolically,
exp( 1 ) 
ODDS [ ( x  1)]
 ODDS RATIO (OR ).
ODDS [ ( x)]
Thus, 1 could also be interpreted as the LOGARITHM of the ODDS RATIO, that is, 1 = ln(OR).
Estimation and Testing when Dealing with Logistic Model
1.
2.
Maximum Likelihood Estimation Procedure.
Testing hypothesis is via likelihood ratio tests.
Will not go into any detail about these methods of inference, but simply illustrate them using the results
from the logistic regression analysis in Minitab. It should be noted that there are no closed form
expressions to the regression coefficient estimates. They are obtained iteratively, and the object of this
iterative procedure is to obtain the regression coefficients that will maximize the likelihood function. As
such, the estimation procedure is a very computer-intensive procedure.
We now illustrate the results of the Minitab Analysis.
2
Binary Logistic Regression
(Minitab Output)
Step
0
1
2
3
4
5
Log-Likelihood
-17.148
-12.866
-12.714
-12.712
-12.712
-12.712
Link Function:
Logit
Response Information
Variable
TaskSucc
Value
1
0
Total
Count
11
14
25
(Event)
Logistic Regression Table
Predictor
Constant
MonOfExp
Coef
-3.060
0.16149
StDev
1.259
0.06498
Odds
Ratio
Z
P
-2.43 0.015
2.49 0.013
95% CI
Lower
Upper
1.18
1.03
1.33
Log-Likelihood = -12.712
Test that all slopes are zero: G = 8.872, DF = 1, P-Value = 0.003
Goodness-of-Fit Tests
Method
Pearson
Deviance
Hosmer-Lemeshow
Chi-Square
19.623
19.879
5.946
DF
17
17
8
P
0.294
0.280
0.653
Table of Observed and Expected Frequencies:
(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)
Value
1
Obs
Exp
0
Obs
Exp
Total
Group
6
1
2
3
4
5
7
8
9
0
0.2
0
0.3
1
0.3
1
1.0
1
1.2
2
1.8
3
2.7
1
1.7
3
3.0
2
3
2
4
10
1
1.0
2
1.2
1
1.4
1
1.6
3
2.6
11
2
1.8
1
1.0
0
0.8
1
0.6
1
0.4
0
0.4
14
3
2
2
2
2
3
25
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs
Concordant
Discordant
Ties
Total
Number
127
25
2
154
Percent
82.5%
16.2%
1.3%
100.0%
Summary Measures
Somers' D
Goodman-Kruskal Gamma
Kendall's Tau-a
3
0.66
0.67
0.34
Total
A Goodness-Of-Fit Criterion
Model Deviance: compares the log-likelihood of the fitted logistic model with the perfectly fitting model
(called the saturated model). The smaller the value of this deviance, the better is the fit. The DEVIANCE
statistic is given by:
n
DEV ( X )  2 [Yi ln( p ( X i ))  (1  Yi ) ln( 1  p( X i ))].
i 1
The p(Xi) is the estimate of the success probability for the predictor value of X i. Under the hypothesis that
the logistic model is correct, the statistic DEV(X) follows a chi-square distribution with degrees-offreedom of n - 1 (in general, n - p, where p-1 is the number of predictor variables).
Chi-Square Statistic: The data is grouped into classes according to their fitted logit values. Let there be c
groups. For each group, determine the number of observed successes (denoted by Oj1's) and the number of
observed failures (denoted by Oj0's). Also, for each group, obtain the expected successes and failures
(denoted by Ej1's and Ej0's). If the logistic regression model is appropriate, then the observed and expected
frequencies for each of the cells/groupings will tend to be close to each other. This closeness, or lack
thereof, is measured by the chi-square statistic given by:
c
1
  
2
(O jk  E jk ) 2
E jk
j 1 k  0
.
If the model is appropriate then this chi-square statistic follows a chi-square distribution with degress-offreedom of c-2, so to test the model, this is compared to the 100(1-)th percentile of the chi-square
distribution with c-2 degrees-of-freedom.
Some Diagnostic Plots
These diagnostic plots are obtained by computing the above statistics when a given observation is deleted.
Delta Chi-Square versus Probability
7
Delta Chi-Square
6
5
4
3
2
1
0
0.20
0.45
0.70
0.95
Probability
4
Delta Deviance versus Probability
Delta Deviance
4
3
2
1
0
0.20
0.45
0.70
0.95
Probability
Implementation Using SAS
THE PROGRAM
/* Logistic Regression Illustration */
data prgtask;
input MonExp TskSucc Est;
cards;
14 0 0.310262
29 0 0.835263
6 0 0.109996
25 1 0.726602
18 1 0.461837
4 0 0.082130
18 0 0.461837
12 0 0.245666
22 1 0.620812
6 0 0.109996
30 1 0.856299
11 0 0.216980
30 1 0.856299
5 0 0.095154
20 1 0.542404
13 0 0.276802
9 0 0.167100
32 1 0.891664
24 0 0.693379
13 1 0.276802
19 0 0.502134
4 0 0.082130
28 1 0.811825
22 1 0.620812
8 1 0.145815
;
proc print;
proc logistic DESCENDING;
/* The keyword DESCENDING is to indicate that 1=Success */
model TskSucc = MonExp / waldcl corrb covb itprint lackfit plcl plrl rsquare;
run;
5
The OUTPUT
Obs
Mon
Exp
Tsk
Succ
Est
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
14
29
6
25
18
4
18
12
22
6
30
11
30
5
20
13
9
32
24
13
19
4
28
22
8
0
0
0
1
1
0
0
0
1
0
1
0
1
0
1
0
0
1
0
1
0
0
1
1
1
0.31026
0.83526
0.11000
0.72660
0.46184
0.08213
0.46184
0.24567
0.62081
0.11000
0.85630
0.21698
0.85630
0.09515
0.54240
0.27680
0.16710
0.89166
0.69338
0.27680
0.50213
0.08213
0.81183
0.62081
0.14582
The LOGISTIC Procedure
Model Information
Data Set
Response Variable
Number of Response Levels
Number of Observations
Link Function
Optimization Technique
WORK.PRGTASK
TskSucc
2
25
Logit
Fisher's scoring
Response Profile
Ordered
Value
TskSucc
Total
Frequency
1
2
1
0
11
14
Maximum Likelihood Iteration History
Iter
Ridge
-2 Log L
Intercept
MonExp
0
1
2
3
4
0
0
0
0
0
34.296490
25.732187
25.428428
25.424575
25.424574
-0.241162
-2.401052
-2.982504
-3.058497
-3.059696
0
0.127956
0.157626
0.161427
0.161486
Last Change in -2 Log L
6
9.1283891E-7
Last Evaluation of Gradient
Intercept
MonExp
-1.577658E-7
5.635832E-7
Convergence criterion (GCONV=1E-8) satisfied.
The LOGISTIC Procedure
Model Fit Statistics
Criterion
Intercept
Only
Intercept
and
Covariates
36.296
37.515
34.296
29.425
31.862
25.425
AIC
SC
-2 Log L
R-Square
0.2987
Max-rescaled R-Square
0.4003
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
8.8719
7.9742
6.1760
1
1
1
0.0029
0.0047
0.0129
Likelihood Ratio
Score
Wald
Analysis of Maximum Likelihood Estimates
Parameter
DF
Estimate
Standard
Error
Chi-Square
Pr > ChiSq
Intercept
MonExp
1
1
-3.0597
0.1615
1.2594
0.0650
5.9029
6.1760
0.0151
0.0129
Odds Ratio Estimates
Effect
Point
Estimate
MonExp
1.175
95% Wald
Confidence Limits
1.035
1.335
Association of Predicted Probabilities and Observed Responses
Percent Concordant
Percent Discordant
Percent Tied
Pairs
82.5
16.2
1.3
154
Somers' D
Gamma
Tau-a
c
0.662
0.671
0.340
0.831
The LOGISTIC Procedure
Profile Likelihood Confidence
Interval for Parameters
Parameter
Estimate
Intercept
MonExp
-3.0597
0.1615
95% Confidence Limits
-6.0369
0.0500
7
-0.9159
0.3140
Wald Confidence Interval for Parameters
Parameter
Estimate
Intercept
MonExp
-3.0597
0.1615
95% Confidence Limits
-5.5280
0.0341
-0.5914
0.2888
Profile Likelihood Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
MonExp
1.0000
1.175
95% Confidence Limits
1.051
1.369
Estimated Covariance Matrix
Variable
Intercept
MonExp
Intercept
MonExp
1.585967
-0.0754
-0.0754
0.004222
Estimated Correlation Matrix
Variable
Intercept
MonExp
1.0000
-0.9214
-0.9214
1.0000
Intercept
MonExp
The LOGISTIC Procedure
Partition for the Hosmer and Lemeshow Test
Group
Total
1
2
3
4
5
6
7
8
3
3
3
3
3
3
3
4
TskSucc = 1
Observed
Expected
0
1
0
1
1
3
2
3
TskSucc = 0
Observed
Expected
0.26
0.37
0.63
0.86
1.43
1.78
2.23
3.44
3
2
3
2
2
0
1
1
Hosmer and Lemeshow Goodness-of-Fit Test
Chi-Square
DF
Pr > ChiSq
5.1453
6
0.5253
8
2.74
2.63
2.37
2.14
1.57
1.22
0.77
0.56
Another Example
Multiple Logistic Regression
Study Considered (Example on page 582 but using the whole data set): To investigate an epidemic
outbreak of a disease that is spread by mosquitoes, individuals were randomly sampled within two sectors
in a city to determine if the person has recently contracted the disease under study. Response variables was
coded 1 = Yes, 0 = No. The predictor variables considered are:
1.
2.
3.
Age, a quantitative variable;
SocioEconomic status, a qualitative variable taking the values Upper, Middle, Lower, and
which were then coded by using two dummy variables with the following coding: (0, 0) =
Upper, (1, 0) = Middle, and (0, 1) = Lower.
CitySector, which is a qualitative variable taking values Sector 1 (coded 1) and Sector 2
(coded 2).
To give you an idea of the data set, the plot below is a scatterplot of Disease Status versus Age.
DiseaseStatus
1.0
0.5
0.0
0
10
20
30
40
50
60
70
80
90
Age
Using Minitab, we fit a multiple logistic regression model. The results of this analysis is summarized next.
Binary Logistic Regression
Link Function:
Logit
Response Information
Variable
DiseaseS
Value
1
0
Total
Count
107
89
196
(Event)
Logistic Regression Table
Predictor
Constant
Age
SocEcoSt
SocEcoSt
CitySect
Coef
0.1963
0.03596
-0.9768
0.7751
-0.0213
StDev
0.7011
0.01001
0.2012
0.3584
0.3927
Z
0.28
3.59
-4.85
2.16
-0.05
P
0.780
0.000
0.000
0.031
0.957
Odds
Ratio
1.04
0.38
2.17
0.98
95% CI
Lower
Upper
1.02
0.25
1.08
0.45
Log-Likelihood = -107.826
Test that all slopes are zero: G = 54.406, DF = 4, P-Value = 0.000
9
1.06
0.56
4.38
2.11
Goodness-of-Fit Tests
Method
Pearson
Deviance
Hosmer-Lemeshow
Chi-Square
165.767
185.154
9.343
DF
165
165
8
P
0.469
0.135
0.314
Table of Observed and Expected Frequencies:
(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)
Value
1
Obs
Exp
0
Obs
Exp
Total
Group
6
1
2
3
4
5
6
2.9
3
4.5
3
6.0
9
8.2
9
10.5
13
16.1
17
15.5
16
13.0
11
11.8
11
9.5
19
20
19
20
20
7
8
9
13
11.5
14
15.0
15
14.4
18
16.5
17
17.6
107
6
7.5
7
6.0
4
4.6
2
3.5
2
1.4
89
19
21
19
20
10
19
Total
196
Measures of Association:
(Between the Response Variable and Predicted Probabilities)
Pairs
Concordant
Discordant
Ties
Total
Number
7560
1930
33
9523
Percent
79.4%
20.3%
0.3%
100.0%
Summary Measures
Somers' D
Goodman-Kruskal Gamma
Kendall's Tau-a
0.59
0.59
0.29
CONCLUSIONS??
Question: Suppose now that we want to see the effect of SocioEconomic Status on Disease Outbreak,
given that the predictors of AGE and CITY SECTOR are already in the model. To answer this question, we
need to fit the reduced model which only contains AGE and CITY SECTOR as predictors in order to be
able to compute the DEVIANCE statistic for SOCIOECONOMIC STATUS after accounting for AGE and
CITY SECTOR. This statistic will be denoted by
DEV(SocEconStat | Age, City Sector) = DEV(Age, City Sector) - DEV(Age, SocEconStat, CitySect).
This is called the partial deviance and is analogous to the extra-sum of squares idea in multiple linear
regression.
The results of fitting the reduced model is given below:
Binary Logistic Regression
Link Function:
Logit
Response Information
Variable
DiseaseS
Value
1
Count
107
(Event)
10
0
Total
89
196
Logistic Regression Table
Predictor
Constant
Age
CitySect
Coef
-0.6875
0.034064
0.1739
StDev
0.2599
0.009345
0.3449
Odds
Ratio
Z
P
-2.65 0.008
3.65 0.000
0.50 0.614
95% CI
Lower
Upper
1.03
1.19
1.02
0.61
1.05
2.34
Log-Likelihood = -126.065
Test that all slopes are zero: G = 17.928, DF = 2, P-Value = 0.000
Goodness-of-Fit Tests
Method
Pearson
Deviance
Hosmer-Lemeshow
Chi-Square
93.182
119.708
13.116
DF
91
91
8
P
0.417
0.023
0.108
Table of Observed and Expected Frequencies:
(See Hosmer-Lemeshow Test for the Pearson Chi-Square Statistic)
Value
1
Obs
Exp
0
Obs
Exp
Total
1
5
Group
6
2
3
4
11
7.5
8
7.4
9
8.9
5
8.9
10
9.6
10
13.5
11
11.6
12
12.1
14
10.1
9
9.4
21
19
21
19
19
7
8
9
12
10.9
11
11.2
9
12.9
13
13.7
19
16.0
107
8
9.1
8
7.8
11
7.1
6
5.3
0
3.0
89
20
19
20
19
10
19
Total
196
SAS IMPLEMENTATIO N
/* Multiple Logistic Regression */
data DisOut;
input ObsNum Age SocEcD1 SocEcD2 CitySect DisSta;
label SocEcD1 = "Indicator for Middle SocioEcon Status"
SocEcD2 = "Indicator for Lower SocioEcon Status"
CitySect = "City Sector (0 = Sector 1)"
DisSta = "Disease Status (1=Diseased)";
Cards;
(Data Set to be Inserted here)
run;
proc print;
run;
proc logistic;
model DisSta = Age SocEcD1 SocEcD2 CitySect / itprint plcl plrl rsquare lackfit;
run;
11
THE OUTPUT
Data Set: WORK.DISOUT
Response Variable: DISSTA
Response Levels: 2
Number of Observations: 196
Link Function: Logit
Disease Status (1=Diseased)
Response Profile
Ordered
Value
DISSTA
Count
1
2
0
1
89
107
Maximum Likelihood Iterative Phase
Iter Step
0
1
2
3
4
5
INITIAL
IRLS
IRLS
IRLS
IRLS
IRLS
-2 Log L
INTERCPT
270.058302
217.769394
215.679039
215.652532
215.652526
215.652526
AGE
-0.184192
-0.316255
-0.213630
-0.196544
-0.196262
-0.196262
0
-0.025919
-0.034772
-0.035939
-0.035956
-0.035956
SOCECD1
SOCECD2
0
0
0
0.811906 -0.580761
0.958907 -0.751783
0.976515 -0.774711
0.976768 -0.775062
0.976768 -0.775062
CITYSECT
0.016157
0.021381
0.021312
0.021305
0.021305
Last Change in -2 Log L: 1.136868E-13
Last Evaluation of Gradient
INTERCPT
-8.223913E-7
AGE
-0.000054821
SOCECD1
-4.526762E-7
SOCECD2
CITYSECT
-1.795719E-6
-5.701351E-7
The LOGISTIC Procedure
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Criterion
AIC
SC
-2 LOG L
Score
Intercept
Only
Intercept
and
Covariates
272.058
275.336
270.058
.
225.653
242.043
215.653
.
RSquare = 0.2424
Chi-Square for Covariates
.
.
54.406 with 4 DF (p=0.0001)
48.404 with 4 DF (p=0.0001)
Max-rescaled RSquare = 0.3241
12
Analysis of Maximum Likelihood Estimates
Variable DF
INTERCPT
AGE
SOCECD1
SOCECD2
CITYSECT
1
1
1
1
1
Parameter Standard
Wald
Pr >
Standardized
Estimate Error Chi-Square Chi-Square Estimate
-0.1963
-0.0360
0.9768
-0.7751
0.0213
0.7011
0.0100
0.2012
0.3584
0.3927
0.0784
12.9026
23.5669
4.6760
0.0029
0.7795
0.0003
0.0001
0.0306
0.9567
Odds
Ratio
.
.
-0.374763
0.467169
-0.210140
0.005348
0.965
2.656
0.461
1.022
Association of Predicted Probabilities and Observed Responses
Concordant = 79.4%
Discordant = 20.4%
Tied
= 0.2%
(9523 pairs)
Somers' D = 0.590
Gamma
= 0.591
Tau-a
= 0.294
c
= 0.795
Parameter Estimates and 95% Confidence Intervals
Profile Likelihood
Confidence Limits
Variable
Parameter
Estimate
Lower
Upper
INTERCPT
AGE
SOCECD1
SOCECD2
CITYSECT
-0.1963
-0.0360
0.9768
-0.7751
0.0213
-1.5875
-0.0565
0.5926
-1.4879
-0.7506
1.1740
-0.0170
1.3843
-0.0772
0.7961
Conditional Odds Ratios and 95% Confidence Intervals
Profile Likelihood
Confidence Limits
Variable
Unit
Odds
Ratio
Lower
Upper
AGE
SOCECD1
SOCECD2
CITYSECT
1.0000
1.0000
1.0000
1.0000
0.965
2.656
0.461
1.022
0.945
1.809
0.226
0.472
0.983
3.992
0.926
2.217
Hosmer and Lemeshow Goodness-of-Fit Test
Group
Total
1
2
3
4
5
6
7
8
9
10
21
20
20
20
21
20
21
20
20
13
DISSTA = 0
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Observed
Expected
2
2
5
7
6
12
14
15
17
9
1.71
3.71
4.95
5.90
8.62
9.78
13.05
14.11
16.03
11.14
DISSTA = 1
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Observed
Expected
19
18
15
13
15
8
7
5
3
4
19.29
16.29
15.05
14.10
12.38
10.22
7.95
5.89
3.97
1.86
Goodness-of-fit Statistic = 7.1833 with 8 DF (p=0.5170)
_________________________________________________________________________________________
13
SELECTING BEST VARIABLES
You may also use SAS to select the appropriate variables to include in your model. You do this by
using the INCLUDE = p and SELECTION = STEPWISE option in the MODEL statement. The value of p tells
SAS to include in the model the first p variables listed. Thus, for the above data set, we could
use the command
proc logistic;
model DisSta =SocEcD1 SocEcD2 CitySect Age / include = 2 selection=stepwise;
run;
The relevant part of the output is given below:
Stepwise Selection Procedure
The following variables will be included in each model:
INTERCPT
Step
SOCECD1
SOCECD2
0. The INCLUDE variables were entered.
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Criterion
AIC
SC
-2 LOG L
Score
Intercept
Only
Intercept
and
Covariates
272.058
275.336
270.058
.
236.851
246.685
230.851
.
Chi-Square for Covariates
.
.
39.207 with 2 DF (p=0.0001)
37.067 with 2 DF (p=0.0001)
Residual Chi-Square = 14.7090 with 2 DF (p=0.0006)
Step
1. Variable AGE entered:
Model Fitting Information and Testing Global Null Hypothesis BETA=0
Criterion
AIC
SC
-2 LOG L
Score
Intercept
Only
Intercept
and
Covariates
272.058
275.336
270.058
.
223.655
236.768
215.655
.
Chi-Square for Covariates
.
.
54.403 with 3 DF (p=0.0001)
48.402 with 3 DF (p=0.0001)
Residual Chi-Square = 0.0029 with 1 DF (p=0.9567)
NOTE: No (additional) variables met the 0.05 significance level for entry into the model.
Summary of Stepwise Procedure
Step
1
Variable
Entered
Removed
AGE
Number
In
Score
Chi-Square
Wald
Chi-Square
Pr >
Chi-Square
3
14.7000
.
0.0001
14
Analysis of Maximum Likelihood Estimates
Variable DF
INTERCPT
SOCECD1
SOCECD2
AGE
1
1
1
1
Parameter Standard
Wald
Pr >
Standardized
Estimate Error Chi-Square Chi-Square Estimate
-0.2009 0.6960
0.9772 0.2010
-0.7700 0.3463
-0.0358 0.00978
0.0833
23.6272
4.9459
13.4347
0.7729
0.0001
0.0262
0.0002
.
.
0.467385
-0.208780
-0.373561
Odds
Ratio
2.657
0.463
0.965
Association of Predicted Probabilities and Observed Responses
Concordant = 79.4%
Discordant = 20.3%
Tied
= 0.3%
(9523 pairs)
Somers' D = 0.591
Gamma
= 0.592
Tau-a
= 0.294
c
= 0.795
Conclusions: By using this procedure, it determined that the variable City Sector is not an important
predictor.
Note: If you did not include the option INCLUDE = 2, then it will also see if the SocioEconomic variables
are also important. Below is the program and output:
Relevant Program Portion:
proc logistic;
model DisSta = SocEcD1 SocEcD2 CitySect Age / selection=stepwise;
run;
15