Download vanessa_3-20-07_Logi..

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
SAS Commands for Logistic Regression
*SAS EXAMPLE FOR LOGISTIC REGRESSION USING
PROC LOGISTIC AND PROC GENMOD;
options
options
options
options
yearcutoff=1900;
pageno=1 formdlim=" " nodate;
yearcutoff=1900;
pageno=1 title formdlim=" ";
data bcancer;
infile "e:\510\2007\data" lrecl=300;
input idnum 1-4 stopmens 5 agestop1 6-7 numpreg1 8-9 agebirth 10-11
mamfreq4 12 @13 dob mmddyy8. educ 21-22
totincom 23 smoker 24 weight1 25-27;
format dob mmddyy10.;
if dob = "09SEP99"D then dob=.;
if stopmens=9 then stopmens=.;
if agestop1 = 88 or agestop1=99 then agestop1=.;
if agebirth =99 then agebirth=.;
if numpreg1=99 then numpreg1=.;
if mamfreq4=9 then mamfreq4=.;
if educ=99 then educ=.;
if totincom=8 or totincom=9 then totincom=.;
if smoker=9 then smoker=.;
if weight1=999 then weight1=.;
if stopmens = 1 then menopause=1;
if stopmens = 2 then menopause=0;
yearbirth = year(dob);
age = int(("01JAN1997"d - dob)/365.25);
if educ not=. then do;
if educ in (1,2,3,4) then edcat = 1;
-1-
if educ in (5,6)
if educ in (7,8)
then edcat = 2;
then edcat = 3;
highed = (educ in (6,7,8));
end;
if age not=. then do;
if age <50 then agecat=1;
if age >=50 and age < 60 then agecat=2;
if age >=60 and age < 70 then agecat=3;
if age >=70 then agecat=4;
end;
run;
title "Descriptive Statistics for Breats Cancer Data";
proc means data=bcancer n nmiss min max mean std;
run;
title "Logistic Regression with a Continuous Predictor";
proc logistic data=bcancer descending;*The descending option is important for
the way in which you code your
response variable, Y (0 or 1).
This option will model the
probability of the event
occurring Prob(Y = 1). If this option
is not used, you're modelling
the probability of the event NOT
occurring Prob(Y = 0).
model menopause = age / risklimits rsquare;
units age = 1 5 10; *Calculates 3 different odds ratios (ORs) corresponding to a 1, 5
and 10 unit increase in age... The risklimits option includes
95% CI for each of these ORs;
run;
-2-
title "Logistic Regression with a Continuous Predictor";
title2 "Without the Descending Option";
proc logistic data=bcancer ;
model menopause = age / risklimits rsquare;
units age = 1 5 10;
run;
title "Logistic Regression Using Proc Genmod";
proc genmod data=bcancer descending;
model menopause = age / dist = bin; *You need DIST=BIN to get same results as in Proc Logistic;
run;
proc univariate data=bcancer;
var age; *get quartiles for age. The cut-off is arbitrary but a good N in
each category is usually preferred; *Be sure to check the
distribution of the response in each category, too; *You need at
least some variation in the response for each level of your categorical
predictor for the logistic model to work;
run;
data bcancer2; set bcancer;
if age not=. then do;
if 40<=age<=57 then AgeCat2 = 0;
if age > 57 then AgeCat2 = 1;
end;
if educ not=. then do;
if educ in (1,2,3,4) then edcat = 1;
if educ in (5,6)
then edcat = 2;
if educ in (7,8)
then edcat = 3;
highed = (educ in (6,7,8));
end;
run;
-3-
title "Logistic Regression with Dummy Variable Predictor";
title3 "Use Dummy Variable, Coded as 0, 1";
proc logistic data=bcancer2 descending;
model menopause = AgeCat2/ risklimits rsquare;
run;
title "Logistic Regression to Predict Menopause From Education";
proc logistic data=bcancer2 descending;
class edcat(ref="1") / param = ref;
model menopause = edcat/ risklimits rsquare;
run;
title "Logistic Regression with AGECAT";
title2 "This Analysis Does not Work";
title3 "Check out the Parameter Estimates and Standard Errors";
proc logistic data=bcancer descending;
class agecat(ref="1") / param = ref; *Has 4 levels in original dataset;
model menopause = agecat/ risklimits rsquare;
run;
title "Use Proc Freq to check the relationship between AGECAT and MENOPAUSE";
proc freq data=bcancer;
tables agecat*menopause/ chisq;
run;
*Recode Agecat into AGECAT3 with 3 categories;
data bcancer3;
set bcancer;
if age not=. then do;
if age < 50 then agecat3 = 1;
if age >=50 and age < 60 then agecat3 = 2;
if age >=60 then agecat3 = 3;
end;
run;
title "Logistic Regression with Ordinal Categorical Predictor";
title2 "This Analysis Works";
-4-
proc logistic data=bcancer3 descending;
class agecat3(ref="1") / param = ref;
model menopause = agecat3/ risklimits rsquare;
run; *Note to self if the CIs and or SE look funny, do a proc freq;
proc freq data=bcancer3;
tables agecat3*menopause/ chisq;
run;
*Similarly this code can be written as the following;
proc logistic data=bcancer3 descending;
class agecat3 / param = ref reference = first;
model menopause = agecat3/ risklimits rsquare;
run;
*There is usually more than one way to write code in SAS;
*If you want your last group to be the ref category then specify reference = last;
title "Logistic Regression with Several Predictors";
title2 "Predictors are a mix of the aforementioned types";
proc logistic data=bcancer descending;
class edcat(ref="1") / param = ref;
model menopause = age edcat smoker totincom numpreg1
/ rsquare;
run;
title "Logistic Regression Using Proc Genmod";
proc genmod data=bcancer descending;
class edcat(ref="1") / param = ref;
model menopause = age edcat smoker totincom numpreg1
/ dist=bin type3; *If you don't specify dist = bin, your results
WON'T match the results of proc logistic.
run;
-5-
SAS OUTPUT and With
Corresponding Code
*************************************************************************************
title "Descriptive Statistics for Breats Cancer Data";
proc means data=bcancer n nmiss min max mean std;
run;
************************************************************************************
Descriptive Statistics for Breast Cancer Data
The MEANS Procedure
N
Variable
N
Miss
Minimum
Maximum
Mean
Std Dev
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
idnum
370
0
1008.00
2448.00
1761.69
412.7290352
stopmens
369
1
1.0000000
2.0000000
1.1598916
0.3670031
agestop1
297
73
27.0000000
61.0000000
47.1818182
6.3101650
numpreg1
366
4
0
12.0000000
2.9480874
1.8726683
agebirth
359
11
9.0000000
88.0000000
30.2228412
19.5615468
mamfreq4
328
42
1.0000000
6.0000000
2.9420732
1.3812853
dob
361
9
-19734.00
-1248.00
-7899.50
4007.12
educ
365
5
1.0000000
9.0000000
5.6410959
1.6374595
totincom
325
45
1.0000000
5.0000000
3.8276923
1.3080364
smoker
364
6
1.0000000
2.0000000
1.4862637
0.5004993
weight1
360
10
86.0000000
295.0000000
148.3527778
31.1093049
menopause
369
1
0
1.0000000
0.8401084
0.3670031
yearbirth
361
9
1905.00
1956.00
1937.86
10.9836177
age
361
9
40.0000000
91.0000000
58.1440443
10.9899588
edcat
364
6
1.0000000
3.0000000
2.0137363
0.7694786
highed
365
5
0
1.0000000
0.4383562
0.4968666
agecat
361
9
1.0000000
4.0000000
2.3296399
1.0798313
over50
361
9
0
1.0000000
0.7257618
0.4467488
highage
361
9
1.0000000
2.0000000
1.2742382
0.4467488
**************************************************************************************************************
title "Logistic Regression with a Continuous Predictor";
proc logistic data=bcancer descending;*The descending option is important for
the way in which you code your
response variable, Y (0 or 1).
This option will model the
probability of the event
occurring Prob(Y = 1). If this option
is not used, you're modelling
-6-
the probability of the event NOT
occurring Prob(Y = 0).
model menopause = age / risklimits rsquare;
units age = 1 5 10; *Calculates 3 different odds ratios (ORs)
corresponding to a 1, 5 and 10 unit increase
in age... The risklimits option includes
95% Wald CI for each of these ORs;
run;
***********************************************************************************************************
Logistic Regression with a Continuous Predictor
The LOGISTIC Procedure
Model Information
Data Set
WORK.BCANCER
Response Variable
menopause
Number of Response Levels
2
Model
binary logit
Optimization Technique
Fisher's scoring
Number of Observations Read
Number of Observations Used
370
360
Response Profile
Ordered
Value
1
2
menopause
1
0
Total
Frequency
301
59
Probability modeled is menopause=1.
NOTE: 10 observations were deleted due to missing values for the response or explanatory variables.
Model Convergence Status
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
323.165
201.019
SC
327.051
208.792
-2 Log L
321.165
197.019
R-Square
0.2917
Max-rescaled R-Square
-7-
0.4942
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
124.1456
81.0669
49.7646
DF
1
1
1
Pr > ChiSq
<.0001
<.0001
<.0001
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
age
DF
1
1
Effect
age
Estimate
-12.8675
0.2829
Standard
Error
1.9360
0.0401
Wald
Chi-Square
44.1735
49.7646
Pr > ChiSq
<.0001
<.0001
Odds Ratio Estimates
Point
95% Wald
Estimate
Confidence Limits
1.327
1.227
1.436
Association of Predicted Probabilities and Observed Responses
Percent Concordant
89.3
Somers' D
0.806
Percent Discordant
8.7
Gamma
0.822
Percent Tied
2.0
Tau-a
0.222
Pairs
17759
c
0.903
Wald Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
95% Confidence Limits
age
1.0000
1.327
1.227
1.436
age
5.0000
4.115
2.778
6.097
age
10.0000
16.935
7.716
37.170
******************************************************************************************************
COMPARE THE PREVIOUS RESULTS TO A PROC LOGISTIC WITHOUT THE 'DESCENDING' OPTION, THE SIGNS OF THE
PARAMETER ESTIMATES WILL BE REVERSED, AND THE ODDS RATIOS WILL BE IN INVERSE (1/OR) OF THE PREVIOUS OR ESTIMATES.
title "Logistic Regression with a Continuous Predictor";
title2 "Without the Descending Option";
proc logistic data=bcancer ;
model menopause = age / risklimits rsquare;
units age = 1 5 10;
-8-
run;
Logistic Regression with a Continuous Predictor
Without the Descending Option
The LOGISTIC Procedure
Model Information
Data Set
WORK.BCANCER
Response Variable
menopause
Number of Response Levels
2
Model
binary logit
Optimization Technique
Fisher's scoring
Number of Observations Read
Number of Observations Used
370
360
Response Profile
Ordered
Value
1
2
menopause
0
1
Total
Frequency
59
301
Probability modeled is menopause=0.
NOTE: 10 observations were deleted due to missing values for the response or explanatory variables.
Convergence criterion (GCONV=1E-8) satisfied.
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
323.165
201.019
SC
327.051
208.792
-2 Log L
321.165
197.019
R-Square
0.2917
Max-rescaled R-Square
0.4942
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Parameter
Intercept
age
Chi-Square
124.1456
81.0669
49.7646
DF
1
1
1
Pr > ChiSq
<.0001
<.0001
<.0001
Analysis of Maximum Likelihood Estimates
Standard
Wald
DF
Estimate
Error
Chi-Square
1
12.8675
1.9360
44.1735
1
-0.2829
0.0401
49.7646
-9-
Pr > ChiSq
<.0001
<.0001
Effect
age
Odds Ratio Estimates
Point
95% Wald
Estimate
Confidence Limits
0.754
0.697
0.815
Wald Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
95% Confidence Limits
age
1.0000
0.754
0.697
0.815
age
5.0000
0.243
0.164
0.360
age
10.0000
0.059
0.027
0.130
**********************************************************************************
title "Logistic Regression Using Proc Genmod";
proc genmod data=bcancer descending;
model menopause = age / dist = bin; *You need DIST=BIN to get same results as in Proc Logistic;
run;
Logistic Regression Using Proc Genmod
The GENMOD Procedure
Model Information
Data Set
WORK.BCANCER
Distribution
Binomial
Link Function
Logit
Dependent Variable
menopause
Number of Observations Read
Number of Observations Used
Number of Events
Number of Trials
Missing Values
370
360
301
360
10
Response Profile
Ordered
Value
1
2
menopause
1
0
Total
Frequency
301
59
PROC GENMOD is modeling the probability that menopause='1'.
- 10 -
Criteria For Assessing Goodness Of Fit
Criterion
DF
Value
Deviance
358
197.0195
Scaled Deviance
358
197.0195
Pearson Chi-Square
358
250.8081
Scaled Pearson X2
358
250.8081
Log Likelihood
-98.5097
Value/DF
0.5503
0.5503
0.7006
0.7006
Algorithm converged.
Analysis Of Parameter Estimates
Parameter
Intercept
age
Scale
DF
1
1
0
Estimate
-12.8675
0.2829
1.0000
Standard
Error
1.9360
0.0401
0.0000
Wald 95% Confidence
Limits
-16.6621
-9.0730
0.2043
0.3615
1.0000
1.0000
ChiSquare
44.17
49.76
Pr > ChiSq
<.0001
<.0001
NOTE: The scale parameter was held fixed.
YOU DON'T NEED TO WORRY ABOUT THE SCALE PARAMETER. JUST KNOW THAT IT IS SET TO 1.00.
*********************************************************************************************************
proc univariate data=bcancer;
var age; *get quartiles for age. The cut-off is arbitrary but a good N
in each category is usually preferred; *You need at
least some variation in the response for each level of your categorical
predictor for the logistic model to work;
run;
Use Proc Univariate to get Quartiles for AGE
The UNIVARIATE Procedure
Variable: age
Quantiles (Definition 5)
Quantile
Estimate
75% Q3
67
50% Median
57
25% Q1
49
10%
45
5%
43
1%
41
- 11 -
0% Min
40
- 12 -
data bcancer2; set bcancer;
if age not=. then do;
if 40<=age<=57 then AgeCat2 = 0;
if age > 57 then AgeCat2 = 1;
end;
if educ not=. then do;
if educ in (1,2,3,4) then edcat = 1;
if educ in (5,6)
then edcat = 2;
if educ in (7,8)
then edcat = 3;
highed = (educ in (6,7,8));
end;
run;
title "Logistic Regression with Dummy Variable Predictor";
title2 "Use Dummy Variable, Coded as 0, 1";
proc logistic data=bcancer2 descending;
model menopause = AgeCat2/ risklimits rsquare;
run;
Logistic Regression with Dummy Variable Predictor
Use Dummy Variable, Coded as 0, 1
The LOGISTIC Procedure
Response Profile
Ordered
Value
1
2
menopause
1
0
Total
Frequency
301
59
Probability modeled is menopause=1.
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
323.165
249.345
SC
327.051
257.117
-2 Log L
321.165
245.345
R-Square
0.1899
Max-rescaled R-Square
0.3218
Testing Global Null Hypothesis: BETA=0
- 13 -
Test
Likelihood Ratio
Score
Wald
Chi-Square
75.8204
59.3694
18.1149
DF
1
1
1
Pr > ChiSq
<.0001
<.0001
<.0001
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
AgeCat2
DF
1
1
Effect
AgeCat2
Estimate
0.8148
4.3210
Standard
Error
0.1577
1.0152
Wald
Chi-Square
26.6865
18.1149
Pr > ChiSq
<.0001
<.0001
Odds Ratio Estimates
Point
95% Wald
Estimate
Confidence Limits
75.262
10.290
550.474
Wald Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
95% Confidence Limits
AgeCat2
1.0000
75.262
10.290
550.474
title "Logistic Regression to Predict Menopause From Education";
proc logistic data=bcancer2 descending;
class edcat(ref="1") / param = ref;
model menopause = edcat/ risklimits rsquare;
run;
Logistic Regression to Predict Menopause From Education
The LOGISTIC Procedure
Model Information
Data Set
WORK.BCANCER2
Response Variable
menopause
Number of Response Levels
2
Model
binary logit
- 14 -
Optimization Technique
Fisher's scoring
Number of Observations Read
Number of Observations Used
370
363
Response Profile
Ordered
Value
Total
Frequency
menopause
1
2
1
0
305
58
Probability modeled is menopause=1.
NOTE: 7 observations were deleted due to missing values for the response or explanatory variables.
Class Level Information
Design
Class
Value
Variables
edcat
1
0
0
2
3
1
0
0
1
/*Edcat = 1 is the reference category. It has zeroes for
both dummy variables*/
Model Fit Statistics
Criterion
AIC
SC
-2 Log L
R-Square
0.0254
Intercept
Only
320.935
324.829
318.935
Intercept
and
Covariates
315.598
327.281
309.598
Max-rescaled R-Square
0.0434
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
Likelihood Ratio
9.3370
2
0.0094
Score
9.1172
2
0.0105
Wald
8.6314
2
0.0134
Type 3 Analysis of Effects
- 15 -
Effect
edcat
DF
2
Wald
Chi-Square
8.6314
Pr > ChiSq
0.0134
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
edcat
2
edcat
3
DF
1
1
1
Estimate
2.3671
-0.6743
-1.1944
Standard
Error
0.3486
0.4159
0.4146
Wald
Chi-Square
46.1069
2.6279
8.2990
Pr > ChiSq
<.0001
0.1050
0.0040
Odds Ratio Estimates
Effect
edcat 2 vs 1
edcat 3 vs 1
Point
Estimate
0.510
0.303
95% Wald
Confidence Limits
0.225
1.151
0.134
0.683
Wald Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
95% Confidence Limits
edcat 2 vs 1
1.0000
0.510
0.225
1.151
edcat 3 vs 1
1.0000
0.303
0.134
0.683
*********************************************************************************************
title "Logistic Regression with AGECAT";
title2 "This Analysis Does not Work";
title3 "Check out the Parameter Estimates and Standard Errors";
proc logistic data=bcancer descending;
class agecat(ref="1") / param = ref; *AGECAT has 4 levels in the
original dataset;
model menopause = agecat/ rsquare;
run;
Logistic Regression with AGECAT
This Analysis Does not Work
Check out the Parameter Estimates and Standard Errors
The LOGISTIC Procedure
Model Information
Data Set
WORK.BCANCER
Response Variable
menopause
- 16 -
Number of Response Levels
Model
Optimization Technique
2
binary logit
Fisher's scoring
Number of Observations Read
Number of Observations Used
370
360
Response Profile
Ordered
Value
1
2
menopause
1
0
Total
Frequency
301
59
Probability modeled is menopause=1.
NOTE: 10 observations were deleted due to missing values for the response or explanatory variables.
Class
agecat
Class Level Information
Value
Design Variables
1
0
0
0 /*The reference category*/
2
1
0
0
3
0
1
0
4
0
0
1
Model Convergence Status
Quasi-complete separation of data points detected.
WARNING: The maximum likelihood estimate may not exist.
WARNING: The LOGISTIC procedure continues in spite of the above warning. Results shown are based on the last
maximum likelihood iteration. Validity of the model fit is questionable.
Model Fit Statistics
Intercept
Intercept
and
Criterion
Only
Covariates
AIC
SC
-2 Log L
R-Square
0.2636
Test
323.165
218.990
327.051
234.534
321.165
210.990
Max-rescaled R-Square
0.4467
Testing Global Null Hypothesis: BETA=0
Chi-Square
DF
Pr > ChiSq
- 17 -
Likelihood Ratio
Score
Wald
110.1752
111.6605
50.0793
3
3
3
<.0001
<.0001
<.0001
WARNING: The validity of the model fit is questionable.
Effect
agecat
Type 3 Analysis of Effects
Wald
DF
Chi-Square
Pr > ChiSq
3
50.0793
<.0001
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
agecat
2
agecat
3
agecat
4
DF
1
1
1
1
Estimate
0.0202
2.4460
4.2839
14.8969
Standard
Error
0.2010
0.4012
1.0266
205.9
Wald
Chi-Square
0.0101
37.1721
17.4126
0.0052
Pr > ChiSq
0.9199
<.0001
<.0001
0.9423
Odds Ratio Estimates
Effect
agecat 2 vs 1
agecat 3 vs 1
agecat 4 vs 1
Point
Estimate
11.542
72.520
>999.999
95% Wald
Confidence Limits
5.258
25.339
9.696
542.384
<0.001
>999.999
WARNING: The validity of the model fit is questionable.
Wald Confidence Interval for Adjusted Odds Ratios
Effect
Unit
Estimate
95% Confidence Limits
agecat 2 vs 1
1.0000
11.542
5.258
25.339
agecat 3 vs 1
1.0000
72.520
9.696
542.384
agecat 4 vs 1
1.0000
>999.999
<0.001
>999.999
****************************************************************************************************
- 18 -
/*Take a look at proc freq to see what caused the problem in the logistic regression model*/
proc freq data=bcancer;
tables agecat*menopause/ chisq;
run;
Note below that AGECAT=3 has only one person who has not yet gone through menopause, while
AGECAT=4 has no one who has not yet gone through menopause. This is the cause of the problem.
Table of agecat by menopause
agecat
menopause
Frequency‚
Percent ‚
Row Pct ‚
Col Pct ‚
0‚
1‚ Total
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
1 ‚
49 ‚
50 ‚
99
‚ 13.61 ‚ 13.89 ‚ 27.50
‚ 49.49 ‚ 50.51 ‚
‚ 83.05 ‚ 16.61 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
2 ‚
9 ‚
106 ‚
115
‚
2.50 ‚ 29.44 ‚ 31.94
‚
7.83 ‚ 92.17 ‚
‚ 15.25 ‚ 35.22 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
3 ‚
1 ‚
74 ‚
75
‚
0.28 ‚ 20.56 ‚ 20.83
‚
1.33 ‚ 98.67 ‚
‚
1.69 ‚ 24.58 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
4 ‚
0 ‚
71 ‚
71
‚
0.00 ‚ 19.72 ‚ 19.72
‚
0.00 ‚ 100.00 ‚
‚
0.00 ‚ 23.59 ‚
ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ
Total
59
301
360
16.39
83.61
100.00
Frequency Missing = 10
Statistics for Table of agecat by menopause
Statistic
DF
Value
Prob
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Chi-Square
3
111.6605
<.0001
Likelihood Ratio Chi-Square
3
110.1752
<.0001
Mantel-Haenszel Chi-Square
1
78.6978
<.0001
- 19 -
Phi Coefficient
Contingency Coefficient
Cramer's V
0.5569
0.4866
0.5569
Effective Sample Size = 360
Frequency Missing = 10
*************************************************************************************************
*Recode Agecat into AGECAT3 with 3 categories;
data bcancer3;
set bcancer;
if age not=. then do;
if age < 50 then agecat3 = 1;
if age >=50 and age < 60 then agecat3 = 2;
if age >=60 then agecat3 = 3;
end;
run;
title "Logistic Regression with Ordinal Categorical Predictor";
title2 "This Analysis Works";
proc logistic data=bcancer3 descending;
class agecat3(ref="1") / param = ref;
model menopause = agecat3/ risklimits rsquare;
run;
*Similarly this code can be written as the following;
proc logistic data=bcancer3 descending;
class agecat3 / param = ref reference = first;
model menopause = agecat3/ risklimits rsquare;
run;
*There is usually more than one way to write code in SAS;
*If you want your last group to be the ref category then specify reference = last;
*******************************************************************************************************
- 20 -
Logistic Regression with Ordinal Categorical Predictor
This Analysis Works
Testing Global Null Hypothesis: BETA=0
Test
Chi-Square
DF
Pr > ChiSq
108.8365
111.6132
55.3535
2
2
2
<.0001
<.0001
<.0001
Likelihood Ratio
Score
Wald
Type 3 Analysis of Effects
Effect
DF
Wald
Chi-Square
Pr > ChiSq
agecat3
2
55.3535
<.0001
Analysis of Maximum Likelihood Estimates
Parameter
Intercept
agecat3
2
agecat3
3
DF
Estimate
Standard
Error
Wald
Chi-Square
Pr > ChiSq
1
1
1
0.0202
2.4460
4.9565
0.2010
0.4012
1.0234
0.0101
37.1721
23.4578
0.9199
<.0001
<.0001
Odds Ratio Estimates
Effect
agecat3 2 vs 1
agecat3 3 vs 1
Point
Estimate
11.542
142.097
95% Wald
Confidence Limits
5.258
19.120
25.339
>999.999
***************************************************************************************************
title "Logistic Regression Using Proc Genmod";
- 21 -
proc genmod data=bcancer descending;
class edcat(ref="1") / param = ref;
model menopause = age edcat smoker totincom numpreg1
/ dist=bin type3; *If you don't specify dist = bin,
your results WON'T match the
results of proc logistic.
run;
Logistic Regression with Several Predictors
Predictors are a mix of the aforementioned types
Testing Global Null Hypothesis: BETA=0
Test
Likelihood Ratio
Score
Wald
Chi-Square
DF
Pr > ChiSq
110.3657
73.1512
44.6630
6
6
6
<.0001
<.0001
<.0001
- 22 -
Effect
Odds Ratio Estimates
Point
Estimate
age
edcat
2 vs 1
edcat
3 vs 1
smoker
totincom
numpreg1
Parameter
Intercept
age
edcat
edcat
smoker
totincom
numpreg1
Scale
2
3
DF
1
1
1
1
1
1
1
0
Estimate
-10.8151
0.2797
-0.4356
-0.8401
-0.6543
-0.0927
0.0065
1.0000
1.323
0.647
0.432
0.520
0.911
1.006
95% Wald
Confidence Limits
1.214
0.219
0.143
0.245
0.655
0.779
1.442
1.910
1.303
1.102
1.268
1.300
Analysis Of Parameter Estimates
Standard
Wald 95% Confidence
Error
Limits
2.2132
-15.1530
-6.4773
0.0439
0.1937
0.3658
0.5524
-1.5182
0.6470
0.5636
-1.9448
0.2647
0.3836
-1.4062
0.0976
0.1683
-0.4226
0.2372
0.1305
-0.2494
0.2623
0.0000
1.0000
1.0000
NOTE: The scale parameter was held fixed.
LR Statistics For Type 3 Analysis
ChiSource
DF
Square
Pr > ChiSq
age
1
89.12
<.0001
edcat
2
2.45
0.2932
smoker
1
2.96
0.0852
totincom
1
0.31
0.5794
numpreg1
1
0.00
0.9605
- 23 -
ChiSquare
23.88
40.61
0.62
2.22
2.91
0.30
0.00
Pr > ChiSq
<.0001
<.0001
0.4304
0.1361
0.0881
0.5819
0.9605
Related documents