Download portable document (.pdf) format

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Neural modeling fields wikipedia , lookup

Mathematical model wikipedia , lookup

Data (Star Trek) wikipedia , lookup

Cross-validation (statistics) wikipedia , lookup

Gene expression programming wikipedia , lookup

Pattern recognition wikipedia , lookup

Linear belief function wikipedia , lookup

Time series wikipedia , lookup

Transcript
Running head: QUASI-NONCONVERGENCE LOGISTIC REGRESSION
Handling Quasi-Nonconvergence in Logistic Regression:
Technical Details and an Applied Example
Jeffrey M. Miller
Northcentral University
M. David Miller
University of Florida
Author Note
Jeffrey M. Miller, Ph.D., College of Education, Northcentral University
M. David Miller, Ph.D., Research & Evaluation Methodology, University of Florida
Correspondence regarding this article should be addressed to Jeffrey M. Miller,
4117 SW 20th Ave., #73, Gainesville, FL 32607.
contact: [email protected]
1
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
2
Abstract
Nonconvergence is a concern for any iterative data analysis process. However,
there are instances in which convergence will be obtained for the overall solution but not
for a specific estimate. For most software packages, it is not easy to notice this problem
unless the researcher has a priori knowledge of reasonable solutions. Hence, faulty
inferences can be disguised by a presumably correct estimation procedure known as
“quasi-nonconvergence”.
This type of nonconvergence occurs in logistic regression models when the data
are quasi-completely separated. This is to say that prediction is completely or nearly
completely perfect. Firth (1993) presented a penalized likelihood correction that was then
extended by Heinze and Ploner (2003) to solve the quasi-nonconvergence problem. This
procedure was applied to educational research data to demonstrate its success in
eliminating the problem.
Keywords: quasi-nonconvergence nonconvergence logistic regression
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
3
Handling Quasi-Nonconvergence in Logistic Regression:
Technical Details and an Applied Example
Many researchers have experienced nonconvergence errors where, for one reason
or another, a maximum likelihood solution can not be calculated or does not exist.
Conditions for nonconvergence include sparseness of data, multiple maximas,
unspecified boundary constraints, and data separation. Logistic regression analyses are
especially prone to data separation issues resulting in complete or quasi-complete
separation. This article provides calculations for ML estimates in logistic regression, a
solution to the data separation problem, and an example using real data.
Logistic regression is often used to describe and/or predict a binary outcome given
a set of covariates. The technique is widely used in research including the topics of
dropping out of school (Suh, Suh, & Houston, 2007), retention and graduation
(Wohlgemuth, Whalen, Sullivan, Nading, Mack, & Wang, 2007), and skipping classes
(Kimberly, 2007). In the parlance of generalized linear modeling (MaCullagh & Nelder;
1989; Nelder & Wedderburn, 1972), the logistic function has a random, a systematic, and
a linking component. The random component specifies information about the response
variable. In this case, given n randomly selected, identically distributed, and independent
trials, the random component is specified as Y ~ B(n,  ) (Agresti, 1996). This is to say
that outcome Y has a binomial (B) distribution with the n parameter representing the
number of trials and the π parameter representing the probability that Y equals one. This
As explained by McCullagh and Nelder (1989), the area under the curve for k
successes, or the probability mass function for Pr(Y = 1), can be calculated by hand when
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
4
provided the total number of trials (n), the number of successes (k) and the probability of
a success.
f (k ; n, p) 
n!
[ p k (1  p)n k ] .
k !(n  k )!
(1)
The systematic component for the logistic regression model contains the P
covariates that are specified as predictors of the probability that Y = 1 through their P
estimators. The link function specifies how the random component should be related to
the systematic component. For typical regression analyses (i.e., ordinary least squares
regression), we equate the mean of the response variable to the predictor variables and
have assumed the identity link. However, if we do this with a binary response variable
then we can feasibly obtain a regression prediction equation that permits predicted values
less than zero or greater than one (Hair, Black, Babin, Anderson, & Tatham, 2006;
Agresti, 1996). This is undesirable since we are modeling a probability that is bounded
between zero and one. Further, ordinary least squares regression models assume normally
distributed residuals, which is rarely the case for binary response variables.
The typical (i.e., canonical) link for logistic regression is the logit, which is the
natural log of the odds ratio where the odds ratio is equal to the probability divided by the
probability subtracted from one.
  g ( )  log( /(1   ))
(2)
Combining the random component, systematic component, and link function yields
the logistic generalized linear model
 
logit(π) = log 
 1 

  0   X 1  ... X p .

(3)
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
5
Since natural logs can be reversed through exponentiation, and since odds can be
converted to probabilities, the fitted equation can be used to predict probabilities via
P
Pr( yi  1| xi  )  {1  exp( xip  p )}1
(4)
p 1
or equivalently

exp(  1 X 1 
1  [exp(  1 X 1 
 p X p)
  p X p )]
.
(5)
Obtaining the required P + 1 estimates of β requires maximum likelihood
estimation. An iterative process is used to find the values of the coefficients that are most
likely (i.e., maximized) given the data. This is done by iteratively solving a likelihood
function.
N
L(  | y )  
i 1
ni !
 iyi (1   i )ni  yi .
yi !(ni  yi )!
(6)
First and second derivatives of this function are required in order to determine the
maximizing estimates. The calculations are simplified after applying algebraic
manipulations to the function. First, we simplify the term with the subtracting exponential
(n – y) since that is equal to their quotient. Second, we return to the generalized linear
model for logistic regression, exponentiate both sides of the equation, and solve for π.
The results of exponentiation can be substituted in the left-hand side of the likelihood
equation, and the solution to π can be substituted into the right-hand side of the likelihood
equation.
This simplification contains terms that are powers of powers. Thus, the equation
can be simplified further since a number raised to a raised power is equal to that number
raised to the product of those powers. Finally, further simplification by applying the
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
6
natural logarithm yields a function that readily permits calculation of first and second
derivatives.
N
P
P
i 1
p 1
p 1
l (  )  [ yi ( xip  p )  ni ][log(1  exp( xip  p )]
(7)
T
hen, differentiation of the first derivative,

 p
P
x
p 0
ip
 p  xip
(8)
which leads to P + 1 equations to be solved for βp by setting them to zero.
l (  ) N
 yi xip  ni i xip
 p i 1
(9)
This differentiation of the log-likelihood of the parameters is also known as the Score
function (U) or the gradient of the log-likelihood. Next, differentiation of the second
derivative
 2l (  )
 N

 yi xip  ni i xip
 p  p '  p ' i 1
(10)
leads to another set of equations to be solved for β.
N
 2l (  )
  ni xip i (1   i ) xip .
 p  p '
i 1
(11)
During the iterative procedure, a maximum or minimum will be achieved if and
when the matrix for the second derivatives contains only negative numbers on its
diagonal (i.e., a negative-definite matrix). Note that the equation for the second derivative
includes the term,  i (1   i ) . This is also the variance for a binomial variable; hence, it is
not a coincidence that the matrix of second derivatives is also the covariance matrix for
the maximum likelihood estimates.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
7
Iteratively solving the equations is a tedious process. Ferron & Hess (2007)
provide an excellent didactic example of both maximum likelihood estimation and the
iterative procedure for structural equation models. Succinctly, starting values for the
estimates are declared and succeeding iterations are based on the first two Taylor
polynomial expansions leading to subsequent estimates that are hopefully closer and
closer to the maximized solution for the estimates.
A problem can arise when maximizing estimates for data that suffers from
complete or quasi-complete separation (Albert & Anderson, 1984). This is to say that the
probability of y=1 or y=0 is nearly perfectly predictable from a predictor or set of
predictors (Webb, Wilson, & Choug, 2004). An extreme example of the separation issue
occurs when a 2X2 analysis has one cell containing all 0's or all 1's (Heinze & Ploner,
2003). The problem has been addressed in detail in the fields of medicine (Heinze, 2003)
and biometrics (Firth, 1993). However, no research was found that addressed the issue
using educational data.
Convergence for logistic regression models is affected by the configurations of
the observed values in the sample space (So, 1999; Santner & Duffy, 1985; Albert &
Albertson, 1984). This is simple to conceptualize by imagining a scatterplot relating y =
the probability of being in elementary school and x = subject age (ranging from 5 to 35).
There would obviously be two distinct clusters of observations; this is the configuration
for this space. Suppose that the resulting maximum likelihood estimate for the age
coefficient perfectly predicts being enrolled or not being enrolled in elementary school.
In fact, given perfect predictability, the maximum likelihood solution is zero since the fit
is perfect. This is complete separation, and it would be apparent in the data after sorting
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
8
by age. A scatterplot would draw a line between the two clusters; no member belonging
to one cluster would appear either on the line or within the other cluster.
A more realistic scenario is one that distinguishes between borderline
observations. Due to month of birth differences, there is presumably an age at which only
some observations will be classified as enrolled in elementary school. The scatterplot
would still neatly divide the clusters; however, some observations would fall on the line
itself. In this case, a maximum likelihood estimate exists and is usually very close to zero.
However, the dispersion matrix will usually take on unrealistically large or small values.
In other words, without prior knowledge regarding reasonable estimates of coefficients
and variances, the “successful convergence” may create the illusion of accurate results
leading to incorrect inferences and interpretations leading to what we term quasinonconvergence.
If you have an idea of what the coefficients and/or variances should be then the
problem can be identified since the provided estimates may be absurd. Variances tend to
be extremely inflated (Webb, Wilson, & Choug, 2004), and odds ratios are often infinite
(Heinze, 2006). For example, SAS will provide an odds ratio estimate of „>999.999‟.
Program logs and/or output sometimes provide a clue although there is much variability
in how the clue may be presented (Zorn, 2005). SAS reports
Quasi-complete separation of data points detected. Warning: the maximum
likelihood estimate may not exist. Warning: The LOGISTIC procedure continues
in spite of the above warning. Results shown are based on the last maximum
likelihood iteration. Validity of the model fit is questionable”. SPSS reports,
“Estimation terminated at iteration ___ because maximum iterations have been
reached. Final solution cannot be found.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
9
The R program reports, “fitted probabilities numerically 0 or 1 occurred in:____, which is
not to say that the observed responses for a variable were all 0‟s or 1‟s. Stata actually
eliminates variables under complete separation to produce a solution and fails to provide
any estimates under quasi-complete separation.
Researchers encountering complete or quasi-complete separation tend to take
arbitrary steps to eliminate the problem. The simplest solution is to delete the predictor
that is responsible for the separation. This assumes that the problem is due to a particular
predictor and not a linear combination (Heinze, 2006). Some researchers insert artificial
data to eliminate separation. Another possibility is to use exact logistic regression as
proposed by Cox (1970); however, this procedure leads to degenerate estimates when
some or all of the predictors are continuous (Heinze & Schemper, 2002). Zorn (2005)
presents many examples of published research that have attempted to resolve separation
issues using such procedures; the procedures have been reiterated by others (Heinze,
2006; Heinze & Schemper, 2002). A more appropriate procedure, penalized maximum
likelihood, is more recent. The procedure is based on unrelated research by Firth (1993)
and was applied to resolving separation issues in logistic regression by Heinze and Ploner
(2003).
It has long been known that maximum likelihood logistic regression estimates are
biased. Firth (1993) refers to the findings of 3.4% asymptotic bias away from the true
value (Copas, 1988). Hence Firth (1993) proceeded to construct a “penalized likelihood”
correction. This correction is a shrinkage quantity intended to remove the bias. The
correction, known in the Bayesian literature as the Jeffrey‟s invariant prior (Zorn, 2005),
adds one-half of the natural logarithm of the information matrix to the log-likelihood
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
10
while concurrently adjusting the Score function. As the log-likelihood declines to zero
over iterations, the adjustment serves to counteract the bias. The resulting penalized
likelihood, log-likelihood, and score equations are displayed below.
( )*  ( ) I |  |
(12)
ln ( )  ln ( )  .5ln I |  |
(13)

 I (  )  
U (  )*  U (  )  .5tr  I (  )1 

   

(14)
*
Heinze and Ploner (2003) found that this correction also eliminates what they
term the “nonconvergence bug” due to separation issues. They noted that the correction
will always provide a finite solution in the presence of separation. Heinze and Schemper
(2002) found 97.5% coverage for the confidence intervals as well as high power. Zorn
(2005) noted that “because they are shrunken towards zero, penalized-likelihood
estimates will typically be smaller in absolute value than standard MLEs, though their
standard errors will also be reduced, yielding similar inferences about the significance of
parameter estimates for those parameters whose MLE is finite (2005, p. 160)”.
This is not to say that penalized maximum likelihood is a panacea to separation
issues. Caution is advised when constructing confidence intervals for the odds ratio. The
penalized estimates are permitted to be asymmetrical leading to inappropriate coverage
for traditional Wald intervals. In this case, the analyst should construct profile intervals
for the penalized likelihood estimates (Heinze, 2006; Heinze & Ploner, 2003; Zorn,
2005). Another caution is to not ignore other modeling problems; for example, penalized
likelihood does not serve to resolve issues of multicollinearity (Heinze & Schemper,
2002).
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
11
The primary difference between Firth‟s correction and those of others (Rubin &
Shenker, 1987; Clogg , Rubin, Schenker, Schultz, & Weidman, 1991) is that the other
procedures are not iterative hence failing to leverage over the course of maximization.
Heinze and Ploner (2004) wrote SAS, S-PLUS, and R macros to implement the penalized
maximum likelihood estimation; the code is freely available at
http://www.meduniwien.ac.at/msi/biometrie/programme/fl/. This code also produces
figures to help determine whether the Wald confidence intervals or profile-based
intervals are more appropriate.
Applied Example
The data for this example were extracted from the Internet Access in U.S. Public
Schools survey of Fall, 2005 (NCES, 2005) collected in the Fall of 2000. The survey was
developed to measure the extent of Internet usage and aspects related to Internet usage
such as the type of connection, control of content access, and integration into curriculum.
Responses were obtained from 1,104 school technology coordinators (or staff members
most knowledgeable about school Internet access). For the purposes of this example,
listwise deletion was used to remove all observations with missing data on one or more of
the variables used as predictors or responses. This resulted in an analyzed sample size of
822. Further listwise deletions were made in order to produce a perfect quasi-complete
separation. This reduced the sample size to 814.
The binary response variable was item Q7DA: Students with Disabilities (0 = No
Access to Internet, 1 = Access to Internet). There were three binary predictors as follows:
NS_LEVEL: School (0 = Secondary, 1 = Elementary), Q9AB: What technologies or
other procedures does your school use to prevent student access to inappropriate material
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
12
on the Internet? – Intranet (0 = No, 1 = Yes) , and Q10: Does your school allow students
to access instructional computers with Internet access at times other than regular school
hours? (0 = no, 1 = yes). In addition, there were three continuous predictors as follows:
Q4A: How many computers in your school currently have Internet access, PCTMIN:
percent minorities enrollment, and FLEP: percent of students eligible or free or reducedprice school lunch. The adjusted variable was NSLEVEL such that the probability of a
student having Internet access was perfectly predictable for elementary schools but not
for secondary schools. In other words, for elementary schools, all students, with
disabilities had Internet access; for secondary schools, not all students with disabilities
had Internet access. If perfect prediction was obtained for elementary schools then the
condition would be complete, not quasi-complete, separation.
Logistic regression was used to analyze these data using SAS PROC LOGISTIC.
Table 1 displays partial SAS output displaying the output warning of quasi-complete
separation. However, the maximum likelihood does appear to exist. Further, the
maximum likelihood estimates appear to be reasonable until one inspects the standard
errors and notices the school-level (NSLEVEL) estimate of 100.8, which is also the
standard error for the intercept. These estimates are displayed in Table 2.
The problem is transparent when inspecting the maximum likelihood estimates for
the odds ratios. The school-level variable has a reported odds ratio estimate of < 0.001
with a 95% Wald confidence interval of (<0.001 , 999.999). Given the warning and the
unreasonable odds ratio estimate, we conclude quasi-nonconvergence for this solution.
These results are displayed in Table 3.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
13
The data were analyzed again using the penalized log-likelihood macro in SAS.
This time there were no nonconvergence warnings. As seen in Table 4, the standard
errors for the intercept and school-level (NSLEVEL) variable are more reasonable. Table
5 suggests that, for school-level, the odds of students with disabilities having Internet
access are 42.415 times higher for elementary schools than for high schools. This is a
considerably more reasonable estimate than the previously found estimate of <.001. It is
also interesting to note that maximum likelihood produced a p-value for school-level of
0.929, which strongly suggests lack of support for this variable as a predictor. However,
the penalized maximum likelihood results indicate that this variable has an extremely low
p-value of 0.007. This tells us that the quasi-nonconvergence should not be ignored.
Conclusion
The primary purpose of this article is to inform applied researchers about a
technique for handling a common problem that occurs when analyzing response variables
that are binary. Logistic regression was reviewed from the perspective of generalized
linear models followed by an extensive treatment of maximum likelihood estimation.
Not all logistic regression models truly converge. Quasi-nonconvergence can arise
due to complete or quasi-complete separation of data in the configuration space. Much
previous research has addressed the problem by eliminating the culprit variables, making
transformations of the data, and even inserting artificial data. It is also suspect that some
research has not addressed the problem at all since statistics software does not provide a
clear indication of the problem or how to handle it.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
14
Firth (1993) proposed penalized likelihood as a mechanism for correcting bias
inherent in logistic regression. Heinze and Ploner (2003) extended penalized likelihood
estimation to resolve nonconvergence in logistic regression due to separation issues.
Their research demonstrated adequate coverage and power for these odds ratio estimates
that always have a finite solution.
It is intended that researchers will be more aware of the potential for quasinonconvergence in logistic regression that may go unnoticed. A converging maximum
likelihood estimation does not guarantee a converging parameter estimate. The
consequence of reporting such results may be incorrect or even nonsensical. Researchers
are encouraged to obtain the macro for penalized maximum likelihood and apply it when
faced with complete or quasi-complete data separation.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
15
References
Agresti, A. (1996). An introduction to categorical data analysis. New York: John Wiley
and Sons.
Albert, A., & Anderson, J. A. (1984). On the existence of maximum likelihood estimates
in logistic regression models, Biometrika, 71, 1-10.
Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991). Multiple
imputation of industry and occupation codes in censu public-use samples using
Bayesian logistic regression. Journal of the American Statistical Association, 86,
68-78.
Copas, J. B. (1988). Binary regression models for contaminated data (with
discussion). Journal of the Royal Statistical Sociate: Series B, 50, 225-265.
Cox, D. (1970). Analysis of Binary Data. New York: Wiley.
Firth, D. (1993). Bias reduction of maximum likelihood estimates. Biometrika, 80, 27-38.
Hair, J. F., Black, W. C., Babin, B. J., Anderson, R. E., & Tatham, R. L. (2006).
Multivariate Data Analysis. (6th ed.). Upper Saddle River, NJ: Perason Education.
Heinze, G. (2006). A comparitive investigation of methods for logistic regression with
separated or nearly separated data. Statistics in Medicine, 25, 4216-4226.
Heinze, G., & Ploner, M. (2004). A SAS macro, S-PLUS library, and R package to
perform logistic regression without convergence problems. Technical Report
2/2004. Medical University of Vienna: Department of Computer Sciences –
Section of Clinical Biometrics, Vienna.
Heinze, G., & Ploner, M. (2003). Fixing the nonconvergence bug in logistic regression
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
16
with SPLUS and SAS. Computer Methods and Programs in Biomedicine, 71,
181-187.
Heinze, G., & Schemper, M. (2002). A solution to the problem of separation in logistic
regression. Statistics in Medicine, 21, 2409-2419.
Kimberly, L. (2007). Who‟s skipping school: Characteristics of truants in 8th and 10th
grade. Journal of School Health, 77, 29-35.
McCullagh, P., &J.A. Nelder. (1989). Generalized Linear Models. Chapman and Hall:
London.
Nelder, J. A. and Wedderburn, R. W. M. (1972). Generalized linear models. Journal of
the Royal Statistical Society: Seris A, 135, 370-384.
Santner, T. J., & Duffy, E. D. (1986). A note on A. Albert and J. A. Anderson‟s
conditions for the existence of maximum likelhood estimates in logistic regression
models. Biometrika, 73, 755-758.
Rubin, D. B., & Schenker, N. (1987). Logit-based interval estimation for binomial data
using the Jeffrey prior. In Sociological Methodology 1987, Ed. C. C. Clogg,
pp.131-44. Washington, DC: American Sociological Association.
So, Y. (1999). A tutorial on logistic regression. SAS Technical Report. Cary, NC.
Suh, S., Suh, J., & Houston, I. (2007). Predictors of categorical at-risk high school
dropouts. Journal of Counseling & Development, 85, 196-203.
Webb, M. C., Wilson, J. R., & Chong, J. (2004). An analysis of quasi-complete binary
data with logistic models: Applications to alcohol abuse data. Journal of Data
Science, 2, 273-285.
Wohlgemuth, D., Whalen, D., Sullivan, J., Nading, C., Mack, S., & Wang, Y. (2007).
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
17
Financial, academic, and environmental influences on the retention and
graduation of students. Journal of College Student Retention: Research, Theory &
Practice, 8, 457-475.
Zorn, C., (2005). A solution to separation in binary response models. Political Analysis,
13, 151-170.
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
Table 1
SAS Output Displaying Warning of Quasi-Complete Separation
Model Convergence Status
Quasi-complete separation of data points detected
Warning: The maximum likelihood estimate may not exist.
Warning: The LOGISTIC procedure continues despite of the above warning.
Results shown are based on the last maximum likelihood estimation
The LOGISTIC procedure
Warning: The validity of the model fit is questionable.
18
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
19
Table 2
SAS Output Displaying ML Logit Estimates
Parameter
Intercept
NSLEVEL 0
Q9AB 0
Q10 0
Q4A
PCTMIN
FLEP
DF
Estimate
Error
1
1
1
1
1
1
1
8.9894
-6.5541
-0.1131
-0.2481
0.0125
0.0012
-0.0179
100.8000
100.8000
0.2142
0.1792
0.0043
0.0070
0.0094
ChiSquare
0.0080
0.0042
0.2788
1.9175
8.2648
0.0299
3.6856
Pr > ChiSq
0.9290
0.0948
0.5975
0.1661
0.0040
0.8627
0.0549
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
Table 3
SAS Output Displaying ML Odds Ratio Estimates
Effect
NSLEVEL
Q9AB
Q10
Q4A
PCTMIN
FLEP
0 vs. 1
0 vs. 1
0 vs. 1
Point
Estimate
<0.001
0.798
0.609
1.013
1.001
0.982
95% Wald
Confidence Limits
<0.001
>999.999
0.345
1.847
0.302
1.229
1.004
1.021
0.988
1.015
0.964
1.000
20
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
21
Table 4
SAS Output Displaying Penalized ML Logit Estimates
FL estimates and Wald confidence limits and tests
NOTE: Confidence interval for Intercept based on Wald method.
Variable
INTERCEP
NSLEVEL
Q9AB
Q4A
PCTMIN
FLEP
Paramater
Estimate
2.812
3.748
0.457
-0.000
0.001
-0.021
Standard
Error
0.382
1.400
0.400
0.000
0.007
0.009
Lower
95% c. l.
2.063
1.012
-0.328
-0.001
-0.013
-0.038
Upper
95% c. l.
3.561
6.483
1.242
0.000
0.014
-0.004
Pr>
Chi-Square
<.0001
0.007
0.254
0.557
0.910
0.019
QUASI-NONCONVERGENCE LOGISTIC REGRESSION
22
Table 5
SAS Output Displaying Penalized ML Odds Ratio Estimates
Variable
NSLEVEL
Q9AB
Q4A
PCTMIN
FLEP
Odds
Estimate
42.415
1.579
0.999
1.009
0.979
Lower
95% c. l.
2.751
0.720
0.999
0.988
0.963
Upper
95% c. l.
653.914
3.462
1.000
1.014
0.997
Pr>
Chi-Square
0.007
0.254
0.556
0.910
0.019