Download The Ridge Regression Estimated Linear Probability Model: A Report

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Bias of an estimator wikipedia , lookup

Data assimilation wikipedia , lookup

Discrete choice wikipedia , lookup

Time series wikipedia , lookup

Regression analysis wikipedia , lookup

Linear regression wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
Introduction
The linear probability model (LPM) is Y = Xb + u, but where Y only takes the values 0
and 1 (Goldberger 1964). It is usually estimated by logit or probit analysis. The LPM, which is
sufficiently well-known to econometricians and statisticians, is discussed in many well-known
texts on econometrics (e.g., Judge, Hill, Griffiths & Lee 1985; Maddala 1992; Gujarati 1995).
The LPM is a heteroscedastic model. So, if E(u) = 0, then each ui has variance E(Yi)(1 – E(Yi)).
Goldberger suggested estimating E(Yi) by ordinary least squares (OLS) and then re-estimating the
model by weighted least squares (WLS) to achieve homoscedasticity. Goldberger’s LPM
estimator is a consistent one (McGillivray 1970), and the problem of getting negative residual
variances is not an asymptotic one (Amemiya 1977), but a finite-sample one, thus, hindering
empirical work.
So, the classic problem with the LPM is that the least squares estimator of b cannot
guarantee that the LPM predictions, which represent conditional probabilities, will lie between 0
and 1. This problem has made the LPM, despite its simplicity, unfashionable. However, over the
years, several researchers have been attracted to the LPM and have proposed interesting methods
to resolve this problem (see Judge et al.; Mullahy 1990). One ad-hoc method simply sets LPM
predictions greater than 1 to a number close to 1 (such as 0.999) and negative LPM predictions to
a number close to 0 (such as 0.001). Another ad-hoc method uses the absolute values of the OLS
estimated residual variances to do the WLS estimation. Goldfeld and Quandt (1972) proposed
only using those observations having OLS estimates between 0 and 1 to do the WLS estimation.
In another approach, the sum of squared errors is minimized subject to the constraints 0  Xb  1
(see Judge & Takayama 1966). Hensher and Johnson (1981) proposed bounding the weights and
assigning negative weights a constant value. More recently, Mullahy proposed a quasi
generalized least squares estimator which is a generalization of the Goldfeldt-Quandt and
Hensher-Johnson estimators.
In the spirit of this tradition, ridge regression (RR) estimation (Hoerl & Kennard 1970;
1990) of the LPM was also proposed (Gana 1995). A detailed account of RR is provided by
Vinod and Ullah (1981), and, more recently, by Gruber (1998). The RR estimator of b, bR , is
given by (XTX + kI )-1XTY , where k  0 is the smallest constant for which all of the resultant LPM
predictions, X bR , lie between 0 and 1. The classical bisection method can be used to calculate
such a value of k. Next, WLS is used to re-estimate b by using the weights X bR ( 1  X bR ). If
any of the resultant WLS estimated LPM predictions fall outside the 0-1 interval, then RR is used
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
1
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
again (this is “weighted” RR) to resolve the problem as before (Gana 1996); let bWR denote this
estimate of b.
The conventional use of RR is to resolve the problem of multicollinearity in the usual
linear model (i.e., where Y is a continuous regressand). Furthermore, Brook and Moore (1980)
have shown that the least squares estimated coefficient vector is much too long on average.
Hence, some shrinkage of the least squares coefficient vector may, in general, be desirable.
Obenchain (1977) has shown that the RR estimator yields the same t-statistics and F-ratios as
does the classical least squares estimator. Saccucci (1985) showed that the RR estimator is also
robust to the effects of outliers under the assumption that the usual linear model has a
multicollinearity problem. Frank and Friedman (1993) have shown that RR employs optimal
linear shrinkage of the coefficient vector as well.
It is interesting to note that when Saccucci’s thesis is applied to the case of the LPM there
is an indication that the RR estimated LPM will be robust to the effects of outliers as well (Gana
1996 – when my 1995 paper was completed, I was not aware of Saccucci’s work). This is
interesting because RR estimation of the LPM does not necessarily require invoking the
assumption of multicollinearity to justify its application. Furthermore, this may be useful in
applied work with 0-1dummy regressands because logit models, for example, are sensitive to
outliers (see Pregibon 1981). Since Saccucci’s work is unpublished, it is reviewed in this report
as an addition to the brief history, presented here, of the flow of ideas that have influenced me
and, thus, allowed me to lay the foundations for the doctoral work of John Monyak done during
the years 1996-1998 at the University of Delaware (UD). Finally, with hindsight, it is easy to
wonder why using RR to estimate the LPM was not thought of in the 1970s, after the invention of
RR.
On Michael Saccucci’s Thesis
Little work has been done on the impact of outliers on RR (Saccucci 1985; Walker &
Birch 1988; Chalton & Troskie 1992). Saccucci considered the case of variance inflated outliers
(VIOs) in the usual linear model where Y is a continuous regressand. A VIO is an observation
whose residual variance is  2 w, where w  1 is a constant. Saccucci assumed that given n
observations, m of them are VIOs each with residual variance  2 w . He assumed that the
remaining n  m observations each have residual variance  2 . Let Xm denote the sub-matrix of X
containing the VIOs.
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
2
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
Saccucci showed that the mean square error (MSE) of the RR estimated b under this
assumption of VIOs, is equal to the MSE of the RR estimated b under the assumption of no
outliers plus  2 (w-1) times the sum of the diagonal elements of the following matrix:
( XTX + kI )-1XmTXm( XTX + kI )-1
where, as usual, I denotes the identity matrix, T denotes the transpose operator, and k  0 is a
constant. Saccucci showed that this matrix (which is the additional MSE for the RR estimator)
decreases monotonically with k. He showed that there always exists a k > 0 for which the MSE
of the RR estimated b under his assumption of VIOs, MSE ( bR  VIOs ), say, is less than the
MSE of the least squares estimated b, bLS, under his assumption of VIOs, MSE ( bLS  VIOs ),
say. That is, in symbols we have:
 k > 0  MSE ( bR  VIOs ) < MSE ( bLS  VIOs )
This can be viewed as a generalization of the original existence theorem of Hoerl and Kennard
(1970). Saccucci used simulation to show that his result holds (with probability > 0.5) for the
values of k selected using the algorithms proposed by Lawless and Wang (1976), Hoerl, Kennard
and Baldwin (1975), Hoerl and Kennard (1976), and Dempster, Schatzoff and Wermuth (1977).
Saccucci allowed w to take the integer values 1 through 10, and m the values 1, 2 and 3. Saccucci
used six known X matrices for his simulation. Five of these were obtained from real data, and the
other was artificially created to illustrate the effects of multicollinearity on bR. The dimensions
(i.e., number of regressors  number of observations) of these X matrices are 3  10, 4  13, 7 
20, 6  16, 10  36 and 19  36. Saccucci generated the true bi values using the formula:
bi = R  ui   (  ui2 )
where i indexes the coefficients, ui is a random uniform number on the interval 0.5, +0.5, and
R is the pre-selected length of the coefficient vector (i.e., R2 = bTb). Saccucci used the following
values of R2: 10.0, 15.8, 25.1, 39.8, 63.1, 100, 158, 251, 398, 631, 1000, 1580, 2510, 3980, 6310,
and 10000. The bi values generated are pairwise uncorrelated and create an approximate, but not
exact, uniform distribution of the vector b over the hypersphere of radius R centered at the origin
(see Lawless and Wang). For each combination of factors over the six data sets, he generated 500
regression simulations.
At this stage of the analysis it is easy to see, as Saccucci did, that embedded in the
additional MSE term is a diagnostic to flag outliers. The additional MSE for an incremental
increase in the variance of observation i is given by  2 xiT(XTX)-2xi , where vector xi denotes row
i of X. Saccucci’s diagnostic is xiT(XTX)-2xi . He indicated, by example, that his diagnostic
(which is related to Cook’s (1977) distance) can flag an outlier which remains undetected by
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
3
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
Cook’s distance. Saccucci ended his dissertation by suggesting that his results could be extended
to the case of m VIOs with distinct variances (i.e., variances of the form  2 wi , where wi > 1),
and that the RR estimator would exhibit similar MSE properties for other types of outliers.
The Next Step
Let us now look at Saccucci’s work in a new way and, thereby, connect it to the idea of
the RR estimated LPM. Each residual, ui, under the LPM has variance xib(1  xib) . Hence, the
LPM can be viewed as a linear model with observations having distinct variances, in Saccucci’s
sense. Therefore, it is not unreasonable to conjecture the following existence property:
 k > 0  MSE ( bR  LPM ) < MSE ( bLS  LPM )
where MSE (   LPM ) denotes the MSE of “ ” under the LPM assumption. Furthermore, if we
can show that this result also holds for the proposed value of k (i.e., the smallest value of k > 0
for which all of the LPM predictions are between 0 and 1), then we have a stronger case for
considering the RR estimated LPM to have some measure of usefulness. Now, Theobald (1974)
had shown that the RR estimator can also improve prediction properties. Thus, we are lead to
conjecture the following property:
 k > 0  MSE ( XbR  LPM ) < MSE ( XbLS  LPM )
Again, this result will be stronger if we can also show that it holds for the proposed value of k.
Investigating these conjectures requires the use of standard results in matrix theory (see,
for example, the text of Rao & Toutenburg 1995) and the use of simulation in the spirit of Hoerl,
Schuenemeyer, and Hoerl (1984). And interestingly enough, even though the proofs use some
non-elementary ideas, they are, like the idea of the LPM itself, simple in style and in execution.
Most of the statistical computing involved can be done using “proc iml” in the SAS System (SAS
Institute, Inc.). The last page of this presentation includes a SAS macro that I wrote to compute
bR without having to invoke “proc iml”. In early 1996 (more than 30 years after Goldberger’s
LPM proposition !), my ideas outlined in this section were imparted to John Monyak, who was
then a doctoral student at UD in search of a thesis topic (Professor John Schuenemeyer of UD,
whom I have known for many years, introduced me to him). He found these ideas interesting and
decided to take up the task of demonstrating their worth.
On John Monyak’s Thesis
Using matrix algebra, Monyak showed that the conjectures outlined above are true. In
the spirit of McGillivray, he showed that the RR estimated LPM is consistent as well. His
simulation results indicate that the RR estimated LPM is superior to the least squares estimated
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
4
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
LPM both in terms of coefficient and prediction MSEs. Monyak’s simulation results indicate that
the best improvement in coefficient and prediction MSEs is achieved by bR (closely followed by
bWR ). This is interesting because it indicates that solving for bR resolves the twin problems of
heteroscedasticity and getting the predictions to lie in the range 0-1.
For his simulation, Monyak used two known X matrices. One is the classic data set of
Spector and Mazzeo (1980) which has 3 regressors and 32 observations. The other is a
modification of an X matrix used by Hoerl, Schuenemeyer and Hoerl, and has 5 regressors and 36
observations. To simulate the large sample case, the sample sizes of the two data sets were
increased (as in Hoerl, Schuenemeyer & Hoerl 1984) to 200 without changing the correlation
structure of XTX . The condition number of XTX took the values 1 (no multicollinearity), 1,000
(“medium” multicollinearity), and 10,000 (“high” multicollinearity). Monyak also used “low”
and “high” levels of heteroscedasticity. For the “low” level he generated b vectors such that 0.1 
Xb  0.9. For the “high” level he generated b vectors in the usual manner so that 0  Xb  1. Due
to computing constraints in generating the b vectors, Monyak limited the number of regressors to
five. For each scenario, 1,000 b vectors were generated. Since, there are 24 combinations (2  2
 3  2) of these factors, he generated 24,000 b vectors in all.
While estimating bWR , using bR , values of X bR that are close to 0 or 1 will tend to
produce large weights. This could lead to large values of k when estimating bWR . So, such a
point was deleted before solving for bWR (Gana 1996). Monyak noticed this phenomenon in his
simulation runs. He noted that the problem was more pronounced when the observation was a
high leverage point (RR changes the leverage of points relative to least squares). Instead of
deleting such a point, Monyak set an upper bound of 100 on the weights (in the spirit of Hensher
and Johnson). He then calculated bWR . Some of his results for the proposed value of k are stated
next. MSE ( bR  LPM ) and MSE ( bWR  LPM ) are 77.5% (standard error = 3.6%) and 80.2%
(standard error = 3.9%) of MSE ( bLS  LPM ), respectively. MSE ( XbR  LPM ) and MSE (
XbWR  LPM ) are 87.5% (standard error = 1.8%) and 87.8% (standard error = 2.0%) of MSE (
XbLS  LPM ), respectively. MSE ( bR  LPM ) < MSE ( bLS  LPM ), 60.5% of times. MSE (
XbR  LPM ) < MSE ( XbLS  LPM ), 66.0% of times. Hence, the probability that the RR
estimated LPM improves upon least squares is greater than 0.5.
When the RR estimated LPM is compared with some of the other proposed LPM
estimators (like the ad-hoc, Goldfeld & Quandt, and Mullahy estimators), Monyak’s simulation
results indicate that its superiority, in terms of MSEs, continues to hold. The Goldfeld-Quandt
and Mullahy methods produce coefficient and prediction vector MSEs that are about 118% each,
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
5
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
of the corresponding least squares MSE values. The first ad hoc method (rounding predictions to
0.999 or 0.001) produces coefficient and prediction MSEs of 98.3% and 98.6% of the
corresponding least squares MSE values, respectively. Least squares produces predictions
outside [ 0,1], 42% of times.
LPM versus Logit and Probit Models: some empirical results
It is natural to ask how the RR estimated LPM compares with logit and probit models
when doing applied work. Three empirical studies (Trusheim & Gana 1994; Gana & Trusheim
1995; Gana & Rossi 1998) have addressed this question. Two of these studies are discussed next.
Two data sets were modeled in the study of Gana and Trusheim. The first set consisted
of data on 296 freshmen who were offered admission to UD for Fall 1991. Of the 296, 128
students enrolled at the University. The aim was to model the college selection process which is
a complex one. The following regressors were used: SAT score, high school GPA, parental
income, ethnicity (White, Black, or other), number of colleges applied to, type of high school
attended (public, independent non-religious, independent Catholic, or other), and a UD “attitude”
score (which is a composite score developed from students’ rating of some 20 college
characteristics as “very important”, “somewhat important”, or “not important”). Assuming that
estimated probabilities greater than 0.43 (128/296) predict an enrolling student (although such
cutoff probabilities are arbitrary), we found that there are virtually no differences between logit,
probit, and the RR estimated LPM.
The second data set consisted of data on 3,215 first-time UD freshmen in Fall 1993 and
their retention to the sophomore year. Of these freshmen, 2,834, or about 88%, returned to UD
for their second year. The aim was to model the probability of retention to the second year.
Clearly, here the distribution of the dependent variable is much more skewed than the distribution
of the dependent variable in the first data set. The following regressors were used: academic
probation status (a 0-1 dummy variable), GPA, ethnicity, gender, and deficit points (a number
between 0 and 30 which students start to receive if their GPA falls below 2.0). Assuming that
estimated probabilities greater than or equal to 0.88 predict retention to the second year, we found
that the RR estimated LPM produced errors (i.e., false positives and negatives) of 17.7%, while
logit and probit models produced 15.4% and 16.6% of errors, respectively. However, when
specific retention probabilities are compared, all three models show close agreement on average.
We also noted that the academic probation status variable was not significant (and had a positive
coefficient) in logit and probit models, but was significant in the RR estimated LPM (and had a
negative coefficient).
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
6
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
In the study of Gana and Rossi, a sample of 95 mortgage loans was selected. Of the 95
loans, 30 loans were delinquent. The aim was to model the probability of delinquency. The
following regressors were used: FICO score (a credit score given by Fair, Issac & Company, San
Rafael, CA), the ratio of the loan amount to the value of the property, the borrower’s income, and
whether or not the property is located in California (0-1 dummy variable). Both the RR estimated
LPM and the weighted RR (WRR) estimated LPM were compared to the logit model. Assuming
that predicted probabilities greater than 30/95 predict a delinquent loan, we found that the RR
estimated LPM, WRR estimated LPM, and the logit model produced errors of 24%, 20%, and
18%, respectively. Next, assuming that predicted probabilities greater than 0.5 predict a
delinquent loan, we found that the RR estimated LPM, WRR estimated LPM, and the logit model
produced errors of 15%, 16%, and 19%, respectively. Finally, the Kolmogorov-Smirnov (KS)
statistic (Smirnov 1939) was computed to measure the degree of separation between the
distributions of predicted probabilities for the delinquent and non-delinquent loans in the sample.
All three models yielded a KS statistic value of 0.60 (this statistic is often used in credit scoring).
Monyak also empirically compared the RR and WRR estimated LPMs with logit and
probit models by using the classic data set of Spector and Mazzeo. He found that the RR and
WRR estimated LPMs, logit, and probit models produced PRESS statistic (Allen 1971) values of
5.3, 5.0, 5.8, and 5.8, respectively and errors of 25%, 19%, 19%, and 19%, respectively, when the
cutoff probability is set equal to the sample mean of the regressand. These error proportions
become 19% each, for all models, when the cutoff probability is set equal to 0.5.
Finally, it should be mentioned that Monyak’s simulation results show that the RR and
WRR estimated LPMs produce errors of 31% and 32%, respectively, when the cutoff probability
is set to 0.5, and errors of 36% each, when the cutoff probability is set to the sample mean of the
regressand. The trends in these results are consistent with the empirical results considered above.
In contrast, the OLS estimated LPM produces errors of 30% when the cutoff probability is 0.5.
These results indicate that the RR estimated LPM is competitive with logit and probit
models.
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
7
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
References
Allen, D.M. (1971): The prediction sum of squares as a criterion for selecting predictor variables,
Technical Report No. 23, Department of Statistics, University of Kentucky.
Amemiya, T. (1977): Some theorems on the linear probability model, International Economic Review.
Brook, R.J. and T. Moore (1980): On the expected length of the least squares coefficient vector, Journal of
Econometrics.
Chalton, D.O. and C.G. Troskie (1992): Identifying of outlying and influential data with biased
estimation: a simulation study, Communications in Statistics – Simulation.
Cook, R.D. (1977): Detection of influential observations in linear regression, Technometrics.
Dempster, A.P., M. Schatzoff, and N. Wermuth (1977): A simulation study of alternatives to ordinary least
squares, Journal of the American Statistical Association.
Frank, I.E. and J.H. Friedman (1993): A statistical view of some chemometrics regression tools,
Technometrics.
Gana, R. (1995): Ridge regression estimation of the linear probability model, Journal of Applied Statistics,
England.
Gana, R. and D.W. Trusheim (1995): An empirical comparison of linear probability, logit, and probit
models of enrollment, presented at the Association for Institutional Research national meeting,
Boston, Massachusetts.
Gana, R. (1996): The effect of influential data on the ridge regression estimated linear probability model,
Proceedings of the Northeast Decision Sciences Institute annual meeting, St. Croix, USA.
Gana, R. and C.V. Rossi (1998): An empirical comparison of linear probability and logit models of
mortgage default, presented at the international conference on “Credit Scoring and Control V”,
University of Edinburgh, UK, September 1997.
Goldberger, A.S. (1964): Econometric Theory, John Wiley, New York.
Goldfeld, S.M. and R.E. Quandt (1972): Nonlinear Methods in Econometrics, North-Holland, Amsterdam.
Gruber, M.H.J. (1998): Improving Efficiency By Shrinkage – The James-Stein and Ridge Regression
Estimators, Marcel Dekker, New York.
Gujarati, D.N. (1995): Basic Econometrics, McGraw-Hill, New York.
Hensher, D.A. and L.W. Johnson (1981): Applied Discrete Choice Modeling, Croom Helm, London.
Hoerl, A.E. and R.W. Kennard (1970): Ridge regression: biased estimation of nonorthogonal problems,
Technometrics.
Hoerl, A.E., R.W. Kennard and K.F. Baldwin (1975): Ridge regression: some simulations,
Communications in Statistics.
Hoerl, A.E. and R.W. Kennard (1976): Ridge regression iterative estimation of the biasing parameter,
Communications in Statistics.
Hoerl, R.W., J.H. Schuenemeyer, and A.E. Hoerl (1984): A simulation of biased estimation and subset
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
8
The Ridge Regression Estimated Linear Probability Model: A Report of Recent Work
regression techniques, Technometrics.
Hoerl, A.E. and R.W. Kennard (1990): Ridge regression: degrees of freedom in the analysis of variance,
Communications in Statistics.
Judge, G. and T. Takayama (1966): Inequality restrictions in regression analysis, Journal of the American
Statistical Association.
Judge, G., C. Hill, W. Griffiths, and T. Lee (1985): The Theory and Practice of Econometrics, John Wiley.
Lawless, J.F. and P. Wang (1976): A simulation study of ridge and other regression estimators,
Communications in Statistics.
Maddala, G.S. (1992): Introduction to Econometrics, Macmillan, New York.
McGillivray, R.G. (1970): Estimating the linear probability function, Econometrica.
Monyak, J.T. (1998): Mean squared error properties of the ridge regression estimated linear probability
model, Ph.D. dissertation, University of Delaware, Newark, Delaware.
Mullahy, J. (1990): Weighted least squares estimation of the linear probability model revisited, Economics
Letters.
Obenchain, R.L. (1977): Classical F-tests and confidence regions for ridge regression, Technometrics.
Pregibon, D. (1981): Logistic Regression Diagnostics, Annals of Statistics.
Rao, C.R. and H. Toutenburg (1995): Linear Models: Least Squares and Alternatives, Springer-Verlag.
Saccucci, M.S. (1985): The effect of variance-inflated outliers on least squares and ridge regression,
unpublished Ph.D. dissertation (supervised by Arthur E. Hoerl), University of Delaware, Newark,
Delaware.
SAS Institute, Inc. The SAS System, Cary, North Carolina.
Smirnov, N.V. (1939): On the estimation of the discrepancy between empirical curves of distribution for
two independent samples, Bulletin of the University of Moscow, Russia.
Spector, L.C. and M. Mazzeo (1980): Probit analysis and economic education, Journal of Economic
Education.
Theobald, C.M. (1974): Generalization of the mean square error applied to ridge regression, Journal of the
Royal Statistical Society.
Trusheim, D.W. and R. Gana (1994): How much can financial aid increase the probability of freshman
enrollment ?, presented at the annual meeting of the Association for Institutional Research, New
Orleans.
Vinod, H.D. and A. Ullah (1981): Recent Advances in Regression Methods, Marcel Dekker.
Walker, E. and J.B. Birch (1988): Influence measures in ridge regression, Technometrics.
Author: Rajaram Gana
Occasion: Invited Presentation, The Philadelphia Chapter of the American Statistical Association Meeting
Date: March 16, 1999, at The Wharton School, University of Pennsylvania, Philadelphia, PA
9