Download A Study on the Process Optimizing of Bank’s Lending Service

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
A Study on the Process Optimizing of Bank’s Lending Service
HUANG Xiaokun
Schoole of Business Administration, South ChinaUniversity of Technology,Guangzhou, P.R..China,
510640
[email protected]
Abstract The commercial banks can be seen as an enterprise which manufacture loan for firms and
individuals. In the process of lending, credit-scoring model has play an important role in evaluating the
probability of default of the loan applicaiton. In general, credit-scoring models suffer from a
sample-selection bias. This paper uses the bivariate probit approach to estimate an unbiased models
scoring model. The data set with large commercial loans data provided by a commercial bank of China to
estimate the model contains some financial and firm information on both rejected and approved
applicants. In the bivariate probit model, we find the bivariate selection model provides more efficient
estimates than does a single equation mode. The results show that the bivariate probit model can help the
loan committee of the commercial to optimize the process of lending service.
Keywords Lending process, Credit scoring, Sample Selection Bias, Bivariate Probit Selection Model
1 Introduction
Commercial banks have play an important role as economy growth accelerator in China since
opening and reform. As a part of service industry, the banking industry provide financial service for firms
and individuals. Actually, a commercial bank can be regarded as an enterprise which manufactures loans
for the other enterprises or individuals. Generally, in evaluating an application for a large loan, such as
mortgage or a construction loan, the commercial banks will rely on direct, individual scrutiny by a loan
committee.
Firm
Financial
Index
Macroecono
mic Status
Statistical
Model
Loan
Committe
e
Rejected
Loans
Feedback and
Model
Amended
Firm
Ordinary
Data
The Bank’s
Lending
Policy
Accepted
Loans
Bad Loans
Good Loans
Figure 1 The process of bank lending
The loan committee will analyze the frims financial index, ordinary data (such as the number of
employees, registered capital), the macroeconomic status and their bank’s lending policy carefully (see
Fig. 1). And then, they will make a decision wherther accept the loan application acooring to all the
imformation they have. However, this process of lending decision is not optimal, because it lacks
45
efficiency and has more subjective judgement. Many advanced internation commercial banks have use
statistical model to help them to make more proper decision and improve the efficiency in lending
scrutiny. Among many statistical models, credit-scoring model is the most important one.
The objective of most credi-scoring models is to minimize the misclassification rate or the expected
default rate. To achieve this, various statistical methods are used to separate loan applicants that are
expected to pay back their debts from those who are likely to fall into arrears. The most commonly used
statistical methods have been some form of discriminant analysis (DA). The DA model assumes that the
exogenous variables xi are normally distributed but with different means conditional on the group to
which the dependent variable belongs [1]. The objective is then to estimate these means and then predict
which of the group observation with characteristics xi is most likely to come from. DA thus differs from
probit and logit analysis in that the exogenous variables explicitly determine group membership. One
potential weakness of the DA model is that the underlying assumptions are easy to violate. More
important, beside that, the models can only be estimated on samples of granted loans (that means the data
have been extracted from all the loan applications), which causes a sample selection bias in the
parameters estimates [2].
In practice, most credit-scoring models suffer from a sample selection bias because they are
estimated from a sample of granted loans and the criteria by which applicants are rejected are not taken
into account [3]. Many researchers have developed effective approaches to solve the sample selection bias
problem on credit scoring. Boyes, Hoffman and Low (1989) avoided the bias by designing a bivariate
probit model with two sequential events as the dependent variables: the lender’s decision to grant the loan
or not, and — conditional on the loan having been provided — the borrower’s ability to pay it off or not
[4]
. Greene (1998) developed a similar binary choice model for sample selection that is relevant for
modeling credit scoring by commercial banks [5]. Jacobson and Roazbach (2003) also followed the same
methods to analyze the loan default rate of credit cards with bigger data set [6]. All of the former researches
only focused on the loans of credit card which are revolving and have no predetermined maturity of the
loan. However, the large commercial loans are quite different with credit card loans because they have
predetermined maturity.
The contribution of this paper is to augment the usage of credit-scoring models on large commercial
loans and study wherther the credit-scoring model can optimize the lending process of the commercial
bank.. This paper is organized as follows: section 2 presents the bivariate probit selection model. Section
3 presents the empirical analysis. It is divided into two parts. The first describes the data set and the
variables used in the model, and the second focuses on the bivariate sample selection probit model’s
parameter estimates results. Section 4 provides a summary of the results.
2 Econometric Model
In this section, we begin by briefly presenting the bivariate probit selection model. For details, we
refer to Boyes et al.(1989), Greene (1998) and Jacobson et al.(2003). The model consists of two
simultaneous equations, one for the binary decision to provide a loan or not, and another for the binary
outcome, “default” or “not default”. Let the superscript * indicate an unobserved variable and assume
that y1i* and y2i* follow
y1*i = X1i ⋅ α1 + ε1i ,
y2*i = X 2i ⋅ α 2 + ε 2i
i = 1, 2, ⋅⋅⋅, N
(1)
where X ji , j = 1, 2 , are 1× k j vectors of explanatory variable and the disturbances ε1i and ε 2i are assumed
to be Zero-mean, bivariate normal distributed with unit variances and a correlation coefficient ρ .
 ε1 
ε  ~ N
 2 
i
i
1 ρ
0
 0 , ρ 1


46
(2)
If ρ = 0, the selection is of no consequence. So, it does not need to correct the sample selection bias.
The binary choice variable y1i takes value 1 if the loan was granted and 0 if the application was
rejected:
*
0 y 1 < 0
y1 = 
(3)
*
1 y 1 ≥ 0
i
i
i
The second variable, y 2 , takes value 0 if the loan defaults and 1 if not:
i
0 y 2* < 0
=
(4)
*
1 y 2 ≥ 0
Generally, one only observes a loan is good or bad if it was granted. There is not only a censoring rule
for ( y1i , y 2 ) but even an observation rule. Because we have three types of observations: no loans, bad
loans and good loans, the likelihood function will take the following form:
(5)
l = Π prob(no loan) × Π prob(bad loan) × Π prob(good loan)
y2
i
i
i
i
no loans
bad loans
good loans
Where “no loans” represents the loan has rejected, “bad loan” represents the loan defaults, “good
loan” represents the loan does not default.
Combining (3) – (4) and table 1, the likelihood function in equation (5) becomes:
N
N
N
i =1
i =1
i =1
l = Π prob( y1*i < 0)(1− y1i ) × Π prob( y1*i ≥ 0, y2*i ≤ 0) y1i (1− y2 i ) × Π prob( y1*i ≥ 0, y2*i ≥ 0) y1i ⋅ y2 i (6)
Substituting for (1), (6) implies the following loglikelihood functions:
N
N
i =1
i =1
(
ln l = ∑ (1 − y1i ) ⋅ ln [ prob(ε1i < −X1i α1 ) ] + ∑ y1i ⋅ 1 − y2i ) ⋅ ln [ prob(ε1i ≥ −X1i α1 ∩ ε 2i ≤ −X2i α 2 ) ]
(7)
N
+ ∑ y1i ⋅ y2i ⋅ ln [ prob(ε1i ≥ −X1i α1 ∩ ε 2i ≤ − X2i α 2 ) ]
i =1
Because of the symmetry property of the bivariate normal distribution, the last line in (7) can be
rewritten as :
prob(ε1i ≥ −X1i α1 ∩ε2i ≤ −X2iα2 ) ⇔Φ2 (X1i α1, X2i α2 ; ρ)
(8)
∀i , the loglikelihood function can be written as:
N
N
i =1
i =1
(
ln l = ∑(1− y1i ) ⋅ ln [1−Φ(X1i α1 )] + ∑ y1i ⋅ 1 − y2i ) ⋅ ln[ Φ(X1i α1 ) −Φ2 (X1i α1 , X2i α2 ; ρ )]
N
(9)
+ ∑ y1i ⋅ y2i ⋅ ln Φ2 (X1i α1 , X2i α2 ; ρ )
i =1
Where Φ(⋅) and Φ 2 (⋅,⋅, ρ ) represent the univariate and bivariate standard normal cumulative
distribution function, the latter with correlation coeffici- ent ρ .
3 Empirical Analysis
3.1 Data
The original data set consists of 16384 commercial loan contracts at one branch of a major
commercial bank of China between December 1991 and February 2004. These loan contracts include
ordinary commercial loans (such as mortgage loans, pledge loans and credit loans), loans for private
housing, acceptance credit loans, outward documentary loans and discount loans. In the original data set,
the ordinary commercial loans account for more than 70% of the total lending. Because the ordinary
commercial loans are the primary assets and have higher risk in the commercial bank we study on, we
focus on this style of loans and exclude the others. Moreover, in order to study what factors of the firms
affect credit rating, we need to exclude individual loans and only reserve firm loans. Before handing
47
over the combined data for analysis, the name of firms were removed. Finally, we get a data set which
consists of 2798 granted loans and 299 rejected loans.
Variable
CAPT
COMTY
COMOWN
RELAT
ESTATE
RMB
LOANSIZE
RATE
MATURITY
MORTGAGE
APPROVAL
Variable
CAPT
COMTY
COMOWN
RELAT
ESTATE
RMB
LOANSIZE
RATE
MATURITY
MORTGAGE
APPROVAL
Variable
CAPT
COMTY
COMOWN
RELAT
ESTATE
RMB
LOANSIZE
RATE
MATURITY
MORTGAGE
APPROVAL
Table 1 Definition of variables
Definition
The registered capital of a firm (in 10 thousand Yuan)
Dummy, take value 1 if the firm is a join-stock company, otherwise 0
Dummy, take value 1 if the firm is a state-owned enterprise, otherwise 0
Dummy, take value 1 if the firm have some relationship with the bank (such as the bank
holding shares enterprise), otherwise 0
Dummy, take value 1 if the firm is a real estate development enterprise, otherwise 0
Dummy, take value 1 if the loan currency is RMB, otherwise 0
Amount of the loan (in 10 thousand Yuan)
Interest rate of the loan (%)
The maturity of the loan (day)
Dummy, Take value 1 if the loan is a pledge loan or a mortgage loan, otherwise 0
Dummy, take value 1 if the loan was examined and approved by a sub-branch of the bank, take
value 2 if the loan was examined and approved by a branch of the bank, take value 3 if the loan
was examined and approved by the bank headquarter.
Mean
4336
0.26
0.46
0.013
0.11
0.78
484
9.83%
301
0.36
0.95
Table 2 Descriptive statistics for all loans
Rejections (N=153)
Granted loans (N=653)
Stdev
Min
Max
Mean
Stdev
Min
10752
40
12000
6882
17010
15
0.44
0
1
0.20
0.42
0
0.50
0
1
0.32
0.47
0
0.11
0
1
0.005
0.07
0
0.32
0
1
0.12
0.32
0
0.41
0
1
0.90
0.31
0
645
27
5000
1558
2016
1
3.01%
5.841%
24.6%
5.39%
1.35%
2.42%
222
31
1826
553
358
23
0.48
0
1
0.47
0.49
0
0.47
1
3
1.02
0.30
1
Table 3 Descriptive statistics for granted loans
Defaulted loans (N=57)
Good loans (N=596)
Mean
Stdev
Min
Max
Mean
Stdev
Min
5874
10642
100
60000
7642
19778
15
0.26
0.44
0
1
0.20
0.40
0
0.42
0.50
0
1
0.31
0.46
0
0.018
0.13
0
1
0.005
0.07
0
0.18
0.38
0
1
0.11
0.31
0
0.81
0.40
0
1
0.91
0.29
0
609
854
2
4500
2714
12666
1
6.30%
1.77%
3%
7.623%
5.31
1.28
2.42
247
330
23
2192
370
355
66
0.37
0.49
0
1
0.47
0.50
0
1.00
0.22
1
3
1.02
0.32
1
Max
150000
1
1
1
1
1
10000
18%
3652
1
3
Max
200000
1
1
1
1
1
30000
18
3652
1
3
Database includes some useful information, such as guarantee style, loan size, interest rate and loan
maturity, etc., which can be use as important variables when develop statistical model of loan credit
scoring. Otherwise, some information such as the code of loan contract, the starting date of loan and the
name of guarantor, etc. could not use as a determined factor in the model. In total we dispose of 38
variables. Of the 38 variables, 27 were no used in the final estimation of the model described in section 2.
48
Most were disregarded because they lacked the relation with the probability of loan default or displayed
extremely high correlation with another variable that measured approximately the same thing but had
greater explanatory power. The Table 1 contains definitions for the variables that have been selected for
the estimation of the model in Section 2.
Table 2 and table 3 contain descriptive statistics for the variables used in the empirical model in
section 2. Of all loans, 2798 loans are granted loans and 299 are rejected loans. Both of the two types of
loans are firm loan. Of granted loans, 210 loans are defaulted loan and 2588 are good loan.
3.2 Empirical analysis result
We employ Maximum Likelihood Estimation (MLE) to estimate the model mentioned in section 2.
Table 5 present the results of bivariate and single equation probit models,standard errors of the regression
coefficients are included, along with their respective t-statistics.
A univariate probit model, which assumes zero correlation between ε1i and ε 2i (see section 2),
contains the same independent variables as the regression in bivariate probit selection model. This
equation also has a significant overall fit. RMB, LOANSIZE, and MATURITY have positive effect on the
probability of loan default at the 0.05 level of significance, while COMOWN, RATE and APPROVAL
have negative effect. When compare with the bivariate probit selection model, variables CAPT and
MORTGAGE in univariate probit model have no significant impact on the probability of loan default.
The significance estimate of ρ in the bivariate probit model leads to the inference that a sample
selection bias is present in the single equation estimates of probability of loan default. It is an estimate of
the correlation between outcomes after the effects of included variable have been incorporated. The
correlation coefficient takes the value -0.5268, which implies that non-systematic tendencies to hold the
loans in the balance sheet are almost perfectly correlated with non-systematic increases in default risk. In
other words, the elements which were described by the first equation in (1) — in the bank’s business that
increase the loan’s odds of existing on the current balance sheet, are positively related to increases in
default risk that cannot be explained by a systematic relation with the covariate X 2i .
Table 4 Result of MLE
Univariate Probit(N=2798)
Bivariate Probit selection(N=3097)
Parameter
Parameter
Standard
Standard
t – stat.
t – stat.
Error
Error
Estimate( αˆٛ1 )
Estimate( α̂ٛ2 )
0.360225 10.34435 ***
2.342374
8.03E-06 1.383071
1.11E-05
0.176469 -1.402744
-0.210365
0.149876 -2.715010 ***
-0.479586
0.816670 0.051799
-0.800273
0.952175 0.557873
-0.094837
0.227626 0.415307
0.586126
7.50E-05 2.172574 **
0.000330
0.035274 -13.37997 ***
-0.440710
0.000208 -0.514697
0.000557
0.157018 1.018622
0.646626
0.209751
-2.344265***
-0.302108
ρ
——
——
-0.526815
Chi-square
0.0000 *** (p-value)
552.0222
McFadden R2
0.535635
The coefficient estimates correspond to the parameters of model (1).
* ** ***
, ,
represent statistical significance at 10%, 5% and 1 respectively
CONSTANT
CAPT
COMTY
COMOWN
RELAT
ESTATE
RMB
LOANSIZE
RATE
MATURITY
MORTGAGE
APPROVAL
3.726294
1.11E-05
-0.247540
-0.406916
0.042303
0.531193
0.494534
0.000221
-0.471971
-0.000107
0.159942
-0.491712
——
419.6083
%
0.303638 7.714374 ***
4.82E-06 2.302901 ***
0.179794 -1.170034
0.155308 -3.087964 ***
0.897579 -0.891591
0.216255 -0.438541
0.213365 2.747059 ***
7.48E-05 4.415668 ***
0.038318 -11.50136 ***
0.000335 1.664689 *
0.155685 4.153429 ***
0.087666
-3.446106***
0.032439 -5.064592 ***
0.0000 *** (p-value)
0.596983
COMOWN, RMB, LOANSIZE, RATE, MATURITY, and APPROVAL are significant in both
models. We can infer that the nation-owned enterprises have higher default rate than the other enterprise,
the RMB loans have lower default rate than the other currency loans, higher loan size and maturity of the
49
loans will lead to lower default rate, and the higher interest rate will lead to higher default rate. The
regression conclusions in the models are consistent with the experiential judgment of the loan committee
when it examines the risk factor of a loan application. In addition, we found an interesting result in the
models that the loan-approval agency obviously affects the risk of loans. Because the bank headquarters
have more risk management experience and scrutiny skill than the branches and sub-branches, the loans
which were examined and approved by them had lower risk. If the loan-approval agency upgrades one,
the probability of loan default will decrease 30% in the bivariate probit selection model, while it will
decrease 49% in the univariate probit model.
Furthermore, there is another important difference between the two probit model except for the
coefficients and estimation precision, that is variable CAPT and MORTGAGE are significant in bivariate
selection probit model but not in the univariate probit model. However, both of the variables are important
factors for the loan committee to consider lending decision-making. According to the experience of
lending business of this commercial bank, the higher CAPT means the larger scale of a firm will be, thus
it may lead to a lower probability of loan default. When a loan has mortgage, it certainly will reduce the
risk of the loan. Moreover, the average estimated loan default probability in bivariate sample selection
model is 15.3% and the same index in univariate probit model is 21.9%. Therefore, the sample selection
model can help the loan committee to improve not only their scrutiny efficiency but also precision of many
loan applications.
4 Conclusions
In this paper, the bivariate probit selection model has been applied to investigate the large loan credit
rating. From a data set provided by the commercial bank, evidence is found that the loan credit rating
model will suffer a bias estimation when it does not consider the fact of sample selection bias problem. We
develop an econometric model of bivariate probit selection model to study the loan credit. The bivariate
probit selection model has better estimates than the univariate probit model. Because the loan settlement
projects have been excluded out of the current balance sheet, the sample selection model predicts much
lower default rate for the population as a whole (15.3% vs. 21.9%). Our results show that using bivariate
probit model to measure the probability of defult of loan application can help the loan committee of the
commercial to make proper lending decision and optimize the process of lending.
In this instance, there is a lack of extensive financial information such as asset liability ratio and
liquidity ratio of the firms, which are primary variables in developing credit-scoring model but did no
provide in our study by the bank for some reason. So, our empirical sample selection model is not a perfect
one. But it is enough for us to only study the process of lening in spite of the limitation of data. Whatever,
it is important to include the financial variable besides consider the sample selection bias of data when
constructs the credit-scoring model for practice used.
Reference
[1] Carling, K., Jacobson, T., Roszbach, Kf., Dormacy, Risk and expected profits of consumer loans,
Journal of banking and financial, 2001,vol. 25, no.2, pp. 717-739.
[2] Henly, W.E., and D.J. Hand, A K-nearest-neighbor classification for assessing consumer credit risk,
The statistician, 1996,vol. 45, no.1, pp. 77-95.
[3] Robert F. Phillips, Anthony M. J. Yezer, Self-selection and tests for bias and risk in mortgage lending:
Can you price the mortgage if you don’t know the process, The journal of real estate research, 1996,vol.11,
no. 1, pp. 87-102.
[4] Bayes, W.J., Hoffman, D.L., Low, S.A., “An econometric analysis of the bank credit scoring problem”
Journal of Econometric Perspectives, 1989, vol. 40, no. 2, pp. 3-14.
[5] Greene, W., Sample selection in credit-scoring models, Japan and the world Economy, 1998, vol. 10,
no. 3, pp. 299-316.
50
[6] Jacobson, T., Roazbach, K., Bank lending policy, credit scoring and value-at-risk, Journal of Banking
and Finance, 2003, vol. 27, no. 4, pp. 615-633.
[7] Greene, W., Econometric Analysis, 2nd edition, New York: Macmillan, 1993
51