Download Fulltext - Brunel University Research Archive

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Agent-based model in biology wikipedia , lookup

Machine learning wikipedia , lookup

Cross-validation (statistics) wikipedia , lookup

Neural modeling fields wikipedia , lookup

Pattern recognition wikipedia , lookup

Affective computing wikipedia , lookup

Mathematical model wikipedia , lookup

Time series wikipedia , lookup

K-nearest neighbors algorithm wikipedia , lookup

Transcript
A systematic credit scoring model based on
heterogeneous classifier ensembles
Maher Ala’raj
Maysam Abbod
Department of Electronic and Computer Engineering
Brunel University London
Uxbridge, UK
maher.ala’[email protected]
Department of Electronic and Computer Engineering
Brunel University London
Uxbridge, UK
[email protected]
Abstract— lending loans to borrowers is considered one of the
main profit sources for banks and financial institutions. Thus,
careful assessment and evaluation should be taken when deciding
to grant credit to potential borrowers. With the rapid growth of
credit industry and the massive volume of financial data,
developing effective credit scoring models is very crucial. The
literature in this area is very dense with models that aim to get
the best predictive performance. Recent studies stressed on using
ensemble models or multiple classifiers over single ones to solve
credit scoring problems. Therefore, this study propose to develop
and introduce a systematic credit scoring model based on
homogenous and heterogeneous classifier ensembles based on
three state-of-the art classifiers: logistic regression (LR), artificial
neural network (ANN) and support vector machines (SVM).
Results revealed that heterogeneous classifier ensembles gives
better predictive performance than homogenous and single
classifiers in terms of average accuracy.
Keywords—credit scoring; LR; ANN; SVM; homogenous
ensembles; heterogeneous ensembles; bagging; majority voting
I. INTRODUCTION
Credit granting is considered a key business activity that
generates profits to banks and financial institutions. However,
it can also be a great source of risk. The recent financial crises
resulted in huge loses globally and increased the attention of
banks and financial institutions on credit risk. Therefore, Basel
Committee for banking supervision [1], requested all banks and
financial institutions that it should implement powerful credit
scoring systems to help them in estimating their credit risk
levels and different risk exposures, as well as improving capital
allocation and credit pricing. As a result, it has become very
important for every bank and financial institution to have
rigorous evaluation procedures when it comes in making a
credit decision to either grant a loan to an applicant or not [2].
Because of the huge amount of consumer data available,
banks and financial institutions have currently a need for
advanced analytical and systematic tools that support their
credit managerial support decisions in order to comply with
the Basel regulatory requirements. Credit scoring systems are
models aligned with assessing financial risks associated with
granting credit for potential applicants [3]. Credit scoring
models are developed to classify and assign future loan
applicants in to two groups either ‘good’ or ‘bad’ with respect
to their characteristics such as age, income and marital
condition [4], [5]. In the light of the applicants score that is
generated from the model, good or accepted applicants are
expected to repay their obligation over the loan period while
bad or rejected applicants are expected to fail on their
obligation over the same loan period. To build a credit scoring
system, data or loans of past granted applicants are collected
in order to capture patterns and relationships between the
characteristics of the applicants and their known loan status
and to identify which are best to distinguish between good and
bad applicants [6], [7]. Generally speaking, credit scoring
models are used to estimate or predict the new applicant’s
credit risk [8] or their probability of default [3] using past loan
applicants information.
Credit scoring models are typically built using traditional or
statistical methods such LR [9], [10], [11] and data mining or
machine learning techniques, such as ANN [11], [12], [13],
[14], [15], [16], SVM [11], [17], [18], [19], [20]. It is
concluded in [9], [21] that data mining techniques are superior
to traditional methods in dealing with credit scoring problems,
especially for nonlinear relationships between variables and for
its generalization abilities. Moreover, Baesens et al., reflected
the need of banks and financial institutions in investigating
new advanced methods in order to improve their credit
granting decisions [11].
Despite of the encouraging results achieved by related
studies, and its ability to hold classification problems, there
was a direction by academic and practitioners for having more
developed models with improved accuracy. However, they
have been developing scoring models based on new advanced
techniques, such hybrid and ensemble techniques which show
superiority over single models [22], [23].
Hybrid methods are mainly based on combining two
different methods together. The first method will be
responsible for per-processing training data and then the
second method will learn from the pre-processed data to output
decision result [22]. While ensemble methods are based on
training multiple or number of classifiers to solve the same
problem, and then their outputs are combined or aggregated by
a method(s) into one classifier to give an enhanced single
output [23].
According to Lessmanna et al., ensemble classifiers models
training can be achieved in two ways [3]:
 Homogenous ensembles: create multiple base models using
the same classification algorithm and combine their
prediction in to one single classifier.
 Heterogeneous ensembles: create base models using
different types of classification algorithms and combine
their prediction in to one single classifier.
In this paper a mixture of individual classifiers,
homogenous and heterogeneous classifier ensembles will be
built based on three mostly used classification techniques in the
literature namely LR, ANN and SVM.
However, the systematic proposed model is made up of
following phases:
 Collect the datasets.
 Splitting data in training, and testing sets.
 Choose and generate the classification algorithms.
 Combine the trained classifiers in to an ensemble.
 Assess the model using performance evaluation measures.
In practice, the results of the proposed ensemble classifier
will be compared against the result of each individual classifier
to prove the superiority of proposed method. Moreover, the
performance of proposed model will be compared to individual
classifiers and homogenous and heterogonous classifiers
ensembles independently.
In conclusion there is no overall best classification
technique used in building credit scoring model, selecting a
model that could classify between classes depends on the
nature or the domain of the problem, data structure, variables
used and the market or the environment it will be developed in
[24], [25], [26] The rest of the paper is organized as follows:
section 2 reviews some related literature on ensemble
techniques in credit scoring. Section 3 represents the research
methodology. Section 4 presents the experimental setup of the
experiments. Section 5 demonstrates experimental work.
Section 6 carries on discussion and analysis of the results.
Finally, section 6 draws the conclusion and future direction of
research.
II.
RELATED LITERATURE
Building well-organized and efficient credit scoring
models is demanded and it is quite a challenging task, due to
the massive growth of credit industry. The number of studies
that have been developing credit scoring models is increasing
massively. Their research incentive is to improve models that
maximize profits, minimize risks and support efficient and
reliable decision making. The research tendency is going
towards hybrid and ensemble modeling, because of its
efficiency in handling the credit scoring problems. As a result,
models are getting more complex, besides to higher costs of
implementation [2]. Due to the huge volume of current
literature in the field of credit scoring, we will focus only on
the studies that implemented ensemble classifiers in
chronological order.
West et al., carried out a study using ANN ensemble over
3 methods (cross-validation, bagging and boosting) to
investigate their performance, accuracy and how it reduce the
generalization error of ANN models, ANN ensembles showed
more accurate results and low generalization error than the
single best ANN classifier [27]. Cross -validation and bagging
were better than boosting in testing generalization error. Tsai
and Wu in their study compared single ANN classifier with
multiple and diversified classifiers in terms of accuracy and
misclassification cost. Multiple classifiers showed better
performance only in one data set [28]. In [29] Nanni and
Lumini investigated the use of ANN Levenberg-Marquardt,
SVM, and k-nn as base classifiers, and then use them as
ensemble classifiers with random subspace, bagging and
boosting methods. Results showed superiority of ensemble in
terms of classification performance. Yu et al. [30] constructed
multi-stage ANN ensemble classifiers based on bagging
sampling method, the ensemble method showed better results
than SVM and LR. Zhang et al., proposed new vertical
bagging ensemble decision tress technique based on attribute
reduction, and then used the reduced data for classification
[31]. Zhou et al., in [32] developed ensemble models based on
LS-SVM in order to reduce bias and improve classification
accuracy.
Hsieh and Hung built a complete efficient ensemble
classifier based on hybrid clustering to reduce data,
discretization to rank data, and screen useless ones then use
the reduced data to fit the ensemble models of ANN, SVM
and Bayesian networks, the classifiers are combined using
confidence-weighted voting [33]. Twala [34] investigated
ANN, Bayesian networks, decision tress, k-nn and LR
classifiers with different noise levels of attributes, and try to
improve the accuracy their ensembles, results revealed that
ensemble classifiers are more accurate at different noise
levels. Yu et al., in another study used ensemble SVM
classifiers to increase performance over single SVM classifier
[35].
The work in [26] tested LR, decision tress, NN, SVM as single
classifiers and ensemble classifiers over ensemble methods
bagging, boosting and stacking. Ensemble methods showed
improvement over single classifiers. Finlay in [36] empirically
investigated several multiple classifiers system using bagging,
boosting, new boosting method called Error Trimmed
boosting. In their study [37] use d dual ensemble strategies
based on bagging and random subspace with decision trees
and compare its performance with other ensemble methods
and classification techniques. Marques et al [38] explored the
behavior of several base classifiers such as ANN, SVM, LR
when used as classifier ensembles, several ensemble methods
were used to construct them, the aim of the study is to to
suggest appropriate classifiers for each ensemble method used.
Again, Marques et al in another study [39] proposed a new
two-level ensemble for credit scoring by combing different
ensembles to obtain better accuracy, both bagging and
boosting were used for data resampling and both random
subspace and rotation forest for attribute resampling, results
were better than traditional single ensembles and individual
classifiers. Tsai developed a novel hybrid ensemble classifiers
based on clustering, homogenous and heterogeneous
ensembles using ANN, LR and decision tress [23]. Results
showed best prediction results when using hybrid with
homogenous and heterogeneous ensembles classifiers with
weighted voting combination.
The above summarized studies focused on using ensemble
classifiers in order to achieve high predictive accuracy rate.
However, it is widely accepted that improvement in prediction
accuracy might translate into future significant savings [24].
It can be observed that vast majority of the studies above
have used ANN, SVM and LR as their base classifiers.
Furthermore, bagging and boosting were the common method
used to create classifier ensembles. The ensemble classifiers
have been covered in the related literature, but the focus were
typically on homogeneous ensembles, while the
heterogeneous ensembles are rarely considered. Out of ten
studies mentioned seven focused on building homogenous
classifier ensembles. Four focused on heterogeneous
classifiers, and one study focused on both.
For this reason, we purpose a model which combines both
homogenous and heterogeneous ensembles using several
ensemble methods and combining strategies to exploit the
most from the views of different classifiers about the data.
Basically, there can be a right model for the right purpose,
and any credit scoring model when it is developed, must take
in consideration the problem and how to fit it based the data
availability, computational capacity and the model should be
easy and understandable on how it works, the model should be
adaptable with all conditions and through time and it should
also extends to be profitable and detect problems [2].
2) Artificial Neural Networks (ANN)
ANN is machine learning technique that mimics the human
brain function. It is made up of interconnected neurons that
process information concurrently, learning and adapting from
historical data [42]. ANN has been comprehensively used in
developing credit scoring models as an alternative to the
traditional statistical approaches such as LR.
As shown in Figure 1 ANN model consists of three layers:
input, hidden and output layers [43], [10]. The structure of the
ANN model for credit scoring problem starts by passing the
attributes of each applicant to input layer to process them, then
they are transferred to the hidden layer for further processing,
then the values are sent to the output layer which gives the
final answer for problem either to give or not to give a loan.
The output is calculated by using weights.
Weights are assigned to each attribute expressing its
relative importance, then all weighted attributes are summed
together and fed to a transfer function (i.e. sigmoid, tansig) to
create output. Output gives a final result based on adjusting
weights iteratively by minimizing the error between the
predicted output and the actual targets [15].
III. RESEARCH METHODOLOGY
A. Classification techniques
1) Logistic Regression (LR)
LR is an extensively used statistical technique which is
popular for solving classification and regression problems
[16]. LR is used to model a binary outcome variable usually
represented by 0 or 1.
Thomas Stated that the outcome of the scoring model have
to be binary (accept/ good loan, 0 or reject/ bad loan, 1) and
this depends on group of independent variables. The LR
formula is expressed as [8]:
log[p(1-p)]= β₀ +β1X 1 + β2X2 + …. + βnXn.
(1)
Where p is the probability of outcome of interest (0/1), β₀ is
the intercept term, and βi represent the coefficient related to
the independent variables Xi (i=1… n), and log [p (1-p)] is the
dependent variable which is the logarithm of ratio of two
probabilities outcome of interest. The objective of LR in credit
scoring is to determine the conditional probability of a specific
input (the characteristics of the customer or features)
belonging to a certain class [40].
Despite that the widely application of LR its accuracy
decreases when the relationship between variables are nonlinear [16]. However, ANN and SVM are introduced to deal
with this issue [41], [21].
Figure 1. ANN topology [28]
The ANN model tend to find the relationship between an
applicant’s probability of default and its given attribute, filter
them and choose among the most important attributes. The
core advantages of ANN models is that it can handle
incomplete, missing and noisy data, no previous assumptions
required about the distribution of the inputs; it can recognize
complex patterns between variables [10].
In contrast ANN has some disadvantages such as the long
training process, lack of theory [12] [10]. Conversely, ANN
has been applied in many application domains such as
medicine, engineering and education [44]. In this study a
feed-forward backpropagation neural network will be
employed.
3) Support Vector Machines (SVM)
SVM are another machine learning technique used in
classification and credit scoring problems. It is been widely
used in the area of credit scoring and other fields for its
superior results [17]. SVM was first proposed by Vapnik [45].
SVM finds an optimal hyperplane that categorize the training
input data in two classes (good and bad loans) figure 5 show
the basis of support vector machines [17].
implement it and it gives good performance results. The
diversification of data to be trained is achieved by creating
different number of bags (n-bags) each bag is filled by data
randomly generated from training dataset, with replication.
For instance each bag could have data selected more than once
or not selected at all, and each bag is trained by only one
classifier.
Figure 2. The basis of support vector machines [17]
As illustrated in Figure 2, the dashed lines parallel to the
optimal hyperplane measures the distance to the solid line.
The dashed lines are called margin, and the training data that
lies on the margins are called support vectors, however, the
SVM finds the tries to find the best optimal hyperplane that
separate the data correctly, so the margin width between the
optimal hyperplane and the support vectors are maximized to
fit the data. Based on the features of the support vectors, it can
be predictable which applicant belongs to good or bad credit.
The strength of the SVM lies in its ability to model non-linear
data, which provides high prediction accuracy compared with
other techniques. In case if the training data are not linear the
SVM use different types of kernels which transform the data
in higher dimensional space. Linear, RBF and Polynomial are
types of kernels SVM use [32]. Unlike NN, SVM is sensitive
towards outliers and noisy data [46].
B. Classifiers ensembles
Ensemble classifiers is are a set multiple classifiers are
trained to solve the same problem, it comprises a set of
individually or single trained classifiers also called the base
classifiers whose predictions are combined in some manner,
usually by voting or weighted voting, when classifying new
instance [47]. in comparison to the individual classifiers, they
only learn and trained on one set of data, but ensemble
classifiers learn and train on diversified sets of data generated
from the original dataset, which in a result will built a set of
hypothesis from the data trained, which will lead to a better
accuracy . Several strategies have been developed to impose
diversity on the classifiers that make up the ensemble [47]
Outlined four basic ways to achieve diversity by using
 Different combination patterns.
 Different or diversified classification models
 Different features subsets.
 Different training sets.
In the following sub-section these popular approaches will
be briefly described.
To ensure diversity of data training amongst classifiers,
several ensemble strategies can be used such as crossvalidation, bagging, boosting, random forest and rotation
forest. Due to space restriction, only the bagging method will
be explained.
1) Bagging
It is a shortcut for bootstrap aggregation [38], it is the one
of the first ensemble learning algorithms. It is easy to
2) Combining ensembles outputs
After training classifiers on diverse data, each one will
produce a prediction; these predictions can be combined
through several ways:
 Simple averaging: where the average of all classifiers
predictions for each sample are calculated, to give a final
prediction.

Majority voting: where all classifiers predictions are
pooled together and the class with the highest vote for
each sample will be chosen as a final output [38].
Bagging usually uses this method
IV.
EXPERIMENTAL SETUP
A. The datasets
Two real –world credit datasets were used to evaluate the
predictive accuracy of the proposed model and compare it
with single classifies, homogenous and heterogeneous
ensembles. German, Australian datasets are the most widely
used datasets by researchers in the credit scoring literature.
The datasets can be are available publicly online from UCI
machine learning depository [49]. Summary of the datasets are
illustrated in Table I.
TABLE I. DESCRIPTION OF THE DATASETS USED IN THE STUDY
Data set
German
Australian
# Loans
# Good Loans
1000
690
700
307
# Bad Loans
300
383
# Attributes
20
14
B. Data Normalization
Each attribute in the dataset contains values that differ in
range. In order to avoid any bias and build up the model with
data within the same interval, data should be transformed from
different scale values to common scale ones. To achieve this,
dataset attributes are normalized to values in the range
between 0 and 1. These transformations are done by taking the
maximum value within each attribute and divide all the values
in the attributes with its maximum value.
C. Performance measurements
The evaluation measurements of this study were embraced
from the traditional performance measures in the area of credit
scoring. These measurements are average accuracy, Type I
and Type II errors [28]. A combination of these measurements
is used, rather than a single measure to measure the predictive
performance of the proposed credit scoring model.
TABLE II. CONFUSION MATRIX FOR CREDIT SCORING [28]
Predicted class (%)
Actual class (%)
Good loans
Bad loans
Good loans
TP
FN (Type II error)
Bad loans
FP (Type I error)
TN
From the confusion matrix table the following calculations
are defined:
Average Accuracy (ACC) = TP+TN/TP+FN+TN+FP (2)
Type I error = FP/ TN+FP
(3)
Type II error = FN/ TP+FN
(4)
A TP stand for good applicant correctly classified as good,
TN stands for bad applicant correctly classified as bad, FN
(Type II) stands for good applicant incorrectly classified as
bad customer and FP (Type I) stands for Bad customer
incorrectly classified as Good customer (high risk).
V. EXPERIMENTAL WORK
The full experiments in this study were executed using
Matlab 2013b version, on a PC with 3.4 GHz, Intel CORE i7
and 8 GB RAM, using Microsoft Windows 7 operating
system. The dataset were divided in two sets: Training set to
train the developed model, and Testing set (unseen data) to
validate the developed model. The distribution of the data was
80% and 20% for training, testing sets respectively. To
minimize the effect of the variability of the training set the
experiments are repeated 100 times based on different training
and testing sets at each run, but all the models are trained and
tested on the same partition of the dataset.
In order to investigate whether the proposed model is
efficient in terms of its predictive performance, different tests
will be conducted independently as follows:
 Testing each single classifier.
 Testing homogenous ensemble for each classifier using
bagging.
 Testing the proposed heterogeneous ensemble model.
 Finally, compare the results achieved based on average
accuracy, Type I and Type II error and draw out
conclusions.
VI. RESULTS AND DISCUSSION
In this section all the results reported were based on the
testing set (unseen data), German dataset includes 140 good
loans and 60 bad loans, while Australian dataset include 61
good loans and 77 bad loans.
To define the best number of neurons in the hidden layer
for the ANN model, we determine it by trial and error test in
order to choose the best number of neurons in the hidden
layer; we tried a variety of neurons ranged from 5 to 30
neurons. The ANN stopped training after reaching the predetermined number of epochs. The ultimate topology was
selected based on the lowest Mean Square Error, and the
highest predictive accuracy, essentially for bad loans. The
activation function used was ‘tan-sigmoid’. However, the
ANN topology was of 20-20-1 for German dataset and 14-281 for Australian dataset.
However for the SVM models, kernel function selection
plays a significant role in the building of SVM [32]. However,
the SVM models was developed using linear kernel function.
Tables III and IV show the results of the individual
classifiers for both German and Australian dataset. The best
single model for German and Australian datasets was LR and
ANN respectively. LR scored 75.50% and ANN scored
86.2%. ANN achieved the lowest standard deviation of 2.46%
for German dataset which reflect the model stability over
several model runs. LR this time got the lowest standard
deviation for Australian dataset by 2.8% showing it is more
stable model despite ANN accuracy. SVM achieved the
lowest accuracy on both datasets, and highest standard
deviations.
In terms of Type I and II error in German dataset, SVM
achieved the lowest Type I error by 28.3% and for the type II
error LR and ANN scored the lowest by 11.4%. The surprising
thing in the single models is that SVM was perfect in
classifying bad loans correctly unlike LR and ANN which
were imperfect. But in classifying good loans it was inferior to
LR and ANN.
For the Australian dataset, ANN showed good
performance in classifying bad loan. For the type II error all
the classifiers achieved the same error rate. This indicates their
good ability in classify good loans correctly in the Australian
set. Tables V and VI show the results of the homogenous
classifier ensembles independently using bagging method with
majority voting combination technique. Again LR ensembles
achieved the highest accuracy in the German dataset by 76%
also it got the highest for the Australian dataset by 86.59%.
This indicates how superior the LR in classifying binary class
problems, unlike what is said about it in the literature.
The superiority of LR ensembles can be seen also in
classifying bad loan at it achieved the lowest type I error for
German dataset, followed by ANN and SVM. Unlike its good
ability in classifying bad loans as single classifier, SVM
achieved the lowest type II error. In Australian dataset SVM
achieved the lowest type I and, LR and ANN achieved lowest
type II error. In general, it can be concluded that using
classifier ensembles can enhance the predictive accuracy of
the model; consequently it is superior to single classifiers in
most of the cases.
Finally, Table VII and VIII shows the result of the proposed
model which is based on combining all the prediction that is
out of the homogenous classifiers in order to exploit the most
from the opinions of the different classifiers about the data
classes. Final results are combined using majority voting.
The heterogeneous ensembles achieved the highest
accuracies in both German and Australian datasets by 80%
and 89.13% respectively. Moreover, it can be concluded that
heterogeneous classifier ensembles is superior in predicting
good loans correctly in both datasets as it achieved the lowest
type II error among all the models. Regarding type I error in
German dataset, the model came after the single SVM in
predicting bad loans correctly. For Australian dataset the
model showed good ability to classify bad loans perfectly as
SVM bagging. However, this gives an insight that combining
different classifiers from different families can enhance the
predictive performance in general.
TABLE III. RESULTS OF INDIVIDUAL CLASSIFIERS (GERMAN)
German dataset LR was achieved the best accuracy of 75.5%.
SVM got the lowest type I error.
TABLE VI. RESULTS OF HOMOGENOUS CLASSIFIER ENSEMBLES
(AUSTRALIAN)
Predicted Class
Classifier
TP
FN (Type
TN
FP
ACC % (SD)
(Type I)
II)
Predicted Class
Classifier
TP
FN (Type
TN
FP
(Type I)
II)
LR_ENS
56
5
64
13
86.59 (2.84)
ANN_ENS
56
5
63
14
86.2 (2.58)
SVM_ENS
52
9
65
12
84.74 (2.76)
ACC % (SD)
LR
124
16
27
33
75.5 (2.5)
ANN
124
16
26
34
75.0 (2.46)
SVM
98
42
43
17
70.5 (2.89)
TABLE IV. RESULTS OF INDIVIDUAL CLASSIFIERS (AUSTRALIAN)
TABLE VII. RESULTS OF HETEROGONOUS CLASSIFIER ENSEMBLES (GERMAN)
Predicted Class
Classifier
TP
TP
FN (Type
TN
FP
ACC % (SD)
HETRO_ENS
(Type I)
II)
TN
LR
56
5
62
15
85.5 (2.8)
ANN
56
5
63
14
86.2 (3.25)
SVM
56
5
60
17
84.05 (2.87)
130
10
FP
ACC % (SD)
(Type I)
II)
Predicted Class
Classifier
FN (Type
30
30
80.0 (2.3)
TABLE VIII. RESULTS OF HETEROGENEOUS CLASSIFIER ENSEMBLES
(AUSTRALIAN)
Predicted Class
TABLE V. RESULTS OF HOMOGENOUS CLASSIFIER ENSEMBLES (GERMAN)
Predicted Class
Classifier
TP
FN (Type
TN
FP
ACC % (SD)
(Type I)
II)
LR_ENS
124
16
28
32
76.0 (2.23)
ANN_ENS
124
16
26
34
75.0 (2.58)
SVM_ENS
125
15
23
37
74.0 (2.25)
I.
CONCLUSIONS AND FUTURE DIRECTIONS
The aim of this study is to build a model based on building
classifiers of heterogeneous ensembles in order to achieve
higher predictive performance. The proposed model was
developed systematically by following the main phases of
building a credit scoring model. It started with collecting
datasets (German and Australian), after collecting the data we
applied splitting method by dividing the data in to training (to
learn the model) and testing (to validate the model) sets, then
classifiers were generated to train the model. Classifiers were
trained individually, and then trained independently using
homogenous ensembles.Finally the outcomes of the each
homogenous classifier were combined in heterogeneous
ensemble to achieve final results.
The models performances were evaluated via average
accuracy, type I and type II error. For the single models, in
Classifier
HETRO_ENS
TP
58
FN (Type
II)
TN
3
65
FP
ACC % (SD)
(Type I)
12
89.13 (2.38)
In the Australian dataset ANN achieved the highest
accuracy by 86.2% and scored the lowest type I error.
Regarding the homogenous ensembles it is clear that in most
of the cases it is better than single models. LR ensembles were
the superior by achieving the highest accuracy (76%, 86.59%),
lowest type I (53.3%) and SVM (15.5%) in German and
Australian dataset respectively, which is more costly to banks.
Heterogeneous ensembles achieved the highest accuracies and
lowest type I among all the testes models. This indicates that
combining different families of classifiers together into one
single model exploits all the opinions generated by the
classifiers to get better and enhanced results. In German
dataset the model achieved 80% and in Australian dataset
89.13 %. It is widely accepted that improvements in prediction
accuracy will translate into significant future savings [24].
Future research directions could be extended by: (1) using
more datasets with different sizes and attributes for extra
validation of the model (2) using different ensemble methods
such as boosting and random forest to learn more on
diversified data subsets (3) using different algorithms methods
such as decision tress and Bayesian classifiers.
REFERENCES
[1]
A. Estrella, "Credit ratings and complementary sources of credit quality
information," 2000.
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
M. Alaraj, M. Abbod and Z. Hunaiti, "Evaluating Consumer Loans
Using Neural Networks Ensembles," International Conference on
Machine Learning, Electrical and Mechanical Engineering, 2014.
S. Lessmanna, H. Seowb, B. Baesenscd and L. C. Thomasd,
"Benchmarking state-of-the-art classification algorithms for credit
scoring: A ten-year update," in Credit Research Centre, Conference
Archive, 2013,
Y. S. Kim and S. Y. Sohn, "Managing loan customers using
misclassification patterns of credit scoring model," Expert Syst. Appl.,
vol. 26, pp. 567-573, 2004.
M. Chen and S. Huang, "Credit scoring and rejected instances
reassigning through evolutionary computation techniques," Expert Syst.
Appl., vol. 24, pp. 433-441, 2003..
T. H. T. Dinh and S. Kleimeier, "A credit scoring model for Vietnam's
retail banking market," International Review of Financial Analysis, vol.
16, pp. 471-495, 2007.
A. Lahsasna, R. N. Ainon and Y. W. Teh, "Credit Scoring Models Using
Soft Computing Methods: A Survey." Int.Arab J.Inf.Technol., vol. 7, pp.
115-123, 2010.
L. C. Thomas, "A survey of credit and behavioural scoring: forecasting
financial risk of lending to consumers," Int. J. Forecast., vol. 16, pp.
149-172, 2000.
V. S. Desai, J. N. Crook and G. A. Overstreet, "A comparison of neural
networks and linear scoring models in the credit union environment,"
Eur. J. Oper. Res., vol. 95, pp. 24-37, 1996.
T. Lee and I. Chen, "A two-stage hybrid credit scoring model using
artificial neural networks and multivariate adaptive regression splines,"
Expert Syst. Appl., vol. 28, pp. 743-752, 2005.
B. Baesens, T. Van Gestel, S. Viaene, M. Stepanova, J. Suykens and J.
Vanthienen, "Benchmarking state-of-the-art classification algorithms for
credit scoring," J. Oper. Res. Soc., vol. 54, pp. 627-635, 2003.
D. West, "Neural network credit scoring models," Comput. Oper. Res.,
vol. 27, pp. 1131-1152, 2000.
A. Khashman, "Neural networks for credit risk evaluation: Investigation
of different neural models and learning schemes," Expert Syst. Appl.,
vol. 37, pp. 6233-6239, 2010.
A. F. Atiya, "Bankruptcy prediction for credit risk using neural
networks: A survey and new results," Neural Networks, IEEE
Transactions on, vol. 12, pp. 929-935, 2001.
R. Malhotra and D. Malhotra, "Evaluating consumer loans using neural
networks," Omega, vol. 31, pp. 83-96, 2003.
S. Akkoç, "An empirical comparison of conventional techniques, neural
networks and the three stage hybrid Adaptive Neuro Fuzzy Inference
System (ANFIS) model for credit scoring analysis: The case of Turkish
credit card data," Eur. J. Oper. Res., vol. 222, pp. 168-178, 2012.
C. Huang, M. Chen and C. Wang, "Credit scoring with a data mining
approach based on support vector machines," Expert Syst. Appl., vol. 33,
pp. 847-856, 2007.
T. Bellotti and J. Crook, "Support vector machines for credit scoring and
discovery of significant features," Expert Syst. Appl., vol. 36, pp. 33023308, 2009.
Y. Wang, S. Wang and K. K. Lai, "A new fuzzy support vector machine
to evaluate credit risk," Fuzzy Systems, IEEE Transactions on, vol. 13,
pp. 820-831, 2005.
I. T. Van Gestel, B. Baesens, I. J. Garcia and P. Van Dijcke, "A support
vector machine approach to credit scoring," in Forum Financier-Revue
Bancaire Et Financiaire Bank En Financiewezen-, 2003, pp. 73-82.
Z. Huang, H. Chen, C. Hsu, W. Chen and S. Wu, "Credit rating analysis
with support vector machines and neural networks: a market
comparative study," Decis. Support Syst., vol. 37, pp. 543-558, 2004.
C. Tsai and M. Chen, "Credit rating by hybrid machine learning
techniques," Applied Soft Computing, vol. 10, pp. 374-380, 2010.
C. Tsai, "Combining cluster analysis with classifier ensembles to predict
financial distress," Information Fusion, vol. 16, pp. 46-58, 2014.
D. J. Hand and W. E. Henley, "Statistical classification methods in
consumer credit scoring: a review," Journal of the Royal Statistical
Society: Series A (Statistics in Society), vol. 160, pp. 523-541, 1997.
[25] H. A. Abdou and J. Pointon, "Credit scoring and decision making in
Egyptian public sector banks," International Journal of Managerial
Finance, vol. 5, pp. 391-406, 2009.
[26] G. Wang, J. Hao, J. Ma and H. Jiang, "A comparative assessment of
ensemble learning for credit scoring," Expert Syst. Appl., vol. 38, pp.
223-230, 2011.
[27] D. West, S. Dellana and J. Qian, "Neural network ensemble strategies
for financial decision applications," Comput. Oper. Res., vol. 32, pp.
2543-2559, 2005.
[28] C. Tsai and J. Wu, "Using neural network ensembles for bankruptcy
prediction and credit scoring," Expert Syst. Appl., vol. 34, pp. 26392649, 2008.
[29] L. Nanni and A. Lumini, "An experimental comparison of ensemble of
classifiers for bankruptcy prediction and credit scoring," Expert Syst.
Appl., vol. 36, pp. 3028-3033, 2009.
[30] L. Yu, S. Wang and K. K. Lai, "Credit risk assessment with a multistage
neural network ensemble learning approach," Expert Syst. Appl., vol. 34,
pp. 1434-1444, 2, 2008.
[31] D. Zhang, X. Zhou, S. C. Leung and J. Zheng, "Vertical bagging
decision trees model for credit scoring," Expert Syst. Appl., vol. 37, pp.
7838-7843, 2010.
[32] L. Zhou, K. K. Lai and L. Yu, "Least squares support vector machines
ensemble models for credit scoring," Expert Syst. Appl., vol. 37, pp.
127-133, 2010.
[33] N. Hsieh and L. Hung, "A data driven ensemble classifier for credit
scoring analysis," Expert Syst. Appl., vol. 37, pp. 534-545, 2010.
[34] B. Twala, "Multiple classifier application to credit risk assessment,"
Expert Syst. Appl., vol. 37, pp. 3326-3336, 2010.
[35] L. Yu, W. Yue, S. Wang and K. K. Lai, "Support vector machine based
multiagent ensemble learning for credit risk evaluation," Expert Syst.
Appl., vol. 37, pp. 1351-1360, 2010.
[36] S. Finlay, "Multiple classifier architectures and their application to credit
risk assessment," Eur. J. Oper. Res., vol. 210, pp. 368-378, 2011.
[37] G. Wang, J. Ma, L. Huang and K. Xu, "Two credit scoring models based
on dual strategy ensemble trees," Knowledge-Based Syst., vol. 26, pp.
61-68, 2012.
[38] A. Marqués, V. García and J. S. Sánchez, "Exploring the behaviour of
base classifiers in credit scoring ensembles," Expert Syst. Appl., vol. 39,
pp. 10244-10250, 2012.
[39] A. I. Marqués, V. García and J. S. Sánchez, "Two-level classifier
ensembles for credit risk assessment," Expert Syst. Appl., vol. 39, pp.
10916-10922, 9/15, 2012.
[40] T. Lee and I. Chen, "A two-stage hybrid credit scoring model using
artificial neural networks and multivariate adaptive regression splines,"
Expert Syst. Appl., vol. 28, pp. 743-752, 2005.
[41] C. Ong, J. Huang and G. Tzeng, "Building credit scoring models using
genetic programming," Expert Syst. Appl., vol. 29, pp. 41-47, 2005.
[42] H. L. Jensen, "Using neural networks for credit scoring," Managerial
Finance, vol. 18, pp. 15-26, 1992.
[43] S. Haykin, "Adaptive filters," Signal Processing Magazine, vol. 6, 1999.
[44] A. Vellido, P. J. Lisboa and J. Vaughan, "Neural networks in business: a
survey of applications (1992–1998)," Expert Syst. Appl., vol. 17, pp. 5170, 1999.
[45] C. Cortes and V. Vapnik, "Support-vector networks," Mach. Learning,
vol. 20, pp. 273-297, 1995.
[46] S. Teng, H. Du, N. Wu, W. Zhang and J. Su, "A cooperative network
intrusion detection based on fuzzy SVMs," Journal of Networks, vol. 5,
pp. 475-483, 2010.
[47] L. I. Kuncheva, Combining Pattern Classifiers: Methods and
Algorithms. John Wiley & Sons, 2004.
[48] Y. Freund, R. Schapire and N. Abe, "A short introduction to boosting,"
Journal-Japanese Society for Artificial Intelligence, vol. 14, pp. 1612,
1999.
[49] A. Asuncion and D. Newman, UCI Machine Learning Repository, 2007.