Download Evaluation of Credit Scoring Methods using Data Mining

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Evaluation of Credit Scoring Methods using Data Mining
Mahya Mirzaei
11793306
Table of Contents
Abstract ................................................................................................................................................... 3
Introduction ............................................................................................................................................ 4
Literature Review .................................................................................................................................... 6
Data Mining............................................................................................................................................. 6
Use of Data Mining In the banking industry....................................................................................... 6
Credit scoring .......................................................................................................................................... 7
Classification algorithms for credit scoring............................................................................................. 9
1)
Logistic regression, linear and quadratic discriminant analysis .................................................. 9
2)
Linear programming.................................................................................................................. 10
3)
Support vector machines ........................................................................................................... 10
4)
Neural Networks ....................................................................................................................... 13
5)
Bayesian network classifiers ..................................................................................................... 13
6)
Decision trees and rules ............................................................................................................ 14
Performance criteria for classification .................................................................................................. 14
PCC ................................................................................................................................................... 14
Sensitivity and specificity ................................................................................................................. 15
ROC .................................................................................................................................................. 15
Methods ................................................................................................................................................ 15
Results ................................................................................................................................................... 15
Conclusion ............................................................................................................................................. 16
References ............................................................................................................................................ 17
Abstract
Credit scoring is developed to create an accurate means of distinguishing between good
applicants who are likely to repay and bad applicants who are likely to default. Loan
portfolios of financial institutions are expanding significantly and therefore different
alternatives are required to improve the accuracy of credit scoring. Significant future savings
will be obtained if the accuracy of credit scoring models is improved by even a fraction of a
per cent.
A significant number of classification techniques have been implemented for credit scoring,
however, when comparing the conclusions of some of these studies, often conflicts arise in
regards to what classification technique is more accurate. Therefore it is a very challenging
and difficult issue to determine which classification to use for credit scoring.
The aim of this thesis is to investigate the performance of various state-of-the-art
classification algorithms by applying them to real-life credit scoring data sets. The wellknown classification algorithms such as logistic regression, discriminant analysis, k-nearest
neighbour, neural networks, decision trees and support vector machines will be used and their
suitability and performance will be investigated. The performance will be assessed using the
classification accuracy and the area under the receiver operating characteristic curve.
Statistically significant performance differences will be identified using appropriate test
statistics.
Introduction
In the financial industry, consumers regularly request credit to make purchases. The risk for
financial institutions to extend the requested credit depends on how well they distinguish the
good credit applicants from the bad credit applicants (Abdou & Pointon 2011).
One widely adopted technique for solving this problem is “Credit Scoring.” Credit scoring is
the set of decision models and their underlying techniques that aid lenders in the granting of
consumer credit. These techniques choose the applicants that can receive credit, the amount
of credit they can obtain, and the operational strategies that will enhance the profitability of
the borrowers to the lenders. Furthermore, credit scoring assists in assessing the risk in
lending. Credit scoring is a dependable assessment of an applicant’s credit worthiness since it
is based on actual data (Lee et al. 2002). A lender commonly makes two types of decisions:
first, whether to grant credit to a new applicant or not, and second, how to deal with existing
applicants, including whether to increase their credit limits or not. In both cases, whatever the
techniques used, it is critical that there is a large sample of previous customers with their
application details, behavioural patterns, and subsequent credit history available. Most of the
techniques use this sample to identify the connection between the characteristics of the
consumers (annual income, age, number of years in employment with their current employer,
etc.) and how “good” or “bad” their subsequent history is. Typical application areas in the
consumer market include: credit cards, auto loans, home mortgages, home equity loans, mail
catalogue orders, and a wide variety of personal loan products (Jensen 1992).
Nowadays, financial institutions see their loan portfolios expand and are actively
investigating various alternatives to improve the accuracy of their credit scoring practice.
Even a slight improvement in the accuracy of credit scoring models can result in dramatic
savings (Baesens et al. 2003).
A significant number of classification techniques have been implemented for credit scoring.
The techniques include the following (Baesens et al. 2003):
1) Traditional statistical methods such as discriminate analysis and logistic regression.
2) Non-parametric statistical models such as k-nearest neighbour and decision trees.
3) Neural networks.
When comparing the conclusions of some of these studies, often conflicts arise. As an
example Desai et al (2002) came to the conclusion that neural networks result in a dramatic
improvement in performance compared to linear discriminant analysis for predicting bad
loans whereas Yobas et al (2004) found the opposite. Therefore it is a very challenging and
difficult issue to determine which classification to use for a specific credit scoring dataset.
The objective of this thesis is to conduct a benchmarking study of various classification
techniques on real-life credit scoring data sets to identify the most accurate one. Techniques
that will be implemented are logistic regression, linear and quadratic discriminant analysis,
linear programming, support vector machines, neural networks, naïve Bayes and nearest
neighbour classifications. All techniques will be evaluated in terms of the percentage of
correctly classified observations and the area under thee receiver operating characteristic
curve. ROC basically illustrates the behaviour of a classifier without regards to class
distribution or misclassification cost. Both performance measures will be compared by using
the appropriate test statistics.
The remainder of this report is as follows. A brief introduction to data mining and credit
scoring is given. Next a short overview of the classification techniques that are going to be
used is given. This is followed by a discussion of the classification performance criteria.
Literature Review
Data Mining
Coupling domain expertise and statistical modelling to create solutions for specific problems,
was the traditional approach to data analysis for decision support. However the availability of
multidimensional data and the competitive demand for the creation and utilisation of data
driven analysis, at a timely manner, has forced the traditional approach to change(Apte et al.
2002). In addition, end users require analytics results that are readily understandable and can
be used to gain the necessary insight to make critical decisions. Furthermore, Knowledge
Discovery in Databases (KDD) techniques which focus on reliability, scalability and the full
automation have resulted in these structures being used in addition to and sometimes instead
of the analytical techniques that are human-expert-intensive in order to improve the quality of
the decisions (Apte et al. 2002).
Data mining recognises the potentially useful information from large collected data, which
provides organisations with competing advantage and improves performance. Data mining
can be defined as the extraction of important information from existing data by the use of
which decision making can be improved within an organisation (Jayasree & Vijayalakshmi
Siva Balan 2013).
Data mining has the capability of improving decision making by investigating the
relationships and patterns from the collected data, in addition to reducing the amount of data
needed to be used(Wu et al. 2014) .By the aid of data mining, managers can make more
knowledgeable decisions as data mining allows organisations to focus on the most important
information in the database therefore making it less expensive and time consuming to search
through large amounts of data (Hormozi & Giles 2004).
KDD applications result in improvements in quality of service and profitability due to the
decreased cost of performing business. These improvements have been utilised in many
industries including insurance and banking industries (Pulakkazhy & Balan 2013).
Use of Data Mining In the banking industry
Banking industry utilises the high amount of information collected from customers to gain
competitive advantage and improve the quality of service. The extremely high volume of data
that banks have been collecting over the years significantly affects the success rate of data
mining efforts (Jayasree & Vijayalakshmi Siva Balan 2013).
With the aid of data mining, it is possible to analyse pattern and trends by the use of which
bank executives can more accurately predict issues such as: the reaction of customers to rate
adjustments, the customers that are more likely of acceptance of new product offers, the
riskier customers that are more likely to default a loan and the way to form more profitable
customer relationships. One of the areas in banking that the use of data mining is proving
very effective and useful is risk management (Hormozi & Giles 2004).
Risk management
It is important for bank executives to have the knowledge of whether the customers they are
dealing with are reliable. If banks do not have any knowledge about their customers offering
new customers credit cards, extending lines of credit to existing customers, and approving
loans can be risky decisions for banks. By the aid of data mining, banks that issue credit cards
can determine the customers with a higher likelihood of defaulting their accounts, and
therefore decrease the risks of banks that issue credit cards. An example of this situation is a
bank which by the use of data mining discovered that card holders that drew money at
casinos had an increased rate of bankruptcy and linquency (Hormozi & Giles 2004).
Credit scoring was one of the first financial risk management tools developed (Pulakkazhy &
Balan 2013). When making lending decisions, credit scoring is very valuable to the lenders in
the banking industry. Without an accurate, objective and controllable means of assessing
risks lenders would not have expanded the number of loans the offer (Pulakkazhy & Balan
2013). A profile for good and bad new applicant can be developed based on the examples of
both good and bad loan applicants’ histories. The credit behaviours of individual borrowers
with installement, credit card loans and mortgage can be derived using data mining by
bringing into account different parameters such as credit history, length of residency and
employment. Based on this information a score is produced by the use of which the lender
can evaluate the customer and determine whether a particular customer has a high risk of
default or is a good loan candidate (Hormozi & Giles 2004).
Credit scoring
As discussed in the previous section, effective management of various financial and credit
risks is crucial for bankers who have realised that the operations of the banks affect and are
affected by social, environmental and economic risks. Even though banks face a dramatic
amount of risk from the environment, the environment also presents profitable opportunities.
Risk management is one of the most important factors in the banking sector and the
management of the risk associated with the personal credit decision is one of the key
components of risk management. This involves one of the most vital banking decisions which
require a distinction between customers with good and bad credit (Martens et al. 2007).
Credit scoring can be defined as the utilisation of statistical models to transform relevant data
into numerical measures that guide credit decisions by determining the probability of a
prospective borrower defaulting a loan. Credit evaluation is regarded as one of the most
important processes in the credit management decisions of banks and consists of the
collection, analysis and classification of various credit elements and variables to assess the
credit decisions. Competition, survival and profitability of the banks depend largely in the
quality of the loans they provide. This is due to the fact that without an accurate and
automated risk assessment tool, lenders of consumer credit could not effectively expand their
loan portfolio (Thomas et al 2002).
The behaviour of the two classes of customers (good or bad) provides a historical data that is
vital for the prediction of the behaviour of new applicants. The use of credit scoring to assign
credit to good applicants and distinguish between good and bad credit reduces the cost of
credit processing and the expected risk associated with a bad loan. Therefore the credit
decision is enhanced and time, effort and money is saved. Due to this, credit scoring is
regarded as one of the most important techniques in banks and has become a crucial tool in
the past decade due to the worldwide fast growth of the credit industry and portfolio
management of huge loans (Ong, Huang & Tzeng 2005).
In his studies Crook (1996) stated that in credit evaluation, the features and characteristics of
new loan applicants will be compared with the previous customers that have already paid
their loan. If the prospective customer’s characteristics are close to the previous customers
who were granted a loan and consequently defaulted the application will usually get rejected.
On the other hand the application will be approved if the customer’s characteristics are
adequately similar to the customers who have not defaulted.
Therefore, a quantitative model is derived for the segregation of acceptable and unacceptable
applications, based on the historical experience of the analysts with debtors. Using such a
model, a credit application will be an automatic and self-operating process which can be
applied to all credit applications consistently without any subjectivity, inconsistency and
individual preferences that could motivate decisions if a judgemental technique was used
(Sullivan, 1981; Bailey, 2004).
Credit scoring has sometimes been subject to criticism due to the statistical issues with the
data used to develop the model, in addition to the assumptions of the particular statistical
technique used to derive the point scores. However, the credit scoring model, in spite of these
criticisms is known as the most successful models in the field of finance and business
(Sullivan, 1981; Bailey, 2004).
The limitations of credit scoring models are that their quality is dependant on the original
specification and also the data that they use is historical. Variables or constants (or both) are
assumed to be constant over time which results in the accuracy of model decreasing unless
they are frequently updated. If banks keep records of type I and type II errors and apply a
new or updated model to make the necessary changes, this problem can be reduced. One of
the crucial short falls of the model is that, it can provide two outcomes: prospective borrower
defaults or doesn’t default. However there are a range of possible outcomes from delay in
interest payments to not paying the interest or defaulting on principal and interest. Frequently
the borrower states a problem with payments and the loan terms can be renegotiated. There is
a possibility of including these different outcomes but only two at a time (Heffernan, 2005).
So far it has not been identified what the optimal method to evaluate customers is or what
variables a credit analyst should include to assess their applications and what kind of
information is needed to improve and facilitate the decision-making process. Also the best
measure to predict the loan quality (whether a customer will default or not) and the extent to
which a customer can be classified as good or bad is still unknown.
The other gaps in the literature are that the best statistical technique on the basis of the
highest average correct classification rate or lowest misclassification cost or other evaluation
criteria is not accurately identified. Also, there seems to be little understanding of whether the
predicted credit quality based on conventional techniques adequately compare with those
based on more advanced approaches.
Classification algorithms for credit scoring
Below is a brief overview of the classification algorithms that are going to be used in the
benchmarking study of various classification techniques.
1) Logistic regression, linear and quadratic discriminant analysis
Suppose a training set of N data points D= {(𝑥𝑖 , 𝑦𝑖 )}𝑁
𝑖=1 is given. In this data set, input data is
𝑛
given by 𝑥𝑖 ∈ 𝑅 and the corresponding binary class labels are given by 𝑦𝑖 ∈ {0,1}. In the
logistic regression classification (LOG) is conducted by attempting to estimate the probability
𝑃(𝑦 = 1|𝑥) as follows (Baesens et al. 2003):
𝑃(𝑦 = 1|𝑥) =
1
(1 + exp(−(𝑤0 + 𝒘𝑻 𝒙))
(1)
Where 𝑥𝜖𝑅 𝑛 is an n dimensional input vector, w is the parameter vector and 𝑤0 is the scalar
intercept. The parameters w and 𝑤0 are usually estimated using the maximum likelihood
procedure.
Using discriminant analysis, observation x will be assigned to the class 𝑦𝜖{0,1} that has the
largest posterior probablity 𝑝(𝑦|𝑥) which is calculated using the Bayes’ theorem as follows:
𝑝(𝑦|𝑥) =
𝑃(𝑥|𝑦)𝑃(𝑦)
𝑝(𝑥)
(2)
If it is assumed that the class-conditional distributions p(x|y) are multivariate Gaussian
𝑝(𝑥|𝑦 = 1)
1
𝑛
(2𝜋) 2 (
| ∑1
1
|2 )
exp{−
−1
1
(𝑥 − 𝜇1 )𝑇 ∑ (𝑥 − 𝜇1 )}
2
1
(3)
Where 𝜇1 is the mean vector of class 1 and ∑1 the covariance matrix of class 1, the
classification rule will be as follows: decide y=1 if
−1
−1
1
0
(𝑥 − 𝜇1 )𝑇 ∑ (𝑥 − 𝜇1 ) − (𝑥 − 𝜇0 )𝑇 ∑ (𝑥 − 𝜇0 )
< 2(log(𝑃(𝑦 = 1)) − log(𝑃(𝑦 = 0))) + 𝐿𝑜𝑔| ∑ | − log | ∑ |
0
1
And y=0 otherwise. This classification is called quadratic discriminate analysis (QDA) as the
decision boundary is quadratic in x due to the presence of the quadratic terms
𝑇 −1
𝑥 𝑇 ∑−1
1 𝑥 𝑎𝑛𝑑 −𝑥 ∑0 𝑥 .
The classification is called linear discriminant analysis (LDA) if ∑0 = ∑1 = ∑ which will
result in a simplification in which the quadratic terms cancel and the classification rule turns
into a linear rule with resepct to x.
2) Linear programming
One of the most frequently used techniques for credit scoring in the industry is linear
programming (LP). Below is a very popular formulation (Tsai & Wu 2008):
Subject to
Where ξ represents the vector of ξi values. Separation of the goods from bads is done by the
first set of inequalities which assigns a score 𝑤 𝑇 𝑥𝑖 to them that is higher than the prescribed
cutoff c. The positive slack variables of ξi are entered as the misclassifications need to be
taken into account. Similarly the second inequality separates the bads from the goods by
assigning them a score 𝑤 𝑇 𝑥𝑖 that is lower than the prescribed cutoff c. Different variations of
this method is provided in the literature. For instance, mixed integer programming approach
for classification is suggested by Glen (10) . LP methods can easily model domain knowledge
or a priori bias by including additional constraints. This is one of the main advantages of
using LP methods for credit scoring (9).
3) Support vector machines
Suppose a training set of N data points {(𝑥𝑖 , 𝑦𝑖 )}𝑁
𝑖=1 is given. In this data set, input data is
𝑛
given by 𝑥𝑖 ∈ 𝑅 and the corresponding binary class labels are given by 𝑦𝑖 ∈ {−1, +1}.
According to Vapnik’s orriginal formulation, the SVM classifier, satisfies the following
conditions (Huang, Chen & Wang 2007):
Which is equivalent to:
𝜑(. ) is a nonlinear function that maps the input space to a high dimensional feature space
(possibly infinite). The above inequalities construct a hyperplane 𝑤 𝑇 𝜑(𝑥) + 𝑏 = 0 in this
feauture space, which discriminates between both classes. The figure below visualises this for
a typical two dimensional scenario.
Figure 1. SVM optimisation of the margin in the feature space.
In primal weight space, the classifier takes the following form:
But the classifier is never evaluated in this form. The convex optimisation problem is defined
as below:
Subject to
To allow misclassifications in the set of inequalities (for example because of overlapping
distributions), the slack variables ξi are required. The objective function has two parts, the
first part attempts to maximise the margin between both classes in the feature space, and the
second part tries to minimise the misclassification error. The tuning paramter in the algorithm
is the positive real constant C. The SVM is closely related to the LP formulation and the main
distinctions are the following:
1
1) SVM classifier introduces a large margin term 2 𝑤 𝑇 𝑤 in the objective function.
2) SVM considers a margin to separate the classes.
3) SVM allows for non-linear decision boundaries due to the mapping 𝜑(. ).
The Lagrangian to the constraint optimisation problem is given by
The saddle point of the Lagrangian gives the solution to the optimisation problem above.
Therefore by minimising
with respect to w, b, ξ and maximising it with resepct
to α and v:
And the following will be obtained:
From which we can obtain:
Where 𝐾(𝑥𝑖 , 𝑥) = 𝜑(𝑥𝑖 )𝑇 𝜑(𝑥) is taken with a positive kernel satisfyinf the mercer theorem.
Using the following optimisation problem the Lagrange multipliers αi are then determined:
Subject to
Now the classifier constructiton problem simplifies to a convex quadratic programming
problem (QP) in αi. To determine the decision surface w or 𝜑(𝑥𝑖 ) do not have to be
calculated and thereofre the explicit construction of the non-linear mapping 𝜑(𝑥) is not
required and the kernel function K will be used instead.
4) Neural Networks
Inspired by the functioning of human brain, the mathematical representations called Neural
Networks (NNs) were introduced and many types of NNs have been suggested in the
literature. The most popular NN for clasification is the multilayer perceptron (MLP) (Jensen
1992).
The typical structure of an MLP consists of an input layer, one or more hidden layers and an
output layer, each consisting of several of several neurons. One output value is generated by
each neuron by processing its inputs and the output is then transmitted to the neurons in the
subsequent layer. An example of an MLP with a hidden layer and an output neuron is
provided in the following figure (Jensen 1992).
Figure 2. Architecture of a multilayer perceptron with one hidden layer.
By processing the weighted inputs and its bias term bi(1), the output of hidden neuron I is
computed as follows:
W is the weight matrix and therefore Wij represents the weight connecting input j to hidden
unit i.
5) Bayesian network classifiers
Naïve Bayes is a simple classifer that performs very well in practice. The classifer works by
learning the class conditional probablities 𝑝(𝑥𝑖 |𝑦) of each input xi given the class label y. The
posterior probability of each class y is computed by classifying a new test case using Bayes’
rule given the vector of observed attribute values (Baesens et al. 2003):
𝑝(𝑦|𝑥) =
𝑝(𝑥|𝑦)𝑝(𝑦)
𝑝(𝑥)
The Naïve Bayes classifier assumes that the attributes are conditionally independent given the
class label and therefore this simplifying assumption results in:
By utilising the frequency counts for discrete attributes and a normal or kernel density based
method for the continuos attributes, the probabilities p(xi|y) are estimated (15).
6) Decision trees and rules
A large number of decision tree and rule induction algorithms have been introduced in the
literature and C4.5 algorithm is one of the most popular algorithms (17). Based on
information theoretical concepts, C4.5 induces decision trees. Suppose p1 is the proportion of
examples of class one and p0 the proportion of examples of class 0 in sample S. Then the
entropy of S will be given by the following equation (Baesens et al. 2003):
Where 𝑝0 + 𝑝1 = 1 and entropy reaches its maximum (1) when 𝑝1 = 𝑝0 = 0.5 and minimum
(0) when 𝑝1 = 𝑝0 = 0. The expected reduction in entropy because od splitting on attribute 𝑥𝑖
is defined by Gain(S, xi) and is given by:
Where Sv represents a subsample of S where the attribute x i has one specific value. When the
Gain principle is utilised to decide whether the node needs to split or not, the attributes with
many distinct values are favoured to split.
Performance criteria for classification
The performance of classification algorithms is measured using the following performance
criteria:
PCC
The proportion of correctly classified cases on a sample data is measured by the percentage
correctly classified (PCC) observations. PCC assumes equal misclassification costs for falsepositive and false-negative predictions, which makes using PCC as a performance criterion
problematic, since for most real life problems, one type of misclassification may be
significantly more costly than the other. PCC also presumes that the class distribution (class
priors) is constant over time and relatively balanced (Abdou & Pointon 2011). Therefore, due
to the fact that class distributions and misclassification cost are rarely uniform, the use of
PCC alone is normally inadequate. However, it should be noted that bringing class
distributions and misclassification costs into account can be very difficult as in practice as
these two factors can rarely be specified accurately and can also vary with time (Tsai & Wu
2008).
Sensitivity and specificity
Suppose TP, FP, FN and TN stand for the number of True Positives, False Positives, False
Negatives and True Negatives respectively. The sensitivity is defined as the proportion of
positive examples that are predicted to be positive (TP/(TP+FN)), and specificity measured
the proportion of negative examples that are predicted to be negative (TN/(FP+TN)) (Martens
et al. 2007).
As the threshold on a classifier’s continuous output is varied between its extremes,
sensitivity, specificity and PCC change together (Martens et al. 2007).
ROC
The receiver operating characteristic curve (ROC) is a two dimensional graph that depicts
specificity on the x-axis and sensitivity on the y-axis for different values of classification
threshold. The ROC graph represents the behaviour of a class regardless of the class
distribution or misclassification cost, which means that the classification performance is
given irrespective of these two factors (Tsai & Wu 2008; Lee et al. 2006).
ROC curves of different classifiers can be compared by calculating the area under the
receiver operating characteristic curve (AUC). The AUC can be interpreted as a figure of
merit that provides an estimated of the probability that a randomly chosen instance of class 1
(positive instance) is accurately ranked higher than a randomly selected instance of class 0
(negative instance).
Methods
Case studies will be used to prove the accuracy and the usefulness of the obtained methods.
Results
The results of the study will go in this section.
Conclusion
In conclusion, in the last few decades quantitative methods known as credit scoring models
have been developed for the credit granting decision and have gained significant importance.
The objective of quantitative credit scoring models is to assign credit applicants to one of two
groups: a “good credit” group that is likely to repay the financial obligation, or a “bad credit”
group that should be denied credit because of a high likelihood of defaulting on the financial
obligation. With the growth of the credit industry and the large loan portfolios under
management today, the industry is actively developing more accurate credit scoring models.
Even a fraction of a percent increase in credit scoring accuracy is a significant
accomplishment. This effort is leading to the investigation of nonparametric statistical
methods, classification trees, and neural network technology for credit scoring applications.
The purpose of this research is to investigate the accuracy of the most popular credit scoring
architectures for the credit scoring applications and to benchmark their performance against
the models currently under investigation today.
All techniques will be evaluated in terms of the percentage of correctly classified
observations and the area under thee receiver operating characteristic curve. ROC basically
illustrates the behaviour of a classifier without regards to class distribution or
misclassification cost. Both performance measures will be compared by using the appropriate
test statistics.
References
Abdou, H.A. & Pointon, J. 2011, 'Credit scoring, statistical techniques and evaluation criteria:
A review of the literature', Intelligent Systems in Accounting, Finance and
Management, vol. 18, no. 2-3, pp. 59-88.
Apte, C., Bing, L., Pednault, E.P.D. & Smyth, P. 2002, 'BUSINESS APPLICATIONS OF
DATA MINING', Communications of the ACM, vol. 45, no. 8, pp. 49-53.
Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J. & Vanthienen, J. 2003,
'Benchmarking state-of-the-art classification algorithms for credit scoring', Journal of
the Operational Research Society, vol. 54, no. 6, pp. 627-35.
Hormozi, A.M. & Giles, S. 2004, 'Data mining: a competitive weapon for banking and retail
industries', Information systems management, vol. 21, no. 2, pp. 62-71.
Hsinchun, C., Chiang, R.H.L. & Storey, V.C. 2012, 'BUSINESS INTELLIGENCE AND
ANALYTICS: FROM BIG DATA TO BIG IMPACT', MIS Quarterly, vol. 36, no. 4,
pp. 1165-88.
Huang, C.-L., Chen, M.-C. & Wang, C.-J. 2007, 'Credit scoring with a data mining approach
based on support vector machines', Expert Systems with Applications, vol. 33, no. 4,
pp. 847-56.
Jayasree, V. & Vijayalakshmi Siva Balan, R. 2013, 'A REVIEW ON DATA MINING IN
BANKING SECTOR', American Journal of Applied Sciences, vol. 10, no. 10, pp.
1160-5.
Jensen, H.L. 1992, 'Using neural networks for credit scoring', Managerial Finance, vol. 18,
no. 6, pp. 15-26.
Labrinidis, A. & Jagadish, H. 2012, 'Challenges and opportunities with big data', Proceedings
of the VLDB Endowment, vol. 5, no. 12, pp. 2032-3.
Lee, T.-S., Chiu, C.-C., Chou, Y.-C. & Lu, C.-J. 2006, 'Mining the customer credit using
classification and regression tree and multivariate adaptive regression splines',
Computational Statistics & Data Analysis, vol. 50, no. 4, pp. 1113-30.
Lee, T.-S., Chiu, C.-C., Lu, C.-J. & Chen, I.-F. 2002, 'Credit scoring using the hybrid neural
discriminant technique', Expert Systems with applications, vol. 23, no. 3, pp. 245-54.
Madden, S. 2012, 'From Databases to Big Data', Internet Computing, IEEE, vol. 16, no. 3, pp.
4-6.
Martens, D., Baesens, B., Van Gestel, T. & Vanthienen, J. 2007, 'Comprehensible credit
scoring models using rule extraction from support vector machines', European journal
of operational research, vol. 183, no. 3, pp. 1466-76.
Mervis, J. 2012, 'Agencies Rally to Tackle Big Data', Science, vol. 336, no. 6077, pp. 22-.
Ong, C.-S., Huang, J.-J. & Tzeng, G.-H. 2005, 'Building credit scoring models using genetic
programming', Expert Systems with Applications, vol. 29, no. 1, pp. 41-7.
Pulakkazhy, S. & Balan, R.V.S. 2013, 'DATA MINING IN BANKING AND ITS
APPLICATIONS-A REVIEW', Journal of Computer Science, vol. 9, no. 10, pp.
1252-9.
Pabliyasin, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J. & Vanthienen, J. 2013,
'Benchmarking state-of-the-art classification algorithms for credit scoring', Journal of
the Operational Research Society, vol. 54, no. 6, pp. 627-35.
Shull, F. 2013, 'Getting an Intuition for Big Data', IEEE Software, vol. 30, no. 4, pp. 3-6.
Tsai, C.-F. & Wu, J.-W. 2008, 'Using neural network ensembles for bankruptcy prediction
and credit scoring', Expert Systems with Applications, vol. 34, no. 4, pp. 2639-49.
Wu, X., Zhu, X., Wu, G.-Q. & Ding, W. 2014, 'Data Mining with Big Data', IEEE
Transactions on Knowledge & Data Engineering, vol. 26, no. 1, pp. 97-107.
Wu, C.-L., Chen, M.-C. & Wang, C.-J. 2007, 'Credit scoring with a data mining approach
based on support vector machines', Expert Systems with Applications, vol. 33, no. 4,
pp. 847-56.