Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Evaluation of Credit Scoring Methods using Data Mining Mahya Mirzaei 11793306 Table of Contents Abstract ................................................................................................................................................... 3 Introduction ............................................................................................................................................ 4 Literature Review .................................................................................................................................... 6 Data Mining............................................................................................................................................. 6 Use of Data Mining In the banking industry....................................................................................... 6 Credit scoring .......................................................................................................................................... 7 Classification algorithms for credit scoring............................................................................................. 9 1) Logistic regression, linear and quadratic discriminant analysis .................................................. 9 2) Linear programming.................................................................................................................. 10 3) Support vector machines ........................................................................................................... 10 4) Neural Networks ....................................................................................................................... 13 5) Bayesian network classifiers ..................................................................................................... 13 6) Decision trees and rules ............................................................................................................ 14 Performance criteria for classification .................................................................................................. 14 PCC ................................................................................................................................................... 14 Sensitivity and specificity ................................................................................................................. 15 ROC .................................................................................................................................................. 15 Methods ................................................................................................................................................ 15 Results ................................................................................................................................................... 15 Conclusion ............................................................................................................................................. 16 References ............................................................................................................................................ 17 Abstract Credit scoring is developed to create an accurate means of distinguishing between good applicants who are likely to repay and bad applicants who are likely to default. Loan portfolios of financial institutions are expanding significantly and therefore different alternatives are required to improve the accuracy of credit scoring. Significant future savings will be obtained if the accuracy of credit scoring models is improved by even a fraction of a per cent. A significant number of classification techniques have been implemented for credit scoring, however, when comparing the conclusions of some of these studies, often conflicts arise in regards to what classification technique is more accurate. Therefore it is a very challenging and difficult issue to determine which classification to use for credit scoring. The aim of this thesis is to investigate the performance of various state-of-the-art classification algorithms by applying them to real-life credit scoring data sets. The wellknown classification algorithms such as logistic regression, discriminant analysis, k-nearest neighbour, neural networks, decision trees and support vector machines will be used and their suitability and performance will be investigated. The performance will be assessed using the classification accuracy and the area under the receiver operating characteristic curve. Statistically significant performance differences will be identified using appropriate test statistics. Introduction In the financial industry, consumers regularly request credit to make purchases. The risk for financial institutions to extend the requested credit depends on how well they distinguish the good credit applicants from the bad credit applicants (Abdou & Pointon 2011). One widely adopted technique for solving this problem is “Credit Scoring.” Credit scoring is the set of decision models and their underlying techniques that aid lenders in the granting of consumer credit. These techniques choose the applicants that can receive credit, the amount of credit they can obtain, and the operational strategies that will enhance the profitability of the borrowers to the lenders. Furthermore, credit scoring assists in assessing the risk in lending. Credit scoring is a dependable assessment of an applicant’s credit worthiness since it is based on actual data (Lee et al. 2002). A lender commonly makes two types of decisions: first, whether to grant credit to a new applicant or not, and second, how to deal with existing applicants, including whether to increase their credit limits or not. In both cases, whatever the techniques used, it is critical that there is a large sample of previous customers with their application details, behavioural patterns, and subsequent credit history available. Most of the techniques use this sample to identify the connection between the characteristics of the consumers (annual income, age, number of years in employment with their current employer, etc.) and how “good” or “bad” their subsequent history is. Typical application areas in the consumer market include: credit cards, auto loans, home mortgages, home equity loans, mail catalogue orders, and a wide variety of personal loan products (Jensen 1992). Nowadays, financial institutions see their loan portfolios expand and are actively investigating various alternatives to improve the accuracy of their credit scoring practice. Even a slight improvement in the accuracy of credit scoring models can result in dramatic savings (Baesens et al. 2003). A significant number of classification techniques have been implemented for credit scoring. The techniques include the following (Baesens et al. 2003): 1) Traditional statistical methods such as discriminate analysis and logistic regression. 2) Non-parametric statistical models such as k-nearest neighbour and decision trees. 3) Neural networks. When comparing the conclusions of some of these studies, often conflicts arise. As an example Desai et al (2002) came to the conclusion that neural networks result in a dramatic improvement in performance compared to linear discriminant analysis for predicting bad loans whereas Yobas et al (2004) found the opposite. Therefore it is a very challenging and difficult issue to determine which classification to use for a specific credit scoring dataset. The objective of this thesis is to conduct a benchmarking study of various classification techniques on real-life credit scoring data sets to identify the most accurate one. Techniques that will be implemented are logistic regression, linear and quadratic discriminant analysis, linear programming, support vector machines, neural networks, naïve Bayes and nearest neighbour classifications. All techniques will be evaluated in terms of the percentage of correctly classified observations and the area under thee receiver operating characteristic curve. ROC basically illustrates the behaviour of a classifier without regards to class distribution or misclassification cost. Both performance measures will be compared by using the appropriate test statistics. The remainder of this report is as follows. A brief introduction to data mining and credit scoring is given. Next a short overview of the classification techniques that are going to be used is given. This is followed by a discussion of the classification performance criteria. Literature Review Data Mining Coupling domain expertise and statistical modelling to create solutions for specific problems, was the traditional approach to data analysis for decision support. However the availability of multidimensional data and the competitive demand for the creation and utilisation of data driven analysis, at a timely manner, has forced the traditional approach to change(Apte et al. 2002). In addition, end users require analytics results that are readily understandable and can be used to gain the necessary insight to make critical decisions. Furthermore, Knowledge Discovery in Databases (KDD) techniques which focus on reliability, scalability and the full automation have resulted in these structures being used in addition to and sometimes instead of the analytical techniques that are human-expert-intensive in order to improve the quality of the decisions (Apte et al. 2002). Data mining recognises the potentially useful information from large collected data, which provides organisations with competing advantage and improves performance. Data mining can be defined as the extraction of important information from existing data by the use of which decision making can be improved within an organisation (Jayasree & Vijayalakshmi Siva Balan 2013). Data mining has the capability of improving decision making by investigating the relationships and patterns from the collected data, in addition to reducing the amount of data needed to be used(Wu et al. 2014) .By the aid of data mining, managers can make more knowledgeable decisions as data mining allows organisations to focus on the most important information in the database therefore making it less expensive and time consuming to search through large amounts of data (Hormozi & Giles 2004). KDD applications result in improvements in quality of service and profitability due to the decreased cost of performing business. These improvements have been utilised in many industries including insurance and banking industries (Pulakkazhy & Balan 2013). Use of Data Mining In the banking industry Banking industry utilises the high amount of information collected from customers to gain competitive advantage and improve the quality of service. The extremely high volume of data that banks have been collecting over the years significantly affects the success rate of data mining efforts (Jayasree & Vijayalakshmi Siva Balan 2013). With the aid of data mining, it is possible to analyse pattern and trends by the use of which bank executives can more accurately predict issues such as: the reaction of customers to rate adjustments, the customers that are more likely of acceptance of new product offers, the riskier customers that are more likely to default a loan and the way to form more profitable customer relationships. One of the areas in banking that the use of data mining is proving very effective and useful is risk management (Hormozi & Giles 2004). Risk management It is important for bank executives to have the knowledge of whether the customers they are dealing with are reliable. If banks do not have any knowledge about their customers offering new customers credit cards, extending lines of credit to existing customers, and approving loans can be risky decisions for banks. By the aid of data mining, banks that issue credit cards can determine the customers with a higher likelihood of defaulting their accounts, and therefore decrease the risks of banks that issue credit cards. An example of this situation is a bank which by the use of data mining discovered that card holders that drew money at casinos had an increased rate of bankruptcy and linquency (Hormozi & Giles 2004). Credit scoring was one of the first financial risk management tools developed (Pulakkazhy & Balan 2013). When making lending decisions, credit scoring is very valuable to the lenders in the banking industry. Without an accurate, objective and controllable means of assessing risks lenders would not have expanded the number of loans the offer (Pulakkazhy & Balan 2013). A profile for good and bad new applicant can be developed based on the examples of both good and bad loan applicants’ histories. The credit behaviours of individual borrowers with installement, credit card loans and mortgage can be derived using data mining by bringing into account different parameters such as credit history, length of residency and employment. Based on this information a score is produced by the use of which the lender can evaluate the customer and determine whether a particular customer has a high risk of default or is a good loan candidate (Hormozi & Giles 2004). Credit scoring As discussed in the previous section, effective management of various financial and credit risks is crucial for bankers who have realised that the operations of the banks affect and are affected by social, environmental and economic risks. Even though banks face a dramatic amount of risk from the environment, the environment also presents profitable opportunities. Risk management is one of the most important factors in the banking sector and the management of the risk associated with the personal credit decision is one of the key components of risk management. This involves one of the most vital banking decisions which require a distinction between customers with good and bad credit (Martens et al. 2007). Credit scoring can be defined as the utilisation of statistical models to transform relevant data into numerical measures that guide credit decisions by determining the probability of a prospective borrower defaulting a loan. Credit evaluation is regarded as one of the most important processes in the credit management decisions of banks and consists of the collection, analysis and classification of various credit elements and variables to assess the credit decisions. Competition, survival and profitability of the banks depend largely in the quality of the loans they provide. This is due to the fact that without an accurate and automated risk assessment tool, lenders of consumer credit could not effectively expand their loan portfolio (Thomas et al 2002). The behaviour of the two classes of customers (good or bad) provides a historical data that is vital for the prediction of the behaviour of new applicants. The use of credit scoring to assign credit to good applicants and distinguish between good and bad credit reduces the cost of credit processing and the expected risk associated with a bad loan. Therefore the credit decision is enhanced and time, effort and money is saved. Due to this, credit scoring is regarded as one of the most important techniques in banks and has become a crucial tool in the past decade due to the worldwide fast growth of the credit industry and portfolio management of huge loans (Ong, Huang & Tzeng 2005). In his studies Crook (1996) stated that in credit evaluation, the features and characteristics of new loan applicants will be compared with the previous customers that have already paid their loan. If the prospective customer’s characteristics are close to the previous customers who were granted a loan and consequently defaulted the application will usually get rejected. On the other hand the application will be approved if the customer’s characteristics are adequately similar to the customers who have not defaulted. Therefore, a quantitative model is derived for the segregation of acceptable and unacceptable applications, based on the historical experience of the analysts with debtors. Using such a model, a credit application will be an automatic and self-operating process which can be applied to all credit applications consistently without any subjectivity, inconsistency and individual preferences that could motivate decisions if a judgemental technique was used (Sullivan, 1981; Bailey, 2004). Credit scoring has sometimes been subject to criticism due to the statistical issues with the data used to develop the model, in addition to the assumptions of the particular statistical technique used to derive the point scores. However, the credit scoring model, in spite of these criticisms is known as the most successful models in the field of finance and business (Sullivan, 1981; Bailey, 2004). The limitations of credit scoring models are that their quality is dependant on the original specification and also the data that they use is historical. Variables or constants (or both) are assumed to be constant over time which results in the accuracy of model decreasing unless they are frequently updated. If banks keep records of type I and type II errors and apply a new or updated model to make the necessary changes, this problem can be reduced. One of the crucial short falls of the model is that, it can provide two outcomes: prospective borrower defaults or doesn’t default. However there are a range of possible outcomes from delay in interest payments to not paying the interest or defaulting on principal and interest. Frequently the borrower states a problem with payments and the loan terms can be renegotiated. There is a possibility of including these different outcomes but only two at a time (Heffernan, 2005). So far it has not been identified what the optimal method to evaluate customers is or what variables a credit analyst should include to assess their applications and what kind of information is needed to improve and facilitate the decision-making process. Also the best measure to predict the loan quality (whether a customer will default or not) and the extent to which a customer can be classified as good or bad is still unknown. The other gaps in the literature are that the best statistical technique on the basis of the highest average correct classification rate or lowest misclassification cost or other evaluation criteria is not accurately identified. Also, there seems to be little understanding of whether the predicted credit quality based on conventional techniques adequately compare with those based on more advanced approaches. Classification algorithms for credit scoring Below is a brief overview of the classification algorithms that are going to be used in the benchmarking study of various classification techniques. 1) Logistic regression, linear and quadratic discriminant analysis Suppose a training set of N data points D= {(𝑥𝑖 , 𝑦𝑖 )}𝑁 𝑖=1 is given. In this data set, input data is 𝑛 given by 𝑥𝑖 ∈ 𝑅 and the corresponding binary class labels are given by 𝑦𝑖 ∈ {0,1}. In the logistic regression classification (LOG) is conducted by attempting to estimate the probability 𝑃(𝑦 = 1|𝑥) as follows (Baesens et al. 2003): 𝑃(𝑦 = 1|𝑥) = 1 (1 + exp(−(𝑤0 + 𝒘𝑻 𝒙)) (1) Where 𝑥𝜖𝑅 𝑛 is an n dimensional input vector, w is the parameter vector and 𝑤0 is the scalar intercept. The parameters w and 𝑤0 are usually estimated using the maximum likelihood procedure. Using discriminant analysis, observation x will be assigned to the class 𝑦𝜖{0,1} that has the largest posterior probablity 𝑝(𝑦|𝑥) which is calculated using the Bayes’ theorem as follows: 𝑝(𝑦|𝑥) = 𝑃(𝑥|𝑦)𝑃(𝑦) 𝑝(𝑥) (2) If it is assumed that the class-conditional distributions p(x|y) are multivariate Gaussian 𝑝(𝑥|𝑦 = 1) 1 𝑛 (2𝜋) 2 ( | ∑1 1 |2 ) exp{− −1 1 (𝑥 − 𝜇1 )𝑇 ∑ (𝑥 − 𝜇1 )} 2 1 (3) Where 𝜇1 is the mean vector of class 1 and ∑1 the covariance matrix of class 1, the classification rule will be as follows: decide y=1 if −1 −1 1 0 (𝑥 − 𝜇1 )𝑇 ∑ (𝑥 − 𝜇1 ) − (𝑥 − 𝜇0 )𝑇 ∑ (𝑥 − 𝜇0 ) < 2(log(𝑃(𝑦 = 1)) − log(𝑃(𝑦 = 0))) + 𝐿𝑜𝑔| ∑ | − log | ∑ | 0 1 And y=0 otherwise. This classification is called quadratic discriminate analysis (QDA) as the decision boundary is quadratic in x due to the presence of the quadratic terms 𝑇 −1 𝑥 𝑇 ∑−1 1 𝑥 𝑎𝑛𝑑 −𝑥 ∑0 𝑥 . The classification is called linear discriminant analysis (LDA) if ∑0 = ∑1 = ∑ which will result in a simplification in which the quadratic terms cancel and the classification rule turns into a linear rule with resepct to x. 2) Linear programming One of the most frequently used techniques for credit scoring in the industry is linear programming (LP). Below is a very popular formulation (Tsai & Wu 2008): Subject to Where ξ represents the vector of ξi values. Separation of the goods from bads is done by the first set of inequalities which assigns a score 𝑤 𝑇 𝑥𝑖 to them that is higher than the prescribed cutoff c. The positive slack variables of ξi are entered as the misclassifications need to be taken into account. Similarly the second inequality separates the bads from the goods by assigning them a score 𝑤 𝑇 𝑥𝑖 that is lower than the prescribed cutoff c. Different variations of this method is provided in the literature. For instance, mixed integer programming approach for classification is suggested by Glen (10) . LP methods can easily model domain knowledge or a priori bias by including additional constraints. This is one of the main advantages of using LP methods for credit scoring (9). 3) Support vector machines Suppose a training set of N data points {(𝑥𝑖 , 𝑦𝑖 )}𝑁 𝑖=1 is given. In this data set, input data is 𝑛 given by 𝑥𝑖 ∈ 𝑅 and the corresponding binary class labels are given by 𝑦𝑖 ∈ {−1, +1}. According to Vapnik’s orriginal formulation, the SVM classifier, satisfies the following conditions (Huang, Chen & Wang 2007): Which is equivalent to: 𝜑(. ) is a nonlinear function that maps the input space to a high dimensional feature space (possibly infinite). The above inequalities construct a hyperplane 𝑤 𝑇 𝜑(𝑥) + 𝑏 = 0 in this feauture space, which discriminates between both classes. The figure below visualises this for a typical two dimensional scenario. Figure 1. SVM optimisation of the margin in the feature space. In primal weight space, the classifier takes the following form: But the classifier is never evaluated in this form. The convex optimisation problem is defined as below: Subject to To allow misclassifications in the set of inequalities (for example because of overlapping distributions), the slack variables ξi are required. The objective function has two parts, the first part attempts to maximise the margin between both classes in the feature space, and the second part tries to minimise the misclassification error. The tuning paramter in the algorithm is the positive real constant C. The SVM is closely related to the LP formulation and the main distinctions are the following: 1 1) SVM classifier introduces a large margin term 2 𝑤 𝑇 𝑤 in the objective function. 2) SVM considers a margin to separate the classes. 3) SVM allows for non-linear decision boundaries due to the mapping 𝜑(. ). The Lagrangian to the constraint optimisation problem is given by The saddle point of the Lagrangian gives the solution to the optimisation problem above. Therefore by minimising with respect to w, b, ξ and maximising it with resepct to α and v: And the following will be obtained: From which we can obtain: Where 𝐾(𝑥𝑖 , 𝑥) = 𝜑(𝑥𝑖 )𝑇 𝜑(𝑥) is taken with a positive kernel satisfyinf the mercer theorem. Using the following optimisation problem the Lagrange multipliers αi are then determined: Subject to Now the classifier constructiton problem simplifies to a convex quadratic programming problem (QP) in αi. To determine the decision surface w or 𝜑(𝑥𝑖 ) do not have to be calculated and thereofre the explicit construction of the non-linear mapping 𝜑(𝑥) is not required and the kernel function K will be used instead. 4) Neural Networks Inspired by the functioning of human brain, the mathematical representations called Neural Networks (NNs) were introduced and many types of NNs have been suggested in the literature. The most popular NN for clasification is the multilayer perceptron (MLP) (Jensen 1992). The typical structure of an MLP consists of an input layer, one or more hidden layers and an output layer, each consisting of several of several neurons. One output value is generated by each neuron by processing its inputs and the output is then transmitted to the neurons in the subsequent layer. An example of an MLP with a hidden layer and an output neuron is provided in the following figure (Jensen 1992). Figure 2. Architecture of a multilayer perceptron with one hidden layer. By processing the weighted inputs and its bias term bi(1), the output of hidden neuron I is computed as follows: W is the weight matrix and therefore Wij represents the weight connecting input j to hidden unit i. 5) Bayesian network classifiers Naïve Bayes is a simple classifer that performs very well in practice. The classifer works by learning the class conditional probablities 𝑝(𝑥𝑖 |𝑦) of each input xi given the class label y. The posterior probability of each class y is computed by classifying a new test case using Bayes’ rule given the vector of observed attribute values (Baesens et al. 2003): 𝑝(𝑦|𝑥) = 𝑝(𝑥|𝑦)𝑝(𝑦) 𝑝(𝑥) The Naïve Bayes classifier assumes that the attributes are conditionally independent given the class label and therefore this simplifying assumption results in: By utilising the frequency counts for discrete attributes and a normal or kernel density based method for the continuos attributes, the probabilities p(xi|y) are estimated (15). 6) Decision trees and rules A large number of decision tree and rule induction algorithms have been introduced in the literature and C4.5 algorithm is one of the most popular algorithms (17). Based on information theoretical concepts, C4.5 induces decision trees. Suppose p1 is the proportion of examples of class one and p0 the proportion of examples of class 0 in sample S. Then the entropy of S will be given by the following equation (Baesens et al. 2003): Where 𝑝0 + 𝑝1 = 1 and entropy reaches its maximum (1) when 𝑝1 = 𝑝0 = 0.5 and minimum (0) when 𝑝1 = 𝑝0 = 0. The expected reduction in entropy because od splitting on attribute 𝑥𝑖 is defined by Gain(S, xi) and is given by: Where Sv represents a subsample of S where the attribute x i has one specific value. When the Gain principle is utilised to decide whether the node needs to split or not, the attributes with many distinct values are favoured to split. Performance criteria for classification The performance of classification algorithms is measured using the following performance criteria: PCC The proportion of correctly classified cases on a sample data is measured by the percentage correctly classified (PCC) observations. PCC assumes equal misclassification costs for falsepositive and false-negative predictions, which makes using PCC as a performance criterion problematic, since for most real life problems, one type of misclassification may be significantly more costly than the other. PCC also presumes that the class distribution (class priors) is constant over time and relatively balanced (Abdou & Pointon 2011). Therefore, due to the fact that class distributions and misclassification cost are rarely uniform, the use of PCC alone is normally inadequate. However, it should be noted that bringing class distributions and misclassification costs into account can be very difficult as in practice as these two factors can rarely be specified accurately and can also vary with time (Tsai & Wu 2008). Sensitivity and specificity Suppose TP, FP, FN and TN stand for the number of True Positives, False Positives, False Negatives and True Negatives respectively. The sensitivity is defined as the proportion of positive examples that are predicted to be positive (TP/(TP+FN)), and specificity measured the proportion of negative examples that are predicted to be negative (TN/(FP+TN)) (Martens et al. 2007). As the threshold on a classifier’s continuous output is varied between its extremes, sensitivity, specificity and PCC change together (Martens et al. 2007). ROC The receiver operating characteristic curve (ROC) is a two dimensional graph that depicts specificity on the x-axis and sensitivity on the y-axis for different values of classification threshold. The ROC graph represents the behaviour of a class regardless of the class distribution or misclassification cost, which means that the classification performance is given irrespective of these two factors (Tsai & Wu 2008; Lee et al. 2006). ROC curves of different classifiers can be compared by calculating the area under the receiver operating characteristic curve (AUC). The AUC can be interpreted as a figure of merit that provides an estimated of the probability that a randomly chosen instance of class 1 (positive instance) is accurately ranked higher than a randomly selected instance of class 0 (negative instance). Methods Case studies will be used to prove the accuracy and the usefulness of the obtained methods. Results The results of the study will go in this section. Conclusion In conclusion, in the last few decades quantitative methods known as credit scoring models have been developed for the credit granting decision and have gained significant importance. The objective of quantitative credit scoring models is to assign credit applicants to one of two groups: a “good credit” group that is likely to repay the financial obligation, or a “bad credit” group that should be denied credit because of a high likelihood of defaulting on the financial obligation. With the growth of the credit industry and the large loan portfolios under management today, the industry is actively developing more accurate credit scoring models. Even a fraction of a percent increase in credit scoring accuracy is a significant accomplishment. This effort is leading to the investigation of nonparametric statistical methods, classification trees, and neural network technology for credit scoring applications. The purpose of this research is to investigate the accuracy of the most popular credit scoring architectures for the credit scoring applications and to benchmark their performance against the models currently under investigation today. All techniques will be evaluated in terms of the percentage of correctly classified observations and the area under thee receiver operating characteristic curve. ROC basically illustrates the behaviour of a classifier without regards to class distribution or misclassification cost. Both performance measures will be compared by using the appropriate test statistics. References Abdou, H.A. & Pointon, J. 2011, 'Credit scoring, statistical techniques and evaluation criteria: A review of the literature', Intelligent Systems in Accounting, Finance and Management, vol. 18, no. 2-3, pp. 59-88. Apte, C., Bing, L., Pednault, E.P.D. & Smyth, P. 2002, 'BUSINESS APPLICATIONS OF DATA MINING', Communications of the ACM, vol. 45, no. 8, pp. 49-53. Baesens, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J. & Vanthienen, J. 2003, 'Benchmarking state-of-the-art classification algorithms for credit scoring', Journal of the Operational Research Society, vol. 54, no. 6, pp. 627-35. Hormozi, A.M. & Giles, S. 2004, 'Data mining: a competitive weapon for banking and retail industries', Information systems management, vol. 21, no. 2, pp. 62-71. Hsinchun, C., Chiang, R.H.L. & Storey, V.C. 2012, 'BUSINESS INTELLIGENCE AND ANALYTICS: FROM BIG DATA TO BIG IMPACT', MIS Quarterly, vol. 36, no. 4, pp. 1165-88. Huang, C.-L., Chen, M.-C. & Wang, C.-J. 2007, 'Credit scoring with a data mining approach based on support vector machines', Expert Systems with Applications, vol. 33, no. 4, pp. 847-56. Jayasree, V. & Vijayalakshmi Siva Balan, R. 2013, 'A REVIEW ON DATA MINING IN BANKING SECTOR', American Journal of Applied Sciences, vol. 10, no. 10, pp. 1160-5. Jensen, H.L. 1992, 'Using neural networks for credit scoring', Managerial Finance, vol. 18, no. 6, pp. 15-26. Labrinidis, A. & Jagadish, H. 2012, 'Challenges and opportunities with big data', Proceedings of the VLDB Endowment, vol. 5, no. 12, pp. 2032-3. Lee, T.-S., Chiu, C.-C., Chou, Y.-C. & Lu, C.-J. 2006, 'Mining the customer credit using classification and regression tree and multivariate adaptive regression splines', Computational Statistics & Data Analysis, vol. 50, no. 4, pp. 1113-30. Lee, T.-S., Chiu, C.-C., Lu, C.-J. & Chen, I.-F. 2002, 'Credit scoring using the hybrid neural discriminant technique', Expert Systems with applications, vol. 23, no. 3, pp. 245-54. Madden, S. 2012, 'From Databases to Big Data', Internet Computing, IEEE, vol. 16, no. 3, pp. 4-6. Martens, D., Baesens, B., Van Gestel, T. & Vanthienen, J. 2007, 'Comprehensible credit scoring models using rule extraction from support vector machines', European journal of operational research, vol. 183, no. 3, pp. 1466-76. Mervis, J. 2012, 'Agencies Rally to Tackle Big Data', Science, vol. 336, no. 6077, pp. 22-. Ong, C.-S., Huang, J.-J. & Tzeng, G.-H. 2005, 'Building credit scoring models using genetic programming', Expert Systems with Applications, vol. 29, no. 1, pp. 41-7. Pulakkazhy, S. & Balan, R.V.S. 2013, 'DATA MINING IN BANKING AND ITS APPLICATIONS-A REVIEW', Journal of Computer Science, vol. 9, no. 10, pp. 1252-9. Pabliyasin, B., Van Gestel, T., Viaene, S., Stepanova, M., Suykens, J. & Vanthienen, J. 2013, 'Benchmarking state-of-the-art classification algorithms for credit scoring', Journal of the Operational Research Society, vol. 54, no. 6, pp. 627-35. Shull, F. 2013, 'Getting an Intuition for Big Data', IEEE Software, vol. 30, no. 4, pp. 3-6. Tsai, C.-F. & Wu, J.-W. 2008, 'Using neural network ensembles for bankruptcy prediction and credit scoring', Expert Systems with Applications, vol. 34, no. 4, pp. 2639-49. Wu, X., Zhu, X., Wu, G.-Q. & Ding, W. 2014, 'Data Mining with Big Data', IEEE Transactions on Knowledge & Data Engineering, vol. 26, no. 1, pp. 97-107. Wu, C.-L., Chen, M.-C. & Wang, C.-J. 2007, 'Credit scoring with a data mining approach based on support vector machines', Expert Systems with Applications, vol. 33, no. 4, pp. 847-56.