Survey
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project
Assessment of Model Development Techniques and Evaluation Methods for Binary Classification in the Credit Industry DSI Conference Jennifer Lewis Priestley Satish Nargundkar November 24, 2003 Paper Research Questions This paper addresses the answers to two the following research questions: 1.Does model development technique improve classification accuracy? 2.How will model selection vary based upon the evaluation method used? Discussion Outline Discussion of Modeling Techniques Discussion of Model Evaluation Methods Global Classification Rate Loss Function K-S Test ROC Curves Empirical Example Model Development Techniques Modeling plays an increasingly important role in CRM strategies: Other Models Segmentation Models Bankruptcy Models Fraud Models Collection s/Recover y Collections Recovery Models Product Planning Creating Value Customer Management Target Marketing Response Models Risk Models Customer Acquisitio n Customer Behavioral Models Usage Models Attrition Models Activation Models Model Development Techniques Given that even minimal improvements in model classification accuracy can translate into significant savings or incremental revenue, an entire literature exists on the comparison of model development techniques (e.g., Atiya, 2001; Reichert et al., 1983; West, 2000; Vellido et al., 1993; Zhang et al., 1999). Statistical Techniques Linear Discriminant Analysis Logistic Analysis Multiple Regression Analysis Non-Statistical Techniques Neural Networks Cluster Analysis Decision Trees Model Evaluation Methods But, developing the model is really only half the problem. How do you then determine which model is “best”? Model Evaluation Methods In the context of binary classification (one of the most common objectives in CRM modeling), one of four outcomes is possible: 1. True positive 2.False positive 3. True negative 4. False negative True Good True Bad Pred. Good TP FP Pred. Bad FN TN Model Evaluation Methods If all of these outcomes, specifically the errors, have the same associated costs, then a simple global classification rate is a highly appropriate evaluation method: True Good True Bad Total Predicted Good 650 50 700 Predicted Bad 200 100 300 850 150 1000 Total Classification Rate = 75% ((100+650)/1000) Model Evaluation Methods The global classification method is the most commonly used (Bernardi and Zhang, 1999), but fails when the costs of the misclassification errors are different (Type 1 vs Type 2 errors) Model 1 results: Model 2 results: Global Classification Rate = 75% False Positive Rate = 5% False Negative Rate = 20% Global Classification Rate = 80% False Positive Rate = 15% False Negative Rate = 5% What if the cost of a false positive was great, and the cost of a false negative was negligible? What if it was the other way around? Model Evaluation Methods If the misclassification error costs are understood with some certainty, a loss function could be used to evaluate the best model: Loss=π0f0c0+ π1f1c1 Where, πi is the probability that an element comes from class i, (prior probability), fi is the probability that an element will be misclassified in i class, and ci is the cost associated with that misclassification error. Model Evaluation Methods An evaluation model that uses the same conceptual foundation as the global classification rate is the Kolmorgorov-Smirnov Test: Cumulativ e Percentage of Observ ations 100% Greatest separation occurs at a cut off score of .65 80% 60% 40% 20% 0% 0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00 Score Cut Off Model Evaluation Methods What if you don’t have ANY information regarding misclassification error costs…or…the costs are in the eye of the beholder? Model Evaluation Methods The area under the ROC (Receiver Operating Characteristics) Curve accounts for all possible outcomes Sensitivity (True Positives) (Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983): 1 True Good θ=1 .5<θ<1 Pred. Good TP FP Pred. Bad FN TN θ=.5 0 1-Specificity (False Positives) True Bad 1 Empirical Example So, given this background, the guiding questions of our research were – 1. Does model development technique impact prediction accuracy? 2. How will model selection vary with the evaluation method used? Empirical Example We elected to evaluate these questions using a large data set from a pool of car loan applicants. The data set included: • 14,042 US applicants for car loans between June 1, 1998 and June 30, 1999. • Of these applicants, 9442 were considered to have been “good” and 4600 were considered to be “bad” as of December 31, 1999. • 65 variables, split into two groups – • Transaction variables (miles on the vehicle, selling price, age of vehicle, etc.) • Applicant variables (bankruptcies, balances on other loans, number of revolving trades, etc.) Empirical Example – The LDA and Logistic models were developed using SAS 8.2, while the Neural Network models were developed using Backpack® 4.0. Because there is no accepted guidelines for the number of hidden nodes in Neural Network development (Zhang et al., 1999; Chen and Huang, 2003), we tested a range of hidden nodes from 5 to 50. Empirical Example – Feed Forward Back Propogation Neural Networks: input Input Layer input Hidden Layer Output Layer output input input Combination Function combines all inputs into a single value, usually as a weighted summation Σ S Transfer Function Calculates the output value from the combination function Empirical Example - Results Technique Class Rate “Goods” Class Rate “Bads” Class Rate “Global” Theta K-S Test LDA 73.91% 43.40% 59.74% 68.98% 19% Logistic 70.54% 59.64% 69.45% 68.00% 24% NN-5 Hidden Nodes 63.50% 56.50% 58.88% 63.59% 38% NN-10 Hidden Nodes 75.40% 44.50% 55.07% 64.46% 11% NN-15 Hidden Nodes 60.10% 62.10% 61.40% 65.89% 24% NN-20 Hidden Nodes 62.70% 59.00% 60.29% 65.27% 24% NN-25 Hidden Nodes 76.60% 41.90% 53.78% 63.55% 16% NN-30 Hidden Nodes 52.70% 68.50% 63.13% 65.74% 22% NN-35 Hidden Nodes 60.30% 59.00% 59.46% 63.30% 22% NN-40 Hidden Nodes 62.40% 58.30% 59.71% 64.47% 17% NN-45 Hidden Nodes 54.10% 65.20% 61.40% 64.50% 31% NN-50 Hidden Nodes 53.20% 68.50% 63.27% 65.15% 37% Conclusions What were we able to demonstrate? 1. The “best” model depends upon the evaluation method selected; 2. The appropriate evaluation method depends upon situational and data context; 3. No multivariate technique is “best” under all circumstances.