Download Effectiveness of models

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the work of artificial intelligence, which forms the content of this project

Document related concepts
no text concepts found
Transcript
Assessment of Model Development
Techniques and Evaluation Methods for
Binary Classification in the Credit Industry
DSI Conference
Jennifer Lewis Priestley
Satish Nargundkar
November 24, 2003
Paper Research Questions
This paper addresses the answers to two the following
research questions:
1.Does model development technique improve
classification accuracy?
2.How will model selection vary based upon the evaluation
method used?
Discussion Outline
Discussion of Modeling Techniques
Discussion of Model Evaluation Methods




Global Classification Rate
Loss Function
K-S Test
ROC Curves
Empirical Example
Model Development Techniques
Modeling plays an increasingly important role in CRM
strategies:
Other Models
Segmentation Models
Bankruptcy Models
Fraud Models
Collection
s/Recover
y
Collections
Recovery Models
Product
Planning
Creating
Value
Customer
Management
Target Marketing
Response Models
Risk Models
Customer
Acquisitio
n
Customer Behavioral
Models
Usage Models
Attrition Models
Activation Models
Model Development Techniques
Given that even minimal improvements in model
classification accuracy can translate into significant
savings or incremental revenue, an entire literature exists
on the comparison of model development techniques
(e.g., Atiya, 2001; Reichert et al., 1983; West, 2000;
Vellido et al., 1993; Zhang et al., 1999).
Statistical Techniques
 Linear Discriminant Analysis
 Logistic Analysis
 Multiple Regression Analysis
Non-Statistical Techniques
 Neural Networks
 Cluster Analysis
 Decision Trees
Model Evaluation Methods
But, developing the model is really only half the problem.
How do you then determine which model is “best”?
Model Evaluation Methods
In the context of binary classification (one of the most
common objectives in CRM modeling), one of four outcomes
is possible:
1. True positive
2.False positive
3. True negative
4. False negative
True Good
True Bad
Pred. Good
TP
FP
Pred. Bad
FN
TN
Model Evaluation Methods
If all of these outcomes, specifically the errors, have the
same associated costs, then a simple global classification
rate is a highly appropriate evaluation method:
True Good
True Bad
Total
Predicted Good
650
50
700
Predicted Bad
200
100
300
850
150
1000
Total
Classification Rate = 75% ((100+650)/1000)
Model Evaluation Methods
The global classification method is the most commonly
used (Bernardi and Zhang, 1999), but fails when the costs
of the misclassification errors are different (Type 1 vs Type
2 errors) Model 1 results:
Model 2 results:
Global Classification Rate = 75%
False Positive Rate = 5%
False Negative Rate = 20%
Global Classification Rate = 80%
False Positive Rate = 15%
False Negative Rate = 5%
What if the cost of a false positive was great, and the cost of
a false negative was negligible? What if it was the other
way around?
Model Evaluation Methods
If the misclassification error costs are understood with some
certainty, a loss function could be used to evaluate the best
model:
Loss=π0f0c0+ π1f1c1
Where, πi is the probability that an element comes from
class i, (prior probability), fi is the probability that an element
will be misclassified in i class, and ci is the cost associated
with that misclassification error.
Model Evaluation Methods
An evaluation model that uses the same conceptual
foundation as the global classification rate is the
Kolmorgorov-Smirnov Test:
Cumulativ e Percentage
of Observ ations
100%
Greatest
separation
occurs at a
cut off score
of .65
80%
60%
40%
20%
0%
0.00 0.10 0.20 0.30 0.40 0.50 0.60 0.70 0.80 0.90 1.00
Score Cut Off
Model Evaluation Methods
What if you don’t have ANY information regarding
misclassification error costs…or…the costs are in the eye of
the beholder?
Model Evaluation Methods
The area under the ROC (Receiver Operating
Characteristics) Curve accounts for all possible outcomes
Sensitivity (True Positives)
(Swets et al., 2000; Thomas et al., 2002; Hanley and McNeil, 1982, 1983):
1
True Good
θ=1
.5<θ<1
Pred. Good
TP
FP
Pred. Bad
FN
TN
θ=.5
0
1-Specificity (False Positives)
True Bad
1
Empirical Example
So, given this background, the guiding questions of
our research were –
1. Does model development technique impact
prediction accuracy?
2. How will model selection vary with the evaluation
method used?
Empirical Example
We elected to evaluate these questions using a large data
set from a pool of car loan applicants. The data set
included:
• 14,042 US applicants for car loans between June 1, 1998 and June
30, 1999.
• Of these applicants, 9442 were considered to have been “good” and
4600 were considered to be “bad” as of December 31, 1999.
• 65 variables, split into two groups –
• Transaction variables (miles on the vehicle, selling price, age of
vehicle, etc.)
• Applicant variables (bankruptcies, balances on other loans,
number of revolving trades, etc.)
Empirical Example –
The LDA and Logistic models were developed using SAS
8.2, while the Neural Network models were developed using
Backpack® 4.0.
Because there is no accepted guidelines for the number of
hidden nodes in Neural Network development (Zhang et al.,
1999; Chen and Huang, 2003), we tested a range of hidden
nodes from 5 to 50.
Empirical Example –
Feed Forward Back Propogation Neural Networks:
input
Input Layer
input
Hidden Layer
Output Layer
output
input
input
Combination
Function
combines all
inputs into a
single value,
usually as a
weighted
summation
Σ S
Transfer
Function
Calculates the
output value from
the combination
function
Empirical Example - Results
Technique
Class
Rate
“Goods”
Class
Rate
“Bads”
Class
Rate
“Global”
Theta
K-S
Test
LDA
73.91%
43.40%
59.74%
68.98%
19%
Logistic
70.54%
59.64%
69.45%
68.00%
24%
NN-5 Hidden Nodes
63.50%
56.50%
58.88%
63.59%
38%
NN-10 Hidden Nodes
75.40%
44.50%
55.07%
64.46%
11%
NN-15 Hidden Nodes
60.10%
62.10%
61.40%
65.89%
24%
NN-20 Hidden Nodes
62.70%
59.00%
60.29%
65.27%
24%
NN-25 Hidden Nodes
76.60%
41.90%
53.78%
63.55%
16%
NN-30 Hidden Nodes
52.70%
68.50%
63.13%
65.74%
22%
NN-35 Hidden Nodes
60.30%
59.00%
59.46%
63.30%
22%
NN-40 Hidden Nodes
62.40%
58.30%
59.71%
64.47%
17%
NN-45 Hidden Nodes
54.10%
65.20%
61.40%
64.50%
31%
NN-50 Hidden Nodes
53.20%
68.50%
63.27%
65.15%
37%
Conclusions
What were we able to demonstrate?
1. The “best” model depends upon the evaluation method
selected;
2. The appropriate evaluation method depends upon
situational and data context;
3. No multivariate technique is “best” under all
circumstances.
Related documents