Download Modeling Response to Direct Mail Marketing Using Data

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
University of Rhode Island
Department of Computer Science and Statistics
March 30, 2007
An Overview and Example
of Data Mining
Daniel T. Larose, Ph.D.
Professor of Statistics
Director, Data Mining @CCSU
Editor, Wiley Series on Methods and Applications in Data Mining
[email protected]
www.math.ccsu.edu/larose
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
1
Overview
•
Part One:
– A Brief Overview of Data Mining
•
Part Two:
– An Example of Data Mining:
– Modeling Response to Direct Mail Marketing
•
But first, a shameless plug …
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
2
Master of Science in DM at CCSU
Faculty
• Dr. Roger Bilisoly (from Ohio State Univ., Statistics)
–
Text Mining, Intro to Data Mining
• Dr. Darius Dziuda (from Warsaw Polytechnic Univ, CS)
–
Data Mining for Genomics and Proteomics, Biomarker Discovery
• Dr. Zdravko Markov (from Sofia Univ, CS)
–
Data Mining (CS perspective), Machine Learning
• Dr. Daniel Miller (from UConn, Statistics)
–
Applied Multivariate Analysis, Mathematical Statistics II, Intro to Data
Mining
• Dr. Krishna Saha (from Univ of Windsor, Statistics)
–
Intro to Data Mining using R
• Dr. Daniel Larose (Program Director) (from UConn, Statistics)
–
Intro to Data Mining, Data Mining Methods, Applied Data Mining,
Web Mining
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
3
Master of Science in DM at CCSU
Program (36 credits)
• Core Courses (27 credits) All available online.
–
–
–
–
–
–
–
–
Stat
Stat
Stat
Stat
Stat
Stat
Stat
Stat
521
522
523
525
526
527
416
570
Introduction to Data Mining (4 cr)
Data Mining Methods (4 cr)
Applied Data Mining (4 cr)
Web Mining
Data Mining for Genomics and Proteomics
Text Mining
Mathematical Statistics II
Applied Multivariate Analysis
• Electives ( 6 credits. Choose two)
–
–
–
–
–
–
–
•
CS 570 Topics in Artificial Intelligence: Machine Learning
CS 580 Topics in Advanced Database: Data Mining
Stat 455 Experimental Design
Stat 551 Applied Stochastic Processes
Stat 567 Linear Models
Stat 575 Mathematical Statistics III
Stat 529 Current Issues in Data Mining
Capstone Requirement: Stat 599 Thesis (3 credits)
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
4
Master of Science in DM at CCSU
•
•
•
•
•
•
•
•
•
Only MS in DM that is entirely online.
Some courses available on campus.
Student must come to CCSU to present Thesis
We reach students in about 30 US States and a dozen foreign
countries
Half of our students already have master’s degrees
About 15% already have Ph.D.’s
Typical student is a mid-career professional
Backgrounds are diverse: Computer Science, Engineering, Finance,
Chemistry, Database Admin, Statistics, etc.
www.ccsu.edu/datamining
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
5
Graduate Certificate in Data Mining
• 18 Credits:
• Required Courses (12 credits)
– Stat 521 Introduction to Data Mining
– Stat 522 Data Mining Methods and Models
– Stat 523 Applied Data Mining
• Elective Courses (6 credits. Choose Two):
–
–
–
–
–
Stat 525 Web Mining
Stat 526 Data Mining for Genomics and Proteomics
Stat 527 Text Mining
Stat 529 Current Issues in Data Mining
Some other graduate-level data mining or statistics course, with approval of
advisor.
• No Mathematical Statistics requirement.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
6
Material for Part I Drawn From:
Discovering Knowledge in Data:
An Introduction to Data Mining
(Wiley, 2005)
•
•
•
•
Chapter
Chapter
Chapter
Chapter
•
•
•
•
Chapter
Chapter
Chapter
Chapter
• Chapter
• Chapter
• Chapter
1.
2.
3.
4.
An Introduction to Data Mining
Data Preprocessing
Exploratory Data Analysis
Statistical Approaches to
Estimation and Prediction
5. K-Nearest Neighbor
6. Decision Trees
7. Neural Networks
8. Hierarchical and K-Means
Clustering
9. Kohonen networks
10. Association Rules
11. Model Evaluation Techniques
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
7
Material for Part II Drawn From:
Data Mining Methods and Models
(Wiley, 2006)
• Chapter 1. Dimension Reduction Methods
• Chapter 2. Regression Modeling
• Chapter 3. Multiple Regression and Model
•
•
•
•
Building
Chapter 4. Logistic Regression
Chapter 5. Naïve Bayes Classification and
Bayesian Networks
Chapter 6. Genetic Algorithms
Chapter 7. Case Study: Modeling Response
to Direct-Mail Marketing
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
8
No Material Drawn From:
Data Mining the Web: Uncovering Patterns in
Web Content, Structure, and Usage
(Wiley, April 2007)
• Part One: Web Structure Mining
– Information Retrieval and Web Search
– Hyperlink-Based Ranking
• Part Two: Web Content Mining
– Clustering
– Evaluating Clustering
– Classification
• Part Three: Web Usage Mining
– Data Preprocessing,
– Exploratory Data Analysis,
– Association Rules, Clustering, and
Classification for Web Usage Mining
• With Dr. Zdravko Markov, Computer
Science, CCSU
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
9
Call for Book Proposals
Wiley Series on
Methods and Applications in Data Mining
• Suggested topics:
–
–
–
–
–
–
Data Mining in Bioinformatics
Emerging Techniques in Data Mining (e.g., SVM)
Data Mining with Evolutionary Algorithms
Drug Discovery Using Data Mining
Mining Data Streams
Visual Analysis in Data Mining
• Books in press:
– Data Mining for Genomics and Proteomics, by Darius Dziuda
– Practical Text Mining Using Perl, by Roger Bilisoly
• Contact Series Editor at [email protected]
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
10
What is Data Mining?
• “Data mining is the analysis of
(often large) observational data
sets to find unsuspected
relationships and to summarize
the data in novel ways that are
both understandable and useful
to the data owner.”
– David Hand, Heikki Mannila &
Padhraic Smyth, Principles of Data
Mining, MIT Press, 2001
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
11
Why Data Mining?
• “We are drowning in information but
starved for knowledge.”
– John Naisbitt, Megatrends, 1984.
• “The problem is that there are not enough
trained human analysts available who are
skilled at translating all of this data into
knowledge, and thence up the taxonomy
tree into wisdom.”
– Daniel Larose, Discovering Knowledge in Data: An
Introduction to Data Mining, Wiley, 2005.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
12
Need for Human Direction
• Automation is no substitute for human
supervision and input.
– Humans need to be actively involved
at every phase of data mining process.
•“Rather than asking where humans fit into data
mining, we should instead inquire about how we may
design data mining into the very human process of
problem solving.”
- Daniel Larose, Discovering Knowledge in Data:
An Introduction to Data Mining, Wiley, 2005.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
13
“Data Mining is Easy to Do Badly”
• Black box software
– Powerful, “easy-to-use” data mining algorithms
– Makes their misuse dangerous.
– Too easy to point and click your way to disaster.
• What is needed:
– An understanding of the underlying
algorithmic and statistical model
structures.
– An understanding of which algorithms
are most appropriate in which
situations and for which types of data.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
14
CRISP-DM: Cross-Industry Standard Process
for Data Mining
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
15
CRISP: DM as a Process
1.
Business / Research Understanding Phase
2.
3.
4.
5.
Data Understanding Phase: EDA
Data Preparation Phase: Preprocessing
Modeling Phase: Fun and interesting!
Evaluation Phase
6.
Deployment Phase: Use results to solve
problem.
If desired: Use lessons learned to reformulate
business / research objective.
Enunciate your objectives
Confluence of results? Objectives Met?
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
16
What About Data Dredging?
Data Dredging
“A sufficiently exhaustive search will
certainly throw up patterns of some kind.
Many of these patterns will simply be a
product of random fluctuations, and will
not represent any underlying structure.”

David J. Hand, Data Mining: Statistics and More?
The American Statistician, May, 1998.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
17
Guarding Against Data Dredging:
Cross-Validation is the Key
•
•
•
Partition the data into training set and test set.
If the pattern shows up in both data sets, decreases
the probability that it represents noise.
More generally, may use n-fold cross-validation.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
18
Inference and Huge Data Sets
• Hypothesis testing becomes sensitive at the huge
sample sizes prevalent in data mining applications.
– Even very tiny effects will be found significant.
– So, data mining tends to de-emphasize inference
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
19
Need for Transparency and Interpretability
• Data mining models should be transparent
•
•
•
– Results should be interpretable by humans
Decision Trees are transparent
Neural Networks tend to be opaque
If a customer complains about why he/she was turned
down for credit, we should be able to explain why,
without saying “Our neural net said so.”
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
20
Part Two:
Modeling Response to Direct Mail Marketing
Business Understanding Phase:
– Clothing Store Purchase Data
• Results of a direct mail marketing
campaign
• Task: Construct a classification model
– For classifying customers as either
responders or non-responders to the
marketing campaign,
– To reduce costs and increase return-oninvestment
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
21
Data Understanding:
The Clothing Store dataset
List of fields in the dataset (28,7999 customers, 51 fields)
Customer ID: Unique, encrypted customer Number of days the customer has
identification
been on file
Product uniformity (Low score = diverse spending
patterns)
Zip Code
Lifetime average time between visits
Number of purchase visits
Total net sales
Average amount spent per visit
Amount spent at each of four different
franchises (four variables)
Amount spent in the past month, the past
three months, and the past six months
Amount spent the same period last year
Gross margin percentage
Number of marketing promotions on file
Number of days between purchases
Markdown percentage on customer
purchases
Number of different product
classes purchased
Number of coupons used by the
customer
Total number of individual items
purchased by the customer
Number of stores the customer
shopped at
Number of promotions mailed in
the past year
Number of promotions responded
to in the past year
Promotion response rate for the
past year
Microvision® Lifestyle Cluster Type
Percent of Returns
Flag: Credit card user
Flag: Valid phone number on file
Flag: Web shopper
15 variables providing the percentages spent by the
customer on specific classes of clothing, including
sweaters, knit tops, knit dresses, blouses, jackets, career
pants, casual pants, shirts, dresses, suits, outerwear,
jewelry, fashion, legwear, and the collectibles line. Also
a variable showing the brand of choice (encrypted).
Target variable: Response to promotion
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
22
Data Preparation and EDA Phase
• Not covered in this presentation.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
23
Modeling Strategy
• Apply principal components analysis to address
•
•
•
•
multicollinearity.
Apply cluster analysis. Briefly profile clusters.
Balance the training data set.
Establish baseline model performance
– In terms of expected profit per customer contacted.
Apply classification algorithms to training data set:
– CART
– C5.0 (C4.5)
– Neural networks
– Logistic regression.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
24
Modeling Strategy continued
• Evaluate each model using test data set.
• Apply misclassification costs in line with cost benefit table.
• Apply overbalancing as a surrogate for misclassification costs.
– Find best overbalancing proportion.
• Combine predictions from four models
– Using model voting.
– Using mean response probabilities.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
25
Principal Components Analysis (PCA)
• Multicollinearity does not degrade prediction accuracy.
– But muddles individual predictor coefficients.
• Interested in predictor characteristics, customer
profiling, etc?
– Then PCA is required.
• But, if interested solely in classification (prediction,
estimation),
– PCA not strictly required.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
26
Report Two Model Sets:
• Model Set A:
– Includes principal components
– All purpose model set
• Model Set B:
– Includes correlated predictors, not principal
components
– Use restricted to classification
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
27
Principal Components Analysis (PCA)
• Seven correlated variables.
– Two components extracted
– Account for 87% of variability
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
28
Principal Components Analysis (PCA)
• Principal Component 1:
– Purchasing Habits
– Customer general purchasing habits
– Expect component to be strongly indicative of
response
• Principal Component 2:
– Promotion Contacts
– Unclear whether component will be associated with
response
• Components validated by test data set
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
29
BIRCH Clustering Algorithm
• Requires only one pass through data set
•
•
•
– Scalable for large data sets
Benefit: Analyst need not pre-specify number of clusters
Drawback: Sensitive to initial records encountered
– Leads to widely variable cluster solutions
Requires “outer loop” to find consistent cluster solution
• Zhang, Ramakrishnan and Livny, BIRCH: A New Data Clustering Algorithm
and Its Applications, Data Mining and Knowledge Discovery 1, 1997.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
30
BIRCH Clusters
• Cluster 3 shows:
– Higher response for flag predictors
– Higher averages for numeric
predictors
z ln Purchase Visits
z ln Total Net Sales
z sqrt Spending Last One Month
z ln Lifetime Average Time Between Visits
z ln Product Uniformity
z sqrt # Promotion Responses in Past Year
z sqrt Spending on Sweaters
Cluster 1
–0.575
–0.177
–0.279
0.455
0.493
–0.480
–0.486
Cluster 2
–0.570
–0.804
–0.314
0.484
0.447
–0.573
0.261
Cluster 3
1.011
0.971
0.523
–0.835
–0.834
0.950
0.116
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
31
BIRCH Clusters
• Cluster 3 has highest
response rate (red).
– Cluster 1: 7.6%
– Cluster 2: 7.1%
– Cluster 3: 33.0%
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
32
Balancing the Data
• For “rare” classes,
•
provides more equitable
distribution.
Drawback: Loss of data:
– Here, 40% of nonresponders randomly
omitted
– All responders retained
– Responders increases from
16.58% to 24.76%
• Test data set should
never be balanced
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
33
False Positive vs. False Negative:
Which is Worse?
• For direct mail marketing, a false negative
•
error is probably worse than a false
positive.
Generate misclassification costs based on
the observed data.
– Construct cost-benefit table
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
34
Decision Cost / Benefit Analysis
Outcome
True
Negative
True
Positive
Classified
No
Yes
Actual
No
Yes
Cost
Rationale
$0
No contact
made; no
revenue lost
(Anticipated
-$26.40
revenue) –
(Cost of contact)
False
Negative
No
Yes
$28.40
Loss of
anticipated
revenue
False
Positive
Yes
No
$2.00
Cost of contact
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
35
Establish Baseline Model Performance
• Benchmarks
– “Don’t Send a Marketing Promotion to Anyone” Model
– “Send a Marketing Promotion to Everyone” Model
• Will compare candidate models against this baseline error
rate.
Model
“Don’t Send Anyone”
“Send to Everyone”
TN
Cost
$0
5908
0
TP
Cost
– $26.4
0
1151
FN
Cost
$28.40
1151
0
FP
Cost
$2.00
0
5908
Overall
Error
Rate
Overall
Cost
16.3%
$32,688.40
($4.63 per
customer)
83.7%
-$18,570.40
(-$2.63 per
customer)
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
36
Model Set A
• No model beats
benchmark of $2.63
profit per customer
• Misclassification costs
had not been applied
• Now define FN cost =
$28.40, FP cost = $2
– Outperformed
baseline “Send to
everyone” model
(With 50% Balancing)
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
Neural Network
4694
672
479
9.3%
1214
64.4%
24.0%
-$0.24
CART
4348
829
322
6.9%
1560
65.3%
26.7%
-$1.36
C5.0
4465
782
369
7.6%
1443
64.9%
25.7%
-$1.03
Logistic Regression
4293
872
279
6.1%
1615
64.9%
26.8%
-$1.68
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
CART
754
1147
4
0.5%
5154
81.8%
73.1%
-$2.81
C5.0
858
1143
8
0.9%
5050
81.5%
71.7%
-$2.81
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
37
Model Set A:
Effect of Misclassification Costs
• For the 447 highlighted records:
– Only 20.8% responded.
– But model predicts positive response.
– Due to high false negative misclassification cost.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
38
Model Set A:
PCA Component 1 is Best Predictor
• First principal component ($F-PCA-1),
Purchasing Habits, represents both the
root node split and the secondary split
– Most important factor for predicting response
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
39
Over-Balancing as a Surrogate for
Misclassification Costs
• Software limitation:
• Neural network and logistic regression models in
Clementine:
– Lack methods for applying misclassification costs
• Over-balancing is an alternate method which can achieve
•
similar results
Starves the classifier of instances of non-response
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
40
Over-Balancing as a Surrogate for
Misclassification Costs
• Neural network model results
– Three over-balanced models outperform baseline
• Properly applied, over-balancing can be used as a
surrogate for misclassification costs
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
No Balancing
(16.3% - 83.7%)
5865
124
1027
14.9%
43
25.7%
15.2%
+$3.68
50% - 50%
Balancing
4694
672
479
9.3%
1214
64.4%
24.0%
-$0.24
65% - 35%
Over-Balancing
1918
1092
59
3.0%
3990
78.5%
57.4%
-$2.72
80% - 20%
Over-Balancing
1032
1129
22
2.1%
69.4%
-$2.75
90% - 10%
Over-Balancing
592
1141
10
1.7%
75.4%
-$2.72
4876
81.2%
5316
82.3%
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
41
Over-Balancing as a Surrogate for
Misclassification Costs
• Apply 80% - 20% over-balancing to the
other models.
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
885
1132
19
2.1%
5023
81.6%
71.4%
-$2.73
CART
1724
1111
40
2.3%
4184
79.0%
59.8%
-$2.81
C5.0
1467
1116
35
2.3%
4441
79.9%
63.4%
-$2.77
Logistic Regression
2389
1106
45
1.8%
3519
76.1%
50.5%
-$2.96
Model
Neural Network
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
42
Combination Models: Voting
• Smoothes out strengths and weaknesses of each model
– Each model supplies a prediction for each record
– Count the votes for each record
• Disadvantage of combination models:
– Lack of easy interpretability
• Four competing combination models…
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
43
Combination Models: Voting
Mail a Promotion only if:
• All four models predict response
– Protects against false positive
– All four classification algorithms must agree on a
positive prediction
• At least three models predict response
• At least two models predict response
• Any model predicts response
– Protects against false negatives
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
44
Combination Models: Voting
• None beat the logistic regression model: $2.96 profit per customer
• Perhaps combination models will do better with Model Collection B…
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
2772
1067
84
2.9%
3136
74.6%
45.6%
-$2.76
Mail a Promotion Only
if Three or Four Models
Predict Response
1936
1115
36
1.8%
3972
78.1%
56.8%
-$2.90
Mail a Promotion Only
if At Least Two Models
Predict Response
1207
1135
16
1.3%
4701
80.6%
66.8%
-$2.85
Mail a Promotion if Any
Model Predicts
Response
550
1148
3
0.5%
5358
82.4%
75.9%
-$2.76
Combination
Model
Mail a Promotion Only
if All Four Models
Predict Response
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
45
Model Collection B: Non-PCA Models
• Models retain correlated variables
– Use restricted to prediction only
• Since the correlated variables are highly predictive
– Expect Collection B will outperform the PCA models
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
46
Model Collection B: CART and C5.0
• Using misclassification costs, and 50% balancing
• Both models outperform the best PCA model
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
CART
1645
1140
11
0.7%
4263
78.9%
60.5%
-$3.01
C5.0
1562
1147
4
0.3%
4346
79.1%
61.6%
-$3.04
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
47
Model Collection B: Over-Balancing
• Apply over-balancing as a surrogate for
•
misclassification costs for all models
Best performance thus far.
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
Neural Network
1301
1123
28
2.1%
4607
80.4%
65.7%
-$2.78
CART
2780
1100
51
1.8%
3128
74.0%
45.0%
-$3.02
C5.0
2640
1121
30
1.1%
3268
74.5%
46.7%
-$3.15
Logistic Regression
2853
1110
41
1.4%
3055
73.3%
43.9%
-$3.12
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
48
Combination Models: Voting
• Combine the four models via voting and 80%-20% overbalancing
• Synergy: Combination model outperforms any individual
model.
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
FP
Cost
$2.00
Overall
Error
Rate
Overall
Cost per
Customer
3307
1065
86
2.5%
2601
70.9%
38.1%
-$2.90
2835
1111
40
1.4%
3073
73.4%
44.1%
-$3.12
Mail a Promotion Only
if At Least Two Models
Predict Response
2357
1133
18
0.7%
3551
75.8%
50.6%
-$3.16
Mail a Promotion if Any
Model Predicts
Response
1075
1145
6
0.6%
4833
80.8%
68.6%
-$2.89
Combination
Model
Mail a Promotion Only
if All Four Models
Predict Response
Mail a Promotion Only
if Three or Four Models
Predict Response
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
49
Combining Models Using
Mean Response Probabilities
• Combine the confidences that each model
reports for its decisions
– Allows finer tuning of the decision space
• Derive a new variable:
– Mean Response Probability (MRP):
• Average of response confidences of the four
models.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
50
Combining Models Using
Mean Response Probabilities
• Multi-modality due to the discontinuity of the
transformation used in derivation of MRP
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
51
Combining Models Using
Mean Response Probabilities
• Where shall we define response vs. non-response?
– Recall that FN is 14.2 times worse than FP
– Set partitions on the low side => fewer FN decisions are made
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
52
Combining Models Using
Mean Response Probabilities
• Optimal partition: near
50%.
• Mail a promotion to a
prospective customer only
if the mean response
probability is at least 50%
• Best model in case study.
– MRP = 0.51
• $3.1744 profit
– “send to everyone”
• $2.62 profit
– 20.7% profit
enhancement
(54.44 cents)
Combination
Model
TN
Cost
$0
TP
Cost
– $26.4
FN
Cost
$28.40
MRP  0.95
Partition : 
MRP  0.95
5648
353
798
12.4%
MRP  0.85
Partition : 
MRP  0.85
3810
994
157
4.0%
MRP  0.65
Partition : 
MRP  0.65
2995
1104
47
1.5%
MRP  0.54
Partition : 
MRP  0.54
2796
1113
38
1.3%
MRP  0.52
Partition : 
MRP  0.52
2738
1121
30
1.1%
MRP  0.51
Partition : 
MRP  0.51
2686
1123
MRP  0.50
Partition : 
MRP  0.50
2625
MRP  0.46
Partition : 
MRP  0.46
MRP  0.42
Partition : 
MRP  0.42
FP
Cost
$2.00
260
42.4%
Overall
Error
Rate
Overall
Cost per
Customer
15.0%
+$1.96
2098
67.8%
31.9%
-$2.49
2913
72.5%
41.9%
-$3.11
44.6%
-$3.13
3170
73.9%
45.3%
-$3.1736
28
1.0%
3222
74.2%
46.0%
-$3.1744
1125
26
1.0%
3283
74.5%
46.9%
-$3.1726
2493
1129
22
0.9%
3415
75.2%
48.7%
-$3.166
2369
1133
18
0.8%
3539
75.7%
50.4%
-$3.162
3112
73.7%
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
53
Summary
• For more on this Case Study, see Data Mining
Methods and Models (Wiley, 2006)
• So, the best part about all this is:
– Data mining is fun!
– If you love to play with data, and you love to
construct and evaluate models, then data
mining is for you.
URI Dept of Computer Science and Statistics - An Overview and Example of Data Mining - Larose
54