Download Intelligent Miner for Data Applications Guide

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

K-means clustering wikipedia , lookup

Cluster analysis wikipedia , lookup

Transcript
IBML
Intelligent Miner for Data
Applications Guide
Peter Cabena, Hyun Hee Choi, Il Soo Kim, Shuichi Otsuka,
Joerg Reinschmidt, Gary Saarenvirta
International Technical Support Organization
http://www.redbooks.ibm.com
SG24-5252-00
IBML
International Technical Support Organization
Intelligent Miner for Data
Applications Guide
March 1999
SG24-5252-00
Take Note!
Before using this information and the product it supports, be sure to read the general information in
Appendix A, “Special Notices” on page 137.
First Edition (March 1999)
This edition applies to Version 2, Release 1 of the Intelligent Miner for Data, Program Number 5801-AAR for use
with the AIX Operating System.
Comments may be addressed to:
IBM Corporation, International Technical Support Organization
Dept. QXXE Building 80-E2
650 Harry Road
San Jose, California 95120-6099
When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any
way it believes appropriate without incurring any obligation to you.
 Copyright International Business Machines Corporation 1999. All rights reserved.
Note to U.S. Government Users — Documentation related to restricted rights — Use, duplication or disclosure is
subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp.
Contents
Figures
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
v
Tables
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
vii
Preface
. . . . . . . . . . . . . . . .
The Team That Wrote This Redbook
. . . . . . . .
Comments Welcome
. . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 1. Introduction . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
1.1 Why Now?
. .
1.1.1 Changed Business Environment
. . . . . . . . . . . . . . . . .
1.1.2 Drivers
. . . . . . . . . . . . . . . .
1.1.3 Enablers
. . . . . . . . . . .
1.2 What Is Data Mining?
1.3 Data Mining and Business Intelligence .
1.3.1 Where to from Here? . . . . . . . . .
1.4 Data Mining Applications . . . . . . . . .
1.5 Data Mining Techniques . . . . . . . . . .
1.5.1 Predictive Modeling . . . . . . . . . .
. . . . . . .
1.5.2 Database Segmentation
1.5.3 Link Analysis . . . . . . . . . . . . . .
1.6 General Approach to Data Mining . . . .
1.6.1 Business Requirements Analysis . .
1.6.2 Project Management . . . . . . . . .
. . . . . .
1.6.3 Business Solution Design
1.6.4 Data Mining Run . . . . . . . . . . . .
1.6.5 Business Implementation Design . .
. . . . . .
1.6.6 Business Implementation
1.6.7 Results Tracking . . . . . . . . . . . .
1.6.8 Final Business Result Determination
. . . . . .
1.6.9 Business Result Analysis
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Chapter 2. Introduction to the Intelligent Miner
.
. . . . . . . . . . . . . . . . . . . . . . .
2.1 History
. . . . . . . . . . . . . . .
2.2 Intended Customers
. . . . . . . . .
2.3 What Is the Intelligent Miner?
. . . .
2.4 Data Mining with the Intelligent Miner
2.5 Overview of the Intelligent Miner Components
2.5.1 Intelligent Miner Architecture . . . . . . .
. . . . . . .
2.5.2 Intelligent Miner TaskGuides
2.5.3 Mining and Statistics Functions . . . . . .
2.5.4 Processing Functions . . . . . . . . . . . .
2.5.5 Modes . . . . . . . . . . . . . . . . . . . . .
Chapter 3. Case Study Framework . . .
3.1 Customer Relationship Management
. . . . . . . . . . . . .
3.2 Case Studies
3.3 Strategic Customer Segmentation .
. . . . . . . . . . . . .
3.4 Case Studies
Chapter 4. Customer Segmentation
 Copyright IBM Corp. 1999
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
ix
ix
x
1
1
1
2
4
5
6
7
8
9
9
11
13
14
15
15
16
16
16
16
17
17
17
19
19
19
19
20
20
20
22
23
24
24
. . . . . . . . . . . . . . . . . . . . . . .
27
27
29
29
30
. . . . . . . . . . . . . . . . . . . . . . . . .
33
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
iii
4.1 Executive Summary
. . . . . . . . . . . .
4.2 Business Requirements . . . . . . . . . .
4.3 Data Mining Process . . . . . . . . . . . .
4.3.1 Data Selection . . . . . . . . . . . . .
. . . . . . . . . . .
4.3.2 Data Preparation
. . . . . . . . . . . . . .
4.3.3 Data Mining
4.4 Data Mining Results . . . . . . . . . . . .
. . . . . . .
4.4.1 Cluster Details Analysis
4.4.2 Cluster Characterization . . . . . . .
4.4.3 Cluster Profiling . . . . . . . . . . . .
4.4.4 Decision Tree Characterization . . .
4.5 Business Implementation and Next Steps
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
Chapter 5. Cross-Selling Opportunity Identification
. . . . . . . . . . . . . . .
5.1 Executive Summary
5.2 Business Requirement . . . . . . . . . . . . . .
5.3 Data Mining Process . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
5.3.1 Cluster Selection
5.3.2 Data Selection . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
5.3.3 Data Preparation
5.3.4 Product Association Analysis . . . . . . .
5.4 Data Mining Results . . . . . . . . . . . . . . .
. . . . . . . . . . . . . .
5.4.1 Cluster Selection
5.4.2 Association Rule Discovery . . . . . . . .
. .
5.5 Business Implementation and Next Steps
. . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.1 Executive Summary
6.2 Business Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3 Data Mining Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.1 Create Objective Variable . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.2 Data Preparation
. . . . . . . . . . . . . . . .
6.3.3 Data Sampling for Training and Test
6.3.4 Feature Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.5 Train and Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . .
6.3.6 Select ″Best Model″
6.3.7 Perform Population Stability Tests on Application Universe . . .
6.4 Data Mining Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.1 Decision Tree
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.2 RBF
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6.4.3 Neural Network
6.5 Business Implementation . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 7. Attrition Model to Improve Customer Retention
. . . . . . . . . . . . . . . . . . . .
7.1 Executive Summary
7.2 Business Requirement . . . . . . . . . . . . . . . . . . .
7.3 Data Mining Process . . . . . . . . . . . . . . . . . . . .
7.3.1 Data Definition . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
7.3.2 Data Preparation
. . . . . . . . . . . . . . . . . . . . . .
7.3.3 Data Mining
. . . . . . . . . . . . . . . . . . . . . .
7.3.4 Gains Chart
7.3.5 Clustering . . . . . . . . . . . . . . . . . . . . . . . .
7.4 Data Mining Results . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . .
7.4.1 Decision Tree
. . . . . . . . . . . . . . . . . . . . .
7.4.2 RBF Modeling
iv
Intelligent Miner Applications Guide
33
33
34
36
38
44
50
53
55
64
66
67
69
69
69
70
71
72
73
74
76
76
77
85
87
87
. 88
. 89
. 90
. 92
. 93
. 95
. 95
102
103
103
103
106
108
109
. .
. . .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. .
. . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
. . . . . . . . . . .
111
111
112
113
115
116
117
120
120
120
120
122
7.4.3 Neural Network
. . .
7.4.4 Clustering . . . . . . .
7.4.5 Time-Series Prediction
7.5 Business Implementation
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
Chapter 8. Intelligent Miner Advantages
Appendix A. Special Notices
. . . . . . . . . . . . . . . . . . . . .
133
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
137
Appendix B. Related Publications
. . . . . . . . . . . . . . . .
B.1 International Technical Support Organization Publications
B.2 Redbooks on CD-ROMs . . . . . . . . . . . . . . . . . . . .
B.3 Other Publications . . . . . . . . . . . . . . . . . . . . . . .
How to Get ITSO Redbooks
. . . . . . . . . .
How IBM Employees Can Get ITSO Redbooks
How Customers Can Get ITSO Redbooks . .
. . . . . . . . . . .
IBM Redbook Order Form
Glossary
. . . . . . . . .
. . . . . . . .
. . . . . . . . .
. . . . . . . . .
139
139
139
139
. . . . . . . . . . . . . . . . . . .
141
141
142
143
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
145
List of Abbreviations
Index
126
127
128
131
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
151
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
153
ITSO Redbook Evaluation
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Contents
155
v
vi
Intelligent Miner Applications Guide
Figures
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
38.
39.
40.
41.
42.
43.
44.
45.
46.
47.
 Copyright IBM Corp. 1999
New Customer Relationships Out of Reach
. . . . . . . . . . .
Data Mining Positioning . . . . . . . . . . . . . . . . . . . . . . .
Data Mining and Business Intelligence . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
Predictive Modeling
Database Segmentation . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . .
Pattern Matching
. . . . . . . . . . . . . . . . . . . . . .
The Data Mining Process
The Intelligent Miner Architecture . . . . . . . . . . . . . . . . .
The Data Task Guide . . . . . . . . . . . . . . . . . . . . . . . . .
Customer Segmentation Model . . . . . . . . . . . . . . . . . . .
. . . . . . . . .
Data Mining Process: Customer Segmentation
Customer Transaction Data Model . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . .
Original Data Profile
Post-Discretized Data Profile . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . .
Post Logarithm Transformed Data Profile
Clustering Process Flow . . . . . . . . . . . . . . . . . . . . . . .
Shareholder Value Demographic Clusters . . . . . . . . . . . .
Shareholder Value Neural Network Clusters . . . . . . . . . . .
Shareholder Value Demographic Cluster Details . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
Cluster 6 Detailed View
. . . . . . . . . . . . . . . . . . . . . . .
Cluster 3 Detailed View
. . . . . . . . . . . . . . . . . . . . . . .
Cluster 5 Detailed View
Cluster 5 Tabulated Details . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
Cluster 1 Detailed View
. . . . . . . . . . . . . . . . . .
Decision Tree Confusion Matrix
Decision Tree Model . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . .
Data Mining Process: Cross-Selling Opportunity
Typical Transaction Record . . . . . . . . . . . . . . . . . . . . .
Product Association Analysis Workflow . . . . . . . . . . . . . .
. . . . . . . . . . . . . . .
Parameter Settings for Associations
. . . . . . . . . . . . . . .
Associations on Good Customer Set
. . . . . . . . . . .
Associations on Good Customer Set Detail
. . . . . .
Associations for Good Customer Set: LIS Removed
Associations for Good Customer Set: LIS Removed, Detail . .
Associations on Okay Customer Set . . . . . . . . . . . . . . . .
Associations on Okay Customer Set Detail . . . . . . . . . . . .
. . . . . .
Associations for Okay Customer Set: LIS Removed
Associations for Good Customer Set: LIS Removed, Summary
Associations for Good Customer Set: LIS Removed, Detail . .
Associations for Good Customer Set: LIS and Certain Products
. . . . . . . . . . . . . . . . . . . . . . . . .
Removed, Summary
Associations for Good Customer Set: LIS and Certain Products
Removed, Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Associations for Okay Customer Set: LIS and Certain Products
. . . . . . . . . . . . . . . . . . . . . . . . .
Removed, Summary
Associations for Okay Customer Set: LIS and Certain Products
Removed, Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Associations for All Transactions: LIS Removed, Summary . .
Associations for All Transactions LIS Removed Detail . . . . .
Data Mining Process: Cross-Selling . . . . . . . . . . . . . . . .
Creating an Objective Variable . . . . . . . . . . . . . . . . . . .
. . . . . .
4
5
7
10
12
13
14
21
23
30
35
37
41
42
43
45
51
52
54
56
58
60
61
63
66
67
71
72
74
78
78
79
79
80
80
81
81
82
82
. . . . . .
82
. . . . . .
83
. . . . . .
83
. . . . . .
83
84
84
90
91
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
. . . . . .
vii
48.
49.
50.
51.
52.
53.
54.
55.
56.
57.
58.
59.
60.
61.
62.
63.
64.
65.
66.
67.
68.
viii
Cross Selling: Data Sampling (5252f405/50)
. . . . . . . . . . .
Detailed Predictive Modeling Process . . . . . . . . . . . . . . .
. .
Decision Tree Results: Isolating the Key Decision Criteria
Gains Chart for Decision Tree Results (s406/0.0) . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
RBF Results
. . . .
Cross-Selling: Comparison of Three Predictive Models
Cross-Selling: ROI Analysis Figures . . . . . . . . . . . . . . . .
Reducing Defections 5% Boosts Profits 25% to 85% . . . . . .
Data Mining Process: Attrition Analysis . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . .
Attrition Analysis: Data Definition
Times Series: Setting the Parameters . . . . . . . . . . . . . . .
Attrition Analysis: Decision Tree Structure . . . . . . . . . . . .
Decision Tree Gains Chart: Training and Testing . . . . . . . .
RBF: Results Window . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . .
Attrition Analysis: Predicting Values Result
Attrition Analysis: Predicting Values . . . . . . . . . . . . . . . .
Attrition Analysis: Comparative Gains Charts for All Methods
Attrition Analysis: Demographic Clustering of Likely Defectors
Profile of Time-Series Prediction . . . . . . . . . . . . . . . . . .
. . . . . . .
Time Profile of Defection Probability for Defectors
Time Profile of Defection Probability for Nondefectors . . . . .
Intelligent Miner Applications Guide
. . . . . .
. . . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
94
96
104
105
107
109
110
111
114
116
118
121
122
123
124
125
126
128
129
130
131
Tables
1.
2.
3.
4.
5.
6.
 Copyright IBM Corp. 1999
Customer Revenue by Cluster
. . . . . . . . . . . . . . . . . . . . . . . .
. . . . . .
Comparison of Neural and Demographic Clustering Results
Demographic Clustering Results: Percentage . . . . . . . . . . . . . . .
Cross-selling: Summary - Predictive Modeling More Than Doubles ROI
. . . . . . . . . . . . . . . . . .
Cross-Selling: Baseline ROI Calculation
Cross-Selling: ROI Analysis Figures . . . . . . . . . . . . . . . . . . . . .
64
65
. 77
88
. 88
109
.
.
ix
x
Intelligent Miner Applications Guide
Preface
This redbook is a step-by-step guide to data mining with Intelligent Miner
Version 2. It will help customers better understand the usability and the
business value of the product.
The focus is on helping the Intelligent Miner V2 user determine which algorithms
to use and how to effectively exploit them. The business utilized as a case study
in the book is a retail bank client of Loyalty Consulting, an IBM business partner
based in Toronto, Canada.
After a short introduction to data mining technology and Intelligent Miner V2, the
case study framework is described. The rest of the book covers each data
mining technique in detail and provides ideas on how to implement the
techniques.
Although no in-depth knowledge of the Intelligent Miner V2 is required, a basic
understanding of data mining technology is assumed.
The Team That Wrote This Redbook
This redbook was produced by a team of specialists from around the world
working at the International Technical Support Organization, San Jose Center.
Peter Cabena is a data warehouse and data mining specialist at IBM′ s
International Technical Support Organization - San Jose Center. He holds a
Bachelor of Science degree in computer science from Trinity College, Dublin,
Ireland. Peter has been extensively involved in the IBM data warehouse effort
since its inception in 1991. In recent years, he has taught and presented
internationally on the subjects of data warehousing and data mining.
Peter conceived and managed the project that produced this book.
Hyun Hee Choi is a data mining researcher at the Korea Software Development
Institute, a branch of IBM in Korea. She holds a Master of Science degree in
statistics from Korea University, Seoul, Korea, where she focused her research
on time-series analysis. Hyun Hee has several years of experience in data
mining and business intelligence consulting projects for airline, banking,
insurance, and cerdit card customer data analysis. She can be reached by
e-mail at [email protected].
Il Soo Kim is a Business Intelligence Solution Specialist at IBM Korea. He holds
a Master of Science degree in engineering from Seoul National University, Seoul,
Korea. Il Soo specializes in content management. Recently he has been
involved in constructing an in-house patent data warehouse and designing a
patent data analysis program.
Shuichi Otsuka works for the Business Intelligence Solution Center, IBM Japan.
He has been engaged for several years in data mining projects, mainly in
distribution industries. Shuichi and his collegues have translated Data Mining
with Neural Networks by Joe Bigus into Japanese.
Joerg Reinschmidt is a data management and data mining specialist at IBM′ s
International Technical Support Organization, San Jose Center. He has been
 Copyright IBM Corp. 1999
xi
engaged for several years in all data-management-related topics such as second
level suport and technical marketing support. For the last several years, Joerg
has taught several technical classes on DB2, while focusing on DB2 and IMS
Internet connectivity.
Gary Saarenvirta is a principal consultant of Loyalty Consulting at The Loyalty
Group in Toronto, Canada. He has worked in the business intelligence industry
for more than eight years, providing data mining and data warehousing
consulting services for Global 2000 companies. Gary joined The Loyalty Group
to manage the design, construction, and operation of the company′s data
warehouse. He played a key role in the development of Loyalty Consulting′ s
Decision Support business over the last few years.
Gary was the lead editor of this book and conceived the framework and data
mining methodology for each case study.
Thanks to the following people for their invaluable contributions to this project:
Hanspeter Nagel
International Technical Support Organization, San Jose Center
Susan Dahm
IBM Santa Teresa Laboratory
Ingrid Foerster
IBM Santa Teresa Laboratory
Comments Welcome
Your comments are important to us!
We want our redbooks to be as helpful as possible. Please send us your
comments about this or other redbooks in one of the following ways:
•
Fax the evaluation form found in “ITSO Redbook Evaluation” on page 155 to
the fax number shown on the form.
•
Use the electronic evaluation form found on the Redbooks Web sites:
For Internet users
For IBM Intranet users
•
Send us a note at the following address:
[email protected]
xii
http://www.redbooks.ibm.com
http://w3.itso.ibm.com
Intelligent Miner Applications Guide
Chapter 1. Introduction
Data mining is an interdisciplinary field bringing together techniques from
machine learning, pattern recognition, statistics, databases, and visualization to
address the issue of information extraction from large databases. The genesis
of the field came with the realization that traditional decision-support
methodologies, which combine simple statistical techniques with executive
information systems, do not scale to the point where they can deal with large
databases and data warehouses within the time limits imposed by today′ s
business environment. Data mining has captured the imagination of the
business and academic worlds, moving very quickly from a niche research
discipline in the mid-eighties to a flourishing field today. In fact, 80% of the
Fortune 500 companies are currently involved in a data mining pilot project or
have already deployed one or more data mining production systems.
1.1 Why Now?
Much of the current upsurge of interest in data mining arises from the
confluence of two forces: the need for data mining (drivers) and the means to
implement it (enablers). The drivers are primarily the business environment
changes that have resulted in an increasingly competitive marketplace. The
enablers are mostly recent technical advances in machine learning research,
database, and technologies. This happy coincidence of growing commercial
pressures and major advances in research and information technology lends an
inevitable push toward a more advanced approach to advise critical business
decisions.
Before looking at these drivers and enablers in some detail, it is worth reviewing
the commercial backdrop against which these two forces are coming together.
1.1.1 Changed Business Environment
Today′s business environment is in flux. Fundamental changes are influencing
the way organizations view and plan to approach their customers. Among these
changes are:
•
Customer behaviour patterns
Consumers are becoming more demanding and have access to better
information through buyers′ guides, catalogs, and the Web. New
demographics are emerging: Only 15% of U.S. families are now traditional
single-earner units, that is, a married couple with or without children where
only the husband works outside the home. Many consumers are reportedly
confused by too many choices and are starting to limit the number of
businesses with which they are prepared to deal. They are starting to put
more value on the time they spend shopping for goods and services.
•
Market saturation
Many markets have become saturated. For example, in the United States
almost everyone uses a bank account, has at least one credit card, has
some form of automobile and property insurance, and has well-established
purchasing patterns in basic food items. Thus, in these areas, few options
are available to organizations wanting to expand their market share. If a
merger or takeover is not possible, such organizations often must resort to
effectively stealing customers from competitors, frequently by what is called
 Copyright IBM Corp. 1999
1
predatory pricing . Lowering prices is not a sound long-term strategy,
however, as only one supplier can be the lowest-cost provider.
•
New niche markets
New, untapped markets are opening up. Examples are the handicapped and
ethnic groups or the current U.S. inner-city hip-hop culture. Also, highly
specialized stores such as SunGlass Hut are emerging.
•
Increased commoditization
Increased commoditization, where even many leading brand products and
services are finding it increasingly difficult to differentiate themselves, has
sent many suppliers in search of new distribution channels. Witness the
increase in online service outlets, from catalogs to banking and insurance to
Internet-based shopping malls.
•
Traditional marketing approaches under pressure
Traditional mass marketing and even database marketing approaches are
becoming ineffective, as customers are increasingly turning to more targeted
channels. Customers are shopping in fewer stores and are expecting to do
more one-stop shopping.
•
Time to market
Time to market has become increasingly important. Witness the recent
emergence and spectacular rise of Netscape Communications Corporation in
the Web browser marketplace. With only a few months′ lead over its rivals,
Netscape captured an estimated 80% of the browser market within a year of
establishment. This is the exception, of course; most companies operate by
making small incremental changes to services or products to capture
additional customers.
•
Shorter product life cycles
Today products are brought to market quickly but often have a short life
cycle. This phenomenon is currently exemplified by the personal computer
and Internet industries where new products and services are offered at
arguably faster rates than at any other time in the history of computing. The
result of these shortened life cycles is that providers have less time to turn a
profit or to ″milk″ their products and services.
•
Increased competition and business risks
Many of the above changes tend to combine to create a climate that is
significantly competitive and a challenging risk management environment for
many organizations. General trends like commoditization, globalization,
deregulation, and the Internet make it increasingly difficult to keep track of
competitive forces, both traditional and new. Equally, rapidly changing
consumer trends inject new risks into doing business.
1.1.2 Drivers
Against this background, many organizations have been forced to reevaluate
their traditional approaches to doing business and have started to look for ways
to respond to changes in the business environment. The main requirements
driving this reevaluation are:
•
Focus on the customer
The requirement here is to rejuvenate customer relationships with an
emphasis on greater intimacy, collaboration, and one-to-one partnership. In
2
Intelligent Miner Applications Guide
turn, this requirement has forced organizations to ask new questions about
their existing customers and potential customers, for example:
•
−
Which general classes of customer do I have?
−
How can I sell more to my existing customers?
−
Is there a recognizable pattern whereby my customers acquire products
or use services?
−
Which of my customers will prove to be good, long-term valuable
customers and which will not?
−
Can I predict which of my customers are more likely to default on their
payments or to defraud me?
Focus on the competition
Organizations need to focus increasingly on competitive forces with a view to
building up a modern armory of business weapons. Some of the approaches
to building such an armory are:
•
−
Prediction of potential strategies or major business plans by leading
competitors
−
Prediction of tactical movements by local competitors
−
Discovery of subpopulations of existing customers that are especially
vulnerable to competitive offers
Focus on the data asset
Business and information technology (IT) managers are becoming
increasingly aware that there is an information-driven opportunity to be
seized. Many organizations are now beginning to view their accumulated
data resources as a critical business asset.
Some of the factors contributing to this growing awareness are:
−
Growing evidence of exponential return on investment (ROI) numbers
from industry watchers and consultants on the benefits of a modern,
corporate, decision-making strategy based on data-driven techniques
such as data warehousing. Data mining is a high-leverage business
where even small improvements in the accuracy of business decisions
can have huge benefits.
−
Growing availability of data warehouses. As the data warehouse
approach becomes more pervasive, early adopters are forced to
leverage further value from their investments by pushing into new
technology areas to maintain their competitive edge.
−
Growing availability of success stories, both anecdotal and otherwise, in
the popular trade press.
Figure 1 on page 4 summarizes the situation. The frustrated business executive
is attempting to grasp new opportunities such as better customer relationships
and improved services. He fails, however, given the combination of a rapidly
changing business environment and poor or outdated in-house technology
systems.
Chapter 1. Introduction
3
Figure 1. New Customer Relationships Out of Reach
1.1.3 Enablers
There is a set of enablers for data mining that, when combined with the driving
forces discussed above, substantially increases the momentum toward a revised
approach to business decision making:
•
Data flood
Forty years of information technology have led to the storage of enormous
amounts of data (measured in gigabytes and terabytes) on computer
systems. A typical business trip today generates an automatic electronic
audit trail of a traveler′s habits and preferences in airline travel, car hire,
credit card usage, reading material, mobile phone services, and perhaps
Web sites.
In addition, the increasing availability of demographic and psychographic
data from syndicated providers, such as A.C. Nielsen and Acxiom in the
United States, has provided data miners with a useful data source.The
availability of such data is particularly important given the focus in data
4
Intelligent Miner Applications Guide
mining on consumer behavior, which is often driven by preferences and
choices that are not visible in a single organization′s database.
•
Growth of data warehousing
The growth of data warehousing in organizations has led to a ready supply
of the basic raw material for data mining: clean and well-documented
databases. Early adopters of the warehousing approach are now poised to
further capitalize on their investment. See ″The Data Warehouse Connection″
on page 18 for a detailed discussion of the integration of data warehouse
and data mining approaches.
•
New information technology solutions
More cost-effective IT solutions in terms of storage and processing ability
have made large-scale data mining projects possible. This is particularly true
of parallel technologies, as many of the data mining algorithms are parallel
by nature. Furthermore, increasingly affordable desktop power has enabled
the emergence of sophisticated visualization packages, which are a key
weapon in the data mining armory.
•
New research in machine learning
New algorithms from research centers and universities are being pressed
into commercial service more quickly than ever. Emphasis on commercial
applications has focused attention on better and more scalable algorithms,
which are beginning to come to market through commercial products. This
movement is supported by increasing contact and joint ventures between
research centers and commercial industries around the world.
The net effect of the changed business environment is that decision making has
become much more complicated, problems have become more complex, and the
decision-making process less structured. Decision makers today need a set of
strategies and tools to address these fundamental changes.
1.2 What Is Data Mining?
It is difficult to make definitive statements about an evolving area—and surely
data mining is an area in very quick evolution. However, we need a framework
within which to position and better understand the subject. Figure 2 shows a
general positioning of the components in a data mining environment.
Figure 2. Data M i n i n g Positioning
Although there is no one single definition of data mining that would meet with
universal approval, the following definition is generally acceptable:
Chapter 1. Introduction
5
Data Mining...
is the process of extracting previously unknown, valid, and actionable
information from large databases and then using the information to make
crucial business decisions.
The highlighted words in the definition lend insight into the essential nature of
data mining and help to explain the fundamental differences between it and the
traditional approaches to data analysis such as query and reporting and online
analytical processing (OLAP). In essence, data mining is distinguished by the
fact that it is aimed at the discovery of information, without a previously
formulated hypothesis.
First, the information discovered must have been previously unknown. Although
this sounds obvious, the real issue here is that it must be unlikely that the
information could have been hypothesized in advance; that is, the data miner is
looking for something that is not intuitive or, perhaps, even counterintuitive. The
further away the information is from being obvious, potentially the more value it
has. Data mining can uncover information that could not even have been
hypothesized with other approaches.
Second, the new information must be valid. This element of the definition relates
to the problem of overoptimism in data mining; that is, if data miners look hard
enough in a large collection of data, they are bound to find something of interest
sooner or later. For example, the potential number of associations between
items in customers′ shopping baskets rises exponentially with the number of
items. The possibility of spurious results applies to all data mining and
highlights the constant need for post-data-mining validation and sanity checking.
Third, and most critically, the new information must be actionable, that is, it must
be possible to translate it into some business advantage. In the case of the
classic example of the retail store manager, who, using data mining, discovered
that there was a strong association between the sales of diapers and beer on
Friday evenings, clearly he could leverage the results of the analysis by placing
the beer and diapers closer together in the store or by ensuring that the two
items were not discounted at the same time. In many cases, however, the
actionable criterion is not so simple.
The ability to use the mined data to inform crucial business decisions is another
critical environmental condition for successful commercial data mining and
underpins data mining′s strong association with and applicability to business
problems. Needless to say, an organization must have the necessary political
will to carry out the action implied by the mining.
1.3 Data Mining and Business Intelligence
We use business intelligence as a global term for all the processes, techniques,
and tools that support business decision-making based on information
technology. The approaches can range from a simple spreadsheet to a major
competitive intelligence undertaking. Data mining is an important new
component of business intelligence. Figure 3 on page 7 shows the logical
positioning of different business intelligence technologies according to their
potential value as a basis for tactical and strategic business decisions.
6
Intelligent Miner Applications Guide
In general, the value of the information to support decision-making increases
from the bottom of the pyramid to the top. A decision based on data in the lower
layers, where there are typically millions of data records, will typically affect only
a single customer transaction. A decision based on the highly summarized data
in the upper layers is much more likely to be about company or department
initiatives or even major redirection. Therefore we generally also find different
types of users on the different layers. A database administrator works primarily
with databases on the data source and data warehouse level, whereas business
analysts and executives work primarily on the higher levels of the pyramid.
Note that Figure 3 portrays a logical positioning and not a physical
interdependence among the various technology layers. For example, data mining
can be based on data warehouses of flat files, and the data presentation can be
used outside data mining, of course.
Figure 3. Data M i n i n g and Business Intelligence
1.3.1 Where to from Here?
It is probably a little early to ponder the future of data mining, but some trends
on the horizon are already becoming clear.
Data mining technology trends are becoming established as we see vendors
scramble to position their tools and services within the new data mining
paradigm. This scramble will be followed by the inevitable technology shakeout
where some vendors will manage to establish leadership positions in the
provision of tools and services and others will simply follow. Doubtless, new data
mining algorithms will continue to be developed, but, over time, the technology
will begin to dissolve into the general backdrop of database and data
management technology. Already, we are seeing the merging of OLAP and
multidimensional database analysis (MDA) tools and the introduction of
structured query language (SQL) extensions for mining data directly from
relational databases.
Chapter 1. Introduction
7
On the data mining process side, there will be more open sharing of experiences
by the early adopters of data mining. Solid, verifiable success stories are
already beginning to appear. Over time, as more of the implementation details
of these successes emerge, knowledge of the data mining process will begin to
move out into the public domain.
The final phase in the evolution will be the integration of the data mining process
into the overall business intelligence machinery. In the long run, data mining,
like all truly great technologies, may simply become transparent!
1.4 Data Mining Applications
Large customers in mature, competitive industries can no longer establish
competitive advantage through transaction systems or business process
improvement. To distinguish their strategies and operations from those of
competitors, they must discover and extract strategic value from their
operational data. Companies generate enormous amounts of data during the
course of doing business, and Business Intelligence is the process of
transforming that data into knowledge.
Business Intelligence enables companies to make strategic marketing decisions
about which markets to enter and which products to promote, all in an effort to
increase profitability. Some customers use business intelligence for marketing
purposes, others, to detect fraud. New marketing strategies and the
implementation of fraud detection can also reduce operating costs through
effective financial analysis, risk management, fraud management, distribution
and logistics management, and sales analysis.
Perhaps the best known application area for data mining is database marketing.
The objective is to drive targeted and therefore effective marketing and
promotional campaigns through the analysis of corporate databases. Data known
through credit card transactions or loyalty cards, for example, mixed with
publicly available information from sources such as lifestyle studies forms a
potent concoction. Data mining algorithms then sift through the data, looking for
clusters of ″model″ consumers who share the same characteristics such as
interests, income level, and spending habits. It is a win-win game for both the
consumers and marketers: Consumers perceive greater value in the (reduced)
number of advertising messages, and marketers save by limiting their
distribution costs and getting an improved response to the campaign.
Another application area for data mining is that of determining customer
purchasing patterns over time. Marketers can determine much about the
behavior of consumers, such as the sequence in which they take up financial
services as their family grows, or how they change their cars. Commonly the
conversion of a single bank account to a joint account indicates marriage, which
could lead to future opportunities to sell a mortgage, a loan for a honeymoon
vacation, life insurance, a home equity loan, or a loan to cover college fees. By
understanding these patterns, marketers can advertise just-in-time to these
consumers, thus ensuring that the message is focused and likely to draw a
response. In the long run, focusing on long-term customer purchasing patterns
provides a full appreciation of the lifetime value of customers, where the strategy
is to move away from share of market to share of customer. An average
supermarket customer is worth $200,000 over his or her lifetime, and General
Motors estimates that the lifetime value of an automobile customer is $400,000,
8
Intelligent Miner Applications Guide
which includes car, service, and income on loan financing. Clearly,
understanding and cultivating long-term relationships bring commercial benefits.
Cross-selling campaigns constitute another application area where data mining
is widely used. Cross selling is where a retailer or service provider makes it
attractive for customers who buy one product or service to buy an associated
product or service.
1.5 Data Mining Techniques
Data mining techniques are specific implementations of the algorithms that are
used to carry out the data mining operations.
Predictive modeling, database segmentation, link analysis, and deviation
detection are the four major operations for implementing any of the business
applications. We deliberately do not show a fixed, one-to-one link between the
business applications and data mining layers, to avoid the suggestion that only
certain operations are appropriate for certain applications and vice versa. (On
the contrary, truly breakthrough results can sometimes come from the use of
nonintuitive approaches to problems.) Nevertheless, certain well-established
links between the applications and the corresponding operations do exist. For
example, modern target marketing strategies are almost always implemented by
means of the database segmentation operation. However, fraud detection could
be implemented by any of the four operations, depending on the nature of the
problem and input data. Furthermore, the operations are not mutually exclusive.
For example, a common approach to customer retention is to segment the
database first and then apply predictive modeling to the resultant, more
homogeneous segments. Typically the data analyst, perhaps in conjunction with
the business analyst, selects the data mining operations to use.
Not all algorithms to implement a particular data mining operation are equal,
and each has its own strengths and weaknesses.
The key message is this: There is rarely one, fool-proof technique for any given
operation or application, and the success of the data mining exercise relies
critically on the experience and intuition of the data analyst.
In the sections that follow we discuss in detail the operations associated with
data mining.
1.5.1 Predictive Modeling
Predictive modelling is akin to the human learning experience, where we use
observations to form a model of the essential, underlying characteristics of some
phenomenon. For example, in its early years, a young child observes several
different examples of dogs and can then later in life use the essential
characteristics of dogs to accurately identify (classify) new animals as dogs.
This predictive ability is critical in that it helps us to make sound generalizations
about the world around us and to fit new information into a general framework.
In data mining, we use a predictive model to analyze an existing database to
determine some essential characteristics about the data. Of course, the data
must include complete, valid observations from which the model can learn how
to make accurate predictions. The model must be told the correct answer to
some already solved cases before it can start to make up its own mind about
Chapter 1. Introduction
9
new observations. When an algorithm works in this way, the approach is called
supervised learning. Physically, the model can be a set of IF THEN rules in some
proprietary format, a block of SQL, or a segment of C source code.
Figure 4 illustrates the predictive modeling approach. Here a service company,
for example an insurance company, is interested in understanding the increasing
rates of customer attrition. A predictive model has determined that only two
variables are of interest: the length of time the client has been with the company
(Tenure), and the number of the company′s services that the client uses
(Services). The decision tree presents the analysis in an intuitive way. Clearly,
those customers who have been with the company less than 2.5 years and use
only one or two services are the most likely to leave.
Figure 4. Predictive Modeling
Models are developed in two phases: training and testing. Training refers to
building a new model by using historical data, and testing refers to trying out the
model on new, previously unseen data to determine its accuracy and physical
performance characteristics. Training is typically done on a large proportion of
the total data available, whereas testing is done on some small percentage of
the data that has been held out exclusively for this purpose.
The predictive modeling approach has broad applicability across many
industries. Typical business applications that it supports are customer retention
management, credit approval, cross selling, and target marketing.
There are two specializations of predictive modeling: classification and value
prediction. Although both have the same basic objective, namely, to make an
educated guess about some variable of interest, they can be distinguished by the
nature of the variable being predicted.
With classification, a predictive model is used to establish a specific class for
each record in a database. The class must be one from a finite set of possible,
predetermined class values. The insurance example in Figure 4 is a case in
point. The variable of interest is the class of customer, and it has two possible
values: STAY and LEAVE.
10
Intelligent Miner Applications Guide
With value prediction, a predictive model is used to estimate a continuous
numeric value that is associated with a database record. For example, a car
retailer may want to predict the lifetime value of a new customer. A mining run
on the historical data of present long-standing clients, including some
agreed-upon measure of their financial worth to date, produces a model that can
estimate the likely lifetime value of new customers.
A specialization of value prediction is scoring , where the variable to be predicted
is a probability or propensity. Probability and propensity are similar in that they
are both indicators of likelihood. Both use an ordinal scale, that is, the higher
the number, the more likely it is that the predicted event will occur. Typical
applications are the prediction of the likelihood of fraud or the probability that a
customer will respond to a promotional mailing.
1.5.2 Database Segmentation
The goal of database segmentation is to partition a database into segments of
similar records, that is, records that share a number of properties and so are
considered to be homogeneous. In some literature the words segmentation and
clustering are used interchangeably. Here, we use segmentation to describe the
data mining operation, and segments or clusters to describe the resulting groups
of data records. By definition, two records in different segments are different in
some way. The segments should have high internal (within segment)
homogeneity and high external (between segment) heterogeneity.
Database segmentation is typically done to discover homogeneous
subpopulations in a customer database to improve the accuracy of the profiles.
A subpopulation, which might be ″wealthy, older, males″ or ″urban, professional
females,″ can be targeted for specialized treatment. Equally, as databases grow
and are populated with diverse types of data, it is often necessary to partition
them into collections of related records to obtain a summary of each database or
before performing a data mining operation such as predictive modeling.
Figure 5 on page 12 shows a scatterplot of income and age from a sample
population. The population has been segmented into clusters (indicated by
circles) that represent significant subpopulations within the database. For
example, one cluster might be labeled ″young, well-educated professionals″ and
another, ″older, highly paid managers.″
The grid lines and shaded sectors on the plot illustrate the comparative
inefficiency of the traditional, slice-and-dice approach to the problem of database
segmentation. The overlaid areas do not account for the truly homogeneous
clusters because they either miss many of the cluster members or take in
extraneous cluster members—which will skew the results.
In contrast, the segmentation algorithm can segment a database without any
prompting from the user about the type of segments or even the number of
segments it is expected to find in the database. Thus, any element of human
bias or intuition is removed, and the true discovery nature of the mining can be
leveraged. When an algorithm works in this way, the approach is called
unsupervised learning.
Chapter 1. Introduction
11
Figure 5. Database Segmentation
Database segmentation can be accomplished by using either demographic or
neural clustering methods. The methods are distinguished by:
•
The data types of the input attributes that are allowed
•
The way in which they calculate the distance between records (that is, the
measure of similarity or difference between the records, which is the
essence of the segmentation operation)
•
The way in which they organize the resulting segments for analysis
Demographic clustering methods operate primarily on records with categoric
variables. They use a distance measurement technique based on the voting
principle called condorect, and the resulting segments are not prearranged on
output in any particular hierarchy.
Neural clustering methods are built on neural networks, typically by using
Kohonen feature maps. Neural networks accept only numeric input, but
categorical input is possible by first transforming the input variables into
quantitative variables. The distance measurement technique is based on
Euclidean distance, and the resulting segments are arranged in a hierarchy
where the most similar segments are placed closest together.
Segmentation differs from other data mining techniques in that its objective is
generally far less precise than the objectives of predictive modeling or link
analysis. As a result, segmentation algorithms are sensitive to redundant and
irrelevant features. This sensitivity can be alleviated by directing the
segmentation algorithm to ignore a subset of the attributes that describe each
instance or by assigning a weight factor to each variable.
12
Intelligent Miner Applications Guide
Segmentation supports such business applications as customer profiling or
target marketing, cross selling, and customer retention. Clearly, this operation
has broad, cross-industry applicability.
1.5.3 Link Analysis
In contrast to the predictive modeling and database segmentation operations,
which aim to characterize the contents of the database as a whole, the link
analysis operation seeks to establish links (associations) between individual
records, or sets of records, in the database. A classic application of this
operation is associations discovery, that is, discovering the associations between
the products or services that customers tend to purchase together or in a
sequence over time. Other examples of business applications that link analysis
supports are cross selling, target marketing, and stock price movement.
There are three specializations of link analysis: associations discovery,
sequential pattern discovery, and similar time sequence discovery. The
differences among the three are best illustrated by some examples. If we define
a transaction as a set of goods purchased in one visit to a shop, associations
discovery can be used to analyze the goods purchased within the transaction to
reveal hidden affinities among the products, that is, which products tend to sell
well together. This type of analysis is called market basket analysis (MBA) or
product affinity analysis.
Sequential pattern discovery is used to identify associations across related
purchase transactions over time that reveal information about the sequence in
which consumers purchase goods and services. It aims to understand long-term
customer buying behavior and thus leverage this new information through timely
promotions.
Similar time sequence discovery, the discovery of links between two sets of data
that are time dependent, is based on the degree of similarity between the
patterns that both time series demonstrate. Retailers would use this approach
when they want to see whether a product with a particular pattern of sales over
time matches the sales curve of other products, even if the pattern match is
lagging some time behind. Figure 6 shows an example of three apparently
unrelated patterns that could represent sales histories or even stock movements
over time. At first glance the graphs appear not to be related in any significant
way. However, on closer examination, definite patterns can be identified, which,
when translated into business terms, can be exploited for commercial gain.
Figure 6. Pattern Matching
Chapter 1. Introduction
13
1.6 General Approach to Data Mining
Figure 7 depicts a general data mining process that fits into an overall business
process. In this section we briefly describe most of the actions to be performed
within the data mining process.
Figure 7. The Data M i n i n g Process
14
Intelligent Miner Applications Guide
1.6.1 Business Requirements Analysis
The first part of any data mining project is to understand the client′s business
requirements. The business requirements that form the project objectives
should be clearly presented and understood by all members of the project team.
The data mining process is driven by the client′s business requirements.
•
Economics of the business problem
In order to turn the data mining results into actionable business results, it is
important to understand the economics and/or other drivers of the client′ s
business requirements. The data mining activities activities undertaken must
adhere to and improve the economics and/or other drivers of the
requirements.
•
Review of current methods used
An understanding of the current methods used and the current business
performance of the methods is required to ensure that the application of new
technologies and methods adds incremental value beyond the status quo.
The status quo performance is the minimum performance required of any
new methods.
•
Expected performance of a new method
The expected improvement over status quo methods should be presented by
the client to ensure that the project team has clear objectives. It is also
important to set attainable expectations of results.
1.6.2 Project Management
The second part of a data mining project is to define the scope of the project and
the team to run the project.
•
Project team identification
A cross-functional project management team including representation from
all parties is defined and ensures that all project issues are appropriately
discussed and resolved.
•
Project plan design
The first task of the project management team is to agree on a project plan
that includes identification of all project tasks, project task resourcing,
project task scheduling, and project task estimation.
•
Project objectives
At the outset of the project, a clear set of objectives must be defined to
maintain project focus and help resolve project issues. A project without
clear objectives has a high probability of not being completed on time and
with positive results.
•
Project evaluation criteria
The client must present criteria that will be used to evaluate the success of
the project. The project management team should modify the evaluation
criteria as required to set an appropriate expectation of success and achieve
an objective evaluation.
Chapter 1. Introduction
15
1.6.3 Business Solution Design
Before the actual data mining phase of a project, a business solution must be
designed. The solution should define the detailed data mining tasks and can be
illustrated as a flow diagram.
1.6.4 Data Mining Run
As illustrated in Figure 7 on page 14, the data mining action is iterative and
consists of these steps:
•
Data selection
This step involves identifying and selecting all relevant data that can be used
for data mining. The business requirement defines that data selection. The
data requirement project activity defined above is the data selection mining
activity.
•
Data preparation
Data preparation is a substantial portion of a data mining project. It involves
the treatment of missing values, outliers, and the creation of new variables
based on data transformations. Each data mining algorithm has a different
data preparation requirements. Data preparation could also include data
reduction, which is defined by the maximum number of variables that an
algorithm can effectively utilize, and data sampling. The sampling
requirements are driven by the different data mining algorithms.
•
Data mining
Data mining involves the execution of the various data mining algorithms
against the prepared data sets. Several (tens to hundreds) mining runs are
completed for each data mining project. The effects of algorithm parameters
and data transformations are scientifically evaluated.
•
Results analysis
Once a data model has been created and tested, its performance is
analyzed. The analysis includes a description of all key variables and
findings that the model permits. All modeling assumptions are outlined, and
implementation issues are presented.
1.6.5 Business Implementation Design
Business implementation design involves designing the implementation of the
data mining results, with the goal of meeting the defined business requirements.
The design should support quality control, tracking of business results, and the
ability to prove the causal effect of the data mining result. The design must also
take into account any business implementation issues that are not part of the
data mining project. The business implementation design is more experimental
than fixed.
1.6.6 Business Implementation
The business implementation is the execution of the experimental design.
16
Intelligent Miner Applications Guide
1.6.7 Results Tracking
If required by the client preliminary business results can be tracked against the
expected performance to ensure success of the business implementation.
Preliminary results can be used to modify the current business activity if
warranted.
1.6.8 Final Business Result Determination
At the conclusion of any business activity, a complete analysis of the profitability
of the business implementation will be evaluated. The performance of the model
will also be analyzed against its expected performance.
1.6.9 Business Result Analysis
The final business result should be analyzed to identify general learnings that
can be fed into future projects. Many companies are beginning to create
learning warehouses to store corporate knowledge.
Chapter 1. Introduction
17
18
Intelligent Miner Applications Guide
Chapter 2. Introduction to the Intelligent Miner
The IBM Intelligent Miner for Data (IM in this book) is leading the way in helping
customers identify and extract high-value business intelligence from their data
assets. The process is one of discovery . Companies are empowered to
leverage information hidden within enterprise data and discover associations,
patterns and trends; detect deviations; group and classify information; and
develop predictive models.
2.1 History
IBM′s award-winning Intelligent Miner was released in 1996. It enables users to
mine structured data stored in conventional databases or flat files. Customers
and partners have successfully deployed its mining algorithms to address such
business areas as market analysis, fraud and abuse, and customer relationship
management.
2.2 Intended Customers
The Intelligent Miner offerings are intended for use by data analysts and
business technologists in areas such as marketing, finance, product
management, and customer relationship management. In addition, the text
mining technologies have applicability to a wide range of users who regularly
review or research documents - for example, patent attorneys, corporate
librarians, public relations teams, researchers, and students.
2.3 What Is the Intelligent Miner?
The IBM Intelligent Miner is a suite of statistical, processing, and mining
functions that you can use to analyze large databases. It also provides
visualization tools for viewing and interpreting mining results. The server
software runs on AIX, AS/400, OS/390, and Sun Solaris operating systems. AIX,
OS/2, and Windows operating systems can be used for the clients.
Some of the features provided by the Intelligent Miner include:
•
Extension of the associations, classification, clustering, and prediction
functions
•
Neural prediction
•
Statistical functions
•
Export and import of mining bases across operating systems
•
Exploitation of DB2 Parallel Edition and DB2 Universal Database Enterprise
Extended Edition
•
Repeatable sequences
•
API for all server platforms
The Intelligent Miner provides a complete graphical user interface with
TaskGuides that lead you through the steps of creating the different Intelligent
Miner objects. General help for each TaskGuide provides additional information,
examples, and valid values for the controls on each page.
 Copyright IBM Corp. 1999
19
In the sections that follow we introduce the data mining technology and the data
mining process of the Intelligent Miner. We also explain in general the
statistical, processing, and mining functions that Intelligent Miner provides.
2.4 Data Mining with the Intelligent Miner
Data mining is the process of discovering valid, previously unknown, and
ultimately comprehensible information from large stores of data. It can be used
to extract information to form a prediction or classification model, or to identify
similarities between database records. The resulting information can help you
make more informed decisions.
The Intelligent Miner helps organizations perform data mining tasks. For
example,a retail store might use the Intelligent Miner to identify groups of
customers that are most likely to respond to new products and services or to
identify new opportunities for cross selling. An insurance company might use the
Intelligent Miner with claims data to isolate likely fraud indicators.
2.5 Overview of the Intelligent Miner Components
In this section we provide a high-level overview of the product architecture. See
the Intelligent Miner Application Programming Interface and Utility Reference for
more detailed information about the architecture and the APIs for the Intelligent
Miner.
The Intelligent Miner links the mining and processing functions on the server
with the administrative and visualization tools on the client. The client
component includes a user interface from which you can invoke the mining and
processing functions on an Intelligent Miner server. The results of the mining
process can be returned to the client where you can visualize and analyze them.
The client components are available for AIX, OS/2, Windows NT, and Windows 95
operating systems. The server components are available for AIX, OS/390,
AS/400, and Sun Solaris systems. They are also available for RS/6000 SP and
exploit parallel mining on multiple processing nodes. You can have client and
server components on the same machine.
2.5.1 Intelligent Miner Architecture
Figure 8 on page 21 illustrates the client and server components of the
Intelligent Miner and the way they are related to one another :
20
Intelligent Miner Applications Guide
Figure 8. The Intelligent M i n e r Architecture
User Interface The user interface is a program that enables you to define data
mining functions in a graphical environment. You can define
preferences for the user interface that are stored on the client.
Environment Layer API The environmental layer API is a set of API functions that
control the execution of mining runs and results. Sequences of
functions and mining operations can be defined and executed by
using the user interface through the environment layer API.
Data Definition This feature of Intelligent Miner provides the ability to collect and
prepare the data for the data mining process.
Visualizer The Intelligent Miner provides a rich set of visualization tools. You
can also use other visualization tools.
Data Access The Intelligent Miner provides access to flat files, database tables,
and database views.
Chapter 2. Introduction to the Intelligent Miner
21
Databases and Flat Files The Intelligent Miner components work directly with
data stored in a relational database or in flat files. The data is not
copied to a special format. You define input and output data objects
that are logical descriptions of the physical data. Therefore the
physical location of the data can be changed without affecting objects
that use the data; only the logical descriptions must be changed. The
change might be as simple as changing a database name.
Processing Library The processing library provides access to database functions
such as bulk load of data and data transformation.
Mining Bases Mining bases are collections of data mining objects used for a
mining objective or business problem. Mining bases are stored on the
server, which allows access from different clients.
Mining Kernels Mining kernels provide the data mining and statistical functions.
Mining Results, Result API, and Export Tools Mining results are the data
resulting from running a mining or statistics function. These
components allow you to visualize results at the client. Results can
be exported for use by visualization tools.
2.5.2 Intelligent Miner TaskGuides
Data mining in the Intelligent Miner is accomplished through the creation of
interrelated objects. The objects are displayed as icons and represent the
collection of attributes or settings that define the data or function.
Working with the Intelligent Miner graphical user interface is fairly simple as
Intelligent Miner offers TaskGuides. In this section we explain how to use a
Taskguide to create a settings object.
To create a settings object, use the Create menu or click on a settings object
icon in the task bar. A TaskGuide opens to guide you through the creation of the
object.
Each TaskGuide starts with a Welcome page that provides an overview of the
type of settings object that you are creating. Each TaskGuide page provides
step-by-step instructions for filling in the fields and making selections that define
the settings for the object. You can click on a highlighted term to see a short
definition of the term.
Click the Next button to navigate to the next TaskGuide page. The last page of
every TaskGuide summarizes the settings object that you created. Click the
Finish button to create the object.
Figure 9 on page 23 shows the TaskGuide for creating a data settings object.
22
Intelligent Miner Applications Guide
Figure 9. The Data Task Guide
You can have more than one TaskGuide open at a time. Thus you can leave a
TaskGuide to create another object that is required to complete the first
TaskGuide. For example, while you are in the process of defining a mining
function, you might have to define or modify an input data object. You can open
a Data TaskGuide to define an input data object, then continue with the Mining
TaskGuide.
2.5.3 Mining and Statistics Functions
Mining and statistics settings objects are similar in that they represent analytical
functions that are run against data. In both cases, you must indicate which data
settings object you want to use.
Mining and statistics settings objects produce a results object when run. You can
view and analyze the results object with visualization tools. You can also
indicate in the settings for these functions that you want to create output data in
addition to a results object.
Chapter 2. Introduction to the Intelligent Miner
23
The Intelligent Miner has many types of mining and statistics functions:
Mining
Statistics
Associations
Clustering − demographic
Clustering − neural
Sequential patterns
Time sequence
Classification − tree
Classification − neural
Prediction − Radial-Basis-Function
Prediction − neural
Cross-correlation
Correlation matrixes
Factor analysis
Linear regression
Principal component analysis
Univariate curve fitting
Bivariate statistics
2.5.4 Processing Functions
Processing functions are used to make data suitable for mining or analysis.
Processing settings objects apply only to database tables and views because
they take advantage of the processing capability of the database engine.
The Intelligent Miner has many processing functions:
Aggregate values
Calculate values
Clean up input data or output data
Convert to lowercase or uppercase
Copy records to file
Discard records with missing values
Discretization into quantiles
Discretization using ranges
Encode missing values
Encode nonvalid values
Filter fields
Filter records
Filter records using a value set
Get random sample
Group records
Join data sources
M a p values
Pivot fields to records
Run SQL
Processing settings objects always read input from a database and create output
data in a database. The only exception is the Copy Records to File function,
which copies data to a file. When you create a processing settings object or
update an existing one, you can use a data settings object to identify input data
or output data. In this way the name of a database table or view is copied to the
processing settings object. Subsequent changes to the data settings object have
no effect on the processing settings object.
2.5.5 Modes
How results objects are used with Intelligent Miner depends on the mode in
which functions are run. Intelligent Miner provides the following modes under
which to perform the mining process:
Training
In training mode, a mining function builds a model on the basis of the
selected input data.
Clustering
In clustering mode, the clustering functions build a model on the
basis of the selected input data. Clustering mode is similar to training
mode for the predictive algorithms. Clustering mode offers the choice
of using background statistics from the input data or an input result.
Test
In test mode, a mining function uses new or the same data with
known results to verify that the model created in training mode
24
Intelligent Miner Applications Guide
produces consistent results. Results objects are used for input and
created as output.
Application
In application mode, a mining function uses a model created in
training mode to predict the specified field for every record in the new
input data. The data format must be identical to that used to
generate the model.
For more information about how to work with the Intelligent Miner, see Using the
Intelligent Miner for Data, SH12-6325-01, the documentation shipped with the
product.
Chapter 2. Introduction to the Intelligent Miner
25
26
Intelligent Miner Applications Guide
Chapter 3. Case Study Framework
Customer Relationship Management (CRM) is a key focus area today in
marketing departments in many different industries including finance,
telecommunications, utilities, and insurance. Businesses in these industries
have changed or are changing their marketing focus from a product-centric view
to a customer-centric view. There are several reasons for this change in focus:
increased competition for nongrowing markets, government deregulation, a
technology revolution enabling the consolidation of corporate data and access to
new data sources, and a growing awareness that the primary assets of a
business are its customers.
3.1 Customer Relationship Management
CRM is a methodology used to market to customers. CRM′s key features
include customer profitability, customer lifetime value, and customer loyalty. In
managing their customers, businesses recognize that all customers are not
created equal and that they should focus their marketing efforts on retaining
their best customers, increasing the profitability of their high-potential
customers, spending less marketing dollars on their low-potential customers,
and acquiring new high-potential customers at a lower cost. A customer
segmentation based on their key characteristics is central to CRM and is used to
derive strategic marketing campaigns.
A consolidated customer view enabled through the process of data warehousing
permits businesses to determine the current and potential value of customers. A
business can associate customer purchase behaviors with their customers ′
value to the shareholders. By understanding the association between
transaction behavior and shareholder value, marketers can influence customers
to change their purchase behavior in ways profitable to the organization.
By further understanding the complete view of its customers, including
demographic, geodemographic, and psychographic profiles, a business can do
more than simply influence behavioral change through the use of customer
rewards. Understanding the needs of customers, as exhibited through their
purchase behavior, marketers can use the customer profile information to better
serve these customers by targeting them for products/services that they are
likely to purchase. Increased understanding of their customers also allows
marketers to communicate relevant messages through customer-preferred
channels such as direct mail or phone campaigns. Effectively serving the needs
of the customer requires less incentive to change customer behavior. Increased
targeting of customers, focusing on meeting strategic campaign initiatives for
smaller customer segments, substantially reduces the cost of marketing and can
increase its effectiveness.
Strategic campaign initiatives can be derived by creating customer segmentation
models. Several different strategic initiatives can be applied to the different
customer segments. Businesses have realized that a minority of customers,
10%-25%, contribute the lion′s share, 40%-80%, of the bottom line. A retention
strategy is the primary initiative for these ″best″ customers. As many as five
average customers are required to replace a ″best″ customer. With the high
cost of customer acquisition, businesses have a strong business case to invest
heavily in retaining their ″best″ customers and best potential customers. Loyal
 Copyright IBM Corp. 1999
27
customers increase in value over time; they spend more over time, consolidate
their purchases, and refer new customers.
Another important customer segment to consider is that containing customers
with a high potential value. In addition to retention, high-potential customers are
candidates for cross-selling and up-selling campaigns. Finding additional
products and services that can be marketed to this segment can be determined
by analyzing the customer purchase behaviors. By profiling and understanding
the characteristics of best customers, a business can effectively target customer
lists to acquire more profitable customers.
In addition to changing the way in which organizations market to their
customers, a change is occurring in the way marketing campaigns are
implemented. The status quo in marketing science is the implementation of
marketing campaigns in a series of waves or tactical campaigns. In this type of
marketing, groups of customers are targeted for a specific promotion. The
customers′ buying behavior initiates a promotional period during which
customers can respond to the promotional offer. At the end of such a campaign,
the results are determined and then fed back into future waves of marketing
activity.
A new method of continuous marketing has recently appeared. With multiple
customer interaction channels, including the Internet, inbound telephone calls,
outbound telephone calls, direct sales, and direct mail, organizations with the
capability to provide CRM data to operational customer service applications can
continuously market to single customers. For instance, if a customer segment
definition and its sensitivity to purchase certain product or service information
are made available to customer service agents during inbound telephone calls,
the customer service agent can be directed to deliver the appropriate marketing
message to the customer interactively. Furthermore, if organizations had the
capability to update a customer segment and other purchase behavior models in
realtime, they would be able to conduct continuous interactive marketing
campaigns. Organizations must track all customer interactions and provide
timely and accurate customer behavioral information to the marketer to execute
such a campaign. In the example above, customer service agents must have
real-time information to know that the customer has not already purchased the
products they are marketing. Failure to have real-time information in this
instance can have a detrimental effect on customer service.
Continuous marketing is also driven by the technology revolution. The technical
challenge in continuous marketing is the ability to access real-time information.
In order to deliver real-time information, an organization must be able to
transform its customer purchase behavior into decision support information in
realtime. With wave marketing campaigns, it can take an organization several
weeks or months to provide the decision support information that drives the
marketing strategy. Organizations can no longer wait for its knowledge workers
to spend weeks creating models and decision support analysis to support
marketing campaigns. Automated models and expert systems will create the
decision support information required by continuous marketing. Data mining
technology will play an ever-increasing role in providing decision support
information to continuous marketing campaigns.
In summary, technology plays a fundamental role in CRM and continuous
interactive marketing (CIM). Data warehousing permits the consolidation of an
organization′s operational data. Data mining is used to create customer
28
Intelligent Miner Applications Guide
segments and to identify profitable marketing opportunities. Campaign
management tools are used to implement and manage the design, execution,
tracking, and postanalysis of marketing campaigns.
Technology is the key enabler in the implementation of CIM. This case study
guide illustrates the use of data warehousing and the Intelligent Miner to support
CRM and CIM.
3.2 Case Studies
The business used for the case studies presented in this book is a retail bank,
who is a client of Loyalty Consulting. Throughout the book this retail bank shall
be referred to as the ″Bank″.
Loyalty Consulting, a subsidiary of The Loyalty Group, grew out of the
experience of building and outsourcing the data warehouse for the Air Miles
Reward Program (AMRP). By maintaining the data warehouse and providing
analytical services to the AMRP and sponsor companies, Loyalty Consulting
gained substantial experience in the application of technology to real business
requirements. It was one of the original partners for the Intelligent Miner data
mining product for IBM and has been applying the technology for more than two
years.
Loyalty Consulting offers services that can be broadly categorized as:
•
Database and data warehouse consulting
•
Data mining or knowledge discovery in databases
•
Geographic information system (GIS)
3.3 Strategic Customer Segmentation
In meeting its database marketing needs, the Bank currently uses standard
analytical techniques. The Bank′s business analysts use recency frequency
monetary (RFM) analysis, OLAP tools, and linear statistical methods to mine the
data for marketing opportunities and to analyze the success of the various
marketing initiatives undertaken by various lines of business. The Bank
recognizes the opportunity to increase the efficiency of its database marketing
activities and improve the knowledge of its customers through advanced data
mining technology.
The case studies presented in this book are driven by the Bank′s business
requirement to use data mining to identify new business opportunities and/or to
reduce the cost of marketing campaigns to existing customers. In this section we
describe a framework for customer relationship management. We illustrate the
framework by using four data mining case studies, which we present in 3.4,
“Case Studies” on page 30.
Customer segmentation is one of the most important data mining methods in
marketing or CRM. Segmentation using behavioral data creates strategic
business initiatives. The customer purchase data that a company collects forms
the basis of the behavioral data. It is important to create customer segments by
using the variables that calculate customer profitability. These variables
Chapter 3. Case Study Framework
29
typically include current customer profitability and some measure of risk and/or
a measure of the lifetime value of a customer.
Creating customer segments based on variables that calculate customer
profitability will highlight obvious marketing opportunities. For example, a
segment of high-profit, high-value, and low-risk customers is the segment a
company wants to keep. This segment typically represents the 10% to 20% of
customers who create 50% to 80% of a company′s profits. The strategic
initiative for this group is obviously retention. A company would not want to lose
these customers. A low-profit, high-value, and low-risk customer segment is
also attractive to a company. The obvious goal of the company for this segment
would be to increase its profitability. Cross-selling (selling new products) and
up-selling (selling more of what customers currently buy) to this segment are the
marketing initiatives of choice.
Within the behavioral segments, demographic clusters and/or segments are
created. Customer demographic data does not typically correlate with customer
profitability, which is why it should not be used with behavioral data. Creating
demographic segments allows the marketer to create relevant advertising, select
the appropriate marketing channel, and identify campaigns within the strategic
customer segment defined above.
Let us say a bank has both a high-profit and a low-profit behavioral customer
segment that have similar demographic subsegments. The profile of the
subsegment is young, high-income professionals with families. The marketer
would want to ask the following question: Why do these similar demographic
segments behave differently and how do I change the low-profit group to a
high-profit group? It is difficult, if not impossible, to answer the why, but data
mining provides an answer to the how. Affinity analysis discovers that the
high-profit segment of young wealthy professionals has a distinct product pattern
- mortgages, mutual funds, and credit cards. Using affinity analysis on the
low-profit segment reveals that two of its product patterns are the same as those
of the high-profit segment - mutual funds and credit cards. The marketing
campaign to increase the profitability of the low profit segment would thus be to
market mortgages to it.
In summary, behavioral segmentation helps derive strategic marketing initiatives
by using the variables that determine customer profitability. Demographic
segmentation within the behavioral segments defines tactical marketing
campaigns and the appropriate marketing channel and advertising for the
campaigns. It is then possible to target those customers most likely to exhibit
the desired behavior (in the above example, those customers most likely to
purchase a mortgage) by creating predictive models. See Figure 10.
30
Intelligent Miner Applications Guide
Figure 10. Customer Segmentation Model
Chapter 3. Case Study Framework
31
3.4 Case Studies
In this book we present the following four case studies that highlight the role
IBM′s Intelligent Miner and data mining technology play in supporting a CRM
system:
•
Customer Segmentation
The first case study creates a customer segmentation that will be used in the
other case studies. Using shareholder value variables to create the
segmentation will drive strategic initiatives for the customer segments
discovered. Two of Intelligent Miner′s clustering techniques and a decision
tree are used to build segmentation models.
•
Cross-Selling Opportunity Identification
Identifying a cross-selling opportunity that is actionable and profitable using
Intelligent Miner′s product associations algorithms is the topic of this case
study. This study is based on the customer segment from the first case
study whose strategic initiative is to increase its profitability.
•
Target Marketing Model to Support a Cross-Selling Campaign
In this case study, we build a predictive model to target those customers
likely to buy the product identified as a cross-selling opportunity in the
previous case study. Several algorithms from Intelligent Miner are used. The
models built with the Intelligent Miner decision tree, radial basis function
(RBF) regression, and neural network are compared.
•
Attrition Model to Improve Customer Retention
In this case study, profitable customer segments are selected from the
segmentation model built in the first case study. An attrition model is built
identifying those profitable customers likely to defect. Several algorithms
from Intelligent Miner are compared. In addition to the predictive modeling
algorithms used in the previous case study, a time-series neural network will
be utilized.
The four case studies represent four major components of a CRM program that
an organization can implement. The strengths of Intelligent Miner′s algorithms
and visualization tools and its ability to work on a wide variety of business
problems are illustrated through the case study results. Figure 10 on page 30
shows the customer segmentation model used in the case studies shown in this
book.
32
Intelligent Miner Applications Guide
Chapter 4. Customer Segmentation
This case study creates a customer segmentation that will be used in the other
case studies. Using shareholder value variables to create the segmentation will
drive strategic initiatives for the customer segments discovered. Two of
Intelligent Miner′s clustering techniques and a decision tree are used to build
the segmentation models.
4.1 Executive Summary
The Bank wanted to create an advanced segmentation of its customer base in
order to further understand customer behavior. The segmentation was to be
compared with the existing segmentation that was created through RFM
analysis. A segmentation framework as described in 3.3, “Strategic Customer
Segmentation” on page 29, was to be created to meet these key business
requirements:
•
Define ″shareholder value″ for the corporation
•
Define strategic objectives for customer management
•
Understand customer behavior in terms of shareholder value
•
Understand the interaction between customer transaction behavior and
shareholder value
Shareholder value was a well-understood concept for the Bank. However, the
specific variables that make up shareholder value were not previously
considered in detail. The selection or creation of these variables was a primary
requirement.
Having defined the metrics or variables used to approximate shareholder value,
the Bank wanted to understand how the customer base was segmented by
shareholder value. An analysis of customer segments defined by shareholder
value were to be used to derive strategic initiatives for managing the
shareholder value of each of the segments.
Further segmentation using detailed customer transaction behavior, defined by
RFM variables by product over time, would provide insight into which customer
behaviors were related to positive and negative shareholder value.
Understanding the relationship between customer behavior and shareholder
value would drive the creation of tactical marketing initiatives that could be
executed to meet the various customer segment strategies.
4.2 Business Requirements
The Bank wanted to create an advanced segmentation of its customer base to
further understand customer behavior. This segmentation was to be compared
to the existing segmentation that was created with RFM analysis. A
segmentation framework as described in 3.3, “Strategic Customer
Segmentation” on page 29, was to be created to meet the following key
business requirements:
 Copyright IBM Corp. 1999
•
Define ″shareholder value″ for the corporation
•
Define strategic objectives for customer management
33
•
Understand customer behavior in terms of shareholder value
•
Understand the interaction between customer transaction behavior and
shareholder value
The Bank′s data warehouse was used as a data source for this case study. The
Bank had spent considerable effort cleaning and transforming the data prior to
loading it into their warehouse. Therefore, some of the data preparation
activities that are usually time consuming were not required in this case study.
Customer segments were to be determined using the following shareholder
valuable variables that were identified by the Bank′s executives as key drivers of
their business:
•
Number of products used by the customer over a lifetime
•
Number of products used the customer in the last 12 months
•
Revenue contribution of the customer over a lifetime
•
Revenue contribution of the customer over last 12 months lifetime
•
Most recent Customer Credit Score
•
Customer tenure in months
•
Ratio of (number of products/tenure)
•
Ratio of (revenue/tenure)
•
Recency
A review of the clustering process was presented in sufficient detail so that
technical analysts could use their own data to reproduce a clustering project.
The results showed that the existing segmentation scheme was valid but could
use some additional refinement. The key drivers of profitability were verified. A
highly profitable customer segment was identified and represented 35% of the
corporate profit with only 9% of customers. Some cross-selling opportunities
were quantified; they represented a potential profit increase of 18% over the
entire customer base.
The Bank executives decided that there was potential value in data mining and
started several data mining projects, including target marketing, opportunity
identification, and further segmentation work.
4.3 Data Mining Process
In this section we outline the data mining process that was used to meet the
business requirements of the Bank (see 4.2, “Business Requirements” on
page 33). Figure 11 on page 35 highlights the major steps in the process:
1. Shareholder value definition
2. Data selection
3. Data preparation including discretization
4. Demographic clustering
5. Neural clustering
6. Cluster result analysis
7. Classification of clusters with decision tree
34
Intelligent Miner Applications Guide
8. Comparison of results
9. Selection of clusters and/or segments for further analysis
We describe the first four topics in this section. We discuss topics 5 through 8 in
4.4, “Data Mining Results” on page 50.
Figure 11. Data M i n i n g Process: Customer Segmentation
A high-level tabulated comparison of the demographic clustering algorithm and
neural clustering results is made. We chose to present the demographic
clustering results in detail because they are more interesting than the neural
clustering results. This difference is not a general observation; it is true for this
particular case study. In our experience both algorithms produce good results,
usually one slightly better than the other, depending on the business problem
and more importantly the characteristics of the data that was mined.
Chapter 4. Customer Segmentation
35
4.3.1 Data Selection
We use the data model in Figure 12 on page 37 as the primary source of data.
Approximately 50,000 customers and their associated transaction data for a
12-month period was selected as a representative sample for the study. (We
used this data in all of our case studies.) The transaction data used contained
transactions across all possible products. We selected the complete transaction
data because we wanted to develop an understanding of the different customer
transaction behaviors. All customer transaction behaviors are contained entirely
within the transaction and customer tables.
The shareholder value variables we defined for this case study included
revenue, tenure, number of products purchased over the customer tenure,
number of products purchased over the last 12 months, customer credit score
and recency (in months) of the last transaction. These variables form the core of
the top layer of the hierarchical clustering model that we develop in this case
study (see Figure 10 on page 30). We had to calculate all of these variables from
the raw transaction data. The selection of these variables was driven entirely by
the business requirement. These are the variables the business had decided to
use in managing its customer base.
The profitability data in the data model in Figure 12 on page 37 was contained in
the transactions table. Each transaction record contained a revenue figure that
could be used to estimate profitability by applying a gross profit margin or
interest rate spread. More sophisticated profit models could be developed but
were outside the scope of this work. 1 The other shareholder value variables
were calculated by using aggregate functions on the transaction data while
joining the data to each customer record.
1
More sophisticated profit models may include the transaction cost as well as the transaction gross revenue. The cost of
marketing to the customer can be determined from the promotion history table as in Figure 12 on page 37. Other costs can
be allocated by the customer ′s transaction intensity or by some other variable relevant to the business problem at hand.
36
Intelligent Miner Applications Guide
Figure 12. Customer Transaction Data Model
The Bank offers has divided its products into the 14 categories listed below. The
category labels in the results will be denoted by ″cat #____″ to protect the
Bank′s confidentiality.
•
Loans
•
Mortgages
•
Leases
•
Credit Card
•
Term Deposits
•
ATM Card
•
Savings Accounts
•
Personal Banking
•
Internet Banking
•
Telephone Banking
•
Business Loans
•
Business Mortgages
•
Business Deposit Accounts
•
Business Credit Cards
We created transaction variables for each of the above product categories. For
each customer we calculated the recency in months, revenue by quarter and
number of transactions by quarter for two consecutive quarters in 1997.
Chapter 4. Customer Segmentation
37
4.3.2 Data Preparation
Once the data required for the data mining process is selected, it must be in the
appropriate format or distribution. Therefore it has to be cleaned and
transformed to meet the requirements of the data mining algorithms.
4.3.2.1 Data Cleaning
Very little data cleaning was required for this case study because the data was
extracted from the Bank′s data warehouse. During the load process for this
warehouse, substantial data cleaning occurs to minimize the data preparation
required for all analytical activities, including data mining.
After we created all the variables on each customer record, we had to clean the
data. We profiled the data to determine how many variables had records with
missing values, unknown values, invalid values, or valid values. Following are
the definitions for possible field contents:
•
Missing Value
−
•
Unknown Value
−
•
A record has a value for a particular field that has no known meaning.
Invalid Value
−
•
A record has no value for a particular field.
A record has a value for a particular field that is invalid but whose
meaning is known.
Valid value
−
A record has a value for a field that is valid.
Data cleaning is the process of assigning valid values to all records with
missing, invalid, and unknown values. In this case study only the transaction
variables had missing values. (Transaction data is usually very consistent and
has no invalid or unknown values). The missing values resulted from particular
customers having no transaction activity for a particular product. We assigned
these missing values a value of zero.
We assigned a new value to all categorical variables that had records with
missing and unknown values. We corrected the invalid values for these variables
to valid values.
4.3.2.2 Data Transformation
After we cleaned the data, handled all missing and invalid values, and made the
known valid values consistent, we were ready to transform the data to maximize
the information content that can be retrieved.
For statistical analysis the data transformation phase is critical as some
statistical methodologies require that the data be linearly related to an objective
variable, normally distributed and containing no outliers. Artificial intelligence
and machine learning methods do not strictly require the data to be normal or
linearized, and some methods, like the decision tree, do not even require
outliers to be dealt with. This is a major difference between statistical analysis
and data mining. The machine learning algorithms can automatically deal with
the nonlinearity and nonnormal distributions, although the algorithms work better
in many cases if these criteria are met. A good statistician with a lot of time can
manually linearize, standardize, and remove outliers better than the artificial
38
Intelligent Miner Applications Guide
intelligence and machine learning methods. The challenge is that with millions of
data records and thousands of variables, it is not feasible to do this work
manually. Also, most analysts are not qualified statisticians, so using automated
methods is the only reasonable solution.
After cleaning the original data variables, we created new variables using ratios,
differences, and business intuition. We created total transaction variables, which
were the sum of the transaction variables over two quarters. We used these
totals to create ratio variables. We created timeseries variables to capture the
time difference in all transaction variables between quarters.
Other variables that we calculated on the basis of our knowledge of the business
were:
•
Number of products purchased by the customer over a lifetime
•
Number of products purchased by the customer in the last 12 months
•
Revenue contribution of the customer over a lifetime
•
Revenue contribution of the customer over last 12 months lifetime
•
Most Recent Customer Credit Score
•
Customer tenure in months
•
Ratio of (number of products/tenure)
•
Ratio of (revenue/tenure)
•
Recency
This last group of variables were designated as ″shareholder value″ variables
and they were the variables selected by the business to be used to create
strategic customer relationship marketing initiatives.
To use the data in the demographic clustering algorithm, we discretized it.
Discretization facilitates interpretating the results, for both the neural clustering
and demographic clustering algorithms, and takes care of outliers. The following
quantiles were calculated for all numeric variables: 10, 25, 50, 75, 90. The values
of the variables at these breakpoints were determined and the data was divided
into six ordinal values.
We arbitrarily chose the quantiles for the discretization, and we found the
selection useful. 2 The quantile breaks were generated in an automated fashion.
We then profiled the resulting distributions and manually adjusted them to be
unimodal or at least monotonic. We selected the modality and monotonicity
criteria for ease of interpretation; in our experience these criteria provide useful
results.
To improve the clustering results, advanced analysts removed the correlated
variables. Factor analysis can be used to create linearly independent
components. For easy interpretation of results, the original data can be
clustered against the components, and the variables most representative of the
components chosen as input to the clustering algorithm.
Refer to Figure 13 on page 41 for a view of the original data and to Figure 14 on
page 42 for a post-discretized view of the data.
2
We give credit for this discretization scheme to Dr. Messatfa from the IBM ECAM lab in Paris, France.
Chapter 4. Customer Segmentation
39
Figure 13 on page 41 shows the variable names taken from the data source, and
Figure 14 on page 42 shows the variable names as follows:
40
•
Unchanged variables have the original variable names.
•
Changed variables have an underscore added to the end of the variable
name.
•
New variables show like unchanged variables.
Intelligent Miner Applications Guide
Figure 13. Original Data Profile
Chapter 4. Customer Segmentation
41
Figure 14. Post-Discretized Data Profile
42
Intelligent Miner Applications Guide
Some of the key features to note in Figure 13 on page 41 and Figure 14 on
page 42 are:
•
The original data has missing values treated. (The Bank data warehouse
data is cleaned before loading and thus less cleaning is required.)
•
The original data has continuous variables that are extremely skewed.
•
The original data has multimodal variables.
•
The discretized data is much easier to interpret than the original data.
•
Many of the previously skewed distributions are ″normal″ in shape, which
enables the algorithms to obtain accurate results and/or allow the results to
be easily interpreted.
•
Some of the data in the discretized set is still skewed, indicating that the
data may not be useful.
To prepare the data for clustering with the neural clustering algorithm, we
standardized some of the continuous variables, using a logarithmic transform.
See Figure 15.
Figure 15. Post Logarithm Transformed Data Profile
Some key features of the logarithm transformed data are:
•
The data is much less skewed.
•
Some of the variables are unimodal (LAVGBAL, LRATIO1, LRATIO2,
LRATIO3, LTENURE)
•
Some variables (LREV12, LREV3) have two peaks because of a large number
of records with zero or small values.
•
Some variables (LDIFF3, LDIFF3TX, LDIFF6, LDIFF6TX) have three modes or
peaks because of the transformation used. The data in transformed form is
Chapter 4. Customer Segmentation
43
much easier to visualize than in its original pre-prepared state. The
algorithms should achieve better results using this data and/or results that
will be much easier to interpret.
Once the data has been selected, prepared, and transformed, it is possible to
run the data mining algorithms.
4.3.3 Data Mining
Figure 16 on page 45 shows the clustering process flow for this case study. We
used demographic clustering so that we could use the results to interpret the
output from neural clustering. Neural clustering can be difficult to interpret
because of the use of continuous data, which is typically skewed or has been
logarithm transformed to remove the skew.
44
Intelligent Miner Applications Guide
Figure 16. Clustering Process Flow
4.3.3.1 Parameter Selection
Referring to Figure 16, you see that the first step in the clustering process, after
selecting the data set (the discretized data in this case) and after selecting an
algorithm to be run (demographic clustering in this case), is to choose these
basic run parameters for the algorithm:
•
Maximum number of clusters
Chapter 4. Customer Segmentation
45
This parameter indicates the maximum number of clusters allowed. The
algorithm may find less. This feature is unique to Intelligent Miner. Most
other clustering algorithms require that the number of clusters be specified.
•
Maximum number of passes through the data
This parameter indicates how many times the algorithm can read the data.
The higher this number and the lower the accuracy criterion (see below), the
longer the algorithm will run and the more accurate the result will be. This
parameter is a stopping criterion for the algorithm. If the algorithm has not
satisfied the accuracy criterion after the maximum number of passes, it stops
anyway.
•
Accuracy
This number is a stopping criterion for the algorithm. If the change in the
condorect criteria between data passes is smaller than the accuracy (as a
percentage), the algorithm terminates.
•
Similarity threshold
This parameter defines the similarity threshold between two values in
distance units. The default distance unit is the absolute number. Therefore
two values are considered equal if their absolute difference is less than or
equal to 0.5.
The neural clustering algorithm has the following parameters:
•
Number of rows and number of columns
Multiply the two numbers together to get the maximum number of clusters.
The rectangle defined by the number of rows and columns of neural network
nodes changes the resulting clusters. Unless you are an advanced user, we
recommend choosing the most ″square″ output grid shape. For example, if
you want 9 clusters, choose 3 rows by 3 columns (the default). If you want 12
clusters, choose 4 rows by 3 columns as opposed to 6 rows by 2 columns.
•
Number of passes
This parameter indicates the number of passes through the data the
algorithm will make to build the neural network.
For the first clustering run, we selected a maximum number of clusters larger
than the number we wanted at the end of the project. By selecting more we
allowed the algorithm to choose less if that is all that is in the data. If the
algorithm comes back with the maximum, we know that there are likely more
clusters. The number of clusters chosen is driven by how many clusters the
business can manage. In our experience, this number is less than 10 for most
companies. For this case study we chose 9 for the maximum number of clusters.
For the maximum number of passes, we chose 5 and specified the accuracy as
0.1. We left the similarity threshold at the default value of 0.5. The parameter
settings for the number of passes and accuracy were arbitrary. We wanted a
reasonable number of passes through the data to ensure a reasonable
convergence of the solution.
For the initial neural clustering run, we selected a three row by three column
grid; this selection results in a maximum of nine clusters. We left the number of
passes at the default.
46
Intelligent Miner Applications Guide
The analysis of the results of each run will guide the selection of parameters for
follow-on runs. The clustering process is highly iterative as shown in Figure 16
on page 45.
For the first run of the demographic clustering algorithm, we left the advanced
parameter settings at the default. Because we discretized the data ahead of
time, and all the discretized variables had approximately the same range, many
of the advanced parameters were not required. We used these advanced
parameter settings to allow continuous data to be effectively clustered with the
algorithm:
•
Distance measure
−
Absolute
One unit of absolute difference in the magnitude of two record values for
one variable.
−
Range
The range (difference between maximum and minimum) of a variable is
considered one distance unit.
−
Standard deviation
The standard deviation of a variable is considered one distance unit.
This setting is only meaningful if the variable is normally distributed.
•
Field weighting
−
Probability weighting
Uses the probability of the occurrence of a variable value to compensate
for its contribution to the overall cluster result.
−
Information theoretic weighting
Uses manually selected weights to compensate for the contribution of a
variable to the overall cluster result.
4.3.3.2 Input Field Selection
We selected these input field variables for the first run:
•
Number of products purchased by the customer over a lifetime
•
Number of products purchased by the customer in the last 12 months
•
Revenue contribution of the customer over a lifetime
•
Most Recent Credit Score
•
Revenue contribution over the last 12 months
•
Customer tenure in months
•
Ratio of (revenue/tenure): Ratio 1
•
Ratio of (number of products/tenure): Ratio 3
•
Region
•
Recency
•
Tenure (number of months since the customer first activated at the bank)
We used the discretized versions of these variables for demographic clustering
and the log-transformed continuous versions for neural clustering.
Chapter 4. Customer Segmentation
47
As discussed in section 4.2, “Business Requirements” on page 33, the first layer
of clusters in the CRM framework is created by using shareholder value
variables and any other variables the business would like to use to manage its
customers. All other discrete and categorical variables and some interesting
continuous variables were input as supplementary variables to be profiled with
the clusters but not used to define them. These supplementary variables can be
used to interpret the cluster as well. The ability to add supplementary variables
at the outset of clustering is a very useful feature of Intelligent Miner, which
allows the direct interpretation of clusters using other data very quickly and
easily.
4.3.3.3 Output Field Selection
The entire data set was output with the cluster information appended to the end
of each record. The entire data set was output so that the results of other
clustering runs using both the demographic clustering and neural clustering
algorithms could be directly compared by cross-tabulating the cluster IDs from
the various schemes. This is one advantage of Intelligent Miner. Having multiple
algorithms allows the output of one algorithm to be used as the input to another.
The algorithms used in combination are more powerful than those applied alone.
4.3.3.4 Results Visualization
The output of the clustering algorithms is an output data set and a visualization.
The visual results display the number of clusters, the size of each cluster, the
distribution of each variable in each cluster, and the importance of each variable
to the definition of each cluster (based on several metrics including chi-square
test, entropy, and condorect criteria).
The result is completely unsatisfactory if there is only one cluster, or if there is
one very large cluster (> 90%) and several small clusters. This situation will
occur if highly skewed continuous variables are used as input or if the modal
frequency of some of the discretized variables is very large (> 50%-90%). If
this situation occurs, we recommend using probability field weighting for the
discrete variables and discretization of the continuous variables. The statistics
of the input variables can be viewed in the cluster details.
4.3.3.5 Cluster Details Analysis
The cluster details contain some tabulated statistics for the cluster model. The
global measures include the condorect criteria for the demographic clustering
algorithm and the quality for neural clustering. Realistic ″good″ values for the
condorect criteria are in the 0.6-0.75 range. Higher values are usually
associated with the case of one very large cluster and a number of smaller
clusters. ″Good″ neural cluster values are in the 0.5-0.7 range.
For the demographic clustering algorithm the details view also shows the
condorect criteria for each cluster and for each variable globally and within each
cluster, the similarities among all clusters, and the global statistics and statistics
within each cluster for each variable. The neural clustering algorithm also shows
global statistics and statistics within each cluster for each variable.
The details can be used, for example, to assess the quality of the cluster models,
assess the contribution of each variable to the model, and to compare different
cluster models.
48
Intelligent Miner Applications Guide
4.3.3.6 Cluster Profiling
The next step in the clustering process is to profile the clusters by executing
SQL queries. The purpose of profiling is to quantitively assess the potential
business value of each cluster by profiling the aggregate values of the
shareholder value variables by cluster. The scientific quality of the clusters
should also be profiled. Some of the variables for profiling include:
•
Record scores
−
•
2nd choice cluster
−
•
Intelligent Miner provides a score on each record in addition to the
cluster ID, which is a measure of how well the records fit the cluster
model.
Intelligent Miner provides a cluster ID for the second choice cluster to
which the record could have been assigned.
2nd choice scores
−
Intelligent Miner provides the score for how well the record fits the
second choice cluster assignment.
•
Comparison of methods considering 2nd choice clusters and scores
•
Other measures including entropy, chi-square, Euclidean distance
4.3.3.7 Cluster Characterization (Qualitative)
Once the cluster algorithm has been run, the next step is to qualitatively
characterize the clusters. Cluster characterization can be completed using the
results visualization. Each cluster should be considered variable by variable. The
differences and similarities among the clusters, the variable distributions by
cluster and global distribution, the cluster sizes, and ordering of variables within
the cluster by different metrics should be noted.
4.3.3.8 Cluster Characterization Using a Decision Tree
One of the disadvantages of cluster models is that there are no explicit rules to
define each cluster. The model is thus difficult to implement, and there is no
clear understanding of how the model assigns cluster IDs. The cluster model
tends to be a blackbox. You can use a decision tree to classify the cluster IDs
using all the input data and supplementary data that was used in the clustering
algorithms. The decision tree will define rules that classify the records using the
cluster ID. In many instances, based on our experience, the decision tree
produces a very accurate representation of the cluster model (>90% accuracy).
If the tree representation is accurate, it is preferable to implement a tree
because it provides explicit, easy-to-understand rules for each cluster.
4.3.3.9 Final Result
The final clustering result is selected on the basis of a combination of scientific
and business reasons. Cluster models that have good global values of the
condorect criteria or quality, whose clusters are distinct and different from each
other, and which can be accurately modeled with a decision tree are
″scientifically″ good. Good business models are defined by sensible
interpretation of the clusters, good segmentation in shareholder value variables,
segmentation that drives obvious business strategies, and segments that are
actionable.
Chapter 4. Customer Segmentation
49
4.4 Data Mining Results
Figure 17 on page 51 presents the results of several iterations of demographic
clustering. This diagram is the cluster visualizer in Intelligent Miner that is used
by both demographic clustering and neural clustering.
Here is some general information to help you read the diagram:
50
•
Each strip of the diagram represents a cluster.
•
The clusters are ordered from top to bottom according to their size.
•
The numbers down the left side show the size of the cluster as a percentage
of the universe.
•
The numbers down the right side are cluster IDs.
•
The variables are ordered from left to right in their order of importance to
the cluster, based on chi-square tests between the variables and cluster IDs.
This is the default metric. Among other ordering criteria you could use are
entropy, condorect criteria, and database order.
•
The variables in square brackets are the supplementary variables. Variables
without brackets are those used to define a cluster.
•
Numeric (integer), discrete numeric (smallint), binary, and continuous
variables have their frequency distribution or histogram shown as a bar
graph. The outlines in the foreground of the bars indicate the distribution of
the variable within the current cluster. The grey solid bars in the background
indicate the distribution of the variable in the entire universe. The more
different the cluster distribution is from the distribution within the entire
universe, the more interesting or distinct the cluster is.
•
Categorical variables are shown as pie charts. The inner pie represents the
distribution of the categories for the current cluster, and the outer ring
represents the pie chart distribution of the variable for the entire universe.
Again, the more different the distribution of the variable is for the current
cluster as compared to the average distribution, the more interesting or
distinct the cluster is.
Intelligent Miner Applications Guide
Figure 17. Shareholder Value Demographic Clusters
The result shows that there are nine clusters in the model. There are likely more
clusters in the data as we chose nine to be the maximum number of clusters
allowed. The clusters are reasonably distributed (not one very large cluster). The
variable distributions within the clusters tend to be different from their global
Chapter 4. Customer Segmentation
51
distributions. The Best98, Revenue, and CreditScore variables are commonly
important to several clusters.
For comparison purposes, a high-level neural clustering is shown in Figure 18 to
highlight some similarities and differences between the results from the two
different methods.
Figure 18. Shareholder Value Neural Network Clusters
The input variables chosen for the neural network were the logarithm
transformed versions of the variables used for the demographic clustering. The
discretized variables used for the demographic clustering were input as
supplementary variables to aid in the interpretation of the neural cluster. Some
key features to note are:
52
Intelligent Miner Applications Guide
•
The neural clusters are not quite as uniformly distributed with respect to
cluster size as the demographic cluster results. In our experience, the
opposite is usually the case.
•
The same variables as in demographic clustering appear as the most
important variables (for example, Best98, other Best vars, REVENUE_,
CREDSCORE_, and NUMPROD_).
•
The discretized variables are more significant to the cluster definitions than
the logarithm transformed variables used to create the clusters. This
illustrates one of the values of discretization. You can use the discrete
variables to assist in the interpretation of clusters while using the continuous
variables to build the clusters.
Because of the similarity of the neural clustering and demographic clustering
results and in an effort to reduce redundancy in the presentation of results, the
discussion below focuses on the demographic clustering results.
4.4.1 Cluster Details Analysis
Figure 19 on page 54 shows the cluster details. From this result we can see
that the global condorect value is 0.6098. This value is at the low end of a
reasonable result. The lower value may be due to several factors including the
fact that we restricted the output to nine clusters when there may have been
more, some variables are not very good, the data is not on distinct clusters or
other reasons.
The quality of the clusters ranges from a condorect criterion value of 0.42 to 0.72,
as shown in the Cluster Characteristics section in Figure 19 on page 54.
In the Similarity Between Clusters section of Figure 19 on page 54, you can see
that there is some similarity among clusters, with the similarity measure ranging
from <0.25 to a maximum of 0.42.
Figure 19 on page 54 also shows that the REVENUE12_, CREDSCORE_ and
NPRODL12 variables have low condorect values. Therefore they could be
removed from the cluster model to improve the result (see the Reference Field
Characteristics section in Figure 19 on page 54).
These results indicate that further iteration is wanted.
Chapter 4. Customer Segmentation
53
Figure 19. Shareholder Value Demographic Cluster Details
54
Intelligent Miner Applications Guide
4.4.2 Cluster Characterization
In this section we discuss the characterizations of some of the interesting
clusters in Figure 17 on page 51.
The Best98 variable is a binary variable that indicates the best customers in the
database as determined by other means. The clustering model presented seems
to agree very well with this existing definition as most of the clusters seem to
have almost all Best or no Best. As a first pass, this is an exciting result, as the
status quo Best segment has been confirmed with little effort! To be confident of
the data mining results, you should always observe the current business
knowledge in the results. Any successful company knows its business well
enough that the obvious results should show clearly in any data mining results.
Observing the current business knowledge provides confidence that the data
selection and data preparation efforts have been valid. If results are observed
that were previously unknown, one can find confidence in them as long as they
are alongside currently known facts.
This clustering result does not only validate the existing concept of best
customers, it also extends the idea of best customers by creating clusters within
best. It can be seen from Figure 17 on page 51 that there are several clusters
with varying levels of revenue. Perhaps this builds a case to create a
″VeryBest″ customer group?
Cluster 6 can be interpreted as almost all Best98 customers, whose Credit
Score, Revenue in the last 12 months, and revenue per month and number of
products used per month are in the 50th to 75th percentile. (Recall the
discretization definition in 4.3.2.2, “Data Transformation” on page 38). Cluster 6
represents 24% of the population. Refer to Figure 20 on page 56 for a detailed
view of cluster 6.
Chapter 4. Customer Segmentation
55
Figure 20. Cluster 6 Detailed View
56
Intelligent Miner Applications Guide
Cluster 3 can be interpreted as almost no Best98 customers, whose revenue,
credit score, revenue in the last 12 months, revenue per month, and number of
products per month are all in the 25th to 50th percentile. (Recall the
discretization definition given in 4.3.2.2, “Data Transformation” on page 38).
Cluster 3 represents 23% of the population. Refer to Figure 21 on page 58 for a
detailed view of cluster 3.
Chapter 4. Customer Segmentation
57
Figure 21. Cluster 3 Detailed View
58
Intelligent Miner Applications Guide
Cluster 5 represents 9% of the population, and the customers′ revenue, credit
score, and number of products per month are all in the 75% percentile and
above, skewed to almost all greater than the 90th percentile. The Best95, Best96,
and Best97 variables represent the status of the customers in the calendar years
1995, 1996, and 1997. The fraction of customers who were best was increased by
year! This looks like a very profitable cluster. Refer to Figure 22 on page 60 for
a detailed view of cluster 5.
Chapter 4. Customer Segmentation
59
Figure 22. Cluster 5 Detailed View
Figure 23 on page 61 provides the tabulated details for cluster 5.
60
Intelligent Miner Applications Guide
Figure 23. Cluster 5 Tabulated Details
Chapter 4. Customer Segmentation
61
Cluster 5 contains 8.9% of the customer population. The condorect value for
cluster 5 is 0.5946, just below the global value. Cluster 5 is most similar to
cluster 7 and cluster 0. Notice that REVENUE_ and CREDSCORE_ have
condorect values of 0.71 and 0.83, respectively. Recall that globally these
variables had low condorect values, but for this cluster they have very high
values. NPRODL12_ has a low condorect value, 0.37 for this cluster and low
globally. This information can be used to decide whether or not these variables
should be included in the model. The details also present the chi-square value
and the entropy value, which are measures of the association between the
variable and the cluster to which the records have been assigned.
In cluster 1, the supplementary variable, NEW, is a binary variable that indicates
whether or not the customer is new to the Bank. This cluster clearly consists of
new customers. The recency is low (which means the customer has not had a
recent transaction, that is, they have opened accounts but not transacted yet),
and the tenure is low. It would be very interesting to track these customers
over time to see how they progress. Refer to Figure 24 on page 63 for a
detailed view of cluster 1.
62
Intelligent Miner Applications Guide
Figure 24. Cluster 1 Detailed View
Chapter 4. Customer Segmentation
63
4.4.3 Cluster Profiling
In this section we present an example of a profile of revenue, number of
products purchased, and customer tenure (see Table 1). The Leverage column
is a ratio of revenue to customer. Table 1 shows that cluster 5 is the most
profitable cluster in that it represents 35% of the revenue yet only 9% of the
customers. The leverage ratio is the highest for this cluster. From Table 1 you
can also see that as profitability increases so does the average number of
products purchased. The product index is the ratio of the average number of
products purchased by the customers in the cluster divided by the average
number of products purchased overall. It is also interesting to note that
customer profitability increases as customer tenure increases.
Table 1. Customer Revenue by Cluster
Cluster
ID
Revenue
Customer
Product
index
Leverage
Tenure
5
34.74%
8.82%
1.77
3.94
60.92
6
26.13%
23.47%
1.41
1.11
57.87
7
21.25%
10.71%
1.64
1.98
63.52
3
6.62%
23.32%
0.73
0.28
47.23
0
4.78%
3.43%
1.45
1.40
31.34
2
4.40%
2.51%
1.46
1.75
61.38
4
1.41%
2.96%
0.99
0.48
20.10
8
0.45%
14.14%
0.36
0.03
30.01
1
0.22%
10.64%
0.00
0.02
4.66
From this simple result it is possible to derive some high-level business
strategies. From Table 1 it is obvious that the best customers (considering only
the data in the table) are in clusters 2, 5, and 7. These customers have a higher
revenue per person than other clusters as indicated by the leverage column.
Some possible high-level business strategies are:
•
Retention strategy for best customers (clusters 2, 5, and 7)
−
•
64
A business does not want to lose its best customers.
Cross-sell strategy for clusters (2, 6, and 0) by contrasting with clusters 5
and 7.
−
Clusters 2, 6, and 0 have a product index close to that of clusters 5 and
7, which have the highest number of products purchased. Because the
clusters are close in number of products purchased, it is not a big stretch
to convert customers from clusters 2, 6, and 0. By comparing the
products bought by the best customers to those purchased by clusters 2,
6, and 0, you can find missing products, which are candidates for cross
selling.
−
If you could increase the number of products purchased by 10% of
cluster 6 customers by one additional product, you could increase the
profitability of cluster 7 by 20% and the entire base by 5%.
−
If you could increase the number of products purchased by 10% of
cluster 7 customers by two products, you could increase the profitability
of cluster 2 by 25% and the entire base by 9%.
Intelligent Miner Applications Guide
•
You can similarly cross-sell clusters 3 and 4 compared to clusters 2, 6, and 0
as they are close in value.
•
The strategy for cluster 1 would be a wait-and-see plus information strategy.
−
•
Cluster 1 appears to be a group of new customers. As they are new
customers, sufficient data has not been collected to determine the
behaviors they may exhibit. Informing cluster 1 of the products and
services the business offers would make them profitable quickly.
The strategy for cluster 8 may be not to spend any significant marketing
dollars.
−
Cluster 8 appears to be the worst cluster; it has a very low revenue
percentage and purchases very few products, although it has been with
the company for about 30 months.
4.4.3.1 Cluster Results Comparison
Intelligent Miner permits the output of one algorithm to be used as the input to
another. Table 2 is a cross-tabulation of the cluster IDs created by the neural
clustering model and the demographic clustering model. The neural network
cluster ID distribution is presented by row, and the demographic clustering
distribution by column. The comparison shows the similarity of the two models.
Table 2. Comparison of Neural and Demographic Clustering Results
0
1
2
3
4
5
6
7
8
Total
0
3
5306
0
0
7
0
0
0
0
5316
1
2
7
0
183
89
0
1
0
567
849
2
1
8
3
665
21
0
0
0
5182
5880
3
1247
0
37
14
648
533
812
169
0
3460
4
3
0
11
2163
455
1
355
0
45
3033
5
2
0
28
5343
32
0
9
2
1277
6693
6
69
0
744
4
3
3733
4661
4625
0
13839
7
124
0
400
2461
33
99
4707
490
0
8314
8
262
0
34
828
193
43
1189
67
0
2616
Total
1713
5321
1257
11661
1481
4409
11734
5353
7071
50000
The highlighted cells indicate a significant overlap between the two models.
From Table 2 it is possible to conclude the following:
•
Cluster 1 and Cluster 0 from both models agree almost 100%. The
agreement is not usually this good unless the cluster is very distinct. In this
case the cluster contains new Bank customers with very little activity. The
fact that both models agree would allow you to apply this particular cluster
with confidence.
•
The cluster models agree fairly well with each other. The results indicate
that there are likely more than nine clusters. Rerunning the results with a
higher number should result in better agreement between the two models.
Chapter 4. Customer Segmentation
65
4.4.4 Decision Tree Characterization
One disadvantage of clustering methods is that the cluster definitions are not
easily extracted. Building a decision tree model with the cluster ID as the field
to be classified and using all available input data allows explicit rules to be
extracted for each cluster. The decision tree model built using the demographic
clustering result from above showed an accuracy of 95% (see the confusion
matrix in Figure 25). The confusion matrix shows the distribution of the
classification errors and the global accuracy.
Figure 25. Decision Tree Confusion Matrix
See Figure 26 on page 67 for a view of the decision tree model and a rule for
cluster 5. Rules for each of the clusters can be extracted.
66
Intelligent Miner Applications Guide
Figure 26. Decision Tree Model
As the accuracy of the decision tree is very high (95%), it is preferable to
implement the decision tree version of the customer segmentation model rather
than the original demographic clustering model.
4.5 Business Implementation and Next Steps
The results of this case study drew several reactions from the Bank executives:
1. Excellent visualization of results allow for more meaningful and actionable
analysis.
2. The original segmentation methodology was validated very quickly.
3. Refinement to the original segmentation is indicated and meaningful.
Based on the results of this case study, several data mining opportunities were
identified, and several projects were undertaken. Some of these projects
include:
•
Several predictive models for direct mail targeting
•
Further work on segmentation, using more detailed behavioral data
•
Opportunity identification using association algorithms within the segments
discovered
Chapter 4. Customer Segmentation
67
Data mining tools can be used to quickly find business opportunity in customer
transaction data. The simple example presented herein attempts to highlight a
process that can be used to achieve profitable data mining results.
Once a segmentation model is built and the customer is satisfied with the result,
the model is ready to be implemented. The first step in the implementation is to
integrate the model into the data warehouse and to modify the data warehouse
load process to automatically assign customers to the appropriate segments.
The variables used in the final segmentation model should be calculated and
stored in the data warehouse permanently. A data warehouse table should be
created to track each customer over time and record which segment the
customer was a part of in each time period. Such a table is very useful for
analytical purposes and can be used to measure the overall effectiveness of
marketing campaigns by observing their effect on customer behavior over time.
The segmentation model should also be rebuilt periodically (in our experience,
from monthly to annually depending on the organization). A comparison of
segmentation models over time should reveal changing market dynamics and
changing customer behavior due to an organization′s marketing efforts, changing
products and services, and social, political, and economic changes.
When the segmentation model has been implemented in the data warehouse, it
is possible to begin using it to drive actionable business activities. The
customer segment information can be used in operational data stores to support
continuous marketing and other operational activities, create standard reports
highlighting the shareholder value, demographic profiles and transaction
behavior of each segment, and as a framework to support opportunity
identification.
The next case study explores the use of affinity analysis within the segmentation
model defined herein, to find profitable cross-selling opportunities.
68
Intelligent Miner Applications Guide
Chapter 5. Cross-Selling Opportunity Identification
Using Intelligent Miner′s product associations algorithms to identify a
cross-selling opportunity that is actionable and profitable is the topic of this case
study. It is based on the customer segments derived from the first case study
whose strategic initiative is to increase profitability.
5.1 Executive Summary
The business requirements for this case study are to identify cross-selling
opportunities for the customer segments defined in Chapter 4 and ensure that
the opportunities discovered adhere to the corporate objectives.
Customer purchase transactions or billing data are required to perform product
associations. We used the Bank′s data warehouse to analyze transaction data,
thereby reducing the data preparation requirements for the case study.
A review of the cross-selling process using association discovery was presented
in sufficient detail for technical analysts to be able to reproduce the project using
their own data.
The cluster selected for cross-selling opportunity represented 7% of revenue for
23% of customers. The target behavior cluster represented 26% of revenue for
23% of customers (see Figure 17 on page 51 or Table 1 on page 64 for the
corresponding clusters). Changing the behavior of 10% of the cross-selling
cluster to that of the target cluster would represent a 25% increase in the
cluster′s profitability, or a 3% increase in the overall profitability of the business.
A credit card product category was identified as a cross-selling opportunity.
Several specific products within the credit card category were also identified.
The selection of these products was driven by the fact that the Bank′ s
executives, based on previous analyses, knew that credit card products were
very profitable. Our analyses also revealed that these products have the highest
profit potential opportunity. Confirmation of the business intuition provided
additional confidence in proceeding with a campaign. Although the data mining
analysis simply confirmed the business intuition, it provided quantitative results
and a specific target group, both of which were previously missing.
The recommended next steps include some demographic profiling of the target
customer group to assist the marketer in creating appropriate advertising and
marketing messages as well as selecting marketing channels. To further refine
the target group, we also recommend the construction of a predictive model,
which is the content of a later case study (see Chapter 7, “Attrition Model to
Improve Customer Retention” on page 111).
5.2 Business Requirement
The main objective of this case study was to use data mining techniques to find
actionable cross-selling opportunities from the analysis of customer transaction
data. Any opportunities that are identified should support strategic marketing
initiatives for the customer segments used by the organization. The
segmentation and the strategic initiatives recommended from the previous case
study in Chapter 4, “Customer Segmentation” on page 33 should be used.
 Copyright IBM Corp. 1999
69
Finally, the next steps required to implement the cross-selling opportunities as a
marketing campaign should be recommended.
5.3 Data Mining Process
Figure 27 on page 71 highlights the data mining process implemented in this
case study to meet the business requirements. The major steps in the process
are:
1. Cluster (segment) selection
2. Transaction data selection
3. Data preparation
4. Product association mining
5. Results analysis
6. Compare to identify cross selling opportunities
7. Compare methodology
8. Select a cross selling opportunity
We cover topics 1 through 4 in this section. We cover the other topics in 5.4,
“Data Mining Results” on page 76.
70
Intelligent Miner Applications Guide
Figure 27. Data M i n i n g Process: Cross-Selling Opportunity
5.3.1 Cluster Selection
The process to find cross-selling opportunities within a specific customer
segment depends on contrasting the purchase behavior of more than two
clusters. (The method discussed here is not the only method of finding
cross-selling opportunities.) One cluster is selected to be the group of
customers whose behavior is to be replicated in other clusters; this cluster is
usually the more profitable one. For cross-selling opportunity identification, the
purchase behaviors of interest are represented as product associations derived
from purchase transactions. Comparing the patterns of two or more clusters
highlights product pattern differences. For instance, if cluster A had a product
association, (A,B-->C) and cluster B had a product association (A-->B), the
cross-selling opportunity would be to market product C to cluster B. The
Chapter 5. Cross-Selling Opportunity Identification
71
behavior of the clusters being contrasted should not differ significantly. The gap
in missing products should not differ by more than a 1-3 product count because it
is difficult to change customer behavior drastically. Furthermore, too large a gap
between clusters could indicate fundamentally different customer behaviors that
would be impossible or difficult to bridge.
5.3.2 Data Selection
The data required to create product associations is customer transaction data,
market basket data, or any other data that has a similar layout. Figure 28
illustrates a typical transaction record used in a data mining project as input.
Figure 28. Typical Transaction Record
For market basket analysis the customer is not usually known, and product
associations are found when using transaction or market basket data. In this
case study the customer is known and it is therefore possible to link customer
transactions over time; this is much more powerful than an analysis of market
baskets without the customer ID.
The following considerations are important in the selection of transaction data
for association rule mining:
•
Time window of transactions
•
Level of product aggregation
•
Definition of product activity
The selection of a time window for the transactions is driven by the product
purchase cycle. We typically choose 2 to 4 product cycles, a range that has
produced positive results. The average purchase cycle can be determined by
query analysis of customer purchase transactions. (If the customer ID is not
known, the product cycles must be determined by empirical or survey methods.)
For frequently purchased products, a short time window is sufficient. A long time
window and hence more transaction records are required for low frequency
items. It is typically more difficult to find patterns in low frequency items
because of the amount of data and the prevalence of too many product cycles of
high-frequency-item transactions. To find patterns between low frequency items,
we recommend removing the transaction line items for all high-frequency
products. If the objective is to find patterns between low- and high-frequency
purchases, there is no choice but to use the long time window and all
transaction details.
72
Intelligent Miner Applications Guide
For this case study we selected a 12-month window of customer transaction
data. This implies that the patterns or associations discovered will be for
customer purchases that occur with a frequency of six months or less.
Another important consideration is the level of aggregation chosen for product
definition. If product codes are too specific (that is, they are based on product
details like size and flavor groupings), fewer associations will be discovered.
The associations discovered will also be less actionable because of the
specificity required in a promotional advertisement. A product taxonomy or
hierarchy is usually helpful in guiding the selection of product definition.
For this case study we used product categories, which resulted in a reduction in
the number of possible product codes from more than 130 to 13.
A final consideration important to product association analysis is the definition of
what constitutes a product purchase. This is more relevant when the customer
ID is known. If all products that were purchased only once over time are
included, more product patterns will be discovered, some of which may not be
very strong (that is, have lower confidence). Setting some minimum criterion for
inclusion of a particular product for a particular customer should reduce the
number of weak rules, and thus permit easy analysis. A threshold may be to
consider only products that have been purchased more than once by a customer
or products where the customer has spent some level of money.
5.3.3 Data Preparation
Transaction data for organizations that generate revenue through customer
billing is typically very ″clean.″ The industry sectors are finance,
telecommunications, insurance, and utilities. In these industries the transaction
data to be analyzed is actually billing data. Very little data preparation is
required to perform product association analysis for these industry sectors. The
preparation activities conducted would typically include:
•
Ensure that product codes are consistent
•
Addition of product hierarchy information
•
Creation of new product hierarchy levels
Product IDs which reference the same product should be made consistent.
Some variations in the product ID could result from the use of different codes in
different stores or regions, code changes due to supplier change, new coding
systems being implemented, and errors. If the product IDs are not made
consistent, the support for patterns and hence the number of patterns discovered
will be less.
The product codes in the customer transaction data used for this case study
were consistent, with only a few exceptions.
Adding in the product hierarchy information (as illustrated in Figure 28 on
page 72) allows the product association mining to be easily conducted using
different levels of product definition. The Bank′s data warehouse already
contained the product hierarchy information on each transaction record,
obviating this step in the process.
The final data preparation activity that may be required is the manual creation of
new product hierarchy levels. This activity is required when too few patterns are
discovered as a result of product definitions that are too specific. In such cases,
Chapter 5. Cross-Selling Opportunity Identification
73
we recommend using a higher level in the product hierarchy. If, however, using
the higher level results in rules that are too general and hence difficult to action,
the creation of an intermediate layer is required. This process could be very
laborious if the number of possible products is large. (Most industries have
several hundred products, except retail, which may have tens of thousands!)
The creation of the product hierarchy for the Bank′s data was based on past
analytical experience and was therefore appropriate for analysis without
modification.
5.3.4 Product Association Analysis
Figure 29 illustrates the steps required to discover product associations:
1. Parameter Settings
2. Association Discovery
3. Profile Rules and Large Item Sets (LIS)
4. Selectively Remove Large Item Sets
5. Iteration back to step 1
6. Rebuild Rules
Figure 29. Product Association Analysis Workflow
74
Intelligent Miner Applications Guide
5.3.4.1 Parameter Settings
The first step in setting up an association run is to select the algorithm
parameters. The parameters available include:
•
Minimum support
This is the minimum frequency of occurrence of a pattern required for a rule
to considered.
•
Minimum confidence
This is the minimum joint probability between the rule head and rule tail
required for a rule to be considered.
•
Maximum rule length
This is the maximum number of products allowed in any rule to be
considered.
•
Item constraints
This is a list of items that all rules must contain in order to be considered.
Starting with values that are too low for support and confidence may cause
unnecessary computer load. The association algorithm is very memory and CPU
intensive as the number of products and number of rules considered grows. We
recommend choosing very high values for support and confidence (50% for both)
and gradually lowering them until the number of patterns becomes unwieldy.
We usually leave the confidence level at 50% to eliminate most of the
permutations of rules that meet the minimum support criterion. For example, if
rule (A-->B) meets the support criterion, so does rule (B-->A). If the original
rule meets the confidence criterion, the permutation usually does not. Reducing
the permutation results in fewer rules and permits easy results analysis. We
never limit the maximum rule length or constrain the list of items within the
algorithm. If certain items are not to be considered, it is convenient to remove
them from the transaction records.
5.3.4.2 Association Discovery
Association discovery is repeated for all the clusters that were selected for
contrasting. The minimum set of rules in the two cases is compared to identify
products not present in some of the clusters. The list of removed large item sets
(LIS) is also compared to identify products not present in some clusters. These
missing products are the cross-selling candidate opportunities. The acceptance
of candidates as actionable opportunities is usually driven by the number of
customers who have the missing product. Too small a group of customers will
have too little return to justify the promotional investment.
5.3.4.3 Selectively Remove Large Item Sets
Having determined the parameter bounds, you can discover the association
rules. The number of rules generated initially is usually very large and
intimidating. Rather than changing the parameter settings at this point, it is
possible to begin temporarily removing certain products from the transaction
records. The products removed are LIS. There are two types of LIS:
1. Large item sets whose frequency in the entire transaction data universe is
statistically equivalent to the frequency in the current data set. These items
complicate the analysis and should be removed to achieve less complicated
rules that are easy to analyze.
Chapter 5. Cross-Selling Opportunity Identification
75
2. Large item sets whose frequency in the entire transaction data universe is
statistically different from the frequency in the current data set. These items
are the items that make up the patterns discovered.
After the first type of LIS is removed from the transaction data, the associations
are rediscovered. If the number of rules is still unmanageable, begin removing
the second type of LIS, noting carefully what is removed. The associations are
again rediscovered. This process is repeated until the number of rules is
manageable (usually 20 to 50 rules). Removing the LIS allows you to understand
the ″structure″ of the rules. The remaining 20 to 50 rules at this point form the
core of the rules. The initial unmanageable set of rules is created by permutating
the LIS with the remaining rules and applying the support and confidence
criterion.
5.3.4.4 Profile Rules and Large Item Sets
This step is repeated for all the clusters that where selected for contrasting. The
minimum set of rules in the two cases is compared to identify products not
present in some of the clusters. The list of removed LIS is also compared to
identify products not present in some clusters. These missing products are the
cross-selling candidate opportunities. The acceptance of candidates as
actionable opportunities is usually driven by the number of customers that
bought the missing product. Too small a group of customers will have too little
return to justify the promotional investment.
5.3.4.5 Rebuild Rules
Once you have determined the candidate opportunities, it is important to
reconstruct the original and actual rules present in the data. Removing rules
from the transaction data affects the statistics of the rules. To get accurate
statistics, the LIS must be returned. Adding the LIS back one by one and
observing the change in the discovered associations will give useful insight into
the rule ″structure.″
5.4 Data Mining Results
As mentioned before, the result of the demographic clustering process as
described in our first case study has been used for this case study. Identifying
opportunities for cross selling is a two-step process:
1. Select customer segments created before and select those clusters
containing valuable customers.
2. Perform the product association discovery on the selected cluster data.
5.4.1 Cluster Selection
We created Table 3 on page 77 using the result of the demographic clustering,
enhanced by some data we selected through query analysis against the original
cluster data.
76
Intelligent Miner Applications Guide
Table 3. Demographic Clustering Results: Percentage
Cluster
ID
Profit
Customer
RevenueLl2
Product
Index
Leverage
(Profit/Cust)
Tenure
5
34.74%
8.82%
32.83%
1.77
3.94
60.92
6
26.13%
23.47%
28.36%
1.41
1.11
57.87
7
21.25%
10.71%
20.10%
1.64
1.98
63.52
3
6.62%
23.32%
5.98%
0.73
0.28
47.23
0
4.78%
3.43%
6.78%
1.45
1.40
31.34
2
4.40%
2.51%
3.00%
1.46
1.75
61.38
4
1.41%
2.96%
2.46%
0.99
0.48
20.10
8
0.45%
14.14%
0.47%
0.36
0.03
30.01
1
0.22%
10.64%
0.01%
0.00
0.02
4.66
Total
100.00%
100.00%
100.00%
The two clusters chosen for further study in this case study were clusters 3 and
6. From Table 3 you can see that cluster 6 represents a profitable customer
segment with 26% of revenue represented by 23% of customers. In contrast
cluster 3 represents only 7% of revenue for 23% of customers. The number of
products used by cluster 6 customers (indicated by the product index) is greater
than that for cluster 3. Query analysis reveals that the difference is on average
two products. Furthermore cluster 6 has a slightly longer tenure. These two
clusters were chosen because of the sizeable opportunity and the small gap in
purchase behavior between them.
5.4.2 Association Rule Discovery
We initially performed product association discovery on the selected cluster
data, using the Intelligent Miner parameter settings illustrated in Figure 30 on
page 78 .
Chapter 5. Cross-Selling Opportunity Identification
77
Figure 30. Parameter Settings for Associations
5.4.2.1 Association Results for Cluster 6
Figure 31 shows that associations for the entire Good Customer Set returned
many (2,218) rules.
Figure 31. Associations on Good Customer Set
Figure 32 on page 79 shows that the frequent item sets include loan (94%),
mortgage (90%), and credit card (79%).
78
Intelligent Miner Applications Guide
Figure 32. Associations on Good Customer Set Detail
Figure 33 shows the associations when loan, mortgage, and credit card are
removed. Note that the number of rules has been reduced to 286.
Figure 33. Associations for Good Customer Set: LIS Removed
There are many multiple-item (more than four) rules in the Good Customer Set
(see Figure 34 on page 80.)
Chapter 5. Cross-Selling Opportunity Identification
79
Figure 34. Associations for Good Customer Set: LIS Removed, Detail
5.4.2.2 Okay Customer Set
Figure 35 shows the associations for the entire Okay Customer Set. Many (212)
rules have been generated.
Figure 35. Associations on Okay Customer Set
Figure 36 on page 81 shows that the frequent item sets include loan (70%),
mortgage (70%), and credit card (24%). The substantially lower frequency of
credit card activity in cluster 3 represents a cross-selling opportunity.
80
Intelligent Miner Applications Guide
Figure 36. Associations on Okay Customer Set Detail
Figure 37 shows the associations when loan and mortgage are removed. Note
that the number of rules has been reduced to 48.
Figure 37. Associations for Okay Customer Set: LIS Removed
No rules in the Okay Customer Set contain more than four items. Further
detailed comparison of the association rules will reveal other cross selling
opportunities. The largest cross-selling opportunities are revealed by
differences in the large item sets.
5.4.2.3 Association Rules Discovery: Product Detail Level
So far, all of the associations have been processed on product categories that
summarize products. A comparison of Figure 38 on page 82 (and Figure 39 on
page 82) with Figure 33 on page 79 (and Figure 34 on page 80) shows what
happens when associations are run on a more detailed level. With low-level
products instead of product categories specified, associations exploded from 286
to 1,521 for the Good Customer Set.
5.4.2.4 Good Customer Set
Chapter 5. Cross-Selling Opportunity Identification
81
Figure 38. Associations for Good Customer Set: LIS Removed, Summary
Figure 39. Associations for Good Customer Set: LIS Removed, Detail
Figure 40 and Figure 41 on page 83 show the results of associations when in
addition to all types of loans, mortgage and credit cards being removed we also
removed frequent account types from the Good Customer Set customer product
sets. Note the number of rules has been reduced to 55.
Figure 40. Associations for Good Customer Set: LIS and Certain Products Removed,
Summary
82
Intelligent Miner Applications Guide
Figure 41. Associations for Good Customer Set: LIS and Certain Products Removed,
Detail
5.4.2.5 Okay Customer Set
Figure 42 and Figure 43 show the results of associations when transactions
containing large item sets are removed from the Okay Customer Set. The
number of rules has been reduced from 513 to 15.
Figure 42. Associations for Okay Customer Set: LIS and Certain Products Removed,
Summary
Figure 43. Associations for Okay Customer Set: LIS and Certain Products Removed,
Detail
The difference in large item sets removed for cluster 6 and cluster 3 reveals the
opportunity to cross-sell web banking to cluster 3. Furthermore, a lower
frequency of occurrence of term deposits in cluster 3 reveals another substantial
Chapter 5. Cross-Selling Opportunity Identification
83
opportunity. Further detailed comparison of the differences in the rules
generated in cluster 6 and cluster 3 will reveal further cross-selling
opportunities.
5.4.2.6 Association Rule Discovery for the Entire Universe
Figure 44 and Figure 45 show the association rule results discovered from
mining against all customer transaction records without using segmentation.
Product categories were used in this example.
Figure 44. Associations for A l l Transactions: LIS Removed, Summary
Figure 45. Associations for A l l Transactions LIS Removed Detail
The number of rules in this case is 480 compared to 286 from cluster 6. The
increased number of rules results in a more complex analysis. Furthermore, the
lack of segment objectives makes it difficult to know what to search for.
84
Intelligent Miner Applications Guide
5.5 Business Implementation and Next Steps
Several cross-selling opportunities were identified. At the product category
level, cross-selling CREDIT CARD to cluster 3 was the best opportunity. The
strategic initiative for cluster 3 derived in the segmentation case study was to
identify cross-selling opportunity. This CREDIT CARD opportunity is thus
consistent with the segment objectives.
With previous methods and common business experience the Bank recognized
that cross selling CREDIT CARD to its customer base was an objective. The
method presented in this case study provides the additional benefit of targeting
CREDIT CARD to customer segments that are high shareholder value.
Using more detailed product definitions revealed several specific product cross
selling opportunities. These included cross-selling term deposits and web
banking to cluster 3.
Before the cross-selling opportunity can be implemented, several activities must
be completed. Some demographic profiling or clustering of the target universe
is required to assist the marketer and advertiser in creating the appropriate
marketing message and selecting the appropriate marketing channel. It is also
not efficient to target the entire group for this cross-selling campaign. Other
factors must be considered to target those customers most likely to use a credit
card. Building a predictive model to target those customers most likely to use or
require a Credit Card as well as targeting those customers most likely to pass a
credit check would further reduce the mailing cost in executing this campaign.
The creation of a predictive model to target these customers is the topic of the
next case study.
Chapter 5. Cross-Selling Opportunity Identification
85
86
Intelligent Miner Applications Guide
Chapter 6. Target Marketing Model to Support a Cross-Selling
Campaign
In this case study, we build three predictive models to target those customers
likely to buy the product identified as a cross-selling opportunity in the previous
case study. Several algorithms from Intelligent Miner are used. The models
built with the Intelligent Miner (decision tree, radial base function (RBF)
regression, and neural network) are compared.
6.1 Executive Summary
In Chapter 5, “Cross-Selling Opportunity Identification” on page 69 we describe
how we identified a new business opportunity: cross-selling credit cards to
existing customers to make them more profitable. The focus was on customers
in the Okay Customer Set. Our strategy was to market the Bank′s credit card to
these customers. By getting some of the customers in the Okay Customer Set to
use a credit card we would migrate them to the Good Customer Set and
increase their profitability.
A simple approach would have been to conduct a direct campaign for mail all
customers in the Okay Customer Set, but our limited marketing budget did not
permit that. Furthermore, an additional goal was to reduce the customer
acquisition and thus increase the campaign ROI, reduce the cost and maximize
the number of customers cross-sold.
Using the customers from the Okay Customer Set and those of the Good
Customer Set we created a data set that we could use to predict which
customers in Okay Customer Set had a propensity to use a credit card. We built
three predictive models, using the three prediction techniques available in
Intelligent Miner:
•
Decision tree
•
Value prediction with RBF
•
A neural network
A process for predictive modeling was presented to the analysts and the results
from each algorithm were compared.
The neural network model had the best performance. By mailing only 40% of
the total Okay Customer Set, we managed to include 76% of the customers with
the highest propensity to use a credit card. Furthermore, we expected to get an
ROI of 113%, in contrast to the 60% ROI we could have expected by mailing the
entire Okay Customer Set. The higher ROI was achieved by reducing the cost
per customer acquisition from $167 to $88.
Table 4 on page 88 summarizes the financial details.
 Copyright IBM Corp. 1999
87
Table 4. Cross-selling: Summary - Predictive Modeling M o r e Than Doubles ROI
Without
Prediction
Model
Mailed customers
25,000
Cost of mailing material and mailing per
customer
$5
Total cost
$125,000
New acquisitions
With
Prediction
Model
10,000
$5
$50,000
750
570
Cost per acquisition
$167
$88
Average profit/year per customer
$100
$100
$75,000
$57,000
Total profit per year
Return on investment
60%
113%
6.2 Business Requirements
The general objective in this case study was to improve company revenue and
profitability by attracting more customers to the credit card. We specifically
wanted to cross sell customers from the Okay Customer Set and in successfully
doing so increase their profitability and move them to the more profitable Good
Customer Set. The average profit from a customer who uses a credit card is
$100 per year.
We used a direct mailing campaign to target the best prospects for the credit
card from the Okay Customer Set.
The first task in designing the campaign was to establish a baseline against
which to measure the success of the planned mailing campaign. In other words,
we had to calculate the ROI, that would be expected without any data mining,
from such a mailing campaign. We calculated the ROI by looking at the historical
trends in the movement of customers from the Okay Customer Set to the Good
Customer Set. Table 5 summarizes the calculations.
Table 5. Cross-Selling: Baseline ROI Calculation
Total number of customers
Cost to mailing and creation (per piece)
Total cost of mailing
Expected takeup rate
Expected new acquisitions
Cost per acquisition
Average profit per customer
Total profit per year
Return on investment
25,000
$5
$125,000
25,000/734 = 3 % ( s e e F i g u r e 48 on
page 94)
750 (3% of 25,000)
$125,000/750 = $167
$100
$100 * 750 = $75,000
$75,000/$125,000 = 60%
The Okay Customer Set and Good Customer Set total approximately 25,000
customers. The expected response credit card offer is about 3%, based on the
observed movement of customers from the Okay Customer Set to the Good
88
Intelligent Miner Applications Guide
Customer Set (see Figure 48 on page 94 for details). The baseline ROI for a
mass marketing campaign is 60%. Thus it was not feasible to use mass
marketing methods to implement the direct mailing campaign. The specific
campaign goals were to achieve a positive return while moving as many
customers as possible from the Okay Customer Set to Good Customer Set.
6.3 Data Mining Process
Our general approach was to build predictive models to help identify those
customers in the Okay Customer Set who were the best prospects for using the
credit card. In fact, we used several different predictive techniques in Intelligent
Miner, both as a means of gaining more insight into the target customer set and
as a cross-validation of the different mining algorithms. Figure 46 on page 90
illustrates the overall approach.
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
89
Figure 46. Data M i n i n g Process: Cross-Selling
6.3.1 Create Objective Variable
The first step in the predictive modeling process is to determine the objective
variable to be modeled. When building models for targeting direct mail
campaigns, the objective variable is usually based on the customer′s historical
response data to similar campaigns. In this particular case, we did not have a
historical campaign similar enough to the proposed campaign to use to create a
response variable. In practice this situation occurs frequently. An alternative to
building a response model is to build a propensity model. A propensity model
90
Intelligent Miner Applications Guide
predicts which customers, who do not currently purchase the product being
cross-sold, have a higher likelihood or propensity to purchase the product.
The first and critical step in creating the target or objective variable is to select
the time period in consideration. Setting the objective variable correctly is
critical. The size of the time window selected is driven by the time horizon of
the desired prediction. For instance, if the marketing campaign to be executed
has a six-month window for the customer to respond, the objective variable
should be defined with a six-month window.
In this case, the marketing objective was to cross-sell a particular product to a
group of customers targeted for a direct mail campaign that would have a
six-month window of opportunity for the customer to respond. We thus
considered customers who used the credit card product in question in the most
recent six-month period. In fact, we only considered customers who had
activated in the most recent six months, that is, they had never used the Bank′ s
credit card more than six months ago. This last statement is extremely
important. To create a predictive model we must be able to predict the future
behavior of a customer before that customer exhibits the behavior. If we are to
predict the propensity of a customer to use the Bank′s credit card in the next six
months, we must do so using past data for the customer, that is, data from a
period before the customer used the credit card.
We assigned the objective variable a value of 1 if a customer had no credit card
in the previous third quarter of 1997 but had activity in the third or fourth quarter
of 1997. All other customer records were assigned a value of 0 for the objective
variable. To predict those customers who activated in the third and fourth
quarters of 1997, we used the customer transaction records for only the first two
quarters of 1997 (see Figure 47).
The final consideration in creating data for prediction is to ensure that no data
related to the objective variable is used for prediction. For instance, if you are
predicting the credit card profitability of customers, do not use credit card data
because it is one and the same. Profitable credit card customers will have more
activity on their cards, so using credit card activity to predict credit card
profitability is a self-fulfilling prophecy.
Figure 47. Creating an Objective Variable
We also selected customers only from the Okay Customer Set and the Good
Customer Set instead of sampling from the entire customer universe. By
focusing on these customer clusters, both of which were profitable, we
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
91
automatically eliminated customers who were low profit from the direct mail
campaign. Note that some of the customers in the Okay Customer Set and Good
Customer Set had no activity in the last 12 months with the credit card we were
cross-selling. It is important to have a mix of target groups and non-target
groups in order to develop a model that can distinguish between the two
extremes.
We specifically chose three different types of data to use for the prediction
modeling process problem:
1. Customer transaction data
Transaction data includes revenue, the number of transactions, and recency
of transactions from each Bank product category by time period (in this case
by quarter). The Bank′s data warehouse categorizes products into 14 groups
as explained in 4.3.1, “Data Selection” on page 36. We therefore created a
total of 84 variables (3 variables * 14 categories * 2 quarters).
2. Customer demographic and summary data
This data consists of demographic data including customer age, gender,
income, and household size. The Bank′s data warehouse also contains
summarized transaction data including total revenue lifetime to date (LTD),
number of products used LTD, total transactions LTD, recency, and first
transaction date.
3. Third party census and tax data
In Canada the government permits the reselling of census and tax data. This
data is aggregated to the enumeration area which contains 300 to 400
households. A variety of data is available including profession, ethnicity,
education, income, and income by source.
Promotion history data is also typically used when building response models for
direct mail campaigns. In this case, promotion history was not available for the
particular product we were targeting, so the model built is not a response model.
(The credit card we identified as a cross-selling opportunity was a card with new
features that had not been marketing via direct mail previously). It is a
propensity model, which predicts the customer′s propensity to use a credit card.
6.3.2 Data Preparation
We performed two types of data preparation: data cleaning and transformations.
6.3.2.1 Data Cleaning
The data cleaning required for predictive modeling is similar to data cleaning for
clustering as discussed in 4.3.2, “Data Preparation” on page 38. The only
difference is that more care is required in assigning values to missing records.
In choosing a value to assign, the resulting distribution of the variable in
question should not be drastically altered. Ideally you want to assign values that
do not change the characteristics of the distribution (for example, the min, max,
and mean). If it is not possible to assign values without dramatically altering a
variable′s distribution, discard that variable to avoid spurious correlations.
We assigned all transaction variables that had missing values a value of zero.
Such an assignment is appropriate as an absence of transaction activity (null in
the database) implies zero activity. We discarded demographic data with
missing values if the missing portion was significant. Also, we created binary
variables indicating the missing portion of all categorical variables.
92
Intelligent Miner Applications Guide
6.3.2.2 Data Transformation
After we cleaned the data, handled all missing and invalid values, and made the
known valid values consistent, we transformed the data to maximize the
information content that can be retrieved.
For statistical analysis the data transformation phase is critical, as some
statistical methodologies require that the data be linearly related to an objective
variable, normally distributed and containing no outliers. Artificial intelligence
and machine learning methods do not strictly require the data to be normal or
linearized, and some methods, like the decision tree, do not even require
outliers to be dealt with. This is a major difference between statistical analysis
and data mining. The machine learning algorithms can automatically deal with
the nonlinearity and nonnormal distributions, although the algorithms work better
in many cases if these criteria are met. A good statistician with a lot of time can
manually linearize, standardize, and remove outliers better than the artificial
intelligence and machine learning methods. The challenge is that with millions of
data records and thousands of variables, it is not feasible to do this work
manually. Also, most analysts are not qualified statisticians, so using automated
methods is the only reasonable solution.
After cleaning the original data variables, we created new variables using ratios,
differences, and business intuition. We created total transaction variables, which
were the sum of the transaction variables over two quarters. We used these
totals as normalizing constants to create ratio variables. We created timeseries
variables to capture the time difference in all transaction variables between
quarters.
The data set for predictive modeling is almost identical to that created for a
clustering model except that more care is taken in the data cleaning and data
transformation processes. In addition it is important to remove all colinearities
from the input variables before you execute any algorithms. Variables that are
colinear cause most data mining algorithms difficulty and worsen model
performance. Colinearities can be removed by using either:
•
Correlation analysis
•
Principal component analysis
•
Regression
Removing colinearities is especially important when you use RBF and the neural
network. One of the assumptions made in the back-propagation algorithm is that
the input variables are linearly independent. If this is not true, it may take a long
time to train the neural network, and the results may be poor depending on how
correlated the inputs are.
6.3.3 Data Sampling for Training and Test
Finally, we took a sample of the data for training and testing the Intelligent Miner
prediction algorithm. (see Figure 48 on page 94).
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
93
Figure 48. Cross Selling: Data Sampling (5252f405/50)
It′s important to create both a training data set, which is used to build the model,
and a test or hold-back data set, which is used to test the model. A model should
be tested against data that it has never seen before to ensure that there is no
overfitting. In this case we were trying to build a model that would predict a
binary outcome (that is, the customer propensity to a particular product). In the
customer universe of the Okay Customer Set and the Good Customer Set, we
sampled approximately 23000 records. The distribution of a positive event (that
is, customer used a credit card in second half of 1997 but not before) was 734
records out of the 23000. A minimum number of positive events required to build
a predictive model is approximately 250. (We chose that number on the basis of
our experience). On the basis of the this distribution, we randomly split the
entire file into two equal sized data sets as shown in Figure 48. One data set
was to be used for testing and was left as is.
The training portion of the data set was further sampled to create a 50/50
distribution of the target variable. This is known as stratified sampling . When
the distribution of the target to non target is less than 10%, stratified sampling
tends to improve the modeling results. The stratified sample data set size is
usually driven by the number of positive events, but when the number of records
becomes small, as in this case, it is important to consider the sample size of the
non target or negative events relative the entire universe. To avoid the sample
bias, the sample of non targets should not be too small. If sample bias is a
concern, it is possible to distribute the target to non target events non evenly (for
example, 20/80) or to duplicate records with positive events to permit a larger
non target sample. To consider these effects, we recommend creating multiple
training data sets with different target and non target distributions to ensure
valid samples and to maximize model performance.
94
Intelligent Miner Applications Guide
In this case study we simply explored the 50/50 case for the sake of brevity, even
though the non target sample is small. (The wavy results in the gains charts in
Figure 51 on page 105 could be a symptom of these effects.)
6.3.4 Feature Selection
If the number of variables in the training and test data sets is very large (i.e.
greater than 300), it is useful to reduce the number of variables before building
any models. Feature selection, the process of selecting a subset of variables
most correlated to the target variable from a larger set of variables, is an entire
discipline in itself. Here we simply mention some of the methods for selecting
variables:
•
Linear and non linear correlation
•
Principal components analysis
•
Factor analysis
•
Regression
•
Decision trees
Most problems that we have worked on had over 1000 variables at the outset,
and feature selection was a part of the predictive modeling process.
6.3.5 Train and Test
We used several methods to build the predictive model after preparing the data:
•
Classification using a decision tree
•
Value prediction with RBF regression
•
Classification with a back-propagating neural network
Figure 49 on page 96 outlines the detailed steps in running a predictive
modeling algorithm.
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
95
Figure 49. Detailed Predictive Modeling Process
6.3.5.1 Algorithm Selection
The first algorithm that is usually used to build a model is the decision tree.
There are two reasons for using a decision tree:
1. The tree is very good at finding anomalies in the data. The first half dozen
runs typically fine tune the data preparation. The decision tree discovers
missed details.
2. The tree can also be used as a data reduction tool. It typically reduces the
number of variables by one order of magnitude if a few hundred variables
are input into the algorithm. The tree algorithm is very scalable, and
performance is not hampered by several hundred variables. Selecting the
variables that are in the tree model as input to the value prediction and
neural classification algorithms improves their accuracy and performance.
The tree is used to create a reduced set of variables. The top 10 to 20 variables
are selected from the tree according to the position and number of occurrences
in the tree (that is, the higher up in the tree a variable occurs and the more
times it occurs, the more significant it is). Value prediction with RBF requires a
reduced set of variables because the algorithm does a clustering pass before
building the predictive model. The more variables present, the more difficult it is
96
Intelligent Miner Applications Guide
to get good clusters and the worse the results. If you use linearly independent
input variables, such as those created by principal components analysis, RBF
can handle many more variables. Creating training and test data sets with
Principal Component Analysis factors can improve the accuracy of both RBF and
the neural network.
The neural network also requires a reduced set of input variables. The major
concern in using the neural network is the algorithm run time. This is the
reason for selecting a reduced variable set.
6.3.5.2 Parameter Selection
After completing the data preparation and selecting the data set to be mined (in
this case the training data set first) you have to select the algorithm parameters.
Set the basic parameters first and, if you are an advanced user, you can set
advanced settings to be different from the defaults.
Decision Tree — For the decision tree the parameters available for selection are:
•
Maximum tree depth
The maximum tree depth sets the maximum number of levels to which the
tree can grow. We typically leave this at no limit. When no limit is chosen,
the algorithm fits the data and then prunes back the tree, using minimum
description length (MDL) pruning. If you want to prevent overfitting or limit
the complexity of a tree, set this limit.
•
Maximum purity per internal node
The maximum purity per internal node sets a limit for the purity beyond
which the tree will no longer split the data. We typically leave this at 100%,
which allows the tree to fit the data before pruning. If you are concerned
about overfitting, choose a lower value.
•
Minimum number of records per internal node
The minimum number of records per internal node sets a minimum number
of records required per node. We typically set this parameter to 50. If a node
contains at least 50 records, the resulting rule is likely to be statistically
significant.
Value Prediction with RBF — For the RBF algorithm tree the parameters
available for selection are:
•
In-sample size
In-sample size is the number of consecutive records assigned to the training
data set before out-sample records are assigned to the cross-validation data
set. The ratio of in-sample size to out-sample size is the same as the ratio
of the training to cross-validation data sets. A cross-validation data set is
used to test the accuracy of the model between successive passes through
the data and model iterations. Cross-validation is used to choose the best
model and to minimize the likelihood of overfitting. Although this algorithm
has cross-validation, we strongly recommend that the model be tested
against the hold back test data set. The in-sample to out-sample ratio is
driven by the number of positive target events. You would like to have at
least 250 positive target events in the training or in-sample. If this criterion
is met, we usually use an 80/20 split, where the in-sample data set is the
larger data set.
•
Out-sample size
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
97
Out-sample size is the number of consecutive records assigned to the
cross-validation data set.
•
Maximum number of passes
Maximum number of passes is the maximum number of passes the
algorithm makes through the data. This is a stopping criterion for the
algorithm (that is, if the algorithm has not achieved its accuracy criterion, it
will continue to run on until it has made the maximum passes through the
data). We usually start with 25 passes. If the algorithm uses less, the value
chosen was good. If the algorithm stops at 25 passes, we recommend
doubling the number of passes until the accuracy result is achieved before
the maximum passes or until it seems that the accuracy criterion will not be
achieved no matter how high the number of passes.
•
Maximum number of centers
Maximum number of centers is the maximum number of gaussian regions
that will be built by the model. If this value is set to zero, the algorithm
chooses the number of centers to maximize the accuracy.
•
Minimum region size
Minimum region size is the minimum number of records that the clustering
portion of the algorithm will assign to one gaussian region. Any gaussians
with less than this number of records will be deleted after each pass through
the data. We use approximately 50 records so that the gaussians are
assigned to regions that are statistically significant. If there are not sufficient
data records to set the minimum region size to 50, choose a minimum region
size to get at least 5 to 10 regions in the output.
•
Minimum number of passes
Minimum number of passes is the minimum number of passes the algorithm
will take through the data. During these initial passes, the algorithm does
not do cross-validation.
Neural Network — For the neural network algorithm the parameters are:
•
In-sample size
The in-sample and out-sample size parameters are used to split the input
data set into a training data set and cross-validation data set exactly as
described above for value prediction with RBF regression. The neural
network uses the cross-validation data set to choose a network architecture
as well as to find the weights that minimize the model root mean square
error. Again it is important to test the model against a hold back test data
set.
•
Out-sample size
Out-sample size is the number of corrective records assigned to the
cross-validation data set.
•
Maximum number of passes
Maximum number of passes is a stopping criterion for the algorithm. If the
accuracy and error criteria are not achieved, the algorithm will stop after
taking the maximum number of passes through the data. We use 500 passes
as a starting point and test the effect of increasing the number of passes on
the accuracy.
•
98
Accuracy
Intelligent Miner Applications Guide
Accuracy is a stopping criterion for the algorithm. It is the percentage of
records that the algorithm classified correctly. The accuracy is tested
against the out-sample or cross validation data set.
•
Error rate
Error rate is a stopping criterion for the algorithm. It is the percentage of
records that the algorithm classified incorrectly. This is different from the
accuracy rate, because an unknown class is assigned if the network cannot
make a decision. In the predictive modeling case, where you are interested
in simply rank ordering the records, which is different from classification, the
accuracy and error rate of classification are not necessarily important. The
network may have poor accuracy yet still rank order the records correctly.
The network outputs a confidence, which is the actual output of the neural
network that can be used to rank order.
•
Network architecture
Using the manual architecture option of IM it is possible to assign the
number of nodes per hidden layer. The neural network can have up to three
hidden layers. The number of nodes in each layer can be selected by
specifying the number in the hidden layer 1, hidden layer 2, and hidden layer
3 parameters. Selecting the default setting or automatic architecture
determination causes the algorithm to iterate several architectures and
choose the best one based on preliminary cross-validation results. Unless
you have some reason to specify an architecture, we recommend using
automated architecture selection. Sometimes the algorithm creates a neural
network with no hidden layers. In this case, to compare results, you may
want to force some hidden layers to compare results.
•
Learning rate
Learning rate can be used to control the rate of gradient descent. The
parameter can range from 0 to 1. Too high a value causes the network to
diverge, and too low a value causes the neural network to train very slowly.
The academic literature recommends a value of 0.1, which seems to work
best in most cases. This is the default setting. If the algorithm is converging
too slowly, you might slowly increase the value of this parameter.
•
Momentum
Momentum can be used to control the rate of convergence. It controls the
direction of gradient descent and it is the fraction of the previous direction
that is maintained in the current descent step. The parameter can range
from 0 to 1. Too high a value causes the algorithm to converge very slowly
or not at all as the descent direction is not sufficiently changed. Too low a
value causes very slow convergence as the convergence direction changes
too much, causing the descent direction to ″zig-zag″ across the error
surface. The academic literature recommends a value of 0.9, which is the
default value.
6.3.5.3 Input Field Selection
In this case we selected transaction and demographic data that we had created
as input to the tree algorithm. Refer to 6.3.1 ″Data Definition″ on page 146 for a
detailed description. We also included some of the clustering algorithm data in
the tree to test their significance in predicting propensity to use a credit card.
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
99
Decision Tree — To rank order the records in order of the customer propensity
to buy a particular product, you must set the objective variable type to discrete
numeric or continuous.
Valuable Prediction with RBF — The RBF algorithm requires that the objective
variable be continuous. RBF also allows the use of supplementary variables,
which are profiled by model region but not used to build the model. This useful
feature of RBF enables you to immediately profile the model scores. We selected
the top decision tree variables as input to the RBF algorithm.
Neural Network — The neural network requires that the objective variable be
categorical. We selected the top decision tree variables as input to the neural
network algorithm.
6.3.5.4 Output Field Selection
To build a gains chart, the minimum output requirements are, for each algorithm:
•
Customer ID
•
Objective variable
•
Algorithm prediction
If you want to use the output of one algorithm, including its prediction, as the
input to another algorithm, you should output the entire original data set. Having
the scores of multiple algorithms in the same file is useful for comparisons. For
instance, if the tree places certain records in the top decile, and the RBF
algorithm assigns them to the middle decile, it may be possible to correct the
models by creating new variables or altering the training data set to compensate
for this disagreement.
6.3.5.5 Results Visualization
We used three different mining algoritms in this case study. The following gives
an idea how the result presented may look for each of the algorithms. We also
show how the most common problems for each algorithm appear within the
result visualization.
Decision Tree — The tree algorithm outputs a summary screen showing the
mean and root mean square error. From this screen it is possible to view both
the unpruned and pruned trees. As discussed earlier, the tree will find all data
anomalies. Symptoms of anomalous trees are:
•
One leaf only
This symptom is caused by using a variable in the input data that is perfectly
correlated with the objective variable. This typically occurs with variables as
dates or customer IDs or other fields that are unique to each customer.
These fields produce a one-to-one mapping to the objective.
•
Highly unbalanced tree with only one leaf to one side of the root node
This symptom is also caused by a variable that is highly correlated with the
objective variable.
•
Very shallow tree when many input variables are used
This symptom can also be caused by variables that are highly correlated
with the objective.
Reasonable tree visualizations should produce balanced trees with a reasonable
number of levels, depending on the number of input variables. The purity of the
100
Intelligent Miner Applications Guide
leaf nodes should range from highly pure with either value of the target to leaf
nodes with mixed distributions of the target values.
Value Prediction with RBF — The RBF algorithm outputs a visualization similar to
that of the cluster viewer. The main difference between the two visualizations is
that RBF presents the records by region and not cluster. A record is assigned to
the region to which it has the highest probability of belonging. The visualization
shows the average model score by region and root mean square error by region
as well.
Anomalous results are indicated by these symptoms:
•
Visualizations with only one or two regions
This symptom is usually indicative of very strong predictor variables that
mask the effect of other input. In this situation look at the decision tree
results to determine whether there are segments at the top of the tree with
the same variables that are most important to the regions. To correct this
situation, remove the strong variables from the chosen input fields and split
the data into multiple files based on the segmentation by the strong
variables as indicated by the tree. It is then possible to run RBF against each
of the separate files, and after scoring, simply append the results into one
file.
•
A low ratio of the average score in the top region to the average score in the
bottom region
This symptom is caused by either too many input variables or just poor data
used for the prediction problem.
Good results are indicated by:
•
A high ratio (> 2-3) between the top and bottom regions′ average prediction
scores
•
Several ( >5 ) regions present
Neural Network — The algorithm outputs a confusion matrix that shows the
classification accuracy of the network. The algorithm adds an unknown class to
the possible predicted class set. Anomalous output is indicated by too many
records being classified as unknown. If too many records are unknown, try
increasing the number of passes. The algorithm also outputs a sensitivity matrix
that assigns each input variable a percentage. The percentage indicates how
sensitive the output is to changes in that variable. Anomalous results may occur
if one or a few variables contribute a vary large fraction of the sensitivity. These
variables may indicate the presence of segments in the data. If this occurs,
observe the decision tree results and see whether the high sensitivity variables
occur at the top of the tree. If they do, split the data into segments as indicated
by the tree, and train a neural network for each segment. Once each segment is
scored, it is possible to append the results together to analyze.
6.3.5.6 Results Analysis/Refinement
The results of a predictive model that rank orders records are typically displayed
as a gains chart (see Figure 51 on page 105). A gains chart contrasts the
performance of the model within the results achieved by random chance.
Several iterations of the algorithm are executed, varying the parameters. Gains
charts for each run should be compared and studied. In training mode the gains
curves should be perfectly smooth, and the counts of the positive target event by
decreasing decile should be monotonically decreasing and non wavering.
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
101
6.3.5.7 Run Model against Test Data
To ensure that the model has not overfit the data and to assess the model
performance against a data set that has the same characteristics as the
application universe, the model should be executed against the test data in test
mode. Test mode permits using an existing model to score the records. The
test mode results, should be approximately equal to the training results, except
when stratified sampling is used. When stratified sampling is used, the test
mode gains chart should be better for the test data set than the training data set.
The performance of the model prediction by descending decile should result in a
monotonic decrease in the counts of positive target events. Any wavering in the
top deciles of the model that are likely to be mailed should be studied. The
cause of the wavering should be identified and corrected. If the model performs
well against the test data set, it should perform similarly against the application
universe, if both populations have the same statistics. (This point is discussed
further in 6.3.7, “Perform Population Stability Tests on Application Universe” on
page 103.)
6.3.6 Select ″Best Model″
After using gains charts to analyze the model results, you have to explain why
the model is scoring as it is. Perform clustering on the input data, using the
score decile (or other quantile) as a supplementary variable, and observe and
characterize the clusters that appear. If the model is working properly, the
clusters should separate the quantile field. The clusters can be used to explain
the difference between records containing high scores and low scores. Compare
the characterizations of the scores from each of the algorithms to determine
whether the algorithms are observing the same effect or one algorithm is
discovering something the others are not. Use the differences found to
iteratively improve the results of each algorithm.
After having iteratively improved the models, you chose the ″best model″.
Typically the best model has the highest performance as measured by the gains
chart, that is, it rank orders the input records the best. Sometimes, however,
you may choose a model that does not rank order the best, for several reasons:
•
The model is easy to explain
Sometimes the best model contains variables that are not easily explained
or are not the related to the current business problem, and it may be difficult
to justify its application. It is just as important to be able to explain why a
model works as it is for the model to work well.
•
The model agrees with the current business intuition
If the model reflects the current understanding of the factors that affect the
business problem, more confidence can be assigned to the result.
Furthermore, if some new learnings are present with the current
understanding, more confidence can be assigned to the new learnings. If a
model contains unusual factors that cannot be explained, the model should
not be implemented.
•
The model is simple to implement
A simple model with few variables or that requires little data processing is
preferable to more complex models. The implementation of complex models
could result in errors as well as a high tendency to overfit.
102
Intelligent Miner Applications Guide
6.3.7 Perform Population Stability Tests on Application Universe
After you have selected the ″best″ model, it is crucial to ensure that the
application data set that the model will be implemented against is the same as
the test data set that the model was tested against. The similarity can be
determined by univariate and multivariate profiles of the data sets. A
comparison of statistics from these profiles should show very little difference
between the universes. If the statistics are very different, the model will
probably not work properly. The statistics could be different for these few
reasons:
•
Sample bias
The test and training samples created were biased samples. If this is the
case, the data should be re-created, and the modeling process repeated.
•
Incorrect problem setup
The design of test and training data differs from design to which the model
was intended to be applied. The process used to create the application data
set should be identical to the process used to create the test data set, except
of course for the difference in time periods.
6.4 Data Mining Results
In this section we explain how to use the Intelligent Miner visualization tools to
present the results of the mining algorithms and how to interpret those results.
6.4.1 Decision Tree
Figure 50 on page 104 shows the visualization results from the decision tree.
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
103
Figure 50. Decision Tree Results: Isolating the Key Decision Criteria
This result was achieved after several iterations during which some variables
were removed. The variables that appeared in the tree rules included total
revenue, total number of transactions in Q1, savings account revenue in Q2, the
number of savings account transactions in Q2, best customer in 96, and the
second-choice cluster ID assigned by Intelligent Miner during the Customer
Segmentation case study (see Chapter 4). All of these variables agreed with the
current business understanding.
The gains chart for the training data produced a smooth curve as expected. The
training results are typically non-interesting, as in most cases the models
achieve good results against training. A more important test is how well the
model performs against the test or holdback data set. A gains chart was created
for the test data set (see Figure 51 on page 105).
104
Intelligent Miner Applications Guide
Figure 51. Gains Chart for Decision Tree Results (s406/0.0)
Gains Chart — A gains chart is a graph created from the rank ordered model
scores. For algorithms that create a continuous score, such as RBF and the
neural network, the score variable can be quantiled. The gains graph chart is
then created by plotting the number of cumulative positive events by descending
quantile versus the cumulative number of records by descending quantile. For
an algorithm containing discontinuous scores, such as the decision tree, it is not
possible to quantile the scores. The decision tree scores records by assigning
the average leaf node score to all records in the leaf node. You can therefore
build the gains graph chart by plotting the number of cumulative positive events
by descending leaf node score versus the cumulative number of records by
descending leaf node score. The line labeled random indicates that a random
rank ordering of records results in an even number of positive events by
quantile. This is the expected result for random ordering. If our model is rank
ordering well, there should be more positive target events in the top quantiles,
and the slope of the gains curve should be higher in the top quantiles than the
random line. This higher slope will result in a curve that is lifted above the
random line.
In most cases the business action taken using the output of a predictive model
typically uses 10%-40% of the possible universe. It is therefore important to
note the ratio of gains curve to random at the implementation cutoff. In
Figure 53 on page 109 we observe a lift of approximately 1.5 at 25% of the
Universe. This is a modest lift curve. In our experience most gains charts have
a lift ratio ranging between 1.5 to 3.5. Too low a lift indicates that the data is not
very predictive of what was being modeled. Too high a lift is also suspicious
and may indicate sample bias or the use of input data that is too closely related
to the target variable.
Another feature to note is the smoothness of the gains curve. If the curve is very
smooth, the number of positive events by quantile is distributed monotonically.
A monotonic distribution of positive events indicates that the model is rank
ordering correctly. A wavy curve indicates that the positive events are not
monotonically distributed. This implies that there is a secondary factor in the
data that the model did not capture. If the waviness occurs in the top quantiles
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
105
or in a range in which you intend to use the model, it should be corrected. If the
waviness occurs in the bottom quantiles or out of range, you can ignore it.
In this case the tree gains chart had a modest positive lift of approximately 1.5
times random and was a smooth curve.
6.4.2 RBF
Figure 52 on page 107 presents the RBF visualization results. One immediate
advantage of using RBF is apparent: the results of the RBF algorithm present a
profile by model region, which can be used to characterize or explain why the
model is working.
106
Intelligent Miner Applications Guide
Figure 52. RBF Results
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
107
In Figure 52 on page 107, observe that the top region with an average model
score of 0.7778 is characterized by customers with higher than average revenue
in Q2, and a larger positive revenue difference between Q2 and Q1 indicating
growth in activity. The third region from the top is characterized by customers
who were in the Best segment in 1996 and who have much higher than average
withdrawal amounts from savings accounts. These characterizations are
consistent with the current business understanding of customers likely to use a
credit card. The gains chart for RBF is plotted in Figure 53 on page 109. The
RBF model used against the training data set results in a gains curve similar to
that of the decision tree, with a modest lift of 1.5 over random. The model is,
however, wavy in the top quantiles, which raises some concern and should be
resolved before implementation.
6.4.3 Neural Network
The neural network algorithm was run against the top six variables selected
from the decision tree. The following sensitivity analysis results were output:
Field Name
Savings_Revenue_Q2
Savings_Txns_Q2
Loan_Revenue_Q1
Best96 Status
Total Revenue
Total_Txns_Q1
Sensitivity
4.6
4.7
12.0
0.2
22.2
56.0
This result indicates that the output is most affected by changes in the total
number of transactions in the first quarter of 1997, which accounts for 56% of the
total change observed. Total revenue in the first half of 1997 accounts for 22.2%
of the observed change. The large fraction of sensitivity accounted for by two of
the variables raises some concern. These two variables are also at the very top
of the decision tree. Better results may be achieved with the neural network if
the training file is first segmented using the tree rules for these variables and
then a neural network is trained on each segment.
Figure 53 on page 109 shows the gains chart for both training and test for this
neural network model. Although the training results outperform both other
models, however, the gains curve is much wavier below the top 20% of the list.
In training, the lift of the model is approximately 2 times random. In test mode
the inflection point of the lift curve moves to the left. When the model is built
against a stratified training data set, this is expected. The lift of the test curve is
approximately 3 times random at 20% of the total population. The curve is very
wavy at the top, however. This severe waviness indicates that the model has
missed a major factor and should be resolved. The two overpowering variables
in the model as indicated by the sensitivity results above could be masking other
effects that would otherwise be present and could explain the gains curve
waviness. Preliminary results indicate that the neural network model will work
better in the end for this data set.
108
Intelligent Miner Applications Guide
Figure 53. Cross-Selling: Comparison of Three Predictive Models
6.5 Business Implementation
You may recall that the business objective was: given a limited mailing budget,
target the most likely prospects for a credit card offer to reduce the cost of
customer acquisition and improve the campaign ROI.
The optimal number of customers to target can be decided by looking at Table 6
and its graphic representation in Figure 54 on page 110.
Table 6. Cross-Selling: ROI Analysis Figures
Percentage
of
Universe
(%)
Predicted
Response
Rate ( % )
Number of
Responses
Total Cost
of Mailing
(U.S. $)
Annual
Credit
Card Profit
(U.S. $)
Predicted
ROI ( % )
10
55
416
12,500
41,600
333
20
58
435
25,000
43,500
174
30
65
488
37,500
48,800
130
40
76
570
50,000
57,000
114
50
83
623
62,500
62,300
99.7
60
90
675
75,000
67,500
90
70
92
690
87,500
69,000
79
80
95
713
100,000
71,300
71
90
98
735
112,500
73,500
65
100
100
750
125,000
75,000
60
Chapter 6. Target Marketing Model to Support a Cross-Selling Campaign
109
Figure 54. Cross-Selling: ROI Analysis Figures
The ROI analysis table is built from a combination of the comparative gains chart
(Figure 53 on page 109) and the baseline ROI calculation in Table 5 on page 88.
The first two columns in Table 6 are derived by reading the gains chart in
Figure 54 from left to right as follows: contacting 20% of the universe will yield
58% of the respondents, contacting 40% of the universe will yield 76% of the
respondents, and so on. The number of responses is derived from the neural
network test model. The cost and profit figures are taken directly from Table 5
on page 88.
In conclusion, to achieve a positive return and at the same time maximize the
migration of customers from the Okay Customer Set to the Good Customer Set,
40% of the potential customer universe should be targeted.
110
Intelligent Miner Applications Guide
Chapter 7. Attrition Model to Improve Customer Retention
In this chapter we discuss attrition management analysis, which is how to keep
your customer satisfied, how to predict the customers who will leave within six
months, and how to make these expected defectives loyal customers. In
general, it is more profitable to influence the nonloyal customers to be loyal to
your company than to strive to gather new customers.
With many analysts estimating customer attrition rates at almost 50% every five
years, the challenge to manage customer attrition is driving companies to gain a
more comprehensive understanding of their customers.
Figure 55 illustrates the point well. This chart is taken from the Harvard
Business Review and demonstrates the value of good attrition control to the
profitability of businesses in several different sectors. The message is clear: by
decreasing the rate of attrition, you can increase the profitability of your
business.
Figure 55. Reducing Defections 5 % Boosts Profits 2 5 % to 8 5 %
Intelligent Miner can develop models that you can use to accurately target
customers who might defect. If you take the appropriate business action to stop
the potential defection, you can stop the reduction in customer attrition.
7.1 Executive Summary
The goal of this case study was to identify which profitable customers were likely
to defect. The profitable customers selected for analysis were the Okay
Customer Set and the Good Customer Set customers from the Customer
Segmentation case study. In addition to being able to predict which customers
had a higher likelihood of defection, we wanted to understand the characteristics
of the defectors and nondefectors and how we could use that information to
increase the company′s retention rate.
We used four different methods to solve the business problem. The methodology
to implement these techniques, including data preparation and analysis of
results, are also presented. For this prediction we used a combination of
 Copyright IBM Corp. 1999
111
customer transaction data and demographic data. We contrasted the results
from the three standard prediction techniques with a time-series technique.
The neural networks, both the standard and time-series versions, were able to
predict the best which customers were likely to defect. The standard neural
network could identify 95% of the defectors in only 20% of the customer
population. The time-series neural network could identify 92% of defectors in
20% of the customer population. In addition the time-series neural network
could reduce the window of predicting the time of defection to one month,
instead of six months for the standard techniques.
We profiled and characterized the output of all the techniques to distinguish
between a typical defector customer and a nondefector customer. The
characterizations from all algorithms agreed very well. The defining
characteristics of defectors were:
•
Mostly from the Okay Customer Set
•
No Best customers
•
Lower product usage than average
•
Shorter tenure
•
In general lower usage of all products, especially telebanking, credit card,
mortgages, and loans
The defining characteristics of nondefectors were:
•
Mostly from the Good Customer Set, although not as skewed in favor as the
defector
•
Higher ratio of Best customers
•
Higher product usage than average
•
Longer tenure
•
In general a higher usage of all products, especially telebanking, credit card,
personal banking, mortgages, and loans
The Bank′s personal banking service is a bundle of savings, checking, credit
card and lower fees for various services. This bundle was only associated with
non defector customers. Customers with a multi-faceted relationship with the
Bank are less likely to defect. Selling the personal banking bundle is a good
start to build a strong relationship with customers.
7.2 Business Requirement
The goal of this case study was to identify those customers in profitable
segments who have a high probability of defection. Once customers likely to
defect have been identified, it is then possible to take business action to reduce
the likelihood by offering the customer incentives to remain loyal. The reason
for analyzing only profitable customers is that they provide sufficient margin to
permit discounting and rewards and still be profitable. Customers who are likely
to defect and are not profitable should be let go. We built an attrition model for
the Okay Customer Set and the Good Customer Set, both of which are profitable
customer segments.
112
Intelligent Miner Applications Guide
We defined customer defection as a customer who had no activity for at least six
months. Our analysis was completed in January 1998, so the most recent
defectors were customers who had no activity in July 1997 through December
1997.
In addition to identifying those customers most likely to defect, we analyzed how
customers could be prevented from defecting. We also profiled the customers
likely to defect.
7.3 Data Mining Process
We took two broad approaches to the problem:
•
Model 1:
A combination of three tried-and-tested Intelligent Miner algorithms
•
Model 2:
A new Intelligent Miner algorithm called time-series prediction
We combined decision tree, RBF, and neural classification much as we did for
the Cross-Selling case study (see Chapter 6, “Target Marketing Model to
Support a Cross-Selling Campaign” on page 87). On the basis of detailed
customer transactions from January through December 1997, we identified those
customers with a high probability of defecting sometime in the last six months of
1997. The more sophisticated time-series analysis algorithm used the same
input data but provided us with not just a single probability of defection but also
with six different data points corresponding to the probability of defection in any
one of the last six months of 1997. Figure 56 on page 114 illustrates the general
approach.
Chapter 7. Attrition Model to Improve Customer Retention
113
Figure 56. Data M i n i n g Process: Attrition Analysis
•
The modeling approach shown on the left-hand side of Figure 56 is identical
to the approach we used in the Cross-Selling case study (see Chapter 6,
“Target Marketing Model to Support a Cross-Selling Campaign” on page 87).
•
The modeling approach shown on the right-hand side of Figure 56 uses
time-series prediction .
The only differences are the definition and creation of the objective variable. In
fact, because the same time periods were involved for both case studies, we
used the exact same initial data set to start the data mining process. Therefore
we only discuss method 1, the combination of decision tree, RBF, and neural
classification, for the definition of the objective variable. For details of the
process refer to Chapter 6, “Target Marketing Model to Support a Cross-Selling
Campaign” on page 87. The data mining process focuses on the second
method. We present and discuss the results for both methods.
114
Intelligent Miner Applications Guide
7.3.1 Data Definition
Refer to Figure 57 on page 116 for a layout of the data required for each
method. Note that for the standard prediction methods, there is one record per
customer, with one variable being the targeted. For the time-series approach
each customer has one record per month and one value of the objective variable
per month, that is, the target variable has a profile, whereas the time-series
approach will try to predict the profile.
7.3.1.1 Method 1
As a first step we created the objective variable to be modeled. We assigned
the objective variable a value of 0 or 1. Customers who had activity in the first
half of 1997 and had no activity in the second half were assigned a value of 1,
and all other customers were assigned a value of 0.
7.3.1.2 Method 2
For this method we defined the objective variable in the same way as in method
1, but implemented it differently. Customers who had activity in the first half of
1997 and had no activity in the second half were assigned a value of 1 for the
objective variable for each of the six months of the second half of 1997, and a
value of zero for each of the 6 months of the first half of 1997. All other
customers were assigned a zero for the objective variable for all 12 months.
Because of the definition of customer defection in both methods, we actually
built a model to identify which customers defected in July 1997.
Chapter 7. Attrition Model to Improve Customer Retention
115
Figure 57. Attrition Analysis: Data Definition
7.3.2 Data Preparation
Given the two models we used, shown in Figure 56 on page 114, we performed
the data preparation for each model.
7.3.2.1 Method 1
Refer to Chapter 6, “Target Marketing Model to Support a Cross-Selling
Campaign” on page 87.
7.3.2.2 Method 2
One advantage of using the time-series prediction method is that it does not
require pivoting the transaction data: that is, taking the transaction data out of its
natural time sequence and creating time variables for each customer record.
The time-series algorithm uses data in its time-like sequence, whereas the
standard prediction methods use a different variable on the same record for
each time period. This fact limits the number of variables that can be analyzed
with the standard prediction techniques. To capture time effects, a separate
variable must be created for each time period being considered, the differences
between time periods, and other factors. In method 1 the selection of quarterly
time periods was driven by the desire to keep the number of variables small.
For 14 product categories, with 3 variables (recency, number of transactions, and
116
Intelligent Miner Applications Guide
revenue) and 2 quarters, there are 84 base variables with 42 differences, 42
totals, and 84 ratios. If we had used months instead of quarters, we would have
had 3 times the number of variables. With the time-series method we only had
42 variables (that is, 14 categories with 3 variables per category) and one record
per month or 12 times the number of records. The time series neural network
does not require the creation of time-derivative terms as it captures those effects
using time-delay layers. The algorithms tend to scale logarithmically with the
number of records and exponentially with the number of variables. It is
therefore much more efficient and elegant to use the time-series method.
The algorithm requires that each customer have a record for each time period.
Unfortunately, if a customer has no transaction activity in a period, it has no
transaction record. Therefore a dummy table must be created containing all
customer ID and month-number pairs. The transaction data can then be joined
through an outer join, which will assign null values to all missing customer ID
and month-number pairs. The null values can then be updated to zeroes.
7.3.3 Data Mining
Figure 49 on page 96 outlines the detailed steps in running a predictive
modeling algorithm. We refer to Figure 49 on page 96 in our discussion of the
time-series prediction method.
7.3.3.1 Parameter Selection
For the two methods described here, we used the following parameter settings:
Method 1 — Refer to Chapter 6, “Target Marketing Model to Support a
Cross-Selling Campaign” on page 87.
Method 2 — The time-series algorithm has the following parameters
(see Figure 58 on page 118).
•
In-sample size
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
•
Out-sample size
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
•
Maximum number of passes
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
•
Forecast horizon
The forecast horizon is the number of periods in the future from the record in
consideration for which the algorithm is making a prediction. In this case we
used a forecast horizon of 1, that is, we were trying to predict one month in
advance.
•
Window size
The window size is the number of historical records used to make a future
prediction. In our case we used six months of historical data to predict one
month in advance.
•
Average error
Average error is an algorithm stopping criterion. It is the average root mean
square (RMS) error limit. If the average RMS error is greater than this limit,
Chapter 7. Attrition Model to Improve Customer Retention
117
the algorithm continues training until the criterion is met or the maximum
number of passes is exceeded.
•
Neural architecture
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
•
Learning Rate
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
•
Momentum
Refer to 6.3.5.2, ″Parameter Selection,″ under Neural Network.
Figure 58. Times Series: Setting the Parameters
7.3.3.2 Input Field Selection
We selected the input fields for each model as described below.
Method 1 — Refer to Chapter 6, “Target Marketing Model to Support a
Cross-Selling Campaign” on page 87.
Method 2 — The prediction variable should be continuous for the time-series
algorithm. We selected the variables indicated by the decision tree to be
important predictors of defection. The algorithm also profiles the output by
quantile break points that can be user defined. We input as supplementary all
data not used in the prediction.
118
Intelligent Miner Applications Guide
7.3.3.3 Output Field Selection
We selected the following output fields for the two models used in this case
study:
Method 1 — Refer to Chapter 6, “Target Marketing Model to Support a
Cross-Selling Campaign” on page 87.
Method 2 — The output of the time-series neural network can be viewed as a
series of gains charts, one gains chart for each time period prediction.
Therefore, at a minimum we required:
•
Customer ID
•
Objective variable
•
Algorithm prediction
7.3.3.4 Results Visualization
As we used two different models in this case study, the following will give you an
idea of what the results would look like for each model:
Method 1 — Refer to Chapter 6, “Target Marketing Model to Support a
Cross-Selling Campaign” on page 87.
Method 2 — The algorithm generates a profile of the quantile breaks, using the
clustering visualizer. This output also shows the average score by each
quantile. A reasonable result should have the following characteristics:
•
Ratio of average score for top quantile to bottom quantile should be at least
2 or 3 to 1.
If this criterion is not satisfied, a mistake was made in setting up the data for
the model or the data selected is not predictive of the target.
•
The characteristics of the top quantile to bottom quantile should be different.
If the top and bottom quantiles do not have differing characteristics, the
model will have poor lift, or the data was not defined correctly.
•
The average score should be monotonically decreasing with decreasing
quantile bucket.
If the average score is not monotonic with decreasing bucket, there is
sample bias or an effect that the model is not capturing.
•
The order of importance of the variable within each quantile should be
different.
If the order of importance of the first few variables in each quantile is the
same, those variables are likely very powerful indicators that a separate
model should be built for each segment. Compare the results of the
prediction with a decision tree. If the powerful variables appear in the top of
the tree, then this is the case. If not, the variables may be systematically
related to the target variable.
Chapter 7. Attrition Model to Improve Customer Retention
119
7.3.4 Gains Chart
Refer to Chapter 6, “Target Marketing Model to Support a Cross-Selling
Campaign” on page 87 for a discussion of the role of gains charts in predictive
modeling.
7.3.5 Clustering
In addition to the exercise of building predictive models to identify those
customers likely to defect, we completed some clustering of the resulting models
to explain the characteristics of defectors and nondefectors. We extracted the
top and bottom decile customer records from the neural prediction model and
appended them together into a data set. We used the demographic clustering
algorithm to cluster all input data, not just the variables used to create the
neural network model. The quantile number was input into the clustering
algorithm as a supplementary variable to determine whether the algorithm could
distinguish the customer records using other data. We then used the insight into
the differences between defectors and nondefectors to design a campaign to
reduce the defection rate in the targeted customer segments.
In addition to clustering the neural network scores, RBF output visualization
automatically provides profile information about the customer records in each
region. For the run of the RBF algorithm, we made sure to include all possible
input data available as supplementary data if it was not used already as input
into the model.
7.4 Data Mining Results
In this section we describe the results achieved with each of the data mining
algorithms used in this case study. Remember that the decision tree , RBF
modeling , neural network , and clustering represent the first model used, whereas
the time-series prediction represents the second model used for the attrition
management analysis.
7.4.1 Decision Tree
The following primary variables appeared in the decision tree:
•
Lifetime to Date Revenue
•
Mortgage Balance
•
Total loan balance
•
Customer tenure in months
•
Number of products used in 1997
Figure 59 on page 121 shows a node for customers who will leave with 85%
probability. The characteristics of these customers are:
120
•
Total revenue is less than 1091.
•
They have lower than average mortgage balances.
•
They have no loans
Intelligent Miner Applications Guide
Figure 59. Attrition Analysis: Decision Tree Structure
All of these variables were meaningful for the business problem. Figure 60 on
page 122 shows the gains chart for both training and test for the decision tree.
The training results are very smooth, indicating monotonicity in the distribution
of the target variable positive events by descending leaf node score. The
training result has a lift ratio of 1.75 times random at 20% of the customer
population. The test gains curve has a higher lift ratio of 3.3 at 20%. This
improvement is due to the use of stratified sampling in the training mode. The
problem with the test mode is the severe waviness of the gains curve in the top
portion of the curve. This is due to either a biased training sample (because of a
small sample of negative target events) or some effect that the model was
missing. The former explanation is the most probable.
Chapter 7. Attrition Model to Improve Customer Retention
121
Figure 60. Decision Tree Gains Chart: Training and Testing
7.4.2 RBF Modeling
The RBF results visualization in Figure 61 on page 123 is positive as we have
multiple regions. The top region to bottom region have a ratio of defection of
over 10 to 1, and the characteristics of each region are different and
interpretable.
122
Intelligent Miner Applications Guide
Figure 61. RBF: Results Window
Those customers in the top region of Figure 62 on page 124 have low activity
across all products and tend to live in Ontario.
Chapter 7. Attrition Model to Improve Customer Retention
123
Figure 62. Attrition Analysis: Predicting Values Result
The second region from the top, which also has a high likelihood of defection,
also contains customers skewed to Ontario on whom the Bank didn′t have any
demographics. The reason for no demographics would be that these customers
had not applied for any credit and hence had not completed a credit check. This
region also had low activity across all products. The first two regions also
contained more male customers than average.
The regions that had a low probability of defection were mostly from the Good
Customer Set and a small segment of the Okay Customer Set. These customers
tended to be the Bank′s Best customers and to come from Western Canada.
These customers had high product activity and tended to have a longer tenure
124
Intelligent Miner Applications Guide
as well. All these characterizations reflect the current business understanding
(see Figure 63 on page 125).
Figure 63. Attrition Analysis: Predicting Values
Figure 64 on page 126 shows the gains curves for the RBF models against both
the training and test data sets. The RBF training result is similar to the decision
tree gains chart at the top quantiles or below 25% of the customer universe.
Above 25% percent of the cumulative customer population, the tree training
model performs significantly better. The test result for the RBF model is worse
than that of the decision tree because it has a lower lift ratio and is more wavy
in the top quantiles. The model is still very promising, showing a lift over
Chapter 7. Attrition Model to Improve Customer Retention
125
random of approximately 2.8 at 20% of the customer population. Some further
analysis to account for the source of the waviness may improve the result.
Figure 64. Attrition Analysis: Comparative Gains Charts for A l l Methods
7.4.3 Neural Network
We built the neural network model, using the most significant variables as
identified by the decision tree. The following sensitivity analysis resulted from
the neural network:
Field Name
REVENUE
TENURE
AVG_LOAN_BAL
AVG_MTG_BAL
PRODUCTS_USED
Sensitivity
7.4
14.3
1.7
2.2
74.3
The ′PRODUCTSS_USED′ variable seems to be highly related to customer
defection. It is concerning that one variable accounts for such a high fraction of
the observed output sensitivity. However, exploring this variable in other
algorithms reveals that it is not the most sensitive variable as identified by the
decision tree or RBF. The business interpretation of the result indicates that the
more products a customer uses, the less likely they are to defect. This is a
current reflection of the business understanding.
Referring to Figure 64, you can see that the neural network achieves the best
results in training with a lift over random of 2 to 1 up to 40% of the customer
population. The training curve is also smooth. The test results have an
exceptional lift over random of approximately 4.75 to 1 at 20% of the customer
population. The gains curve, however, is wavy at the top quantiles. The result
should be accepted with caution. The high sensitivity of the one high variable,
the ″too good″ gains curve, and the waviness should all be explained before the
result is accepted as valid. The initial results are very promising and indicate
that the neural network will produce the best model for this business problem.
126
Intelligent Miner Applications Guide
7.4.4 Clustering
Finally, we clustered the neural network results to:
1. Validate the models which were developed
A clustering algorithm should be able to distinguish between the top decile
and bottom decile of predictive model scored output if the model is valid.
2. Identify the distinguishing characteristics of defectors and nondefectors
The distinguishing features could then be utilized to create retention
campaigns.
Figure 65 on page 128 shows the results of the clustering on the neural network
model output. The data set used for the clustering contained a 50/50 split of top
decile and bottom decile customer records. As expected we got two large
clusters, which are 43% and 38%, respectively. If the model had worked
perfectly, we would see just two clusters. In this case we got five smaller
clusters. The largest cluster contains the defectors, as indicated by the quantile
variable (the left two bars are the quantile buckets for the bottom two 5
percentile buckets, and the right two buckets are the top two 5 percentile
buckets). Notice the slight sliver of low quantile customers in this cluster. These
are mistakes made by either the neural network or the clustering algorithm;
further analysis of these records may improve the prediction result. Because the
clustering found a larger segment of defectors, there are probably fewer
characteristics associated with defectors as opposed to nondefectors.
The characteristics of defectors are:
•
Mostly from the Okay Customer Set segment
Recall that we only used customers from the Okay Customer Set and the
Good Customer Set.
•
No Best customers
•
Lower product usuage than the average
•
Shorter tenure
•
In general lower usage of all products, especially telebanking, credit card,
mortgages and loans
In contrast, the characteristics of nondefectors are:
•
Mostly from the Good Customer Set, although not as skewed in favor as the
defector
•
Higher ratio of Best customers
•
Higher product usage than average
•
Longer tenure
•
In general a higher usage of all products, especially telebanking, credit card,
personal banking, mortgages, and loans
Chapter 7. Attrition Model to Improve Customer Retention
127
Figure 65. Attrition Analysis: Demographic Clustering of Likely Defectors
7.4.5 Time-Series Prediction
The time-series neural network outputs a profile of the model scores by quantile
(see Figure 66 on page 129). The output is very good with a ratio of almost 20
to 1 between the top and bottom quantiles. The scores are monotonically
distributed by decreasing quantile, although the distribution could be a little
smoother, and variables have differing importance to each quantile group.
Customers with a high likelihood to defect, the top quantile in Figure 66 on
page 129, have lower than average activity in the different products. Customers
with a low likelihood to defect have high levels of activity at all products. This
128
Intelligent Miner Applications Guide
characterizations of defectors and nondefectors agrees with the output of all
other algorithms, including those using standard prediction methods.
Figure 66. Profile of Time-Series Prediction
Figure 64 on page 126 shows both the test and training gains curves for the
time-series neural network. The training case is almost identical to the standard
Chapter 7. Attrition Model to Improve Customer Retention
129
back-propagating neural network. The test case is slightly worse than the
standard neural network, but the curve is smooth with no wave. The lift over
random for the test case is very high at approximately 4.6 to 1 over random at
20%. Again you should be skeptical of such good performance and do some
analysis to ensure that there is no sample bias or the input data is not highly
related to the target field.
The time-series neural network can also be used to generate prediction profiles
over time by plotting the prediction versus time period. Figure 67 and Figure 68
on page 131 show the profiles of a few randomly selected customers from the
test data set. These plotted profiles are representative of the many that we
analyzed in the case study.
Figure 67. Time Profile of Defection Probability for Defectors
130
Intelligent Miner Applications Guide
Figure 68. Time Profile of Defection Probability for Nondefectors
The profile of a defector by definition has a step-like profile, with a step between
months 6 and 7, as visible in Figure 67 on page 130. The predicted profiles
have a similar step-like profile except the height of the step is smaller and the
location of the step is not always between months 6 and 7. One profile also has
a blip in the first few months of the year. The nondefector profiles in Figure 68
should be a flat line at 0. The shape of the profiles is not flat. They are wavy,
with no distinct step-like shape as the defectors had. The clear difference
between a defector and nondefector profile makes this algorithm very useful.
Not only can you distinguish between defectors and nondefectors, as you can
with the standard predictive methods, but you can approximate when the
customer is likely to defect. This fact is of critical importance in customer
defection problems because the timing of a business activity to reduce customer
defection should occur very close to the time that a customer is likely to defect.
With the standard techniques in this case, our prediction of defection is
windowed to within 6 months. Using the time-series approach we can narrow
this approximation to a particular month in this case. This is a substantial
improvement. The added difficulty of predicting the time of defection tends to
cause the gains curves of the time-series prediction to be slightly worse than the
standard neural technique, but in this case the model can distinguish defectors
and nondefectors very easily.
7.5 Business Implementation
Once customers with a high likelihood of defection are identified, it is possible to
execute a direct mail campaign to target them. If we are to believe the neural
network model results, we could target 95% of customers likely to defect for
20% of the cost compared to mailing to all customers (in the Okay Customer Set
and the Good Customer Set). This is a substantial cost savings. Using the
time-series prediction we can also time the customer communication to be as
Chapter 7. Attrition Model to Improve Customer Retention
131
close as possible to the time of likely customer defection to make the contact as
relevant as possible.
Once we have identified a list of customers we intend to target, we can profile
the profitability of those customers to determine the margin available to be used
to increase customer retention.
We can then implement this budget to try to change the defectors into
nondefectors. The characteristics of non-defectors were primarily lots of product
usage indicating a strong relationship with the Bank. A key product in building a
multi-product relationship is the personal banking bundle which was present in
only non-defector customers. This bundle includes a savings account, checking
account, credit card and other services all at low fee. A campaign to cross-sell
this product to profitable customers likely to defect may help consolidate the
Bank′s relationship.
132
Intelligent Miner Applications Guide
Chapter 8. Intelligent Miner Advantages
In the first case study we created a segmentation model to be used as a basis
for CRM. Using several techniques, we were able to create segments of
customers with differing levels of shareholder value. The differences in
shareholder value of the customer segments allowed us to identify the most
profitable customers, high potential customers, and low potential customers.
The best customer segment represented 35% of customer revenue from 9% of
customers. Several high potential segments were also identified. If by
marketing additional products and/or services to these customers we were able
to change the purchase behavior of 10% of the high potential customers to be
similar to our best customers, we could impact total revenue by 18%.
Selecting one of the high potential customer segments, we used product
association techniques to find cross-selling opportunities in the second case
study. By contrasting the behavior of the high potential segment against the
behavior of a higher potential group, we were able to identify missing products
that could be marketed in a cross-selling campaign. The product to cross sell
was identified to be a credit card. If through marketing we were able to activate
10% of the high potential cluster to use the credit card, we could impact the
cluster revenue by 25% and the overall revenue by 3%.
Having identified a customer segment and a product for cross-selling, we could
execute a promotion. Rather than marketing to every customer in the segment
of interest, building a predictive model to target just those customers likely to
activate with the bank′s credit card would be more cost effective. In the third
case study, we built several predictive models, using different techniques. The
best of these models was able to predict 65% of likely activations with only 30%
of the mailing cost. If we targeted 30% of the customer segment in question, our
expected ROI would be 160%.
In the fourth case study, we built predictive models to identify which customers
were likely to defect. From the customer segmentation model in the first case
study we were able to identify the current best customers and future high
potential value customers. An important marketing strategy for these customers
is retention. By marketing to these customers, we would be able to reduce the
defection rate in our best customer segments, ensuring the corporation′s future
earning potential and maintaining current revenue levels.
Using IBM′s Intelligent Miner for Data product in these four case studies, we
were able to illustrate how data mining can be used to support a CRM program
within an organization. We also showed the power of Intelligent Miner and its
capability to work on a wide variety of business problems. In selecting a tool for
data mining, an organization should consider the range of problems to be solved
and the potential for feeding the results of one business problem into the input of
the next. Intelligent Miner was able to execute a sequence of business activities
fundamental to CRM:
•
Create strategic marketing initiatives
•
Identify marketing opportunities to support strategic initiatives
•
Effectively target customers for a particular promotion
Based on the case studies presented, the total impact on the bottom line was
approximately a 25% increase in profitability.
 Copyright IBM Corp. 1999
133
IBM′s Intelligent Miner is the leading data mining product in today′ s
marketplace, offering these competitive advantages:
•
Algorithms based on open academic research
The algorithms within Intelligent Miner are based on open academic
research. They were developed in IBM laboratories around the world by
world-leading researchers in artificial intelligence and machine learning.
This body of research dates back more than 20 years. For end users of this
technology, this large body of research means higher quality algorithms that
produce better results than other tools in the marketplace.
•
Research grown out of IBM core competence
IBM has been cultivating artificial intelligence and machine learning through
billions of dollars of investment in R&D for decades. In the corporate world
IBM research labs are second to none. Organizations building competitive
data mining products do not have nearly as strong a competency in the
disciplines required for data mining.
In addition to a core competency in data mining, IBM has a core competency
in software development. The technical challenge of implementing data
mining technology with the ability to work against millions of customer
records is immense. No other organization has developed such complex
algorithms that are as scalable as Intelligent Miner′s. Most of the
competition produces data mining products for PCs and uniprocessor server
platforms.
To customers, this advantage means higher quality algorithms that have
been so efficiently implemented that the time required to create decision
support information has been significantly reduced.
•
Algorithms that have existed for a long time
Some of the algorithms in Intelligent Miner have existed in other IBM
products for 10 years. The newest algorithms are more than three years old.
The competitors are just creating and releasing products today. In addition
to being first in the marketplace, IBM consultants have been using the
algorithms in more than 100 engagements around the world. In fact,
Intelligent Miner was created because the data mining consultants
recognized the need for an integrated data mining product with several
different analytical methods.
End users benefit from these advantages. More product use means that
current product will have fewer software bugs. Investments in million dollar
marketing campaigns are more secure with Intelligent Miner. All the
algorithm bugs have been shaken out through years of practical application.
•
Wider variety of algorithms and visualizations in one tool
As shown in the case studies presented in this book, the wide variety of
algorithms in Intelligent Miner allows for a wider range of analysis than other
data mining packages. The powerful combination of visualization tools and
data mining algorithms, some of which are unique to Intelligent Miner,
permits better business results than other products permit.
•
Unique algorithms
Intelligent Miner has two algorithms that are unique and were invented by
IBM researchers. The demographic algorithm is the only clustering
algorithm that can cluster categorical data. The product associations
algorithms were also invented in IBM research labs.
134
Intelligent Miner Applications Guide
These unique capabilities enable end users to perform analyses not possible
with other tools.
•
″Infinitely scalable″
Intelligent Miner runs on the SP2 MPP platform, which can scale to handle
terabyte-sized data warehouses. No competitive products are as scalable.
Intelligent Miner can also connect to operational databases for scoring and
validation. This scalability enables end users with millions of customers to
efficiently integrate data mining technology into their businesses today.
•
Open technology
Intelligent Miner runs on other vendor platforms, including HP, Sun, and
Windows NT. It can also interface to other databases, using IBM′ s
DataJoiner product. Thus end user customers can use IBM data mining
technology on their platforms today.
Chapter 8. Intelligent Miner Advantages
135
136
Intelligent Miner Applications Guide
Appendix A. Special Notices
This publication is intended to help all customers to better understand the
different data mining algorithms used by the Intelligent Miner for Data. The
information in this publication is not intended as the specification of any
programming interfaces that are provided by Intelligent Miner for Data. See the
PUBLICATIONS section of the IBM Programming Announcement for Intelligent
Miner for Data for more information about what publications are considered to
be product documentation.
References in this publication to IBM products, programs or services do not
imply that IBM intends to make these available in all countries in which IBM
operates. Any reference to an IBM product, program, or service is not intended
to state or imply that only IBM′s product, program, or service may be used. Any
functionally equivalent program that does not infringe any of IBM′s intellectual
property rights may be used instead of the IBM product, program or service.
Information in this book was developed in conjunction with use of the equipment
specified, and is limited in application to those specific hardware and software
products and levels.
IBM may have
this document.
these patents.
Licensing, IBM
patents or pending patent applications covering subject matter in
The furnishing of this document does not give you any license to
You can send license inquiries, in writing, to the IBM Director of
Corporation, North Castle Drive, Armonk, NY 10504-1785.
Licensees of this program who wish to have information about it for the purpose
of enabling: (i) the exchange of information between independently created
programs and other programs (including this one) and (ii) the mutual use of the
information which has been exchanged, should contact IBM Corporation, Dept.
600A, Mail Drop 1329, Somers, NY 10589 USA.
Such information may be available, subject to appropriate terms and conditions,
including in some cases, payment of a fee.
The information contained in this document has not been submitted to any
formal IBM test and is distributed AS IS.
The use of this information or the implementation of any of these techniques is a
customer responsibility and depends on the customer′s ability to evaluate and
integrate them into the customer′s operational environment. While each item
may have been reviewed by IBM for accuracy in a specific situation, there is no
guarantee that the same or similar results will be obtained elsewhere.
Customers attempting to adapt these techniques to their own environments do
so at their own risk.
Any pointers in this publication to external Web sites are provided for
convenience only and do not in any manner serve as an endorsement of these
Web sites.
 Copyright IBM Corp. 1999
137
The following terms are trademarks of the International Business Machines
Corporation in the United States and/or other countries:
AIX
DATABASE 2
DB2 Universal Database
Intelligent M iner
RISC System/6000
TextMiner
AIX/6000
DB2
IBM
QMF
RS/6000
Visual Warehouse
The following terms are trademarks of other companies:
C-bus is a trademark of Corollary, Inc.
Java and HotJava are trademarks of Sun Microsystems, Incorporated.
Microsoft, Windows, Windows NT, and the Windows 95 logo are trademarks
or registered trademarks of Microsoft Corporation.
PC Direct is a trademark of Ziff Communications Company and is used
by IBM Corporation under license.
Pentium, MMX, ProShare, LANDesk, and ActionMedia are trademarks or
registered trademarks of Intel Corporation in the U.S. and other
countries.
UNIX is a registered trademark in the United States and other
countries licensed exclusively through X/Open Company Limited.
Other company, product, and service names may be trademarks or
service marks of others.
138
Intelligent Miner Applications Guide
Appendix B. Related Publications
The publications listed in this section are considered particularly suitable for a
more detailed discussion of the topics covered in this redbook.
B.1 International Technical Support Organization Publications
For information on ordering these ITSO publications see “How to Get ITSO
Redbooks” on page 141.
•
Discovering Data Mining , SG24-4839
•
Mining Relational and Nonrelational Data with IBM Intelligent Miner for Data ,
SG24-5278
B.2 Redbooks on CD-ROMs
Redbooks are also available on CD-ROMs. Order a subscription and receive
updates 2-4 times a year at significant savings.
CD-ROM Title
System/390 Redbooks Collection
Networking and Systems Management Redbooks Collection
Transaction Processing and Data Management Redbook
AS/400 Redbooks Collection
RS/6000 Redbooks Collection (HTML, BkMgr)
RS/6000 Redbooks Collection (PostScript)
Application Development Redbooks Collection
Personal Systems Redbooks Collection
Subscription
Number
SBOF-7201
SBOF-7370
SBOF-7240
SBOF-7270
SBOF-7230
SBOF-7205
SBOF-7290
SBOF-7250
Collection Kit
Number
SK2T-2177
SK2T-6022
SK2T-8038
SK2T-2849
SK2T-8040
SK2T-8041
SK2T-8037
SK2T-8042
B.3 Other Publications
These publications are also relevant as further information sources:
•
 Copyright IBM Corp. 1999
Using the Intelligent Miner for Data , SH12-6325
139
140
Intelligent Miner Applications Guide
How to Get ITSO Redbooks
This section explains how both customers and IBM employees can find out about ITSO redbooks, CD-ROMs,
workshops, and residencies. A form for ordering books and CD-ROMs is also provided.
This information was current at the time of publication, but is continually subject to change. The latest
information may be found at http://www.redbooks.ibm.com.
How IBM Employees Can Get ITSO Redbooks
Employees may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and information about
redbooks, workshops, and residencies in the following ways:
•
PUBORDER — to order hardcopies in United States
•
GOPHER link to the Internet - type GOPHER.WTSCPOK.ITSO.IBM.COM
•
Tools disks
To get LIST3820s of redbooks, type one of the following commands:
TOOLS SENDTO EHONE4 TOOLS2 REDPRINT GET SG24xxxx PACKAGE
TOOLS SENDTO CANVM2 TOOLS REDPRINT GET SG24xxxx PACKAGE (Canadian users only)
To get BookManager BOOKs of redbooks, type the following command:
TOOLCAT REDBOOKS
To get lists of redbooks, type one of the following commands:
TOOLS SENDTO USDIST MKTTOOLS MKTTOOLS GET ITSOCAT TXT
TOOLS SENDTO USDIST MKTTOOLS MKTTOOLS GET LISTSERV PACKAGE
To register for information on workshops, residencies, and redbooks, type the following command:
TOOLS SENDTO WTSCPOK TOOLS ZDISK GET ITSOREGI 1998
For a list of product area specialists in the ITSO: type the following command:
TOOLS SENDTO WTSCPOK TOOLS ZDISK GET ORGCARD PACKAGE
•
Redbooks Web Site on the World Wide Web
http://w3.itso.ibm.com/redbooks
•
IBM Direct Publications Catalog on the World Wide Web
http://www.elink.ibmlink.ibm.com/pbl/pbl
IBM employees may obtain LIST3820s of redbooks from this page.
•
REDBOOKS category on INEWS
•
Online — send orders to: USIB6FPL at IBMMAIL or DKIBMBSH at IBMMAIL
•
Internet Listserver
With an Internet e-mail address, anyone can subscribe to an IBM Announcement Listserver. To initiate the
service, send an e-mail note to [email protected] with the keyword subscribe in the body of
the note (leave the subject line blank). A category form and detailed instructions will be sent to you.
Redpieces
For information so current it is still in the process of being written, look at ″Redpieces″ on the Redbooks Web
Site ( http://www.redbooks.ibm.com/redpieces.htm). Redpieces are redbooks in progress; not all redbooks
become redpieces, and sometimes just a few chapters will be published this way. The intent is to get the
information out much quicker than the formal publishing process allows.
 Copyright IBM Corp. 1999
141
How Customers Can Get ITSO Redbooks
Customers may request ITSO deliverables (redbooks, BookManager BOOKs, and CD-ROMs) and information about
redbooks, workshops, and residencies in the following ways:
•
Online Orders — send orders to:
In United States:
In Canada:
Outside North America:
•
•
United States (toll free)
Canada (toll free)
1-800-879-2755
1-800-IBM-4YOU
Outside North America
(+45) 4810-1320 - Danish
(+45) 4810-1420 - Dutch
(+45) 4810-1540 - English
(+45) 4810-1670 - Finnish
(+45) 4810-1220 - French
(long
(+45)
(+45)
(+45)
(+45)
(+45)
distance charges apply)
4810-1020 - German
4810-1620 - Italian
4810-1270 - Norwegian
4810-1120 - Spanish
4810-1170 - Swedish
Mail Orders — send orders to:
I B M Publications
144-4th Avenue, S.W.
Calgary, Alberta T2P 3N5
Canada
IBM Direct Services
Sortemosevej 21
DK-3450 Allerød
D enmark
Fax — send orders to:
United States (toll free)
Canada
Outside North America
•
Internet
[email protected]
[email protected]
[email protected]
Telephone orders
I B M Publications
Publications Customer Support
P.O. Box 29570
Raleigh, NC 27626-0570
USA
•
IBMMAIL
usib6fpl at ibmmail
caibmbkz at ibmmail
dkibmbsh at ibmmail
1-800-445-9269
1-403-267-4455
(+45) 48 14 2207 (long distance charge)
1-800-IBM-4FAX (United States) or (+1)001-408-256-5422 (Outside USA) — ask for:
Index # 4421 Abstracts of new redbooks
Index # 4422 IBM redbooks
Index # 4420 Redbooks for last six months
•
Direct Services - send note to [email protected]
•
On the World Wide Web
Redbooks Web Site
IBM Direct Publications Catalog
•
http://www.redbooks.ibm.com
http://www.elink.ibmlink.ibm.com/pbl/pbl
Internet Listserver
With an Internet e-mail address, anyone can subscribe to an IBM Announcement Listserver. To initiate the
service, send an e-mail note to [email protected] with the keyword subscribe in the body of
the note (leave the subject line blank).
Redpieces
For information so current it is still in the process of being written, look at ″Redpieces″ on the Redbooks Web
Site ( http://www.redbooks.ibm.com/redpieces.htm). Redpieces are redbooks in progress; not all redbooks
become redpieces, and sometimes just a few chapters will be published this way. The intent is to get the
information out much quicker than the formal publishing process allows.
142
Intelligent Miner Applications Guide
IBM Redbook Order Form
Please send me the following:
Title
First name
Order Number
Quantity
Last name
Company
Address
City
Postal code
Telephone number
Telefax number
•
Invoice to customer number
•
Credit card number
Credit card expiration date
Card issued to
Country
VAT number
Signature
We accept American Express, Diners, Eurocard, Master Card, and Visa. Payment by credit card not
available in all countries. Signature mandatory for credit card payment.
How to Get ITSO Redbooks
143
144
Intelligent Miner Applications Guide
Glossary
A
C
adaptive connection. A numeric weight used to
describe the strength of the connection between two
processing units in a neural network. The connection
is called adaptive because it is adjusted during
training. Values typically range from zero to one, or
-0.5 to +0.5.
categorical values. Discrete, nonnumerical data
represented by character strings; for example, colors
or special brands.
aggregate.
To summarize data in a field.
application program interface (API). A functional
interface supplied by the operating system or a
separate orderable licensed program that allows an
application program written in a high-level language
to use specific data or functions of the operating
system or the licensed program.
architecture. The number of processing units in the
input, output, and hidden layer of a neural network.
The number of units in the input and output layers is
calculated from the mining data and input
parameters. An intelligent data mining agent
calculates the number of hidden layers and the
number of processing units in those hidden layers.
chi-square test. A test to check whether two
variables are statistically dependent or not.
Chi-square is calculated by subtracting the expected
frequencies (imaginary values) from the observed
frequencies (actual values). The expected frequencies
represent the values that were to be expected if the
variable question were statistically independent.
classification. The assignment of objects into groups
or categories based on their characteristics.
cluster. A group of records with similar
characteristics.
cluster prototype. The attribute values that are
typical of all records in a given cluster. Used to
compare the input records to determine whether a
record should be assigned to the cluster represented
by these values.
associations. The relationship of items in a
transaction in such a way that items imply the
presence of other items in the same transaction.
clustering. A mining function that creates groups of
data records within the input data on the basis of
similar characteristics. Each group is called a cluster .
attribute. Characteristics or properties that can be
controlled, usually to obtain a required appearance.
For example, the color is an attribute of a line. In
object-oriented programming, a data element defined
within a class.
confidence factor. Indicates the strength or the
reliability of the associations detected.
B
D
back propagation. A general-purpose neural network
named for the method used to adjust weights while
learning data patterns. The Classification − Neural
mining function uses such a network.
DATABASE 2 (DB2). An IBM relational database
management system.
boundary field. The upper limit of an interval as used
for discretization using ranges of a processing
function.
database view. An alternative representation of data
from one or more database tables. A view can include
all or some of the columns contained in the database
table or tables on which it is defined.
bucket. One of the bars in a bar chart showing the
frequency of a specific value.
continuous field. A field that can have any floating
point number as its value.
database table.
A table residing in a database.
data field. In a database table, the intersection from
table description and table column where the
corresponding data is entered.
data format. There are different kinds of data
formats, for example, database tables, database
views, pipes, or flat files.
data table. A data table, regardless of the data
format it contains.
 Copyright IBM Corp. 1999
145
data type. There are different kinds of Intelligent
Miner data types, for example, discrete numeric,
discrete nonnumeric, binary, or continuous.
discrete. Pertaining to data that consists of distinct
elements such as character or to physical quantities
having a finite number of distinctly recognizable
values.
discretization.
discrete.
F-test. A statistical test that checks whether two
estimates of the variances of two independent
samples are the same. In addition, the F-test checks
whether the null hypothesis is true or false.
The act of making mathematically
E
envelope. The area between two curves that are
parallel to a curve of time-sequence data. The first
curve runs above the curve of time-sequence data,
the second one below. Both curves have the same
distance to the curve of time-sequence data. The
width of the envelope, that is, the distance from the
first parallel curve to the second, is defined as
epsilon.
epsilon. The maximum width of an envelope that
encloses a sequence. Another sequence is
epsilon-similar if it fits in this envelope.
epsilon-similar. Two sequences are epsilon-similar if
one sequence does not go beyond the envelope that
encloses the other sequence.
equality compatible. Pertaining to different data
types that can be operands for the = logical
operator.
Euclidean distance. The square root of the sum of
the squared differences between two numeric vectors.
The Euclidean distance is used to calculate the error
between the calculated network output and the target
output in neural classification, to calculate the
difference between a record and a prototype cluster
value in neural clustering. A zero value indicates an
exact match; larger numbers indicate greater
differences.
F
field. A set of one or more related data items
grouped for processing. In this document, with regard
to database tables and views, field is synonymous
with column .
file. A collection of related data that is stored and
retrieved by an assigned name.
file name. (1) A name assigned or declared for a file.
(2) The name used by a program to identify a file.
flat file. (1) A one-dimensional or two-dimensional
array: a list or table of items. (2) A file that has no
hierarchical structure.
146
formatted information. An arrangement of
information into discrete units and structures in a
manner that facilitates its access and processing.
Contrast with narrative information .
Intelligent Miner Applications Guide
function. Any instruction or set of related instructions
that perform a specific operation.
fuzzy logic. In artificial intelligence, a technique
using approximate rules of inference in which truth
values and quantifiers are defined as possibility
distributions that carry linguistic labels.
I
input data. The metadata of the database table,
database view, or flat file containing the data you
specified to be mined.
input layer. A set of processing units in a neural
network which present the numeric values derived
from user data to the network. The number of fields
and type of data in those fields are used to calculate
the number of processing units in the input layer.
instance. In object-oriented programming, a single,
actual occurrence of a particular object. Any level of
the object class hierarchy can have instances. A n
instance can be considered in terms of a copy of the
object type frame that is filled in with particular
information.
interval. A set of real numbers between two numbers
either including or excluding both of them.
interval boundaries. Values that represent the upper
and lower limits of an interval.
item category. A categorization of an item. For
example, a room in a hotel can have the following
categories: Standard, Comfort, Superior, Luxury. The
lower category is called the child item category. Each
child item category can have several parent item
categories. Each parent item category can have
several grandparent item categories.
item description. The descriptive name of a
character string in a data table.
item ID.
The identifier for an item.
item set. A collection of items. For example, all items
bought by one customer during one visit to a
department store.
K
Kohonen Feature Map. A neural network model
comprised of processing units arranged in an input
layer and output layer. All processors in the input
layer are connected to each processor in the output
layer by an adaptive connection. The learning
algorithm used involves competition between units for
each input pattern and the declaration of a winning
unit. Used in neural clustering to partition data into
similar record groups.
L
large item sets. The total volume of items above the
specified support factor returned by the Associations
mining function.
learning algorithm. The set of well-defined rules used
during the training process to adjust the connection
weights of a neural network. The criteria and methods
used to adjust the weights define the different
learning algorithms.
learning parameters. The variables used by each
neural network model to control the training of a
neural network which is accomplished by modifying
network weights.
lift. Confidence factor divided by expected
confidence.
nonsupervised learning. A learning algorithm that
requires only input data to be present in the data
source during the training process. No target output is
provided; instead, the desired output is discovered
during the mining run. A Kohonen Feature Map, for
example, uses nonsupervised learning.
O
offset. (1) The number of measuring units from an
arbitrary starting point in a record, area, or control
block, to some other point. (2) The distance from the
beginning of an object to the beginning of a particular
field.
operator. (1) A symbol that represents an operation
to be done. (2) In a language statement, the lexical
entity that indicates the action to be performed on
operands.
output data. The metadata of the database table,
database view, or flat file containing the data being
produced or to be produced by a function.
output layer. A set of processing units in a neural
network which contain the output calculated by the
network. The number of outputs depends on the
number of classification categories or maximum
clusters value in neural classification and neural
clustering, respectively.
P
M
pass.
metadata.
objects.
mining.
In databases, data that describes data
Synonym for analyzing or searching.
mining base. A repository where all information
about the input data, the mining run settings, and the
corresponding results is stored.
model. A specific type of neural network and its
associated learning algorithm. Examples include the
Kohonen Feature Map and back propagation.
One cycle of processing a body of data.
prediction. The dependency and the variation of one
field′s value within a record on the other fields within
the same record. A profile is then generated that can
predict a value for the particular field in a new record
of the same form, based on its other field values.
processing unit. A processing unit in a neural
network is used to calculate an output by summing all
incoming values multiplied by their respective
adaptive connection weights.
Q
N
narrative information. Information that is presented
according to the syntax of a natural language.
Contrast with formatted information.
neural network. A collection of processing units and
adaptive connections that is designed to perform a
specific processing function.
quantile. One of a finite number of nonoverlapping
subranges or intervals, each of which is represented
by an assigned value.
Q is an N%-quantile of a value set S when:
•
Approximately N percent of the values in S are
lower than or equal to Q .
•
Approximately (100- N ) percent of the values are
greater than or equal to Q .
Neural Network Utility (NNU). A family of IBM
application development products for creating neural
network and fuzzy rule system applications.
Glossary
147
The approximation is less exact when there are many
values equal to Q. N is called the quantile label. The
50%-quantile represents the median.
R
similar time sequences. Occurrences of similar
sequences in a database of time sequences.
radial basis function. In data mining functions, radial
basis functions are used to predict values. They
represent functions of the distance or the radius from
a particular point. They are used to build up
approximations to more complicated functions.
record. A set of one or more related data items
grouped for processing. In reference to a database
table, record is synonymous with row .
region. (Sub)set of records with similar
characteristics in their active fields. Regions are used
to visualize a prediction result.
round-robin method. A method by which items are
sequentially assigned to units. When an item has
been assigned to the last unit in the series, the next
item is assigned to the first again. This process is
repeated until the last item has been assigned. The
Intelligent Miner uses this method, for example, to
store records in output files during a partitioning job.
rule. A clause in the form head ⇐ body. It specifies
that the head is true if the body is true.
rule body. Represents the specified input data for a
mining function.
rule group. Covers all rules containing the same
items in different variations.
rule head. Represents the derived items detected by
the Associations mining function.
S
scale. A system of mathematical notation: fixed-point
or floating-point scale of an arithmetic value.
scaling. To adjust the representation of a quantity by
a factor in order to bring its range within prescribed
limits.
scale factor. A number used as a multiplier in
scaling. For example, a scale factor of 1/1000 would
be suitable to scale the values 856, 432, -95, and /182
to lie in the range from -1 to +1, inclusive.
self-organizing feature map.
Map .
See Kohonen Feature
sensitivity analysis report. An output from the
Classification − Neural mining function that shows
which input fields are relevant to the classification
decision.
148
sequential patterns. Intertransaction patterns such
that the presence of one set of items is followed by
another set of items in a database of transactions
over a period of time.
Intelligent Miner Applications Guide
Structured Query Language (SQL). An established
set of statements used to manage information stored
in a database. By using these statements, users can
add, delete, or update information in a table, request
information through a query, and display results in a
report.
supervised learning. A learning algorithm that
requires input and resulting output pairs to be
presented to the network during the training process.
Back propagation, for example, uses supervised
learning and makes adjustments during training so
that the value computed by the neural network will
approach the actual value as the network learns from
the data presented. Supervised learning is used in
the techniques provided for predicting classifications
as well as for predicting values.
support factor. Indicates the occurrence of the
detected association rules and sequential patterns
based on the input data.
symbolic name. In a programming language, a
unique name used to represent an entity such as a
field, file, data structure, or label. In the Intelligent
Miner you specify symbolic names, for example, for
input data, name mappings, or taxonomies.
T
taxonomy. Represents a hierarchy or a lattice of
associations between the item categories of an item.
These associations are called taxonomy relations.
taxonomy relation. The hierarchical associations
between the item categories you defined for an item.
A taxonomy relation consists of a child item category
and a parent item category.
trained network. A neural network containing
connection weights that have been adjusted by a
learning algorithm. A trained network can be
considered a virtual processor: it transforms inputs to
outputs.
training. The process of developing a model which
understands the input data. In neural networks, the
model is created by reading the records of the input
and modifying the network weights until the network
calculates the desired output data.
translation process. Converting the data provided in
the database to scaled numeric values in the
appropriate range for a mining kernel using neural
networks. Different techniques are used depending
on whether the data is numeric or symbolic. Also,
converting neural network output back to the units
used in the database.
transaction. A set of items or events that are linked
by a common key value, for example, the articles
(items) bought by a customer (customer number) on a
particular date (transaction identifier). In this
example, the customer number represents the key
value.
transaction ID. The identifier for a transaction, for
example, the date of a transaction.
transaction group. The identifier for a set of
transactions. For example, a customer number, can
represent a transaction group that includes all
purchases of a particular customer during the month
of May.
V
vector. A quantity usually characterized by an
ordered set of numbers.
W
weight. The numeric value of an adaptive connection
representing the strength of the connection between
two processing units in a neural network.
winner. The index of the cluster which has the
minimum Euclidean distance from the input record.
Used in the Kohonen Feature Map to determine which
output units will have their weights adjusted.
Glossary
149
150
Intelligent Miner Applications Guide
List of Abbreviations
AMRP
air miles reward program
MBA
market basket analysis
API
application programming
interface
MDA
multidimensional database
analysis
CIM
continuous interactive
marketing
MDL
minimum description length
MPP
massive parallel processor
CPU
central processing unit
OLAP
online analytical processing
CRM
customer relationship
marketing
PC
personal computer
DB2
DATABASE 2
POS
point of sale
GB
gigabyte
PROFS
Professional Office System
GIS
graphical information system
R&D
research and development
RBF
radial basis function
RFM
recency frequency monetary
IBM
International Business
Machines Corporation
IT
information technology
RMS
root mean square
ITSO
International Technical
Support Organization
ROI
return on investment
SQL
structured query language
LIS
large item sets
TB
terabyte
 Copyright IBM Corp. 1999
151
152
Intelligent Miner Applications Guide
Index
A
C
accuracy 46, 98
affinity analysis 30
aggregate function
aggregation
algorithm selection
analysis
affinity 30
cluster detail 48, 53
data 6
decision tree 120
factor 39
intelligence 6
link 9, 13
market basket 13
multidimensional database 7
product affinity 13
result 16, 101, 120
time-series 113
anomalous decision tree 100
application
data mining 8
mode 24
architecture
Intelligent Miner 20
neural 118
association discovery 13
association rule discovery
attrition model
clustering result 126
data definition 114
data preparation 116
decision tree result 120
gains chart 119
input field selection 118
mining process 113
neural network result 126
output field selection 118
parameter selection 117
RBF result 122
result visualization 119
time-series result 128
average error
calculate ROI 88
campaign
cross selling 9
categoric variables 12, 50
chart
gains 105
cleaning
data 92
cluster
characterization 49, 54
detail analysis 48, 53
maximum number of 45
profiling 48, 63
result comparison 65
selection 71, 76
values 48
clustering
demographic 12
disadvantages 49
mode 24
neural 12
process 44
competition
focus on 3
components
CRM 32
Intelligent Miner 20
Condorect criteria 12
confidence
m i n i m u m 74
confusion matrix 65, 101
continuous marketing 28
continuous variables 50
create
objective variable 90
CRM
components 32
cross selling
association discovery 75
association rule discovery 77
campaign 9
cluster selection 71, 76
data preparation 73
data selection 72
identification 32
large item set removal 75
mining process 70
mining results 76
opportunity identification 68
parameter settings 74
rebuild rules 76
target marketing model 32
B
behavior pattern
customer 1
binary variables
business
analyst 7
50
 Copyright IBM Corp. 1999
153
customer
behavior pattern 1
focus on 2
purchasing pattern 8
relationship management 25
retention 30, 32, 110
retention manmagement 10
segmentation 29, 32
customer segmentation
cluster analysis 48, 53
cluster characterization 49, 54
cluster comparison 65
cluster profiling 48, 63
data preparation 37
data selection 35
decision tree characterization 65
input field selection 47
mining process 34
output field selection 48
parameter selection 45
result visualization 48, 50
D
data
access 21
analysis 6
cleaning 38, 92
definition 21, 114
flood 4
mining 5, 16
preparation 16, 37, 73, 92, 116
reduction 16
sampling 16, 93
selection 16, 35, 72
transformation 38, 92
data mining
application 8
process 34, 70, 89
results 103
techniques 9
data warehouse 3, 7
database
analysis 7
marketing 8
segmentation 9, 11
decision tre
results 103
decision tree 10, 49, 65, 96, 99
anomalous 100
parameter 97
demographic clustering 12, 45
demographic profile 27
deviation
detection 9
standard 47
discovery
association 13, 75
association rule 77
154
Intelligent Miner Applications Guide
discovery (continued)
subpopulations 3
discrete numeric variables
discretization 39
distance
absolute 47
range 47
standard deviation 47
drivers 2
E
enablers 4
error
average 117
rate 98
F
factor analysis 39
feature selection 95
focus
on competition 3
on customer 2
on data assets 3
relationship 2
forecast horizon 117
function
aggregate 36
analytical 23
mining 23
processing 24
statistical 23
G
gains chart 105
geodemographic profile
27
H
hierarchy 73
horizon
forecast 117
I
input field selection 47, 99
integer variables 50
Intelligent Miner
architecture 20
components 20
invalid value 38
item constraints 75
item set
large 75
r e m o v a l 75
50
neural (continued)
clustering 12
network 12, 98
network parameter 98, 100
prediction 19
numeric variables 50
K
Kohonen feature map
12
L
large item sets
r e m o v a l 75
learning
supervised 9
unsupervised 11
level aggregation 73
library processing 22
link analysis 9, 13
logarithmic transformation
O
output field selection
P
43
M
machine learning 1, 5
management
customer relationship 25
market
niche 2
saturation 1
market basket analysis 13
marketing
continuous 28
database 8
matrix
confusion 101
maximum rule length 75
minimum
confidence 74
support 74
mining
base 22
data 5
functions 23
kernel 22
result 22
missing value 38
model
attrition 110
propensity 90
modeling
predictive 9
modes
application 24
clustering 24
test 24
training 24
m o m e n t u m 98, 118
N
network
neural 98, 100
neural
architecture 118
48, 100
parameter
accuracy 98
clustering algorithm 46
error rate 98
in-sample size 97, 98
item constraints 74
learning rate 98
maximum number of clusters 45
maximum number of passes 45
minimum confidence 74
minimum support 74
m o m e n t u m 98
number of centers 97
number of passes 97, 98
number of records 97
out-sample size 97, 98
purity per internal node 97
region size 97, 98
rule length 74
selection 45, 97, 117
settings 74
tree depth 97
passes
maximum number of 45
permutation 75
prediction
neural 19
potential strategies 3
profile 130
strategies 2
tactical movements 3
time-series 128
value 97
value with RBF 100, 101
predictive modeling 9
probability weighting 47
process
clustering 44
data mining 34, 70, 89
processing
functions 24
library 22
product
aggregation 72
association analysis 73, 74
hierarchy 73
Index
155
product (continued)
ID 73
product affinity
analysis 13
product association 73, 74
profile prediction 130
profiles
demographic 27
geodemographic 27
psychographic 27
project
design 15
evaluation 15
management 15
objectives 15
plan 15
t e a m 15
propensity 11
model 90
psychographic profile 27
R
range
distance measure 47
rate
e r r o r 98
learning 98
RBF modeling
result 122
record scores 48
result
analysis 16, 101, 120
data mining 103
decision tree 103
RBF modeling 122
visualization 49, 100, 119
result visualization
customer segmentation 48
ROI
calculate 88
rules
rebuild 76
S
sampling
data 93
stratified 94
scatterplot 11
scores
record 48
scoring 11
segmentation 11
customer 29, 32
database 9
selection
algorithm 96
cluster 76
156
Intelligent Miner Applications Guide
selection (continued)
data 35
feature 95
input field 47, 99
output field 48, 100
parameter 45, 97, 117
selling
cross 9, 13, 30
up 30
shareholder value 33, 36
similarity threshold 46
standard deviation 47
statistical functions 23
stratified sampling 94
subpopulation discovery 3
supervised learning 9
support
m i n i m u m 74
T
target marketing
algorithm selection 96
data preparation 92
data sampling 93
decision tree result 103
feature selection 95
input field selection 99
mining process 89
neural network result 108
output field selection 100
parameter selection 97
RBF result 106
result analysis 101
result visualization 100
train and test 95
variable creation 90
TaskGuide 19, 22
techniques
data mining 9
test 95
threshold similarity 46
time sequence 13
time-series
analysis 113
parameter 117
prediction 128
result analysis 128
train 95
transformation
data 38, 92
logarithmic 43
U
unknown value 38
unsupervised learning
up-selling 30
11
user interface
21
V
value
invalid 38
missing 38
prediction with RBF 97, 100, 101
shareholder 33, 36
unknown 38
valid 38
variable
binary 50
categoric 12
categorical 50
continuous 50
create objective 90
discrete numeric 50
integer 50
numeric 50
visualization
result 48, 49, 100, 119
visualizer 21
W
weighting
information theoretic
probability 47
window size 117
47
Index
157
158
Intelligent Miner Applications Guide
ITSO Redbook Evaluation
Intelligent Miner for Data Applications Guide
SG24-5252-00
Your feedback is very important to help us maintain the quality of ITSO redbooks. Please complete this
questionnaire and return it using one of the following methods:
•
•
•
Use the online evaluation form found at http://www.redbooks.com
Fax this form to: USA International Access Code + 1 914 432 8264
Send your comments in an Internet note to [email protected]
Please rate your overall satisfaction with this book using the scale:
(1 = very good, 2 = good, 3 = average, 4 = poor, 5 = very poor)
Overall Satisfaction
____________
Please answer the following questions:
Was this redbook published in time for your needs?
Yes____ No____
If no, please explain:
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
What other redbooks would you like to see published?
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
Comments/Suggestions:
( THANK YOU FOR YOUR FEEDBACK! )
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
_____________________________________________________________________________________________________
 Copyright IBM Corp. 1999
159
SG24-5252-00
IBML
SG24-5252-00
Printed in the U.S.A.
Intelligent Miner for Data Applications Guide
PAGE SEGMENT 5252F122 CONTAINS INVALID DATA.
′.E D F A W R K ′ LINE 900: .si 5252F122 inline
STARTING PASS 2 OF 2.
+ + + P a g e c h e c k : document requires more passes or extended cross-reference to resolve correctl
y. (Page 32 File: 5252CH3)