Download OMEGA - LIACS

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Cluster analysis wikipedia , lookup

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
Data Mining and Knowledge Discovery
for Strategic Business Optimization
Peter van der Putten
ALP Group, LIACS & KiQ Ltd
November 2004
Why is a business in business?
• Successful businesses create a lot of added
value for their customers and capture it
– Maximize long term profit
• Optimize: Maximize sales, minimize costs, minimize risk
Challenges
• Businesses are bigger
• Fragmentation of products, customer interaction
channels, market segments
• Fierce competition, chaotic economic climate
and dynamic customer behavior
• Data glut & information overflow
• Solution: data mining & knowledge discovery for
strategic business optimization
Credit scoring case: minimizing loan
risk while maximizing loan acception
All applications
Expert knowledge
29.8% accepted
12.7% infection
Prediction
model plus
rules
34.5% accepted
9.1% infection
Accepted volume
Marketing case: maximizing direct
mail response while minimizing cost
100.00
90.00
80.00
Cum. positive
70.00
A model was created that
predicts the probability to
respond to a mailing. By
using the model to select
customers to mail we could
reach 50% of the responders
by mailing only 20% of all
customers
60.00
50.00
40.00
30.00
20.00
10.00
0.00
0
10
20
30
40
50
Cases (%)
Logistic-Regression
60
70
80
90
100
Siebel
predicts
a slight
Although the next
Within
OMEGA
customer
general
offers
might
insurance,
Siebel
have
the
forfor
general
preferences
appropriate
aspreference
well,
OMEGA
theagain
text
predicts
exit
risk
itsis
ascript
OMEGA
offers
Siebel
insurance
and
offers
a
overriding.preference
Usingthe
a combination
engine.
for car
insurance
ofoneappropriate
text
to
click
cross-sell
button.
predictive models
and
offers
and
business
one-click
rules,
access
execute
a retention
script.
OMEGA suggests
to to
theSiebel
appropriate
an immediate
script.
attempt to retain the customer.
Overview
•
•
•
•
•
•
Why Data Mining?
The Data Mining Process
Data Mining Tasks
Data Mining Techniques
Future Outlook
Data Mining Opportunities by Sector and
Function
• Q&A
Some working definitions….
• ‘Data Mining’ and ‘Knowledge Discovery in
Databases’ (KDD) are used interchangeably
• Data mining =
– the discovery of interesting, meaningful and
actionable patterns hidden in large amounts of data
• Multidisciplinary field originating from artificial
intelligence, pattern recognition, statistics,
machine learning, econometrics, ….
Data mining is a process…
• Model Development
–
–
–
–
–
Objective
Data collection & preparation
Model construction
Model evaluation
Combining models with business knowledge into
decision logic
• Model / decision logic deployment
• Model / decision logic monitoring
Data mining tasks
• Undirected, explorative, descriptive,
‘unsupervised’ data mining
– Matching & search
– Profile & rule extraction
– Clustering & segmentation
• Directed, predictive, ‘supervised’ data mining
– Predictive modeling
Data mining task example:
Clustering & segmentation
Data mining task example:
Clustering & segmentation
Start Looking Glass
Tussenresultaat looking glass
Resultaat Looking Glass
Resultaat Looking Glass
Data mining task example:
predictive modeling
Past experience
Data
Behaviour
Good
Bad
Bad
Case A
Good
Case B
Score
Model
Case A
7
Case B
4
10
9
8
7
6
5
4
3
2
1
Better
business
Worse
business
Data mining task example:
predictive modeling
Income
Age
Children
60K
38
2
30K
23
1
30K
29
0
...
...
...
120K
55
2
Collected data
Data mining task example:
predictive modeling
Income
Age
Children Status
60K
38
2
Good
30K
23
1
Good
30K
29
0
Bad
...
...
...
...
120K
55
2
Bad
Known customer
behaviour
Data mining task example:
predictive modeling
Income
Age
Children Status
Value
Score
60K
38
2
Good
100
12
30K
23
1
Good
45
2
30K
29
0
Bad
-80
-24
...
...
...
...
...
...
120K
55
2
Bad
-40
-5
score = (0 x Income) + (-1 x Age) + (25 x Children)
Data mining task example:
predictive modeling
• Recruitment
– Who will respond to a mailing campaign?
– To who can we cross sell which products?
– What will be the customer value one year from now?
• Retention
– Who is going to cancel his/her mobile phone subscription.
Should I attempt to keep this customer?
– Which customers have accounts that will go dormant?
• Risk
– Should I sell a loan to this person?
– How much money will someone claim on a policy?
– Is this caller going to pay his bills?
Data mining techniques
for predictive modeling
•
•
•
•
•
Linear and logistic regression
Decision trees
Neural Networks
Genetic Algorithms
….
Linear Regression Models
score
=
(0 x Income) + (-1 x Age) + (25 x Children)
Regression in pattern space
Only a single line available in pattern space to separate classes
income
Class ‘square’
Class ‘circle’
age
Decision Trees
20000 customers
response 1%
Income >150000?
yes
no
1200 customers
balance>50000?
yes
400 customers
response 0,1%
18800 customers
Purchases >10?
no
800 customers
response 1,8%
no
etc.
Decision Trees in Pattern Space
Line pieces perpendicular to
axes
income
Each line is a split in the tree,
two answers to a question
age
Infotrees (Genetic Programming)
• Nested regression formulas
– sum(average(region, spend), max(age, children))
sum
max
average
region
spend
age
children
Infotrees in Pattern Space
Infotrees can seperate
any class in pattern
space, even if the class
boundary is non-linear
income
 Can model complex
customer behavior
age
Genetic Algorithms / Programming
• How to find the best Infotree? Genetic algorithms
– Based on the idea of evolution
– Start with (random) Infotrees
– Build a new generation
• Fittest models can reproduce to create offspring, worst
models die
• Small amount of mutation occurs to keep exploring
– Repeat process
Notes about Infotree models:
Cross-over
•New models can be created by cross-over:
– part of one model is swapped with part of another
– parts may chosen randomly or intelligently
cross-over
point
old model
s1
amean
region
quadv
age
spend
new model
convex
concave
amean
cross-over
point
old model
region
convex
invert
salary
concave
age
children
spend
age
children
Notes about Infotree models:
Mutation
• New models can be created by mutation:
– part of a model (a sub-tree, operator or predictor) is changed
– part and type of change may chosen randomly or intelligently
convex
convex
Sub-tree
region
spend
Operator
age
children
house
TV Region
convex
concave
spend
age
children
region
spend
concave
spend
age
children
convex
amean
region
children
concave
s2
convex
Predictor
age
convex
amean
region
concave
s2
concave
amean
age
children
concave
s2
house
spend
age
children
Short Demo
(if time allows…)
Model to predict caravan policy ownership
Combining this model with other models
and business rules
Data Mining: the Future
• Business (marketing)
– More fine-grained segmentation down to the cluster or
individual level
– More personalised actions, inbound and outbound, in all
customer contact channels
– Optimization of both value for the business and the
customer
– Privacy
• Technical
– From Data Mining to Decisioning, combining multiple
models with business rules
– Monitoring business and model performance
– Data Mining Process Automation
Let’s discuss:
Data Mining Opportunities by Function
•
•
•
•
•
•
•
•
•
Marketing, Sales, CRM
Product Development, R&D
Manufacturing, Production, Logistics
Customer service
Finance
Procurement
Human Resources
IT
….
Let’s discuss:
Data Mining Opportunities by Sector
•
•
•
•
•
•
•
•
•
Retail
Telco
Pharma
Government
Automotive
Oil
Charity
Consumers / Citizens
….
The Paper: Requirements
• 2500 words + -10%, APA style references
• No plagiarism / copying! Rephrase in your own words,
reference, cite & quote
• Two parts of each 1250 words
– Your grasp of the research topic: what is data mining? Own
interpretation, clear, put into context
– Memo to CEO/CIO of a specific company / industry: what are
the benefits/changes/opportunities and next steps (best
practice, proof of concept)? Impact, convincing, plan to
action.
The Paper: Suggestions
• Suggestions for ‘companies’
– KPN Mobile, Marketing: how to reduce loss of customers to
competitors
– Dutch Police, Strategic Innovation: opportunities for law
enforcement, privacy implications
– Pfizer, Drug Discovery: using data mining to find new drugs
– Google, Product Management / R&D: opportunities for new
data mining features to enhace customer experience
– Your Idea!
The Paper: Resources
• Webpage for this talk:
– http://www.liacs.nl/~putten/ictvision.html
• General Writing Resources:
– http://www.liacs.nl/~putten/writingpapers.html
• Homepage:
– www.liacs.nl/~putten , mail [email protected]
Dilbert’s Perspective on Data Mining