Download How to approach a segmentation project for ten million customers with SAS Enterprise Miner

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Nonlinear dimensionality reduction wikipedia , lookup

Transcript
How to approach a segmentation project
for ten million customers with SAS
Enterprise Miner
Luis Miguel Muruzabal
Endesa - Spain
Florence, 1st June, 2001
Index
Introduction to Endesa
Starting point
Short term solution: Segmentation project using SAS
SeUGI Florence 2001
IM&P
Electricity Liberalization Process in Spain
Main Agents
Monetary
relation
RETAILERS
POOL - OME
(liberalized)
DISTRIBUTORS
GENERATION
(regulated)
REE
Physical
Relation
FINAL
CUSTOMERS
DISTRIBUTORS
(Transport)
Planned and real Schedules of liberalization
1998
1999
>15 GWh
1998
2002
>9 GWh
>15 GWh
SeUGI Florence 2001
>5 GWh
2000
1999
>1 GWh
2004
High Voltage
2007
>1 GWh
2003
Mass Market
Mass Market
Sharp acceleration of the
liberalization proccess
FREE
MARKET
IM&P
Endesa in Spain and other countries (year 2000)
Supplies
GWh
3.693.000
25.347
ERZ
701.000
3.957
GESA
553.000
3.941
3.872.000
20.307
Unelco
877.000
5.700
Viesgo
495.000
3.350
Endesa Energía
6.300
20.500
TOTAL Spain
10.198.000
83.102
Chile
>1.500.000
>22.000
Argentina
>4.000.000
>30.000
Colombia
>1.500.000
>15.000
Brazil
>3.000.000
>15.000
>800.000
>6.900
>11.000.000
>80.000
FECSA
Sevillana
Perú
TOTAL Latin America
Holland
> 500.000
Total Europe
> 500.000
TOTAL
>22.000.000 >150.000
Endesa Suppliers
Competitors
NATIONAL MARKET SHARE = 44,7%
Countries where Endesa distributes electricity
SeUGI Florence 2001
IM&P
Main market segments of Endesa in Spain (1999)
CUSTOMERS
1%
16%
USAGE
REVENUE
0%
25%
29%
36%
46%
13%
17%
83%
Residential
18%
16%
Small businesses
Commercial + Industrial
Key Customers
Customers
GWh
Mill. Ptas
kWh/cust.
Residential
8.486.000
20.800
417.000
2.500
Small businesses
1.620.000
7.300
140.000
4.500
87.100
17.200
250.000
197.500
6.900
37.800
387.000
5.500.000
10.200.000
83.100
1.194.000
8.150
99,9%
34%
47%
2.800
C&I
Key Customers
Total
% Mass Market
SeUGI Florence 2001
IM&P
Index
Introduction to Endesa
Starting point
– Starting point
– Data Mart Implementation
– Operating Plan (medium term)
Short term solution: Segmentation project using SAS
SeUGI Florence 2001
IM&P
Starting Point
No real need to know well the customers and their needs, as there was a monopolistic situation
All the efforts focused on cost reductions
DATA BASES
➤Seven data bases and massive volume (10 Mill. customers)
➤Few relevant variables for segmentation
➤Data quality not equal among the different data bases
➤Analysis information was, mainly, “in papers”
SYSTEMS
➤Oriented to the customer administration, not for analysis
➤Data requests to the S.I. Department “on-need”
➤High Time response - Low flexibility
➤No specific Marketing tools
SEGMENTATION
SeUGI Florence 2001
➤“Home-made”, with very basic tools (Excel, Access…)
➤Baseda on internal variables: Tariff, usage, kW, revenue...
➤Descriptive statistics and basic activities for C&I
➤Difficult to define common strategies for the whole Group
➤Need to know better the customers and to improve the marketing
campaigns
IM&P
Data-Mart Launching Approach
Beat unbelief on the tool
Very low initial budget (implementation, tools, etc)
Acquire experience
Use very few relevant data
INITIAL
VERSION
Revision
Revision
Revision
DATA
WAREHOUSE
Incorporate more data and External Data Bases
Incorporate other tools (OLAP, Data-Mining)
Increase processing power
Increase the number of users
SeUGI Florence 2001
IM&P
Implementation Plan Scheme
Data-Mining
Pilot
Prove
Benefits
Human
Resources
Implement
Data-Mart
Marketing
Campaigns
Short term solution
SeUGI Florence 2001
Increase
specialization
Increase
functionality
Technical
environment
Tools
(OLAP, Data
Mining, GIS)
Increased
Knowledge
Improve
data quality
Medium term solution
IM&P
Existing Tools in the market
Company decission
support capabilities
Decission
support
Final User
Presentation
Business
Analysts
Visualization tools
Data Mining
Discover Information
Technical
Analysts
Exploration
OLAP, ROLAP, Statistical
Analysis, Query reporting
Data Warehouse / Data Mart
Data Sources
Data Base
Administrator
Internal and External Databases, files...
Source : IBM
SeUGI Florence 2001
IM&P
Main Analysis Tools
Query Reporting
Data extraction (usually relational data bases)
Basic segmentations
Data preparation for marketing campaigns
Users need high technical knowledge
Decission Support
System (DSS) / EIS
Visualization and Multidimensional Analysis (presentation)
Pre-defined queries
Solid information upon the time
Users need less technical knowledge
Data Mining
SAS Enterprise MinerTM
Geomarketing / GIS
SeUGI Florence 2001
Discover relations among variables
Behaviour patterns
Customer profiles
Requires very expert users
Visualize customers on the territory
Search potential customers based on standard profiles
Easy to use, but with some limitations
Requieres data normalization
IM&P
Operating plan (Medium Term)
MARKET
SEGMENTATION
FRAUD
DETECTION
CHURNING
DETECTION - ACTION
LOCATE
CUSTOMERS
(OWN - OTHERS)
OPERATING
PLAN
PREDICTIONS
MARKETING
CAMPAIGNS
DEFINITION AND
MONITORING OF
GOALS
OTHER USES
SeUGI Florence 2001
IM&P
Structure and Resources Needed
FUNCTIONAL
TECHNICAL /
SYSTEMS
➤ CREATE A SPECIFIC TEAM
➤ ALIGNEMENT OF THE FUNCTIONAL
AND TECHNICAL AREAS
➤ SUPPORT FROM THE ORGANIZATION
➤ LEARN WITH EXTERNAL SUPPORT
EXTERNAL
(CONSULTING)
MANAGEMENT /
ORGANIZATION
SeUGI Florence 2001
IM&P
Index
Introduction to Endesa
Starting point
Short term solution: Segmentation project using SAS
– Main goals
– Methodology
– Results
SeUGI Florence 2001
IM&P
Specific questions to answer
➤ Is the existing pure electric data enough to make a good
segmentation?
➤ What can we know about our customers from the electric usage
data?
➤ What should we know of our customers to define segments?
➤To sell electricity
➤To sell other products and services
➤ How good is our database? (quality)
➤ What kind of problems do we have from our structure of
companies?
➤ Which are the main steps to make an advanced market reseach?
SeUGI Florence 2001
IM&P
Main goals of the Data Mining Pilot Project
➤ Make a segmentation of the residential and SMEs, using DataMining tools
➤ Check the feasibility of these tools for Endesa’s needs
➤ Availability of the description for the main tipologies of customers
➤ Be able to make strategic analysis until the implementation of the
final Data Mart and its tools
➤ Define the main segmentation variables
➤ Detect the potential improvement of the data quality
➤ Identify the main features of the future Data Mart, and the tools
needed
SeUGI Florence 2001
IM&P
SAS Methodology applied to Endesa: Data Mining SEMMA
SELECTION OF THE
INITIAL VARIABLES
EXPLORATION
VISUAL
EXPLORATION
MODIFICATION
CREATION OF
NEW VARIABLES
MODELING
NEURONAL
NETWORKS
ASSESSMENT
FINAL MODEL
SeUGI Florence 2001
SAMPLING
SELECTION
VARIABLES
REMOVE
WITHOUT
SAMPLING
DESCRIPTIVE
STATISTICS
IMPUTATION OF
MISSING VALUES
DECISSION
TREES
LOGISTIC
REGRESSION
USER DEFINED
MODELS
IMPROVE AND REFINE
SAMPLING
COMPARE MODELS AGAINST STATISTIC
AND BUSINESS RULES
SELECT THE BEST MODEL
Tool:
External help:
Data source:
SAS Enterprise MinerTM
SAS + Apex Group
Endesa´s Distributors files
(supplies - customers)
IM&P
Sampling Model
Representative sample of supplies data (tariff, power,
usage, revenue, geographic zone, climate zone…)
Random stratified sample, through proportional criteria on
the variables:
Residential
➤
➤
➤
➤
Tariff
Company
Habitat
Activity code
Sample size: 1.500.000 supplies
No personal data, to avoid regulation problems
SMEs
SeUGI Florence 2001
Without sampling
Only companies (not personal data)
Analyze the total population, once the data has been
modified
Sample size: 1.250.000 supplies
IM&P
Exploration and Modification
EXPLORATION
MODIFICATION
Analysis of the sample distribution
Descriptive analysis of the variables to study
Generation of frecuency counts for the main variables and cross tabulation
Visual exploration of data
Correlation analysis
Identification of the main discriminant variables
Creation of new variables, through direct and statistic methods (factorial
analysis)
Transformation and replacement of variables to the model that is going to
be used
Imputation of missing values
Creation of typologies (target variable), through cluster analysis (k-mean
algorithm)
Data partition using proportional criteria for the same variables of the
sample:
➤50% Training data set
➤30% Validation data set
➤20% Test data set
SeUGI Florence 2001
IM&P
Modelizng
• Create a predictive model for the different segments:
9 Neuronal Networks
9 Decission trees, created authomatically and interactively
9 Logistic Regression
9 Creation of Specific Models ( combination of different models)
SeUGI Florence 2001
IM&P
Validation and Scoring
•
•
•
•
There is no unique best model to discriminate all segments
Selection of the best discrimination model for each segment
Execution of models and scoring to the whole data base
Clasification of supplies according to the best probability of segment belonging
SeUGI Florence 2001
IM&P
Pilot Project consecuences
➤ Significative improvement of the customer knowledge
➤ Organization changes: Oriented to main market segments
➤ Improvements in the campaing preparation
➤ Strong need to improve the data quality
➤ Need to obtain aditional information (no electric)
➤ Need to create a Data-Mart:
➤ Obtain customer authorization to use personal data
➤ Primary and secondary information
➤ Need to invest in specific tools
SeUGI Florence 2001
IM&P
Data Mart Structure (Implementation scheme)
SECONDARY SOURCES
(i.e, surveys)
PERCEIVED QUALITY SYSTEM
EXTERNAL
INFORMATION
STATISTIC INFORMATION
(i.e, EGM, INE)
COMERCIAL
SYSTEMS
CUSTOMERS
DATA QUALITY PROCEDURES
AND DATA PROTECTION
Public Data Bases
CUSTOMER
MANAGEMENT
SYSTEMS
• CUSTOMERS
• CONTACTS
• PROSPECTS
Data Mart:
• Residential
• SMEs
• ...
MARKET RESEARCH
ANALYSIS TOOLS
•
•
•
•
•
Extractions
Reports
Analysis
Operating Plan
...
• i.e, Enterprise MinerTM
SeUGI Florence 2001
IM&P
Additional data from customers
SURVEYS
(samples)
DIRECT
ACTION
INTERNAL
ANALYSIS
CAMPAIGNS
CORRECTION
(only authorized)
(Supply Companies)
(Previous authorization
from the customer)
COMERCIAL SYSTEM
CUSTOMER
PROFILES
PUBLIC
INFORMATION
DATA MART
ACTUALIZATION
DATA MART + PROFILES
CAMPAIGNS
(profiles)
SeUGI Florence 2001
IM&P
Results application in Endesa
APPLICATIONS
IMPLEMENTATION
Descriptive Segmentation
Actual
Basic Segmentation
Actual
Behaviour Patterns
Future
Customer Scoring
Actual /Future
Marketing Campaigns
Actual / Future
Churning Analysis - Prevention
Future
Customer Value
Actual / Future
Fraud Detection
Future
Strategic Analysis
Actual / Future
SeUGI Florence 2001
IM&P
Summary...
➤ It is key that the functional area leads the project, with the
support of the technical area
➤ It is easier to start with a low cost pilot, rather than full scale
➤ Focus the pilot on proving the benefits of these tools
➤ You don´t need to have all the tools from the begining
➤ You need an operating plan and a growth plan
➤ You need external partners (i.e. SAS), who know well these
techniques and have solid and proven tools
SeUGI Florence 2001
IM&P