Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
How to approach a segmentation project for ten million customers with SAS Enterprise Miner Luis Miguel Muruzabal Endesa - Spain Florence, 1st June, 2001 Index Introduction to Endesa Starting point Short term solution: Segmentation project using SAS SeUGI Florence 2001 IM&P Electricity Liberalization Process in Spain Main Agents Monetary relation RETAILERS POOL - OME (liberalized) DISTRIBUTORS GENERATION (regulated) REE Physical Relation FINAL CUSTOMERS DISTRIBUTORS (Transport) Planned and real Schedules of liberalization 1998 1999 >15 GWh 1998 2002 >9 GWh >15 GWh SeUGI Florence 2001 >5 GWh 2000 1999 >1 GWh 2004 High Voltage 2007 >1 GWh 2003 Mass Market Mass Market Sharp acceleration of the liberalization proccess FREE MARKET IM&P Endesa in Spain and other countries (year 2000) Supplies GWh 3.693.000 25.347 ERZ 701.000 3.957 GESA 553.000 3.941 3.872.000 20.307 Unelco 877.000 5.700 Viesgo 495.000 3.350 Endesa Energía 6.300 20.500 TOTAL Spain 10.198.000 83.102 Chile >1.500.000 >22.000 Argentina >4.000.000 >30.000 Colombia >1.500.000 >15.000 Brazil >3.000.000 >15.000 >800.000 >6.900 >11.000.000 >80.000 FECSA Sevillana Perú TOTAL Latin America Holland > 500.000 Total Europe > 500.000 TOTAL >22.000.000 >150.000 Endesa Suppliers Competitors NATIONAL MARKET SHARE = 44,7% Countries where Endesa distributes electricity SeUGI Florence 2001 IM&P Main market segments of Endesa in Spain (1999) CUSTOMERS 1% 16% USAGE REVENUE 0% 25% 29% 36% 46% 13% 17% 83% Residential 18% 16% Small businesses Commercial + Industrial Key Customers Customers GWh Mill. Ptas kWh/cust. Residential 8.486.000 20.800 417.000 2.500 Small businesses 1.620.000 7.300 140.000 4.500 87.100 17.200 250.000 197.500 6.900 37.800 387.000 5.500.000 10.200.000 83.100 1.194.000 8.150 99,9% 34% 47% 2.800 C&I Key Customers Total % Mass Market SeUGI Florence 2001 IM&P Index Introduction to Endesa Starting point – Starting point – Data Mart Implementation – Operating Plan (medium term) Short term solution: Segmentation project using SAS SeUGI Florence 2001 IM&P Starting Point No real need to know well the customers and their needs, as there was a monopolistic situation All the efforts focused on cost reductions DATA BASES ➤Seven data bases and massive volume (10 Mill. customers) ➤Few relevant variables for segmentation ➤Data quality not equal among the different data bases ➤Analysis information was, mainly, “in papers” SYSTEMS ➤Oriented to the customer administration, not for analysis ➤Data requests to the S.I. Department “on-need” ➤High Time response - Low flexibility ➤No specific Marketing tools SEGMENTATION SeUGI Florence 2001 ➤“Home-made”, with very basic tools (Excel, Access…) ➤Baseda on internal variables: Tariff, usage, kW, revenue... ➤Descriptive statistics and basic activities for C&I ➤Difficult to define common strategies for the whole Group ➤Need to know better the customers and to improve the marketing campaigns IM&P Data-Mart Launching Approach Beat unbelief on the tool Very low initial budget (implementation, tools, etc) Acquire experience Use very few relevant data INITIAL VERSION Revision Revision Revision DATA WAREHOUSE Incorporate more data and External Data Bases Incorporate other tools (OLAP, Data-Mining) Increase processing power Increase the number of users SeUGI Florence 2001 IM&P Implementation Plan Scheme Data-Mining Pilot Prove Benefits Human Resources Implement Data-Mart Marketing Campaigns Short term solution SeUGI Florence 2001 Increase specialization Increase functionality Technical environment Tools (OLAP, Data Mining, GIS) Increased Knowledge Improve data quality Medium term solution IM&P Existing Tools in the market Company decission support capabilities Decission support Final User Presentation Business Analysts Visualization tools Data Mining Discover Information Technical Analysts Exploration OLAP, ROLAP, Statistical Analysis, Query reporting Data Warehouse / Data Mart Data Sources Data Base Administrator Internal and External Databases, files... Source : IBM SeUGI Florence 2001 IM&P Main Analysis Tools Query Reporting Data extraction (usually relational data bases) Basic segmentations Data preparation for marketing campaigns Users need high technical knowledge Decission Support System (DSS) / EIS Visualization and Multidimensional Analysis (presentation) Pre-defined queries Solid information upon the time Users need less technical knowledge Data Mining SAS Enterprise MinerTM Geomarketing / GIS SeUGI Florence 2001 Discover relations among variables Behaviour patterns Customer profiles Requires very expert users Visualize customers on the territory Search potential customers based on standard profiles Easy to use, but with some limitations Requieres data normalization IM&P Operating plan (Medium Term) MARKET SEGMENTATION FRAUD DETECTION CHURNING DETECTION - ACTION LOCATE CUSTOMERS (OWN - OTHERS) OPERATING PLAN PREDICTIONS MARKETING CAMPAIGNS DEFINITION AND MONITORING OF GOALS OTHER USES SeUGI Florence 2001 IM&P Structure and Resources Needed FUNCTIONAL TECHNICAL / SYSTEMS ➤ CREATE A SPECIFIC TEAM ➤ ALIGNEMENT OF THE FUNCTIONAL AND TECHNICAL AREAS ➤ SUPPORT FROM THE ORGANIZATION ➤ LEARN WITH EXTERNAL SUPPORT EXTERNAL (CONSULTING) MANAGEMENT / ORGANIZATION SeUGI Florence 2001 IM&P Index Introduction to Endesa Starting point Short term solution: Segmentation project using SAS – Main goals – Methodology – Results SeUGI Florence 2001 IM&P Specific questions to answer ➤ Is the existing pure electric data enough to make a good segmentation? ➤ What can we know about our customers from the electric usage data? ➤ What should we know of our customers to define segments? ➤To sell electricity ➤To sell other products and services ➤ How good is our database? (quality) ➤ What kind of problems do we have from our structure of companies? ➤ Which are the main steps to make an advanced market reseach? SeUGI Florence 2001 IM&P Main goals of the Data Mining Pilot Project ➤ Make a segmentation of the residential and SMEs, using DataMining tools ➤ Check the feasibility of these tools for Endesa’s needs ➤ Availability of the description for the main tipologies of customers ➤ Be able to make strategic analysis until the implementation of the final Data Mart and its tools ➤ Define the main segmentation variables ➤ Detect the potential improvement of the data quality ➤ Identify the main features of the future Data Mart, and the tools needed SeUGI Florence 2001 IM&P SAS Methodology applied to Endesa: Data Mining SEMMA SELECTION OF THE INITIAL VARIABLES EXPLORATION VISUAL EXPLORATION MODIFICATION CREATION OF NEW VARIABLES MODELING NEURONAL NETWORKS ASSESSMENT FINAL MODEL SeUGI Florence 2001 SAMPLING SELECTION VARIABLES REMOVE WITHOUT SAMPLING DESCRIPTIVE STATISTICS IMPUTATION OF MISSING VALUES DECISSION TREES LOGISTIC REGRESSION USER DEFINED MODELS IMPROVE AND REFINE SAMPLING COMPARE MODELS AGAINST STATISTIC AND BUSINESS RULES SELECT THE BEST MODEL Tool: External help: Data source: SAS Enterprise MinerTM SAS + Apex Group Endesa´s Distributors files (supplies - customers) IM&P Sampling Model Representative sample of supplies data (tariff, power, usage, revenue, geographic zone, climate zone…) Random stratified sample, through proportional criteria on the variables: Residential ➤ ➤ ➤ ➤ Tariff Company Habitat Activity code Sample size: 1.500.000 supplies No personal data, to avoid regulation problems SMEs SeUGI Florence 2001 Without sampling Only companies (not personal data) Analyze the total population, once the data has been modified Sample size: 1.250.000 supplies IM&P Exploration and Modification EXPLORATION MODIFICATION Analysis of the sample distribution Descriptive analysis of the variables to study Generation of frecuency counts for the main variables and cross tabulation Visual exploration of data Correlation analysis Identification of the main discriminant variables Creation of new variables, through direct and statistic methods (factorial analysis) Transformation and replacement of variables to the model that is going to be used Imputation of missing values Creation of typologies (target variable), through cluster analysis (k-mean algorithm) Data partition using proportional criteria for the same variables of the sample: ➤50% Training data set ➤30% Validation data set ➤20% Test data set SeUGI Florence 2001 IM&P Modelizng • Create a predictive model for the different segments: 9 Neuronal Networks 9 Decission trees, created authomatically and interactively 9 Logistic Regression 9 Creation of Specific Models ( combination of different models) SeUGI Florence 2001 IM&P Validation and Scoring • • • • There is no unique best model to discriminate all segments Selection of the best discrimination model for each segment Execution of models and scoring to the whole data base Clasification of supplies according to the best probability of segment belonging SeUGI Florence 2001 IM&P Pilot Project consecuences ➤ Significative improvement of the customer knowledge ➤ Organization changes: Oriented to main market segments ➤ Improvements in the campaing preparation ➤ Strong need to improve the data quality ➤ Need to obtain aditional information (no electric) ➤ Need to create a Data-Mart: ➤ Obtain customer authorization to use personal data ➤ Primary and secondary information ➤ Need to invest in specific tools SeUGI Florence 2001 IM&P Data Mart Structure (Implementation scheme) SECONDARY SOURCES (i.e, surveys) PERCEIVED QUALITY SYSTEM EXTERNAL INFORMATION STATISTIC INFORMATION (i.e, EGM, INE) COMERCIAL SYSTEMS CUSTOMERS DATA QUALITY PROCEDURES AND DATA PROTECTION Public Data Bases CUSTOMER MANAGEMENT SYSTEMS • CUSTOMERS • CONTACTS • PROSPECTS Data Mart: • Residential • SMEs • ... MARKET RESEARCH ANALYSIS TOOLS • • • • • Extractions Reports Analysis Operating Plan ... • i.e, Enterprise MinerTM SeUGI Florence 2001 IM&P Additional data from customers SURVEYS (samples) DIRECT ACTION INTERNAL ANALYSIS CAMPAIGNS CORRECTION (only authorized) (Supply Companies) (Previous authorization from the customer) COMERCIAL SYSTEM CUSTOMER PROFILES PUBLIC INFORMATION DATA MART ACTUALIZATION DATA MART + PROFILES CAMPAIGNS (profiles) SeUGI Florence 2001 IM&P Results application in Endesa APPLICATIONS IMPLEMENTATION Descriptive Segmentation Actual Basic Segmentation Actual Behaviour Patterns Future Customer Scoring Actual /Future Marketing Campaigns Actual / Future Churning Analysis - Prevention Future Customer Value Actual / Future Fraud Detection Future Strategic Analysis Actual / Future SeUGI Florence 2001 IM&P Summary... ➤ It is key that the functional area leads the project, with the support of the technical area ➤ It is easier to start with a low cost pilot, rather than full scale ➤ Focus the pilot on proving the benefits of these tools ➤ You don´t need to have all the tools from the begining ➤ You need an operating plan and a growth plan ➤ You need external partners (i.e. SAS), who know well these techniques and have solid and proven tools SeUGI Florence 2001 IM&P