Download Churn prediction in telecoms via data mining approach: A case

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Least squares wikipedia , lookup

Data assimilation wikipedia , lookup

Linear regression wikipedia , lookup

Regression analysis wikipedia , lookup

Choice modelling wikipedia , lookup

Coefficient of determination wikipedia , lookup

Transcript
Churn prediction in telecoms via
data mining approach:
A case study using SAS Enterprise Miner
Jean-Jacques ESSOME BELL
1
The approach
used: Supervised
learning from
usage pattern
This method is the classic data mining approach using the so-called supervised learning.
The fundament of this approach is that we learn from the behavior of some customers in
order to predict what could happen to similar ones. The principle of this methods is
depicted in the graph below and consists in the following:
•We define a key variable to study (aka target variable), here Churn status;
•We constitute a database of customers described with the target variable (customers
who churned and those who did not) and all other available variables coming from
internal source (data warehouse) and/or external source (market research).
•We build a sample from a database (generic sample in the diagram), the size of which is
defined using market research principles (*).
•This sample is split between a learning (aka training) sample which will serve to build
several models, a validation sample which will serve to observe the behavior and to fine
tune these models, and a test sample which will serve to determine the accuracy of the
model selected.
•The model selected will be applied to the whole customer base by determining the
probability of any subscriber to churn; this is the scoring.
(*) We do not necessarily need to build an astronomic sample. Even if the company has 10 or 20 millions
2 done
subscribers, we can get accurate results with a sample of 10,000 subscribers if the selection has been
correctly.
The case study
The case study developed here concerns a telecom company that faces a
critical problem of churn/attrition, the rate being estimated at 4.1% per month.
Data requirements for the Analysis
The basic requirements are:
• Data from customer information file like age, sex, Zip code etc.
• Data from service account file such as Pricing plan, activation data, contract
identification etc.
• Data from billing system such as number of calls, airtime, fixed line time, total
amount spent, no. of times calls made to customer care center, change in price
plan etc.
3
1.
Description of
the variables
Due to our supposed lack of knowledge about the factors that can explain churn,
many variables have been considered here. Note that only the variable name and the
description are presented here. In the original file information like type of variable or
range are available.
Var. #
Variable name
Description
1.
Msisdn
Telephone number
2.
Offer
Tariff plan subscribed to
3.
Three_g
Customer has 3G
4.
mms
Customer has MMS
5.
gprs
Customer has GPRS
6.
tenure
Relation between the customer and Lambda Telecoms till
date
7.
Churn_ind
Churn
8.
Count_voice_national
Number of national calls the customer has made
9.
Duration_voice_national
Duration of national calls made by the customer
10.
Value_voice_national
Amount of national calls made by the customer
11.
Count_voice_international
Number of international calls the customer has made
12.
Duration_voice_international
Duration of international calls made by the customer
13.
Value_voice_international
Amount of international calls made by the customer
14.
Count_voice_roaming
Number of roaming calls the customer has made
15.
Duration_voice_roaming
Duration of roaming calls made by the customer
16.
Value_voice_roaming
Amount of roaming calls made by the customer
17.
Count_SMS
Number of SMS the customer has sent
18.
Value_SMS
Value of SMS sent by the customer
4
1.
Description of
the variables
Var. #
Variable name
Description
19.
Count_MMS
Number of MMS the customer has sent
20.
Value_MMS
Value of MMS sent by the customer
21.
Count_contents
Number of Contents the customer has downloaded
22.
Value_contents
Total amount contents by the customer
23.
Total_value_ASPU
Total amount usage by the customer
24.
Total_count
Total number of calls made by the customer
25.
Count_peak
Total number of peak calls made by the customer
26.
Count_offpeak
Total number of offpeak calls made by the customer
27.
Total_duration
Total duration of the calls the customer has made
28.
Call_lenght
Average duration of the calls made by the customer
29.
SOI
Number of different persons called by the customer
30.
Count_incoming_calls
Number of calls received by the customer
31.
SOR
Number of different persons that have called the customer
32.
Duration_incoming_calls
Total duration of calls received by the customer
33.
Change_total_value
Change in amount of usage by the customer compared
with previous period
34.
Change_total_duration
Change in total duration by the customer compared with
previous period
35.
Ratio_SOI/SOR
Ratio between number of persons called and number of
persons who called the customer
36.
Ratio_Peak/offpeak
Ratio between peak calls and off peak calls by the
customer
5
2.
Process flow of
the project
(Churn 31-JayJay)
The process flow of this churn prediction modeling using SAS Enterprise Miner is
depicted on page 8. We can observe the following steps regarding the data mining
process.
1. After being imported into SAS Interface, the sample dataset is described via classic
techniques of descriptive statistics in order to obtain a preliminary understand of
churn phenomenon: proportion of churners, ranking of the factors that have an impact
on churn, etc.
2. The sample has been partitioned between:
•A learning sample, aka training sample, that helps to build the various models;
•A validation sample that is used to prevent a modeling node from overfitting the
training data and to compare models
•A test sample which is used for a final assessment of the model.
Note that in SAS Enterprise Miner the repartition of the total sample can be done in
terms of absolute numbers or in terms of proportion. We choose to partition the
sample as follows:
•Learning sample: 50%;
•Validation sample: 25%;
•Test sample: 25%
3. A phase of data preparation is implemented before the utilization of the models.
The key tasks of this phase concern the imputation of missing values, the
transformation and the selection of the variables.
Imputation of missing values.
This task is very important before using approaches like Neural Networks or Logistic
Regression because they ignore observations that contain missing values. The
imputation method varies according to the type of data:
•For numerical variables, we selected median of the non-missing values to replace the
missing ones.
•For categorical variables, we choose tree surrogate. Then the predicted values of a
decision tree will be used to replace the missing values.
6
2.
Process flow of
the project
(Churn 31-JayJay)
Variables transformation
The issue of variable transformation is due to the fact that input data can be more
informative on a scale other than that on which it was originally collected. For
example, variable transformations can be used to stabilize variance, remove
nonlinearity or to counter non-normality. Therefore, for many models,
transformations of the input data can lead to a better model.
A lot of methods can be used for variables transformation. We decided to select the
variables with a strong skewness, and to apply a log transformation.
Variables selection
In some cases, it is advised to reduce the number of variables to include in the model.
This is due to the fact that some variables (as identified in the exploratory data
analysis phase) have a very poor discriminance power. In addition, when we have too
many variables we also need a lot of computer resources.
In our example, we decided to change the role of these variables into rejected. This
decision was taken just before the selection of neural network model.
4. For the effective stage of modelling, we decided to make use of 4 approaches:
Decision tree: CART algorithm
Logistic Regression: stepwise algorithm
Neural Network: auto-neural-cascade architecture
Memory-Based Reasoning(aka K-Nearest Neighbours)
In this handbook (see following pages), we elaborate only on decision tree and logistic
regression.
5. Each of these models produced outcomes regarding churn rules, and they were
compared in terms of accuracy and prediction power in order to select the best one
using both the learning sample and the validation sample.
6. The best model is then assessed via the test sample. And on this is applied the
scoring process in order to produce a churn probability for any customer of the whole
database.
7
2.
Process flow of
the project
(Churn 31-JayJay)
For a tutorial purpose, the read can see in the ribbon above the diagram the 6 steps
that constitute the process flow of a data mining project as applied by SAS: the so-called
SEMMA process (Sample-Explore-Modify-Model-Assess).
8
3.
Outcomes of the
project
For space reasons, we present the outcomes of only two churn prediction models here:
decision tree and logistic regression.
Outcomes from Decision Trees
Decision Tree approach is actually the predictive models that is most used in combination with
others.
Various algorithms can be used (CART, CHAID, C 5.0, etc), each of them have various parameters
we can play with. That is why it is possible to build a lot of different models based on decision
tree.
In our example, we selected CART (Classification And Regression Tree) algorithm for which the
splitting rule is based on Gini index.
Note that in decision tree, the other decisions to take concern:
The maximum depth of the tree. For instance when setting 10 the tree will have up to ten
generations of the root node.
The leaf size constraint. When choosing 100, we decide that the minimum number of training
observations (here subscribers) in any leaf will be 100.
The Number of Surrogate Rules. This specification enables SAS Enterprise Miner to use a given
number of surrogate rules in each non-leaf node if the main splitting rule relies on an input
whose value is missing. Then missing values are not problematic in the decision trees approach.
Below is an example of the tree built with CART algorithm.
9
3.
Outcomes of the
project
The outcomes of a decision tree can also be presented under the shape of business
rules (called English Rules in SAS terminology) as depicted in the diagram below. The
English Rules window contains the IF-THEN logic that distributes observations into
each leaf node of the decision tree.
It is important to notice that generally several churn rules are produced, due the
various profiles of potential churners.
Coming back to the tree of the previous page, below are two examples of churn rules
for our example.
Note that there are seven leaf nodes in this tree. For each leaf node, the following
information is listed:
•node number (for instance 14);
•number of training observations in the node (for example192 in node 14);
•percentage of training observations in the node with churn_ind=no (did not churn),
adjusted for prior probabilities;
percentage of training observations in the node with churn_ind=yes (churned),
adjusted for prior probabilities.
10
3.
Outcomes of the
project
Outcomes from Logistic Regression
Before being applied, logistic regression models need an intensive phase of data
preparation in order to deal with missing values and the distribution shape of
some variables.
Three categories of models can be used with logistic regression approach:
backward, forward and stepwise.
In our example, we selected stepwise. Below is an example of the outcomes as
produced with SAS EM.
We can observe that the two most discriminant variables are the same as those
identified with decision tree approach: ratio_SOI_SOR and ratio_peak_offpeak.
11
4.
Models
comparison
The four churn prediction models built (decision tree, logistic regression, memorybased reasoning, neural network) are then compared in order to select the best one
using the validation sample.
A lot of criteria can be used for the selection.
Some of them are statistical metrics: average squared errors, misclassification rate
(based on confusion matrix), etc.
Other take the shape of a graph: ROC chart, Lift curve, etc.
On the next page, we present three examples of the outcomes of models comparison.
The table depicts the statistical metrics of the four models and SAS Enterprise Miner
recommends the best one (aka champion model): with the value Y in the first column.
It appears that decision tree is the best model, with a misclassification rate of 6.97% on
the training sample and 7.33% on the validation sample. On the other hand, neural
network is the worst, with a misclassification rate that reaches 31.9% on the validation
sample (!).
It is interesting to notice that the error is lower for the training sample than for the
validation and test sample. A good indicator to look at for the predictive capacity of the
model is how increases this error from one sample to another.
When we look at the Lift curve for the test sample, we can observe that with decision
tree, identifying churners from a random sample of 20% of the total base will produce
3.7 as much effective than a random choice. In other words, instead of applying a
loyalty program to the whole base, we can select 20% of the base and reach 74% of the
churners, and then save a lot of money. The cumulative %capture response presents
the same result in another form.
12
4.
Models
comparison
13
5.
Model
deployment via
Scoring
The phase of model deployment concerns the scoring of new observations (new
data set) based on the churn rules of the champion model. In other words, what
will be the probability of a given customer to churn over the next months?
At this stage we need to write down a SAS code as depicted in the diagram
below.
14
5.
Model
deployment via
Scoring
We can observe is the small diagram that it is possible to select only the subscribers
who reach a given threshold of churn probability, here 0.85.
15
6.
Customer
Knowledge
Matrix
Building a Customer Strategic Map is the first step that makes it possible to move from Mass
Marketing to Target Marketing and ultimately to real CRM.
Below is the Customer Strategic Map generated from our case study. It becomes then possible
to adopt a clear strategy for each individual subscriber knowing the quadrant he/she belongs to.
(aka Customer
Strategic Map)
16