Download Churn prediction in telecoms via data mining approach: A case

Churn prediction in telecoms via data mining approach: A case study using SAS Enterprise Miner Jean-Jacques ESSOME BELL 1 The approach used: Supervised learning from usage pattern This method is the classic data mining approach using the so-called supervised learning. The fundament of this approach is that we learn from the behavior of some customers in order to predict what could happen to similar ones. The principle of this methods is depicted in the graph below and consists in the following: •We define a key variable to study (aka target variable), here Churn status; •We constitute a database of customers described with the target variable (customers who churned and those who did not) and all other available variables coming from internal source (data warehouse) and/or external source (market research). •We build a sample from a database (generic sample in the diagram), the size of which is defined using market research principles (*). •This sample is split between a learning (aka training) sample which will serve to build several models, a validation sample which will serve to observe the behavior and to fine tune these models, and a test sample which will serve to determine the accuracy of the model selected. •The model selected will be applied to the whole customer base by determining the probability of any subscriber to churn; this is the scoring. (*) We do not necessarily need to build an astronomic sample. Even if the company has 10 or 20 millions 2 done subscribers, we can get accurate results with a sample of 10,000 subscribers if the selection has been correctly. The case study The case study developed here concerns a telecom company that faces a critical problem of churn/attrition, the rate being estimated at 4.1% per month. Data requirements for the Analysis The basic requirements are: • Data from customer information file like age, sex, Zip code etc. • Data from service account file such as Pricing plan, activation data, contract identification etc. • Data from billing system such as number of calls, airtime, fixed line time, total amount spent, no. of times calls made to customer care center, change in price plan etc. 3 1. Description of the variables Due to our supposed lack of knowledge about the factors that can explain churn, many variables have been considered here. Note that only the variable name and the description are presented here. In the original file information like type of variable or range are available. Var. # Variable name Description 1. Msisdn Telephone number 2. Offer Tariff plan subscribed to 3. Three_g Customer has 3G 4. mms Customer has MMS 5. gprs Customer has GPRS 6. tenure Relation between the customer and Lambda Telecoms till date 7. Churn_ind Churn 8. Count_voice_national Number of national calls the customer has made 9. Duration_voice_national Duration of national calls made by the customer 10. Value_voice_national Amount of national calls made by the customer 11. Count_voice_international Number of international calls the customer has made 12. Duration_voice_international Duration of international calls made by the customer 13. Value_voice_international Amount of international calls made by the customer 14. Count_voice_roaming Number of roaming calls the customer has made 15. Duration_voice_roaming Duration of roaming calls made by the customer 16. Value_voice_roaming Amount of roaming calls made by the customer 17. Count_SMS Number of SMS the customer has sent 18. Value_SMS Value of SMS sent by the customer 4 1. Description of the variables Var. # Variable name Description 19. Count_MMS Number of MMS the customer has sent 20. Value_MMS Value of MMS sent by the customer 21. Count_contents Number of Contents the customer has downloaded 22. Value_contents Total amount contents by the customer 23. Total_value_ASPU Total amount usage by the customer 24. Total_count Total number of calls made by the customer 25. Count_peak Total number of peak calls made by the customer 26. Count_offpeak Total number of offpeak calls made by the customer 27. Total_duration Total duration of the calls the customer has made 28. Call_lenght Average duration of the calls made by the customer 29. SOI Number of different persons called by the customer 30. Count_incoming_calls Number of calls received by the customer 31. SOR Number of different persons that have called the customer 32. Duration_incoming_calls Total duration of calls received by the customer 33. Change_total_value Change in amount of usage by the customer compared with previous period 34. Change_total_duration Change in total duration by the customer compared with previous period 35. Ratio_SOI/SOR Ratio between number of persons called and number of persons who called the customer 36. Ratio_Peak/offpeak Ratio between peak calls and off peak calls by the customer 5 2. Process flow of the project (Churn 31-JayJay) The process flow of this churn prediction modeling using SAS Enterprise Miner is depicted on page 8. We can observe the following steps regarding the data mining process. 1. After being imported into SAS Interface, the sample dataset is described via classic techniques of descriptive statistics in order to obtain a preliminary understand of churn phenomenon: proportion of churners, ranking of the factors that have an impact on churn, etc. 2. The sample has been partitioned between: •A learning sample, aka training sample, that helps to build the various models; •A validation sample that is used to prevent a modeling node from overfitting the training data and to compare models •A test sample which is used for a final assessment of the model. Note that in SAS Enterprise Miner the repartition of the total sample can be done in terms of absolute numbers or in terms of proportion. We choose to partition the sample as follows: •Learning sample: 50%; •Validation sample: 25%; •Test sample: 25% 3. A phase of data preparation is implemented before the utilization of the models. The key tasks of this phase concern the imputation of missing values, the transformation and the selection of the variables. Imputation of missing values. This task is very important before using approaches like Neural Networks or Logistic Regression because they ignore observations that contain missing values. The imputation method varies according to the type of data: •For numerical variables, we selected median of the non-missing values to replace the missing ones. •For categorical variables, we choose tree surrogate. Then the predicted values of a decision tree will be used to replace the missing values. 6 2. Process flow of the project (Churn 31-JayJay) Variables transformation The issue of variable transformation is due to the fact that input data can be more informative on a scale other than that on which it was originally collected. For example, variable transformations can be used to stabilize variance, remove nonlinearity or to counter non-normality. Therefore, for many models, transformations of the input data can lead to a better model. A lot of methods can be used for variables transformation. We decided to select the variables with a strong skewness, and to apply a log transformation. Variables selection In some cases, it is advised to reduce the number of variables to include in the model. This is due to the fact that some variables (as identified in the exploratory data analysis phase) have a very poor discriminance power. In addition, when we have too many variables we also need a lot of computer resources. In our example, we decided to change the role of these variables into rejected. This decision was taken just before the selection of neural network model. 4. For the effective stage of modelling, we decided to make use of 4 approaches: Decision tree: CART algorithm Logistic Regression: stepwise algorithm Neural Network: auto-neural-cascade architecture Memory-Based Reasoning(aka K-Nearest Neighbours) In this handbook (see following pages), we elaborate only on decision tree and logistic regression. 5. Each of these models produced outcomes regarding churn rules, and they were compared in terms of accuracy and prediction power in order to select the best one using both the learning sample and the validation sample. 6. The best model is then assessed via the test sample. And on this is applied the scoring process in order to produce a churn probability for any customer of the whole database. 7 2. Process flow of the project (Churn 31-JayJay) For a tutorial purpose, the read can see in the ribbon above the diagram the 6 steps that constitute the process flow of a data mining project as applied by SAS: the so-called SEMMA process (Sample-Explore-Modify-Model-Assess). 8 3. Outcomes of the project For space reasons, we present the outcomes of only two churn prediction models here: decision tree and logistic regression. Outcomes from Decision Trees Decision Tree approach is actually the predictive models that is most used in combination with others. Various algorithms can be used (CART, CHAID, C 5.0, etc), each of them have various parameters we can play with. That is why it is possible to build a lot of different models based on decision tree. In our example, we selected CART (Classification And Regression Tree) algorithm for which the splitting rule is based on Gini index. Note that in decision tree, the other decisions to take concern: The maximum depth of the tree. For instance when setting 10 the tree will have up to ten generations of the root node. The leaf size constraint. When choosing 100, we decide that the minimum number of training observations (here subscribers) in any leaf will be 100. The Number of Surrogate Rules. This specification enables SAS Enterprise Miner to use a given number of surrogate rules in each non-leaf node if the main splitting rule relies on an input whose value is missing. Then missing values are not problematic in the decision trees approach. Below is an example of the tree built with CART algorithm. 9 3. Outcomes of the project The outcomes of a decision tree can also be presented under the shape of business rules (called English Rules in SAS terminology) as depicted in the diagram below. The English Rules window contains the IF-THEN logic that distributes observations into each leaf node of the decision tree. It is important to notice that generally several churn rules are produced, due the various profiles of potential churners. Coming back to the tree of the previous page, below are two examples of churn rules for our example. Note that there are seven leaf nodes in this tree. For each leaf node, the following information is listed: •node number (for instance 14); •number of training observations in the node (for example192 in node 14); •percentage of training observations in the node with churn_ind=no (did not churn), adjusted for prior probabilities; percentage of training observations in the node with churn_ind=yes (churned), adjusted for prior probabilities. 10 3. Outcomes of the project Outcomes from Logistic Regression Before being applied, logistic regression models need an intensive phase of data preparation in order to deal with missing values and the distribution shape of some variables. Three categories of models can be used with logistic regression approach: backward, forward and stepwise. In our example, we selected stepwise. Below is an example of the outcomes as produced with SAS EM. We can observe that the two most discriminant variables are the same as those identified with decision tree approach: ratio_SOI_SOR and ratio_peak_offpeak. 11 4. Models comparison The four churn prediction models built (decision tree, logistic regression, memorybased reasoning, neural network) are then compared in order to select the best one using the validation sample. A lot of criteria can be used for the selection. Some of them are statistical metrics: average squared errors, misclassification rate (based on confusion matrix), etc. Other take the shape of a graph: ROC chart, Lift curve, etc. On the next page, we present three examples of the outcomes of models comparison. The table depicts the statistical metrics of the four models and SAS Enterprise Miner recommends the best one (aka champion model): with the value Y in the first column. It appears that decision tree is the best model, with a misclassification rate of 6.97% on the training sample and 7.33% on the validation sample. On the other hand, neural network is the worst, with a misclassification rate that reaches 31.9% on the validation sample (!). It is interesting to notice that the error is lower for the training sample than for the validation and test sample. A good indicator to look at for the predictive capacity of the model is how increases this error from one sample to another. When we look at the Lift curve for the test sample, we can observe that with decision tree, identifying churners from a random sample of 20% of the total base will produce 3.7 as much effective than a random choice. In other words, instead of applying a loyalty program to the whole base, we can select 20% of the base and reach 74% of the churners, and then save a lot of money. The cumulative %capture response presents the same result in another form. 12 4. Models comparison 13 5. Model deployment via Scoring The phase of model deployment concerns the scoring of new observations (new data set) based on the churn rules of the champion model. In other words, what will be the probability of a given customer to churn over the next months? At this stage we need to write down a SAS code as depicted in the diagram below. 14 5. Model deployment via Scoring We can observe is the small diagram that it is possible to select only the subscribers who reach a given threshold of churn probability, here 0.85. 15 6. Customer Knowledge Matrix Building a Customer Strategic Map is the first step that makes it possible to move from Mass Marketing to Target Marketing and ultimately to real CRM. Below is the Customer Strategic Map generated from our case study. It becomes then possible to adopt a clear strategy for each individual subscriber knowing the quadrant he/she belongs to. (aka Customer Strategic Map) 16

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Top subcategories

Download Churn prediction in telecoms via data mining approach: A case