Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Churn Prediction in Mobile Social Games: Towards a Complete Assessment Using Survival Ensembles África Periáñez, Alain Saas, Anna Guitart and Colin Magne IEEE/ACM DSAA 2016 Montreal, October 19th, 2016 1 About us Who are we? ● Game and technology company based in Tokyo (spin-off of Silicon Graphics) ● Research project to provide Game Data Science as a Service ● Goals: predict player behavior, scale to big data and intuitive result visualization 2 Our data ● Free-to-play mobile social games ● in-app purchases and activity behavioral data 3 Churn prediction in Free-To-Play games We focus on the top spenders: the whales ➔ 0.2% of the players, 50 % of the revenues ➔ Their high engagement make them more likely to answer positively to action taken to retain them ➔ For this group, we can define churn as 10 days of inactivity ◆ The definition of churn in F2P games is not straightforward 4 The model Survival Ensembles 5 Challenge: modeling churn ◎ Survival analysis focuses on predicting the time-to-event, e.g. churn ○ when a player will stop playing? ◎ Classical methods, like regressions, are appropriate when all players have left the game ◎ Censoring Problem: dataset with incomplete churning information ◎ Censoring is the nature of churn ➔ Survival analysis is used in biology and medicine to deal with this problem ➔ Ensemble learning techniques provide high-class prediction results 6 Output of the model ◎ ◎ ◎ ◎ We focus on whales Churn definition as 10 days of inactivity Cumulative survival probability (Kaplan-Meier estimates) Step function that changes every time that a player churns 7 Challenge: modeling churn ◎ Two approaches: ○ ○ ◎ Churn as a binary classification Churn as a censored data problem One model: Conditional Inference Survival Ensembles1 ○ deals with censoring ○ high accuracy due to ensemble learning Survival Analysis ➔ Survival analysis methods (e.g. Cox regression) does not follow any particular statistical distribution: fitted from data ➔ Fixed link between output and features: efforts to model selection and evaluation 1) Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework 8 Conditional inference survival ensembles Survival Tree Conditional Survival Ensembles ➔ Outstanding predictions ➔ Make use of hundreds of trees ➔ Conditional inference survival ensemble use a Kaplan-Meier function as splitting criterion ➔ Maximize the survival difference between nodes ➔ Overfit is not present ➔ ➔ Robust information about variable importance ➔ Not biased approach ➔ Split the feature space recursively ➔ Based on survival statistical criterion the root node is divided in two daughter nodes A single tree produces instability predictions 9 Survival tree Linear rank statistics as splitting criterion Conditional inference survival tree partition with Kaplan-Meier estimates of the survival time which characterizes the players placed in every terminal node group 10 Conditional inference survival ensembles ◎ Two steps algorithm: ○ 1) the optimal split variable is selected: association between covariates and response ○ 2) the optimal split point is determined by comparing two sample linear statistics for all possible partitions of the split variable Random Survival Forest ➔ RSF is based on original random forest algorithm1 ➔ RSF favors variables with many possible split points over variables with fewer 1) Breiman L. 2001. Random Forests. 11 Features selection ◎ Game independent features: ○ player attention: ● time spent per day ○ player loyalty : ● number of days connecting (loyalty index) ● days from registration to first purchase ● days since last purchase ○ player intensity: ● number of actions, sessions, etc. ● amount in-app purchases ◎ Game dependent features: ● player level: (concept common to most games) 12 Features selection ◎ Game independent features: ○ player attention: time spent per day, lifetime ○ player loyalty : number of days connecting, loyalty index (number of days played over lifetime), days from registration to first purchase, days since last purchase ○ player intensity: number of actions, sessions, amount in-app purchases, action activity distance (total average actions compared to last days behaviour) ○ player level: concept common to most games) ◎ Game dependent features researched but ultimately not part of our model: ○ participation in a guild (social feature) ○ actions measured by categories 13 The Results With “Age of Ishtaria” Game Data 14 Binary classification results and comparison with other models 15 Censored data problem results Predicted Kaplan-Meier survival curves as a function of time (days) for new or existing players 16 Validation -- Churn prediction 17 Validation -- Churn prediction 1000 bootstrap cross-validation error curves for the survival ensemble model and Cox regression 18 Survival ensembles approach ◎ Censoring problem is the right approach ○ the median survival time, i.e. time when the percentage of surviving in the game is 50%, can be used as a time threshold to categorize a player in the risk of churning ◎ Binary problem -- static model ○ also bring relevant information ○ useful insight for a short-term prediction ◎ SVM, ANN, Decision Trees, etc. are useful tools for regression or classification problems. ○ in their original form cannot handle with censored data ○ 1) modification of algorithm or 2) transformation of the data 19 Summary and conclusion ◎ Application of state-of-the-art algorithm “conditional inference survival ensembles” ○ to predict churn ○ and survival probability of players in social games ◎ Model able to make predictions every day in operational environment ◎ adapts to other game data: Democratize Game Data Science ◎ relevant information about whales behaviour ○ discovering new playing patterns as a function of time ○ classifying gamers by risk factors of survival experience ◎ Step towards the challenging goal of the comprehensive understanding of players 20 Other work related to Game Data Science Discovering Playing Patterns: Time Series Clustering of Free-To-Play Game Data Alain Saas, Anna Guitart and África Periáñez IEEE CIG 2016 Special Session on Game Data Science Chaired by Alain Saas and África Periáñez IEEE/ACM DSAA 2016 www.gamedatascience.org [email protected] 21