Download Churn Prediction in Mobile Social Games: Towards a Complete

Survey
yes no Was this document useful for you?
   Thank you for your participation!

* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project

Document related concepts

Merchant Prince wikipedia , lookup

Transcript
Churn Prediction in
Mobile Social Games:
Towards a Complete
Assessment Using Survival
Ensembles
África Periáñez, Alain Saas, Anna Guitart and Colin Magne
IEEE/ACM DSAA 2016
Montreal, October 19th, 2016
1
About us
Who are we?
● Game and technology company based in Tokyo (spin-off of
Silicon Graphics)
● Research project to provide Game Data Science as a Service
● Goals: predict player behavior, scale to big data and
intuitive result visualization
2
Our data
● Free-to-play mobile social games
● in-app purchases and activity behavioral data
3
Churn prediction in Free-To-Play games
We focus on the top spenders: the whales
➔ 0.2% of the players, 50 % of the revenues
➔ Their high engagement make them more likely to answer positively to
action taken to retain them
➔ For this group, we can define churn as 10 days of inactivity
◆ The definition of churn in F2P games is not straightforward
4
The model
Survival Ensembles
5
Challenge: modeling churn
◎
Survival analysis focuses on predicting the
time-to-event, e.g. churn
○
when a player will stop playing?
◎
Classical methods, like regressions, are appropriate
when all players have left the game
◎ Censoring Problem: dataset with incomplete churning
information
◎ Censoring is the nature of churn
➔
Survival analysis is used in biology and medicine to
deal with this problem
➔ Ensemble learning techniques provide high-class
prediction results
6
Output of the model
◎
◎
◎
◎
We focus on whales
Churn definition as 10 days of inactivity
Cumulative survival probability (Kaplan-Meier estimates)
Step function that changes every time that a player churns
7
Challenge: modeling churn
◎
Two approaches:
○
○
◎
Churn as a binary classification
Churn as a censored data problem
One model: Conditional Inference Survival Ensembles1
○ deals with censoring
○ high accuracy due to ensemble learning
Survival Analysis
➔
Survival analysis methods (e.g. Cox regression) does not follow any
particular statistical distribution: fitted from data
➔
Fixed link between output and features: efforts to model selection and
evaluation
1)
Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework
8
Conditional inference survival ensembles
Survival Tree
Conditional Survival Ensembles
➔
Outstanding predictions
➔
Make use of hundreds of trees
➔
Conditional inference survival
ensemble use a Kaplan-Meier
function as splitting criterion
➔ Maximize the survival
difference between nodes
➔
Overfit is not present
➔
➔
Robust information about
variable importance
➔
Not biased approach
➔ Split the feature space
recursively
➔ Based on survival statistical
criterion the root node is
divided in two daughter nodes
A single tree produces
instability predictions
9
Survival tree
Linear rank
statistics as
splitting criterion
Conditional inference survival tree partition with
Kaplan-Meier estimates of the survival time which
characterizes the players placed in every terminal node group
10
Conditional inference survival ensembles
◎ Two steps algorithm:
○ 1) the optimal split variable is selected: association between
covariates and response
○ 2) the optimal split point is determined by comparing two sample
linear statistics for all possible partitions of the split variable
Random Survival Forest
➔ RSF is based on original random forest algorithm1
➔ RSF favors variables with many possible split points over variables
with fewer
1)
Breiman L. 2001. Random Forests.
11
Features selection
◎ Game independent features:
○ player attention:
● time spent per day
○ player loyalty :
● number of days connecting (loyalty index)
● days from registration to first purchase
● days since last purchase
○ player intensity:
● number of actions, sessions, etc.
● amount in-app purchases
◎ Game dependent features:
● player level: (concept common to most games)
12
Features selection
◎ Game independent features:
○ player attention: time spent per day, lifetime
○ player loyalty : number of days connecting, loyalty index (number of days
played over lifetime), days from registration to first purchase, days since
last purchase
○ player intensity: number of actions, sessions, amount in-app purchases,
action activity distance (total average actions compared to last days
behaviour)
○ player level: concept common to most games)
◎ Game dependent features researched but ultimately not part of our model:
○ participation in a guild (social feature)
○ actions measured by categories
13
The Results
With “Age of Ishtaria” Game Data
14
Binary classification results and comparison with other
models
15
Censored data problem results
Predicted Kaplan-Meier survival curves as a function
of time (days) for new or existing players
16
Validation -- Churn prediction
17
Validation -- Churn prediction
1000 bootstrap cross-validation error curves for
the survival ensemble model and Cox
regression
18
Survival ensembles approach
◎
Censoring problem is the right approach
○ the median survival time, i.e. time when the percentage of
surviving in the game is 50%, can be used as a time threshold
to categorize a player in the risk of churning
◎
Binary problem -- static model
○ also bring relevant information
○ useful insight for a short-term prediction
◎
SVM, ANN, Decision Trees, etc. are useful tools for regression or
classification problems.
○ in their original form cannot handle with censored data
○ 1) modification of algorithm or 2) transformation of the data
19
Summary and conclusion
◎
Application of state-of-the-art algorithm “conditional inference
survival ensembles”
○ to predict churn
○ and survival probability of players in social games
◎
Model able to make predictions every day in operational
environment
◎
adapts to other game data: Democratize Game Data Science
◎
relevant information about whales behaviour
○ discovering new playing patterns as a function of time
○ classifying gamers by risk factors of survival experience
◎
Step towards the challenging goal of the comprehensive
understanding of players
20
Other work related to Game Data Science
Discovering Playing Patterns:
Time Series Clustering of Free-To-Play Game Data
Alain Saas, Anna Guitart and África Periáñez
IEEE CIG 2016
Special Session on Game Data Science
Chaired by Alain Saas and África Periáñez
IEEE/ACM DSAA 2016
www.gamedatascience.org
[email protected]
21