Reducing user attrition, i.e. churn, is a broad challenge faced by several industries. In mobile social games, decreasing churn is decisive to increase player retention and rise revenues. Churn prediction models allow to understand player loyalty and to anticipate when they will stop playing a game. Thanks to these predictions, several initiatives can be taken to retain those players who are more likely to churn.
Survival analysis focuses on predicting the time of occurrence of a certain event, churn in our case. Classical methods, like regressions, could be applied only when all players have left the game. The challenge arises for datasets with incomplete churning information for all players, as most of them still connect to the game. This is called a censored data problem and is in the nature of churn. Censoring is commonly dealt with survival analysis techniques, but due to the inflexibility of the survival statistical algorithms, the accuracy achieved is often poor. In contrast, novel ensemble learning techniques, increasingly popular in a variety of scientific fields, provide high-class prediction results.
In this work, we develop, for the first time in the social games domain, a survival ensemble model which provides a comprehensive analysis together with an accurate prediction of churn. For each player, we predict the probability of churning as function of time, which permits to distinguish various levels of loyalty profiles. Additionally, we assess the risk factors that explain the predicted player survival times. Our results show that churn prediction by survival ensembles significantly improves the accuracy and robustness of traditional analyses, like Cox regression.
Predicting Churn in Mobile Games Using Survival Ensembles
1. Churn Prediction in
Mobile Social Games:
Towards a Complete
Assessment Using Survival
Ensembles
1
África Periáñez, Alain Saas, Anna Guitart, Colin Magne
IEEE/ACM DSAA 2016
Montreal, October 19th, 2016
2. 2
Churn prediction in Free-To-Play games
We focus on the top spenders: the whales
➔ 0.2% of the players, 50 % of the revenues
➔ Their high engagement make them more likely to answer positively to
action taken to retain them
➔ For this group, we can define churn as 10 days of inactivity
◆ The definition of churn in F2P games is not straightforward
3. Features selection
◎ Game independent features:
○ player attention: time spent per day, lifetime
○ player loyalty : number of days connecting, loyalty index (number of days
played over lifetime), days from registration to first purchase, days since
last purchase
○ player intensity: number of actions, sessions, amount in-app purchases,
action activity distance (total average actions compared to last days
behaviour)
○ player level: concept common to most games)
◎ Game dependent features researched but ultimately not part of our model:
○ participation in a guild (social feature)
○ actions measured by categories
3
5. Challenge: modeling churn
◎ Survival analysis focuses on predicting the
time-to-event, e.g. churn
○ when a player will stop playing?
◎ Classical methods, like regressions, are appropriate
when all players have left the game
◎ Censoring Problem: dataset with incomplete churning
information
◎ Censoring is the nature of churn
➔ Survival analysis is used in biology and medicine to
deal with this problem
➔ Ensemble learning techniques provide high-class
prediction results
5
6. ◎ We focus on whales
◎ Cumulative survival probability (Kaplan-Meier estimates)
◎ Step function that changes every time that a player churns
6
Output of the model
7. ◎ Two approaches:
○ Churn as a binary classification
○ Churn as a censored data problem
◎ One model: Conditional Inference Survival Ensembles1
○ deals with censoring
○ high accuracy due to ensemble learning
Survival Analysis
➔ Survival analysis methods (e.g. Cox regression) does not follow any
particular statistical distribution: fitted from data
➔ Fixed link between output and features: efforts to model selection and
evaluation
1) Hothorn et al., 2006. Unbiased recursive partitioning: A conditional inference framework 7
Challenge: modeling churn
8. Survival Tree
➔ Split the feature space
recursively
➔ Based on survival statistical
criterion the root node is
divided in two daughter nodes
➔ Maximize the survival
difference between nodes
➔ A single tree produces
instability predictions
Conditional Survival Ensembles
➔ Outstanding predictions
➔ Make use of hundreds of trees
➔ Conditional inference survival
ensemble use a Kaplan-Meier
function as splitting criterion
➔ Overfit is not present
➔ Robust information about
variable importance
➔ Not biased approach
8
Conditional inference survival ensembles
9. Conditional inference survival tree partition with
Kaplan-Meier estimates of the survival time which
characterizes the players placed in every terminal node group
9
Linear rank
statistics as
splitting criterion
Survival tree
10. ◎ Two steps algorithm:
○ 1) the optimal split variable is selected: association between
covariates and response
○ 2) the optimal split point is determined by comparing two sample
linear statistics for all possible partitions of the split variable
Random Survival Forest
➔ RSF is based on original random forest algorithm1
➔ RSF favors variables with many possible split points over variables
with fewer
101) Breiman L. 2001. Random Forests.
Conditional inference survival ensembles
16. ◎ Censoring problem is the right approach
○ the median survival time, i.e. time when the percentage of
surviving in the game is 50%, can be used as a time threshold
to categorize a player in the risk of churning
◎ Binary problem -- static model
○ also bring relevant information
○ useful insight for a short-term prediction
◎ SVM, ANN, Decision Trees, etc. are useful tools for regression or
classification problems.
○ in their original form cannot handle with censored data
○ 1) modification of algorithm or 2) transformation of the data
16
Survival ensembles approach
17. ◎ Application of state-of-the-art algorithm “conditional inference
survival ensembles”
○ to predict churn
○ and survival probability of players in social games
◎ Model able to make predictions every day in operational
environment
◎ adapts to other game data: Democratize Game Data Science
◎ relevant information about whales behaviour
○ discovering new playing patterns as a function of time
○ classifying gamers by risk factors of survival experience
◎ Step towards the challenging goal of the comprehensive
understanding of players
17
Summary and conclusion
18. 18
Other work of the authors related to Game Data Science
Discovering Playing Patterns:
Time Series Clustering of Free-To-Play Game Data
Alain Saas, Anna Guitart and África Periáñez
IEEE CIG 2016
Special Session on Game Data Science
Chaired by Alain Saas and África Periáñez
IEEE/ACM DSAA 2016
www.gamedatascience.org