These slides are from a talk I gave at Google Campus Madrid for the Machine Learning Meetup. The main subject is uplift modelling. Starting from a churn model approach for an e-gaming company, we introduce when to apply uplift methods, how to mathematically model them, and finally, how to evaluate them.
2. A few words about me
â˘âŻ Senior Data Scientist at Dataiku
(worked on churn prediction, fraud detection, bot detection, recommender systems, graph
analytics, smart cities, ⌠)
â˘âŻ Occasional Kaggle competitor
â˘âŻ Mostly code with python and SQL
â˘âŻ Twitter @prrgutierrez
3. Plan
â˘âŻ Introduction / Clients situation
â˘âŻ Uplift use case examples
â˘âŻ Uplift modelling
â˘âŻ Uplift evaluation & results
4. Client situation
â˘âŻ French Online Gaming Company (RPG)
â˘âŻ A lot of users are leaving
â˘âŻ letâs do a churn prediction model !
â˘âŻ Target : no come back in 14 or 28 days.
(14 missing days -> 80 % of chance not to come back
28 missing days -> 90 % of chance not to come back)
â˘âŻ Features :
â˘âŻ Connection features :
â˘âŻ Time played in 1,7,15,30,⌠days
â˘âŻ Time since last connection
â˘âŻ Connection frequency
â˘âŻ Days of week / hours of days played
â˘âŻ Equivalent for payments and subscriptions
â˘âŻ Age, sex, country
â˘âŻ Number of account, is a bot âŚ
â˘âŻ No in game features (no data)
Â
Â
5. Client situation
â˘âŻ Model Results :
â˘âŻ AUC 0.88
â˘âŻ Very stable model
â˘âŻ Marketing actions :
â˘âŻ 7 diďŹerent actions based on customer segmentation
(oďŹers, promotion, ⌠)
â˘âŻ A/B test
-> -5 % churn for persons contacted by email
â˘âŻ Going further :
â˘âŻ Feature engineering : guilds, close network, in game actions, âŚ
â˘âŻ Study long term churn âŚ
6. Client situation
â˘âŻ But wait !
â˘âŻ Strong hypothesis : target the person that are the most likely to churn
7. Client situation
â˘âŻ But wait !
â˘âŻ Strong hypothesis : target the person that are the most likely to churn
â˘âŻ What is the gain / person for an action ?
â˘âŻ cost of action
â˘âŻ value of the customer
â˘âŻ independent variables
â˘âŻ âtreatedâ population and âcontrolâ population
â˘âŻ
â˘âŻ Value with action :
â˘âŻ Value without action :
â˘âŻ Gain (if independent of treatment ) :
c
vi i
X
T C
Y =
â˘
1 if customer churn
0 otherwise
ET
(Vi) = vi(1 PT
(Y = 1|X)) c
EC
(Vi) = vi(1 PC
(Y = 1|X))
vi
E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c
8. Client situation
â˘âŻ But wait !
â˘âŻ Strong hypothesis : target the person that are the most likely to churn
â˘âŻ What is the gain / person for an action ?
â˘âŻ Objective : maximize this gain
â˘âŻ Targeting highly probable churner -> minimize
But not the diďŹerence !
â˘âŻ Intuitive examples :
â˘âŻ : action is expected to make the situation worst. Spam ?
â˘âŻ : user does not care, is already lost
Upli&
 =
 Model
Â
E(Gi) = vi(PC
(Y = 1|X) PT
(Y = 1|X)) c
PT
(Y = 1|X)
PC
(Y = 1) ⥠PT
(Y = 1)
P
PC
(Y = 1) < PT
(Y = 1)
9. Uplift
â˘âŻ Model eďŹect of the action
â˘âŻ 4 groups of customers / patients
â˘âŻ 1  Responded because of the action
(the people we want)
â˘âŻ 2  Responded, but would have responded anyway
(unnecessary costs)
â˘âŻ 3  Did not respond and the action had no impact
(unnecessary costs)
â˘âŻ 4  Did not respond because the action had a negative impact
(negative impact)
â˘âŻ Incomplete knowledge
10. Uplift Examples
â˘âŻ Healthcare :
â˘âŻ A typical medical trial:
â˘âŻ treatment group: gets the treatment
â˘âŻ control group: gets placebo (or another treatment)
â˘âŻ do a statistical test to show that the treatment is better than placebo
â˘âŻ With uplift modeling we can find out for whom the treatment works best
â˘âŻ Personalized medicine
â˘âŻ Ex : What is the gain in survival probability ?
-> classification/uplift problem
11. Uplift Examples
â˘âŻ Churn :
â˘âŻ E-gaming
â˘âŻ Other Ex : Coyote
â˘âŻ Retail :
â˘âŻ Compare coupons campaigns
12. Uplift Examples
â˘âŻ Mailing : Hillstrom challenge
â˘âŻ 2 campaigns :
â˘âŻ one men email
â˘âŻ one woman email
â˘âŻ Question : who are the people to target / that have the best response rate
13. Uplift Examples
â˘âŻ Common pattern
â˘âŻ Experiment or A/B testing -> Test and control
â˘âŻ Warning : Control can be biased easily :
â˘âŻ Targeted most probable churners and control is the rest
â˘âŻ Call only the people that come to a shop
â˘âŻ Limited experiment trial -> no bandit algorithm :
(once a medicine experiment is done, you donât continue the âexplorationâ)
-> relatively large and discrete in time feedbacks.
14. Uplift modelling
â˘âŻ Three main methods :
â˘âŻ Two models approach
â˘âŻ Class variable modification
â˘âŻ Modification of existing machine learning models
15. Uplift modelling : Two model approach
â˘âŻ Build a model on treatment to get
â˘âŻ Build a model on control to get
â˘âŻ Set :
PT
(Y |X)
PC
(Y |X)
P = PT
(Y |X) PC
(Y |X)
16. Uplift modelling : Two model approach
â˘âŻ Advantages :
â˘âŻ Standard ML models can be used
â˘âŻ In theory, two good estimators -> a good uplift model
â˘âŻ Works well in practice
â˘âŻ Generalize to regression and multi-treatment easily
â˘âŻ Drawbacks
â˘âŻ DiďŹerence of estimators is probably not the best estimator of the diďŹerence
â˘âŻ The two classifier can ignore the weaker uplift signal (since itâs not their target)
â˘âŻ Algorithm focusing on estimating the diďŹerence should perform better
17. Uplift modelling : Class variable modification
â˘âŻ Introduced in Jaskowski, Jaroszewicz 2012
â˘âŻ Allows any classifier to be updated to uplift modeling
â˘âŻ Let denote the group membership (Treatment or Control)
â˘âŻ Letâs define the new target variable :
â˘âŻ This corresponds to flipping the target in the control dataset.
G 2 {T, C}
Z =
8
<
:
1 if G = T and Y = 1
1 if G = C and Y = 0
0 otherwise
18. Uplift modelling : Class variable modification
â˘âŻ Why does it work ?
â˘âŻ By design (A/B test warning !), should be independent from
â˘âŻ Possibly with a reweighting of the datasets we should have :
thus
P(Z = 1|X) = PT
(Y = 1|X)P(G = T|X) + PC
(Y = 0|X)P(G = C|X)
P(Z = 1|X) = PT
(Y = 1|X)P(G = T) + PC
(Y = 0|X)P(G = C)
G X
P(G = T) = P(G = C) = 1/2
2P(Z = 1|X) = PT
(Y = 1|X) + PC
(Y = 0|X)
19. Uplift modelling : Class variable modification
â˘âŻ Why does it work ?
Thus
And sorting by is the same as sorting by
2P(Z = 1|X) = PT
(Y = 1|X) + PC
(Y = 0|X)
= PT
(Y = 1|X) + 1 PC
(Y = 1|X)
P = 2P(Z = 1|X) 1
P(Z = 1|X) P
20. Uplift modelling : Class variable modification
â˘âŻ Summary :
â˘âŻ Flip class for control dataset
â˘âŻ Concatenate test and control dataset
â˘âŻ Build a classifier
â˘âŻ Target users with highest probability
â˘âŻ Advantages :
â˘âŻ Any classifier can be used
â˘âŻ Directly predict uplift (and not each class separately)
â˘âŻ Single model on a larger dataset (instead of two small ones)
â˘âŻ Drawbacks :
â˘âŻ Complex decision surface -> model can perform poorly
â˘âŻ Interpretation : what is AUC in this case ?
21. Uplift modeling : Other methods
â˘âŻ Based on decision trees :
â˘âŻ Rzepakowski Jaroszewicz 2012
new decision tree split criterion based on information theory
â˘âŻ Soltys Rzepakowski Jaroszewicz 2013
Ensemble methods for uplift modeling
(out of today scope)
22. Evaluation
â˘âŻ We used :
â˘âŻ 2 model approach. -> AUC ? Not very informative.
â˘âŻ 1 model approach -> does AUC means something ?
â˘âŻ How can we evaluate / compare them ?
â˘âŻ Cross Validation :
â˘âŻ 4 datasets : treatment/control x train/test
â˘âŻ Problem :
â˘âŻ We donât have a clear 0/1 target.
â˘âŻ We would need to know for each customer
â˘âŻ Response to treatment
â˘âŻ Response to control
-> not possible
23. Evaluation
â˘âŻ Gain for group of customers :
â˘âŻ Gain for the 10% highest scoring customers =
% of successes for top 10% treated customers â % of successes for top 10% control
customers
â˘âŻ Uplift curve ? :
â˘âŻ DiďŹerence between two lift curve
â˘âŻ Interpretation : net gain in success rate if a given percentage of the population is treated
â˘âŻ Pb : no theoretic maximum
â˘âŻ Pb 2 : weird behaviour for 2 wizard models.
24. Evaluation : Qini
â˘âŻ Qini Measure :
â˘âŻ Similar to Gini (Area under lift curve). Lift Curve <-> Qini Curve
â˘âŻ Parametric curve defined by :
â˘âŻ When taking the first observations
â˘âŻ is the total number of 1 seen in target observations
â˘âŻ is the total number of 1 seen in control observations
â˘âŻ is the total number of target observations
â˘âŻ is the total number of control observations
â˘âŻ Balanced setting :
t
f(t) = YT (t) YC(t) ⤠NC(t)/NT (t)
YT
YC
NC
NT
f(t) = YT (t) YC(t)
25. Evaluation : Qini
â˘âŻ Personal intuition :
â˘âŻ We canât know everything :
â˘âŻ treated that convert, not treated that donât convert. What would have happen ?
â˘âŻ But we donât want to see :
â˘âŻ Treated not converting
â˘âŻ Not treated converting (in our top list)
â˘âŻ In we want to minimize :
â˘âŻ Very similar to lift taking into account only negative examples.
t
NT (t) YT (t) + YC(t)
27. Evaluation : Qini
â˘âŻ Best model :
â˘âŻ Take first all positive in target and last all positive in control.
â˘âŻ No theoretic best model :
â˘âŻ depends on possibility of negative eďŹect
â˘âŻ Displayed for no negative eďŹect
â˘âŻ Random model :
â˘âŻ Corresponds to global eďŹect of treatment
â˘âŻ Hillstrom Dataset :
â˘âŻ For women models are comparable and useful
â˘âŻ For men, there is no clear individuals to target
29. Evaluation : Qini
â˘âŻ Back to our study :
â˘âŻ Class modification performs best
â˘âŻ Two models approach performs poorly
â˘âŻ A/B test failure :
â˘âŻ Control dataset is way to small !
â˘âŻ Class modification model very close to lift
â˘âŻ Two model slightly better than random
-> need to redo the A/B test.
30. Conclusion
â˘âŻ Uplift :
â˘âŻ Surprisingly little literature / examples
â˘âŻ The theory is rather easy to test
â˘âŻ Two models
â˘âŻ Class modification
â˘âŻ The intuition and evaluation are not easy to grasp
â˘âŻ On the client side :
â˘âŻ I donât loose hope weâll do the A/B test again
â˘âŻ A good lead to select the best oďŹer for a customer
31. A few references
â˘âŻ Data :
â˘âŻ Churn in gaming :
WOWAH dataset (blog post to come)
â˘âŻ Uplift for healthcare :
Colon Dataset
â˘âŻ Uplift in mailing :
Hillstrom data challenge
â˘âŻ Uplift in General :
Simulated data :
(blog post to come)
32. A few references
â˘âŻ Application
â˘âŻ Uplift modeling for clinical trial data (Jaskowski, Jaroszewicz)
â˘âŻ Uplift Modeling in Direct Marketing (Rzepakowski, Jaroszewicz)
33. A few references
â˘âŻ Modeling techniques :
â˘âŻ Rzepakowski Jaroszewicz 2011 (decision trees)
â˘âŻ Soltys Rzepakowski Jaroszewicz 2013 (ensemble for uplift)
â˘âŻ Jaskowski Jaroszewicz 2012 (Class modification model)
34. A few references
â˘âŻ Evaluation
â˘âŻ Using Control Groups to Target on Predicted Lift (RadcliďŹe)
â˘âŻ Testing a New Metric for Uplift Models (Mesalles Naranjo)