The document proposes a framework that combines process mining and causal machine learning to discover causal rules from event logs. It identifies candidate treatments using action rule mining on event log data. It then uses uplift trees to discover subgroups where a treatment has a high causal effect on outcomes, addressing confounding. The approach was tested on a loan application dataset, discovering 8 causal rules that recommend process changes to increase loan selections. Future work includes incorporating other recommendation types and addressing unobserved confounding.
Process Mining Meets Causal Machine Learning: Discovering Causal Rules From Event Logs
1. Process Mining meets
Causal Machine Learning:
Discovering Causal Rules
from Event logs
Zahra Dasht Bozorgi, Irene Teinemaa, Marlon Dumas,
Marcello La Rosa, and Artem Polyvyanyy
1
2nd International Conference on Process Mining (ICPM 2020)
Padua, Italy, 4-9 October 2020
2. Introduction
An approach that addresses the following:
• Discover case-level treatment
recommendations to increase the positive
outcome rate of a process;
• Identify subsets of cases to which a
recommendation should be applied;
• Estimate the causal effect and the incremental
Return-on-Investment (ROI) of a treatment.
2
3. Preliminaries
What is Causal Inference?
Inferring the effects of any
treatment/policy/intervention/etc.
Examples:
• Effect of a treatment on a disease
• Effect of climate change policy on emissions
• Effect of an action on a business process outcome
3
Notation:
Y denotes outcome
T denotes treatment assignment
X denotes case attributes
4. Preliminaries
How do we measure causal effects?
Taking the difference between potential outcomes
Example: Loan application Process
Treatment/intervention: calling the customer after offer
Outcome: whether customer selects a loan offer or not
4
World 1
E[Y1] E[Y0]
World 2
Average Treatment Effect:
ATE = E[Y1 – Y0]
Conditional Average Treatment Effect:
CATE = E[Y1 – Y0|X= x]
CATE is also known as Uplift
in marketing literature.
5. Preliminaries
id T Y Y1 Y0 Y1 – Y0
1 0 0 ? 0 ?
2 1 1 1 ? ?
3 1 0 0 ? ?
4 0 0 ? 0 ?
5 0 1 ? 1 ?
6 1 1 1 ? ?
5
Fundamental Problem of Causal Inference:
Missing Data Problem
Solution: Randomised Experiment
• Expensive
• Time-consuming
• Unethical
Next best solution: Observational Study
8. Candidate Treatment Identification
Aim: generating recommendations automatically
Method: Action rule mining
• Extension of classification rules
• Suggests actions to change class label
• Based on support
Action Rule: r = [(a: a1) ∧ (b: b1 → b2)] ⇒ [y: y1 → y2]
8
Controllable
Uncontrollable
Input: Case Attributes
9. Candidate Treatment Identification
9
Rule: r = [(LoanGoal: Home Improvement) ∧ (NumOfTerms: 48 → 84)] ⇒ [Selected: 0 → 1]
with support = 0.05
Pre-condition Action Outcome
Example:
In loan applications where the loan goal is home improvement, changing the number of payback months (terms)
from 48 to 84 months will increase the chance of the customer selecting the loan offer.
Example: Loan application process
Attributes: Loan Goal (Uncontrollable)
Number of terms (Controllable)
10. Causal Rules Discovery
Aim: Discovering subgroups where an action has a
high causal effect on the outcome
Method: Uplift Trees1
Uplift tree: predicts which cases will have a
positive outcome because a treatment is applied.
101. P. Rzepakowski and S. Jaroszewicz, “Decision trees for uplift modelling with
single and multiple treatments,”Knowl. Inf. Syst., vol. 32, no. 2,2012.
11. Uplift Tree
11
Splitting criterion:
D(.) a divergence measure
PT probability distribution of the outcome in the treatment group
PC probability distribution of the outcome in the control group
Divergence Metrics:
Kullback-Leibler divergence
Squared Euclidean distance
Chi-squared divergence
12. Uplift Tree
E[Y|T=1, X=x] – E[Y|T=0, X=x]
Recall:
Uplift = E[Y1 | X=x] – E[Y0 | X=x]
The reason is Confounding.
12
≠
13. Confounding
Confounding is the presence of a variable that affects both treatment assignment and the outcome.
How do we address confounding in uplift trees?
Normalisation
13
NT and NC are the number of cases in the treatment and control group respectively.
N is the total number of cases. A is a test. And H(.) is entropy.
14. Example Causal Rule
Example rule:
• Action: [NumOfTerms: 48 → 84]
• Objective: [Selected: 0 → 1]
• Sub-group:
a) Customers whose loan goal is not existing loan takeover AND
b) have a credit score less than 920 AND
c) their offer includes a monthly cost greater than 149 AND
d) and the first withdrawal amount is less than 8304
14
15. Ranking Rules using Cost-benefit Model
Parameters:
• v: value (benefit) of a positive outcome
• c: impression cost for a treatment
• u: uplift of applying a treatment
• n: size of the treated group
Net value of applying the recommendation:
Net = n × (u × v - c)
15
16. Dataset and Experimental setup
• The approach was implemented in Python 3.7 using
the ActionRules package for generating candidate
treatments and the CausalML package for
constructing uplift trees.
• We ran the action rule discovery algorithm on this
dataset and discovered 24 rules containing 17
distinct recommendations.
• The uplift tree was constructed for each of the 17
recommendations resulting in 8 causal rules.
• Details of the recommendations can be found in the
paper
16
BPI Challenge 2017:
• 31,509 applications (cases)
• 42,995 offers
• Outcome is whether customer selects the offer
Uncontrollable attributes:
• Application Type
• Loan Goal
• Credit Score
• Requested Amount
Controllable attributes:
• Number of Offers
• Number of Payback Terms
• Monthly Cost
• Initial Withdrawal Amount
17. Comparison of Results
• Increase number of term
from 48 months to more than 120 months
for customers whose credit scores are between 899 and 943 and their first withdrawal amount
is less than 8,304.
• Decrease the duration of the process
Delays are mostly due to unresponsive customers
17
18. Future Work
• Incorporating other types of recommendation
• Addressing other goals, such as reduction of cycle time and waste
• Adjusting for unobserved confounding effects
• Conducting complimentary evaluations such as randomised experiments
18
19. Thank you
Any Questions?
Zahra Dasht Bozorgi
zdashtbozorg@student.unimelb.edu.au
School of Computing and Information Systems
University of Melbourne