Keynote talk by Marlon Dumas at the SIMULTECH 2021 conference. The talk gives an overview of ongoing research on automated construction of simulation models / digital twins from business process execution logs, including approaches that combine discrete event simulation with deep learning methods.
Predictive Analysis for Loan Default Presentation : Data Analysis Project PPT
Mine Your Simulation Model: Automated Discovery of Business Process Simulation Models from Execution Data
1. Marlon Dumas
Professor @ University of Tartu
Co-founder @ Apromore
With contributions by Manuel Camargo and Oscar González-Rojas
SIMULTECH’2021, 7 July 2021
8. • The simulated logs are analyzed and compared using dashboards displaying
performance measures such as case duration, waiting time, cost, etc. and/or via heat
maps or animations
D
9. Business Process Simulation: Assumptions
The process model is authoritative (always followed to the letter)
• No deviations
• No workarounds
The simulation parameters accurately reflect reality
• …whereas in reality, they are often guesstimates
A resource only works on one task instance at a time / a task is performed by one resource
• No multi-tasking / no multi-resource tasks (teamwork)
Resources have robotic behavior (eager resources consume work items in FIFO mode)
• No batching
• No tiredness effects, no interruptions, no distractions beyond “stochastic” ones
Undifferentiated resources
• Every resource in a pool has the same performance as others
No time-sharing outside the simulated process
• Resources fully dedicated to one process 9
10. End Result
Business process simulations based
on incomplete models,
guesstimates, and simplifying
assumptions are not faithful
adoption of business process
simulation is disappointing
0
13. Given
• one or more business processes, for which we
have:
• one or more process specifications and/or
• event logs generated by the execution of the
processes on top of one or more information
systems.
• one or more process performance measures of
interest (e.g. cycle time, resource cost)
• One or more changes to the process (interventions)
Predict
• Predict the values of the process performance
measures after the given interventions.
14. Non-Functional Requirements
15
Predictions accurate.
Accuracy may be measured e.g. via an error
between the predicted and the actual
performance measures after intervention.
Predictions should be accompanied by a
reliability estimate. In most cases, the
reliability is high.
Reliability could be captured, e.g. by
confidence intervals
15. Stochastic Process Model Discovery
Event log
Simod
Control-Flow Discovery
(BPMN Model)
• Filtering threshold
• Parallelism threshold
Log repair & branching
probability discovery
Trace alignment +
replay to determine
how many times each
sequence flow is
traversed
Parameter discovery
• Resource pools
• Pool-task assignment
• Resource timetables
• Multi-tasking behavior
• Interarrival distribution
• Activities durations distribution
Simulation
Simulator
Optimizer
Simulation Parameters Discovery
SplitMiner
19. - Assumes undifferentiated resources with robotic
behavior
- Branches are selected using branching probabilities
Deep Learning Sequence Gen Methods
- Does not explicitly take into account resource
constraints
- Models resource availability via neural networks that
may capture non-linear availability functions
- Learns the case arrival process from data (univariate
or multivariate models)
- May take as input a process specification (helps with
interpretability)
- Models the case creation process via a probability
distribution
- Takes into account resource constraints
- No interpretable process specification
- Branching behavior modeled via neural networks (e.g.
LSTM) that may capture complex relations
- Models resource availability as calendars (possibly
discovered from historical data)
- May capture differentiated resources and robotic
behavior
Data-Driven (Discrete Event) Simulation
- Provides a natural mechanism for capturing the effect
of changes to the process
- Does not have a mechanism for capturing the effect
of changes to the process
22. • DDS Models (SIMOD) and DL
models have comparable
performance w.r.t. control-flow
similarity (CLFS)
• DL models sometimes clearly
outperform DDS models on
temporal metrics (MAE, ELS)
Could we combine them?
23. Introducing DeepSimulator
DeepSimulator
Phase 1: Activity sequence generation
Stochastic process model discovery
• Control-flow model discovery
• Trace alignment (log repair)
• Branching probabilities discovery
Model eval and
optimization
Activity sequences generation
Phase 2: Case start-times generation
Time-series analysis
• Prophet model training
Case start-times generation
Model eval and
optimization
Phase 3: Activity timestamps generation
Activity timestamps generator
• Log replay
• Feature engineering / embeddings
• Model training
Output log assembly
Model eval and
optimization
24.
25. Given
• one or more event logs recording the
execution of one or more processes
• one or more performance measures that we
seek to maximize/minimize
• a process model, decision rules and resource
allocation rules
• a set of allowed changes to the process
model and associated rules
Find
• Possible sets of changes to the process to
optimize the performance measures
26.
27.
28. LSTM {tanh, selu}
Dense {linear}
Concatenate
{50, 100}
LSTM {tanh, selu}
{50, 100}
{current act processing or waiting until next act}
Embedded
activities
category
processing or
waiting time
contextual
features
inter-case
features
30. Remove activity
Train DSIM
baseline model
List of
changes
Update model
Generate
syntethic
examples
Update
embeddings
Replace
embeddings of
generative models
Evaluator
Generate log Generate log
Simulate
Simulation
model
Simulation
model modified
Simulate
change
Train DSIM
modified model
BASELINE MODEL UPDATED MODEL
31. Results
34
Dataset Version MAE RMSE MAPE
D1
As-is 7155 22006 0.25497
What-if 17546 33137 0.42854
D2
As-is 283061 357717 0.25912
What-if 1040344 1052255 0.95599
Accuracy of “As is” simulation model vs Accuracy of “What-if”
simulation model after adding an activity
32. References
Limitations and pitfalls of traditional BP simulation
• van der Aalst: Business Process Simulation Survival Guide. In Handbook on Business
Process Management Vol. 1, 2015, 337-370
Data-Driven Simulation (discovering simulation models from logs)
• Rozinat et al. Discovering simulation models. Inf. Syst. 34(3): 305-327 (2009)
• Martin et al. The Use of Process Mining in Business Process Simulation Model Construction
- Structuring the Field. Bus. Inf. Syst. Eng. 58(1): 73-87
• Camargo et al. Automated discovery of business process simulation models from event
logs. Decis. Support Syst. 134:113284, 2020 https://arxiv.org/abs/2009.03567
• Pourbafrani et al. Extracting Process Features from Event Logs to Learn Coarse-Grained
Simulation Models. CAiSE 2021: 125-140
Data-Driven Simulation and Deep Learning
• Camargo et al. Discovering Generative Models from Event Logs: Data-driven Simulation vs
Deep Learning, 2020 https://arxiv.org/abs/2009.03567
• Camargo et al. Learning Accurate Business Process Simulation Models from Event Logs via
Automated Process Discovery and Deep Learning. Arxiv.org (2021)
https://arxiv.org/abs/2103.11944