Introduction to Business Process Monitoring and Process Mining

Introduction to Business
Process Monitoring and
Process Mining
Marlon Dumas
University of Tartu, Estonia
marlon.dumas@ut.ee

1. Any process is better than no process
2. A good process is better than a bad process
3. Even a good process can be improved
4. Any good process eventually becomes a bad process
• […unless continuously cared for]
• Michael Hammer
Back to basics…
3

Part 1: Techniques for
Business Process
Monitoring
5

Business Process Monitoring
Dashboards & reports
Process miningEvent
stream
DB logs
Event
log

Process
Dashboards
Operational
dashboards
(runtime)
Tactical
dashboards
(historical)
Strategic
dashboards
(historical)
Types of process dashboards

Operational process dashboards
• Aimed at process workers & operational managers
• Emphasis on monitoring (detect-and-respond), e.g.:
- Work-in-progress
- Problematic cases – e.g. overdue/at-risk cases
- Resource load

• Aimed at process owners / managers
• Emphasis on analysis and management
• E.g. detecting bottlenecks
• Typical process performance indicators
• Cycle times
• Error rates
• Resource utilization
Tactical dashboards

Tactical Performance Dashboard
@ Australian Insurer

• Aimed at executives & managers
• Emphasis on linking process performance to strategic
objectives
Strategic dashboards

Manage
Unplanned
Outages
Manage
Emergencies &
Disasters
Manage Work
Programming &
Resourcing
Manage
Procurement
Customer
Satisfaction
0.5 0.55 - 0.2
Customer
Complaint
0.6 - - 0.5
Customer
Feedback
0.4 - - 0.8
Connection Less
Than Agreed Time
0.3 0.6 0.7 -
Key Performance
Process
Strategic Performance Dashboard
@ Australian Utilities Provider

Process: Manage Emergencies & Disasters
Process: Manage Procurement
Process: Manage Unplanned Outages
Overall Process Performance
Financial People
Customer
Excellence
Operational
Excellence
Risk
Management
Health
& Safety
Customer
Satisfaction
Customer
Complaint
Customer
Rating (%)
Customer
Loyalty Index
Average Time
Spent on Plan
1st Layer
Key Result
Area
2nd Layer
Key Performance
Satisfied
Customer Index
Market
Share (%)
3rd & 4th Layer
Process Performance
Measures
0.65
0.6 0.7
0.7 0.6 0.8
0.4 0.8
0.5 0.4 0.5 0.8 0.4
0.54
0.58
0.67

Sketch operational and tactical process monitoring
dashboards for CVS Pharmacy’s prescription
fulfillment process.
Consider the viewpoints of each stakeholder in the
process.
Teamwork

Process Mining
17
/
event log
discovered model
Process
Discovery
Conformance
Checking
Variants
Analysis
Difference
diagnostics
Performance
Mining
input model
Enhanced model
event log’

Event logs structure: minimum
requirements
Concrete formats:
• Comma-Separated Values (CSV)
• XES (XML format)

Automated Process Discovery
19
Enter Loan
Application
Retrieve
Applicant
Data
Compute
Installments
Approve
Simple
Application
Approve
Complex
Application
Notify
Rejection
Notify
Eligibility
CID Task Time Stamp …
13219 Enter Loan Application 2007-11-09 T 11:20:10 -
13219 Retrieve Applicant Data 2007-11-09 T 11:22:15 -
13220 Enter Loan Application 2007-11-09 T 11:22:40 -
13219 Compute Installments 2007-11-09 T 11:22:45 -
13219 Notify Eligibility 2007-11-09 T 11:23:00 -
13219 Approve Simple Application 2007-11-09 T 11:24:30 -
13220 Compute Installements 2007-11-09 T 11:24:35 -
… … … …

Process Mining Tools
Open-source
• Apromore
• ProM
• bupaR
Lightweight
• Disco
Mid-range
• Minit
• myInvenio
• ProcessGold
• QPR Process
Analyzer
• Signavio Process
Intelligence
• StereoLOGIC
Discovery Analyst
Heavyweight
• ARIS Process
Performance Manager
• Celonis Process Mining
• Perceptive Process
Mining (Lexmark)
• Interstage Process
Discovery (Fujitsu)
20

Process Maps
• A process map of an event log is a graph where:
• Each activity is represented by one node
• An arc from activity A to activity B means that B is directly
followed by A in at least one trace in the log
• Arcs in a process map can be annotated with:
• Absolute frequency: how many times B directly follows A?
• Relative frequency: in what percentage of times when A is
executed, it is directly followed by B?
• Time: What is the average time between the occurrence of A
and the occurrence of B?
22

Process Maps – Example
23
Event log:
10: a,b,c,g,e,h
10: a,b,c,f,g,h
10: a,b,d,g,e,h
10: a,b,d,e,g,h
10: a,b,e,c,g,h
10: a,b,e,d,g,h
10: a,c,b,e,g,h
10: a,c,b,f,g,h
10: a,d,b,e,g,h
10: a,d,b,f,g,h

Process Maps – Exercise
Case
ID Task Name Originator Timestamp
Case
ID Task Name Originator Timestamp
1 File Fine Anne 20-07-2004 14:00:00 3 Reminder John 21-08-2004 10:00:00
2 File Fine Anne 20-07-2004 15:00:00 2 Process Payment system 22-08-2004 09:05:00
1 Send Bill system 20-07-2004 15:05:00 2 Close case system 22-08-2004 09:06:00
2 Send Bill system 20-07-2004 15:07:00 4 Reminder John 22-08-2004 15:10:00
3 File Fine Anne 21-07-2004 10:00:00 4 Reminder Mary 22-08-2004 17:10:00
3 Send Bill system 21-07-2004 14:00:00 4 Process Payment system 29-08-2004 14:01:00
4 File Fine Anne 22-07-2004 11:00:00 4 Close Case system 29-08-2004 17:30:00
4 Send Bill system 22-07-2004 11:10:00 3 Reminder John 21-09-2004 10:00:00
1
Process
Payment system 24-07-2004 15:05:00 3 Reminder John 21-10-2004 10:00:00
1 Close Case system 24-07-2004 15:06:00 3 Process Payment system 25-10-2004 14:00:00
2 Reminder Mary 20-08-2004 10:00:00 3 Close Case system 25-10-2004 14:01:00
24

Process Maps in Disco
• Disco (and other commercial process mining tools) use
process maps as the main visualization technique for
event logs
• These tools also provide three types of operations:
1. Abstract the process map:
• Show only most frequent activities
• Show only most frequent arcs
2. Filter the traces in the event log…
25

Types of filters
• Event filters
• Retain only events that fulfil a given condition (e.g. all events
of type “Create purchase order”)
• Performance filter
• Retain traces that have a duration above or below a given
value
• Event pair filter (a.k.a. “follower” filter)
• Retain traces where there is a pair of events that fulfil a given
condition (e.g. “Create invoice” followed by “Create purchase
order”)
• Endpoint filter
• Retain traces that start with or finish with an event that fulfils
a given condition
26

Process Maps in Disco
• Disco (and other commercial process mining tools) use
process maps as the main visualization technique for
event logs
• These tools also provide three types of operations:
1. Abstract the process map:
• Show only most frequent activities
• Show only most frequent arcs
2. Filter the traces in the event log
3. Enhance the process map
27

Process Map Enhancement
• Nodes and arcs in a process map can be color-
coded or thickness-coded to capture:
• Frequency: How often a given task or a given directly-
follows relation occurs?
• Time performance: processing times, waiting times,
cycles times of tasks
• More advanced tools support enhancement by other
attributes, e.g. cost, revenue, etc. if the data is available.
28

Using Disco, answer the following questions on the
PurchasingExample log:
• How many cases had to settle a dispute with the
purchasing agent?
• Is there a difference in cycle time for the cases that
had to settle a dispute with the purchasing agent,
compared to the ones that did not? Make sure you
only compare cases that actually reach the endpoint
‘Pay invoice’
• Are there any cases where the invoice is released and
authorized by the same resource? And if so, who is
doing this most often?
Exercise
Exercise by Anne Rozinat, Fluxicon

Consider the dataset of a refund process from an electronics manufacturer.
Customer complaints and the inspection of individual cases indicate that this
process suffers from inefficiencies and overly long cycle times. Assume that only
cases that have reached the ‘Order completed’ event are finished.
Questions:
1. Is it a problem if you take the average cycle time of all cases, also the ones
that have not finished yet?
2. In general, which channel(s) have the biggest problems with missing
documents that need to be requested from the customer?
3. How many customers have received a refund without the product being
received by the electronics manufacturing company? This should not happen
in this process.
4. Has a customer ever received a double payment? This should not happen in
this process.
To complete this exercise use the log of RefundProcess.fbt
One more exercise: Refund process

Process Maps - Limitations
• Process maps over-generalize: some paths of a
process map might not exist and might not make
sense
• Example: Draw the process map of [ abc, adc, afce, afec ]
and check which traces it can recognize, for which there is
no support in the event log.
• Process maps make it difficult to distinguish
conditional branching, parallelism, and loops.
• See previous example… or a simpler one: [abcd, acbd]
• Solution: automated BPMN process discovery
• More on this tomorrow…
33

Process Mining
34
/
event log
discovered model
Process
Discovery
Conformance
Checking
Variants
Analysis
Difference
diagnostics
Performance
Mining
input model
Enhanced model
event log’

• Dotted charts
• One line per trace, each line contains points, one point per event
• Each event type is mapped to a colour
• Position of the point denotes its occurrence time (in a relative scale)
• Birds-eye view of the timing of different events (e.g. activity end times), but does
not allow one to see the “processing” times of activities
• Timeline diagrams
• One line per trace, each line contains segments capturing the start and end of tasks
• Captures process time (unlike dotted charts)
• Not scalable for large event logs – good to show “representative” traces
• Performance-enhanced process maps
• Process maps where nodes are colour-coded w.r.t a performance measure. Nodes
may represent activities (default option)
• But they may represent resources and then arcs denote hands-offs between
resources
Process Performance Mining

Timeline diagram
See: http://timelines.nirdizati.org

Performance-enhanced process map
Nodes are activities (default)
Screenshot of Disco

Performance-enhanced process map
Nodes are tasks (handoff graph)
Screenshot of MyInvenio

Exercise
• Consider the following event log of a telephone
repair process: http://tinyurl.com/repairLogs
• What are the bottlenecks in this process?
• Which task has the longest waiting time and which one
has the longest processing time?
40

Process Mining
41
/
event log
discovered model
Process
Discovery
Conformance
Checking
Variants
Analysis
Difference
diagnostics
Performance
Mining
input model
Enhanced model
event log’

Given two logs, find the differences and root causes for
variation or deviance between the two logs
Variants Analysis
≠

Case Study: Variants Analysis at Suncorp
OK
OK Good
Bad Expected
Performance
Line

Simple claims and quick Simple claims and slow
Variants Analysis via Process Map
Comparison
?
S. Suriadi et al.: Understanding Process Behaviours in a Large Insurance Company in Australia: A Case Study. CAiSE 2013

Variants analysis - Exercise
We consider a process for handling health insurance claims, for which
we have extracted two event logs, namely L1 and L2. Log L1 contains
all the cases executed in 2011, while L2 contains all cases executed in
2012. The logs are available in the book’s companion website or
directly at: http://tinyurl.com/InsuranceLogs
Based on these logs, answer the following questions using a process
mining tool:
1. What is the cycle time of each log?
2. Where are the bottlenecks (highest waiting times) in each of the
two logs and how do these bottlenecks differ?
3. Describe the differences between the frequency of tasks and the
order in which tasks are executed in 2011 (L1) versus 2012 (L2).
Hint: If you are using process maps, you should consider using the
abstraction slider in your tool to hide some of the most
infrequent arcs so as to make the maps more readable
45

Process Mining
46
/
event log
discovered model
Process
Discovery
Conformance
Checking
Variants
Analysis
Difference
diagnostics
Performance
Mining
input model
Enhanced model
event log’

Conformance Checking:
Unfitting vs. Additional Behavior
Unfitting behaviour:
• Task C is optional (i.e. may be skipped) in the log
Additional behavior:
• The cycle including IGDF is not observed in the log
Event log:
ABCDEH
ACBDEH
ABCDFH
ACBDFH
ABDEH
ABDFH

Conformance Checking in Apromore
49
Full demo at:
https://www.youtube.com/watch?v=3d00pORc9X8

Open-source tools: Apromore
(apromore.org)• Open-source BPM analytics platform as Software as a Service
• Focus is on end users (business analytics and operations managers), not on data
scientists
• Over 40 plugins
!
!

Key features
• Repository of process models and event logs (BPMN, AML, XPDL, EPML, AML, YAWL, XES, MXML)
• Offers a range of features along the BPM lifecycle:
From logs
• Automated discovery of BPMN models
• Filter noise from log
• Visualize log
• Mine process stages
From models
• Structure model
On logs
• Animate logs
• Compare model-log, log-log
• Detect and characterize drifts
• Measure log complexity
• Mine process performance
On models
• Measure model complexity
• Compare model-model
• Detect clones
• Search similar models
• Simulate model
On models
• Merge model variants
• Configure model with
questionnaire
From logs
• Animate logs
• Compare model-log, log-log
• Detect and characterize drifts
• Mine process performance
• Predict outcomes and
performance (via Nirdizati)

Access Apromore
You can access it in the cloud or download and install a standalone version
Cloud-version
• Node 1(Estonia): http://apromore.cs.ut.ee
• Node 2 (Australia): http://apromore.qut.edu.au
Standalone
• One-click: a lightweight version of Apromore. Simply unzip and run from
localhost
• Full-fledged: for developers and advanced users, this distribution gives you
full control over Apromore
Source code
• Apromore’s source code is open-source, licensed under LGPL 3.0
• The code can be accessed from GitHub

ProM: the very first process mining tool
• 600+ plug-ins available for the whole process mining
spectrum
• Open source license
• Download it from www.processmining.org

Nirdizati: predictive process monitoring
(nirdizati.com)• Predict process outcome (e.g. “Is this loan offer going to be rejected?”)
• Predict process performance (e.g. “Will this claim take longer than 5 days to
be handled?”)
• Predict future events (e.g. “What activity is likely to be executed next? And
after that?”)

Part 2: Process Mining
Algorithms
55

BPMN-Based Process Mining
56
/
event log
discovered model
Process
Discovery
Conformance
Checking
Variants
Analysis
Difference
diagnostics
Performance
Mining
input model
Enhanced model
event log’

58
Process
Model
Log
Unfitting
behavior (lack
of fitness)
Additional
behavior (lack
of precision)
Lack of
generalization
Accuracy of Automatically
Discovered Process Models

Accuracy of Automatically Discovered
Process Models
• Fitness: To what extent the behaviour observed in the
event log fits the process model?
• No unfitting behaviour  Fitness = 1
• Precision: How much additional behaviour the process
model allows that is not observed in the event log
• No additional behaviour  Precision = 1
• Generalization (of an algorithm): If we have a (partial)
event log of a process, to what extent the discovery
algorithm produces models that fit the behaviour of the
process that is not observed in the log
59

Measuring Fitness
• Replay
• Replay each trace against the model
• When a parsing error occurs, repair it locally
• Keep track of the “parsing error”
• Does not calculate an exact distance measure!
• Optimal Trace Alignment
• For each trace in the model t, find the trace t’ of the
process model such that the string-edit distance of t and
t’ is minimal
• Use the string-edit distances
• Calculates a “distance” between log and model
60

Conformance Checking via Replay
A B C E

B C E

C E

E

A C E

E missing token

remaining token

Accuracy of automatically
discovered process models
The accuracy of an automatically discovered process models consists of three
quality dimensions:
1. Fitness: the discovered model should allow for the behavior seen in the
event log.
 A model has a perfect fitness if all traces in the log can be replayed from the
beginning to the end.

Accuracy of process models
quality dimensions:
1. Fitness
2. Precision (avoid underfitting): the discovered model should not allow for
behavior completely unrelated to what was seen in the event log.

quality dimensions:
1. Fitness:
2. Precision (avoid underfitting)
3. Generalization (avoid overfitting): the discovered model should generalize
the example behavior seen in the event log.
Accuracy of process models

Flower model (underfitting)
L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l> 360}

Enumerating model
(overfitting)

Something in the middle…
L= { <a,b,i,j,k,l>10, <a,b,g,j,k,i,l>140, <a,f,g,j,i,k>5, <a,f,g,i,j,k,l>360}

Computing fitness: basic approach

Computing fitness: basic
approach

approach
A B I J K L

approach
B I J K L

approach
I J K L

approach
I J K L
non-conformance

approach
A “basic approach” to compute fitness is to count the fraction of cases that can be
“parsed completely” (i.e., the proportion of cases corresponding to firing sequences
leading from [start] to [end]).

approach
A “basic approach” to compute fitness is to count the fraction of cases that can be
“parsed completely” (i.e., the proportion of cases corresponding to firing sequences
leading from [start] to [end]).
Fitness = 0.97

Computing fitness: Event-based
approach
• In the simple fitness computation, we stopped replaying a trace
once we encounter a problem and mark it as non-fitting.
• An event-based approach to calculate fitness consists of just
continue replaying the trace on the model and:
• record all situations where a transition is forced to fire without being
enabled, i.e., we count all missing tokens.
• record the tokens that remain at the end.
• Use of four counters:
• p = produced tokens
• c = consumed tokens
• m = missing tokens
• r = remaining tokens

approach

approach
A B I J K L

approach
p = 1
c = 0
m = 0
r = 0
A B I J K L

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0
B I J K L

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
B I J K L
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
I J K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
I J K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
I J K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 0
m = 1
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
I J K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 0
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
J K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 0
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
K L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
L
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 1
m = 1
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 12
c = 12
m = 1
r = 1

approach
p = 12
c = 12
m = 1
r = 1

approach
p = 12
c = 12
m = 1
r = 1
Fitness = 0.9166

approach

approach
p = 13
c = 13
m = 0
r = 0

approach
p = 13
c = 13
m = 0
r = 0
Fitness = 1

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 0
m = 0
r = 0

approach
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 1
m = 0
r = 0
p = 1
c = 0
m = 0
r = 1
p = 0
c = 0
m = 0
r = 0
p = 12
c = 11
m = 0
r = 1

approach
p = 12
c = 11
m = 0
r = 1
Fitness = 0.9583

approach
p = 13
c = 13
m = 0
r = 0
Fitness = 1

Computing fitness at log level

Number of occurrences of a specific trace in
the log (e.g., if a trace σ appears 200 times in
the log, L(σ) will be equal to 200 )

p = 13
c = 13
m = 0
r = 0
p = 12
c = 12
m = 1
r = 1
p = 13
c = 13
m = 0
r = 0
p = 12
c = 11
m = 0
r = 1

p = 13
c = 13
m = 0
r = 0
p = 12
c = 12
m = 1
r = 1
p = 13
c = 13
m = 0
r = 0
p = 12
c = 11
m = 0
r = 1
Fitness = 0.998

Calculating precision
• Precision = 1  the behaviour allowed by the model
is contained or equal to the behavior in the log
• Precision close to 0  None of the behaviour in the
model is observed in the log
• Precision can be calculated as a “difference”
between a state space representing the behaviour of
the model, and a state space representing the
behaviour of the log
• Adriano Augusto et al. “Abstract-and-Compare: A Family
of Scalable Precision Measures for Automated Process
Discovery”. In Proceedings of BPM’2018
133

Measuring generalization of a process
discovery algorithm via cross-validation
L
L1
L2
L3
L4
L5

L
L1
L2
L3
L4
L5
N1
N2
N3
N4
N5
Discovery

L
L1
L2
L3
L4
L5
N1
N2
N3
N4
N5
Fitness

L
L1
L2
L3
L4
L5
N1
N2
N3
N4
N5
Avg Fitness

Automated Discovery of
BPMN Process Models
142

α-algorithm: the Origin of
Process Discovery
van der Aalst, W. M. P. and Weijters, A. J. M. M. and Maruster,
L. (2003). Workflow Mining: Discovering process models
from event logs, IEEE Transactions on Knowledge and Data
Engineering

α-algorithm
Basic Idea: Ordering relations
• Direct succession:
x>y iff for some case
x is directly followed
by y.
• Causality: xy iff
x>y and not y>x.
• Parallel: x||y iff x>y
and y>x
• Unrelated: x#y iff
not x>y and not y>x.
case 1 : task A
case 2 : task A
case 3 : task A
case 3 : task B
case 1 : task B
case 1 : task C
case 2 : task C
case 4 : task A
case 2 : task B
...
A>B
A>C
B>C
B>D
C>B
C>D
E>F
AB
AC
BD
CD
EF
B||C
C||B
ABCD
ACBD
EF

α-Algorithm
• Idea (a)
a  b


α-Algorithm
• Idea (b)
a b, a c and b # c


α-Algorithm
• Idea (c)
b d, c d and b # c


α-Algorithm
• Idea (d)
a b, a c and b || c


α-Algorithm
• Idea (e)
b d, c d and b || c


α-algorithm: Applicative Example
α(L) = ?
L= { <a,b,c,d,e,g> 3, <a,c,b,d,e,g> 2, <a,b,c,d,f,g>, <a,c,b,d,f,g> 4}

ALPHABET

ALPHABET
{a,b,c,d,e,f,g}

INITIAL ACTIVITIES

INITIAL ACTIVITIES
{a}

FINAL ACTIVITIES

FINAL ACTIVITIES
{g}

FOOTPRINTS
a b c d e f g
a # > > # # # #
b < # || > # # #
c < || # > # # #
d # < < # > > #
e # # # < # # >
f # # # < # # >
g # # # # < < #

a b c d e f g
a # > > # # # #
b < # || > # # #
c < || # > # # #
d # < < # > > #
e # # # < # # >
f # # # < # # >
g # # # # < < #
FOOTPRINTS

α(L) = ?
L= { <a,b,c,d,e,f> 4, <a,g,h,d,f,e> 2, <a,b,c,d,f,e> 3, <a,g,h,d,e,f> 3}

Limitations of alpha miner
Completeness
All possible traces of the process (model)
need to be in the log
Short loops
c>b and b>c implies c||b and b||c
instead of cb and bc
Self-loops
b>b and not b>b implies bb (impossible!)

Frequency of the Ordering Relations?

Little Thumb to Deal with Noise
van der Aalst, W. M. P. and Weijters, A. J. M. M. (2003).
Rediscovering workflow models from event-based data
using little thumb, Integrated Computer-Aided Engineering

Heuristics Miner
number between -1 and 1 indicating strength of causal dependency between a and b

Heuristics Miner
What’s the
corresponding
process model?

Automated Process Discovery
Automated
process
discovery
method
Simplicity
minimal size & structural
complexity
Precision
does not parse
traces not in the log
Fitness
parses the traces of the log
Generalization
parses traces of the
process not included in
the log
188

Process Discovery Algorithms:
The Two Worlds
High-Fitness
High-Precision
Heuristic Miner
Fodina Miner
High-Fitness
Low-Complexity

Process Model discovered with
Heuristics Miner

Inductive Miner
• Structured by construction
• Based on process tree

Process Discovery Algorithms:
The Two Worlds
High-Fitness
High-Precision
High complexity
Heuristic Miner
Fodina Miner
High-Fitness
Low Precision
Low-Complexity
Inductive Miner
Evolutionary
Tree Miner

Split Miner
Augusto, A. and Conforti, R. and Dumas, M. and La Rosa, M.
(2017). Split Miner: Discovering Accurate and Simple
Business Process Models from Event Logs. ICDM 2017

Split Miner

Process Discovery Algorithms
High-Fitness
High-Precision
Low-Complexity
Split Miner

From Event Log to Process Model in 5
Steps
196
Directly-Follows
Graph and
Loops Discovery
Filtering
Concurrency
Discovery
Splits
Discovery
Joins
Discovery
Event
Log
Process
Model

Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
197
Directly-Follows
Graph and
Loops Discovery
Concurrency
DiscoveryEvent Log Filtering
Splits
Discovery
Joins
Discovery
Process
Model

Trace #obs
a » b » c » g » e » h 10
a » b » c » f » g » h 10
a » b » d » g » e » h 10
a » b » d » e » g » h 10
a » b » e » c » g » h 10
a » b » e » d » g » h 10
a » c » b » e » g » h 10
a » c » b » f » g » h 10
a » d » b » e » g » h 10
a » d » b » f » g » h 10
198
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

199
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)

200
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

202
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model
(b || c) (b || d) (d || e) (e || g)

203
Directly-Follows
Graph and
Loops Discovery
Concurrency
Splits
Discovery
Joins
Discovery
Process
Model

Automated discovery of BPM
Process Models in Apromore
205

Process
Dashboards
Operational
dashboards
Tactical
dashboards
Strategic
dashboards
Process
Mining
Automated
process
discovery
Conformance
checking
Performance
mining
Variants
analysis
Summary

Topics not covered in this class
• Event log filtering
• Removing anomalous or infrequent behaviour from an
event log
• Business process drift detection
• Detecting changes in a business process (over time)
using event logs
• Predictive process monitoring
• Predicting the outcome or a future property of a process
based on an event log containing completed cases, and
an incomplete case
207

Thank you!
http://fundamentals-of-bpm.org
Chapter 12:
Process Monitoring

Introduction to Business Process Monitoring and Process Mining

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to Business Process Monitoring and Process Mining

Similar to Introduction to Business Process Monitoring and Process Mining (20)

More from Marlon Dumas

More from Marlon Dumas (20)

Recently uploaded

Recently uploaded (20)

Introduction to Business Process Monitoring and Process Mining