Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage.
However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance.
In this talk, we present Scotty an implementation of a general stream slicing technique for window aggregation. This technique automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. Our experiments show that Scotty outperforms alternative implementations, like the default window operator in Flink, by up to one order of magnitude.
Furthermore, we present how to use Scotty as a library in Flink, Storm, or Beam without changing the underlying Stream Processing System.
General stream slicing was first published at EDBT 2019 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-efficient-window-aggregation-with-general-stream-slicing-edbt-2019.pdf) where it received the Best Paper Award.
The Scotty library and its connectors are available as open-source (https://github.com/TU-Berlin-DIMA/scotty-window-processor) and contributions are highly welcome.
Mapping the pubmed data under different suptopics using NLP.pptx
FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing
1. Scotty: Efficient Window Aggregation with
General Stream Slicing
Berlin, October 7-9, 2019
Philipp M. Grulich
Research Associate (TU Berlin)
Jonas Traub
Research Associate (TU Berlin)
2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
2
3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
2
4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
53
2
5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
2
6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
4
9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
4
10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
4
11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
Scotty Window Processor:
Efficent Window Aggregations for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
4
12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
5
13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
The number of slices depends on the workload.
6
14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
7
15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
8
16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
9
17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
10
We store partial aggregates instead of all tuples. => Small memory footprint.
18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
11
We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
12
20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
We require just a few computation steps to calculate final aggregates. => Low latency.
13
21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
14
22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
14
23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
General Stream Slicing combines generality and efficiency in a single solution.
14
28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
15
29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
15
30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling
window with a length of 5
tuples.
15
31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
15
32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
11 13 12
15
33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?
15
34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
15
35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
15
38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
13 12
15
39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
15
40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
5
15
43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?
15
44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Window Processor:
Efficent Window Aggregations
for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
16
45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
17
46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
17
47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
17
48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
17
49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
17
50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
17
51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
Connectors:
…more coming soon…
17
52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
Scotty adapts to work load characteristics
and combines generality and efficiency in a single solution.
18
57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
● With overlapping
concurrent windows,
the throughput drops
drastically.
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink+Scotty Storm+Scotty Beam+Flink+Scotty
Benchmark
Concurrent Windows with Scotty:
● With Scotty, the throughput
is independent of the
number of concurrent
windows.
20
Throughput(Tuples/sec.)
Number of Councurrent Windows
60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
21
61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
2. Add Scotty to your Flink Project:
21
62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
22
63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
2. Add Window Definitions
22
64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
3. Add Scotty to your Flink Job
2. Add Window Definitions
22
65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP).
Scotty Window Processor
Scotty Features:
● One window operator for many systems.
● High performance with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics
tu-berlin-dima.github.io/
scotty-window-processor
Open Source Repository:
23