Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Scotty: Efficient Window Aggregation with
General Stream Slicing
Berlin, October 7-9, 2019
Philipp M. Grulich
Research Ass...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Backgr...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Backgr...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Backgr...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Backgr...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workl...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Window P...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Featu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
Sco...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concu...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
0
500.000
1.000...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Acknowledgement...
Nächste SlideShare
Wird geladen in …5
×

FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing

241 Aufrufe

Veröffentlicht am

Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage.

However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance.

In this talk, we present Scotty an implementation of a general stream slicing technique for window aggregation. This technique automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. Our experiments show that Scotty outperforms alternative implementations, like the default window operator in Flink, by up to one order of magnitude.

Furthermore, we present how to use Scotty as a library in Flink, Storm, or Beam without changing the underlying Stream Processing System.

General stream slicing was first published at EDBT 2019 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-efficient-window-aggregation-with-general-stream-slicing-edbt-2019.pdf) where it received the Best Paper Award.

The Scotty library and its connectors are available as open-source (https://github.com/TU-Berlin-DIMA/scotty-window-processor) and contributions are highly welcome.

Veröffentlicht in: Daten & Analysen
  • Loggen Sie sich ein, um Kommentare anzuzeigen.

FlinkForward Berlin 2019 - Scotty: Efficient Window Aggregation with General Stream Slicing

  1. 1. Scotty: Efficient Window Aggregation with General Stream Slicing Berlin, October 7-9, 2019 Philipp M. Grulich Research Associate (TU Berlin) Jonas Traub Research Associate (TU Berlin)
  2. 2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 2
  3. 3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 2
  4. 4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53 2
  5. 5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 2
  6. 6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  7. 7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  8. 8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) 4
  9. 9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) 4
  10. 10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) 4
  11. 11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 4
  12. 12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 5
  13. 13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example The number of slices depends on the workload. 6
  14. 14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 7
  15. 15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 8
  16. 16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 9
  17. 17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 10 We store partial aggregates instead of all tuples. => Small memory footprint.
  18. 18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 11 We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
  19. 19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 12
  20. 20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example We require just a few computation steps to calculate final aggregates. => Low latency. 13
  21. 21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing 14
  22. 22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics 14
  23. 23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  24. 24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  25. 25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  26. 26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  27. 27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility General Stream Slicing combines generality and efficiency in a single solution. 14
  28. 28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 15
  29. 29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 15
  30. 30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples. 15
  31. 31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 15
  32. 32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12 15
  33. 33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order? 15
  34. 34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 15
  35. 35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  36. 36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  37. 37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 15
  38. 38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12 15
  39. 39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 15
  40. 40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5 15
  41. 41. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5 15
  42. 42. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 15
  43. 43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible? 15
  44. 44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 16
  45. 45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: 17
  46. 46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. 17
  47. 47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. 17
  48. 48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. 17
  49. 49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. 17
  50. 50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order 17
  51. 51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order Connectors: …more coming soon… 17
  52. 52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  53. 53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  54. 54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  55. 55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  56. 56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core Scotty adapts to work load characteristics and combines generality and efficiency in a single solution. 18
  57. 57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam Throughput(Tuples/sec.) Number of Councurrent Windows 19
  58. 58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam ● With overlapping concurrent windows, the throughput drops drastically. Throughput(Tuples/sec.) Number of Councurrent Windows 19
  59. 59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink+Scotty Storm+Scotty Beam+Flink+Scotty Benchmark Concurrent Windows with Scotty: ● With Scotty, the throughput is independent of the number of concurrent windows. 20 Throughput(Tuples/sec.) Number of Councurrent Windows
  60. 60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 21
  61. 61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 2. Add Scotty to your Flink Project: 21
  62. 62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 22
  63. 63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 2. Add Window Definitions 22
  64. 64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 3. Add Scotty to your Flink Job 2. Add Window Definitions 22
  65. 65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP). Scotty Window Processor Scotty Features: ● One window operator for many systems. ● High performance with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics tu-berlin-dima.github.io/ scotty-window-processor Open Source Repository: 23

×