Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.
Philipp M. Grulich (TU Berlin) & Jonas Traub (TU Berlin)
Scotty: Efficient Window Aggregation
for your Stream Processing S...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing ...
Nächste SlideShare
Wird geladen in …5
×

code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Processing System

35 Aufrufe

Veröffentlicht am

This presentation was held at Code.Talks 2019 in Hamburg.
A video is available at: https://www.youtube.com/watch?v=K1y5dJvP1jM

Window aggregation is a core operation in data stream processing.
Stream Processing Systems, like Flink or Storm, implement general aggregation techniques which perform poorly under specific workloads (e.g. Sliding Windows).

To this end, we present Scotty, a new highly-efficient window operator.
Scotty exploits specific workload properties such as the type of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. This allows Scotty to outperform systems like Flink by up to one order of magnitude.

The structure of this talk is threefold:
First, we give an introduction to the semantics and implementations of window aggregations in modern Stream Processing Systems.
Second, we discuss the design of Scotty and show why Scotty is able to outperform the default window operators of many stream processing systems.
Third, we give a hands-on introduction to Scotty and demonstrate how it can be integrated into standard Flink, Storm, or Beam stream processing pipelines.

Scotty and its connectors are available as open-source (https://github.com/TU-Berlin-DIMA/scotty-window-processor) and contributions are highly welcome.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Processing System

  1. 1. Philipp M. Grulich (TU Berlin) & Jonas Traub (TU Berlin) Scotty: Efficient Window Aggregation for your Stream Processing System Big Data Track
  2. 2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System About Us Philipp M. Grulich Research Associate (TU Berlin) grulich@tu-berlin.de Jonas Traub Research Associate (TU Berlin) jonas.traub@tu-berlin.de 2
  3. 3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System About Us Philipp M. Grulich Research Associate (TU Berlin) grulich@tu-berlin.de Jonas Traub Research Associate (TU Berlin) jonas.traub@tu-berlin.de Database Systems and Information Management Research Group at TU Berlin www.dima.tu-berlin.de 2
  4. 4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Stream Processing Systems Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org 3
  5. 5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Stream Processing Systems Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org 3
  6. 6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 4
  7. 7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 4
  8. 8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 4 53
  9. 9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53 4
  10. 10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 4
  11. 11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 4
  12. 12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation Arithmetic Operations Sum, min, max etc. Statistics / Analysis Reservoir Sampling ML Model Updates Concept Drift Detection Aggregation Examples: 4
  13. 13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Apache Flink - Stateful Stream Processing “Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.” (flink.apache.org) 5
  14. 14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Motivation 6
  15. 15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM 2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 7 Research Background
  16. 16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 8 Stream Slicing Example
  17. 17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System The number of slices depends on the workload. 9 Stream Slicing Example
  18. 18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 10 Stream Slicing Example
  19. 19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 11 Stream Slicing Example
  20. 20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 12 Stream Slicing Example
  21. 21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 13 We store partial aggregates instead of all tuples. => Small memory footprint. Stream Slicing Example
  22. 22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 14 We assign each tuple to exactly one slice. => O(1) per-tuple complexity. Stream Slicing Example
  23. 23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 15 Stream Slicing Example
  24. 24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System We require just a few computation steps to calculate final aggregates. => Low latency. 16 Stream Slicing Example
  25. 25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 17 Workload Characteristics
  26. 26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility General Stream Slicing combines generality and efficiency in a single solution. 17 Workload Characteristics
  27. 27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 18
  28. 28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples. 18
  29. 29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 18
  30. 30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12 18
  31. 31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order? 18
  32. 32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 18
  33. 33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 18
  34. 34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 18
  35. 35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 18
  36. 36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12 18
  37. 37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 18
  38. 38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5 18
  39. 39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5 18
  40. 40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 18
  41. 41. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible? 18
  42. 42. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 19
  43. 43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order 20
  44. 44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order Connectors: …more coming soon… 20
  45. 45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  46. 46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  47. 47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  48. 48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  49. 49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core Scotty adapts to work load characteristics and combines generality and efficiency in a single solution. 21
  50. 50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam Throughput(Tuples/sec.) Number of Councurrent Windows 22
  51. 51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam ● With overlapping concurrent windows, the throughput drops drastically. Throughput(Tuples/sec.) Number of Councurrent Windows 22
  52. 52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink+Scotty Storm+Scotty Beam+Flink+Scotty Benchmark Concurrent Windows with Scotty: ● With Scotty, the throughput is independent of the number of concurrent windows. 23 Throughput(Tuples/sec.) Number of Councurrent Windows
  53. 53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Clone Scotty and install to maven 24
  54. 54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Clone Scotty and install to maven 2. Add Scotty to your Flink Project: 24
  55. 55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 25
  56. 56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 2. Add Window Definitions 25
  57. 57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 3. Add Scotty to your Flink Job 2. Add Window Definitions 25
  58. 58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 26
  59. 59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions Example: • Average -> Sum/Count 27
  60. 60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 5 3|3 Input Stream: Output Stream: PartialState Example: • Average -> Sum/Count 28
  61. 61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 3|3 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  62. 62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 2. combine 3|38|4 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  63. 63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 2. combine 2 2 3. lower 3|38|4 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  64. 64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects 29
  65. 65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf 29
  66. 66. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects Agora: An Open Ecosystem for Democratizing Data Science & Artificial Intelligence The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf 29
  67. 67. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects Agora: An Open Ecosystem for Democratizing Data Science & Artificial Intelligence The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf We are hiring! Research Associates / PhD Students & Post Docs (m/w/d) Catch us after the talks or send a mail to jobs@dima.tu-berlin.de 29
  68. 68. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP). Scotty Features: ● One window operator for many systems. ● High performance with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics tu-berlin-dima.github.io/ scotty-window-processor Open Source Repository: 30 Scotty Window Processor

×