Window aggregation is a core operation in data stream processing. Existing aggregation techniques focus on reducing latency, eliminating redundant computations, and minimizing memory usage.
However, each technique operates under different assumptions with respect to workload characteristics such as properties of aggregation functions (e.g., invertible, associative), window types (e.g., sliding, sessions), windowing measures (e.g., time- or countbased), and stream (dis)order. Violating the assumptions of a technique can deem it unusable or drastically reduce its performance.
In this talk, we present Scotty an implementation of a general stream slicing technique for window aggregation. This technique automatically adapts to workload characteristics to improve performance without sacrificing its general applicability. Our experiments show that Scotty outperforms alternative implementations, like the default window operator in Flink, by up to one order of magnitude.
Furthermore, we present how to use Scotty as a library in Flink, Storm, or Beam without changing the underlying Stream Processing System.
General stream slicing was first published at EDBT 2019 (http://www.user.tu-berlin.de/powibol/assets/publications/traub-efficient-window-aggregation-with-general-stream-slicing-edbt-2019.pdf) where it received the Best Paper Award.
The Scotty library and its connectors are available as open-source (https://github.com/TU-Berlin-DIMA/scotty-window-processor) and contributions are highly welcome.