SlideShare ist ein Scribd-Unternehmen logo
1 von 68
Downloaden Sie, um offline zu lesen
Philipp M. Grulich (TU Berlin) & Jonas Traub (TU Berlin)
Scotty: Efficient Window Aggregation
for your Stream Processing System
Big Data Track
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
About Us
Philipp M. Grulich
Research Associate (TU Berlin)
grulich@tu-berlin.de
Jonas Traub
Research Associate (TU Berlin)
jonas.traub@tu-berlin.de
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
About Us
Philipp M. Grulich
Research Associate (TU Berlin)
grulich@tu-berlin.de
Jonas Traub
Research Associate (TU Berlin)
jonas.traub@tu-berlin.de
Database Systems and Information Management Research Group at TU Berlin
www.dima.tu-berlin.de
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Stream Processing Systems
Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Stream Processing Systems
Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
4
53
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
53
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
Arithmetic Operations
Sum, min, max etc.
Statistics / Analysis
Reservoir Sampling
ML Model Updates
Concept Drift Detection
Aggregation Examples:
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Apache Flink - Stateful Stream Processing
“Apache Flink is a framework and distributed processing engine for stateful computations
over unbounded and bounded data streams. Flink has been designed to run in all common cluster
environments, perform computations at in-memory speed and at any scale.” (flink.apache.org)
5
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Motivation
6
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM 2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
Scotty Window Processor:
Efficent Window Aggregations for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
7
Research Background
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 8
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
The number of slices depends on the workload.
9
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 10
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 11
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 12
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 13
We store partial aggregates instead of all tuples. => Small memory footprint.
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 14
We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 15
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
We require just a few computation steps to calculate final aggregates. => Low latency.
16
Stream Slicing Example
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
17
Workload Characteristics
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
General Stream Slicing combines generality and efficiency in a single solution.
17
Workload Characteristics
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling window
with a length of 5 tuples.
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling window
with a length of 5 tuples.
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling window
with a length of 5 tuples.
11 13 12
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
13 12
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
5
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 125 + - 3
5
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Window Processor:
Efficent Window Aggregations
for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
19
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
20
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
Connectors:
…more coming soon…
20
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Core
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Core
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Core
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Core
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Scotty Core
Scotty adapts to work load characteristics
and combines generality and efficiency in a single solution.
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
Throughput(Tuples/sec.)
Number of Councurrent Windows
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
● With overlapping
concurrent windows,
the throughput drops
drastically.
Throughput(Tuples/sec.)
Number of Councurrent Windows
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink+Scotty Storm+Scotty Beam+Flink+Scotty
Benchmark
Concurrent Windows with Scotty:
● With Scotty, the throughput is
independent of the number of
concurrent windows.
23
Throughput(Tuples/sec.)
Number of Councurrent Windows
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Using Scotty on Flink
1. Clone Scotty and install to maven
24
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Using Scotty on Flink
1. Clone Scotty and install to maven
2. Add Scotty to your Flink Project:
24
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Using Scotty on Flink
1. Initialize Scotty Window Operator
25
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Using Scotty on Flink
1. Initialize Scotty Window Operator
2. Add Window Definitions
25
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Using Scotty on Flink
1. Initialize Scotty Window Operator
3. Add Scotty to your Flink Job
2. Add Window Definitions
25
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
26
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
Example:
• Average -> Sum/Count
27
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
8|4
6 7 1 2 5
3|3
Input Stream: Output Stream:
PartialState
Example:
• Average -> Sum/Count
28
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
8|4
6 7 1 2
1. lift
5
5|1
3|3
Input Stream: Output Stream:
Example:
• Average -> Sum/Count
28
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
8|4
6 7 1 2
1. lift
5
5|1
2. combine
3|38|4
Input Stream: Output Stream:
Example:
• Average -> Sum/Count
28
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Implement your own Aggregations Functions
8|4
6 7 1 2
1. lift
5
5|1
2. combine
2
2
3. lower
3|38|4
Input Stream: Output Stream:
Example:
• Average -> Sum/Count
28
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Upcoming Research Projects
29
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Upcoming Research Projects
The NebulaStream Platform:
Data and Application Management
for the Internet of Things
https://arxiv.org/pdf/1910.07867.pdf
29
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Upcoming Research Projects
Agora:
An Open Ecosystem for Democratizing Data
Science & Artificial Intelligence
The NebulaStream Platform:
Data and Application Management
for the Internet of Things
https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf
29
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Upcoming Research Projects
Agora:
An Open Ecosystem for Democratizing Data
Science & Artificial Intelligence
The NebulaStream Platform:
Data and Application Management
for the Internet of Things
https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf
We are hiring!
Research Associates / PhD Students & Post Docs (m/w/d)
Catch us after the talks or send a mail to
jobs@dima.tu-berlin.de
29
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System
Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP).
Scotty Features:
● One window operator for many systems.
● High performance with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics
tu-berlin-dima.github.io/
scotty-window-processor
Open Source Repository:
30
Scotty Window Processor

Weitere ähnliche Inhalte

Ähnlich wie code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Processing System

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Jonas Traub
 
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
Flink Forward
 
Blue gene technology
Blue gene technologyBlue gene technology
Blue gene technology
Vivek Jha
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13
Gilles Fedak
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWR
myatom
 

Ähnlich wie code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Processing System (20)

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
Flink Forward Berlin 2018: Jonas Traub & Philipp Grulich - "Efficient Window ...
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
 
Data Streaming in IoT and Big Data Analytics
Data Streaming in  IoT and Big Data AnalyticsData Streaming in  IoT and Big Data Analytics
Data Streaming in IoT and Big Data Analytics
 
How to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC HardwareHow to Prepare Weather and Climate Models for Future HPC Hardware
How to Prepare Weather and Climate Models for Future HPC Hardware
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
 
Dynamic Data Center concept
Dynamic Data Center concept  Dynamic Data Center concept
Dynamic Data Center concept
 
Bluegene
BluegeneBluegene
Bluegene
 
The benefits of fine-grained synchronization in deterministic and efficient ...
The benefits of fine-grained synchronization in  deterministic and efficient ...The benefits of fine-grained synchronization in  deterministic and efficient ...
The benefits of fine-grained synchronization in deterministic and efficient ...
 
Blue gene technology
Blue gene technologyBlue gene technology
Blue gene technology
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402
 
Bluegene
BluegeneBluegene
Bluegene
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
AIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 BalajiAIAA Future of Fluids 2018 Balaji
AIAA Future of Fluids 2018 Balaji
 
Multicore computing
Multicore computingMulticore computing
Multicore computing
 
Distributed stream consistency checking
Distributed stream consistency checkingDistributed stream consistency checking
Distributed stream consistency checking
 
Data Integration in a Big Data Context
Data Integration in a Big Data ContextData Integration in a Big Data Context
Data Integration in a Big Data Context
 
Active Data PDSW'13
Active Data PDSW'13Active Data PDSW'13
Active Data PDSW'13
 
Low Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard PlatformLow Power High-Performance Computing on the BeagleBoard Platform
Low Power High-Performance Computing on the BeagleBoard Platform
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWR
 

Mehr von Jonas Traub

Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Jonas Traub
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Jonas Traub
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Jonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Jonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
Jonas Traub
 

Mehr von Jonas Traub (14)

Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
Definitely not Java! A Hands-on Introduction to Efficient Functional Programm...
 
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
Efficient Data Stream Processing in the Internet of Things - SoftwareCampus A...
 
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
Analyzing Efficient Stream Processing on Modern Hardware (VLDB 2019 Presentat...
 
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
Resense: Transparent Record and Replay of Sensor Data in the Internet of Thin...
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Kürzlich hochgeladen

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
AlMamun560346
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
Areesha Ahmad
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 

Kürzlich hochgeladen (20)

Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceuticsPulmonary drug delivery system M.pharm -2nd sem P'ceutics
Pulmonary drug delivery system M.pharm -2nd sem P'ceutics
 
IDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicineIDENTIFICATION OF THE LIVING- forensic medicine
IDENTIFICATION OF THE LIVING- forensic medicine
 
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)COMPUTING ANTI-DERIVATIVES(Integration by SUBSTITUTION)
COMPUTING ANTI-DERIVATIVES (Integration by SUBSTITUTION)
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
Feature-aligned N-BEATS with Sinkhorn divergence (ICLR '24)
 
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
❤Jammu Kashmir Call Girls 8617697112 Personal Whatsapp Number 💦✅.
 
GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)GBSN - Microbiology (Unit 2)
GBSN - Microbiology (Unit 2)
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Alandi Call Me 7737669865 Budget Friendly No Advance Booking
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptxPSYCHOSOCIAL NEEDS. in nursing II sem pptx
PSYCHOSOCIAL NEEDS. in nursing II sem pptx
 
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verifiedConnaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
Connaught Place, Delhi Call girls :8448380779 Model Escorts | 100% verified
 
GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)GBSN - Microbiology (Unit 3)
GBSN - Microbiology (Unit 3)
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 

code.talks 2019 - Scotty: Efficient Window Aggregation for your Stream Processing System

  • 1. Philipp M. Grulich (TU Berlin) & Jonas Traub (TU Berlin) Scotty: Efficient Window Aggregation for your Stream Processing System Big Data Track
  • 2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System About Us Philipp M. Grulich Research Associate (TU Berlin) grulich@tu-berlin.de Jonas Traub Research Associate (TU Berlin) jonas.traub@tu-berlin.de 2
  • 3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System About Us Philipp M. Grulich Research Associate (TU Berlin) grulich@tu-berlin.de Jonas Traub Research Associate (TU Berlin) jonas.traub@tu-berlin.de Database Systems and Information Management Research Group at TU Berlin www.dima.tu-berlin.de 2
  • 4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Stream Processing Systems Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org 3
  • 5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Stream Processing Systems Souce: Rajaraman, A., & Ullman, J. D. (2012). Mining of massive datasets (Vol. 77). Cambridge University Press. Chapter 4, www.mmds.org 3
  • 6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 4
  • 7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 4
  • 8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 4 53
  • 9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53 4
  • 10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 4
  • 11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 4
  • 12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation Arithmetic Operations Sum, min, max etc. Statistics / Analysis Reservoir Sampling ML Model Updates Concept Drift Detection Aggregation Examples: 4
  • 13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Apache Flink - Stateful Stream Processing “Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.” (flink.apache.org) 5
  • 14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Motivation 6
  • 15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM 2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 7 Research Background
  • 16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 8 Stream Slicing Example
  • 17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System The number of slices depends on the workload. 9 Stream Slicing Example
  • 18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 10 Stream Slicing Example
  • 19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 11 Stream Slicing Example
  • 20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 12 Stream Slicing Example
  • 21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 13 We store partial aggregates instead of all tuples. => Small memory footprint. Stream Slicing Example
  • 22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 14 We assign each tuple to exactly one slice. => O(1) per-tuple complexity. Stream Slicing Example
  • 23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 15 Stream Slicing Example
  • 24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System We require just a few computation steps to calculate final aggregates. => Low latency. 16 Stream Slicing Example
  • 25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 17 Workload Characteristics
  • 26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility General Stream Slicing combines generality and efficiency in a single solution. 17 Workload Characteristics
  • 27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 18
  • 28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples. 18
  • 29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 18
  • 30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12 18
  • 31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order? 18
  • 32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 18
  • 33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 18
  • 34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 18
  • 35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 18
  • 36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12 18
  • 37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 18
  • 38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5 18
  • 39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5 18
  • 40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 18
  • 41. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible? 18
  • 42. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 19
  • 43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order 20
  • 44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order Connectors: …more coming soon… 20
  • 45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  • 46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  • 47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  • 48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core 21
  • 49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Scotty Core Scotty adapts to work load characteristics and combines generality and efficiency in a single solution. 21
  • 50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam Throughput(Tuples/sec.) Number of Councurrent Windows 22
  • 51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam ● With overlapping concurrent windows, the throughput drops drastically. Throughput(Tuples/sec.) Number of Councurrent Windows 22
  • 52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink+Scotty Storm+Scotty Beam+Flink+Scotty Benchmark Concurrent Windows with Scotty: ● With Scotty, the throughput is independent of the number of concurrent windows. 23 Throughput(Tuples/sec.) Number of Councurrent Windows
  • 53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Clone Scotty and install to maven 24
  • 54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Clone Scotty and install to maven 2. Add Scotty to your Flink Project: 24
  • 55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 25
  • 56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 2. Add Window Definitions 25
  • 57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Using Scotty on Flink 1. Initialize Scotty Window Operator 3. Add Scotty to your Flink Job 2. Add Window Definitions 25
  • 58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 26
  • 59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions Example: • Average -> Sum/Count 27
  • 60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 5 3|3 Input Stream: Output Stream: PartialState Example: • Average -> Sum/Count 28
  • 61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 3|3 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  • 62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 2. combine 3|38|4 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  • 63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Implement your own Aggregations Functions 8|4 6 7 1 2 1. lift 5 5|1 2. combine 2 2 3. lower 3|38|4 Input Stream: Output Stream: Example: • Average -> Sum/Count 28
  • 64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects 29
  • 65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf 29
  • 66. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects Agora: An Open Ecosystem for Democratizing Data Science & Artificial Intelligence The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf 29
  • 67. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Upcoming Research Projects Agora: An Open Ecosystem for Democratizing Data Science & Artificial Intelligence The NebulaStream Platform: Data and Application Management for the Internet of Things https://arxiv.org/pdf/1910.07867.pdf https://arxiv.org/pdf/1909.03026.pdf We are hiring! Research Associates / PhD Students & Post Docs (m/w/d) Catch us after the talks or send a mail to jobs@dima.tu-berlin.de 29
  • 68. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Scotty: Efficient Window Aggregation for your Stream Processing System Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP). Scotty Features: ● One window operator for many systems. ● High performance with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics tu-berlin-dima.github.io/ scotty-window-processor Open Source Repository: 30 Scotty Window Processor