SlideShare ist ein Scribd-Unternehmen logo
1 von 145
Downloaden Sie, um offline zu lesen
Candidate: Jonas Traub PhD Defense, Berlin, 22.03.2019
Demand-based Data Stream Gathering, Processing, and Transmission
Committee: Prof. Dr. Manfred Hauswirth Prof. Dr. Amr El Abbadi
Prof. Dr. Volker Markl Prof. Dr. Albert Bifet
Thesis Goal
Efficiently gather, transmit, and process sensor data
from a large number of sensor nodes, and
make it available to a large number of applications
in real-time.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
Thesis Goal
Efficiently gather, transmit, and process sensor data
from a large number of sensor nodes, and
make it available to a large number of applications
in real-time.
Efficiency
Scalability
Timeliness
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering
53
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering Data Processing
53
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering Data Processing
8
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering Data Processing Data Transmission
8
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Data Gathering Data Processing Data Transmission
8
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems Demand-oblivious Data Streaming in the Internet of Things
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems Demand-oblivious Data Streaming in the Internet of Things
Combined
Stream
Bandwidth
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems Demand-oblivious Data Streaming in the Internet of Things
Parallel
Network
Connections
Combined
Stream
Bandwidth
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems
Expensive
Cluster
Scale-Out
Demand-oblivious Data Streaming in the Internet of Things
Parallel
Network
Connections
Combined
Stream
Bandwidth
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems
Expensive
Cluster
Scale-Out
Demand-oblivious Data Streaming in the Internet of Things
Parallel
Network
Connections
Combined
Stream
Bandwidth
Front-End
Overload
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Problems
Expensive
Cluster
Scale-Out
Demand-oblivious Data Streaming in the Internet of Things
Parallel
Network
Connections
Combined
Stream
Bandwidth
Front-End
Overload
s1
s2
s3
s4
s5
s6
…
sN-1
sN
Sensor
Nodes
Stream
Analysis
System
Front-End
Applications
Demand-oblivious
Stream Processing Pipelines
do not suite the IoT.
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
…
sN
Sensor
Nodes
s1
s3
s4
s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
sN
Sensor
Nodes
s1 s3
s4 s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
Demand-based Data Streaming in the Internet of Things
Stream
Analysis
System
Front-End
Applications
s2
s5
sN
Sensor
Nodes
s1 s3
s4 s6
sN-1
Sensor
Control
System
Data Demand:
Minimum number of data points
which allows for providing a
desired functionality.
Demand-oblivious:
Not considering the data
demand of data consumers.
Definitions:
Demand-based:
Utilize requirement specifications of
data consumers to save resources.
Solution
We enable demand-based optimizations
by introducing control interfaces for expressing data demands.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
End-to-End Architecture
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 6
Optimized On-Demand
Data Streaming from Sensor Nodes
Chapter 2:
Scope of Chapter 2: Read Scheduling on Sensor Nodes
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 7
Sensor Read Scheduling
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 8
User-Defined Sampling Functions
Read time tolerance
Desired read time
Penalty function
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 9
Sensor Read Fusion
Read time tolerance
Desired read time
Penalty function
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 10
Sensor Read Fusion
Read time tolerance
Desired read time
Penalty function
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 11
Local Filtering
Read time tolerance
Desired read time
Penalty function
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
Local Filtering
Read time tolerance
Desired read time
Penalty function
Our scheduler minimizes the number of sensor reads and data transmissions
based on the joint data demand of all data consumers.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
Read Scheduling Example
Read time tolerance
Desired read time
Penalty function
Optimal time frame
for read sharing
Sensor read time
Symbol Key
Sum of penalty
functions
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 13
Read Scheduling Example
Read time tolerance
Desired read time
Penalty function
Optimal time frame
for read sharing
Sensor read time
Symbol Key
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 14
Increasing the number of concurrent queries
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15
Increasing the number of concurrent queries
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15
On-Demand scheduling reduces sensor reads and data transfer by up to 87%.
The # of reads and transfers increases sub-linearly with the # of queries.
Conclusion of Chapter 2
• We introduced user-defined sampling functions (UDSFs).
• We contributed a multi-query read scheduling algorithm.
• We optimize read times based on given read time tolerances.
• We experimentally and analytically validated our approach.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
Conclusion of Chapter 2
• We introduced user-defined sampling functions (UDSFs).
• We contributed a multi-query read scheduling algorithm.
• We optimize read times based on given read time tolerances.
• We experimentally and analytically validated our approach.
Optimized On-Demand Data Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, and Volker Markl.
ACM Symposium on Cloud Computing 2017 (SoCC '17)
Full Paper:
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
Scalable Data Acquisition with
Guaranteed Time Coherence
Chapter 3:
Scope of Chapter 3: The Sensor Control Layer
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
Scalable Data Acquisition with
Guaranteed Time Coherence
Chapter 3:
Scope of Chapter 3: The Sensor Control Layer
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
Architecture: Central Join Topology
s1
s2
s3
s4
…
sN
Sensor
Nodes
(t3,v3) (t,v1,v2, …, vN)
Central
Join
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18
Architecture: Central Join Topology
s1
s2
s3
s4
…
sN
Sensor
Nodes
(t3,v3) (t,v1,v2, …, vN)
Central
Join
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18
Central Stream Joins do not scale to thousands of input streams.
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2
s3
s4
Sensor
Nodes
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4Loop
Node
(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1)
Loop
Node
(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)
Loop
Node
(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t)
Loop
Node
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t)
Loop
Node
s5 s6 s7 s8
(t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t)
Loop
Node
s5 s6 s7 s8
(t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
s1 s2 s3 s4
Architecture: Sensing Loop Topology
s1
s2 s3 s4
(t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t)
Loop
Node
s5 s6 s7 s8
(t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t)
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
We dynamically combine sensing loops and central joins to solve scalability challenges.
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Architecture: Sensor Node Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
Time Coherence of Data Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
Definition of Time Coherence:
A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent
if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
How can we measure, optimize, and guarantee time coherency?
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
Based on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Measures
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
If untrusted clocks are correct,
data quality will be better
if you just trust them.
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
If untrusted clocks are correct,
data quality will be better
if you just trust them.
I can‘t
rely on
untrusted
clocks
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
If untrusted clocks are correct,
data quality will be better
if you just trust them.
I can‘t
rely on
untrusted
clocks
If untrusted clocks are wrong,
data quality is ensured
iff you don‘t trust them.
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
If untrusted clocks are correct,
data quality will be better
if you just trust them.
I can‘t
rely on
untrusted
clocks
If untrusted clocks are wrong,
data quality is ensured
iff you don‘t trust them.
Best Case: Ce=0 Best Case: Cg = round trip time
Coherence Estimate (Ce) Coherence Guarantee (Cg)
Time Coherence Optimization
Based on Trusted ClocksBased on Untrusted Clocks
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23
If untrusted clocks are correct,
data quality will be better
if you just trust them.
I can‘t
rely on
untrusted
clocks
If untrusted clocks are wrong,
data quality is ensured
iff you don‘t trust them.
Dmax
Best Case: Ce=0 Best Case: Cg = round trip time
Example of Coherence Optimization
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 24
Coherence Estimate (Ce)
Coherence Guarantee (Cg)
Round Trip Time
Dmax (Desired upper limit for Cg)
Symbol Key
Large Scale Parameter Exploration
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25
Large Scale Parameter Exploration
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25
Our optimization works for arbitrary coherence requirements and large numbers of sensor nodes.
Conclusion of Chapter 3
• We present an architecture for acquiring values from large numbers of
sensors with guaranteed time coherence and low latency.
• We introduce time coherence as a fundamental data characteristic.
• We optimize the coherence of tuples and provide coherence guarantees.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26
Conclusion of Chapter 3
• We present an architecture for acquiring values from large numbers of
sensors with guaranteed time coherence and low latency.
• We introduce time coherence as a fundamental data characteristic.
• We optimize the coherence of tuples and provide coherence guarantees.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26
SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time
Coherence. Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann
Rabl, and Volker Markl. (Under Submission)
Full Paper:
Efficient Window Aggregation
with General Stream Slicing
Chapter 4:
Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
Efficient Window Aggregation
with General Stream Slicing
Chapter 4:
Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 28
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
The number of slices depends on data consumers. → Demand-based memory requirement.
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 30
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 31
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 32
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
We store partial aggregates instead of all tuples. → Small memory footprint.
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
We assign each tuple to exactly one slice. → O(1) per-tuple complexity.
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
We require just a few computation steps to calculate final aggregates. → Low latency.
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
We share partial aggregations among all users and queries. → Efficiency by preventing redundancy.
Stream Slicing Example
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
General Stream Slicing
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
General Stream Slicing combines generality and efficiency in a single solution.
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Part 1: Three Fundamental Operations on Slices
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices
Part 1: Three Fundamental Operations on Slices
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices
Part 1: Three Fundamental Operations on Slices
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
General Stream Slicing Internals
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
Merge Slices Split Slices Update Slices
Part 1: Three Fundamental Operations on Slices
Part 2: Adapt to Workload Characteristics:
Do we need to store original tuples?
Do we potentially need to split slices?
Do we potentially need
to remove tuples from slices?
General Stream Slicing adapts to current workload characteristics.
Performance Evaluation with 20% out-of-order Tuples
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 39
Conclusion of Chapter 4
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
• We contribute the General Stream Slicing technique.
• We show that general stream slicing is generally applicable and
offers better performance than alternative approaches.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
Conclusion of Chapter 4
• We identify workload characteristics which impact
applicability and performance of window aggregation techniques.
• We contribute the General Stream Slicing technique.
• We show that general stream slicing is generally applicable and
offers better performance than alternative approaches.
Efficient Window Aggregation with General Stream Slicing
Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl.
International Conference on Extending Database Technology (EDBT), 2019.
Full Paper:
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl.
IEEE International Conference on Data Engineering (ICDE), 2018.
Short Paper:
Open Source: https://tu-berlin-dima.github.io/scotty-window-processor/
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
Interactive Real-Time Visualization
for Streaming Data
Chapter 5:
Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
Interactive Real-Time Visualization
for Streaming Data
Chapter 5:
Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
Demonstration Data
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 42
The DEBS 2013 Grand Challenge
ChristopherMutschler, Holger Ziekow, ZbigniewJerzak
Data:
● Sensor data from a football match
(speed, acceleration, and position of the ball and players)
● Up to 2000 Hz frequency
● roughly 12.000 data points per second
Conclusion of Chapter 5
• We connect visualization front-ends with stream analysis clusters to enable
demand-based optimizations.
• We show that our solution significantly reduces the amount of processed
and transferred data, while still providing loss-free visualizations.
• We provide an algorithm for the live visualization of time series in line charts.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44
Conclusion of Chapter 5
• We connect visualization front-ends with stream analysis clusters to enable
demand-based optimizations.
• We show that our solution significantly reduces the amount of processed
and transferred data, while still providing loss-free visualizations.
• We provide an algorithm for the live visualization of time series in line charts.
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44
I²: Interactive Real-Time Visualization for Streaming Data
Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, and Volker Markl.
International Conference on Extending Database Technology (EDBT), 2017.
Demonstration:
Open Source: https://tu-berlin-dima.github.io/i2
Conclusion
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 45
Additional Contributions:
• Cutty: Aggregate Sharing for User-Defined Windows. (CIKM 2016)
Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl
• Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing. (EDBT 2018)
Philipp Marian Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß,Tilmann Rabl, Volker Markl
• A Concept Drift based Approach for Predicting Event-Time Progress in Data Streams (EDBT 2019)
Ahmed Awad, Jonas Traub, Sherif Sakr
• Analyzing Efficient Stream Processing on Modern Hardware (PVLDB 2019)
Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl
• Efficient SIMD Vectorization for Hashing in OpenCL (EDBT 2018)
Tobias Behrens, Viktor Rosenfeld, Jonas Traub, Sebastian Breß, Volker Markl
• Resense: Transparent Record and Replay of Sensor Data in the Internet of Things (EDBT 2019)
Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl
• Apache Flink in current research (it-Information Technology 2016)
Tilmann Rabl, Jonas Traub, Asterios Katsifodimos, and Volker Markl
• Die Apache Flink Plattform zur parallelen Analyse von Datenströmen und Stapeldaten (LWA 2015)
Jonas Traub, Tilmann Rabl, Fabian Hueske, Till Rohrmann, Volker Markl
22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 46
List of Publications
PhD Thesis Publications:
• Optimized On-Demand Data Streaming from Sensor Nodes (SoCC 2017)
Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl
• SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence.
Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann Rabl, and Volker Markl
• Scotty: Efficient Window Aggregation for out-of-order Stream Processing (ICDE 2018)
Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
• Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
• I²: Interactive Real-Time Visualization for Streaming Data (EDBT 2017, Best Demo)
Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, Volker Markl

Weitere ähnliche Inhalte

Mehr von Jonas Traub

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingJonas Traub
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingJonas Traub
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Jonas Traub
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLJonas Traub
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...Jonas Traub
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...Jonas Traub
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataJonas Traub
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)Jonas Traub
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisJonas Traub
 

Mehr von Jonas Traub (10)

Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream SlicingFlink Forward 2018: Efficient Window Aggregation with Stream Slicing
Flink Forward 2018: Efficient Window Aggregation with Stream Slicing
 
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream ProcessingScotty: Efficient Window Aggregation for Out-of-Order Stream Processing
Scotty: Efficient Window Aggregation for Out-of-Order Stream Processing
 
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive W...
 
Efficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCLEfficient SIMD Vectorization for Hashing in OpenCL
Efficient SIMD Vectorization for Hashing in OpenCL
 
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
UZH Stream Reasoning Workshop 2018: Optimized On-Demand Data Streaming from S...
 
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
JT@UCSB - On-Demand Data Streaming from Sensor Nodes and A quick overview of ...
 
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
I²: Interactive Real-Time Visualization for Streaming Data with Apache Flink ...
 
I²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming DataI²: Interactive Real-Time Visualization for Streaming Data
I²: Interactive Real-Time Visualization for Streaming Data
 
LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)LWA 2015: The Apache Flink Platform (Poster)
LWA 2015: The Apache Flink Platform (Poster)
 
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream AnalysisLWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
LWA 2015: The Apache Flink Platform for Parallel Batch and Stream Analysis
 

Kürzlich hochgeladen

cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationSanghamitraMohapatra5
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsMarkus Roggen
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxtuking87
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxfarhanvvdk
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clonechaudhary charan shingh university
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionJadeNovelo1
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialMarkus Roggen
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx201bo007
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxzeus70441
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGSoniaBajaj10
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfSubhamKumar3239
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 

Kürzlich hochgeladen (20)

cybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitationcybrids.pptx production_advanges_limitation
cybrids.pptx production_advanges_limitation
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of CannabinoidsTotal Legal: A “Joint” Journey into the Chemistry of Cannabinoids
Total Legal: A “Joint” Journey into the Chemistry of Cannabinoids
 
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptxQ4-Mod-1c-Quiz-Projectile-333344444.pptx
Q4-Mod-1c-Quiz-Projectile-333344444.pptx
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
Oxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptxOxo-Acids of Halogens and their Salts.pptx
Oxo-Acids of Halogens and their Salts.pptx
 
whole genome sequencing new and its types including shortgun and clone by clone
whole genome sequencing new  and its types including shortgun and clone by clonewhole genome sequencing new  and its types including shortgun and clone by clone
whole genome sequencing new and its types including shortgun and clone by clone
 
The Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and FunctionThe Sensory Organs, Anatomy and Function
The Sensory Organs, Anatomy and Function
 
Unveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s PotentialUnveiling the Cannabis Plant’s Potential
Unveiling the Cannabis Plant’s Potential
 
DETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptxDETECTION OF MUTATION BY CLB METHOD.pptx
DETECTION OF MUTATION BY CLB METHOD.pptx
 
Introduction Classification Of Alkaloids
Introduction Classification Of AlkaloidsIntroduction Classification Of Alkaloids
Introduction Classification Of Alkaloids
 
Abnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptxAbnormal LFTs rate of deco and NAFLD.pptx
Abnormal LFTs rate of deco and NAFLD.pptx
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
Measures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UGMeasures of Central Tendency.pptx for UG
Measures of Central Tendency.pptx for UG
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
AZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTXAZOTOBACTER AS BIOFERILIZER.PPTX
AZOTOBACTER AS BIOFERILIZER.PPTX
 
complex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdfcomplex analysis best book for solving questions.pdf
complex analysis best book for solving questions.pdf
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 

Demand-based Data Stream Gathering, Processing, and Transmission - PhD Defense

  • 1. Candidate: Jonas Traub PhD Defense, Berlin, 22.03.2019 Demand-based Data Stream Gathering, Processing, and Transmission Committee: Prof. Dr. Manfred Hauswirth Prof. Dr. Amr El Abbadi Prof. Dr. Volker Markl Prof. Dr. Albert Bifet
  • 2. Thesis Goal Efficiently gather, transmit, and process sensor data from a large number of sensor nodes, and make it available to a large number of applications in real-time. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
  • 3. Thesis Goal Efficiently gather, transmit, and process sensor data from a large number of sensor nodes, and make it available to a large number of applications in real-time. Efficiency Scalability Timeliness 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
  • 4. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 5. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 6. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 7. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 8. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering 53 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 9. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing 53 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 10. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 11. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing Data Transmission 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 12. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing Data Transmission 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  • 13. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 14. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 15. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 16. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 17. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 18. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 19. Problems Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 20. Problems Demand-oblivious Data Streaming in the Internet of Things Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 21. Problems Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 22. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 23. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth Front-End Overload s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 24. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth Front-End Overload s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Demand-oblivious Stream Processing Pipelines do not suite the IoT. Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  • 25. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 26. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 27. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 28. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 29. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 30. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 31. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 32. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 33. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 34. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. Solution We enable demand-based optimizations by introducing control interfaces for expressing data demands. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  • 35. End-to-End Architecture 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 6
  • 36. Optimized On-Demand Data Streaming from Sensor Nodes Chapter 2: Scope of Chapter 2: Read Scheduling on Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 7
  • 37. Sensor Read Scheduling 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 8
  • 38. User-Defined Sampling Functions Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 9
  • 39. Sensor Read Fusion Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 10
  • 40. Sensor Read Fusion Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 11
  • 41. Local Filtering Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
  • 42. Local Filtering Read time tolerance Desired read time Penalty function Our scheduler minimizes the number of sensor reads and data transmissions based on the joint data demand of all data consumers. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
  • 43. Read Scheduling Example Read time tolerance Desired read time Penalty function Optimal time frame for read sharing Sensor read time Symbol Key Sum of penalty functions 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 13
  • 44. Read Scheduling Example Read time tolerance Desired read time Penalty function Optimal time frame for read sharing Sensor read time Symbol Key 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 14
  • 45. Increasing the number of concurrent queries 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15
  • 46. Increasing the number of concurrent queries 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15 On-Demand scheduling reduces sensor reads and data transfer by up to 87%. The # of reads and transfers increases sub-linearly with the # of queries.
  • 47. Conclusion of Chapter 2 • We introduced user-defined sampling functions (UDSFs). • We contributed a multi-query read scheduling algorithm. • We optimize read times based on given read time tolerances. • We experimentally and analytically validated our approach. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
  • 48. Conclusion of Chapter 2 • We introduced user-defined sampling functions (UDSFs). • We contributed a multi-query read scheduling algorithm. • We optimize read times based on given read time tolerances. • We experimentally and analytically validated our approach. Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, and Volker Markl. ACM Symposium on Cloud Computing 2017 (SoCC '17) Full Paper: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
  • 49. Scalable Data Acquisition with Guaranteed Time Coherence Chapter 3: Scope of Chapter 3: The Sensor Control Layer 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
  • 50. Scalable Data Acquisition with Guaranteed Time Coherence Chapter 3: Scope of Chapter 3: The Sensor Control Layer 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
  • 51. Architecture: Central Join Topology s1 s2 s3 s4 … sN Sensor Nodes (t3,v3) (t,v1,v2, …, vN) Central Join 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18
  • 52. Architecture: Central Join Topology s1 s2 s3 s4 … sN Sensor Nodes (t3,v3) (t,v1,v2, …, vN) Central Join 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18 Central Stream Joins do not scale to thousands of input streams.
  • 53. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 54. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 55. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 56. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1) Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 57. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4) Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 58. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 59. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 60. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  • 61. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19 We dynamically combine sensing loops and central joins to solve scalability challenges.
  • 62. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 63. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 64. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 65. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 66. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 67. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 68. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 69. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 70. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  • 71. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
  • 72. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 73. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 74. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 75. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 76. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 77. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  • 78. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡. How can we measure, optimize, and guarantee time coherency?
  • 79. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  • 80. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  • 81. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  • 82. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  • 83. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  • 84. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22 Best Case: Ce=0 Best Case: Cg = round trip time
  • 85. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 Best Case: Ce=0 Best Case: Cg = round trip time
  • 86. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 Best Case: Ce=0 Best Case: Cg = round trip time
  • 87. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  • 88. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks Best Case: Ce=0 Best Case: Cg = round trip time
  • 89. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  • 90. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  • 91. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Dmax Best Case: Ce=0 Best Case: Cg = round trip time
  • 92. Example of Coherence Optimization 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 24 Coherence Estimate (Ce) Coherence Guarantee (Cg) Round Trip Time Dmax (Desired upper limit for Cg) Symbol Key
  • 93. Large Scale Parameter Exploration 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25
  • 94. Large Scale Parameter Exploration 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25 Our optimization works for arbitrary coherence requirements and large numbers of sensor nodes.
  • 95. Conclusion of Chapter 3 • We present an architecture for acquiring values from large numbers of sensors with guaranteed time coherence and low latency. • We introduce time coherence as a fundamental data characteristic. • We optimize the coherence of tuples and provide coherence guarantees. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26
  • 96. Conclusion of Chapter 3 • We present an architecture for acquiring values from large numbers of sensors with guaranteed time coherence and low latency. • We introduce time coherence as a fundamental data characteristic. • We optimize the coherence of tuples and provide coherence guarantees. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26 SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence. Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann Rabl, and Volker Markl. (Under Submission) Full Paper:
  • 97. Efficient Window Aggregation with General Stream Slicing Chapter 4: Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
  • 98. Efficient Window Aggregation with General Stream Slicing Chapter 4: Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
  • 99. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 28
  • 100. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
  • 101. The number of slices depends on data consumers. → Demand-based memory requirement. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
  • 102. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 30
  • 103. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 31
  • 104. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 32
  • 105. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
  • 106. We store partial aggregates instead of all tuples. → Small memory footprint. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
  • 107. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
  • 108. We assign each tuple to exactly one slice. → O(1) per-tuple complexity. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
  • 109. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
  • 110. We require just a few computation steps to calculate final aggregates. → Low latency. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
  • 111. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
  • 112. We share partial aggregations among all users and queries. → Efficiency by preventing redundancy. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
  • 113. General Stream Slicing 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  • 114. General Stream Slicing Workload Characteristics 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  • 116. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  • 117. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  • 118. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  • 119. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37 General Stream Slicing combines generality and efficiency in a single solution.
  • 120. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
  • 121. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Part 1: Three Fundamental Operations on Slices
  • 122. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Part 1: Three Fundamental Operations on Slices
  • 123. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Part 1: Three Fundamental Operations on Slices
  • 124. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices
  • 125. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics:
  • 126. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples?
  • 127. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices?
  • 128. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  • 129. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  • 130. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices? General Stream Slicing adapts to current workload characteristics.
  • 131. Performance Evaluation with 20% out-of-order Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 39
  • 132. Conclusion of Chapter 4 • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We contribute the General Stream Slicing technique. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
  • 133. Conclusion of Chapter 4 • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We contribute the General Stream Slicing technique. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. Efficient Window Aggregation with General Stream Slicing Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. International Conference on Extending Database Technology (EDBT), 2019. Full Paper: Scotty: Efficient Window Aggregation for out-of-order Stream Processing Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. IEEE International Conference on Data Engineering (ICDE), 2018. Short Paper: Open Source: https://tu-berlin-dima.github.io/scotty-window-processor/ 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
  • 134. Interactive Real-Time Visualization for Streaming Data Chapter 5: Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
  • 135. Interactive Real-Time Visualization for Streaming Data Chapter 5: Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
  • 136. Demonstration Data 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 42 The DEBS 2013 Grand Challenge ChristopherMutschler, Holger Ziekow, ZbigniewJerzak Data: ● Sensor data from a football match (speed, acceleration, and position of the ball and players) ● Up to 2000 Hz frequency ● roughly 12.000 data points per second
  • 137.
  • 138.
  • 139.
  • 140.
  • 141.
  • 142. Conclusion of Chapter 5 • We connect visualization front-ends with stream analysis clusters to enable demand-based optimizations. • We show that our solution significantly reduces the amount of processed and transferred data, while still providing loss-free visualizations. • We provide an algorithm for the live visualization of time series in line charts. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44
  • 143. Conclusion of Chapter 5 • We connect visualization front-ends with stream analysis clusters to enable demand-based optimizations. • We show that our solution significantly reduces the amount of processed and transferred data, while still providing loss-free visualizations. • We provide an algorithm for the live visualization of time series in line charts. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44 I²: Interactive Real-Time Visualization for Streaming Data Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, and Volker Markl. International Conference on Extending Database Technology (EDBT), 2017. Demonstration: Open Source: https://tu-berlin-dima.github.io/i2
  • 144. Conclusion 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 45
  • 145. Additional Contributions: • Cutty: Aggregate Sharing for User-Defined Windows. (CIKM 2016) Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl • Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing. (EDBT 2018) Philipp Marian Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß,Tilmann Rabl, Volker Markl • A Concept Drift based Approach for Predicting Event-Time Progress in Data Streams (EDBT 2019) Ahmed Awad, Jonas Traub, Sherif Sakr • Analyzing Efficient Stream Processing on Modern Hardware (PVLDB 2019) Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl • Efficient SIMD Vectorization for Hashing in OpenCL (EDBT 2018) Tobias Behrens, Viktor Rosenfeld, Jonas Traub, Sebastian Breß, Volker Markl • Resense: Transparent Record and Replay of Sensor Data in the Internet of Things (EDBT 2019) Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl • Apache Flink in current research (it-Information Technology 2016) Tilmann Rabl, Jonas Traub, Asterios Katsifodimos, and Volker Markl • Die Apache Flink Plattform zur parallelen Analyse von Datenströmen und Stapeldaten (LWA 2015) Jonas Traub, Tilmann Rabl, Fabian Hueske, Till Rohrmann, Volker Markl 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 46 List of Publications PhD Thesis Publications: • Optimized On-Demand Data Streaming from Sensor Nodes (SoCC 2017) Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl • SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence. Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann Rabl, and Volker Markl • Scotty: Efficient Window Aggregation for out-of-order Stream Processing (ICDE 2018) Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl • Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper) Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl • I²: Interactive Real-Time Visualization for Streaming Data (EDBT 2017, Best Demo) Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, Volker Markl