Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Demand-based Data Stream Gathering, Processing, and Transmission - PhD Defense

16 Aufrufe

Veröffentlicht am

This slide set was presented at my PhD defense in March 2019.

The thesis can be fount at: https://depositonce.tu-berlin.de/bitstream/11303/10519/4/traub_jonas.pdf

Abstract:
The Internet of Things (IoT) consists of billions of devices which form a cloud of network connected sensor nodes. These sensor nodes supply a vast number of data streams with massive amounts of sensor data. Real-time sensor data enables diverse applications including traffic-aware navigation, machine monitoring, and home automation. Current stream processing pipelines are demand-oblivious, which means that they gather, transmit, and process as much data as possible. In contrast, a demand-based processing pipeline uses requirement specifications of data consumers, such as failure tolerances and latency limitations, to save resources. In this thesis, we present an end-to-end architecture for demand-based data stream gathering, processing, and transmission. Our solution unifies the way applications express their data demands, i.e., their requirements with respect to their input streams. This unification allows for multiplexing the data demands of all concurrently running applications. On sensor nodes, we schedule sensor reads based on the data demands of all applications, which saves up to 87% in sensor reads and data transfers in our experiments with real-world sensor data. Our demand-based control layer optimizes the data acquisition from thousands of sensors. We introduce time coherence as a fundamental data characteristic, which is the delay between the first and the last sensor read that contribute values to a tuple. A large scale parameter exploration shows that our solution scales to large numbers of sensors and operates reliably under varying latency and coherence constraints. On stream analysis systems, we tackle the problem of efficient window aggregation. We contribute a general aggregation technique, which adapts to four key workload characteristics: Stream (dis)order, aggregation types, window types, and window measures. Our experiments show that our solution outperforms alternative solutions by an order of magnitude in throughput, which prevents expensive system scale-out. We further derive data demands from visualization needs of applications and make these data demands available to streaming systems such as Apache Flink. This enables streaming systems to preprocess data with respect to changing visualization needs. Experiments show that our solution reliably prevents overloads when data rates increase.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

  • Gehören Sie zu den Ersten, denen das gefällt!

Demand-based Data Stream Gathering, Processing, and Transmission - PhD Defense

  1. 1. Candidate: Jonas Traub PhD Defense, Berlin, 22.03.2019 Demand-based Data Stream Gathering, Processing, and Transmission Committee: Prof. Dr. Manfred Hauswirth Prof. Dr. Amr El Abbadi Prof. Dr. Volker Markl Prof. Dr. Albert Bifet
  2. 2. Thesis Goal Efficiently gather, transmit, and process sensor data from a large number of sensor nodes, and make it available to a large number of applications in real-time. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
  3. 3. Thesis Goal Efficiently gather, transmit, and process sensor data from a large number of sensor nodes, and make it available to a large number of applications in real-time. Efficiency Scalability Timeliness 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 2
  4. 4. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  5. 5. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  6. 6. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  7. 7. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  8. 8. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering 53 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  9. 9. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing 53 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  10. 10. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  11. 11. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing Data Transmission 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  12. 12. Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Data Gathering Data Processing Data Transmission 8 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 3
  13. 13. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  14. 14. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  15. 15. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  16. 16. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  17. 17. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  18. 18. Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  19. 19. Problems Demand-oblivious Data Streaming in the Internet of Things s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  20. 20. Problems Demand-oblivious Data Streaming in the Internet of Things Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  21. 21. Problems Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  22. 22. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  23. 23. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth Front-End Overload s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  24. 24. Problems Expensive Cluster Scale-Out Demand-oblivious Data Streaming in the Internet of Things Parallel Network Connections Combined Stream Bandwidth Front-End Overload s1 s2 s3 s4 s5 s6 … sN-1 sN Sensor Nodes Stream Analysis System Front-End Applications Demand-oblivious Stream Processing Pipelines do not suite the IoT. Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 4
  25. 25. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  26. 26. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  27. 27. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  28. 28. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  29. 29. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  30. 30. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  31. 31. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  32. 32. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 … sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  33. 33. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  34. 34. Demand-based Data Streaming in the Internet of Things Stream Analysis System Front-End Applications s2 s5 sN Sensor Nodes s1 s3 s4 s6 sN-1 Sensor Control System Data Demand: Minimum number of data points which allows for providing a desired functionality. Demand-oblivious: Not considering the data demand of data consumers. Definitions: Demand-based: Utilize requirement specifications of data consumers to save resources. Solution We enable demand-based optimizations by introducing control interfaces for expressing data demands. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 5
  35. 35. End-to-End Architecture 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 6
  36. 36. Optimized On-Demand Data Streaming from Sensor Nodes Chapter 2: Scope of Chapter 2: Read Scheduling on Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 7
  37. 37. Sensor Read Scheduling 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 8
  38. 38. User-Defined Sampling Functions Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 9
  39. 39. Sensor Read Fusion Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 10
  40. 40. Sensor Read Fusion Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 11
  41. 41. Local Filtering Read time tolerance Desired read time Penalty function 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
  42. 42. Local Filtering Read time tolerance Desired read time Penalty function Our scheduler minimizes the number of sensor reads and data transmissions based on the joint data demand of all data consumers. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 12
  43. 43. Read Scheduling Example Read time tolerance Desired read time Penalty function Optimal time frame for read sharing Sensor read time Symbol Key Sum of penalty functions 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 13
  44. 44. Read Scheduling Example Read time tolerance Desired read time Penalty function Optimal time frame for read sharing Sensor read time Symbol Key 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 14
  45. 45. Increasing the number of concurrent queries 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15
  46. 46. Increasing the number of concurrent queries 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 15 On-Demand scheduling reduces sensor reads and data transfer by up to 87%. The # of reads and transfers increases sub-linearly with the # of queries.
  47. 47. Conclusion of Chapter 2 • We introduced user-defined sampling functions (UDSFs). • We contributed a multi-query read scheduling algorithm. • We optimize read times based on given read time tolerances. • We experimentally and analytically validated our approach. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
  48. 48. Conclusion of Chapter 2 • We introduced user-defined sampling functions (UDSFs). • We contributed a multi-query read scheduling algorithm. • We optimize read times based on given read time tolerances. • We experimentally and analytically validated our approach. Optimized On-Demand Data Streaming from Sensor Nodes Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, and Volker Markl. ACM Symposium on Cloud Computing 2017 (SoCC '17) Full Paper: 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 16
  49. 49. Scalable Data Acquisition with Guaranteed Time Coherence Chapter 3: Scope of Chapter 3: The Sensor Control Layer 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
  50. 50. Scalable Data Acquisition with Guaranteed Time Coherence Chapter 3: Scope of Chapter 3: The Sensor Control Layer 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 17
  51. 51. Architecture: Central Join Topology s1 s2 s3 s4 … sN Sensor Nodes (t3,v3) (t,v1,v2, …, vN) Central Join 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18
  52. 52. Architecture: Central Join Topology s1 s2 s3 s4 … sN Sensor Nodes (t3,v3) (t,v1,v2, …, vN) Central Join 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 18 Central Stream Joins do not scale to thousands of input streams.
  53. 53. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 Sensor Nodes 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  54. 54. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  55. 55. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  56. 56. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1) Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  57. 57. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4) Loop Node (t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  58. 58. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  59. 59. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  60. 60. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19
  61. 61. s1 s2 s3 s4 Architecture: Sensing Loop Topology s1 s2 s3 s4 (t,v1,v2)(t,v1) (t,v1,v2,v3) (t,v1,v2,v3,v4)(t) Loop Node s5 s6 s7 s8 (t,v5,v6)(t,v5) (t,v5,v6,v7) (t,v5,v6,v7,v8)(t) 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 19 We dynamically combine sensing loops and central joins to solve scalability challenges.
  62. 62. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  63. 63. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  64. 64. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  65. 65. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  66. 66. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  67. 67. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  68. 68. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  69. 69. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  70. 70. Architecture: Sensor Node Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 20
  71. 71. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21
  72. 72. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  73. 73. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  74. 74. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  75. 75. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  76. 76. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  77. 77. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡.
  78. 78. Time Coherence of Data Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 21 Definition of Time Coherence: A tuple 𝑡, 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 is time coherent if all its values 𝑣1, 𝑣2,⋯ , 𝑣 𝑁 were read exactly at time 𝑡. How can we measure, optimize, and guarantee time coherency?
  79. 79. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  80. 80. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  81. 81. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  82. 82. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  83. 83. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22
  84. 84. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Measures Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 22 Best Case: Ce=0 Best Case: Cg = round trip time
  85. 85. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 Best Case: Ce=0 Best Case: Cg = round trip time
  86. 86. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 Best Case: Ce=0 Best Case: Cg = round trip time
  87. 87. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  88. 88. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks Best Case: Ce=0 Best Case: Cg = round trip time
  89. 89. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  90. 90. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Best Case: Ce=0 Best Case: Cg = round trip time
  91. 91. Coherence Estimate (Ce) Coherence Guarantee (Cg) Time Coherence Optimization Based on Trusted ClocksBased on Untrusted Clocks 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 23 If untrusted clocks are correct, data quality will be better if you just trust them. I can‘t rely on untrusted clocks If untrusted clocks are wrong, data quality is ensured iff you don‘t trust them. Dmax Best Case: Ce=0 Best Case: Cg = round trip time
  92. 92. Example of Coherence Optimization 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 24 Coherence Estimate (Ce) Coherence Guarantee (Cg) Round Trip Time Dmax (Desired upper limit for Cg) Symbol Key
  93. 93. Large Scale Parameter Exploration 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25
  94. 94. Large Scale Parameter Exploration 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 25 Our optimization works for arbitrary coherence requirements and large numbers of sensor nodes.
  95. 95. Conclusion of Chapter 3 • We present an architecture for acquiring values from large numbers of sensors with guaranteed time coherence and low latency. • We introduce time coherence as a fundamental data characteristic. • We optimize the coherence of tuples and provide coherence guarantees. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26
  96. 96. Conclusion of Chapter 3 • We present an architecture for acquiring values from large numbers of sensors with guaranteed time coherence and low latency. • We introduce time coherence as a fundamental data characteristic. • We optimize the coherence of tuples and provide coherence guarantees. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 26 SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence. Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann Rabl, and Volker Markl. (Under Submission) Full Paper:
  97. 97. Efficient Window Aggregation with General Stream Slicing Chapter 4: Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
  98. 98. Efficient Window Aggregation with General Stream Slicing Chapter 4: Scope of Chapter 4: Flexible Stream Discretization and Efficient Window Aggregation in Stream Analysis Systems 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 27
  99. 99. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 28
  100. 100. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
  101. 101. The number of slices depends on data consumers. → Demand-based memory requirement. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 29
  102. 102. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 30
  103. 103. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 31
  104. 104. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 32
  105. 105. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
  106. 106. We store partial aggregates instead of all tuples. → Small memory footprint. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 33
  107. 107. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
  108. 108. We assign each tuple to exactly one slice. → O(1) per-tuple complexity. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 34
  109. 109. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
  110. 110. We require just a few computation steps to calculate final aggregates. → Low latency. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 35
  111. 111. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
  112. 112. We share partial aggregations among all users and queries. → Efficiency by preventing redundancy. Stream Slicing Example 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 36
  113. 113. General Stream Slicing 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  114. 114. General Stream Slicing Workload Characteristics 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  115. 115. General Stream Slicing Workload Characteristics Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  116. 116. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  117. 117. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  118. 118. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37
  119. 119. General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 37 General Stream Slicing combines generality and efficiency in a single solution.
  120. 120. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38
  121. 121. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Part 1: Three Fundamental Operations on Slices
  122. 122. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Part 1: Three Fundamental Operations on Slices
  123. 123. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Part 1: Three Fundamental Operations on Slices
  124. 124. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices
  125. 125. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics:
  126. 126. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples?
  127. 127. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices?
  128. 128. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  129. 129. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices?
  130. 130. General Stream Slicing Internals 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 38 Merge Slices Split Slices Update Slices Part 1: Three Fundamental Operations on Slices Part 2: Adapt to Workload Characteristics: Do we need to store original tuples? Do we potentially need to split slices? Do we potentially need to remove tuples from slices? General Stream Slicing adapts to current workload characteristics.
  131. 131. Performance Evaluation with 20% out-of-order Tuples 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 39
  132. 132. Conclusion of Chapter 4 • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We contribute the General Stream Slicing technique. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
  133. 133. Conclusion of Chapter 4 • We identify workload characteristics which impact applicability and performance of window aggregation techniques. • We contribute the General Stream Slicing technique. • We show that general stream slicing is generally applicable and offers better performance than alternative approaches. Efficient Window Aggregation with General Stream Slicing Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. International Conference on Extending Database Technology (EDBT), 2019. Full Paper: Scotty: Efficient Window Aggregation for out-of-order Stream Processing Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl. IEEE International Conference on Data Engineering (ICDE), 2018. Short Paper: Open Source: https://tu-berlin-dima.github.io/scotty-window-processor/ 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 40
  134. 134. Interactive Real-Time Visualization for Streaming Data Chapter 5: Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
  135. 135. Interactive Real-Time Visualization for Streaming Data Chapter 5: Scope of Chapter 5: Connecting Stream Analysis Systems and Front-End Applications 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 41
  136. 136. Demonstration Data 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 42 The DEBS 2013 Grand Challenge ChristopherMutschler, Holger Ziekow, ZbigniewJerzak Data: ● Sensor data from a football match (speed, acceleration, and position of the ball and players) ● Up to 2000 Hz frequency ● roughly 12.000 data points per second
  137. 137. Conclusion of Chapter 5 • We connect visualization front-ends with stream analysis clusters to enable demand-based optimizations. • We show that our solution significantly reduces the amount of processed and transferred data, while still providing loss-free visualizations. • We provide an algorithm for the live visualization of time series in line charts. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44
  138. 138. Conclusion of Chapter 5 • We connect visualization front-ends with stream analysis clusters to enable demand-based optimizations. • We show that our solution significantly reduces the amount of processed and transferred data, while still providing loss-free visualizations. • We provide an algorithm for the live visualization of time series in line charts. 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 44 I²: Interactive Real-Time Visualization for Streaming Data Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, and Volker Markl. International Conference on Extending Database Technology (EDBT), 2017. Demonstration: Open Source: https://tu-berlin-dima.github.io/i2
  139. 139. Conclusion 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 45
  140. 140. Additional Contributions: • Cutty: Aggregate Sharing for User-Defined Windows. (CIKM 2016) Paris Carbone, Jonas Traub, Asterios Katsifodimos, Seif Haridi, Volker Markl • Scalable Detection of Concept Drifts on Data Streams with Parallel Adaptive Windowing. (EDBT 2018) Philipp Marian Grulich, René Saitenmacher, Jonas Traub, Sebastian Breß,Tilmann Rabl, Volker Markl • A Concept Drift based Approach for Predicting Event-Time Progress in Data Streams (EDBT 2019) Ahmed Awad, Jonas Traub, Sherif Sakr • Analyzing Efficient Stream Processing on Modern Hardware (PVLDB 2019) Steffen Zeuch, Bonaventura Del Monte, Jeyhun Karimov, Clemens Lutz, Manuel Renz, Jonas Traub, Sebastian Breß, Tilmann Rabl, Volker Markl • Efficient SIMD Vectorization for Hashing in OpenCL (EDBT 2018) Tobias Behrens, Viktor Rosenfeld, Jonas Traub, Sebastian Breß, Volker Markl • Resense: Transparent Record and Replay of Sensor Data in the Internet of Things (EDBT 2019) Dimitrios Giouroukis, Julius Hülsmann, Janis von Bleichert, Morgan Geldenhuys, Tim Stullich, Felipe Gutierrez, Jonas Traub, Kaustubh Beedkar, Volker Markl • Apache Flink in current research (it-Information Technology 2016) Tilmann Rabl, Jonas Traub, Asterios Katsifodimos, and Volker Markl • Die Apache Flink Plattform zur parallelen Analyse von Datenströmen und Stapeldaten (LWA 2015) Jonas Traub, Tilmann Rabl, Fabian Hueske, Till Rohrmann, Volker Markl 22.03.2019 Jonas Traub - Demand-based Data Stream Gathering, Processing, and Transmission 46 List of Publications PhD Thesis Publications: • Optimized On-Demand Data Streaming from Sensor Nodes (SoCC 2017) Jonas Traub, Sebastian Breß, Tilmann Rabl, Asterios Katsifodimos, Volker Markl • SENSE: Scalable Data Acquisition from Distributed Sensors with Guaranteed Time Coherence. Jonas Traub, Julius Hülsmann, Tim Stullich, Sebastian Breß, Tilmann Rabl, and Volker Markl • Scotty: Efficient Window Aggregation for out-of-order Stream Processing (ICDE 2018) Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl • Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper) Jonas Traub, Philipp Grulich, Alejandro Rodríguez Cuéllar, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl • I²: Interactive Real-Time Visualization for Streaming Data (EDBT 2017, Best Demo) Jonas Traub, Nikolaas Steenbergen, Philipp M Grulich, Tilmann Rabl, Volker Markl

×