Diese Präsentation wurde erfolgreich gemeldet.
Wir verwenden Ihre LinkedIn Profilangaben und Informationen zu Ihren Aktivitäten, um Anzeigen zu personalisieren und Ihnen relevantere Inhalte anzuzeigen. Sie können Ihre Anzeigeneinstellungen jederzeit ändern.

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

21 Aufrufe

Veröffentlicht am

Stream Processing Engines (SPEs) execute long-running queries on unbounded data streams. They follow an interpretation-based processing model and do not perform runtime
optimizations. This limits the utilization of modern hardware
and neglects changing data characteristics at runtime.
In this paper, we present Grizzly, a novel adaptive query
compilation-based SPE, to enable highly efficient query execution. We extend query compilation and task-based parallelization for the unique requirements of stream processing and apply adaptive compilation to enable runtime reoptimizations. The combination of light-weight statistic gathering with just-in-time compilation enables Grizzly to adjust
to changing data-characteristics dynamically at runtime. Our
experiments show that Grizzly outperforms state-of-the-art
SPEs by up to an order of magnitude in throughput.

Veröffentlicht in: Wissenschaft
  • Als Erste(r) kommentieren

Grizzly: Efficient Stream Processing Through Adaptive Query Compilation

  1. 1. Grizzly: Efficient Stream Processing Through Adaptive Query Compilation Philipp M. Grulich¹, Sebastian Breß², Steffen Zeuch¹², Jonas Traub¹, Janis von Bleichert¹, Zongxiong Chen², Tilmann Rabl³, Volker Markl¹² Technische Universität Berlin¹, DFKI GmbH², HPI & Universität Potsdam³ 1
  2. 2. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] 2
  3. 3. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 3 1. Interpretation-based processing model causes poor cache utilization. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  4. 4. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 4 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  5. 5. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 5 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics at runtime. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.] Data Stream
  6. 6. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Limitations of state-of-the-art SPEs 6 1. Interpretation-based processing model causes poor cache utilization. 2. Upfront-Partitioning causes high overhead on single nodes. 3. SPEs do not react to changing data-characteristics. An SPE should be hardware- and data-conscious. Current SPEs use hardware resources inefficiently [Zeuch et al., Zhang et al.]
  7. 7. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Our Proposal Grizzly: Efficient Stream Processing Through Adaptive Query Compilation 7
  8. 8. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations 8 Query Compilation for Stream Processing
  9. 9. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● Support unique stream processing operators. 9 Order Preserving Task-based Parallelization Continuous Adaptive Optimizations
  10. 10. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 10 From(user_purchases ) .filter(origin=’Germany’) .keyBy(userid) .windowBy(TumblingWindow(days(7)), Max(price).as(max_price)) .filter(max_price > 42)
  11. 11. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Query Compilation for Stream Processing ● Fuses operators to compact code blocks. ● How to support combination of window assignment, function, and trigger? Order Preserving Task-based Parallelization ● Concurrent execution on a global state. ● Supporting order requirement of stream processing. ● Exploiting NUMA-configuration. 11
  12. 12. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization 12 ● Input stream is processed in small batches (sized to network buffer). ● Pipelines are executed concurrently on a shared state.
  13. 13. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Task-based Parallelization Lock-Free Window Processing ● Allows threads to process windows concurrently. ● Lightweight coordination for window triggering. NUMA-awareness ● Pre-aggregate window results on locally to minimize inter-NUMA node communication. 13
  14. 14. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly’s Core Principles Order Preserving Task-based Parallelization Continuous Adaptive Optimizations ● Feedback loop between code-generation and query execution. ● Lightweight monitoring at runtime. 14 Query Compilation for Stream Processing
  15. 15. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Re-Optimization Generic Execution: ● Without data-dependent optimizations. 15 Instrumentalized Execution: ● Injects profiling code to collect statistics. (predicate selectivity, value distribution) Specialized Execution: ● Specialize operator implementation (predication, fixed hash-tables)
  16. 16. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Adaptive Optimization 16 Deoptimization: ● Migrates from optimized to less optimized execution. ● Caused by violated assumptions or changed data characteristics.
  17. 17. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation 17
  18. 18. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Grizzly outperforms state-of-the-art SPEs by up-to 10x. Evaluation: System Comparison 18
  19. 19. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Code generation is beneficial for a wide range of workloads. Evaluation: Workloads 19
  20. 20. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Evaluation: Adaptive Optimizations Adaptive optimizations are crucial to reach peak performance. 20
  21. 21. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Summary www.nebula.stream @NebulaStream Grizzly: ● Query compilation for stream processing. ● Task-based parallelization while taking ordering requirements into account. ● Adaptive optimization to reach to changing data characteristics. 21
  22. 22. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. Query Compilation 22
  23. 23. Sigmod 2020, Grizzly: Efficient Stream Processing Through Adaptive Query Compilation, Grulich et al. System Architecture 23

×