Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019

Database Research at TU Berlin
Today‘s Talks:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl

Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Optimized On-Demand Data
Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
ACM Symposium on Cloud Computing (SoCC), 2017

The Sensor Cloud
Real-time
insights
3

The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
3

Real-time
insights
The Sensor Cloud – Problems
4

Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
4

Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
Increasing data rates
require expensive
system scale-out.
4

Tailor Data Streams to the Demand of Applications
• Provide an abstraction to define the data demand of applications.
• Optimize communication costs while maintaining the result accuracy.
• Share sensor reads and data transfer among users and queries.
User-Defined Sampling Functions (UDSFs)
Read-Time Optimization
Multi-Query / Multi-User Optimization
The Sensor Cloud – Solutions
5

Architecture Overview
6

Sensor Read Scheduling
7

Input:
Sensor read time and value
Output:
Next Sensor Read Request
User-Defined Sampling Functions
8

Input:
Sensor read time and value
9

Enable adaptive sampling techniques to reduce data transmission
e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13]
10

Sensor Read Fusion
11

1) Minimize Sensor Reads and Data Transfer:
Latest possible read time
Sensor Read Fusion
12

1) Minimize Sensor Reads and Data Transfer:
Latest possible read time
2) Optimize Sensor Read Times:
● Check the paper for all details on the read time optimizer!
Sensor Read Fusion
12

Read Execution
14

Local Filtering
15

● Enable adaptive filtering in combination with adaptive sampling
● Enable model-driven data acquisition
Local Filtering
15

• On-Demand scheduling reduces sensor reads and data transfer by up to 87%.
• The # of reads and transfers increases sub-linearly with the # of queries.
Increasing the Number of Concurrent Queries
16
independent queries
on-demand scheduling

Further Publications on Data Streams and Sensor Data:
Optimized On-Demand Data
Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
ACM Symposium on Cloud Computing (SoCC), 2017
Efficient Window
Aggregation with General
Stream Slicing
EDBT 2019
I²: Interactive Real-Time
Visualization for
Streaming Data
EDBT 2017
Resense: Transparent Record
and Replay of Sensor Data in
the Internet of Things
EDBT 2019

Database Research at TU Berlin
Up Next:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl

Generating Custom Code for Efficient Query
Execution on Heterogeneous Processors
Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl
VLDB Journal, 27(6), 797-822, 2018

Heterogeneous Processors
20S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018

20
CPUs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018

20
CPUs MICs

20
CPUs MICs GPUs

20
Enable databases to automatically exploit heterogeneous processors
Goal
CPUs MICs GPUs

S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21
Writing efficient code for different processors is costly and error prone
Problem
Problem and Key Ideas

Problem
Generate custom code for each query and processor
Key Idea 1

Problem
Generate custom code for each query and processor
Key Idea 1
Identify efficient code modifications and parameters automatically
Key Idea 2

Challenges

Challenges
Represent code modifications in query plan
Intermediate Representation

Challenges
Select efficient parameters and code modifications
Variant Optimization

Challenges
Select efficient parameters and code modifications
Generate hardware-tailored code
Code Generation

Hawk Code Generator

Hawk Code Generator
y
a
od a o
a s

Hawk Code Generator
y
a
od a o
a s
No changes to SQL parser and optimizer
Alternative Execution Engine

Hawk Code Generator
y
a
od a o
a s
Execute queries on CPUs/GPUs/MICs
Multi-Processor Support

Hawk Code Generator
y
a
od a o
a s
Execute queries on CPUs/GPUs/MICs
Multi-Processor Support
Tunes code and parameters to processors
Automatic Performance Optimization

Step 1: Query Segmentation
24
CJCJ
CJ
SQL

Step 1: Query Segmentation
24
SQL

Step 2: Select Processor-Specific Code Variants
Pipeline
program
Optimized Pipeline
Programs

Step 2: Select Processor-Specific Code Variants
Pipeline
program
Optimized Pipeline
Programs
Variant
Optimizer

Step 3: Generate Target Code
26
Optimized Pipeline
Programs
Code Generator
Target Code

Pipeline Program IR
28
SELECT id, age
FROM person
WHERE age < 25;
SQL Query Pipeline Program

Pipeline Program IR (2)

29
LOOP(person)

29
LOOP(person)
FILTER(age<25)

29
LOOP(person)
FILTER(age<25)
HASH_PUT(id)

29
LOOP(person)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)

Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)

30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Memory Access Pattern

30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Predication Mode

30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Hash Table Implementation
Predication Mode

30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Hash Table Implementation
Predication Mode
Parallelization Strategy

Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)

31
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)

31
HASH_PUT(id)
PROJECT(id, age)

31
PROJECT(id, age, single-pass)PROJECT(id, age)

31

Generating Code: Sequential Memory Access
32
int thread_id = get_thread_id();
start=start_idx(thread_id, num_rows);
end=end_idx(thread_id, num_rows);
for(tid=start;tid<end;tid+=1){
if(age[id] < 25){
OUTPUT(id[tid], age[tid]);
}
}

Memory Access Patterns

Pipeline Program IR: Rewrite
80
LOOP(table, coalesced)

Pipeline Program IR: Rewrite
81
LOOP(table, coalesced)

Generating Code: Coalesced Memory Access
82
int num_threads= get_num_threads();
for(id=thread_id;id<num_rows;
id+=num_threads){
if(age[id] < 25){
}
}

Generating Code: Coalesced Memory Access
83
int num_threads= get_num_threads();
for(id=thread_id;id<num_rows;
id+=num_threads){
if(age[id] < 25){
}
}
Pipeline programs provide fine-grained control over generated code

Performance: Memory Access Patterns

Change to a pipeline program that conserves the semantic but changes the code
Modification
Terminology

Modification
Provides value for each supported modification, defines the generated code
Variant configuration
Terminology

Modification
Provides value for each supported modification, defines the generated code
Variant configuration
Compilation result of a pipeline program
Code variant
Terminology

39
Derive an efficient code variant for each processor

39
Perform an offline calibration phase on a test workload

39
Perform an offline calibration phase on a test workload
Explore the impact of each code modification separately

Variant Optimization - Algorithm
40
Slow
FastVariant Space

40
Slow
FastVariant Space
Initial
Variant

41
Slow
FastVariant Space
Variant 1

42
Slow
FastVariant Space

43
Slow
FastVariant Space
Variant 2

Search Algorithm

Search Algorithm
44
Finds an efficient variant with linear run-time in the number of dimensions

Search Algorithm
44
Code modifications are not strictly orthogonal (space not convex)

Search Algorithm
44
Code modifications are not strictly orthogonal (space not convex)
Perform multiple iterations of the algorithm to find best code variant

Optimizing Search Time

Terminate the search if no faster variant is found during an iteration
Early Termination

Early Termination
Explore the parameter values of the most critical modifications first
Feature Ordering

Early Termination
Explore the parameter values of the most critical modifications first
Feature Ordering
Only include code modifications that change the code
Nested Modifications

Evaluation of Search Time
Variant exploration times for SSB Q4.1 on SF1

Evaluation of Search Time
Our strategy outperforms backtracking by up to two orders of magnitude
Variant exploration times for SSB Q4.1 on SF1

Handling Query Dependencies

Variant configuration of a processor serves as starting point for further tuning
Reuse Variant Configurations

Set a query-dependent modification to another parameter value when we
expect a performance improvement
Heuristic-Based Rewrites

Set a query-dependent modification to another parameter value when we
expect a performance improvement
Heuristic-Based Rewrites
Switch to software predication in FILTER when selectivity is 50%
Example: Software Predication

Query Compilation Times

Compilation times of OpenCL are in the order of hundreds of milliseconds

Compilation times of OpenCL are in the order of hundreds of milliseconds
Compilation times grow linear with the number of pipelines in a query

Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1

Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Performance difference among variants up to two orders of magnitude

Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs

Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs
Best code depends on query characteristics

Conclusion

Conclusion
50
A hardware-tailored code generator
Hawk

Conclusion
50
Hawk
Produce custom code variants for each processor
Code Variant Generation

Conclusion
50
Hawk
No manual tuning for a specific processor

https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ
Conclusion
50
Hawk
No manual tuning for a specific processor

Further Publications on Data Management on Modern Hardware:
Generating Custom Code for Efficient Query
Execution on Heterogeneous Processors
Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl
VLDB Journal, 27(6), 797-822, 2018
Pipelined Query Processing in
Coprocessor Environments
SIGMOD 2018
Efficient and Scalable
k-Means on GPUs.
Datenbank-Spektrum 2018
Analyzing Efficient Stream
Processing on Modern
Hardware
PVLDB 2019

GPU-Accelerated Join Selectivity Estimation using
KDE Models
Paper:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models,
Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl
PVLDB, Volume 10 Issue 13, September 2017

GPU-Accelerated Kernel Density Estimation for
Join Selectivity Estimation
54
Query Optimizer
Database Engine
Query
Plan
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |

GPU-Accelerated Kernel Density Estimation for
Join Selectivity Estimation
54
Query Optimizer
Database Engine
Statistical CoprocessorQuery
Plan
Parameters
Estimates

Background: Kernel Density Estimators
55
Dataset

55
Dataset Sample 𝑆

55
Dataset Sample 𝑆 Kernels 𝐾 𝐻

55
Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate ෠𝑃 𝐻

55
෠𝑃 𝐻 Ԧ𝑥 =
1
|𝑆|
෍
𝑖=1
|𝑆|
𝐾 𝐻 𝑠𝑖, Ԧ𝑥
Average… … over the kernel contributions

56
Average… … over the kernel contributions
Ω Ω
sel Ω =
1
|𝑆|
෍
𝑖=1
|𝑆|
න
Ω
𝐾 𝐻(𝑠𝑖, Ԧ𝑥) 𝑑 Ԧ𝑥

Background: Kernel Density Estimators for Multi-
Dimensional Selectivity Estimation [1]
57
Good fit Overfit Underfit
The bandwidth matrix 𝐻 controls the smoothing applied on the
sample
• Range selections over base tables
• Bandwidth optimization based on the estimation error
• Easy model maintenance
[1] Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation, SIGMOD’15

The Problem:
Multi-Dimensional Join Selectivity Estimation
• and generalization to multiple joins
• Databases: Independence Assumption
• Often violated
• Introduce large errors, potentially bad query plans
• Research: Various Methods (e.g. Sampling, Sketches)
• Our Approach: Kernel Density Estimators
58Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |

Why KDEs for Join Selectivities?
• Multivariate Estimator
• No independence assumption
• Hybrid between samples and histograms
• Small bandwidth: Sample evaluation
• Increasing bandwidth: More smoothing, increasing bucket sizes
• Bandwidth optimization selects proper bandwidth

The Approach: Join and Base Table Models

60
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
𝑃(𝑐1 ∧ 𝑐2)Compute:

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Sample from 𝑅1 Sample from 𝑅2

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Bandwidth 𝐻
Sample from 𝑅1
Base Table KDE Model
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
(𝑷 𝟐)

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Bandwidth 𝐻
Sample from 𝑅1
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
(𝑷 𝟐)
𝑃(𝑐1 ∧ 𝑐2) Compute: ෍
𝑣∈𝐴
𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute:

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Bandwidth 𝐻
Sample from 𝑅1
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
(𝑷 𝟐)
𝑣∈𝐴
Easy to evaluate, better estimates

60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Bandwidth 𝐻
Sample from 𝑅1
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
(𝑷 𝟐)
𝑣∈𝐴
Easy to evaluate, better estimates
Support for base table and join selectivities
Easy to construct and to maintain

Table Model: Computation Components
61
Selectivity:

61
Sum over cross
product of two
samples
Selectivity:

61
Sum over cross
product of two
samples Invariant Contributions:
Contribution of sample
points wrt. selection
predicate
Selectivity:

61
Sum over cross
product of two
samples Cross Contribution:
Distance function on join
attributes of sample points
Invariant Contributions:
Contribution of sample
points wrt. selection
predicate
Selectivity:

Table Model: Sample Pruning
9
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑡1
(5)
Sample 1

9
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑡1
(5)
Compute
Sample 1

9
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑡1
(5)
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑝1
(1)
𝑝1
(2)
𝑝1
(3)
𝑝1
(4)
𝑡1
(5)
𝑝1
(5)
Compute
Sample 1

9
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑡1
(5)
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑝1
(1)
𝑝1
(2)
𝑝1
(3)
𝑝1
(4)
𝑡1
(5)
𝑝1
(5)
𝑡1
(1)
𝑡1
(4)
𝑝1
(1)
𝑝1
(4)
Compute
Filter by
contribution
Sample 1

Table Model: Cross Pruning
63
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑝1
(1)
𝑝1
(2)
𝑝1
(3)
𝑝1
(4)
𝑡1
(5)
𝑝1
(5)
Sample 1

63
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑝1
(1)
𝑝1
(2)
𝑝1
(3)
𝑝1
(4)
𝑡1
(5)
𝑝1
(5)
𝑡2
(1)
𝑡2
(2)
𝑡2
(3)
𝑡2
(4)
𝑝2
(1)
𝑝2
(2)
𝑝2
(3)
𝑝2
(4)
𝑡2
(5)
𝑝2
(5)
Sample 1
Sample 2
(Sorted on join attribute)

63
𝑡1
(1)
𝑡1
(2)
𝑡1
(3)
𝑡1
(4)
𝑝1
(1)
𝑝1
(2)
𝑝1
(3)
𝑝1
(4)
𝑡1
(5)
𝑝1
(5)
𝑡2
(1)
𝑡2
(2)
𝑡2
(3)
𝑡2
(4)
𝑝2
(1)
𝑝2
(2)
𝑝2
(3)
𝑝2
(4)
𝑡2
(5)
𝑝2
(5)
𝑡1
𝑖
. 𝐴 − 𝑡2
𝑗
. 𝐴 < 𝜃
Sample 1
Sample 2
(Sorted on join attribute)

Evaluation: Scaling the Model Size
(Postgres)
64
Dataset: DMV
Query: Q1U

(Table Sample)
65
Dataset: DMV
Query: Q1U

(Correlated Sample)
66
Dataset: DMV
Query: Q1U

(AGMS Sketch)
67
Dataset: DMV
Query: Q1U

(Join Sample)
68
Dataset: DMV
Query: Q1U

(Join Sample + KDE)
69
Dataset: DMV
Query: Q1U

(Table Sample + KDE)
70
Dataset: DMV
Query: Q1U

Runtime: CPU vs GPU
Dataset: IMDB
Workload: Q1U
GPU: Tesla V100
CPU: Intel Xeon Gold 5115
TS+KDE:
4x faster
JS+KDE:
5x faster
0,1
1
10
100
1% 2% 4% 8% 16%
AverageEstimationTime(ms)
Sample Size (Relative to Base Table Size)
TS+KDE (GPU) TS+KDE (CPU) JS+KDE (GPU) JS+KDE (CPU)

Conclusion
• KDE models for join selectivity estimation
• “Getting most out of your sample”
• Based on join or base table KDE models
• Learning hybrid between histograms and samples
• GPU-acceleration possible
• Experiments, data, and code online
72
github.com/martinkiefer/join-kde
“Estimating Join Selectivities using Bandwidth-
Optimized Kernel Density Models”, PVLDB 17

Further Publications on GPU-Accelerated Kernel Density Estimation:
Estimating Join Selectivities using Bandwidth-
Optimized Kernel Density Models
Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl
Proceedings of the VLDB Endowment, 10(13), 2017
Demonstrating Transfer-Efficient
Sample Maintenance on Graphics
Cards
EDBT 2015
Self-Tuning, GPU-Accelerated Kernel
Density Models for Multidimensional
Selectivity Estimation
SIGMOD 2015

BlockJoin: Efficient Matrix
Partitioning Through Joins
Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, Volker Markl
PVLDB, Volume 10 Issue 13, September 2017

76
Common Pattern in end-to-end machine learning pipelines
1. Relational operators e.g., join and filter the input data
2. User-defined functions e.g., feature transformation and vectorization
3. Linear algebra operators e.g., model training and cross-validation
INTRODUCTION
⋈ ML𝒇
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |

77
Parallel Dataflow engines implement
• Relational operators on row-partitioned datasets
• Linear algebra operators on block-partitioned matrices
INTRODUCTION
⋈ ML𝒇

78
Parallel Dataflow engines implement
• Relational operators on row-partitioned datasets
• Linear algebra operators on block-partitioned matrices
>> Pipelines combining both require expensive re-partitioning (shuffle) steps
INTRODUCTION
⋈ ML𝒇

STANDARD WORKFLOW
79
⋈
Join Result
Row-wise
Products
Reviews
PK
FK
P1 1 1 1 1
P2 2 2 2 2
P1 1 3 3 3
P1 1 4 4 4
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4

STANDARD WORKFLOW
80
0
0
1 1
2 2
0
1
1 3
1 4
⋈
Join Result
Row-wise
0 1 1 1 1
1 2 2 2 2
2 1 3 3 3
3 1 4 4 4
Global row-index
Row-wise
1 3
1 4
Matrix
block-partitioned
Products
Reviews
PK
FK
1
0
1 1
2 2
1
1
3 3
4 4
P1 1 1 1 1
P2 2 2 2 2
P1 1 3 3 3
P1 1 4 4 4
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4

STANDARD WORKFLOW - PROBLEMS
81
0
0
1 1
2 2
0
1
1 3
1 4
⋈
Join Result
Row-wise
0 1 1 1 1
1 2 2 2 2
2 1 3 3 3
3 1 4 4 4
Global row-index
Row-wise
1 3
1 4
Matrix
block-partitioned
Products
Reviews
PK
FK
1
0
1 1
2 2
1
1
3 3
4 4
P1 1 1 1 1
P2 2 2 2 2
P1 1 3 3 3
P1 1 4 4 4
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4
Distributed
Join
Re-
Partitioning

0
0
1 1
2 2
0
1
1 3
1 4
STANDARD WORKFLOW - PROBLEMS
82
⋈
Join Result
Row-wise
0 1 1 1 1
1 2 2 2 2
2 1 3 3 3
3 1 4 4 4
Global row-index
Row-wise
1 3
1 4
Matrix
block-partitioned
Materializes the join result, just to apply sequential row-index:
• Shuffles data for row-wise partitioning , which is split up immediately
• Puts heavy load on a few machines in case of skewed keys
• Forces early matrix block materialization
Products
Reviews
PK
FK
1
0
1 1
2 2
1
1
3 3
4 4
P1 1 1 1 1
P2 2 2 2 2
P1 1 3 3 3
P1 1 4 4 4
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4
Distributed
Join
Re-
Partitioning

• We propose
Specialized operators at the intersection of linear and relational algebra
• Here, we focus on
Efficient creation of block-partitioned results from normalized data
83
HOW CAN WE IMPROVE?

OUR APPROACH
84
Prune Apply row-index
1 1
2 2
1 3
1 4
1 1
2 2
3 3
4 4
Block-partitioned matrix
P1 1
P2 2
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4
0 1
1 2
2 1
3 1
0 1 1 1
1 2 2 2
2 3 3 3
3 4 4 4
Local
TID-
Join
Products
Reviews
PK
FK
Local Join Kernel Distributed Fetch Kernel
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4

OUR APPROACH
Creates block-partitioned results from normalized data
JOIN KERNEL: Local TID-Join on driver to create block-index meta-data
1. Meta-data provides mapping of TID to row-index for both relations
2. Row-index is applied independently: no materialization of join result
85BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |

OUR APPROACH
Creates block-partitioned results from normalized data
JOIN KERNEL: Local TID-Join on driver to create block-index meta-data
FETCH KERNEL: Materialization strategy of matrix blocks based on matrix shape:
• Late materialization: Blocks are materialized on the receiver node
|PK columns| >> |FK columns|
• Early materialization: Blocks are materialized on the sender node
|PK columns| << |FK columns|

Evaluation

PK – FK JOIN
PK Table: 100k rows, scaling columns
FK Table: 1m rows, 5k columns
88
b. Power-law distributed FKsa. Uniform distributed FKs
up to 2.5x speedup
skew resistant,
while the baseline fails

PK – FK JOIN
PK Table: 100k rows, scaling columns
FK Table: 1m rows, 5k columns
89
b. Power-law distributed FKsa. Uniform distributed FKs

RECAP
BlockJoin is a logically fused operator pipeline
• Separation of matrix index creation and matrix materialization
> No materialization of join result
> Skew resistant
• Cost based block materialization based on data shape
> Late materialization
> Early materialization

Further Publications:
BlockJoin:
Efficient Matrix PartitioningThrough Joins
Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, and Volker Markl.
PVLDB 10.13, 2017
Bridging the gap: towards
optimization across linear
and relational algebra
BeyondMR 2016
Implicit Parallelism
through Deep Language
Embedding
SIGMOD 2015
ScootR: Scaling R
Dataframes on Dataflow
Systems
SoCC 2018

Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019

Empfohlen

Empfohlen

Weitere ähnliche Inhalte

Was ist angesagt?

Was ist angesagt? (16)

Ähnlich wie Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019

Ähnlich wie Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019 (20)

Mehr von Jonas Traub

Mehr von Jonas Traub (14)

Kürzlich hochgeladen

Kürzlich hochgeladen (20)

Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019